Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
MLAir
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
esde
machine-learning
MLAir
Commits
27734c2e
Commit
27734c2e
authored
4 years ago
by
lukas leufen
Browse files
Options
Downloads
Patches
Plain Diff
added table of contents to readme
parent
1398ec15
No related branches found
No related tags found
3 merge requests
!146
Develop
,
!145
Resolve "new release v0.12.0"
,
!138
Resolve "Advanced Documentation"
Pipeline
#45971
passed
4 years ago
Stage: test
Stage: docs
Stage: pages
Stage: deploy
Changes
2
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
README.md
+2
-57
2 additions, 57 deletions
README.md
mlair/data_handler/station_preparation.py
+53
-0
53 additions, 0 deletions
mlair/data_handler/station_preparation.py
with
55 additions
and
57 deletions
README.md
+
2
−
57
View file @
27734c2e
...
...
@@ -4,6 +4,8 @@ MLAir (Machine Learning on Air data) is an environment that simplifies and accel
learning (ML) models for the analysis and forecasting of meteorological and air quality time series. You can find the
docs
[
here
](
http://toar.pages.jsc.fz-juelich.de/mlair/docs/
)
.
[[
_TOC_
]]
# Installation
MLAir is based on several python frameworks. To work properly, you have to install all packages from the
...
...
@@ -375,60 +377,3 @@ add it to `src/join_settings.py` in the hourly data section. Replace the `TOAR_S
value. To make sure, that this
**sensitive**
data is not uploaded to the remote server, use the following command to
prevent git from tracking this file:
`git update-index --assume-unchanged src/join_settings.py`
# remaining things
## Transformation
There are two different approaches (called scopes) to transform the data:
1)
`station`
: transform data for each station independently (somehow like batch normalisation)
1)
`data`
: transform all data of each station with shared metrics
Transformation must be set by the
`transformation`
attribute. If
`transformation = None`
is given to
`ExperimentSetup`
,
data is not transformed at all. For all other setups, use the following dictionary structure to specify the
transformation.
```
transformation = {"scope": <...>,
"method": <...>,
"mean": <...>,
"std": <...>}
ExperimentSetup(..., transformation=transformation, ...)
```
### scopes
**station**
: mean and std are not used
**data**
: either provide already calculated values for mean and std (if required by transformation method), or choose
from different calculation schemes, explained in the mean and std section.
### supported transformation methods
Currently supported methods are:
*
standardise (default, if method is not given)
*
centre
### mean and std
`"mean"="accurate"`
: calculate the accurate values of mean and std (depending on method) by using all data. Although,
this method is accurate, it may take some time for the calculation. Furthermore, this could potentially lead to memory
issue (not explored yet, but could appear for a very big amount of data)
`"mean"="estimate"`
: estimate mean and std (depending on method). For each station, mean and std are calculated and
afterwards aggregated using the mean value over all station-wise metrics. This method is less accurate, especially
regarding the std calculation but therefore much faster.
We recommend to use the later method
*estimate*
because of following reasons:
*
much faster calculation
*
real accuracy of mean and std is less important, because it is "just" a transformation / scaling
*
accuracy of mean is almost as high as in the
*accurate*
case, because of
$
\b
ar{x_{ij}} =
\b
ar{
\l
eft(
\b
ar{x_i}
\r
ight)_j}$. The only difference is, that in the
*estimate*
case, each mean is
equally weighted for each station independently of the actual data count of the station.
*
accuracy of std is lower for
*estimate*
because of $
\v
ar{x_{ij}}
\n
e
\b
ar{
\l
eft(
\v
ar{x_i}
\r
ight)_j}$, but still the mean of all
station-wise std is a decent estimate of the true std.
`"mean"=<value, e.g. xr.DataArray>`
: If mean and std are already calculated or shall be set manually, just add the
scaling values instead of the calculation method. For method
*centre*
, std can still be None, but is required for the
*standardise*
method.
**Important**
: Format of given values
**must**
match internal data format of DataPreparation
class:
`xr.DataArray`
with
`dims=["variables"]`
and one value for each variable.
This diff is collapsed.
Click to expand it.
mlair/data_handler/station_preparation.py
+
53
−
0
View file @
27734c2e
...
...
@@ -514,6 +514,59 @@ class DataHandlerSingleStation(AbstractDataHandlerSingleStation):
:param transformation: the transformation dictionary as described above.
:return: updated transformation dictionary
## Transformation
There are two different approaches (called scopes) to transform the data:
1) `station`: transform data for each station independently (somehow like batch normalisation)
1) `data`: transform all data of each station with shared metrics
Transformation must be set by the `transformation` attribute. If `transformation = None` is given to `ExperimentSetup`,
data is not transformed at all. For all other setups, use the following dictionary structure to specify the
transformation.
```
transformation = {
"
scope
"
: <...>,
"
method
"
: <...>,
"
mean
"
: <...>,
"
std
"
: <...>}
ExperimentSetup(..., transformation=transformation, ...)
```
### scopes
**station**: mean and std are not used
**data**: either provide already calculated values for mean and std (if required by transformation method), or choose
from different calculation schemes, explained in the mean and std section.
### supported transformation methods
Currently supported methods are:
* standardise (default, if method is not given)
* centre
### mean and std
`
"
mean
"
=
"
accurate
"
`: calculate the accurate values of mean and std (depending on method) by using all data. Although,
this method is accurate, it may take some time for the calculation. Furthermore, this could potentially lead to memory
issue (not explored yet, but could appear for a very big amount of data)
`
"
mean
"
=
"
estimate
"
`: estimate mean and std (depending on method). For each station, mean and std are calculated and
afterwards aggregated using the mean value over all station-wise metrics. This method is less accurate, especially
regarding the std calculation but therefore much faster.
We recommend to use the later method *estimate* because of following reasons:
* much faster calculation
* real accuracy of mean and std is less important, because it is
"
just
"
a transformation / scaling
* accuracy of mean is almost as high as in the *accurate* case, because of
$
\b
ar{x_{ij}} =
\b
ar{\left(
\b
ar{x_i}
\r
ight)_j}$. The only difference is, that in the *estimate* case, each mean is
equally weighted for each station independently of the actual data count of the station.
* accuracy of std is lower for *estimate* because of $
\v
ar{x_{ij}}
\n
e
\b
ar{\left(
\v
ar{x_i}
\r
ight)_j}$, but still the mean of all
station-wise std is a decent estimate of the true std.
`
"
mean
"
=<value, e.g. xr.DataArray>`: If mean and std are already calculated or shall be set manually, just add the
scaling values instead of the calculation method. For method *centre*, std can still be None, but is required for the
*standardise* method. **Important**: Format of given values **must** match internal data format of DataPreparation
class: `xr.DataArray` with `dims=[
"
variables
"
]` and one value for each variable.
"""
if
transformation
is
None
:
return
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment