Skip to content
Snippets Groups Projects

MachineLearningTools

This is a collection of all relevant functions used for ML stuff in the ESDE group

Inception Model

See a description here or take a look on the papers Going Deeper with Convolutions (Szegedy et al., 2014) and Network In Network (Lin et al., 2014).

Installation

  • Install proj on your machine using the console. E.g. for opensuse / leap zypper install proj
  • c++ compiler required for cartopy installation

Security

  • To use hourly data from ToarDB via JOIN interface, a private token is required. Request your personal access token and add it to src/join_settings.py in the hourly data section. Replace the TOAR_SERVICE_URL and the Authorization value. To make sure, that this sensitive data is not uploaded to the remote server, use the following command to prevent git from tracking this file: git update-index --assume-unchanged src/join_settings.py

Customise your experiment

This section summarises which parameters can be customised for a training.

Transformation

There are two different approaches (called scopes) to transform the data:

  1. station: transform data for each station independently (somehow like batch normalisation)
  2. data: transform all data of each station with shared metrics

Transformation must be set by the transformation attribute. If transformation = None is given to ExperimentSetup, data is not transformed at all. For all other setups, use the following dictionary structure to specify the transformation.

transformation = {"scope": <...>, 
                  "method": <...>,
                  "mean": <...>,
                  "std": <...>}
ExperimentSetup(..., transformation=transformation, ...)

scopes

station: mean and std are not used

data: either provide already calculated values for mean and std (if required by transformation method), or choose from different calculation schemes, explained in the mean and std section.

supported transformation methods

Currently supported methods are:

  • standardise (default, if method is not given)
  • centre

mean and std

"mean"="accurate": calculate the accurate values of mean and std (depending on method) by using all data. Although, this method is accurate, it may take some time for the calculation. Furthermore, this could potentially lead to memory issue (not explored yet, but could appear for a very big amount of data)

"mean"="estimate": estimate mean and std (depending on method). For each station, mean and std are calculated and afterwards aggregated using the mean value over all station-wise metrics. This method is less accurate, especially regarding the std calculation but therefore much faster.

We recommend to use the later method estimate because of following reasons:

  • much faster calculation
  • real accuracy of mean and std is less important, because it is "just" a transformation / scaling
  • accuracy of mean is almost as high as in the accurate case, because of \bar{x_{ij}} = \bar{\left(\bar{x_i}\right)_j}. The only difference is, that in the estimate case, each mean is equally weighted for each station independently of the actual data count of the station.
  • accuracy of std is lower for estimate because of \var{x_{ij}} \ne \bar{\left(\var{x_i}\right)_j}, but still the mean of all station-wise std is a decent estimate of the true std.

"mean"=<value, e.g. xr.DataArray>: If mean and std are already calculated or shall be set manually, just add the scaling values instead of the calculation method. For method centre, std can still be None, but is required for the standardise method. Important: Format of given values must match internal data format of DataPreparation class: xr.DataArray with dims=["variables"] and one value for each variable.