MLAir issueshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues2021-03-08T15:38:05+01:00https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/271add CDC database DataHandler2021-03-08T15:38:05+01:00Ghost Useradd CDC database DataHandlerThe goal is to add a database access script inside the DataHandler structure for the usage in the master thesis project of Falco.The goal is to add a database access script inside the DataHandler structure for the usage in the master thesis project of Falco.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/265REFAC: remove defaults from default data handler2022-06-07T12:18:56+02:00Ghost UserREFAC: remove defaults from default data handlerCurrently the `DataHandlerSingleStation` sets a lot of defaults. This leads to the problem, that some settings are set by the data handler and not in an experiment setup (e.g. station_type is not given in the experiment setup, but is set...Currently the `DataHandlerSingleStation` sets a lot of defaults. This leads to the problem, that some settings are set by the data handler and not in an experiment setup (e.g. station_type is not given in the experiment setup, but is set by the data handler). It should be more clear, what defaults are used. Therefore, I suggest to remove some defaults and use Nones instead.
* [ ] investigate which parameters shouldn't be filled with a default value by the data handler
* [ ] create a list with these parameters
* [ ] apply refactoringhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/256tests for default data handler2020-12-17T10:45:04+01:00Ghost Usertests for default data handler*placeholder for now*
implement tests for `data_handler/default_data_handler`*placeholder for now*
implement tests for `data_handler/default_data_handler`Data Handler Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/255test for data handler single station2020-12-17T10:44:35+01:00Ghost Usertest for data handler single station*placeholder for now*
implement tests for `data_handler/data_handler_single_station`*placeholder for now*
implement tests for `data_handler/data_handler_single_station`Data Handler Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/254tests for data handler mixed sampling2020-12-17T10:44:12+01:00Ghost Usertests for data handler mixed sampling*placeholder for now*
implement tests for `data_handler/data_handler_mixed_sampling`*placeholder for now*
implement tests for `data_handler/data_handler_mixed_sampling`Data Handler Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/253tests for data handler kz filter2020-12-17T10:43:46+01:00Ghost Usertests for data handler kz filter*placeholder for now*
implement tests for `data_handler/data_handler_kz_filter`*placeholder for now*
implement tests for `data_handler/data_handler_kz_filter`Data Handler Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/252tests for abstract data handler2020-12-17T10:43:07+01:00Ghost Usertests for abstract data handler*placeholder for now*
implement tests for `data_handler/abstract_data_handler`*placeholder for now*
implement tests for `data_handler/abstract_data_handler`Data Handler Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/251new tests for training2020-12-17T10:41:15+01:00Ghost Usernew tests for training*placeholder for now*
implement tests for run module `training`*placeholder for now*
implement tests for run module `training`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/250new tests for run environment2020-12-17T10:40:58+01:00Ghost Usernew tests for run environment*placeholder for now*
implement tests for run module `run_environment`*placeholder for now*
implement tests for run module `run_environment`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/249new tests for preprocessing2020-12-17T10:40:31+01:00Ghost Usernew tests for preprocessing*placeholder for now*
implement tests for run module `pre_processing`*placeholder for now*
implement tests for run module `pre_processing`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/248new tests for post processing2020-12-17T10:40:13+01:00Ghost Usernew tests for post processing*placeholder for now*
implement tests for run module `post_processing`*placeholder for now*
implement tests for run module `post_processing`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/247new tests for model setup2020-12-17T10:39:50+01:00Ghost Usernew tests for model setup*placeholder for now*
implement tests for run module `model_setup`*placeholder for now*
implement tests for run module `model_setup`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/246new tests for experiment setup2020-12-17T10:39:29+01:00Ghost Usernew tests for experiment setup*placeholder for now*
implement tests for run module `experiment_setup`*placeholder for now*
implement tests for run module `experiment_setup`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/234New Map Plot using Folium2022-08-10T15:48:52+02:00Ghost UserNew Map Plot using FoliumThere is an interesting python package called [Folium](https://python-visualization.github.io/folium/index.html) that can create different maps using leaflet.
![image](/uploads/e9fb2feeac76374e77d7e652693c5728/image.png)
![image](/uplo...There is an interesting python package called [Folium](https://python-visualization.github.io/folium/index.html) that can create different maps using leaflet.
![image](/uploads/e9fb2feeac76374e77d7e652693c5728/image.png)
![image](/uploads/aa9cf6429c205f202c83b3e53b263e32/image.png)
![image](/uploads/3dd107b6df62f0ea5b88f499289071d5/image.png)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/230Boost decoupling of MLAir and data handlers2022-08-31T10:39:18+02:00Ghost UserBoost decoupling of MLAir and data handlersSome parameters that are related to our data handlers could be removed from MLAir's workflow and run_module to serve as a more general workflow. This refers for example to the parameters `window_history_length` and `window_lead_time` whi...Some parameters that are related to our data handlers could be removed from MLAir's workflow and run_module to serve as a more general workflow. This refers for example to the parameters `window_history_length` and `window_lead_time` which may have no influence for custom data handlers. These parameters are only used to build the data handler, but this could be also done by default values inside the data handler. If these parameters should be adjusted, it would be still possible to add the key value pair in the workflow's init call. All kwargs will be also stored in the data store and are therefore available for the data handler.
Create a list of parameters, that are **candidates for removal**:
* [ ] `window_history_length`
* [ ] `window_lead_time`
* [ ] `interpolation_method`
* [ ] `interpolation_limit`
* [ ] `data_origin`
* [ ] `variables`
* [ ] `statistics_per_var`
* [ ] `extreme_values`
* [ ] `extremes_on_right_tail_only`
* [ ] `neighbors`
* [ ] `overwrite_local_data` (<- discussion on this parameter)
* [ ] `sampling` (<- discussion on this parameter, used in postprocessing - for what?)
* [ ] `store_data_locally`
* [ ] `store_processed_data` (<- discussion on this parameter, actually I forgot the meaning of this)
* [x] `target_dim` (<- discussion on this parameter, this could be required for all postprocessing routines, but is currently not used in fact! - is there still too much hardcoded? - solved in #272)
* [ ] `target_var` (<- discussion on this parameter)
It is not easy to remove parameters, that change for subsets like `start` and `min_length`. **How to deal with this?**
Next indicate which parameters are going to be removed!
**Further tasks**
* [x] decouple make prediction in postprocessing (solved in #272)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/219bootstrapped skill scores on denormalised data2020-11-24T16:49:49+01:00Ghost Userbootstrapped skill scores on denormalised dataCurrently normalised forecasts are only stored for the purpose to be usable for the bootstrap analysis. But this could be also performed in the original value space if the retransformation is applied to the bootstrapped prediction. Maybe...Currently normalised forecasts are only stored for the purpose to be usable for the bootstrap analysis. But this could be also performed in the original value space if the retransformation is applied to the bootstrapped prediction. Maybe it is worth to save the computation time of normalised forecasts and apply transformation on the bootstrap predictions instead.
Before taking a decision, first check if the norm forecast is only used by the bootstrap methods or somewhere else.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/218hyperparameters for model class2020-11-20T14:00:24+01:00Ghost Userhyperparameters for model classCurrently, all hyperparameters regarding the model class (e.g. initial learning rate) have to be implemented inside the model class. Change this, that a model can request hyperparameters. Try to use a similar scheme like used for the dat...Currently, all hyperparameters regarding the model class (e.g. initial learning rate) have to be implemented inside the model class. Change this, that a model can request hyperparameters. Try to use a similar scheme like used for the data handler (working with requirements method, that is called by MLAir).https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/217split init and run of all run_modules2021-07-23T18:18:52+02:00Ghost Usersplit init and run of all run_modulessplit init and run of all run_modules. Also add the init and the run call in the workflow run methodsplit init and run of all run_modules. Also add the init and the run call in the workflow run methodhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/209create test methods to verify a custom class implementation2020-11-11T14:13:44+01:00Ghost Usercreate test methods to verify a custom class implementationCreate a test suite that can be used for default testing during CI as well as for an user of MLAir.
By using this test suite, a custom class e.g. a new data handler or model module can be tested, if it fulfils all requirements.
Tests t...Create a test suite that can be used for default testing during CI as well as for an user of MLAir.
By using this test suite, a custom class e.g. a new data handler or model module can be tested, if it fulfils all requirements.
Tests to implement:
* [ ] data handler
* [ ] get_X
* [ ] get_Y
* [ ] build
* [ ] requirements
* [ ] model class
* [ ] model
* [ ] loss
* [ ] run module
* [ ] init (maybe split init and _run, and call always init and run of a stage -> refac)
* [ ] workflow
* [ ] can run?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/205stationwise batch-normalisation not supported anymore2020-11-04T14:14:05+01:00Ghost Userstationwise batch-normalisation not supported anymoreDue to developments of #202, the stationwise normalisation (each station train data has mean=0, std=1) former called `scope="station"` (in contrast to `scope="data"`) is not supported on the preprocessing side (but still on the postproce...Due to developments of #202, the stationwise normalisation (each station train data has mean=0, std=1) former called `scope="station"` (in contrast to `scope="data"`) is not supported on the preprocessing side (but still on the postprocessing as far as I can say). If this "feature" is of interest again, the code namely preprocessing and data handler parts must be adjusted.
First idea: It could be sufficient to skip the general transformation step. But how to deal with the remaining subsets? Is there an attribute that is stored for each station separately? (Something like an additional subscope in the data store like `general.train.DEBW107`). Maybe this could be much easier, if #204 is solved in that way, that the remaining subsets are some kind of copy. Therefore the parameters wouldn't be reloaded and the already estimated transformation from the train subset can be used.