MLAir issueshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues2023-11-30T11:35:20+01:00https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/459Preprocessing German stations2023-11-30T11:35:20+01:00Michael LangguthPreprocessing German stationsPreprocess data (i.e. generation of transformation- and apriori-data) for all German stations (rural, suburban **and** urban stations) for DestinE-AQ use case.
For this purpose, a revised list of stations is parsed and the filtering of ...Preprocess data (i.e. generation of transformation- and apriori-data) for all German stations (rural, suburban **and** urban stations) for DestinE-AQ use case.
For this purpose, a revised list of stations is parsed and the filtering of NOx data is deactivated.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/455Skip harmonize history and target on demand2023-06-30T11:43:02+02:00Ghost UserSkip harmonize history and target on demandTo issue a real-time forecast, it is required to not harmonize history and target data as target data is not available at this time.
* [ ] add a parameter that stores unharmonized history data in separate variable `self.full_history`.
*...To issue a real-time forecast, it is required to not harmonize history and target data as target data is not available at this time.
* [ ] add a parameter that stores unharmonized history data in separate variable `self.full_history`.
* [ ] add method that forecasts also on full_history parameter and stores forecasts as `forecast_full.nc`https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/456add different example run scripts2023-06-30T11:42:47+02:00Ghost Useradd different example run scriptsAdd a number of different example run scripts.
* [ ] run climate fir
* [ ] run IFS forecast
* [ ] ?Add a number of different example run scripts.
* [ ] run climate fir
* [ ] run IFS forecast
* [ ] ?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/457set config paths as parameter2023-06-30T11:42:24+02:00Ghost Userset config paths as parameteradd parameters to set data paths of ifs or cams from outside. Use config files only if parameter is not providedadd parameters to set data paths of ifs or cams from outside. Use config files only if parameter is not providedhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/452update proj version2023-06-30T11:42:21+02:00Ghost Userupdate proj version<!-- Use this template for a bug in MLAir. -->
# Bug
## Error description
<!-- Provide a context when the bug / error arises -->
Pipeline tests from scratch are failing
## Error message
<!-- Provide the error log if available -->
``...<!-- Use this template for a bug in MLAir. -->
# Bug
## Error description
<!-- Provide a context when the bug / error arises -->
Pipeline tests from scratch are failing
## Error message
<!-- Provide the error log if available -->
```shell
$ zypper --no-gpg-checks --non-interactive install proj=9.1.0
Loading repository data...
Reading installed packages...
No provider of 'proj=9.1.0' found.
'proj=9.1.0' not found in package names. Trying capabilities.
```
## First guess on error origin
<!-- Add first ideas where the error could come from -->
* updated proj version -> 9.2.0
* try to remove version at all first
* if this is not working, fix to new version
## Error origin
<!-- Fill this up when the bug / error origin has been found -->
## Solution
<!-- Short description how to solve the error -->
Remove all the package versions.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/451robust apriori estimate for short timeseries2023-06-30T11:42:19+02:00Ghost Userrobust apriori estimate for short timeseriesWhen time series are shorter than 1 year, there are issue with calculcating climate stats resulting in NaN information.
* [x] add `dropna` along time axis in `filter.py:create_monthly_mean`
* [x] also check `filter.py:create_seasonal_ho...When time series are shorter than 1 year, there are issue with calculcating climate stats resulting in NaN information.
* [x] add `dropna` along time axis in `filter.py:create_monthly_mean`
* [x] also check `filter.py:create_seasonal_hourly_mean` if this behaviour happens therehttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/447store and load local clim apriori data2023-06-30T11:42:16+02:00Ghost Userstore and load local clim apriori dataCurrently, apriori data to calculate filter components must be either parsed in workflow args or is calculated on data. Also, apriori is stored in lazy processingm, but is encrypted and therefore hard to read from outside. After this iss...Currently, apriori data to calculate filter components must be either parsed in workflow args or is calculated on data. Also, apriori is stored in lazy processingm, but is encrypted and therefore hard to read from outside. After this issue, it should be possible to use an experiment's apriori data in another experiment.
* [x] Store apriori data as locally, similar to transformation properties. ~Maybe only store, when `store_apriori=True`.~ Apriori is always stored.
* [x] define location and if a single file storage or multiple files are suitable: `<exp>/data/apriori`
* [x] define storage format (.nc, .np, .csv, ...?): `.pickle`
* [x] is it required to store filter information, like filter size and number of split (to ensure integrity of apriori information): not implemented
* [x] Load apriori data from local path
* [x] define parameter `apriori_file=<path/file>.pickle`
* [x] ~check for matching filter settings~ canceledhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/448load model from path2023-06-30T11:42:15+02:00Ghost Userload model from pathCurrently, MLAir expects a trained model to be inside the experiment path inside the `model` directory. When running a fresh experiment run with no training at all, this will fail as there is no option to refer to an existing model (with...Currently, MLAir expects a trained model to be inside the experiment path inside the `model` directory. When running a fresh experiment run with no training at all, this will fail as there is no option to refer to an existing model (without copying it into to model directory after initiating the MLAir workflow). Therefore add option to use external model.
* [x] introduce `model_path` parameter that can be set during `ExperimentSetup`
* [x] add option to copy the given model inside the experiment path (which should then overwrite the external `model_path`?)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/442bias free evaluation2023-06-30T11:42:12+02:00Ghost Userbias free evaluationImplement post-processing evaluation that is free of bias. This is in particular interesting when comparing DL with model data, as model data suffers from systematical deviations. A bias-free evaluation can show how much a model is able ...Implement post-processing evaluation that is free of bias. This is in particular interesting when comparing DL with model data, as model data suffers from systematical deviations. A bias-free evaluation can show how much a model is able to predict target's variance.
Therefore, implement two strategies:
(i) Calculate a total mean of a given model for each station and subtract this value from the model's forecasts.
(ii) Calculate a running mean of a given model for each station and subtract this series from model's forecast.
Each strategy is then applied to all competing models and evaluation is performed (in addition to the standard evaluation).https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/444choose interp method in CAMS competitor2023-06-30T11:42:10+02:00Ghost Userchoose interp method in CAMS competitor* [x] add option to set cams interpolation method (currently only using nearest neighbor)
* [x] enable to use both methods as separate competitors* [x] add option to set cams interpolation method (currently only using nearest neighbor)
* [x] enable to use both methods as separate competitorshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/449load era5 data from toar db2023-06-30T11:42:09+02:00Ghost Userload era5 data from toar dbToar DB does now include era5 data. Therefore, replace current bypass using local data.
* [x] check that `era5` can be used as data origin to load from ToarDB
* [x] trigger former era5 loader only with flag `era5_local` (do not completl...Toar DB does now include era5 data. Therefore, replace current bypass using local data.
* [x] check that `era5` can be used as data origin to load from ToarDB
* [x] trigger former era5 loader only with flag `era5_local` (do not completly remove this functionality)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/453advanced retry strategy2023-06-30T11:42:07+02:00Ghost Useradvanced retry strategyImplement advanced retry strategy when downloading data. Currently, only the `get` command is in charge of retry, but not building up a renewed connection. As toardb sometimes killes jobs without an error response, the connection is brok...Implement advanced retry strategy when downloading data. Currently, only the `get` command is in charge of retry, but not building up a renewed connection. As toardb sometimes killes jobs without an error response, the connection is broken without recognition. Therefore, adjust the retry strategy to establish a new connection instead of only retrying to get data.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/450load ifs data2023-06-30T11:42:04+02:00Ghost Userload ifs dataImplement a data loader that can load locally stored IFS forecast data. Should look similar to existing era5 data loader.
* [x] be able to load IFS data
* [x] trigger IFS loading by using `ifs` in `data_origin`
# Design choices
* IFS ...Implement a data loader that can load locally stored IFS forecast data. Should look similar to existing era5 data loader.
* [x] be able to load IFS data
* [x] trigger IFS loading by using `ifs` in `data_origin`
# Design choices
* IFS data contain two temporal axes: init time (every 12 hours), valid time (hourly)
* for now: create a single time series consisting on the closest combination of init and valid time (use only t0+0h to t0+11h of each init time).
* for future: think about information for ti>t0 of each sample. Maybe use rather init time of t0 and all forecast steps for future time steps. This will break with a single timeseries, but this is similar to the filter approach and prevents data leakage. Maybe this should be part of another issue.
## discussions
Usage of operational forecast data poses some problems with the current setup of MLAir. For now, raw time series always contained a single time dimension which then is transformed into two during sample setup. Now, this second dimension already exists from the NWP model's lead time. So all methods implemented for now cannot handle this data (interpolation, filter, ...).
Changing the general behaviour, e.g. always adding window dimension 0, is a huge refactoring step.
Designing a new data handler just for IFS data harms compability with all other data handlers and produced a lot of almost duplicated code.
Changing the filter calculation methods might therefore be the simplest solution.
## TODO
* [x] refac: expand dims in all data loader (era5_local, join) by dimension `window` so that data can be merged with ifs data. Window dimension has single entry `0`. If final dataframe has only the 0 dimension, remove this dimension again. If no IFS data are loaded, returned data are like before!
* [x] implement/adjust: ClimateFIR filter should be able to use data with two time dimensions as input. As after the first filter iteration, data is already structured with two time dimensions, it should be possible to use such data from the beginning.
* [x] new data handler for IFS data? Skip interpolation (or apply later after data is resturctured?), create time series data for each init time (ti<t0: closest combination of init and valid time, ti>=t0: most recent forecast, be aware of running time 01 and 13 local time), maybe interpolate now, calculate filter.Michael LangguthMichael Langguthhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/445Data Insight Plot Monthly Distribution2023-06-30T11:42:04+02:00Ghost UserData Insight Plot Monthly Distribution* [ ] implement a variant of the monthly summary plot but only with observations. Instead, use different colors/bars for each subset* [ ] implement a variant of the monthly summary plot but only with observations. Instead, use different colors/bars for each subsethttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/454Use Toar statistics api v22023-06-30T11:42:01+02:00Ghost UserUse Toar statistics api v2As Toar Statistics API v1 is offline, adjust code to load aggregated data (like dma8eu ozone) from new API v2.
When the old code looks like
```python
from io import BytesIO
import pandas as pd
import requests
resp = requests.get("ht...As Toar Statistics API v1 is offline, adjust code to load aggregated data (like dma8eu ozone) from new API v2.
When the old code looks like
```python
from io import BytesIO
import pandas as pd
import requests
resp = requests.get("https://toar-data.fz-juelich.de/statistics/api/v1/?format=csv×eries_id=31099&names=dma8eu&sampling=daily")
df = pd.read_csv(BytesIO(resp.content), index_col="datetime", parse_dates=True)
```
the new code could look like
```python
from io import BytesIO
import zipfile
import pandas as pd
import requests
resp = requests.get("https://toar-data.fz-juelich.de/api/v2/analysis/statistics/?sampling=daily&statistics=dma8eu&id=31099")
while True:
resp = requests.get(resp.json()["status"], timeout=(3.05, 5))
if resp.history:
break
with zipfile.ZipFile(BytesIO(resp.content)) as file:
df = pd.read_csv(BytesIO(file.read("31099_dma8eu.csv")), comment="#", index_col="datetime", parse_dates=True)
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/440release v2.3.02023-06-30T11:37:11+02:00Ghost Userrelease v2.3.0<!-- Use this template for a new release of MLAir. -->
# Release
<!-- add your release version here -->
v2.3.0
## checklist
* [x] Create Release Issue
* [x] Create merge request: branch `release_v2.3.0` into `master`
* [x] Merge `dev...<!-- Use this template for a new release of MLAir. -->
# Release
<!-- add your release version here -->
v2.3.0
## checklist
* [x] Create Release Issue
* [x] Create merge request: branch `release_v2.3.0` into `master`
* [x] Merge `develop` into `release_v2.3.0`
* [x] Checkout `release_v2.3.0`
* [x] Adjust `changelog.md` (see template for changelog)
* [x] Update version number in `mlair/__ init__.py`
* [x] Create new dist file: `python3 setup.py sdist bdist_wheel`
* [x] Add new dist file `mlair-2.3.0-py3-none-any.whl` to git
* [x] Update file link `distribution file (current version)` in `README.md`
* [x] Update file link in `docs/_source/installation.rst`
* [x] Commit + push
* [x] Merge `release_v2.3.0` into `master`
* [ ] Create new tag with
* [ ] distribution file (.whl)
* [ ] link to Documentation
* [ ] Example Jupyter Notebook
* [ ] changelog
## template for changelog
<!-- use this structure for the changelog. Link all issue to at least one item. -->
```
## v2.3.0 - 2022-11-25 - <release description>
### general:
* text
### new features:
* words (issue)
### technical:
*
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/458release v2.4.02023-06-30T11:36:58+02:00Ghost Userrelease v2.4.0<!-- Use this template for a new release of MLAir. -->
# Release
<!-- add your release version here -->
v2.4.0
## checklist
* [x] Create Release Issue
* [x] Create merge request: branch `release_v2.4.0` into `master`
* [x] Merge `dev...<!-- Use this template for a new release of MLAir. -->
# Release
<!-- add your release version here -->
v2.4.0
## checklist
* [x] Create Release Issue
* [x] Create merge request: branch `release_v2.4.0` into `master`
* [x] Merge `develop` into `release_v2.4.0`
* [x] Checkout `release_v2.4.0`
* [x] Adjust `changelog.md` (see template for changelog)
* [x] Update version number in `mlair/__ init__.py`
* [x] Create new dist file: `python3 setup.py sdist bdist_wheel`
* [ ] Add new dist file `mlair-2.4.0-py3-none-any.whl` to git
* [x] Update file link `distribution file (current version)` in `README.md`
* [x] Update file link in `docs/_source/installation.rst`
* [x] Commit + push
* [ ] Merge `release_v2.4.0` into `master`
* [ ] Create new tag with
* [ ] distribution file (.whl)
* [ ] link to Documentation
* [ ] Example Jupyter Notebook
* [ ] changelog
## template for changelog
<!-- use this structure for the changelog. Link all issue to at least one item. -->
```
## v2.4.0 - 2023-06-30- <release description>
### general:
* text
### new features:
* words (issue)
### technical:
*
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/433Contingency Analysis2023-06-27T14:53:57+02:00Ghost UserContingency AnalysisImplement some kind of contingency analysis:
* [ ] add hit rate and the other metrics
* [ ] for dma8eu ozone: calculate metrics for legal limits
* [ ] calculate percent hits of the top 2,5,10%: Does a model/competitor exceed its own X% ...Implement some kind of contingency analysis:
* [ ] add hit rate and the other metrics
* [ ] for dma8eu ozone: calculate metrics for legal limits
* [ ] calculate percent hits of the top 2,5,10%: Does a model/competitor exceed its own X% threshold if this is the case for the observation?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/79KZ filter2023-06-19T12:30:10+02:00Ghost UserKZ filterImplement kolmogorov zurbenko filter for preprocessing. Should be callable like the tranform method inside the data generator as method of the data preparation class. First implementation can be found here: https://gitlab.version.fz-juel...Implement kolmogorov zurbenko filter for preprocessing. Should be callable like the tranform method inside the data generator as method of the data preparation class. First implementation can be found here: https://gitlab.version.fz-juelich.de/leufen1/kolmogorovzurbenkofilter . But this implementation has to be adapted to the current situation.
**NOTE:** regard that a gaussian filter is not proper (because values are used for baseline, that are not available at the forecast point of time). -> use left side of gauss curve and a hard cut-off.Temporal Decomposed Input Datahttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/214check bootstrap behaviour on separation of scales2023-06-19T12:29:19+02:00Ghost Usercheck bootstrap behaviour on separation of scalescheck bootstrap behaviour on separation of scales.
Does it work properly? The plot looks promising, but is there a valid shuffling for each variable? Is it more important, to shuflle only a single filter dim to estimate the influence of...check bootstrap behaviour on separation of scales.
Does it work properly? The plot looks promising, but is there a valid shuffling for each variable? Is it more important, to shuflle only a single filter dim to estimate the influence of variable and filter? Or only the low pass terms are shuffled (but for all variables at once).Temporal Decomposed Input Data