Skip to content
Snippets Groups Projects
To find the state of this project's repository at the time of any of these versions, check out the tags.
CHANGELOG.md 16.69 KiB

Changelog

All notable changes to this project will be documented in this file.

v2.3.0 - 2022-11-25 - new models and plots

general:

  • new model classes for ResNet and U-Net
  • new plots and variations of existing plots

new features:

technical:

v2.2.0 - 2022-08-16 - new data sources and python3.9

general:

  • new data sources: era5 data and ToarDB V2
  • CAMS competitor available
  • improved execution speed
  • MLAir is now updated to python3.9

new features:

technical:

v2.1.0 - 2022-06-07 - new evaluation metrics and improved training

general:

  • new evaluation metrics, IOA and MNMB
  • advanced train options for early stopping
  • reduced execution time by refactoring

new features:

  • uncertainty estimation of MSE is now applied for each season separately (#374 (closed))
  • added different configurations of early stopping to use either last trained or best epoch (#378 (closed))
  • train monitoring plots now add a star for best epoch when using early stopping (#367 (closed))
  • new evaluation metric index of agreement, IOA (#376 (closed))
  • new evaluation metric modified normalised mean bias, MNMB (#380 (closed))
  • new plot available that shows temporal evolution of MSE for each station (#381 (closed))

technical:

  • reduced loading of forecast path from data store (#328 (closed))
  • bug fix for not catched error during transformation (#385 (closed))
  • bug fix for data handler with climate and fir filter leading to calculate transformation always with fir filter (#387 (closed))
  • improved duration for latex report creation at end of preprocessing (#388 (closed))
  • enhanced speed for make prediction in postprocessing (#389 (closed))
  • fix to always create version badge from version and not from tag name (#382 (closed))

v2.0.0 - 2022-04-08 - tf2 usage, new model classes, and improved uncertainty estimate

general:

  • MLAir now uses tensorflow v2
  • new customisable model classes for CNN and RNN
  • improved uncertainty estimate

new features:

  • MLAir depends now on tensorflow v2 (#331 (closed))
  • new CNN class that can be configured layer-wise (#368 (closed))
  • new RNN class that can be configured in more detail (#361 (closed))
  • new branched-input CNN class (#368 (closed))
  • new branched-input RNN class (#362 (closed))
  • set custom model display name that is used in plots (#341 (closed))
  • specify names of input branches to use in feature importance plots (#356 (closed))
  • uncertainty estimate of model error is now calculated for each forecast step additionally (#359 (closed))
  • data transformation properties are stored locally and can be loaded into an experiment run (#345 (closed))
  • uncertainty estimate includes now a Mann-Whitney U rank test (#355 (closed))
  • data handlers can now have access to "future" data specified by new parameter extend_length_opts (#339 (closed))

technical:

  • MLAir now uses python3.8 on Jülich HPC systems (#375 (closed))
  • no support of MLAir for tensorflow v1.X, replaced by tf v2.X (#331 (closed))
  • all data handlers with filters can return data as branches (#370 (closed))
  • bug fix to force model name and competitor names to be unique (#366 (closed), #369 (closed))
  • fix to use only a single forecast step (#315 (closed))
  • CI pipeline adjustments (#340 (closed), #365 (closed))
  • new option to set the level of the print logging (#364 (closed))
  • advanced logging for batch data creation and in postprocessing (#350 (closed), #360 (closed))
  • batch data creation is skipped on disabled training (#341 (closed))
  • multiprocessing pools are now closed properly (#342 (closed))
  • bug fix if no competitor data is available (#343 (closed))
  • bug fix for model loading (#343 (closed))
  • models plotted by PlotSampleUncertaintyFromBootstrap are now ordered by mean error (#344 (closed))
  • fix for usage of lazy data caused unintended reloading of data (#347 (closed))
  • fix for latex reports no showing all stations and competitors (#349 (closed))
  • refactoring of hard coded dimension names in skill scores calculation (#357 (closed))
  • bug fix of order of bootstrap method in feature importance calculation causes errors (#358 (closed))
  • distinguish now between window_history_offset (pos of last time step), window_history_size (total length of input sample), and extend_length_opts ("future" data that is available at given time) (#353 (closed))

v1.5.0 - 2021-11-11 - new uncertainty estimation

general:

  • introduces method to estimate sample uncertainty
  • improved multiprocessing
  • last release with tensorflow v1 support

new features:

  • test set sample uncertainty estmation during postprocessing (#333 (closed))
  • support of Kolmogorov Zurbenko filter for data handlers with filters (#334 (closed))

technical:

v1.4.0 - 2021-07-27 - new model classes and data handlers, improved usability and transparency

general:

  • many technical adjustments to improve usability and transparency of MLAir
  • new FCN and CNN classes for easy NN model creation
  • new plots

new features:

technical:

v1.3.0 - 2021-02-24 - competitors and improved transformation

general:

  • release of official MLAir logo (#274 (closed))
  • new transformation schema for better independence of MLAir and data handler (#272 (closed))
  • competing models can be included in postprocessing for direct comparison (#198 (closed))

new features:

technical:

v1.2.1 - 2021-02-08 - bug fix for recursive import error

general:

  • applied bug fix

technical:

v1.2.0 - 2020-12-18 - parallel preprocessing and improved data handlers

general:

  • new plots
  • parallelism for faster preprocessing
  • improved data handler with mixed sampling types
  • enhanced test coverage

new features:

technical:

v1.1.0 - 2020-11-18 - hourly resolution support and new data handlers

general:

  • MLAir can be used with 1H resolution data from JOIN
  • new data handlers to use the Kolmogorov-Zurbenko filter and mixed sampling types

new features:

  • new data handler DataHandlerKzFilter to use Kolmogorov-Zurbenko filter (kz filter) on inputs (#195 (closed))
  • new data handler DataHandlerMixedSampling that can used mixed sampling types for input and target (#197 (closed))
  • new data handler DataHandlerMixedSamplingWithFilter that uses kz filter and mixed sampling (#197 (closed))
  • new data handler DataHandlerSeparationOfScales to filter-depended time steps sizes on filtered inputs using mixed sampling (#196 (closed))

technical:

  • bug fix for very short time series in TimeSeriesPlot (#215 (closed))
  • bug fix for variable dictionary when using hourly resolution (#212 (closed))
  • variable naming for data from JOIN interface harmonised (#206 (closed))
  • transformation setup is now separated for inputs and targets (#202 (closed))
  • bug fix in PlotClimatologicalSkillScore if only single station is used (#193 (closed))
  • preprocessed data is now stored inside experiment and not in the data folder

v1.0.0 - 2020-10-08 - official release of new version 1.0.0

general:

  • This is the first official release of MLAir ready for use
  • updated license, installation instruction

technical:

  • restructured order of packages in requirements

v0.12.2 - 2020-10-01 - HDFML support

general:

  • HDFML support

technical:

v0.12.1 - 2020-09-28 - examples in notebook

general:

  • introduced a notebook documentation for easy starting, #174 (closed)
  • updated special installation instructions for the Juelich HPC systems, #172 (closed)

new features:

  • names of input and output shape are renamed consistently to: input_shape, and output_shape, #175 (closed)

technical:

  • it is possible to assign a custom name to a run module (e.g. used in logging), #173 (closed)

v0.12.0 - 2020-09-21 - Documentation and Bugfixes

general:

  • improved documentation include installation instructions and many examples from the paper, #153 (closed)
  • bugfixes (see technical)

new features:

  • MyLittleModel is now a pure feed-forward network (before it had a CNN part), #168 (closed)

technical:

  • new compile options check to ensure its execution, #154 (closed)
  • bugfix for key errors in time series plot, #169 (closed)
  • bugfix for not used kwargs in DefaultDataHandler, #170 (closed)
  • trainable parameter is renamed by train_model to prevent confusion with the tf trainable parameter, #162 (closed)
  • fixed HPC installation failure, #159 (closed)

v0.11.0 - 2020-08-24 - Advanced Data Handling for MLAir

general

  • Introduce advanced data handling with much more flexibility (independent of TOAR DB, custom data handling is pluggable), #144 (closed)
  • default data handler is still using TOAR DB

new features

technical

v0.10.0 - 2020-07-15 - MLAir is official name, Workflows, easy Model plug-in

general

  • Official project name is released: MLAir (Machine Learning on Air data)
  • a model class can now easily be plugged in into MLAir. #121 (closed)
  • introduced new concept of workflows, #134 (closed)

new features

  • workflows are used to execute a sequence of run modules, #134 (closed)
  • default workflows for standard and the Juelich HPC systems are available, custom workflows can be defined, #134 (closed)
  • seasonal decomposition is available for conditional quantile plot, #112 (closed)
  • map plot is created with coordinates, #108 (closed)
  • flatten_tails are now more general and easier to customise, #114 (closed)
  • model classes have custom compile options (replaces set_loss), #110 (closed)
  • model can be set in ExperimentSetup from outside, #121 (closed)
  • default experiment settings can be queried using get_defaults(), #123 (closed)
  • training and model settings are reported as MarkDown and Tex tables, #145 (closed)

technical

v0.9.0 - 2020-04-15 - faster bootstraps, extreme value upsamling

general

  • improved and faster bootstrap workflow
  • new plot PlotAvailability
  • extreme values upsampling
  • improved runtime environment

new features

  • entire bootstrap workflow has been refactored and much faster now, can be skipped with evaluate_bootstraps=False, #60 (closed)
  • upsampling of extreme values, set with parameter extreme_values=[your_values_standardised] (e.g. [1, 2]) and extremes_on_right_tail_only=<True/False> if only right tail of distribution is affected or both, #58 (closed), #87 (closed)
  • minimal data length property (in total and for all subsets), #76 (closed)
  • custom objects in model class to load customised model objects like padding class, loss, #72 (closed)
  • new plot for data availability: PlotAvailability, #103 (closed)
  • introduced (default) plot_list to specify which plots to draw
  • latex and markdown information on sample sizes for each station, #90 (closed)

technical

  • implemented tests on gpu and from scratch for develop, release and master branches, #95 (closed)
  • usage of tensorflow 1.13.1 (gpu / cpu), separated in 2 different requirements, #81 (closed)
  • new abstract plot class to have uniform plot class design
  • New time tracking wrapper to use for functions or classes
  • improved logger (info on display, debug into file), #73 (closed), #85 (closed), #88 (closed)
  • improved run environment, especially for error handling, #86 (closed)
  • prefix general in data store scope is now optional and can be skipped. If given scope is not general, it is treated as subscope, #82 (closed)
  • all 2D Padding classes are now selected by Padding2D(padding_name=<padding_type>) e.g. Padding2D(padding_name="SymPad2D"), #78 (closed)
  • custom learning rate (or lr_decay) is optional now, #71 (closed)