-
lukas leufen authoredlukas leufen authored
To find the state of this project's repository at the time of any of these versions, check out the tags.
- Changelog
- v2.3.0 - 2022-11-25 - new models and plots
- general:
- new features:
- technical:
- v2.2.0 - 2022-08-16 - new data sources and python3.9
- general:
- new features:
- technical:
- v2.1.0 - 2022-06-07 - new evaluation metrics and improved training
- general:
- new features:
- technical:
- v2.0.0 - 2022-04-08 - tf2 usage, new model classes, and improved uncertainty estimate
- general:
- new features:
- technical:
- v1.5.0 - 2021-11-11 - new uncertainty estimation
- general:
- new features:
- technical:
- v1.4.0 - 2021-07-27 - new model classes and data handlers, improved usability and transparency
- general:
- new features:
- technical:
- v1.3.0 - 2021-02-24 - competitors and improved transformation
- general:
- new features:
- technical:
- v1.2.1 - 2021-02-08 - bug fix for recursive import error
- general:
- technical:
- v1.2.0 - 2020-12-18 - parallel preprocessing and improved data handlers
- general:
- new features:
- technical:
- v1.1.0 - 2020-11-18 - hourly resolution support and new data handlers
- general:
- new features:
- technical:
- v1.0.0 - 2020-10-08 - official release of new version 1.0.0
- general:
- technical:
- v0.12.2 - 2020-10-01 - HDFML support
- general:
- technical:
- v0.12.1 - 2020-09-28 - examples in notebook
- general:
- new features:
- technical:
- v0.12.0 - 2020-09-21 - Documentation and Bugfixes
- general:
- new features:
- technical:
- v0.11.0 - 2020-08-24 - Advanced Data Handling for MLAir
- general
- new features
- technical
- v0.10.0 - 2020-07-15 - MLAir is official name, Workflows, easy Model plug-in
- general
- new features
- technical
- v0.9.0 - 2020-04-15 - faster bootstraps, extreme value upsamling
- general
- new features
- technical
CHANGELOG.md 16.69 KiB
Changelog
All notable changes to this project will be documented in this file.
v2.3.0 - 2022-11-25 - new models and plots
general:
- new model classes for ResNet and U-Net
- new plots and variations of existing plots
new features:
- new model classes: ResNet (#419 (closed)), U-Net (#423 (closed))
- seasonal mse stack plot (#422 (closed))
- new aggregated and line versions of Time Evolution Plot (#424 (closed), #427 (closed))
- box-and-whisker plots are created for all error metrics (#431 (closed))
- new split and frequency distribution versions of box-and-whisker plots for error metrics (#425 (closed), #434 (closed))
- new evaluation metric: mean error / bias (#430 (closed))
- conditional quantiles are now available for all competitors too (#435 (closed))
- new map plot showing mse at locations (#432 (closed))
technical:
- speed up in model setup (#421 (closed))
- bugfix for boundary trim in FIR filter (#418 (closed))
- persistence is now calculated only on demand (#426 (closed))
- block mse are stored locally in a file (#428 (closed))
- fix issue with boolean variables not recognized by argparse (#417 (closed))
- renaming of ahead labels (#436 (closed))
v2.2.0 - 2022-08-16 - new data sources and python3.9
general:
- new data sources: era5 data and ToarDB V2
- CAMS competitor available
- improved execution speed
- MLAir is now updated to python3.9
new features:
- new data loading method to load era5 data on Jülich systems (#393 (closed))
- new data loading method to load data from ToarDB V2 (#396 (closed))
- implemented competitor model using CAMS ensemble forecasts (#394 (closed))
- OLS competitor is only calculated if provided in competitor list (#404 (closed))
- experimental: snapshot creation to skip preprocessing stage (#346 (closed), #405 (closed), #406 (closed))
- new workflow HyperSearchWorkflow stopping after training stage (#408 (closed))
technical:
- fixed minor issues and improved execution speed in postprocessing (#401 (closed), #413 (closed))
- improved speed in keras iterator creation (#409 (closed))
- solved bug for very long competitor time series (#395 (closed))
- updated python, HPC and CI environment (#402 (closed), #403 (closed), #407 (closed), #410 (closed))
- fix for climateFIR data handler (#399 (closed))
- fix for report model error (#416 (closed))
v2.1.0 - 2022-06-07 - new evaluation metrics and improved training
general:
- new evaluation metrics, IOA and MNMB
- advanced train options for early stopping
- reduced execution time by refactoring
new features:
- uncertainty estimation of MSE is now applied for each season separately (#374 (closed))
- added different configurations of early stopping to use either last trained or best epoch (#378 (closed))
- train monitoring plots now add a star for best epoch when using early stopping (#367 (closed))
- new evaluation metric index of agreement, IOA (#376 (closed))
- new evaluation metric modified normalised mean bias, MNMB (#380 (closed))
- new plot available that shows temporal evolution of MSE for each station (#381 (closed))
technical:
- reduced loading of forecast path from data store (#328 (closed))
- bug fix for not catched error during transformation (#385 (closed))
- bug fix for data handler with climate and fir filter leading to calculate transformation always with fir filter (#387 (closed))
- improved duration for latex report creation at end of preprocessing (#388 (closed))
- enhanced speed for make prediction in postprocessing (#389 (closed))
- fix to always create version badge from version and not from tag name (#382 (closed))
v2.0.0 - 2022-04-08 - tf2 usage, new model classes, and improved uncertainty estimate
general:
- MLAir now uses tensorflow v2
- new customisable model classes for CNN and RNN
- improved uncertainty estimate
new features:
- MLAir depends now on tensorflow v2 (#331 (closed))
- new CNN class that can be configured layer-wise (#368 (closed))
- new RNN class that can be configured in more detail (#361 (closed))
- new branched-input CNN class (#368 (closed))
- new branched-input RNN class (#362 (closed))
- set custom model display name that is used in plots (#341 (closed))
- specify names of input branches to use in feature importance plots (#356 (closed))
- uncertainty estimate of model error is now calculated for each forecast step additionally (#359 (closed))
- data transformation properties are stored locally and can be loaded into an experiment run (#345 (closed))
- uncertainty estimate includes now a Mann-Whitney U rank test (#355 (closed))
- data handlers can now have access to "future" data specified by new parameter extend_length_opts (#339 (closed))
technical:
- MLAir now uses python3.8 on Jülich HPC systems (#375 (closed))
- no support of MLAir for tensorflow v1.X, replaced by tf v2.X (#331 (closed))
- all data handlers with filters can return data as branches (#370 (closed))
- bug fix to force model name and competitor names to be unique (#366 (closed), #369 (closed))
- fix to use only a single forecast step (#315 (closed))
- CI pipeline adjustments (#340 (closed), #365 (closed))
- new option to set the level of the print logging (#364 (closed))
- advanced logging for batch data creation and in postprocessing (#350 (closed), #360 (closed))
- batch data creation is skipped on disabled training (#341 (closed))
- multiprocessing pools are now closed properly (#342 (closed))
- bug fix if no competitor data is available (#343 (closed))
- bug fix for model loading (#343 (closed))
- models plotted by PlotSampleUncertaintyFromBootstrap are now ordered by mean error (#344 (closed))
- fix for usage of lazy data caused unintended reloading of data (#347 (closed))
- fix for latex reports no showing all stations and competitors (#349 (closed))
- refactoring of hard coded dimension names in skill scores calculation (#357 (closed))
- bug fix of order of bootstrap method in feature importance calculation causes errors (#358 (closed))
- distinguish now between window_history_offset (pos of last time step), window_history_size (total length of input sample), and extend_length_opts ("future" data that is available at given time) (#353 (closed))
v1.5.0 - 2021-11-11 - new uncertainty estimation
general:
- introduces method to estimate sample uncertainty
- improved multiprocessing
- last release with tensorflow v1 support
new features:
- test set sample uncertainty estmation during postprocessing (#333 (closed))
- support of Kolmogorov Zurbenko filter for data handlers with filters (#334 (closed))
technical:
- new communication scheme for multiprocessing (#321 (closed), #322 (closed))
- improved error reporting (#323 (closed))
- feature importance returns now unaggregated results (#335 (closed))
- error metrics are reported for all competitors (#332 (closed))
- minor bugfixes and refacs (#330 (closed), #326 (closed), #329 (closed), #325 (closed), #324 (closed), #320 (closed), #337 (closed))
v1.4.0 - 2021-07-27 - new model classes and data handlers, improved usability and transparency
general:
- many technical adjustments to improve usability and transparency of MLAir
- new FCN and CNN classes for easy NN model creation
- new plots
new features:
- new FCN class that can be customized in many ways (#284 (closed))
- also new CNN class (#289 (closed))
- added new bootstrap analysis method: mean bootstrapping (#300 (closed))
- new data handler using FIR filters (#306 (closed))
- performance measures are now stored in local files (#286 (closed))
- histogram plots for inputs and targets (#299 (closed))
- periodogram plots for filtered data (#298 (closed))
technical:
- a calling run script can be stored inside experiment folder if reference to this script is parsed as argument (#99 (closed))
- new callback to track epoch-runtime (#312 (closed))
- added switch to use multiprocessing (#297 (closed))
- customize maximum number of parallel processes (#308 (closed))
- support non-monotonic window lead times (#313 (closed))
- resolved bug with FileExistsError (#311 (closed))
- resolved bug if no chemical is used at all (#307 (closed))
- min/max scaler now scales between -1 and 1 (#302 (closed))
- added missing offset parameter to some data handlers (#305 (closed))
- improved data store logging (#304 (closed))
- improved logging message on station removal in preprocessing (#294 (closed))
- limited number of retries in JOIN module (#296 (closed))
- adjusted competing skill score plot (#301 (closed))
- transformation parameter check (#295 (closed))
- implemented lazy data preprocessing for selected data handlers (#292 (closed))
- fix bug in separation of scales data handler (#290 (closed))
v1.3.0 - 2021-02-24 - competitors and improved transformation
general:
- release of official MLAir logo (#274 (closed))
- new transformation schema for better independence of MLAir and data handler (#272 (closed))
- competing models can be included in postprocessing for direct comparison (#198 (closed))
new features:
- new helper functions for geographic issues (#280 (closed))
- default data handler and inheritances can use min/max and log transformation (#276 (closed), #275 (closed))
- include IntelliO3-ts model as reference via automatic download (#131 (closed))
technical:
- experiment name now always includes target sampling type (#263 (closed))
- competitive skill score plot is refactored (#260 (closed))
- bug fix for climatological skill scores (#259 (closed))
- bug fix for custom objects handling (#277 (closed))
- bug fix for monitoring plots when multiple output branches are used (#278 (closed))
- update requirements to newer version and dependencies (#262 (closed), #273 (closed))
- HPC scripts are updated to work properly with parallel data processing (#281 (closed))
v1.2.1 - 2021-02-08 - bug fix for recursive import error
general:
- applied bug fix
technical:
- bug fix for recursive import error, (#269 (closed))
v1.2.0 - 2020-12-18 - parallel preprocessing and improved data handlers
general:
- new plots
- parallelism for faster preprocessing
- improved data handler with mixed sampling types
- enhanced test coverage
new features:
- station map plot highlights now subsets on the map and displays number of stations for each subset (#227 (closed), #231 (closed))
- two new data availability plots
PlotAvailabilityHistogram
(#191 (closed), #192 (closed), #223 (closed)) - introduced parallel code in preprocessing if system supports parallelism (#164 (closed), #224 (closed), #225 (closed))
- data handler
DataHandlerMixedSampling
(and inheritances) supports an offset parameter to end inputs at a different time than 00 hours (#220 (closed)) - args for data handler
DataHandlerMixedSampling
(and inheritances) that differ for input and target can now be parsed as tuple (#229 (closed))
technical:
- added templates for release and bug issues (#189 (closed))
- improved test coverage (#236 (closed), #238 (closed), #239 (closed), #240 (closed), #241 (closed), #242 (closed), #243 (closed), #244 (closed), #245 (closed))
- station map plot includes now number of stations for each subset (#231 (closed))
- postprocessing plots are encapsulated in try except statements (#107 (closed))
- updated git settings (#213 (closed))
- bug fix for data handler (#235 (closed))
- reordering and bug fix for preprocessing reporting (#207 (closed), #232 (closed))
- bug fix for outdated system path style (#226 (closed))
- new plots are included in default plot list (#211 (closed))
-
helpers/join
connection to ToarDB (e.g. used by DefaultDataHandler) reports now which variable could not be loaded (#222 (closed)) - plot
PlotBootstrapSkillScore
can now additionally highlight specific variables, but not included in postprocessing up to now (#201 (closed)) - data handler
DataHandlerMixedSampling
has now a reduced data loading (#221 (closed))
v1.1.0 - 2020-11-18 - hourly resolution support and new data handlers
general:
- MLAir can be used with 1H resolution data from JOIN
- new data handlers to use the Kolmogorov-Zurbenko filter and mixed sampling types
new features:
- new data handler
DataHandlerKzFilter
to use Kolmogorov-Zurbenko filter (kz filter) on inputs (#195 (closed)) - new data handler
DataHandlerMixedSampling
that can used mixed sampling types for input and target (#197 (closed)) - new data handler
DataHandlerMixedSamplingWithFilter
that uses kz filter and mixed sampling (#197 (closed)) - new data handler
DataHandlerSeparationOfScales
to filter-depended time steps sizes on filtered inputs using mixed sampling (#196 (closed))
technical:
- bug fix for very short time series in TimeSeriesPlot (#215 (closed))
- bug fix for variable dictionary when using hourly resolution (#212 (closed))
- variable naming for data from JOIN interface harmonised (#206 (closed))
- transformation setup is now separated for inputs and targets (#202 (closed))
- bug fix in PlotClimatologicalSkillScore if only single station is used (#193 (closed))
- preprocessed data is now stored inside experiment and not in the data folder
v1.0.0 - 2020-10-08 - official release of new version 1.0.0
general:
- This is the first official release of MLAir ready for use
- updated license, installation instruction
technical:
- restructured order of packages in requirements
v0.12.2 - 2020-10-01 - HDFML support
general:
- HDFML support
technical:
- installation script for HDFML adjusted, #183 (closed)
v0.12.1 - 2020-09-28 - examples in notebook
general:
- introduced a notebook documentation for easy starting, #174 (closed)
- updated special installation instructions for the Juelich HPC systems, #172 (closed)
new features:
- names of input and output shape are renamed consistently to: input_shape, and output_shape, #175 (closed)
technical:
- it is possible to assign a custom name to a run module (e.g. used in logging), #173 (closed)
v0.12.0 - 2020-09-21 - Documentation and Bugfixes
general:
- improved documentation include installation instructions and many examples from the paper, #153 (closed)
- bugfixes (see technical)
new features:
-
MyLittleModel
is now a pure feed-forward network (before it had a CNN part), #168 (closed)
technical:
- new compile options check to ensure its execution, #154 (closed)
- bugfix for key errors in time series plot, #169 (closed)
- bugfix for not used kwargs in
DefaultDataHandler
, #170 (closed) -
trainable
parameter is renamed bytrain_model
to prevent confusion with the tf trainable parameter, #162 (closed) - fixed HPC installation failure, #159 (closed)
v0.11.0 - 2020-08-24 - Advanced Data Handling for MLAir
general
- Introduce advanced data handling with much more flexibility (independent of TOAR DB, custom data handling is pluggable), #144 (closed)
- default data handler is still using TOAR DB
new features
- default data handler using TOAR DB refactored according to advanced data handling, #140 (closed), #141 (closed), #152 (closed)
- data sets are handled as collections, #142 (closed), and are iterable in a standard way (StandardIterator) and optimised for keras (KerasIterator), #143 (closed)
- automatically moving station map plot, #136 (closed)
technical
- model modules available from package, #139 (closed)
- renaming of parameter time dimension, #151 (closed)
- refactoring of README.md, #138 (closed)
v0.10.0 - 2020-07-15 - MLAir is official name, Workflows, easy Model plug-in
general
- Official project name is released: MLAir (Machine Learning on Air data)
- a model class can now easily be plugged in into MLAir. #121 (closed)
- introduced new concept of workflows, #134 (closed)
new features
- workflows are used to execute a sequence of run modules, #134 (closed)
- default workflows for standard and the Juelich HPC systems are available, custom workflows can be defined, #134 (closed)
- seasonal decomposition is available for conditional quantile plot, #112 (closed)
- map plot is created with coordinates, #108 (closed)
-
flatten_tails
are now more general and easier to customise, #114 (closed) - model classes have custom compile options (replaces
set_loss
), #110 (closed) - model can be set in ExperimentSetup from outside, #121 (closed)
- default experiment settings can be queried using
get_defaults()
, #123 (closed) - training and model settings are reported as MarkDown and Tex tables, #145 (closed)
technical
- Juelich HPC systems are supported and installation scripts are available, #106 (closed)
- data store is tracked, I/O is saved and illustrated in a plot, #116 (closed)
- batch size, epoch parameter have to be defined in ExperimentSetup, #127 (closed), #122 (closed)
- automatic documentation with sphinx, #109 (closed)
- default experiment settings are updated, #123 (closed)
- refactoring of experiment path and its default naming, #124 (closed)
- refactoring of some parameter names, #146 (closed)
- preparation for package distribution with pip, #119 (closed)
- all run scripts are updated to run with workflows, #134 (closed)
- the experiment folder is restructured, #130 (closed)
v0.9.0 - 2020-04-15 - faster bootstraps, extreme value upsamling
general
- improved and faster bootstrap workflow
- new plot PlotAvailability
- extreme values upsampling
- improved runtime environment
new features
- entire bootstrap workflow has been refactored and much faster now, can be skipped with
evaluate_bootstraps=False
, #60 (closed) - upsampling of extreme values, set with parameter
extreme_values=[your_values_standardised]
(e.g.[1, 2]
) andextremes_on_right_tail_only=<True/False>
if only right tail of distribution is affected or both, #58 (closed), #87 (closed) - minimal data length property (in total and for all subsets), #76 (closed)
- custom objects in model class to load customised model objects like padding class, loss, #72 (closed)
- new plot for data availability:
PlotAvailability
, #103 (closed) - introduced (default)
plot_list
to specify which plots to draw - latex and markdown information on sample sizes for each station, #90 (closed)
technical
- implemented tests on gpu and from scratch for develop, release and master branches, #95 (closed)
- usage of tensorflow 1.13.1 (gpu / cpu), separated in 2 different requirements, #81 (closed)
- new abstract plot class to have uniform plot class design
- New time tracking wrapper to use for functions or classes
- improved logger (info on display, debug into file), #73 (closed), #85 (closed), #88 (closed)
- improved run environment, especially for error handling, #86 (closed)
- prefix
general
in data store scope is now optional and can be skipped. If given scope is notgeneral
, it is treated as subscope, #82 (closed) - all 2D Padding classes are now selected by
Padding2D(padding_name=<padding_type>)
e.g.Padding2D(padding_name="SymPad2D")
, #78 (closed) - custom learning rate (or lr_decay) is optional now, #71 (closed)