CHANGELOG.md



To find the state of this project's repository at the time of any of these versions, check out the tags.


Changelog
All notable changes to this project will be documented in this file.

v1.5.0 -  2021-11-11  - 

general:

introduces method to estimate sample uncertainty
improved multiprocessing
last release with tensorflow v1 support


new features:

test set sample uncertainty estmation during postprocessing (#333 (closed))
support of Kolmogorov Zurbenko filter for data handlers with filters (#334 (closed))


technical:

new communication scheme for multiprocessing (#321 (closed), #322 (closed))
improved error reporting (#323 (closed))
feature importance returns now unaggregated results (#335 (closed))
error metrics are reported for all competitors (#332 (closed))
minor bugfixes and refacs (#330 (closed), #326 (closed), #329 (closed), #325 (closed), #324 (closed), #320 (closed), #337 (closed))


v1.4.0 -  2021-07-27  - new model classes and data handlers, improved usability and transparency

general:

many technical adjustments to improve usability and transparency of MLAir
new FCN and CNN classes for easy NN model creation
new plots


new features:

new FCN class that can be customized in many ways (#284 (closed))
also new CNN class (#289 (closed))
added new bootstrap analysis method: mean bootstrapping (#300 (closed))
new data handler using FIR filters (#306 (closed))
performance measures are now stored in local files (#286 (closed))
histogram plots for inputs and targets (#299 (closed))
periodogram plots for filtered data (#298 (closed))


technical:

a calling run script can be stored inside experiment folder if reference to this script is parsed as argument (#99 (closed))
new callback to track epoch-runtime (#312 (closed))
added switch to use multiprocessing (#297 (closed))
customize maximum number of parallel processes (#308 (closed))
support non-monotonic window lead times (#313 (closed))
resolved bug with FileExistsError (#311 (closed))
resolved bug if no chemical is used at all (#307 (closed))
min/max scaler now scales between -1 and 1 (#302 (closed))
added missing offset parameter to some data handlers (#305 (closed))
improved data store logging (#304 (closed))
improved logging message on station removal in preprocessing (#294 (closed))
limited number of retries in JOIN module (#296 (closed))
adjusted competing skill score plot (#301 (closed))
transformation parameter check (#295 (closed))
implemented lazy data preprocessing for selected data handlers (#292 (closed))
fix bug in separation of scales data handler (#290 (closed))


v1.3.0 -  2021-02-24  - competitors and improved transformation

general:

release of official MLAir logo (#274 (closed))
new transformation schema for better independence of MLAir and data handler (#272 (closed))
competing models can be included in postprocessing for direct comparison (#198 (closed))


new features:

new helper functions for geographic issues (#280 (closed))
default data handler and inheritances can use min/max and log transformation (#276 (closed), #275 (closed))
include IntelliO3-ts model as reference via automatic download (#131 (closed))


technical:

experiment name now always includes target sampling type (#263 (closed))
competitive skill score plot is refactored (#260 (closed))
bug fix for climatological skill scores (#259 (closed))
bug fix for custom objects handling (#277 (closed))
bug fix for monitoring plots when multiple output branches are used (#278 (closed))
update requirements to newer version and dependencies (#262 (closed), #273 (closed))
HPC scripts are updated to work properly with parallel data processing (#281 (closed))


v1.2.1 -  2021-02-08  - bug fix for recursive import error

general:

applied bug fix


technical:

bug fix for recursive import error, (#269 (closed))


v1.2.0 -  2020-12-18  - parallel preprocessing and improved data handlers

general:

new plots
parallelism for faster preprocessing
improved data handler with mixed sampling types
enhanced test coverage


new features:

station map plot highlights now subsets on the map and displays number of stations for each subset (#227 (closed), #231 (closed))
two new data availability plots PlotAvailabilityHistogram (#191 (closed), #192 (closed), #223 (closed))
introduced parallel code in preprocessing if system supports parallelism (#164 (closed), #224 (closed), #225 (closed))
data handler DataHandlerMixedSampling (and inheritances) supports an offset parameter to end inputs at a different time than 00 hours (#220 (closed))
args for data handler DataHandlerMixedSampling (and inheritances) that differ for input and target can now be parsed as tuple (#229 (closed))


technical:

added templates for release and bug issues (#189 (closed))
improved test coverage (#236 (closed), #238 (closed), #239 (closed), #240 (closed), #241 (closed), #242 (closed), #243 (closed), #244 (closed), #245 (closed))
station map plot includes now number of stations for each subset (#231 (closed))
postprocessing plots are encapsulated in try except statements (#107 (closed))
updated git settings (#213 (closed))
bug fix for data handler (#235 (closed))
reordering and bug fix for preprocessing reporting (#207 (closed), #232 (closed))
bug fix for outdated system path style (#226 (closed))
new plots are included in default plot list (#211 (closed))

helpers/join connection to ToarDB (e.g. used by DefaultDataHandler) reports now which variable could not be loaded (#222 (closed))
plot PlotBootstrapSkillScore can now additionally highlight specific variables, but not included in postprocessing up to now (#201 (closed))
data handler DataHandlerMixedSampling has now a reduced data loading (#221 (closed))


v1.1.0 -  2020-11-18  - hourly resolution support and new data handlers

general:

MLAir can be used with 1H resolution data from JOIN
new data handlers to use the Kolmogorov-Zurbenko filter and mixed sampling types


new features:

new data handler DataHandlerKzFilter to use Kolmogorov-Zurbenko filter (kz filter) on inputs (#195 (closed))
new data handler DataHandlerMixedSampling that can used mixed sampling types for input and target (#197 (closed))
new data handler DataHandlerMixedSamplingWithFilter that uses kz filter and mixed sampling (#197 (closed))
new data handler DataHandlerSeparationOfScales to filter-depended time steps sizes on filtered inputs using mixed sampling (#196 (closed))


technical:

bug fix for very short time series in TimeSeriesPlot (#215 (closed))
bug fix for variable dictionary when using hourly resolution (#212 (closed))
variable naming for data from JOIN interface harmonised (#206 (closed))
transformation setup is now separated for inputs and targets (#202 (closed))
bug fix in PlotClimatologicalSkillScore if only single station is used (#193 (closed))
preprocessed data is now stored inside experiment and not in the data folder


v1.0.0 -  2020-10-08  - official release of new version 1.0.0

general:

This is the first official release of MLAir ready for use
updated license, installation instruction


technical:

restructured order of packages in requirements


v0.12.2 -  2020-10-01  - HDFML support

general:

HDFML support


technical:

installation script for HDFML adjusted, #183 (closed)


v0.12.1 -  2020-09-28  - examples in notebook

general:

introduced a notebook documentation for easy starting, #174 (closed)

updated special installation instructions for the Juelich HPC systems, #172 (closed)


new features:

names of input and output shape are renamed consistently to: input_shape, and output_shape, #175 (closed)


technical:

it is possible to assign a custom name to a run module (e.g. used in logging), #173 (closed)


v0.12.0 -  2020-09-21  - Documentation and Bugfixes

general:

improved documentation include installation instructions and many examples from the paper, #153 (closed)

bugfixes (see technical)


new features:


MyLittleModel is now a pure feed-forward network (before it had a CNN part), #168 (closed)


technical:

new compile options check to ensure its execution, #154 (closed)

bugfix for key errors in time series plot, #169 (closed)

bugfix for not used kwargs in DefaultDataHandler, #170 (closed)


trainable parameter is renamed by train_model to prevent confusion with the tf trainable parameter, #162 (closed)

fixed HPC installation failure, #159 (closed)


v0.11.0 -  2020-08-24  -  Advanced Data Handling for MLAir

general

Introduce advanced data handling with much more flexibility (independent of TOAR DB, custom data handling is
pluggable), #144 (closed)

default data handler is still using TOAR DB


new features

default data handler using TOAR DB refactored according to advanced data handling, #140 (closed), #141 (closed), #152 (closed)

data sets are handled as collections, #142 (closed), and are iterable in a standard way (StandardIterator) and optimised for
keras (KerasIterator), #143 (closed)

automatically moving station map plot, #136 (closed)


technical

model modules available from package, #139 (closed)

renaming of parameter time dimension, #151 (closed)

refactoring of README.md, #138 (closed)


v0.10.0 -  2020-07-15  -  MLAir is official name, Workflows, easy Model plug-in

general

Official project name is released: MLAir (Machine Learning on Air data)
a model class can now easily be plugged in into MLAir. #121 (closed)

introduced new concept of workflows, #134 (closed)


new features

workflows are used to execute a sequence of run modules, #134 (closed)

default workflows for standard and the Juelich HPC systems are available, custom workflows can be defined, #134 (closed)

seasonal decomposition is available for conditional quantile plot, #112 (closed)

map plot is created with coordinates, #108 (closed)


flatten_tails are now more general and easier to customise, #114 (closed)

model classes have custom compile options (replaces set_loss), #110 (closed)

model can be set in ExperimentSetup from outside, #121 (closed)

default experiment settings can be queried using get_defaults(), #123 (closed)

training and model settings are reported as MarkDown and Tex tables, #145 (closed)


technical

Juelich HPC systems are supported and installation scripts are available, #106 (closed)

data store is tracked, I/O is saved and illustrated in a plot, #116 (closed)

batch size, epoch parameter have to be defined in ExperimentSetup, #127 (closed), #122 (closed)

automatic documentation with sphinx, #109 (closed)

default experiment settings are updated, #123 (closed)

refactoring of experiment path and its default naming, #124 (closed)

refactoring of some parameter names, #146 (closed)

preparation for package distribution with pip, #119 (closed)

all run scripts are updated to run with workflows, #134 (closed)

the experiment folder is restructured, #130 (closed)