implement lazy data preprocessing

lazy data loading

lazy data loading on first time if possible

store the data locally in data path under different folder e.g.
create a checksum for the name and reuse this data always if checksum fits (this will replace all previous steps and save a lot of computation time)
if this works, this can be used for all subsets because data was preloaded as "preprocessing". For all subsets is would be sufficient to use lazy loading
checkout how to create a checksum
store attributes _data, meta, input_data, target_data (data already loaded, interpolated, kzf applied, only create history, labels, ... has to be performed), additional attributes are stored for the DataHandlerKzFilterSingleStation (self.cutoff_period, self.cutoff_period_days)
add parameter lazy_preprocessing which is default False to trigger lazy preprocessing
compare checksum and try to load data
continue with missing steps
there must be a check regarding variables, and start/end point. Data must be reloaded if start date is earlier than available in data (maybe there could be a check for the case, that there is not data for the starting point, which would trigger an unintended repreprocessing of data) -> NO check for start and end. We assume that data are first used with total time range.

Check this links out:

These links are related how a class can be stored

https://stackoverflow.com/questions/23582489/python-pickle-protocol-choice

https://stackoverflow.com/questions/4529815/saving-an-object-data-persistence

It seems that a checksum is not possible to create for classes. Maybe there is a way to create a string that summarises all essential properties of a class and create a checksum from this?

Edited Mar 17, 2021 by Ghost User