implement lazy data preprocessing

lazy data loading

lazy data loading on first time if possible

  • store the data locally in data path under different folder e.g.

  • create a checksum for the name and reuse this data always if checksum fits (this will replace all previous steps and save a lot of computation time)

  • if this works, this can be used for all subsets because data was preloaded as "preprocessing". For all subsets is would be sufficient to use lazy loading

  • checkout how to create a checksum

  • store attributes _data, meta, input_data, target_data (data already loaded, interpolated, kzf applied, only create history, labels, ... has to be performed), additional attributes are stored for the DataHandlerKzFilterSingleStation (self.cutoff_period, self.cutoff_period_days)

  • add parameter lazy_preprocessing which is default False to trigger lazy preprocessing

  • compare checksum and try to load data

  • continue with missing steps

  • there must be a check regarding variables, and start/end point. Data must be reloaded if start date is earlier than available in data (maybe there could be a check for the case, that there is not data for the starting point, which would trigger an unintended repreprocessing of data) -> NO check for start and end. We assume that data are first used with total time range.

Check this links out:

These links are related how a class can be stored

https://stackoverflow.com/questions/23582489/python-pickle-protocol-choice

https://stackoverflow.com/questions/4529815/saving-an-object-data-persistence

It seems that a checksum is not possible to create for classes. Maybe there is a way to create a string that summarises all essential properties of a class and create a checksum from this?

Edited by Ghost User