experimental: load preprocessing snapshot
load preprocessing from previous run (experimental)
Especially with big data it is anoying to rerun preprocessing although data is not changing. The lazy preprocessing functionality of the data handlers reduces the amount of time spent for recurring (but not changing) tasks to a good minimum, but has problems when using a huge number of stations. Furthermore, preprocessing seems to slow down (in case of big data/many stations) from the overall evaluate station (which is quite fast still) up to the test and combined train/val data set (it takes multiple second for each station, although computation is same or less in general).
idea
Store some kind of snapshot after preprocessing that can be loaded in a follow-up run. This snapshot should be able to be loaded from a local path and it must be assured that all configuration is identical between the initial run and this continuing run.
requirements
- Store snapshot of data store as pickle in each run
- load snapshot of data store and compare with current data store
- which parameters are allowed to change? This shouldn't be included in the comparison (e.g. paths to write forecasts or plots to)
- after loading the snapshot, data should be accessible from the snapshot exp (important: not the data is snapshoted but the references)
- what about the batches? They could be reused too.
todo
- in each run independent
-
create snapshot
folder -
store snapshot after preprocessing
-
- when loading from snapshot
-
set path to load snapshot from -
abort run if snapshot does not match -
be able to use the data refered from the snapshot
-