save progress of experiment steps
This applies only to
RunEnvironment and its inheritances.
- save datastore as pickle (on
__del__): naming like
- query if class is already executed (on
- skip, if already executed (directly go from
- force button for rerun (independently if checkpoint is available or not)
Background: This issue is required, if mlt is running on HPC systems and different partitions. E.g. experiment setup and preprocessing shall run on CPU-nodes (and on login-nodes because of the required internet connection), but the training step should be performed on the GPU partition. The post-processing (not evaluated yet, if GPU is required for bootstrap prediction and if it is actually faster) can be performed afterwards on CPU again.
- implement checkpoint saving on local disk
- implement loading of checkpoints
- implement skipping of execution if checkpoint was loaded
RunEnvironmentbehaviour (when not called as inheritance): add force button, clean-up button
- check speed of postprocessing depending on partition (not really related, but interesting for the final setup: if postprocessing is run on CPU or GPU)