diff --git a/src/run_modules/README.md b/src/run_modules/README.md new file mode 100644 index 0000000000000000000000000000000000000000..eab10c72113c8f2ae4b3851f60a22c983251f12b --- /dev/null +++ b/src/run_modules/README.md @@ -0,0 +1,96 @@ +# On data handling + +This readme declares which function loads which data and where it is stored. + +## experiment setup + +*Data_path* is the destination where all downloaded data is locally stored. Data is downloaded from TOARDB either using +the JOIN interface or a direct connection to the underlying PostgreSQL DB. If data was already downloaded, no new +download will be started. Missing data will be downloaded on the fly and saved in data_path. + +`data_path = src.helpers.prepare_host()` + + Current implementation leads to following paths: + + | hostname | path | comment | + | --- | --- | --- | + | ZAM144 | `/home/{user}/Data/toar_daily/` | notebook Felix | + | zam347 | `/home/{user}/Data/toar_daily/` | ESDE server | + | linux-gzsx | `/home/{user}/machinelearningtools/data/toar_daily/` | notebook Lukas | + | jureca | `/p/project/cjjsc42/{user}/DATA/toar_daily/` | JURECA | + | juwels | `/p/home/jusers/{user}/juwels/intelliaq/DATA/toar_daily/` | JUWELS | + | runner-6HmDp9Qd-project-2411-concurrent | `/home/{user}/machinelearningtools/data/toar_daily/` | gitlab-runner | + +*experiment_path* is the root folder in that all results from the experiment are saved. For each experiment there should +be distinct folder. Experiment path is can be set in ExperimentSetup. `experiment_date` can be set by parser_args and +`experiment_path` (this argument is not the same as the internal stored experiment_path!) as args. The *experiment_path* +is the combination of both given arguments `os.path.join(experiment_path, f"{experiment_date}_network")`. Inside this +folder, several subfolders are created in the course of the program. + +```bash +data_path + <station1>_<var1>_<var2>_..._<varx>.nc + <station1>_<var1>_<var2>_..._<varx>_meta.csv + <station2>_<var1>_<var2>_..._<varx>.nc + <station2>_<var1>_<var2>_..._<varx>_meta.csv +------ +experiment_path +| history.json +| history_lr.json +| <experiment_name>_model.pdf +| <experiment_name>_model-best.h5 +| <experiment_name>_my_model.h5 +├─── forecasts +| forecasts_<station1>_test.nc +| forecasts_<station2>_test.nc +| ... +└─── plots + conditional_quantiles_cali-ref_plot.pdf + conditional_quantiles_like-bas_plot.pdf + test_monthly_box.pdf + test_map_plot.pdf + <experiment_name>_history_learning_rate.pdf + <experiment_name>_history_loss.pdf + <experiment_name>_history_main_loss.pdf + <experiment_name>_history_main_mse.pdf + ... + +``` + +*plot_path* includes all created plots. If not given, this is create into the experiment_path by default (as shown in +the folder structure above). Can be customised by `ExperimentSetup(plot_path=<path>)`. + +*forecast_path* is the place, where all forecasts are stored as netcdf file. Each file consists exactly one single +station. If not given, this is create into the experiment_path by default (as shown in the folder structure above). Can +be customised by `ExperimentSetup(forecast_path=<path>)`. + +## pre-processing + +Each requested station is check whether it is already included in *data_path*. The files all following the naming +convention `<station_name>_<sorted_list_of_all_variables_split_by_underscore>.nc`. E.g. the station *DEBW013* with the +variables cloudcover, NO, NO2, O3 and temp (all TOARDB short names) is saved as `DEBW013_cloudcover_no_no2_o3_temp.nc`, +whereas the same station with only O3 and temperature becomes `DEBW013_o3_temp.nc`. Although all data of the latter file +is potentially also included in the former file, the program will always download the data specification for new and +save this data into a new file. Only if the exactly fitting file is available locally, no data is downloaded. **NOTE**: +There is no check on data time range, only the name is compared. Set `overwrite_local_data=True` +in `experiment_setup.py` to overwrite local data by downloading new data. + +## model setup + +*checkpoint* is created inside *experiment_path* as `<experiment_name>_model-best.h5`. + +The architecture of the model is plotted into *experiment_path* as `<experiment_name>_model.pdf` + +## training + +Training metrics are saved in `history.json` and `history_lr.json`. + +Best model is saved in `<experiment_name>_my_model.h5`. + +## post-processing + +During the *make_forecast* method, all calculated forecasts of the neural network, persistence, ordinary least squared +and the target values with the regarding lead time are saved locally inside *forecast_path* as +`forecasts_<station>_test.nc`. + +All plots are created inside *plot_path*. \ No newline at end of file