Skip to content
Snippets Groups Projects
Commit 8ad5f2c8 authored by lukas leufen's avatar lukas leufen
Browse files

created detailed description of all data paths (which steps creates which data...

created detailed description of all data paths (which steps creates which data and where is it stored)
parent ac662ace
No related branches found
No related tags found
2 merge requests!37include new development,!27Lukas issue032 feat plotting postprocessing
Pipeline #28569 passed
# On data handling
This readme declares which function loads which data and where it is stored.
## experiment setup
*Data_path* is the destination where all downloaded data is locally stored. Data is downloaded from TOARDB either using
the JOIN interface or a direct connection to the underlying PostgreSQL DB. If data was already downloaded, no new
download will be started. Missing data will be downloaded on the fly and saved in data_path.
`data_path = src.helpers.prepare_host()`
Current implementation leads to following paths:
| hostname | path | comment |
| --- | --- | --- |
| ZAM144 | `/home/{user}/Data/toar_daily/` | notebook Felix |
| zam347 | `/home/{user}/Data/toar_daily/` | ESDE server |
| linux-gzsx | `/home/{user}/machinelearningtools/data/toar_daily/` | notebook Lukas |
| jureca | `/p/project/cjjsc42/{user}/DATA/toar_daily/` | JURECA |
| juwels | `/p/home/jusers/{user}/juwels/intelliaq/DATA/toar_daily/` | JUWELS |
| runner-6HmDp9Qd-project-2411-concurrent | `/home/{user}/machinelearningtools/data/toar_daily/` | gitlab-runner |
*experiment_path* is the root folder in that all results from the experiment are saved. For each experiment there should
be distinct folder. Experiment path is can be set in ExperimentSetup. `experiment_date` can be set by parser_args and
`experiment_path` (this argument is not the same as the internal stored experiment_path!) as args. The *experiment_path*
is the combination of both given arguments `os.path.join(experiment_path, f"{experiment_date}_network")`. Inside this
folder, several subfolders are created in the course of the program.
```bash
data_path
<station1>_<var1>_<var2>_..._<varx>.nc
<station1>_<var1>_<var2>_..._<varx>_meta.csv
<station2>_<var1>_<var2>_..._<varx>.nc
<station2>_<var1>_<var2>_..._<varx>_meta.csv
------
experiment_path
| history.json
| history_lr.json
| <experiment_name>_model.pdf
| <experiment_name>_model-best.h5
| <experiment_name>_my_model.h5
├─── forecasts
| forecasts_<station1>_test.nc
| forecasts_<station2>_test.nc
| ...
└─── plots
conditional_quantiles_cali-ref_plot.pdf
conditional_quantiles_like-bas_plot.pdf
test_monthly_box.pdf
test_map_plot.pdf
<experiment_name>_history_learning_rate.pdf
<experiment_name>_history_loss.pdf
<experiment_name>_history_main_loss.pdf
<experiment_name>_history_main_mse.pdf
...
```
*plot_path* includes all created plots. If not given, this is create into the experiment_path by default (as shown in
the folder structure above). Can be customised by `ExperimentSetup(plot_path=<path>)`.
*forecast_path* is the place, where all forecasts are stored as netcdf file. Each file consists exactly one single
station. If not given, this is create into the experiment_path by default (as shown in the folder structure above). Can
be customised by `ExperimentSetup(forecast_path=<path>)`.
## pre-processing
Each requested station is check whether it is already included in *data_path*. The files all following the naming
convention `<station_name>_<sorted_list_of_all_variables_split_by_underscore>.nc`. E.g. the station *DEBW013* with the
variables cloudcover, NO, NO2, O3 and temp (all TOARDB short names) is saved as `DEBW013_cloudcover_no_no2_o3_temp.nc`,
whereas the same station with only O3 and temperature becomes `DEBW013_o3_temp.nc`. Although all data of the latter file
is potentially also included in the former file, the program will always download the data specification for new and
save this data into a new file. Only if the exactly fitting file is available locally, no data is downloaded. **NOTE**:
There is no check on data time range, only the name is compared. Set `overwrite_local_data=True`
in `experiment_setup.py` to overwrite local data by downloading new data.
## model setup
*checkpoint* is created inside *experiment_path* as `<experiment_name>_model-best.h5`.
The architecture of the model is plotted into *experiment_path* as `<experiment_name>_model.pdf`
## training
Training metrics are saved in `history.json` and `history_lr.json`.
Best model is saved in `<experiment_name>_my_model.h5`.
## post-processing
During the *make_forecast* method, all calculated forecasts of the neural network, persistence, ordinary least squared
and the target values with the regarding lead time are saved locally inside *forecast_path* as
`forecasts_<station>_test.nc`.
All plots are created inside *plot_path*.
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment