Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
  • feature-gp
2 results

bayesian-statistical-learning-2

  • Clone with SSH
  • Clone with HTTPS
  • On data handling

    This readme declares which function loads which data and where it is stored.

    experiment setup

    data_path is the destination where all downloaded data is locally stored. Data is downloaded from TOARDB either using the JOIN interface or a direct connection to the underlying PostgreSQL DB. If data was already downloaded, no new download will be started. Missing data will be downloaded on the fly and saved in data_path.

    data_path = src.helpers.prepare_host()

    Current implementation leads to following paths:

    hostname path comment
    ZAM144 /home/{user}/Data/toar_daily/ notebook Felix
    zam347 /home/{user}/Data/toar_daily/ ESDE server
    linux-gzsx /home/{user}/machinelearningtools/data/toar_daily/ notebook Lukas
    jureca /p/project/cjjsc42/{user}/DATA/toar_daily/ JURECA
    juwels /p/home/jusers/{user}/juwels/intelliaq/DATA/toar_daily/ JUWELS
    runner-6HmDp9Qd-project-2411-concurrent /home/{user}/machinelearningtools/data/toar_daily/ gitlab-runner

    experiment_path is the root folder in that all results from the experiment are saved. For each experiment there should be distinct folder. Experiment path is can be set in ExperimentSetup. experiment_date can be set by parser_args and experiment_path (this argument is not the same as the internal stored experiment_path!) as args. The experiment_path is the combination of both given arguments os.path.join(experiment_path, f"{experiment_date}_network"). Inside this folder, several subfolders are created in the course of the program.

    data_path
        <station1>_<var1>_<var2>_..._<varx>.nc
        <station1>_<var1>_<var2>_..._<varx>_meta.csv
        <station2>_<var1>_<var2>_..._<varx>.nc
        <station2>_<var1>_<var2>_..._<varx>_meta.csv
    ------
    experiment_path
    |   history.json
    |   history_lr.json
    |   <experiment_name>_model.pdf
    |   <experiment_name>_model-best.h5
    |   <experiment_name>_my_model.h5
    ├─── forecasts 
    |       forecasts_<station1>_test.nc
    |       forecasts_<station2>_test.nc
    |       ...
    └─── plots
            conditional_quantiles_cali-ref_plot.pdf
            conditional_quantiles_like-bas_plot.pdf
            monthly_summary_box_plot.pdf
            skill_score_clim_all_terms_<architecture>.pdf
            skill_score_clim_<architecture>.pdf
            skill_score_competitive_<architecture>.pdf
            station_map.pdf
            <experiment_name>_history_learning_rate.pdf
            <experiment_name>_history_loss.pdf
            <experiment_name>_history_main_loss.pdf
            <experiment_name>_history_main_mse.pdf
            ...
    

    plot_path includes all created plots. If not given, this is create into the experiment_path by default (as shown in the folder structure above). Can be customised by ExperimentSetup(plot_path=<path>).

    forecast_path is the place, where all forecasts are stored as netcdf file. Each file consists exactly one single station. If not given, this is create into the experiment_path by default (as shown in the folder structure above). Can be customised by ExperimentSetup(forecast_path=<path>).

    pre-processing

    Each requested station is check whether it is already included in data_path. The files all following the naming convention <station_name>_<sorted_list_of_all_variables_split_by_underscore>.nc. E.g. the station DEBW013 with the variables cloudcover, NO, NO2, O3 and temp (all TOARDB short names) is saved as DEBW013_cloudcover_no_no2_o3_temp.nc, whereas the same station with only O3 and temperature becomes DEBW013_o3_temp.nc. Although all data of the latter file is potentially also included in the former file, the program will always download the data specification for new and save this data into a new file. Only if the exactly fitting file is available locally, no data is downloaded. NOTE: There is no check on data time range, only the name is compared. Set overwrite_local_data=True in experiment_setup.py to overwrite local data by downloading new data.

    model setup

    checkpoint is created inside experiment_path as <experiment_name>_model-best.h5.

    The architecture of the model is plotted into experiment_path as <experiment_name>_model.pdf

    training

    Training metrics are saved in history.json and history_lr.json.

    Best model is saved in <experiment_name>_my_model.h5.

    post-processing

    During the make_forecast method, all calculated forecasts of the neural network, persistence, ordinary least squared and the target values with the regarding lead time are saved locally inside forecast_path as forecasts_<station>_test.nc.

    All plots are created inside plot_path.