-[Introduction to Atmospheric Machine Learning Benchmarking System](#introduction-to-atmospheric-machine-learning-benchmarking-system)
-[Requirements](#requirements)
-[Getting started](#getting-started)
*[Download of the repository](#download-of-the-repository)
*[Download of NVIDIA's TF1.15 singularity container](#download-of-nvidia-s-tf115-singularity-container)
*[Workflow without singularity containers](#workflow-without-singularity-containers)
*[Virtual environment](#virtual-environment)
+[On HPC-systems](#on-hpc-systems)
-[Case I - With TF1.15 singularity container](#case-i---with-tf115-singularity-container)
-[Case II - Without TF1.15 singularity container, but with software modules](#case-ii---without-tf115-singularity-container--but-with-software-modules)
+[On other systems](#on-other-systems)
+[Further details on the arguments](#further-details-on-the-arguments)
-[Contributors and contact](#contributors-and-contact)
[TOC]
## Introduction to Atmospheric Machine Learning Benchmarking System
...
...
@@ -205,7 +181,7 @@ The following steps are part of the workflow:
[sbatch] ./data_extraction_era5.sh
```
2) **Data Preprocessing**:<br> In this step, the ERA 5-data is sliced to the region of interest (preprocesing step 1). All data is loaded into memory once which allows computing some statistics (for later normalization) and saved then saved as pickle-files in the output directory. The TFRecord-files which are streamed to the neural network for training and postprocessing are created in preprocessing step 2. Thus, two (batch-) scripts have to be executed:
2) **Data Preprocessing**:<br> In this step, the ERA 5-data is sliced to the region of interest (preprocesing step 1). All data is loaded into memory once which allows computing some statistics (for later normalization) and then saved as pickle-files in the output directory. The TFRecord-files which are streamed to the neural network for training and postprocessing are created in preprocessing step 2. Thus, two (batch-) scripts have to be executed:
```bash
[sbatch] ./preprocess_data_era5_step1.sh
...
...
@@ -223,55 +199,55 @@ The following steps are part of the workflow:
```bash
[sbatch] ./visualize_postprocess_era5_<exp_id>.sh
```
5) **Meta-Postprocessing**: <br> AMBS also provides a runscript to compare different models against each other (called meta-postprocessing). This happens in the `meta_postprocess`-step. While the runscript generator currently cannot handle this step, this step can be configured by adapting the file `meta_config.json` in the ` meta_postprocess_config`-directory. The related runscript can be created from the template is also provided under `HPC_scripts/` and `no_HPC_scripts`, respectively.
5) **Meta-Postprocessing**: <br> AMBS also provides a runscript to compare different models against each other (called meta-postprocessing). This happens in the `meta_postprocess`-step. While the runscript generator currently cannot handle this step, this step can be configured by adapting the file `meta_config.json` in the ` meta_postprocess_config/`-directory. The related runscript can be created from the template is also provided under `HPC_scripts/` and `no_HPC_scripts/`, respectively.
```bash
[sbatch] ./meta_postprocess_era5.sh
```
### Additional Jupyter Notebooks
Following up the interactive discussion during the peer-review phase (click on `discussion` when opening the [manuscript's landing page](https://doi.org/10.5194/gmd-2021-430)), some additional evaluations have been conducted. While training of one convolutional model from WeatherBench has been integrated into the workflow, some evaluations (e.g. IFS forecast evaluation) have been realized in scope of Jupyter Notebooks. These Notebooks are provided in the `Jupyter_Notebooks/`-directory where further (technical) details is given in the Notebooks. The software requirements to run these Jupyter Notebooks are the same as for the workflow (see [above](#Software-requirements)).
Following up the interactive discussion during the peer-review phase (click on `discussion` when opening the [manuscript's landing page](https://doi.org/10.5194/gmd-2021-430)), some additional evaluations have been conducted. While training of one convolutional model from WeatherBench has been integrated into the workflow, some evaluations (e.g. ERA 5 short-range forecast evaluation) have been realized in scope of Jupyter Notebooks. These Notebooks are provided in the `Jupyter_Notebooks/`-directory where further (technical) details are given in the Notebooks. The software requirements to run these Jupyter Notebooks are the same as for the workflow (see [above](#Software-requirements)).
## Directory tree and naming convention
To successfully run the workflow and enable tracking the results from each workflow step, inputs and output directories as well as the file names should be follow the convention depicted below.
At first, we show the input data structure for the ERA5 dataset. In detail, the data is recorded hourly and stored into two different kind of grib files. The files with suffix `*_ml.grb` provide data on the model levels of the underlying IFS model (to allow subsequent interpolation onto pressure levels), whereas `*_sf.grb` include data without a vertical dimension.
At first, we show the directory structure for the ERA5 dataset which serves as the raw input data source in this study. In detail, the data is hourly available and stored into two different kind of grib files. The files with suffix `*_ml.grb` provide data on the model levels of the underlying IFS model (to allow subsequent interpolation onto pressure levels), whereas `*_sf.grb` include data without a vertical dimension.
```
├── ERA5 dataset
│ ├── [Year]
│ │ ├── [Month]
│ ├── <YYYY>
│ │ ├── <MM>
│ │ │ ├── *_ml.grb
│ │ │ ├── *_sf.grb
│ │ │ ├── ...
│ │ ├── [Month]
│ │ ├── <MM>
│ │ │ ├── *_ml.grb
│ │ │ ├── *_sf.grb
│ │ │ ├── ...
```
The root output directory should be set up when you run the workflow at the first time as (see `<my_target_dir>`-parameter of `create_env.sh` as described [here](#Virtual-environment)).
The base output directory, where all the results of the workflow are stored, should be set up when running the workflow for the first time (see `<my_target_dir>`-parameter of `create_env.sh` as described [here](#Virtual-environment)).
The output structure for each workflow step (directory tree) following the filename convention should be:
The structure of the base output directory (i.e. the directory tree) should be as follows. More details on the naming convention are provided below.
```
├── extractedData
│ ├── [<YYYY>]
│ │ ├── [<MM>]
│ │ │ ├── ecmwf_era5_[<YYMMDDHH>].nc
│ ├── <YYYY>
│ │ ├── <MM>
│ │ │ ├── ecmwf_era5_<YYMMDDHH>.nc
├── preprocessedData
│ ├── [directory_name_convention]
│ ├── <directory_name_convention>
│ │ ├── pickle
│ │ │ ├── [<YYYY>]
│ │ │ │ ├── X_[<MM>].pkl
│ │ │ │ ├── T_[<MM>].pkl
│ │ │ │ ├── stat_[<MM>].pkl
│ │ ├── tfrecords_seq_len_[X]
│ │ │ ├── sequence_Y_[<YYYY>]_M_[<MM>].tfrecords
│ │ │ ├── <YYYY>
│ │ │ │ ├── X_<MM>.pkl
│ │ │ │ ├── T_<MM>.pkl
│ │ │ │ ├── stat_<MM>.pkl
│ │ ├── tfrecords_seq_len_<X>
│ │ │ ├── sequence_Y_<YYYY>_M_<MM>.tfrecords
│ │ │── metadata.json
│ │ │── options.json
├── models
│ ├── [directory_name_convention]
│ │ ├── [model_name]
│ ├── <directory_name_convention>
│ │ ├── <model_name>
│ │ │ ├── <timestamp>_<user>_<exp_id>
│ │ │ │ ├── checkpoint_<iteration_step>
│ │ │ │ │ ├── model_*
...
...
@@ -282,36 +258,40 @@ The output structure for each workflow step (directory tree) following the filen
│ │ │ │ │── val_losses.pkl
│ │ │ │ │── *.json
├── results
│ ├── [directory_name_convention]
│ │ ├── [model_name]
│ ├── <directory_name_convention>
│ │ ├── <model_name>
│ │ │ ├── <timestamp>_<user>_<exp_id>
│ │ │ │ ├── vfp_date_[<YYYYMMDDHH>]_*.nc
│ │ │ │ ├── vfp_date_<YYYYMMDDHH>_*.nc
│ │ │ │ ├── evalutation_metrics.nc
│ │ │ │ ├── *.png
├── meta_postprocoess
│ ├── [experiment ID]
│ ├── <exp_id>
```
#### Overview on placeholders of the output directory tree
| Arguments | Value |
|--- |--- |
| [<`YYYY`>] | four-digit years, e.g. 2007, 2008 etc.|
| [<`MM`>] | two-digit month, e.g. 01, 02, ..., 12|
| [<`DD`>] | two-digit day, e.g. 01, 02, ..., 31|
| [<`HH`>] | two-digit day, e.g. 01, 02, ..., 24|
|[`directory_name_convention`]| name indicating the data period, the target domain and the selected variables|
[`model_name`] | convLSTM, savp or weatherBench|
| `<YYYY>` | four-digit years, e.g. 2007, 2008 etc.|
| `<MM>` | two-digit month, e.g. 01, 02, ..., 12|
| `<DD>` | two-digit day, e.g. 01, 02, ..., 31|
| `<HH>` | two-digit day, e.g. 01, 02, ..., 24|
| `<X>` | index of sequence (TFRecords)
|`<directory_name_convention>`| name indicating the data period, the target domain and the selected variables|
`<model_name>` | convLSTM, savp or weatherBench|
`<timestamp>` | time stamp for experiment (from runscript generator) |
`<user>` | the user name on the operating system |
`<exp_id>` | experiment ID (customized by user) |
#### Directory name convention
The meaning of all components of the directory name convention
`Y[yyyy]-[yyyy]M[mm]to[mm]-[nx]x[ny]-[nn.nn]N[ee.ee]E-[var1]_[var2]_[var3]` is:
`Y<YYYY>-<YYYY>M<MM>to<MM>-<nx>x<ny>-<nn.nn>N<ee.ee>E-<var1>_<var2>_<var3>` is:
*`Y[<YYYY>]to[<YYYY>]M[<MM>]to[<MM>]`:>]`: data period defined by years and months
* `[nx]x[ny]`: the size of the target domain, e.g. 92x56 means 92 grid points in longitude and 56 grid points in latitude direction
* `[nn.nn]N[ee.ee]E`: the geolocation of the south-west corner of the target domain, e.g. 38.40N0.00E for the largest target domain
* `[var1]_[var2]_[var3]`: short names of selected meteorological variables
*`Y<YYYY>-<YYYY>M<MM>to<MM>`: data period defined by years and months
*`<nx>x<ny>`: the size of the target domain, e.g. 92x56 means 92 grid points in longitude and 56 grid points in latitude direction
*`<nn.nn>N<ee.ee>E`: the geolocation of the south-west corner of the target domain, e.g. 38.40N0.00E for the largest target domain
*`<var1>_<var2>_<var3>`: short names of selected meteorological variables (channels)