+[Climatological mean data](#climatological-mean-data)
+[Other systems](#other-systems)
*[Run the workflow](#run-the-workflow)
*[Preparation with NVIDIA's TF1.15 singularity containers](#preparation-with-nvidia-s-tf115-singularity-containers)
*[Create specific runscripts](#create-specific-runscripts)
...
...
@@ -32,11 +33,34 @@
**A**tmopsheric **M**achine learning **B**enchmarking **S**ystem (AMBS) aims to provide state-of-the-art video prediction methods applied to the meteorological domain. In the scope of the current application, the hourly evolution of the 2m temperature over a used-defined region is focused.
Different Deep Learning video prediction architectures such as convLSTM and SAVP are trained with ERA5 reanalysis to perform a prediction for 12 hours based on the previous 12 hours. In addition to the 2m temperature (t2m) itself, other variables can be fed to the video frame prediction models to enhance their capability to learn the complex physical processes driving the diurnal cycle of temperature. Currently, the recommended additional meteorological variables are the 850 hPa temperature (t850) and the total cloud cover (tcc) as described in our preprint GMD paper.
Different Deep Learning video prediction architectures such as convLSTM and SAVP are trained with ERA5 reanalysis to perform a prediction for 12 hours based on the previous 12 hours. In addition to the 2m temperature (2t) itself, other variables can be fed to the video frame prediction models to enhance their capability to learn the complex physical processes driving the diurnal cycle of temperature. Currently, the recommended additional meteorological variables are the 850 hPa temperature (t850) and the total cloud cover (tcc) as described in our preprint GMD paper.
## Prepare your dataset
#### Access ERA5 dataset (~TB)
The experiment described in the GMD paper relies on the rather large ERA5 dataset with 13 years data.
- For the users of JSC HPC system: You access the data from the followin path: /p/fastdata/slmet/slmet111/met_data/ecmwf/era5/grib. If you meet access permission issue please contact: Stein, Olaf <o.stein@fz-juelich.de>
- For the users of other HPC sytems: You can retrieve the ERA5 data from the ECMWF MARS archive by specifying a resolution of 0.3° in the retrieval script (keyword "GRID", "https://confluence.ecmwf.int/pages/viewpage.action?pageId=123799065 "). The variable names and the corresponding paramID can be found in the ECMWF documentaation website [ERA5 documentations](https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Howtoacknowledge,citeandrefertoERA5)
We recommend the users to store the data following the input structure of the described [in the following description](#input-and-output-folder-structure-and-naming-convention)
#### Dry run with small samples (~15 GB)
In our application, we are dealing with the large dataset. Nevertheless, we also prepared rather small samples ~ 15 GB (3 months data with few variables) to help the users to be able fast test the workflow. The data can be downloaded through the following link [link!!] . For the users of deepacf project in JSC: You can also access from the following path `cd /p/project/deepacf/deeprain/video_prediction_shared_folder/GMD_samples`
#### Climatological mean data
climatological mean which is inferred at each grid point from the ERA5 reanalysis data between 1990 and 2019 is used in the postprocess step. The data can be downloaded along with the small samples [link!!] .
## Prerequisites
- Linux or macOS
- Python 3
- Python 3.6
- CPU or NVIDIA GPU + CUDA CuDNN
- MPI
- Tensorflow 1.13.1 or CUDA-enabled NVIDIA TensorFlow 1.15 within a singularity container
...
...
@@ -68,8 +92,9 @@ cd ambs/video_preditcion_tools/
### Set-up virtual environment
AMBS is the tool for HPC systems. Other computing systems sucha s personal computer is not supported. Currently, we provide two approaches to set up your virtual envionrment based on either Jülich HPC system or other HPC systems. The introduction is described as below.
AMBS is a tool for the users who develop on HPC systems with Slurm batch systems since the large-scale dataset and architectures would be used.
However, aforementioned we also provide a small dataset and runscripts for the users that can explore the tool on their personal computer systems.
In such case, we provide three approaches to set up your virtual environment based on systems that the users work on: Jülich HPC system, other HPC systems, or other computer systems. The introduction is described below.
#### On Jülich's HPC systems
...
...
@@ -81,9 +106,9 @@ cd env_setup
source create_env.sh <env_name>
```
This also already sets up the runscript templates with regards to the five steps of the workflow for you under the folder `[...]/ambs/video_prediction_tools/HPC_scripts`.
This also already sets up the runscript templates with regards to the five steps of the workflow for you under the folder `[...]/ambs/video_prediction_tools/JSC_scripts`.
By default, the runscript templates make use of the standard target base directory `/p/project/deepacf/deeprain/video_prediction_shared_folder/`. This directory will serve as your standard top-level direcotry to store the output of each step in the workflow (see details in #Input-and-output-folder-tructure-and-naming-convention). In case that you want to deviate from this, you may call `create_env.sh` as follows:
By default, the runscript templates make use of the standard target base directory `/p/project/deepacf/deeprain/video_prediction_shared_folder/`. This directory will serve as your standard top-level direcotry to store the output of each step in the workflow see details in the [folder structure section](#input-and-output-folder-tructure-and-naming-convention). In case that you want to deviate from this, you may call `create_env.sh` to setup a new root direcotyr as follows:
**Note** that suifficient read-write permissions and a reasonable amount of memory space is mandatory for your alternative standard output directory.
#### On other HPC systems
Setting up the environment on other HPC is different from the ones in JSC since there is quite diversity with regards to the available software stack. The users need to load the modules manually. We prepare the templates for each step of workflow under the `HPC_scripts` . The users can follow the guidance to customise the templates.
If you are working on other HPC system, you are required to custermise the templates under the folder `nonJSC_HPC_scripts`. You can then replace the templates to the `HPC_scripts`.
#### Other systems
### Prepare your dataset
AMBS also allows the users to test on other non-HPC machines. You may enter the folder `../ambs/video_prediction_tools/env_setup` and excute:
#### Dry run with small samples (~15 GB)
In weatheer and Climate application, we are also dealing with the large dataset. However, we prepare rather small samples (3 months data with few variables) to help the users test the workflow.
- For the users of JSC HPC system: The data can be downloaded through the following link [link!!] .
- For the users of deepacf project: You can also access `cd /p/scratch/deepacf/deeprain/ji4/GMD_data_example`
#### Access ERA5 dataset (~TB)
The experiment described in the GMD paper relies on the rather large ERA5 dataset with 13 years data.
- For the users of JSC HPC system: You access the data from the followin path: /p/fastdata/slmet/slmet111/met_data/ecmwf/era5/grib. If you meet access permission issue please contact: Stein, Olaf <o.stein@fz-juelich.de>
- For the users of other HPC sytems: You can retrieve the ERA5 data from the ECMWF MARS archive by specifying a resolution of 0.3° in the retrieval script (keyword "GRID", "https://confluence.ecmwf.int/pages/viewpage.action?pageId=123799065 "). The variable names and the corresponding paramID can be found in the ECMWF Parameter database website [ERA5 documentations](https://apps.ecmwf.int/codes/grib/param-db)
#### Climatological mean data
climatological mean which is inferred at each grid point from the ERA5 reanalysis data between 1990 and 2019 is used in the postprocess step. The data can be download [LINK!!]
```bash
source create_env_non_HPC.sh <env_name>
```
Then the virtual enviornment will be created under `../ambs/video_prediction_tools/virtual_envs`. The required packages (`requirement_non_HPC.txt`) will be installed.
### Run the workflow
Depending on the computing system you are working on, the workflow steps will be invoked by dedicated runscripts either from the directory `HPC_scripts/` (on known HPC-systems, see above) or from the directory `JSC_scripts/`.
Depending on the computing system you are working on, the workflow steps will be invoked by dedicated runscripts either from the directory `JSC_scripts/` (on known HPC-systems, see above) or from the directory `HPC_scripts/`, `other_scripts/`
To help the users conduct different experiments with different configuration (e.g. input variables, hyperparameters etc). Each runscript can be set up conveniently with the help of the Python-script `generate_runscript.py`. Its usage as well the workflow runscripts are described subsequently.
### Preparation with NVIDIA's TF1.15 singularity containers
...
...
@@ -124,15 +139,18 @@ a singularity container with a CUDA-enabled NVIDIA TensorFlow v1.15 was made ava
Then, you can either download container image ([Link](https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel_21-09.html#rel_21-09)) and place it under the folder`HPC_script`; Or you can access to the image though the symlink command as below, if you are part of the *deepacf*project (still link to the `HPC_scripts`-directory)
Note that if you are the user of JSC HPC system, you need to log in [Judoor account] (https://judoor.fz-juelich.de/login) and specifically ask for the request to access to the restricted container software.
### Create specific runscripts
Specific runscripts for each workfow substep (see below) are generated conventiently by keyboard interaction.
Specific runscripts for each workflow substep (see below) are generated conveniently by keyboard interaction.
The interactive Python-script thereby has to be executed in an activated virtual environment with some addiional modules! After prompting
The interactive Pythonscript thereby has to be executed in an activated virtual environment with some additional modules! After prompting
```bash
python generate_runscript.py
...
...
@@ -145,29 +163,28 @@ You will be asked first which workflow runscript shall be generated. You can cho
- train
- postprocess
The subsequent keyboard interactions then allow the user to make individual settings to the workflow step at hand. By pressing simply Enter, the user may receive some guidance for the keyboard interaction.
The subsequent keyboard interactions then allow the user to make individual settings to the workflow step at hand. By pressing simply Enter, the user may receive some guidance for the keyboard interaction.
Note that the runscript creation of later workflow substeps depends on the preceding steps (i.e. by checking the arguments from keyboard interaction).
Thus, they should be created sequentially instead of all at once at the beginning.
**Warning**: the `generate_runscript.py` currently is only for the JSC users. You can skip this step for non-JSC HPC users. If you have different settings for various experiments, you can simply copy the template to a new file where you can customize your setting.
### Running the workflow substeps
Having created the runscript by keyboard interaction, the workflow substeps can be run sequentially. Depending on the machine you are working on, change either to `HPC_scripts/` (on Juwels, Juwels Booster or HDF-ML).
There, the respective runscripts for all steps of the workflow are located
whose order is as follows. Note that `[sbatch]` only has to precede on one of the HPC systems. Besides data extraction and preprocessing step 1 are onyl mandatory when ERA5 data is subject to the application.
Having created the runscript by keyboard interaction, the workflow substeps can be run sequentially. Depending on the machine you are working on, change either to `JSC_scripts/` (on Juwels, Juwels Booster or HDF-ML), `HPC_scripts/` or `other_scripts/` . The respective runscripts for all steps of the workflow are located whose order is as follows. Note that `[sbatch]` only has to precede on one of the HPC systems. Besides data extraction and preprocessing step 1 are only mandatory when ERA5 data is subject to the application.
Note we provide default configurations for each runscripts
that the users still need to manully configure flags based on which project and HPC systems you are currently working on. Particurly, you must replace the flag `#SBATCH --account =<your computing project name>` with your project name. For partitions `#SBATCH --partition=`, we refer the users to the following link [JUWELS/JUWELS Booster](https://apps.fz-juelich.de/jsc/hps/juwels/batchsystem.html#slurm-partitions) for further information. If you are using HDF-MLsystem, you can simply use `batch` as partition.
that the users still need to manully configure flags based on which project and HPC systems you work on. Particurly, you must configure the flag `#SBATCH --account =<your computing project name>` with your project name. For partitions `#SBATCH --partition`, we refer the users to the following link [JUWELS/JUWELS Booster](https://apps.fz-juelich.de/jsc/hps/juwels/batchsystem.html#slurm-partitions) for further information. If you are using HDF-MLsystem, you can simply use `batch` as partition.
Now it is time to run the AMBS workflow
1. Data Extraction: This script retrieves the demanded variables for user-defined years from complete ERA% reanalysis grib-files and stors the data into netCDF-files.
1. Data Extraction: This script retrieves the demanded variables for user-defined years from complete ERA% reanalysis grib-files and stores the data into netCDF-files.
```bash
[sbatch] ./data_extraction_era5.sh
```
2. Data Preprocessing: Crop the ERA 5-data (multiple years possible) to the region of interest (preprocesing step 1),
The TFrecord-files which are fed to the trained model (next workflow step) are created afterwards. Thus, two cases exist at this stage:
2. Data Preprocessing: Crop the ERA 5-data (multiple years possible) to the region of interest (preprocesing step 1). All the year data will be touched once and the statistics are calculated and saved in the output folder. The TFrecord-files which are fed to the trained model (next workflow step) are created afterwards. Thus, two cases exist at this stage:
```bash
[sbatch] ./preprocess_data_era5_step1.sh
...
...
@@ -181,9 +198,7 @@ Note that the `exp_id` is generated automatically when running `generate_runscri
[sbatch] ./train_model_era5_<exp_id>.sh
```
4. Postprocess: Create some plots and calculate the evaluation metrics for test dataset. <br>
Note that the `exp_id` is generated automatically when running `generate_runscript.py`.
4. Postprocess: Create some plots and calculate the evaluation metrics for test dataset. Note that the `exp_id` is generated automatically when running `generate_runscript.py`.
```bash
[sbatch] ./visualize_postprocess_era5_<exp_id>.sh
...
...
@@ -191,14 +206,12 @@ Note that the `exp_id` is generated automatically when running `generate_runscri
### Compare and visualize the results
AMBS also provide the tool (called met_postprocess) for the users to compare different experiments results and visualize the results as shown in GMD paper through
`meta_postprocess` step. The runscript template are also prepared in the `HPC_scripts`.
AMBS also provide the tool (called met_postprocess) for the users to compare different experiments results and visualize the results as shown in GMD paper through `meta_postprocess` step. The runscript template are also prepared in the `HPC_scripts`, `JSC_scripts`, and `other_scripts`.
### Input and Output folder structure and naming convention
To succesfully runt the workflow and enable to track the result from each step, inputs and output directories, and the file name convention should be constructed as below:
To successfully run the workflow and enable to track the result from each step, inputs and output directories, and the file name convention should be constructed as described below:
The example of inputs structure for ERA5 dataset. In detail, the data is recoredly hourly and stored into two grib files. The file with postfix `*_ml.grb` consists of multilayers of the variables, whereas `_sf.grb` only include the surface data.
We demonstrate an example of inputs structure for ERA5 dataset. In detail, the data is recorded hourly and stored into two grib files. The file with postfix `*_ml.grb` consists of multi-layers of the variables, whereas `_sf.grb` only includes the surface data.
```
├── ERA5 dataset
...
...
@@ -211,11 +224,9 @@ The example of inputs structure for ERA5 dataset. In detail, the data is recore
│ │ │ ├── *_ml.grb
│ │ │ ├── *_sf.grb
│ │ │ ├── ...
```
The root output directory should be set up when you run the workflow at the first time as aformentioned
The output strucutre for each step of the workflow along with the file name convention are described below:
The root output directory should be set up when you run the workflow at the first time as aformentioned. The output strucutre for each step of the workflow along with the file name convention are described below:
```
├── ExtractedData
│ ├── [Year]
...
...
@@ -253,7 +264,7 @@ The output strucutre for each step of the workflow along with the file name conv
```
-***Details of file name convention:***
| Arguments | Value |
|--- |--- |
| [Year] | 2005;2006;2007,...,2019|
...
...
@@ -261,15 +272,15 @@ The output strucutre for each step of the workflow along with the file name conv