From 857f198a684316c014f5fcf81844803bcbd1aceb Mon Sep 17 00:00:00 2001
From: Bing Gong <b.gong@fz-juelich.de>
Date: Sun, 13 Feb 2022 00:53:29 +0100
Subject: [PATCH] Update README.md

---
 README.md | 90 ++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 70 insertions(+), 20 deletions(-)

diff --git a/README.md b/README.md
index 3c00dbc0..fad3606c 100644
--- a/README.md
+++ b/README.md
@@ -22,11 +22,9 @@
   * [Running the workflow substeps](#running-the-workflow-substeps)
   * [Compare and visualize the results](#compare-and-visualize-the-results)
   * [Input and Output folder structure and naming convention](#input-and-output-folder-structure-and-naming-convention)
-  * [Benchmarking architectures:](#benchmarking-architectures-)
-  * [Contributors and contact](#contributors-and-contact)
-  * [On-going work](#on-going-work)
-
-
+- [Benchmarking architectures:](#benchmarking-architectures-)
+- [Contributors and contact](#contributors-and-contact)
+- [On-going work](#on-going-work)
 
 
 
@@ -171,7 +169,6 @@ Now it is time to run the AMBS workflow
 2. Data Preprocessing: Crop the ERA 5-data (multiple years possible) to the region of interest (preprocesing step 1),
 The TFrecord-files which are fed to the trained model (next workflow step) are created afterwards. Thus, two cases exist at this stage:
 
-    * **ERA 5 data**
     ```bash
     [sbatch] ./preprocess_data_era5_step1.sh
     [sbatch] ./preprocess_data_era5_step2.sh
@@ -179,7 +176,7 @@ The TFrecord-files which are fed to the trained model (next workflow step) are c
 
 3. Training: Training of one of the available models with the preprocessed data. 
 Note that the `exp_id` is generated automatically when running `generate_runscript.py`.
-    * **ERA 5 data**
+
     ```bash
     [sbatch] ./train_model_era5_<exp_id>.sh
     ```
@@ -187,7 +184,7 @@ Note that the `exp_id` is generated automatically when running `generate_runscri
 4. Postprocess: Create some plots and calculate the evaluation metrics for test dataset. <br>
 Note that the `exp_id` is generated automatically when running `generate_runscript.py`.
 
-    * **ERA 5 data**
+
     ```bash
     [sbatch] ./visualize_postprocess_era5_<exp_id>.sh
     ```
@@ -199,9 +196,26 @@ AMBS also provide the tool (called met_postprocess) for the users to compare dif
 
 
 ### Input and Output folder structure and naming convention
+To succesfully runt the workflow and enable to track the result from each step, inputs and output directories, and the file name convention should  be constructed as below:
+
+The example of inputs structure for ERA5 dataset. In detail, the  data is recoredly hourly and stored into two grib files. The file with postfix `*_ml.grb` consists of multi layers of the variables, whereas `_sf.grb` only include the surface data.
+
+```
+├── ERA5 dataset
+│   ├── [Year]
+│   │   ├── [Month]
+│   │   │   ├── *_ml.grb 
+│   │   │   ├── *_sf.grb 
+│   │   │   ├── ...
+│   │   ├── [Month]
+│   │   │   ├── *_ml.grb 
+│   │   │   ├── *_sf.grb 
+│   │   │   ├── ...
 
-The details can be found [name_convention](docs/structure_name_convention.md)
 
+```
+The root output directory should be set up when you run the workflow at the first time as aformentioned
+The output strucutre for each step of the workflow along with the file name convention are described below:
 ```
 ├── ExtractedData
 │   ├── [Year]
@@ -210,32 +224,69 @@ The details can be found [name_convention](docs/structure_name_convention.md)
 ├── PreprocessedData
 │   ├── [Data_name_convention]
 │   │   ├── pickle
-│   │   │   ├── train
-│   │   │   ├── val
-│   │   │   ├── test
+│   │   │   ├── X_<Month>.pkl
+│   │   │   ├── T_<Month>.pkl
+│   │   │   ├── stat_<Month>.pkl
 │   │   ├── tfrecords
-│   │   │   ├── train
-│   │   │   ├── val
-│   │   │   ├── test
+│   │   │   ├── sequence_Y_<Year>_M_<Month>.tfrecords
+│   │   │── metadata.json
 ├── Models
 │   ├── [Data_name_convention]
 │   │   ├── [model_name]
-│   │   ├── [model_name]
+│   │   │   ├── <timestamp>_<user>_<exp_id>
+│   │   │   │   ├── checkpoint_<iteration>
+│   │   │   │   │   ├── model_*
+│   │   │   │   │── timing_per_iteration_time.pkl
+│   │   │   │   │── timing_total_time.pkl
+│   │   │   │   │── timing_training_time.pkl
+│   │   │   │   │── train_losses.pkl
+│   │   │   │   │── val_losses.pkl
+│   │   │   │   │── *.json 
 ├── Results
 │   ├── [Data_name_convention]
 │   │   ├── [training_mode]
 │   │   │   ├── [source_data_name_convention]
 │   │   │   │   ├── [model_name]
+│   │   │   │   │  ├── *.nc
+├── meta_postprocoess
+│   ├── [experiment ID]
 
 ```
 
-### Benchmarking architectures:
+
+| Arguments	| Value	|
+|---	|---	|
+| [Year]	| 2005;2006;2007,...,2019|
+| [Month]	| 01;02;03 ...,12|
+|[Data_name_convention]|Y[yyyy]to[yyyy]M[mm]to[mm]-[nx]_[ny]-[nn.nn]N[ee.ee]E-[var1]_[var2]_[var3]|
+|[model_name]| convLSTM, savp, ...|
+
+***Data name convention***
+
+`Y[yyyy]to[yyyy]M[mm]to[mm]-[nx]_[ny]-[nn.nn]N[ee.ee]E-[var1]_[var2]_[var3]`
+ - Y[yyyy]to[yyyy]M[mm]to[mm]
+ - [nx]_[ny]: the size of images,e.g 64_64 means 64*64 pixels 
+ - [nn.nn]N[ee.ee]E: the geolocation of selected regions with two decimal points. e.g : 0.00N11.50E
+ - [var1]_[var2]_[var3]: the abbrevation of selected variables
+
+
+| Examples	| Name abbrevation 	|
+|---	|---	|
+|all data from March to June of the years 2005-2015	| Y2005toY2015M03to06 |   
+|data from February to May of years 2005-2008 + data from March to June of year 2015| Y2005to2008M02to05_Y2015M03to06 |   
+|Data from February to May, and October to December of 2005 |  Y2005M02to05_Y2015M10to12 |   
+|operational’ data base: whole year 2016 |  Y2016M01to12 |   
+|add new whole year data of 2017 on the operational data base |Y2016to2017M01to12 |  
+|Note: Y2016to2017M01to12 = Y2016M01to12_Y2017M01to12|  
+
+
+## Benchmarking architectures:
 
 - convLSTM: [paper](https://papers.nips.cc/paper/5955-convolutional-lstm-network-a-machine-learning-approach-for-precipitation-nowcasting.pdf),[code](https://github.com/loliverhennigh/Convolutional-LSTM-in-Tensorflow)
 - Stochastic Adversarial Video Prediction (SAVP): [paper](https://arxiv.org/pdf/1804.01523.pdf),[code](https://github.com/alexlee-gk/video_prediction) 
 - Variational Autoencoder:[paper](https://arxiv.org/pdf/1312.6114.pdf)
 
-### Contributors and contact
+## Contributors and contact
 
 The project is currently developed by Bing Gong, Michael Langguth, Amirpasha Mozafarri, and Yan Ji. 
 
@@ -246,11 +297,10 @@ The project is currently developed by Bing Gong, Michael Langguth, Amirpasha Moz
 
 Former code developers are Scarlet Stadtler and Severin Hussmann.
 
-### On-going work
+## On-going work
 
 - Port to PyTorch version
 - Parallel training neural network
 - Integrate precipitation data and new architecture used in our submitted CVPR paper
 - Integrate the ML benchmark datasets such as Moving MNIST 
 
-
-- 
GitLab