MachineLearningTools
This project contains the source code to rerun "IntelliO3-ts v1.0: A neural network approach to predict near-surface ozone concentrations in Germany" by F. Kleinert, L. H. Leufen and M. G. Schultz (2020, submitted to GMD). Moreover, the source code includes some functionality which is not used in the study named above.
Installation
We assume that you have downloaded or cloned the project from GitLab or
b2share. In the latter case, you can skip the first remark
regarding the data_path
. Instructions on how to rerun the specific version are given below.
- Install proj and geos on your machine using the console. E.g. for OpenSUSE / leap
zypper install proj
- c++ compiler required for cartopy installation
- graphviz is required to plot the model architecture
- Make sure that CUDA 10.0 is installed if you want to use Nvidia GPUs (compatible with TensorFlow 1.13.1).
Depending on your system (GPU available or not) you can create a virtual environment by executing
python3.6 -m venv venv
. Make sure that the venv is activated (source venv/bin/activate
). Afterwards
you can install the requirements into the venv:
- CPU version:
pip install -r requirements.txt
- GPU version:
pip install -r requirements_gpu.txt
Remarks on the first setup
-
The source code does not include any data to process. Instead, it checks if data are available on your local machine and downloads data from the here. We did not implement a default
data_path
as we want to allow you to choose where exactly the data should be stored. Consequently, you have to pass a custom data path toExperimentSetup
inrun.py
(see example below). If all required data are already locally available, the program does not download any new data. -
Please note that cartopy may cause errors on run time. If cartopy raises an error, you can try the following (in activated venv):
pip uninstall shapely
pip uninstall cartopy
pip install --upgrade numpy
pip install --no-binary shapely shapely
pip install cartopy
Catropy is needed only to create one plot showing the station locations and does not affect the neural network itself. If the procedure above does not solve the problem, you can force the workflow to ignore cartopy by adding the first two characters of your hostname (
echo $HOSTNAME
) as a list containing a string to the keyword argumenthpc_hosts
inrun.py
. The example below assumes that the output ofecho $HOSTNAME
is "your_hostname".Please also consult the installation instruction of the cartopy package itself.
Example of all remarks given above:
import [...]
[...]
def main(parser_args):
[...]
with RunEnvironment():
ExperimentSetup(parser_args,
data_path="<your>/<custom>/<path>" # <- Remark 1
hpc_hosts=["yo"], # <- Remark 2
[...]
HPC - JUWELS and HDFML setup
The following instruction guides you through the installation on JUWELS and HDFML.
- Clone the repo to HPC system (we recommend to place it in
/p/projects/<project name>
. - Setup venv by executing
source setupHPC.sh
. This script loads all pre-installed modules and creates a venv for all other packages. Furthermore, it creates slurm/batch scripts to execute code on compute nodes.
You have to enter the HPC project's budget name (--account flag). - The default external data path on JUWELS and HDFML is set to
/p/project/deepacf/intelliaq/<user>/DATA/toar_<sampling>
.
To choose a different location open `run.py` and add the following keyword argument to `ExperimentSetup`: `data_path=//`. * Execute `python run.py` on a login node to download example data. The program throws an OSerror after downloading. * Execute either `sbatch run_juwels_develgpus.bash` or `sbatch run_hdfml_batch.bash` to verify that the setup went well. * Currently cartopy is not working on our HPC system, therefore PlotStations does not create any output.
HPC JUWELS and HDFML remarks
Please note, that the HPC setup is customised for JUWELS and HDFML. When using another HPC system, you can use the HPC setup files as a skeleton and customise it to your needs.
Note: The method PartitionCheck
currently only checks if the hostname starts with ju
or hdfmll
.
Therefore, it might be necessary to adopt the if
statement in src/run_modules/PartitionCheck._run
.
IntelliO3-ts
How to runAfter you followed the instructions above, you can rerun the trained model by executing (in activated venv)
python run.py --experiment_date=IntelliO3-ts
. If you want to train the model from scratch, you have to modify run.py
.
To be more precise, you have to set create_new_model=True
. Please note that the evaluation of bootstrapped input
variables takes some time. You can skip this evaluation by using evaluate_bootstraps=False
.