Skip to content
Snippets Groups Projects
Select Git revision
  • enxhi_issue460_remove_TOAR-I_access
  • michael_issue459_preprocess_german_stations
  • sh_pollutants
  • develop protected
  • master default protected
  • release_v2.4.0
  • michael_issue450_feat_load-ifs-data
  • lukas_issue457_feat_set-config-paths-as-parameter
  • lukas_issue454_feat_use-toar-statistics-api-v2
  • lukas_issue453_refac_advanced-retry-strategy
  • lukas_issue452_bug_update-proj-version
  • lukas_issue449_refac_load-era5-data-from-toar-db
  • lukas_issue451_feat_robust-apriori-estimate-for-short-timeseries
  • lukas_issue448_feat_load-model-from-path
  • lukas_issue447_feat_store-and-load-local-clim-apriori-data
  • lukas_issue445_feat_data-insight-plot-monthly-distribution
  • lukas_issue442_feat_bias-free-evaluation
  • lukas_issue444_feat_choose-interp-method-cams
  • 414-include-crps-analysis-and-other-ens-verif-methods-or-plots
  • lukas_issue384_feat_aqw-data-handler
  • v2.4.0 protected
  • v2.3.0 protected
  • v2.2.0 protected
  • v2.1.0 protected
  • Kleinert_etal_2022_initial_submission
  • v2.0.0 protected
  • v1.5.0 protected
  • v1.4.0 protected
  • v1.3.0 protected
  • v1.2.1 protected
  • v1.2.0 protected
  • v1.1.0 protected
  • IntelliO3-ts-v1.0_R1-submit
  • v1.0.0 protected
  • v0.12.2 protected
  • v0.12.1 protected
  • v0.12.0 protected
  • v0.11.0 protected
  • v0.10.0 protected
  • IntelliO3-ts-v1.0_initial-submit
40 results

get-started.rst

Blame
  • get-started.rst 8.22 KiB

    Getting started with MLAir

    Install MLAir

    MLAir is based on several python frameworks. To work properly, you have to install all packages from the requirements.txt file. Additionally to support the geographical plotting part it is required to install geo packages built for your operating system. Unfortunately, the names of these package may differ for different systems. In this instruction, we try to address users of different operating systems namely openSUSE Leap, Ubuntu and macOS. If the installation is still not working, we recommend skipping the geographical plot. We have put together a small workaround :ref:`here<Workaround to skip geographical plot>`. For special instructions to install MLAir on the Juelich HPC systems, see section :ref:`Installation on Jülich HPC systems`.

    Pre-requirements

    • Make sure to have the python3.6 version installed.
    • (geo) A c++ compiler is required for the installation of the program cartopy
    • (geo) Install proj and GEOS on your machine using the console.
    • Install the python3.6 develop libraries.

    Installation of MLAir

    • Install all requirements from requirements.txt preferably in a virtual environment
    • Either clone MLAir from the gitlab repository
    • or download the distribution file (current version) and install it via pip install <dist_file>.whl. In this case, you can simply import MLAir in any python script inside your virtual environment using import mlair.
    • (tf) Currently, TensorFlow-1.13 is mentioned in the requirements. We already tested the TensorFlow-1.15 version and couldn't find any compatibility errors. Please note, that tf-1.13 and 1.15 have two distinct branches each, the default branch for CPU support, and the "-gpu" branch for GPU support. If the GPU version is installed, MLAir will make use of the GPU device.

    Special Instructions for Installation

    openSUSE Leap 15.1

    • c++ compiler

    sudo zypper install gcc-c++

    • geo packages

    sudo zypper install proj geos-devel

    • depending on the pre-installed packages it could be required to install further packages

    sudo zypper install libproj-devel binutils gdal-devel graphviz

    • python develop libraries

    sudo zypper install python3-devel

    Ubuntu 20.04.1

    • c++ compiler

    sudo apt install build-essential

    • geo packages

    sudo apt install proj-bin libgeos-dev libproj-dev

    • depending on the pre-installed packages it could be required to install further packages

    sudo apt install graphviz libgeos++-dev

    • python develop libraries

    sudo apt install python3.6-dev

    macOS & windows

    The installation on macOS is not tested yet. The following commands are possibly needed:

    brew install geos

    sudo port install graphviz

    The installation on Windows is not tested yet.

    Installation on Jülich HPC systems

    Please note, that the HPC setup is customised for JUWELS and HDFML. When using another HPC system, you can use the HPC setup files as a skeleton and customise it to your needs.

    The following instruction guide you through the installation on JUWELS and HDFML.

    • Clone the repo to HPC system (we recommend to place it in /p/projects/<project name>).
    • Setup venv by executing source setupHPC.sh. This script loads all pre-installed modules and creates a venv for all other packages. Furthermore, it creates slurm/batch scripts to execute code on compute nodes. You have to enter the HPC project's budget name (--account flag).
    • The default external data path on JUWELS and HDFML is set to /p/project/deepacf/intelliaq/<user>/DATA/toar_<sampling>.
    • To choose a different location open run.py and add the following keyword argument to ExperimentSetup: data_path=<your>/<custom>/<path>.
    • Execute python run.py on a login node to download example data. The program will throw an OSerror after downloading.
    • Execute either sbatch run_juwels_develgpus.bash or sbatch run_hdfml_batch.bash to verify that the setup went well.
    • Currently cartopy is not working on our HPC system, therefore PlotStations does not create any output.

    Note: The method PartitionCheck currently only checks if the hostname starts with ju or hdfmll. Therefore, it might be necessary to adopt the if statement in PartitionCheck._run.

    Workaround to skip geographical plot

    If it is not possible to install all required geo libraries on your system, a good compromise is to skip the creation of the geographical plot. Therefore, it is required to remove the plot from the plot_list manually. We recommend to use this code snippet as a starting point.

    from mlair.helpers import remove_items
    from mlair.configuration.defaults import DEFAULT_PLOT_LIST
    
    mlair.run(plot_list=remove_items(DEFAULT_PLOT_LIST, "PlotStationMap"))

    How to start with MLAir

    In this section, we show three examples how to work with MLAir. Note, that for these examples MLAir was installed using the distribution file. In case you are using the git clone it is required to adjust the import path if not directly executed inside the source directory of MLAir.

    Example 1

    We start MLAir in a dry run without any modification. Just import mlair and run it.

    import mlair
    
    # just give it a dry run without any modification
    mlair.run()

    The logging output will show you many informations. Additional information (including debug messages) are collected inside the experiment path in the logging folder.

    INFO: DefaultWorkflow started
    INFO: ExperimentSetup started
    INFO: Experiment path is: /home/<usr>/mlair/testrun_network
    ...
    INFO: load data for DEBW001 from JOIN
    ...
    INFO: Training started
    ...
    INFO: DefaultWorkflow finished after 00:00:12 (hh:mm:ss)

    Example 2

    Now we update the stations and customise the window history size parameter.

    import mlair
    
    # our new stations to use
    stations = ['DEBW030', 'DEBW037', 'DEBW031', 'DEBW015', 'DEBW107']
    
    # expanded temporal context to 14 (days, because of default sampling="daily")
    window_history_size = 14
    
    # restart the experiment with little customisation
    mlair.run(stations=stations,
              window_history_size=window_history_size)

    The output looks similar, but we can see, that the new stations are loaded.

    INFO: DefaultWorkflow started
    INFO: ExperimentSetup started
    ...
    INFO: load data for DEBW030 from JOIN
    INFO: load data for DEBW037 from JOIN
    ...
    INFO: Training started
    ...
    INFO: DefaultWorkflow finished after 00:00:24 (hh:mm:ss)

    Example 3

    Let's just apply our trained model to new data. Therefore we keep the window history size parameter but change the stations. In the run method, we need to disable the trainable and create new model parameters. MLAir will use the model we have trained before. Note, this only works if the experiment path has not changed or a suitable trained model is placed inside the experiment path.

    import mlair
    
    # our new stations to use
    stations = ['DEBY002', 'DEBY079']
    
    # same setting for window_history_size
    window_history_size = 14
    
    # run experiment without training
    mlair.run(stations=stations,
              window_history_size=window_history_size,
              create_new_model=False,
              trainable=False)

    We can see from the terminal that no training was performed. Analysis is now made on the new stations.

    INFO: DefaultWorkflow started
    ...
    INFO: No training has started, because trainable parameter was false.
    ...
    INFO: DefaultWorkflow finished after 00:00:06 (hh:mm:ss)