Skip to content
Snippets Groups Projects
README.md 20.61 KiB

MLAir Logo.

MLAir - Machine Learning on Air Data

MLAir (Machine Learning on Air data) is an environment that simplifies and accelerates the creation of new machine learning (ML) models for the analysis and forecasting of meteorological and air quality time series. You can find the docs here.

Installation

MLAir is based on several python frameworks. To work properly, you have to install all packages from the requirements.txt file. Additionally to support the geographical plotting part it is required to install geo packages built for your operating system. Unfortunately, the names of these package may differ for different systems. In this instruction, we try to address users of different operating systems namely openSUSE Leap, Ubuntu and macOS. If the installation is still not working, we recommend skipping the geographical plot. We have put together a small workaround here. For special instructions to install MLAir on the Juelich HPC systems, see here.

  • Make sure to have the python3.6 version installed.
  • (geo) A c++ compiler is required for the installation of the program cartopy
  • (geo) Install proj and GEOS on your machine using the console.
  • Install the python3.6 develop libraries.
  • Install all requirements from requirements.txt preferably in a virtual environment. You can use pip install -r requirements.txt to install all requirements at once. Note, we recently updated the version of Cartopy and there seems to be an ongoing issue when installing numpy and Cartopy at the same time. If you run into trouble, you could use cat requirements.txt | cut -f1 -d"#" | sed '/^\s*$/d' | xargs -L 1 pip install instead.
  • Installation of MLAir:
    • Either clone MLAir from the gitlab repository and use it without installation (beside the requirements)
    • or download the distribution file (current version) and install it via pip install <dist_file>.whl. In this case, you can simply import MLAir in any python script inside your virtual environment using import mlair.
  • (tf) Currently, TensorFlow-1.13 is mentioned in the requirements. We already tested the TensorFlow-1.15 version and couldn't find any compatibility errors. Please note, that tf-1.13 and 1.15 have two distinct branches each, the default branch for CPU support, and the "-gpu" branch for GPU support. If the GPU version is installed, MLAir will make use of the GPU device.

openSUSE Leap 15.1

  • c++ compiler

sudo zypper install gcc-c++

  • geo packages

sudo zypper install proj geos-devel

  • depending on the pre-installed packages it could be required to install further packages

sudo zypper install libproj-devel binutils gdal-devel graphviz graphviz-gnome

  • python develop libraries

sudo zypper install python3-devel

Ubuntu 20.04.1

  • c++ compiler

sudo apt install build-essential

  • geo packages

sudo apt install proj-bin libgeos-dev libproj-dev

  • depending on the pre-installed packages it could be required to install further packages

sudo apt install graphviz libgeos++-dev

  • python develop libraries

sudo apt install python3.6-dev

macOS & windows

The installation on macOS is not tested yet. The following commands are possibly needed:

brew install geos

sudo port install graphviz

The installation on Windows is not tested yet.

How to start with MLAir

In this section, we show three examples how to work with MLAir. Note, that for these examples MLAir was installed using the distribution file. In case you are using the git clone it is required to adjust the import path if not directly executed inside the source directory of MLAir. There is also a downloadable Jupyter Notebook provided in that you can run the following examples. Note that this notebook still requires an installation of MLAir.

Example 1

We start MLAir in a dry run without any modification. Just import mlair and run it.

import mlair

# just give it a dry run without any modification 
mlair.run()

The logging output will show you many informations. Additional information (including debug messages) are collected inside the experiment path in the logging folder.

INFO: DefaultWorkflow started
INFO: ExperimentSetup started
INFO: Experiment path is: /home/<usr>/mlair/testrun_network 
...
INFO: load data for DEBW107 from JOIN
INFO: load data for DEBY081 from JOIN
INFO: load data for DEBW013 from JOIN
INFO: load data for DEBW076 from JOIN
INFO: load data for DEBW087 from JOIN
...
INFO: Training started
...
INFO: DefaultWorkflow finished after 0:03:04 (hh:mm:ss)

Example 2

Now we update the stations and customise the window history size parameter.

import mlair

# our new stations to use
stations = ['DEBW030', 'DEBW037', 'DEBW031', 'DEBW015', 'DEBW107']

# expanded temporal context to 14 (days, because of default sampling="daily")
window_history_size = 14

# restart the experiment with little customisation
mlair.run(stations=stations, 
          window_history_size=window_history_size)

The output looks similar, but we can see, that the new stations are loaded.

INFO: DefaultWorkflow started
INFO: ExperimentSetup started
...
INFO: load data for DEBW030 from JOIN 
INFO: load data for DEBW037 from JOIN 
INFO: load data for DEBW031 from JOIN 
INFO: load data for DEBW015 from JOIN 
...
INFO: Training started
...
INFO: DefaultWorkflow finished after 00:02:03 (hh:mm:ss)

Example 3

Let's just apply our trained model to new data. Therefore we keep the window history size parameter but change the stations. In the run method, we need to disable the trainable and create new model parameters. MLAir will use the model we have trained before. Note, this only works if the experiment path has not changed or a suitable trained model is placed inside the experiment path.

import mlair

# our new stations to use
stations = ['DEBY002', 'DEBY079']

# same setting for window_history_size
window_history_size = 14

# run experiment without training
mlair.run(stations=stations, 
          window_history_size=window_history_size, 
          create_new_model=False, 
          train_model=False)

We can see from the terminal that no training was performed. Analysis is now made on the new stations.

INFO: DefaultWorkflow started
...
INFO: No training has started, because train_model parameter was false. 
...
INFO: DefaultWorkflow finished after 0:01:27 (hh:mm:ss)

Default Workflow

MLAir is constituted of so-called run_modules which are executed in a distinct order called workflow. MLAir provides a default_workflow. This workflow runs the run modules ExperimentSetup, PreProcessing, ModelSetup, Training, and PostProcessing one by one.

Sketch of the default workflow.