4 merge requests!125Release v0.10.0,!124Update Master to new version v0.10.0,!119Resolve "Include advanced data handling in workflow",!114Lukas issue119 feat package distribution
This is a collection of all relevant functions used for ML stuff in the ESDE group
## Inception Model
See a description [here](https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202)
or take a look on the papers [Going Deeper with Convolutions (Szegedy et al., 2014)](https://arxiv.org/abs/1409.4842)
and [Network In Network (Lin et al., 2014)](https://arxiv.org/abs/1312.4400).
# MLAir - Machine Learning on Air Data
MLAir (Machine Learning on Air data) is an environment that simplifies and accelerates the creation of new machine
learning (ML) models for the analysis and forecasting of meteorological and air quality time series.
# Installation
* Install __proj__ on your machine using the console. E.g. for opensuse / leap `zypper install proj`
* c++ compiler required for cartopy installation
## HPC - JUWELS and HDFML setup
The following instruction guide you throug the installation on JUWELS and HDFML.
* Clone the repo to HPC system (we recommend to place it in `/p/projects/<project name>`.
* A c++ compiler is required for the installation of the program __cartopy__
* Install all requirements from `requirements.txt` preferably in a virtual environment
* Installation of MLAir:
* Either clone MLAir from its repository in gitlab (link??) and use it without installation
* or download the distribution file (?? .whl) and install it via `pip install <??>`. In this case, you can simply
import MLAir in any python script inside your virtual environment using `import mlair`.
## Special instructions for installation on Jülich HPC systems
_Please note, that the HPC setup is customised for JUWELS and HDFML. When using another HPC system, you can use the HPC
setup files as a skeleton and customise it to your needs._
The following instruction guide you through the installation on JUWELS and HDFML.
* Clone the repo to HPC system (we recommend to place it in `/p/projects/<project name>`).
* Setup venv by executing `source setupHPC.sh`. This script loads all pre-installed modules and creates a venv for
all other packages. Furthermore, it creates slurm/batch scripts to execute code on compute nodes. <br>
You have to enter the HPC project's budget name (--account flag).
...
...
@@ -27,9 +30,6 @@ You have to enter the HPC project's budget name (--account flag).
* Execute either `sbatch run_juwels_develgpus.bash` or `sbatch run_hdfml_batch.bash` to verify that the setup went well.
* Currently cartopy is not working on our HPC system, therefore PlotStations does not create any output.
### HPC JUWELS and HDFML remarks
Please note, that the HPC setup is customised for JUWELS and HDFML. When using another HPC system, you can use the HPC setup files as a skeleton and customise it to your needs.
Note: The method `PartitionCheck` currently only checks if the hostname starts with `ju` or `hdfmll`.
Therefore, it might be necessary to adopt the `if` statement in `PartitionCheck._run`.
...
...
@@ -39,8 +39,7 @@ Therefore, it might be necessary to adopt the `if` statement in `PartitionCheck.
* To use hourly data from ToarDB via JOIN interface, a private token is required. Request your personal access token and
add it to `src/join_settings.py` in the hourly data section. Replace the `TOAR_SERVICE_URL` and the `Authorization`
value. To make sure, that this **sensitive** data is not uploaded to the remote server, use the following command to
prevent git from tracking this file: `git update-index --assume-unchanged src/join_settings.py
`
prevent git from tracking this file: `git update-index --assume-unchanged src/join_settings.py`
# Customise your experiment
...
...
@@ -97,3 +96,10 @@ station-wise std is a decent estimate of the true std.
scaling values instead of the calculation method. For method *centre*, std can still be None, but is required for the
*standardise* method. **Important**: Format of given values **must** match internal data format of DataPreparation
class: `xr.DataArray` with `dims=["variables"]` and one value for each variable.
## Inception Model
See a description [here](https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202)
or take a look on the papers [Going Deeper with Convolutions (Szegedy et al., 2014)](https://arxiv.org/abs/1409.4842)
and [Network In Network (Lin et al., 2014)](https://arxiv.org/abs/1312.4400).