Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
MLAir
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
esde
machine-learning
MLAir
Commits
a5ca723c
Commit
a5ca723c
authored
5 years ago
by
Felix Kleinert
Browse files
Options
Downloads
Patches
Plain Diff
update readme and prepare run.py to be used for IntelliO3-ts
parent
d85907d2
Branches
Branches containing commit
Tags
Tags containing commit
No related merge requests found
Pipeline
#37629
failed
5 years ago
Stage: test
Stage: pages
Stage: deploy
Changes
2
Pipelines
1
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
README.md
+47
-37
47 additions, 37 deletions
README.md
run.py
+12
-3
12 additions, 3 deletions
run.py
with
59 additions
and
40 deletions
README.md
+
47
−
37
View file @
a5ca723c
# MachineLearningTools
This is a collection of all relevant functions used for ML stuff in the ESDE group
This project contains the source code to rerun "IntelliO3-ts v1.0: A neural network approach to predict near-surface
ozone concentrations in Germany" by F. Kleinert, L. H. Leufen and M. G. Schultz, 2020.
Moreover, the source code includes some functionality which is not used in the study named above.
## Inception Model
See a description
[
here
](
https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
)
or take a look
on
the papers
[
Going Deeper with Convolutions (Szegedy et al., 2014)
](
https://arxiv.org/abs/1409.4842
)
or take a look
at
the papers
[
Going Deeper with Convolutions (Szegedy et al., 2014)
](
https://arxiv.org/abs/1409.4842
)
and
[
Network In Network (Lin et al., 2014)
](
https://arxiv.org/abs/1312.4400
)
.
# Installation
We assume that you have downloaded or cloned the project.
We assume that you have downloaded or cloned the project from GitLab or
[
b2share
](
https://doi.org/10.34730/c5dae21fac954aa6bdb4e86172221526
)
. In the latter case, you can skip the first remark
regarding the
`data_path`
. Instructions on how to rerun the specific version are given below.
*
Install __proj__ and __geos__ on your machine using the console. E.g. for
o
pen
suse
/ leap
`zypper install proj`
*
Install __proj__ and __geos__ on your machine using the console. E.g. for
O
pen
SUSE
/ leap
`zypper install proj`
*
c++ compiler required for cartopy installation
*
Make sure that CUDA 10.0 is installed if you want to use Nvidia GPUs
*
Make sure that CUDA 10.0 is installed if you want to use Nvidia GPUs
(compatible with TensorFlow 1.13.1).
Depending on your system (GPU available or not) you can create a virtual environment by executing
`python3 -m venv venv`
. Make sure that the venv is activated (
`source venv/bin/activate`
). Afterwards
...
...
@@ -23,14 +27,14 @@ you can install the requirements into the venv:
*
CPU version:
`pip install -r requirements.txt`
*
GPU version:
`pip install -r requirements_gpu.txt`
##
#
Remarks on first setup
1.
The source code does not include any data to process. Instead, it
will
check if data are available on your local machine
and
will
download data from the
[
here
](
https://join.fz-juelich.de
)
. We did not implement
## Remarks on
the
first setup
1.
The source code does not include any data to process. Instead, it check
s
if data are available on your local machine
and download
s
data from the
[
here
](
https://join.fz-juelich.de
)
. We did not implement
a default
`data_path`
as we want to allow you to choose where exactly the data should be stored.
Consequently, you have to pass a custom data path to
`ExperimentSetup`
in
`run.py`
(see example below).
If all required data are already locally available, the program
will
not download any new data.
If all required data are already locally available, the program
does
not download any new data.
2.
Please note that cartopy may cause errors on run time. If cartopy raises an error you can try the following
2.
Please note that cartopy may cause errors on run time. If cartopy raises an error
,
you can try the following
(in activated venv):
*
`pip uninstall shapely`
*
`pip uninstall cartopy`
...
...
@@ -38,16 +42,16 @@ If all required data are already locally available, the program will not downloa
*
`pip install --no-binary shapely shapely`
*
`pip install cartopy`
Catropy is needed only to create one plot showing the station locations and does not
e
ffect
the neural network
itself. If the procedure above does not solve the problem, you can force the workflow to ignore
cartopy by adding
the first two characters of your hostname (
`echo $HOSTNAME`
) as a list containing a string to the
keyword argument
`hpc_hosts`
in
`run.py`
.
The
e
example below
,
assumes that the output of
`echo $HOSTNAME`
is "your_hostname".
Catropy is needed only to create one plot showing the station locations and does not
a
ffect
the neural network
itself. If the procedure above does not solve the problem, you can force the workflow to ignore
cartopy by adding
the first two characters of your hostname (`echo $HOSTNAME`) as a list containing a string to the
keyword argument
`hpc_hosts` in `run.py`.
The example below assumes that the output of
`echo $HOSTNAME`
is "your_hostname".
Please also consult the [installation instruction](https://scitools.org.uk/cartopy/docs/latest/installing.html#installing)
of the cartopy package itself.
Example f
or
all remarks given above:
Example
o
f all remarks given above:
```
import [...]
[...]
...
...
@@ -62,38 +66,43 @@ def main(parser_args):
```
## HPC - JUWELS and HDFML setup
The following instruction guide you through the installation on JUWELS and HDFML.
##
#
HPC - JUWELS and HDFML setup
The following instruction guide
s
you through the installation on JUWELS and HDFML.
*
Clone the repo to HPC system (we recommend to place it in
`/p/projects/<project name>`
.
*
Setup venv by executing
`source setupHPC.sh`
. This script loads all pre-installed modules and creates a venv for
all
other packages. Furthermore, it creates slurm/batch scripts to execute code on compute nodes.
<br>
*
Setup venv by executing
`source setupHPC.sh`
. This script loads all pre-installed modules and creates a venv for
all
other packages. Furthermore, it creates slurm/batch scripts to execute code on compute nodes.
<br>
You have to enter the HPC project's budget name (--account flag).
*
The default external data path on JUWELS and HDFML is set to
`/p/project/deepacf/intelliaq/<user>/DATA/toar_<sampling>`
.
<br>
To choose a different location open
`run.py`
and add the following keyword argument to
`ExperimentSetup`
:
<br>
To choose a different location open
`run.py`
and add the following keyword argument to
`ExperimentSetup`
:
`data_path=<your>/<custom>/<path>`
.
*
Execute
`python run.py`
on a login node to download example data. The program
will
throw an OSerror after downloading.
*
Execute
`python run.py`
on a login node to download example data. The program throw
s
an OSerror after downloading.
*
Execute either
`sbatch run_juwels_develgpus.bash`
or
`sbatch run_hdfml_batch.bash`
to verify that the setup went well.
*
Currently cartopy is not working on our HPC system, therefore PlotStations does not create any output.
### HPC JUWELS and HDFML remarks
###
#
HPC JUWELS and HDFML remarks
Please note, that the HPC setup is customised for JUWELS and HDFML. When using another HPC system, you can use the HPC
setup files as a skeleton and customise it to your needs.
Note: The method
`PartitionCheck`
currently only checks if the hostname starts with
`ju`
or
`hdfmll`
.
Therefore, it might be necessary to adopt the
`if`
statement in
`PartitionCheck._run`
.
Therefore, it might be necessary to adopt the
`if`
statement in
`
src/run_modules/
PartitionCheck._run`
.
## How to run [IntelliO3-ts](https://doi.org/10.34730/c5dae21fac954aa6bdb4e86172221526)
After you followed the instructions above, you can rerun the trained model by executing (in activated venv)
`python run.py --experiment_date=IntelliO3-ts`
. If you want to train the model from scratch, you have to modify
`run.py`
.
To be more precise, you have to set
`create_new_model=True`
.
# Security
*
To use hourly data from ToarDB via JOIN interface, a private token is required. Request your personal access token and
add it to
`src/join_settings.py`
in the hourly data section. Replace the
`TOAR_SERVICE_URL`
and the
`Authori
z
ation`
add it to
`src/join_settings.py`
in the hourly data section. Replace the
`TOAR_SERVICE_URL`
and the
`Authori
s
ation`
value. To make sure, that this
**sensitive**
data is not uploaded to the remote server, use the following command to
prevent git from tracking this file:
`git update-index --assume-unchanged src/join_settings.py
`
# Customise your experiment
This section summarises which parameters can be customised for
a
training.
This section summarises which parameters can be customised for training.
## Transformation
...
...
@@ -120,29 +129,30 @@ ExperimentSetup(..., transformation=transformation, ...)
from different calculation schemes, explained in the mean and std section.
### supported transformation methods
Currently supported methods are:
*
standardise (default, if method is not given)
Currently
,
supported methods are:
*
standardise (default, if
the
method is not given)
*
centre
### mean and std
`"mean"="accurate"`
: calculate the accurate values of mean and std (depending on method) by using all data. Although
,
`"mean"="accurate"`
: calculate the accurate values of mean and std (depending on method) by using all data. Although
this method is accurate, it may take some time for the calculation. Furthermore, this could potentially lead to memory
issue (not explored yet, but could appear for a
very big
amount of data)
issue (not explored yet, but could appear for a
huge
amount of data)
`"mean"="estimate"`
: estimate mean and std (depending on method). For each station, mean and std are calculated and
afterwards aggregated using the mean value over all station-wise metrics. This method is less accurate, especially
regarding the std calculation but therefore much faster.
We recommend to use the later method
*estimate*
because of following reasons:
We recommend to use the later method
*estimate*
because of
the
following reasons:
*
much faster calculation
*
real accuracy of mean and std is less important
,
because it is "just" a transformation
/
scaling
*
real accuracy of mean and std is less important because it is "just" a transformation
/
scaling
*
accuracy of mean is almost as high as in the
*accurate*
case, because of
$
\b
ar{x_{ij}} =
\b
ar{
\l
eft(
\b
ar{x_i}
\r
ight)_j}$. The only difference is
,
that in the
*estimate*
case
,
each mean is
$
\b
ar{x_{ij}} =
\b
ar{
\l
eft(
\b
ar{x_i}
\r
ight)_j}$. The only difference is that in the
*estimate*
case each mean is
equally weighted for each station independently of the actual data count of the station.
*
accuracy of std is lower for
*estimate*
because of $
\v
ar{x_{ij}}
\n
e
\b
ar{
\l
eft(
\v
ar{x_i}
\r
ight)_j}$, but still the mean of all
*
accuracy of std is lower for
*estimate*
because of $
\v
ar{x_{ij}}
\n
e
\b
ar{
\l
eft(
\v
ar{x_i}
\r
ight)_j}$, but still the
mean of all
station-wise std is a decent estimate of the true std.
`"mean"=<value, e.g. xr.DataArray>`
: If mean and std are already calculated or shall be set manually,
just
add the
scaling values instead of the calculation method. For method
*centre*
, std can still be None
,
but is required for the
`"mean"=<value, e.g. xr.DataArray>`
: If mean
,
and std are already calculated or shall be set manually,
you can
add the
scaling values instead of the calculation method. For method
*centre*
, std can still be None but is required for the
*
standardise
* method. **Important**: Format of given values **must*
*
match internal data format of DataPreparation
class:
`xr.DataArray`
with
`dims=["variables"]`
and one value for each variable.
This diff is collapsed.
Click to expand it.
run.py
+
12
−
3
View file @
a5ca723c
...
...
@@ -4,6 +4,7 @@ __date__ = '2019-11-14'
import
argparse
import
json
import
os
from
src.run_modules.experiment_setup
import
ExperimentSetup
from
src.run_modules.partition_check
import
PartitionCheck
...
...
@@ -23,12 +24,19 @@ def main(parser_args):
with
RunEnvironment
():
ExperimentSetup
(
parser_args
,
data_path
=
f
"
{
os
.
getcwd
()
}
/raw_input_IntelliO3-ts/
"
,
# hpc_hosts=["yo"], #
stations
=
stations
,
# stations=['DEBW107', 'DEBY081', 'DEBW013', 'DEBW076', 'DEBW087', 'DEBW001'],
station_type
=
'
background
'
,
window_lead_time
=
4
,
window_history_size
=
6
,
trainable
=
False
,
create_new_model
=
Tru
e
,
permute_data_on_training
=
True
,
trainable
=
False
,
create_new_model
=
Fals
e
,
permute_data_on_training
=
True
,
extreme_values
=
3.
,
train_min_length
=
365
,
val_min_length
=
365
,
test_min_length
=
365
,
create_new_bootstraps
=
True
,
hpc_hosts
=
[
"
za
"
])
create_new_bootstraps
=
True
,
plot_list
=
[
"
PlotMonthlySummary
"
,
"
PlotStationMap
"
,
"
PlotClimatologicalSkillScore
"
,
"
PlotCompetitiveSkillScore
"
,
"
PlotBootstrapSkillScore
"
,
"
PlotConditionalQuantiles
"
,
"
PlotAvailability
"
],
)
PreProcessing
()
...
...
@@ -40,6 +48,7 @@ def main(parser_args):
PostProcessing
()
if
__name__
==
"
__main__
"
:
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
'
--experiment_date
'
,
metavar
=
'
--exp_date
'
,
type
=
str
,
default
=
"
testrun
"
,
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment