Snippets Groups Projects

aq-bench

Merge branch 'devel-intern' into master-intern

Clara Betancourt authored 4 years ago

e0705e7d

e0705e7d 4 years ago

Name	Last commit	Last update
resources
source
test
.gitignore
LICENSE.md
README.md
prepare.sh
run.sh

Air quality mapping with the AQ-Bench dataset

The goal of this project is to map metadata at station locations to air quality statistics.

These instructions will get you a copy of the project up and running on your PC.

Structure of the project

This project consists of two parts:

Obtaining the training dataset from TOAR-DB and JOIN. We call it AQ-Bench for now.
The mapping part

Hyperparameter tuning "Hackathon"

Get yourself up and ready:

Download the project from Git
Run source prepare.sh for python environment
Start the Jupyter notebook cd source, jupyter notebook

Rules for the Game:

We provide you with training data (train/dev split as you like)
Try out hyperparameters
Submit your best hyper-parameters to be tested with our secret test set
Best hyper-parameters win the price!

Downloading the AQ-Bench dataset

We provide the dataset in the data folder of this project. Nevertheless, you can also download it by yourself.
If would like to download the AQ-Bench dataset, turn on FZJ VPN for TOAR access.
Create a file dataset_dbaccess.py in the source directory which contains your credentials for TOAR-DB (if you do not have access to TOAR-db, then just leave '***' for username and password):

db_user = '****'
db_password = '****'
db_host = 'zam10131.zam.kfa-juelich.de'
db_port = '5432'
db_name = 'surface_observations_toar'

Resources to describe AQ-Bench

The resources folder contains .csv files with necessary info to handle the dataset.

AQbench_variables.csv: Info on all variables in the dataset
*_cols.csv: Info for dataset retrieval
climatic_zone.csv, htap_region.csv, climatic_landcover.csv: Info on decoded variables

Run Scripts

Run source run.sh to start the interactive script starter. You may choose from various options:

prepare
- Creates folders for logs (where your log files are stored), data (where the dataset is stored) and plots (where your plots will be stored)
- Creates and activates the mapping environment
test
- Starts all tests in the test folder
retrieval
- Starts the dataset retrieval from TOAR-DB and JOIN
sanitycheck
- Carries out a sanitycheck for your dataset
preanalysis
- Preliminary analysis of dataset statistics
- Visualisation of missing values
mapping
- Mapping of the dataset (multi layer perceptron)
- Mapping of the dataset (random forest)

Authors

Clara Betancourt
Scarlet Stadtler
Timo Stomberg

License

This project is licensed under the MIT License.