Air quality mapping with the AQ-Bench dataset
The goal of this project is to map metadata at station locations to air quality statistics.
These instructions will get you a copy of the project up and running on your PC.
Structure of the project
This project consists of two parts:
- Obtaining the training dataset from TOAR-DB and JOIN. We call it AQ-Bench for now.
- The mapping part
Hyperparameter tuning "Hackathon"
Get yourself up and ready:
- Download the project from Git
- Run
source prepare.sh
for python environment - Start the Jupyter notebook
cd source
,jupyter notebook
Rules for the Game:
- We provide you with training data (train/dev split as you like)
- Try out hyperparameters
- Submit your best hyper-parameters to be tested with our secret test set
- Best hyper-parameters win the price!
Downloading the AQ-Bench dataset
- We provide the dataset in the data folder of this project. Nevertheless, you can also download it by yourself.
- If would like to download the AQ-Bench dataset, turn on FZJ VPN for TOAR access.
- Create a file
dataset_dbaccess.py
in the source directory which contains your credentials for TOAR-DB (if you do not have access to TOAR-db, then just leave '***' for username and password):
db_user = '****'
db_password = '****'
db_host = 'zam10131.zam.kfa-juelich.de'
db_port = '5432'
db_name = 'surface_observations_toar'
Resources to describe AQ-Bench
The resources folder contains .csv files with necessary info to handle the dataset.
-
AQbench_variables.csv
: Info on all variables in the dataset -
*_cols.csv
: Info for dataset retrieval -
climatic_zone.csv
,htap_region.csv
,climatic_landcover.csv
: Info on decoded variables
Run Scripts
Run source run.sh
to start the interactive script starter. You may choose from various options:
-
prepare
- Creates folders for logs (where your log files are stored), data (where the dataset is stored) and plots (where your plots will be stored)
- Creates and activates the mapping environment
-
test
- Starts all tests in the test folder
-
retrieval
- Starts the dataset retrieval from TOAR-DB and JOIN
-
sanitycheck
- Carries out a sanitycheck for your dataset
-
preanalysis
- Preliminary analysis of dataset statistics
- Visualisation of missing values
-
mapping
- Mapping of the dataset (multi layer perceptron)
- Mapping of the dataset (random forest)
Authors
- Clara Betancourt
- Scarlet Stadtler
- Timo Stomberg
License
This project is licensed under the MIT License.