The goal of this project is to map metadata at station locations to air quality statistics.
This repository enables a machine learning quickstart on the AQ-Bench dataset.
These instructions will get you a copy of the project up and running on your PC.
The AQ-Bench Benchmark dataset is described in Betancourt et al. (manuscript): "AQ-Bench: A Benchmark Dataset for Machine Learning on Global Air Quality Metrics" (link follows)
## Structure of the project
## Quickstart
This project consists of two parts:
Run it on binder!
* Obtaining the training dataset from TOAR-DB and JOIN. We call it AQ-Bench for now.
* The mapping part
## Hyperparameter tuning "Hackathon"
## Get the project running on your PC
Get yourself up and ready:
* Prerequisite: Conda or MiniConda with Python 3.6
* Download the project from Git
* Use ```environment.yml``` to create an environment, then activate it
* Run ```source prepare.sh``` for python environment
* Navigate to ```source``` and start the ```introduction_jupyter.ipynp``` by prompting ```jupyter notebook```
* Start the Jupyter notebook ```cd source```, ```jupyter notebook```
Rules for the Game:
## Structure of the repository
* We provide you with training data (train/dev split as you like)
* Try out hyperparameters
* Submit your best hyper-parameters to be tested with our secret test set
* Best hyper-parameters win the price!
## Downloading the AQ-Bench dataset
*```resources``` contains the data
*```source``` contains the scripts
* We provide the dataset in the data folder of this project. Nevertheless, you can also download it by yourself.
* If would like to download the AQ-Bench dataset, turn on FZJ VPN for TOAR access.
* Create a file ```dataset_dbaccess.py``` in the source directory which contains your credentials for TOAR-DB (if you do not have access to TOAR-db, then just leave '***' for username and password):
```
db_user = '****'
db_password = '****'
db_host = 'zam10131.zam.kfa-juelich.de'
db_port = '5432'
db_name = 'surface_observations_toar'
```
## Resources to describe AQ-Bench
The resources folder contains .csv files with necessary info to handle the dataset.
*```AQbench_variables.csv```: Info on all variables in the dataset
*```*_cols.csv```: Info for dataset retrieval
*```climatic_zone.csv```, ```htap_region.csv```, ```climatic_landcover.csv```: Info on decoded variables
## Run Scripts
Run ```source run.sh``` to start the interactive script starter. You may choose from various options:
*```prepare ```
* Creates folders for logs (where your log files are stored), data (where the dataset is stored) and plots (where your plots will be stored)
* Creates and activates the mapping environment
*```test ```
* Starts all tests in the test folder
*```retrieval ```
* Starts the dataset retrieval from TOAR-DB and JOIN