Skip to content
Snippets Groups Projects
Commit 55c64c96 authored by Clara Betancourt's avatar Clara Betancourt
Browse files
parents a7d8ec25 de49effa
Branches
No related tags found
No related merge requests found
# Air quality mapping with the AQ-Bench dataset # Machine learning on the AQ-Bench dataset
The goal of this project is to map metadata at station locations to air quality statistics. This repository enables a machine learning quickstart on the AQ-Bench dataset.
These instructions will get you a copy of the project up and running on your PC. The AQ-Bench Benchmark dataset is described in Betancourt et al. (manuscript): "AQ-Bench: A Benchmark Dataset for Machine Learning on Global Air Quality Metrics" (link follows)
## Structure of the project ## Quickstart
This project consists of two parts: Run it on binder! Click on the badge below to start machine learning on AQ-Bench in your browser (might take some time to launch).
* Obtaining the training dataset from TOAR-DB and JOIN. We call it AQ-Bench for now. https://mybinder.org/v2/git/https%3A%2F%2Fgitlab.version.fz-juelich.de%2Ftoar%2Fozone-mapping/devel?filepath=source%2Fintroduction_jupyter.ipynb
* The mapping part
## Hyperparameter tuning "Hackathon" ## Get the project running on your PC
Get yourself up and ready: * Prerequisite: Conda or MiniConda with Python 3.6
* Download the project from Git * Use ```environment.yml``` to create an environment, then activate it
* Run ```source prepare.sh``` for python environment * Navigate to ```source``` and start the ```introduction_jupyter.ipynp``` by prompting ```jupyter notebook```
* Start the Jupyter notebook ```cd source```, ```jupyter notebook```
Rules for the Game: ## Structure of the repository
* We provide you with training data (train/dev split as you like)
* Try out hyperparameters
* Submit your best hyper-parameters to be tested with our secret test set
* Best hyper-parameters win the price!
## Downloading the AQ-Bench dataset * ```resources``` contains the data
* ```source``` contains the scripts
* We provide the dataset in the data folder of this project. Nevertheless, you can also download it by yourself.
* If would like to download the AQ-Bench dataset, turn on FZJ VPN for TOAR access.
* Create a file ```dataset_dbaccess.py``` in the source directory which contains your credentials for TOAR-DB (if you do not have access to TOAR-db, then just leave '***' for username and password):
```
db_user = '****'
db_password = '****'
db_host = 'zam10131.zam.kfa-juelich.de'
db_port = '5432'
db_name = 'surface_observations_toar'
```
## Resources to describe AQ-Bench
The resources folder contains .csv files with necessary info to handle the dataset.
* ```AQbench_variables.csv```: Info on all variables in the dataset
* ```*_cols.csv```: Info for dataset retrieval
* ```climatic_zone.csv```, ```htap_region.csv```, ```climatic_landcover.csv```: Info on decoded variables
## Run Scripts
Run ```source run.sh``` to start the interactive script starter. You may choose from various options:
* ```prepare ```
* Creates folders for logs (where your log files are stored), data (where the dataset is stored) and plots (where your plots will be stored)
* Creates and activates the mapping environment
* ```test ```
* Starts all tests in the test folder
* ```retrieval ```
* Starts the dataset retrieval from TOAR-DB and JOIN
* ```sanitycheck ```
* Carries out a sanitycheck for your dataset
* ```preanalysis ```
* Preliminary analysis of dataset statistics
* Visualisation of missing values
* ```mapping ```
* Mapping of the dataset (multi layer perceptron)
* Mapping of the dataset (random forest)
## Authors ## Authors
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment