Skip to content
Snippets Groups Projects
Commit ec39b321 authored by Clara Betancourt's avatar Clara Betancourt
Browse files

Update README.md

parent dc4bd91f
No related branches found
No related tags found
No related merge requests found
# Air quality mapping with the AQ-Bench dataset # Machine learning on the AQ-Bench dataset
The goal of this project is to map metadata at station locations to air quality statistics. This repository enables a machine learning quickstart on the AQ-Bench dataset.
These instructions will get you a copy of the project up and running on your PC. The AQ-Bench Benchmark dataset is described in Betancourt et al. (manuscript): "AQ-Bench: A Benchmark Dataset for Machine Learning on Global Air Quality Metrics" (link follows)
## Structure of the project ## Quickstart
This project consists of two parts: Run it on binder!
* Obtaining the training dataset from TOAR-DB and JOIN. We call it AQ-Bench for now.
* The mapping part
## Hyperparameter tuning "Hackathon" ## Get the project running on your PC
Get yourself up and ready: * Prerequisite: Conda or MiniConda with Python 3.6
* Download the project from Git * Use ```environment.yml``` to create an environment, then activate it
* Run ```source prepare.sh``` for python environment * Navigate to ```source``` and start the ```introduction_jupyter.ipynp``` by prompting ```jupyter notebook```
* Start the Jupyter notebook ```cd source```, ```jupyter notebook```
Rules for the Game: ## Structure of the repository
* We provide you with training data (train/dev split as you like)
* Try out hyperparameters
* Submit your best hyper-parameters to be tested with our secret test set
* Best hyper-parameters win the price!
## Downloading the AQ-Bench dataset * ```resources``` contains the data
* ```source``` contains the scripts
* We provide the dataset in the data folder of this project. Nevertheless, you can also download it by yourself.
* If would like to download the AQ-Bench dataset, turn on FZJ VPN for TOAR access.
* Create a file ```dataset_dbaccess.py``` in the source directory which contains your credentials for TOAR-DB (if you do not have access to TOAR-db, then just leave '***' for username and password):
```
db_user = '****'
db_password = '****'
db_host = 'zam10131.zam.kfa-juelich.de'
db_port = '5432'
db_name = 'surface_observations_toar'
```
## Resources to describe AQ-Bench
The resources folder contains .csv files with necessary info to handle the dataset.
* ```AQbench_variables.csv```: Info on all variables in the dataset
* ```*_cols.csv```: Info for dataset retrieval
* ```climatic_zone.csv```, ```htap_region.csv```, ```climatic_landcover.csv```: Info on decoded variables
## Run Scripts
Run ```source run.sh``` to start the interactive script starter. You may choose from various options:
* ```prepare ```
* Creates folders for logs (where your log files are stored), data (where the dataset is stored) and plots (where your plots will be stored)
* Creates and activates the mapping environment
* ```test ```
* Starts all tests in the test folder
* ```retrieval ```
* Starts the dataset retrieval from TOAR-DB and JOIN
* ```sanitycheck ```
* Carries out a sanitycheck for your dataset
* ```preanalysis ```
* Preliminary analysis of dataset statistics
* Visualisation of missing values
* ```mapping ```
* Mapping of the dataset (multi layer perceptron)
* Mapping of the dataset (random forest)
## Authors ## Authors
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment