diff --git a/README.md b/README.md index 5fa4b7c2bc02d5325ce39955af61a0b817884876..77d8bcab7941ea25119e79fbcba27890bdfac129 100644 --- a/README.md +++ b/README.md @@ -1,68 +1,25 @@ -# Air quality mapping with the AQ-Bench dataset +# Machine learning on the AQ-Bench dataset -The goal of this project is to map metadata at station locations to air quality statistics. +This repository enables a machine learning quickstart on the AQ-Bench dataset. -These instructions will get you a copy of the project up and running on your PC. +The AQ-Bench Benchmark dataset is described in Betancourt et al. (manuscript): "AQ-Bench: A Benchmark Dataset for Machine Learning on Global Air Quality Metrics" (link follows) -## Structure of the project +## Quickstart -This project consists of two parts: -* Obtaining the training dataset from TOAR-DB and JOIN. We call it AQ-Bench for now. -* The mapping part +Run it on binder! -## Hyperparameter tuning "Hackathon" +## Get the project running on your PC -Get yourself up and ready: -* Download the project from Git -* Run ```source prepare.sh``` for python environment -* Start the Jupyter notebook ```cd source```, ```jupyter notebook``` +* Prerequisite: Conda or MiniConda with Python 3.6 +* Use ```environment.yml``` to create an environment, then activate it +* Navigate to ```source``` and start the ```introduction_jupyter.ipynp``` by prompting ```jupyter notebook``` -Rules for the Game: -* We provide you with training data (train/dev split as you like) -* Try out hyperparameters -* Submit your best hyper-parameters to be tested with our secret test set -* Best hyper-parameters win the price! +## Structure of the repository -## Downloading the AQ-Bench dataset +* ```resources``` contains the data +* ```source``` contains the scripts -* We provide the dataset in the data folder of this project. Nevertheless, you can also download it by yourself. -* If would like to download the AQ-Bench dataset, turn on FZJ VPN for TOAR access. -* Create a file ```dataset_dbaccess.py``` in the source directory which contains your credentials for TOAR-DB (if you do not have access to TOAR-db, then just leave '***' for username and password): -``` -db_user = '****' -db_password = '****' -db_host = 'zam10131.zam.kfa-juelich.de' -db_port = '5432' -db_name = 'surface_observations_toar' -``` -## Resources to describe AQ-Bench - -The resources folder contains .csv files with necessary info to handle the dataset. - -* ```AQbench_variables.csv```: Info on all variables in the dataset -* ```*_cols.csv```: Info for dataset retrieval -* ```climatic_zone.csv```, ```htap_region.csv```, ```climatic_landcover.csv```: Info on decoded variables - -## Run Scripts - -Run ```source run.sh``` to start the interactive script starter. You may choose from various options: - -* ```prepare ``` - * Creates folders for logs (where your log files are stored), data (where the dataset is stored) and plots (where your plots will be stored) - * Creates and activates the mapping environment -* ```test ``` - * Starts all tests in the test folder -* ```retrieval ``` - * Starts the dataset retrieval from TOAR-DB and JOIN -* ```sanitycheck ``` - * Carries out a sanitycheck for your dataset -* ```preanalysis ``` - * Preliminary analysis of dataset statistics - * Visualisation of missing values -* ```mapping ``` - * Mapping of the dataset (multi layer perceptron) - * Mapping of the dataset (random forest) ## Authors