Skip to content
Snippets Groups Projects
Select Git revision
  • master default protected
  • master-intern
2 results

aq-bench

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    Clara Betancourt authored
    e0705e7d
    History

    Air quality mapping with the AQ-Bench dataset

    The goal of this project is to map metadata at station locations to air quality statistics.

    These instructions will get you a copy of the project up and running on your PC.

    Structure of the project

    This project consists of two parts:

    • Obtaining the training dataset from TOAR-DB and JOIN. We call it AQ-Bench for now.
    • The mapping part

    Hyperparameter tuning "Hackathon"

    Get yourself up and ready:

    • Download the project from Git
    • Run source prepare.sh for python environment
    • Start the Jupyter notebook cd source, jupyter notebook

    Rules for the Game:

    • We provide you with training data (train/dev split as you like)
    • Try out hyperparameters
    • Submit your best hyper-parameters to be tested with our secret test set
    • Best hyper-parameters win the price!

    Downloading the AQ-Bench dataset

    • We provide the dataset in the data folder of this project. Nevertheless, you can also download it by yourself.
    • If would like to download the AQ-Bench dataset, turn on FZJ VPN for TOAR access.
    • Create a file dataset_dbaccess.py in the source directory which contains your credentials for TOAR-DB (if you do not have access to TOAR-db, then just leave '***' for username and password):
    db_user = '****'
    db_password = '****'
    db_host = 'zam10131.zam.kfa-juelich.de'
    db_port = '5432'
    db_name = 'surface_observations_toar'

    Resources to describe AQ-Bench

    The resources folder contains .csv files with necessary info to handle the dataset.

    • AQbench_variables.csv: Info on all variables in the dataset
    • *_cols.csv: Info for dataset retrieval
    • climatic_zone.csv, htap_region.csv, climatic_landcover.csv: Info on decoded variables

    Run Scripts

    Run source run.sh to start the interactive script starter. You may choose from various options:

    • prepare
      • Creates folders for logs (where your log files are stored), data (where the dataset is stored) and plots (where your plots will be stored)
      • Creates and activates the mapping environment
    • test
      • Starts all tests in the test folder
    • retrieval
      • Starts the dataset retrieval from TOAR-DB and JOIN
    • sanitycheck
      • Carries out a sanitycheck for your dataset
    • preanalysis
      • Preliminary analysis of dataset statistics
      • Visualisation of missing values
    • mapping
      • Mapping of the dataset (multi layer perceptron)
      • Mapping of the dataset (random forest)

    Authors

    • Clara Betancourt
    • Scarlet Stadtler
    • Timo Stomberg

    License

    This project is licensed under the MIT License.