Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • MLAir MLAir
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 90
    • Issues 90
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 9
    • Merge requests 9
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
    • Model experiments
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • esdeesde
  • machine-learning
  • MLAirMLAir
  • Issues
  • #384

AQW data handler

Data Handler for AQWatch

data handler designed for work of @li40

Structure of Data

Inputs: forecasts from CTMs

  • root folder: mod
  • structured per region of interest (each will be a separate experiment with a different NN), e.g. /colorado
  • depending on region: different number and names of CTMs and always mean of ensemble (not used as input for now), e.g. /lotos_tno
  • data are already interpolated on station level
  • data are stored per forecast date /$YYYYMMDD
  • single file per species, e.g. $nvar_TNO_inp.nc
  • data files are structured as follows: first dimension is time (model time UTC), second dimension is station (named by station long name).

Targets: observations from measurement stations

  • root folder: /obs/obs_download_scripts
  • structured per region of interest (each will be a separate experiment with a different NN), e.g. /Colorado
  • data are inside data folder /Data
  • single file per species and date (including all stations and 24h), e.g. obs_$nvar_$YYYYMMDD.nc
  • data files are structured as follows: index is date_utc, columns are stations indicated by id (size is timesteps x stations).

Competitor: ensemble mean calculated over all CTM forecasts

  • root folder: mod
  • structured per region of interest (each will be a separate experiment with a different NN), e.g. /colorado
  • stored in directory /ens
  • structure same as for inputs
|-- mod
|   `-- colorado
|       |-- ens (MEAN of CTM#1-3)
|       |     `-- $YYYYMMDD (%forecast_date)
|       |         `-- interpolated 
|       |             `-- $nvar_ensmean_inp.nc # time series of ensemble mean (competitor)
|       |-- lotos_tno (CTM#1)
|       |   |-- $YYYYMMDD (%forecast_date)
|       |   |   `-- interpolated
|       |   |       `-- $nvar_TNO_inp.nc # time series of CTM#1 (input feature)
|       |   `-- stations_colo.csv # Station info
|       |-- silam_fmi (CTM#2)
|       |   |-- $YYYYMMDD (%forecast_date)
|       |   |   `-- interpolated
|       |   |       `-- $nvar_FMI_inp.nc # time series of CTM#2 (input feature)
|       |   `-- stations_colo.csv
|       `-- wrf_ucar (CTM#3)
|           |-- $YYYYMMDD (%forecast_date)
|           |   `-- interpolated
|           |       `-- $nvar_UCAR_inp.nc # time series of CTM#3 (input feature)
|            `-- stations_colo.csv
`-- obs
    `-- obs_download_scripts
         `-- Colorado
            |-- stations_colo.csv
            `-- Data
                `--obs_$nvar_$YYYYMMDD.nc (%obs_date) # time series of observation (target)

$nvar = ['co','so2','no2','o3','pm10','pm25']  # pollutant species available
%forecast_date = current day
%obs_date = one day before current day

Start Script

This is a basic script that could be used for the AQWatch data handler. The script does not setup the NN explicitly but can be used to check if the workflow passes.

__author__ = "Lukas Leufen"
__date__ = '2022-05-18'

import argparse
import os
import sys
sys.path.append("<abs_path_to_mlair>")

from mlair.workflows import DefaultWorkflow
from mlair.data_handler.data_handler_aqwatch import DataHandlerAQWatch, DataHandlerAQWatchSingleStation


def main(parser_args):
    args = dict(data_handler=DataHandlerAQWatch,
                interpolation_limit=3, overwrite_local_data=False,
                overwrite_lazy_data=True,
                lazy_preprocessing=True,
                train_min_length=0,  # just replace defaults which are 90
                val_min_length=0,  # just replace defaults which are 90
                test_min_length=0,  # just replace defaults which are 90
                window_history_size=0,  # has to be 0 to indicate t0
                window_lead_time=0,  # has to be 0 to indicate t0
                start="2022-05-01",  # start and train_start should be the same
                train_start="2022-05-01",
                train_end="2022-05-02",
                val_start="2022-05-02",
                val_end="2022-05-10",
                test_start="2022-05-10",
                test_end="2022-05-20",
                end="2022-05-20",  # end and test_end should be the same
                region="colorado",  # specify the region
                variables=["no2"],  # this sets your variable, currently it is not possible to use more than one
                target_var=["no2"],   # this sets your variable, currently it is not possible to use more than one
                ctm_list=["test_ctm", "test2_ctm"],  # name models to use
                competitors=["aqw_ens_mean"],
                sampling="hourly",
                #stations=['80050006', '80131001', '80130014', '80350004', '80410017', '80410015', '80410013',
                #          '80830006', '80310002', '80310013', '80691004', '80690011', '80690009', '80310028',
                #          '80590011', '80519991', '80770017', '81230009', '81230006', '80050002', '80310027',
                #          '80310026', '80130003', '80410016', '80830101', '80770020', '81030006', '80450012',
                #          '80450007', '80590006', '80690007', '80699991', '80677001', '80677003', '80013001',
                #          '80970008'],
                stations=["Aurora East", "Boulder Reservoir","Chatfield Park - 11500 N. Roxborough Park Rd.","Colorado Springs - USAF Academy","Cortez Ozone","Denver - CAMP - 2105 Broadway","Fort Collins - CSU - 708 S. Mason St.","Fort Collins - West - Laporte Ave. & Overland Tr.","Golden - NREL - 2054 Quaker St.","Gothic","Greeley - Weld Co. Tower - 3101 35th Ave.","Highland Reservoir - 8100 S. University Blvd.","La Casa NCORE - 4545 Navajo St.","Manitou Springs","Mesa Verde NP","Palisade Ozone","Rangely, CO","Rifle Ozone","Rocky Flats - N - 16600 W. Colo. Hwy. 128","Rocky Mountain NP","Ute 1","Ute 3","Welby - 78th Ave. & Steele St."],
                transformation={
                    "o3": {"method": "log"},
                    "no": {"method": "log"},
                    "no2": {"method": "log"}, },
                data_path=os.path.join(".", "data", "aqw_data"),  # <- root folder of data containing obs and mod
                **parser_args.__dict__,
                )
    workflow = DefaultWorkflow(**args, start_script=__file__)
    workflow.run()


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--experiment_date', metavar='--exp_date', type=str, default=None,
                        help="set experiment date as string")
    args = parser.parse_args(["--experiment_date", "testrun"])
    main(args)

TODOs

  • include competitor: maybe write a class that can load the AQWatch data (and stores them), similar to the IntelliO3 competitor
  • check if workflow works from begin to end
  • be able to load meta data as lon/lat from meta files as station_colo.csv
Edited Jun 23, 2022 by Wing Yi Li
Assignee
Assign to
Time tracking