AQW data handler
Data Handler for AQWatch
data handler designed for work of @li40
Structure of Data
Inputs: forecasts from CTMs
- root folder:
mod
- structured per region of interest (each will be a separate experiment with a different NN), e.g.
/colorado
- depending on region: different number and names of CTMs and always mean of ensemble (not used as input for now), e.g.
/lotos_tno
- data are already interpolated on station level
- data are stored per forecast date
/$YYYYMMDD
- single file per species, e.g.
$nvar_TNO_inp.nc
- data files are structured as follows: first dimension is time (model time UTC), second dimension is station (named by station long name).
Targets: observations from measurement stations
- root folder:
/obs/obs_download_scripts
- structured per region of interest (each will be a separate experiment with a different NN), e.g.
/Colorado
- data are inside data folder
/Data
- single file per species and date (including all stations and 24h), e.g.
obs_$nvar_$YYYYMMDD.nc
- data files are structured as follows: index is date_utc, columns are stations indicated by id (size is timesteps x stations).
Competitor: ensemble mean calculated over all CTM forecasts
- root folder:
mod
- structured per region of interest (each will be a separate experiment with a different NN), e.g.
/colorado
- stored in directory
/ens
- structure same as for inputs
|-- mod
| `-- colorado
| |-- ens (MEAN of CTM#1-3)
| | `-- $YYYYMMDD (%forecast_date)
| | `-- interpolated
| | `-- $nvar_ensmean_inp.nc # time series of ensemble mean (competitor)
| |-- lotos_tno (CTM#1)
| | |-- $YYYYMMDD (%forecast_date)
| | | `-- interpolated
| | | `-- $nvar_TNO_inp.nc # time series of CTM#1 (input feature)
| | `-- stations_colo.csv # Station info
| |-- silam_fmi (CTM#2)
| | |-- $YYYYMMDD (%forecast_date)
| | | `-- interpolated
| | | `-- $nvar_FMI_inp.nc # time series of CTM#2 (input feature)
| | `-- stations_colo.csv
| `-- wrf_ucar (CTM#3)
| |-- $YYYYMMDD (%forecast_date)
| | `-- interpolated
| | `-- $nvar_UCAR_inp.nc # time series of CTM#3 (input feature)
| `-- stations_colo.csv
`-- obs
`-- obs_download_scripts
`-- Colorado
|-- stations_colo.csv
`-- Data
`--obs_$nvar_$YYYYMMDD.nc (%obs_date) # time series of observation (target)
$nvar = ['co','so2','no2','o3','pm10','pm25'] # pollutant species available
%forecast_date = current day
%obs_date = one day before current day
Start Script
This is a basic script that could be used for the AQWatch data handler. The script does not setup the NN explicitly but can be used to check if the workflow passes.
__author__ = "Lukas Leufen"
__date__ = '2022-05-18'
import argparse
import os
import sys
sys.path.append("<abs_path_to_mlair>")
from mlair.workflows import DefaultWorkflow
from mlair.data_handler.data_handler_aqwatch import DataHandlerAQWatch, DataHandlerAQWatchSingleStation
def main(parser_args):
args = dict(data_handler=DataHandlerAQWatch,
interpolation_limit=3, overwrite_local_data=False,
overwrite_lazy_data=True,
lazy_preprocessing=True,
train_min_length=0, # just replace defaults which are 90
val_min_length=0, # just replace defaults which are 90
test_min_length=0, # just replace defaults which are 90
window_history_size=0, # has to be 0 to indicate t0
window_lead_time=0, # has to be 0 to indicate t0
start="2022-05-01", # start and train_start should be the same
train_start="2022-05-01",
train_end="2022-05-02",
val_start="2022-05-02",
val_end="2022-05-10",
test_start="2022-05-10",
test_end="2022-05-20",
end="2022-05-20", # end and test_end should be the same
region="colorado", # specify the region
variables=["no2"], # this sets your variable, currently it is not possible to use more than one
target_var=["no2"], # this sets your variable, currently it is not possible to use more than one
ctm_list=["test_ctm", "test2_ctm"], # name models to use
competitors=["aqw_ens_mean"],
sampling="hourly",
#stations=['80050006', '80131001', '80130014', '80350004', '80410017', '80410015', '80410013',
# '80830006', '80310002', '80310013', '80691004', '80690011', '80690009', '80310028',
# '80590011', '80519991', '80770017', '81230009', '81230006', '80050002', '80310027',
# '80310026', '80130003', '80410016', '80830101', '80770020', '81030006', '80450012',
# '80450007', '80590006', '80690007', '80699991', '80677001', '80677003', '80013001',
# '80970008'],
stations=["Aurora East", "Boulder Reservoir","Chatfield Park - 11500 N. Roxborough Park Rd.","Colorado Springs - USAF Academy","Cortez Ozone","Denver - CAMP - 2105 Broadway","Fort Collins - CSU - 708 S. Mason St.","Fort Collins - West - Laporte Ave. & Overland Tr.","Golden - NREL - 2054 Quaker St.","Gothic","Greeley - Weld Co. Tower - 3101 35th Ave.","Highland Reservoir - 8100 S. University Blvd.","La Casa NCORE - 4545 Navajo St.","Manitou Springs","Mesa Verde NP","Palisade Ozone","Rangely, CO","Rifle Ozone","Rocky Flats - N - 16600 W. Colo. Hwy. 128","Rocky Mountain NP","Ute 1","Ute 3","Welby - 78th Ave. & Steele St."],
transformation={
"o3": {"method": "log"},
"no": {"method": "log"},
"no2": {"method": "log"}, },
data_path=os.path.join(".", "data", "aqw_data"), # <- root folder of data containing obs and mod
**parser_args.__dict__,
)
workflow = DefaultWorkflow(**args, start_script=__file__)
workflow.run()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--experiment_date', metavar='--exp_date', type=str, default=None,
help="set experiment date as string")
args = parser.parse_args(["--experiment_date", "testrun"])
main(args)
TODOs
-
include competitor: maybe write a class that can load the AQWatch data (and stores them), similar to the IntelliO3 competitor -
check if workflow works from begin to end -
be able to load meta data as lon/lat from meta files as station_colo.csv
Edited by Wing Yi Li