produce_data_many_years_withOptional.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example with optional parameters\n",
    "Toargridding has a number of required arguments for a dataset. Those include the time range, variable and statistical analysis. The TAOR-DB has a large number of metadata fileds that can be used to further refine this request.\n",
    "A python dictionary can be provided to include theses other fields. The analysis service provides an error message, if the requested parameters does not exist (check for typos) or if the provided value is wrong.\n",
    "\n",
    "In this example we want to obtain data from 2000 to 2018 (maybe change this, if you want your results faster:-)).\n",
    "\n",
    "The fist block contains the includes and the setup of the logging."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime as dt\n",
    "from collections import namedtuple\n",
    "from pathlib import Path\n",
    "\n",
    "from toargridding.toar_rest_client import AnalysisServiceDownload, Connection\n",
    "from toargridding.grids import RegularGrid\n",
    "from toargridding.gridding import get_gridded_toar_data\n",
    "from toargridding.metadata import TimeSample"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now want to include packages for logging. We want so see some output in the shell, we want to log exceptions and we maybe want to have a logfile to review everything later:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "import logging\n",
    "from toargridding.defaultLogging import toargridding_defaultLogging\n",
    "\n",
    "#setup of logging\n",
    "logger = toargridding_defaultLogging()\n",
    "logger.addShellLogger(logging.DEBUG)\n",
    "logger.logExceptions()\n",
    "logger.addRotatingLogFile(Path(\"log/produce_data_withOptional.log\"))#we need to explicitly set a logfile\n",
    "#logger.addSysLogger(logging.DEBUG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparation of requests\n",
    "in the next block we prepare the request to the analysis service.\n",
    "The dictionary details4Query adds additional requirements to the request. Here, the two fields *toar1_category* and *type_of_area* are used. Both stand for a classification of stations depending on the surrounding area. It is advised to use only one at once. \n",
    "\n",
    "\n",
    "moreOptions is implemented as a dict to add additional arguments to the query to the REST API\n",
    "For example the field *toar1_category* with its possible values Urban, RuralLowElevation, RuralHighElevation and Unclassified can be added \n",
    "(see page 18 in https://toar-data.fz-juelich.de/sphinx/TOAR_UG_Vol03_Database/build/latex/toardatabase--userguide.pdf).\n",
    "Or *type_of_area* with urban, suburban and rural on page 20 can be used.\n",
    "\n",
    "There are a many more metadata available in the user guide, feel free to look around."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#creation of request.\n",
    "# helper to keep the configuration together\n",
    "Config = namedtuple(\"Config\", [\"grid\", \"time\", \"variables\", \"stats\",\"moreOptions\"])\n",
    "\n",
    "#uncomment, what you want to test:-)\n",
    "details4Query ={\n",
    "    #\"toar1_category\" : \"Urban\" #uncomment if wished:-)\n",
    "    #\"toar1_category\" : \"RuralLowElevation\" #uncomment if wished:-)\n",
    "    #\"toar1_category\" : \"RuralHighElevation\" #uncomment if wished:-)\n",
    "    #\"type_of_area\" : \"Urban\" #also test Rural, Suburban,\n",
    "    \"type_of_area\" : \"Rural\" #also test Rural, Suburban,\n",
    "    #\"type_of_area\" : \"Suburban\" #also test Rural, Suburban,\n",
    "}\n",
    "\n",
    "#a regular grid with 1.9°x2.5° resolution. A warning will be issued as 1.9° does not result in  natural number of grid cells.\n",
    "grid = RegularGrid( lat_resolution=1.9, lon_resolution=2.5, )\n",
    "\n",
    "configs = dict()\n",
    "\n",
    "# we split the request into one request per year.\n",
    "for year in range(0,19):\n",
    "    valid_data = Config(\n",
    "        grid,\n",
    "        TimeSample( start=dt(2000+year,1,1), end=dt(2000+year,12,31), sampling=\"daily\"),#possibly adopt range:-)\n",
    "        #TimeSample( start=dt(2000+year,1,1), end=dt(2000+year,12,31), sampling=\"monthly\"),#possibly adopt range:-)\n",
    "        [\"mole_fraction_of_ozone_in_air\"],#variable name\n",
    "        #[ \"mean\", \"dma8epax\"],# will start one request after another other...\n",
    "        [ \"mean\" ],\n",
    "        details4Query\n",
    "    )\n",
    "    \n",
    "    configs[f\"test_ta{year}\"] = valid_data\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now need to setup the out connection to the analysis service of the TOAR database. We also setup directories to store the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats_endpoint = \"https://toar-data.fz-juelich.de/api/v2/analysis/statistics/\"\n",
    "cache_basepath = Path(\"cache\")\n",
    "result_basepath = Path(\"results\")\n",
    "cache_basepath.mkdir(exist_ok=True)\n",
    "result_basepath.mkdir(exist_ok=True)\n",
    "analysis_service = AnalysisServiceDownload(stats_endpoint=stats_endpoint, cache_dir=cache_basepath, sample_dir=result_basepath, use_downloaded=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download and gridding:\n",
    "Now we come to the last step: we want to download the data, process them and store them to disc.\n",
    "\n",
    "CAVE: this cell runs about 45minutes per requested year. therefore we increase the waiting duration to 1h per request.\n",
    "the processing is done on the server of the TOAR database.\n",
    "A restart of the cell continues the request to the REST API. Data are cached on the local computer to prevent repetitive downloads.\n",
    "The download can also take a few minutes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# maybe adopt the interval for requesting the results and the total duration, before the client pauses the requests.\n",
    "# as the requests take about 45min, it is more suitable to wait 60min before timing out the requests than the original 30min.\n",
    "analysis_service.connection.set_request_times(interval_min=5, max_wait_minutes=60)\n",
    "\n",
    "for person, config in configs.items():\n",
    "    print(f\"\\nProcessing {person}:\")\n",
    "    print(f\"--------------------\")\n",
    "    datasets, metadatas = get_gridded_toar_data(\n",
    "        analysis_service=analysis_service,\n",
    "        grid=config.grid,\n",
    "        time=config.time,\n",
    "        variables=config.variables,\n",
    "        stats=config.stats,\n",
    "        contributors_path=result_basepath,\n",
    "        **config.moreOptions\n",
    "    )\n",
    "\n",
    "    for dataset, metadata in zip(datasets, metadatas):\n",
    "        dataset.to_netcdf(result_basepath / f\"{metadata.get_id()}_{config.grid.get_id()}.nc\")\n",
    "        print(metadata.get_id())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "toargridding-8RVrxzmn-py3.11",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}