03_produce_data_station_metadata.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example with optional parameters\n",
    "Toargridding has a number of required arguments for a dataset. Those include the time range, variable and statistical analysis. The TAOR-DB has a large number of metadata fileds that can be used to further refine this request.\n",
    "A python dictionary can be provided to include theses other fields. The analysis service provides an error message, if the requested parameters does not exist (check for typos) or if the provided value is wrong.\n",
    "\n",
    "In this example we want to obtain data from 2012.\n",
    "\n",
    "The fist block contains the includes and the setup of the logging."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### inclusion of packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "from datetime import datetime as dt\n",
    "from collections import namedtuple\n",
    "from pathlib import Path\n",
    "\n",
    "from toargridding.toar_rest_client import AnalysisServiceDownload, Connection\n",
    "from toargridding.grids import RegularGrid\n",
    "from toargridding.gridding import get_gridded_toar_data\n",
    "from toargridding.metadata import TimeSample\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Setup of logging\n",
    "\n",
    "In the next step we setup the logging, i.e. the level of information that is displayed as output. \n",
    "\n",
    "We start with a default setup and restrict the output to information and more critical output like warnings and errors.\n",
    "\n",
    "We also add logging to a file. This will create a new log file at midnight and keep up to 7 log files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from toargridding.defaultLogging import toargridding_defaultLogging\n",
    "\n",
    "logger = toargridding_defaultLogging()\n",
    "logger.addShellLogger(logging.INFO)\n",
    "logger.logExceptions()\n",
    "log_path = Path(\"log\")\n",
    "log_path.mkdir(exist_ok=True)\n",
    "logger.addRotatingLogFile( log_path / \"produce_data_station_metadata.log\")#we need to explicitly set a logfile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Setting up the analysis\n",
    "\n",
    "We need to prepare our connection to the analysis service of the toar database, which will provide us with temporal and statistical aggregated data. \n",
    "Besides the url of the service, we also need to setup two directories on our computer:\n",
    "- one to save the data provided by the analysis service (called cache)\n",
    "- a second to store our gridded dataset (called results)\n",
    "Those will be found in the directory examples/cache and examples/results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "stats_endpoint = \"https://toar-data.fz-juelich.de/api/v2/analysis/statistics/\"\n",
    "cache_basepath = Path(\"cache\")\n",
    "result_basepath = Path(\"results\")\n",
    "cache_basepath.mkdir(exist_ok=True)\n",
    "result_basepath.mkdir(exist_ok=True)\n",
    "analysis_service = AnalysisServiceDownload(stats_endpoint=stats_endpoint, cache_dir=cache_basepath, sample_dir=result_basepath, use_downloaded=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our following request will take some time, so we edit the durations between two checks, if our data are ready for download and the maximum duration for checking.\n",
    "We will check every 45min for 12h. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "analysis_service.connection.set_request_times(interval_min=15, max_wait_minutes=12*60)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Preparation of requests with station metadata\n",
    "\n",
    "We restrict our request to one year and of daily mean ozone data. In addition we would like to only include urban stations.\n",
    "\n",
    "We use a container class to keep the configurations together (type: namedtuple).\n",
    "\n",
    "We also want to refine our station selection by using further metadata.\n",
    "Therefore, we create the `station_metadata` dictionary. We can use the further metadata stored in the TOAR-DB by providing their name and our desired value. This also discards stations, without a provided value for a metadata field. We can find information on different metadata values in the [documentation](https://toar-data.fz-juelich.de/sphinx/TOAR_UG_Vol03_Database/build/latex/toardatabase--userguide.pdf). For example for the *toar1_category* on page 18 and for the *type_of_area* on page 20.\n",
    "\n",
    "We can use this to filter for all additional metadata, which are supported by the [statistics endpoint of the analysis service](https://toar-data.fz-juelich.de/api/v2/analysis/#statistics), namely station metadata and timeseries metadata. \n",
    "\n",
    "In the end we have wo requests, that we want to submit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Config = namedtuple(\"Config\", [\"grid\", \"time\", \"variables\", \"stats\", \"station_metadata\"])\n",
    "\n",
    "#uncomment, if you want to change the metadata:\n",
    "station_metadata ={\n",
    "    #\"toar1_category\" : \"Urban\" #uncomment if wished:-)\n",
    "    \"type_of_area\" : \"Urban\" #also test Rural, Suburban,\n",
    "}\n",
    "\n",
    "grid = RegularGrid( lat_resolution=1.9, lon_resolution=2.5, )\n",
    "\n",
    "configs = dict()\n",
    "request_config = Config(\n",
    "    grid,\n",
    "    TimeSample( start=dt(2012,1,1), end=dt(2012,12,31), sampling=\"daily\"),\n",
    "    [\"mole_fraction_of_ozone_in_air\"],\n",
    "    [ \"mean\" ],\n",
    "    station_metadata\n",
    ")\n",
    "configs[f\"test_ta\"] = request_config\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### execution of toargridding and saving of results \n",
    "Now we want to request the data from the TOAR analysis service and create the gridded dataset.\n",
    "Therefore, we call the function `get_gridded_toar_data` with everything we have prepared until now.\n",
    "\n",
    "The request will be submitted to the analysis service, which will process the request. On our side, we will check in intervals, if the processing is finished. After several request, we will stop checking. The setup for this can be found a few cells above.\n",
    "A restart of this cell allows to continue the look-up, if the data are available.\n",
    "\n",
    "The obtained data are stored in the result directory (`results_basepath`). Before submitting a request, toargridding checks his cache, if the data have already been downloaded.\n",
    "\n",
    "Last but not least, we want to save our dataset as netCDF file.\n",
    "In the global metadata of this file we can find a recipe on how to obtain a list of contributors with the contributors file created by `get_gridded_toar_data`. This function also creates the required  file with the extension \"*.contributors\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "for config_id, config in configs.items():\n",
    "    print(f\"\\nProcessing {config_id}:\")\n",
    "    print(f\"--------------------\")\n",
    "    datasets, metadatas = get_gridded_toar_data(\n",
    "        analysis_service=analysis_service,\n",
    "        grid=config.grid,\n",
    "        time=config.time,\n",
    "        variables=config.variables,\n",
    "        stats=config.stats,\n",
    "        contributors_path=result_basepath,\n",
    "        **config.station_metadata\n",
    "    )\n",
    "\n",
    "    for dataset, metadata in zip(datasets, metadatas):\n",
    "        dataset.to_netcdf(result_basepath / f\"{metadata.get_id()}_{config.grid.get_id()}.nc\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.21"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}