00_download_and_visualization.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get Dataset from request\n",
    "\n",
    "This cell imports all required packages and sets up the logging as well as the required information for the requests to the TOAR-DB.\n",
    "\n",
    "We will receive a warning as the lateral resolution of 1.9° does not result in a natural number of cells. Therefore, it is slightly adopted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime as dt\n",
    "from pathlib import Path\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "from toargridding.grids import RegularGrid\n",
    "from toargridding.toar_rest_client import (\n",
    "    AnalysisServiceDownload,\n",
    "    STATION_LAT,\n",
    "    STATION_LON,\n",
    ")\n",
    "from toargridding.metadata import Metadata, TimeSample, AnalysisRequestResult, Coordinates\n",
    "from toargridding.variables import Coordinate\n",
    "\n",
    "from toargridding.contributors import contributions_manager_by_id\n",
    "\n",
    "import logging\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Setup of logging\n",
    "\n",
    "In the next step we setup the logging, i.e. the level of information that is displayed as output. \n",
    "\n",
    "We start with a default setup and restrict the output to information and more critical output like warnings and errors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from toargridding.defaultLogging import toargridding_defaultLogging\n",
    "\n",
    "logger = toargridding_defaultLogging()\n",
    "logger.addShellLogger(logging.INFO)\n",
    "logger.logExceptions()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### selection of data and grid:\n",
    "We need to select a temporal aggregation by selecting one week of data with a daily sampling.\n",
    "With this sampling we define our metadata for the request. As variable we select ozone and as statistical aggregation a mean. \n",
    "\n",
    "The last step is the definition of the grid. We select a resolution of 2° in latitude and 2.5° in longitude."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "time = TimeSample(dt(2016,1,1), dt(2016,1,8), \"daily\")\n",
    "metadata = Metadata.construct(\"mole_fraction_of_ozone_in_air\", time, \"mean\")\n",
    "my_grid = RegularGrid(2.0, 2.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Setting up the analysis\n",
    "\n",
    "We need to prepare our connection to the analysis service of the toar database, which will provide us with temporal and statistical aggregated data. \n",
    "Besides the url of the service, we also need to setup two directories on our computer:\n",
    "- one to save the data provided by the analysis service (called cache)\n",
    "- a second to store our gridded dataset (called results)\n",
    "Those will be found in the directory examples/cache and examples/results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "endpoint = \"https://toar-data.fz-juelich.de/api/v2/analysis/statistics/\"\n",
    "#starts in directory [path/to/toargridding]/examples\n",
    "toargridding_base_path = Path(\".\")\n",
    "cache_dir = toargridding_base_path / \"cache\"\n",
    "data_download_dir = toargridding_base_path / \"results\"\n",
    "\n",
    "cache_dir.mkdir(exist_ok=True)\n",
    "data_download_dir.mkdir(exist_ok=True)\n",
    "analysis_service = AnalysisServiceDownload(endpoint, cache_dir, data_download_dir, use_downloaded=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Download of data and writing to disc:\n",
    "In the next step we want to download the data and store them to disc.\n",
    "\n",
    "To obtain the contributors for this dataset, we need to create a dedicated file. This can be uploaded to the TOAR database to obtain a preformatted list of contributors. The required recipe can be found in the global metadata of the netCDF file.\n",
    "\n",
    "The request the database can take several minutes. This duration is also dependent on the overall usage of the services. The `get_data` function checks every 5minutes, if the data are ready for download. After 30min this cell stops the execution. Simply restart this cell to continue checking for the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this cell can run longer than 30minutes\n",
    "data = analysis_service.get_data(metadata)\n",
    "\n",
    "# create contributors endpoint and write result to metadata\n",
    "contrib = contributions_manager_by_id(metadata.get_id(), data_download_dir)\n",
    "contrib.extract_contributors_from_data_frame(data.stations_data)\n",
    "metadata.contributors_metadata_field = contrib.setup_contributors_endpoint_for_metadata()\n",
    "\n",
    "ds = my_grid.as_xarray(data)\n",
    "#store dataset\n",
    "out_file_name = data_download_dir / f\"{metadata.get_id()}_{my_grid.get_id()}.nc\"\n",
    "ds.to_netcdf(out_file_name)\n",
    "\n",
    "print(\"Gridded data have been written to \", out_file_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visual inspection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are working here with raw data and also want to visualize the station positions. Therefore, we want to distinguish stations that have valid data and those without valid data.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#calculation of coordinates for plotting\n",
    "#especially separation of coordinates with results and without results.\n",
    "\n",
    "mean_data = ds[\"mean\"]\n",
    "clean_coords = data.stations_coords\n",
    "all_na = data.stations_data.isna().all(axis=1)\n",
    "clean_coords = all_na.to_frame().join(clean_coords)[[\"latitude\", \"longitude\"]]\n",
    "all_na_coords = clean_coords[all_na]\n",
    "not_na_coords = clean_coords[~all_na]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the next step we prepare a function for plotting the gridded data to a world map. The flag *discrete* influences the creation of the color bar. The *plot_stations* flag allows including the station positions into the map."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cartopy.crs as ccrs\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.ticker as mticker\n",
    "import numpy as np\n",
    "\n",
    "#definition of plotting function\n",
    "\n",
    "def plot_cells(data, stations, na_stations, discrete=True, plot_stations=False):\n",
    "    fig = plt.figure(figsize=(9, 18))\n",
    "\n",
    "    ax = plt.axes(projection=ccrs.PlateCarree())\n",
    "    ax.coastlines()\n",
    "    gl = ax.gridlines(draw_labels=True)\n",
    "    gl.top_labels = False\n",
    "    gl.left_labels = False\n",
    "    gl.xlocator = mticker.FixedLocator(data.longitude.values)\n",
    "    gl.ylocator = mticker.FixedLocator(data.latitude.values)\n",
    "\n",
    "    cmap = mpl.cm.viridis\n",
    "\n",
    "    if discrete:\n",
    "        print(np.unique(data.values))\n",
    "        bounds = np.arange(8)\n",
    "        norm = mpl.colors.BoundaryNorm(bounds, cmap.N, extend=\"both\")\n",
    "        ticks = np.arange(bounds.size + 1)[:-1] + 0.5\n",
    "        ticklables = bounds\n",
    "        \n",
    "        im = plt.pcolormesh(\n",
    "            data.longitude,\n",
    "            data.latitude,\n",
    "            data,\n",
    "            transform=ccrs.PlateCarree(),\n",
    "            cmap=cmap,\n",
    "            shading=\"nearest\",\n",
    "            norm=norm,\n",
    "        )\n",
    "        cb = fig.colorbar(im, ax=ax, shrink=0.2, aspect=25)\n",
    "        cb.set_ticks(ticks)\n",
    "        cb.set_ticklabels(ticklables)\n",
    "        im = plt.pcolormesh(\n",
    "            data.longitude,\n",
    "            data.latitude,\n",
    "            data,\n",
    "            transform=ccrs.PlateCarree(),\n",
    "            cmap=cmap,\n",
    "            shading=\"nearest\",\n",
    "            norm=norm,\n",
    "        )\n",
    "    else:\n",
    "        im = plt.pcolormesh(\n",
    "            data.longitude,\n",
    "            data.latitude,\n",
    "            data,\n",
    "            transform=ccrs.PlateCarree(),\n",
    "            cmap=cmap,\n",
    "            shading=\"nearest\",\n",
    "        )\n",
    "\n",
    "        cb = fig.colorbar(im, ax=ax, shrink=0.2, aspect=25)\n",
    "    \n",
    "\n",
    "    if plot_stations:\n",
    "        plt.scatter(na_stations[\"longitude\"], na_stations[\"latitude\"], s=1, c=\"k\")\n",
    "        plt.scatter(stations[\"longitude\"], stations[\"latitude\"], s=1, c=\"r\")\n",
    "\n",
    "    plt.tight_layout()\n",
    "\n",
    "    plt.title(f\"global ozon at {data.time.values} {data.time.units}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we do the actual plotting. We select a single time from the dataset. To obtain two maps: 1) the mean ozone concentration per grid point and second the number of stations contributing to a grid point."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#example visualization for two time points\n",
    "print(not_na_coords)\n",
    "timestep = 2\n",
    "time = ds.time[timestep]\n",
    "data = ds.sel(time=time)\n",
    "\n",
    "plot_cells(data[\"mean\"], not_na_coords, all_na_coords, discrete=False, plot_stations=True)\n",
    "plt.show()\n",
    "\n",
    "plot_cells(data[\"n\"], not_na_coords, all_na_coords, discrete=True)\n",
    "plt.show()\n",
    "\n",
    "n_observations = ds[\"n\"].sum([\"latitude\", \"longitude\"])\n",
    "plt.plot(ds.time, n_observations)\n",
    "print(np.unique(ds[\"n\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Last but not least: We print the data and metadata of the dataset. Especially a look into the metadata can be interesting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(data)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "toargridding-g-KQ1Hyq-py3.10",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}