added script as fallback for notebook.

added comment to readme.

added script as fallback for notebook.
4a5c89ab · Carsten Hinz · 5c8a904f · 4a5c89ab · 4a5c89ab · 4a5c89ab
Commit 4a5c89ab authored 2 months ago by Carsten Hinz
--- a/README.md
+++ b/README.md
@@ -132,6 +132,10 @@ The aim of this first example is the creation of a gridded dataset and the visua
 For the visualization, `cartopy` is required, which might need dependencies, that might not be installed by `pip`. If you are experiencing any issues, do not hesitate to continue with the next examples.
 If you are still curious for the results, we have uploaded the resulting map as title image of this README.

+CAVE: We observed for a specific version and case, that the notebook is running into an exception from `matplotlib.pyplot.pcolormesh`.
+Please be aware, that the error message complains about the usage of 'flat' instead if 'nearest' shading. Here, the value changes at some point after calling the plot function. We did not investigated this further...
+There is also an export called `examples/00_download_and_vis_export.py`. This runs without an issue.
+
 ## 01: Retrieval of one week:
 ```bash
 jupyter notebook examples/01_produce_data_one_week.ipynb

--- a/examples/00_download_and_vis_export.py
+++ b/examples/00_download_and_vis_export.py
+# %% [markdown]
+# ### Get Dataset from request
+# 
+# This cell imports all required packages and sets up the logging as well as the required information for the requests to the TOAR-DB.
+# 
+# We will receive a warning as the lateral resolution of 1.9° does not result in a natural number of cells. Therefore, it is slightly adopted.
+
+# %%
+from datetime import datetime as dt
+from pathlib import Path
+
+import pandas as pd
+
+from toargridding.grids import RegularGrid
+from toargridding.toar_rest_client import (
+    AnalysisServiceDownload,
+    STATION_LAT,
+    STATION_LON,
+)
+from toargridding.metadata import Metadata, TimeSample, AnalysisRequestResult, Coordinates
+from toargridding.variables import Coordinate
+
+from toargridding.contributors import contributions_manager_by_id
+
+import logging
+
+
+# %% [markdown]
+# #### Setup of logging
+# 
+# In the next step we setup the logging, i.e. the level of information that is displayed as output. 
+# 
+# We start with a default setup and restrict the output to information and more critical output like warnings and errors.
+
+# %%
+from toargridding.defaultLogging import toargridding_defaultLogging
+
+logger = toargridding_defaultLogging()
+logger.addShellLogger(logging.INFO)
+logger.logExceptions()
+
+# %% [markdown]
+# #### selection of data and grid:
+# We need to select a temporal aggregation by selecting one week of data with a daily sampling.
+# With this sampling we define our metadata for the request. As variable we select ozone and as statistical aggregation a mean. 
+# 
+# The last step is the definition of the grid. We select a resolution of 2° in latitude and 2.5° in longitude.
+
+# %%
+
+time = TimeSample(dt(2016,3,1), dt(2016,3,3), "daily")
+metadata = Metadata.construct("mole_fraction_of_ozone_in_air", time, "mean")
+my_grid = RegularGrid(2.0, 2.5)
+
+# %% [markdown]
+# #### Setting up the analysis
+# 
+# We need to prepare our connection to the analysis service of the toar database, which will provide us with temporal and statistical aggregated data. 
+# Besides the url of the service, we also need to setup two directories on our computer:
+# - one to save the data provided by the analysis service (called cache)
+# - a second to store our gridded dataset (called results)
+# Those will be found in the directory examples/cache and examples/results.
+
+# %%
+
+endpoint = "https://toar-data.fz-juelich.de/api/v2/analysis/statistics/"
+#starts in directory [path/to/toargridding]/examples
+toargridding_base_path = Path("examples")
+cache_dir = toargridding_base_path / "cache"
+data_download_dir = toargridding_base_path / "results"
+
+cache_dir.mkdir(exist_ok=True)
+data_download_dir.mkdir(exist_ok=True)
+analysis_service = AnalysisServiceDownload(endpoint, cache_dir, data_download_dir, use_downloaded=True)
+
+# %% [markdown]
+# #### Download of data and writing to disc:
+# In the next step we want to download the data and store them to disc.
+# 
+# To obtain the contributors for this dataset, we need to create a dedicated file. This can be uploaded to the TOAR database to obtain a preformatted list of contributors. The required recipe can be found in the global metadata of the netCDF file.
+# 
+# The request the database can take several minutes. This duration is also dependent on the overall usage of the services. The `get_data` function checks every 5minutes, if the data are ready for download. After 30min this cell stops the execution. Simply restart this cell to continue checking for the results.
+
+# %%
+# this cell can run longer than 30minutes
+data = analysis_service.get_data(metadata)
+
+# create contributors endpoint and write result to metadata
+contrib = contributions_manager_by_id(metadata.get_id(), data_download_dir)
+contrib.extract_contributors_from_data_frame(data.stations_data)
+metadata.contributors_metadata_field = contrib.setup_contributors_endpoint_for_metadata()
+
+ds = my_grid.as_xarray(data)
+#store dataset
+out_file_name = data_download_dir / f"{metadata.get_id()}_{my_grid.get_id()}.nc"
+ds.to_netcdf(out_file_name)
+
+print("Gridded data have been written to ", out_file_name)
+
+# %% [markdown]
+# ### Visual inspection
+
+# %% [markdown]
+# We are working here with raw data and also want to visualize the station positions. Therefore, we want to distinguish stations that have valid data and those without valid data.  
+
+# %%
+#calculation of coordinates for plotting
+#especially separation of coordinates with results and without results.
+
+mean_data = ds["mean"]
+clean_coords = data.stations_coords
+all_na = data.stations_data.isna().all(axis=1)
+clean_coords = all_na.to_frame().join(clean_coords)[["latitude", "longitude"]]
+all_na_coords = clean_coords[all_na]
+not_na_coords = clean_coords[~all_na]
+
+# %% [markdown]
+# In the next step we prepare a function for plotting the gridded data to a world map. The flag *discrete* influences the creation of the color bar. The *plot_stations* flag allows including the station positions into the map.
+
+# %%
+import cartopy.crs as ccrs
+import matplotlib as mpl
+import matplotlib.pyplot as plt
+import matplotlib.ticker as mticker
+import numpy as np
+
+#definition of plotting function
+
+def plot_cells(data, stations, na_stations, discrete=True, plot_stations=False):
+    fig = plt.figure(figsize=(9, 18))
+
+    ax = plt.axes(projection=ccrs.PlateCarree())
+    ax.coastlines()
+    gl = ax.gridlines(draw_labels=True)
+    gl.top_labels = False
+    gl.left_labels = False
+    gl.xlocator = mticker.FixedLocator(data.longitude.values)
+    gl.ylocator = mticker.FixedLocator(data.latitude.values)
+
+    cmap = mpl.cm.viridis
+
+    if discrete:
+        print(np.unique(data.values))
+        bounds = np.arange(8)
+        norm = mpl.colors.BoundaryNorm(bounds, cmap.N, extend="both")
+        ticks = np.arange(bounds.size + 1)[:-1] + 0.5
+        ticklables = bounds
+        
+        im = plt.pcolormesh(
+            data.longitude,
+            data.latitude,
+            data,
+            transform=ccrs.PlateCarree(),
+            cmap=cmap,
+            shading="nearest",
+            norm=norm,
+        )
+        cb = fig.colorbar(im, ax=ax, shrink=0.2, aspect=25)
+        cb.set_ticks(ticks)
+        cb.set_ticklabels(ticklables)
+        im = plt.pcolormesh(
+            data.longitude,
+            data.latitude,
+            data,
+            transform=ccrs.PlateCarree(),
+            cmap=cmap,
+            shading="nearest",
+            norm=norm,
+        )
+    else:
+        im = plt.pcolormesh(
+            data.longitude,
+            data.latitude,
+            data,
+            transform=ccrs.PlateCarree(),
+            cmap=cmap,
+            shading="nearest",
+        )
+
+        cb = fig.colorbar(im, ax=ax, shrink=0.2, aspect=25)
+    
+
+    if plot_stations:
+        plt.scatter(na_stations["longitude"], na_stations["latitude"], s=1, c="k")
+        plt.scatter(stations["longitude"], stations["latitude"], s=1, c="r")
+
+    plt.tight_layout()
+
+    plt.title(f"global ozon at {data.time.values} {data.time.units}")
+
+# %% [markdown]
+# Now we do the actual plotting. We select a single time from the dataset. To obtain two maps: 1) the mean ozone concentration per grid point and second the number of stations contributing to a grid point.
+
+# %%
+#example visualization for two time points
+print(not_na_coords)
+timestep = 2
+time = ds.time[timestep]
+data = ds.sel(time=time)
+print(data)
+plot_cells(data["mean"], not_na_coords, all_na_coords, discrete=False, plot_stations=True)
+
+plot_cells(data["n"], not_na_coords, all_na_coords, discrete=True)
+
+n_observations = ds["n"].sum(["latitude", "longitude"])
+plt.plot(ds.time, n_observations)
+plt.title("Number of Stations per Time Point")
+plt.xlabel("Time")
+plt.ylabel("Number of Stations")
+print(np.unique(ds["n"]))
+
+# %% [markdown]
+# Last but not least: We print the data and metadata of the dataset. Especially a look into the metadata can be interesting.
+
+# %%
+print(data)
+
+plt.show()
+
--- a/examples/00_download_and_visualization.ipynb
+++ b/examples/00_download_and_visualization.ipynb
@@ -290,12 +290,12 @@
    "timestep = 2\n",
    "time = ds.time[timestep]\n",
    "selected_data = ds.sel(time=time)\n",
-    "\n",
+    "print(selected_data)\n",
    "plot_cells(selected_data[\"mean\"], not_na_coords, all_na_coords, discrete=False, plot_stations=False)\n",
    "plt.show()\n",
    "\n",
    "plot_cells(selected_data[\"n\"], not_na_coords, all_na_coords, discrete=True)\n",
-    "plt.show(\n",
+    "plt.show()\n",
    "\n",
    "n_observations = ds[\"n\"].sum([\"latitude\", \"longitude\"])\n",
    "plt.plot(ds.time, n_observations)\n",

 %% Cell type:markdown id: tags:

 ### Get Dataset from request

 This cell imports all required packages and sets up the logging as well as the required information for the requests to the TOAR-DB.

 We will receive a warning as the lateral resolution of 1.9° does not result in a natural number of cells. Therefore, it is slightly adopted.

 %% Cell type:code id: tags:

 ``` python
 from datetime import datetime as dt
 from pathlib import Path

 import pandas as pd

 from toargridding.grids import RegularGrid
 from toargridding.toar_rest_client import (
    AnalysisServiceDownload,
    STATION_LAT,
    STATION_LON,
 )
 from toargridding.metadata import Metadata, TimeSample, AnalysisRequestResult, Coordinates
 from toargridding.variables import Coordinate

 from toargridding.contributors import contributions_manager_by_id

 ```

 %% Cell type:markdown id: tags:

 #### Setup of logging

 In the next step we setup the logging, i.e. the level of information that is displayed as output.

 We start with a default setup and restrict the output to information and more critical output like warnings and errors.

 %% Cell type:code id: tags:

 ``` python
 from toargridding.defaultLogging import toargridding_defaultLogging
 import logging

 logger = toargridding_defaultLogging()
 logger.addShellLogger(logging.INFO)
 logger.logExceptions()
 ```

 %% Cell type:markdown id: tags:

 #### selection of data and grid:
 We need to select a temporal aggregation by selecting one week of data with a daily sampling.
 With this sampling we define our metadata for the request. As variable we select ozone and as statistical aggregation a mean.

 We also select an offline post processing of the data: we average the mean calculated for all timeseries at a station to create a single value.
 Last but not least, we select a data quality flag with 'AllOK', requesting that all data points need to pass the quality tests of the provider and the automatic tests of the TOAR data infrastructure.
 Other options exclude the tests of performed by TOAR.

 The last step is the definition of the grid. We select a resolution of 2° in latitude and 2.5° in longitude.

 %% Cell type:code id: tags:

 ``` python

 time = TimeSample(dt(2016,3,1), dt(2016,3,3), "daily")
 metadata = Metadata.construct("mole_fraction_of_ozone_in_air", time, "mean", "meanTSbyStation", data_quality_flag="AllOK")
 my_grid = RegularGrid(2.0, 2.5)
 ```

 %% Cell type:markdown id: tags:

 #### Setting up the analysis

 We need to prepare our connection to the analysis service of the toar database, which will provide us with temporal and statistical aggregated data.
 Besides the url of the service, we also need to setup two directories on our computer:
 - one to save the data provided by the analysis service (called cache)
 - a second to store our gridded dataset (called results)
 Those will be found in the directory examples/cache and examples/results.

 %% Cell type:code id: tags:

 ``` python

 endpoint = "https://toar-data-testing.fz-juelich.de/api/v2/analysis/statistics/"
 #starts in directory [path/to/toargridding]/examples
 toargridding_base_path = Path(".")
 cache_dir = toargridding_base_path / "cache"
 data_download_dir = toargridding_base_path / "results"

 cache_dir.mkdir(exist_ok=True)
 data_download_dir.mkdir(exist_ok=True)
 analysis_service = AnalysisServiceDownload(endpoint, cache_dir, data_download_dir, use_downloaded=True)
 ```

 %% Cell type:markdown id: tags:

 #### Download of data and writing to disc:
 In the next step we want to download the data and store them to disc.

 To obtain the contributors for this dataset, we need to create a dedicated file. This can be uploaded to the TOAR database to obtain a preformatted list of contributors. The required recipe can be found in the global metadata of the netCDF file.

 The request the database can take several minutes. This duration is also dependent on the overall usage of the services. The `get_data` function checks every 5minutes, if the data are ready for download. After 30min this cell stops the execution. Simply restart this cell to continue checking for the results.

 %% Cell type:code id: tags:

 ``` python
 # this cell can run longer than 30minutes
 data = analysis_service.get_data(metadata)

 # create contributors endpoint and write result to metadata
 contrib = contributions_manager_by_id(metadata.get_id(), data_download_dir)
 contrib.extract_contributors_from_data_frame(data.stations_data)
 metadata.contributors_metadata_field = contrib.setup_contributors_endpoint_for_metadata()

 ds = my_grid.as_xarray(data)
 ```

 %% Cell type:markdown id: tags:

 ### Visual inspection

 %% Cell type:markdown id: tags:

 We are working here with raw data and also want to visualize the station positions. Therefore, we want to distinguish stations that have valid data and those without valid data.

 %% Cell type:code id: tags:

 ``` python
 #calculation of coordinates for plotting
 #especially separation of coordinates with results and without results.

 clean_coords = data.stations_coords
 all_na = data.stations_data.isna().all(axis=1)
 clean_coords = all_na.to_frame().join(clean_coords)[["latitude", "longitude"]]
 all_na_coords = clean_coords[all_na]
 not_na_coords = clean_coords[~all_na]
 ```

 %% Cell type:code id: tags:

 ``` python
 print(ds)
 ```

 %% Cell type:markdown id: tags:

 In the next step we prepare a function for plotting the gridded data to a world map. The flag *discrete* influences the creation of the color bar. The *plot_stations* flag allows including the station positions into the map.

 %% Cell type:code id: tags:

 ``` python
 import cartopy.crs as ccrs
 import matplotlib as mpl
 import matplotlib.pyplot as plt
 import matplotlib.ticker as mticker
 import numpy as np

 #definition of plotting function

 def plot_cells(data, stations, na_stations, discrete=True, plot_stations=False):
    fig = plt.figure(figsize=(9, 18))

    ax = plt.axes(projection=ccrs.PlateCarree())
    ax.coastlines()
    gl = ax.gridlines(draw_labels=True)
    gl.top_labels = False
    gl.left_labels = False
    gl.xlocator = mticker.FixedLocator(data.longitude.values)
    gl.ylocator = mticker.FixedLocator(data.latitude.values)

    cmap = mpl.cm.viridis

    if discrete:
        print(np.unique(data.values))
        bounds = np.arange(8)
        norm = mpl.colors.BoundaryNorm(bounds, cmap.N, extend="both")
        ticks = np.arange(bounds.size + 1)[:-1] + 0.5
        ticklables = bounds

        im = plt.pcolormesh(
            data.longitude,
            data.latitude,
            data,
            transform=ccrs.PlateCarree(),
            cmap=cmap,
            shading="nearest",
            norm=norm,
        )
        cb = fig.colorbar(im, ax=ax, shrink=0.2, aspect=25)
        cb.set_ticks(ticks)
        cb.set_ticklabels(ticklables)
        im = plt.pcolormesh(
            data.latitude,
            data.longitude,
            data,
            transform=ccrs.PlateCarree(),
            cmap=cmap,
            shading="nearest",
            norm=norm,
        )
    else:
        print(data)
        im = plt.pcolormesh(
            data.latitude,
            data.longitude,
            data,
            transform=ccrs.PlateCarree(),
            cmap=cmap,
            shading="nearest",
        )

        cb = fig.colorbar(im, ax=ax, shrink=0.2, aspect=25)


    if plot_stations:
        plt.scatter(na_stations["longitude"], na_stations["latitude"], s=1, c="k")
        plt.scatter(stations["longitude"], stations["latitude"], s=1, c="r")

    plt.tight_layout()

    unit = data.attrs["units"]
    plt.title(f"global ozone [{unit}] at {data.time.values} {data.time.units}")
 ```

 %% Cell type:markdown id: tags:

 Now we do the actual plotting. We select a single time from the dataset. To obtain two maps: 1) the mean ozone concentration per grid point and second the number of stations contributing to a grid point.

 %% Cell type:code id: tags:

 ``` python
 #example visualization for two time points
 #print(not_na_coords)
 timestep = 2
 time = ds.time[timestep]
 selected_data = ds.sel(time=time)
-
+print(selected_data)
 plot_cells(selected_data["mean"], not_na_coords, all_na_coords, discrete=False, plot_stations=False)
 plt.show()

 plot_cells(selected_data["n"], not_na_coords, all_na_coords, discrete=True)
-plt.show(
+plt.show()

 n_observations = ds["n"].sum(["latitude", "longitude"])
 plt.plot(ds.time, n_observations)
 plt.title("Number of Contributions per Time Point")
 plt.xlabel("Time")
 plt.ylabel("Number of contributions")
 print(np.unique(ds["n"]))
 ```

 %% Cell type:markdown id: tags:

 Last but not least: We print the data and metadata of the dataset. Especially a look into the metadata can be interesting.

 %% Cell type:code id: tags:

 ``` python
 print(data)
 ```