Skip to content
Snippets Groups Projects
Commit 4a5c89ab authored by Carsten Hinz's avatar Carsten Hinz
Browse files

added script as fallback for notebook.

added  comment to readme.
parent 5c8a904f
Branches
Tags
No related merge requests found
......@@ -132,6 +132,10 @@ The aim of this first example is the creation of a gridded dataset and the visua
For the visualization, `cartopy` is required, which might need dependencies, that might not be installed by `pip`. If you are experiencing any issues, do not hesitate to continue with the next examples.
If you are still curious for the results, we have uploaded the resulting map as title image of this README.
CAVE: We observed for a specific version and case, that the notebook is running into an exception from `matplotlib.pyplot.pcolormesh`.
Please be aware, that the error message complains about the usage of 'flat' instead if 'nearest' shading. Here, the value changes at some point after calling the plot function. We did not investigated this further...
There is also an export called `examples/00_download_and_vis_export.py`. This runs without an issue.
## 01: Retrieval of one week:
```bash
jupyter notebook examples/01_produce_data_one_week.ipynb
......
# %% [markdown]
# ### Get Dataset from request
#
# This cell imports all required packages and sets up the logging as well as the required information for the requests to the TOAR-DB.
#
# We will receive a warning as the lateral resolution of 1.9° does not result in a natural number of cells. Therefore, it is slightly adopted.
# %%
from datetime import datetime as dt
from pathlib import Path
import pandas as pd
from toargridding.grids import RegularGrid
from toargridding.toar_rest_client import (
AnalysisServiceDownload,
STATION_LAT,
STATION_LON,
)
from toargridding.metadata import Metadata, TimeSample, AnalysisRequestResult, Coordinates
from toargridding.variables import Coordinate
from toargridding.contributors import contributions_manager_by_id
import logging
# %% [markdown]
# #### Setup of logging
#
# In the next step we setup the logging, i.e. the level of information that is displayed as output.
#
# We start with a default setup and restrict the output to information and more critical output like warnings and errors.
# %%
from toargridding.defaultLogging import toargridding_defaultLogging
logger = toargridding_defaultLogging()
logger.addShellLogger(logging.INFO)
logger.logExceptions()
# %% [markdown]
# #### selection of data and grid:
# We need to select a temporal aggregation by selecting one week of data with a daily sampling.
# With this sampling we define our metadata for the request. As variable we select ozone and as statistical aggregation a mean.
#
# The last step is the definition of the grid. We select a resolution of 2° in latitude and 2.5° in longitude.
# %%
time = TimeSample(dt(2016,3,1), dt(2016,3,3), "daily")
metadata = Metadata.construct("mole_fraction_of_ozone_in_air", time, "mean")
my_grid = RegularGrid(2.0, 2.5)
# %% [markdown]
# #### Setting up the analysis
#
# We need to prepare our connection to the analysis service of the toar database, which will provide us with temporal and statistical aggregated data.
# Besides the url of the service, we also need to setup two directories on our computer:
# - one to save the data provided by the analysis service (called cache)
# - a second to store our gridded dataset (called results)
# Those will be found in the directory examples/cache and examples/results.
# %%
endpoint = "https://toar-data.fz-juelich.de/api/v2/analysis/statistics/"
#starts in directory [path/to/toargridding]/examples
toargridding_base_path = Path("examples")
cache_dir = toargridding_base_path / "cache"
data_download_dir = toargridding_base_path / "results"
cache_dir.mkdir(exist_ok=True)
data_download_dir.mkdir(exist_ok=True)
analysis_service = AnalysisServiceDownload(endpoint, cache_dir, data_download_dir, use_downloaded=True)
# %% [markdown]
# #### Download of data and writing to disc:
# In the next step we want to download the data and store them to disc.
#
# To obtain the contributors for this dataset, we need to create a dedicated file. This can be uploaded to the TOAR database to obtain a preformatted list of contributors. The required recipe can be found in the global metadata of the netCDF file.
#
# The request the database can take several minutes. This duration is also dependent on the overall usage of the services. The `get_data` function checks every 5minutes, if the data are ready for download. After 30min this cell stops the execution. Simply restart this cell to continue checking for the results.
# %%
# this cell can run longer than 30minutes
data = analysis_service.get_data(metadata)
# create contributors endpoint and write result to metadata
contrib = contributions_manager_by_id(metadata.get_id(), data_download_dir)
contrib.extract_contributors_from_data_frame(data.stations_data)
metadata.contributors_metadata_field = contrib.setup_contributors_endpoint_for_metadata()
ds = my_grid.as_xarray(data)
#store dataset
out_file_name = data_download_dir / f"{metadata.get_id()}_{my_grid.get_id()}.nc"
ds.to_netcdf(out_file_name)
print("Gridded data have been written to ", out_file_name)
# %% [markdown]
# ### Visual inspection
# %% [markdown]
# We are working here with raw data and also want to visualize the station positions. Therefore, we want to distinguish stations that have valid data and those without valid data.
# %%
#calculation of coordinates for plotting
#especially separation of coordinates with results and without results.
mean_data = ds["mean"]
clean_coords = data.stations_coords
all_na = data.stations_data.isna().all(axis=1)
clean_coords = all_na.to_frame().join(clean_coords)[["latitude", "longitude"]]
all_na_coords = clean_coords[all_na]
not_na_coords = clean_coords[~all_na]
# %% [markdown]
# In the next step we prepare a function for plotting the gridded data to a world map. The flag *discrete* influences the creation of the color bar. The *plot_stations* flag allows including the station positions into the map.
# %%
import cartopy.crs as ccrs
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import numpy as np
#definition of plotting function
def plot_cells(data, stations, na_stations, discrete=True, plot_stations=False):
fig = plt.figure(figsize=(9, 18))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
gl = ax.gridlines(draw_labels=True)
gl.top_labels = False
gl.left_labels = False
gl.xlocator = mticker.FixedLocator(data.longitude.values)
gl.ylocator = mticker.FixedLocator(data.latitude.values)
cmap = mpl.cm.viridis
if discrete:
print(np.unique(data.values))
bounds = np.arange(8)
norm = mpl.colors.BoundaryNorm(bounds, cmap.N, extend="both")
ticks = np.arange(bounds.size + 1)[:-1] + 0.5
ticklables = bounds
im = plt.pcolormesh(
data.longitude,
data.latitude,
data,
transform=ccrs.PlateCarree(),
cmap=cmap,
shading="nearest",
norm=norm,
)
cb = fig.colorbar(im, ax=ax, shrink=0.2, aspect=25)
cb.set_ticks(ticks)
cb.set_ticklabels(ticklables)
im = plt.pcolormesh(
data.longitude,
data.latitude,
data,
transform=ccrs.PlateCarree(),
cmap=cmap,
shading="nearest",
norm=norm,
)
else:
im = plt.pcolormesh(
data.longitude,
data.latitude,
data,
transform=ccrs.PlateCarree(),
cmap=cmap,
shading="nearest",
)
cb = fig.colorbar(im, ax=ax, shrink=0.2, aspect=25)
if plot_stations:
plt.scatter(na_stations["longitude"], na_stations["latitude"], s=1, c="k")
plt.scatter(stations["longitude"], stations["latitude"], s=1, c="r")
plt.tight_layout()
plt.title(f"global ozon at {data.time.values} {data.time.units}")
# %% [markdown]
# Now we do the actual plotting. We select a single time from the dataset. To obtain two maps: 1) the mean ozone concentration per grid point and second the number of stations contributing to a grid point.
# %%
#example visualization for two time points
print(not_na_coords)
timestep = 2
time = ds.time[timestep]
data = ds.sel(time=time)
print(data)
plot_cells(data["mean"], not_na_coords, all_na_coords, discrete=False, plot_stations=True)
plot_cells(data["n"], not_na_coords, all_na_coords, discrete=True)
n_observations = ds["n"].sum(["latitude", "longitude"])
plt.plot(ds.time, n_observations)
plt.title("Number of Stations per Time Point")
plt.xlabel("Time")
plt.ylabel("Number of Stations")
print(np.unique(ds["n"]))
# %% [markdown]
# Last but not least: We print the data and metadata of the dataset. Especially a look into the metadata can be interesting.
# %%
print(data)
plt.show()
%% Cell type:markdown id: tags:
### Get Dataset from request
This cell imports all required packages and sets up the logging as well as the required information for the requests to the TOAR-DB.
We will receive a warning as the lateral resolution of 1.9° does not result in a natural number of cells. Therefore, it is slightly adopted.
%% Cell type:code id: tags:
``` python
from datetime import datetime as dt
from pathlib import Path
import pandas as pd
from toargridding.grids import RegularGrid
from toargridding.toar_rest_client import (
AnalysisServiceDownload,
STATION_LAT,
STATION_LON,
)
from toargridding.metadata import Metadata, TimeSample, AnalysisRequestResult, Coordinates
from toargridding.variables import Coordinate
from toargridding.contributors import contributions_manager_by_id
```
%% Cell type:markdown id: tags:
#### Setup of logging
In the next step we setup the logging, i.e. the level of information that is displayed as output.
We start with a default setup and restrict the output to information and more critical output like warnings and errors.
%% Cell type:code id: tags:
``` python
from toargridding.defaultLogging import toargridding_defaultLogging
import logging
logger = toargridding_defaultLogging()
logger.addShellLogger(logging.INFO)
logger.logExceptions()
```
%% Cell type:markdown id: tags:
#### selection of data and grid:
We need to select a temporal aggregation by selecting one week of data with a daily sampling.
With this sampling we define our metadata for the request. As variable we select ozone and as statistical aggregation a mean.
We also select an offline post processing of the data: we average the mean calculated for all timeseries at a station to create a single value.
Last but not least, we select a data quality flag with 'AllOK', requesting that all data points need to pass the quality tests of the provider and the automatic tests of the TOAR data infrastructure.
Other options exclude the tests of performed by TOAR.
The last step is the definition of the grid. We select a resolution of 2° in latitude and 2.5° in longitude.
%% Cell type:code id: tags:
``` python
time = TimeSample(dt(2016,3,1), dt(2016,3,3), "daily")
metadata = Metadata.construct("mole_fraction_of_ozone_in_air", time, "mean", "meanTSbyStation", data_quality_flag="AllOK")
my_grid = RegularGrid(2.0, 2.5)
```
%% Cell type:markdown id: tags:
#### Setting up the analysis
We need to prepare our connection to the analysis service of the toar database, which will provide us with temporal and statistical aggregated data.
Besides the url of the service, we also need to setup two directories on our computer:
- one to save the data provided by the analysis service (called cache)
- a second to store our gridded dataset (called results)
Those will be found in the directory examples/cache and examples/results.
%% Cell type:code id: tags:
``` python
endpoint = "https://toar-data-testing.fz-juelich.de/api/v2/analysis/statistics/"
#starts in directory [path/to/toargridding]/examples
toargridding_base_path = Path(".")
cache_dir = toargridding_base_path / "cache"
data_download_dir = toargridding_base_path / "results"
cache_dir.mkdir(exist_ok=True)
data_download_dir.mkdir(exist_ok=True)
analysis_service = AnalysisServiceDownload(endpoint, cache_dir, data_download_dir, use_downloaded=True)
```
%% Cell type:markdown id: tags:
#### Download of data and writing to disc:
In the next step we want to download the data and store them to disc.
To obtain the contributors for this dataset, we need to create a dedicated file. This can be uploaded to the TOAR database to obtain a preformatted list of contributors. The required recipe can be found in the global metadata of the netCDF file.
The request the database can take several minutes. This duration is also dependent on the overall usage of the services. The `get_data` function checks every 5minutes, if the data are ready for download. After 30min this cell stops the execution. Simply restart this cell to continue checking for the results.
%% Cell type:code id: tags:
``` python
# this cell can run longer than 30minutes
data = analysis_service.get_data(metadata)
# create contributors endpoint and write result to metadata
contrib = contributions_manager_by_id(metadata.get_id(), data_download_dir)
contrib.extract_contributors_from_data_frame(data.stations_data)
metadata.contributors_metadata_field = contrib.setup_contributors_endpoint_for_metadata()
ds = my_grid.as_xarray(data)
```
%% Cell type:markdown id: tags:
### Visual inspection
%% Cell type:markdown id: tags:
We are working here with raw data and also want to visualize the station positions. Therefore, we want to distinguish stations that have valid data and those without valid data.
%% Cell type:code id: tags:
``` python
#calculation of coordinates for plotting
#especially separation of coordinates with results and without results.
clean_coords = data.stations_coords
all_na = data.stations_data.isna().all(axis=1)
clean_coords = all_na.to_frame().join(clean_coords)[["latitude", "longitude"]]
all_na_coords = clean_coords[all_na]
not_na_coords = clean_coords[~all_na]
```
%% Cell type:code id: tags:
``` python
print(ds)
```
%% Cell type:markdown id: tags:
In the next step we prepare a function for plotting the gridded data to a world map. The flag *discrete* influences the creation of the color bar. The *plot_stations* flag allows including the station positions into the map.
%% Cell type:code id: tags:
``` python
import cartopy.crs as ccrs
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import numpy as np
#definition of plotting function
def plot_cells(data, stations, na_stations, discrete=True, plot_stations=False):
fig = plt.figure(figsize=(9, 18))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
gl = ax.gridlines(draw_labels=True)
gl.top_labels = False
gl.left_labels = False
gl.xlocator = mticker.FixedLocator(data.longitude.values)
gl.ylocator = mticker.FixedLocator(data.latitude.values)
cmap = mpl.cm.viridis
if discrete:
print(np.unique(data.values))
bounds = np.arange(8)
norm = mpl.colors.BoundaryNorm(bounds, cmap.N, extend="both")
ticks = np.arange(bounds.size + 1)[:-1] + 0.5
ticklables = bounds
im = plt.pcolormesh(
data.longitude,
data.latitude,
data,
transform=ccrs.PlateCarree(),
cmap=cmap,
shading="nearest",
norm=norm,
)
cb = fig.colorbar(im, ax=ax, shrink=0.2, aspect=25)
cb.set_ticks(ticks)
cb.set_ticklabels(ticklables)
im = plt.pcolormesh(
data.latitude,
data.longitude,
data,
transform=ccrs.PlateCarree(),
cmap=cmap,
shading="nearest",
norm=norm,
)
else:
print(data)
im = plt.pcolormesh(
data.latitude,
data.longitude,
data,
transform=ccrs.PlateCarree(),
cmap=cmap,
shading="nearest",
)
cb = fig.colorbar(im, ax=ax, shrink=0.2, aspect=25)
if plot_stations:
plt.scatter(na_stations["longitude"], na_stations["latitude"], s=1, c="k")
plt.scatter(stations["longitude"], stations["latitude"], s=1, c="r")
plt.tight_layout()
unit = data.attrs["units"]
plt.title(f"global ozone [{unit}] at {data.time.values} {data.time.units}")
```
%% Cell type:markdown id: tags:
Now we do the actual plotting. We select a single time from the dataset. To obtain two maps: 1) the mean ozone concentration per grid point and second the number of stations contributing to a grid point.
%% Cell type:code id: tags:
``` python
#example visualization for two time points
#print(not_na_coords)
timestep = 2
time = ds.time[timestep]
selected_data = ds.sel(time=time)
print(selected_data)
plot_cells(selected_data["mean"], not_na_coords, all_na_coords, discrete=False, plot_stations=False)
plt.show()
plot_cells(selected_data["n"], not_na_coords, all_na_coords, discrete=True)
plt.show(
plt.show()
n_observations = ds["n"].sum(["latitude", "longitude"])
plt.plot(ds.time, n_observations)
plt.title("Number of Contributions per Time Point")
plt.xlabel("Time")
plt.ylabel("Number of contributions")
print(np.unique(ds["n"]))
```
%% Cell type:markdown id: tags:
Last but not least: We print the data and metadata of the dataset. Especially a look into the metadata can be interesting.
%% Cell type:code id: tags:
``` python
print(data)
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment