fixed documentation to match code

a1aff297 · Carsten Hinz · 4bf54e88 · a1aff297
Commit a1aff297 authored 2 months ago by Carsten Hinz
--- a/examples/03_produce_data_station_metadata.ipynb
+++ b/examples/03_produce_data_station_metadata.ipynb
@@ -100,7 +100,7 @@
   "metadata": {},
   "source": [
    "Our following request will take some time, so we edit the durations between two checks, if our data are ready for download and the maximum duration for checking.\n",
-    "We will check every 45min for 12h. "
+    "We will check every 15min for 12h. "
   ]
  },
  {

 %% Cell type:markdown id: tags:

 # Example with optional parameters
 Toargridding has a number of required arguments for a dataset. Those include the time range, variable and statistical analysis. The TAOR-DB has a large number of metadata fileds that can be used to further refine this request.
 A python dictionary can be provided to include theses other fields. The analysis service provides an error message, if the requested parameters does not exist (check for typos) or if the provided value is wrong.

 In this example we want to obtain data from 2012.

 The fist block contains the includes and the setup of the logging.

 %% Cell type:markdown id: tags:

 #### inclusion of packages

 %% Cell type:code id: tags:

 ``` python
 import logging
 from datetime import datetime as dt
 from collections import namedtuple
 from pathlib import Path

 from toargridding.toar_rest_client import AnalysisServiceDownload, Connection
 from toargridding.grids import RegularGrid
 from toargridding.gridding import get_gridded_toar_data
 from toargridding.metadata import TimeSample
 ```

 %% Cell type:markdown id: tags:

 #### Setup of logging

 In the next step we setup the logging, i.e. the level of information that is displayed as output.

 We start with a default setup and restrict the output to information and more critical output like warnings and errors.

 We also add logging to a file. This will create a new log file at midnight and keep up to 7 log files.

 %% Cell type:code id: tags:

 ``` python
 from toargridding.defaultLogging import toargridding_defaultLogging

 logger = toargridding_defaultLogging()
 #logger.addShellLogger(logging.INFO)
 logger.addShellLogger(logging.DEBUG)
 logger.logExceptions()
 log_path = Path("log")
 log_path.mkdir(exist_ok=True)
 logger.addRotatingLogFile( log_path / "produce_data_station_metadata.log")#we need to explicitly set a logfile
 ```

 %% Cell type:markdown id: tags:

 #### Setting up the analysis

 We need to prepare our connection to the analysis service of the toar database, which will provide us with temporal and statistical aggregated data.
 Besides the url of the service, we also need to setup two directories on our computer:
 - one to save the data provided by the analysis service (called cache)
 - a second to store our gridded dataset (called results)
 Those will be found in the directory examples/cache and examples/results.

 %% Cell type:code id: tags:

 ``` python

 stats_endpoint = "https://toar-data.fz-juelich.de/api/v2/analysis/statistics/"
 cache_basepath = Path("cache")
 result_basepath = Path("results")
 cache_basepath.mkdir(exist_ok=True)
 result_basepath.mkdir(exist_ok=True)
 analysis_service = AnalysisServiceDownload(stats_endpoint=stats_endpoint, cache_dir=cache_basepath, sample_dir=result_basepath, use_downloaded=True)
 ```

 %% Cell type:markdown id: tags:

 Our following request will take some time, so we edit the durations between two checks, if our data are ready for download and the maximum duration for checking.
-We will check every 45min for 12h.
+We will check every 15min for 12h.

 %% Cell type:code id: tags:

 ``` python
 analysis_service.connection.set_request_times(interval_min=15, max_wait_minutes=12*60)
 ```

 %% Cell type:markdown id: tags:

 #### Preparation of requests with station metadata

 We restrict our request to one year and of daily mean ozone data. In addition we would like to only include urban stations.

 We use a container class to keep the configurations together (type: namedtuple).

 We also want to refine our station selection by using further metadata.
 Therefore, we create the `station_metadata` dictionary. We can use the further metadata stored in the TOAR-DB by providing their name and our desired value. This also discards stations, without a provided value for a metadata field. We can find information on different metadata values in the [documentation](https://toar-data.fz-juelich.de/sphinx/TOAR_UG_Vol03_Database/build/latex/toardatabase--userguide.pdf). For example for the *toar1_category* on page 18 and for the *type_of_area* on page 20.

 We can use this to filter for all additional metadata, which are supported by the [statistics endpoint of the analysis service](https://toar-data.fz-juelich.de/api/v2/analysis/#statistics), namely station metadata and timeseries metadata.

 In the end we have wo requests, that we want to submit.

 %% Cell type:code id: tags:

 ``` python
 Config = namedtuple("Config", ["grid", "time", "variables", "stats", "station_metadata", "data_aggregation_mode"])

 #uncomment, if you want to change the metadata:
 station_metadata ={
    #"toar1_category" : "Urban" #uncomment if wished:-)
    "type_of_area" : "Rural" #also test Rural, Suburban, Urban
 }

 grid = RegularGrid( lat_resolution=1.9, lon_resolution=2.5, )

 configs = dict()
 request_config = Config(
    grid,
    TimeSample( start=dt(2012,1,1), end=dt(2012,12,31), sampling="daily"),
    ["mole_fraction_of_ozone_in_air"],
    #[ "mean" ],
    [ "dma8epa_strict" ],
    station_metadata,
    data_aggregation_mode="meanTSbyStation"
 )
 configs[f"test_ta"] = request_config
 ```

 %% Cell type:markdown id: tags:

 #### execution of toargridding and saving of results
 Now we want to request the data from the TOAR analysis service and create the gridded dataset.
 Therefore, we call the function `get_gridded_toar_data` with everything we have prepared until now.

 The request will be submitted to the analysis service, which will process the request. On our side, we will check in intervals, if the processing is finished. After several request, we will stop checking. The setup for this can be found a few cells above.
 A restart of this cell allows to continue the look-up, if the data are available.

 The obtained data are stored in the result directory (`results_basepath`). Before submitting a request, toargridding checks his cache, if the data have already been downloaded.

 Last but not least, we want to save our dataset as netCDF file.
 In the global metadata of this file we can find a recipe on how to obtain a list of contributors with the contributors file created by `get_gridded_toar_data`. This function also creates the required  file with the extension "*.contributors".

 %% Cell type:code id: tags:

 ``` python

 for config_id, config in configs.items():
    print(f"\nProcessing {config_id}:")
    print(f"--------------------")
    datasets, metadatas = get_gridded_toar_data(
        analysis_service=analysis_service,
        grid=config.grid,
        time=config.time,
        variables=config.variables,
        stats=config.stats,
        contributors_path=result_basepath,
        data_aggregation_mode=config.data_aggregation_mode,
        **config.station_metadata
    )

    for dataset, metadata in zip(datasets, metadatas):
        dataset.to_netcdf(result_basepath / f"{metadata.get_id()}_{config.grid.get_id()}.nc")
 ```