The TOARgridding projects data from the TOAD database (https://toar-data.fz-juelich.de/) onto a grid.
The request to the database also allows a statistical analysis of the requested value.
The request to the database includes a statistical analysis of the requested value.
The mean and standard deviation of all stations within a cell are computed.
The tool handles the request to the database over the REST API and the subsequent processing.
This tool handles the request to the database over the REST API and the subsequent processing.
The results of the gridding are provided as [xarray datasets](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) for subsequent processing and visualization by the user.
While this project provides ready to use examples, it is intended as a library to be used in dedicated analysis scripts. Furthermore, the long term goal is to provide the gridding as a service over a RESTful API.
This project is in beta with the intended basic functionalities.
The documentation and this README are work in progress.
# Requirements
This project requires python 3.11 or higher.
...
...
@@ -46,11 +49,11 @@ The latter line activates the virtual environment for the further usage. To deac
```bash
deactivate
```
For the installation of all required dependencies call
For the installation of all required dependencies call with activated virtual environment.
```bash
pip install-e .
```
To be able to execute the examples, that are provided as jupyter notebooks, we need to install a different preset by calling
To be able to execute the examples, that are provided as jupyter notebooks, we need to install a additional packages by calling
```bash
pip install-e."interactive"
```
...
...
@@ -69,7 +72,7 @@ python [/path/to/scriptname.py]
# How does this tool work?
This tool has two main parts. The first handles requests to the TOAR database via its analysis service. This includes the statistical analysis of the requested timeseries.
This tool has two main parts. The first part handles requests to the TOAR database via its analysis service. This includes the statistical analysis of the requested timeseries.
The second part is the gridding, which is performed offline.
## Request to TOAR Database with Statistical Analysis
...
...
@@ -78,14 +81,13 @@ Requests are send to the analysis service of the TOAR database. This allows a se
Whenever a request is submitted, it will be processed. The returned status endpoint will point to the results as soon as the analysis is finished.
A request can take several hours, depending on time range and the number of requested stations.
This module stores the requests and their status endpoint in a local cache file. These endpoints are used to check, if the processing by the analysis service is finished.
Requests are deleted from the cache after 14 days. You can adopt this by using Cache.setMaxDaysInCache([max age in days]).
Requests are deleted from the cache after 14 days. You can adopt this by using `Cache.set_max_days_in_cache([max age in days])`.
At the moment, there is no possibility implemented to check the status of a running job until it is finished (Date: 2024-05-14).
It seems that crashed requests respond with an internal server error (HTML Status Code 500). Therefore, those requests are automatically deleted from the cache and resubmitted.
As soon as a request is finished, the status endpoint will not be valid forever. The data will be stored longer in a cache by the analysis service. As soon as the same request is submitted, first the cache is checked, if the results have already been calculated. The retrieval of the results from the cache can take some time, similar to the analysis.
There is no check, if a request is already running. Therefore, submitting a request multiple times, leads to additional load on the system and slows down all requests.
The TOAR database has only a limited number of workers for performing a statistical analysis. Therefore, it is advised to run one request after another, especially for large requests covering a large number of stations and or a longer time.
## Gridding
...
...
@@ -95,27 +97,27 @@ Per cell mean, standard deviation and the number of stations are reported in the
## Contributors
The contributors include all projects, organizations and persons that are associated to any timeseries of a gridded dataset as contributor or originator. In offline mode, this information is conserved by saving the timeseries IDs in a dedicated file with one ID per line. In the metadata of a dataset, this filename is stated together with the contributors endpoint (at the moment: `https://toar-data.fz-juelich.de/api/v2/timeseries/request_contributors`; CAVE Endpoint is under development and expected to be published until 2024-07-31) to retrieve the actual names. Therefore, the created contributor file need to be submitted as a POST request.
The contributors include all projects, organizations and persons that are associated to any timeseries of a gridded dataset with the roles "contributor" and "originator". In offline mode, this information is conserved by saving the timeseries IDs in a dedicated file with one ID per line. In the metadata of a dataset, this filename is stated together with the contributors endpoint (at the moment: `https://toar-data.fz-juelich.de/api/v2/timeseries/request_contributors`; CAVE Endpoint is under development and expected to be published until 2024-07-31) to retrieve the actual names. Therefore, the created contributor file need to be submitted as a POST request.
The default output format is a json file, that contains the full information on all roles associated to the provided timeseries IDs. These data can be further processed as desired.
The default output format is a json file, that contains the full information on all roles associated to the provided timeseries IDs. These data should be processed to fit your needs.
The second option provides a ready-to-use list of all programs, organizations and persons that contributed to this dataset. This requires setting the output format of the request to *TBD*.
The provided organizations include the affiliations of the individual persons, as provided to the TOAR database.
The provided organizations include the affiliations of all individual persons, as stored in the TOAR database.
## Logging
Output created by the different modules and classes of this package use the python logging.
There is also a auxiliary class to reuse the same logger setup for examples and so over this script.
There is also a auxiliary class to reuse the same logger setup for examples and scripts provided by this package. It can also be used for custom scripts using this library.
This can be used to configures a logging to the shell as well as to the system log of a linux system.
# Example
There are at the moment five example provided as jupyter notebooks (https://jupyter.org/).
Jupyter uses your web-browser to display results and the code blocks. Here, examples are provided in python.
This package provides a number of examples as jupyter notebooks (https://jupyter.org/).
Jupyter uses your web-browser to display results and the code blocks.
As an alternative, visual studio code directly supports execution of jupyter notebooks.
For VS Code, please ensure to select the kernel of the virtual environment [see](https://code.visualstudio.com/docs/datascience/jupyter-notebooks).
Running the provided examples with the python environment created by poetry can be done by
After activating the virtual environment the notebooks can be run by calling
```bash
jupyter notebook
```
...
...
@@ -130,7 +132,7 @@ This notebook provides an example on how to download data, apply gridding and sa
The AnalysisServiceDownload caches already obtained data on the local machine.
This allows different grids without the necessity to repeat the request to the TOARDB, the statistical analysis and the subsequent download.
As an example we calculated the dma8epa_strict on a daily basis for the years 2000 to 2018 for all timeseries in the TOAR database.
As an example we calculated *dma8epa_strict* on a daily basis for the years 2000 to 2018 for all timeseries in the TOAR database.
The first attempt for this example covered the full range of 19 years in a single request. It turned out, that an extraction year by year is more reliable.
The subsequent requests function as a progress report and allow working with the data, while further requests are processed.
...
...
@@ -139,7 +141,7 @@ As the gridding is done offline, it will be executed for already downloaded file
This example is based on the previous one but uses additional arguments to reduce the number of stations per request. As an example, different classifications of the stations are used: first the "toar1_category" and second the "type_of_area".
This example is based on the previous one but uses additional arguments to refine the selection of stations. As an example, different classifications of the stations are used: first the "toar1_category" and second the "type_of_area".
Details can be found in [documentation of the FastAPI REST interface](https://toar-data.fz-juelich.de/api/v2/#stationmeta) or the [user guide](https://toar-data.fz-juelich.de/sphinx/TOAR_UG_Vol03_Database/build/latex/toardatabase--userguide.pdf).
The selection of only a limited number of stations leads to significant faster results. On the downside, the used classifications are not available for all stations.
...
...
@@ -150,7 +152,7 @@ The selection of only a limited number of stations leads to significant faster r
```
Downloads data from the TOAR database with a manual creation of the request to the TOAR database.
The extracted data are written to disc. No further processing or gridding is done.
The result is a ZIP-file containing two CSV files. The first one contains the statistical analysis of the timeseries and the second one the coordinates of the stations.
The result is a ZIP-file containing two CSV files. The first one contains the statistical analysis of the timeseries and the second one metadata of the timeseries, like coordinates of the stations or a link to a citation.
## Retrieving data
```bash
...
...
@@ -164,29 +166,30 @@ As a comparison to the previous example, this one performs the same request by u
```
Notebook for downloading and visualization of data.
The data are downloaded and reused for subsequent executions of this notebook.
The gridding is done on the downloaded data. Gridded data are not saved to disc.
The gridding is repeated whenever this example is executed. Gridded data are not saved.
# Supported Grids
The first supported grid is a regular grid with longitude and latitude.
The first supported grid is a regular grid with longitude and latitude covering the hole world.
# Supported Variables
This module supports all variables of the TOAR database (Extraction: 2024-05-27). They can be identified by their "cf_standardname" or their name as stored in the TOAR database.
The second option is to provide the name of a variable, as not all variables have a "cf_standardname".
The full list of available variables with their name and "cf_standardname" can be accesses by querying the TOAR database, e.g. with https://toar-data.fz-juelich.de/api/v2/variables/?limit=None
This module supports all variables of the TOAR database (Extraction: 2024-05-27). They can be identified by their *cf_standardname* or their name as stored in the TOAR database.
The second option is shorter and as not all variables in the database have a *cf_standardname*.
The up-to-date list of all available variables with their name and *cf_standardname* can be accesses by querying the TOAR database, e.g. with https://toar-data.fz-juelich.de/api/v2/variables/?limit=None.
The configuration of toargridding can be updated by running the script `tools/setupFunctions.py`.
# Supported Time intervals
At the moment time differences larger than one day are working, i.e. start and end=start+1day leads to crashes.
At the moment only time differences larger than one day are working, i.e. start and end=start+1day leads to crashes.
# Setup functions:
This package comes with all required information. There is a first function to fetch an update of the available variables from the TAOR-DB.
This will override the original file:
```bash
python toargridding/setupFunctions.py
python tools/setupFunctions.py
```
# Benchmarks
...
...
@@ -267,3 +270,7 @@ class dataClass:
# Tested platforms
This project has been tested on
- Rocky Linux 9
# Automated Testing
At the moment automatic tests for this module are under development but not in an operational state.