Snippets Groups Projects

10 months ago
3ee81cb4

Merge branch 'dev' into wip_tests_and_notebooks · 3ee81cb4
Carsten Hinz authored 10 months ago

3ee81cb4

History

Merge branch 'dev' into wip_tests_and_notebooks
Carsten Hinz authored 10 months ago

README.md 7.37 KiB

TOAR Gridding Tool

About

The TOARgridding projects data from the TOAD database (https://toar-data.fz-juelich.de/) onto a grid. The request to the database also allows a statistical analysis of the requested value. The mean and standard deviation of all stations within a cell are computed.

The tool handles the request to the database over the REST API and the subsequent processing. The results of the gridding are provided as xarray objects for subsequent processing by the user.

This project is in beta with the intended basic functionalities. The documentation is work in progress.

Requirements

TBD, see pyproject.toml

Installation

Move to the folder you want to create download this project to. We now need to download the source code (https://gitlab.jsc.fz-juelich.de/esde/toar-public/toargridding/-/tree/dev?ref_type=heads). Either as ZIP folder or via git:

Download with GIT

Clone the project from its git repository:

git clone https://gitlab.jsc.fz-juelich.de/esde/toar-public/toargridding.git

With git we need to checkout the development branch (dev). Therefore we need to change to the project directory first:

cd toargridding
git checkout dev

Installing Dependencies and Setting up Virtual Enviorment

The handling of required packages is done with poetry (https://python-poetry.org/). After installing poetry, you can simply install all required dependencies for this project by runing poetry in the project directory:

poetry install

This also creates a virtual enviorment, which ensures that different projects do not interfere with their dependencies. To run a jupyter notebook in the virtual enviorment execute

poetry run jupyter notebook

and to run a script use

poetry run python [/path/to/scriptname.py]

How does this tool work?

This tool has two main parts. The first handles requests to the TOAR database and the analysis of the data. The second part is the gridding, which is performed offline.

Request to TOAR Database with Statistical Analysis

Requests are send to the analysis service of the TOAR database. This allows a selection of different stations base on their metadata and performing a statistical analysis. Whenever a request is submitted, it will be processed. The returned status endpoint will point ot the results as soon as the process is finished. A request can take several hours, depending on time range and the number of requested stations. At the moment, there is no possibility implemented to check the status of a running job until it is finished (Date: 2024-05-14).

As soon as a request is finished, the status endpoint will not be valid forever. The data will be stored longer in a cache by the analysis service. As soon as the same request is submitted, first the cache is checked, if the results have already been calculated. The retrieval of the results from the cache can take some time, similar to the analysis.

There is no check, if a request is already running. Therefore, submitting a request multiple times, leads to additional load on the system and slows down all requests.

The TOAR database has only a limited number of workers for performing a statistical analysis. Therefore, it is advised to run one request after another, especially for large requests covering a large number of stations and or a longer time.

Gridding

The gridding uses a user defined grid to combine all stations in a cell. Per cell mean, standard deviation and the number of stations are reported.

Example

There are at the moment three example provided as jupyter notebooks (https://jupyter.org/).

Running them with the python environment produced by poetry can be done by

poetry run jupyter notebook

High level function

tests/produce_data_withOptional.ipynb

Provides an example on how to download data, apply gridding and save the results as netCDF files. The AnalysisServiceDownload caches already obtained data on the local machine. This allows different griddings without the necessity to repeat the request to the TOARDB and subsequent download.

In total two requests are executed by requesting different different statistical quantities (mean & dma8epax). The example uses a dictionary to pass additional arguments to the request to the TAOR database (here: station category from TOAR 1). A detailed list can be found at https://toar-data.fz-juelich.de/api/v2/#stationmeta

tests/produce_data_manyStations.ipynb

Uses a similar request, but without the restriction to the station type. Therefore, a much larger number of stations is requested (about 1000 compared to a few hundred, that have a "toar1_category" classification used in the previous example). Therefore, this example is restricted to the calculation of "dma8epax".