TOAR Gridding Tool
====

# About

The TOARgridding projects data from the TOAD database (https://toar-data.fz-juelich.de/) onto a grid.
The request to the database also allows a statistical analysis of the requested value.
The mean and standard deviation of all stations within a cell are computed.

The tool handles the request to the database over the REST API and the subsequent processing.
The results of the gridding are provided as [xarray datasets](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) for subsequent processing and visualization by the user.

This project is in beta with the intended basic functionalities.
The documentation is work in progress.

# Requirements

TBD, see pyproject.toml

# Installation

Move to the folder you want to download this project to.
We now need to download the source code from the [repository](https://gitlab.jsc.fz-juelich.de/esde/toar-public/toargridding/-/tree/dev?ref_type=heads) either as ZIP file or via git:

## Download with GIT
Clone the project from its git repository:
```bash
git clone https://gitlab.jsc.fz-juelich.de/esde/toar-public/toargridding.git
```
With git we need to checkout the development branch (dev). Therefore we need to change to the project directory first:
```bash
cd toargridding
git checkout dev
```

## Installing Dependencies and Setting up Virtual Enviorment

The handling of required packages is done with [poetry](https://python-poetry.org/).
After installing poetry, you can simply install all required dependencies for this project by running poetry in the project directory:
```bash
poetry install
```
This also creates a virtual environment, which ensures that the dependencies of different projects do not interfere.
To run a jupyter notebook in the virtual environment execute
```bash
#for selecting a notebook over the file browser in your webbrowser:
poetry run jupyter notebook
#or for directly opening a notebook:
poetry run jupyter notebook [/path/to/scriptname.py]
```
and to run a script use
```bash
poetry run python [/path/to/scriptname.py]
```

# How does this tool work?

This tool has two main parts. The first handles requests to the TOAR database via its analysis service. This includes the statistical analysis of the requested timeseries.
The second part is the gridding, which is performed offline.

## Request to TOAR Database with Statistical Analysis

Requests are send to the analysis service of the TOAR database. This allows a selection of different stations based on their metadata and performing a statistical analysis.
Whenever a request is submitted, it will be processed. The returned status endpoint will point to the results as soon as the analysis is finished.
A request can take several hours, depending on time range and the number of requested stations.
At the moment, there is no possibility implemented to check the status of a running job until it is finished (Date: 2024-05-14).

As soon as a request is finished, the status endpoint will not be valid forever. The data will be stored longer in a cache by the analysis service. As soon as the same request is submitted, first the cache is checked, if the results have already been calculated. The retrieval of the results from the cache can take some time, similar to the analysis.

There is no check, if a request is already running. Therefore, submitting a request multiple times, leads to additional load on the system and slows down all requests.

The TOAR database has only a limited number of workers for performing a statistical analysis. Therefore, it is advised to run one request after another, especially for large requests covering a large number of stations and or a longer time.

## Gridding

The gridding uses a user defined grid to combine all stations in a cell.
Per cell mean, standard deviation and the number of stations are reported in the resulting xarray dataset.

# Example

There are at the moment five example provided as jupyter notebooks (https://jupyter.org/).
Jupyter uses your webbrowser to display results and the code blocks. Here, examples are provided in python.
As an alternative, visual studio code directly supports execution of jupyter notebooks.
For VS Code, please ensure to select the kernel of the virtual environment [see](https://code.visualstudio.com/docs/datascience/jupyter-notebooks).

Running the provided examples with the python environment created by poetry can be done by
```bash
poetry run jupyter notebook
```
as pointed out previously.

## High level function
```
tests/produce_data_manyStations.ipynb
#(plase see next notebook for a faster example)
```
This notebook provides an example on how to download data, apply gridding and save the results as [netCDF files](https://docs.xarray.dev/en/stable/user-guide/io.html).
The AnalysisServiceDownload caches already obtained data on the local machine.
This allows different griddings without the necessity to repeat the request to the TOARDB, the statistical analysis and the subsequent download.

As an example we calculated the dma8epa_strict on a daily basis for the years 2000 to 2018 for all timeseries in the TOAR database.
The first attempt for this example covered the full range of 19 years in a single request. It turned out, that an extraction year by year is more reliable.
The subsequent requests function as a progress report and allow working with the data, while further requests are processed.

As the gridding is done offline, it will be executed for already downloaded files, whenever the notebook is rerun. Please note, that the file name also contains the day of creation.

```bash
poetry run jupyter notebook tests/produce_data_withOptional.ipynb
```
This example is based on the previous one but uses additional arguments to reduce the number of stations per request. As an example, different classifications of the stations are used: first the toar1_category and second the type_of_area.
Details can be found in [documentation of the FastAPI REST interface](https://toar-data.fz-juelich.de/api/v2/#stationmeta) or the [user guide](https://toar-data.fz-juelich.de/sphinx/TOAR_UG_Vol03_Database/build/latex/toardatabase--userguide.pdf).

The selection of only a limited number of stations leads to significant faster results. On the downside, the used classifications are not available for all stations.

## Retrieving data
```bash
poetry run jupyter notebook tests/get_sample_data_manual.ipynb
```
Downloads data from the TOAR database with a manual creation of the request to the TOAR database.
The extracted data are written to disc. No further processing or gridding is done.
The result is a ZIP-file containing two CSV files. The first one contains the statistical analysis of the timeseries and the second one the coordinates of the stations.

## Retrieving data
```bash
poetry run jupyter notebook tests/get_sample_data.ipynb
```
As a comparison to the previous example, this one performs the same request by using the interface of this project.

## Retrieving data and visualization
```bash
poetry run jupyter notebook tests/quality_controll.ipynb
```
Notebook for downloading and visualization of data.
The data are downloaded and reused for subsequent executions of this notebook.
The gridding is done on the downloaded data. Gridded data are not saved to disc.

# Benchmarks

## Duration of Different Requests

```bash
poetry run jupyter notebook tests/benchmark.py
```
This script requests datasets with different durations (days to month) from the TOAR Database and saves them to disc.
It reports the duration for the different requests.
There is no gridding involved.
CAVE: This script can run several hours.

# Supported Grids

The first supported grid is a regular grid with longitude and latitude.

# Supported Variables

At the moment only a limited number of variables from the TOAR database is supported.

# Supported Time intervals

At the moment time differences larger than one day are working, i.e. start and end=start+1day leads to crashes.

# Documentation of Source Code:

At the moment Carsten Hinz is working on a documentation of the source code, while getting familiar with this project.
The aim is a brief overview on the functionalities and the arguments of individual functions. As he personally does not like repetitions,
the documentations might not match other style guides.
It will definitely be possible to extend the documentation:-)

```python
class example:
"""An example class

A more detailed explanation of the purpose of this example class.
"""

def __init__(self, varA : int, varB : str):
"""Constructor

Attributes:
varA:
brief details and more context
varB:
same here.
"""
[implementation]

def func1(self, att1, att2):
"""Brief

details

Attributes:
-----------
att1:
brief/details
att2:
brief/details
"""

[implementation]

```

```python
@dataclass
class dataClass:
"""Brief description

optional details

Parameters
----------
anInt:
brief description
anStr:
brief description
secStr:
brief description (explanation of default value, if this seems necessary)
"""
anInt : int
anStr : str
secStr : str = "Default value"
```

# Tested platforms
This project has been tested on
- Rocky Linux 9