Skip to content
Snippets Groups Projects

toarstats

This repository contains a collection of statistical tools for the analysis of time series data. It is split into two subpackages:

  • metrics: collection of statistics and metrics to calculate on hourly time series data (some specific to ozone measurements)
  • trends: calculate quantile regression on time series data

Installation

To install the package in a specific version and all dependencies run the following command from within the dist folder of this repository:

python3 -m pip install toarstats-<version>-py3-none-any.whl

It is advised to set up a virtual environment beforehand.

metrics

This subpackage contains a collection of statistics that can be calculated on hourly data. The statistics in the ozone_metrics.py file are specific to ozone data. The statistics in the stats.py file can be calculated for other variables as well.

Import

To use the package import calculate_statistics with:

from toarstats.metrics import calculate_statistics # or
from toarstats.metrics import * # or
import toarstats.metrics

Interface

The calculate_statistics interface is defined like this:

calculate_statistics(
    sampling=None, statistics=None, data=None, metadata=None, seasons=None,
    crops=None, min_data_capture=None, datetimes=None, values=None,
    station_lat=None, station_lon=None, station_climatic_zone=None
)
    """Calculate the requested statistics.

    This function is the public interface for the ``toarstats`` package.
    It takes all the user inputs and returns the result of all requested
    statistics and metrics.

    :param sampling: temporal aggregation, one of ``daily``,
                     ``monthly``, ``seasonal``, ``vegseason``,
                     ``summer``, ``xsummer``, or ``annual``;
                     ``summer`` will pick the 6-months summer season in
                     the hemisphere where the station is located;
                     ``xsummer`` does the same for a 7-months summer
                     season;
                     ``vegseason`` requires also the ``crops`` argument
                     and will then determine the appropriate growing
                     seasons based on the ``climatic_zone`` metadata and
                     crop type
    :param statistics: a single statistic or metric or a list of
                       statistics and metrics to call, these must be
                       defined in ``stats.py`` or ``ozone_metrics.py``
    :param data: data containing a list of date time values and
                 associated parameter values on which to calculate the
                 statistics;
                 if not given, both ``datetimes`` and ``values`` must be
                 given instead
    :param metadata: metadata information about the station's latitude,
                     longitude and climatic zone (keys: ``station_lat``,
                     ``station_lon`` and ``station_climatic_zone``);
                     if not given and any requested statistic or metric
                     needs metadata information, ``station_lat``,
                     ``station_lon`` and ``station_climatic_zone`` must
                     be given instead
    :param seasons: a list of season names for seasonal statistics;
                    for a definition of seasons, see ``stats_utils.py``;
                    if ``None`` is passed, seasonal statistics will be
                    computed for the default seasons of the respective
                    metrics, normally, these are the four meteorological
                    seasons ``DJF``, ``MAM``, ``JJA`` and ``SON``;
                    if sampling is set to ``summer`` or ``xsummer``, the
                    correct season will be determined based on the
                    ``station_lat`` metadata;
                    if sampling is ``vegseason`` and the ``crops``
                    argument is given, the appropriate growing seasons
                    will be selected based on the crop type and
                    ``climatic_zone`` metadata;
                    the growing seasons for ``wheat`` and ``rice`` will
                    also be selected if sampling is ``seasonal`` and the
                    chosen metrics contain ``aot40`` or ``w126``
    :param crops: a single crop type or a list of crop types for
                  ``vegseason`` statistics;
                  default is ``["wheat", "rice"]``
    :param min_data_capture: a fractional value which will be used to
                             identify valid data periods;
                             the default is 0.75 for most statistics,
                             meaning that 75% of hourly values must be
                             present in a given interval in order to
                             mark a result as valid;
                             note that the ``count``, ``mean`` and
                             ``stddev`` statistics do not use this
                             capture criterion, ``count`` counts all
                             values, ``mean`` and ``stddev`` are
                             calculated when there are at least 10 valid
                             hourly values in an interval;
                             the fraction may not always be applied to
                             original hourly values, but could for
                             example also be used to count the number of
                             valid days for a ``monthly``, ``seasonal``,
                             or ``annual`` statistic
    :param datetimes: must be given with ``values`` if the ``data``
                      argument is missing
    :param values: must be given with ``datetimes`` if the ``data``
                   argument is missing
    :param station_lat: station's latitude, used if missing in the
                        ``metadata`` argument
    :param station_lon: station's longitude, used if missing in the
                        ``metadata`` argument
    :param station_climatic_zone: station's climatic zone, used if
                                  missing in the ``metadata`` argument
    """

trends

This subpackage contains a collection of regression methods.

Import

To use the package import calculate_trend with:

from toarstats.trends import calculate_trend # or
from toarstats.trends import * # or
import toarstats.trends

Interface

The calculate_trend interface is defined like this:

calculate_trend(method, data, quantiles=None, num_samples=1000)
    """Calculate the trend using the requested method.

    This function is the public interface for the ``trends`` subpackage.
    It takes all the user inputs and returns the result of the requested
    trend analysis.

    The calculation follows "Guidance note on best statistical practices
    for TOAR analyses" (Chang et al. 2023,
    https://arxiv.org/pdf/2304.14236.pdf) Annex E.

    :param method: either ``"OLS"`` or ``"quant"``
    :param data: data containing a list of date time values and
                 associated parameter values on which to calculate the
                 trend
    :param quantiles: a single quantile or a list of quantiles to
                      calculate, these must be between 0 and 1; only
                      needed when ``method="quant"``
    :param num_samples: number of sampled trends in moving block
                        bootstrap
    """