Skip to content
Snippets Groups Projects
user avatar
Niklas Selke authored
1c5aba62
History

toarstats

This repository contains a collection of statistical tools for the analysis of time series data. It is split into two subpackages:

  • metrics: collection of statistics and metrics to calculate on hourly time series data (some specific to ozone measurements)
  • trends: calculate quantile regression on time series data

Installation

To install the package in a specific version and all dependencies run the following command from within the dist folder of this repository:

python3 -m pip install toarstats-<version>-py3-none-any.whl

It is advised to set up a virtual environment beforehand.

metrics

This subpackage contains a collection of statistics that can be calculated on hourly data. The statistics in the ozone_metrics.py file are specific to ozone data. The statistics in the stats.py file can be calculated for other variables as well.

Import

To use the package import calculate_statistics with:

from toarstats.metrics import calculate_statistics # or
from toarstats.metrics import * # or
import toarstats.metrics

Interface

The calculate_statistics interface is defined like this:

calculate_statistics(
    sampling=None, statistics=None, data=None, metadata=None, seasons=None,
    crops=None, min_data_capture=None, datetimes=None, values=None,
    station_lat=None, station_lon=None, station_climatic_zone=None
)
    """Calculate the requested statistics.

    This function is the public interface for the ``toarstats`` package.
    It takes all the user inputs and returns the result of all requested
    statistics and metrics.

    :param sampling: temporal aggregation, one of ``daily``,
                     ``monthly``, ``seasonal``, ``vegseason``,
                     ``summer``, ``xsummer``, or ``annual``;
                     ``summer`` will pick the 6-months summer season in
                     the hemisphere where the station is located;
                     ``xsummer`` does the same for a 7-months summer
                     season;
                     ``vegseason`` requires also the ``crops`` argument
                     and will then determine the appropriate growing
                     seasons based on the ``climatic_zone`` metadata and
                     crop type
    :param statistics: a single statistic or metric or a list of
                       statistics and metrics to call, these must be
                       defined in ``stats.py`` or ``ozone_metrics.py``
    :param data: data containing a list of date time values and
                 associated parameter values on which to calculate the
                 statistics;
                 if not given, both ``datetimes`` and ``values`` must be
                 given instead
    :param metadata: metadata information about the station's latitude,
                     longitude and climatic zone (keys: ``station_lat``,
                     ``station_lon`` and ``station_climatic_zone``);
                     if not given and any requested statistic or metric
                     needs metadata information, ``station_lat``,
                     ``station_lon`` and ``station_climatic_zone`` must
                     be given instead
    :param seasons: a list of season names for seasonal statistics;
                    for a definition of seasons, see ``stats_utils.py``;
                    if ``None`` is passed, seasonal statistics will be
                    computed for the default seasons of the respective
                    metrics, normally, these are the four meteorological
                    seasons ``DJF``, ``MAM``, ``JJA`` and ``SON``;
                    if sampling is set to ``summer`` or ``xsummer``, the
                    correct season will be determined based on the
                    ``station_lat`` metadata;
                    if sampling is ``vegseason`` and the ``crops``
                    argument is given, the appropriate growing seasons
                    will be selected based on the crop type and
                    ``climatic_zone`` metadata;
                    the growing seasons for ``wheat`` and ``rice`` will
                    also be selected if sampling is ``seasonal`` and the
                    chosen metrics contain ``aot40`` or ``w126``
    :param crops: a single crop type or a list of crop types for
                  ``vegseason`` statistics;
                  default is ``["wheat", "rice"]``
    :param min_data_capture: a fractional value which will be used to
                             identify valid data periods;
                             the default is 0.75 for most statistics,
                             meaning that 75% of hourly values must be
                             present in a given interval in order to
                             mark a result as valid;
                             note that the ``count``, ``mean`` and
                             ``stddev`` statistics do not use this
                             capture criterion, ``count`` counts all
                             values, ``mean`` and ``stddev`` are
                             calculated when there are at least 10 valid
                             hourly values in an interval;
                             the fraction may not always be applied to
                             original hourly values, but could for
                             example also be used to count the number of
                             valid days for a ``monthly``, ``seasonal``,
                             or ``annual`` statistic
    :param datetimes: must be given with ``values`` if the ``data``
                      argument is missing
    :param values: must be given with ``datetimes`` if the ``data``
                   argument is missing
    :param station_lat: station's latitude, used if missing in the
                        ``metadata`` argument
    :param station_lon: station's longitude, used if missing in the
                        ``metadata`` argument
    :param station_climatic_zone: station's climatic zone, used if
                                  missing in the ``metadata`` argument
    """

trends

This subpackage contains a collection of regression methods.

Import

To use the package import calculate_trend with:

from toarstats.trends import calculate_trend # or
from toarstats.trends import * # or
import toarstats.trends

Interface

The calculate_trend interface is defined like this:

calculate_trend(method, data, formula="value ~ datetime", quantiles=None):
    """Calculate the trend using the requested method.

    This function is the public interface for the ``trends`` subpackage.
    It takes all the user inputs and returns the result of the requested
    trend analysis.

    :param method: either ``"OLS"`` or ``"quant"``
    :param data: data containing a list of date time values and
                 associated parameter values on which to calculate the
                 trend
    :param formula: the formula specifying the model
    :param quantiles: a single quantile or a list of quantiles to
                      calculate, these must be between 0 and 1; only
                      needed when ``method="quant"``
    """