toarstats
This repository contains a collection of statistical tools for the analysis of time series data. It is split into two subpackages:
- metrics: collection of statistics and metrics to calculate on hourly time series data (some specific to ozone measurements)
- trends: calculate quantile regression on time series data
Installation
To install the package in a specific version and all dependencies run
the following command from within the dist
folder of this repository:
python3 -m pip install toarstats-<version>-py3-none-any.whl
It is advised to set up a virtual environment beforehand.
metrics
This subpackage contains a collection of statistics that can be
calculated on hourly data. The statistics in the ozone_metrics.py
file are specific to ozone data. The statistics in the stats.py
file
can be calculated for other variables as well.
Import
To use the package import calculate_statistics
with:
from toarstats.metrics import calculate_statistics # or
from toarstats.metrics import * # or
import toarstats.metrics
Interface
The calculate_statistics
interface is defined like this:
calculate_statistics(
sampling=None, statistics=None, data=None, metadata=None, seasons=None,
crops=None, min_data_capture=None, datetimes=None, values=None,
station_lat=None, station_lon=None, station_climatic_zone=None
)
"""Calculate the requested statistics.
This function is the public interface for the ``toarstats`` package.
It takes all the user inputs and returns the result of all requested
statistics and metrics.
:param sampling: temporal aggregation, one of ``daily``,
``monthly``, ``seasonal``, ``vegseason``,
``summer``, ``xsummer``, ``annual``, or ``custom``;
``summer`` will pick the 6-months summer season in
the hemisphere where the station is located;
``xsummer`` does the same for a 7-months summer
season;
``vegseason`` requires also the ``crops`` argument
and will then determine the appropriate growing
seasons based on the ``climatic_zone`` metadata and
crop type;
``custom`` will create one aggregate value over the
entire time series
:param statistics: a single statistic or metric or a list of
statistics and metrics to call, these must be
defined in ``stats.py`` or ``ozone_metrics.py``
:param data: data containing a list of date time values and
associated parameter values on which to calculate the
statistics;
if not given, both ``datetimes`` and ``values`` must be
given instead
:param metadata: metadata information about the station's latitude,
longitude and climatic zone (keys: ``station_lat``,
``station_lon`` and ``station_climatic_zone``);
if not given and any requested statistic or metric
needs metadata information, ``station_lat``,
``station_lon`` and ``station_climatic_zone`` must
be given instead
:param seasons: a list of season names for seasonal statistics;
for a definition of seasons, see ``stats_utils.py``;
if ``None`` is passed, seasonal statistics will be
computed for the default seasons of the respective
metrics, normally, these are the four meteorological
seasons ``DJF``, ``MAM``, ``JJA`` and ``SON``;
if sampling is set to ``summer`` or ``xsummer``, the
correct season will be determined based on the
``station_lat`` metadata;
if sampling is ``vegseason`` and the ``crops``
argument is given, the appropriate growing seasons
will be selected based on the crop type and
``climatic_zone`` metadata;
the growing seasons for ``wheat`` and ``rice`` will
also be selected if sampling is ``seasonal`` and the
chosen metrics contain ``aot40`` or ``w126``
:param crops: a single crop type or a list of crop types for
``vegseason`` statistics;
default is ``["wheat", "rice"]``
:param min_data_capture: a fractional value which will be used to
identify valid data periods;
the default is 0.75 for most statistics,
meaning that 75% of hourly values must be
present in a given interval in order to
mark a result as valid;
note that the ``count``, ``mean`` and
``stddev`` statistics do not use this
capture criterion, ``count`` counts all
values, ``mean`` and ``stddev`` are
calculated when there are at least 10 valid
hourly values in an interval;
the fraction may not always be applied to
original hourly values, but could for
example also be used to count the number of
valid days for a ``monthly``, ``seasonal``,
or ``annual`` statistic
:param datetimes: must be given with ``values`` if the ``data``
argument is missing
:param values: must be given with ``datetimes`` if the ``data``
argument is missing
:param station_lat: station's latitude, used if missing in the
``metadata`` argument
:param station_lon: station's longitude, used if missing in the
``metadata`` argument
:param station_climatic_zone: station's climatic zone, used if
missing in the ``metadata`` argument
"""
trends
This subpackage contains a collection of regression methods.
Import
To use the package import calculate_trend
with:
from toarstats.trends import calculate_trend # or
from toarstats.trends import * # or
import toarstats.trends
Interface
The calculate_trend
interface is defined like this:
calculate_trend(method, data, quantiles=None, num_samples=1000)
"""Calculate the trend using the requested method.
This function is the public interface for the ``trends`` subpackage.
It takes all the user inputs and returns the result of the requested
trend analysis.
The calculation follows "Guidance note on best statistical practices
for TOAR analyses" (Chang et al. 2023,
https://arxiv.org/pdf/2304.14236.pdf) Annex E.
:param method: either ``"OLS"`` or ``"quant"``
:param data: data containing a list of date time values and
associated parameter values on which to calculate the
trend
:param quantiles: a single quantile or a list of quantiles to
calculate, these must be between 0 and 1; only
needed when ``method="quant"``
:param num_samples: number of sampled trends in moving block
bootstrap
"""