toarstats
This repository contains a collection of statistical tools for the analysis of time series data. It is split into two subpackages:
- metrics: collection of statistics and metrics to calculate on hourly time series data (some specific to ozone measurements)
- trends: calculate quantile regression on time series data
Installation
To install the package in a specific version and all dependencies run
the following command from within the dist
folder of this repository:
python3 -m pip install toarstats-<version>-py3-none-any.whl
It is advised to set up a virtual environment beforehand.
metrics
This subpackage contains a collection of statistics that can be
calculated on hourly data. The statistics in the ozone_metrics.py
file are specific to ozone data. The statistics in the stats.py
file
can be calculated for other variables as well.
Import
To use the package import calculate_statistics
with:
from toarstats.metrics import calculate_statistics # or
from toarstats.metrics import * # or
import toarstats.metrics
Interface
The calculate_statistics
interface is defined like this:
calculate_statistics(
sampling=None, statistics=None, data=None, metadata=None, seasons=None,
crops=None, min_data_capture=None, datetimes=None, values=None,
station_lat=None, station_lon=None, station_climatic_zone=None
)
"""Calculate the requested statistics.
This function is the public interface for the ``toarstats`` package.
It takes all the user inputs and returns the result of all requested
statistics and metrics.
:param sampling: temporal aggregation, one of ``daily``,
``monthly``, ``seasonal``, ``vegseason``,
``summer``, ``xsummer``, or ``annual``;
``summer`` will pick the 6-months summer season in
the hemisphere where the station is located;
``xsummer`` does the same for a 7-months summer
season;
``vegseason`` requires also the ``crops`` argument
and will then determine the appropriate growing
seasons based on the ``climatic_zone`` metadata and
crop type
:param statistics: a single statistic or metric or a list of
statistics and metrics to call, these must be
defined in ``stats.py`` or ``ozone_metrics.py``
:param data: data containing a list of date time values and
associated parameter values on which to calculate the
statistics;
if not given, both ``datetimes`` and ``values`` must be
given instead
:param metadata: metadata information about the station's latitude,
longitude and climatic zone (keys: ``station_lat``,
``station_lon`` and ``station_climatic_zone``);
if not given and any requested statistic or metric
needs metadata information, ``station_lat``,
``station_lon`` and ``station_climatic_zone`` must
be given instead
:param seasons: a list of season names for seasonal statistics;
for a definition of seasons, see ``stats_utils.py``;
if ``None`` is passed, seasonal statistics will be
computed for the default seasons of the respective
metrics, normally, these are the four meteorological
seasons ``DJF``, ``MAM``, ``JJA`` and ``SON``;
if sampling is set to ``summer`` or ``xsummer``, the
correct season will be determined based on the
``station_lat`` metadata;
if sampling is ``vegseason`` and the ``crops``
argument is given, the appropriate growing seasons
will be selected based on the crop type and
``climatic_zone`` metadata;
the growing seasons for ``wheat`` and ``rice`` will
also be selected if sampling is ``seasonal`` and the
chosen metrics contain ``aot40`` or ``w126``
:param crops: a single crop type or a list of crop types for
``vegseason`` statistics;
default is ``["wheat", "rice"]``
:param min_data_capture: a fractional value which will be used to
identify valid data periods;
the default is 0.75 for most statistics,
meaning that 75% of hourly values must be
present in a given interval in order to
mark a result as valid;
note that the ``count``, ``mean`` and
``stddev`` statistics do not use this
capture criterion, ``count`` counts all
values, ``mean`` and ``stddev`` are
calculated when there are at least 10 valid
hourly values in an interval;
the fraction may not always be applied to
original hourly values, but could for
example also be used to count the number of
valid days for a ``monthly``, ``seasonal``,
or ``annual`` statistic
:param datetimes: must be given with ``values`` if the ``data``
argument is missing
:param values: must be given with ``datetimes`` if the ``data``
argument is missing
:param station_lat: station's latitude, used if missing in the
``metadata`` argument
:param station_lon: station's longitude, used if missing in the
``metadata`` argument
:param station_climatic_zone: station's climatic zone, used if
missing in the ``metadata`` argument
"""
trends
This subpackage contains a collection of regression methods.
Import
To use the package import calculate_trend
with:
from toarstats.trends import calculate_trend # or
from toarstats.trends import * # or
import toarstats.trends
Interface
The calculate_trend
interface is defined like this:
calculate_trend(method, data, formula="value ~ datetime", quantiles=None):
"""Calculate the trend using the requested method.
This function is the public interface for the ``trends`` subpackage.
It takes all the user inputs and returns the result of the requested
trend analysis.
:param method: either ``"OLS"`` or ``"quant"``
:param data: data containing a list of date time values and
associated parameter values on which to calculate the
trend
:param formula: the formula specifying the model
:param quantiles: a single quantile or a list of quantiles to
calculate, these must be between 0 and 1; only
needed when ``method="quant"``
"""