toarstats
This repository contains a collection of statistics that can be
calculated on hourly data. The statistics in the ozone_metrics.py
file are specific to ozone data. The statistics in the stats.py
file
can be calculated for other variables as well.
Installation
To install the package in a specific version and all dependencies run
the following command from within the dist
folder of this repository:
python3 -m pip install toarstats-<version>-py3-none-any.whl
It is advised to set up a virtual environment beforehand.
Usage
Import
To use the package import calculate_statistics
with:
from toarstats import calculate_statistics # or
from toarstats import * # or
import toarstats
Interface
The calculate_statistics
interface is defined like this:
calculate_statistics(
sampling=None, statistics=None, data=None, metadata=None, seasons=None,
crops=None, data_capture=None, datetimes=None, values=None,
station_lat=None, station_lon=None, station_climatic_zone=None
)
"""Calculate the requested statistics.
This function is the public interface for the ``toarstats`` package.
It takes all the user inputs and returns the result of all requested
statistics and metrics.
:param sampling: temporal aggregation, one of ``daily``,
``monthly``, ``seasonal``, ``vegseason``,
``summer``, ``xsummer``, or ``annual``;
``summer`` will pick the 6-months summer season in
the hemisphere where the station is located;
``xsummer`` does the same for a 7-months summer
season;
``vegseason`` requires also the ``crops`` argument
and will then determine the appropriate growing
seasons based on the ``climatic_zone`` metadata and
crop type
:param statistics: a single statistic or metric or a list of
statistics and metrics to call, these must be
defined in ``stats.py`` or ``ozone_metrics.py``
:param data: data containing a list of date time values and
associated parameter values on which to calculate the
statistics;
if not given, both ``datetimes`` and ``values`` must be
given instead
:param metadata: metadata information about the station's latitude,
longitude and climatic zone (keys: ``station_lat``,
``station_lon`` and ``station_climatic_zone``);
if not given and any requested statistic or metric
needs metadata information, ``station_lat``,
``station_lon`` and ``station_climatic_zone`` must
be given instead
:param seasons: a list of season names for seasonal statistics;
for a definition of seasons, see ``stats_utils.py``;
if ``None`` is passed, seasonal statistics will be
computed for the default seasons of the respective
metrics, normally, these are the four meteorological
seasons ``DJF``, ``MAM``, ``JJA`` and ``SON``;
if sampling is set to ``summer`` or ``xsummer``, the
correct season will be determined based on the
``station_lat`` metadata;
if sampling is ``vegseason`` and the ``crops``
argument is given, the appropriate growing seasons
will be selected based on the crop type and
``climatic_zone`` metadata;
the growing seasons for ``wheat`` and ``rice`` will
also be selected if sampling is ``seasonal`` and the
chosen metrics contain ``aot40`` or ``w126``
:param crops: a single crop type or a list of crop types for
``vegseason`` statistics;
default is ``["wheat", "rice"]``
:param data_capture: a fractional value which will be used to
identify valid data periods;
the default is 0.75 for most statistics,
meaning that 75% of hourly values must be
present in a given interval in order to mark a
result as valid;
note that the ``value_count``, ``mean`` and
``standard_deviation`` statistics do not use
this capture criterion, ``value_count`` counts
all values, ``mean`` and ``standard_deviation``
are calculated when there are at least 10 valid
hourly values in an interval;
the fraction may not always be applied to
original hourly values, but could for example
also be used to count the number of valid days
for a ``monthly``, ``seasonal``, or ``annual``
statistic
:param datetimes: must be given with ``values`` if the ``data``
argument is missing
:param values: must be given with ``datetimes`` if the ``data``
argument is missing
:param station_lat: station's latitude, used if missing in the
``metadata`` argument
:param station_lon: station's longitude, used if missing in the
``metadata`` argument
:param station_climatic_zone: station's climatic zone, used if
missing in the ``metadata`` argument
"""