(Meta) Data versioning and changelog tables are defined

General Remark

Only processing logs that were operating on the database will go to the changelog tables defined below.
Processing logs from database insertion from the raw data (which was communicated with database providers) will be kept separately outside the database.

Changelog Tables

StationmetaChangelog

column name type description required
datetime timestamp with time zone Yes
station_id bigint Yes
type_of_change int controlled -- see below Yes
description text Yes
old_value character varying(256) Yes
new_value character varying(256) Yes
author foreign reference unity (see below) Yes

Remarks:

column name type description required
coordinate_validation_status string Yes
coordinate_validation_date timestamp with time zone Yes
coordinate_validator_id integer Yes

See Update from Oct 8th, 2020 (https://gitlab.version.fz-juelich.de/toar/toardb_fastapi/-/issues/17#note_61137) regarding coordinate_validation_status.

will be removed from StationmetaCoreBase and will only appear in the StationsChangelog table.

TimeseriesChangelog

All data ingestions (also extensions of time series) will also create a changelog entry!

column name type description required
datetime timestamp with time zone Yes
timeseries_id bigint Yes
type_of_change int controlled -- see below Yes
description text Yes
old_value character varying(254) (also for metadata updates) Yes
new_value character varying(254) (also for metadata updates) Yes
period_start timestamp with time zone for changes on data No
period_end timestamp with time zone for changes on data No
version char(28) the label of the new version, which is introduced by this change Yes
author foreign reference unity (see below) Yes

type_of_change – try with a common controlled vocabulary for both tables:

0 – created
1 – single value correction in metadata
2 – comprehensive metadata revision
3 – typographic correction of metadata
4 – unspecified data value corrections (this holds also, if there is only one single value to be corrected; the addition "unspecified" keeps all possibilities open to add "specified" corrections later (e. g. from QC)
5 – replaced data with a new version
6 – data value flagging
7 -

“whodunit“ (author):

(from "The FREE online Pro Git book"):
You may be wondering what the difference is between author and committer. The author is the person who originally wrote the patch, whereas the committer is the person who last applied the patch. So, if you send in a patch to a project and one of the core members applies the patch, both of you get credit — you as the author and the core member as the committer.
decision for TOAR database: Role of author is sufficient (no committer needed).

(Meta)Data Versions:

EXCURSUS: VERSION NUMBERS (STANDARDS)

PEP 386 defines the standard way to specify versions within the Python community.
verlib 0.1:
pip install verlib (Last released: Feb 15, 2012)

The most common scenarios:

  • Major.Minor.Micro
  • Major.Minor.Micro.alpha/beta

Here is an example that conforms to PEP 386:

version = { 'major': ...,
            'minor': ...,
            'micro': ...,
            'releaselevel': ...,  ('alpha', 'beta', 'final') 
            'serial': ... 
          }

Python Enhancement Proposals (PEPs): https://www.python.org/dev/peps

ACTUAL DECISIONS FOR TOARDB

Format of version string: "major.minor.micro" (e. g. 000001.000001.20200911100000)

micro:

  • timestamp (yyyymmddhhMMss)
  • used for all data insertions
    (by this approach snapshots of the database given a special date in the past can be accomplished even if time series have been extended)
  • will not be logged in changelog tables

minor:

  • string with 6 digits
  • will be logged in changelog tables

major:

  • string with 6 digits
  • will be logged in changelog tables
  • preliminary data will always have major version "000000"

Tests on Data Versions:

Tests on data versions will be done to clarify if indexing the string format may be slow (when accessing the data).
There might be a switch to big int and three columns for major, minor, micro.

some thoughts (not to be forgotten):

  • use special hash index for version string
  • what will happen, if (f. ex.):
    data available for the whole year, March has to be scaled (due to conversion error)
    --> period_start: March 1st, period_end: March 31st, description: "unit conversion..."
    --> version changes from 1.0 to 1.1 (for March)
    ==> now March values will be doubled:
    - version 1.0 ("wrong" values):These values will go to table "data_archive".
    - version 1.1 (converted values): These values will go to table "data".
Edited by Sabine Schröder