(Meta) Data versioning and changelog tables are defined
General Remark
Only processing logs that were operating on the database will go to the changelog tables defined below.
Processing logs from database insertion from the raw data (which was communicated with database providers) will be kept separately outside the database.
Changelog Tables
StationmetaChangelog
column name | type | description | required |
---|---|---|---|
datetime | timestamp with time zone | Yes | |
station_id | bigint | Yes | |
type_of_change | int | controlled -- see below | Yes |
description | text | Yes | |
old_value | character varying(256) | Yes | |
new_value | character varying(256) | Yes | |
author | foreign reference unity | (see below) | Yes |
Remarks:
column name | type | description | required |
---|---|---|---|
coordinate_validation_status | string | Yes | |
coordinate_validation_date | timestamp with time zone | Yes | |
coordinate_validator_id | integer | Yes |
See Update from Oct 8th, 2020 (https://gitlab.version.fz-juelich.de/toar/toardb_fastapi/-/issues/17#note_61137) regarding coordinate_validation_status.
will be removed from StationmetaCoreBase and will only appear in the StationsChangelog table.
TimeseriesChangelog
All data ingestions (also extensions of time series) will also create a changelog entry!
column name | type | description | required |
---|---|---|---|
datetime | timestamp with time zone | Yes | |
timeseries_id | bigint | Yes | |
type_of_change | int | controlled -- see below | Yes |
description | text | Yes | |
old_value | character varying(254) | (also for metadata updates) | Yes |
new_value | character varying(254) | (also for metadata updates) | Yes |
period_start | timestamp with time zone | for changes on data | No |
period_end | timestamp with time zone | for changes on data | No |
version | char(28) | the label of the new version, which is introduced by this change | Yes |
author | foreign reference unity | (see below) | Yes |
type_of_change – try with a common controlled vocabulary for both tables:
0 – created
1 – single value correction in metadata
2 – comprehensive metadata revision
3 – typographic correction of metadata
4 – unspecified data value corrections (this holds also, if there is only one single value to be corrected; the addition "unspecified" keeps all possibilities open to add "specified" corrections later (e. g. from QC)
5 – replaced data with a new version
6 – data value flagging
7 -
“whodunit“ (author):
(from "The FREE online Pro Git book"):
You may be wondering what the difference is between author and committer. The author is the person who originally wrote the patch, whereas the committer is the person who last applied the patch. So, if you send in a patch to a project and one of the core members applies the patch, both of you get credit — you as the author and the core member as the committer.
decision for TOAR database: Role of author is sufficient (no committer needed).
(Meta)Data Versions:
EXCURSUS: VERSION NUMBERS (STANDARDS)
PEP 386 defines the standard way to specify versions within the Python community.
verlib 0.1:
pip install verlib
(Last released: Feb 15, 2012)
The most common scenarios:
- Major.Minor.Micro
- Major.Minor.Micro.alpha/beta
Here is an example that conforms to PEP 386:
version = { 'major': ...,
'minor': ...,
'micro': ...,
'releaselevel': ..., ('alpha', 'beta', 'final')
'serial': ...
}
Python Enhancement Proposals (PEPs): https://www.python.org/dev/peps
ACTUAL DECISIONS FOR TOARDB
Format of version string: "major.minor.micro" (e. g. 000001.000001.20200911100000)
micro:
- timestamp (yyyymmddhhMMss)
- used for all data insertions
(by this approach snapshots of the database given a special date in the past can be accomplished even if time series have been extended) - will not be logged in changelog tables
minor:
- string with 6 digits
- will be logged in changelog tables
major:
- string with 6 digits
- will be logged in changelog tables
- preliminary data will always have major version "000000"
Tests on Data Versions:
Tests on data versions will be done to clarify if indexing the string format may be slow (when accessing the data).
There might be a switch to big int and three columns for major, minor, micro.
some thoughts (not to be forgotten):
- use special hash index for version string
- what will happen, if (f. ex.):
data available for the whole year, March has to be scaled (due to conversion error)
--> period_start: March 1st, period_end: March 31st, description: "unit conversion..."
--> version changes from 1.0 to 1.1 (for March)
==> now March values will be doubled:
- version 1.0 ("wrong" values):These values will go to table "data_archive".
- version 1.1 (converted values): These values will go to table "data".