Alternative data preprocessing
In this branch, a new approach for data preprocessing is tested.
Instead of the multi-step approach that is currently implemented (grib -> netCDF -> pickle -> tfRecords), a two-step approach is tried out here.
The required data is directly processed from grib- to netCDF-files, where all operations on the data (i.e. variable selection, slicing and vertical interpolation) are performed with the help of CDO
.
The data is then collected into monthly netCDF-files that can then be further merged to asingle netCDF-file for the training (validation and testing) dataset.
This netCDF-file can then be loaded into (CPU-)memory during training which enables sequence generation and data normalization on-the-fly. Note, that this also adds flexibility to the normalization process, since the required statistics can be computed quickly once the data fits into memory.
However, it must be figured out that this approach yields an efficient data-pipeline.