small fix in postprocessing, updated data handler description

805c1fe6 · leufen1 · f01e4fcc · 805c1fe6 · 805c1fe6
Commit 805c1fe6 authored 4 years ago by leufen1
--- a/docs/_source/customise.rst
+++ b/docs/_source/customise.rst
@@ -225,6 +225,9 @@ A data handler should inherit from the :py:`AbstractDataHandler` class. This cla
 * :py:`self.transformation(*args, **kwargs)` is a placeholder to execute any desired transformation. This class method
  is called during the preprocessing stage in the default MLAir workflow. Note that a transformation operation is only
  estimated on the train data subset and afterwards applied on all data subsets.
+* :py:`self.apply_transformation(data, inverse=False, **kwargs)` is used in the postprocessing to apply inverse
+  transformation on the model prediction. This method applies a transformation stored internally in the data handler and
+  returns the (inverse) transformed data.
 * :py:`self.get_coordinates()` is a placeholder and can be used to return a position for a geographical overview plot.

 During the preprocessing stage the following is executed:
@@ -241,6 +244,9 @@ During the preprocessing stage the following is executed:
 Later on during ModelSetup, Training and PostProcessing, MLAir requests data using :py:`data_handler.get_X()` and
 :py:`data_handler.get_Y()`.

+In PostProcessing, MLAir applies inverse transformation to some data by calling
+:py:`data_handler.apply_transformation(`data, inverse=True, **kwargs)'.
+
 Default Data Handler
 ~~~~~~~~~~~~~~~~~~~~

@@ -252,12 +258,31 @@ Custom Data Handler

 * Choose your personal data source, either a web interface or locally available data.
 * Create your custom data handler class by inheriting from :py:`AbstractDataHandler`.
-* Implement the initialiser :py:`__init__(*args, **kwargs)` and make sure to call the super class initialiser as well.
-  After executing this method data should be ready to use. Besides there are no further rules for the initialiser.
+* Implement the initializer :py:`__init__(*args, **kwargs)` and make sure to call the super class initializer as well.
+  After executing this method data should be ready to use. Besides there are no further rules for the initializer.
+* Implement the data providers :py:`get_X(upsampling=False, as_numpy=False)` and
+  :py:`get_Y(upsampling=False, as_numpy=False)` to return inputs (X) and targets (Y). These methods should be able to
+  return the data both in xarray and numpy format. The numpy format is used for training whereas the xarray is used for
+  postprocessing. The :py:`upsampling` argument can be used to implement a custom method how to deal with extreme values
+  that is only enabled during training. The argument :py:`as_numpy` should trigger a numpy or xarray return format.
+* Implement the :py:`apply_transformation(data, inverse=False, **kwargs)` to provide a proper data scaling. If no
+  scaling is used (see annotations to :py:`transformation()`) it is sufficient to return the given data without any
+  modification. In all other cases, apply the transformation internally and return the calculated data. It is important
+  that the custom data handler supports the :py:`inverse` parameter, because it is used in the postprocessing stage.
+  The method should therefore return data that are processed by an inverse transformation (original value space).
+* (optionally) Create a custom :py:`transformation()` method that transforms data. All parameters required for this
+  method should already be queried during the initialization of the data handler. For communication between
+  data handler and MLAir the keyword "transformation" is used. If the custom :py:`transformation()` returns a value, it
+  is stored inside MLAir. To use this parameter again, it is only required to add a parameter named "transformation" in
+  the initializer's arguments. When using the default MLAir workflow (or the HPC version), MLAir only executes this
+  method when creating the train data subset. Therefore a transformation logic can be created on the train data and can
+  afterwards applied on validation and test data. If transformation parameters are fixed before running a MLAir
+  Workflow, it is not required to implement this method. Just use the keyword "transformation" to parse the information
+  to the data handler.
 * (optionally) Modify the class method :py:`cls.build(*args, **kwargs)` to calculate pre-build operations. Otherwise the
-  data handler calls the class initialiser. On modification make sure to return the class at the end.
+  data handler calls the class initializer. On modification make sure to return the class at the end.
 * (optionally) Add names of required arguments to the :py:`cls._requirements` list. It is not required to add args and
-  kwargs from the initialiser, they are added automatically. Modifying the requirements is only necessary if the build
+  kwargs from the initializer, they are added automatically. Modifying the requirements is only necessary if the build
  method is modified (see previous bullet).
 * (optionally) Overwrite the base class :py:`self.get_coordinates()` method to return coordinates as dictionary with
  keys *lon* and *lat*.

--- a/mlair/run_modules/post_processing.py
+++ b/mlair/run_modules/post_processing.py
@@ -667,8 +667,8 @@ class PostProcessing(RunEnvironment):
        try:
            data = self.train_val_data[station]
            observation = data.get_observation()
-            transformation_opts = data.get_transformation_Y()
-            external_data = self._create_observation(observation, None, transformation_opts, normalised=False)
+            transformation_func = data.apply_transformation
+            external_data = self._create_observation(observation, None, transformation_func, normalised=False)
            return external_data.rename({external_data.dims[0]: 'index'})
        except (IndexError, KeyError):
            return None