diff --git a/docs/_source/customise.rst b/docs/_source/customise.rst index cb77eb63f8bf1a53d54ca2e0f80fd9bdeb93a0b0..14738fcb6751495b5dcb10b83bcd2d420b79d71b 100644 --- a/docs/_source/customise.rst +++ b/docs/_source/customise.rst @@ -225,6 +225,9 @@ A data handler should inherit from the :py:`AbstractDataHandler` class. This cla * :py:`self.transformation(*args, **kwargs)` is a placeholder to execute any desired transformation. This class method is called during the preprocessing stage in the default MLAir workflow. Note that a transformation operation is only estimated on the train data subset and afterwards applied on all data subsets. +* :py:`self.apply_transformation(data, inverse=False, **kwargs)` is used in the postprocessing to apply inverse + transformation on the model prediction. This method applies a transformation stored internally in the data handler and + returns the (inverse) transformed data. * :py:`self.get_coordinates()` is a placeholder and can be used to return a position for a geographical overview plot. During the preprocessing stage the following is executed: @@ -241,6 +244,9 @@ During the preprocessing stage the following is executed: Later on during ModelSetup, Training and PostProcessing, MLAir requests data using :py:`data_handler.get_X()` and :py:`data_handler.get_Y()`. +In PostProcessing, MLAir applies inverse transformation to some data by calling +:py:`data_handler.apply_transformation(`data, inverse=True, **kwargs)'. + Default Data Handler ~~~~~~~~~~~~~~~~~~~~ @@ -252,12 +258,31 @@ Custom Data Handler * Choose your personal data source, either a web interface or locally available data. * Create your custom data handler class by inheriting from :py:`AbstractDataHandler`. -* Implement the initialiser :py:`__init__(*args, **kwargs)` and make sure to call the super class initialiser as well. - After executing this method data should be ready to use. Besides there are no further rules for the initialiser. +* Implement the initializer :py:`__init__(*args, **kwargs)` and make sure to call the super class initializer as well. + After executing this method data should be ready to use. Besides there are no further rules for the initializer. +* Implement the data providers :py:`get_X(upsampling=False, as_numpy=False)` and + :py:`get_Y(upsampling=False, as_numpy=False)` to return inputs (X) and targets (Y). These methods should be able to + return the data both in xarray and numpy format. The numpy format is used for training whereas the xarray is used for + postprocessing. The :py:`upsampling` argument can be used to implement a custom method how to deal with extreme values + that is only enabled during training. The argument :py:`as_numpy` should trigger a numpy or xarray return format. +* Implement the :py:`apply_transformation(data, inverse=False, **kwargs)` to provide a proper data scaling. If no + scaling is used (see annotations to :py:`transformation()`) it is sufficient to return the given data without any + modification. In all other cases, apply the transformation internally and return the calculated data. It is important + that the custom data handler supports the :py:`inverse` parameter, because it is used in the postprocessing stage. + The method should therefore return data that are processed by an inverse transformation (original value space). +* (optionally) Create a custom :py:`transformation()` method that transforms data. All parameters required for this + method should already be queried during the initialization of the data handler. For communication between + data handler and MLAir the keyword "transformation" is used. If the custom :py:`transformation()` returns a value, it + is stored inside MLAir. To use this parameter again, it is only required to add a parameter named "transformation" in + the initializer's arguments. When using the default MLAir workflow (or the HPC version), MLAir only executes this + method when creating the train data subset. Therefore a transformation logic can be created on the train data and can + afterwards applied on validation and test data. If transformation parameters are fixed before running a MLAir + Workflow, it is not required to implement this method. Just use the keyword "transformation" to parse the information + to the data handler. * (optionally) Modify the class method :py:`cls.build(*args, **kwargs)` to calculate pre-build operations. Otherwise the - data handler calls the class initialiser. On modification make sure to return the class at the end. + data handler calls the class initializer. On modification make sure to return the class at the end. * (optionally) Add names of required arguments to the :py:`cls._requirements` list. It is not required to add args and - kwargs from the initialiser, they are added automatically. Modifying the requirements is only necessary if the build + kwargs from the initializer, they are added automatically. Modifying the requirements is only necessary if the build method is modified (see previous bullet). * (optionally) Overwrite the base class :py:`self.get_coordinates()` method to return coordinates as dictionary with keys *lon* and *lat*. diff --git a/mlair/run_modules/post_processing.py b/mlair/run_modules/post_processing.py index eaa593050ccf5dcaf120e80b8c190302b83f054f..f1fd1d533c4c62012393a0115db17fbeb1bae017 100644 --- a/mlair/run_modules/post_processing.py +++ b/mlair/run_modules/post_processing.py @@ -667,8 +667,8 @@ class PostProcessing(RunEnvironment): try: data = self.train_val_data[station] observation = data.get_observation() - transformation_opts = data.get_transformation_Y() - external_data = self._create_observation(observation, None, transformation_opts, normalised=False) + transformation_func = data.apply_transformation + external_data = self._create_observation(observation, None, transformation_func, normalised=False) return external_data.rename({external_data.dims[0]: 'index'}) except (IndexError, KeyError): return None