Skip to content
Snippets Groups Projects
Closed BUG: wrong time extend when using lazy preprocessing
  • View options
  • BUG: wrong time extend when using lazy preprocessing

  • View options
  • Closed Issue created by Ghost User

    Bug

    Error description

    Although there should be data for a longer period, MLAir cannot find data for val and test subsets.

    Error message

    only indirect error occuring at a later stage

    First guess on error origin

    • There must something going wrong with the lazy preprocessing. Either the loading is not executed properly, or something with the storing is erroneous. Therefore check the loading properly!

    • There is a mismatch in the hash, so that the file is not overwriten. A check of the content of the hash looks as there is a issue with the data_origin parameter which unintentionally changes to default values and does not keep the given parameters from experiment setup. That the lazy preprocessing is not working correctly might be a direct result from this issue. Let's check this out.

    Error origin

    There are two issues:

    1. With enabled overwriting of lazy data, there is not removal of an existing pickle file, that should be overwritten. But when storing the newly calculated pickle file, there is a check if the file already exists and in this case the file is not overwritten (but then still the old file perseveres).

    2. The is a JOIN helper function helpers/join.py:download_join which works with the external parameter data_origin. Unintentionally, this method changes this parameter. As this parameter is a dictionary which is always parsed as reference and not as copy, this has an impact on the self.data_origin attributes of a data handler. Consequentially, the parameter changes between the existing check of loading lazy data and the existing check while storing as lazy data (this leads to a deviating hash and thus to two separate files).

    Solution

    1. Change line in mlair/data_handler/data_handler_single_station.py. It is not required to catch the error, if filename is not existing, as this will cause an FileNotFoundError which would be anyway raised afterwards.
    class DataHandlerSingleStation(AbstractDataHandler):
        ....
        def load_lazy(self):
            hash = self._get_hash()
            filename = os.path.join(self.lazy_path, hash + ".pickle")
            try:
                if self.overwrite_lazy_data is True:
    +               os.remove(filename)
                    raise FileNotFoundError
    1. Add a line to copy a given dictionary for data_origin in mlair/helpers/join.py:download_join
    def download_join(...):
        ....
        # make sure station_name parameter is a list
        station_name = helpers.to_list(station_name)
    +                                                            
    +   # also ensure that given data_origin dict is no reference
    +   data_origin = None if data_origin is None else {k: v for (k, v) in data_origin.items()}
    
        # get data connection settings
        join_url_base, headers = join_settings(sampling)
    
    
    Edited by Ghost User

    Linked items ... 0

  • Activity

    • All activity
    • Comments only
    • History only
    • Newest first
    • Oldest first
    Loading Loading Loading Loading Loading Loading Loading Loading Loading Loading