BUG: wrong time extend when using lazy preprocessing
- Truncate descriptions
Bug
Error description
Although there should be data for a longer period, MLAir cannot find data for val and test subsets.
Error message
only indirect error occuring at a later stage
First guess on error origin
-
There must something going wrong with the lazy preprocessing. Either the loading is not executed properly, or something with the storing is erroneous. Therefore check the loading properly!
-
There is a mismatch in the hash, so that the file is not overwriten. A check of the content of the hash looks as there is a issue with the
data_origin
parameter which unintentionally changes to default values and does not keep the given parameters from experiment setup. That the lazy preprocessing is not working correctly might be a direct result from this issue. Let's check this out.
Error origin
There are two issues:
-
With enabled overwriting of lazy data, there is not removal of an existing pickle file, that should be overwritten. But when storing the newly calculated pickle file, there is a check if the file already exists and in this case the file is not overwritten (but then still the old file perseveres).
-
The is a JOIN helper function
helpers/join.py:download_join
which works with the external parameterdata_origin
. Unintentionally, this method changes this parameter. As this parameter is a dictionary which is always parsed as reference and not as copy, this has an impact on theself.data_origin
attributes of a data handler. Consequentially, the parameter changes between the existing check of loading lazy data and the existing check while storing as lazy data (this leads to a deviating hash and thus to two separate files).
Solution
- Change line in
mlair/data_handler/data_handler_single_station.py
. It is not required to catch the error, iffilename
is not existing, as this will cause anFileNotFoundError
which would be anyway raised afterwards.
class DataHandlerSingleStation(AbstractDataHandler):
....
def load_lazy(self):
hash = self._get_hash()
filename = os.path.join(self.lazy_path, hash + ".pickle")
try:
if self.overwrite_lazy_data is True:
+ os.remove(filename)
raise FileNotFoundError
- Add a line to copy a given dictionary for
data_origin
inmlair/helpers/join.py:download_join
def download_join(...):
....
# make sure station_name parameter is a list
station_name = helpers.to_list(station_name)
+
+ # also ensure that given data_origin dict is no reference
+ data_origin = None if data_origin is None else {k: v for (k, v) in data_origin.items()}
# get data connection settings
join_url_base, headers = join_settings(sampling)
- Show labels
- Show closed items