BUG: workflow failure on HPC during clim skill score calcuation

Bug

Error description

Calculate clim skill scores failes on HDFML system.

Error message

2022-02-25 15:34:45,634 - INFO: start calculate_error_metrics  [time_tracking.py:__enter__:131]
2022-02-25 15:34:45,634 - DEBUG: get: forecast_path(general)=/p/project/deepacf/intelliaq/leufen1/demystify-temporal-components/experiments/2022-02-24_MBRNN_LT_ST/2022-02-25_15-22-21_network_daily/forecasts  [datastore.py:__call__:118]
2022-02-25 15:34:45,635 - DEBUG: get: stations(general)=['DEBB051', 'DEBB053', 'DEBE062', 'DEHH063', 'DEMV001', 'DEMV004', 'DEMV012', 'DEMV017', 'DENI031', 'DENI058', 'DENI059', 'DENI060', 'DENI063', 'DESH001', 'DESH008', 'DEST069', 'DEST089', 'DEUB001', 'DEUB005', 'DEUB006', 'DEUB007', 'DEUB020', 'DEUB022', 'DEUB024', 'DEUB026', 'DEUB027', 'DEUB028', 'DEUB030', 'DEUB034', 'DEUB038', 'DEUB040', 'DEBB028', 'DEBB038', 'DEBB039', 'DEBB040', 'DEBB048', 'DEBB050', 'DEBB063', 'DEBB067', 'DEBE051', 'DEHH005', 'DEHH021', 'DEHH022', 'DEHH030', 'DEHH047', 'DEHH049', 'DEHH050', 'DEMV007', 'DEMV018', 'DENI029', 'DENI052', 'DENI062', 'DESH005', 'DESH006', 'DESH016']  [datastore.py:__call__:118]
2022-02-25 15:34:45,635 - DEBUG: No competitor found for combination 'DEBB051' and 'IntelliO3-ts-v1'.  [post_processing.py:load_competitors:273]
2022-02-25 15:34:45,636 - DEBUG: No competitor found for combination 'DEBB051' and 'MB-FCN-LT_ST'.  [post_processing.py:load_competitors:273]
2022-02-25 15:34:45,636 - DEBUG: No competitor found for combination 'DEBB051' and 'MB-RNN-LT_ST'.  [post_processing.py:load_competitors:273]
2022-02-25 15:34:45,636 - DEBUG: No competitor found for combination 'DEBB051' and 'OLS_hourly'.  [post_processing.py:load_competitors:273]
2022-02-25 15:34:45,637 - DEBUG: No competitor found for combination 'DEBB051' and 'RNN_hourly'.  [post_processing.py:load_competitors:273]
2022-02-25 15:34:45,637 - DEBUG: No competitor found for combination 'DEBB051' and 'FCN_1449_512_32_4'.  [post_processing.py:load_competitors:273]
2022-02-25 15:34:46,661 - INFO: calculate_error_metrics finished after 0:00:02 (hh:mm:ss)  [time_tracking.py:__exit__:137]
2022-02-25 15:34:46,661 - ERROR: Reindexing only valid with uniquely valued Index objects  [run_environment.py:__exit__:137]
Traceback (most recent call last):
  File "/p/home/jusers/leufen1/hdfml/intelliaq/mlair_tf2/mlair/mlair/workflows/abstract_workflow.py", line 31, in run
    stage(**self._registry_kwargs[pos])
  File "/p/home/jusers/leufen1/hdfml/intelliaq/mlair_tf2/mlair/mlair/run_modules/post_processing.py", line 99, in __init__
    self._run()
  File "/p/home/jusers/leufen1/hdfml/intelliaq/mlair_tf2/mlair/mlair/run_modules/post_processing.py", line 129, in _run
    skill_score_competitive, _, skill_score_climatological, errors = self.calculate_error_metrics()
  File "/p/home/jusers/leufen1/hdfml/intelliaq/mlair_tf2/mlair/mlair/run_modules/post_processing.py", line 1000, in calculate_error_metrics
    skill_score_climatological[station] = skill_score.climatological_skill_scores(
  File "/p/home/jusers/leufen1/hdfml/intelliaq/mlair_tf2/mlair/mlair/helpers/statistics.py", line 387, in climatological_skill_scores
    external_data = self.external_data.sel({self.ahead_dim: iahead, self.type_dim: [self.observation_name]})
  File "/p/software/hdfml/stages/2020/software/Jupyter/2021.3.1-gcccoremkl-10.3.0-2021.2.0-Python-3.8.5/lib/python3.8/site-packages/xarray/core/dataarray.py", line 1202, in sel
    ds = self._to_temp_dataset().sel(
  File "/p/software/hdfml/stages/2020/software/Jupyter/2021.3.1-gcccoremkl-10.3.0-2021.2.0-Python-3.8.5/lib/python3.8/site-packages/xarray/core/dataset.py", line 2182, in sel
    pos_indexers, new_indexes = remap_label_indexers(
  File "/p/software/hdfml/stages/2020/software/Jupyter/2021.3.1-gcccoremkl-10.3.0-2021.2.0-Python-3.8.5/lib/python3.8/site-packages/xarray/core/coordinates.py", line 397, in remap_label_indexers
    pos_indexers, new_indexes = indexing.remap_label_indexers(
  File "/p/software/hdfml/stages/2020/software/Jupyter/2021.3.1-gcccoremkl-10.3.0-2021.2.0-Python-3.8.5/lib/python3.8/site-packages/xarray/core/indexing.py", line 270, in remap_label_indexers
    idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance)
  File "/p/software/hdfml/stages/2020/software/Jupyter/2021.3.1-gcccoremkl-10.3.0-2021.2.0-Python-3.8.5/lib/python3.8/site-packages/xarray/core/indexing.py", line 200, in convert_label_indexer
    indexer = get_indexer_nd(index, label, method, tolerance)
  File "/p/software/hdfml/stages/2020/software/Jupyter/2021.3.1-gcccoremkl-10.3.0-2021.2.0-Python-3.8.5/lib/python3.8/site-packages/xarray/core/indexing.py", line 103, in get_indexer_nd
    flat_indexer = index.get_indexer(flat_labels, method=method, tolerance=tolerance)
  File "/p/software/hdfml/stages/2020/software/SciPy-Stack/2021-gcccoremkl-10.3.0-2021.2.0-Python-3.8.5/lib/python3.8/site-packages/pandas-1.1.0-py3.8-linux-x86_64.egg/pandas/core/indexes/base.py", line 2980, in get_indexer
    raise InvalidIndexError(
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

First guess on error origin

  • is there an empty array? Check shape and coords of self.external_data to get more insight ❌
  • error says that there is no unique index, have a look to all competing models in external data: ✔
array(['MB-RNN-LT_ST', 'persi', 'obs', 'ols', 'IntelliO3-ts-v1',
       'MB-FCN-LT_ST', 'MB-RNN-LT_ST', 'OLS_hourly', 'RNN_hourly',
       'FCN_1449_512_32_4'], dtype=object)

Error origin

  • Error is caused by a duplicated name in the model_type dimensions (see duplication of 'MB-FCN-LT_ST')
  • origin: model_display_name was specified as 'MB-FCN-LT_ST' but also a competitor with the same name is added to competitors list in the run script.

Solution

  • implement a name check in the experiment setup to prevent this duplication and abort experiment run when name is not unique.
Edited Feb 28, 2022 by Ghost User
Assignee Loading
Time tracking Loading