Bug: order in feature_importance_bootstrap_method causes crash
Bug
Order of list elements for argument feature_importance_bootstrap_method=
can cause program crash.
Error description
within run.py
feature_importance_bootstrap_method=["zero_mean", "shuffle"],
FAILS
feature_importance_bootstrap_method=["shuffle", "zero_mean"],
WORKS
Error message
Traceback (most recent call last):
File "[...]/mlair/run.py", line 46, in <module>
main(args)
File "[...]/mlair/run.py", line 38, in main
workflow.run()
File "[...]/mlair/mlair/workflows/abstract_workflow.py", line 30, in run
stage(**self._registry_kwargs[pos])
File "[...]/mlair/mlair/run_modules/post_processing.py", line 99, in __init__
self._run()
File "[...]/mlair/mlair/run_modules/post_processing.py", line 125, in _run
self.report_feature_importance_results(self.feature_importance_skill_scores)
File "[...]/mlair/mlair/run_modules/post_processing.py", line 1027, in report_feature_importance_results
df = pd.DataFrame(res, columns=col_names)
File "[...]/lib/python3.8/site-packages/pandas/core/frame.py", line 509, in __init__
arrays, columns = to_arrays(data, columns, dtype=dtype)
File "[...]/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 524, in to_arrays
return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
File "[...]/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 567, in _list_to_arrays
raise ValueError(e) from e
ValueError: 6 columns passed, passed data had 25 columns
2022-01-28 15:29:19,809 - INFO: PostProcessing finished after 0:01:03 (hh:mm:ss) [run_environment.py:__del__:118]
Process finished with exit code 1
First guess on error origin
post_processing.py
in method report_feature_importance_results
-> col_names=
Error origin
Currently, the number of columns is determined by the first element of res
by *list(range(len(res[0]) - 5))
. In case, this element is shorter than the longest element, the pandas dataframe has too few columns.
Solution
Look for the longest result element and use this length to create the data frame.
class PostProcessing(RunEnvironment):
...
def report_feature_importance_results(self, results):
...
res = []
+ max_cols = 0
for boot_type, d0 in results.items():
for boot_method, d1 in d0.items():
for station_name, vals in d1.items():
for boot_var in vals.coords[self.boot_var_dim].values.tolist():
for ahead in vals.coords[self.ahead_dim].values.tolist():
res.append([boot_type, boot_method, station_name, boot_var, ahead,
*vals.sel({self.boot_var_dim: boot_var,
self.ahead_dim: ahead}).values.round(5).tolist()])
+ max_cols = max(max_cols, len(res[-1]))
col_names = [self.model_type_dim, "method", "station", self.boot_var_dim, self.ahead_dim,
- *list(range(len(res[0]) - 5))]
+ *list(range(max_cols - 5))]
Edited by Ghost User