mean bootstrapping
Motivation
The current bootstrapping method was quite slow for hourly data because of the multiple resampling.
Proposed solution
Implement a variation of this destroy-information-of-a-single-input approach by destroying the information using the variable's mean value. This mean value is 0 by choice for transformed data and therefore much cheaper to apply to the data compared to a full shuffling and redrawing of values for each variable. The answer such an investigation can give is "how much information can be used from the inspected variable if it deviates from the mean state". If this information is quite low, than this variable has limited influence on the prediction. If the skill score is stronger negative, this would be a hint, that the real value is indeed important for a prediction. At the end, this is a similar (or the same) answer to that given by the shuffling approach, but achieved with massivly less computation effort. Furthermore, using a mean value as input isn't such a hard destruction of the information than randomly shuffled data.
Implementation
-
create a new bootstrap class that can replace single variables by a given value. This could be either 0 because we know that data is scaled or a value that must be calculated? -
There should be a summary plot similar to the current bootstrapping plot -
When using a filter approach, there will be another dimension. In this case, there should be the overall plot as mentioned above, but also a more specific plot with the mean bootstrapping for each single filter component per variable. -
When implementing this plot for filters, one could think about an additional plot in that all long-term components are disabled at once, then the faster ones, and finally the residues. This plot could show how much overall influence is contained in different time components.
tasks to implement
-
create new class MeanBoostrapping
-
make sure that data is in transformed space -> mean=0, otherwise ask for mean? -
set single variable to mean and return -
iterate over all variables with this mean replacement -
be able to handle branches -
set single variable in single branch to mean -
replace single variable in all branches at once to mean
-
-
able to set entire branches (all variables in this branch) to mean
not solved in this issue
See #316 about a mean bootstrapping with a non-zero mean