Missing note on requirement for linearity of evaluation metric in bootstrapping-routine
The bootstrapping routine performs the block-bootstrapping on the evaluation metric itself instead of the data. While this approach saves a lot of computation time, it is only valid when the evaluation metric is linear with respect to the realizations. For MSE, SSMI and texture analysis used in the GMD paper, this is true. However, other metrics such as the Fractions Skill Score and the RMSE, care is required. E.g. for FSS, this approach is possible when bootstrapping the Fractions Brier Score (FBS) and FBS_worst separately (but of course in the sam manner to yield matching resampled data sequences). Afterwards, using
FSS(resampled) = 1 - sum(FBS(resampled))/sum(FBS_worst(resampled))
gives the correct (bootstrapped ) result.
At least, a note should be left on this issue in the doc-string of the corresponding method.
@ji4 : I noted down the procedure for the FSS explicitly :-)