Skip to content
Snippets Groups Projects
Commit c93a0e2b authored by Michael Langguth's avatar Michael Langguth
Browse files

Updated Jupyter Notebooks.

parent 32d27ef8
No related branches found
No related tags found
No related merge requests found
Pipeline #105855 failed
%% Cell type:code id:5272c6d8-2d18-4de1-a437-5fe6461ca743 tags:
%% Cell type:code id:sacred-impossible tags:
``` python
import os, sys
sys.path.append("../utils/")
sys.path.append("../video_prediction_tools/utils/")
sys.path.append("../video_prediction_tools/postprocess/")
import xarray as xr
import numpy as np
import pandas as pd
from statistical_evaluation import perform_block_bootstrap_metric, avg_metrics, calculate_cond_quantiles, Scores
```
%% Cell type:markdown id:61e6b683-b9c6-4691-9c1f-130324ddb7a8 tags:
%% Cell type:markdown id:antique-trader tags:
# Evaluation of the ERA5 short-range forecasts
Define the path to the file and load the data.
In this Jupyter Notebook, the ERA5 short-range forecasts created with the script `get_era5_forecasts.sh` are evaluated.
As a result of this Jupyter Notebook, the netCDF-file `evaluation_metrics.nc` will be created which carries the relevant evaluation metrics presented in [Gong et al., 2021](https://doi.org/10.5194/gmd-2021-430). With this file, the meta-postprocessing step can be run in order to create Figure 5 of the mentioned manuscript. <br>
%% Cell type:code id:453ad6f2-5655-47bf-8f39-7a1d73b059ab tags:
We start by defining the path to the respective netCDF-file and load the data.
%% Cell type:code id:streaming-croatia tags:
``` python
indir = "/p/home/jusers/langguth1/juwels/video_prediction_shared_folder/results/era5-Y2007-2019M01to12-92x56-3840N0000E-2t_tcc_t_850/era5_forecast"
indir = "<dir/with/era5_forecasts>" # see get_era5_forecasts.sh
yr=2019
era5_fcst_file = os.path.join(indir, "{0}_era5_short_range_fcst.nc".format(yr))
if not os.path.isfile(era5_fcst_file):
raise FileNotFoundError("Could not find file with all ERA5 forecasts '{0}'".format(era5_fcst_file))
```
%% Cell type:code id:459fa9bf-dbd0-48ee-8184-ccfec9ddbd59 tags:
``` python
era5_fcst = xr.open_dataset(era5_fcst_file)
print(era5_fcst)
```
%% Cell type:markdown id:07b1324f-c6df-45c8-b77a-e334c2bb2966 tags:
%% Cell type:markdown id:sixth-comfort tags:
Next we initialize the function for calculating the MSE and call it to evaluate the ERA5 and persistence forecasts. <br>
If you require further evaluation metrics, just expand the cell accordingly, e.g. add the following lines <br>
```
ssim_func = Scores("ssim", ["lat", "lon"]).score_func
ssim_era5_all = ssim_func(data_fcst=era5_fcst[varname_fcst], data_ref=era5_fcst[varname_ref])
ssim_per_all = (data_fcst=era5_fcst[varname_per], data_ref=era5_fcst[varname_ref])
```
in case you want to evaluate the SSIM as well.
Next, we read the climatology data which is required to calculate the anomaly correlation coefficient (see Eq. B2 in Gong et al., 2022). <br>
From this file, we load the 2m temperature and slice the data to the region of interest:
%% Cell type:code id:e066baeb-3190-4012-a0c1-ace1100be3aa tags:
%% Cell type:code id:meaningful-flashing tags:
``` python
fdata_clim = os.path.join("/p/project/deepacf/deeprain/video_prediction_shared_folder/preprocessedData/T2monthly/", "climatology_t2m_1991-2020.nc")
fdata_clim = os.path.join("<dir_to_climatology_reference_data>", "climatology_t2m_1991-2020.nc")
data = xr.open_dataset(fdata_clim)
# copied from load_climdata in main_visualize_postprocess.py
var="var167"
dt_clim = data[var]
lon_dom = np.arange(0., 27.5, 0.3)
lat_dom = np.arange(54.9, 38.1, -0.3)
# get the coordinates of the data after running CDO
coords = dt_clim.coords
nlat, nlon = len(coords["lat"]), len(coords["lon"])
# modify it our needs
coords_new = dict(coords)
coords_new.pop("time")
coords_new["month"] = np.arange(1, 13)
coords_new["hour"] = np.arange(0, 24)
# initialize a new data array with explicit dimensions for month and hour
data_clim_new = xr.DataArray(
np.full((12, 24, nlat, nlon), np.nan),
coords=coords_new,
dims=["month", "hour", "lat", "lon"],
)
# do the reorganization
for month in np.arange(1, 13):
data_clim_new.loc[dict(month=month)] = dt_clim.sel(
time=dt_clim["time.month"] == month
)
data_clim = data_clim_new.sel(lon=lons, lat=lats, tolerance=0.01, method="nearest")
print(data_clim["lat"])
```
%% Cell type:code id:0dd70ea1-9341-4b7a-9086-405ee75c6a64 tags:
%% Cell type:markdown id:prompt-scanning tags:
Then, we start to compute the evaluation metrics. <br>
Unfortunately, we have to rewrite the functions to calculate the ACC and the SSIM since both score-functions expect a `fcst_hour`-dimension which is not present in the data at hand.
%% Cell type:code id:synthetic-bracelet tags:
``` python
from tqdm import tqdm
from skimage.metrics import structural_similarity as ssim
# overwrite some score-functions which inherently expect a separate fcst_hour-dimension next to batch_size
def calc_acc_batch(data_fcst, data_ref, **kwargs):
"""
Calculate acc ealuation metric of forecast data w.r.t reference data
:param data_fcst: forecasted data (xarray with dimensions [batch, fore_hours, lat, lon])
:param data_ref: reference data (xarray with dimensions [batch, fore_hours, lat, lon])
:param data_clim: climatology data (xarray with dimensions [monthly, hourly, lat, lon])
:return: averaged acc for each batch example [batch, fore_hours]
"""
print("Start calculating ACC")
if "data_clim" in kwargs:
data_clim = kwargs["data_clim"]
else:
raise KeyError("%{0}: climatological data must be parsed to calculate the ACC.".format(method))
batch_size = data_fcst.shape[0]
acc = np.ones([batch_size])*np.nan
for i in tqdm(range(batch_size)):
img_fcst = data_fcst[i, ...]
img_ref = data_ref[i, ...]
# get the forecast time
img_month = img_fcst["fcst_hour"].dt.month.values
img_hour = img_fcst["fcst_hour"].dt.hour.values
img_clim = data_clim.sel(month=img_month, hour=img_hour)
img1_ = img_ref - img_clim
img2_ = img_fcst - img_clim
cor1 = np.sum(img1_*img2_)
cor2 = np.sqrt(np.sum(img1_**2)*np.sum(img2_**2))
acc[i] = cor1/cor2
# convert to data array
acc = xr.DataArray(acc, coords={"fcst_hour": data_fcst["fcst_hour"]}, dims=["fcst_hour"])
return acc
def calc_ssim_batch(data_fcst, data_ref, **kwargs):
"""
Calculate ssim ealuation metric of forecast data w.r.t reference data
:param data_fcst: forecasted data (xarray with dimensions [batch, fore_hours, lat, lon])
:param data_ref: reference data (xarray with dimensions [batch, fore_hours, lat, lon])
:return: averaged ssim for each batch example, shape is [batch,fore_hours]
"""
method = Scores.calc_ssim_batch.__name__
batch_size = np.array(data_ref).shape[0]
ssim_pred = []
for i in tqdm(range(batch_size)):
ssim_pred.append(ssim(data_ref[i, ...],data_fcst[i,...]))
# convert to data array
ssim_pred = xr.DataArray(np.asarray(ssim_pred), coords={"fcst_hour": data_fcst["fcst_hour"]}, dims=["fcst_hour"])
return ssim_pred
```
%% Cell type:code id:e76c4db9-f4da-4664-8054-fdd99cf5f64b tags:
%% Cell type:markdown id:linear-bathroom tags:
However, for the MSE and the texture-metric, we can simply use the functions provided by the Scores-class:
%% Cell type:code id:ordered-wireless tags:
``` python
# to get and configure the score-functions
score_dims = ["lat", "lon"]
scores = ["mse", "ssim", "acc", "texture"]
# the reference data
varname_ref, varname_fcst, varname_per = "2t_ref", "2t_era5_fcst", "2t_persistence_fcst"
# initialize empty dictionaries to store the score functions and to save the corresponding results
score_data = {}
for score in scores:
# overwrite functions that are incompatble here
if score == "acc":
score_func = calc_acc_batch
elif score == "ssim":
score_func = calc_ssim_batch
else:
score_func = Scores(score, score_dims).score_func
# parsing the persistence forecast as float32 increases throughput by a factor of 10!
score_data[score] = {"era5_all": score_func(data_fcst=era5_fcst[varname_fcst], data_ref=era5_fcst[varname_ref], data_clim=data_clim,),
"per_all": score_func(data_fcst=era5_fcst[varname_per].astype("float32"), data_ref=era5_fcst[varname_ref], data_clim=data_clim)}
```
%% Cell type:code id:c612a06f-e193-4d4a-85a4-1adc11fde81e tags:
%% Cell type:code id:human-fighter tags:
``` python
print(score_data["mse"]["era5_all"])
print(score_data["ssim"]["era5_all"])
print(score_data["texture"]["era5_all"])
```
%% Cell type:markdown id:b91695dc-7d3f-47de-8e67-03e5675aeac1 tags:
%% Cell type:markdown id:optional-channels tags:
Next, we initialize the data arrays to store the metrics for each forecast hour. <br>
Note that the ERA5 short-range forecasts only start twice a day at 06 and 18 UTC, respectively. Besides, the have only data starting from lead time 6 hours, but for consistency with the video prediction models, the data arrays cover all lead times between forecast hour 1 and 12. The unavailable values will be set to None.
Note that the ERA5 short-range forecasts only start twice a day at 06 and 18 UTC, respectively. Besides, we only have data starting from lead time 6 hours (see `get_era5_forecasts.sh`), but for consistency with the video prediction models, the data arrays cover all lead times between forecast hour 1 and 12. The unavailable values will be set to None.
%% Cell type:code id:95611d9c-7551-466d-84b6-ca24be2d3977 tags:
%% Cell type:code id:dynamic-animal tags:
``` python
init_times = [6, 18]
fcst_hours = np.arange(1, 13)
nhours = len(fcst_hours)
nboots=1000
# MSE
mse_era5_fcst = xr.DataArray(np.empty(nhours, dtype=object), coords={"fcst_hour": fcst_hours}, dims=["fcst_hour"])
mse_era5_fcst_boot = xr.DataArray(np.empty((nhours, nboots), dtype=object),
coords={"fcst_hour": fcst_hours, "iboot": np.arange(nboots)},
dims=["fcst_hour", "iboot"])
mse_per_fcst = xr.DataArray(np.empty(nhours, dtype=object), coords={"fcst_hour": fcst_hours}, dims=["fcst_hour"])
mse_per_fcst_boot = xr.DataArray(np.empty((nhours, nboots), dtype=object),
coords={"fcst_hour": fcst_hours, "iboot": np.arange(nboots)},
dims=["fcst_hour", "iboot"])
# SSMI
ssim_era5_fcst, ssim_per_fcst = mse_era5_fcst.copy(), mse_per_fcst.copy()
ssim_era5_fcst_boot, ssim_per_fcst_boot = mse_per_fcst_boot.copy(), mse_per_fcst_boot.copy()
# ACC
acc_era5_fcst, acc_per_fcst = mse_era5_fcst.copy(), mse_per_fcst.copy()
acc_era5_fcst_boot, acc_per_fcst_boot = mse_per_fcst_boot.copy(), mse_per_fcst_boot.copy()
# ACC
txtr_era5_fcst, txtr_per_fcst = mse_era5_fcst.copy(), mse_per_fcst.copy()
txtr_era5_fcst_boot, txtr_per_fcst_boot = mse_per_fcst_boot.copy(), mse_per_fcst_boot.copy()
```
%% Cell type:markdown id:e5700456-043b-4656-8ac4-759f42fc03c4 tags:
%% Cell type:markdown id:local-motor tags:
Finally, we populate the initialized data arrays by looping over the forecast hours for which data is available. <br>
Additionally, we perform block bootstrapping to estimate the uncertainty of our evaluation metrics.
%% Cell type:code id:f3ea77c8-60c1-48bd-8285-95d6af66217a tags:
%% Cell type:code id:built-notice tags:
``` python
def handle_scores(scores1_all, scores_ref_all, fhh):
scores1 = scores1_all.sel(fcst_hour=(scores1_all.fcst_hour.dt.hour.isin(fhh)))
scores_ref = scores_ref_all.sel(fcst_hour=(scores_ref_all.fcst_hour.dt.hour.isin(fhh)))
score1_mean, score_ref_mean = scores1.mean(), scores_ref.mean()
# two runs per day -> 2*7 correpsonds to a block length of one week
score1_boot = perform_block_bootstrap_metric(scores1, "fcst_hour", 2*7)
score_ref_boot = perform_block_bootstrap_metric(scores_ref, "fcst_hour", 2*7)
return score1_mean, score_ref_mean, score1_boot, score_ref_boot
for fh in fcst_hours[5::]:
print("Handling scores for forecast hour '{0:0d}'".format(fh))
fh_curr = (init_times + fh)%24
# MSE
mse_era5_fcst[fh-1], mse_per_fcst[fh-1], mse_era5_fcst_boot[fh-1, :], mse_per_fcst_boot[fh-1, :] = handle_scores(score_data["mse"]["era5_all"],
score_data["mse"]["per_all"], fh_curr)
# SSIM
ssim_era5_fcst[fh-1], ssim_per_fcst[fh-1], ssim_era5_fcst_boot[fh-1, :], ssim_per_fcst_boot[fh-1, :] = handle_scores(score_data["ssim"]["era5_all"],
score_data["ssim"]["per_all"], fh_curr)
# ACC
acc_era5_fcst[fh-1], acc_per_fcst[fh-1], acc_era5_fcst_boot[fh-1, :], acc_per_fcst_boot[fh-1, :] = handle_scores(score_data["acc"]["era5_all"],
score_data["acc"]["per_all"], fh_curr)
# TEXTURE
txtr_era5_fcst[fh-1], txtr_per_fcst[fh-1], txtr_era5_fcst_boot[fh-1, :], txtr_per_fcst_boot[fh-1, :] = handle_scores(score_data["texture"]["era5_all"],
score_data["texture"]["per_all"], fh_curr)
#mse_era5_curr = mse_era5_all.sel(fcst_hour=(mse_era5_all.fcst_hour.dt.hour.isin(fh_curr)))
#mse_per_curr = mse_per_all.sel(fcst_hour=(mse_per_all.fcst_hour.dt.hour.isin(fh_curr)))
#mse_era5_fcst[fh-1], mse_per_fcst[fh-1] = mse_era5_curr.mean(), mse_per_curr.mean()
## two runs per day -> 2*7 correpsonds to a block length of one week
#mse_era5_fcst_boot[fh-1, :] = perform_block_bootstrap_metric(mse_era5_curr, "fcst_hour", 2*7)
#mse_per_fcst_boot[fh-1, :] = perform_block_bootstrap_metric(mse_per_curr, "fcst_hour", 2*7)
```
%% Cell type:code id:5ec002e6-cb1b-4c66-9ed0-9f5e2bd3fae9 tags:
``` python
print(mse_era5_fcst)
print(ssim_era5_fcst)
print(acc_era5_fcst)
print(txtr_era5_fcst)
```
%% Cell type:markdown id:276f4f4c-e926-4009-9198-c4e43271e87d tags:
%% Cell type:markdown id:accompanied-commerce tags:
Finally, we put the data arrays into a joint dataset and save the results into the netCDF-file.
%% Cell type:code id:88ccbdce-5d6a-4c40-971c-ebe9a05a8c88 tags:
%% Cell type:code id:serious-president tags:
``` python
# create Dataset and save to netCDF-file
ds_mse = xr.Dataset({"2t_era5_mse_avg": mse_era5_fcst, "2t_era5_mse_bootstrapped": mse_era5_fcst_boot,
"2t_persistence_mse_avg": mse_per_fcst, "2t_persistence_mse_bootstrapped": mse_per_fcst_boot,
# SSIM
"2t_era5_ssim_avg": ssim_era5_fcst, "2t_era5_mse_bootstrapped": ssim_era5_fcst_boot,
"2t_persistence_ssim_avg": ssim_per_fcst, "2t_persistence_mse_bootstrapped": ssim_per_fcst_boot,
# ACC
"2t_era5_acc_avg": acc_era5_fcst, "2t_era5_acc_bootstrapped": acc_era5_fcst_boot,
"2t_persistence_acc_avg": acc_per_fcst, "2t_persistence_acc_bootstrapped": acc_per_fcst_boot,
# TEXTURE
"2t_era5_texture_avg": txtr_era5_fcst, "2t_era5_texture_bootstrapped": txtr_era5_fcst_boot,
"2t_persistence_texture_avg": txtr_per_fcst, "2t_persistence_texture_bootstrapped": txtr_per_fcst_boot,
})
outfile = os.path.join(indir, "evaluation_metrics.nc")
print("Save evaluation metrics to '{0}'".format(outfile))
ds_mse.to_netcdf(outfile)
```
%% Cell type:markdown id:a7678cc4-7333-402a-bcf4-1bc15712255d tags:
%% Cell type:markdown id:mental-oxford tags:
## DONE!
......
%% Cell type:code id:d636ee12-e299-485f-b84d-6f35c05fa766 tags:
%% Cell type:code id:written-productivity tags:
``` python
import os, sys
import glob
sys.path.append("../utils/")
import xarray as xr
import numpy as np
import pandas as pd
from statistical_evaluation import perform_block_bootstrap_metric, avg_metrics, calculate_cond_quantiles, Scores
```
%% Cell type:markdown id:8510684d-9374-4e40-bc1c-4d69181c925c tags:
%% Cell type:markdown id:english-conducting tags:
# Evaluation over a smaller (joint) domain
In this Jupyter Notebook, evaluation will be performed on a subregion which is nested into the target region of the evaluated trained video prediction models. The resulting netCDF-files can be used in the meta-postpro
The following cells will first merge all forecast files under `indir` into a single netCDF-file.<br>
Then the data is sliced to the domain defined by `lonlatbox` and all subsequent evaluation is performed on this smaller domain.<br>
The evaluation metrics are then saved to a file under `indir` named `evaluation_metrics_<nlon>x<nlat>.nc` where `nlat` and `nlon` denote the number of grid points/pixels in latitude and longitude direction of the smaller domain, respectively. <br>
Thus, first let's define the basic parameters:
%% Cell type:code id:440b15fa-ecd4-4bb4-9100-ede5abb2b04f tags:
%% Cell type:code id:metropolitan-mailman tags:
``` python
indir = "/p/project/deepacf/deeprain/video_prediction_shared_folder/results/era5-Y2007-2019M01to12-92x56-3840N0000E-2t_tcc_t_850/savp/20210901T090059_gong1_savp_cv12/"
model = "savp"
# define domain. [3., 24.3, 40.2, 53.1] corresponds to the smallest domain tested in the GMD paper
lonlatbox = [3., 24.3, 40.2, 53.1]
```
%% Cell type:markdown id:fd759f01-2561-4615-8056-036bdee6e2c7 tags:
%% Cell type:markdown id:turned-player tags:
Next, we perform a first merging step. For computational efficiency, we merge max. 1000 files in the first step.<br>
Since the data is not sorted by the dimension `init_time` when querying along the sample index, we sort it before saving to intermediate files.<br>
Given that the merging step has already been performed, no further processing is required.<br>
If this is not the case, we start with the sample indices between 0 and 999:
%% Cell type:code id:e6726da3-d774-4eda-89d6-e315a865bb99 tags:
%% Cell type:code id:aging-radio tags:
``` python
def get_fname_ndigits(indir, prefix, suffix, n, patt="[0-9]"):
flist = []
for i in range(1, n+1):
fn_search = os.path.join(indir, "{0}{1}{2}".format(prefix, i*patt, suffix))
flist = flist + glob.glob(fn_search)
if len(flist) == 0:
raise FileNotFoundError("Could not find any file under '{0}' with prefix '{1}' and suffix '{2}' containing digits.".format(indir, prefix, suffix))
return flist
# get list of files with sample index between 0 and 999.
vfp_list = get_fname_ndigits(indir, "vfp_date_*sample_ind_", ".nc", 3)
outfile = os.path.join(indir, "vfp_{0}_forecasts_sample_ind_0_999.nc".format(model))
if not os.path.isfile(outfile):
print("File '{0}' does not exist. \n Start reading data with sample index between 0 and 999 from '{1}'...".format(outfile, indir))
data_all = xr.open_mfdataset(vfp_list, concat_dim="init_time", combine="nested", decode_cf=True).load()
data_all = data_all.sortby("init_time")
print("Data loaded successfully. Save merged data to '{0}'.".format(outfile))
data_all.to_netcdf(outfile, encoding={'init_time':{'units': "seconds since 1900-01-01 00:00:00"}})
```
%% Cell type:markdown id:1c0222c0-386d-44f4-9532-4e824b14828c tags:
%% Cell type:markdown id:controversial-picking tags:
Then, we proceed with the rest.
%% Cell type:code id:54f4aa3e-3a39-496e-ae97-65f79d9cd598 tags:
%% Cell type:code id:basic-corps tags:
``` python
for i in np.arange(1, 9):
outfile = os.path.join(indir, "vfp_{0}_forecasts_sample_ind_{1:d}000_{1:d}999.nc".format(model, i))
if not os.path.isfile(outfile):
print("File '{0}' does not exist. Start reading data with sample index between {1:d}000 and {1:d}999 from '{2}'...".format(outfile, i, indir))
data_all = xr.open_mfdataset(os.path.join(indir, "vfp_date_*sample_ind_{0}???.nc".format(i)), concat_dim="init_time", combine="nested", decode_cf=True).load()
data_all = data_all.sortby("init_time")
print("Data loaded successfully. Save merged data to '{0}'.".format(outfile))
data_all.to_netcdf(outfile, encoding={'init_time':{'units': "seconds since 1900-01-01 00:00:00"}})
```
%% Cell type:markdown id:bdf16158-0ce5-40a3-848d-f574a1b9d622 tags:
%% Cell type:markdown id:severe-satisfaction tags:
Still, xarray's `open_mfdataset`-method would not be able to concatenate all data since the `init_time`-dimension is not montonically increasing/decreasing when looping through the files. <br>
Thus, we have to merge the data manually.
The merged dataset is then saved to separate datafile for later computation.
If the data has already been merged, we simply read the data from the corresponding netCDF-file.
%% Cell type:code id:92f15edf-c23f-4803-b3c5-618305194de5 tags:
%% Cell type:code id:illegal-headset tags:
``` python
outfile_all = os.path.join(indir, "vfp_{0}_forecasts_all.nc".format(model))
if not os.path.isfile(outfile_all):
print("netCDF-file with all forecasts '{0}' does not exist yet. Start merging and sorting all precursor files.".format(outfile))
all_files = sorted(glob.glob(os.path.join(indir, "vfp_{0}_forecasts_sample_ind_*.nc".format(model))))
if len(all_files) == 0:
raise FileNotFoundError("Could not find any precursor files.")
for i, f in enumerate(all_files):
print("Processing file '{0}'".format(f))
tmp = xr.open_dataset(os.path.join(indir, f)).load()
if i == 0:
all_fcst = tmp.copy()
else:
print("Start merging")
all_fcst = xr.merge([all_fcst, tmp])
# sort by init_time-dimension...
all_fcst = all_fcst.sortby("init_time")
# ... and save to file
print("Finally, write all merged and sorted data to '{0}'.".format(outfile_all))
all_fcst.to_netcdf(outfile_all)
else:
all_fcst = xr.open_dataset(outfile_all).load()
```
%% Cell type:markdown id:0fcf1cb1-ba0d-4262-8e23-12ba44b6e2d0 tags:
%% Cell type:markdown id:fresh-favor tags:
Now, we slice the dataset to the domain of interest (defined by `lonlatbox`).
%% Cell type:code id:ede23e56-5be8-48be-b584-0eb8741acbf3 tags:
%% Cell type:code id:initial-campbell tags:
``` python
all_fcst_sl = all_fcst.sel({"lon": slice(lonlatbox[0], lonlatbox[1]), "lat": slice(lonlatbox[3], lonlatbox[2])})
print(all_fcst_sl)
```
%% Cell type:markdown id:e21b89c8-57ab-4070-9b4c-ec0fe24c37b9 tags:
%% Cell type:markdown id:optimum-drunk tags:
Next we initialize the function for calculating the MSE and call it to evaluate the ERA5 and persistence forecasts. <br>
If you require further evaluation metrics, just expand the cell accordingly, e.g. add the following lines <br>
```
ssim_func = Scores("ssim", ["lat", "lon"]).score_func
ssim_era5_all = ssim_func(data_fcst=era5_fcst[varname_fcst], data_ref=era5_fcst[varname_ref])
ssim_per_all = (data_fcst=era5_fcst[varname_per], data_ref=era5_fcst[varname_ref])
```
in case you want to evaluate the SSIM as well.
%% Cell type:code id:c2b70b80-6b86-4674-b051-6a23aaa821ea tags:
%% Cell type:code id:sorted-adams tags:
``` python
mse_func = Scores("mse", ["lat", "lon"]).score_func
varname_ref, varname_fcst, varname_per = "2t_ref", "2t_{0}_fcst".format(model), "2t_persistence_fcst"
mse_model_all = mse_func(data_fcst=all_fcst_sl[varname_fcst], data_ref=all_fcst_sl[varname_ref])
mse_per_all = mse_func(data_fcst=all_fcst_sl[varname_per], data_ref=all_fcst_sl[varname_ref])
```
%% Cell type:markdown id:7745356d-ad44-47b6-9655-8d6db3433b1a tags:
%% Cell type:markdown id:demonstrated-eligibility tags:
Then, we initialize the data arrays to store the desired evaluation metrics...
%% Cell type:code id:b49db031-126c-44b1-b649-4f70587fac89 tags:
%% Cell type:code id:protective-battle tags:
``` python
fcst_hours = all_fcst_sl["fcst_hour"]
nhours = len(fcst_hours)
nboots=1000
mse_model_fcst = xr.DataArray(np.empty(nhours, dtype=object), coords={"fcst_hour": fcst_hours}, dims=["fcst_hour"])
mse_model_fcst_boot = xr.DataArray(np.empty((nhours, nboots), dtype=object),
coords={"fcst_hour": fcst_hours, "iboot": np.arange(nboots)},
dims=["fcst_hour", "iboot"])
mse_per_fcst = xr.DataArray(np.empty(nhours, dtype=object), coords={"fcst_hour": fcst_hours}, dims=["fcst_hour"])
mse_per_fcst_boot = xr.DataArray(np.empty((nhours, nboots), dtype=object),
coords={"fcst_hour": fcst_hours, "iboot": np.arange(nboots)},
dims=["fcst_hour", "iboot"])
```
%% Cell type:markdown id:55967405-02d1-46e8-b3c3-8952d0e28bd2 tags:
%% Cell type:markdown id:norman-swedish tags:
... and populate them by looping over all forecast hours.
%% Cell type:code id:5090e71c-f20f-43e6-94f6-71cbd0b6006d tags:
%% Cell type:code id:extra-bible tags:
``` python
for i, fh in enumerate(fcst_hours):
mse_model_curr = mse_model_all.sel(fcst_hour=fh)
mse_per_curr = mse_per_all.sel(fcst_hour=fh)
mse_model_fcst[fh-1], mse_per_fcst[fh-1] = mse_model_curr.mean(), mse_per_curr.mean()
mse_model_fcst_boot[i, :] = perform_block_bootstrap_metric(mse_model_curr, "init_time", 24*7)
mse_per_fcst_boot[i, :] = perform_block_bootstrap_metric(mse_per_curr, "init_time", 24*7)
```
%% Cell type:code id:d526324d-5d19-4193-8208-e609d9c65205 tags:
``` python
print(mse_model_fcst)
```
%% Cell type:code id:b42cd738-b966-4b24-ad13-351d9b88f9e8 tags:
``` python
print(mse_model_fcst)
```
%% Cell type:markdown id:b13c7287-7a8c-4133-bccc-f250bf25dad7 tags:
%% Cell type:markdown id:limiting-corpus tags:
Finally, we put the data arrays into a joint dataset and save the results into the netCDF-file.
%% Cell type:code id:7cd455ee-4749-46dd-8095-9e43744a1563 tags:
%% Cell type:code id:reported-traffic tags:
``` python
# create Dataset and save to netCDF-file
ds_mse = xr.Dataset({"2t_{0}_mse_avg".format(model): mse_model_fcst, "2t_{0}_mse_bootstrapped".format(model): mse_model_fcst_boot,
"2t_persistence_mse_avg": mse_per_fcst, "2t_persistence_mse_bootstrapped": mse_per_fcst_boot})
outfile = os.path.join(indir, "evaluation_metrics_{0:d}x{1:d}.nc".format(len(all_fcst_sl["lon"]), len(all_fcst_sl["lat"])))
print("Save evaluation metrics to '{0}'".format(outfile))
print(ds_mse)
ds_mse.to_netcdf(outfile)
```
%% Cell type:markdown id:69bb9464-6fdb-489b-ba59-0170040144ee tags:
%% Cell type:markdown id:correct-bulgarian tags:
## Done!
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment