Skip to content
Snippets Groups Projects
Commit 764b7c70 authored by Michael Langguth's avatar Michael Langguth
Browse files

Integrate Jupyter Notebooks for evaluating IFS forecasts and evaluation on shared subdomain.

parent 9bb49049
Branches
Tags
No related merge requests found
Pipeline #105815 failed
Jupyter_Notebooks/first_cond_quantile.png

35.2 KiB

#!/usr/bin/env bash
yr=$1
indir=/p/fastdata/slmet/slmet111/met_data/ecmwf/era5/grib/${yr}
indir_ref=/p/project/deepacf/deeprain/video_prediction_shared_folder/results/era5-Y2007-2019M01to12-92x56-3840N0000E-2t_tcc_t_850/savp/20210901T090059_gong1_savp_cv12/
outdir=/p/scratch/deepacf/deeprain/video_prediction_shared_folder/era5_forecast_ref/${yr}
if [[ ! -d "${outdir}" ]]; then
mkdir ${outdir}
fi
hh_s=6
hh_e=12
declare -a hh_list=(18)
for hh in ${hh_list[@]}; do
hh0=$(printf "%02d" ${hh})
# slice and retrieve 2m temperature from forecast-files
for mm in {01..12};
do sf_files=(`ls ${indir}/${mm}/fc_${hh0}/*_{6..12}_sf_fc.grb`);
for sf_file in ${sf_files[*]};
do newfile=`basename "${sf_file}"`
newfile="${outdir}/${newfile/.grb/.nc}"
echo "Processing file '${newfile}'"
cdo --eccodes -f nc copy -selname,2t -sellonlatbox,0.,27.3,38.4,54.9 ${sf_file} ${newfile}
done
done
date1=`date -d "${yr}0101" '+%Y%m%d'`
date2=`date -d "${yr}0101 ${hh0}" '+%Y-%m-%d %H:%M:00'`
date_end=`date -d "{yr}1231 ${hh0}" '+%Y%m%d'`
while [[ "$date1" -le "$date_end" ]]; do
outfile=${outdir}/${date1}${hh0}_allfc.nc
echo "Merging forecats for run at ${date2}..."
cdo mergetime ${outdir}/${date1}_${hh}00_*.nc ${outfile}
echo "Manipulate attributes and dimensions..."
ncrename -O -v 2t,2t_era5_fcst ${outfile} ${outfile}
ncap2 -O -s init_time=0 ${outfile} ${outfile}
ncatted -O -a calendar,init_time,a,c,"proleptic_gregorian" -a units,init_time,a,c,"hours since ${date2}" ${outfile} ${outfile}
ncrename -O -d time,fcst_hour -v time,fcst_hour ${outfile} ${outfile}
ncap2 -O -s 'fcst_hour=int(fcst_hour)' ${outfile} ${outfile}
# add reference data as possible
patt=${indir_ref}/vfp_date_${date1}${hh0}_sample_ind*.nc
if ls $patt 1>/dev/null 2<&1; then
file_src=`ls $patt`
ncks -d fcst_hour,5,11 -v 2t_ref,2t_persistence_fcst -A ${file_src} ${outfile}
else
echo "Could not find reference data for ${date2}..."
mv ${outfile} "${oufile/.nc/_ref_miss.nc}"
fi
date1="$(date -u --date="$date1 tomorrow" '+%Y%m%d')"
date2="$(date -u --date="$date2 tomorrow" '+%Y-%m-%d %H:%M:00')"
done
done
%% Cell type:code id:5272c6d8-2d18-4de1-a437-5fe6461ca743 tags:
``` python
import os, sys
sys.path.append("../utils/")
import xarray as xr
import numpy as np
import pandas as pd
from statistical_evaluation import perform_block_bootstrap_metric, avg_metrics, calculate_cond_quantiles, Scores
```
%% Cell type:markdown id:61e6b683-b9c6-4691-9c1f-130324ddb7a8 tags:
# Evaluation of the ERA5 short-range forecasts
Define the path to the file and load the data.
%% Cell type:code id:453ad6f2-5655-47bf-8f39-7a1d73b059ab tags:
``` python
indir = "/p/home/jusers/langguth1/juwels/video_prediction_shared_folder/results/era5-Y2007-2019M01to12-92x56-3840N0000E-2t_tcc_t_850/era5_forecast"
yr=2019
era5_fcst_file = os.path.join(indir, "{0}_era5_short_range_fcst.nc".format(yr))
if not os.path.isfile(era5_fcst_file):
raise FileNotFoundError("Could not find file with all ERA5 forecasts '{0}'".format(era5_fcst_file))
```
%% Cell type:code id:459fa9bf-dbd0-48ee-8184-ccfec9ddbd59 tags:
``` python
era5_fcst = xr.open_dataset(era5_fcst_file)
print(era5_fcst)
```
%% Cell type:markdown id:07b1324f-c6df-45c8-b77a-e334c2bb2966 tags:
Next we initialize the function for calculating the MSE and call it to evaluate the ERA5 and persistence forecasts. <br>
If you require further evaluation metrics, just expand the cell accordingly, e.g. add the following lines <br>
```
ssim_func = Scores("ssim", ["lat", "lon"]).score_func
ssim_era5_all = ssim_func(data_fcst=era5_fcst[varname_fcst], data_ref=era5_fcst[varname_ref])
ssim_per_all = (data_fcst=era5_fcst[varname_per], data_ref=era5_fcst[varname_ref])
```
in case you want to evaluate the SSIM as well.
%% Cell type:code id:e066baeb-3190-4012-a0c1-ace1100be3aa tags:
``` python
fdata_clim = os.path.join("/p/project/deepacf/deeprain/video_prediction_shared_folder/preprocessedData/T2monthly/", "climatology_t2m_1991-2020.nc")
data = xr.open_dataset(fdata_clim)
# copied from load_climdata in main_visualize_postprocess.py
var="var167"
dt_clim = data[var]
lon_dom = np.arange(0., 27.5, 0.3)
lat_dom = np.arange(54.9, 38.1, -0.3)
# get the coordinates of the data after running CDO
coords = dt_clim.coords
nlat, nlon = len(coords["lat"]), len(coords["lon"])
# modify it our needs
coords_new = dict(coords)
coords_new.pop("time")
coords_new["month"] = np.arange(1, 13)
coords_new["hour"] = np.arange(0, 24)
# initialize a new data array with explicit dimensions for month and hour
data_clim_new = xr.DataArray(
np.full((12, 24, nlat, nlon), np.nan),
coords=coords_new,
dims=["month", "hour", "lat", "lon"],
)
# do the reorganization
for month in np.arange(1, 13):
data_clim_new.loc[dict(month=month)] = dt_clim.sel(
time=dt_clim["time.month"] == month
)
data_clim = data_clim_new.sel(lon=lons, lat=lats, tolerance=0.01, method="nearest")
print(data_clim["lat"])
```
%% Cell type:code id:0dd70ea1-9341-4b7a-9086-405ee75c6a64 tags:
``` python
from tqdm import tqdm
from skimage.metrics import structural_similarity as ssim
# overwrite some score-functions which inherently expect a separate fcst_hour-dimension next to batch_size
def calc_acc_batch(data_fcst, data_ref, **kwargs):
"""
Calculate acc ealuation metric of forecast data w.r.t reference data
:param data_fcst: forecasted data (xarray with dimensions [batch, fore_hours, lat, lon])
:param data_ref: reference data (xarray with dimensions [batch, fore_hours, lat, lon])
:param data_clim: climatology data (xarray with dimensions [monthly, hourly, lat, lon])
:return: averaged acc for each batch example [batch, fore_hours]
"""
print("Start calculating ACC")
if "data_clim" in kwargs:
data_clim = kwargs["data_clim"]
else:
raise KeyError("%{0}: climatological data must be parsed to calculate the ACC.".format(method))
batch_size = data_fcst.shape[0]
acc = np.ones([batch_size])*np.nan
for i in tqdm(range(batch_size)):
img_fcst = data_fcst[i, ...]
img_ref = data_ref[i, ...]
# get the forecast time
img_month = img_fcst["fcst_hour"].dt.month.values
img_hour = img_fcst["fcst_hour"].dt.hour.values
img_clim = data_clim.sel(month=img_month, hour=img_hour)
img1_ = img_ref - img_clim
img2_ = img_fcst - img_clim
cor1 = np.sum(img1_*img2_)
cor2 = np.sqrt(np.sum(img1_**2)*np.sum(img2_**2))
acc[i] = cor1/cor2
# convert to data array
acc = xr.DataArray(acc, coords={"fcst_hour": data_fcst["fcst_hour"]}, dims=["fcst_hour"])
return acc
def calc_ssim_batch(data_fcst, data_ref, **kwargs):
"""
Calculate ssim ealuation metric of forecast data w.r.t reference data
:param data_fcst: forecasted data (xarray with dimensions [batch, fore_hours, lat, lon])
:param data_ref: reference data (xarray with dimensions [batch, fore_hours, lat, lon])
:return: averaged ssim for each batch example, shape is [batch,fore_hours]
"""
method = Scores.calc_ssim_batch.__name__
batch_size = np.array(data_ref).shape[0]
ssim_pred = []
for i in tqdm(range(batch_size)):
ssim_pred.append(ssim(data_ref[i, ...],data_fcst[i,...]))
# convert to data array
ssim_pred = xr.DataArray(np.asarray(ssim_pred), coords={"fcst_hour": data_fcst["fcst_hour"]}, dims=["fcst_hour"])
return ssim_pred
```
%% Cell type:code id:e76c4db9-f4da-4664-8054-fdd99cf5f64b tags:
``` python
# to get and configure the score-functions
score_dims = ["lat", "lon"]
scores = ["mse", "ssim", "acc", "texture"]
# the reference data
varname_ref, varname_fcst, varname_per = "2t_ref", "2t_era5_fcst", "2t_persistence_fcst"
# initialize empty dictionaries to store the score functions and to save the corresponding results
score_data = {}
for score in scores:
# overwrite functions that are incompatble here
if score == "acc":
score_func = calc_acc_batch
elif score == "ssim":
score_func = calc_ssim_batch
else:
score_func = Scores(score, score_dims).score_func
# parsing the persistence forecast as float32 increases throughput by a factor of 10!
score_data[score] = {"era5_all": score_func(data_fcst=era5_fcst[varname_fcst], data_ref=era5_fcst[varname_ref], data_clim=data_clim,),
"per_all": score_func(data_fcst=era5_fcst[varname_per].astype("float32"), data_ref=era5_fcst[varname_ref], data_clim=data_clim)}
```
%% Cell type:code id:c612a06f-e193-4d4a-85a4-1adc11fde81e tags:
``` python
print(score_data["mse"]["era5_all"])
print(score_data["ssim"]["era5_all"])
print(score_data["texture"]["era5_all"])
```
%% Cell type:markdown id:b91695dc-7d3f-47de-8e67-03e5675aeac1 tags:
Next, we initialize the data arrays to store the metrics for each forecast hour. <br>
Note that the ERA5 short-range forecasts only start twice a day at 06 and 18 UTC, respectively. Besides, the have only data starting from lead time 6 hours, but for consistency with the video prediction models, the data arrays cover all lead times between forecast hour 1 and 12. The unavailable values will be set to None.
%% Cell type:code id:95611d9c-7551-466d-84b6-ca24be2d3977 tags:
``` python
init_times = [6, 18]
fcst_hours = np.arange(1, 13)
nhours = len(fcst_hours)
nboots=1000
# MSE
mse_era5_fcst = xr.DataArray(np.empty(nhours, dtype=object), coords={"fcst_hour": fcst_hours}, dims=["fcst_hour"])
mse_era5_fcst_boot = xr.DataArray(np.empty((nhours, nboots), dtype=object),
coords={"fcst_hour": fcst_hours, "iboot": np.arange(nboots)},
dims=["fcst_hour", "iboot"])
mse_per_fcst = xr.DataArray(np.empty(nhours, dtype=object), coords={"fcst_hour": fcst_hours}, dims=["fcst_hour"])
mse_per_fcst_boot = xr.DataArray(np.empty((nhours, nboots), dtype=object),
coords={"fcst_hour": fcst_hours, "iboot": np.arange(nboots)},
dims=["fcst_hour", "iboot"])
# SSMI
ssim_era5_fcst, ssim_per_fcst = mse_era5_fcst.copy(), mse_per_fcst.copy()
ssim_era5_fcst_boot, ssim_per_fcst_boot = mse_per_fcst_boot.copy(), mse_per_fcst_boot.copy()
# ACC
acc_era5_fcst, acc_per_fcst = mse_era5_fcst.copy(), mse_per_fcst.copy()
acc_era5_fcst_boot, acc_per_fcst_boot = mse_per_fcst_boot.copy(), mse_per_fcst_boot.copy()
# ACC
txtr_era5_fcst, txtr_per_fcst = mse_era5_fcst.copy(), mse_per_fcst.copy()
txtr_era5_fcst_boot, txtr_per_fcst_boot = mse_per_fcst_boot.copy(), mse_per_fcst_boot.copy()
```
%% Cell type:markdown id:e5700456-043b-4656-8ac4-759f42fc03c4 tags:
Finally, we populate the initialized data arrays by looping over the forecast hours for which data is available. <br>
Additionally, we perform block bootstrapping to estimate the uncertainty of our evaluation metrics.
%% Cell type:code id:f3ea77c8-60c1-48bd-8285-95d6af66217a tags:
``` python
def handle_scores(scores1_all, scores_ref_all, fhh):
scores1 = scores1_all.sel(fcst_hour=(scores1_all.fcst_hour.dt.hour.isin(fhh)))
scores_ref = scores_ref_all.sel(fcst_hour=(scores_ref_all.fcst_hour.dt.hour.isin(fhh)))
score1_mean, score_ref_mean = scores1.mean(), scores_ref.mean()
# two runs per day -> 2*7 correpsonds to a block length of one week
score1_boot = perform_block_bootstrap_metric(scores1, "fcst_hour", 2*7)
score_ref_boot = perform_block_bootstrap_metric(scores_ref, "fcst_hour", 2*7)
return score1_mean, score_ref_mean, score1_boot, score_ref_boot
for fh in fcst_hours[5::]:
print("Handling scores for forecast hour '{0:0d}'".format(fh))
fh_curr = (init_times + fh)%24
# MSE
mse_era5_fcst[fh-1], mse_per_fcst[fh-1], mse_era5_fcst_boot[fh-1, :], mse_per_fcst_boot[fh-1, :] = handle_scores(score_data["mse"]["era5_all"],
score_data["mse"]["per_all"], fh_curr)
# SSIM
ssim_era5_fcst[fh-1], ssim_per_fcst[fh-1], ssim_era5_fcst_boot[fh-1, :], ssim_per_fcst_boot[fh-1, :] = handle_scores(score_data["ssim"]["era5_all"],
score_data["ssim"]["per_all"], fh_curr)
# ACC
acc_era5_fcst[fh-1], acc_per_fcst[fh-1], acc_era5_fcst_boot[fh-1, :], acc_per_fcst_boot[fh-1, :] = handle_scores(score_data["acc"]["era5_all"],
score_data["acc"]["per_all"], fh_curr)
# TEXTURE
txtr_era5_fcst[fh-1], txtr_per_fcst[fh-1], txtr_era5_fcst_boot[fh-1, :], txtr_per_fcst_boot[fh-1, :] = handle_scores(score_data["texture"]["era5_all"],
score_data["texture"]["per_all"], fh_curr)
#mse_era5_curr = mse_era5_all.sel(fcst_hour=(mse_era5_all.fcst_hour.dt.hour.isin(fh_curr)))
#mse_per_curr = mse_per_all.sel(fcst_hour=(mse_per_all.fcst_hour.dt.hour.isin(fh_curr)))
#mse_era5_fcst[fh-1], mse_per_fcst[fh-1] = mse_era5_curr.mean(), mse_per_curr.mean()
## two runs per day -> 2*7 correpsonds to a block length of one week
#mse_era5_fcst_boot[fh-1, :] = perform_block_bootstrap_metric(mse_era5_curr, "fcst_hour", 2*7)
#mse_per_fcst_boot[fh-1, :] = perform_block_bootstrap_metric(mse_per_curr, "fcst_hour", 2*7)
```
%% Cell type:code id:5ec002e6-cb1b-4c66-9ed0-9f5e2bd3fae9 tags:
``` python
print(mse_era5_fcst)
print(ssim_era5_fcst)
print(acc_era5_fcst)
print(txtr_era5_fcst)
```
%% Cell type:markdown id:276f4f4c-e926-4009-9198-c4e43271e87d tags:
Finally, we put the data arrays into a joint dataset and save the results into the netCDF-file.
%% Cell type:code id:88ccbdce-5d6a-4c40-971c-ebe9a05a8c88 tags:
``` python
# create Dataset and save to netCDF-file
ds_mse = xr.Dataset({"2t_era5_mse_avg": mse_era5_fcst, "2t_era5_mse_bootstrapped": mse_era5_fcst_boot,
"2t_persistence_mse_avg": mse_per_fcst, "2t_persistence_mse_bootstrapped": mse_per_fcst_boot,
# SSIM
"2t_era5_ssim_avg": ssim_era5_fcst, "2t_era5_mse_bootstrapped": ssim_era5_fcst_boot,
"2t_persistence_ssim_avg": ssim_per_fcst, "2t_persistence_mse_bootstrapped": ssim_per_fcst_boot,
# ACC
"2t_era5_acc_avg": acc_era5_fcst, "2t_era5_acc_bootstrapped": acc_era5_fcst_boot,
"2t_persistence_acc_avg": acc_per_fcst, "2t_persistence_acc_bootstrapped": acc_per_fcst_boot,
# TEXTURE
"2t_era5_texture_avg": txtr_era5_fcst, "2t_era5_texture_bootstrapped": txtr_era5_fcst_boot,
"2t_persistence_texture_avg": txtr_per_fcst, "2t_persistence_texture_bootstrapped": txtr_per_fcst_boot,
})
outfile = os.path.join(indir, "evaluation_metrics.nc")
print("Save evaluation metrics to '{0}'".format(outfile))
ds_mse.to_netcdf(outfile)
```
%% Cell type:markdown id:a7678cc4-7333-402a-bcf4-1bc15712255d tags:
## DONE!
%% Cell type:code id:d636ee12-e299-485f-b84d-6f35c05fa766 tags:
``` python
import os, sys
import glob
sys.path.append("../utils/")
import xarray as xr
import numpy as np
import pandas as pd
from statistical_evaluation import perform_block_bootstrap_metric, avg_metrics, calculate_cond_quantiles, Scores
```
%% Cell type:markdown id:8510684d-9374-4e40-bc1c-4d69181c925c tags:
# Evaluation over a smaller (joint) domain
The following cells will first merge all forecast files under `indir` into a single netCDF-file.<br>
Then the data is sliced to the domain defined by `lonlatbox` and all subsequent evaluation is performed on this smaller domain.<br>
The evaluation metrics are then saved to a file under `indir` named `evaluation_metrics_<nlon>x<nlat>.nc` where `nlat` and `nlon` denote the number of grid points/pixels in latitude and longitude direction of the smaller domain, respectively. <br>
Thus, first let's define the basic parameters:
%% Cell type:code id:440b15fa-ecd4-4bb4-9100-ede5abb2b04f tags:
``` python
indir = "/p/project/deepacf/deeprain/video_prediction_shared_folder/results/era5-Y2007-2019M01to12-92x56-3840N0000E-2t_tcc_t_850/savp/20210901T090059_gong1_savp_cv12/"
model = "savp"
# define domain. [3., 24.3, 40.2, 53.1] corresponds to the smallest domain tested in the GMD paper
lonlatbox = [3., 24.3, 40.2, 53.1]
```
%% Cell type:markdown id:fd759f01-2561-4615-8056-036bdee6e2c7 tags:
Next, we perform a first merging step. For computational efficiency, we merge max. 1000 files in the first step.<br>
Since the data is not sorted by the dimension `init_time` when querying along the sample index, we sort it before saving to intermediate files.<br>
Given that the merging step has already been performed, no further processing is required.<br>
If this is not the case, we start with the sample indices between 0 and 999:
%% Cell type:code id:e6726da3-d774-4eda-89d6-e315a865bb99 tags:
``` python
def get_fname_ndigits(indir, prefix, suffix, n, patt="[0-9]"):
flist = []
for i in range(1, n+1):
fn_search = os.path.join(indir, "{0}{1}{2}".format(prefix, i*patt, suffix))
flist = flist + glob.glob(fn_search)
if len(flist) == 0:
raise FileNotFoundError("Could not find any file under '{0}' with prefix '{1}' and suffix '{2}' containing digits.".format(indir, prefix, suffix))
return flist
# get list of files with sample index between 0 and 999.
vfp_list = get_fname_ndigits(indir, "vfp_date_*sample_ind_", ".nc", 3)
outfile = os.path.join(indir, "vfp_{0}_forecasts_sample_ind_0_999.nc".format(model))
if not os.path.isfile(outfile):
print("File '{0}' does not exist. \n Start reading data with sample index between 0 and 999 from '{1}'...".format(outfile, indir))
data_all = xr.open_mfdataset(vfp_list, concat_dim="init_time", combine="nested", decode_cf=True).load()
data_all = data_all.sortby("init_time")
print("Data loaded successfully. Save merged data to '{0}'.".format(outfile))
data_all.to_netcdf(outfile, encoding={'init_time':{'units': "seconds since 1900-01-01 00:00:00"}})
```
%% Cell type:markdown id:1c0222c0-386d-44f4-9532-4e824b14828c tags:
Then, we proceed with the rest.
%% Cell type:code id:54f4aa3e-3a39-496e-ae97-65f79d9cd598 tags:
``` python
for i in np.arange(1, 9):
outfile = os.path.join(indir, "vfp_{0}_forecasts_sample_ind_{1:d}000_{1:d}999.nc".format(model, i))
if not os.path.isfile(outfile):
print("File '{0}' does not exist. Start reading data with sample index between {1:d}000 and {1:d}999 from '{2}'...".format(outfile, i, indir))
data_all = xr.open_mfdataset(os.path.join(indir, "vfp_date_*sample_ind_{0}???.nc".format(i)), concat_dim="init_time", combine="nested", decode_cf=True).load()
data_all = data_all.sortby("init_time")
print("Data loaded successfully. Save merged data to '{0}'.".format(outfile))
data_all.to_netcdf(outfile, encoding={'init_time':{'units': "seconds since 1900-01-01 00:00:00"}})
```
%% Cell type:markdown id:bdf16158-0ce5-40a3-848d-f574a1b9d622 tags:
Still, xarray's `open_mfdataset`-method would not be able to concatenate all data since the `init_time`-dimension is not montonically increasing/decreasing when looping through the files. <br>
Thus, we have to merge the data manually.
The merged dataset is then saved to separate datafile for later computation.
If the data has already been merged, we simply read the data from the corresponding netCDF-file.
%% Cell type:code id:92f15edf-c23f-4803-b3c5-618305194de5 tags:
``` python
outfile_all = os.path.join(indir, "vfp_{0}_forecasts_all.nc".format(model))
if not os.path.isfile(outfile_all):
print("netCDF-file with all forecasts '{0}' does not exist yet. Start merging and sorting all precursor files.".format(outfile))
all_files = sorted(glob.glob(os.path.join(indir, "vfp_{0}_forecasts_sample_ind_*.nc".format(model))))
if len(all_files) == 0:
raise FileNotFoundError("Could not find any precursor files.")
for i, f in enumerate(all_files):
print("Processing file '{0}'".format(f))
tmp = xr.open_dataset(os.path.join(indir, f)).load()
if i == 0:
all_fcst = tmp.copy()
else:
print("Start merging")
all_fcst = xr.merge([all_fcst, tmp])
# sort by init_time-dimension...
all_fcst = all_fcst.sortby("init_time")
# ... and save to file
print("Finally, write all merged and sorted data to '{0}'.".format(outfile_all))
all_fcst.to_netcdf(outfile_all)
else:
all_fcst = xr.open_dataset(outfile_all).load()
```
%% Cell type:markdown id:0fcf1cb1-ba0d-4262-8e23-12ba44b6e2d0 tags:
Now, we slice the dataset to the domain of interest (defined by `lonlatbox`).
%% Cell type:code id:ede23e56-5be8-48be-b584-0eb8741acbf3 tags:
``` python
all_fcst_sl = all_fcst.sel({"lon": slice(lonlatbox[0], lonlatbox[1]), "lat": slice(lonlatbox[3], lonlatbox[2])})
print(all_fcst_sl)
```
%% Cell type:markdown id:e21b89c8-57ab-4070-9b4c-ec0fe24c37b9 tags:
Next we initialize the function for calculating the MSE and call it to evaluate the ERA5 and persistence forecasts. <br>
If you require further evaluation metrics, just expand the cell accordingly, e.g. add the following lines <br>
```
ssim_func = Scores("ssim", ["lat", "lon"]).score_func
ssim_era5_all = ssim_func(data_fcst=era5_fcst[varname_fcst], data_ref=era5_fcst[varname_ref])
ssim_per_all = (data_fcst=era5_fcst[varname_per], data_ref=era5_fcst[varname_ref])
```
in case you want to evaluate the SSIM as well.
%% Cell type:code id:c2b70b80-6b86-4674-b051-6a23aaa821ea tags:
``` python
mse_func = Scores("mse", ["lat", "lon"]).score_func
varname_ref, varname_fcst, varname_per = "2t_ref", "2t_{0}_fcst".format(model), "2t_persistence_fcst"
mse_model_all = mse_func(data_fcst=all_fcst_sl[varname_fcst], data_ref=all_fcst_sl[varname_ref])
mse_per_all = mse_func(data_fcst=all_fcst_sl[varname_per], data_ref=all_fcst_sl[varname_ref])
```
%% Cell type:markdown id:7745356d-ad44-47b6-9655-8d6db3433b1a tags:
Then, we initialize the data arrays to store the desired evaluation metrics...
%% Cell type:code id:b49db031-126c-44b1-b649-4f70587fac89 tags:
``` python
fcst_hours = all_fcst_sl["fcst_hour"]
nhours = len(fcst_hours)
nboots=1000
mse_model_fcst = xr.DataArray(np.empty(nhours, dtype=object), coords={"fcst_hour": fcst_hours}, dims=["fcst_hour"])
mse_model_fcst_boot = xr.DataArray(np.empty((nhours, nboots), dtype=object),
coords={"fcst_hour": fcst_hours, "iboot": np.arange(nboots)},
dims=["fcst_hour", "iboot"])
mse_per_fcst = xr.DataArray(np.empty(nhours, dtype=object), coords={"fcst_hour": fcst_hours}, dims=["fcst_hour"])
mse_per_fcst_boot = xr.DataArray(np.empty((nhours, nboots), dtype=object),
coords={"fcst_hour": fcst_hours, "iboot": np.arange(nboots)},
dims=["fcst_hour", "iboot"])
```
%% Cell type:markdown id:55967405-02d1-46e8-b3c3-8952d0e28bd2 tags:
... and populate them by looping over all forecast hours.
%% Cell type:code id:5090e71c-f20f-43e6-94f6-71cbd0b6006d tags:
``` python
for i, fh in enumerate(fcst_hours):
mse_model_curr = mse_model_all.sel(fcst_hour=fh)
mse_per_curr = mse_per_all.sel(fcst_hour=fh)
mse_model_fcst[fh-1], mse_per_fcst[fh-1] = mse_model_curr.mean(), mse_per_curr.mean()
mse_model_fcst_boot[i, :] = perform_block_bootstrap_metric(mse_model_curr, "init_time", 24*7)
mse_per_fcst_boot[i, :] = perform_block_bootstrap_metric(mse_per_curr, "init_time", 24*7)
```
%% Cell type:code id:d526324d-5d19-4193-8208-e609d9c65205 tags:
``` python
print(mse_model_fcst)
```
%% Cell type:code id:b42cd738-b966-4b24-ad13-351d9b88f9e8 tags:
``` python
print(mse_model_fcst)
```
%% Cell type:markdown id:b13c7287-7a8c-4133-bccc-f250bf25dad7 tags:
Finally, we put the data arrays into a joint dataset and save the results into the netCDF-file.
%% Cell type:code id:7cd455ee-4749-46dd-8095-9e43744a1563 tags:
``` python
# create Dataset and save to netCDF-file
ds_mse = xr.Dataset({"2t_{0}_mse_avg".format(model): mse_model_fcst, "2t_{0}_mse_bootstrapped".format(model): mse_model_fcst_boot,
"2t_persistence_mse_avg": mse_per_fcst, "2t_persistence_mse_bootstrapped": mse_per_fcst_boot})
outfile = os.path.join(indir, "evaluation_metrics_{0:d}x{1:d}.nc".format(len(all_fcst_sl["lon"]), len(all_fcst_sl["lat"])))
print("Save evaluation metrics to '{0}'".format(outfile))
print(ds_mse)
ds_mse.to_netcdf(outfile)
```
%% Cell type:markdown id:69bb9464-6fdb-489b-ba59-0170040144ee tags:
## Done!
This diff is collapsed.
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment