"# Evaluation of the ERA5 short-range forecasts\n",
"\n",
"Define the path to the file and load the data."
"In this Jupyter Notebook, the ERA5 short-range forecasts created with the script `get_era5_forecasts.sh` are evaluated.\n",
"As a result of this Jupyter Notebook, the netCDF-file `evaluation_metrics.nc` will be created which carries the relevant evaluation metrics presented in [Gong et al., 2021](https://doi.org/10.5194/gmd-2021-430). With this file, the meta-postprocessing step can be run in order to create Figure 5 of the mentioned manuscript. <br>\n",
"\n",
"We start by defining the path to the respective netCDF-file and load the data."
"Then, we start to compute the evaluation metrics. <br>\n",
"Unfortunately, we have to rewrite the functions to calculate the ACC and the SSIM since both score-functions expect a `fcst_hour`-dimension which is not present in the data at hand."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0dd70ea1-9341-4b7a-9086-405ee75c6a64",
"id": "synthetic-bracelet",
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -176,10 +174,18 @@
" return ssim_pred"
]
},
{
"cell_type": "markdown",
"id": "linear-bathroom",
"metadata": {},
"source": [
"However, for the MSE and the texture-metric, we can simply use the functions provided by the Scores-class:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e76c4db9-f4da-4664-8054-fdd99cf5f64b",
"id": "ordered-wireless",
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -207,7 +213,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c612a06f-e193-4d4a-85a4-1adc11fde81e",
"id": "human-fighter",
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -218,17 +224,17 @@
},
{
"cell_type": "markdown",
"id": "b91695dc-7d3f-47de-8e67-03e5675aeac1",
"id": "optional-channels",
"metadata": {},
"source": [
"Next, we initialize the data arrays to store the metrics for each forecast hour. <br>\n",
"Note that the ERA5 short-range forecasts only start twice a day at 06 and 18 UTC, respectively. Besides, the have only data starting from lead time 6 hours, but for consistency with the video prediction models, the data arrays cover all lead times between forecast hour 1 and 12. The unavailable values will be set to None."
"Note that the ERA5 short-range forecasts only start twice a day at 06 and 18 UTC, respectively. Besides, we only have data starting from lead time 6 hours (see `get_era5_forecasts.sh`), but for consistency with the video prediction models, the data arrays cover all lead times between forecast hour 1 and 12. The unavailable values will be set to None."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "95611d9c-7551-466d-84b6-ca24be2d3977",
"id": "dynamic-animal",
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -263,7 +269,7 @@
},
{
"cell_type": "markdown",
"id": "e5700456-043b-4656-8ac4-759f42fc03c4",
"id": "local-motor",
"metadata": {},
"source": [
"Finally, we populate the initialized data arrays by looping over the forecast hours for which data is available. <br>\n",
In this Jupyter Notebook, the ERA5 short-range forecasts created with the script `get_era5_forecasts.sh` are evaluated.
As a result of this Jupyter Notebook, the netCDF-file `evaluation_metrics.nc` will be created which carries the relevant evaluation metrics presented in [Gong et al., 2021](https://doi.org/10.5194/gmd-2021-430). With this file, the meta-postprocessing step can be run in order to create Figure 5 of the mentioned manuscript. <br>
Then, we start to compute the evaluation metrics. <br>
Unfortunately, we have to rewrite the functions to calculate the ACC and the SSIM since both score-functions expect a `fcst_hour`-dimension which is not present in the data at hand.
Next, we initialize the data arrays to store the metrics for each forecast hour. <br>
Note that the ERA5 short-range forecasts only start twice a day at 06 and 18 UTC, respectively. Besides, the have only data starting from lead time 6 hours, but for consistency with the video prediction models, the data arrays cover all lead times between forecast hour 1 and 12. The unavailable values will be set to None.
Note that the ERA5 short-range forecasts only start twice a day at 06 and 18 UTC, respectively. Besides, we only have data starting from lead time 6 hours (see `get_era5_forecasts.sh`), but for consistency with the video prediction models, the data arrays cover all lead times between forecast hour 1 and 12. The unavailable values will be set to None.
"In this Jupyter Notebook, evaluation will be performed on a subregion which is nested into the target region of the evaluated trained video prediction models. The resulting netCDF-files can be used in the meta-postpro\n",
"The following cells will first merge all forecast files under `indir` into a single netCDF-file.<br>\n",
"Then the data is sliced to the domain defined by `lonlatbox` and all subsequent evaluation is performed on this smaller domain.<br>\n",
"The evaluation metrics are then saved to a file under `indir` named `evaluation_metrics_<nlon>x<nlat>.nc` where `nlat` and `nlon` denote the number of grid points/pixels in latitude and longitude direction of the smaller domain, respectively. <br>\n",
...
...
@@ -34,7 +35,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "440b15fa-ecd4-4bb4-9100-ede5abb2b04f",
"id": "metropolitan-mailman",
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -46,7 +47,7 @@
},
{
"cell_type": "markdown",
"id": "fd759f01-2561-4615-8056-036bdee6e2c7",
"id": "turned-player",
"metadata": {},
"source": [
"Next, we perform a first merging step. For computational efficiency, we merge max. 1000 files in the first step.<br>\n",
...
...
@@ -59,7 +60,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e6726da3-d774-4eda-89d6-e315a865bb99",
"id": "aging-radio",
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -87,7 +88,7 @@
},
{
"cell_type": "markdown",
"id": "1c0222c0-386d-44f4-9532-4e824b14828c",
"id": "controversial-picking",
"metadata": {},
"source": [
"Then, we proceed with the rest. "
...
...
@@ -96,7 +97,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "54f4aa3e-3a39-496e-ae97-65f79d9cd598",
"id": "basic-corps",
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -112,7 +113,7 @@
},
{
"cell_type": "markdown",
"id": "bdf16158-0ce5-40a3-848d-f574a1b9d622",
"id": "severe-satisfaction",
"metadata": {},
"source": [
"Still, xarray's `open_mfdataset`-method would not be able to concatenate all data since the `init_time`-dimension is not montonically increasing/decreasing when looping through the files. <br>\n",
...
...
@@ -125,7 +126,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "92f15edf-c23f-4803-b3c5-618305194de5",
"id": "illegal-headset",
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -159,7 +160,7 @@
},
{
"cell_type": "markdown",
"id": "0fcf1cb1-ba0d-4262-8e23-12ba44b6e2d0",
"id": "fresh-favor",
"metadata": {},
"source": [
"Now, we slice the dataset to the domain of interest (defined by `lonlatbox`)."
...
...
@@ -168,7 +169,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ede23e56-5be8-48be-b584-0eb8741acbf3",
"id": "initial-campbell",
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -178,7 +179,7 @@
},
{
"cell_type": "markdown",
"id": "e21b89c8-57ab-4070-9b4c-ec0fe24c37b9",
"id": "optimum-drunk",
"metadata": {},
"source": [
"Next we initialize the function for calculating the MSE and call it to evaluate the ERA5 and persistence forecasts. <br>\n",
...
...
@@ -195,7 +196,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c2b70b80-6b86-4674-b051-6a23aaa821ea",
"id": "sorted-adams",
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -208,7 +209,7 @@
},
{
"cell_type": "markdown",
"id": "7745356d-ad44-47b6-9655-8d6db3433b1a",
"id": "demonstrated-eligibility",
"metadata": {},
"source": [
"Then, we initialize the data arrays to store the desired evaluation metrics..."
...
...
@@ -217,7 +218,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b49db031-126c-44b1-b649-4f70587fac89",
"id": "protective-battle",
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -237,7 +238,7 @@
},
{
"cell_type": "markdown",
"id": "55967405-02d1-46e8-b3c3-8952d0e28bd2",
"id": "norman-swedish",
"metadata": {},
"source": [
"... and populate them by looping over all forecast hours."
In this Jupyter Notebook, evaluation will be performed on a subregion which is nested into the target region of the evaluated trained video prediction models. The resulting netCDF-files can be used in the meta-postpro
The following cells will first merge all forecast files under `indir` into a single netCDF-file.<br>
Then the data is sliced to the domain defined by `lonlatbox` and all subsequent evaluation is performed on this smaller domain.<br>
The evaluation metrics are then saved to a file under `indir` named `evaluation_metrics_<nlon>x<nlat>.nc` where `nlat` and `nlon` denote the number of grid points/pixels in latitude and longitude direction of the smaller domain, respectively. <br>
Still, xarray's `open_mfdataset`-method would not be able to concatenate all data since the `init_time`-dimension is not montonically increasing/decreasing when looping through the files. <br>
Thus, we have to merge the data manually.
The merged dataset is then saved to separate datafile for later computation.
If the data has already been merged, we simply read the data from the corresponding netCDF-file.