RMSE scores on website differ from those on the corresponding Google Cloud dataset #184

cindyl983 · 2024-09-20T13:49:45Z

Hi WeatherBench 2 team,

I was taking a look at the RMSE scores of different weather models on your website: https://sites.research.google/weatherbench/deterministic-scores/.

However, these scores all differ from scores found in your Google Cloud bucket: https://console.cloud.google.com/storage/browser/weatherbench2/results/1440x721/deterministic;tab=objects?prefix=&forceOnObjectsSortingFiltering=false

The scores on the website seem to be consistently higher across all parameters, with the difference increasing as the lead times increase. I have two examples in this spreadsheet containing the RMSE scores from the website and the Google Cloud for each lead time.
WeatherBench 2 RMSE Scores (1).xlsx

Example of retrieving RMSE scores

Model: FuXi
Dataset: ERA5
Variable: Geopotential
Metric: RMSE
Level: 500
Region: Global
Year: 2020
Resolution 1440x721

Website

I hovered on the plot at each point to find the score. Ex: the RMSE score for lead_time=6 is 19.1883

Google Cloud bucket dataset

The RMSE score for lead_time=6 is 19.17010272.

import gcsfs
fs = gcsfs.GCSFileSystem()
from google.colab import auth
auth.authenticate_user()

import xarray as xr
import pandas as pd
import numpy as np

path = 'gs://weatherbench2/results/1440x721/deterministic/fuxi_vs_era_2020_deterministic.nc'

with fs.open(path, 'rb') as f:
        ds = xr.open_dataset(f)

ds['geopotential'].sel(level=500, region='global', metric='rmse', lead_time=np.timedelta64(6, 'h')).values

# Output
array(19.17010272)

ds['geopotential'].sel(level=500, region='global', metric='rmse').values

# Output
array([ 19.17010272,  30.50602631,  33.31361289,  42.70132768,
        47.80672409,  58.73283364,  65.87870037,  77.75102942,
        87.01627604, 100.64913862, 112.15159811, 127.53932959,
       141.26029425, 158.69348736, 174.7353766 , 194.35554665,
       212.64561075, 234.24519231, 254.39069177, 277.60062767,
       294.03645833, 313.28461093, 331.8133458 , 352.49123041,
       371.59800125, 392.58279915, 411.67895299, 432.52519587,
       451.15322293, 471.2914886 , 489.00525285, 508.19880698,
       524.81361289, 542.78650285, 558.00587607, 574.56623932,
       588.26046118, 603.36204594, 615.70713141, 629.5929042 ,
       636.72930021, 646.43491809, 654.61672009, 664.60741631,
       672.32594373, 681.75587607, 688.69115028, 697.3772703 ,
       703.4286859 , 711.22240028, 716.29985755, 723.14387464,
       727.27711004, 733.34628739, 736.73744658, 742.14547721,
       744.96354167, 749.94586895, 752.37473291, 756.99795228])

The text was updated successfully, but these errors were encountered:

raspstephan · 2024-09-30T08:08:25Z

Hey, it is totally possible that one of those corresponds to an earlier version. Do you see the same differences for other metrics? If not, the most likely difference is that we used to compute the time mean of RMSE outside of the square-root but now do it inside to correspond with what ECMWF do. This leads to slightly different results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RMSE scores on website differ from those on the corresponding Google Cloud dataset #184

RMSE scores on website differ from those on the corresponding Google Cloud dataset #184

cindyl983 commented Sep 20, 2024 •

edited

Loading

raspstephan commented Sep 30, 2024

RMSE scores on website differ from those on the corresponding Google Cloud dataset #184

RMSE scores on website differ from those on the corresponding Google Cloud dataset #184

Comments

cindyl983 commented Sep 20, 2024 • edited Loading

Example of retrieving RMSE scores

Website

Google Cloud bucket dataset

raspstephan commented Sep 30, 2024

cindyl983 commented Sep 20, 2024 •

edited

Loading