Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMSE scores on website differ from those on the corresponding Google Cloud dataset #184

Open
cindyl983 opened this issue Sep 20, 2024 · 1 comment

Comments

@cindyl983
Copy link

cindyl983 commented Sep 20, 2024

Hi WeatherBench 2 team,

I was taking a look at the RMSE scores of different weather models on your website: https://sites.research.google/weatherbench/deterministic-scores/.

However, these scores all differ from scores found in your Google Cloud bucket: https://console.cloud.google.com/storage/browser/weatherbench2/results/1440x721/deterministic;tab=objects?prefix=&forceOnObjectsSortingFiltering=false

The scores on the website seem to be consistently higher across all parameters, with the difference increasing as the lead times increase. I have two examples in this spreadsheet containing the RMSE scores from the website and the Google Cloud for each lead time.
WeatherBench 2 RMSE Scores (1).xlsx

Example of retrieving RMSE scores

Model: FuXi
Dataset: ERA5
Variable: Geopotential
Metric: RMSE
Level: 500
Region: Global
Year: 2020
Resolution 1440x721

Website

I hovered on the plot at each point to find the score. Ex: the RMSE score for lead_time=6 is 19.1883

image

Google Cloud bucket dataset

The RMSE score for lead_time=6 is 19.17010272.

import gcsfs
fs = gcsfs.GCSFileSystem()
from google.colab import auth
auth.authenticate_user()

import xarray as xr
import pandas as pd
import numpy as np

path = 'gs://weatherbench2/results/1440x721/deterministic/fuxi_vs_era_2020_deterministic.nc'

with fs.open(path, 'rb') as f:
        ds = xr.open_dataset(f)
ds['geopotential'].sel(level=500, region='global', metric='rmse', lead_time=np.timedelta64(6, 'h')).values

# Output
array(19.17010272)
ds['geopotential'].sel(level=500, region='global', metric='rmse').values

# Output
array([ 19.17010272,  30.50602631,  33.31361289,  42.70132768,
        47.80672409,  58.73283364,  65.87870037,  77.75102942,
        87.01627604, 100.64913862, 112.15159811, 127.53932959,
       141.26029425, 158.69348736, 174.7353766 , 194.35554665,
       212.64561075, 234.24519231, 254.39069177, 277.60062767,
       294.03645833, 313.28461093, 331.8133458 , 352.49123041,
       371.59800125, 392.58279915, 411.67895299, 432.52519587,
       451.15322293, 471.2914886 , 489.00525285, 508.19880698,
       524.81361289, 542.78650285, 558.00587607, 574.56623932,
       588.26046118, 603.36204594, 615.70713141, 629.5929042 ,
       636.72930021, 646.43491809, 654.61672009, 664.60741631,
       672.32594373, 681.75587607, 688.69115028, 697.3772703 ,
       703.4286859 , 711.22240028, 716.29985755, 723.14387464,
       727.27711004, 733.34628739, 736.73744658, 742.14547721,
       744.96354167, 749.94586895, 752.37473291, 756.99795228])
@raspstephan
Copy link
Collaborator

Hey, it is totally possible that one of those corresponds to an earlier version. Do you see the same differences for other metrics? If not, the most likely difference is that we used to compute the time mean of RMSE outside of the square-root but now do it inside to correspond with what ECMWF do. This leads to slightly different results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants