Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison values (e.g. FCN) in the paper do not match up #1

Open
marvingabler opened this issue Apr 21, 2024 · 2 comments
Open

Comparison values (e.g. FCN) in the paper do not match up #1

marvingabler opened this issue Apr 21, 2024 · 2 comments

Comments

@marvingabler
Copy link

Hey folks, first congrats to your paper & hard work!

I am wondering where the comparison values in the paper (e.g. FourCastNet RMSE) is coming from. They seem to be very different than what the authors describe in their papers and what we could experimentally verify last year:

  • you report an RMSE for FCN of 1.28K for t2m at 6h, while the authors describe roughly 0.74K (which I can verify is correct)
  • you report an RMSE for FCN of 1.68K for t2m at 24h, while the authors describe roughly 0.94K (which I can also verify)

These numbers would obv change the conclusion of the paper. Before I go deeper into checking also with other variables and ClimaX, I wanted to reach out to check if I am missing something.

To compare your scores easily with other open AI weather models (you can choose target resolutions), I can highly recommend WeatherBench's web UI

One more comment:

  • GraphCast and Pangu are open & can be used for comparison (in contrast to your statement in the paper)
@yogeshverma1998
Copy link
Collaborator

yogeshverma1998 commented Apr 22, 2024

Hi,

Our work doesn't utilize the total number of variables used in the FourCastNet paper (for training the model), primarily due to academic computational constraints. FCN uses a resolution of 0.25, whereas we use a resolution of 5.625, leading to a 32x64 grid. Thus, a direct-to-direct comparison is unfair.

We have described the variables and resolution of data that we have used in Appendix B. For a fair comparison, we re-run the FCN and ClimaX (without pertaining) with the same hyper-parameters (provided in their official repo), with those variables on 32x64 resolution. This also makes it not directly compared to WeatherBench web UPI.

We acknowledge the usage of the low amount of variables in the spatial domain primarily due to academic computational constraints. Primarly, we wanted to demonstrate that "proper" continuous-time approaches can be viable for weather. We are still progressing in expanding the method to incorporate more variables and compare the scores easily with the WeatherBench web UPI.

GraphCast and Pangu are open & can be used for comparison (in contrast to your statement in the paper)

I think the training codes for Pangu are still not released (which is stated here: 198808xc/Pangu-Weather#58), thus making it impossible to have a fair comparison. I think the initial release for Graphcast was made after the ICLR submission deadlines (which were later updated with some instructions to use, etc.). We are still progressing towards a fair one-to-one comparison with Graphcast by adapting their code. However, it is a little bit challenging due to the non-availability of proper documentation regarding training, pre-processing, inference, etc.

@marvingabler
Copy link
Author

Hey @yogeshverma1998 thanks for the prompt response & detailed explanation, makes totally sense to me!

I think your approach is quite unique & I want to support further evaluation. I can offer you a few nodes of H100s (with 8 H100s each) for your research experiments, drop me a mail at [email protected] if that would help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants