Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing NeurIPS paper Fig. 6 results: predictions have no longitudinal dimension. #12

Open
blutjens opened this issue Jun 19, 2024 · 7 comments
Assignees
Labels
bug Something isn't working emulator

Comments

@blutjens
Copy link
Contributor

What is the expected shape of the preds variable in metrics.py -> LLweighted_RMSE_Climax()? Following the instructions in README_emulator.md I get a shape which is missing the longitude dimension.

In more detail, I have been following the instructions to reproduce the values in Fig. 6 of the camera-ready NeurIPS paper, but am getting some issues. I am running this suggested command from the README:

python emulator/run.py experiment=single_emulator/unet/NorESM2-LM_unet_tas+pr_run-01.yaml logger=none seed=3423

After training for 50 epochs, trainer.test() returns similar values to paper Fig. 6 : test/ssp245_NorESM2-LM/tas/llrmse_climax 0.3184917569160461. However, the shape of the variable preds in LLweighted_RMSE_Climax (which I believe is the metric used in Fig. 6) has the shape (4, 12, 2, 96). This is missing the longitude dimension. The header in metrics.py says the shape should be (batch_size / N, time, lon, lat) which would be (4, 12, 144, 96).

P.s:
During debugging I tried setting datamodule.channels_last = True, which gives me a shape with swapped lat-lon: (4, 12, 96, 144). But, with that I am get an llrmse_climax around ~ 0.024 which is significantly lower than the results in Fig. 6. Also, the latitude weights assume that latitude is the last dimension:
LLweighted_RMSE_Climax() -> lat_size = y.shape[-1]
I also tried editing split_vector_by_variable() with no success.

I would greatly appreciate any pointers to resolve this issue. Thank you!

@jirvin16
Copy link

So I think this issue is caused by the fact that is used here, leading to the wrong dimensions being sliced. This is fixed by using something like vector[:, :, var_channel_limits["start"] : var_channel_limits["end"]] instead to ensure the correct dimension (the variable dimension) is being sliced. To be clear, the output of the data loader is (B, T, V, lat, lon) so the current code slices the longitude dimension instead of the variable dimension as intended, which is why we’re not seeing any longitude dimension (I believe it’s squeezed here).

It seems like there is at least one other issue as well. The latitude and longitude dimensions seem to be swapped – AFAICT there should be 144 longitudes and 96 latitudes, but the code specifies the opposite e.g. here and here

Kudos to @aharbi for spotting these and coming up with the above fix.

@liellnima
Copy link
Collaborator

Hi Björn and Jeremy,

Thank you so much for pointing this out. There is indeed a slicing issue and I can confirm what Jeremy has suggested. I remember we had issues with the different weighted RMSE metrics all over the place, and I am wondering now if it was the slicing issue and the longitude-latitude swap.

I'd love Charlie to fix this since she knows the code the best, however, she is off the project and away for the next months. Feel free to make a PR regarding this! I am trying to find someone who can address this soonish and if not I will take care of it in August / September.

Sorry for being slow on all of this! And thanks to @aharbi from my site too :)

@hirasaleem0703
Copy link

After fixing the slicing issue. The model does not seem to produce the same results as reported in paper. This is especially the case in predicting precipitation.

@hassony2
Copy link

Thank you for looking into this. @hirasaleem0703, as I am also interested in figure 6, I would be very interested in knowing how big is the discrepancy you are observing ?

All the best,

Yana

@liellnima
Copy link
Collaborator

liellnima commented Sep 3, 2024

Dear everyone,

sorry for my late response, I am finally back from my internship and traveling.

I just discovered that the issue for the swapped longitude-latitude lies actually deeper: In our configs and constants, the numbers for longitude and latitude are swapped. It should be lon -> 144 and lat -> 96.

(@jirvin16 thanks - you already pointed out the inconsistencies!).

In: configs/datamodule/climate.yaml and configs/datamodule/climate_super.yaml and src/data/constants.py

I still have to understand how this impacts everything else downstream, but that is definitely (one of) the root problems here. I will let you know when it's fixed.

Will rerun experiments after fixing all the metrics and loss issues and making sure it's doing the right thing.

@liellnima
Copy link
Collaborator

Sidenote - to prevent anyone from using that:

channels_last expected behavior is to change
the original shape: (batch_size, time_sequence, vars, lon, lat)
to a "variables-come-last" shape: (batch_size, time_sequence, lon, lat, vars)

meaning: using channels_last will not fix the issue

@liellnima
Copy link
Collaborator

Update:

  • longitude / latitude: The swap has a lot of downstream effects. I have rewritten the relevant parts. From now on, we follow the convention "first latitude, last longitude" to keep it consistent with other repos.
  • RMSE loss: I have rewritten and fixed our loss function. Next to shape issues, the weighting was wrong as well.
  • channels_last: There also have been some inconsistencies in how "channels_last" was used. I addressed this, and from now on, it does not matter if you set it to True or False, you should get the same results.

For channels_last = False; you should get the following shape now:
[batch, time, variables, latitude (96), longitude (144)]

At this point, I expect our results to change quite a bit. Rerunning the experiments will take a bit longer. After having fixed the RMSE metric and making sure that one works as expected, I will push the changes to the repo, because I wanna make sure you have access to the updated code asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working emulator
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

8 participants