Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verificaton brainstorming #6

Open
konradmayer opened this issue Jun 6, 2023 · 20 comments
Open

verificaton brainstorming #6

konradmayer opened this issue Jun 6, 2023 · 20 comments
Assignees
Milestone

Comments

@konradmayer
Copy link
Collaborator

issue for brainstorming and material collection

@konradmayer konradmayer added this to the verification milestone Jun 6, 2023
@konradmayer
Copy link
Collaborator Author

mlr3 Dictionary of Performance Measures

https://mlr3.mlr-org.com/reference/mlr_measures.html

@konradmayer
Copy link
Collaborator Author

@r3xth0r
Copy link
Collaborator

r3xth0r commented Jul 25, 2023

A hint at spatiotemporal resampling: https://mlr3spatiotempcv.mlr-org.com/

@seblehner
Copy link
Collaborator

As denoted therein (https://ml4physicalsciences.github.io/2019/files/NeurIPS_ML4PS_2019_75.pdf) and in other literature: power spectral density (PSD) could be more beneficial to evaluate high-resolution features than MSE or PSNR (peak signal to noise ratio).

@konradmayer
Copy link
Collaborator Author

Verification script of the spanish team is to be found at: https://github.com/ECMWFCode4Earth/DeepR/tree/main/deepr/validation/netcdf

and conducts calculation of well established skill scores for individual coordinates:

https://github.com/ECMWFCode4Earth/DeepR/blob/main/deepr/validation/netcdf/metrics.py#L50-L56

@konradmayer
Copy link
Collaborator Author

I did some testing on radially averaged PSD using the r package {radialpsd}.

For lead time 12 with a radially averaged 2D fourier transform per timestep (912 in total) first plots look as follows (line is mean over all timesteps, shaded area is minmax range)

image

@konradmayer
Copy link
Collaborator Author

konradmayer commented Aug 28, 2023

  • Is an explorative approach sufficient here, or do we need to derive a skill score?
  • how to compare the different lead times (facets?)?
  • would a 3D FT be better suited (not supported by the package)?
  • scaling and normalization necessary?
  • how to treat non-square domains (truncating or padding?)? - is windowing (i.e. padding) a good idea in any case to avoid artifacts in the high frequency area?
  • needs to be done on projected data? (otherwise wavenumber/distance not valid?)

@mdaber
Copy link
Collaborator

mdaber commented Aug 28, 2023

I am not familiar with the score but i guess a skill score is mainly needed when the differences between the methods are small, if they are big enough i dont think it is necessary, similar with scaling and normalizing. Regarding the lead times: maybe we can have a summarizing score for the lead times and show a lead time wise graphic and use the power spectrum just at one or two lead times (night or day). Just for my understanding: does PSD penalize a bias?

@konradmayer
Copy link
Collaborator Author

konradmayer commented Aug 28, 2023

probably its also better not to logarithmize wavenumber for easier interpretation:

image

@konradmayer
Copy link
Collaborator Author

here for comparison the plot with logarithmic x axis

image

(the last two plots were without scaling and normalization, the one in #6 (comment) was with both)

@konradmayer
Copy link
Collaborator Author

@r3xth0r, do you think that comparing variograms among cerra and models is a useful addition/alternative to PSD?

konradmayer pushed a commit that referenced this issue Aug 28, 2023
@konradmayer
Copy link
Collaborator Author

konradmayer commented Aug 28, 2023

pushed my first tests on PSD with 011f663 to its own experimental branch - any suggestions and ideas are very welcome

@konradmayer
Copy link
Collaborator Author

Just did some tests with variography. I aggregated the time steps (to seasons) as its computationally much more expensive than PSD. Here's a test output for lead_time 12.

image

This is for testing only, but we would interpret here that spatial variablility is underestimated by the downscaled data (samos in this case) in summer, but overestimated in the other seasons as compared to cerra. In general, the shape of the variograms is more or less reproduced

I think this analysis (same for PSD) is generally only valid for projected coordinates, as otherwise distance is not uniform across space - which is not the case for this as well as above plots! (added this point to the list in #6 (comment); @r3xth0r, any thoughts on this?)

here unit of x is degrees, in the PSD plots above wavenumber is diagonal_domain_size(in px)^-1

@konradmayer
Copy link
Collaborator Author

However, aren't we especially interested in the distances < ERA5 pixel size (which is 0.25), which is not covered at all by the above variograms (first bin at 0.54). is it even reasonable to derive a variogram with small enough bins to learn something about the low distance variability we are mainly interested in?

@r3xth0r
Copy link
Collaborator

r3xth0r commented Aug 29, 2023

(1) CRS: you are right. The effect of using geographic instead of projected coordinates might be negligible on small AOIs, but could lead to be considerable on a continental scale.

(2) I am somewhat unsure about the added value of using variograms here. We probably would need to consider anisotropy to some extent, but this might not be straightforward, as does not occur consisntly across the whole area. Probably it's sufficient to stick to PSD.

(3) A 3D FT would probably be better suited indeed, but I doubt that the additional effort of a manual implementation is really worth it.

@konradmayer
Copy link
Collaborator Author

Here PSD as in #6 (comment), but stratified by season

image

@konradmayer
Copy link
Collaborator Author

konradmayer commented Aug 31, 2023

Inspired by yesterdays meeting (thanks, @mc4117) I added PSD for (bilinear interpolated) ERA5 to this analysis - this is what it looks like for individual timesteps:

image

@konradmayer
Copy link
Collaborator Author

we can clearly see that the power spectrum of the downscaled field (samos) is closely following the PS of the CERRA data, and ERA5 showing bigger differences

@konradmayer
Copy link
Collaborator Author

konradmayer commented Aug 31, 2023

this is also the case when averaging over all timesteps:

image

to get an idea of the variation of these power spectra heres median, IQR and 0.05-0.95 range instead of the mean:

image

alternatively - mean by season:

image

all plots above for lead time 12

konradmayer pushed a commit that referenced this issue Aug 31, 2023
@mc4117
Copy link

mc4117 commented Aug 31, 2023

Thanks! It's great to see the comparison

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants