-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do we merge clusters over different training splits #101
Comments
Thanks for raising this issue. In my opinion, this task really needs expert knowledge as in some cases the areas might not overlap (due to p value masks) but they can still belong to the same group (e.g. areas in Pacific form the same sst horse shoe pattern but are separated due to p values). Fortunately, given that normally you won't have hundreds of train splits, it is logical that the user can use their expert knowledge and adjust some labels manually. So, I think what we are doing here, is to provide a basic landscape that has plausible results and doesn't require too much correction from the user if they dislike it. It is not possible to be perfect as we rely too much on the outcome from RGDR, but the results should make sense for as many cases as possible. We can design an algorithm to at least label those easy cases correctly for the user. We can use the area comparison method suggested by @Peter9192 . This can be a utility function that takes the clustered maps (e.g. a list of maps) as input |
About the algorithm, here are my thoughts:
|
Thanks for opening the issue and describing it so clearly. Looking at this example, I agree that it is not trivial to "align" the clusters, as there doesn't seem to be an obvious alignment even by eye. As you say, it may be different for other usecases/examples, but if we want to come up with something general, perhaps we should take a step back first. Instead of a function like
Only if we are able to judge whether the clusters are robust, can we start thinking of 'merging' or 'aligning' them. |
Thanks for the comments on this issue @Peter9192 and @geek-yang. I like the suggestion of Peter to run some diagnostics over the clusters. It would be very cool if in the end we can have a sort of final map showing clusters with shaded colours over gridcells (the darker the more robust) so you can see that in some splits you have found some significantly correlating gridcells but not in others. This even sparks an idea that you can in the end use the timeseries of all the regions but with weights based on how many times a region is found over every split. I'll continue with a simple method that compares regions for now, like the |
One of the main reasons we couldn't use s2spy during the Lorenz workshop was that the labels of the precursor regions weren't aligned over the training splits.
When using RGDR to identify precursor regions of interest, we follow the following procedure:
Uptil now, we haven't thought about a way to somehow align the areas over training splits. The example below shows that it is not trivial to match these areas. Here I have used 4 splits over the data in /tests to look at the clustered regions over the splits. I have adapted the plotting function in rgdr a bit to get the same colorbars for every figure.
Note:
We could come up with some algorithm that mimics what we would identify as one cluster by eye. There are some things to consider:
The text was updated successfully, but these errors were encountered: