ec-46 compatibilty #167

pimmeerdink · 2023-05-30T17:34:56Z

Hey guys! First of all, I'm going to be working with, and on, the s2spy software the coming time: I think we may have met one time at some presentations about the work going on at the IVM, I was presenting work for my Masters thesis in AI. Either way I have an issue I could use your help with.

In a nuthsell: we need to be able to use the preprocess functionality not just with historical data but also modelled data. In the case of EC-46, this means that we don't only have a time, latitude and longitude dimension, but also a /step/ dimension. In essence, this means the amount of steps into the future that that datapoint represents: on every day, for every gridpoint, a simulation is run 30 days into the future. This means that every day in essence has 30 prediction fields associated with it. When it comes to deseasonalizing, this just means that not only do we want to groupby the day of the year, but also over the step into the future: we calculate a normalization coefficient for every x combination. Sounds easy, it's just a groupby and merge over an extra axis. It gets hairy when you realize, as i did, that xarray does not support groupby operations on multiple columns. I have done som dirty coding on the ec-46 compaitiblity branch in s2spy/preprocess to get it working anyways, but it's crazy inefficient and, lets face it, ugly. Wondering if you guys can help: i think the code could potentially use a little restructuring/rethinking given the new use case. I have attached a netcdf file of the format that should be pre-processable by the functions (zipped because i had to...)

Hope you guys can help! Thanks in advance.

input_data.netcdf.zip

BSchilperoort · 2023-05-31T07:05:51Z

Hi Pim, we did meet at that presentation, but Yang was not there. However, if you'll be working on s2spy, I am sure that we will meet (again) some time. It would be good to align our thoughts and ideas in person.

Looking at the file and your description, having the dimension named "time" can be a bit misleading, right? As it is the time that the forecast was made, not the actual time the forecast represents. I think rearranging that would be a good first step.
Possibly with time representing the actual date of the forecast temperature, and instead of step a lead_time (or similar name) representing the number of days ahead the forecast was made.

This issue that @semvijverberg made in Lilio had data structured more like that: AI4S2S/lilio#54

xarray does not support groupby operations on multiple columns.

If you stack your dimensions, you should be able to use groupby on the stacked dimension. This stacked dimension will have a MultiIndex as coordinates.

In the end, it would be nice to be able to support EC-46 and similar forecast/ensemble data in a flexible way, without writing too much custom code in the processing functions.
One way to do this is to have a converter function that takes a certain dataset (e.g. EC-46) and converts it to a format compatible with s2spy. Then we can rely on a certain data format/structure in the rest of the code.

geek-yang · 2023-05-31T07:23:47Z

@pimmeerdink Thanks for asking. I agree with @BSchilperoort and you can simply deseasonalize your data with s2spy using stack and unstack trick, which is quite quick and straight forward.

Indeed we are considering supporting ensemble forecasts in a nicer way. In the meantime @semvijverberg and @jannesvaningen are experimenting with lilio and s2spy, using EC-46 data. Once we figure out a nice way and implement a new feature, we will let you know.

pimmeerdink · 2023-05-31T09:14:26Z

Hey guys, thanks for the responses! I'm actually the one experimenting with the ec-46 data at the moment instead of jannes and sem, and of course I agree we would like to write as little custom code as possible, and agreeing upon a data format for data like this would be the way to go. For now, I'd like to get it working in a somewhat practical way, which would help with the future development. However, we know that deseasonalizing for data of this format (so where a double groupby is necessary) will need to be supported.

I agree that stack would seem like the logical choice. However there's a small problem with simply stacking the dimensions: we are unable to access the dayofyear attribute of the time dimensions:

data.stack(doy__step=["time.dt.dayofyear", "step"])
*** KeyError: 'time.dt.dayofyear'

While we can access it directly through just writing "data.time.dt.dayofyear", when used in a stack function this does not seem to work. Intuitively you would then calculate it seperately, assigning it as a new coordinate and then stacking with that, however in that case the dayofyear (doy) that we calculate and assign is a coordinate, not a dimension, and stacking is only possible for dimensions. So basically: I can't figure out how to make the stack thus also the groupby operation work. If one of you could help, that would be great!

BSchilperoort · 2023-05-31T10:54:32Z

Hi Pim, try this:

import xarray as xr
ds = xr.open_dataset("/home/bart/Downloads/input_data.netcdf")
ds["doy"] = ds["time"].dt.dayofyear
ds = ds.set_coords("doy")
ds = ds.swap_dims({"time": "doy"})
ds = ds.stack(doystep=["doy", "step"])
ds

pimmeerdink · 2023-05-31T12:05:02Z

Great! That worked, thanks a lot :)

BSchilperoort · 2023-06-07T07:10:41Z

I had a discussion with @pimmeerdink yesterday, and the conclusion was that:

If the trend and climatology are a function of the step as well. E.g. slope = F(step, latitude, longitude) no changes have to be made to the preprocessor, as it is already supported.
If the step dimension should be flattened for the trend or climatology, different changes will have to be made.

semvijverberg · 2023-06-07T07:25:07Z

That is nice, curious to see a code snippet!

pimmeerdink assigned BSchilperoort and geek-yang May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ec-46 compatibilty #167

ec-46 compatibilty #167

pimmeerdink commented May 30, 2023

BSchilperoort commented May 31, 2023 •

edited

Loading

geek-yang commented May 31, 2023

pimmeerdink commented May 31, 2023 •

edited

Loading

BSchilperoort commented May 31, 2023

pimmeerdink commented May 31, 2023

BSchilperoort commented Jun 7, 2023

semvijverberg commented Jun 7, 2023

ec-46 compatibilty #167

ec-46 compatibilty #167

Comments

pimmeerdink commented May 30, 2023

BSchilperoort commented May 31, 2023 • edited Loading

geek-yang commented May 31, 2023

pimmeerdink commented May 31, 2023 • edited Loading

BSchilperoort commented May 31, 2023

pimmeerdink commented May 31, 2023

BSchilperoort commented Jun 7, 2023

semvijverberg commented Jun 7, 2023

BSchilperoort commented May 31, 2023 •

edited

Loading

pimmeerdink commented May 31, 2023 •

edited

Loading