-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ec-46 compatibilty #167
Comments
Hi Pim, we did meet at that presentation, but Yang was not there. However, if you'll be working on s2spy, I am sure that we will meet (again) some time. It would be good to align our thoughts and ideas in person. Looking at the file and your description, having the dimension named "time" can be a bit misleading, right? As it is the time that the forecast was made, not the actual time the forecast represents. I think rearranging that would be a good first step. This issue that @semvijverberg made in Lilio had data structured more like that: AI4S2S/lilio#54
If you stack your dimensions, you should be able to use groupby on the stacked dimension. This stacked dimension will have a In the end, it would be nice to be able to support EC-46 and similar forecast/ensemble data in a flexible way, without writing too much custom code in the processing functions. |
@pimmeerdink Thanks for asking. I agree with @BSchilperoort and you can simply deseasonalize your data with Indeed we are considering supporting ensemble forecasts in a nicer way. In the meantime @semvijverberg and @jannesvaningen are experimenting with |
Hey guys, thanks for the responses! I'm actually the one experimenting with the ec-46 data at the moment instead of jannes and sem, and of course I agree we would like to write as little custom code as possible, and agreeing upon a data format for data like this would be the way to go. For now, I'd like to get it working in a somewhat practical way, which would help with the future development. However, we know that deseasonalizing for data of this format (so where a double groupby is necessary) will need to be supported. I agree that stack would seem like the logical choice. However there's a small problem with simply stacking the dimensions: we are unable to access the dayofyear attribute of the time dimensions: data.stack(doy__step=["time.dt.dayofyear", "step"]) While we can access it directly through just writing "data.time.dt.dayofyear", when used in a stack function this does not seem to work. Intuitively you would then calculate it seperately, assigning it as a new coordinate and then stacking with that, however in that case the dayofyear (doy) that we calculate and assign is a coordinate, not a dimension, and stacking is only possible for dimensions. So basically: I can't figure out how to make the stack thus also the groupby operation work. If one of you could help, that would be great! |
Hi Pim, try this: import xarray as xr
ds = xr.open_dataset("/home/bart/Downloads/input_data.netcdf")
ds["doy"] = ds["time"].dt.dayofyear
ds = ds.set_coords("doy")
ds = ds.swap_dims({"time": "doy"})
ds = ds.stack(doystep=["doy", "step"])
ds |
Great! That worked, thanks a lot :) |
I had a discussion with @pimmeerdink yesterday, and the conclusion was that:
|
That is nice, curious to see a code snippet! |
Hey guys! First of all, I'm going to be working with, and on, the s2spy software the coming time: I think we may have met one time at some presentations about the work going on at the IVM, I was presenting work for my Masters thesis in AI. Either way I have an issue I could use your help with.
In a nuthsell: we need to be able to use the preprocess functionality not just with historical data but also modelled data. In the case of EC-46, this means that we don't only have a time, latitude and longitude dimension, but also a /step/ dimension. In essence, this means the amount of steps into the future that that datapoint represents: on every day, for every gridpoint, a simulation is run 30 days into the future. This means that every day in essence has 30 prediction fields associated with it. When it comes to deseasonalizing, this just means that not only do we want to groupby the day of the year, but also over the step into the future: we calculate a normalization coefficient for every x combination. Sounds easy, it's just a groupby and merge over an extra axis. It gets hairy when you realize, as i did, that xarray does not support groupby operations on multiple columns. I have done som dirty coding on the ec-46 compaitiblity branch in s2spy/preprocess to get it working anyways, but it's crazy inefficient and, lets face it, ugly. Wondering if you guys can help: i think the code could potentially use a little restructuring/rethinking given the new use case. I have attached a netcdf file of the format that should be pre-processable by the functions (zipped because i had to...)
Hope you guys can help! Thanks in advance.
input_data.netcdf.zip
The text was updated successfully, but these errors were encountered: