Replies: 3 comments
-
Dear Ryan, Thank you very much for these kind words about parcels, and for your thoughtful comments. Philippe and I have discussed these, and have the following responses and questions on your three ideas
We would very much welcome the opportunity to work with you and the rest of the xarray and xgcm teams to streamline Parcels and integrate it more. As you say, we citizens of the scientific python ecosystem should aim to reduce unnecessary repetition. We’ll open a separate Issue for 1), and leave this one open for discussion on the other two points -Erik and Philippe |
Beta Was this translation helpful? Give feedback.
-
One way to auto-detect would be to use CF conventions and examine the
It makes total sense that that, when it comes time to actually integrate the trajectories, you need numpy arrays. But rather than just calling
I totally agree that 3 is hard and probably impractical. The ideal path, given lots of developer time and excellent coordination between packages, would involve the following:
This would be awesome, because now any xarray dataset could use the cool, fast interpolators you have developed, which are currently accessible only from deep inside parcels. Not claiming this would be easy, but I think it is, in some sense, the "right" way. In the meantime, it might be worth reviewing some of the recent changes in xarray, including This image in particular seems to capture what one wants to do with particles cc @shoyer and @jhamman, who might be able to weigh in on the timeframe for the needed changes to xarray indexing. |
Beta Was this translation helpful? Give feedback.
-
This shows how to use dask for distributing potentially many similar parcels experiments: https://nbviewer.jupyter.org/gist/willirath/6b5c4654ca6be3774fa76acf4a266b96 The basic idea is to wrap the parcels experiment in a function and map it to a dask bag with the parameters. It would be very interesting to use the xarray / zarr backend and run this on distributed resources on pangeo. My feeling (somewhat informed by a few tests though) is that there's only very few obstacles to overcome before splitting parcels particle sets and distributing them more in a more transparent way than with this relatively crude approach. |
Beta Was this translation helpful? Give feedback.
-
Hi Parcels Folks,
Congratulations on this amazing package! It is awesome to see how far it has evolved since Erik's Lagrangian meeting at Imperial a few years ago. Both the code itself and the documentation / example sets are just beautiful.
I have been reading through the code and documentation, and I have a few comments that I want to share. Please take these comments with a giant grain of salt...they are meant purely for discussion, not as a criticism of your package, which, as I said above, is awesome and amazing.
Parcels frequently seems to make the assumption that the velocity data resides in netCDF files. This may lead to some problems down the line. There are many cases when the velocities might be in other formats. For example:
However, all of these formats can be read into xarray. As a specific example, in the Pangeo Sea Surface Height example, I can load the entire AVISO dataset from zarr format from google cloud storage in one line. I would love to be able to just call ocean parcels on this data and compute Lagrangian trajectories. That is not trivial right now.
So here are some ideas, in increasing order of difficulty / complexity
I believe it might make sense for you to accept xarray objects when creating velocity
FieldSet
objects, i.e.fieldset.from_xarray(ds)
. Looking at this code, I don't think this would be too hard. In fact, it might even already work withfieldset.from_data
, since it xarray datasets provide a dictionary-like interface. This would solve some of the problems I described above (but not all). You also use xarray internally to read netcdf data, so maybe that path could be followed instead.It would might not work with the Pangeo AVISO zarr dataset, because you also frequently coerce inputs to numpy arrays. This would trigger dask arrays to compute. It would be great to operate lazily until data is actually required for computation. This might be easy with duck typing, simply by avoiding explicit coercion to numpy arrays. Maybe the netcdf code path is lazier, but, as noted above, it only accepts netcdf files, not generic objects.
More generally, it looks like you are essentially re-implementing large portions of xarray in this package. For example
FieldSet
is conceptually similar toxarray.Dataset
andField
is conceptually similar toxarray.DataArray
. Labelled multi-dimensional arrays are an extremely common pattern in geoscience code. Many packages that once had their own implementations of these data structures have refactored to just use xarray (satpy is a great example). Your variousField.search_indices
methods are very similar to xarray's indexing operations. Xarray now has multidimensional interpolation based on scipy, which you also implement in parcels. This would of course imply a major refactor of your internal structure, so this is a very presumptuous suggestion on my part. However, there could be major advantages, includingOf course, xarray does not provide all the functionality you need for working with GCM data. Operations related to grid cells are not part of xarray. I am trying (mostly without success) to rally folks to use and contribute to xgcm for this purpose. (So that is part of my ulterior motive here.)
Anyway, of course I don't expect any of these things (other than perhaps 1) to make it onto your todo list any time soon. Clearly this package is amazing and very successful in what it does. But as an xarray developer, oceanographer, and concerned citizen of the scientific python ecosystem, I just can't resist pointing out this opportunity where using xarray more could significantly reduce your development burden and lead to enhanced interoperability.
-Ryan
Beta Was this translation helpful? Give feedback.
All reactions