-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscellaneous lazy preprocessor improvements #2520
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2520 +/- ##
=======================================
Coverage 94.66% 94.66%
=======================================
Files 251 251
Lines 14287 14296 +9
=======================================
+ Hits 13525 13534 +9
Misses 762 762 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
17c4320
to
56a0615
Compare
@bouweandela this is a lovely PR - did you want to still work on it? It's been a while since it's been sitting out, let's get it in if you done working on it 🍺 |
It's ready for review. I would like to add more tests, but I'm not sure when I'll have time to do that, so maybe it's better to just go ahead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot @bouweandela - a few very minor points, until the last one - chunks of aux factories/coords - me not happy about that, see me comment 🍺
esmvalcore/preprocessor/_shared.py
Outdated
corresponds to a dimension of *mask* and its value provides | ||
the index in *array* which the dimension of *mask* corresponds | ||
to, so the first element of *dim_map* gives the index of *array* | ||
that corresponds to the first dimension of *mask* etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be a bit more clear:
- is it a list/tuple of ints? or of lists/tuples?
- give an example e.g.
dim_map = [2, 3, 4]
->array[2]
will be masked withmask[0]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copy/pasted from the iris docs. I could also just link that? Anyway, this is not part of our public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha OK, pls post a link then - it's as clear as a foggy November day in the North Atlantic 😁 The module is private, but this seems to be a decent function for our public API? Up to you on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it in 904e32a so the docstring now just refers to the iris.util.broadcast_to_shape
function. That function has examples: https://scitools-iris.readthedocs.io/en/stable/generated/api/iris.util.html#iris.util.broadcast_to_shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module is private, but this seems to be a decent function for our public API?
Let's keep it private for now, I'm not sure if this is the kind of functionality ESMValCore should provide.
|
||
mask = iris.util.broadcast_to_shape( | ||
mask, array.shape, dim_map=dim_map, chunks=array_chunks | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to put this in the func docstring: Raises: whatever the iris utility may raise - that utility is about 90% of this function, so I'd put it in the description/API doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not raise any obvious exceptions: https://github.com/SciTools/iris/blob/354aac1b269df36be9def15ba1a3b0fbca74e0b6/lib/iris/util.py#L32-L118
chunks + (None,) | ||
) | ||
cube.replace_coord(coord) | ||
return cube |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a costly function - is it worth its cost vs not doing anything and going with default chunks for aux coords? Also, I don't like it when we do things in our code, that should be done by Iris - this is probably something they should do at load point ie make sure the chunks are decent. Am rather not happy about this TBH, if I understand the issue correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a workaround for SciTools/iris#5457
Not doing anything will result in all workers dying when using the distributed scheduler because they use too much memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh absolutely, I commend your work 🎖️ I am frowning at Iris, bud - when you plop all the other suggestions, I'll Approve, am aware this needs be here, am just not happy it's not at iris
Co-authored-by: Valeriu Predoi <[email protected]>
Description
Various improvements to the preprocessor functions related to handling lazy data.
Related to #674
Link to documentation:
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: