-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a FILL_VALUE=-1 everywhere #299
Conversation
The external fill value is only used when converting back to dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this great improvement (especially since it also reduces the code complexity). I have tested the following:
- the example from the issue now works again as expected
- all related dfm_tools tests: 2 failing, but I see I have to update dfm_tools code/tests to solve this (Fix xugrid tests in testbank (contourf) dfm_tools#989), the xugrid implementation is correct.
- related dfm_tools example scripts (just to be sure): all work properly again
One request, would be great to have a xugrid release at a time that I can also update dfm_tools to it. This does not have to be asap, but would be great to keep in touch about this.
I think the previous release is basically half-broken, I was even thinking of whether we should yank it from conda-forge and pypi. But if we release this quickly, that might not be needed since no one will end up getting the previous one installed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got three comments. I also couldn't find in my glancing some tests where a dataset was opened with a different fill value, some operations were done, and then was saved and then read again. That should result in the same fill value attribute in the dataset.
If there isn't one of these yet, it would be worthwhile to add that to avoid any future problems with datasets with different fill values.
Fixed | ||
~~~~~ | ||
|
||
- Release 0.12.0 changed the return type of the face node connectivity of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: This should be one bullet in the list right? The first paragraph is context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd started off by just fixing the fill value mismatch, but ended up getting rid of all of them. I've changed it to a single bullet.
examples-dev/voronoi.py
Outdated
n_edge = len(edge_nodes) | ||
edge_coords = np.empty((n_edge, 2, 2), dtype=float) | ||
node_0 = edge_nodes[:, 0] | ||
node_1 = edge_nodes[:, 1] | ||
valid = (node_0 != fill_value) & (node_1 != fill_value) | ||
valid = (node_0 != -1) & (node_1 != -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not:
valid = (node_0 != -1) & (node_1 != -1) | |
valid = (node_0 != FILL_VALUE) & (node_1 != FILL_VALUE) |
That clarifies that the edge is not connected to a node and a fill value is used. It also helps identifying parts of the code that could be affected when alterations are made to how fill values are treated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had intended this more or less as a standalone-ish example. But you're right: FILL_VALUE is more clearly understood than -1.
# Internally we always use a fill value of -1. This ensures we can always index | ||
# with the fill value as well, since any array will have at least size 1. | ||
FILL_VALUE = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would you want to index with the fill value?
I can imagine this can result in some nasty bugs where an array is returned with lot of the final values, instead of dropped/set to np.nan. With a large negative FILL VALUE, you are more likely to get an error if any of this indexing with a fill value is done. Would help maintainability of the code base, as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indexing with the fill value is unavoidable because we are dealing with rectangular numpy arrays, rather than ragged/awkward arrays: https://github.com/scikit-hep/awkward
Connectivity arrays are primarily used for indexing and cannot be float typed. NaN is not an option.
Let's say you have a mixed quad/triangular grid:
vertices = np.array(
[
[0.0, 0.0], # 0
[1.0, 0.0], # 1
[2.0, 0.0], # 2
[0.0, 1.0], # 3
[1.0, 1.0], # 4
[2.0, 1.0], # 5
[1.0, 2.0], # 6
]
)
faces = np.array(
[
[0, 1, 4, 3],
[1, 2, 5, 4],
[3, 4, 6, -1],
[4, 5, 6, -1],
]
)
If you want the coordinates for every face, you simply do: coords = vertices[faces]
.
This inserts node -1 (i.e. 6) for the triangles.
But we simply "remove" them with: coords[faces == FILL_VALUE] = np.nan
.
Note that this allocates just one temporary array (faces == FILL_VALUE
).
For reference, this is how you'd avoid indexing with a fill value:
_faces = faces[faces != FILL_VALUE] # 1D result
_coords = vertices[_faces] # 1D result
coords = np.full((*faces.shape, 2), np.nan)
coords[faces == FILL_VALUE, :] = _coords
This allocates 4 temporaries: _faces, _coords, faces != FILL_VALUE, faces == FILL_VALUE
Using 0 temporarily:
_faces = faces.copy()
_faces[face == FILL_VALUE] = 0
coords = vertices[_faces]
coords[face == FILL_VALUE] = np.nan
Doesn't help you one a bit: if the array is empty, it will IndexError all the same.
This is one example, but there are several others where having -1 is very convenient given that we're working with vectorized numpy operations.
Note that any valid Ugrid1d topology has atleast two vertices, and any valid Ugrid2d topology at least three vertices. Hence, -1 will always work.
Worrying about empty topologies is valid, but the check belongs at Ugrid1d/Ugrid2d initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're feeling clever / cute, you could add (np.nan, np.nan)
as the last vertex. But the UGRID conventions don't specifiy that, they just specifiy a fill value. A fill value of -1 is fully conforming to UGRID conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, I was under the misimpression that the FILL_VALUE was also used on places beyond the connectivity matrices. Probably should have read the UGRID conventions on this.
…erent from the internal FILL_VALUE
The external fill value is only used when converting back to dataset.
Fixes #298
(Should've done this long ago, there was more than one place where a fill value != -1 might have resulted in an error.)
The error of #298 makes the last release unfortunately quite broken for datasets where a fill_value != -1 was used.
@veenstrajelmer, you might want to run the dfm tools tests with this as well. The fill values were used in a lot of places, it's possible the xugrid test still have some soft spots. If there are any failures, we should patch them and add them as new tests.