Use a FILL_VALUE=-1 everywhere #299

Huite · 2024-09-05T17:05:21Z

The external fill value is only used when converting back to dataset.

Fixes #298

(Should've done this long ago, there was more than one place where a fill value != -1 might have resulted in an error.)

The error of #298 makes the last release unfortunately quite broken for datasets where a fill_value != -1 was used.

@veenstrajelmer, you might want to run the dfm tools tests with this as well. The fill values were used in a lot of places, it's possible the xugrid test still have some soft spots. If there are any failures, we should patch them and add them as new tests.

The external fill value is only used when converting back to dataset.

veenstrajelmer

Thanks for this great improvement (especially since it also reduces the code complexity). I have tested the following:

the example from the issue now works again as expected
all related dfm_tools tests: 2 failing, but I see I have to update dfm_tools code/tests to solve this (Fix xugrid tests in testbank (contourf) dfm_tools#989), the xugrid implementation is correct.
related dfm_tools example scripts (just to be sure): all work properly again

One request, would be great to have a xugrid release at a time that I can also update dfm_tools to it. This does not have to be asap, but would be great to keep in touch about this.

Huite · 2024-09-05T19:02:45Z

I think the previous release is basically half-broken, I was even thinking of whether we should yank it from conda-forge and pypi. But if we release this quickly, that might not be needed since no one will end up getting the previous one installed.

JoerivanEngelen

I've got three comments. I also couldn't find in my glancing some tests where a dataset was opened with a different fill value, some operations were done, and then was saved and then read again. That should result in the same fill value attribute in the dataset.

If there isn't one of these yet, it would be worthwhile to add that to avoid any future problems with datasets with different fill values.

JoerivanEngelen · 2024-09-06T14:34:57Z

docs/changelog.rst

+Fixed
+~~~~~
+
+- Release 0.12.0 changed the return type of the face node connectivity of


Nitpick: This should be one bullet in the list right? The first paragraph is context

Yeah, I'd started off by just fixing the fill value mismatch, but ended up getting rid of all of them. I've changed it to a single bullet.

JoerivanEngelen · 2024-09-06T14:40:46Z

examples-dev/voronoi.py

    n_edge = len(edge_nodes)
    edge_coords = np.empty((n_edge, 2, 2), dtype=float)
    node_0 = edge_nodes[:, 0]
    node_1 = edge_nodes[:, 1]
-    valid = (node_0 != fill_value) & (node_1 != fill_value)
+    valid = (node_0 != -1) & (node_1 != -1)


Why not:

Suggested change

valid = (node_0 != -1) & (node_1 != -1)

valid = (node_0 != FILL_VALUE) & (node_1 != FILL_VALUE)

That clarifies that the edge is not connected to a node and a fill value is used. It also helps identifying parts of the code that could be affected when alterations are made to how fill values are treated.

I had intended this more or less as a standalone-ish example. But you're right: FILL_VALUE is more clearly understood than -1.

JoerivanEngelen · 2024-09-06T14:49:08Z

xugrid/constants.py

+# Internally we always use a fill value of -1. This ensures we can always index
+# with the fill value as well, since any array will have at least size 1.
+FILL_VALUE = -1


When would you want to index with the fill value?

I can imagine this can result in some nasty bugs where an array is returned with lot of the final values, instead of dropped/set to np.nan. With a large negative FILL VALUE, you are more likely to get an error if any of this indexing with a fill value is done. Would help maintainability of the code base, as far as I can tell.

Indexing with the fill value is unavoidable because we are dealing with rectangular numpy arrays, rather than ragged/awkward arrays: https://github.com/scikit-hep/awkward

Connectivity arrays are primarily used for indexing and cannot be float typed. NaN is not an option.

Let's say you have a mixed quad/triangular grid:

vertices = np.array( [ [0.0, 0.0], # 0 [1.0, 0.0], # 1 [2.0, 0.0], # 2 [0.0, 1.0], # 3 [1.0, 1.0], # 4 [2.0, 1.0], # 5 [1.0, 2.0], # 6 ] ) faces = np.array( [ [0, 1, 4, 3], [1, 2, 5, 4], [3, 4, 6, -1], [4, 5, 6, -1], ] )

If you want the coordinates for every face, you simply do: coords = vertices[faces].
This inserts node -1 (i.e. 6) for the triangles.
But we simply "remove" them with: coords[faces == FILL_VALUE] = np.nan.

Note that this allocates just one temporary array (faces == FILL_VALUE).

For reference, this is how you'd avoid indexing with a fill value:

_faces = faces[faces != FILL_VALUE] # 1D result _coords = vertices[_faces] # 1D result coords = np.full((*faces.shape, 2), np.nan) coords[faces == FILL_VALUE, :] = _coords

This allocates 4 temporaries: _faces, _coords, faces != FILL_VALUE, faces == FILL_VALUE

Using 0 temporarily:

_faces = faces.copy() _faces[face == FILL_VALUE] = 0 coords = vertices[_faces] coords[face == FILL_VALUE] = np.nan

Doesn't help you one a bit: if the array is empty, it will IndexError all the same.

This is one example, but there are several others where having -1 is very convenient given that we're working with vectorized numpy operations.

Note that any valid Ugrid1d topology has atleast two vertices, and any valid Ugrid2d topology at least three vertices. Hence, -1 will always work.

Worrying about empty topologies is valid, but the check belongs at Ugrid1d/Ugrid2d initialization.

If you're feeling clever / cute, you could add (np.nan, np.nan) as the last vertex. But the UGRID conventions don't specifiy that, they just specifiy a fill value. A fill value of -1 is fully conforming to UGRID conventions.

Thanks for the explanation, I was under the misimpression that the FILL_VALUE was also used on places beyond the connectivity matrices. Probably should have read the UGRID conventions on this.

…erent from the internal FILL_VALUE

Use a FILL_VALUE=-1 everywhere.

333419b

The external fill value is only used when converting back to dataset.

Huite requested review from JoerivanEngelen and veenstrajelmer September 5, 2024 17:05

Huite added 2 commits September 5, 2024 20:22

Update voronoi dev example

a1c0a7f

Update changelog

15bb386

veenstrajelmer approved these changes Sep 5, 2024

View reviewed changes

Mention to_netcdf in changelog too

8e0fed4

JoerivanEngelen reviewed Sep 6, 2024

View reviewed changes

Huite added 3 commits September 6, 2024 17:23

Address review comments

125a0c8

Only replace the fill_value in the returned connectivity if it's diff…

ee87753

…erent from the internal FILL_VALUE

Add a test to assert that fill_value is set back

81fa3a6

JoerivanEngelen approved these changes Sep 6, 2024

View reviewed changes

Huite merged commit 4c6239a into main Sep 6, 2024
14 checks passed

Huite deleted the remove-fill-value branch September 6, 2024 17:01

veenstrajelmer mentioned this pull request Sep 9, 2024

Support a start_index of 1 #300

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a FILL_VALUE=-1 everywhere #299

Use a FILL_VALUE=-1 everywhere #299

Huite commented Sep 5, 2024 •

edited

Loading

veenstrajelmer left a comment •

edited

Loading

Huite commented Sep 5, 2024

JoerivanEngelen left a comment •

edited

Loading

JoerivanEngelen Sep 6, 2024

Huite Sep 6, 2024

JoerivanEngelen Sep 6, 2024

Huite Sep 6, 2024

JoerivanEngelen Sep 6, 2024

Huite Sep 6, 2024 •

edited

Loading

Huite Sep 6, 2024

JoerivanEngelen Sep 6, 2024

	valid = (node_0 != -1) & (node_1 != -1)
	valid = (node_0 != FILL_VALUE) & (node_1 != FILL_VALUE)

Use a FILL_VALUE=-1 everywhere #299

Use a FILL_VALUE=-1 everywhere #299

Conversation

Huite commented Sep 5, 2024 • edited Loading

veenstrajelmer left a comment • edited Loading

Choose a reason for hiding this comment

Huite commented Sep 5, 2024

JoerivanEngelen left a comment • edited Loading

Choose a reason for hiding this comment

JoerivanEngelen Sep 6, 2024

Choose a reason for hiding this comment

Huite Sep 6, 2024

Choose a reason for hiding this comment

JoerivanEngelen Sep 6, 2024

Choose a reason for hiding this comment

Huite Sep 6, 2024

Choose a reason for hiding this comment

JoerivanEngelen Sep 6, 2024

Choose a reason for hiding this comment

Huite Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

Huite Sep 6, 2024

Choose a reason for hiding this comment

JoerivanEngelen Sep 6, 2024

Choose a reason for hiding this comment

Huite commented Sep 5, 2024 •

edited

Loading

veenstrajelmer left a comment •

edited

Loading

JoerivanEngelen left a comment •

edited

Loading

Huite Sep 6, 2024 •

edited

Loading