Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a FILL_VALUE=-1 everywhere #299

Merged
merged 7 commits into from
Sep 6, 2024
Merged

Use a FILL_VALUE=-1 everywhere #299

merged 7 commits into from
Sep 6, 2024

Conversation

Huite
Copy link
Collaborator

@Huite Huite commented Sep 5, 2024

The external fill value is only used when converting back to dataset.

Fixes #298

(Should've done this long ago, there was more than one place where a fill value != -1 might have resulted in an error.)

The error of #298 makes the last release unfortunately quite broken for datasets where a fill_value != -1 was used.

@veenstrajelmer, you might want to run the dfm tools tests with this as well. The fill values were used in a lot of places, it's possible the xugrid test still have some soft spots. If there are any failures, we should patch them and add them as new tests.

The external fill value is only used when converting back to dataset.
Copy link
Collaborator

@veenstrajelmer veenstrajelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great improvement (especially since it also reduces the code complexity). I have tested the following:

  • the example from the issue now works again as expected
  • all related dfm_tools tests: 2 failing, but I see I have to update dfm_tools code/tests to solve this (Fix xugrid tests in testbank (contourf) dfm_tools#989), the xugrid implementation is correct.
  • related dfm_tools example scripts (just to be sure): all work properly again

One request, would be great to have a xugrid release at a time that I can also update dfm_tools to it. This does not have to be asap, but would be great to keep in touch about this.

@Huite
Copy link
Collaborator Author

Huite commented Sep 5, 2024

I think the previous release is basically half-broken, I was even thinking of whether we should yank it from conda-forge and pypi. But if we release this quickly, that might not be needed since no one will end up getting the previous one installed.

Copy link
Contributor

@JoerivanEngelen JoerivanEngelen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got three comments. I also couldn't find in my glancing some tests where a dataset was opened with a different fill value, some operations were done, and then was saved and then read again. That should result in the same fill value attribute in the dataset.

If there isn't one of these yet, it would be worthwhile to add that to avoid any future problems with datasets with different fill values.

Fixed
~~~~~

- Release 0.12.0 changed the return type of the face node connectivity of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: This should be one bullet in the list right? The first paragraph is context

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd started off by just fixing the fill value mismatch, but ended up getting rid of all of them. I've changed it to a single bullet.

n_edge = len(edge_nodes)
edge_coords = np.empty((n_edge, 2, 2), dtype=float)
node_0 = edge_nodes[:, 0]
node_1 = edge_nodes[:, 1]
valid = (node_0 != fill_value) & (node_1 != fill_value)
valid = (node_0 != -1) & (node_1 != -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not:

Suggested change
valid = (node_0 != -1) & (node_1 != -1)
valid = (node_0 != FILL_VALUE) & (node_1 != FILL_VALUE)

That clarifies that the edge is not connected to a node and a fill value is used. It also helps identifying parts of the code that could be affected when alterations are made to how fill values are treated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had intended this more or less as a standalone-ish example. But you're right: FILL_VALUE is more clearly understood than -1.

Comment on lines +25 to +27
# Internally we always use a fill value of -1. This ensures we can always index
# with the fill value as well, since any array will have at least size 1.
FILL_VALUE = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would you want to index with the fill value?

I can imagine this can result in some nasty bugs where an array is returned with lot of the final values, instead of dropped/set to np.nan. With a large negative FILL VALUE, you are more likely to get an error if any of this indexing with a fill value is done. Would help maintainability of the code base, as far as I can tell.

Copy link
Collaborator Author

@Huite Huite Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing with the fill value is unavoidable because we are dealing with rectangular numpy arrays, rather than ragged/awkward arrays: https://github.com/scikit-hep/awkward

Connectivity arrays are primarily used for indexing and cannot be float typed. NaN is not an option.

Let's say you have a mixed quad/triangular grid:

vertices = np.array(
    [
        [0.0, 0.0],  # 0
        [1.0, 0.0],  # 1
        [2.0, 0.0],  # 2
        [0.0, 1.0],  # 3
        [1.0, 1.0],  # 4
        [2.0, 1.0],  # 5
        [1.0, 2.0],  # 6
    ]
)
faces = np.array(
    [
        [0, 1, 4, 3],
        [1, 2, 5, 4],
        [3, 4, 6, -1],
        [4, 5, 6, -1],
    ]
)

If you want the coordinates for every face, you simply do: coords = vertices[faces].
This inserts node -1 (i.e. 6) for the triangles.
But we simply "remove" them with: coords[faces == FILL_VALUE] = np.nan.

Note that this allocates just one temporary array (faces == FILL_VALUE).

For reference, this is how you'd avoid indexing with a fill value:

_faces = faces[faces != FILL_VALUE]  # 1D result
_coords = vertices[_faces]  # 1D result
coords = np.full((*faces.shape, 2), np.nan)
coords[faces == FILL_VALUE, :] = _coords

This allocates 4 temporaries: _faces, _coords, faces != FILL_VALUE, faces == FILL_VALUE

Using 0 temporarily:

_faces = faces.copy()
_faces[face == FILL_VALUE] = 0
coords = vertices[_faces]
coords[face == FILL_VALUE] = np.nan

Doesn't help you one a bit: if the array is empty, it will IndexError all the same.

This is one example, but there are several others where having -1 is very convenient given that we're working with vectorized numpy operations.

Note that any valid Ugrid1d topology has atleast two vertices, and any valid Ugrid2d topology at least three vertices. Hence, -1 will always work.

Worrying about empty topologies is valid, but the check belongs at Ugrid1d/Ugrid2d initialization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're feeling clever / cute, you could add (np.nan, np.nan) as the last vertex. But the UGRID conventions don't specifiy that, they just specifiy a fill value. A fill value of -1 is fully conforming to UGRID conventions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I was under the misimpression that the FILL_VALUE was also used on places beyond the connectivity matrices. Probably should have read the UGRID conventions on this.

@Huite Huite merged commit 4c6239a into main Sep 6, 2024
14 checks passed
@Huite Huite deleted the remove-fill-value branch September 6, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

contourf fails for face variables with tri/quad mixed topologies
3 participants