-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Round trip tests & various fixes #42
Conversation
IMO, no. I think that would inevitably lead to the ambiguities you mentioned around not wanting to remove nulls that were originally there.
Agreed. As long as it's valid (according to STAC) I'm OK with any formatting difference, as long as we document any potential precision loss issues, if there are any.
|
Is conversion between dict and Or maybe it would be better to create a new type for |
I think either that, or if we do make an overload for dict then it should just do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Looks good overall. Just two small questions on the assert equal definitions.
tests/test_arrow.py
Outdated
Raises: | ||
AssertionError: If the two values are not equal | ||
""" | ||
if isinstance(result, (list, tuple)) and isinstance(expected, (list, tuple)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care whether these should be considered equal?
assert_equal([1, 2], (1, 2))
IIUC, the current implementation considered them equal. If that's deliberate, might want to add it to the documented allowed variations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the JSON parser only creates lists, not tuples, so we can remove the tuples from the check here, if that's preferred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separate the two checks in f50f64c
(#42). Does that look ok to you?
key_name: str, | ||
) -> None: | ||
"""Compare two JSON numbers""" | ||
assert abs(result - expected) <= precision, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can nan
get to this point, and if so will this fail since nan != nan
by definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think JSON was able to store NaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be not standardized, but used?
In [1]: import json
In [2]: json.dumps({"a": float("nan")})
Out[2]: '{"a": NaN}'
In [3]: json.loads(_)
Out[3]: {'a': nan}
The Python docs mention
It also understands NaN, Infinity, and -Infinity as their corresponding float values, which is outside the JSON spec.
I'd say if the JSON parser that we're using supports NaN (and IIUC this code path should only be hit by JSON that was parsed by pyarrow) then let's add code to handle NaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
85b17ac
(#42) Allows NaN equality
One thing I noticed about the implementation, which is selecting bbox fields by position. Might need to update that based on the discussion in opengeospatial/geoparquet#202. |
Fixed in |
Thanks! |
Change list
Adds manual checking for recursive dict-equality, which allows for some variations between the expected and result dicts:
Adds STAC items from 12 different collections (chosen mostly at random) from planetary computer.
Support 3-dimensional bounding boxes
Test float bbox downcasting with allowed precision; test bbox round trip is exactly equal with no downcasting.
Cast null timestamp columns to
timestamp[millisecond]
type.Closes #39
Original PR description:
Creating this as a stub to ask about some behavior in the round tripping.
In particular, the round tripping is working except for this diff:
That is, because we coerce to a columnar representation, by default, pyarrow generates null values for any keys that were not originally present. Should we add a step to remove any null value? One tricky part is that we don't want to remove any field that was originally null. Essentially Arrow/Parquet have a
null
but not also anundefined
. So when converting Arrow -> Parquet we can't necessarily know thatnull
was not present in the original JSON. The key could've existed and been set tonull
.The other diff here is in the datetime string formatting. Both of these date strings conform to ISO 8601/RFC 3339, so it's probably ok to leave the output string as is?
Should we write more complex dict-equality logic in the tests? We could allow:
downcast=True
into_arrow._convert_bbox_to_struct
.