-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test data covering different native (geoarrow-based) encodings #204
Add test data covering different native (geoarrow-based) encodings #204
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few notes! (I haven't checked the actual files yet)
test_data/data-linestring.wkt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look like .csv files to me...should they have a .csv extension?
geometries_wkt = [ | ||
"POINT (30 10)", | ||
"POINT EMPTY", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there also be a version of these that contain NULLs or a version that contains Z values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some null values!
test_data/generate_test_data.py
Outdated
geometries = pa.array( | ||
[[(30, 10), (10, 30), (40, 40)], []], type=linestring_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be less error-prone to construct these geometries from the WKT? Then we don't have to manually check that the WKT is equivalent to here.
Or maybe it would be better to use GeoJSON and pyarrow's JSON reader to infer geometry columns? Though then you'd need to convert from interleaved to separated coords, which is a bit involved for this test data here 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I actually tried to automate writing this hard-coded data a bit, by parsing the WKT with shapely and converting to GeoJSON. The main problem is that then I need to convert the inner lists to tuples (as json is only lists), to have the construction as a separated struct type work with pyarrow (a tuple element can be converted to a struct, but a list element not). And that gives some convoluted logic ..
cc @rouault would this (have been) useful for testing in GDAL? (xref #189 (comment)) |
yes, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've detected that there's a typo on geometry_type which should be geometry_types . Generated files should be regenerated
Ah yes, that's because I started from #134 which I wrote before we made that name change .. Thanks for the catch! |
I'm going to go ahead and merge this in, as it's been open and approved for awhile, and seems like a good edition. Any more work can be done as PR's to main. |
@jorisvandenbossche - The In addition, it would be nice to pull the version identifier out of the I'll leave the same comment on #232 (as I think it would be nice to consolidate the examples/test data into one thing that is easier to maintain). |
Yes, I realized last week with the release I suboptimally hardcoded the number here in this PR. Will try to look at it tomorrow. |
Addressing part of #199 (small example data)
And similar as #134, but focusing here on the new geoarrow-based encodings (I should also update that older PR)
This adds some tiny example data for the different geometry types:
In #134, I also saved the metadata in a separate file as well, which allows easier validation / review that this is correct. I could do that here as well, in case that is useful.
(I should still add validation of the generated files with the json schema)
Does this look useful in this form? Any suggestions to approach it differently?