-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs for pyarrow reader / writer #46
Conversation
Interesting... I suppose I've never hit that because I'm not often saving data with the embedded Pandas-specific metadata. If you write this table without the Pandas-specific metadata it loads fine: import pyarrow as pa
import pandas as pd
import pyarrow.parquet as pq
list_int = pa.list_(pa.int64())
col = pa.array([[1, 1], [2, 2]])
table = pa.table({'col': col})
pq.write_table(table, "ex.parquet")
df = pq.read_table("ex.parquet").to_pandas(types_mapper=pd.ArrowDtype)
print(repr(df))
print(df.dtypes) gives:
Maybe the quickest way to unblock ourselves would be to have a helper function to GeoPandas that ignores any embedded GeoPandas metadata? |
Oh, sorry, I must have been testing against the >>> geopandas.read_parquet("items.parquet").head() works fine, so I think we can ignore that issue. |
I'm not 100% sure here, but I thought that I'm getting the same result as wherever you're using
|
Thanks! |
This adds docs for the pyarrow reader / writer.
I think most users of this will benefit from it being the primary method of producing stac-geoparquet (because it has the best support for nested data).
For analytics, where all the features of a library like geopandas will be desired, I'll want to work on pandas-dev/pandas#57411 which prevents (geo)pandas from reading these parquet files.