-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Expose to_pandas_kwargs
in read_parquet
for pyarrow engine
#49236
Comments
xref #34823 for read_csv. |
take |
take |
Adds the `to_pandas_kwargs` parameter to `pd.read_parquet` to allow passing arguments to `pyarrow.Table.to_pandas`. This addresses issues that may arise during Parquet-to-DataFrame conversion, such as handling microsecond timestamps. Fixes pandas-dev#49236
take |
In one of closed PRs, @WillAyd brought up that he finds this a weird API and that this ties us to the pyarrow API (#57044 (comment)), and it was suggested to update the documentation instead. LIke @phofl, I am still +1 on adding this keyword here. It's indeed depending on the API of pyarrow (we could also make the keyword even more specific in name and call it something like
That is not exactly equivalent though (eg pandas has different handling of the path (eg url support), sets up some default type mappers (especially now with the string dtype this is relevant), handles |
Sounds good Joris. My objection was pretty soft, so happy to have this progressed |
I think #59654 is ready if the consensus is to go ahead with this. To me since we are already passing through arguments to |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
I want to read a parquet file but have control over how the pyarrow Table is converted to a pandas dataframe by specifying the
to_pandas_kwargs
argument in the call toTable.to_parquet()
.That raises with
The solution, in pyarrow, is to pass
timestamp_as_object=True
in the call.to_pandas()
.Feature Description
Add a new parameter to
read_parquet
(technically just the arrow engine, but adding it here for docs)Alternative Solutions
Just use pyarrow :)
Additional Context
No response
The text was updated successfully, but these errors were encountered: