-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-966003: Fix Arrow return value for zero-row queries #1801
SNOW-966003: Fix Arrow return value for zero-row queries #1801
Conversation
can we add some test here? |
Thanks for the quick response. I've added a quick test to check that the dtype of a integer column in fact is what we expect (the highest bit-depth). Feel free to add or change to this PR. My goal is only to see this behaviour fixed. |
I'm a little worried about seeing this PR go stale like #1275. Is there anything further that I can do to finish this? |
src/snowflake/connector/cursor.py
Outdated
@@ -1117,7 +1117,7 @@ def fetch_arrow_batches(self) -> Iterator[Table]: | |||
) | |||
return self._result_set._fetch_arrow_batches() | |||
|
|||
def fetch_arrow_all(self) -> Table | None: | |||
def fetch_arrow_all(self) -> Table: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfc-gh-mkeller , do you think this is a API breaking change that previously we return None
now we return an empty Table
.
I'm thinking of introducing a parameter to protect the change.
(name TBD)
def fetch_arrow_all(self, return_table_even_no_data=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could do that and then use typing.overload to express the different return types based on that boolean flag.
I'd love if we had a plan on how to introduce such changes with a timeline to deprecate them over some time. Maybe some warning wit a date'd comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts
- Here's a suggested fix using your ideas:
from typing import overload, Literal
from warnings import warn
from pyarrow import Table
# Temporarily omitting `self` in the methods in order to make it easy to test
@overload
def fetch_arrow_all(return_table_even_no_data: Literal[False] = False) -> None:
...
@overload
def fetch_arrow_all(return_table_even_no_data: Literal[True] = True) -> Table:
...
def fetch_arrow_all(return_table_even_no_data: bool = False) -> Table | None:
if not return_table_even_no_data:
warn(
"The default behaviour of fetch_arrow_all when the query returns zero rows "
"will change from returning None to returning an empty pyarrow table with "
"schema using the highest bit length in version X.X.X.",
DeprecationWarning,
stacklevel=2,
)
# regular code here
foo = fetch_arrow_all(return_table_even_no_data=True) # Table, or "Unknown" if no type stubs for pyarrow are present
bar = fetch_arrow_all(return_table_even_no_data=False) # None
I'm happy to change any of this.
-
You could consider this a bug-fix though. It is a discrepancy between the pandas and pyarrow apis. Depends on how strict you want to be about potentially breaking changes.
-
If you follow semver, then I'd expect that you would only introduce breaking changes with major version change. As a user of the connector, it would be nice to know when one expects breaking changes, even if you don't follow semver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @thomasaarholt, thanks for the proposal, I personally like it. re your third point, yeah.. sometime I feel torn when facing a breaking change bug fix, but the fact is that there might be programs that depend on this "bug" behavior, so even with a small change like this, something can go wrong in users env.
so I feel it would be safer to introduce a parameter for the new behavior.
I'm thinking of a better naming force_return_table
instead of the return_table_even_no_data
(or you have a better idea for the name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you help make the code change based upon our discussion? really appreciate your contribution and the discussion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I decided to omit the line containing the X.X.X version entirely.
Changes are implemented. This is ready for review again. |
def fetch_arrow_all(self, force_return_table: Literal[True] = True) -> Table: | ||
... | ||
|
||
def fetch_arrow_all(self, force_return_table: bool = False) -> Table | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shall also update the changelog
thanks! @thomasaarholt I did another round of review and left some comments. |
Thank you! I can't believe I didn't actually implement it...! My two-year-old is making me sleep-deprived. I'm very grateful for your code suggestions. I've incorporated them and updated the docstring with a bit more info and also overloaded the |
Btw, I would encourage you to change the pyarrow import here so that you don't set I considered doing so in this PR, but decided to avoid scope creep 😅 |
hey @thomasaarholt , I like your proposal on the for this PR, I tried to do some small changes on your PR, but looks like I don't have the permission to push to your branch..
I added another two commits which tried to fix lint, update doc: #1832. would you be good if I merge the other PR to avoid you going back and forth tweaking the minor changes? |
That’s fine! Go right ahead! I’ll be happy to make a PR on the Table tomorrow (evening in Europe now). I’ll close this when you merge the other one! |
sounds good @thomasaarholt , I will make sure the PR is merged this week. |
This PR was merged via #1832! Closing. |
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-966003: Unexpected arrow table behaviour when query returns zero rows #1800
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
This PR copies the approach done in Return empty Arrow table instead of None #1275 by @JakobGM, but based on the latest main (I'm sorry @JakobGM, but it was easier to just copy rather than rebase your solution, and I hope you don't mind that), with one more type hint added due to some later commits. That approach itself is based on the current implementation of
.fetch_pandas_all()
.This is a minor code change, copying methodology already implemented by snowflake on the pandas side, but it makes it a lot easier to work with arrow data from the snowflake-connector-python.
Please let me know if anything more is required from me.