You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use cursor.fetch_arrow_all() in order to return snowflake data in the arrow format. Sometimes, a where filter might result in the table not having any rows. Currently, instead of returning an arrow table with the correct schema and no rows, snowflake-connector returns None.
This is quite undesirable, since one might be e.g. left-joining this table to another table in Python. With the current behavior, one completely loses the entire schema (column name and dtype), which means that one has to create a workaround to generate the schema to handle this case when there happens to be zero rows.
The pandas version of that method, fetch_pandas_all does not return None, and does return a dataframe with the types cast to the highest bitdepth of each dtype (e.g. Float64 for the float type).
The following example is
pip install 'snowflake-connector-python[pandas]'
importsnowflake.connectorcon=snowflake.connector.connect(...)
### Example returning rowsquery="select 1::int as foo, 2::float as bar"cursor=con.cursor().execute(query)
table=cursor.fetch_arrow_all()
print(table)
### CURRENT OUTPUT# pyarrow.Table# FOO: int8 not null# BAR: double not null# ----# FOO: [[1]]# BAR: [[2]]### Example returning no rowscursor=con.cursor().execute(query+" limit 0")
table=cursor.fetch_arrow_all()
print(table)
### CURRENT OUTPUT# None # <------------------------ This is the issue# DESIRED OUTPUT# pyarrow.Table# FOO: int64# BAR: double# ----# FOO: [[]]# BAR: [[]]### Example using fetch_pandas_all returning no rowscursor=con.cursor().execute(query+" limit 0")
df=cursor.fetch_pandas_all()
print(df)
print(df.dtypes)
### CURRENT OUTPUT# Empty DataFrame# Columns: [FOO, BAR]# Index: []# FOO int64# BAR float64# dtype: object
What did you expect to see?
As per the example above, I would desire a table with as correct a schema as possible. I am submitting a PR with a suggested solution to this, and in that case we return a table with a schema using the highest dtype bit-depth for each dtype, since that is what Snowflake provides us (see upcoming PR).
If necessary, I am happy to go through our company's Snowflake support to see this prioritized.
Can you set logging to DEBUG and collect the logs?
github-actionsbot
changed the title
Unexpected arrow table behaviour when query returns zero rows
SNOW-966003: Unexpected arrow table behaviour when query returns zero rows
Nov 9, 2023
Python version
Python 3.11.4 (main, Jun 20 2023, 09:03:38) [Clang 14.0.3 (clang-1403.0.22.14.1)]
Operating system and processor architecture
macOS-14.0-arm64-arm-64bit
Installed packages
What did you do?
I use
cursor.fetch_arrow_all()
in order to return snowflake data in the arrow format. Sometimes, awhere
filter might result in the table not having any rows. Currently, instead of returning an arrow table with the correct schema and no rows, snowflake-connector returnsNone
.This is quite undesirable, since one might be e.g. left-joining this table to another table in Python. With the current behavior, one completely loses the entire schema (column name and dtype), which means that one has to create a workaround to generate the schema to handle this case when there happens to be zero rows.
The pandas version of that method,
fetch_pandas_all
does not return None, and does return a dataframe with the types cast to the highest bitdepth of each dtype (e.g. Float64 for the float type).The following example is
pip install 'snowflake-connector-python[pandas]'
What did you expect to see?
As per the example above, I would desire a table with as correct a schema as possible. I am submitting a PR with a suggested solution to this, and in that case we return a table with a schema using the highest dtype bit-depth for each dtype, since that is what Snowflake provides us (see upcoming PR).
If necessary, I am happy to go through our company's Snowflake support to see this prioritized.
Can you set logging to DEBUG and collect the logs?
The text was updated successfully, but these errors were encountered: