-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-886649: write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date #991
Comments
The data set used is DCSKINPRODUCT.sas7bdat from SAS-Visual-Forecasting---sample-data-sets -- Query Snowflake to examine the invalid dates
select to_varchar("DATE") from dcskinproduct limit 20;
43113053-09-24 17:00:00.000
43132219-01-30 16:00:00.000
43151384-06-05 17:00:00.000
43170549-10-11 17:00:00.000
43189715-02-16 16:00:00.000
43208880-06-22 17:00:00.000
43228045-10-28 17:00:00.000
43247211-03-05 16:00:00.000
43266376-07-10 17:00:00.000
43285541-11-15 16:00:00.000
43304707-03-23 17:00:00.000
43323872-07-27 17:00:00.000
43343037-12-02 16:00:00.000
43362203-04-09 17:00:00.000
43381368-08-13 17:00:00.000
43400533-12-19 16:00:00.000
43419699-04-25 17:00:00.000
43438864-08-30 17:00:00.000
43458030-01-05 16:00:00.000
43477195-05-13 17:00:00.000 The data type datetime64[ns] is stored internally as an int64 nanoseconds timedelta as of 1970-01-01. Given the Year values Snowflake has here, my suspicion is that the write_pandas is evaluating as if the timedelta were in milliseconds. >>> df['DATE'][0], df['DATE'].astype('int64')[0]
(Timestamp('2013-02-10 00:00:00'), 1360454400000000000) select to_varchar(timestampadd('milliseconds', 1360454400000000000, '1970-01-01'::timestamp));
43113053-09-25 00:00:00.000
select to_varchar(timestampadd('nanoseconds', 1360454400000000000, '1970-01-01'::timestamp));
2013-02-10 00:00:00.000 |
For now, I have the following workaround in place, but this should not be required. # Map datetime columns to string (needed because of bug in write_pandas)
for column in df.columns.to_list():
if is_datetime64_dtype(df[column]):
df[column] = df[column].dt.strftime('%Y-%m-%d %H:%M:%S.%f') |
linking snowflake-connector-python#600 and comment. New release of |
closing this issue as |
When will this paramter be available in Snowpark? The new parameter is not available in Snowpark for Python version 1.9.0 or 1.10.0 snowflake.snowpark.Session.write_pandas. >>> session.write_pandas(df, target_table, use_logical_type=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Session.write_pandas() got an unexpected keyword argument 'use_logical_type'
>>> from snowflake.snowpark.version import VERSION as snowpark_version
>>> print(snowpark_version)
(1, 10, 0) |
It is coming in the next release. |
My bad. I thought this was snowflake-connector-python issue |
BUMP- I'm currently running into related issues, happy to open a new issue but I assume this one is still open for a good reason. Thanks in advance for any eyes here. Edit:
I'm successfully passing
As a user, this is hard for me to triage because the staged parquets are difficult for me to peek into. In my opinion, the write_pandas command should not be able to write something like Is there a good workaround for this? My current fix is just to avoid using |
Please answer these questions before submitting your issue. Thanks!
What version of Python are you using?
Python 3.10.5
What operating system and processor architecture are you using?
Windows-10-10.0.19044-SP0
What are the component versions in the environment (
pip freeze
)?pipfreeze.txt
What did you do?
Expected Pandas datetime column to be written to Snowflake as a TIMESTAMP_NTZ. Dates are not loading properly.
sas2snow2.log
The text was updated successfully, but these errors were encountered: