-
-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas DateTime column timezone incorrectly converted at insertion #380
Comments
Is this similar with #257? |
This issue leads to the incorrect data stored in database, while #257 is only about the read back format. With the #288 fix, this issue still exists. Btw, I don't see #288 changes query results of |
Okay. Does pure insert and select without pandas leads to incorrect data? |
Inserting plain Python datetime seems good. tz = pytz.timezone('America/Chicago')
plain_data = [
[tz.localize(datetime.datetime(2023, 6, 1, 11, 28, 5, 661537))],
[tz.localize(datetime.datetime(2023, 6, 1, 11, 28, 6, 334573))],
[tz.localize(datetime.datetime(2023, 6, 1, 11, 28, 7, 821988))],
]
client.execute('INSERT INTO debug_tbl (ts) VALUES', plain_data)
|
Describe the bug
When inserting a pandas dataframe to a Clickhouse table, and a DateTime column in the table is defined with a timezone other than UTC, timezone-aware datatime columns in the pandas dataframe will be incorrectly localized and inserted as UTC time. The actual timestamps stored in the database are shifted from the original timestamps.
This could be due to the incorrect timezone localization and conversion in apply_timezones_before_write
The
items
returned fromblock.get_column_by_index
should be UNIX timestamps and are always in UTC, butpd.to_datetime(items).tz_localize(timezone)
localizes the timestamp to the timezone defined in the table column ('America/Chicago' in the example below) incorrectly, and then converts it back to UTC. If commenting out these two parts as below, the problem is gone:The workaround works with datetime table column defined in UTC or other timezones.
To Reproduce
The returned query results are:
But it should return
In the command line, clickhouse-client shows the inserted data are not correct:
Expected behavior
The timestamps in the query result above are expected to be the same as those in the source dataframe.
Versions
The text was updated successfully, but these errors were encountered: