-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lost timezone info when reading from cache #135
Comments
Or, instead of using the iso8601 package I could just use I need a drink. |
and you deserve one The question is: what is the desired behaviour if you ask a cached consumption of 20/04/2016 ? I agree that you expect to get the data of that day, in local time. Timezones and dates get messed up as you have illustrated in your example above. An often used principle is to store everything in UTC. This works for eg. hourly timeseries. However, caching daily data in UTC may have been a wrong idea, and the datestamps (and data) should be taken according to local time... I guess we can discuss on grid:camp :-) |
You are correct about the desired behaviour, this is exactly what Forecast.io does. It returns its 'daily' report with a timestamp at midnight, locally. If you store this in UTC you can get a date mismatch, so you'd need to localise it again, but from what? The timezone information has been thrown away. So there are two solutions: do a tz-aware caching, or store the desired timezone for each site in the Houseprint (which will bring along a whole different set of issues I'm sure) |
We need to combine both solutions. Timezone is a sensor characteristic. On Thu, Apr 21, 2016 at 11:32 AM, Jan Pecinovsky [email protected]
|
The saga continues: A Pandas DatetimeIndex only allows OR a fully localised index (eg. 'UTC' or 'Europe/Brussels') OR an index with a Fixed Offset (eg. +02:00), for all timestamps. So when we cache a localised index where a change from/to DST happens, the parser cannot convert them back to a tz-aware index, because the offset changes from +01:00 to +02:00. So this might be the final straw for me, and I'm inclined to say that we do have to save everything in UTC and add a timezone field to the houseprint... |
I have a solution: instead of saving to CSV and having to go through parsing everything, why don't we just save to pickle? That way we're sure the data is read from cache in exactly the same format as how we've written it in the first place. |
the advantage of the csv is that it also happens to be useful for excel On Thu, Apr 21, 2016 at 3:18 PM, Jan Pecinovsky [email protected]
|
Caching to JSON presents the same behaviour as to CSV. I expect every format where Pandas has to do some parsing to behave the same. |
TL;DR: There is a bug with iso8601. It is fixed, but they haven't published the fix on pypi yet.
Here I go again with the long story of a wild bug chase:
By default,
pandas.read_csv()
does not "really" support timezone-aware timestamps. I'm using double quotes because Pandas does convert them to UTC and returns them as timezone-naive. Incaching.py
, @saroele has added the linedf.index = df.index.tz_localize('UTC')
, so technically you are returned the correct time, albeit in UTC instead of the timezone you used when you did the caching.I'm caching weather data per day, my timestamps look something like
2016-04-20 00:00:00+02:00
, and after caching they look like2016-04-19 22:00:00+00:00
. This gives me really stupid errors and a lot of headaches when I try to compare dates.The solution is easy: instead of using the default pandas date parser, you can set the
date_parser
argument inpandas.read_csv()
to use the iso8601 library, like so:pandas.read_csv(date_parser=iso8601.parse_date)
. This works great: it parses the timezone-aware timestamp and uses aFixedOffset
to represent the timezone.However, when I try
.truncate()
or.loc()
on the resulting frame, iso8601 gets stuck in an infinite loop... a bug that was fixed on 2015-11-18 and will be included in the NEXT RELEASE!So I'm writing all this down so that I don't forget to check the iso8601 pypi page someday in the future to check if they have released version 0.1.12... In the meantime I'll figure out some workaround... bleeurg I hate timezones
The text was updated successfully, but these errors were encountered: