Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(web-analytics): Turn timezones into UTC offsets instead of timezone names #26409

Merged
merged 4 commits into from
Nov 26, 2024

Conversation

rafaeelaudibert
Copy link
Member

Timezone names on their own aren't that useful because they're basically a harder-to-use version of displaying events by city/region. Displaying them by offsets, however, can prove to be valuable. We're doing some magic to calculate the offset instead of simply using something like timezoneOffset because toString is one of the only functions in Clickhouse that allows us to use a dynamic timezone versus a hardcoded one. This might have some performance implications, we'll figure it out soon :). To workaround that, we're converting to a string, then parsing that back to a date, and getting the unix timestamp. Comparing the UNIX timestamp in a specific timezone versus the value at UTC allows us to get the offset at that day.

See #26376

Changes

image

Does this work well for both Cloud and self-hosted?

Yep

How did you test this code?

Comprehensive unit test + manual UI testing

properties={"$session_id": session_id, "$pathname": f"/path{i}", "$timezone": distinct_id},
)

with self.assertRaisesRegex(Exception, "Cannot load time zone"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable error?

@@ -1006,50 +1006,67 @@ def test_cohort_test_filters(self):
assert results == [["/path1", 1, 1]]

def test_timezone_filter(self):
d1, s1 = "d1", str(uuid7("2024-07-30"))
d2, s2 = "d2", str(uuid7("2024-07-30"))
date = "2024-07-30"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me to be suuuper confidant this works, I'd like to see some tests that cover before / after the daylight savings change for the same timezone

Copy link
Member Author

@rafaeelaudibert rafaeelaudibert Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added it, good call

3a98085

Copy link
Contributor

github-actions bot commented Nov 26, 2024

Size Change: 0 B

Total Size: 1.16 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 1.16 MB

compressed-size-action

# Get the difference between the UNIX timestamp at UTC and the UNIX timestamp at the event's timezone
# Value is in milliseconds, turn it to hours, works even for fractional timezone offsets (I'm looking at you, Australia)
return parse_expr(
"if(isNull(properties.$timezone), NULL, (toUnixTimestamp64Milli(parseDateTimeBestEffort(assumeNotNull(toString(timestamp, properties.$timezone)))) - toUnixTimestamp64Milli(timestamp)) / 3600000)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test for key not present, value is NULL, and value is an empty string, I think you might need some extra logic to handle the empty string.

Copy link
Member

@robbie-c robbie-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one more test IMO but happy for this to be a follow-up :)

Timezone names on their own aren't that useful because they're basically a harder-to-use version of displaying events by city/region. Displaying them by offsets, however, can prove to be valuable. We're doing some magic to calculate the offset instead of simply using something like `timezoneOffset` because `toString` is one of the only functions in Clickhouse that allows us to use a dynamic timezone versus a hardcoded one. This might have some performance implications, we'll figure it out soon :). To workaround that, we're converting to a string, then parsing that back to a date, and getting the unix timestamp. Comparing the UNIX timestamp in a specific timezone versus the value at UTC allows us to get the offset at that day.

See #26376
@rafaeelaudibert rafaeelaudibert force-pushed the 26376-turn-timezones-into-utc-offsets-instead branch from 4a43be8 to d6361fd Compare November 26, 2024 15:45
@rafaeelaudibert rafaeelaudibert merged commit c0652cf into master Nov 26, 2024
96 checks passed
@rafaeelaudibert rafaeelaudibert deleted the 26376-turn-timezones-into-utc-offsets-instead branch November 26, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants