Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(experiments): Restore internal user filter for dw #26911

Merged
merged 7 commits into from
Dec 16, 2024

Conversation

danielbachhuber
Copy link
Contributor

@danielbachhuber danielbachhuber commented Dec 13, 2024

See #26332
See https://posthog.slack.com/archives/C06KJQ2RH1A/p1734117119825209?thread_ts=1733936000.594629&cid=C06KJQ2RH1A

Changes

Reverts #26818 and adds some tests

Restores the ability to use the test and internal user filter for data warehouse experiments. As it turns out, it works fine if you set up a join to the persons table.

How did you test this code?

Tests should pass.

@danielbachhuber danielbachhuber requested a review from a team December 13, 2024 20:50
Copy link
Contributor

github-actions bot commented Dec 13, 2024

Size Change: 0 B

Total Size: 1.11 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 1.11 MB

compressed-size-action

@jurajmajerik
Copy link
Contributor

jurajmajerik commented Dec 16, 2024

As it turns out, it works fine if you set up a join to the persons table.

That's great!

Comment on lines +876 to +887
_create_event(
team=self.team,
event="$feature_flag_called",
distinct_id="internal_test_1",
properties={
feature_flag_property: "test",
"$feature_flag_response": "test",
"$feature_flag": feature_flag.key,
"$user_id": "internal_test_1",
},
timestamp=datetime(2023, 1, 3),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d prefer a clearer test to explicitly verify that this event is excluded from the returned data. It might be better to have a separate method specifically for testing the filtering of test users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How might you suggest constructing the test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

  • Only populate the data warehouse records with rows including the test user id
  • The experiment query should return no result

My point is: we’re creating a test user and an event generated by them. Am I missing something, or what exactly are we doing with this user and event later on? How are we ensuring that this event is excluded from the results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we ensuring that this event is excluded from the results?

It's inferred. These values will change significantly if the event is included in results:

https://github.com/PostHog/posthog/pull/26911/files#diff-36098f224702c73f16f7fce4f65403a8d9f840e079324042dc4dabeac1176f4dL924-R971

The way I went about writing the test was:

  1. Add the data warehouse row and event so the test was failing.
  2. Add the persons table join and filterTestAccounts=True statement so the test is passing with the original conditions again.

I think what you suggest is marginally more explicit, but could also be unexpectedly erroneous in some other ways.

I updated the test to also run the query with filterTestAccounts=false and assert the results: 5ca3fd3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good 👍

@@ -906,7 +946,7 @@ def test_query_runner_with_data_warehouse_series_no_end_date_and_nested_id(self)
)

# Assert the expected join condition in the clickhouse SQL
expected_join_condition = f"and(equals(events.team_id, {query_runner.count_query_runner.team.id}), equals(event, %(hogql_val_8)s), greaterOrEquals(timestamp, assumeNotNull(parseDateTime64BestEffortOrNull(%(hogql_val_9)s, 6, %(hogql_val_10)s))), lessOrEquals(timestamp, assumeNotNull(parseDateTime64BestEffortOrNull(%(hogql_val_11)s, 6, %(hogql_val_12)s))))) AS e__events ON"
expected_join_condition = f"and(equals(events.team_id, {query_runner.count_query_runner.team.id}), equals(event, %(hogql_val_12)s), greaterOrEquals(timestamp, assumeNotNull(parseDateTime64BestEffortOrNull(%(hogql_val_13)s, 6, %(hogql_val_14)s))), lessOrEquals(timestamp, assumeNotNull(parseDateTime64BestEffortOrNull(%(hogql_val_15)s, 6, %(hogql_val_16)s))))) AS e__events ON"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud - how fragile is it to have a test set up like this, where we rely on checking the generated Clickhouse SQL? This could break if the hogql_val names change, even though it wouldn't actually affect our functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct: it does break when the hogql_val names change. I'm fine with that right now, though. It's a brittle test for a brittle piece of code, and I'd like for it to break noisily if anything changes.

I added a more descriptive exception message for anyone who runs into it though: 9fb4210

@danielbachhuber danielbachhuber merged commit 1861698 into master Dec 16, 2024
96 checks passed
@danielbachhuber danielbachhuber deleted the experiments/allow-filtering-internal branch December 16, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants