Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: re2 dot all flag for non-hogql filters #19487

Closed
wants to merge 7 commits into from

Conversation

pauldambra
Copy link
Member

@pauldambra pauldambra commented Dec 21, 2023

we follow re2 regex syntax and so does ClickHouse except
For example, the string a\nb shouldn't match the pattern a.b, but it does in CH
this is because According to the re2 docs, the s flag is false by default,
but in CH it seems to be true by default.
prepending (?-s) to the regex string will make it work as expected
see ClickHouse/ClickHouse#34603


we found this while investigating queries hanging here https://posthog.slack.com/archives/C03PB072FMJ/p1703123107173919

We don't completely understand why this fixes it but in the reproduction case we had the query would hang forever (eventually killing the node) but not when prepending (?-s) to the query

I think this patches regex for both non-HogQL property filters and HogQL queries


in our docs we explicitly say dot won't match newlines https://posthog.com/tutorials/regex-basics#the-basics-of-regex so it should be safe (or at least more correct) to change this behaviour for all filters

@fuziontech
Copy link
Member

fuziontech commented Dec 21, 2023

this looks right....let me tag an expert though @EDsCODE 👀

elif operator == PropertyOperator.regex:
return ast.Call(name="match", args=[field, ast.Constant(value=value)])
return ast.Call(name="match", args=[field, ast.Constant(value=f"(?-s){value}")])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely looks correct.

# but in CH it seems to be true by default.
# prepending (?-s) to the regex string will make it work as expected
# see https://github.com/ClickHouse/ClickHouse/issues/34603
"v{}_{}".format(prepend, idx): f"(?-s){prop.value}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure this is right

@pauldambra
Copy link
Member Author

@mariusandra for the HogQL assist too?

Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not a regex expert but based on the issue this looks right. Maybe worth having a test

@fuziontech
Copy link
Member

Closing this one up since we won't be landing it (it breaks regex elsewhere in the app)

@fuziontech fuziontech closed this Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants