-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: funnel actors queries on udf #25013
Conversation
2fce29a
to
13f8a8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything is perfect! I didn't manage to test associated session recordings locally. It's always empty, but it's also empty without UDFs. One potential low-hanging fruit is using the orjson module instead of json since we deal with massive data arrays. When Ansible scripts for UDFs are in place, modules should be easier to use.
@skoob13 Just did a quick benchmark to see what's what. In the benchmark, I parsed a json blob, iterated through a 100 item array and multipled each item by 2, to emulate doing some logic in the UDF, and then printed out the result, 1,000,000 times. Python JSON: 14.8s A little surprised about the node performance, thought it would be a bit faster! Using orjson definitely makes sense, but can you think of an easy way to administer it? Would we have to set up ansible to maintain a virtualenv at each clickhouse node? Or maybe their distro provides a package they can install natively.
Python
JS
Rust
|
if property not in self._extra_event_properties: | ||
self._extra_event_properties.append(property) | ||
|
||
# I think I can delete this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you? 😄
@@ -1,158 +1,44 @@ | |||
# serializer version: 1 | |||
# name: TestFunnelPersons.test_funnel_person_recordings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like all the TestFunnelPersons
turned into TestFunnelPersonsUDF
– is this intended here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I put the UDF class in its own file...
And the test_funnel_persons.ambr still has all the TestFunnelPersonsUDF stuff and test_funnel_persons_udf.ambr has TestFunnelPersons in it.
🤯
@aspicer the other day, I was watching this video, which gives a nice overview of basic UDF optimizations. We'll need to run benchmarks for I'm not sure how we can orchestrate venv though. Potentially, we can use another format like Native, which seems to be parseable with the in-built Additionally, it seems that the most significant achievable perf benefit we can achieve with Python is to use Perf is not important right now, so let's postpone this discussion. |
6d5bf56
to
3abc7a5
Compare
3abc7a5
to
ff7d4e7
Compare
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
8980cc6
to
c9fc432
Compare
Move funnel actors queries over to UDF if you have the feature flag on.
Passes the UUID of each event to the UDF. The UDF returns, for each step, a list of event UUIDs that correspond to the funnel matches for each user.
Adds in timing fields used by correlation and user paths.
Also a tiny bit of cleanup of the superfluous leftovers in SQL that were hanging on from the old queries.
Testing
Tested with tests. Added UDF actors query unit tests. Also tested quite a bit on dev.