-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aw-transform: Add union_events_split #179
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #179 +/- ##
===========================================
+ Coverage 46.30% 59.97% +13.67%
===========================================
Files 51 44 -7
Lines 6148 4765 -1383
Branches 1454 0 -1454
===========================================
+ Hits 2847 2858 +11
+ Misses 2442 1907 -535
+ Partials 859 0 -859 ☔ View full report in Codecov by Sentry. |
d409bbb
to
2e5e4fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only taken a quick look but looks alright.
However, as we discussed AFK, I'm not convinced this is the best approach to merging the web events compared to simply replacing all browser-window events when an active browser event exists, but that has its own issues (like having to inject the browser appname, slightly changing the title by removing the appended - Mozilla Firefox
, and maybe more).
|
||
'event1: for mut event1 in events1 { | ||
let event1_endtime = event1.calculate_endtime(); | ||
'event2: for event2 in events2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a nested loop one might want to step through each list under certain conditions (similarly to how we do it in aw-server-python for some transforms: e1_i++
and e2_i++
).
The borrow checker would probably hate that though, and I guess the timestamp checks are pretty fast despite the worst-case O(N^2)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the borrow checker would have any issues with that, you could just use iterators which should work I think.
So like the opposite of filter_period_intersect and then concatenating window and browser events? |
@johan-bjareholt I think something like: window_events = query(...) # query and do AFK filtering
for browser in browsers: # this for-loop would need to be expressed in the function constructing the query
web_events = query("aw-watcher-web-$browser")
window_events_browser = filter_by(window_events, {app: $browserAppname}) # can't remember what transform is used to do this
web_events_active = filter_period_intersect(browser_events, window_events_browser)` # filter web events by browser being the active window
web_events_active = amend_data(web_events_active, {app: $browserAppname}) # optional, amends the missing {app: $browserAppname}
window_events = union(web_events_active, window_events) # picks events from web_events_active first, then fills rest with window_events Not sure if we have something like
A complete end-to-end example would be:
|
I think this would be a strange transform, inserting data into an event which is hardcoded in the query and does not come from a bucket feels like something we should try to avoid. One reason would be because it encourages people to write dynamically generated queries like we do in the web-ui which is something we don't want.
To call that "union" is very misleading in my opinion. A union should be a common ground, not exclude anything. |
What this transform does is essentially this, but it also injects the browsers appname. See the example in the code
So if we put that into perspective of window events and browser events already filtered with the browser window:
Which is pretty much exacly what you just wrote? |
For sure, I was just unsure what to call it.
I agree in general, but in this case the data does come from a bucket (and doesn't actually introduce new hardcoding, since we already have our list of But on second thought I realize it'll still be messy, due to there being multiple possible appnames for each browser.
I thought it would become:
Basically what I'm trying to get rid of splitting events into two. Edit: Ah nvm, your example would indeed become as you wrote after filtering the browser events. But it would still lead to things like this, no?:
So if I'm not mistaken, this would result in 'middle-events' where the title (which is gotten from the window event, if I understood Edit 2: I'm not sure, but maybe this could be resolved by using non-flooded window-events for the Maybe that would lead to:
Which after flooding would become the correct:
(or something similar, depending on flooding strategy) This might require a lot of extra memory though, since we'd need both Edit 3: Regardless, I'd be happy to merge this if there were more comprehensive tests (for example, checking that these 'middle-events' get created as expected). Unless we can come up with a neat solution to the problem (which I'm no longer sure there is) I think it's better to just merge this in the meantime, and work on perfecting the transforms/queries later (as this'll probably work good enough). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second review, now that I understand things better.
|
||
/* test non-object conflict, prefer map1 value */ | ||
// TODO: This does not work yet! | ||
// It should be a pretty rare use-case anyway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wouldn't be rare? Both window-events and web-events have a title
?
duration: Duration::seconds(3), | ||
data: json_map! {"test": json!(1)}, | ||
}; | ||
let mut e2 = e1.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have more realistic data here, for example:
For e1: {app: "firefox", title: "google - firefox"}
For e2: {title: "google", url: "google.com"}
I'd also like event series that are at least two events long (like the examples I recently commented about). Would also help with ensuring behavior stays consistent when we eventually remove the nested loops.
That's a very good point which I thought of before but forgot.
That sounds like a very clever way of solving it, will try that out.
I took a deep dive into the query code and it seems like you are correct here. Here's an example of how we would have to change our transforms
The query language is very inefficient with its assignments, every time we assign something it gets cloned every time it's used afterwards because we do not know if the variable will be used afterwards or not and we need to guarantee that the variable won't change. So previously when we just called But I think that the impact of just one more clone would be minimal compared to the whole issue we have today with #119, there are lots of more unnecessary clones than just this. |
Here's a query that works with this transform, the web-ui queries.ts code is really messy though so I'm having a hard time adapting it, especially now that "canonicalQuery" should probably include browser events.
|
2e5e4fd
to
77eef3d
Compare
Hi! Seems that all major problems have been solved in above discussions. What's currently blocking this PR? |
9d93857
to
096da6d
Compare
One step in fixing the "Merge web watcher events into window events in the query? (to allow for classifying by url/domain)" ActivityWatch/aw-webui#151. We already have it in the same query, but the data is not merged.
Hopefully this will be a good transform to merge window and browser data in that manner.