Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ActivityWatch to Ignoring / Filtering #302

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

NathanaelA
Copy link

@ErikBjare @johan-bjareholt - Actually when it came to implementing it -- several things changed from my initial idea that I proposed in ActivityWatch/activitywatch#1

  1. I ended up using the brand new KeyValue table in 0.12.0, So basically in the keyvalue table you can add a new key called IGNORE_FILTERS with the json value of: {"APPS": ["app1", "app2", ...], "TITLES": ["title1", "title2", ...]}

  2. The DataStore module on startup loads this JSON key, parses it into two vectors that it then uses during inserts and replace functions on the event table. If any of the values you put into the TITLES key sub-string matches the Event.title it skips adding it to the database. Ditto with the Event.app against the APPS vector. This is a substring match so if you put "BRAV" in the APPS then it will filter out any app that contains BRAV anywhere in the app name. (i.e. BRAVE, BRAVO, ABRAVING, etc...

At this point I DO NOT have any GUI to add this value, you just use SQLite (or perhaps a POST to / which acts like a KeyValue "set") but at this point the process works.

@raudabaugh
Copy link

raudabaugh commented Nov 5, 2022

Thanks for this! It's working really well for me. :)

Some things to consider:

  1. aw-watcher-window has additional filtering logic specific to macOS/Chrome here. I think it makes sense to drop such platform-specific cases and keep filtering responsibility solely within aw-server-rust.

  2. For my use case, I'd prefer that AW still adds the events to the database while redacting the title. I implemented it this way here.

Copy link
Member

@johan-bjareholt johan-bjareholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using the keystore to save the ignores for titles and apps. But there are some other rough edges that needs to be addressed.

@@ -1,3 +1,4 @@
#![feature(option_result_contains)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add any more nightly features, our aim is for aw-server-rust to be able to run on stable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure what you are referring too, using XXX.contains afaik, is what caused the lib.rs to be updated to add #![feature(option_result_contains)]. And not sure how that stops this from being run on "Stable" -- I've been running this version of the server since I pushed my PR...

Is there a better way to see if a string contains another string that won't trigger rust to auto-add that [feature] flag to the lib.rs file if this is an actual issue that I'm not understanding?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With stable I mean "rust stable", the toolchain version. We currently require rust nightly because we have one dependency which requires it (which we have plans to get rid of).

I'm not sure what you mean that rust "auto-adds" this to lib.rs, rust itself can't change the source so either you have added it youself or you have a really invasive IDE or something.

I have not used contains before, but it seems to just be syntatic sugar. You just need to add another "match" to unpack the option/result, and if it was some/ok you can just do "x == y" as usual.

Is there a better way to see if a string contains another string

That's what String::contains does, this feature is in regards to Result::contains or Option::contains. So it's possible you have a bug here in case you are not using String::contains, since the other two checks for full equality rather than for a "substring".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this was added by my IDE then. When I have time, I'll try and remove the feature flag and see what happens, maybe the IDE guessed wrong... 😀


I'm not following you completely on the potential .contains replacement. This isn't a == situation.

The .contains() is the code here:
if app.contains(&key) { and here if title.contains(&key) {
Basically it is equivalent to Javascript title.indexOf(key) !== -1 where we are seeing if this specific key value is part of the app name or title. This way if the key says "BROWSER" then if you are running an app called "INTERNET BROWSER" or "BROWSER OPERA" or "THE BROWSER IS AWESOME" all three would be ignored because it found that the string had BROWSER in it somewhere... It is does string contain another string. I'm not a rust expert by any means, but if you have a better way to see if string A contains string B then I'm willing to switch to it... 😀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to continue using contains as-is in the code here and just remove the nightly flag.

first_init,
db_version,
};
ds.get_stored_buckets(&conn)?;
ds.get_stored_ignores(&conn)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is only done on startup? So if you ever change the list of apps/titles to ignore, you have to restart aw-server? Does not seem optimal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a prototype to see how well it works. I'm using it great and my list never changes so far. I suspect just like in @raudabaugh code modification the list is generated normally once and your good to go. This list should rarely change... However, I agree that if we add some UI in front, that somehow we need to trigger the reload of the values on its change...

In addition I do believe Sam has a valid use case of rather than Ignoring the entries to use Redacted so this code should be enhanced to have Redacted_apps and Redacted_titles in addition to ignored as both are valid use cases.

@@ -459,7 +490,32 @@ impl DatastoreInstance {
)))
}
};
for event in &mut events {

'event: for event in &mut events {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be the datastores responsibility to ignore apps/titles imo. It should not manipulate any data. I think it's better to have this logic in the rest endpoint for insert_event and heartbeat.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to re-look but I could have sworn there was more than just insert_event and heartbeat that could trigger the insert, which is why I was trying to consolidate the code only in two paths rather than have multiple places for the code... But maybe I miss read the code and only insert_event and heardbeat call these two datastore code paths...

@@ -573,6 +629,26 @@ impl DatastoreInstance {
bucket_id: &str,
event: &Event,
) -> Result<(), DatastoreError> {

let app_exists = event.data.get("app");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste of the same code above. Better to put it in a generic function. Unittests also wouldn't hurt.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this should be made into a generic gunction -- the second path was because in my original implementation I found there was the second path that also created data entries (heartbeat)... So I duplicated the code to verify it would cover all the cases... Revamping the code is fine and I will check into moving this up to a higher layer, I just want to make sure I don't miss any code paths that plugins or other entities can call that will generate this data....

@real-or-random
Copy link

@NathanaelA Are you planning to continue working on this?

@NathanaelA
Copy link
Author

@real-or-random when I have some of that spare time that people talk about... It is an low priority as this PR works great for my use-case.

@2e3s
Copy link
Contributor

2e3s commented Oct 1, 2023

I made a similar feature in Linux Idle&Window watcher, but with regexp replacements. Given so many various watchers for ActivityWatch, it seems that the general rule may better be to keep watchers choosing what to send rather than implement the filtering and processing to make the server decide on what to store in DB.
+ This would also give extra security, because less unwarranted data travel through the network.
+ Also it allows to preview easily the private data without storing it on the server (if not sure of what app ID and title will be used).

@johan-bjareholt
Copy link
Member

Given so many various watchers for ActivityWatch, it seems that the general rule may better be to keep watchers choosing what to send rather than implement the filtering and processing to make the server decide on what to store in DB.

That's the big reason why implementing it on the server would make sense, all watchers could get support for filtering without having to modify all of them to support this feature. If you would want to support filtering in all watchers separately, we would have to implement this feature at least 3 times (once for aw-client-python, once for aw-client-rust, once for aw-client-js) as well as maintaining it.

I'd also argue that the previewing would be more user friendly if done on the server-side, as then you could easily support that in the web-ui.

@real-or-random
Copy link

real-or-random commented Oct 2, 2023

If you would want to support filtering in all watchers separately, we would have to implement this feature at least 3 times (once for aw-client-python, once for aw-client-rust, once for aw-client-js) as well as maintaining it.

I agree, and let me add having separate filters could be very surprising to user. And surprises are particularly bad when it comes to privacy features. For example, suppose the user has configured some filters, but then adds a new watcher, and suddenly the filters are not effective anymore.

edit: I still can imagine that some watchers would filter certain entries by default (e.g., in a browser watcher, one could ignore websites visited in private mode). But applying manual filters should be done on the server side.

@2e3s
Copy link
Contributor

2e3s commented Oct 2, 2023

all watchers could get support for filtering without having to modify all of them to support this feature

Don't all watchers use their own keys? I clicked a couple random ones from https://docs.activitywatch.net/en/latest/watchers.html and looks like it. This code in PR looks specifically for app and title which is a realm of the window watcher, which is pretty much the only one.

I'd also argue that the previewing would be more user friendly if done on the server-side

It would be more GUI oriented indeed, but the event would be stored on the server then, while the whole point of the feature is to prevent that. For instance, I don't know how bitcoin wallet is reported in terms of app ID and title, and I don't want to keep in the history the usage of it. Client-based preview is trivial, but server-one would be not.

@johan-bjareholt
Copy link
Member

but the event would be stored on the server then, while the whole point of the feature is to prevent that. For instance, I don't know how bitcoin wallet is reported in terms of app ID and title, and I don't want to keep in the history the usage of it.

It would be stored the first times when you test your filter yes, but since you then have a filter that works you could easily use that same filter to remove those events from the server.

I think it's important that we try to keep as many features as possible accessible through a GUI. We want to avoid excluding users just because they are not tech savvy when possible.

Don't all watchers use their own keys? I clicked a couple random ones from https://docs.activitywatch.net/en/latest/watchers.html and looks like it. This code in PR looks specifically for app and title which is a realm of the window watcher, which is pretty much the only one.

I agree that this is not desirable how it works in this PR and should be made more generic.

dependabot bot and others added 7 commits December 2, 2023 17:06
Bumps [dirs](https://github.com/soc/dirs-rs) from 4.0.0 to 5.0.1.
- [Commits](https://github.com/soc/dirs-rs/commits)

---
updated-dependencies:
- dependency-name: dirs
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [rusqlite](https://github.com/rusqlite/rusqlite) from 0.28.0 to 0.30.0.
- [Release notes](https://github.com/rusqlite/rusqlite/releases)
- [Changelog](https://github.com/rusqlite/rusqlite/blob/master/Changelog.md)
- [Commits](rusqlite/rusqlite@v0.28.0...v0.30.0)

---
updated-dependencies:
- dependency-name: rusqlite
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [fancy-regex](https://github.com/fancy-regex/fancy-regex) from 0.11.0 to 0.12.0.
- [Release notes](https://github.com/fancy-regex/fancy-regex/releases)
- [Changelog](https://github.com/fancy-regex/fancy-regex/blob/main/CHANGELOG.md)
- [Commits](fancy-regex/fancy-regex@0.11.0...0.12.0)

---
updated-dependencies:
- dependency-name: fancy-regex
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…SYNC_DIR` env var (ActivityWatch#457)

* docs(sync): improved usage instructions, updated docs to mention sync dir config options

* docs: added global `--sync-dir` cli param and `AW_SYNC_DIR` env var
ErikBjare and others added 13 commits May 13, 2024 13:10
…sform (ActivityWatch#484)

* Replace rocket to url for URI parsing

Url crate is most popular and basic for the task.
Rocket version in aw-transform is also out of sync with aw-server,
which may cause a conflict elsewhere.

* Easen dependency version

Reqwest specifies a particular version which may be a problem.
Currently, services dependent on aw-server.service (e.g. aw-awatcher)
start just after its launch.

This patch causes systemd to launch the dependent services only after
the rocket server is up an running.

It's aimed to help slow systems to not close the dependent service
prematurely due to not being able to connect to the server.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants