Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experiment with hypothesis normaliser via HPI extraction and the corresponding 'cleanup' normaliser #28

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

karlicoss
Copy link
Owner

No description provided.

@karlicoss
Copy link
Owner Author

karlicoss commented Oct 17, 2023

@seanbreckenridge tried a quick hacky objects normaliser for hypothesis -- worked great! I guess exposing a method to parse files like in your modules is cleaner, but kinda nice config hacking works as well.

It was suggesting to prune some extra files that weren't pruned before, so I realized currently I am pruning hypothesis via json module, so helped me identify some flaky fields and implement a 'cleanup' normaliser.

(probably won't merge like this, will play a bit more first with other HPI modules)

@karlicoss
Copy link
Owner Author

Also forcing a module to use just one input file is kinda relevant to this karlicoss/HPI#318 (comment)

@purarue
Copy link
Contributor

purarue commented Oct 17, 2023

Ah yeah, I tend to circumvent this whole problem of dealing with the config or having to specify the one file youre interested in parsing by using non-cached functions from the ..._export library or the _parse_ function in the HPI code, not actually using the public-facing HPI functions. As examples:

https://github.com/seanbreckenridge/bleanser/blob/c96195b7b32769db5f7f351224a77fac217da4c6/src/bleanser_sean/modules/activitywatch.py#L4

https://github.com/seanbreckenridge/bleanser/blob/c96195b7b32769db5f7f351224a77fac217da4c6/src/bleanser_sean/modules/discord.py#L27

https://github.com/seanbreckenridge/bleanser/blob/c96195b7b32769db5f7f351224a77fac217da4c6/src/bleanser_sean/modules/zsh.py#L4

@purarue
Copy link
Contributor

purarue commented Oct 17, 2023

This does however mean I have to specify my files in two places, I have them hardcoded in this helper script

I havent found that too annoying to maintain though, and its not always the case that the files I'm parsing for HPI are the same files I want to be running bleanser on (for example I have some partial exports that get combined as a sort of incremental export)

@karlicoss
Copy link
Owner Author

Yeah, generally it's nicer to expose anyway, although in some cases DAL takes multiple inputs and does data merging itself, e.g. here https://github.com/karlicoss/rexport/blob/61eb8d219064c9f80dfb92a756b6323276314460/src/rexport/dal.py#L152-L160

Also would be kinda cool if in the future bleanser modules could be almost completely agnostic of the module implementation, e.g. using magic similar to inferring module stats to detect data providers https://github.com/karlicoss/HPI/blob/fe26efaea849e9c2b0fb57a4cc75878a45c3f8bf/my/core/stats.py#L16-L23

@karlicoss
Copy link
Owner Author

Wrote another module for comparing twitter databases from Android app -- proves super useful since the db has so much garbage #39
I'll start merging them into hpi/ subdirectory just so code isn't lost and runs the CI, later can think of something better as we shape them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants