-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
experiment with hypothesis normaliser via HPI extraction and the corresponding 'cleanup' normaliser #28
base: master
Are you sure you want to change the base?
Conversation
@seanbreckenridge tried a quick hacky objects normaliser for hypothesis -- worked great! I guess exposing a method to parse files like in your modules is cleaner, but kinda nice config hacking works as well. It was suggesting to prune some extra files that weren't pruned before, so I realized currently I am pruning hypothesis via (probably won't merge like this, will play a bit more first with other HPI modules) |
Also forcing a module to use just one input file is kinda relevant to this karlicoss/HPI#318 (comment) |
Ah yeah, I tend to circumvent this whole problem of dealing with the config or having to specify the one file youre interested in parsing by using non-cached functions from the |
This does however mean I have to specify my files in two places, I have them hardcoded in this helper script I havent found that too annoying to maintain though, and its not always the case that the files I'm parsing for HPI are the same files I want to be running bleanser on (for example I have some partial exports that get combined as a sort of incremental export) |
Yeah, generally it's nicer to expose anyway, although in some cases DAL takes multiple inputs and does data merging itself, e.g. here https://github.com/karlicoss/rexport/blob/61eb8d219064c9f80dfb92a756b6323276314460/src/rexport/dal.py#L152-L160 Also would be kinda cool if in the future bleanser modules could be almost completely agnostic of the module implementation, e.g. using magic similar to inferring module stats to detect data providers https://github.com/karlicoss/HPI/blob/fe26efaea849e9c2b0fb57a4cc75878a45c3f8bf/my/core/stats.py#L16-L23 |
…esponding 'cleanup' normaliser
25d4dda
to
1b02e56
Compare
1b02e56
to
30f94a3
Compare
…ormaliser using HPI also see #28
…ormaliser using HPI also see #28
Wrote another module for comparing twitter databases from Android app -- proves super useful since the db has so much garbage #39 |
No description provided.