-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethinking canonify #278
Comments
Hi @sopoforic sorry for late reply, just got to process my email! Thanks so much, that looks interesting.
Right, that makes sense. On the other hand, keeping each site in a separate file makes it much more annoying for refactoring, maintaining and seeing common patterns. I'd say it makes sense to keep most normalisers in the same file, but also have a mechanism so users could load their own files with normalisers as well.
Yep, very appreciated! It's hard to tell if it would be a suitable approach so early on (with just a couple urls). I'd say if it's okay, keep it in your branch for now -- and as you add more stuff would be clearer if it's a good approach. Skimmed the diff -- maybe one comment is that imo it's unavoidable that URL first will need to be split into protocol/domain/path/query parts, we can only get so far with regex approach. But again would probably be more evident as you add more site specific normalisers. By the way you'll be very welcome in https://memex.zulipchat.com -- there are spaces there to discuss Promnesia in particular and you might get some input from other people as well (you can login with github -- so won't need to create a new account!) |
Sure, I think it makes sense to have anything related in a single file--one file per site wasn't exactly a design requirement, just how it worked out in this little sample. Though I do think that having one giant file with a bunch of unrelated things is a recipe for trouble, so there should be a balance there.
I agree. I use
Okay, will do. I don't often encounter sites that both have problems and exist in my notes, but it won't be hard to maintain the branch as things accumulate. |
Right now, the URL normalizer in
cannon.py
is a kind of monolithic, complicated thing with special cases for certain kinds of URLs. It would make more sense to me to encapsulate all the per-site behavior cleanly, so I took a stab at it:https://github.com/sopoforic/promnesia/tree/better_specs
This branch is a first attempt at cleaning up
cannon.py
. It does two things, as a demonstration:canonify
--which is essentially whatcanonify
already does, but this way that behavior is extracted out of the main function.It also has a couple of nice features:
First, the normalizers are in a namespace package, and are loaded dynamically, so as to support users making their own private additions. It should also be easier to take PRs for better site support with this kind of structure.
Second, because it uses
__subclasses__
to accomplish this, it means users can just add something liketo their
my.config
and it'll get picked up automatically.This is just an experiment, but it's working fine for me, so I am in fact using it already. If this direction seems fruitful to you, I'll put a little more effort into this and make a PR.
The text was updated successfully, but these errors were encountered: