-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance enhancements #73
base: main
Are you sure you want to change the base?
Conversation
Delivers significant performance improvements by caching previously computed results.
# Conflicts: # edtf/fields.py # edtf/jdutil.py # edtf/natlang/en.py # edtf/natlang/tests.py # edtf/parser/grammar.py # edtf/parser/parser_classes.py # edtf/parser/tests.py # pyproject.toml # setup.py
Not sure why it's failing -- the tests all pass and work on my end. |
Also added Andrew Hankinson to the authors list in pyproject.toml
It occurs to me that a big part of the reason for the speed-up is the use of the functools lru_cache, which will cache the result of the same input string and skip the computation of that value, instead returning only the cached value. While this is useful in real-world contexts, it's probably not so useful in benchmark runs! |
@aweakley does this PR look OK? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thank you very much. I've just put a couple of comments - I think some things were duplicated in a merge at some point maybe? and one regex looks different.
edtf/parser/parser_classes.py
Outdated
def get_month(self) -> Optional[int]: | ||
return self._month | ||
|
||
month = property(get_month, set_month) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods duplicate the ones above. I'm not sure if that's just a weird merge thing though?
I've had a pass at the Parser Classes file, but there are a lot of problems still to be sorted out. I've added return types and argument types whereever it makes sense. The "UncertainOrApproximate" class is a hot mess. There are boolean values with property and method calls associated with them, and I would be surprised if it actually works. However, it doesn't seem to be tested or implemented, so I can't figure out where to go from here.
This wasn't actually used anywhere! Also removed a redundant regex group
I've done a bit of cleanup, but the parser_classes file is still quite buggy. Unfortunately I don't quite know how to trigger the bugs, but I'm at least confident that I didn't make it much worse. |
Thank you. I'll check it out. |
@ahankinson The tests are passing for me, are there particular bugs you were worried about in your comment above? |
Yeah the tests pass but I don’t think they exercise the parser classes that much. I have the ruff checker running on that and it still highlights a bunch of stuff |
It's great to see that this project has some life in it now. I have been using it for a while and maintaining my own fork with some performance enhancements.
Primarily, the main enhancements have been to enable packratting (which seems to now be enabled) and pre-declaring and compiling the regexes in the natural language processor.
If the current benchmarks are to be believed, this leads to a significant speedup. All the tests pass, so I guess it's OK?
https://rism-digital.github.io/python-edtf/dev/bench/
I've gone through and tried to integrate the latest changes, but that was a big PR so another set of eyes would be helpful.