-
-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation cache #1226
base: master
Are you sure you want to change the base?
Evaluation cache #1226
Conversation
There is something I broke in the scanner which I think will cause all tests to fail with something like:
I am looking at this now. Some short-term solutions are to ignore this test here. I suspect also in CI we could go back to an older version of the scanner but this is tricky in that there was a bug in adding |
@mmatera For now I have pinned in master Mathics_Scanner at 1.0.0 which should work for now. When that is in you might have to rebase or merge your PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before undertaking such a pervasive change, it would be interesting to get a rough idea of how much improvement we are going to get.
For example a maybe come up with some scenarios based in specific examples and show how much time gain we get using this.
This is an experiment: I know that WMA implements something similar, and the reason is that this helps to avoid keeping many Expressions identical expression objects, as well as reducing the time used in applying replacing rules over the same expression. Also, this could help to reduce the time doing comparisons over identical objects. My implementation is incomplete (apart from the obvious thing that is not working) because I didn't think about a mechanism to decide when to remove elements from the cache, so even if this passes the tests, I shouldn't merge this. |
Ok - thanks for the explanations and clarifications. |
This is a result of one possible experiment:
|
Very nice. The concept is good. We should huddle over the implementation strategies though. Also, I should mention that next weekend I will start looking at comparisons and try to come up with a slightly different organization of the code, keeping the essential logic inside the comparisons the same. |
b007e8c
to
ef16eb6
Compare
8367d69
to
83bb068
Compare
9570fdd
to
499f1bf
Compare
I can try to do the rebase. In any case, this was an experiment. |
Ok.
So a very performant experiment! |
I'd like to look the whole performance issue in detail sometime. The last time I looked at this, it didn't felt more like a shot in the dark rather than looking at what's slow and attacking that. |
I think that the problem is that there is not a single performance issue, but several ones. This PR tries to implement something that the is in WMA: the expression cache. So, if you want to evaluate many times the same complicate expression, which always produces the same result, the best is to store it and reuse it. |
It's not that I don't think an expression cache will help. I in fact do think so. However the implementation that I saw previously might have been a bit lacking. Ideally we'd come up with a bunch of ideas at a high level, kick those around and then drill down from there. Although many things I do them incrementally, for some things, I don't think they should be done as incremental improvements. Like come up with some kind of cache. See if you can improve it a little, and so on. Up front, various performance measurements and estimates should be done. |
@@ -64,7 +64,7 @@ def get_symbol_list(list, error_callback): | |||
class _SetOperator(object): | |||
def assign_elementary(self, lhs, rhs, evaluation, tags=None, upset=False): | |||
# TODO: This function should be splitted and simplified | |||
|
|||
evaluation.cache_result = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line of code appear many many times. What's the rule, and reason behind when to do and when not to?
I wonder if there's a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is just an experiment to see how we can implement the Expression cache.
evaluation.cache_result = False
means that the result of this expression should not be stored in the cache. For example, control expressions, loops and file handling should not be stored in the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - thanks for the information. I got the gist of that aspect.
Generally one thinks of a cache as just working not something where you have to make 30 hand decisions as to what goes in and what doesn't.
That's what I mean by I wonder if there is a better way. Like using someone else's mechanism like LRUCache. (Hmm, where have I heard this idea before - use someone else's package rather than rolling your own?)
Since this is an experiment let's mark it as a draft for now.
Disclaimer: for now I am rewriting/revising the doc system. Although I would love to ditch that, I also want the docs to get tagged and organized better and using a more standard doc system is too big a lift for me right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I mean by I wonder if there is a better way. Like using someone else's mechanism like LRUCache. (Hmm, where have I heard this idea before - use someone else's package rather than rolling your own?)
OK, I totally agree: if there is something already implemented, let's use it. The problem here is that I didn't find something like LRUCache that works with expressions. In any case, right now I do not have much time to work on this. I just rebase this in case we want to come back to it in the future.
Since this is an experiment let's mark it as a draft for now.
Disclaimer: for now I am rewriting/revising the doc system. Although I would love to ditch that, I also want the docs to get tagged and organized better and using a more standard doc system is too big a lift for me right now.
I think this is more urgent: to have a clear code and documentation will help to get more help...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I mean by I wonder if there is a better way. Like using someone else's mechanism like LRUCache. (Hmm, where have I heard this idea before - use someone else's package rather than rolling your own?)
OK, I totally agree: if there is something already implemented, let's use it. The problem here is that I didn't find something like LRUCache that works with expressions. In any case, right now I do not have much time to work on this. I just rebase this in case we want to come back to it in the future.
Since this is an experiment let's mark it as a draft for now.
Disclaimer: for now I am rewriting/revising the doc system. Although I would love to ditch that, I also want the docs to get tagged and organized better and using a more standard doc system is too big a lift for me right now.I think this is more urgent: to have a clear code and documentation will help to get more help...
I am working on it and I hope to have the code for Django in place this week. Basically I'm continuing what I started before in on the Mathics side where we are grouping builtins - eventually according to how it appears in Mathematica 5 or listed as "Guides" in the online docs. So far it has been pleasing to see.
After the code to see this on Django is done, on the Core side we can move more stuff around and make it more closely conform to the Mathematica organization.
Get performance benchmarks and pinpoint exactly where things are slow and figure figure out why. You can use this draft to see where it lies in speeding things up. As an example, build this with Pyston and run the unit tests. I got a 30% speedup which is what the Pyston folks said to expect. What is the speedup using PyPy 3.7? But please, if you can't do this on your own, leave it. There are many many many other things you can do totally on your own. |
934273c
to
ada387b
Compare
Here is a proposal for implementing a cache expression. This should reduce the evaluation time for expressions that are evaluated many times. The idea is to keep a dictionary inside the Evaluation object, whose keys are the hashes of the expressions. The dict stores as values both the input as the output expression. Also, the implementation handles the caching of subexpressions.