-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack overflow when merging. #93
Comments
It's not just stack usage to worry about - the spikes at the end of running Kinetic Merge as evidence for #91 show an alarming increase in heap and CPU usage when finding the longest common subsequence as part of merging: This might be from the enclosing merge algorithm, though... |
The story so far... There are two different approaches been taken, currently leading to head commits of 7850425 and 8f7683f. The first approach changes There is a rather crude hack that tries to dynamically purge the cache of partial results that is still used by the dynamic programming implementation, it is instructive to compare the size of the cache at the end of computing the top-level result with the theoretical upper bound on the number of entries in the cache without purging... (NOTE: using dynamic programming results in a performance increase compared with the original implementation, but the purging technique is crude and drastically reduces performance to well below the original's.) From running
From running Kinetic Merge:
So there is definitely some trimming going on, although perhaps it is only towards the end that it really starts to make a difference? Those entries:
are cause for concern. I had no idea that so many sections could be produced... |
OK, so the second approach goes back to what the original implementation did, doing a straightforward recursive decomposition (that is unfortunately not tail recursive) and using memoisation to imply the same benefits as using dynamic programming, in that any given subproblem isn't computed twice. It turned out that the original implementation was only somewhat successful at memoisation, this has been cutover and the results are good. Let's revisit
What's going on here is that the subproblems are evaluated top-down by recursion, and only if they are actually needed - the algorithm chooses a fork in whether it should drop a common element and thus perform just one subproblem, or needs to try dropping differing elements on each side, doing multiple subproblems. This can avoid having to calculate every single subproblem, whereas the dynamic programming approach works bottom-up and thus doesn't have the calling context to avoid exploring subproblems. Of course, the second approach is still recursive, and while it can avoid subproblems, it doesn't purge the cache. |
Checking with the first approach, the cache is being dynamically purged throughout the execution of the dynamic programming algorithm. So that's not to blame for the heap explosion - although it may be that the partial results (which are built out of |
Would using a difference list in the first approach (or just a plain list followed by a reversal) allow better structure sharing between partial results? If so, that might get us out of the heap explosion problem. (This is probably also going to affect work done on the other branch too). Another thing - the existing dynamic purging caches across index variations for the left and right sides, keeping the base index constant. In the interest of minimising the cache side, shouldn't we optimise the combination of sides? Not sure how much difference that would make in real life, though... |
Meanwhile, back with the second approach, an idea is to stick with top-down memoisation, but to use continuations (in the form of Hopefully this will allow the cache to be purged dynamically as the algorithm then works in horizontal swathes, so we can detect when all three sides have lost interest in cached partial results at a certain distance from the current 'wavefront'. |
Good news on the first approach, commit: 9f7cb5d... From running
From running Kinetic Merge:
So it's within 3G of heap during the merge / longest common subsequence when running Kinetic Merge. |
Furthermore, there was no need to optimise structure sharing between the partial solutions - just trimming the cache was enough. Commit 7850425 uses a map to implement the cache, whereas 9f7cb5d uses an array-backed deque, so possibly that also helps things? So what would happen if this was applied to the other approach? |
NOTE: prior to closing this ticket, it might be worth revisiting the breakdown of filler sections that was fudged around in #43. That had to adopt a compromise where fillers are broken down into a prefix, a potential common part and a suffix, but perhaps we could be bolder and use one-token sections now. Hopefully the latest incarnation won't suffer from the merge alignment problems that were also present... |
Back on the second approach, 69a8c70 brings us use of Performance is lousy - This code uses a fussy pure-FP approach to caching, using an immutable map implementation. What happens if we:
|
Commit 9cbe260 gives us back the imperative cache, and performance is improved, albeit not to the original level. |
Running times for Baseline, 62dd55e: 20 seconds. First approach... Dynamic programming, a9d3f7f: 20 seconds. Dynamic programing with aggressive purging of the cache, 9f7cb5d: 23 seconds. Same but without lazy values, fa04ea4: 21 seconds. Second approach... Centralize caching, 8f7683f: 15 seconds. More or less the same but without lazy values, 38891f6: 14 seconds. Use an immutable cache implementation with Same but with Go back to using an imperative cache, 9cbe260: 39 seconds. |
It's fair to say that the second approach pulled ahead early but as soon as an immutable cache plus monadic workflow was adopted, performance went to the wall. It might have been interesting to see how Even putting the imperative cache back into the mix couldn't quite get back to earlier performance levels. Now, there is still the business of trying to purge the cache that is outstanding on the second approach's branch. To do this when working top-down requires the subproblems to be evaluated breadth-first in reverse order. For that to work using a continuation-style means that all of the nodes in the call tree have to encoded as continuations - so that is all the subproblems evaluated top-down. That means that even if the cache is incrementally purged, we still have to use heap space proportional to the number of evaluated subproblems - and that number can be high sometimes, even when taking the possible savings of the top-down approach into account. That also ignores the added complexity of implementing a breadth-first evaluation strategy - unlike depth-first, it doesn't naturally emerge from the monad translation of top-down calls. So it is with regret that I have to say, "Second approach, you're fired". |
Merged on to |
Went out in Release 1.4.0. |
Seen as part of working on #91.
Reproduced in the same manner, only disable the failing invariant check discussed on that ticket. (Also seen doing WIP on fixing the failing assertion).
This is a known issue because the merge algorithm isn't tail-recursive; so far this hasn't been an issue as the use of sections derived mostly from matches keeps the size of the merge input sequences down. It seems the day has finally arrived to cutover to something stack-safe....
The text was updated successfully, but these errors were encountered: