Stack overflow when merging. #93

sageserpent-open · 2024-10-07T19:45:51Z

Seen as part of working on #91.

Reproduced in the same manner, only disable the failing invariant check discussed on that ticket. (Also seen doing WIP on fixing the failing assertion).

This is a known issue because the merge algorithm isn't tail-recursive; so far this hasn't been an issue as the use of sections derived mostly from matches keeps the size of the merge input sequences down. It seems the day has finally arrived to cutover to something stack-safe....

…g invariant seen in #91 so that execution proceeds to merging. This then exhibits the bug targeted by #93.

sageserpent-open · 2024-10-18T09:56:33Z

It's not just stack usage to worry about - the spikes at the end of running Kinetic Merge as evidence for #91 show an alarming increase in heap and CPU usage when finding the longest common subsequence as part of merging:

This might be from the enclosing merge algorithm, though...

sageserpent-open · 2024-10-20T13:34:21Z

The story so far...

There are two different approaches been taken, currently leading to head commits of 7850425 and 8f7683f.

The first approach changes LongestCommonSubsequence.of to use an exhaustive dynamic programming approach, avoiding recursive calls to gain stack safety. This works (and has been tried out with the original example of running the Kinetic Merge application), but still requires 10G of heap to avoid an out-of-memory fault.

There is a rather crude hack that tries to dynamically purge the cache of partial results that is still used by the dynamic programming implementation, it is instructive to compare the size of the cache at the end of computing the top-level result with the theoretical upper bound on the number of entries in the cache without purging...

(NOTE: using dynamic programming results in a performance increase compared with the original implementation, but the purging technique is crude and drastically reduces performance to well below the original's.)

From running LongestCommonSubsequenceTest.theLongestCommonSubsequenceUnderpinsAllThreeResults:

(4001,85140)
(408,5508)
(324,1620)
(966,12992)
(1250,20000)
(2296,44772)
(1288,18676)
(2552,52316)
(1800,40500)
(1892,44462)
(1144,25168)
(2158,36504)
(297,1200)
(2813,42398)
(1807,21840)
(544,5984)
(4675,63168)
(3276,75348)
(801,8800)
(715,7150)
(200,3900)
(1508,22620)
(1920,32640)
(986,15283)

From running Kinetic Merge:

(117,288)
(8,8)
(57,144)
(36936,2123250)
(1303,14283)
(57,144)
(8,8)
(8,8)
(8,8)
(8,8)
(43,64)
(63,125)
(43,64)
(267,1331)
(213111,33775416)
(1422,14112)
(8,8)
(454,2940)
(87,216)
(929,8800)
(13782,558092)
(8,8)

So there is definitely some trimming going on, although perhaps it is only towards the end that it really starts to make a difference? Those entries:

(36936,2123250)
(1303,14283)

(213111,33775416)

(13782,558092)

are cause for concern. I had no idea that so many sections could be produced...

sageserpent-open · 2024-10-20T13:48:48Z

OK, so the second approach goes back to what the original implementation did, doing a straightforward recursive decomposition (that is unfortunately not tail recursive) and using memoisation to imply the same benefits as using dynamic programming, in that any given subproblem isn't computed twice.

It turned out that the original implementation was only somewhat successful at memoisation, this has been cutover and the results are good. Let's revisit LongestCommonSubsequenceTest.theLongestCommonSubsequenceUnderpinsAllThreeResults, this time the first figure reflects the number of subproblem evaluations performed:


(85139,85140)
(5481,5508)
(1612,1620)
(12990,12992)
(19998,20000)
(6894,44772)
(16606,18676)
(52285,52316)
(40497,40500)
(44456,44462)
(25165,25168)
(5513,36504)
(1193,1200)
(16976,42398)
(21793,21840)
(1343,5984)
(63165,63168)
(22388,75348)
(446,8800)
(7149,7150)
(3862,3900)
(22617,22620)
(32637,32640)
(15282,15283)
(15707,15708)
(21915,21945)
(27986,28014)
(98788,98838)
(32719,47040)
(19962,19992)
(80353,80401)

What's going on here is that the subproblems are evaluated top-down by recursion, and only if they are actually needed - the algorithm chooses a fork in whether it should drop a common element and thus perform just one subproblem, or needs to try dropping differing elements on each side, doing multiple subproblems. This can avoid having to calculate every single subproblem, whereas the dynamic programming approach works bottom-up and thus doesn't have the calling context to avoid exploring subproblems.

Of course, the second approach is still recursive, and while it can avoid subproblems, it doesn't purge the cache.

sageserpent-open · 2024-10-20T14:05:23Z

Checking with the first approach, the cache is being dynamically purged throughout the execution of the dynamic programming algorithm. So that's not to blame for the heap explosion - although it may be that the partial results (which are built out of Vector instances for each side of the result) aren't efficiently sharing structure, thus exploding the heap...

sageserpent-open · 2024-10-20T14:09:55Z

Would using a difference list in the first approach (or just a plain list followed by a reversal) allow better structure sharing between partial results? If so, that might get us out of the heap explosion problem.

(This is probably also going to affect work done on the other branch too).

Another thing - the existing dynamic purging caches across index variations for the left and right sides, keeping the base index constant. In the interest of minimising the cache side, shouldn't we optimise the combination of sides? Not sure how much difference that would make in real life, though...

sageserpent-open · 2024-10-20T14:17:37Z

Meanwhile, back with the second approach, an idea is to stick with top-down memoisation, but to use continuations (in the form of Eval) to avoid stack overflow when executing subproblems. There is an extra twist in that the cached partial results are unevaluated lazy values, so the subproblem DAG can be evaluated in a breadth-first rather than depth-first fashion.

Hopefully this will allow the cache to be purged dynamically as the algorithm then works in horizontal swathes, so we can detect when all three sides have lost interest in cached partial results at a certain distance from the current 'wavefront'.

sageserpent-open · 2024-10-21T11:55:26Z

Good news on the first approach, commit: 9f7cb5d...

From running LongestCommonSubsequenceTest.theLongestCommonSubsequenceUnderpinsAllThreeResults:

(3870,85140)
(408,5508)
(324,1620)
(896,12992)
(1250,20000)
(2296,44772)
(1288,18676)
(2552,52316)
(1800,40500)
(1892,44462)
(1144,25168)
(2028,36504)
(240,1200)
(2494,42398)
(1680,21840)
(544,5984)
(3948,63168)
(3276,75348)
(800,8800)
(650,7150)
(200,3900)
(1508,22620)
(1920,32640)
(986,15283)
(1848,15708)
(2310,21945)
(1932,28014)
(3468,98838)
(2940,47040)
(1428,19992)
(4346,80401)
(2310,48510)
(1152,18432)
(1134,9639)
(3956,85054)
(384,4416)
(1302,27342)
(2516,22644)
(2070,18630)
(2080,36400)
(462,8547)
(1456,24752)
(2100,32550)
(2418,47151)
(4784,100464)
(1664,23296)
(3864,63756)
(322,5796)
(1040,9880)
(2128,39368)
(3404,73186)
(112,1288)
(372,1302)
(2100,45150)
(2376,29700)
(1320,27720)
(4472,105092)
(1280,5760)
(696,12528)
(1472,16192)
(2204,20938)

From running Kinetic Merge:

(96,288)
(8,8)
(48,144)
(33972,2123250)
(1242,14283)
(48,144)
(8,8)
(8,8)
(8,8)
(8,8)
(32,64)
(50,125)
(32,64)
(242,1331)
(212424,33775416)
(1344,14112)
(8,8)
(420,2940)
(72,216)
(880,8800)
(13612,558092)
(8,8)

From Visual VM:

So it's within 3G of heap during the merge / longest common subsequence when running Kinetic Merge.

sageserpent-open · 2024-10-21T12:02:19Z

Furthermore, there was no need to optimise structure sharing between the partial solutions - just trimming the cache was enough.

Commit 7850425 uses a map to implement the cache, whereas 9f7cb5d uses an array-backed deque, so possibly that also helps things?

So what would happen if this was applied to the other approach?

sageserpent-open · 2024-10-21T12:23:33Z

NOTE: prior to closing this ticket, it might be worth revisiting the breakdown of filler sections that was fudged around in #43. That had to adopt a compromise where fillers are broken down into a prefix, a potential common part and a suffix, but perhaps we could be bolder and use one-token sections now. Hopefully the latest incarnation won't suffer from the merge alignment problems that were also present...

sageserpent-open · 2024-10-23T10:57:22Z

Back on the second approach, 69a8c70 brings us use of Eval layered in StateT to avoid stack overflow when doing a top-down memoised recursive breakdown of the subproblems.

Performance is lousy - LongestCommonSubsequenceTest suffers an eightfold slowdown compared with 38891f6, which is more or less what we started with, albeit with some housekeeping done on the old code.

This code uses a fussy pure-FP approach to caching, using an immutable map implementation. What happens if we:

Go back to using an imperative cache implementation?
Try to drop cached entries using a strategy similar to what was done for the first approach?

sageserpent-open · 2024-10-23T14:32:42Z

Commit 9cbe260 gives us back the imperative cache, and performance is improved, albeit not to the original level.

sageserpent-open · 2024-10-23T16:12:33Z

Running times for LongestCommonSubsequenceTest.

Baseline, 62dd55e: 20 seconds.

First approach...

Dynamic programming, a9d3f7f: 20 seconds.

Dynamic programing with aggressive purging of the cache, 9f7cb5d: 23 seconds.

Same but without lazy values, fa04ea4: 21 seconds.

Second approach...

Centralize caching, 8f7683f: 15 seconds.

More or less the same but without lazy values, 38891f6: 14 seconds.

Use an immutable cache implementation with StateT over Id, 1f77026: 58 seconds!

Same but with Eval instead of Id to gain stack-safety, 69a8c70: 115 seconds!

Go back to using an imperative cache, 9cbe260: 39 seconds.

sageserpent-open · 2024-10-23T16:24:33Z

It's fair to say that the second approach pulled ahead early but as soon as an immutable cache plus monadic workflow was adopted, performance went to the wall. It might have been interesting to see how StateT over Id with an imperative cache would have fared, but ultimately we need Eval.

Even putting the imperative cache back into the mix couldn't quite get back to earlier performance levels.

Now, there is still the business of trying to purge the cache that is outstanding on the second approach's branch. To do this when working top-down requires the subproblems to be evaluated breadth-first in reverse order. For that to work using a continuation-style means that all of the nodes in the call tree have to encoded as continuations - so that is all the subproblems evaluated top-down.

That means that even if the cache is incrementally purged, we still have to use heap space proportional to the number of evaluated subproblems - and that number can be high sometimes, even when taking the possible savings of the top-down approach into account.

That also ignores the added complexity of implementing a breadth-first evaluation strategy - unlike depth-first, it doesn't naturally emerge from the monad translation of top-down calls.

So it is with regret that I have to say, "Second approach, you're fired".

sageserpent-open · 2024-10-24T11:11:41Z

Merged on to main in: 4ced27f.

sageserpent-open · 2024-12-31T16:20:21Z

Went out in Release 1.4.0.

sageserpent-open added the bug Something isn't working label Oct 7, 2024

sageserpent-open added a commit that referenced this issue Oct 7, 2024

NOTE: this commit needs to be REVERTED. Temporarily remove the failin…

58bfe27

…g invariant seen in #91 so that execution proceeds to merging. This then exhibits the bug targeted by #93.

sageserpent-open mentioned this issue Oct 8, 2024

Assertion failure when merging. #91

Closed

This was referenced Oct 24, 2024

MatchesContext should be able to handle conflicts that have more than one base section. #105

Closed

Revisit the breakdown of filler sections. #106

Open

sageserpent-open closed this as completed Oct 24, 2024

sageserpent-open mentioned this issue Oct 24, 2024

Revisit LongestCommonSubsequence.of to use a top-down recursive approach. #107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stack overflow when merging. #93

Stack overflow when merging. #93

sageserpent-open commented Oct 7, 2024 •

edited

Loading

sageserpent-open commented Oct 18, 2024

sageserpent-open commented Oct 20, 2024 •

edited

Loading

sageserpent-open commented Oct 20, 2024

sageserpent-open commented Oct 20, 2024

sageserpent-open commented Oct 20, 2024 •

edited

Loading

sageserpent-open commented Oct 20, 2024 •

edited

Loading

sageserpent-open commented Oct 21, 2024 •

edited

Loading

sageserpent-open commented Oct 21, 2024 •

edited

Loading

sageserpent-open commented Oct 21, 2024 •

edited

Loading

sageserpent-open commented Oct 23, 2024

sageserpent-open commented Oct 23, 2024 •

edited

Loading

sageserpent-open commented Oct 23, 2024 •

edited

Loading

sageserpent-open commented Oct 23, 2024 •

edited

Loading

sageserpent-open commented Oct 24, 2024

sageserpent-open commented Dec 31, 2024

Stack overflow when merging. #93

Stack overflow when merging. #93

Comments

sageserpent-open commented Oct 7, 2024 • edited Loading

sageserpent-open commented Oct 18, 2024

sageserpent-open commented Oct 20, 2024 • edited Loading

sageserpent-open commented Oct 20, 2024

sageserpent-open commented Oct 20, 2024

sageserpent-open commented Oct 20, 2024 • edited Loading

sageserpent-open commented Oct 20, 2024 • edited Loading

sageserpent-open commented Oct 21, 2024 • edited Loading

sageserpent-open commented Oct 21, 2024 • edited Loading

sageserpent-open commented Oct 21, 2024 • edited Loading

sageserpent-open commented Oct 23, 2024

sageserpent-open commented Oct 23, 2024 • edited Loading

sageserpent-open commented Oct 23, 2024 • edited Loading

sageserpent-open commented Oct 23, 2024 • edited Loading

sageserpent-open commented Oct 24, 2024

sageserpent-open commented Dec 31, 2024

sageserpent-open commented Oct 7, 2024 •

edited

Loading

sageserpent-open commented Oct 20, 2024 •

edited

Loading

sageserpent-open commented Oct 20, 2024 •

edited

Loading

sageserpent-open commented Oct 20, 2024 •

edited

Loading

sageserpent-open commented Oct 21, 2024 •

edited

Loading

sageserpent-open commented Oct 21, 2024 •

edited

Loading

sageserpent-open commented Oct 21, 2024 •

edited

Loading

sageserpent-open commented Oct 23, 2024 •

edited

Loading

sageserpent-open commented Oct 23, 2024 •

edited

Loading

sageserpent-open commented Oct 23, 2024 •

edited

Loading