-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate parallelisation of the longest common subsequence algorithm. #132
Comments
An outline of how to implement this...
|
As of 66261b3, have implemented the first, second and fifth items from above to make a sequential implementation, just to make sure the idea is correct and can pass the tests. That was a good idea, as it has been a really hard and painful slog trying to deliver just that part! The ordering of subproblems was really tricky to get right, and the implementation, while encapsulated, is a sprawling mess of imperative loops. Got to love it. The payoff though is that execution times for the example merge from #35 has come down a bit to: 1 minute 13 seconds, 1 minute 12 seconds, 1 minute 11 seconds, 1 minute 12 seconds. This is without introducing any parallelisation. It has become apparent that there is a very obvious way of parallelising three linear strands of computation of subproblems within each leading swathe without having to worry about potential overlap, so we may be in for a pleasant surprise next... |
The code is a lot tidier as of fe7023c. This is important, because part of the aforementioned really hard and painful slog was having to keep size unreadable near-duplicated blocks of code consistent with each other as bug-fixes were applied. This didn't always happen! The duplication has been removed, and performance hasn't dropped noticeably as a result of using a more principled approach. |
Right, got to b6885ee. This fuses the swathes' future computations together to make a big
It's possible that something was misconfigured - perhaps the Cats threading model needs tweaking for this? Anyway, the execution times are now: 1 minute, 1 minute, 1 minute. Let's call it a wrap! |
Branch |
This went out in Release 1.4.0. |
The example merge from #35 takes around 20 seconds to perform per-file merging at the end.
No progress bar is shown during this period, which leaves the user with a sense of the application having hung.
A spike to add a progress bar in #35 yielded very unsatisfactory results - rather than devote more time to that, how about simply speeding it up via some parallelisation?
If the execution time comes down to less than a minute, I think that beats a psychological barrier.
The implementation uses an imperative cache of subproblem solutions, but perhaps to use of swathes described in #107 could allow each swathe's problems to be processed in a parallel fashion?
The text was updated successfully, but these errors were encountered: