Memory usage and performance optimizations #7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The first commit of this series solves that problem, that long RCS histories of large files (nearly 30k revisions resulting in a 4 MB file) requires tremendous amount of memory (200GB RAM were not enough...). The solution is to keep only a hash digest for revisions which will no longer be used for diffing. This way commit coalescing is still possible by using the hash but requires a lot less memory.
The next three changes avoid some unnecessary string and array copies.
This is complemented by applying the diff using a linear scan to avoid lots of small array allocations. This change might be problematic as it introduces the new assumption that a diff always contains incrementing line numbers.