Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean Merging (including code motion inside files) #13

Closed
sageserpent-open opened this issue Aug 27, 2023 · 25 comments
Closed

Clean Merging (including code motion inside files) #13

sageserpent-open opened this issue Aug 27, 2023 · 25 comments
Assignees
Labels
can-your-generative-turk-do-this enhancement New feature or request minimum viable product Contributing to the simplest thing that is useful when it is shipped

Comments

@sageserpent-open
Copy link
Owner

sageserpent-open commented Aug 27, 2023

This follows on from #1 , adding in the requirement dropped from that ticket:

Given a Git repository of files in multiple directories that has been worked on in two branches, provide a command line utility that merges one branch into another, taking into account code motion inside files.

Any divergent code motion is treated as an error and causes the merge to be aborted - the repository should be left as it was prior to the merge being attempted.

Code motion is only considered inside a file on a file-by-file basis - so when a method is extracted and placed elsewhere in the same file, it should be picked up as code motion in this ticket. Similarly for code rearrangement due to method sorting in something like IntelliJ.

Moving a class into its own file would not be picked up in this ticket, that job is for #26.

@sageserpent-open sageserpent-open added the minimum viable product Contributing to the simplest thing that is useful when it is shipped label Aug 27, 2023
@sageserpent-open sageserpent-open changed the title Clean Merging (including code motion) Clean Merging (including code motion within files) Feb 14, 2024
@sageserpent-open sageserpent-open changed the title Clean Merging (including code motion within files) Clean Merging (including code motion inside files) Feb 14, 2024
@sageserpent-open
Copy link
Owner Author

Update: breaking out code motion across files into its own ticket: #26 .

@sageserpent-open sageserpent-open added the enhancement New feature or request label Feb 15, 2024
@sageserpent-open sageserpent-open self-assigned this Mar 12, 2024
@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Mar 13, 2024

It may be the case that divergent code motion can be accomodated, at least in some situations by allowing both moves to take place; this can also pick up edits from the other side of the move in both cases. It is then up to user to decide whether to delete either or both moves, or simply accept both.

This doesn't work with ambiguous matches because multiple edits may associate to several of the matches, but that can be left as unsupported.

In fact, having ambiguous matches with code motion can lead to a situation where a moved section is associated with multiple edits on the other side - so there are edit collisions on the same side. Stay away from this for now...

@sageserpent-open
Copy link
Owner Author

More pondering leads to this rule of thumb: if an all-sides match leads to a coincident edit in the merge with the left and right contributions to the match being moved to different locations, then we have two independent edits, each applying to the move on the opposite side.

Going further, if the two moves land in the same place and are picked up by the merge as either a coincident insertion or edit, then we have an edit collision that can be represented as an insertion / insertion conflict or an edit / edit conflict.

If the two moves land in the same place that happens to be a preservation or coincident insertion because of an ambiguous match, we don't care what happens as that's not supported yet, as long as the application doesn't crash.

@sageserpent-open
Copy link
Owner Author

Regarding edit collisions due to ambiguous matches, a quick-and-dirty approach is to store the rival edits as a set, so duplicated edits don't actually conflict. If a set turns out to have more than edit, they are simply concatenated in no particular order and a diagnostic can be issued. A deletion would form its own entry in the set too.

For now however, just store one edit or deletion for each side and issue a diagnostic if a collision occurs on the same side.

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Mar 14, 2024

So in the implementation of the merge algebra for MergeResultDetectingMotion, we have to build up a mapping from the moved element or elements to either an edit or a deletion - and we have to supply the moved elements from the match and ignore any left or right element passed in by the calling merge algoithm, because that favours the left side for preservations, coincident insertions and coincident deletions.

By using the left and right elements from the match, we can distinguish between edits and / or deletions associated with both sides.

@sageserpent-open
Copy link
Owner Author

Got tests for the simpler situations exercising MergeResultDetectingMotion where there are just one edit or deletion to propagate in a move.

The more complex scenarios can wait (eg. a single edit on one side versus two distinct moves on the other side, or an edit on one side adjacent to a move destination versus a move on the other side).

The next thing to do is to wire in use of MergeResultDetectingMotion into CodeMotionAnalysisExtension and use the move mappings there - and write some example tests for CodeMotionAnalysisExtension to shows moves, moved edits and moved deletions in operation...

@sageserpent-open
Copy link
Owner Author

Got a failure in CodeMotionAnalysisExtensionTest.codeMotion, analysis follows...

@sageserpent-open
Copy link
Owner Author

BASE FILE:

package com.sageserpent.kineticmerge.core

import com.eed3si9n.expecty.Expecty

object ExpectyFlavouredAssert:
  val assert: Expecty = new Expecty:
    override val showLocation: Boolean = true
    override val showTypes: Boolean    = true
  end assert
end ExpectyFlavouredAssert

@sageserpent-open
Copy link
Owner Author

LEFT FILE:

package com.sageserpent.kineticmerge.core

import com.eed3si9n.expecty.Expecty

object ExpectyFlavouredAssert:
  val assert: Expecty = new Expecty:
    // Swapped the next two lines around...
    override val showTypes: Boolean    = true
    override val showLocation: Boolean = true

  end assert
end ExpectyFlavouredAssert

@sageserpent-open
Copy link
Owner Author

RIGHT FILE:

package com.sageserpent.kineticmerge.core

import com.eed3si9n.expecty.Expecty

object ExpectyFlavouredAssert:
  val assert: Expecty = new Expecty:
    override val showLocation: Boolean = true
    override val showTypes: Boolean    = false // This edit should propagate.
  end assert
end ExpectyFlavouredAssert

@sageserpent-open
Copy link
Owner Author

DESIRED MERGE BASED ON INTUITION:

package com.sageserpent.kineticmerge.core

import com.eed3si9n.expecty.Expecty

object ExpectyFlavouredAssert:
  val assert: Expecty = new Expecty:
    // Swapped the next two lines around...
    override val showTypes: Boolean    = false // This edit should propagate.
    override val showLocation: Boolean = true

  end assert
end ExpectyFlavouredAssert

@sageserpent-open
Copy link
Owner Author

Outcome seen as of Git commit SHA: 9c5dd26 ...

11:43:26.972 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(*** STUNT DOUBLE ***,0,28) as it is common to all three sides.
11:43:26.974 [main] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(*** STUNT DOUBLE ***,28,7) into SectionImplementation(*** STUNT DOUBLE ***,28,11).
11:43:26.975 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(*** STUNT DOUBLE ***,39,6) as it is common to all three sides.
11:43:26.975 [main] DEBUG c.s.kineticmerge.core.merge$ -- Edit conflict of SectionImplementation(*** STUNT DOUBLE ***,41,5) into SectionImplementation(*** STUNT DOUBLE ***,45,1) on the left and SectionImplementation(*** STUNT DOUBLE ***,41,12) on the right, coalescing with following left insertion of SectionImplementation(*** STUNT DOUBLE ***,46,7).
11:43:26.977 [main] DEBUG c.s.kineticmerge.core.merge$ -- Edit conflict of SectionImplementation(*** STUNT DOUBLE ***,41,5) into SectionImplementation(*** STUNT DOUBLE ***,46,7) on the left and SectionImplementation(*** STUNT DOUBLE ***,41,12) on the right, coalescing with following left insertion of SectionImplementation(*** STUNT DOUBLE ***,53,4).
11:43:26.977 [main] DEBUG c.s.kineticmerge.core.merge$ -- Edit conflict of SectionImplementation(*** STUNT DOUBLE ***,41,5) into SectionImplementation(*** STUNT DOUBLE ***,53,4) on the left and SectionImplementation(*** STUNT DOUBLE ***,41,12) on the right.
11:43:26.982 [main] DEBUG c.s.kineticmerge.core.merge$ -- Merge yielded:
MergeResultDetectingMotion(
  coreMergeResult = MergedWithConflicts(
    leftElements = Vector(
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 0, size = 28),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 28, size = 11),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 39, size = 6),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 45, size = 1),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 46, size = 7),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 53, size = 4)
    ),
    rightElements = Vector(
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 0, size = 28),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 28, size = 11),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 39, size = 6),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 41, size = 12)
    )
  ),
  changesPropagatedThroughMotion = Map()
)

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Mar 17, 2024

  1. override val showTypes: Boolean = was matched and was considered to be preserved in the same spot, contrary to the intuitive view that it would move on the left side towards the start of the file. It appeared as (..., 35, 6) on the base and right and (..., 39, 6) on the left.
  2. override val showLocation: Boolean = true was considered to be moved to the end of the file instead, being dealt with as a left-edit at its source and with the moved section appearing in a conflict at the end. It appeared as (..., 28, 7) on the base and right and (..., 46, 7) on the left.
  3. The initialiser true for override val showTypes: Boolean = true on the base and right was expected to match as a base + right match, being edited into false. This didn't happen (and as a single token, wouldn't be picked up anyway as a match), but puzzlingly, the merge algorithm didn't get the chance to make a last-minute fallback match either...
  4. ... what did happen as that on the base, the entire:true<newline>end assert<newline>end ExpectyFlavouredAssert became a section in its own right (..., 41, 5) ...
  5. ... on the left, true<newline> and then end assert<newline>end ExpectyFlavouredAssert became two sections (..., 45, 1) and (..., 53, 4) with the moved section (..., 46, 7) sitting between them ...
  6. ... and on the right, false // This edit should propagate.<newline>end assert<newline>end ExpectyFlavouredAssert came in as a final section (..., 41, 12).

So we got a conflict at the end.

@sageserpent-open
Copy link
Owner Author

The minimum sure-fire match size was 5, and as there was just the one file, no smaller matches would have been considered. So the obvious match of end assert<newline>end ExpectyFlavouredAssert across all sides didn't happen as that would be four tokens long with whitespace condensed into the preceding token.

The final section on each side was generated as a filler, hence the lack of consistency between them.

@sageserpent-open
Copy link
Owner Author

What happens with a minimum match size of 4? (Git commit SHA: 5a9a82b)

Now we see an all-sides match for end assert<newline>end ExpectyFlavouredAssert, good.

13:26:03.567 [main] DEBUG c.s.k.core.MappedContentSources -- Filling gap on side: base at path: *** STUNT DOUBLE *** prior to following section with: SectionImplementation(*** STUNT DOUBLE ***,41,1).
13:26:03.567 [main] DEBUG c.s.k.core.MappedContentSources -- Filling gap on side: left at path: *** STUNT DOUBLE *** prior to following section with: SectionImplementation(*** STUNT DOUBLE ***,28,11).
13:26:03.568 [main] DEBUG c.s.k.core.MappedContentSources -- Filling gap on side: left at path: *** STUNT DOUBLE *** prior to following section with: SectionImplementation(*** STUNT DOUBLE ***,45,1).
13:26:03.568 [main] DEBUG c.s.k.core.MappedContentSources -- Filling gap on side: right at path: *** STUNT DOUBLE *** prior to following section with: SectionImplementation(*** STUNT DOUBLE ***,41,8).
13:26:03.618 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(*** STUNT DOUBLE ***,0,28) as it is common to all three sides.
13:26:03.620 [main] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(*** STUNT DOUBLE ***,28,7) into SectionImplementation(*** STUNT DOUBLE ***,28,11).
13:26:03.620 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(*** STUNT DOUBLE ***,39,6) as it is common to all three sides.
13:26:03.621 [main] DEBUG c.s.kineticmerge.core.merge$ -- Right edit of SectionImplementation(*** STUNT DOUBLE ***,41,1) into SectionImplementation(*** STUNT DOUBLE ***,41,8).
13:26:03.621 [main] DEBUG c.s.kineticmerge.core.merge$ -- Left insertion of SectionImplementation(*** STUNT DOUBLE ***,46,7).
13:26:03.622 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(*** STUNT DOUBLE ***,53,4) as it is common to all three sides.
13:26:03.625 [main] DEBUG c.s.kineticmerge.core.merge$ -- Merge yielded:
MergeResultDetectingMotion(
  coreMergeResult = FullyMerged(
    elements = Vector(
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 0, size = 28),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 28, size = 11),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 39, size = 6),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 41, size = 8),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 46, size = 7),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 53, size = 4)
    )
  ),
  changesPropagatedThroughMotion = Map()
)

That is fully merged, seems promising ... but the test still fails!

@sageserpent-open
Copy link
Owner Author

Ah - the test expectations were too rigid - was using default equality to compare tokens, which takes whitespace into account. The tests were cutover to use the whitespace-insensitive Token.equality and after backpatching the test to check whether the changes in expectations had any effect on legacy testing, this was merged in via Git commit SHA: a694056.

Test now passes.

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Mar 17, 2024

This is good, but I'm uneasy about how much has been proven.

  1. The test analysed above didn't actually propagate any edits via moves - the edited code stayed put and the non-edited code leapfrogged over it. That is perfectly valid in its own right, but we need to prove propagation of edits (and deletions) works.
  2. Related to this is the ambiguity on the longest common subsequence algorithm about which LCS to choose when there is more than one possibility. This happens here because two lines of code are reordered - so one gets to be part of the LCS and the other is considered to move.
  3. We can jemmy the test cases so as to stumble across one that has to move an edit (or deletion), but here's a thought: it seems intuitively obvious that when reordering a long section of text with a smaller section of text, it is the smaller one that should be considered as moving - at least if both the moved and the LCS section both belong to distinct all-sides matches across each side. If one section is only matched in a pairwise match and the other in an all-sides match, it seems reasonable to treat the all-sides match as being part of the LCS (and that it was the longest common subsequence algorithm will do in a tiebreak).

@sageserpent-open
Copy link
Owner Author

Another wrinkle is that the nice intuitive picture of a single section moving on one side of the merge and being edited on the other generally won't be the case unless the minimum match size is quite large.

What will happen instead is that there will be three sections on each side - a leading and trailing one that are both matched on all sides, moving in tandem, and a middle one that has a pairwise match between the base and the side with the move - and an unmatched section that is the precisely the edit change. I expect this to work with the current implementation as of a694056, but this needs testing...

@sageserpent-open
Copy link
Owner Author

As of Git commit SHA: d3e02cb, CodeMotionAnalysisExtensionTest.codeMotion has been bolstered to show the propagation of an edit through a section move, and it passes.

Code motion has arrived!

@sageserpent-open
Copy link
Owner Author

Pushing all further work on supporting edge cases for propagating edits and deletions through moves into a separate ticket #28 .

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Mar 18, 2024

Favouring moves of small sections over moves of large sections by modifying LongestCommonSubsequence results in a merge failure in CodeMotionAnalysisExtensionTest.codeMotion, investigating this in case the change in behaviour of the LCS is exposing a fundamental defect in the downstream merge...

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Mar 19, 2024

As of Git commit SHA: 9a2fddb the longest common subsequence algorithm favours subsequences with longer section when choosing between longest common subsequences of the same length.

This is reflected in the downstream merge, where it is the smaller sections that get picked up as code motion when they interchange with larger sections.

The test failure noted above in CodeMotionAnalysisExtensionTest.codeMotion is due to the lack of support for edit anchoring. What happened was that a large construct was moved on one side and a small suffix of it edited on the other - the unchanged prefix moved cleanly through the merge, being picked up as an all-sides match, but the small suffix was not picked up as a pairwise match on the base and on the other side from the edit.

This led to the suffix moving on the other side from the edit as is, and an ordinary delete versus edit conflict at the source of the move. Had the suffix been large enough to match, then this would have resulted in a move of the suffix in tandem with the prefix; the edit being propagated through the move.

Edit anchoring is a possible way of dealing with this - unmatched sections are considered to be anchored to adjacent matched sections, so as long as the matched sections move in tandem, they could pick up the unmatched section as a move, thus allowing edit propagation.

A new ticket #29 has been made for edit anchoring; for now, we have something worth getting out there...

@sageserpent-open
Copy link
Owner Author

This went out in release 1.1.0, Git commit SHA: 26a9c9a.

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Mar 21, 2024

Evidence (screencast of a demonstration of Kinetic Merge):

Kinetic.Merge.in.action.-.part.1.-.editing.the.code.mov
Kinetic.Merge.in.action.-.part.2.-.refactoring.the.code.mov
Kinetic.Merge.in.action.-.part.3.-.merging.the.branches.mov

@sageserpent-open
Copy link
Owner Author

Closing now that evidence is uploaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can-your-generative-turk-do-this enhancement New feature or request minimum viable product Contributing to the simplest thing that is useful when it is shipped
Projects
None yet
Development

No branches or pull requests

1 participant