Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic merge commits #4

Open
laughinghan opened this issue Dec 21, 2016 · 5 comments
Open

Synthetic merge commits #4

laughinghan opened this issue Dec 21, 2016 · 5 comments

Comments

@laughinghan
Copy link
Owner

This is the subject of a lengthy multi-part comment in the code, and is still not resolved and needs improvement:

# - Complication: there can be >1 parent, with different Main trees. Luckily,

Basically the situation is, suppose there are two branches earlier in the history of Main that conflict in files outside Sub. Suppose each branch is split off into subproject commits, and the subproject commits are merged. What happens when we assimilate this merge commit (call it "assimilatee") back into the HEAD of Main? Obviously the subproject tree of the new synthetic merge commit will just be the root tree of the assimilatee, but what about the rest of the tree, which conflicts?

  • Currently, we aim to guarantee that the subsequent git merge ASSIMILATE_HEAD will not change anything outside the subproject. This is nice because it means that git-subhistory merge is almost literally just git-subhistory assimilate "$@" && git merge ASSIMILATE_HEAD, isolating the complexity to just git-subhistory assimilate. I see no reason for this to be a hard requirement though, we could instead do git-subhistory assimilate "$@" && git merge ASSIMILATE_HEAD --no-commit && git checkout HEAD <all-files-outside-subproject> (to enforce that git-subhistory merge doesn't change anything outside the subproject, which in my opinion is a hard requirement) if loosening that contract would somehow improve git-subhistory assimilate, like by making it more predictable or usable by a use case other than git-subhistory merge.
    • The current way we guarantee this could be made smarter than it is but is fundamentally full of tradeoffs, as far as I can tell. What we do is 1) if all parent commits of the assimilatee have the same Main tree outside the subproject, use that; 2) if there'll be exactly one merge base between the ASSIMILATE_HEAD and HEAD, use that merge base's Main tree, since that can't possibly conflict with HEAD; 3) if all else fails, just use HEAD's Main tree, since HEAD cannot conflict with itself. Note that paths 2 and 3 may result in a synthetic merge commit whose Main tree is very different from any of its parents, which seems pretty undesirable, ideally synthetic merge commits at least vaguely resemble the commit you'd get if you manually merged the parents.
    • One straightforward improvement to path 2 that I thought of just now as I'm writing this is to look for a merge base between the synthetic merge commit and HEAD, rather than ASSIMILATE_HEAD; that way, the synthetic merge commit's Main tree would be as close to its parents as it could be while still being based on a real Main merge commit, and also while still doing "exactly the right thing in a common use case".
    • One possible improvement to this that I considered and discarded was to search for the commit in HEAD's history where these conflicting branches were merged (HEAD is one commit so this must have happened at some point). Even if that only happens once, though, there's no guarantee that it will prevent a merge conflict between ASSIMILATE_HEAD and HEAD: if the two conflicting branches turn out to be the merge bases, then git-merge will recursively merge the multiple merge bases together, until eventually a single fictitious merge base is left, which is used as the base of the 3-way merge; the tree in this fictitious merge base has no relationship to the tree of the actual merge commit, which you can set to whatever you want when you create a merge commit.
    • That almost seems like good news, because it seems almost like we can hack the recursive git merge algorithm: if assimilatee's parents' corresponding Main commits are the merge bases, then we know recursive merge will use the merge of those as the fictitious merge base, so we can just automatically merge them right now and use that tree as the synthetic merge commit's tree, and then the tree won't conflict because it is the (fictitious) merge base's tree. But what if these branches conflict?
  • One straightforward-seeming alternative is to indeed forgo the requirement that ASSIMILATE_HEAD will be a clean merge into HEAD outside the subproject, and instead specify that synthetic merge commits always resolve conflicts in the Main tree in favor of the first parent (^1), or even to specify that synthetic merge commits don't even bother trying to merge the Main tree, outside the subproject they're always identical to the first parent's Main tree. (This might be desirable because it's common for automatic merging to result in a broken project state due to semantic merge conflicts, especially in the case of merge conflicts that are blindly resolved in favor of one branch or other.)
    • It would be pretty sad to lose the clean merge though, because in the vast majority of cases we can actually do exactly the right thing.
@laughinghan
Copy link
Owner Author

Another point against the clean merge requirement: just because ASSIMILATE_HEAD is a clean merge into HEAD now doesn't mean it will be in the future, even into commits that are a fast-forward from HEAD. In other words,

git-subhistory assimilate path/to/sub/ subproj
git pull # or `git commit ...` or whatever to add new commits on top of HEAD
git merge ASSIMILATE_HEAD

could well result in a merge conflict!

@laughinghan
Copy link
Owner Author

Right now, after going through all this, what I'm thinking going forward is (assuming I don't kill git-subhistory entirely):

  • definitely improve path 2 in the straightforward way described above
  • consider adding a path 1.5 where, if the parents have just one merge base, and all but one parent's Main tree (outside the subproject) is identical to that of the merge base, then we use the odd-one-out's Main tree
    • is that the only case where a merge results in exactly the same tree as one of the parents? I guess it's possible for A to have made a subset of the changes that B made to the merge base M? I guess anything like that is what I want?
    • basically only do an "ultra-safe, ultra-clean merge"
  • instead of path 3 (falling back to HEAD), if paths 1, 1.5, and 2 fail then fall back to copying the first parent's Main tree wholesale (outside the subproject)

Even though we lose the guarantee of a clean merge into HEAD (due to the change to path 3), these still maintain the guarantee that outside the subproject, the Main tree will always be identical to that of some Main commit. I think it also might very well do a more-or-less optimal job of doing the most sensible merge of the Main tree given the parents, while maintaining that guarantee.

Because of the loss of the clean merge guarantee, we must also indeed change git-subhistory merge to do git-subhistory assimilate "$@" && git merge ASSIMILATE_HEAD --no-commit && git checkout HEAD <all-files-outside-subproject>.

@laughinghan
Copy link
Owner Author

To summarize, the ultimate proposed algorithm is:

  1. If normal merging of the parents results in a Main tree that is (outside of the subproject) identical to one of the parent trees, use it.
  2. If there will be exactly one merge base between HEAD and the synthetic merge commit, use that merge base's Main tree. Typically this would happen with:
    ⋮
    |
    |   * Merge 2nd fix from Main
    |  /|
    | / |
    |/  |
    * 2nd fix to subproject in Main
    |   |
    |   |
    |   * Merge fix from Main
    |  /|
    | / |
    |/  |
    * Fix subproject in Main
    |   |
    |   |
    |   * Update subproject (outside Main)
    |  /
    | /
    |/
    * Initial subproject commit in Main
    |
    |
    ⋮
    
    Both kinds of Merge * from Main commits would have their merge bases be the *fix subproject in Main commits, which is exactly the right Main tree to copy.
  3. If neither of those apply, use the Main tree of the first parent.

This feels like a very clean contract to me, and I think it's "optimal" in the sense that in every case where there's an obvious right answer to a human about how to merge the Main tree of the synthetic merge commit, this algorithm would pick it, and it would also never pick any obviously wrong answers, all while maintaining the guarantee that, outside the subproject, the Main tree of the synthetic commit is always identical to that of some real, non-synthetic Main commit.

@laughinghan
Copy link
Owner Author

Man, for a day there I almost convinced myself that case 2 was strictly a subset of case 1 (and hence superfluous); it isn't, but in the course of trying to prove it I've figured out exactly why and when it's needed. Math! Proofs! It's been so long!

Case 2 is not needed in the example above, which actually also satisfies case 1 because for each of those merge commits, one parent is synthetic and therefore makes no changes to the Main tree (outside the subproject) relative to the merge base [FN1], so normal merge will result in a Main tree identical (outside the subproject) to the non-synthetic parent even if it does change the Main tree.

However, case 2 is necessary if the synthetic merge commit M's parents have multiple merge bases (due to criss-cross merges) that conflict when recursively merged, like if 1 and 2 below had a merge conflict (outside the subproject):

---1---Q---A---M
    \ /       /
     X       /
    / \     /
---2---o---B---o---o [HEAD]

There's a unique merge base between M and HEAD, namely B, and B..A (the commits that are ancestors of A but not of B) are all synthetic and should never change the Main tree outside the subproject [FN1], therefore outside the subproject the Main tree should only reflect A..B, i.e., be identical to B, which is what case 2 enforces.

One might hope that's unnecessary because if B..A make no changes to a path and only A..B make changes to the path, then you'd think the merge of A and B would be just B at that path. However, if 1 and 2 conflict, or in fact just as long as they don't satisfy cases 1 or 2 when generating synthetic merge commit Q (e.g. 1 and 2 have independent, non-conflicting changes), then Q would've been case 3 and hence would have a Main tree identical (outside the subproject) to one of 1 or 2, not their merge. But their merge is exactly the merge base that Git's default "recursive" merge strategy would use when merging A and B, so Q and therefore A would in fact have changes relative to that.

This example not only illustrates why case 2 is not a subset of case 1, but also illustrates why case 2 is exactly what we want: in the general case, even for octopus merges, if there's a unique merge base B between HEAD and synthetic merge commit M, then B..M^@ (the commits that are ancestors of any parent of M (like A) but not of B) are all synthetic and should never change the Main tree outside the subproject. Therefore, outside the subproject the Main tree should only reflect the changes in M^@..B, i.e. be identical to B.

Footnote:

  • [FN1]: By design, this is true even of subproject commits that were split from Main commits that do change the Main tree outside the subproject, such commits will essentially be cherry-picked omitting the Main tree changes outside the subproject. This design is because if a feature branch has both Main changes and subproject changes, and you split out the subproject changes, and then git-subhistory merge in those subproject changes, you obviously only want the subproject changes and not the feature branch's Main changes. So for commits with Main changes that aren't on HEAD, we cherry-pick rather than mapping back.

@Darthholi
Copy link

(Trying to understand it to help you :) But maybe in the end I will just wait for the time, when in test.sh does appear a new test that will simulate this behavior to test if your new strategy and algorithm works... :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants