git-subhistory
— Interchangeably merge in and split out subtree history
by Han [email protected]
git-subhistory
manages virtual subproject repos inside a superproject
repo. Like git-subtree
but unlike git-submodule
, git-subhistory
is
stateless, you pass the subproject's subdirectory (path/to/sub/
) as an
argument but the subdirectory isn't specially marked in any way. The
subproject (Sub)'s files are tracked directly by the superproject (Main)
repo like any other files, and no other Git tools know or care that
path/to/sub/
contains a subproject.
You know how you can git log path/to/sub/
to see the history of just
the stuff in path/to/sub/
? Well, git-subhistory split
creates actual
commits of just that stuff. This separate commit history of just Sub can
then be git push
-ed to its own repo to be shared with other projects.
If commits are added on top of that separate commit history,
git-subhistory merge
can merge those changes back into Main's commit
history by inverting split
, creating separate commits that make the
same changes but inside path/to/sub/
, which can then be merged in.
(git-subtree split
is almost the same as git-subhistory split
.
git-subtree merge
contrasts with git-subhistory merge
in that while
it also merges the changes correctly, it messes up the commit graph with
duplicate commits and breaks Git's algorithms to find appropriate merge
bases and determine fast-forwardness. See also "Comparison with
git-subtree
".)
Let's say we have history like so:
[HEAD]
[initial commit] [master]
o-------------------------------o-------------------------------o-------------------------------o
Add a Main thing Add a Sub thing Add another Sub thing Add another Main thing
__________________________ __________________________ __________________________ __________________________
| | | | | | | |
| Files: | | Files: | | Files: | | Files: |
| + a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing |
| | | + path/to/sub/ | | path/to/sub/ | | + another-Main-thing |
| | | + a-Sub-thing | | a-Sub-thing | | path/to/sub/ |
| | | | | + another-Sub-thing | | a-Sub-thing |
| | | | | | | another-Sub-thing |
| | | | | | | |
|__________________________| |__________________________| |__________________________| |__________________________|
We can split out the commit history of just Sub in path/to/sub/
by
doing git-subhistory split path/to/sub/ -b subproj
, which also creates
a new branch subproj
to point to it. This results in 2 disconnected
histories, untouched master
and sparkly new subproj
:
[HEAD]
[initial commit] [master]
o-------------------------------o-------------------------------o-------------------------------o
Add a Main thing Add a Sub thing Add another Sub thing Add another Main thing
__________________________ __________________________ __________________________ __________________________
| | | | | | | |
| Files: | | Files: | | Files: | | Files: |
| + a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing |
| | | + path/to/sub/ | | path/to/sub/ | | + another-Main-thing |
| | | + a-Sub-thing | | a-Sub-thing | | path/to/sub/ |
| | | | | + another-Sub-thing | | a-Sub-thing |
| | | | | | | another-Sub-thing |
| | | | | | | |
|__________________________| |__________________________| |__________________________| |__________________________|
[SPLIT_HEAD]
[initial commit] [subproj]
o-------------------------------o
Add a Sub thing Add another Sub thing
__________________________ __________________________
| | | |
| Files: | | Files: |
| + a-Sub-thing | | a-Sub-thing |
| | | + another-Sub-thing |
| | | |
| | | |
|__________________________| |__________________________|
Say we push subproj
to a public repo for just Sub, and implausibly,
our code isn't perfect and bugfixes are contributed to the public repo.
Now we've git pull
-ed in upstream bugfixes:
[sub-upstream/master] # remote-tracking branch
[initial commit] [subproj]
o-------------------------------o-------------------------------o-------------------------------o
Add a Sub thing Add another Sub thing Fix Sub somehow Fix Sub further
__________________________ __________________________ __________________________ __________________________
| | | | | | | |
| Files: | | Files: | | Files: | | Files: |
| + a-Sub-thing | | a-Sub-thing | | a-Sub-thing | | a-Sub-thing |
| | | + another-Sub-thing | | another-Sub-thing | | another-Sub-thing |
| | | | | + fix-Sub-somehow | | fix-Sub-somehow |
| | | | | | | + fix-Sub-further |
| | | | | | | |
|__________________________| |__________________________| |__________________________| |__________________________|
Here's where the magic happens: we can easily merge these changes into
Main using the Add another Sub thing
commit as the merge base by
doing git-subhistory merge path/to/sub/ subproj
, resulting in:
[HEAD]
[initial commit] [master]
o-------------------------------o-------------------------------o-------------------------------o--------------------------------------------------------------------------------------------------o
| | |\------------------------------|--------------------------------o--------------------------------o-------------------------------/|
Add a Main thing Add a Sub thing Add another Sub thing Add another Main thing Fix Sub somehow Fix Sub further Merge subhistory branch 'subproj' under path/to/sub/
__________________________ __________________________ __________________________ __________________________ __________________________ __________________________ __________________________
| | | | | | | | | | | | | |
| Files: | | Files: | | Files: | | Files: | | Files: | | Files: | | Files: |
| + a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing |
| | | + path/to/sub/ | | path/to/sub/ | | + another-Main-thing | | path/to/sub/ | | path/to/sub/ | | < another-Main-thing |
| | | + a-Sub-thing | | a-Sub-thing | | path/to/sub/ | | a-Sub-thing | | a-Sub-thing | | path/to/sub/ |
| | | | | + another-Sub-thing | | a-Sub-thing | | another-Sub-thing | | another-Sub-thing | | a-Sub-thing |
| | | | | | | another-Sub-thing | | + fix-Sub-somehow | | fix-Sub-somehow | | another-Sub-thing |
| | | | | | | | | | | + fix-Sub-further | | > fix-Sub-somehow |
| | | | | | | | | [Note: no | | | | > fix-Sub-further |
| | | | | | | | | another-Main-thing yet] | | | | |
|__________________________| |__________________________| |__________________________| |__________________________| |__________________________| |__________________________| |__________________________|
See that? The commits on subproj
that fixed Sub after adding
another-Sub-thing
were assimilated into commits that fix Sub inside
path/to/sub/
after adding path/to/sub/another-Sub-thing
, allowing
them to merge cleanly like they clearly should, no commits duplicated.
This goes the other way, too. Say we make further changes to Sub:
[HEAD]
[initial commit] [master]
o-------------------------------o-------------------------------o-------------------------------o--------------------------------------------------------------------------------------------------o--------------------------------o
| | |\------------------------------|--------------------------------o--------------------------------o-------------------------------/| |
Add a Main thing Add a Sub thing Add another Sub thing Add another Main thing Fix Sub somehow Fix Sub further Merge subhistory branch ... Add yet another Sub thing
__________________________ __________________________ __________________________ __________________________ __________________________ __________________________ __________________________ _____________________________
| | | | | | | | | | | | | | | |
| Files: | | Files: | | Files: | | Files: | | Files: | | Files: | | Files: | | Files: |
| + a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing |
| | | + path/to/sub/ | | path/to/sub/ | | + another-Main-thing | | path/to/sub/ | | path/to/sub/ | | < another-Main-thing | | another-Main-thing |
| | | + a-Sub-thing | | a-Sub-thing | | path/to/sub/ | | a-Sub-thing | | a-Sub-thing | | path/to/sub/ | | path/to/sub/ |
| | | | | + another-Sub-thing | | a-Sub-thing | | another-Sub-thing | | another-Sub-thing | | a-Sub-thing | | a-Sub-thing |
| | | | | | | another-Sub-thing | | + fix-Sub-somehow | | fix-Sub-somehow | | another-Sub-thing | | another-Sub-thing |
| | | | | | | | | | | + fix-Sub-further | | > fix-Sub-somehow | | fix-Sub-somehow |
| | | | | | | | | [Note: no | | | | > fix-Sub-further | | fix-Sub-further |
| | | | | | | | | another-Main-thing yet] | | | | | | + yet-another-Sub-thing |
|__________________________| |__________________________| |__________________________| |__________________________| |__________________________| |__________________________| |__________________________| |_____________________________|
And then we git-subhistory split path/to/sub/ -b subproj2
:
[sub-upstream/master] [SPLIT_HEAD]
[initial commit] [subproj] [subproj2]
o-------------------------------o-------------------------------o-------------------------------o--------------------------------o
Add a Sub thing Add another Sub thing Fix Sub somehow Fix Sub further Add yet another Sub thing
__________________________ __________________________ __________________________ __________________________ __________________________
| | | | | | | | | |
| Files: | | Files: | | Files: | | Files: | | Files: |
| + a-Sub-thing | | a-Sub-thing | | a-Sub-thing | | a-Sub-thing | | a-Sub-thing |
| | | + another-Sub-thing | | another-Sub-thing | | another-Sub-thing | | another-Sub-thing |
| | | | | + fix-Sub-somehow | | fix-Sub-somehow | | fix-Sub-somehow |
| | | | | | | + fix-Sub-further | | fix-Sub-further |
| | | | | | | | | + yet-another-Sub-thing |
|__________________________| |__________________________| |__________________________| |__________________________| |__________________________|
The assimilated commits are guaranteed to map back to the exact same
commits they were assimilated from, down to the hashes. That means
subproj2
will be a fast-forward from subproj
, and if more bugfixes
have been added upstream, Fix Sub further
will be the merge base.
This is a data model guarantee, requiring no local state to enforce:
someone else could pull your master
into their clone and split them
out, and they would still map to the exact same commits they were
assimilated from, and the merge bases and fast-forwards will still be
exactly right.
-
git-subhistory split <subproj-path> [(-b | -B) <subproj-branch>]
Literally just uses the
--subdirectory-filter
ofgit-filter-branch
, which does pretty much the same thing asgit-subtree split
: it generates a completely new, synthetic commit graph of the history of just Sub's directory. Like the commit history shown bygit log --graph path/to/sub/
, it includes only commits that affectedpath/to/sub/
, but each commit is rewritten so its root tree is thepath/to/sub/
subtree. -
git-subhistory assimilate <subproj-path> <subproj-branch>
Invert
split
: look through the commits on<subproj-branch>
for synthetic commits that were generated by splitting out some ancestor ofHEAD
, and then on top of the original Main commits those synthetic commits were generated from, generates more synthetic commits that make the same change as the Sub commits but topath/to/sub/
in Main instead.
(Note: synthetic merge commits are tricky, because the Main trees of their parents may conflict.git-subhistory
tries to create the simplest synthetic merge commits, but currently prioritizes guaranteeing thatASSIMILATE_HEAD
be a clean merge intoHEAD
. This may change: mayhaps in the future simplicity of synthetic merge commits is prioritized, andgit-subhistory
merge will be made more sophisticated and will automatically resolve conflicts in the Main tree outsidepath/to/sub/
in favor ofHEAD
.)
(Note: it is recommended to only split from/merge into the same Main branch. TODO: explain why merging in commits split from another branch can be like cherry-picking.) -
git-subhistory merge <subproj-path> <subproj-branch>
Almost literally just
git-subhistory assimilate "$@" && git merge ASSIMILATE_HEAD
.
TODO:
-
git-subhistory push <subproj-path> <remote> <remote-branch>
Should be just
git-subhistory split <subproj-path> && git push <remote> SPLIT_HEAD:<remote-branch>
. -
git-subhistory pull <subproj-path> <remote> <remote-branch>
Should be just
git fetch <remote> <remote-branch> && git-subhistory merge <subproj-path> FETCH_HEAD
.
-
working directory: Unlike
git-submodule
andgit-subtree
,git-subhistory
does NOT "need to [be run] from the toplevel of the working tree", run it wherever you damn well please, even insidepath/to/sub/
with.
as<subproj-path>
. -
SPLIT_HEAD
: All commands rungit-subhistory split
at some point, mutatingSPLIT_HEAD
to beHEAD
split at<subproj-path>
. -
grafts/replacement refs: Synthetic commits are all created with
git-filter-branch
, which honors theinfo/grafts
file and refs inrefs/replaces/
. Grafts and replacement refs will hence be permanently baked into the synthetic commit histories. Changing relevant ones will cause subsequent synthetic history to not match up at all. (Potential TODO: creating synthetic grafts and replacement refs for the synthetic history? But originals and synthetic grafts/replacement refs would have to be modified in sync.)
I actually think git-subtree
was halfway there, it got splitting pretty
much right, but then did merging wrong and had to needlessly complicate
splitting to deal with the broken merging. If you went through the
example above with equivalent git-subtree
commands, the history after
splitting would be identical right down to the hashes, but after
merging, you'd get:
[initial commit] [initial commit] [master]
o-------------------------------o-------------------------------o-------------------------------o--------------------------------|-----------------------------------------------------------------------------------------------------------------------------------o
| | | | o--------------------------------o--------------------------------o--------------------------------o-------------------------------/|
Add a Main thing Add a Sub thing Add another Sub thing Add another Main thing Add a Sub thing Add another Sub thing Fix Sub somehow Fix Sub further Merge commit 'd535a7c' under subtree path/to/sub/
__________________________ __________________________ __________________________ __________________________ __________________________ __________________________ __________________________ __________________________ __________________________
| | | | | | | | | | | | | | | | | |
| Files: | | Files: | | Files: | | Files: | | Files: | | Files: | | Files: | | Files: | | Files: |
| + a-Main-thing | | a-Main-thing | | a-Main-thing | | a-Main-thing | | + a-Sub-thing | | a-Sub-thing | | a-Sub-thing | | a-Sub-thing | | a-Main-thing |
| | | + path/to/sub/ | | path/to/sub/ | | + another-Main-thing | | | | + another-Sub-thing | | another-Sub-thing | | another-Sub-thing | | < another-Main-thing |
| | | + a-Sub-thing | | a-Sub-thing | | path/to/sub/ | | | | | | + fix-Sub-somehow | | fix-Sub-somehow | | path/to/sub/ |
| | | | | + another-Sub-thing | | a-Sub-thing | | | | | | | | + fix-Sub-further | | a-Sub-thing |
| | | | | | | another-Sub-thing | | | | | | | | | | another-Sub-thing |
| | | | | | | | | | | | | | | | | > fix-Sub-somehow |
| | | | | | | | | | | | | | | | | > fix-Sub-further |
| | | | | | | | | | | | | | | | | |
|__________________________| |__________________________| |__________________________| |__________________________| |__________________________| |__________________________| |__________________________| |__________________________| |__________________________|
Note the duplicate Add a Sub thing
and Add another Sub thing
commits.
What happened was, there were the originals, which add
path/to/sub/a-Sub-thing
and path/to/sub/another-Sub-thing
to Main,
right? They were split into commits which add a-Sub-thing
and
another-Sub-thing
to Sub, just like you'd want.
But then, git-subtree merge
did git merge --strategy subtree
, which
is a fine and dandy merge strategy, but git-merge
always creates a
merge commit whose parents are the commits passed to it, and in this
case one is a commit to Main and one is a commit to Sub: notice how in
the merge commit, the subproject files a-Sub-thing
and
another-Sub-thing
are at different paths in the two parent commits.
Hence duplicated commits: the changes are to different paths.
By contrast, git-subhistory merge
first does git-subhistory assimilate
to generate synthetic commits to Main from the upstream bugfixes to Sub,
so both parents of the merge commit are commits to Main. Importantly,
the synthetic commits are generated on top of the original subproject
commits, so instead of being duplicated, those originals are the merge
base, like they should be!
(Asides:
merge -s subtree
is actually awesome if you only ever merge in upstream changes and never split out changes to push upstream: nextmerge -s subtree
, the commit that was last merged in usingmerge -s subtree
will be the merge base, and Sub's commit history shows up in Main's commit history. Our problem with it is that if you do split out any commits, they get duplicated and can't serve as a merge base and will never fast-forward.git-subtree split
doesn't just serve the same purpose asgit-subhistory split
, it actually does almost exactly the same thing and often generates identical commits with identical hashes. In fact, I started out trying to forkgit-subtree
, intending to add an alternative way to merge, until I discovered thatgit-subtree split
was not justgit filter-branch --subdirectory-filter
, it "manually" loops through history and callsgit commit-tree
. I don't know exactly why but I'm pretty sure that's because it does something special withmerge -s subtree
commits, which is wheregit-subtree
-generated commits start diverging from plain oldgit filter-branch --subdirectory-filter
-generated commits (which whatgit-subhistory split
does).- Because of all this, most
git-subtree
tutorials I've seen actually recommend using--squash
when merging, which creates a synthetic commit combining all upstream changes to Sub since the last merge. This solves the duplicate commit problem, but no upstream Sub commits are in Main's history, Main commits are never a fast-forward of any upstream Sub commits, and there will never be a merge-base. On projects I've worked on where we wanted to both push to and pull from an upstream Sub repo, we decided that was worse than submodules and used submodules instead.
)