git combining two files into one with history preserved

Imagine that you have two files in a git repository, say A.txt and B.txt.

Is it possible to concat the two files into a third one A+B.txt, removing the original A.txt and B.txt and committing it all, so the history is still preserved?

That is, if I asked git log --follow A+B.txt I would know that the content originated from the A.txt and B.txt files?

I've tried to separate the files into two different branches and then merging them into a new file (while removing the old ones), but to no avail.

Answer

November 4, 2019

KyleMit

The long answer is 'yes'!

Full credit to Raymond Chen's article Combining two files into one while preserving line history:

Imagine you had two files: fruits & veggies

The naïve way of combining the files would be to do it in a single commit, but you'll lose line history on one of the files (or both)

You could tweak the git blame algorithms with options like -M and -C to get it to try harder, but in practice, you don’t often have control over those options (eg. the git blame may be performed on a server)

The trick is to use a merge with two forked branches

In one branch, we rename veggies to produce.

In the other branch, we rename fruits to produce.
git checkout -b rename-veggies
git mv veggies produce
git commit -m "rename veggies to produce"
git checkout -
git mv fruits produce
git commit -m "rename fruits to produce"
Then merge the first into the second
git merge -m "combine fruits and veggies" rename-veggies
This will generate a merge conflict - that's okay - now take the changes from each branch's Produce file and combine into one - here's a simple concatenation (but resolve the merge conflict however you please):
cat "produce~HEAD" "produce~rename-veggies" >produce
git add produce
git merge --continue
The resulting produce file was created by a merge, so git knows to look in both parents of the merge to learn what happened.

And that’s where it sees that each parent contributed half of the file, and it also sees that the files in each branch were themselves created via renames of other files, so it can chase the history back into both of the original files.

Each line should be correctly attributed to the person who introduced it in the original file, whether it’s fruits or veggies. People investigating the produce file get a more accurate history of who last touched each line of the file.

For best results, your rename commit should be a pure rename. Resist the temptation to edit the file’s contents at the same time you rename it. A pure rename ensure that git’s rename detection will find the match. If you edit the file in the same commit as the rename, then whether the rename is detected as such will depend on git’s “similar files” heuristic.

Checkout the full article for a full step by step breakdown and more explanations

Originally, I had thought this might be a use case for git merge-file doing something like this:

>produce echo #empty
git merge-file fruits produce veggies --union -p > produce
git rm fruits veggies
git add produce
git commit -m "combine fruits and veggies"

However, all this does is help simulate the merge diffing algorithm against two different files - the end output when committed is identical to if the file had been updated manually and the resulting changes manually committed

Answer

The trick is to use a merge with two forked branches

The trick is to use a `merge` with two forked branches