Migrating multi-project Subversion repo to Git

I've been recording my Subversion to Git journeys to remind myself for the next Subversion conversion. Here are my prior entries in case they're helpful:
Simple Sourceforget repo from Subversion to Git
Self-hosted Git server

I had saved one remaining Subversion project for the last to migrate to Git because of its relative complexity in structure. It's actually one of my first projects, the Text Trix text editor, and I had migrated it from CVS to Subversion way back in the day using svn2git.

Here is the basic structure:
root
|--texttrix
   |--trunk/branches/tags
|--jsyntaxpanettx
   |--trunk/branches/tags
|--osterttx
   |--trunk/branches/tags
|--plugins
   |--plugin0
      |--trunk/branches/tags
   |--plugin1
      |--...|--txtrx
   |--trunk/branches/tags

When I gave svn2git the root level URL, svn2git only tracked changes when the repo had a single project, in the trunk/branches/tags format. After I had expanded the repo to include multiple projects, each with its own trunk/branches/tags organization, svn2git stopped tracking the folders further. To include the full history, I pointed svn2git to the main project only, and svn2git was able to track the project back from when it had moved there. I also excluded any .class, .jar, and .lex files since these can be re-generated.

svn2git https://svn.code.sf.net/p/texttrix/svn/texttrix --authors ~/authors-ttx.txt --verbose --exclude ".*.class$" --exclude ".*.jar$" --exclude ".*.lex$"

For some reason many files to exclude leaked through, so I eventually went to the BFG cleaner to remove these files, with some helpful hints from this guide. I first needed to initialize a bare git repo to "remotely" host my subversion-imported repo:

mkdir ttxhost
cd ttxhost
git init --bare
cd ../texttrix
git remote add origin ../ttxhost

This "remote" host is actually local but serves as a clean repository from which BFG can pull and push. But before I actually ran BFG, I needed to make sure that my commit at HEAD did not have any files that I intended to remove since BFG leaves the HEAD commit alone.

git rm lib/*.jar
git commit -m "Remove remaining jars"

Push the whole repo into ttxhost:
git push origin master

Next I cloned this host repo and allowed BFG to work its magic. I had some difficulty finding the syntax for removing multiple files from the main docs but eventually found this Stackoverflow comment.

cd ..
mkdir ttxclean
cd ttxclean
git clone --mirror ../ttxhost
java -jar ../bfg-1.12.15.jar --delete-files "{*.class,*.jar,*.lex}" ttxhost.git

BFG reported that it had successfully cleaned files, and after summing the sizes of these files, I realized that BFB had removed about 80% of the original size of my repo. Inspected the output carefully turned out to be important as I had learned here about the need to remove files from the HEAD commit.

To get all these changes onto the host repo, I needed to clean the repo and push its changes:

cd ttxhost.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push

When I went back to the host repo, however, it remained the same size. And running "git gc" made it even bigger! According to this post, this behavior is known for git gc, reflecting a safety mechanism to keep unreferenced objects for 2 weeks in an unpacked form. Using the "prune" flag greatly reduced the size:

cd ../../ttxhost
git gc --prune=now

And voila, the host repo went from 9.1 to 3MB! I next checked out a fresh copy of this repo to check it:

cd ..
mv texttrix texttrix.old
git clone ttxhost texttrix # can skip if you don't need to check

The new repo was clean and ready for upload to GitHub! Following this guide, I cloned a bare copy of the host repo and mirrored it to GitHub to upload all references.

mkdir ttxforgithub
cd ttxforgithub
git clone --bare ../ttxhost
cd ttxhost.git
git push --mirror https://github.com/the4thchild/texttrix.git

I removed my prior local repo and cloned in the one from GitHub:

cd ../..
rm -rf texttrix
git clone https://github.com/the4thchild/texttrix.git

And here it is on GitHub. And now I have to repeat all over with each of the remaining projects and plugins.

Comments

Popular Posts