[Howto] Git history cleanup

920839987_135ba34fffGit is great. It stores everything you hand over to it. Unfortunately it also stores stuff you, later on, realize you should better not have handed over, for example due to security concerns. Here are two short ways to remove stuff from git, to cleanup the history.

Most people using Git in their daily routine sooner or later stumble in a situation where you realize that you have committed files which should not be in the repository anymore. In my (rather special case I admit) I was working in a Git repo and created a new branch to add some further stuff in a new sub-directory. Later on, however, when I had to clone the content of the new branch to another remote location I realized that there were some old files in the repo (and thus also in the new branch) which could not be exported to another location due to security concerns. They had to be removed beforehand!

So, I had to screw around with Git – but since Git is awesome, there are ways. One way I found under the marvelous title git rocks even when it sucks is to go through the entire commit history and rewrite each and every commit by deleting everything related to the given file name. This can be done by using git filter-branch.

For example, given that I had to remove a folder called “Calculation” the command is:

$ git filter-branch -f --index-filter 'git rm -rf --cached --ignore-unmatch Calculation' -- --all
Rewrite 5089fb36c64934c1b7a8301fe346a214a7cccdaa (360/365)rm 'Calculation'
Rewrite cc232788dfa60355dd6db6c672305700717835b4 (361/365)rm 'Calculation'
Rewrite 33d1782fdd6de5c75b7db994abfe228a028d7351 (362/365)rm 'Calculation'
Rewrite 7416d33cac120fd782f75c6eb91157ce8135590b (363/365)rm 'Calculation'
Rewrite 81e77acb22bd08c9de743b38a02341682ca369dd (364/365)rm 'Calculation'
Rewrite 2dce54592832f333f3ab947b020c0f98c94d1f51 (365/365)rm 'Calculation'

Ref 'refs/heads/documentation' was rewritten
Ref 'refs/remotes/origin/master' was rewritten
Ref 'refs/remotes/origin/documentation' was rewritten
WARNING: Ref 'refs/remotes/origin/master' is unchanged

The folder was removed entirely! However, old commit logs still there, so you better not have any relevant data in the commit messages! And as mentioned in the linked blog post, to really get rid of all traces of the files the best is to clone the repository again once afterwards.

In my case an even simpler way was to take the new subdirectory, make it the new root or the repository and rewrite everything regarding to the root. All other files not under the new root are discarded in such case. Here is the proper command, given that I have added my new content under the subdir “documentation”:

$ git filter-branch --subdirectory-filter documentation -- --all
Rewrite dd1d03f648e983208b1acd9a9db853ee820129b9 (34/34)
Ref 'refs/heads/documentation' was rewritten
WARNING: Ref 'refs/remotes/origin/master' is unchanged
Ref 'refs/remotes/origin/documentation' was rewritten
WARNING: Ref 'refs/remotes/origin/master' is unchanged

Please note that in both cases you have to be extra careful when you renamed the directories in the meantime. If you are not sure, better check all files which have ever been in the repository:

$ for commit in `git log --all --pretty=format:%H`; do git ls-tree -r -l $commit; done |awk '{print $5}'

One thought on “[Howto] Git history cleanup”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: