Note
Github’s single file size limit is 100MB
The recommendation is to keep file size less then 50MB
The Problem
So let’s say by “accident” you added some large files to your git repo in the pass. You start to notice now that each commit you make is pretty slow. git push
seems to be compressing and writing files that are quite large.
You realized that your need to remove these large files. You tried git rm --cached
but still the commits are slow. That’s because the large file still exists in the git history; that is the basic of what version control does!
Therefore, the problem is, how do I remove a complete file from Git history? This can be useful in 2 ways:
- remove large data files that’s commited by accident
- remove sensitive files
Finding the files
As recommended by this post(the second answer), you can run this shell script:
$ cd path/to/your/git/repo
$ sh git_blob_obj.sh
This will show you a list of all the blob objects in the repo, from smallest to biggest. The stackover flow post has a couple more options for filtering for files. For Mac users, you will need to remove the last line in git_blob_obj.sh
Removing from Git History completely
For the file large_file.wmv
as an example, you can run the following:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch path/to/large_file.wmv' HEAD
git push origin master
Before running git push
, I found that I have to run git filter-branch
again with the -f
option: git filter-branch -f --index-filter
. As to why, this post does a good job of explaining. And here is the doc for git filter-branch
.
Prevention
A good way to remove blob from your repo, for example .wmv files, is to run this script:
$ find . -name *.wmv -print0 | xargs -0 git rm -r --cached --ignore-unmatch
echo *.wmv >> .gitignore
Git Lola (a side discovery)
I strumbled across git lola
from this stackoverflow post.
git lola
is a git alias that you need to add to your ~/.gitconfig
file. It is a command-line tool that shows you details about your commits.
Detail instructions are available in this tutorial.
git lol
shows you your repo tree structure with syntax coloring. git lola
is basically the same as lol but with details for all branches.
and using git lola --name-status
we can also see what files were modified for each commit