84.33 Git Remove all Traces of a File

20190509 Github maintains archival copies of everything and it takes a little effort to remove something completely from a repository. You can not actually guarantee to remove all traces of any file you uploaded to a git repository unless it is a private repository to which only you have access. For public repositories someone may have already downloaded the files or cloned the repository, or forked your repository. However, if you are quick enough you can limit the risk. The typical use case is when you have accidentally uploaded a file containing secret information, such as a password. After removing the file from your git repository it is still a good idea to change any exposed passwords.

To begin with, be sure to backup your local copy of the cloned repository in case of any failures in the process.

As suggested in the github docs the tool BFG Repo-Cleaner is a utility which allows more flexibility for removing files from a repository. For example, it can also remove folders or files above a certain size.

To remove files and the history of files from a repository we can do the following. Make sure to delete any branches on origin that are no longer needed as the rewrite history will not apply to PRs and already checked out branches or forks. Merge any PRs first for the same reason.

Begin by cloning the bare repository

git clone --mirror myuser/myrepo

Then delete files (e.g., secrets.csv and the folder secrets) and rewrite history from the main branch of the local repository mirror with:

bfg --delete-files secret.csv secrets/ ...

Wildcards can be passed to define multiple files to delete in this call.

Overwrite the commit history with:

git reflog expire --expire=now --all
git gc --prune=now --aggressive

Forcefully push to remote:

git push --force

Follow through to avoid accidentally reintroducing the removed files back into the repository.

Now destroy your repository and take a fresh clone. Also tell collaborators to take a fresh clone of the repository, or to rebase their branch before submitting a pull request. They can then take a fresh clone of the repository.

ToDo Github is currently (20210830) responding that git-filter-repo should be used over the git-filter-branch. THIS NEEDS TO BE UPDATED AND IS NOT RECOMMENDED IN ITS CURRENT FORM

Here we demonstrate the removal of all traces of a file named test/private.py from a git repository.


WARNING: git-filter-branch has a glut of gotchas generating mangled history
         rewrites.  Hit Ctrl-C before proceeding to abort, then use an
         alternative filtering tool such as 'git filter-repo'
         (https://github.com/newren/git-filter-repo/) instead.  See the
         filter-branch manual page for more details; to squelch this warning,
Proceeding with filter-branch...

$ git rm test/private.py
$ git commit -m "Permanently remove this file."
$ git push
$ git filter-branch --force --index-filter \
  'git rm --cached --ignore-unmatch test/private.py' \
  --prune-empty --tag-name-filter cat -- --all

Rewrite 51c5....a070 (143/204) (...)    rm 'test/private.py'
Rewrite caf4....9d47 (143/204) (...)    rm 'test/private.py'

$ git push --all --force

In this process we (optionally) first rm the file of interest, then commit that change and push the change to the remote repository. A filter-branch command then does the actual work to find all commits that include the file to remove, then removes the file from those commits with a rm.

After this process all traces of the contents of test/private.py are removed from the repository. They will not be removed from any clones or forks.

