How does one .gitignore pickle files in python? [duplicate] - python

How can I delete "file1.txt" from my repository?

Use git rm.
If you want to remove the file from the Git repository and the filesystem, use:
git rm file1.txt
git commit -m "remove file1.txt"
But if you want to remove the file only from the Git repository and not remove it from the filesystem, use:
git rm --cached file1.txt
git commit -m "remove file1.txt"
And to push changes to remote repo
git push origin branch_name

git rm file.txt removes the file from the repo but also deletes it from the local file system.
To remove the file from the repo and not delete it from the local file system use:
git rm --cached file.txt
The below exact situation is where I use git to maintain version control for my business's website, but the "mickey" directory was a tmp folder to share private content with a CAD developer. When he needed HUGE files, I made a private, unlinked directory and ftpd the files there for him to fetch via browser. Forgetting I did this, I later performed a git add -A from the website's base directory. Subsequently, git status showed the new files needing committing. Now I needed to delete them from git's tracking and version control...
Sample output below is from what just happened to me, where I unintentionally deleted the .003 file. Thankfully, I don't care what happened to the local copy to .003, but some of the other currently changed files were updates I just made to the website and would be epic to have been deleted on the local file system! "Local file system" = the live website (not a great practice, but is reality).
[~/www]$ git rm shop/mickey/mtt_flange_SCN.7z.003
error: 'shop/mickey/mtt_flange_SCN.7z.003' has local modifications
(use --cached to keep the file, or -f to force removal)
[~/www]$ git rm -f shop/mickey/mtt_flange_SCN.7z.003
rm 'shop/mickey/mtt_flange_SCN.7z.003'
[~/www]$
[~/www]$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# deleted: shop/mickey/mtt_flange_SCN.7z.003
#
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: shop/mickey/mtt_flange_SCN.7z.001
# modified: shop/mickey/mtt_flange_SCN.7z.002
[~/www]$ ls shop/mickey/mtt_flange_S*
shop/mickey/mtt_flange_SCN.7z.001 shop/mickey/mtt_flange_SCN.7z.002
[~/www]$
[~/www]$
[~/www]$ git rm --cached shop/mickey/mtt_flange_SCN.7z.002
rm 'shop/mickey/mtt_flange_SCN.7z.002'
[~/www]$ ls shop/mickey/mtt_flange_S*
shop/mickey/mtt_flange_SCN.7z.001 shop/mickey/mtt_flange_SCN.7z.002
[~/www]$
[~/www]$
[~/www]$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# deleted: shop/mickey/mtt_flange_SCN.7z.002
# deleted: shop/mickey/mtt_flange_SCN.7z.003
#
# Changed but not updated:
# modified: shop/mickey/mtt_flange_SCN.7z.001
[~/www]$
Update: This answer is getting some traffic, so I thought I'd mention my other Git answer shares a couple of great resources: This page has a graphic that help demystify Git for me. The "Pro Git" book is online and helps me a lot.

First, if you are using git rm, especially for multiple files, consider any wildcard will be resolved by the shell, not by the git command.
git rm -- *.anExtension
git commit -m "remove multiple files"
But, if your file is already on GitHub, you can (since July 2013) directly delete it from the web GUI!
Simply view any file in your repository, click the trash can icon at the top, and commit the removal just like any other web-based edit.
Then "git pull" on your local repo, and that will delete the file locally too.
Which makes this answer a (roundabout) way to delete a file from git repo?
(Not to mention that a file on GitHub is in a "git repo")
(the commit will reflect the deletion of that file):
And just like that, it’s gone.
For help with these features, be sure to read our help articles on creating, moving, renaming, and deleting files.
Note: Since it’s a version control system, Git always has your back if you need to recover the file later.
The last sentence means that the deleted file is still part of the history, and you can restore it easily enough (but not yet through the GitHub web interface):
See "Restore a deleted file in a Git repo".

This is the only option that worked for me.
git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch *.sql'
Note: Replace *.sql with your file name or file type. Be very careful because this will go through every commit and rip this file type out.
EDIT:
pay attention - after this command you will not be able to push or pull - you will see the reject of 'unrelated history' you can use 'git push --force -u origin master' to push or pull

Additionally, if it's a folder to be removed and it's subsequent child folders or files, use:
git rm -r foldername

More generally, git help will help with at least simple questions like this:
zhasper#berens:/media/Kindle/documents$ git help
usage: git [--version] [--exec-path[=GIT_EXEC_PATH]] [--html-path] [-p|--paginate|--no-pager] [--bare] [--git-dir=GIT_DIR] [--work-tree=GIT_WORK_TREE] [--help] COMMAND [ARGS]
The most commonly used git commands are:
add Add file contents to the index
:
rm Remove files from the working tree and from the index

If you want to delete the file from the repo, but leave it in the the file system (will be untracked):
bykov#gitserver:~/temp> git rm --cached file1.txt
bykov#gitserver:~/temp> git commit -m "remove file1.txt from the repo"
If you want to delete the file from the repo and from the file system then there are two options:
If the file has no changes staged in the index:
bykov#gitserver:~/temp> git rm file1.txt
bykov#gitserver:~/temp> git commit -m "remove file1.txt"
If the file has changes staged in the index:
bykov#gitserver:~/temp> git rm -f file1.txt
bykov#gitserver:~/temp> git commit -m "remove file1.txt"

git rm will only remove the file on this branch from now on, but it remains in history and git will remember it.
The right way to do it is with git filter-branch, as others have mentioned here. It will rewrite every commit in the history of the branch to delete that file.
But, even after doing that, git can remember it because there can be references to it in reflog, remotes, tags and such.
If you want to completely obliterate it in one step, I recommend you to use git forget-blob
https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/
It is easy, just do git forget-blob file1.txt.
This will remove every reference, do git filter-branch, and finally run the git garbage collector git gc to completely get rid of this file in your repo.

Note: if you want to delete file only from git use below:
git rm --cached file1.txt
If you want to delete also from hard disk:
git rm file1.txt
If you want to remove a folder(the folder may contain few files) so, you should remove using recursive command, as below:
git rm -r foldername
If you want to remove a folder inside another folder
git rm -r parentFolder/childFolder
Then, you can commit and push as usual. However, if you want to recover deleted folder, you can follow this: recover deleted files from git is possible.
From doc:
git rm [-f | --force] [-n] [-r] [--cached] [--ignore-unmatch] [--quiet] [--] <file>…​
OPTIONS
<file>…​
Files to remove. Fileglobs (e.g. *.c) can be given to remove all matching files. If you want Git to expand file glob characters, you
may need to shell-escape them. A leading directory name (e.g. dir to
remove dir/file1 and dir/file2) can be given to remove all files in
the directory, and recursively all sub-directories, but this requires
the -r option to be explicitly given.
-f
--force
Override the up-to-date check.
-n
--dry-run
Don’t actually remove any file(s). Instead, just show if they exist in the index and would otherwise be removed by the command.
-r
Allow recursive removal when a leading directory name is given.
--
This option can be used to separate command-line options from the list of files, (useful when filenames might be mistaken for
command-line options).
--cached
Use this option to unstage and remove paths only from the index. Working tree files, whether modified or not, will be left alone.
--ignore-unmatch
Exit with a zero status even if no files matched.
-q
--quiet
git rm normally outputs one line (in the form of an rm command) for each file removed. This option suppresses that output.
Read more on official doc.

The answer by Greg Hewgill, that was edited by Johannchopin helped me, as I did not care about removing the file from the history completely.
In my case, it was a directory, so the only change I did was using:
git rm -r --cached myDirectoryName
instead of "git rm --cached file1.txt"
..followed by:
git commit -m "deleted myDirectoryName from git"
git push origin branch_name
Thanks Greg Hewgill and Johannchopin!

Another way if you want to delete the file from your local folder using rm command and then push the changes to the remote server.
rm file1.txt
git commit -a -m "Deleting files"
git push origin master

According to the documentation.
git rm --cached file1.txt
When it comes to sensitive data—better not say that you removed the file but rather just include it in the last known commit:
0. Amend last commit
git commit --amend -CHEAD
If you want to delete the file from all git history, according to the documentation you should do the following:
1. Remove it from your local history
git filter-branch --force --index-filter \ "git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA" \ --prune-empty --tag-name-filter cat -- --all
# Replace PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA with the path to the file you want to remove, not just its filename
Don't forget to include this file in .gitignore (If it's a file you never want to share (such as passwords...):
echo "YOUR-FILE-WITH-SENSITIVE-DATA" >> .gitignore
git add .gitignore
git commit -m "Add YOUR-FILE-WITH-SENSITIVE-DATA to .gitignore"
3. If you need to remove from the remote
git push origin --force --all
4. If you also need to remove it from tag releases:
git push origin --force --tags

In my case I tried to remove file on github after few commits but save on computer
git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch file_name_with_path' HEAD
git push --force -u origin master
and later this file was ignored

To delete a specific file
git rm filename
To clean all the untracked files from a directory recursively in single shot
git clean -fdx

If you have the GitHub for Windows application, you can delete a file in 5 easy steps:
Click Sync.
Click on the directory where the file is located and select your latest version of the file.
Click on tools and select "Open a shell here."
In the shell, type: "rm {filename}" and hit enter.
Commit the change and resync.

First,Remove files from local repository.
git rm -r File-Name
or, remove files only from local repository but from filesystem
git rm --cached File-Name
Secondly, Commit changes into local repository.
git commit -m "unwanted files or some inline comments"
Finally, update/push local changes into remote repository.
git push

go to your project dir and type:
git filter-branch --tree-filter 'rm -f <deleted-file>' HEAD
after that push --force for delete file from all commits.
git push origin --force --all

https://stackoverflow.com/a/2047477/14508423
https://stackoverflow.com/a/19666677/14508423
Quite similar, tried both. Nothing. Git keeps tracking the file. (I use vscode so the file got instantly marked as U(untracked) or A(added))
Doing this https://stackoverflow.com/a/53431148/14508423 to whole category solves the problem.
Do this to delete a file from git and make git forget about it:
**manually add the file name to .gitignore**
git rm --cached settings.py
git commit -m "remove settings.py"
git push origin master
git rm -r --cached .
git add .
git commit -m "Untrack files in .gitignore"
ps: every time it failed I tried to delete file with sensitive info like tokens (.env and settings.py files). But it works fine with random files oO

For the case where git rm doesn't suffice and the file needs to be removed from history: As the git filter-branch manual page now itself suggests using git-filter-repo, and I had to do this today, here's an example using that tool. It uses the example repo https://example/eguser/eg.git
Clone the repository into a new directory git clone https://example/eguser/eg.git
Keep everything except the unwanted file. git-filter-repo --path file1.txt --invert-paths
Add the remote repository origin back :
git remote add origin https://example/eguser/eg.git.
The git-filter-repo tool removes remote remote info by design and suggests a new remote repo (see point 4). This makes sense for big shared repos but might be overkill for getting rid a single newly added file as in this example.
When happy with the contents of local, replace remote with it.
git push --force -u origin master. Forcing is required due to the changed history.
Also note the useful --dry-run option and a good discussion in the linked manual on team and project dynamics before charging in and changing repository history.

I tried a lot of the suggested options and none appeared to work (I won't list the various problems).
What I ended up doing, which worked, was simple and intuitive (to me) was:
move the whole local repo elsewhere
clone the repo again from master to your local drive
copy back the files/folder from your original copy in #1 back into the new clone from #2
make sure that the problem large file is either not there or excluded in the .gitignore file
do the usual git add/git commit/git push

After you have removed the file from the repo with git rm you can use BFG Repo-Cleaner to completely and easily obliterate the file from the repo history.

Just by going on the file in your github repository you can see the delete icon beside Raw|Blame and don't forget to click on commit changes button. And you can see that your file has been deleted.

New answer that works in 2022.
Do not use:
git filter-branch
this command might not change the remote repo after pushing. If you clone after using it, you will see that nothing has changed and the repo still has a large size. This command is old now. For example, if you use the steps in https://github.com/18F/C2/issues/439, this won't work.
You need to use
git filter-repo
Steps:
(1) Find the largest files in .git:
git rev-list --objects --all | grep -f <(git verify-pack -v .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " " | tail -10)
(2) Strat filtering these large files:
git filter-repo --path-glob '../../src/../..' --invert-paths --force
or
git filter-repo --path-glob '*.zip' --invert-paths --force
or
git filter-repo --path-glob '*.a' --invert-paths --force
or
whatever you find in step 1.
(3)
git remote add origin git#github.com:.../...git
(4)
git push --all --force
git push --tags --force
DONE!

I have obj and bin files that accidentally made it into the repo that I don't want polluting my 'changed files' list
After I noticed they went to the remote, I ignored them by adding this to .gitignore
/*/obj
/*/bin
Problem is they are already in the remote, and when they get changed, they pop up as changed and pollute the changed file list.
To stop seeing them, you need to delete the whole folder from the remote repository.
In a command prompt:
CD to the repo folder (i.e. C:\repos\MyRepo)
I want to delete SSIS\obj. It seems you can only delete at the top level, so you now need to CD into SSIS: (i.e. C:\repos\MyRepo\SSIS)
Now type the magic incantation git rm -r -f obj
rm=remove
-r = recursively remove
-f = means force, cause you really mean it
obj is the folder
Now run git commit -m "remove obj folder"
I got an alarming message saying 13 files changed 315222 deletions
Then because I didn't want to have to look up the CMD line, I went into Visual Sstudio and did a Sync to apply it to the remote

if your file is sensitive (for example, settings or keys you accidently added and committed) then you can remove it from all versions.
To remove from all versions use the command below (warning: careful because you won't be able to restore the removed file in the repo if do not have a copy):
Using Git
git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch file.ext' HEAD
Using git-filter-repo (get it here, new recommended way by Git)
git filter-repo --path file.ext

If you need to remove files from a determined extension (for example, compiled files) you could do the following to remove them all at once:
git remove -f *.pyc

Incase if you don't file in your local repo but in git repo, then simply open file in git repo through web interface and find Delete button at right corner in interface.
Click Here, To view interface Delete Option

Related

Why is github trying to download a file that doesn't exist? [duplicate]

I accidentally dropped a DVD-rip into a website project, then carelessly git commit -a -m ..., and, zap, the repo was bloated by 2.2 gigs. Next time I made some edits, deleted the video file, and committed everything, but the compressed file is still there in the repository, in history.
I know I can start branches from those commits and rebase one branch onto another. But what should I do to merge the 2 commits so that the big file doesn't show in the history and is cleaned in the garbage collection procedure?
Use the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch specifically designed for removing unwanted files from Git history.
Carefully follow the usage instructions, the core part is just this:
$ java -jar bfg.jar --strip-blobs-bigger-than 100M my-repo.git
Any files over 100MB in size (that aren't in your latest commit) will be removed from your Git repository's history. You can then use git gc to clean away the dead data:
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
After pruning, we can force push to the remote repo*
$ git push --force
*NOTE: cannot force push a protect branch on GitHub
The BFG is typically at least 10-50x faster than running git-filter-branch, and generally easier to use.
Full disclosure: I'm the author of the BFG Repo-Cleaner.
What you want to do is highly disruptive if you have published history to other developers. See “Recovering From Upstream Rebase” in the git rebase documentation for the necessary steps after repairing your history.
You have at least two options: git filter-branch and an interactive rebase, both explained below.
Using git filter-branch
I had a similar problem with bulky binary test data from a Subversion import and wrote about removing data from a git repository.
Say your git history is:
$ git lola --name-status
* f772d66 (HEAD, master) Login page
| A login.html
* cb14efd Remove DVD-rip
| D oops.iso
* ce36c98 Careless
| A oops.iso
| A other.html
* 5af4522 Admin page
| A admin.html
* e738b63 Index
A index.html
Note that git lola is a non-standard but highly useful alias. (See the addendum at the end of this answer for details.) The --name-status switch to git log shows tree modifications associated with each commit.
In the “Careless” commit (whose SHA1 object name is ce36c98) the file oops.iso is the DVD-rip added by accident and removed in the next commit, cb14efd. Using the technique described in the aforementioned blog post, the command to execute is:
git filter-branch --prune-empty -d /dev/shm/scratch \
--index-filter "git rm --cached -f --ignore-unmatch oops.iso" \
--tag-name-filter cat -- --all
Options:
--prune-empty removes commits that become empty (i.e., do not change the tree) as a result of the filter operation. In the typical case, this option produces a cleaner history.
-d names a temporary directory that does not yet exist to use for building the filtered history. If you are running on a modern Linux distribution, specifying a tree in /dev/shm will result in faster execution.
--index-filter is the main event and runs against the index at each step in the history. You want to remove oops.iso wherever it is found, but it isn’t present in all commits. The command git rm --cached -f --ignore-unmatch oops.iso deletes the DVD-rip when it is present and does not fail otherwise.
--tag-name-filter describes how to rewrite tag names. A filter of cat is the identity operation. Your repository, like the sample above, may not have any tags, but I included this option for full generality.
-- specifies the end of options to git filter-branch
--all following -- is shorthand for all refs. Your repository, like the sample above, may have only one ref (master), but I included this option for full generality.
After some churning, the history is now:
$ git lola --name-status
* 8e0a11c (HEAD, master) Login page
| A login.html
* e45ac59 Careless
| A other.html
|
| * f772d66 (refs/original/refs/heads/master) Login page
| | A login.html
| * cb14efd Remove DVD-rip
| | D oops.iso
| * ce36c98 Careless
|/ A oops.iso
| A other.html
|
* 5af4522 Admin page
| A admin.html
* e738b63 Index
A index.html
Notice that the new “Careless” commit adds only other.html and that the “Remove DVD-rip” commit is no longer on the master branch. The branch labeled refs/original/refs/heads/master contains your original commits in case you made a mistake. To remove it, follow the steps in “Checklist for Shrinking a Repository.”
$ git update-ref -d refs/original/refs/heads/master
$ git reflog expire --expire=now --all
$ git gc --prune=now
For a simpler alternative, clone the repository to discard the unwanted bits.
$ cd ~/src
$ mv repo repo.old
$ git clone file:///home/user/src/repo.old repo
Using a file:///... clone URL copies objects rather than creating hardlinks only.
Now your history is:
$ git lola --name-status
* 8e0a11c (HEAD, master) Login page
| A login.html
* e45ac59 Careless
| A other.html
* 5af4522 Admin page
| A admin.html
* e738b63 Index
A index.html
The SHA1 object names for the first two commits (“Index” and “Admin page”) stayed the same because the filter operation did not modify those commits. “Careless” lost oops.iso and “Login page” got a new parent, so their SHA1s did change.
Interactive rebase
With a history of:
$ git lola --name-status
* f772d66 (HEAD, master) Login page
| A login.html
* cb14efd Remove DVD-rip
| D oops.iso
* ce36c98 Careless
| A oops.iso
| A other.html
* 5af4522 Admin page
| A admin.html
* e738b63 Index
A index.html
you want to remove oops.iso from “Careless” as though you never added it, and then “Remove DVD-rip” is useless to you. Thus, our plan going into an interactive rebase is to keep “Admin page,” edit “Careless,” and discard “Remove DVD-rip.”
Running $ git rebase -i 5af4522 starts an editor with the following contents.
pick ce36c98 Careless
pick cb14efd Remove DVD-rip
pick f772d66 Login page
# Rebase 5af4522..f772d66 onto 5af4522
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#
Executing our plan, we modify it to
edit ce36c98 Careless
pick f772d66 Login page
# Rebase 5af4522..f772d66 onto 5af4522
# ...
That is, we delete the line with “Remove DVD-rip” and change the operation on “Careless” to be edit rather than pick.
Save-quitting the editor drops us at a command prompt with the following message.
Stopped at ce36c98... Careless
You can amend the commit now, with
git commit --amend
Once you are satisfied with your changes, run
git rebase --continue
As the message tells us, we are on the “Careless” commit we want to edit, so we run two commands.
$ git rm --cached oops.iso
$ git commit --amend -C HEAD
$ git rebase --continue
The first removes the offending file from the index. The second modifies or amends “Careless” to be the updated index and -C HEAD instructs git to reuse the old commit message. Finally, git rebase --continue goes ahead with the rest of the rebase operation.
This gives a history of:
$ git lola --name-status
* 93174be (HEAD, master) Login page
| A login.html
* a570198 Careless
| A other.html
* 5af4522 Admin page
| A admin.html
* e738b63 Index
A index.html
which is what you want.
Addendum: Enable git lola via ~/.gitconfig
Quoting Conrad Parker:
The best tip I learned at Scott Chacon’s talk at linux.conf.au 2010, Git Wrangling - Advanced Tips and Tricks was this alias:
lol = log --graph --decorate --pretty=oneline --abbrev-commit
This provides a really nice graph of your tree, showing the branch structure of merges etc. Of course there are really nice GUI tools for showing such graphs, but the advantage of git lol is that it works on a console or over ssh, so it is useful for remote development, or native development on an embedded board …
So, just copy the following into ~/.gitconfig for your full color git lola action:
[alias]
lol = log --graph --decorate --pretty=oneline --abbrev-commit
lola = log --graph --decorate --pretty=oneline --abbrev-commit --all
[color]
branch = auto
diff = auto
interactive = auto
status = auto
Why not use this simple but powerful command?
git filter-branch --tree-filter 'rm -f DVD-rip' HEAD
The --tree-filter option runs the specified command after each checkout of the project and then recommits the results. In this case, you remove a file called DVD-rip from every snapshot, whether it exists or not.
If you know which commit introduced the huge file (say 35dsa2), you can replace HEAD with 35dsa2..HEAD to avoid rewriting too much history, thus avoiding diverging commits if you haven't pushed yet. This comment courtesy of #alpha_989 seems too important to leave out here.
See this link.
(The best answer I've seen to this problem is: https://stackoverflow.com/a/42544963/714112 , copied here since this thread appears high in Google search rankings but that other one doesn't)
🚀 A blazingly fast shell one-liner 🚀
This shell script displays all blob objects in the repository, sorted from smallest to largest.
For my sample repo, it ran about 100 times faster than the other ones found here.
On my trusty Athlon II X4 system, it handles the Linux Kernel repository with its 5,622,155 objects in just over a minute.
The Base Script
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| awk '/^blob/ {print substr($0,6)}' \
| sort --numeric-sort --key=2 \
| cut --complement --characters=13-40 \
| numfmt --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
When you run above code, you will get nice human-readable output like this:
...
0d99bb931299 530KiB path/to/some-image.jpg
2ba44098e28f 12MiB path/to/hires-image.png
bd1741ddce0d 63MiB path/to/some-video-1080p.mp4
🚀 Fast File Removal 🚀
Suppose you then want to remove the files a and b from every commit reachable from HEAD, you can use this command:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch a b' HEAD
After trying virtually every answer in SO, I finally found this gem that quickly removed and deleted the large files in my repository and allowed me to sync again: http://www.zyxware.com/articles/4027/how-to-delete-files-permanently-from-your-local-and-remote-git-repositories
CD to your local working folder and run the following command:
git filter-branch -f --index-filter "git rm -rf --cached --ignore-unmatch FOLDERNAME" -- --all
replace FOLDERNAME with the file or folder you wish to remove from the given git repository.
Once this is done run the following commands to clean up the local repository:
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now
Now push all the changes to the remote repository:
git push --all --force
This will clean up the remote repository.
100 times faster than git filter-branch and simpler
There are very good answers in this thread, but meanwhile many of them are outdated. Using git-filter-branch is no longer recommended, because it is difficult to use and awfully slow on big repositories.
git-filter-repo is much faster and simpler to use.
git-filter-repo is a Python script, available at github: https://github.com/newren/git-filter-repo . When installed it looks like a regular git command and can be called by git filter-repo.
You need only one file: the Python3 script git-filter-repo. Copy it to a path that is included in the PATH variable. On Windows you may have to change the first line of the script (refer INSTALL.md). You need Python3 installed installed on your system, but this is not a big deal.
First you can run
git filter-repo --analyze
This helps you to determine what to do next.
You can delete your DVD-rip file everywhere:
git filter-repo --invert-paths --path-match DVD-rip
Filter-repo is really fast. A task that took around 9 hours on my computer by filter-branch, was completed in 4 minutes by filter-repo. You can do many more nice things with filter-repo. Refer to the documentation for that.
Warning: Do this on a copy of your repository. Many actions of filter-repo cannot be undone. filter-repo will change the commit hashes of all modified commits (of course) and all their descendants down to the last commits!
These commands worked in my case:
git filter-branch --force --index-filter 'git rm --cached -r --ignore-unmatch oops.iso' --prune-empty --tag-name-filter cat -- --all
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now
It is little different from the above versions.
For those who need to push this to github/bitbucket (I only tested this with bitbucket):
# WARNING!!!
# this will rewrite completely your bitbucket refs
# will delete all branches that you didn't have in your local
git push --all --prune --force
# Once you pushed, all your teammates need to clone repository again
# git pull will not work
According to GitHub Documentation, just follow these steps:
Get rid of the large file
Option 1: You don't want to keep the large file:
rm path/to/your/large/file # delete the large file
Option 2: You want to keep the large file into an untracked directory
mkdir large_files # create directory large_files
touch .gitignore # create .gitignore file if needed
'/large_files/' >> .gitignore # untrack directory large_files
mv path/to/your/large/file large_files/ # move the large file into the untracked directory
Save your changes
git add path/to/your/large/file # add the deletion to the index
git commit -m 'delete large file' # commit the deletion
Remove the large file from all commits
git filter-branch --force --index-filter \
"git rm --cached --ignore-unmatch path/to/your/large/file" \
--prune-empty --tag-name-filter cat -- --all
git push <remote> <branch>
I ran into this with a bitbucket account, where I had accidentally stored ginormous *.jpa backups of my site.
git filter-branch --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY-BIG-DIRECTORY-OR-FILE' --tag-name-filter cat -- --all
Relpace MY-BIG-DIRECTORY with the folder in question to completely rewrite your history (including tags).
source: https://web.archive.org/web/20170727144429/http://naleid.com:80/blog/2012/01/17/finding-and-purging-big-files-from-git-history/
Just note that this commands can be very destructive. If more people are working on the repo they'll all have to pull the new tree. The three middle commands are not necessary if your goal is NOT to reduce the size. Because the filter branch creates a backup of the removed file and it can stay there for a long time.
$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch YOURFILENAME" HEAD
$ rm -rf .git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
$ git push origin master --force
git filter-branch --tree-filter 'rm -f path/to/file' HEAD
worked pretty well for me, although I ran into the same problem as described here, which I solved by following this suggestion.
The pro-git book has an entire chapter on rewriting history - have a look at the filter-branch/Removing a File from Every Commit section.
If you know your commit was recent instead of going through the entire tree do the following:
git filter-branch --tree-filter 'rm LARGE_FILE.zip' HEAD~10..HEAD
This will remove it from your history
git filter-branch --force --index-filter 'git rm -r --cached --ignore-unmatch bigfile.txt' --prune-empty --tag-name-filter cat -- --all
Use Git Extensions, it's a UI tool. It has a plugin named "Find large files" which finds lage files in repositories and allow removing them permenently.
Don't use 'git filter-branch' before using this tool, since it won't be able to find files removed by 'filter-branch' (Altough 'filter-branch' does not remove files completely from the repository pack files).
I basically did what was on this answer:
https://stackoverflow.com/a/11032521/1286423
(for history, I'll copy-paste it here)
$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch YOURFILENAME" HEAD
$ rm -rf .git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
$ git push origin master --force
It didn't work, because I like to rename and move things a lot. So some big file were in folders that have been renamed, and I think the gc couldn't delete the reference to those files because of reference in tree objects pointing to those file.
My ultimate solution to really kill it was to:
# First, apply what's in the answer linked in the front
# and before doing the gc --prune --aggressive, do:
# Go back at the origin of the repository
git checkout -b newinit <sha1 of first commit>
# Create a parallel initial commit
git commit --amend
# go back on the master branch that has big file
# still referenced in history, even though
# we thought we removed them.
git checkout master
# rebase on the newinit created earlier. By reapply patches,
# it will really forget about the references to hidden big files.
git rebase newinit
# Do the previous part (checkout + rebase) for each branch
# still connected to the original initial commit,
# so we remove all the references.
# Remove the .git/logs folder, also containing references
# to commits that could make git gc not remove them.
rm -rf .git/logs/
# Then you can do a garbage collection,
# and the hidden files really will get gc'ed
git gc --prune --aggressive
My repo (the .git) changed from 32MB to 388KB, that even filter-branch couldn't clean.
git filter-branch is a powerful command which you can use it to delete a huge file from the commits history. The file will stay for a while and Git will remove it in the next garbage collection.
Below is the full process from deleteing files from commit history. For safety, below process runs the commands on a new branch first. If the result is what you needed, then reset it back to the branch you actually want to change.
# Do it in a new testing branch
$ git checkout -b test
# Remove file-name from every commit on the new branch
# --index-filter, rewrite index without checking out
# --cached, remove it from index but not include working tree
# --ignore-unmatch, ignore if files to be removed are absent in a commit
# HEAD, execute the specified command for each commit reached from HEAD by parent link
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch file-name' HEAD
# The output is OK, reset it to the prior branch master
$ git checkout master
$ git reset --soft test
# Remove test branch
$ git branch -d test
# Push it with force
$ git push --force origin master
NEW ANSWER THAT WORKS IN 20222.
DO NOT USE:
git filter-branch
this command might not change the remote repo after pushing. If you clone after using it, you will see that nothing has changed and the repo still has a large size. this command is old now. For example, if you use the steps in https://github.com/18F/C2/issues/439, this won't work.
You need to use
git filter-repo
Steps:
(1) Find the largest files in .git:
git rev-list --objects --all | grep -f <(git verify-pack -v .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " " | tail -10)
(2) Start filtering these large files:
git filter-repo --path-glob '../../src/../..' --invert-paths --force
or
git filter-repo --path-glob '*.zip' --invert-paths --force
or
git filter-repo --path-glob '*.a' --invert-paths --force
or
whatever you find in step 1.
(3)
git remote add origin git#github.com:.../...git
(4)
git push --all --force
git push --tags --force
DONE!!!
You can do this using the branch filter command:
git filter-branch --tree-filter 'rm -rf path/to/your/file' HEAD
When you run into this problem, git rm will not suffice, as git remembers that the file existed once in our history, and thus will keep a reference to it.
To make things worse, rebasing is not easy either, because any references to the blob will prevent git garbage collector from cleaning up the space. This includes remote references and reflog references.
I put together git forget-blob, a little script that tries removing all these references, and then uses git filter-branch to rewrite every commit in the branch.
Once your blob is completely unreferenced, git gc will get rid of it
The usage is pretty simple git forget-blob file-to-forget. You can get more info here
https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/
I put this together thanks to the answers from Stack Overflow and some blog entries. Credits to them!
Other than git filter-branch (slow but pure git solution) and BFG (easier and very performant), there is also another tool to filter with good performance:
https://github.com/xoofx/git-rocket-filter
From its description:
The purpose of git-rocket-filter is similar to the command git-filter-branch while providing the following unique features:
Fast rewriting of commits and trees (by an order of x10 to x100).
Built-in support for both white-listing with --keep (keeps files or directories) and black-listing with --remove options.
Use of .gitignore like pattern for tree-filtering
Fast and easy C# Scripting for both commit filtering and tree filtering
Support for scripting in tree-filtering per file/directory pattern
Automatically prune empty/unchanged commit, including merge commits
This works perfectly for me : in git extensions :
right click on the selected commit :
reset current branch to here :
hard reset ;
It's surprising nobody else is able to give this simple answer.
git reset --soft HEAD~1
It will keep the changes but remove the commit then you can re-commit those changes.

How to exclude Jupyter notebook checkpoints from git [duplicate]

I put a file that was previously being tracked by Git onto the .gitignore list. However, the file still shows up in git status after it is edited. How do I force Git to completely forget the file?
.gitignore will prevent untracked files from being added (without an add -f) to the set of files tracked by Git. However, Git will continue to track any files that are already being tracked.
To stop tracking a file, we must remove it from the index:
git rm --cached <file>
To remove a folder and all files in the folder recursively:
git rm -r --cached <folder>
The removal of the file from the head revision will happen on the next commit.
WARNING: While this will not remove the physical file from your local machine, it will remove the files from other developers' machines on their next git pull.
The series of commands below will remove all of the items from the Git index (not from the working directory or local repository), and then will update the Git index, while respecting Git ignores. PS. Index = Cache
First:
git rm -r --cached .
git add .
Then:
git commit -am "Remove ignored files"
Or as a one-liner:
git rm -r --cached . && git add . && git commit -am "Remove ignored files"
git update-index does the job for me:
git update-index --assume-unchanged <file>
Note: This solution is actually independent of .gitignore as gitignore is only for untracked files.
Update, a better option
Since this answer was posted, a new option has been created and that should be preferred. You should use --skip-worktree which is for modified tracked files that the user don't want to commit anymore and keep --assume-unchanged for performance to prevent git to check status of big tracked files. See https://stackoverflow.com/a/13631525/717372 for more details...
git update-index --skip-worktree <file>
To cancel
git update-index --no-skip-worktree <file>
git ls-files -c --ignored --exclude-standard -z | xargs -0 git rm --cached
git commit -am "Remove ignored files"
This takes the list of the ignored files, removes them from the index, and commits the changes.
Move it out, commit, and then move it back in.
This has worked for me in the past, but there is probably a 'gittier' way to accomplish this.
I always use this command to remove those untracked files.
One-line, Unix-style, clean output:
git ls-files --ignored --exclude-standard | sed 's/.*/"&"/' | xargs git rm -r --cached
It lists all your ignored files, replaces every output line with a quoted line instead to handle paths with spaces inside, and passes everything to git rm -r --cached to remove the paths/files/directories from the index.
The copy/paste (one-liner) answer is:
git rm --cached -r .; git add .; git status; git commit -m "Ignore unwanted files"
This command will NOT change the content of the .gitignore file. It will just ignore the files that have already been committed to a Git repository, but now we have added them to .gitignore.
The command git status; is to review the changes and could be dropped.
Ultimately, it will immediately commit the changes with the message "Ignore unwanted files".
If you don't want to commit the changes, drop the last part of the command (git commit -m "Ignore unwanted files")
Use this when:
You want to untrack a lot of files, or
You updated your .gitignore file
Source: Untrack files already added to Git repository based on .gitignore
Let’s say you have already added/committed some files to your Git repository and you then add them to your .gitignore file; these files will still be present in your repository index. This article we will see how to get rid of them.
Step 1: Commit all your changes
Before proceeding, make sure all your changes are committed, including your .gitignore file.
Step 2: Remove everything from the repository
To clear your repository, use:
git rm -r --cached .
rm is the remove command
-r will allow recursive removal
–cached will only remove files from the index. Your files will still be there.
The rm command can be unforgiving. If you wish to try what it does beforehand, add the -n or --dry-run flag to test things out.
Step 3: Readd everything
git add .
Step 4: Commit
git commit -m ".gitignore fix"
Your repository is clean :)
Push the changes to your remote to see the changes effective there as well.
If you cannot git rm a tracked file because other people might need it (warning, even if you git rm --cached, when someone else gets this change, their files will be deleted in their filesystem). These are often done due to config file overrides, authentication credentials, etc. Please look at https://gist.github.com/1423106 for ways people have worked around the problem.
To summarize:
Have your application look for an ignored file config-overide.ini and use that over the committed file config.ini (or alternately, look for ~/.config/myapp.ini, or $MYCONFIGFILE)
Commit file config-sample.ini and ignore file config.ini, have a script or similar copy the file as necessary if necessary.
Try to use gitattributes clean/smudge magic to apply and remove the changes for you, for instance smudge the config file as a checkout from an alternate branch and clean the config file as a checkout from HEAD. This is tricky stuff, I don't recommend it for the novice user.
Keep the config file on a deploy branch dedicated to it that is never merged to master. When you want to deploy/compile/test you merge to that branch and get that file. This is essentially the smudge/clean approach except using human merge policies and extra-git modules.
Anti-recommentation: Don't use assume-unchanged, it will only end in tears (because having git lie to itself will cause bad things to happen, like your change being lost forever).
I accomplished this by using git filter-branch. The exact command I used was taken from the man page:
WARNING: this will delete the file from your entire history
git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
This command will recreate the entire commit history, executing git rm before each commit and so will get rid of the specified file. Don't forget to back it up before running the command as it will be lost.
What didn't work for me
(Under Linux), I wanted to use the posts here suggesting the ls-files --ignored --exclude-standard | xargs git rm -r --cached approach. However, (some of) the files to be removed had an embedded newline/LF/\n in their names. Neither of the solutions:
git ls-files --ignored --exclude-standard | xargs -d"\n" git rm --cached
git ls-files --ignored --exclude-standard | sed 's/.*/"&"/' | xargs git rm -r --cached
cope with this situation (get errors about files not found).
So I offer
git ls-files -z --ignored --exclude-standard | xargs -0 git rm -r --cached
git commit -am "Remove ignored files"
This uses the -z argument to ls-files, and the -0 argument to xargs to cater safely/correctly for "nasty" characters in filenames.
In the manual page git-ls-files(1), it states:
When -z option is not used, TAB, LF, and backslash characters in
pathnames are represented as \t, \n, and \\, respectively.
so I think my solution is needed if filenames have any of these characters in them.
Do the following steps for a file/folder:
Remove a File:
need to add that file to .gitignore.
need to remove that file using the command (git rm --cached file name).
need to run (git add .).
need to (commit -m) "file removed".
and finally, (git push).
For example:
I want to delete the test.txt file. I accidentally pushed to GitHub and want to remove it. Commands will be as follows:
First, add "test.txt" in file .gitignore
git rm --cached test.txt
git add .
git commit -m "test.txt removed"
git push
Remove Folder:
need to add that folder to file .gitignore.
need to remove that folder using the command (git rm -r --cached folder name).
need to run (git add .).
need to (commit -m) "folder removed".
and finally, (git push).
For example:
I want to delete the .idea folder/directory. I accidentally pushed to GitHub and want to remove it. The commands will be as follows:
First, add .idea in file .gitignore
git rm -r --cached .idea
git add .
git commit -m ".idea removed"
git push
Update your .gitignore file – for instance, add a folder you don't want to track to .gitignore.
git rm -r --cached . – Remove all tracked files, including wanted and unwanted. Your code will be safe as long as you have saved locally.
git add . – All files will be added back in, except those in .gitignore.
Hat tip to #AkiraYamamoto for pointing us in the right direction.
Do the following steps serially, and you will be fine.
Remove the mistakenly added files from the directory/storage. You can use the "rm -r" (for Linux) command or delete them by browsing the directories. Or move them to another location on your PC. (You maybe need to close the IDE if running for moving/removing.)
Add the files / directories to the .gitignore file now and save it.
Now remove them from the Git cache by using these commands (if there is more than one directory, remove them one by one by repeatedly issuing this command)
git rm -r --cached path-to-those-files
Now do a commit and push by using the following commands. This will remove those files from Git remote and make Git stop tracking those files.
git add .
git commit -m "removed unnecessary files from Git"
git push origin
I think, that maybe Git can't totally forget about a file because of its conception (section "Snapshots, Not Differences").
This problem is absent, for example, when using CVS. CVS stores information as a list of file-based changes. Information for CVS is a set of files and the changes made to each file over time.
But in Git every time you commit, or save the state of your project, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. So, if you added file once, it will always be present in that snapshot.
These two articles were helpful for me:
git assume-unchanged vs skip-worktree and How to ignore changes in tracked files with Git
Basing on it I do the following, if the file is already tracked:
git update-index --skip-worktree <file>
From this moment all local changes in this file will be ignored and will not go to remote. If the file is changed on remote, conflict will occur, when git pull. Stash won't work. To resolve it, copy the file content to the safe place and follow these steps:
git update-index --no-skip-worktree <file>
git stash
git pull
The file content will be replaced by the remote content. Paste your changes from the safe place to the file and perform again:
git update-index --skip-worktree <file>
If everyone, who works with the project, will perform git update-index --skip-worktree <file>, problems with pull should be absent. This solution is OK for configurations files, when every developer has their own project configuration.
It is not very convenient to do this every time, when the file has been changed on remote, but it can protect it from overwriting by remote content.
Using the git rm --cached command does not answer the original question:
How do you force git to completely forget about [a file]?
In fact, this solution will cause the file to be deleted in every other instance of the repository when executing a git pull!
The correct way to force Git to forget about a file is documented by GitHub here.
I recommend reading the documentation, but basically:
git fetch --all
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch full/path/to/file' --prune-empty --tag-name-filter cat -- --all
git push origin --force --all
git push origin --force --tags
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --prune=now
Just replace full/path/to/file with the full path of the file. Make sure you've added the file to your .gitignore file.
You'll also need to (temporarily) allow non-fast-forward pushes to your repository, since you're changing your Git history.
Move or copy the file to a safe location, so you don't lose it. Then 'git rm' the file and commit.
The file will still show up if you revert to one of those earlier commits, or another branch where it has not been removed. However, in all future commits, you will not see the file again. If the file is in the Git ignore, then you can move it back into the folder, and Git won't see it.
The answer from Matt Frear was the most effective IMHO. The following is just a PowerShell script for those on Windows to only remove files from their Git repository that matches their exclusion list.
# Get files matching exclusionsfrom .gitignore
# Excluding comments and empty lines
$ignoreFiles = gc .gitignore | ?{$_ -notmatch "#"} | ?{$_ -match "\S"} | % {
$ignore = "*" + $_ + "*"
(gci -r -i $ignore).FullName
}
$ignoreFiles = $ignoreFiles| ?{$_ -match "\S"}
# Remove each of these file from Git
$ignoreFiles | % { git rm $_}
git add .
The accepted answer does not "make Git "forget" about a file..." (historically). It only makes Git ignore the file in the present/future.
This method makes Git completely forget ignored files (past/present/future), but it does not delete anything from the working directory (even when re-pulled from remote).
This method requires usage of file /.git/info/exclude (preferred) or a pre-existing .gitignore in all the commits that have files to be ignored/forgotten. 1
All methods of enforcing Git ignore behavior after-the-fact effectively rewrite history and thus have significant ramifications for any public/shared/collaborative repositories that might be pulled after this process. 2
General advice: start with a clean repository - everything committed, nothing pending in working directory or index, and make a backup!
Also, the comments/revision history of this answer (and revision history of this question) may be useful/enlightening.
#Commit up-to-date .gitignore (if not already existing)
#This command must be run on each branch
git add .gitignore
git commit -m "Create .gitignore"
#Apply standard Git ignore behavior only to the current index, not the working directory (--cached)
#If this command returns nothing, ensure /.git/info/exclude AND/OR .gitignore exist
#This command must be run on each branch
git ls-files -z --ignored --exclude-standard | xargs -0 git rm --cached
#Commit to prevent working directory data loss!
#This commit will be automatically deleted by the --prune-empty flag in the following command
#This command must be run on each branch
git commit -m "ignored index"
#Apply standard git ignore behavior RETROACTIVELY to all commits from all branches (--all)
#This step WILL delete ignored files from working directory UNLESS they have been dereferenced from the index by the commit above
#This step will also delete any "empty" commits. If deliberate "empty" commits should be kept, remove --prune-empty and instead run git reset HEAD^ immediately after this command
git filter-branch --tree-filter 'git ls-files -z --ignored --exclude-standard | xargs -0 git rm -f --ignore-unmatch' --prune-empty --tag-name-filter cat -- --all
#List all still-existing files that are now ignored properly
#If this command returns nothing, it's time to restore from backup and start over
#This command must be run on each branch
git ls-files --other --ignored --exclude-standard
Finally, follow the rest of this GitHub guide (starting at step 6) which includes important warnings/information about the commands below.
git push origin --force --all
git push origin --force --tags
git for-each-ref --format="delete %(refname)" refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --prune=now
Other developers that pull from the now-modified remote repository should make a backup and then:
#fetch modified remote
git fetch --all
#"Pull" changes WITHOUT deleting newly-ignored files from working directory
#This will overwrite local tracked files with remote - ensure any local modifications are backed-up/stashed
git reset FETCH_HEAD
Footnotes
1 Because /.git/info/exclude can be applied to all historical commits using the instructions above, perhaps details about getting a .gitignore file into the historical commit(s) that need it is beyond the scope of this answer. I wanted a proper .gitignore file to be in the root commit, as if it was the first thing I did. Others may not care since /.git/info/exclude can accomplish the same thing regardless where the .gitignore file exists in the commit history, and clearly rewriting history is a very touchy subject, even when aware of the ramifications.
FWIW, potential methods may include git rebase or a git filter-branch that copies an external .gitignore into each commit, like the answers to this question.
2 Enforcing Git ignore behavior after-the-fact by committing the results of a stand-alone git rm --cached command may result in newly-ignored file deletion in future pulls from the force-pushed remote. The --prune-empty flag in the following git filter-branch command avoids this problem by automatically removing the previous "delete all ignored files" index-only commit. Rewriting Git history also changes commit hashes, which will wreak havoc on future pulls from public/shared/collaborative repositories. Please understand the ramifications fully before doing this to such a repository. This GitHub guide specifies the following:
Tell your collaborators to rebase, not merge, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging.
Alternative solutions that do not affect the remote repository are git update-index --assume-unchanged </path/file> or git update-index --skip-worktree <file>, examples of which can be found here.
In my case I needed to put ".envrc" in the .gitignore file.
And then I used:
git update-index --skip-worktree .envrc
git rm --cached .envrc
And the file was removed.
Then I committed again, telling that the file was removed.
But when I used the command git log -p, the content of the file (which was secret credentials of the Amazon S3) was showing the content which was removed and I don't want to show this content ever on the history of the Git repository.
Then I used this command:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch .envrc' HEAD
And I don't see the content again.
I liked JonBrave's answer, but I have messy enough working directories that commit -a scares me a bit, so here's what I've done:
git config --global alias.exclude-ignored '!git ls-files -z --ignored --exclude-standard | xargs -0 git rm -r --cached && git ls-files -z --ignored --exclude-standard | xargs -0 git stage && git stage .gitignore && git commit -m "new gitignore and remove ignored files from index"'
Breaking it down:
git ls-files -z --ignored --exclude-standard | xargs -0 git rm -r --cached
git ls-files -z --ignored --exclude-standard | xargs -0 git stage
git stage .gitignore
git commit -m "new gitignore and remove ignored files from index"
remove ignored files from the index
stage .gitignore and the files you just removed
commit
The BFG is specifically designed for removing unwanted data like big files or passwords from Git repositories, so it has a simple flag that will remove any large historical (not-in-your-current-commit) files: '--strip-blobs-bigger-than'
java -jar bfg.jar --strip-blobs-bigger-than 100M
If you'd like to specify files by name, you can do that too:
java -jar bfg.jar --delete-files *.mp4
The BFG is 10-1000x faster than git filter-branch and is generally much easier to use - check the full usage instructions and examples for more details.
Source: Reduce repository size
If you don't want to use the CLI and are working on Windows, a very simple solution is to use TortoiseGit. It has the "Delete (keep local)" Action in the menu which works fine.
This is no longer an issue in the latest Git (v2.17.1 at the time of writing).
The .gitignore file finally ignores tracked-but-deleted files. You can test this for yourself by running the following script. The final git status statement should report "nothing to commit".
# Create an empty repository
mkdir gitignore-test
cd gitignore-test
git init
# Create a file and commit it
echo "hello" > file
git add file
git commit -m initial
# Add the file to gitignore and commit
echo "file" > .gitignore
git add .gitignore
git commit -m gitignore
# Remove the file and commit
git rm file
git commit -m "removed file"
# Reintroduce the file and check status.
# .gitignore is now respected - status reports "nothing to commit".
echo "hello" > file
git status
This is how I solved my issue:
git filter-branch --tree-filter 'rm -rf path/to/your/file' HEAD
git push
In this, we are basically trying to rewrite the history of that particular file in previous commits also.
For more information, you can refer to the man page of filter-branch here.
Source: Removing sensitive data from a repository - using filter-branch
Source: Git: How to remove a big file wrongly committed
In case of already committed DS_Store:
find . -name .DS_Store -print0 | xargs -0 git rm --ignore-unmatch
Ignore them by:
echo ".DS_Store" >> ~/.gitignore_global
echo "._.DS_Store" >> ~/.gitignore_global
echo "**/.DS_Store" >> ~/.gitignore_global
echo "**/._.DS_Store" >> ~/.gitignore_global
git config --global core.excludesfile ~/.gitignore_global
Finally, make a commit!
Especially for the IDE-based files, I use this:
For instance, for the slnx.sqlite file, I just got rid off it completely like the following:
git rm {PATH_OF_THE_FILE}/slnx.sqlite -f
git commit -m "remove slnx.sqlite"
Just keep that in mind that some of those files store some local user settings and preferences for projects (like what files you had open). So every time you navigate or do some changes in your IDE, that file is changed and therefore it checks it out and show as uncommitted changes.
If anyone is having a hard time on Windows and you want to ignore the entire folder, go to the desired 'folder' on file explorer, right click and do 'Git Bash Here' (Git for Windows should have been installed).
Run this command:
git ls-files -z | xargs -0 git update-index --assume-unchanged
For me, the file was still available in the history and I first needed to squash the commits that added the removed files: https://gist.github.com/patik/b8a9dc5cd356f9f6f980
Combine the commits. The example below combines the last 3 commits
git reset --soft HEAD~3
git commit -m "New message for the combined commit"
Push the squashed commit
If the commits have been pushed to the remote:
git push origin +name-of-branch
In my case here, I had several .lock files in several directories that I needed to remove. I ran the following and it worked without having to go into each directory to remove them:
git rm -r --cached **/*.lock
Doing this went into each folder under the 'root' of where I was at and excluded all files that matched the pattern.

not able to Add file to git VIA windows 10 [fatal: not a git repository (or any of the parent directories): .git]

i am not able to add a file
to git via windows 10 and also it is giving me this error
for checking the status also
fatal: not a git repository (or any of the parent directories): .git
make sure the remote url exists in the directory where you are trying to add the file.
Run the following command:
git remote -v
If it still stays `fatal: not a git repository (or any of the parent directories): .git
`
then it means git is not configured to your code.
git init
git remote add origin <HTTPS URL FOR GIT REPO>
then you will be able to add the file

How to solve the "remote: You are not allowed to upload code." error on GitLab CI/CD job?

I am currently trying to use GitLab to run a CI/CD job that runs a Python file that makes changes to a particular repository and then commits and pushes those changes to master. I also have a role of Master in the repository. It appears that all git functions run fine except for the git push, which leads to fatal: You are not currently on a branch. and with using git push origin HEAD:master --force, that leads to fatal: unable to access 'https://gitlab-ci-token:xxx#xxx/project.git/': The requested URL returned error: 403. I've been looking over solutions online, one being this one, and another being unprotecting it, and couldn't quite find what I was looking for just yet. This is also a sub-project within the GitLab repository.
Right now, this is pretty much what my .gitlab-ci.yml looks like.
before_script:
- apt-get update -y
- apt-get install git -y
- apt-get install python -y
- apt-get python-pip -y
main:
script:
- git config --global user.email "xxx#xxx"
- git config --global user.name "xxx xxx"
- git config --global push.default simple
- python main.py
My main.py file essentially has a function that creates a new file within an internal directory provided that it doesn't already exist. It has a looks similar to the following:
import os
import json
def createFile(strings):
print ">>> Pushing to repo...";
if not os.path.exists('files'):
os.system('mkdir files');
for s in strings:
title = ("files/"+str(s['title'])+".json").encode('utf-8').strip();
with open(title, 'w') as filedata:
json.dump(s, filedata, indent=4);
os.system('git add files/');
os.system('git commit -m "Added a directory with a JSON file in it..."');
os.system('git push origin HEAD:master --force');
createFile([{"title":"A"}, {"title":"B"}]);
I'm not entirely sure why this keeps happening, but I have even tried to modify the repository settings to change from protected pull and push access, but when I hit Save, it doesn't actually save. Nonetheless, this is my overall output. I would really appreciate any guidance any can offer.
Running with gitlab-runner 10.4.0 (00000000)
on cicd-shared-gitlab-runner (00000000)
Using Kubernetes namespace: cicd-shared-gitlab-runner
Using Kubernetes executor with image ubuntu:16.04 ...
Waiting for pod cicd-shared-gitlab-runner/runner-00000000-project-00000-concurrent-000000 to be running, status is Pending
Waiting for pod cicd-shared-gitlab-runner/runner-00000000-project-00000-concurrent-000000 to be running, status is Pending
Running on runner-00000000-project-00000-concurrent-000000 via cicd-shared-gitlab-runner-0000000000-00000...
Cloning repository...
Cloning into 'project'...
Checking out 00000000 as master...
Skipping Git submodules setup
$ apt-get update -y >& /dev/null
$ apt-get install git -y >& /dev/null
$ apt-get install python -y >& /dev/null
$ apt-get install python-pip -y >& /dev/null
$ git config --global user.email "xxx#xxx" >& /dev/null
$ git config --global user.name "xxx xxx" >& /dev/null
$ git config --global push.default simple >& /dev/null
$ python main.py
[detached HEAD 0000000] Added a directory with a JSON file in it...
2 files changed, 76 insertions(+)
create mode 100644 files/A.json
create mode 100644 files/B.json
remote: You are not allowed to upload code.
fatal: unable to access 'https://gitlab-ci-token:xxx#xxx/project.git/': The requested URL returned error: 403
HEAD detached from 000000
Changes not staged for commit:
modified: otherfiles/otherstuff.txt
no changes added to commit
remote: You are not allowed to upload code.
fatal: unable to access 'https://gitlab-ci-token:xxx#xxx/project.git/': The requested URL returned error: 403
>>> Pushing to repo...
Job succeeded
Here is a resource from Gitlab that describes how to make commits to the repository within the CI pipeline: https://gitlab.com/guided-explorations/gitlab-ci-yml-tips-tricks-and-hacks/commit-to-repos-during-ci/commit-to-repos-during-ci
Try configuring your gitlab-ci.yml file to push the changes rather than trying to do it from the python file.
I managed to do this via ssh on a runner by making sure the ssh key is added, and then using the full git url:
task_name:
stage: some_stage
script:
- ssh-add -K ~/.ssh/[ssh key]
- git push -o ci-skip git#gitlab.com:[path to repo].git HEAD:[branch name]
If it is the same repo that triggered the job, the url could also be written as:
git#$CI_SERVER_HOST:$CI_PROJECT_PATH.git
This method can be used to commit tags or files. You may also wish to consider using the CI CD
variable API to store cross-build persistent data if it does not have to be committed to the repo
https://docs.gitlab.com/ee/api/project_level_variables.html
https://docs.gitlab.com/ee/api/group_level_variables.html
ACCESS_TOKEN below is a variable at the repo or an upbound group level that contains a token that
can write to the target repos. Since maintainer can see these, it is best practice to
create tokens on special API users who are least privileged for just what they need to do.
write_to_another_repo:
before_script:
- git config --global user.name "${GITLAB_USER_NAME}"
- git config --global user.email "${GITLAB_USER_EMAIL}"
script:
- |
echo "This CI job demonstrates writing files and tags back to a different repository than this .gitlab-ci.yml is stored in."
OTHERREPOPATH="guided-explorations/gitlab-ci-yml-tips-tricks-and-hacks/commit-to-repos-during-ci/pushed-to-from-another-repo-ci.git"
git clone https://gitlab-ci-token:${CI_JOB_TOKEN}#$CI_SERVER_HOST/$OTHERREPOPATH
cd pushed-to-from-another-repo-ci
CURRENTDATE="$(date)"
echo "$CURRENTDATE added a line" | tee -a timelog.log
git status
git add timelog.log
# "[ci skip]" and "-o ci-skip" prevent a CI trigger loop
git commit -m "[ci skip] updated timelog.log at $CURRENTDATE"
git push -o ci-skip http://root:$ACCESS_TOKEN#$CI_SERVER_HOST/$OTHERREPOPATH HEAD:master
#Tag commit (can be used without commiting files)
git tag "v$(date +%s)"
git tag
git push --tags http://root:$ACCESS_TOKEN#$CI_SERVER_HOST/$OTHERREPOPATH HEAD:master
The requested URL returned error: 403
The HTTP 403 Forbidden client error status response code indicates that the server understood the request but refuses to authorize it.
The problem is we cannot provide a valid authentication to git and hence our request is forbidden.
Try this:Control Panel => User Accounts => Manage your credentials => Windows Credentials
It worked for me.However I'm not quite sure if it will work for you.
Maybe you can need to generate access token on profile, edit profile - then access tokens for 'read_repository' or 'write_repository'
profile => edit profile => access tokens

Not transferring project to server when git push

I have a django project under development on my windows computer (dev-machine). I am using pyCharm for development.
I have set up a server (server-machine) running ubuntu. And now want to push my project to the server.
So in my project folder on the dev-machine I have done the git init:
$ git init
$ git add .
$ git commit -m"Init of Git"
And on the server-machine I have made a project folder: /home/username/projects
In this folder I init git as well
$ git init --bare
Back on my dev-machine, I set the connection to the server-machine by doing this
$ git remote add origin username#11.22.33.44:/home/username/projects
And finally pushing my project to server-machine by typing this command on my dev-machine
$ git push origin master
It starts to do ome transfer. And here's the problem.
On the server-machine when I check what's been transferred, it's only stuff like this
~/projects$ ls
branches config description HEAD hooks info objects refs
Not a single file from the project is transferred. This looks much like what the .git folder contains on the dev-machine.
What am I doing wrong?
What you see is the directory structure git uses to store your files and meta data. This is not a checked-out copy of the repository.
To check whether the data made it into the repository use git log in ~/project
Okay, so I understand now where I went wrong.
With git push I am not setting up the project.
To set up the project I need to do git clone.
This is how I did it.
1.
So I made a folder for git repositories on the server-machine. I called it /home/username/gitrepos/
2.
Inside there, I made a folder for my project, where I push the git repository into. So path would look like this for me /home/username/gitrepos/projectname/
3.
Being inside that folder I do a 'git init' like this
$ git init --bare
4.
Then I push the git repo to this location. First setting the remote adress from my dev-machine with this command. If adding a remote destination new use this:
$ git remote set nameofconnection username#ip.ip.ip.ip:/home/username/gitrepos/projectname
if changing the adress for a remote destination use this:
$ git remote set-url nameofconnection username#ip.ip.ip.ip:/home/username/gitrepos/projectname
To se with remote destinations you have set type this:
$ git remote -v
5.
Now go back to server-machine and clone the project into a project folder. I made a folder like this /home/username/projects/
When being inside that folder I clone from the gitrepo ike this:
$ git clone /home/username/gitrepos/projectname
Thank you all for the help! <3

Categories