git reset --hard HEAD vs git checkout <file>

git reset --hard HEAD vs git checkout <file> - python

I have a file foo.py. I have made some changes to the working directory, but not staged or commited any changes yet. I know i can use git checkout foo.py to get rid of these changes. I also read about using git reset --hard HEADwhich essentially resets your working directory, staging area and commit history to match the latest commit.
Is there any reason to prefer using one over the other in my case, where my changes are still in working directory?

Is there any reason to prefer using one over the other in my case, where my changes are still in working directory?
No, since they will accomplish the same thing.
Is there any reason to prefer using [git checkout -- path/to/file] over [git reset --hard] in [general but not in my specific case]?
Yes: this will affect only the one file. If your habit is git reset --hard to undo changes to one file, and you have work-tree and/or staged changes to other files too, and you git reset --hard, you may be out of luck getting those changes back without reconstructing them by hand.
Note that there is a third construct: git checkout HEAD path/to/file. The difference between this and the one without HEAD (with -- instead1) is that the one with HEAD means copy the version of the file that is in a permanent, unchangeable commit into the index / staging-area first, and then into the work-tree. The one with -- means copy the version of the file that is in the index / staging-area into the work-tree.
1The reason to use -- is to make sure Git never confuses a file name with anything else, like a branch name. For instance, suppose you name a file master, just to be obstinate. What, then, does git checkout master mean? Is it supposed to check out branch master, or extract file master? The -- in git checkout -- master makes it clear to Git—and to humans—that this means "extract file master".
Summary, or, things to keep in mind
There are, at all times, three active copies of each file:
one in HEAD;
one in the index / staging-area;
one in the work-tree.
The git status command looks at all three, comparing HEAD-vs-index first—this gives Git the list of "changes to be committed"—and then index-vs-work-tree second. The second one gives Git the list of "changes not staged for commit".
The git add command copies from the work-tree, into the index.
The git checkout command copies either from HEAD to the index and then the work-tree, or just from the index to the work-tree. So it's a bit complicated, with multiple modes of operation:
git checkout -- path/to/file: copies from index to work-tree. What's in HEAD does not matter here. The -- is usually optional, unless the file name looks like a branch name (e.g., a file named master) or an option (e.g., a file named -f).
git checkout HEAD -- path/to/file: copies from HEAD commit, to index, then to work-tree. What's in HEAD overwrites what's in the index, which then overwrites what's in the work-tree. The -- is usually optional, unless the file name looks like an option (e.g., -f).
It's wise to use the -- always, just as a good habit.
The git reset command is complicated (it has many modes of operation).
(This is not to say that git checkout is simple: it, too, has many modes of operation, probably too many. But I think git reset is at least a little bit worse.)

Maybe this can help :
Source :
https://www.patrickzahnd.ch

Under your constraints, there's no difference. If there ARE staged changes, there is, though:
reset reverts staged changes.
checkout does not ...

Related

Is there any way to recover files from github and local machine that has been accidentally removed? [duplicate]

In Git, I was trying to do a squash commit by merging in another branch and then resetting HEAD to the previous place via:
git reset origin/master
But I need to step out of this. How can I move HEAD back to the previous location?
I have the SHA-1 fragment (23b6772) of the commit that I need to move it to. How can I get back to this commit?

Before answering, let's add some background, explaining what this HEAD is.
First of all what is HEAD?
HEAD is simply a reference to the current commit (latest) on the current branch.
There can only be a single HEAD at any given time (excluding git worktree).
The content of HEAD is stored inside .git/HEAD and it contains the 40 bytes SHA-1 of the current commit.
detached HEAD
If you are not on the latest commit - meaning that HEAD is pointing to a prior commit in history it's called detached HEAD.
On the command line, it will look like this - SHA-1 instead of the branch name since the HEAD is not pointing to the tip of the current branch:
A few options on how to recover from a detached HEAD:
git checkout
git checkout <commit_id>
git checkout -b <new branch> <commit_id>
git checkout HEAD~X // x is the number of commits to go back
This will checkout the new branch pointing to the desired commit.
This command will checkout to a given commit.
At this point, you can create a branch and start to work from this point on.
# Checkout a given commit.
# Doing so will result in a `detached HEAD` which mean that the `HEAD`
# is not pointing to the latest so you will need to checkout branch
# in order to be able to update the code.
git checkout <commit-id>
# Create a new branch forked to the given commit
git checkout -b <branch name>
git reflog
You can always use the reflog as well.
git reflog will display any change which updated the HEAD and checking out the desired reflog entry will set the HEAD back to this commit.
Every time the HEAD is modified there will be a new entry in the reflog
git reflog
git checkout HEAD#{...}
This will get you back to your desired commit
git reset --hard <commit_id>
"Move" your HEAD back to the desired commit.
# This will destroy any local modifications.
# Don't do it if you have uncommitted work you want to keep.
git reset --hard 0d1d7fc32
# Alternatively, if there's work to keep:
git stash
git reset --hard 0d1d7fc32
git stash pop
# This saves the modifications, then reapplies that patch after resetting.
# You could get merge conflicts if you've modified things which were
# changed since the commit you reset to.
Note: (Since Git 2.7) you can also use the git rebase --no-autostash as well.
git revert <sha-1>
"Undo" the given commit or commit range.
The revert command will "undo" any changes made in the given commit.
A new commit with the undo patch will be committed while the original commit will remain in history as well.
# Add a new commit with the undo of the original one.
# The <sha-1> can be any commit(s) or commit range
git revert <sha-1>
This schema illustrates which command does what.
As you can see there, reset && checkout modify the HEAD.

First reset locally:
git reset 23b6772
To see if you're on the right position, verify with:
git status
You will see something like:
On branch master Your branch is behind 'origin/master' by 17 commits,
and can be fast-forwarded.
Then rewrite history on your remote tracking branch to reflect the change:
git push --force-with-lease // a useful command #oktober mentions in comments
Using --force-with-lease instead of --force will raise an error if others have meanwhile committed to the remote branch, in which case you should fetch first. More info in this article.

Quickest possible solution (just 1 step)
Use git checkout -
You will see Switched to branch <branch_name>. Confirm it's the branch you want.
Brief explanation: this command will move HEAD back to its last position. See note on outcomes at the end of this answer.
Mnemonic: this approach is a lot like using cd - to return to your previously visited directory. Syntax and the applicable cases are a pretty good match (e.g. it's useful when you actually want HEAD to return to where it was).
More methodical solution (2-steps, but memorable)
The quick approach solves the OP's question. But what if your situation is slightly different: say you have restarted Bash then found yourself with HEAD detached. In that case, here are 2 simple, easily remembered steps.
1. Pick the branch you need
Use git branch -v
You see a list of existing local branches. Grab the branch name that suits your needs.
2. Move HEAD to it
Use git checkout <branch_name>
You will see Switched to branch <branch_name>. Success!
Outcomes
With either method, you can now continue adding and committing your work as before: your next changes will be tracked on <branch_name>.
Note that both git checkout - and git checkout <branch_name> will give additional instructions if you have committed changes while HEAD was detached.

The question can be read as:
I was in detached-state with HEAD at 23b6772 and typed git reset origin/master (because I wanted to squash). Now I've changed my mind, how do I go back to HEAD being at 23b6772?
The straightforward answer being: git reset 23b6772
But I hit this question because I got sick of typing (copy & pasting) commit hashes or its abbreviation each time I wanted to reference the previous HEAD and was Googling to see if there were any kind of shorthand.
It turns out there is!
git reset - (or in my case git cherry-pick -)
Which incidentally was the same as cd - to return to the previous current directory in *nix! So hurrah, I learned two things with one stone.

When you run the command git checkout commit_id then HEAD detached from 13ca5593d(say commit-id) and branch will be on longer available.
Move back to previous location run the command step wise -
git pull origin branch_name (say master)
git checkout branch_name
git pull origin branch_name
You will be back to the previous location with an updated commit from the remote repository.

Today, I mistakenly checked out on a commit and started working on it, making some commits on a detach HEAD state. Then I pushed to the remote branch using the following command:
git push origin HEAD: <My-remote-branch>
Then
git checkout <My-remote-branch>
Then
git pull
I finally got my all changes in my branch that I made in detach HEAD.

This may not be a technical solution, but it works. (if anyone of your teammate has the same branch in local)
Let's assume your branch name as branch-xxx.
Steps to Solve:
Don't do update or pull - nothing
Just create a new branch (branch-yyy) from branch-xxx on his machine
That's all, all your existing changes will be in this new branch (branch-yyy). You can continue your work with this branch.
Note: Again, this is not a technical solution, but it will help for sure.

Move last non-pushed commits to a new branch
If your problem is that you started committing on the WRONG_BRANCH, and want to move those last non-pushed commits to the RIGHT_BRANCH, the easiest thing to do is
git checkout WRONG_BRANCH
git branch RIGHT_BRANCH
git reset —-hard LAST_PUSHED_COMMIT
git checkout RIGHT_BRANCH
At this point, if you run git log HEAD you will see that all your commits are there, in the RIGHT_BRACH.
Data
WRONG_BRANCH is where your committed changes (yet to push) are now
RIGHT_BRANCH is where your committed changes (yet to push) will be
LAST_PUSHED_COMMIT is where you want to restore the WRONG_BRANCH to

Git:get changes released to master over time

as a personal project, I'd like to check different python libraries and projects (be it proprietary or open source) and analyze how the code was changed over time in different releases to gather some info about the technical debt (mainly through static code analysis). I'm doing this using gitpython library. However, I'm struggling to filter the merge commits to the master.
I filter the merge commits using git.log("--merges", "--first-parent", "master") from where I extract the commit hashes and filter these particular commits from all repository commits.
As the second part, I'd like to get all changed files in each merge commit. I'm able to access the blobs via git tree, but I don't know how to get only changed files.
Is there some efficient way how to accomplish this? Thanks!

... I'd like to get all changed files in each merge commit. ... but I don't know how to get only changed files.
Once you have your commit list as you described above, loop over them and run the following:
git diff
Use the git diff with the --name-only flag
git diff
--name-only
Show only names of changed files.
--name-status
Show only the names and status of changed files. See the description of the --diff-filter option on what the status letters mean.

Using black with git clean filter

I'm trying to set up black to run on any files checked into git.
I've set it up as follows:
git config filter.black.clean 'black -'
echo '*.py filter=black' >> .git/info/attributes
As far as I understand, this should work ok as black with - as the source path will read from STDIN and output to STDOUT, which is what I think the git filter needs it to do.
However this does not work. When I add an un-black file with git add I see the following output:
reformatted -
All done! ✨ 🍰 ✨
1 file reformatted.
And the file is not changed on disk. What am I doing wrong?

The Black documentation recommends using a pre-commit hook, rather than smudge and clean filters. Note that filter.black.clean defines a clean filter and you have not set up any smudge filters.
The reason you see no change to the work-tree version of the file is that a clean filter is used when turning the work-tree version of a file into the index (to-be-committed) version of the file. This has no effect on the work-tree version of the file!
A smudge filter is used in the opposite direction: Git has a file that is in the index—for whatever reason, such as because it was just copied into the index as part of the git checkout operation to switch to a specific commit—and desires to convert that in-the-index, compressed, Git-ized file to one that you can actually see and edit in your editor, or run with python. Git will, at this time, run the (de-compressed) file contents through your smudge filter.
Note that if you convert some file contents, pre-compression, in a clean filter, and then later extract that file from the repository, into the index, and on into your work-tree, you will at that time be able to see what happened in the clean filter (assuming you do not have a countervailing smudge filter that undoes the effect).
In a code-reformatting world, one could conceivably use a clean filter to turn all source files into some sort of canonical (perhaps four-space-indentation) form, and a smudge filter to turn all source files into one's preferred format (two-space or eight-space indentation). If these transformations are all fully reversible, what you would see, in your work-tree, would be in your preferred format, and what others would see in their work-trees would be their preferred format; but what the version control system itself would see would be the canonical, standardized format.
That's not how Black is actually intended to be used, though it probably can be used that way.

It's not obvious how to set this up manually, but using the pre-commit framework seems to work well.
This approach does two things:
checks prior to committing that the files pass the black test
runs black to fix the files if they don't
So if files don't pass, the commit fails, and black fixes the files which you then need to git -add before trying to commit again.
In a separate test, I managed to set up the pre-commit hook manually using black . --check in .git/hooks/pre-commit (this just does the check - doesn't fix anything if it fails), but never figured out how to configure black to work with the clean and smudge filters.

diff in pre-receive hook

I have written a simple server side git pre-receive hook in Python. Goal is to analyze diffs and reject pushes that have certain text that we consider invalid. I wrote the hook using below set of commands :
git ls-tree
git diff --name-only
git cat-file
however I just noticed that i am scanning entire files that are pushed as part of the commit. But I only want to scan the diff ie the changed lines in this push.
The reason for that is some invalid text can be false positive and is okay. It can be force pushed. However if the same file is edited again and valid text is added, the push will be rejected just because that file previously had invalid text. And this will happen each time the file is edited which is kinda annoying
So basically the question is , how to get just the changed linesdiff in the current push on server side hook code instead of scanning complete files.
Thanks

... how to get just the changed lines
This question is incomplete. Suppose I tell you that there are some people, including Alice, Bob, Carol, and so on. Now I tell you that Bob is different. Different from who or what?
In a pre-receive hook, you must read lines from your standard input. Each line has the form:
old-hash new-hash reference-name
What do these mean? (That's an exercise for you to answer before you go on to the next sections, though the answer is embedded in the last section below.)
Obtaining a diff requires that you select two items
A commit is a snapshot of files—complete copies of every file that was frozen into that commit. There are no differences involved; there are just complete files.
You, however, want differences. To get a difference for some file file.ext, you must pick some other version of file.ext and compare the two. What is the correct "other version"?
For some commits, you are in luck: there's a very clear correct "other version" of file.ext, which is: the copy of file.ext in that commit's parent commit. In fact, this repeats for every file in the commit: we would like to compare that commit's version of that file, to the parent's version of that file, to see what changed.
There's a handy script-able ("plumbing") command for this, which is git diff-tree: given the hash ID of an ordinary non-merge commit, git diff-tree compares the commit's parent to the commit. Add -p or --patch to get a textual difference (this automatically implies the -r option). Consider using -U0 to drop context lines. You will, of course, still need to parse the output lines, to detect hunk headers and the added/deleted markers.
A simple git diff-tree <hash> does not, however, work for two cases of commits:
A root commit has no parent. Fortunately, the empty tree comes to the rescue: git diff-tree -p $(git hash-object -t tree /dev/null) $hash does the trick.
A merge commit has two or more parents. Here git diff-tree producse a combined diff by default. If that's OK, you can ignore this case. If not, you might consider using --first-parent -m or just -m to split the merge and get multiple diffs, against each parent (default) or the first parent (--first-parent).
That gets you the diff for one commit, so now we move on to the last part.
Now it's time to deal with the hook's stdin input lines
As you read each line, it's your job to:
Check the old and new hashes for the special all-zero-digits null hash. In Python, there are multiple ways to express this; one is:
def is_null(hash):
return all(i == '0' for i in hash)
If the old hash is null, the reference is being created at the new hash. If the new hash is null, the reference used to have the given old hash, and is being deleted. Otherwise—neither hash is null—the reference is being updated: it had the old hash, and will have the new hash.
Figure out what to do, if anything, with the change to the particular reference. Is deletion allowed? Is creation allowed? Does it matter if this is a branch name (starts with refs/heads/) vs a tag name (starts with refs/tags/) vs something else entirely?
Creations are especially difficult. The newly introduced name makes the given object reachable by that name. If the object is a tag or commit, that makes additional objects reachable by that name as well. Some or all of these objects may be new. Some or all of these objects may already exist. The classic case is when someone creates a new branch name: it may point to an existing commit, already on some other branch, or it may point to a new commit, the new tip of the new branch, which may have many additional new commits before joining up with some existing branch(es).
Updates are the most common, and usually the simplest to handle. You know that the existing reference name made the old object reachable, and the proposed update is to make the new object reachable. If the reference is a branch name, both objects are in fact commit objects, and it is easy to find which commits, if any, are newly reachable from the proposed new hash, and which commits, if any, are being removed from reachability via the proposed new hash:
git rev-list $old..$new
produces the set of hash IDs that are newly reachable, and:
git rev-list $new..$old
produces the set that are no longer reachable. (Use git rev-list --left-right $old...$new, with three dots, to get both sets of hash IDs at once, with distinguishing markers. You can use $new...$old: the symmetric difference that this produces is itself symmetric, except of course that the left and right sides are reversed.)
Assuming you have handled creation somehow, if your goal is to examine newly-reachable commits—whether or not they are new to the repository overall—you can simply walk through all the new commits, testing each one to see if it is a root commit, an ordinary (single-parent) commit, or a merge commit. (Hint: add --parents to the git rev-list command to get the parent IDs included, so that you can easily tell how many parents each commit has. Also, consider the graph structure of the commit graph fragment you are walking: $old..$new may include merges, which may make many commits reachable that may or may not be new to the repository.)
You now have all the commit hashes, and their parent counts. You also know how to use git diff-tree to compare each commit against its parent(s) or against the empty tree as needed. So now you are ready to write your fancy pre-receive hook.

Moving files and commit history from one git branch to another without changing the SHA

I have a branch called add-ivp-solver that I have submitted as a PR for a project on GitHub. The branch has gotten a bit bloated and we now wish to move some of the files out of add-ivp-solver to a new branch called add-models which will be submitted as another PR in the future.
I would like to know if it is possible to move files and their associated commit history from add-ivp-solver to add-models in a way that will allow us to cleaning merge add-ivp-solver into master and close the original PR.
I think that git filter-branch might be what I need. This should allow me to remove the files and commit history from add-ivp-solver to add-models, but I am concerned that it will leave add-ivp-solver in an "inconsistent state" which will make it nearly impossible to merge and close the PR.

I am concerned that it will leave add-ivp-solver in an "inconsistent state" which will make it nearly impossible to merge and close the PR.
No, it will leave that branch with a different history, which means:
you will need to force push it to your fork
git checkout add-ivp-solver
// do your filter-branch
git push --force origin add-ivp-solver
the PR will automatically adjust in order to take into account that new history (nothing to do)
the maintainer of the original repo can test again the merge of that PR

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.