git commit miss, can't get it - python

I clone the 'Apache/tomcat' git repo to use some info about commit.
However, when i use git.repo('repo local address').iter_commits(), i can't get some commits.
Besides, I can't search these in github search engine.
For example, commit 69c56080fb3355507e1b55d014ec0ee6767a6150 is in the 'Apache tomcat' repo, however, search '69c56080fb3355507e1b55d014ec0ee6767a6150' in 'in this repository' get nothing.
It's amazing for me.
It seems like that the commit isn't in the master branch, so can't be searched?
I want to know the theory behind this and how to get info about these 'missing' commits in Python.
Thanks.

repo.iter_commits(), with no arguments, gives you the commits which can be reached by tracing back through the parent(s) of the current commit. In other words, if you are in the master branch, it will only give you commits that are part of the master branch.
You can give it a rev argument which, among other things, can be a branch name. For example, iter_commits(rev='8.5.x') ought to give you all commits in the 8.5.x branch, which will include 69c5608. You can use another function, repo.branches(), if you need to get a list of branches.
Alternatively, if you already know the hash of a single commit that you want to look up, you can use repo.commit(), again with a rev parameter which in this case is the full or abbreviated commit hash: commit(rev='69c5608').

I believe the issue here is that this commit is in the branch 8.5.x and not master. You can see this in the first link. It will show which branches include it. The GitHub search algorithm only searches the master/main/trunk branch.
To find it via git python library, try changing to that branch. See these instructions on how to switch branches: https://gitpython.readthedocs.io/en/stable/tutorial.html#switching-branches

Related

Python Pip automatically increment version number based on SCM

Similar questions like this were raised many times, but I was not able to find a solution for my specific problem.
I was playing around with setuptools_scm recently and first thought it is exactly what I need. I have it configured like this:
pyproject.toml
[build-system]
requires = ["setuptools_scm"]
build-backend = "setuptools.build_meta"
[project]
...
dynamic = ["version"]
[tool.setuptools_scm]
write_to = "src/hello_python/_version.py"
version_scheme = "python-simplified-semver"
and my __init__.py
from ._version import __version__
from ._version import __version_tuple__
Relevant features it covers for me:
I can use semantic versioning
it is able to use *.*.*.devN version strings
it increments minor version in case of feature-branches
it increments patch/micro version in case of fix-branches
This is all cool. As long as I am on my feature-branch I am able to get the correct version strings.
What I like particularly is, that the dev version string contains the commit hash and is thus unique across multiple branches.
My workflow now looks like this:
create feature or fix branch
commit, (push, ) publish
merge PR to develop-branch
As soon as I am on my feature-branch I am able to run python -m build which generated a new _version.py with the correct version string accordingly to the latest git tag found. If I add new commits, it is fine, as the devN part of the version string changes due to the commit hash. I would even be able to run a python -m twine upload dist/* now. My package is build with correct version, so I simply publish it. This works perfectly fine localy and on CI for both fix and feature branches alike.
The problem that I am facing now, is, that I need a slightly different behavior for my merged PullRequests
As soon as I merge, e.g. 0.0.1.dev####, I want to run my Jenkins job not on the feature-branch anymore, but instead on develop-branch. And the important part now is, I want to
get develop-branch (done by CI)
update version string to same as on branch but without devN, so: 0.0.1
build and publish
In fact, setuptools_scm is changing the version to 0.0.2.dev### now, and I would like to have 0.0.1.
I was tinkering a bit with creating git tags before running setuptools_scm or build, but I was not able to get the correct version string to put into the tag. At this point I am struggling now.
Is anyone aware of a solution to tackle my issue with having?:
minor increment on feature-branches + add .devN
patch/micro increment on fix-branches + add .devN
no increment on develop-branch and version string only containing major.minor.patch of merged branch
TLDR: turning off to write the version number to a file every time setuptools_scm runs could maybe solve your problem, alternatively add the version file to .gitignore.
Explanation:
I also just started using setuptools_scm, so I am not very confident in using it yet.
But, as far as I understand the logic to derive the version number is incremented according to the state of your repository (the detailed logic is documented here: https://github.com/pypa/setuptools_scm/#default-versioning-scheme).
When I am not mistaken, the tool now does exactly what it is expected to: it does NOT set a version name only derived from the tag, but also adds a devSomething because the tag you've set is not referencing the most current commit on the develop branch head in your case.
Also I had the problem that when letting setuptools_scm generate a version and also configuring it to write it to a file, this would lead to another state since the last commit, again generating a dev version number.
To get a "clean" (e.g. v0.0.1) version number I hat to do the tagging after merging (with merge commit) since the merge commit was also taken into account for the version numbering logic.
Still my setup is currently less complex than yours. Just feature and fix branches and just a main branch without develop. So fewer merge commits (I chose to do merge commits, so no linear history). Now after merging with commit I create a Tag manually and formulate its name myself.
And this also only works for me in case I opt-out for writing the version number into a file. This I have done by inserting the following into pyproject.toml:
[tool.setuptools_scm]
# intentionally empty/commented out
# write_to option leads to an unclean workspace during build
# which again is leading setuptools_scm to interpret this during build and producing wheels with unclean version numbers
# write_to = "version.txt"
Since setuptools_scm runs during build, a new version file is also generated, which pollutes your worktree. Since your worktree will never be clean this way, you always get a dev version number. To still have a version file and to let it ignore during build, add the file to your .gitignore.
My approach is not perfect, some manual steps, but for now it works for me.
Certainly not 100% applicable in your CI scenario, but maybe you could change the order of doing merges and tags. I hope this helps somehow.

pygit2 merge simulating -Xtheirs or -Xours

I'm automating changing files in github using pygit2. Sometimes the files have changed in github while I am processing a repo so I want to pull() before I push().
Since this is automated I would like to avoid conflicts by either having my local changes always override the remote, or vice-versa. This seems like a very simple scenario but after hours of scouring the internet for examples I have found 0 examples of someone doing this. The pygit source itself has some examples that get close but the "handle conflicts" portion is just a "TODO" comment.
It looks like pygit2 should support it, but none of the APIs seem to do this.
For example,
Repository.merge_commits(ours, theirs, favor='normal', flags={}, file_flags={})
When I set favor="theirs" or favor="ours" and purposely force a conflict I still get conflicts.
I tried this:
ancestor_id = repo.merge_base(repo.head.target,remote_master_id)
repo.merge_trees(ancestor_id,repo.head,remote_master_id,favor="theirs")
No conflict now, but now I somehow end up with the repo in a state where both (ours and theirs) changes are in the commit history but the file itself is missing either change.
I'm just guessing here since I have no clue what merge_trees does (except "merge trees") and experimenting with values of ancestor_id.
Is there a way to get pygit2 to get it to do what I want?

Can I enforce the order branches are merged on GitHub

I have a github repo that contains three protected branches; master, staging & uat. Anyone may make other branches to make changes but I would like a way make sure that people merge in this order:
users_branch -> uat -> staging -> master.
I have looked at pre-receive hooks using python but cant seem to get information I need on which branches are being merged to create this logic. The only arguments available in pre-receive are; base, commit & ref
Is there anyway to enforce that only uat may merge into staging and only staging may merge into master?
You could setup a workflow with git-flow.
Or you could setup a manual process where commit rights to those branches reside with one person who is responsible for pulling in changes and merging them in the right order.
One thing to remember with Git is these controls will only apply at your 'central' repo. You can't control what happens at the individual cloned repos. Also since hooks are not distributed with repositories for security reasons, you will not be able to enforce the this order via hooks either.
I guess the best you could do is to check all merge commits in the newly pushed commits to check whether they have exactly two parents and the second parent is contained in the branch you want to enforce merging from.
But you cannot add arbitrary git hooks to GitHub repositories anyway, or can you?

GitPython: how to commit updated submodule

I have been at this for hours now, and although I have a feeling I'm close I can't seem to figure this out.
I'm trying to make a script that takes a git repository, updates a submodule in that repository to a specified version, and commits that change.
What works:
I can find the repository, get the submodule and check out the commit I want.
What doesn't work:
I can't seem to add the updated submodule hash so I can commit it.
My Code:
repos = Repo('path/to/repos')
submodule = repos.submodule('submodule-name')
submodule.module().git.checkout('wanted commit')
diff = repos.index.diff(None)
At this point I can see the submodule-change. If I check sourcetree, I can see the changed submodule in the 'unstaged files'.
The thing is, I have no clue how to stage the change so I can commit it.
What I have tried:
If I commit using repos.index.commit(''), it creates an empty commit.
If I try to add the path of the submodule using repos.index.add([submodule.path]), all files in the submodule are added to the repository, which is definately not what I want.
If I try to add the submodule itself (which should be possible according to the docs) using repos.index.add([submodule]), nothing seems to happen.
There are two ways to add new submodule commits to the parent repository. One will use the git command directly, the other one will be implemented in pure-python.
All examples are based on the code presented in the question.
Simple
repos.git.add(submodule.path)
repos.index.commit("updated submodule to 'wanted commit'")
The code above will invoke the git command, which is similar to doing a git add <submodule.path> in a shell.
Pythonic
submodule.binsha = submodule.module().head.commit.binsha
repos.index.add([submodule])
repos.index.commit("updated submodule to 'wanted commit'")
The IndexFile.add(...) implementation adds whichever binsha it finds in the submodule object. This one would be the one in the parent repository's current commit, and not the commit that was checked out in the submodule. One can see the Submodule object as a singular snapshot of the submodule, which does not change, nor does not it know about changes to the submodule's repository.
In this case it seems easiest to just overwrite the binsha field of the submodule object to the one that is actually checked out in its repository, so adding it to the index will have the desired effect.

Check which branches contain a specific git commit sha using Python dulwich?

I would like to do the following command from Python script using dulwich:
$ git branch --contains <myCommitSha> | wc -l
What I intend is to check if a particular commit (sha) is placed in more than one branches.
Of course I thought that I can execute the above command from Python and parse the output (parse the number of branches), but that's the last resort solution.
Any other ideas/comments? Thanks in advance.
Just in case someone was wondering how to do this now using gitpython:
repo.git.branch('--contains', YOURSHA)
Since branches are just pointers to random commits and they don't "describe" trees in any way, there is nothing linking some random commit TO a branch.
The only sensible way I would take to look up if a given commit is an ancestor of a commit to which some branch points is to traverse all ancestor chains from branch-top commit down.
In other words, in dulwich I would iterate over branches and traverse backwards to see if a sha is on the chain.
I am rather certain that's exactly what git branch --contains <myCommitSha> does as I am not aware of any other shortcut.
Since your choice is (a) make python do the iteration or (b) make C do same iteration, I'd just go with C. :)
There is no built-in function for this, but you can of course implement this yourself.
You can also just do something like this (untested):
branches = [ref for ref in repo.refs.keys("refs/heads/") if
any((True for commit in repo.get_walker(include=[repo.refs[ref]])
if commit.id == YOURSHA))]
This will give you a list of all the branch heads that contain the given commit, but will have a runtime of O(n*m), n beeing the amount of commits in your repo, m beeing the amount of branches. The git implementation probably has a runtime of O(n).
In case anyone uses GitPython and wants all branches
import git
gLocal = git.Git("<LocalRepoLocation>")
gLocal.branch('-a','--contains', '<CommitSHA>').split('\n')

Categories