Get changed files using gitpython - python

I want to get a list of changed files of the current git-repo. The files, that are normally listed under Changes not staged for commit: when calling git status.
So far I have managed to connected to the repository, pulled it and show all untracked files:
from git import Repo
repo = Repo(pk_repo_path)
o = self.repo.remotes.origin
o.pull()[0]
print(repo.untracked_files)
But now I want to show all files, that have changes (not commited). Can anybody push me in the right direction? I looked at the names of the methods of repo and experimented for a while, but I can't find the correct solution.
Obviously I could call repo.git.status and parse the files, but that isn't elegant at all. There must be something better.
Edit: Now that I think about it. More usefull would be a function, that tells me the status for a single file. Like:
print(repo.get_status(path_to_file))
>>untracked
print(repo.get_status(path_to_another_file))
>>not staged

for item in repo.index.diff(None):
print item.a_path
or to get just the list:
changedFiles = [ item.a_path for item in repo.index.diff(None) ]
repo.index.diff() returns git.diff.Diffable described in http://gitpython.readthedocs.io/en/stable/reference.html#module-git.diff
So function can look like this:
def get_status(repo, path):
changed = [ item.a_path for item in repo.index.diff(None) ]
if path in repo.untracked_files:
return 'untracked'
elif path in changed:
return 'modified'
else:
return 'don''t care'

just to catch up on #ciasto piekarz question: depending on what you want to show:
repo.index.diff(None)
does only list files that have not been staged
repo.index.diff('Head')
does only list files that have been staged

Related

GitPython: retrieve changed files between commits in specific sub directory

The repo structure looks like this:
- folder_a
- folder_b
- folder_c
- ...
I am particularly interested in the files that changed in a specific commit, but only the ones in folder_a. My solution is
for filename, details in commit.stats.files.items():
if not filename.startswith('folder_a'):
continue
# ...
but it seems the performance is not quite good if there are a great number of files in the other folders. Is there any better way to skip the files I don't care about?
If I understand correctly : you want stats on modifications from a commit, only on one specific subfolder.
Using plain git :
git show [commit] --stat folder_a
will display exactly what you want.
Have a look at what : git.show('<commit identifier>', '--stat', 'folder_a'); returns in your python script.

Extract commits related to code changes from commit tree

Right now I am able to traverse through the commit tree for a github repository using pygit2 library. I am getting all the commits for each file change in the repository. This means that I am getting changes for text files with extensions .rtf as well in the repository. How do I filter out the commits which are related to code changes only? I don't want the changes related to text documents.
Appreciate any help or pointers. Thanks.
last = repo[repo.head.target]
t0=last
f = open(outputFile,'w')
print t0.hex
for commit in repo.walk(last.id):
if t0.hex == commit.hex:
continue
print commit.hex
out=repo.diff(t0,commit)
f.write(out.patch)
t0=commit;
As part of the output, I get the difference in rtf files as well as below:
diff --git a/archived-output/NEW/action-core[best].rtf b/archived-output/NEW/action-core[best].rtf
deleted file mode 100644
index 56cdec6..0000000
--- a/archived-output/NEW/action-core[best].rtf
+++ /dev/null
## -1,8935 +0,0 ##
-{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff31507\deff0\stshfdbch31506\stshfloch31506\stshfhich31506\stshfbi31507\deflang1033\deflangfe1033\themelang1033\themelangfe0\themelangcs0{\fonttbl{\f0\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fbidi \fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}
-{\f2\fbidi \fmodern\fcharset0\fprq1{\*\panose 02070309020205020404}Courier New;}{\f3\fbidi \froman\fcharset2\fprq2{\*\panose 05050102010706020507}Symbol;}
Either I have to filter the commits from the tree or I have to filter the output . I was thinking if I could remove the changes related to rtf files by removing the corresponding commits while walking through the tree.
If that is possible, how do we get the list of modified files?
Ah, now you're asking the right questions! Git, of course, does not store a list of modified files in each commit. Rather, each commit represents the state of the entire repository at a certain point in time. In order to find the modified files, you need to compare the files contained in one commit with the previous commit.
For each commit returned by repo.walk(), the tree attribute refers to the associated Tree object (which is itself a list of TreeEntry objects representing files and directories contained in that particular Tree).
A Tree object has a diff_to_tree() method that can be used to compare it against another Tree object. This returns a Diff object, which acts as an iterator over a list of Patch objects. Each Patch object refers to the changes in a single file between the two Trees that are being compared.
The Patch object is really the key to all this, because this is how
we determine which files have been modified.
The following code demonstrates this. For each commit, it will print
a list of new, modified, or deleted files:
import stat
import pygit2
repo = pygit2.Repository('.')
prev = None
for cur in repo.walk(repo.head.target):
if prev is not None:
print prev.id
diff = cur.tree.diff_to_tree(prev.tree)
for patch in diff:
print patch.status, ':', patch.new_file_path,
if patch.new_file_path != patch.old_file_path:
print '(was %s)' % patch.old_file_path,
print
if cur.parents:
prev = cur
cur = cur.parents[0]
If we run this against a sample repository, we can look at the
output for the first few commits:
c285a21e013892ee7601a53df16942cdcbd39fe6
D : fragments/configure-flannel.sh
A : fragments/flannel-config.service.yaml
A : fragments/write-flannel-config.sh
M : kubecluster.yaml
b06de8f2f366204aa1327491fff91574e68cd4ec
M : fragments/enable-services-master.sh
M : fragments/enable-services-minion.sh
c265ddedac7162c103672022633a574ea03edf6f
M : fragments/configure-flannel.sh
88a8bd0eefd45880451f4daffd47f0e592f5a62b
A : fragments/configure-docker-storage.sh
M : fragments/write-heat-params.yaml
M : kubenode.yaml
And compare that to the output of git log --oneline --name-status:
c285a21 configure flannel via systemd unit
D fragments/configure-flannel.sh
A fragments/flannel-config.service.yaml
A fragments/write-flannel-config.sh
M kubecluster.yaml
b06de8f call daemon-reload before starting services
M fragments/enable-services-master.sh
M fragments/enable-services-minion.sh
c265dde fix json syntax problem
M fragments/configure-flannel.sh
88a8bd0 configure cinder volume for docker storage
A fragments/configure-docker-storage.sh
M fragments/write-heat-params.yaml
M kubenode.yaml
...aaaand, that looks just about identical. Hopefully this is enough
to you started.
This is mainly a rewrite of larsks's excellent answer to
the current pygit2 API
Python3
It also fixes a flaw in the iteration logic: the original code would miss to diff the last revision against its parent when a revision range (a..b) is walked.
The following approximates the command
git log --name-status --pretty="format:Files changed in %h" origin/devel..master
on the sample repository given by larsks.
I was unable to trace file renames, though. This is printed as a deletion and an addition. The code line printing a rename is never reached.
import pygit2
repo = pygit2.Repository('.')
# Show files changed between origin/devel and current HEAD
devel = repo.revparse_single('origin/devel')
walker = repo.walk(repo.head.target)
walker.hide(devel.id)
for cur in walker:
if cur.parents:
print (f'Files changed in {cur.short_id}')
prev = cur.parents[0]
diff = prev.tree.diff_to_tree(cur.tree)
for patch in diff:
print(patch.delta.status_char(), ':', patch.delta.new_file.path)
if patch.delta.new_file.path != patch.delta.old_file.path:
print(f'(was {patch.delta.old_file.path})'.)
print()

How to get svn directory status using pysvn

I try to get directory "svn" status (unversionned, normal, ...) of a particular directory using pysvn Client.status(dirname) function.
But as said in the doc, pysvn return an array with file status within directory, not directory himself.
Is there an other way to obtain this information ?
Have a nice day.
Ouille
Sorry found it.
Client.status(dirname) return an array. Last element of this array is the directory.
So using myclient.status(dirname)[-1].text_status return directory status.
Have a nice day.
Ouille.

What is expected behviour of tarfile.add() when adding archive to itself?

The question might sound strange because I know I enforce a strange situation> It came up by accident (a bug one might say) and I even know hot to avoid it, so please skip that part.
I would really like to understand the behaviour I see.
The point of the function is to add all files with a given prefix in a directory to an archive. I noticed that even despite a "bug", the program works correctly (sic!). I wanted to understand why.
The code is fairly simple so I allow myself to post whole function:
def pack(prefix, custom_meta_files = []):
postfix = 'tgz'
if prefix[-1] != '.':
postfix = '.tgz'
archive = tarfile.open(prefix+postfix, "w:gz")
files = filter(lambda path: path.startswith(prefix), os.listdir())
#print('files: {0}'.format(list(files)))
for file in files:
print('packing `{0}`'.format(file))
archive_name = file[len(prefix):] #skip prefix + dot
archive.add(file, archive_name)
not_doubled_metas = set(custom_meta_files) - set(archive.getnames())
print('metas to add: {0}'.format(not_doubled_metas))
for meta in not_doubled_metas:
print('packing `{0}`'.format(meta))
archive.add(meta)
print('contents:{0}'.format(archive.getnames()))
As one can notice I create the archive with the prefix, and then I create a list of files to pack by by listing everything in cwd and filter it via the lambda. Naturally the archive passes the filter. There is also a snippet to add fixed files if the names do not overlap, although it is not important I think.
So the output from such run is e.g:
packing `ga_run.seq_niche.N30.1.bt0_5K.params`
packing `ga_run.seq_niche.N30.1.bt0_5K.stats`
packing `ga_run.seq_niche.N30.1.bt0_5K.tgz`
metas to add: {'stats.meta'}
packing `stats.meta`
contents:['params', 'stats', 'stats.meta']
So the script tried adding itself, however it does not appear in the final contents. I do not know what is the expected behaviour, but there is no warning at all and the documentation does not mention anything. I read the parts about methods to add members and used search for itself and same name.
I would assume it is automatically skipped, but I don't know how to acutally check it. I would personally expect to add a zero length file as member, however I understand skipping as I makes more sense actually.
Question Is it a desired behaviour in tarfile.add() to ignore adding the archive to itself? Where is it said?
Scanning the tarfile.py code from 3.2 to 2.4 they all have code similar to:
# Skip if somebody tries to archive the archive...
if self.name is not None and os.path.abspath(name) == self.name:
self._dbg(2, "tarfile: Skipped %r" % name)
return

Turn subversion path into walkable directory

I have a subversion repo ie "http://crsvn/trunk/foo" ... I want to walk this directory or for starters simply to a directory list.
The idea is to create a script that will do mergeinfo on all the branches in "http://crsvn/branches/bar" and compare them to trunk to see if the branch has been merged.
So the first problem I have is that I cannot walk or do
os.listdir('http://crsvn/branches/bar')
I get the value label syntax is incorrect (mentioning the URL)
You can use PySVN. In particular, the pysvn.Client.list method should do what you want:
import pysvn
svncl = pysvn.Client()
entries = svncl.list("http://rabbitvcs.googlecode.com/svn/trunk/")
# Gives you a list of directories:
dirs = (entry[0].repos_path for entry in entries if entry[0].kind == pysvn.node_kind.dir)
list(dirs)
No checkout needed. You could even specify a revision to work on, to ensure your script can ignore other people working on the repository while it runs.
listdir takes a path and not a url. It would be nice if python could be aware of the structure on a remote server but i don't think that is the case.
If you were to checkout your repository locally first you could easly walk the directories using pythons functions.

Categories