GitPython: retrieve changed files between commits in specific sub directory

GitPython: retrieve changed files between commits in specific sub directory - python

The repo structure looks like this:
- folder_a
- folder_b
- folder_c
- ...
I am particularly interested in the files that changed in a specific commit, but only the ones in folder_a. My solution is
for filename, details in commit.stats.files.items():
if not filename.startswith('folder_a'):
continue
# ...
but it seems the performance is not quite good if there are a great number of files in the other folders. Is there any better way to skip the files I don't care about?

If I understand correctly : you want stats on modifications from a commit, only on one specific subfolder.
Using plain git :
git show [commit] --stat folder_a
will display exactly what you want.
Have a look at what : git.show('<commit identifier>', '--stat', 'folder_a'); returns in your python script.

Related

Include multiple RST files with Sphinx with directive .. include::

In my Sphinx project, I want to have a include folder with multiple RST files that I can reuse in other projects. My source folder looks something like:
\source
\include
links.rst # Here I have useful external links
roles.rst # Here I define custom roles
subs.rst # Here I definne common substitutions (replace directive)
... rest of my stuff
conf.py
Basically, I want to be able to write a single .. include:: in my source RST files that will account for all my files, i.e. the equivalent of /include/*.rst
I have come up with a neat solution that I post below since it might be usefult to someone else. However, it would be nice to hear other alternatives, since my solution comes with a problem of infinite loop when using sphinx-autobuild.

My solution consists of modifying conf.py to include this small piece of code:
conf.py
import os
# Code to generate include.rst
files = os.listdir('include')
with open('include.rst', 'w') as file:
for rst in files:
file.write('.. include:: /include/' + rst + '\n')
This will create a new include.rst file in the root source directory, which will look as:
\source
\include
links.rst # Here I have useful external links
roles.rst # Here I define custom roles
subs.rst # Here I definne common substitutions (replace directive)
... rest of my stuff
conf.py
include.rst
With the new file include.rst looking like:
.. include:: /include/links.rst
.. include:: /include/roles.rst
.. include:: /include/subs.rst
Finally, in my source files I only need to add on top of the file the line
.. include:: include.rst
to benefit from all my custom links, roles, and substitutions (or anything else you might want there).
PROBLEM:
My solution here presents a problem. Since I use sphinx-autobuild to automatically build the html output whenever a change is detected, an infinite loop is produced, since each time the execution of conf.py is creating the file include.rst. Any ideas on how to solve this?
UPDATE:
I have found the solution to the problem mentioned above, and which was pretty obvious, actually.
Now I execute sphinx-autobuild with the --re-ignore option as:
> sphinx-autobuild source build/html --re-ignore include.rst
and the loop stops happening.
Now, this is OK if I change the children rst files (i.e. roles, links, or subs) but if the very include.rst changes (e.g. a new children rst file was added) then I need to stop and re-run sphinx-autobuild.

Deploying 1 python package under 2 different names

I am trying to have 2 simultaneous versions of a single package on a server. A production and a testing one.
I want these 2 to be on the same git repository on 2 different branches (The testing would merge into production) however i would love to keep them in the same directory so its not needed to change any imports or paths.
Is it possible to dynamically change the package name in setup.py, depending on the git branch?
Or is it possible to deploy them with different names using pip?
EDIT : i may have found a proper solution for my problem here : Git: ignore some files during a merge (keep some files restricted to one branch)
Gitattributes can be setup to ignore merging of my setup.py, ill close this question after i test it.

This could be done with a setup script that looks like this:
#!/usr/bin/env python3
import pathlib
import setuptools
def _get_git_ref():
ref = None
git_head_path = pathlib.Path(__file__).parent.joinpath('.git', 'HEAD')
with git_head_path.open('r') as git_head:
ref = git_head.readline().split()[-1]
return ref
def _get_project_name():
name_map = {
'refs/heads/master': 'ThingProd',
'refs/heads/develop': 'ThingTest',
}
git_ref = _get_git_ref()
name = name_map.get(git_ref, 'ThingUnknown')
return name
setuptools.setup(
# see 'setup.cfg'
name=_get_project_name(),
)
It reads the current git ref directly from the .git/HEAD file and looks up the corresponding name in a table.
Inspired from: https://stackoverflow.com/a/56245722/11138259.

Using .gitattributes file with content "setup.py merge=ours" and also setting up the git config --global merge.ours.driver true. Makes the merge "omit" the setup.py file (it keeps our file instead). This works only if both master and child branch have changed the file since they firstly parted ways. (it seems)

Run a python script recursively indicating input/output paths/names

I have directory containing multiple subdirectories, all of which contain a file named sample.fas. Here, I want to run a python script (script.py) in each file sample.fas of the subdirectories, an export the output(s) with the name of each of their subdirectories.
However, the script needs the user to indicate the path/name of the input, and not create automatically the outputs (it's necessary to specify the path/name). Like this:
script.py sample_1.fas output_1a.nex output_1b.fas
I try using this lines, without success:
while find . -name '*.fas'; # find the *fas files
do python script.py $*.fas > /path/output_1a output_1b; # run the script and export the two outputs
done
So, I want to create a bash that read each sample.fas from all subdirectories (run the script recursively), and export the outputs with the names of their subdirectories.
I would appreciate any help.

One quick way of doing this would be something like:
for x in $(find . -type f -name *.fas); do
/usr/bin/python /my/full/path/to/script.py ${x} > /my/path/$(basename $(dirname ${x}))
done
This is running the script against all .fas files identified in the current directory (subdirectories included) and then redirects whatever the python script is outputting to a file named like the directory in which the currently processed .fas file was located. This file is created in /my/path/.
There is an assumption here (well, a few), and that is that all the directories which contain .fas files have unique names. Also, the paths are supposed not to have any spaces in them, this can be fixed with proper quoting. Another assumption is that the script is always outputting valid data (this just redirects all output from the script to that file). However, this should hopefully get you going in the right direction.
But I get the feeling that I didn't properly understand your question. If this is the case, could you rephrase and maybe provide a tree showing how the directories and sub-directories are structured like?
Also, if my answer is helping you, I would appreciate it if you could mark it as the accepted answer by clicking the check mark on the left.

Get changed files using gitpython

I want to get a list of changed files of the current git-repo. The files, that are normally listed under Changes not staged for commit: when calling git status.
So far I have managed to connected to the repository, pulled it and show all untracked files:
from git import Repo
repo = Repo(pk_repo_path)
o = self.repo.remotes.origin
o.pull()[0]
print(repo.untracked_files)
But now I want to show all files, that have changes (not commited). Can anybody push me in the right direction? I looked at the names of the methods of repo and experimented for a while, but I can't find the correct solution.
Obviously I could call repo.git.status and parse the files, but that isn't elegant at all. There must be something better.
Edit: Now that I think about it. More usefull would be a function, that tells me the status for a single file. Like:
print(repo.get_status(path_to_file))
>>untracked
print(repo.get_status(path_to_another_file))
>>not staged

for item in repo.index.diff(None):
print item.a_path
or to get just the list:
changedFiles = [ item.a_path for item in repo.index.diff(None) ]
repo.index.diff() returns git.diff.Diffable described in http://gitpython.readthedocs.io/en/stable/reference.html#module-git.diff
So function can look like this:
def get_status(repo, path):
changed = [ item.a_path for item in repo.index.diff(None) ]
if path in repo.untracked_files:
return 'untracked'
elif path in changed:
return 'modified'
else:
return 'don''t care'

just to catch up on #ciasto piekarz question: depending on what you want to show:
repo.index.diff(None)
does only list files that have not been staged
repo.index.diff('Head')
does only list files that have been staged

Turn subversion path into walkable directory

I have a subversion repo ie "http://crsvn/trunk/foo" ... I want to walk this directory or for starters simply to a directory list.
The idea is to create a script that will do mergeinfo on all the branches in "http://crsvn/branches/bar" and compare them to trunk to see if the branch has been merged.
So the first problem I have is that I cannot walk or do
os.listdir('http://crsvn/branches/bar')
I get the value label syntax is incorrect (mentioning the URL)

You can use PySVN. In particular, the pysvn.Client.list method should do what you want:
import pysvn
svncl = pysvn.Client()
entries = svncl.list("http://rabbitvcs.googlecode.com/svn/trunk/")
# Gives you a list of directories:
dirs = (entry[0].repos_path for entry in entries if entry[0].kind == pysvn.node_kind.dir)
list(dirs)
No checkout needed. You could even specify a revision to work on, to ensure your script can ignore other people working on the repository while it runs.

listdir takes a path and not a url. It would be nice if python could be aware of the structure on a remote server but i don't think that is the case.
If you were to checkout your repository locally first you could easly walk the directories using pythons functions.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.