Unable to get complete git history of file with python

Unable to get complete git history of file with python - python

I am trying to get all the content from each revision - history of a file in my local repo i am using gitpython lib and here is the code:
import git,json
from pprint import pprint
repo = git.Repo()
path = "my_file_path"
revlist = (
(commit, (commit.tree / path).data_stream.read())
for commit in repo.iter_commits(paths=path)
)
for commit, filecontents in revlist:
filecontentsjs = json.loads(filecontents)
pprint(commit)
pprint(filecontentsjs["execution_status"])
pprint(filecontentsjs["execution_end_time"])
Problem: i am comparing our bitbucket history and the history i get from this script and the script comes up short, meaning the bitbucket history has more revisions of the file but when i clone the repo locally i get like half of the revisions with the script
am i missing something here? limitation or anything like that?

so it turns out it was my fault i didn't notice that the filename changed slightly and bitbucket did what it supposed to do since it thought "if the code is the same the file is the same" which is not true
so adding the --follow flag in the git log i was seeing the full "bad" history.
The real "good" history is without the --follow since i care only about the file with the same name

Related

How do I import in django_cities_light?

I noticed I could not find Newport, Oregon in my django_cities_light django application. It is a small city with population slightly above 10k, so I downloaded cities1000.zip which contains cities with a population higher than 1k. I unzipped this file and started searching for Newport's id and indeed it is there:
5742750 Newport Newport N'juport,Newport (Oregon),Njuport,ONP,nyupoteu,nyupoto,nywbwrt,nywpwrt awrgn,Њупорт,Ньюпорт,Нюпорт,نيوبورت,نیوپورت، اورگن,نیوپورٹ، اوریگون,ニューポート,뉴포트 44.63678 -124.05345 P PPLA2 US OR ...
Now, I have in my myapp/settings/development.py the following:
CITIES_LIGHT_TRANSLATION_LANGUAGES = ['en']
CITIES_LIGHT_INCLUDE_CITY_TYPES = ['PPL', 'PPLA', 'PPLA2', 'PPLA3', 'PPLA4', 'PPLC', 'PPLF', 'PPLG', 'PPLL', 'PPLR', 'PPLS', 'STLMT',]
CITIES_LIGHT_APP_NAME = 'jobs'
CITIES_LIGHT_CITY_SOURCES = ['http://download.geonames.org/export/dump/cities1000.zip'] # <-- this added as part of this task
I added CITIES_LIGHT_CITY_SOURCES following this post and the information here.
I then tried to import from using the following command, which I understand downloads the cities1000 file specified in myapp.settings.development:
python manage.py cities_light --settings=myapp.settings.development --force-all --progress
Newport, Oregon, with id 5742750 is not found in my database. I also cannot see from the command that my settings file is used and that the value of CITIES_LIGHT_CITY_SOURCES is overridden properly.
Does anyone know what I'm doing wrong and how to properly add from the source files? Thx!
EDIT: I added DJANGO_SETTINGS_MODULE explicity as an env var, went into the cities_light directory in my virtual environment and added a print that checks the value of DJANGO_SETTINGS_MODULE. It points to my settings file. I also added a print that prints the value of CITIES_LIGHT_CITY_SOURCES and it works. I also went into the cities_light/data and saw that both cities1000.zip and cities1000.txt were there. I deleted them, ran the command again, and they were there again. Still no success in having Newport, Oregon in my database.
EDIT2: It seems that I either was doing something wrong (likely) or there is a bug in cities_light (less likely). If I started with a database previously populated, lets say with cities15000.zip, and then try to populate it like I have tried before, it wont work. If I start from a completely empty database, and then run as I mentioned, it works. One thing I did notice is that I made a script to manually insert the cities of cities1000.txt. The insert for Newport worked successfully, so I then went to check the database and now ALL the cities were there. I am not sure how this happened, maybe they had been there for a while and I just missed it.
EDIT 3: Important to not confuse id and geoname_id.

ValueError: need more than 0 values to unpack (Python 2)

I am trying to replicate another researcher's findings by using the Python file that he added as a supplement to his paper. It is the first time I am diving into Python, so the error might be extremely simple to fix, yet after two days I haven't still. For context, in the Readme file there's the following instruction:
"To run the script, make sure Python2 is installed. Put all files into one folder designated as “cf_dir”.
In the script I get an error at the following lines:
if __name__ == '__main__':
cf_dir, cf_file, cf_phys_file = sys.argv[1:4]
os.chdir(cf_dir)
cf = pd.read_csv(cf_file)
cf_phys = pd.read_csv(cf_phys_file)
ValueError: need more than 0 values to unpack
The "cf_file" and "cf_phys_file" are two major components of all files that are in the one folder named "cf_dir". The "cf_phys_file" relates only to two survey question's (Q22 and Q23), and the "cf_file" includes all other questions 1-21. Now it seems that the code is meant to retrieve those two files from the directory? Only for the "cf_phys_file" the columns 1:4 are needed. The current working directory is already set at the right location.
The path where I located "cf_dir" is as follows:
C:\Users\Marc-Marijn Ossel\Documents\RSM\Thesis\Data\Suitable for ML\Data en Artikelen\Per task Suitability for Machine Learning score readme\cf_dir
Alternative option in readme file,
In the readme file there's this option, but also here I cannot understand how to direct the path to the right location:
"Run the following command in an open terminal (substituting for file names
below): python cfProcessor_AEAPnP.py cf_dir cf_file cf_phys_file task_file jobTaskRatingFile
jobDataFile OESfile
This should generate the data and plots as necessary."
When I run that in "Command Prompt", I get the following error, and I am not sure how to set the working directory correctly.
- python: can't open file 'cfProcessor_AEAPnP.py': [Errno 2] No such file or directory
Thanks for the reading, and I hope there's someone who could help me!
Best regards & stay safe out there during Corona!!
Marc

cf_dir, cf_file, cf_phys_file = sys.argv[1:4]
means, the python file expects few arguments when called.
In order to run
python cfProcessor_AEAPnP.py cf_dir cf_file cf_phys_file task_file jobTaskRatingFile jobDataFile OESfile
the command prompt should be in that folder.
So, open command prompt and type
cd path_to_the_folder_where_ur_python_file_is_located
Now, you would have reached the path of the python file.
Also, make sure you give full path in double quotes for the arguments.

How to validate a file downloaded from svn - Python, osx

I download files from SVN, the files are quite big (~300MB)
I use the script many times and in case the file already exists i don't want downloading it again, for that i need to validate that the file i have is the same that is on SVN (connectivity issues kill my script sometimes and the download doesn't finish)
Is there any way validating before downloading? (like get the checksum of the svn file, which i couldn't find how to do)
I'm using this line for the download:
fn,download_exit_code=urllib.urlretrieve(url_path,ver_name)
Thanks ahead...

You could check the SVN revision number. Here's some code which works for me:
import subprocess
svn_url = ...
version_string = subprocess.check_output("svnversion")
parts = version_string.split(":")
svn_version = int(''.join([s for s in parts[-1] if s.isdigit()]))
repository_info = subprocess.check_output(
"svn info " + svn_url
)
for line in repository_info.split("\n"):
if line.lower().startswith("last changed rev"):
latest_svn_version = int(''.join([s for s in line if s.isdigit()]))
if svn_version < latest_svn_version:
# Download new
Does require you to use SVN to download your file, though (through an svn up).

If you have access to svn you could checkout the file with svn checkout and then run svn update. svn update will update the file if it has changed and otherwise do nothing.
import os
import subprocess
os.chdir('/path/to/file')
subprocess.check_call('svn update')

Using actual branch head revision number in Hudson

We have multiple branches in SVN and use Hudson CI jobs to maintain our builds. We use SVN revision number as part of our application version number. The issue is when a Hudson job check out HEAD of a brach, it is getting HEAD number of SVN not last committed revision of that brach. I know, SVN maintains revision numbers globally, but we want to reflect last committed number of particular brach in our version.
is there a way to get last committed revision number of a brach using python script so that I can checkout that branch using that revision number?
or better if there a way to do it in Hudson itself?
Thanks.

Getting the last committed revision of a path using python:
from subprocess import check_output as run # >=2.7
path = './'
cmd = ['svn', '--username', XXXX, '--password', XXXX, '--non-interactive', 'info', path]
out = run(cmd).splitlines()
out = (i.split(':', 1) for i in out if i)
info = {k:v.strip() for k,v in out}
# you can access the other svn info fields in a similar manner
rev = info['Last Changed Rev']
with open('.last-svn-commit', 'w') as fh:
fh.write(rev)
I don't think the subversion scm plugin can give you the information you need (it exports SVN_URL and SVN_REVISION only). Keep in mind that there is no difference between checking out the 'Last changed Rev' and the HEAD revision - they both refer to the same content in your branch.
You might want to consider using a new job for every branch you have. This way, the commit that triggers a build will be the 'Last changed Rev' (unless you trigger it yourself). You can do this manually by cloning the trunk job and changing the repository url, or you could use a tool like jenkins-autojobs to do it automatically.

Except svn info you can also use svn log -q -l 1 URL or svn ls -v --depth empty URL

How does one add a svn repository build number to Python code?

EDIT: This question duplicates How to access the current Subversion build number? (Thanks for the heads up, Charles!)
Hi there,
This question is similar to Getting the subversion repository number into code
The differences being:
I would like to add the revision number to Python
I want the revision of the repository (not the checked out file)
I.e. I would like to extract the Revision number from the return from 'svn info', likeso:
$ svn info
Path: .
URL: svn://localhost/B/trunk
Repository Root: svn://localhost/B
Revision: 375
Node Kind: directory
Schedule: normal
Last Changed Author: bmh
Last Changed Rev: 375
Last Changed Date: 2008-10-27 12:09:00 -0400 (Mon, 27 Oct 2008)
I want a variable with 375 (the Revision). It's easy enough with put $Rev$ into a variable to keep track of changes on a file. However, I would like to keep track of the repository's version, and I understand (and it seems based on my tests) that $Rev$ only updates when the file changes.
My initial thoughts turn to using the svn/libsvn module built in to Python, though I can't find any documentation on or examples of how to use them.
Alternatively, I've thought calling 'svn info' and regex'ing the code out, though that seems rather brutal. :)
Help would be most appreciated.
Thanks & Cheers.

There is a command called svnversion which comes with subversion and is meant to solve exactly that kind of problem.

Stolen directly from django:
def get_svn_revision(path=None):
rev = None
if path is None:
path = MODULE.__path__[0]
entries_path = '%s/.svn/entries' % path
if os.path.exists(entries_path):
entries = open(entries_path, 'r').read()
# Versions >= 7 of the entries file are flat text. The first line is
# the version number. The next set of digits after 'dir' is the revision.
if re.match('(\d+)', entries):
rev_match = re.search('\d+\s+dir\s+(\d+)', entries)
if rev_match:
rev = rev_match.groups()[0]
# Older XML versions of the file specify revision as an attribute of
# the first entries node.
else:
from xml.dom import minidom
dom = minidom.parse(entries_path)
rev = dom.getElementsByTagName('entry')[0].getAttribute('revision')
if rev:
return u'SVN-%s' % rev
return u'SVN-unknown'
Adapt as appropriate. YOu might want to change MODULE for the name of one of your codemodules.
This code has the advantage of working even if the destination system does not have subversion installed.

Python has direct bindings to libsvn, so you don't need to invoke the command line client at all. See this blog post for more details.
EDIT: You can basically do something like this:
from svn import fs, repos, core
repository = repos.open(root_path)
fs_ptr = repos.fs(repository)
youngest_revision_number = fs.youngest_rev(fs_ptr)

I use a technique very similar to this in order to show the current subversion revision number in my shell:
svnRev=$(echo "$(svn info)" | grep "^Revision" | awk -F": " '{print $2};')
echo $svnRev
It works very well for me.
Why do you want the python files to change every time the version number of the entire repository is incremented? This will make doing things like doing a diff between two files annoying if one is from the repo, and the other is from a tarball..

If you want to have a variable in one source file that can be set to the current working copy revision, and does not replay on subversion and a working copy being actually available at the time you run your program, then SubWCRev my be your solution.
There also seems to be a linux port called SVNWCRev
Both perform substitution of $WCREV$ with the highest commit level of the working copy. Other information may also be provided.

Based on CesarB's response and the link Charles provided, I've done the following:
try:
from subprocess import Popen, PIPE
_p = Popen(["svnversion", "."], stdout=PIPE)
REVISION= _p.communicate()[0]
_p = None # otherwise we get a wild exception when Django auto-reloads
except Exception, e:
print "Could not get revision number: ", e
REVISION="Unknown"
Golly Python is cool. :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.