How would I go about getting the full commit hash in Dulwich? - python

I would like to get the behavior of git show -s --format=%H in Dulwich; i.e. getting the full commit hash pointed to by HEAD. However, as it turns out the porcelain.show() function behaves pretty much like git show but doesn't seem to know any additional options like the Git CLI.
I am not surprised, given porcelain.describe() behaves similarly. But what alternative means do I have with Dulwich to see the full commit hash of HEAD?
For the abbreviated - albeit hardcoded to 7 characters (!) - hash I can use the aforementioned porcelain.describe().

By consulting the code for porcelain.describe() we can pull the pieces together.
open_repo_closing offers a nice context manager for the dulwich.repo.BaseRepo class, with contextlib.closing behavior
BaseRepo.head() contains the information as bytes
A minimal implementation could look like this:
def get_latest_hash(repo):
from dulwich.porcelain import open_repo_closing
with open_repo_closing(repo) as r:
return r.head().decode("ascii")
Simpler than I initially expected.

Related

Equivalent to python's -R option that affects the hash of ints

We have a large collection of python code that takes some input and produces some output.
We would like to guarantee that, given the identical input, we produce identical output regardless of python version or local environment. (e.g. whether the code is run on Windows, Mac, or Linux, in 32-bit or 64-bit)
We have been enforcing this in an automated test suite by running our program both with and without the -R option to python and comparing the output, assuming that would shake out any spots where our output accidentally wound up dependent on iteration over a dict. (The most common source of non-determinism in our code)
However, as we recently adjusted our code to also support python 3, we discovered a place where our output depended in part on iteration over a dict that used ints as keys. This iteration order changed in python3 as compared to python2, and was making our output different. Our existing tests (all on python 2.7) didn't notice this. (Because -R doesn't affect the hash of ints) Once found, it was easy to fix, but we would like to have found it earlier.
Is there any way to further stress-test our code and give us confidence that we've ferreted out all places where we end up implicitly depending on something that will possibly be different across python versions/environments? I think that something like -R or PYTHONHASHSEED that applied to numbers as well as to str, bytes, and datetime objects could work, but I'm open to other approaches. I would however like our automated test machine to need only a single python version installed, if possible.
Another acceptable alternative would be some way to run our code with pypy tweaked so as to use a different order when iterating items out of a dict; I think our code runs on pypy, though it's not something we've ever explicitly supported. However, if some pypy expert gives us a way to tweak dictionary iteration order on different runs, it's something we'll work towards.
Using PyPy is not the best choice here, given that it always retain the insertion order in its dicts (with a method that makes dicts use less memory). We can of course make it change the order dicts are enumerated, but it defeats the point.
Instead, I'd suggest to hack at the CPython source code to change the way the hash is used inside dictobject.c. For example, after each hash = PyObject_Hash(key); if (hash == -1) { ..error.. }; you could add hash ^= HASH_TWEAK; and compile different versions of CPython with different values for HASH_TWEAK. (I did such a thing at one point, but I can't find it any more. You need to be a bit careful about where the hash values are the original ones or the modified ones.)

Execution Code Tracking - How to know which code has been executed in project?

Let say that I have open source project from which I would like to borrow some functionality. Can I get some sort of report generated during execution and/or interaction of this project?
Report should contain e.g.:
which functions has been called,
in which order,
which classes has been instantiated etc.?
Would be nice to have some graphic output for that... you know, if else tree and highlighted the executed branch etc.
I am mostly interested in python and C (perl would be fine too) but if there is any universal tool that cover multiple languages (or one tool per language) for that, it would be very nice.
PS: I am familiar with debuggers but I do not want to step every singe line of code and check if this is the correct instruction. I'm assuming that if functions/methods/classes etc. are properly named then one can get some hints about where to find desired piece of code. But only naming is not enough because you do not know (from brief overview of code) if hopefully looking function foo() does not require some data that was generated by obscure function bar() etc. For that reason I am looking for something that can visualize relations between running code.
PS: Do not know if this is question for SO or programmers.stackexchange. Feel free to move if you wish. PS: I've noticed that tags that I've used are not recommended but execution flow tracking is the best phrase to describe this process
Check out Ned Batchelder's coverage and perhaps the graphviz/dot library called pycallgraph. May not be exactly what you need and also (python-only) but in the ballpark.
Pycallgraph is actually likelier to be of interest because it shows the execution path, not just what codelines got executed. It only renders to PDF normally, but it wasn't too difficult to get it to do SVG instead (dot/graphviz supports svg and other formats, pycallgraph was hardcoding pdf rendering).
Neither will do exactly what you want but they are a start.

Maintaining two versions of an ipython notebook

I often need to create two versions of an ipython notebook: One contains tasks to be carried out (usually including some python code and output), the other contains the same text plus solutions. Let's call them the assignment and the solution.
It is easy to generate the solution document first, then strip the answers to generate the assignment (or vice versa). But if I subsequently need to make changes (and I always do), I need to repeat the stripping process. Is there a reasonable workflow that will allow changes in the assignment to be propagated to the solutions document?
Partial self-answer: I have experimented with leveraging mercurial's hg copy, which will let two files with different names share history. But I can only get this to work if assignment and solution are in different directories, in two linked hg repositories. I would much prefer a simpler set-up. I've also noticed that diff gets very confused when one JSON file has more sections than another, making a VCS-based solution even less attractive. (To be clear: Ordinary use of a VCS with notebooks is fine; it's the parallel versions that stumble).
This question covers similar ground, but does not solve my problem. In fact an answer to my question would solve the OP's second remaining problem, "pulling changes" (see the Update section).
It sounds like you are maintaining an assignment and an answer key of some kind and want to be able to distribute the assignments (without solutions) to students, and still have the answers for yourself or a TA.
For something like this, I would create two branches "unsolved" and "solved". First write the questions on the "unsolved" branch. Then create the "solved" branch from there and add the solutions. If you ever need to update a question, update back to the "unsolved" branch, make the update and merge the change into "solved" and fix the solution.
You could try going the other way, but my hunch is that going "backwards" from solved to unsolved might be strange to maintain.
After some experimentation I concluded that it is best to tackle this by processing the notebook's JSON code. Version control systems are not the right approach, for the following reasons:
JSON doesn't diff very well when adding or deleting cells. A minimal change leads to mis-matched braces and a very messy diff.
In my use case, the superset version of the file (containing both the assignments and their solutions) must be the source document. This is because the assignment includes example code and output that depends on earlier parts, to be written by the students. This model does not play well with version control, as pointed out by #ChrisPhillips in his answer.
I ended up filtering the JSON structure for the notebook and stripping out the solution cells; they may be recognized via special metadata (which can be set interactively using the metadata button in the interface), or by pattern-matching on the cell contents. The following snippet shows how to filter out cells whose first line starts with # SOLUTION:
def stripcell(cell, pattern):
"""Check if the first line of the cell's content matches `pattern`"""
if cell["cell_type"] == "code":
content = cell["input"]
else:
content = cell["source"]
return ( len(content) > 0 and re.search(pattern, content[0]) )
pattern = r"^# SOLUTION:"
struct = json.load(open("input.ipynb"))
cells = struct["worksheets"][0]["cells"]
struct["worksheets"][0]["cells"] = [ c for c in cells if not stripcell(c, pattern) ]
json.dump(struct, open("output.ipynb", "wb"), indent=1)
I used the generic json library rather than the notebook API. If there's a better way to go about it, please let me know.

How do I compare two nested data structures for unittesting?

For those who know perl, I'm looking for something similar to Test::Deep::is_deeply() in Python.
In Python's unittest I can conveniently compare nested data structures already, if I expect them to be equal:
self.assertEqual(os.walk('some_path'),
my.walk('some_path'),
"compare os.walk with my own implementation")
However, in the wanted test, the order of files in the respective sublist of the os.walk tuple shall be of no concern.
If it was just this one test it would be ok to code an easy solution. But I envision several tests on differently structured nested data. And I am hoping for a general solution.
I checked Python's own unittest documentation, looked at pyUnit, and at nose and it's plugins. Active maintenance would also be an important aspect for usage.
The ultimate goal for me would be to have a set of descriptive types like UnorderedIterable, SubsetOf, SupersetOf, etc which can be called to describe a nested data structure, and then use that description to compare two actual sets of data.
In the os.walk example I'd like something like:
comparison = OrderedIterable(
OrderedIterable(
str,
UnorderedIterable(),
UnorderedIterable()
)
)
The above describes the kind of data structure that list(os.walk()) would return. For comparison of data A and data B in a unit test, the current path names would be cast into a str(), and the dir and file lists would be compared ignoring the order with:
self.assertDeep(A, B, comparison, msg)
Is there anything out there? Or is it such a trivial task that people write their own? I feel comfortable doing it, but I don't want to reinvent, and especially would not want to code the full orthogonal set of types, tests for those, etc. In short, I wouldn't publish it and thus the next one has to rewrite again...
Python Deep seems to be a project to reimplement perl's Test::Deep. It is written by the author of Test::Deep himself. Last development happened in early 2016.
Update (2018/Aug): Latest release (2016/Feb) is located on PyPi/Deep
I have done some P3k porting work on github
Not a solution, but the currently implemented workaround to solve the particular example listed in the question:
os_walk = list(os.walk('some_path'))
dt_walk = list(my.walk('some_path'))
self.assertEqual(len(dt_walk), len(os_walk), "walk() same length")
for ((osw, osw_dirs, osw_files), (dt, dt_dirs, dt_files)) in zip(os_walk, dt_walk):
self.assertEqual(dt, osw, "walk() currentdir")
self.assertSameElements(dt_dirs, osw_dirs, "walk() dirlist")
self.assertSameElements(dt_files, osw_files, "walk() fileList")
As we can see from this example implementation that's quite a bit of code. As we can also see, Python's unittest has most of the ingredients required.

Vim Python omni-completion failing to work on system modules

I'm noticing that even for system modules, code completion doesn't work too well.
For example, if I have a simple file that does:
import re
p = re.compile(pattern)
m = p.search(line)
If I type p., I don't get completion for methods I'd expect to see (I don't see search() for example, but I do see others, such as func_closure(), func_code()).
If I type m., I don't get any completion what so ever (I'd expect .groups(), in this case).
This doesn't seem to affect all modules.. Has any one seen this behaviour and knows how to correct it?
I'm running Vim 7.2 on WinXP, with the latest pythoncomplete.vim from vim.org (0.9), running python 2.6.2.
Completion for this kind of things is tricky, because it would need to execute the actual code to work.
For example p.search() could return None or a MatchObject, depending on the data that is passed to it.
This is why omni-completion does not work here, and probably never will. It works for things that can be statically determined, for example a module's contents.
I never got the builtin omnicomplete to work for any languages. I had the most success with pysmell (which seems to have been updated slightly more recently on github than in the official repo). I still didn't find it to be reliable enough to use consistently but I can't remember exactly why.
I've resorted to building an extensive set of snipMate snippets for my primary libraries and using the default tab completion to supplement.

Categories