Equivalent to python's -R option that affects the hash of ints - python

We have a large collection of python code that takes some input and produces some output.
We would like to guarantee that, given the identical input, we produce identical output regardless of python version or local environment. (e.g. whether the code is run on Windows, Mac, or Linux, in 32-bit or 64-bit)
We have been enforcing this in an automated test suite by running our program both with and without the -R option to python and comparing the output, assuming that would shake out any spots where our output accidentally wound up dependent on iteration over a dict. (The most common source of non-determinism in our code)
However, as we recently adjusted our code to also support python 3, we discovered a place where our output depended in part on iteration over a dict that used ints as keys. This iteration order changed in python3 as compared to python2, and was making our output different. Our existing tests (all on python 2.7) didn't notice this. (Because -R doesn't affect the hash of ints) Once found, it was easy to fix, but we would like to have found it earlier.
Is there any way to further stress-test our code and give us confidence that we've ferreted out all places where we end up implicitly depending on something that will possibly be different across python versions/environments? I think that something like -R or PYTHONHASHSEED that applied to numbers as well as to str, bytes, and datetime objects could work, but I'm open to other approaches. I would however like our automated test machine to need only a single python version installed, if possible.
Another acceptable alternative would be some way to run our code with pypy tweaked so as to use a different order when iterating items out of a dict; I think our code runs on pypy, though it's not something we've ever explicitly supported. However, if some pypy expert gives us a way to tweak dictionary iteration order on different runs, it's something we'll work towards.

Using PyPy is not the best choice here, given that it always retain the insertion order in its dicts (with a method that makes dicts use less memory). We can of course make it change the order dicts are enumerated, but it defeats the point.
Instead, I'd suggest to hack at the CPython source code to change the way the hash is used inside dictobject.c. For example, after each hash = PyObject_Hash(key); if (hash == -1) { ..error.. }; you could add hash ^= HASH_TWEAK; and compile different versions of CPython with different values for HASH_TWEAK. (I did such a thing at one point, but I can't find it any more. You need to be a bit careful about where the hash values are the original ones or the modified ones.)

Related

Does Python load in function arguments into registers or does it keep them on the stack?

So I'm writing a function that takes in a tuple as an argument and does a bunch of stuff to it. Here is what that looks like:
def swap(self, location):
if (location[0] < 0 or location[1] < 0 or
location[0] >= self.r or location[1] >= self.c):
return False
self.board[0][0] = self.board[location[0]][location[1]]
self.board[location[0]][location[1]] = 0
self.empty = (location[0],location[1])
I'm trying to make my code as efficient as possible, so since I am not modifying the values of location, does it make sense to load the variables in registers (loc0 = location[0]; loc1 = location[1]) for faster computations (zero-cycle read) or is location already loaded into registers by the Python compiler when it's passed in as a function argument?
Edit: I bit the bullet and ran some tests. Here are the results (in seconds) for this function running 10 million times with the repeating inputs: "up", "down", "left", "right" (respectively)
Code as is:
run#1: 19.39
run#2: 17.18
run#3: 16.85
run#4: 16.90
run#5: 16.74
run#6: 16.76
run#7: 16.94
Code after defining location[0] and location[1] in the beginning of the function:
run#1: 14.83
run#2: 14.79
run#3: 14.88
run#4: 15.033
run#5: 14.77
run#6: 14.94
run#7: 14.67
That's an average of 16% increase in performance. Definitely not insignificant for my case. Of course, this is not scientific as I need to do more tests in more environments with more inputs, but enough for my simple use case!
Times measured using Python 2.7 on a Macbook Pro (Early 2015), which has a Broadwell i5-5257U CPU (2c4t max turbo 3.1GHz, sustained 2.7GHz, 3MB L3 cache).
IDE was: PyCharm Edu 3.5.1 JRE: 1.8.0_112-release-408-b6 x86_64 JVM: OpenJDK 64-Bit Server VM .
Unfortunately, this is for a class that grades based on code speed.
If you're using an interpreter, it's unlikely that any Python variables will live in registers between different expressions. You could look at how the Python source compiled to byte-code.
Python bytecode (the kind stored in files outside the interpreter) is stack-based (http://security.coverity.com/blog/2014/Nov/understanding-python-bytecode.html). This byte-code is then interpreted or JIT-compiled to native machine code. Regular python only interprets, so it's not plausible for it to keep python variables in machine registers across multiple statements.
An interpreter written in C might keep the top of the bytecode stack in a local variable inside an interpret loop, and the C compiler might keep that C variable in a register. So repeated use of the same Python variable might end up not having too many store/reload round-trips.
Note that store-forwarding latency on your Broadwell CPU is about 4 or 5 clock cycles, nowhere near the hundreds of cycles for a round-trip to DRAM. A store/reload doesn't even have to wait for the store to retire and commit to L1D cache; it's forwarded directly from the store buffer. Related: http://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/ and http://agner.org/optimize/, and other links in the x86 tag wiki). Load-use latency is also only 5 clock cycles for an L1D cache hit (latency from address being ready to data being ready. You can measure it by pointer-chasing through a linked list (in asm).) There's enough interpreter overhead (total number of instructions it runs to figure out what to do next) that this probably isn't even the bottleneck.
Keeping a specific python variable in a register is not plausible at all for an interpreter. Even if you wrote an interpreter in asm, the fundamental problem is that registers aren't addressable. An x86 add r14d, eax instruction has to have both registers hard-coded into the instruction's machine-code. (Every other ISA works the same way: register numbers are part of the machine-code for the instruction, with no indirection based on any data). Even if the interpreter did the work to figure out that it needed to "add reg-var #3 to reg-var #2" (i.e. decoding the bytecode stack operations back into register variables for an internal representation that it interprets), it would have to use a different function than any other combination of registers.
Given an integer, the only ways to get the value of the Nth register are branching to an instruction that uses that register, or storing all the registers to memory and indexing the resulting array. (Or maybe some kind of branchless compare and mask stuff).
Anyway, trying to do anything specific about this is not profitable, which is why people just write the interpreter in C and let the C compiler do a (hopefully) good job of optimizing the machine code that will actually run.
Or you write a JIT-compiler like Sun did for Java (the HotSpot VM). IDK if there are any for Python. See Does the Python 3 interpreter have a JIT feature?.
A JIT-compiler does actually turn the Python code into machine code, where register state mostly holds Python variables rather than interpreter data. Again, without a JIT compiler (or ahead-of-time compiler), "keeping variables in registers" is not a thing.
It's probably faster because it avoids the [] operator and other overhead (see Bren's answer, which you accepted)
Footnote: a couple ISAs have memory-mapped registers. e.g. AVR (8-bit RISC microcontrollers), where the chip also has built-in SRAM containing the low range of memory addresses that includes the registers. So you can do an indexed load and get register contents, but you might as well have done that on memory that wasn't holding architectural register contents.
The Python VM only uses a stack to execute its bytecode, and this stack is completely independent of the hardware stack. You can use dis to disassemble your code to see how your changes affect the generated bytecode.
It will be a little faster if you store these two variable:
loc0 = location[0]
loc1 = location[1]
Because there will be only two look-up instead of four.
Btw, if you want to use python, you shouldn't take care about performance in this low level.
Those kinds of details are not part of the specified behavior of Python. As Ignacio's answer says, CPython does it one way, but that is not guaranteed by the language itself. Python's description of what it does is very far removed from low-level notions like registers, and most of the time it's not useful to worry about how what Python does maps onto those details. Python is a high-level language whose behavior is defined in terms of high-level abstractions, akin to an API.
In any case, doing something like loc0 = language[0] in Python code has nothing to do with setting registers. It's just creating new Python name pointing an existing Python object.
That said, there is a performance difference, because if you use location[0] everywhere, the actual lookup will (or at least may -- in theory a smart Python implementation could optimize this) happen again and again every time the expression location[0] is evaluated. But if you do loc0 = location[0] and then use loc0 everywhere, you know the lookup only happens once. In typical situations (e.g., location is a Python list or dict, you're not running this code gazillions of times in a tight loop) this difference will be tiny.

Python script crashes when executed from server

I am executing locally a python script through a php exec function and everything works fine.
Now its the time I need to move the project in the server and I am facing some problem when it comes in executing the same python script.
In both cases I have the same version of Python (2.7.3) and all the required libs are installed.
I have spotted where the issue is created but I can not figure out why.
Its this line in my python script:
import networkx as nx
CG1_nodes=nx.connected_components(G1)[:]
It runs successfully in the locally but it crashes in the server. I have found out that if I remove the:
[:]
then it works. I have also checked the content of G1 and its populated.
Any idea what I am missing here?
You're consuming a generator. It's possible that it's has billions of items. If that's the case - python may be out of resources. Make sure you're not overloading the system by checking how big the resulting list will be.
I'd also take a look at problems with slicing in libraries that are used by networkx (NumPy? SciPy?). Perhaps try:
CG1_nodes=list(nx.connected_components(G1))
To avoid slicing.
You should check if you have the same version of networkx in both cases.
In older networkx versions nx.connected_components(G1) was a list. In the newer version (1.9.1) it's a generator. If X is a generator, then X[:] doesn't work. But if X is a list it does. So if your machine and the server have different versions, then in one case it would be allowable but not the other.
You "fixed" this by removing the [:], and so CG1_nodes is now a generator rather than a list. As long as your earlier use of it is consistent with a generator, the results would (probably) be the same. So the two codes would work. Obviously making it a list explicitly would work, but might be memory intensive.
More details are documented here. In particular note:
To recover the earlier behavior, use list(connected_components(G)).
I believe the previous version returned the list sorted by decreasing component size. The new version is not sorted. If you need it sorted you'll need to do a bit more:
sorted(list(nx.connected_components(G)), key = len, reverse=True)

Unable to store huge strings in-memory

I have data in the following form:
## De
A B C.
## dabc
xyz def ghi.
## <MyName_1>
Here is example.
## Df
A B C.
## <MyName_2>
De another one.
## <MyName_3>
Df next one.
## dabc1
xyz def ghi.
## <MyName_4>
dabc this one.
Convert it into the following form:
A B#1 C. //step 1 -- 1 assigned to the first occurrence of A B C.
xyz def#1 ghi. //1 assigned to first occurrence of xyz def ghi
Here is example
A B#2 C. //step 1 -- 2 assigned in increasing order
B#1 another one. //step 2
B#2 next one.
xyz def ghi.
def#1 this one.
// Here stand for comments and are not are part of the output.
The algorithm is the following.
If the second line following ## gets repeated. Then, append to the
middle word#number, where number is a numeric identifier and is
assigned in increasing order of repetition of the second line.
Replace ##... with word#number where ever it occurs.
Remove all ## where the second line is not getting repeated.
In order to achieve this I am storing all the triples and then finding their occurrences in order to assign numbers in increasing order. Is there some other way to achieve the same in python. Actually my file is 500GB and it is not possible to store all the triples in-memory in order to find their occurrences.
If you need something that's like a dict, but too big to hold in memory, what you need is a key-value database.
The simplest way to do this is with a dbm-type library, which is a very simple key-value database with almost exactly the same interface as a dict, except that it only allows strings for keys and values, and has some extra methods to control persistence and caching and the like. Depending on your platform and how your Python 2.7 was built, you may have any of:
dbm
gdbm
dumbdbm
dbhash
bsddb
bsddb185
bsddb3
PyBSDDB
The last three are all available on PyPI if your Python install doesn't include them, as long as you have the relevant version of libbsddb itself and don't have any problems with its license.
The problem is that, depending on your platform, the various underlying database libraries may not exist (although of course you can download the C library, install it, then build and install the Python wrapper), or may not support databases this big, or may do so but only in a horribly inefficient way (or, in a few cases, in a buggy way…).
Hopefully one of them will work for you, but the only way you'll really know is to test all of the ones you have.
Of course, if I understand properly, you're mapping strings to integers, not to strings. You could use the shelve module, which wraps any dbm-like library to allow you to use string keys but anything picklable as values… but that's huge overkill (and may kill your performance) for a case like this; you just need to change code like this:
counts.setdefault(key, 0)
counts[key] += 1
… into this:
counts.setdefault(key, '0')
counts[key] = str(int(counts[key]) + 1)
And of course you can easily write a wrapper class that does this for you (maybe even one that supports the Counter interface instead of the dict interface).
If that doesn't work, you need a more powerful database.
Most builds of Python come with sqlite3 in the stdlib, but using it will require learning a pretty low-level API, and learning SQL, which is a whole different language that's very unlike Python. (There are also a variety of different relational databases out there, but you shouldn't need any of them.)
There are also a variety of query expression libraries and even full object-relational mappers, like SQLAlchemy (which can be used either way) that let you write your queries in a much more Pythonic way, but it's still not going to be as simple as using a dict or dbm. (That being said, it's not that hard to wrap a dbm-like interface around SQLAlchemy.)
There are also a wide variety of non-relational or semi-relational databases that are generally lumped under the term NoSQL, the simplest of which are basically dbm on steroids. Again, they'll usually require learning a pretty low-level API, and sometimes a query language as well—but some of them will have nice Python libraries that make them easier to use.

Python ordered garbage collectible dictionary?

I want my Python program to be deterministic, so I have been using OrderedDicts extensively throughout the code. Unfortunately, while debugging memory leaks today, I discovered that OrderedDicts have a custom __del__ method, making them uncollectable whenever there's a cycle. It's rather unfortunate that there's no warning in the documentation about this.
So what can I do? Is there any deterministic dictionary in the Python standard library that plays nicely with gc? I'd really hate to have to roll my own, especially over a stupid one line function like this.
Also, is this something I should file a bug report for? I'm not familiar with the Python library's procedures, and what they consider a bug.
Edit: It appears that this is a known bug that was fixed back in 2010. I must have somehow gotten a really old version of 2.7 installed. I guess the best approach is to just include a monkey patch in case the user happens to be running a broken version like me.
If the presence of the __del__ method is problematic for you, just remove it:
>>> import collections
>>> del collections.OrderedDict.__del__
You will gain the ability to use OrderedDicts in a reference cycle. You will lose having the OrderedDict free all its resources immediately upon deletion.
It sounds like you've tracked down a bug in OrderedDict that was fixed at some point after your version of 2.7. If it wasn't in any actual released versions, maybe you can just ignore it. But otherwise, yeah, you need a workaround.
I would suggest that, instead of monkeypatching collections.OrderedDict, you should instead use the Equivalent OrderedDict recipe that runs on Python 2.4 or later linked in the documentation for collections.OrderedDict (which does not have the excess __del__). If nothing else, when someone comes along and says "I need to run this on 2.6, how much work is it to port" the answer will be "a little less"…
But two more points:
rewriting everything to avoid cycles is a huge amount of effort.
The fact that you've got cycles in your dictionaries is a red flag that you're doing something wrong (typically using strong refs for a cache or for back-pointers), which is likely to lead to other memory problems, and possibly other bugs. So that effort may turn out to be necessary anyway.
You still haven't explained what you're trying to accomplish; I suspect the "deterministic" thing is just a red herring (especially since dicts actually are deterministic), so the best solution is s/OrderedDict/dict/g.
But if determinism is necessary, you can't depend on the cycle collector, because it's not deterministic, and that means your finalizer ordering and so on all become non-deterministic. It also means your memory usage is non-deterministic—you may end up with a program that stays within your desired memory bounds 99.999% of the time, but not 100%; if those bounds are critically important, that can be worse than failing every time.
Meanwhile, the iteration order of dictionaries isn't specified, but in practice, CPython and PyPy iterate in the order of the hash buckets, not the id (memory location) of either the value or the key, and whatever Jython and IronPython do (they may be using some underlying Java or .NET collection that has different behavior; I haven't tested), it's unlikely that the memory order of the keys would be relevant. (How could you efficiently iterate a hash table based on something like that?) You may have confused yourself by testing with objects that use id for hash, but most objects hash based on value.
For example, take this simple program:
d={}
d[0] = 0
d[1] = 1
d[2] = 2
for k in d:
print(k, d[k], id(k), id(d[k]), hash(k))
If you run it repeatedly with CPython 2.7, CPython 3.2, and PyPy 1.9, the keys will always be iterated in order 0, 1, 2. The id columns may also be the same each time (that depends on your platform), but you can fix that in a number of ways—insert in a different order, reverse the order of the values, use string values instead of ints, assign the values to variables and then insert those variables instead of the literals, etc. Play with it enough and you can get every possible order for the id columns, and yet the keys are still iterated in the same order every time.
The order of iteration is not predictable, because to predict it you need the function for converting hash(k) into a bucket index, which depends on information you don't have access to from Python. Even if it's just hash(k) % self._table_size, unless that _table_size is exposed to the Python interface, it's not helpful. (It's a complex function of the sequence of inserts and deletes that could in principle be calculated, but in practice it's silly to try.)
But it is deterministic; if you insert and delete the same keys in the same order every time, the iteration order will be the same every time.
Note that the fix made in Python 2.7 to eliminate the __del__ method and so stop them from being uncollectable does unfortunately mean that every use of an OrderedDict (even an empty one) results in a reference cycle which must be garbage collected. See this answer for more details.

Counting number of symbols in Python script

I have a Telit module which runs [Python 1.5.2+] (http://www.roundsolutions.com/techdocs/python/Easy_Script_Python_r13.pdf)!. There are certain restrictions in the number of variable, module and method names I can use (< 500), the size of each variable (16k) and amount of RAM (~ 1MB). Refer pg 113&114 for details. I would like to know how to get the number of symbols being generated, size in RAM of each variable, memory usage (stack and heap usage).
I need something similar to a map file that gets generated with gcc after the linking process which shows me each constant / variable, symbol, its address and size allocated.
Python is an interpreted and dynamically-typed language, so generating that kind of output is very difficult, if it's even possible. I'd imagine that the only reasonable way to get this information is to profile your code on the target interpreter.
If you're looking for a true memory map, I doubt such a tool exists since Python doesn't go through the same kind of compilation process as C or C++. Since everything is initialized and allocated at runtime as the program is parsed and interpreted, there's nothing to say that one interpreter will behave the same as another, especially in a case such as this where you're running on such a different architecture. As a result, there's nothing to say that your objects will be created in the same locations or even with the same overall memory structure.
If you're just trying to determine memory footprint, you can do some manual checking with sys.getsizeof(object, [default]) provided that it is supported with Telit's libs. I don't think they're using a straight implementation of CPython. Even still, this doesn't always work and with raise a TypeError when an object's size cannot be determined if you don't specify the default parameter.
You might also get some interesting results by studying the output of the dis module's bytecode disassembly, but that assumes that dis works on your interpreter, and that your interpreter is actually implemented as a VM.
If you just want a list of symbols, take a look at this recipe. It uses reflection to dump a list of symbols.
Good manual testing is key here. Your best bet is to set up the module's CMUX (COM port MUXing), and watch the console output. You'll know very quickly if you start running out of memory.
This post makes me recall my pain once with Telit GM862-GPS modules. My code was exactly at the point that the number of variables, strings, etc added up to the limit. Of course, I didn't know this fact by then. I added one innocent line and my program did not work any more. I drove me really crazy for two days until I look at the datasheet to find this fact.
What you are looking for might not have a good answer because the Python interpreter is not a full fledged version. What I did was to use the same local variable names as many as possible. Also I deleted doc strings for functions (those count too) and replace with #comments.
In the end, I want to say that this module is good for small applications. The python interpreter does not support threads or interrupts so your program must be a super loop. When your application gets bigger, each iteration will take longer. Eventually, you might want to switch to a faster platform.

Categories