I am creating finite element code in Python that relies on numpy and scipy for array, matrix and linear algebra calculations. The initial generated code seems to be working and I am getting the results I need.
However, for some other feature I need to call a function that performs the analysis more than one time and when I review the results they differ completely from the first call although both are called using the same inputs. The only thing I can think of is that the garbage collection is not working and the memory is being corrupted.
Here is the procedure used:
call setup file to generate model database: mDB = F0(inputs)
call first analysis with some variable input: r1 = F1(mDB, v1)
repeat first analysis with the same variable from step2: r2 = F1(mDB, v1)
Since nothing has changed, I would expect that the results from step#2 and step#3 would be the same, however, my code produces different results (verified using matplotlib).
I am using:
Python 2.7 (32bit) on Windows 7 with numpy-1.6.2 and scipy-0.11.0
If your results are sensitive to rounding error (e.g. you have some programming error in your code), then in general floating point results are not reproducible. This occurs already due to the way modern compilers optimize code, so it does not require e.g. accessing uninitialized memory.
Please see:
http://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf
Another likely possibility is that your computation function modifies the input data.
The point you mention in the comment above does not exclude this possibility,
as Python is pass-by-reference.
Ok, based on suggestion from above I found the problem.
I rely in my code on dictionaries (hash tables). I call the contents of the original input dictionary mDB and modify those, and I thought the original contents do not get changed inside a separate function, but they do. I come from Fortran and Matlab where these do not change.
The answer was to deepcopy the contents of my original dictionary rather than simple assignment. Note that I tried simple copy as in:
A = mDB['A'].copy()
but that did not work either. I had to use:
import copy
A = copy.deepcopy(mDB['A'])
I know some would say that I should read the manual that "Assignment statements in Python do not copy objects, they create bindings between a target and an object" (documentsation), but this is still new and weird behavior for me.
Any suggestions for using other than dictionaries for storing my original data?
Related
I can't seem to find the code for numpy argmax.
The source link in the docs lead me to here, which doesn't have any actual code.
I went through every function that mentions argmax using the github search tool and still no luck. I'm sure I'm missing something.
Can someone lead me in the right direction?
Thanks
Numpy is written in C. It uses a template engine that parsing some comments to generate many versions of the same generic function (typically for many different types). This tool is very helpful to generate fast code since the C language does not provide (proper) templates unlike C++ for example. However, it also make the code more cryptic than necessary since the name of the function is often generated. For example, generic functions names can look like #TYPE#_#OP# where #TYPE# and #OP# are two macros that can take different values each. On top of all of this, the CPython binding also make the code more complex since C functions have to be wrapped to be called from a CPython code with complex arrays (possibly with a high amount of dimensions and custom user types) and CPython arguments to decode.
_PyArray_ArgMinMaxCommon is a quite good entry point but it is only a wrapping function and not the main computing one. It is only useful if you plan to change the prototype of the Numpy function from Python.
The main computational function can be found here. The comment just above the function is the one used to generate the variants of the functions (eg. CDOUBLE_argmax). Note that there are some alternative specific implementation for alternative type below the main one like OBJECT_argmax since CPython objects and strings must be computed a bit differently. Thank you for contributing to Numpy.
As mentioned in the comments, you'll likely find what you are searching in the C code implementation (here under _PyArray_ArgMinMaxCommon). The code itself can be very convoluted, so if your intent was to open an issue on numpy with a broad idea, I would do it on the page you linked anyway.
My python code is performing fairly complex numerical calculations, and in many cases I am unable to provide known solutions to enable unit testing (especially for intermediate results).
However, I have found that I can catch a lot of bugs with nose, by performing regression testing using the following workflow:
Write test code to solve some relatively small problem
Run once, inspect the results (often in the form of a matplotlib plot), and decide by comparison with analytical results or other numerical software or physical intuition that the results are correct to within acceptable numerical accuracy.
Save the resulting numpy arrays to text files to act as a reference (FWIW I was avoiding numpy's saving routines as a workaround for this bug, but as this has been fixed in a released version think I can use them now).
The test code performs the calculation, and compares it with the reference data read in from the file using numpy's assert_allclose.
The test function is written in such a way as that by default it performs the test, but by passing non-default values for arguments I can plot the results and overwrite the reference file if it becomes necessary. The reference file is checked into git so there is little risk of accidentally overwriting the test values without noticing.
However, I find myself writing a lot of boilerplate code to implement the above functionality, which outweighs the actual test code itself. Cleaning this up would make it much easier to increase test coverage.
Is there some python testing framework or plugin for nose that could easily automate the above workflow?
A few months ago I wrote the nrtest utility in an attempt to make this workflow easier. It sounds like it might help you too.
Here's a quick overview. Each test is defined by its input files and its expected output files. Following execution, output files are stored in a portable benchmark directory. A second step then compares this benchmark to a reference benchmark. A recent update has enabled user extensions, so you can define comparison functions for your custom data.
I hope it helps.
I'm writing a game in Python in which the environment is generated randomly. Currently, the game's "save" function works by writing out all parts of the environment which the player has explored. The result is that save files are larger than they need to be—why write random data to disk when you can just generate it again?
What I could use is a random noise function: a function noise such that noise(x) returns a random number, and always the same number whenever it's called with the same value of x. Now, for each point (x,y) in the game's environment, instead of generating a random number using random() and storing the result in env[(x,y)], I can generate a random number using noise((x,y)), throw it away, and generate the same number later.
Not quite sure if I'm stating the obvious, but using some variation of a Perlin noise generator is a common way to do this. This post is a nice description of doing exactly this (as mentioned in the comments, it's not exactly Perlin noise)
For a given position, the Perlin function will return a random value (the position can be 2D, 3D or any dimensionality).
There is a noise module, and this page has a implementation of it
There's a similar thread on gamedev.SE
First, if you need it to be true that noise(x) would always return the same value for the same x, no matter what, even if it's never been called, then you can't really use randomness at all. A good hash function is the only possibility.
However, if you just need to be able to restore a previous state consisting of the values for all of the previously-explored points (never-explored points may turn out different after save and load than if you hadn't quit… but how can anyone tell without access to multiple universes?), and you don't want to store all of those points, then it might be reasonable to regenerate them.
But let's back up a step. You want something that acts like a hash function. Is there a hash function you can use?
I'd imagine the algorithms in hashlib are too slow (md5 is probably the fastest, but test them all), but I wouldn't reject them without actually testing.
It's possible that the "random period" of zlib.adler32 (or zlib.crc32) is too short, but I wouldn't reject it (except maybe hash) without thinking through whether it's good enough. For that matter, even hash plus a decent fixed-side blender function might be good enough (at least on a 64-bit system).
Python doesn't come with anything "between" md5 and `adler32' out of the box. But you can find PyPI modules or source recipes for hundreds of other hash algorithms. For that matter, if you're familiar with any particular hash algorithm that sounds good, most of them are trivial—you could probably code up, e.g., an FNV hash with xor-folding in less time than it takes you to look through the alternatives.
Also, keep in mind that you can generate a bunch of random bytes at "new game" time, store that in the save file, and use it as salt to your hash function.
If you've exhausted the possibilities are you really do need more randomness than a fast-enough hash function with arbitrary salt can give you alone, then:
It sounds like you'll already need to store a list of the points the user has explored (because how else do you know which points you need to restore?). And the order doesn't really matter. So, you can store them in the order of exploration. That means you can regenerate the values deterministically (just by iterating the list). Which means you can use the suggestion by #delnan on your own answer.
However, seed is not the way to do that. It isn't guaranteed to put the RNG into the same state each time across runs, Python versions, machines, etc. For that, you need setstate:
To save, call random.getstate(), and pickle and stash the result.
To load, read and unpickle the state, and call random.setstate(state).
See the docs for full details.
If you're using a random.Random instance, it's exactly the same, except of course that you have to construct a random.Random before you can call setstate on it.
This is guaranteed to work between runs of your program, across machines, etc. Even with a newer version of Python. However, it's not guaranteed to work with an older version of Python. (That is, if the user saves a game with Python 2.6, then tries to load it with 2.5, the state will not be compatible. I believe the only problems come with 2.6->older and 2.3->older, but of course there's no guarantee there won't be additional ones in the future.) I'd suggest stashing the Python version, and if they've downgraded, show a warning saying "This save file requires Python 2.6 or later. You have Python 2.5. The load may fail. Continue anyway?"
This is only guaranteed for random.Random and for the random module itself (since the top-level module functions just use a hidden random.Random). In particular, random.SystemRandom are explicitly documented not to work.
Practically speaking, you can also just pickle a random.Random directly, because the state gets pickled in. It seems like that ought to work, or what would be the sense of pickling a Random object? And it definitely does work. But it isn't actually documented to work, so I'd stick with pickling the getstate, for safety.
One possible implementation of noise is this:
import random
def noise(point):
gen = random.Random()
gen.seed(point)
return gen.random()
I don't know how fast Random.seed() is, though. In addition, Random may change from one version of Python to the next, causing the players of my game to find that the environment changes when they upgrade.
I'm using python to set up a computationally intense simulation, then running it in a custom built C-extension and finally processing the results in python. During the simulation, I want to store a fixed-length number of floats (C doubles converted to PyFloatObjects) representing my variables at every time step, but I don't know how many time steps there will be in advance. Once the simulation is done, I need to pass back the results to python in a form where the data logged for each individual variable is available as a list-like object (for example a (wrapper around a) continuous array, piece-wise continuous array or column in a matrix with a fixed stride).
At the moment I'm creating a dictionary mapping the name of each variable to a list containing PyFloatObject objects. This format is perfect for working with in the post-processing stage but I have a feeling the creation stage could be a lot faster.
Time is quite crucial since the simulation is a computationally heavy task already. I expect that a combination of A. buying lots of memory and B. setting up your experiment wisely will allow the entire log to fit in the RAM. However, with my current dict-of-lists solution keeping every variable's log in a continuous section of memory would require a lot of copying and overhead.
My question is: What is a clever, low-level way of quickly logging gigabytes of doubles in memory with minimal space/time overhead, that still translates to a neat python data structure?
Clarification: when I say "logging", I mean storing until after the simulation. Once that's done a post-processing phase begins and in most cases I'll only store the resulting graphs. So I don't actually need to store the numbers on disk.
Update: In the end, I changed my approach a little and added the log (as a dict mapping variable names to sequence types) to the function parameters. This allows you to pass in objects such as lists or array.arrays or anything that has an append method. This adds a little time overhead because I'm using the PyObject_CallMethodObjArgs function to call the Append method instead of PyList_Append or similar. Using arrays allows you to reduce the memory load, which appears to be the best I can do short of writing my own expanding storage type. Thanks everyone!
You might want to consider doing this in Cython, instead of as a C extension module. Cython is smart, and lets you do things in a pretty pythonic way, even though it at the same time lets you use C datatypes and python datatypes.
Have you checked out the array module? It allows you to store lots of scalar, homogeneous types in a single collection.
If you're truly "logging" these, and not just returning them to CPython, you might try opening a file and fprintf'ing them.
BTW, realloc might be your friend here, whether you go with a C extension module or Cython.
This is going to be more a huge dump of ideas rather than a consistent answer, because it sounds like that's what you're looking for. If not, I apologize.
The main thing you're trying to avoid here is storing billions of PyFloatObjects in memory. There are a few ways around that, but they all revolve on storing billions of plain C doubles instead, and finding some way to expose them to Python as if they were sequences of PyFloatObjects.
To make Python (or someone else's module) do the work, you can use a numpy array, a standard library array, a simple hand-made wrapper on top of the struct module, or ctypes. (It's a bit odd to use ctypes to deal with an extension module, but there's nothing stopping you from doing it.) If you're using struct or ctypes, you can even go beyond the limits of your memory by creating a huge file and mmapping in windows into it as needed.
To make your C module do the work, instead of actually returning a list, return a custom object that meets the sequence protocol, so when someone calls, say, foo.getitem(i) you convert _array[i] to a PyFloatObject on the fly.
Another advantage of mmap is that, if you're creating the arrays iteratively, you can create them by just streaming to a file, and then use them by mmapping the resulting file back as a block of memory.
Otherwise, you need to handle the allocations. If you're using the standard array, it takes care of auto-expanding as needed, but otherwise, you're doing it yourself. The code to do a realloc and copy if necessary isn't that difficult, and there's lots of sample code online, but you do have to write it. Or you may want to consider building a strided container that you can expose to Python as if it were contiguous even though it isn't. (You can do this directly via the complex buffer protocol, but personally I've always found that harder than writing my own sequence implementation.) If you can use C++, vector is an auto-expanding array, and deque is a strided container (and if you've got the SGI STL rope, it may be an even better strided container for the kind of thing you're doing).
As the other answer pointed out, Cython can help for some of this. Not so much for the "exposing lots of floats to Python" part; you can just move pieces of the Python part into Cython, where they'll get compiled into C. If you're lucky, all of the code that needs to deal with the lots of floats will work within the subset of Python that Cython implements, and the only things you'll need to expose to actual interpreted code are higher-level drivers (if even that).
I'm looking into speeding up my python code, which is all matrix math, using some form of CUDA. Currently my code is using Python and Numpy, so it seems like it shouldn't be too difficult to rewrite it using something like either PyCUDA or CudaMat.
However, on my first attempt using CudaMat, I realized I had to rearrange a lot of the equations in order to keep the operations all on the GPU. This included the creation of many temporary variables so I could store the results of the operations.
I understand why this is necessary, but it makes what were once easy to read equations into somewhat of a mess that difficult to inspect for correctness. Additionally, I would like to be able to easily modify the equations later on, which isn't in their converted form.
The package Theano manages to do this by first creating a symbolic representation of the operations, then compiling them to CUDA. However, after trying Theano out for a bit, I was frustrated by how opaque everything was. For example, just getting the actual value for myvar.shape[0] is made difficult since the tree doesn't get evaluated until much later. I would also much prefer less of a framework in which my code much conform to a library that acts invisibly in the place of Numpy.
Thus, what I would really like is something much simpler. I don't want automatic differentiation (there are other packages like OpenOpt that can do that if I require it), or optimization of the tree, but just a conversion from standard Numpy notation to CudaMat/PyCUDA/somethingCUDA. In fact, I want to be able to have it evaluate to just Numpy without any CUDA code for testing.
I'm currently considering writing this myself, but before even consider such a venture, I wanted to see if anyone else knows of similar projects or a good starting place. The only other project I know that might be close to this is SymPy, but I don't know how easy it would be to adapt to this purpose.
My current idea would be to create an array class that looked like a Numpy.array class. It's only function would be to build a tree. At any time, that symbolic array class could be converted to a Numpy array class and be evaluated (there would also be a one-to-one parity). Alternatively, the array class could be traversed and have CudaMat commands be generated. If optimizations are required they can be done at that stage (e.g. re-ordering of operations, creation of temporary variables, etc.) without getting in the way of inspecting what's going on.
Any thoughts/comments/etc. on this would be greatly appreciated!
Update
A usage case may look something like (where sym is the theoretical module), where we might be doing something such as calculating the gradient:
W = sym.array(np.rand(size=(numVisible, numHidden)))
delta_o = -(x - z)
delta_h = sym.dot(delta_o, W)*h*(1.0-h)
grad_W = sym.dot(X.T, delta_h)
In this case, grad_W would actually just be a tree containing the operations that needed to be done. If you wanted to evaluate the expression normally (i.e. via Numpy) you could do:
npGrad_W = grad_W.asNumpy()
which would just execute the Numpy commands that the tree represents. If on the other hand, you wanted to use CUDA, you would do:
cudaGrad_W = grad_W.asCUDA()
which would convert the tree into expressions that can executed via CUDA (this could happen in a couple of different ways).
That way it should be trivial to: (1) test grad_W.asNumpy() == grad_W.asCUDA(), and (2) convert your pre-existing code to use CUDA.
Have you looked at the GPUArray portion of PyCUDA?
http://documen.tician.de/pycuda/array.html
While I haven't used it myself, it seems like it would be what you're looking for. In particular, check out the "Single-pass Custom Expression Evaluation" section near the bottom of that page.