I am interested in rewriting an old Fortran code to Python. The code was for solving any general field variable, call it F (velocity, temperature, pressure, etc). But to solve each variable we have to define EQUIVALENCE of that variable to F.
For example, something like this:
EQUIVALENCE (F(1,1,1),TP(1,1)),(FOLD(1,1,1),TPOLD(1,1))
Is there a Python version of the above concept?
To my knowledge, there is no way to manipulate the memory usage in python.
You can perhaps simply use a list.
F=[]
and
FOLD=[]
When you do
F=FOLD
F and FOLD will point to the same data.
I would suggest to use numpy and scipy to create solvers and use python concepts to make it efficient instead of trying to mimic fortran concepts. Especially very old ones.
Related
I can't seem to find the code for numpy argmax.
The source link in the docs lead me to here, which doesn't have any actual code.
I went through every function that mentions argmax using the github search tool and still no luck. I'm sure I'm missing something.
Can someone lead me in the right direction?
Thanks
Numpy is written in C. It uses a template engine that parsing some comments to generate many versions of the same generic function (typically for many different types). This tool is very helpful to generate fast code since the C language does not provide (proper) templates unlike C++ for example. However, it also make the code more cryptic than necessary since the name of the function is often generated. For example, generic functions names can look like #TYPE#_#OP# where #TYPE# and #OP# are two macros that can take different values each. On top of all of this, the CPython binding also make the code more complex since C functions have to be wrapped to be called from a CPython code with complex arrays (possibly with a high amount of dimensions and custom user types) and CPython arguments to decode.
_PyArray_ArgMinMaxCommon is a quite good entry point but it is only a wrapping function and not the main computing one. It is only useful if you plan to change the prototype of the Numpy function from Python.
The main computational function can be found here. The comment just above the function is the one used to generate the variants of the functions (eg. CDOUBLE_argmax). Note that there are some alternative specific implementation for alternative type below the main one like OBJECT_argmax since CPython objects and strings must be computed a bit differently. Thank you for contributing to Numpy.
As mentioned in the comments, you'll likely find what you are searching in the C code implementation (here under _PyArray_ArgMinMaxCommon). The code itself can be very convoluted, so if your intent was to open an issue on numpy with a broad idea, I would do it on the page you linked anyway.
The Question:
Given a sympy expression, is there an easy way to generate python code (in the end I want a .py or perhaps a .pyc file)? I imagine this code would contain a function that is given any necessary inputs and returns the value of the expression.
Why
I find myself pretty frequently needing to generate python code to compute something that is nasty to derive, such as the Jacobian matrix of a nasty nonlinear function.
I can use sympy to derive the expression for the nonlinear thing I want: very good. What I then want is to generate python code from the resulting sympy expression, and save that python code to it's own module. I've done this before, but I had to:
Call str(sympyResult)
Do custom things with regular expressions to get this string to look like valid python code
write this python code to a file
I note that sympy has code generation capabilities for several other languages, but not python. Is there an easy way to get python code out of sympy?
I know of several possible but problematic ways around this problem:
I know that I could just call evalf on the sympy expression and plug in the numbers I want. This has several unfortuante side effects:
dependency: my code now depends on sympy to run. This is bad.
efficiency: sympy now must run every time I numerically evaluate: even if I pickle and unpickle the expression, I still need evalf every time.
I also know that I could generate, say, C code and then wrap that code using a host of tools (python/C api, cython, weave, swig, etc...). This, however, means that my code now depends on there being an appropriate C compiler.
Edit: Summary
It seems that sympy.python, or possibly just str(expression) are what there is (see answer from smichr and comment from Oliver W.), and they work for simple scalar expressions.
That doesn't help much with things like Jacobians, but then it seems that sympy.printing.print_ccode chokes on matrices as well. I suppose code that could handle the printing of matrices to another language would have to assume matrix support in the destination language, which for python would probably mean reliance on the presence of things like numpy. It would be nice if such a way to generate numpy code existed, but it seems it does not.
If you don't mind having a SymPy dependency in your code itself, a better solution is to generate the SymPy expression in your code and use lambdify to evaluate it. This will be much faster than using evalf, especially if you use numpy.
You could also look at using the printer in sympy.printing.lambdarepr directly, which is what lambdify uses to convert an expression into a lambda function.
The function you are looking for to generate python code is python. Although it generates python code, that code will need some tweaking to remove dependence on SymPy objects as Oliver W pointed out.
>>> import sympy as sp
>>> x = sp.Symbol('x')
>>> y = sp.Symbol('y')
>>> print(sp.python(sp.Matrix([[x**2,sp.exp(y) + x]]).jacobian([x, y])))
x = Symbol('x')
y = Symbol('y')
e = MutableDenseMatrix([[2*x, 0], [1, exp(y)]])
I've been using the simplify function in Sympy to simplify some long complicated equations, but it's not proving sufficient, as it frequently does not simplify things as much as possible, giving my program numerical errors when it comes to solving the equations.
Does anyone know of any other symbolic engines with a simplify function that can be used instead?
Many thanks.
Maybe you use python's subprocess module to run maxima on behalf of your python program? This is what maxima-mode on Emacs does, just do something similar. Start maxima, keep file handles to it's input/output, feed it with equations and let it mangle them to your desire (Maxima has lots of equation-changing functions), and read back the result from the output file handle.
Sympy vs. Maxima
pyMaxima
I'm looking into speeding up my python code, which is all matrix math, using some form of CUDA. Currently my code is using Python and Numpy, so it seems like it shouldn't be too difficult to rewrite it using something like either PyCUDA or CudaMat.
However, on my first attempt using CudaMat, I realized I had to rearrange a lot of the equations in order to keep the operations all on the GPU. This included the creation of many temporary variables so I could store the results of the operations.
I understand why this is necessary, but it makes what were once easy to read equations into somewhat of a mess that difficult to inspect for correctness. Additionally, I would like to be able to easily modify the equations later on, which isn't in their converted form.
The package Theano manages to do this by first creating a symbolic representation of the operations, then compiling them to CUDA. However, after trying Theano out for a bit, I was frustrated by how opaque everything was. For example, just getting the actual value for myvar.shape[0] is made difficult since the tree doesn't get evaluated until much later. I would also much prefer less of a framework in which my code much conform to a library that acts invisibly in the place of Numpy.
Thus, what I would really like is something much simpler. I don't want automatic differentiation (there are other packages like OpenOpt that can do that if I require it), or optimization of the tree, but just a conversion from standard Numpy notation to CudaMat/PyCUDA/somethingCUDA. In fact, I want to be able to have it evaluate to just Numpy without any CUDA code for testing.
I'm currently considering writing this myself, but before even consider such a venture, I wanted to see if anyone else knows of similar projects or a good starting place. The only other project I know that might be close to this is SymPy, but I don't know how easy it would be to adapt to this purpose.
My current idea would be to create an array class that looked like a Numpy.array class. It's only function would be to build a tree. At any time, that symbolic array class could be converted to a Numpy array class and be evaluated (there would also be a one-to-one parity). Alternatively, the array class could be traversed and have CudaMat commands be generated. If optimizations are required they can be done at that stage (e.g. re-ordering of operations, creation of temporary variables, etc.) without getting in the way of inspecting what's going on.
Any thoughts/comments/etc. on this would be greatly appreciated!
Update
A usage case may look something like (where sym is the theoretical module), where we might be doing something such as calculating the gradient:
W = sym.array(np.rand(size=(numVisible, numHidden)))
delta_o = -(x - z)
delta_h = sym.dot(delta_o, W)*h*(1.0-h)
grad_W = sym.dot(X.T, delta_h)
In this case, grad_W would actually just be a tree containing the operations that needed to be done. If you wanted to evaluate the expression normally (i.e. via Numpy) you could do:
npGrad_W = grad_W.asNumpy()
which would just execute the Numpy commands that the tree represents. If on the other hand, you wanted to use CUDA, you would do:
cudaGrad_W = grad_W.asCUDA()
which would convert the tree into expressions that can executed via CUDA (this could happen in a couple of different ways).
That way it should be trivial to: (1) test grad_W.asNumpy() == grad_W.asCUDA(), and (2) convert your pre-existing code to use CUDA.
Have you looked at the GPUArray portion of PyCUDA?
http://documen.tician.de/pycuda/array.html
While I haven't used it myself, it seems like it would be what you're looking for. In particular, check out the "Single-pass Custom Expression Evaluation" section near the bottom of that page.
I'm having a lot of fun learning Python by writing a genetic programming type of application.
I've had some great advice from Torsten Marek, Paul Hankin and Alex Martelli on this site.
The program has 4 main functions:
generate (randomly) an expression tree.
evaluate the fitness of the tree
crossbreed
mutate
As all of generate, crossbreed and mutate call 'evaluate the fitness'. it is the busiest function and is the primary bottleneck speedwise.
As is the nature of genetic algorithms, it has to search an immense solution space so the faster the better. I want to speed up each of these functions. I'll start with the fitness evaluator. My question is what is the best way to do this. I've been looking into cython, ctypes and 'linking and embedding'. They are all new to me and quite beyond me at the moment but I look forward to learning one and eventually all of them.
The 'fitness function' needs to compare the value of the expression tree to the value of the target expression. So it will consist of a postfix evaluator which will read the tree in a postfix order. I have all the code in python.
I need advice on which I should learn and use now: cython, ctypes or linking and embedding.
Thank you.
Ignore everyone elses' answer for now. The first thing you should learn to use is the profiler. Python comes with a profile/cProfile; you should learn how to read the results and analyze where the real bottlenecks is. The goal of optimization is three-fold: reduce the time spent on each call, reduce the number of calls to be made, and reduce memory usage to reduce disk thrashing.
The first goal is relatively easy. The profiler will show you the most time-consuming functions and you can go straight to that function to optimize it.
The second and third goal is harder since this means you need to change the algorithm to reduce the need to make so much calls. Find the functions that have high number of calls and try to find ways to reduce the need to call them. Utilize the built-in collections, they're very well optimized.
If you're doing a lot of number and array processing, you should take a look at pandas, Numpy/Scipy, gmpy third party modules; they're well optimised C libraries for processing arrays/tabular data.
Another thing you want to try is PyPy. PyPy can JIT recompile and do much more advanced optimisation than CPython, and it'll work without the need to change your python code. Though well optimised code targeting CPython can look quite different from well optimised code targeting PyPy.
Next to try is Cython. Cython is a slightly different language than Python, in fact Cython is actually best described as C with typed Python-like syntax.
For parts of your code that is in very tight loops that you can no longer optimize using any other ways, you may want to rewrite it as C extension. Python has a very good support for extending with C. In PyPy, the best way to extend PyPy is with cffi.
Cython is the quickest to get the job done, either by writing your algorithm directly in Cython, or by writing it in C and bind it to python with Cython.
My advice: learn Cython.
Another great option is boost::python which lets you easily wrap C or C++.
Of these possibilities though, since you have python code already written, cython is probably a good thing to try first. Perhaps you won't have to rewrite any code to get a speedup.
Try to work your fitness function so that it will support memoization. This will replace all calls that are duplicates of previous calls with a quick dict lookup.