Optimizing for PyPy - python

(This is a follow-up to Statistical profiler for PyPy)
I'm running some Python code under PyPy and would like to optimize it.
In Python, I would use statprof or lineprofiler to know which exact lines are causing the slowdown and try to work around them. In PyPy though, both of the tools don't really report sensible results as PyPy might optimize away some lines. I would also prefer not to use cProfile as I find it very difficult to distil which part of the reported function is the bottleneck.
Does anyone have some tips on how to proceed? Perhaps another profiler which works nicely under PyPy? In general, how does one go about optimizing Python code for PyPy?

If you understand the way the PyPy architecture works, you'll realize that trying to pinpoint individual lines of code isn't really productive. You start with a Python interpreter written in RPython, which then gets run through a tracing JIT which generates flow graphs and then transforms those graphs to optimize the RPython interpreter. What this means is the layout of your Python code being run by the RPython interpreter being JIT'ed may have very different structure than the optimized assembler actually be run. Furthermore, keep in mind that the JIT always works on a loop or a function, so getting line-by-line stats is not as meaningful. Consequently, I think cProfile may really be a good option for you since it will give you an idea of where to concentrate your optimization. Once you know which functions are your bottlenecks, you can spend your optimization efforts targeting those slower functions, rather than trying to fix a single line of Python code.
Keep in mind as you do this that PyPy has very different performance characteristics than cPython. Always try to write code in as simple a way as possible (that doesn't mean as few lines as possible btw). There are a few other heuristics that help such as using specialized lists, using objects over dicts when you have a small number of mostly constant keys, avoiding C extensions using the C Python API, etc.
If you really, really insist on trying to optimize at the line level. There are a few options. One is called JitViewer (https://foss.heptapod.net/pypy/jitviewer), which will let you have a very low level view of what the JIT is doing to your code. For instance, you can even see the assembler instructions which correspond to a Python loop. Using that tool, you can really get a sense for just how fast PyPy will behave with certain parts of the code, as you can now do silly things like count the number of assembler instructions used for your loop or something.

Related

Using C/C++ for heavy calculations in Python (Also MySQL)

I'm implementing an algorithm into my Python web application, and it includes doing some (possibly) large clustering and matrix calculations. I've seen that Python can use C/C++ libraries, and thought that it might be a good idea to utilize this to speed things up.
First: Are there any reasons not to, or anything I should keep in mind while doing this?
Second: I have some reluctance against connecting C to MySQL (where I would get the data the calculations). Is this in any way justified?
Use the ecosystem.
For matrices, using numpy and scipy can provide approximately the same range of functionality as tools like Matlab. If you learn to write idiomatic code with these modules, the inner loops can take place in the C or FORTRAN implementations of the modules, resulting in C-like overall performance with Python expressiveness for most tasks. You may also be interested in numexpr, which can further accelerate and in some cases parallelize numpy/scipy expressions.
If you must write compute-intensive inner loops in Python, think hard about it first. Maybe you can reformulate the problem in a way more suited to numpy/scipy. Or, maybe you can use data structures available in Python to come up with a better algorithm rather than a faster implementation of the same algorithm. If not, there’s Cython, which uses a restricted subset of Python to compile to machine code.
Only as a last resort, and after profiling to identify the absolute worst bottlenecks, should you consider writing an extension module in C/C++. There are just so many easier ways to meet the vast majority of performance requirements, and numeric/mathematical code is an area with very good existing library support.
Not the answer you expected, but i have been down that road and advise KISS:
First make it work in the most simple way possible.
Only than look into speeding things up later / complicating the design.
There are lots of other ways to phrase this such as "do not fix hypothetical problems unless resources are unlimited".
cython support for c++ is much better than what it was. You can use most of the standard library in cython seamlessly. There are up to 500x speedups in the extreme best case.
My experience is that it is best to keep the cython code extremely thin, and forward all arguments to c++. It is much easier to debug c++ directly, and the syntax is better understood. Having to maintain a code base unnecessarily in three different languages is a pain.
Using c++/cython means that you have to spend a little time thinking about ownership issues. I.e. it is often safest not to allocate anything in c++ but prepare the memory in python / cython. (Use array.array or numpy.array). Alternatively, make a c++ object wrapped in cython which has a deallocation function. All this means that your application will be more fragile than if it is written only in python or c++: You are abandoning both RAII / gc.
On the other hand, your python code should translate line for line into modern c++. So this reminds you not to use old fashioned new or delete etc in your new c++ code but make things fast and clean by keeping the abstractions at a high level.
Remember too to re-examine the assumptions behind your original algorithmic choices. What is sensible for python might be foolish for c++.
Finally, python makes everything significantly simpler and cleaner and faster to debug than c++. But in many ways, c++ encourages more powerful abstractions and better separation of concerns.
When you programme with python and cython and c++, it slowly comes to feel like taking the worse bits of both approaches. It might be worth biting the bullet and rewriting completely in c++. You can keep the python test harness and use the original design as a prototype / testbed.

Can I improve python runtime by compiling?

I'm writing a small toy simulation in python. Granted, this simulations are slow. To my understanding, the major reason that python codes are slow is the fact that python is in interpreted language. I don't want to give up python since the clear syntax and the available library cut the writing time significantly. So is there a simple way for me to "compile" my python code?
Edit
I answer some questions:
Yes, I'm using numpy. It greatly simplify the code and I don't think I can improve performance writing the functions on my own. I use numpy for all my lists and and I add all of the beads together. Namely. I invoke
pos += V*dt + forces*0.5*dt**2
where ''pos'', 'V', and 'forces' are all np.array of (2000,3) dimensions.
I'm quite certain that the slow part in the forces calculation. This is logical as I have to iterate over all my particles and check their position. For my real project (Ph.D. stuff) I have code of about roughly the same level of complexity, and I know that this is the expensive stuff.
If none of the solutions in the comment suffice, you can also take a look at cython.
For a quick tutorial & example check:
http://docs.cython.org/src/tutorial/cython_tutorial.html
Used at the correct spots (e.g. around frequently called functions) it can easily speed things up by a factor of 10 - 100.
Python is a slightly odd language in that it is both interpreted and compiled. Well sort of. When you run it is compiled to ".pyc" bytecode - so we can quickly get bogged down in semantic details here. Hell I don't even know if what I just said is strictly accurate. But at the end of the day you want to speed things up so...
First, use the profiler and timeit to work out where all the time is going
Second, rewrite your pure python code to improve the slow bits you've discovered
Third, see how it goes when optimised
Now, depends on your scenario, but seriously think "Can I run it on a bigger CPU/memory"
Ok, try rewriting those slow sections in C++
Screw it, write it all in C++
If you get so far as the last option I dare say you're screwed and the savings aren't going to be significant.

Convert python script to binary executable

I wrote a number crunching python code. The calculations involved can take hours. Is it possible somehow to compile it to binary?
Thanks
Not in any useful (for you) way, but moving the calculations into NumPy or Cython will speed them up.
First you can try psyco, that may give you a speed up as much as 10x, but 2x is more typical
If you can post the code up somewhere, perhaps someone can point out how to leverage numpy.
If your task doesn't map well only numpy then cython is a good choice to convert a intensive function or two into C code just by adding a few cdefs.
If you can show us the code (even just the hot spots) we can probably give you better advice.
Perhaps you can modify your algorithm
Shedskin might be worth a try.
From their front page blurb:
Shed Skin is an experimental compiler,
that can translate pure, but
implicitly statically typed Python
programs into optimized C++. It can
generate stand-alone programs or
extension modules that can be imported
and used in larger Python programs.
Besides the typing restriction,
programs cannot freely use the Python
standard library (although about 20
common modules, such as random and re,
are currently supported). Also, not
all Python features, such as nested
functions and variable numbers of
arguments, are supported (see the
tutorial for details).
For a set of 44 non-trivial test
programs (at over 10,000 lines in
total (sloccount)), measurements show
a typical speedup of 2-40 times over
Psyco, and 2-220 times over CPython.
Because Shed Skin is still in an early
stage of development, however, many
other programs will not compile
out-of-the-box.

I need to speed up a function. Should I use cython, ctypes, or something else?

I'm having a lot of fun learning Python by writing a genetic programming type of application.
I've had some great advice from Torsten Marek, Paul Hankin and Alex Martelli on this site.
The program has 4 main functions:
generate (randomly) an expression tree.
evaluate the fitness of the tree
crossbreed
mutate
As all of generate, crossbreed and mutate call 'evaluate the fitness'. it is the busiest function and is the primary bottleneck speedwise.
As is the nature of genetic algorithms, it has to search an immense solution space so the faster the better. I want to speed up each of these functions. I'll start with the fitness evaluator. My question is what is the best way to do this. I've been looking into cython, ctypes and 'linking and embedding'. They are all new to me and quite beyond me at the moment but I look forward to learning one and eventually all of them.
The 'fitness function' needs to compare the value of the expression tree to the value of the target expression. So it will consist of a postfix evaluator which will read the tree in a postfix order. I have all the code in python.
I need advice on which I should learn and use now: cython, ctypes or linking and embedding.
Thank you.
Ignore everyone elses' answer for now. The first thing you should learn to use is the profiler. Python comes with a profile/cProfile; you should learn how to read the results and analyze where the real bottlenecks is. The goal of optimization is three-fold: reduce the time spent on each call, reduce the number of calls to be made, and reduce memory usage to reduce disk thrashing.
The first goal is relatively easy. The profiler will show you the most time-consuming functions and you can go straight to that function to optimize it.
The second and third goal is harder since this means you need to change the algorithm to reduce the need to make so much calls. Find the functions that have high number of calls and try to find ways to reduce the need to call them. Utilize the built-in collections, they're very well optimized.
If you're doing a lot of number and array processing, you should take a look at pandas, Numpy/Scipy, gmpy third party modules; they're well optimised C libraries for processing arrays/tabular data.
Another thing you want to try is PyPy. PyPy can JIT recompile and do much more advanced optimisation than CPython, and it'll work without the need to change your python code. Though well optimised code targeting CPython can look quite different from well optimised code targeting PyPy.
Next to try is Cython. Cython is a slightly different language than Python, in fact Cython is actually best described as C with typed Python-like syntax.
For parts of your code that is in very tight loops that you can no longer optimize using any other ways, you may want to rewrite it as C extension. Python has a very good support for extending with C. In PyPy, the best way to extend PyPy is with cffi.
Cython is the quickest to get the job done, either by writing your algorithm directly in Cython, or by writing it in C and bind it to python with Cython.
My advice: learn Cython.
Another great option is boost::python which lets you easily wrap C or C++.
Of these possibilities though, since you have python code already written, cython is probably a good thing to try first. Perhaps you won't have to rewrite any code to get a speedup.
Try to work your fitness function so that it will support memoization. This will replace all calls that are duplicates of previous calls with a quick dict lookup.

How much of NumPy and SciPy is in C?

Are parts of NumPy and/or SciPy programmed in C/C++?
And how does the overhead of calling C from Python compare to the overhead of calling C from Java and/or C#?
I'm just wondering if Python is a better option than Java or C# for scientific apps.
If I look at the shootouts, Python loses by a huge margin. But I guess this is because they don't use 3rd-party libraries in those benchmarks.
I would question any benchmark which doesn't show the source for each implementation (or did I miss something)? It's entirely possible that either or both of those solutions are coded badly which would result in an unfair appraisal of either or both language's performance. [Edit] Oops, now I see the source. As others have pointed out though, it's not using the NumPy/SciPy libraries so those benchmarks are not going to help you make a decision.
I believe the vast majority of NumPy and SciPy is written in C and wrapped in Python for ease of use.
It probably depends what you're doing in any of those languages as to how much overhead there is for a particular application.
I've used Python for data processing and analysis for a couple of years now so I would say it's certainly fit for purpose.
What are you trying to achieve at the end of the day? If you want a fast way to develop readable code, Python is an excellent option and certainly fast enough for a first stab at whatever it is you're trying to solve.
Why not have a bash at each for a small subset of your problem and benchmark the results in terms of development time and run time? Then you can make an objective decision based on some relevant data ...or at least that's what I'd do :-)
There is a better comparison here (not a benchmark but shows ways of speeding up Python). NumPy is mostly written in C. The main advantage of Python is that there are a number of ways of very easily extending your code with C (ctypes, swig,f2py) / C++ (boost.python, weave.inline, weave.blitz) / Fortran (f2py) - or even just by adding type annotations to Python so it can be processed to C (cython). I don't think there are many things comparably easy for C# or Java - at least that so seemlessly handle passing numerical arrays of different types (although I guess proponents would argue since they don't have the performance penalty of Python there is less need to).
A lot of it is written in C or fortran. You can re-write the hot loops in C (or use one of the gazillion ways to speed python up, boost/weave is my favorite), but does it really matter?
Your scientific app will be run once. The rest is just debugging and development, and those can be much quicker on Python.
Most of NumPy is in C, but a large portion of the C code is "boilerplate" to handle all the dirty details of the Python/C interface. I think the ratio C vs. Python is around 50/50 ATM for NumPy.
I am not too familiar with vm-based low-level details, but I believe the interface cost would be higher because of the restrictions put on the jvm and the .clr. One of the reason why numpy is often faster than similar environments is the memory representation and how arrays are shared/passed between functions. Whereas most environments (Matlab and R as well I believe) use Copy-On-Write to pass arrays between functions, NumPy use references. But doing so in e.g. the JVM would be hard (because of restrictions on how to use pointer, etc...). It is doable (an early port of NumPy for Jython exists), but I don't know how they solve this issue. Maybe C++/Cli would make this easier, but I have zero experience with that environment.
It always depends on your own capability to handle the langue, so the language is able to generate fast code. Out of my experience, numpy is several times slower then good .NET implementations. And I expect JAVA to be similar fast. Their optimizing JIT compilers have improved significantly over the years and produce very efficient instructions.
numpy on the other hand comes with a syntax wich is easier to use for those, which are attuned to scripting languages. But if it comes to application development, those advantages often turn to obstacles and you will yearn for typesafety and enterprise IDEs. Also, the syntactic gap is already closing with C#. A growing number of scientific libraries exist for Java and .NET.Personally I tend towards C#, bacause it provides better syntax for multidimensional arrays and somehow feels more 'modern'. But of course, this is only my personal experience.

Categories