How to improve Cython performance?

How to improve Cython performance? - python

I am doing my first steps with Cython, and I am wondering how to improve performance even more.
Until now I got to half the usual (python only) execution time, but I think there must be more!
I know cython -a and I already typed my variables. But there is still a lot in yellow in my function. Is this because cython does not recognise numpy or is there something else I am missing?

I believe you can benefit by using math functions from libc as you are calling np.sqrt and np.floor on scalars. This has not only the Python call overhead but there are different code paths in the numpy ufuncs for scalars and arrays. So that involves at least a type switch.

I think it's not a problem, as I've tested with the official tutorial, it's also reported as yellow on every np.* lines, and involves python just the same as your code.
Point 3 at the end of that page should have explained this:
Calling NumPy/SciPy functions currently has a Python call overhead; it would be possible to take a short-cut from Cython directly to C. (This does however require some isolated and incremental changes to those libraries; mail the Cython mailing list for details).

Related

Cython: Effectively using Numpy in Pure Python Mode

I am fairly new to using Cython and I am interested in using the "Pure Python" mode.
The work that I am doing right now uses numpy extensively and knowing that there is a C api for numpy, I was excited to see what it could do.
As a small test, I put together two small test files, test.py and test.pxd. Their content is as follows:
test.py:
import cython
import numpy as np
#cython.locals(array=np.ndarray)
#cython.returns(np.ndarray)
def test(array):
return np.cumsum(array)
test_array = np.array([1,2,3,4,5])
test(test_array)
test.pxd:
# cython: language_level=3
cimport numpy as np
cdef np.ndarray test(np.ndarray array)
I then compiled these files with cython -a test.py with the hopes that I would see little to no python interaction when calling np.cumsum(). However when I inspected the generated HTML file, I found the following:
From this, it appears that my call to np.cumsum heavily interacts with python, which is something that feels counter-intuitive. My expectation, since I (should) be using the cimported numpy, is that there should be very little python interaction.
My question is "is my intuition correct?". Have I set something up incorrectly with my files that is not allowing the cimported numpy to actually be used for the function call, and that is why I am still seeing so much yellow? Or am I fundamentally misunderstanding something.
Thanks for reading!

Defining the types as np.ndarray mainly improves one thing: it makes indexing them to get single values significantly faster. Almost everything else remains the same speed.
np.cumsum (and any other Numpy function) is called through the standard Python mechanism and runs at exactly the same speed (internally of course it's implemented in C and should be quite quick). Mathematical operator (such add +, -, *, etc.) are also called through Python and remain the same speed.
In reality your wrapping probably makes it slower - it adds an unnecessary type-check (to make sure that the array is an np.ndarray) and an extra layer of indirection.
There is nothing to be gained through typing here.

Is there a way to disable array bounds checking in Python?

I'm working on improving the performance of a python program, and I was wondering if there was a way to improve performance by disabling array bounds checking? I know that some versions of Pascal let you do this. Does python have any such feature?
I know that Python isn't really designed for high performance, but I'd like to know if it's possible to improve performance in this manner; otherwise yes, I am aware that switching to C would be faster.

Disabling array bounds checking cannot be done in Python as far as I know.
It can, however, be done in cython fairly easily with the directive
#cython: boundscheck=False
However, since "premature optimization is the root of all evil." (Knuth), you might want to first check if your script runs faster by switching to pypy instead of python.

For ordinary lists, no. However, you can create your own datastructures / functions in C or C++ using the Python C API (see Extending Python with C or C++), which you could use to implement a dangerous_unchecked_list data structure that behaves like a regular list but without this checking (or some other higher level data structure that needs to bypass bounds-checking internally). For much larger operations that can't be implemented optimally in Python, you could also use interprocess communication (IPC) -- as opposed to the C API -- for invoking a non-Python implementation.
However, before going down this route, you should make sure that this is truly the performance bottleneck. You may find that there are other areas where you can get bigger wins, such as by using a better algorithm, by choosing other functions or datastructures to implement natively that actually are the peformance bottleneck, by precomputing and/or caching information, or by some other means.

Alternatives of fused type in cython

I am working on rewriting a python module originally written in C using python-C api to Cython.The module also uses NumPy. A major challenge of the project is to maintain the current speed of module and also it should work for all Numpy data types. I am thinking to use fused data type to make it generic but I am worried because of its bottleneck effect on performance. Are there any other technique that can be used instead of fused type which I can use to achieve both speed and generic code.

Ignoring ali_m's perfectly valid comment about whether you've actually measured your performance issues...
http://docs.cython.org/src/userguide/fusedtypes.html#selecting-specializations
"For a cdef or cpdef function called from Cython this means that the specialization is figured out at compile time. For def functions the arguments are typechecked at runtime, and a best-effort approach is performed to figure out which specialization is needed."
Essentially, if you're calling from Cython there should be no issue - separate functions are generated and used without overhead. If you're calling from Python it obviously has to stop and think about which one to call.
But measure your performance before worrying about it! (And read the manual, which answers your question quite clearly.)

Convert python script to binary executable

I wrote a number crunching python code. The calculations involved can take hours. Is it possible somehow to compile it to binary?
Thanks

Not in any useful (for you) way, but moving the calculations into NumPy or Cython will speed them up.

First you can try psyco, that may give you a speed up as much as 10x, but 2x is more typical
If you can post the code up somewhere, perhaps someone can point out how to leverage numpy.
If your task doesn't map well only numpy then cython is a good choice to convert a intensive function or two into C code just by adding a few cdefs.
If you can show us the code (even just the hot spots) we can probably give you better advice.
Perhaps you can modify your algorithm

Shedskin might be worth a try.
From their front page blurb:
Shed Skin is an experimental compiler,
that can translate pure, but
implicitly statically typed Python
programs into optimized C++. It can
generate stand-alone programs or
extension modules that can be imported
and used in larger Python programs.
Besides the typing restriction,
programs cannot freely use the Python
standard library (although about 20
common modules, such as random and re,
are currently supported). Also, not
all Python features, such as nested
functions and variable numbers of
arguments, are supported (see the
tutorial for details).
For a set of 44 non-trivial test
programs (at over 10,000 lines in
total (sloccount)), measurements show
a typical speedup of 2-40 times over
Psyco, and 2-220 times over CPython.
Because Shed Skin is still in an early
stage of development, however, many
other programs will not compile
out-of-the-box.

How much of NumPy and SciPy is in C?

Are parts of NumPy and/or SciPy programmed in C/C++?
And how does the overhead of calling C from Python compare to the overhead of calling C from Java and/or C#?
I'm just wondering if Python is a better option than Java or C# for scientific apps.
If I look at the shootouts, Python loses by a huge margin. But I guess this is because they don't use 3rd-party libraries in those benchmarks.

I would question any benchmark which doesn't show the source for each implementation (or did I miss something)? It's entirely possible that either or both of those solutions are coded badly which would result in an unfair appraisal of either or both language's performance. [Edit] Oops, now I see the source. As others have pointed out though, it's not using the NumPy/SciPy libraries so those benchmarks are not going to help you make a decision.
I believe the vast majority of NumPy and SciPy is written in C and wrapped in Python for ease of use.
It probably depends what you're doing in any of those languages as to how much overhead there is for a particular application.
I've used Python for data processing and analysis for a couple of years now so I would say it's certainly fit for purpose.
What are you trying to achieve at the end of the day? If you want a fast way to develop readable code, Python is an excellent option and certainly fast enough for a first stab at whatever it is you're trying to solve.
Why not have a bash at each for a small subset of your problem and benchmark the results in terms of development time and run time? Then you can make an objective decision based on some relevant data ...or at least that's what I'd do :-)

There is a better comparison here (not a benchmark but shows ways of speeding up Python). NumPy is mostly written in C. The main advantage of Python is that there are a number of ways of very easily extending your code with C (ctypes, swig,f2py) / C++ (boost.python, weave.inline, weave.blitz) / Fortran (f2py) - or even just by adding type annotations to Python so it can be processed to C (cython). I don't think there are many things comparably easy for C# or Java - at least that so seemlessly handle passing numerical arrays of different types (although I guess proponents would argue since they don't have the performance penalty of Python there is less need to).

A lot of it is written in C or fortran. You can re-write the hot loops in C (or use one of the gazillion ways to speed python up, boost/weave is my favorite), but does it really matter?
Your scientific app will be run once. The rest is just debugging and development, and those can be much quicker on Python.

Most of NumPy is in C, but a large portion of the C code is "boilerplate" to handle all the dirty details of the Python/C interface. I think the ratio C vs. Python is around 50/50 ATM for NumPy.
I am not too familiar with vm-based low-level details, but I believe the interface cost would be higher because of the restrictions put on the jvm and the .clr. One of the reason why numpy is often faster than similar environments is the memory representation and how arrays are shared/passed between functions. Whereas most environments (Matlab and R as well I believe) use Copy-On-Write to pass arrays between functions, NumPy use references. But doing so in e.g. the JVM would be hard (because of restrictions on how to use pointer, etc...). It is doable (an early port of NumPy for Jython exists), but I don't know how they solve this issue. Maybe C++/Cli would make this easier, but I have zero experience with that environment.

It always depends on your own capability to handle the langue, so the language is able to generate fast code. Out of my experience, numpy is several times slower then good .NET implementations. And I expect JAVA to be similar fast. Their optimizing JIT compilers have improved significantly over the years and produce very efficient instructions.
numpy on the other hand comes with a syntax wich is easier to use for those, which are attuned to scripting languages. But if it comes to application development, those advantages often turn to obstacles and you will yearn for typesafety and enterprise IDEs. Also, the syntactic gap is already closing with C#. A growing number of scientific libraries exist for Java and .NET.Personally I tend towards C#, bacause it provides better syntax for multidimensional arrays and somehow feels more 'modern'. But of course, this is only my personal experience.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to improve Cython performance? - python

I believe you can benefit by using math functions from libc as you are calling np.sqrt and np.floor on scalars. This has not only the Python call overhead but there are different code paths in the numpy ufuncs for scalars and arrays. So that involves at least a type switch.

Related

Cython: Effectively using Numpy in Pure Python Mode

Is there a way to disable array bounds checking in Python?

Alternatives of fused type in cython

Convert python script to binary executable

How much of NumPy and SciPy is in C?

Categories

Resources