Are parts of NumPy and/or SciPy programmed in C/C++?
And how does the overhead of calling C from Python compare to the overhead of calling C from Java and/or C#?
I'm just wondering if Python is a better option than Java or C# for scientific apps.
If I look at the shootouts, Python loses by a huge margin. But I guess this is because they don't use 3rd-party libraries in those benchmarks.
I would question any benchmark which doesn't show the source for each implementation (or did I miss something)? It's entirely possible that either or both of those solutions are coded badly which would result in an unfair appraisal of either or both language's performance. [Edit] Oops, now I see the source. As others have pointed out though, it's not using the NumPy/SciPy libraries so those benchmarks are not going to help you make a decision.
I believe the vast majority of NumPy and SciPy is written in C and wrapped in Python for ease of use.
It probably depends what you're doing in any of those languages as to how much overhead there is for a particular application.
I've used Python for data processing and analysis for a couple of years now so I would say it's certainly fit for purpose.
What are you trying to achieve at the end of the day? If you want a fast way to develop readable code, Python is an excellent option and certainly fast enough for a first stab at whatever it is you're trying to solve.
Why not have a bash at each for a small subset of your problem and benchmark the results in terms of development time and run time? Then you can make an objective decision based on some relevant data ...or at least that's what I'd do :-)
There is a better comparison here (not a benchmark but shows ways of speeding up Python). NumPy is mostly written in C. The main advantage of Python is that there are a number of ways of very easily extending your code with C (ctypes, swig,f2py) / C++ (boost.python, weave.inline, weave.blitz) / Fortran (f2py) - or even just by adding type annotations to Python so it can be processed to C (cython). I don't think there are many things comparably easy for C# or Java - at least that so seemlessly handle passing numerical arrays of different types (although I guess proponents would argue since they don't have the performance penalty of Python there is less need to).
A lot of it is written in C or fortran. You can re-write the hot loops in C (or use one of the gazillion ways to speed python up, boost/weave is my favorite), but does it really matter?
Your scientific app will be run once. The rest is just debugging and development, and those can be much quicker on Python.
Most of NumPy is in C, but a large portion of the C code is "boilerplate" to handle all the dirty details of the Python/C interface. I think the ratio C vs. Python is around 50/50 ATM for NumPy.
I am not too familiar with vm-based low-level details, but I believe the interface cost would be higher because of the restrictions put on the jvm and the .clr. One of the reason why numpy is often faster than similar environments is the memory representation and how arrays are shared/passed between functions. Whereas most environments (Matlab and R as well I believe) use Copy-On-Write to pass arrays between functions, NumPy use references. But doing so in e.g. the JVM would be hard (because of restrictions on how to use pointer, etc...). It is doable (an early port of NumPy for Jython exists), but I don't know how they solve this issue. Maybe C++/Cli would make this easier, but I have zero experience with that environment.
It always depends on your own capability to handle the langue, so the language is able to generate fast code. Out of my experience, numpy is several times slower then good .NET implementations. And I expect JAVA to be similar fast. Their optimizing JIT compilers have improved significantly over the years and produce very efficient instructions.
numpy on the other hand comes with a syntax wich is easier to use for those, which are attuned to scripting languages. But if it comes to application development, those advantages often turn to obstacles and you will yearn for typesafety and enterprise IDEs. Also, the syntactic gap is already closing with C#. A growing number of scientific libraries exist for Java and .NET.Personally I tend towards C#, bacause it provides better syntax for multidimensional arrays and somehow feels more 'modern'. But of course, this is only my personal experience.
Related
The Julia Language syntax looks very similar to python, while the concept of a class (if one should address it as such a thing) is more what you use in C. There were many reasons why the creators decided on the difference with respect to the OOP. Still would it have been so hard (in comparison to create Julia in first place which is impressive) to find some canonical way to interpret python to Julia and thus get a hold of all the python libraries?
Yes. The design of Python makes it fundamentally difficult to optimize at compile-time (i.e. before you run the code). It is simply false that Julia is fast BECAUSE of its JIT. Rather, Julia is designed with its type system and multiple dispatch in mind so that way the compiler can know all of the necessary details to compile "the same code you would have written in C". That's what makes it fast: the type system. It makes a few trade-offs that allow it to, in "type-stable" functions, fully deduce what the types of every variable is, know what the memory layout of the type should be (including parametric types, so Vector{Float64} has a memory layout which is determined by the type and its parameter which inlines Float64 values like a NumPy array, except this is generalized in a way that your own struct types get the same efficiency), and compile a version of the code specifically for the types which are seen.
There are many ways where this is at odds with Python. For example, if the number of fields in a struct could change, then the memory layout could not be determined and thus these optimizations cannot occur at "compile-time". Julia was painstakingly designed to make sure that it would have type inferrability, and it uses that to generate code which is fully typed and remove all runtime checks (in type-stable functions. When a function is not type-stable, the types of the variables become dynamic rather than static and it slows down to Python-like speeds). In this sense, Julia actually isn't even optimized yet: all of its performance comes "for free" given the design of its type system. Python/MATLAB/R has to try really hard to optimize at runtime because it doesn't have the capability to do these deductions. In fact, those languages are "better optimized" right now in terms of runtime optimizations, but no one has really worked on runtime optimizations in Julia yet because in most performance sensitive cases you can get it all at compile time.
So then, what about Numba? Numba tries to take the route that Julia takes but with Python code by limiting what can be done so that way it can get type-stable code and compile that efficiently. However, this means a few things. First of all, it's not compatible with all Python codes or libraries. But more importantly, since Python is not a language built around its type system, the tools for controlling the code at the level of types is much reduced. So Numba doesn't have parametric vectors and generic codes which auto-specialize via multiple dispatch because these aren't features of the language. But that also means that it cannot make full use of the design, which limits how much it can do. It can handle the "use only floating point array" stuff just fine, but you can see it gets limited if you want one code to produce efficient code for "any number type, even ones I don't know about". However, by design, Julia does this automatically.
So at the core, Julia and Python are extremely different languages. It can be hard to see because Julia's syntax is close to Python's syntax, but they do not work the same at all.
This is a short summary of what I have described in a few blog posts. These go into more detail and show you how Julia is actually generating efficient code, how it gives you a generic "Python looking style" but doing so with full inferrability all the way down, and what the tradeoffs are.
How type-stability plus multiple dispatch gives performance:
http://ucidatascienceinitiative.github.io/IntroToJulia/Html/WhyJulia
http://www.stochasticlifestyle.com/7-julia-gotchas-handle/
How the type system allows for highly performant generic designs
http://www.stochasticlifestyle.com/type-dispatch-design-post-object-oriented-programming-julia/
I'm working on improving the performance of a python program, and I was wondering if there was a way to improve performance by disabling array bounds checking? I know that some versions of Pascal let you do this. Does python have any such feature?
I know that Python isn't really designed for high performance, but I'd like to know if it's possible to improve performance in this manner; otherwise yes, I am aware that switching to C would be faster.
Disabling array bounds checking cannot be done in Python as far as I know.
It can, however, be done in cython fairly easily with the directive
#cython: boundscheck=False
However, since "premature optimization is the root of all evil." (Knuth), you might want to first check if your script runs faster by switching to pypy instead of python.
For ordinary lists, no. However, you can create your own datastructures / functions in C or C++ using the Python C API (see Extending Python with C or C++), which you could use to implement a dangerous_unchecked_list data structure that behaves like a regular list but without this checking (or some other higher level data structure that needs to bypass bounds-checking internally). For much larger operations that can't be implemented optimally in Python, you could also use interprocess communication (IPC) -- as opposed to the C API -- for invoking a non-Python implementation.
However, before going down this route, you should make sure that this is truly the performance bottleneck. You may find that there are other areas where you can get bigger wins, such as by using a better algorithm, by choosing other functions or datastructures to implement natively that actually are the peformance bottleneck, by precomputing and/or caching information, or by some other means.
I'm implementing an algorithm into my Python web application, and it includes doing some (possibly) large clustering and matrix calculations. I've seen that Python can use C/C++ libraries, and thought that it might be a good idea to utilize this to speed things up.
First: Are there any reasons not to, or anything I should keep in mind while doing this?
Second: I have some reluctance against connecting C to MySQL (where I would get the data the calculations). Is this in any way justified?
Use the ecosystem.
For matrices, using numpy and scipy can provide approximately the same range of functionality as tools like Matlab. If you learn to write idiomatic code with these modules, the inner loops can take place in the C or FORTRAN implementations of the modules, resulting in C-like overall performance with Python expressiveness for most tasks. You may also be interested in numexpr, which can further accelerate and in some cases parallelize numpy/scipy expressions.
If you must write compute-intensive inner loops in Python, think hard about it first. Maybe you can reformulate the problem in a way more suited to numpy/scipy. Or, maybe you can use data structures available in Python to come up with a better algorithm rather than a faster implementation of the same algorithm. If not, there’s Cython, which uses a restricted subset of Python to compile to machine code.
Only as a last resort, and after profiling to identify the absolute worst bottlenecks, should you consider writing an extension module in C/C++. There are just so many easier ways to meet the vast majority of performance requirements, and numeric/mathematical code is an area with very good existing library support.
Not the answer you expected, but i have been down that road and advise KISS:
First make it work in the most simple way possible.
Only than look into speeding things up later / complicating the design.
There are lots of other ways to phrase this such as "do not fix hypothetical problems unless resources are unlimited".
cython support for c++ is much better than what it was. You can use most of the standard library in cython seamlessly. There are up to 500x speedups in the extreme best case.
My experience is that it is best to keep the cython code extremely thin, and forward all arguments to c++. It is much easier to debug c++ directly, and the syntax is better understood. Having to maintain a code base unnecessarily in three different languages is a pain.
Using c++/cython means that you have to spend a little time thinking about ownership issues. I.e. it is often safest not to allocate anything in c++ but prepare the memory in python / cython. (Use array.array or numpy.array). Alternatively, make a c++ object wrapped in cython which has a deallocation function. All this means that your application will be more fragile than if it is written only in python or c++: You are abandoning both RAII / gc.
On the other hand, your python code should translate line for line into modern c++. So this reminds you not to use old fashioned new or delete etc in your new c++ code but make things fast and clean by keeping the abstractions at a high level.
Remember too to re-examine the assumptions behind your original algorithmic choices. What is sensible for python might be foolish for c++.
Finally, python makes everything significantly simpler and cleaner and faster to debug than c++. But in many ways, c++ encourages more powerful abstractions and better separation of concerns.
When you programme with python and cython and c++, it slowly comes to feel like taking the worse bits of both approaches. It might be worth biting the bullet and rewriting completely in c++. You can keep the python test harness and use the original design as a prototype / testbed.
I wrote a number crunching python code. The calculations involved can take hours. Is it possible somehow to compile it to binary?
Thanks
Not in any useful (for you) way, but moving the calculations into NumPy or Cython will speed them up.
First you can try psyco, that may give you a speed up as much as 10x, but 2x is more typical
If you can post the code up somewhere, perhaps someone can point out how to leverage numpy.
If your task doesn't map well only numpy then cython is a good choice to convert a intensive function or two into C code just by adding a few cdefs.
If you can show us the code (even just the hot spots) we can probably give you better advice.
Perhaps you can modify your algorithm
Shedskin might be worth a try.
From their front page blurb:
Shed Skin is an experimental compiler,
that can translate pure, but
implicitly statically typed Python
programs into optimized C++. It can
generate stand-alone programs or
extension modules that can be imported
and used in larger Python programs.
Besides the typing restriction,
programs cannot freely use the Python
standard library (although about 20
common modules, such as random and re,
are currently supported). Also, not
all Python features, such as nested
functions and variable numbers of
arguments, are supported (see the
tutorial for details).
For a set of 44 non-trivial test
programs (at over 10,000 lines in
total (sloccount)), measurements show
a typical speedup of 2-40 times over
Psyco, and 2-220 times over CPython.
Because Shed Skin is still in an early
stage of development, however, many
other programs will not compile
out-of-the-box.
I want to compute magnetic fields of some conductors using the Biot–Savart law and I want to use a 1000x1000x1000 matrix. Before I use MATLAB, but now I want to use Python. Is Python slower than MATLAB ? How can I make Python faster?
EDIT:
Maybe the best way is to compute the big array with C/C++ and then transfering them to Python. I want to visualise then with VPython.
EDIT2: Which is better in my case: C or C++?
You might find some useful results at the bottom of this link
http://wiki.scipy.org/PerformancePython
From the introduction,
A comparison of weave with NumPy, Pyrex, Psyco, Fortran (77 and 90) and C++ for solving Laplace's equation.
It also compares MATLAB and seems to show similar speeds to when using Python and NumPy.
Of course this is only a specific example, your application might be allow better or worse performance. There is no harm in running the same test on both and comparing.
You can also compile NumPy with optimized libraries such as ATLAS which provides some BLAS/LAPACK routines. These should be of comparable speed to MATLAB.
I'm not sure if the NumPy downloads are already built against it, but I think ATLAS will tune libraries to your system if you compile NumPy,
http://www.scipy.org/Installing_SciPy/Windows
The link has more details on what is required under the Windows platform.
EDIT:
If you want to find out what performs better, C or C++, it might be worth asking a new question. Although from the link above C++ has best performance. Other solutions are quite close too i.e. Pyrex, Python/Fortran (using f2py) and inline C++.
The only matrix algebra under C++ I have ever done was using MTL and implementing an Extended Kalman Filter. I guess, though, in essence it depends on the libraries you are using LAPACK/BLAS and how well optimised it is.
This link has a list of object-oriented numerical packages for many languages.
http://www.oonumerics.org/oon/
NumPy and MATLAB both use an underlying BLAS implementation for standard linear algebra operations. For some time both used ATLAS, but nowadays MATLAB apparently also comes with other implementations like Intel's Math Kernel Library (MKL). Which one is faster by how much depends on the system and how the BLAS implementation was compiled. You can also compile NumPy with MKL and Enthought is working on MKL support for their Python distribution (see their roadmap). Here is also a recent interesting blog post about this.
On the other hand, if you need more specialized operations or data structures then both Python and MATLAB offer you various ways for optimization (like Cython, PyCUDA,...).
Edit: I corrected this answer to take into account different BLAS implementations. I hope it is now a fair representation of the current situation.
The only valid test is to benchmark it. It really depends on what your platform is, and how well the Biot-Savart Law maps to Matlab or NumPy/SciPy built-in operations.
As for making Python faster, Google's working on Unladen Swallow, a JIT compiler for Python. There are probably other projects like this as well.
As per your edit 2, I recommend very strongly that you use Fortran because you can leverage the available linear algebra subroutines (Lapack and Blas) and it is way simpler than C/C++ for matrix computations.
If you prefer to go with a C/C++ approach, I would use C, because you presumably need raw performance on a presumably simple interface (matrix computations tend to have simple interfaces and complex algorithms).
If, however, you decide to go with C++, you can use the TNT (the Template Numerical Toolkit, the C++ implementation of Lapack).
Good luck.
If you're just using Python (with NumPy), it may be slower, depending on which pieces you use, whether or not you have optimized linear algebra libraries installed, and how well you know how to take advantage of NumPy.
To make it faster, there are a few things you can do. There is a tool called Cython that allows you to add type declarations to Python code and translate it into a Python extension module in C. How much benefit this gets you depends a bit on how diligent you are with your type declarations - if you don't add any at all, you won't see much of any benefit. Cython also has support for NumPy types, though these are a bit more complicated than other types.
If you have a good graphics card and are willing to learn a bit about GPU computing, PyCUDA can also help. (If you don't have an nvidia graphics card, I hear there is a PyOpenCL in the works as well). I don't know your problem domain, but if it can be mapped into a CUDA problem then it should be able to handle your 10^9 elements nicely.
And here is an updated "comparison" between MATLAB and NumPy/MKL based on some linear algebra functions:
http://dpinte.wordpress.com/2010/03/16/numpymkl-vs-matlab-performance/
The dot product is not that slow ;-)
I couldn't find much hard numbers to answer this same question so I went ahead and did the testing myself. The results, scripts, and data sets used are all available here on my post on MATLAB vs Python speed for vibration analysis.
Long story short, the FFT function in MATLAB is better than Python but you can do some simple manipulation to get comparable results and speed. I also found that importing data was faster in Python compared to MATLAB (even for MAT files using the scipy.io).
I would also like to point out that Python (+NumPy) can easily interface with Fortran via the F2Py module, which basically nets you native Fortran speeds on the pieces of code you offload into it.