I want to compute magnetic fields of some conductors using the Biot–Savart law and I want to use a 1000x1000x1000 matrix. Before I use MATLAB, but now I want to use Python. Is Python slower than MATLAB ? How can I make Python faster?
EDIT:
Maybe the best way is to compute the big array with C/C++ and then transfering them to Python. I want to visualise then with VPython.
EDIT2: Which is better in my case: C or C++?
You might find some useful results at the bottom of this link
http://wiki.scipy.org/PerformancePython
From the introduction,
A comparison of weave with NumPy, Pyrex, Psyco, Fortran (77 and 90) and C++ for solving Laplace's equation.
It also compares MATLAB and seems to show similar speeds to when using Python and NumPy.
Of course this is only a specific example, your application might be allow better or worse performance. There is no harm in running the same test on both and comparing.
You can also compile NumPy with optimized libraries such as ATLAS which provides some BLAS/LAPACK routines. These should be of comparable speed to MATLAB.
I'm not sure if the NumPy downloads are already built against it, but I think ATLAS will tune libraries to your system if you compile NumPy,
http://www.scipy.org/Installing_SciPy/Windows
The link has more details on what is required under the Windows platform.
EDIT:
If you want to find out what performs better, C or C++, it might be worth asking a new question. Although from the link above C++ has best performance. Other solutions are quite close too i.e. Pyrex, Python/Fortran (using f2py) and inline C++.
The only matrix algebra under C++ I have ever done was using MTL and implementing an Extended Kalman Filter. I guess, though, in essence it depends on the libraries you are using LAPACK/BLAS and how well optimised it is.
This link has a list of object-oriented numerical packages for many languages.
http://www.oonumerics.org/oon/
NumPy and MATLAB both use an underlying BLAS implementation for standard linear algebra operations. For some time both used ATLAS, but nowadays MATLAB apparently also comes with other implementations like Intel's Math Kernel Library (MKL). Which one is faster by how much depends on the system and how the BLAS implementation was compiled. You can also compile NumPy with MKL and Enthought is working on MKL support for their Python distribution (see their roadmap). Here is also a recent interesting blog post about this.
On the other hand, if you need more specialized operations or data structures then both Python and MATLAB offer you various ways for optimization (like Cython, PyCUDA,...).
Edit: I corrected this answer to take into account different BLAS implementations. I hope it is now a fair representation of the current situation.
The only valid test is to benchmark it. It really depends on what your platform is, and how well the Biot-Savart Law maps to Matlab or NumPy/SciPy built-in operations.
As for making Python faster, Google's working on Unladen Swallow, a JIT compiler for Python. There are probably other projects like this as well.
As per your edit 2, I recommend very strongly that you use Fortran because you can leverage the available linear algebra subroutines (Lapack and Blas) and it is way simpler than C/C++ for matrix computations.
If you prefer to go with a C/C++ approach, I would use C, because you presumably need raw performance on a presumably simple interface (matrix computations tend to have simple interfaces and complex algorithms).
If, however, you decide to go with C++, you can use the TNT (the Template Numerical Toolkit, the C++ implementation of Lapack).
Good luck.
If you're just using Python (with NumPy), it may be slower, depending on which pieces you use, whether or not you have optimized linear algebra libraries installed, and how well you know how to take advantage of NumPy.
To make it faster, there are a few things you can do. There is a tool called Cython that allows you to add type declarations to Python code and translate it into a Python extension module in C. How much benefit this gets you depends a bit on how diligent you are with your type declarations - if you don't add any at all, you won't see much of any benefit. Cython also has support for NumPy types, though these are a bit more complicated than other types.
If you have a good graphics card and are willing to learn a bit about GPU computing, PyCUDA can also help. (If you don't have an nvidia graphics card, I hear there is a PyOpenCL in the works as well). I don't know your problem domain, but if it can be mapped into a CUDA problem then it should be able to handle your 10^9 elements nicely.
And here is an updated "comparison" between MATLAB and NumPy/MKL based on some linear algebra functions:
http://dpinte.wordpress.com/2010/03/16/numpymkl-vs-matlab-performance/
The dot product is not that slow ;-)
I couldn't find much hard numbers to answer this same question so I went ahead and did the testing myself. The results, scripts, and data sets used are all available here on my post on MATLAB vs Python speed for vibration analysis.
Long story short, the FFT function in MATLAB is better than Python but you can do some simple manipulation to get comparable results and speed. I also found that importing data was faster in Python compared to MATLAB (even for MAT files using the scipy.io).
I would also like to point out that Python (+NumPy) can easily interface with Fortran via the F2Py module, which basically nets you native Fortran speeds on the pieces of code you offload into it.
Related
I am currently need to run FFT on 1024 sample points signal. So far I have implementing my own DFT algorithm in python, but it is very slow. If I use the NUMPY fftpack, or even move to C++ and use FFTW, do you guys think it would be better?
If you are implementing the DFFT entirely within Python, your code will run orders of magnitude slower than either package you mentioned. Not just because those libraries are written in much lower-level languages, but also (FFTW in particular) they are written so heavily optimized, taking advantage of cache locality, vector units, and basically every trick in the book, that it would not surprise me if they ran at 10,000x the speed of a naive Python implementation. Even if you are using numpy in your implementation, it will still pale in comparison.
So yes; use numpy's fftpack. If that is not fast enough, you can try the python bindings for FFTW (PyFFTW), but the speedup from fftpack to fftw will not be nearly as dramatic. I really doubt there's a need to drop into C++ just for FFTs - they're sort of the ideal case for Python bindings.
If you need speed, then you want to go for FFTW, check out the pyfftw project.
In order to use processor SIMD instructions, you need to align the data and there is not an easy way of doing so in numpy. Moreover, pyfftw allows you to use true multithreading, so trust me, it will be much faster.
In case you wish to stick to Python (handling and maintaining custom C++ bindings can be time consuming), you have the alternative of using OpenCV's implementation of FFT.
I put together a toy example comparing OpenCV's dft() and numpy's fft2 functions in python (Intel(R) Core(TM) i7-3930K CPU).
samplesFreq_cv2 = [
cv2.dft(samples[iS])
for iS in xrange(nbSamples)]
samplesFreq_np = [
np.fft.fft2(samples[iS])
for iS in xrange(nbSamples)]
Results for sequentially transforming 20000 image patches of varying resolutions from 20x20 to 60x60:
Numpy's fft2: 1.709100 seconds
OpenCV's dft: 0.621239 seconds
This is likely not as fast as binding to a dedicates C++ library like fftw, but it's a rather low-hanging fruit.
I'm implementing an algorithm into my Python web application, and it includes doing some (possibly) large clustering and matrix calculations. I've seen that Python can use C/C++ libraries, and thought that it might be a good idea to utilize this to speed things up.
First: Are there any reasons not to, or anything I should keep in mind while doing this?
Second: I have some reluctance against connecting C to MySQL (where I would get the data the calculations). Is this in any way justified?
Use the ecosystem.
For matrices, using numpy and scipy can provide approximately the same range of functionality as tools like Matlab. If you learn to write idiomatic code with these modules, the inner loops can take place in the C or FORTRAN implementations of the modules, resulting in C-like overall performance with Python expressiveness for most tasks. You may also be interested in numexpr, which can further accelerate and in some cases parallelize numpy/scipy expressions.
If you must write compute-intensive inner loops in Python, think hard about it first. Maybe you can reformulate the problem in a way more suited to numpy/scipy. Or, maybe you can use data structures available in Python to come up with a better algorithm rather than a faster implementation of the same algorithm. If not, there’s Cython, which uses a restricted subset of Python to compile to machine code.
Only as a last resort, and after profiling to identify the absolute worst bottlenecks, should you consider writing an extension module in C/C++. There are just so many easier ways to meet the vast majority of performance requirements, and numeric/mathematical code is an area with very good existing library support.
Not the answer you expected, but i have been down that road and advise KISS:
First make it work in the most simple way possible.
Only than look into speeding things up later / complicating the design.
There are lots of other ways to phrase this such as "do not fix hypothetical problems unless resources are unlimited".
cython support for c++ is much better than what it was. You can use most of the standard library in cython seamlessly. There are up to 500x speedups in the extreme best case.
My experience is that it is best to keep the cython code extremely thin, and forward all arguments to c++. It is much easier to debug c++ directly, and the syntax is better understood. Having to maintain a code base unnecessarily in three different languages is a pain.
Using c++/cython means that you have to spend a little time thinking about ownership issues. I.e. it is often safest not to allocate anything in c++ but prepare the memory in python / cython. (Use array.array or numpy.array). Alternatively, make a c++ object wrapped in cython which has a deallocation function. All this means that your application will be more fragile than if it is written only in python or c++: You are abandoning both RAII / gc.
On the other hand, your python code should translate line for line into modern c++. So this reminds you not to use old fashioned new or delete etc in your new c++ code but make things fast and clean by keeping the abstractions at a high level.
Remember too to re-examine the assumptions behind your original algorithmic choices. What is sensible for python might be foolish for c++.
Finally, python makes everything significantly simpler and cleaner and faster to debug than c++. But in many ways, c++ encourages more powerful abstractions and better separation of concerns.
When you programme with python and cython and c++, it slowly comes to feel like taking the worse bits of both approaches. It might be worth biting the bullet and rewriting completely in c++. You can keep the python test harness and use the original design as a prototype / testbed.
I guess the question speaks for itself. I'm interested in doing some serious computations but am not a programmer by trade. I can string enough python together to get done what I want. But can I write a program in python and have the GPU execute it using CUDA? Or do I have to use some mix of python and C?
The examples on Klockner's (sp) "pyCUDA" webpage had a mix of both python and C, so I'm not sure what the answer is.
If anyone wants to chime in about Opencl, feel free. I heard about this CUDA business only a couple of weeks ago and didn't know you could use your video cards like this.
You should take a look at CUDAmat and Theano. Both are approaches to writing code that executes on the GPU without really having to know much about GPU programming.
I believe that, with PyCUDA, your computational kernels will always have to be written as "CUDA C Code". PyCUDA takes charge of a lot of otherwise-tedious book-keeping, but does not build computational CUDA kernels from Python code.
pyopencl offers an interesting alternative to PyCUDA. It is described as a "sister project" to PyCUDA. It is a complete wrapper around OpenCL's API.
As far as I understand, OpenCL has the advantage of running on GPUs beyond Nvidia's.
Great answers already, but another option is Clyther. It will let you write OpenCL programs without even using C, by compiling a subset of Python into OpenCL kernels.
A promising library is Copperhead (alternative link), you just need to decorate the function that you want to be run by the GPU (and then you can opt-in / opt-out it to see what's best between cpu or gpu for that function)
There is a good, basic set of math constructs with compute kernels already written that can be accessed through pyCUDA's cumath module. If you want to do more involved or specific/custom stuff you will have to write a touch of C in the kernel definition, but the nice thing about pyCUDA is that it will do the heavy C-lifting for you; it does a lot of meta-programming on the back-end so you don't have to worry about serious C programming, just the little pieces. One of the examples given is a Map/Reduce kernel to calculate the dot product:
dot_krnl = ReductionKernel(np.float32, neutral="0",
reduce_expr="a+b",
map_expr="x[i]*y[i]",
arguments="float *x, float *y")
The little snippets of code inside each of those arguments are C lines, but it actually writes the program for you. the ReductionKernel is a custom kernel type for map/reducish type functions, but there are different types. The examples portion of the official pyCUDA documentation goes into more detail.
Good luck!
Scikits CUDA package could be a better option, provided that it doesn't require any low-level knowledge or C code for any operation that can be represented as numpy array manipulation.
I was wondering the same thing and carried a few searches. I found the article linked below which seems to answer your question. However, you asked this back in 2014 and the Nvidia article does not have a date.
https://developer.nvidia.com/how-to-cuda-python
The video goes through the set up, an initial example and, quite importantly, profiliing. However, I do not know if you can implement all of the usual general compute patterns. I would think you can because as far as I could there are no limitations in NumPy.
I wrote a number crunching python code. The calculations involved can take hours. Is it possible somehow to compile it to binary?
Thanks
Not in any useful (for you) way, but moving the calculations into NumPy or Cython will speed them up.
First you can try psyco, that may give you a speed up as much as 10x, but 2x is more typical
If you can post the code up somewhere, perhaps someone can point out how to leverage numpy.
If your task doesn't map well only numpy then cython is a good choice to convert a intensive function or two into C code just by adding a few cdefs.
If you can show us the code (even just the hot spots) we can probably give you better advice.
Perhaps you can modify your algorithm
Shedskin might be worth a try.
From their front page blurb:
Shed Skin is an experimental compiler,
that can translate pure, but
implicitly statically typed Python
programs into optimized C++. It can
generate stand-alone programs or
extension modules that can be imported
and used in larger Python programs.
Besides the typing restriction,
programs cannot freely use the Python
standard library (although about 20
common modules, such as random and re,
are currently supported). Also, not
all Python features, such as nested
functions and variable numbers of
arguments, are supported (see the
tutorial for details).
For a set of 44 non-trivial test
programs (at over 10,000 lines in
total (sloccount)), measurements show
a typical speedup of 2-40 times over
Psyco, and 2-220 times over CPython.
Because Shed Skin is still in an early
stage of development, however, many
other programs will not compile
out-of-the-box.
Are parts of NumPy and/or SciPy programmed in C/C++?
And how does the overhead of calling C from Python compare to the overhead of calling C from Java and/or C#?
I'm just wondering if Python is a better option than Java or C# for scientific apps.
If I look at the shootouts, Python loses by a huge margin. But I guess this is because they don't use 3rd-party libraries in those benchmarks.
I would question any benchmark which doesn't show the source for each implementation (or did I miss something)? It's entirely possible that either or both of those solutions are coded badly which would result in an unfair appraisal of either or both language's performance. [Edit] Oops, now I see the source. As others have pointed out though, it's not using the NumPy/SciPy libraries so those benchmarks are not going to help you make a decision.
I believe the vast majority of NumPy and SciPy is written in C and wrapped in Python for ease of use.
It probably depends what you're doing in any of those languages as to how much overhead there is for a particular application.
I've used Python for data processing and analysis for a couple of years now so I would say it's certainly fit for purpose.
What are you trying to achieve at the end of the day? If you want a fast way to develop readable code, Python is an excellent option and certainly fast enough for a first stab at whatever it is you're trying to solve.
Why not have a bash at each for a small subset of your problem and benchmark the results in terms of development time and run time? Then you can make an objective decision based on some relevant data ...or at least that's what I'd do :-)
There is a better comparison here (not a benchmark but shows ways of speeding up Python). NumPy is mostly written in C. The main advantage of Python is that there are a number of ways of very easily extending your code with C (ctypes, swig,f2py) / C++ (boost.python, weave.inline, weave.blitz) / Fortran (f2py) - or even just by adding type annotations to Python so it can be processed to C (cython). I don't think there are many things comparably easy for C# or Java - at least that so seemlessly handle passing numerical arrays of different types (although I guess proponents would argue since they don't have the performance penalty of Python there is less need to).
A lot of it is written in C or fortran. You can re-write the hot loops in C (or use one of the gazillion ways to speed python up, boost/weave is my favorite), but does it really matter?
Your scientific app will be run once. The rest is just debugging and development, and those can be much quicker on Python.
Most of NumPy is in C, but a large portion of the C code is "boilerplate" to handle all the dirty details of the Python/C interface. I think the ratio C vs. Python is around 50/50 ATM for NumPy.
I am not too familiar with vm-based low-level details, but I believe the interface cost would be higher because of the restrictions put on the jvm and the .clr. One of the reason why numpy is often faster than similar environments is the memory representation and how arrays are shared/passed between functions. Whereas most environments (Matlab and R as well I believe) use Copy-On-Write to pass arrays between functions, NumPy use references. But doing so in e.g. the JVM would be hard (because of restrictions on how to use pointer, etc...). It is doable (an early port of NumPy for Jython exists), but I don't know how they solve this issue. Maybe C++/Cli would make this easier, but I have zero experience with that environment.
It always depends on your own capability to handle the langue, so the language is able to generate fast code. Out of my experience, numpy is several times slower then good .NET implementations. And I expect JAVA to be similar fast. Their optimizing JIT compilers have improved significantly over the years and produce very efficient instructions.
numpy on the other hand comes with a syntax wich is easier to use for those, which are attuned to scripting languages. But if it comes to application development, those advantages often turn to obstacles and you will yearn for typesafety and enterprise IDEs. Also, the syntactic gap is already closing with C#. A growing number of scientific libraries exist for Java and .NET.Personally I tend towards C#, bacause it provides better syntax for multidimensional arrays and somehow feels more 'modern'. But of course, this is only my personal experience.