I was using PIL to do image processing, and I tried to convert a color image into a grayscale one, so I wrote a Python function to do that, meanwhile I know PIL already provides a convert function to this.
But the version I wrote in Python takes about 2 seconds to finish the grayscaling, while PIL's convert almost instantly. So I read the PIL code, figured out that the algorithm I wrote is pretty much the same, but
PIL's convert is written in C or C++.
So is this the problem making the performance's different?
Yes, coding the same algorithm in Python and in C, the C implementation will be faster. This is definitely true for the usual Python interpreter, known as CPython. Another implementation, PyPy, uses a JIT, and so can achieve impressive speeds, sometimes as fast as a C implementation. But running under CPython, the Python will be slower.
If you want to do image processing, you can use
OpenCV(cv2), SimpleCV, NumPy, SciPy, Cython, Numba ...
OpenCV, SimpleCV SciPy have many image processing routines already.
NumPy can do operations on arrays in c speed.
If you want loops in Python, you can use Cython to compile your python code with static declaration into an external module.
Or you can use Numba to do JIT convert, it can convert your python code into machine binary code, and will give you near c speed.
Related
I am currently need to run FFT on 1024 sample points signal. So far I have implementing my own DFT algorithm in python, but it is very slow. If I use the NUMPY fftpack, or even move to C++ and use FFTW, do you guys think it would be better?
If you are implementing the DFFT entirely within Python, your code will run orders of magnitude slower than either package you mentioned. Not just because those libraries are written in much lower-level languages, but also (FFTW in particular) they are written so heavily optimized, taking advantage of cache locality, vector units, and basically every trick in the book, that it would not surprise me if they ran at 10,000x the speed of a naive Python implementation. Even if you are using numpy in your implementation, it will still pale in comparison.
So yes; use numpy's fftpack. If that is not fast enough, you can try the python bindings for FFTW (PyFFTW), but the speedup from fftpack to fftw will not be nearly as dramatic. I really doubt there's a need to drop into C++ just for FFTs - they're sort of the ideal case for Python bindings.
If you need speed, then you want to go for FFTW, check out the pyfftw project.
In order to use processor SIMD instructions, you need to align the data and there is not an easy way of doing so in numpy. Moreover, pyfftw allows you to use true multithreading, so trust me, it will be much faster.
In case you wish to stick to Python (handling and maintaining custom C++ bindings can be time consuming), you have the alternative of using OpenCV's implementation of FFT.
I put together a toy example comparing OpenCV's dft() and numpy's fft2 functions in python (Intel(R) Core(TM) i7-3930K CPU).
samplesFreq_cv2 = [
cv2.dft(samples[iS])
for iS in xrange(nbSamples)]
samplesFreq_np = [
np.fft.fft2(samples[iS])
for iS in xrange(nbSamples)]
Results for sequentially transforming 20000 image patches of varying resolutions from 20x20 to 60x60:
Numpy's fft2: 1.709100 seconds
OpenCV's dft: 0.621239 seconds
This is likely not as fast as binding to a dedicates C++ library like fftw, but it's a rather low-hanging fruit.
Just wondering one simple thing:
Does the majority of numpy code have bindings to C++?
(Which would make it run almost as fast as native C++ code)
Or is it all in python?
An easy way to bind NumPy (Python) code to C++ and use the NumPy arrays in place is xtensor: https://github.com/QuantStack/xtensor-python
In case you ever need that extra bit of speed.
The documention of the numpy array API gives a lot of examples of how you can handle numpy arrays in C (and C++, it's the same API) or handling data existing in a C/C++ program through numpy.
I aim to start opencv little by little but first I need to decide which API of OpenCV is more useful. I predict that Python implementation is shorter but running time will be more dense and slow compared to the native C++ implementations. Is there any know can comment about performance and coding differences between these two perspectives?
As mentioned in earlier answers, Python is slower compared to C++ or C. Python is built for its simplicity, portability and moreover, creativity where users need to worry only about their algorithm, not programming troubles.
But here in OpenCV, there is something different. Python-OpenCV is just a wrapper around the original C/C++ code. It is normally used for combining best features of both the languages, Performance of C/C++ & Simplicity of Python.
So when you call a function in OpenCV from Python, what actually run is underlying C/C++ source. So there won't be much difference in performance.( I remember I read somewhere that performance penalty is <1%, don't remember where. A rough estimate with some basic functions in OpenCV shows a worst-case penalty of <4%. ie penalty = [maximum time taken in Python - minimum time taken in C++]/minimum time taken in C++ ).
The problem arises when your code has a lot of native python codes.For eg, if you are making your own functions that are not available in OpenCV, things get worse. Such codes are ran natively in Python, which reduces the performance considerably.
But new OpenCV-Python interface has full support to Numpy. Numpy is a package for scientific computing in Python. It is also a wrapper around native C code. It is a highly optimized library which supports a wide variety of matrix operations, highly suitable for image processing. So if you can combine both OpenCV functions and Numpy functions correctly, you will get a very high speed code.
Thing to remember is, always try to avoid loops and iterations in Python. Instead, use array manipulation facilities available in Numpy (and OpenCV). Simply adding two numpy arrays using C = A+B is a lot times faster than using double loops.
For eg, you can check these articles :
Fast Array Manipulation in Python
Performance comparison of OpenCV-Python interfaces, cv and cv2
All google results for openCV state the same: that python will only be slightly slower. But not once have I seen any profiling on that. So I decided to do some and discovered:
Python is significantly slower than C++ with opencv, even for trivial programs.
The most simple example I could think of was to display the output of a webcam on-screen and display the number of frames per second. With python, I achieved 50FPS (on an Intel atom). With C++, I got 65FPS, an increase of 25%. In both cases, the CPU usage was using a single core, and to the best of my knowledge, was bound by the performance of the CPU.
Additionally this test case about aligns with what I have seen in projects I've ported from one to the other in the past.
Where does this difference come from? In python, all of the openCV functions return new copies of the image matrices. Whenever you capture an image, or if you resize it - in C++ you can re-use existing memory. In python you cannot. I suspect this time spent allocating memory is the major difference, because as others have said: the underlying code of openCV is C++.
Before you throw python out the window: python is much faster to develop in, and if long as you aren't running into hardware-constraints, or if development speed it more important than performance, then use python. In many applications I've done with openCV, I've started in python and later converted only the computer vision components to C++ (eg using python's ctype module and compiling the CV code into a shared library).
Python Code:
import cv2
import time
FPS_SMOOTHING = 0.9
cap = cv2.VideoCapture(2)
fps = 0.0
prev = time.time()
while True:
now = time.time()
fps = (fps*FPS_SMOOTHING + (1/(now - prev))*(1.0 - FPS_SMOOTHING))
prev = now
print("fps: {:.1f}".format(fps))
got, frame = cap.read()
if got:
cv2.imshow("asdf", frame)
if (cv2.waitKey(2) == 27):
break
C++ Code:
#include <opencv2/opencv.hpp>
#include <stdint.h>
using namespace std;
using namespace cv;
#define FPS_SMOOTHING 0.9
int main(int argc, char** argv){
VideoCapture cap(2);
Mat frame;
float fps = 0.0;
double prev = clock();
while (true){
double now = (clock()/(double)CLOCKS_PER_SEC);
fps = (fps*FPS_SMOOTHING + (1/(now - prev))*(1.0 - FPS_SMOOTHING));
prev = now;
printf("fps: %.1f\n", fps);
if (cap.isOpened()){
cap.read(frame);
}
imshow("asdf", frame);
if (waitKey(2) == 27){
break;
}
}
}
Possible benchmark limitations:
Camera frame rate
Timer measuring precision
Time spent in print formatting
The answer from sdfgeoff is missing the fact that you can reuse arrays in Python. Preallocate them and pass them in, and they will get used. So:
image = numpy.zeros(shape=(height, width, 3), dtype=numpy.uint8)
#....
retval, _ = cv.VideoCapture.read(image)
You're right, Python is almost always significantly slower than C++ as it requires an interpreter, which C++ does not. However, that does require C++ to be strongly-typed, which leaves a much smaller margin for error. Some people prefer to be made to code strictly, whereas others enjoy Python's inherent leniency.
If you want a full discourse on Python coding styles vs. C++ coding styles, this is not the best place, try finding an article.
EDIT:
Because Python is an interpreted language, while C++ is compiled down to machine code, generally speaking, you can obtain performance advantages using C++. However, with regard to using OpenCV, the core OpenCV libraries are already compiled down to machine code, so the Python wrapper around the OpenCV library is executing compiled code. In other words, when it comes to executing computationally expensive OpenCV algorithms from Python, you're not going to see much of a performance hit since they've already been compiled for the specific architecture you're working with.
Why choose?
If you know both Python and C++, use Python for research using Jupyter Notebooks and then use C++ for implementation.
The Python stack of Jupyter, OpenCV (cv2) and Numpy provide for fast prototyping.
Porting the code to C++ is usually quite straight-forward.
I wrote a number crunching python code. The calculations involved can take hours. Is it possible somehow to compile it to binary?
Thanks
Not in any useful (for you) way, but moving the calculations into NumPy or Cython will speed them up.
First you can try psyco, that may give you a speed up as much as 10x, but 2x is more typical
If you can post the code up somewhere, perhaps someone can point out how to leverage numpy.
If your task doesn't map well only numpy then cython is a good choice to convert a intensive function or two into C code just by adding a few cdefs.
If you can show us the code (even just the hot spots) we can probably give you better advice.
Perhaps you can modify your algorithm
Shedskin might be worth a try.
From their front page blurb:
Shed Skin is an experimental compiler,
that can translate pure, but
implicitly statically typed Python
programs into optimized C++. It can
generate stand-alone programs or
extension modules that can be imported
and used in larger Python programs.
Besides the typing restriction,
programs cannot freely use the Python
standard library (although about 20
common modules, such as random and re,
are currently supported). Also, not
all Python features, such as nested
functions and variable numbers of
arguments, are supported (see the
tutorial for details).
For a set of 44 non-trivial test
programs (at over 10,000 lines in
total (sloccount)), measurements show
a typical speedup of 2-40 times over
Psyco, and 2-220 times over CPython.
Because Shed Skin is still in an early
stage of development, however, many
other programs will not compile
out-of-the-box.
I want to compute magnetic fields of some conductors using the Biot–Savart law and I want to use a 1000x1000x1000 matrix. Before I use MATLAB, but now I want to use Python. Is Python slower than MATLAB ? How can I make Python faster?
EDIT:
Maybe the best way is to compute the big array with C/C++ and then transfering them to Python. I want to visualise then with VPython.
EDIT2: Which is better in my case: C or C++?
You might find some useful results at the bottom of this link
http://wiki.scipy.org/PerformancePython
From the introduction,
A comparison of weave with NumPy, Pyrex, Psyco, Fortran (77 and 90) and C++ for solving Laplace's equation.
It also compares MATLAB and seems to show similar speeds to when using Python and NumPy.
Of course this is only a specific example, your application might be allow better or worse performance. There is no harm in running the same test on both and comparing.
You can also compile NumPy with optimized libraries such as ATLAS which provides some BLAS/LAPACK routines. These should be of comparable speed to MATLAB.
I'm not sure if the NumPy downloads are already built against it, but I think ATLAS will tune libraries to your system if you compile NumPy,
http://www.scipy.org/Installing_SciPy/Windows
The link has more details on what is required under the Windows platform.
EDIT:
If you want to find out what performs better, C or C++, it might be worth asking a new question. Although from the link above C++ has best performance. Other solutions are quite close too i.e. Pyrex, Python/Fortran (using f2py) and inline C++.
The only matrix algebra under C++ I have ever done was using MTL and implementing an Extended Kalman Filter. I guess, though, in essence it depends on the libraries you are using LAPACK/BLAS and how well optimised it is.
This link has a list of object-oriented numerical packages for many languages.
http://www.oonumerics.org/oon/
NumPy and MATLAB both use an underlying BLAS implementation for standard linear algebra operations. For some time both used ATLAS, but nowadays MATLAB apparently also comes with other implementations like Intel's Math Kernel Library (MKL). Which one is faster by how much depends on the system and how the BLAS implementation was compiled. You can also compile NumPy with MKL and Enthought is working on MKL support for their Python distribution (see their roadmap). Here is also a recent interesting blog post about this.
On the other hand, if you need more specialized operations or data structures then both Python and MATLAB offer you various ways for optimization (like Cython, PyCUDA,...).
Edit: I corrected this answer to take into account different BLAS implementations. I hope it is now a fair representation of the current situation.
The only valid test is to benchmark it. It really depends on what your platform is, and how well the Biot-Savart Law maps to Matlab or NumPy/SciPy built-in operations.
As for making Python faster, Google's working on Unladen Swallow, a JIT compiler for Python. There are probably other projects like this as well.
As per your edit 2, I recommend very strongly that you use Fortran because you can leverage the available linear algebra subroutines (Lapack and Blas) and it is way simpler than C/C++ for matrix computations.
If you prefer to go with a C/C++ approach, I would use C, because you presumably need raw performance on a presumably simple interface (matrix computations tend to have simple interfaces and complex algorithms).
If, however, you decide to go with C++, you can use the TNT (the Template Numerical Toolkit, the C++ implementation of Lapack).
Good luck.
If you're just using Python (with NumPy), it may be slower, depending on which pieces you use, whether or not you have optimized linear algebra libraries installed, and how well you know how to take advantage of NumPy.
To make it faster, there are a few things you can do. There is a tool called Cython that allows you to add type declarations to Python code and translate it into a Python extension module in C. How much benefit this gets you depends a bit on how diligent you are with your type declarations - if you don't add any at all, you won't see much of any benefit. Cython also has support for NumPy types, though these are a bit more complicated than other types.
If you have a good graphics card and are willing to learn a bit about GPU computing, PyCUDA can also help. (If you don't have an nvidia graphics card, I hear there is a PyOpenCL in the works as well). I don't know your problem domain, but if it can be mapped into a CUDA problem then it should be able to handle your 10^9 elements nicely.
And here is an updated "comparison" between MATLAB and NumPy/MKL based on some linear algebra functions:
http://dpinte.wordpress.com/2010/03/16/numpymkl-vs-matlab-performance/
The dot product is not that slow ;-)
I couldn't find much hard numbers to answer this same question so I went ahead and did the testing myself. The results, scripts, and data sets used are all available here on my post on MATLAB vs Python speed for vibration analysis.
Long story short, the FFT function in MATLAB is better than Python but you can do some simple manipulation to get comparable results and speed. I also found that importing data was faster in Python compared to MATLAB (even for MAT files using the scipy.io).
I would also like to point out that Python (+NumPy) can easily interface with Fortran via the F2Py module, which basically nets you native Fortran speeds on the pieces of code you offload into it.