Does Enthought Canopy support parallel code execution on CPU using perhaps openMPI or on GPU using openCV or CUDA
I am looking into switching from C++ to python as i want to make GUI for my parallel code.
Is this a good idea. Does python support parallel computation?
Yes, Python does support this. There are three layers to processes with Python:
subprocess: which simply starts a process within the same thread
threading: which starts a new thread and leaves the old on alone. There are some frequent stories that this not necessarily leads to better performance.
multiprocessing: which is what you are after
Here is an intro to parallel processing on Python.
The official docs for the multiprocessing are here.
The ever so useful discussions on the Python Module of the Week are also worth a look.
Edit:
The python libraries mentioned by HT #jonathan are likely to be:
Cuda:
http://mathema.tician.de/software/pycuda
OpenCV:
http://code.google.com/p/pyopencv/
There is a nice tutorial for this here.
And Message Passing Interface:
http://mpi4py.scipy.org/docs/usrman/intro.html
Related
I am starting to read up over possible ways to parallelise Python code.
DISCLAIMER. This is NOT a question about Multiprocessing vs Multithreading.
At this link https://ipyparallel.readthedocs.io/en/latest/demos.html one finds references to several
concurrency packages for Python to avoid the GIL: https://scipy.github.io/old-wiki/pages/ParallelProgramming
-IPython1
-mpi4py
-parallel python
-Numba
There is also a multiprocessing package:
https://docs.python.org/3/library/multiprocessing.html
And another one called processing:
https://pypi.org/project/processing/
First of all, it is not at all clear to me the difference between the latter two above; what is the difference in using between the multiprocessing module and the processing module?.
In general, I fail to understand the differences between those all -- which must be there, given some developers made the effort to create a mpi4py version for the MPI used in C++. I guess this is not just about the dualism between "threading" and "multiprocessing" approaches, where in one case the memory is shared while the other has each process with its own memory and interpreter, something more must be different between all of those different packages out there.
Thanks to all of those who will dedicate time to answer this!
The difference is that the last version of processing was released in April of 2008 and multiprocessing was added in Python 2.6 in October 2008.
processing was a library that was used before multiprocessing was distributed with Python.
As far as the specific difference between other modules designed for multiprocessing: The scipy page you linked says that "This is a subject for graduate courses in computer science, and I'm not going to address it here....there are some python tools you can use to implement the things you learn in that graduate course." While they admit that may be a bit of an exaggeration, independent study of multiprocessing in general will be required to discern the difference between these libraries, you should probably just stick to the built in multiprocessing module for your initial experiments while you learn how it works. One you're more comfortable with multiprocessing, you might want to check out the pathos framework.
But here are the basics for the packages you mention:
Numba adds decorators that automatically compile functions to make them run faster, it isn't really a multiprocessing tool as much as a JIT compiling tool.
Parallel Python overcomes the GIL to utilize multiple cores or multiple computers, it's designed to be easy to use and to handle all the complex stuff behind the scenes.
MPI for Python is like Paralell Python with less emphasis on simplicity.
IPython is a toolkit with many features, including a shell and Jupyter kernel, it's also not really a multiprocessing tool.
Keep in mind that plenty of libraries/modules do the same thing, there doesn't need to be a reason more than one exists. Use whatever works for you.
I am trying to restrict the number of CPUs used by Python (for benchmarking & to see if it speeds up my program).
I have found a few Python modules for achieving this ('os', 'affinity', 'psutil') except that their methods for changing affinity only works with Linux (and sometimes Windows). There is also a suggestion to use the 'taskset' command (Why does multiprocessing use only a single core after I import numpy?) but this command not available on macOS as far as I know.
Is there a (preferable clean & easy) way to change affinity while running Python / iPython on macOS? It seems like changing processor affinity in Mac is not as easy as in other platforms (http://softwareramblings.com/2008/04/thread-affinity-on-os-x.html).
Not possible. See Thread Affinity API Release Notes:
OS X does not export interfaces that identify processors or control thread placement—explicit thread to processor binding is not supported. Instead, the kernel manages all thread placement. Applications expect that the scheduler will, under most circumstances, run its threads using a good processor placement with respect to cache affinity.
Note that thread affinity is something you'd consider fairly late when optimizing a program, there are a million things to do which have a larger impact on your program.
Also note that Python is particularly bad at multithreading to begin with.
This is an assignment I am working on.
I have been asked to write a sample parallel processing program using native python features. I can write the code but the problem is - even after searching I cannot find a native parallel programming feature in python.
Since we have to import "multiprocessing" module - it is not native. I just cannot find which feature is available to use.
Already checked following threads but they use multiprocessing:
Parallel programming in python
Python multiprocessing for parallel processes
How to do parallel programming in Python
I think your definition of "native" is too narrow, or your understanding of the term "import" is mistaken.
The multiprocessing module is part of Python's standard library. Every Python implementation should have it. It is a native feature of Python.
The term "import" should be understood as "make this module available in this program", not as "add this non-native feature to the language". Importing a module does not change the language.
Edit:
In Python 3 you could make concurrent programs with async def and yield. But that shouldn't be considered real parallel processing. You might call it cooperative "multitasking", but it isn't really. It's task switching.
Today I stumbled over a post in stackoverflow (see also here):
We are developing opencl4py, higher level bindings. This project uses CFFI, so it works on Pypy.
The major issue we encountered with pyopencl is that 'import pyopencl' does OpenCL initialization and takes the whole virtual memory in case of NVIDIA driver, preventing from correct forking and effectively disabling multiprocessing (yes, we claim that using pyopencl disables multiprocessing at least with NVIDIA). opencl4py uses lazy OpenCL initialization, resolving this "import hell".
Later, it gained some nice features like super easy binary program caching, etc. Unfortunately, the documentation is somewhat brief. The best way to learn how it works is go through the tests.
As there is also pyOpenCL, I was woundering what the difference between these two packages is. Does anybody know where I can find an overview on the pro's and con's for these both packages?
Edit: To include benshope's comment as I would also be interested: what does "disable[s] multiprocessing" mean? Like, it can't run kernels on several devices at one time?
As far as I know, there is no such overview. I'll try to list some key points:
pyOpenCL is a mature project with a relatively large user base. There are tutorials, FAQ, etc. opencl4py appeared on 03/2014; no tutorials, FAQ and so on - only unit tests and docstrings.
pyOpenCL is a native cPython extension, whereas opencl4py uses cffi, so that it works on PyPy (pyOpenCL does NOT) and it does not require to be recompiled each time cPython changes version.
PyOpenCL has extras, such as random number generator and OpenGL interoperability.
opencl4py is extensively tested in Samsung production real world scenarios and is being actively developed.
what does "disable[s] multiprocessing" mean? Like, it can't run kernels on several devices at one time?
Of course, it can, I was trying to say that after importing pyopencl, os.fork() or multiprocessing.Process() lead to crashes inside NVIDIA OpenCL userspace library. It is always a bad idea of doing work during import.
I have a problem when I run a script with python. I haven't done any parallelization in python and don't call any mpi for running the script. I just execute "python myscript.py" and it should only use 1 cpu.
However, when I look at the results of the command "top", I see that python is using almost 390% of my cpus. I have a quad core, so 8 threads. I don't think that this is helping my script to run faster. So, I would like to understand why python is using more than one cpu, and stop it from doing so.
Interesting thing is when I run a second script, that one also takes up 390%. If I run a 3rd script, the cpu usage for each of them drops to 250%. I had a similar problem with matlab a while ago, and the way I solved it was to launch matlab with -singlecompthread, but I don't know what to do with python.
If it helps, I'm solving the Poisson equation (which is not parallelized at all) in my script.
UPDATE:
My friend ran the code on his own computer and it only takes 100% cpu. I don't use any BLAS, MKL or any other thing. I still don't know what the cause for 400% cpu usage is.
There's a piece of fortran algorithm from the library SLATEC, which solves the Ax=b system. That part I think is using a lot of cpu.
Your code might be calling some functions that uses C/C++/etc. underneath. In that case, it is possible for multiple thread usage.
Are you calling any libraries that are only python bindings to some more efficiently implemented functions?
You can always set your process affinity so it run on only one cpu. Use "taskset" command on linux, or process explorer on windows.
This way, you should be able to know if your script has same performance using one cpu or more.
Could it be that your code uses SciPy or other numeric library for Python that is linked against Intel MKL or another vendor provided library that uses OpenMP? If the underlying C/C++ code is parallelised using OpenMP, you can limit it to a single thread by setting the environment variable OMP_NUM_THREADS to 1:
OMP_NUM_THREADS=1 python myscript.py
Intel MKL for sure is parallel in many places (LAPACK, BLAS and FFT functions) if linked with the corresponding parallel driver (the default link behaviour) and by default starts as many compute threads as is the number of available CPU cores.