How to do parallel computing in python 2.7 using MPI?

How to do parallel computing in python 2.7 using MPI? - python

I have written a python code to carry out genetic algorithm optimization, but it is too slow. I would like to know how to run the same in parallel mode making use of multiple CPUs ?
For more clarity, another python code will be called by my code for, say 100 times one after the other, I wanted to divide this between 4 CPUs. So that 25 times the outside python code is solved by each CPU. Thereby increasing the speed.
Its highly appreciated if someone can help me with is ?
Thanks in advance!

There are several packages that provide parallel computing for python2. I am the author of a package called pathos, which provides parallel computing with several parallel backends and gives them a common API. pathos provides parallel pipes and maps for multi-process, multi-threaded, parallel over sockets, MPI-parallel, and also interactions with schedulers and over ssh. pathos relies on several packages, which you can pick from if you don't want all the different options.
pathos uses: pyina which in turn uses mpi4py. mpi4py provides bindings to MPI, but you can't run the code from python 'normally'… you need to run with whatever you use to run MPI. pyina enables you to run mpi4py from normal python, and to interact with schedulers. Plus, pyina uses dill, which can serialize most python objects, and thus you are much more able to send what you want across processes.
pathos provides a fork of multiprocessing that also plays well with dill and pyina. Using both can enable you to do hierarchical parallel competing -- like launching MPI parallel that then spawns multiprocess or multithreaded parallel.
pathos also uses ppft, which is a fork of pp (Parallel Python), which provides parallel computing across sockets -- so that means you can connect a parallel map across several machines.
There are alternatives to pathos, such as IPython-parallel. However, the ability to use MPI is very new, and I don't know how capable it is yet. It may or may not leverage IPython-cluster-helper, which has been in development for a little while. Note that IPython doesn't use pp, it uses zmq instead, and IPython also provides connectivity to EC2 if you like cloud stuff.
Here are some relevant links:
pathos, 'pyina, dill, ppft: https://github.com/uqfoundation
Ipython-cluster-helper: https://github.com/roryk/ipython-cluster-helper
IPython: http://ipython.org/ipython-doc/dev/parallel/parallel_mpi.html

Related

Difference between multiprocessing and concurrent libraries?

Here's what I understand:
The multiprocessing library uses multiple cores, so it's processing in parallel and not just simulating parallel processing like some libraries. To do this, it overrides the Python GIL.
The concurrent library doesn't override the Python GIL and so it doesn't have the issues that multiprocessing has (ie locking, hanging). So it seems like it's not actually using multiple cores.
I understand the difference between concurrency and parallelism. My question is:
How does concurrent actually work behind the scenes?
And does subprocess work like multiprocessing or concurrent?

multiprocessing and concurrent.futures both aim at running Python code in multiple processes concurrently. They're different APIs for much the same thing. multiprocessing's API was, as #András Molnár said, designed to be much like the threading module's. concurrent.futures's API was intended to be simpler.
Neither has anything to do with the GIL. The GIL is a per-process lock in CPython, and the Python processes these modules create each have their own GIL. You can't have a CPython process without a GIL, and there's no way to "override" it (although C code can release it when it's going to be running code that it knows for certain cannot execute Python code - for example, the CPython implementation routinely releases it internally when invoking a blocking I/O function in C, so that other threads can run Python code while the thread that released the GIL waits for the I/O call to complete).

The subprocess module lets you run and control other programs. Anything you can start with the command line on the computer, can be run and controlled with this module. Use this to integrate external programs into your Python code.
The multiprocessing module lets you divide tasks written in python over multiple processes to help improve performance. It provides an API very similar to the threading module; it provides methods to share data across the processes it creates, and makes the task of managing multiple processes to run Python code (much) easier. In other words, multiprocessing lets you take advantage of multiple processes to get your tasks done faster by executing code in p

Will python multiprocessing suffice for multi-CPU machines?

I am aware that you have to use something like MPI or python libraries like celery, jug, pp, etc, when you want to distribute processes over multiple machines, but is that necessary if you have a single machine with multiple CPU's?
So if I had a machine whose motherboard has two or more CPU's (each with multiple cores) would python's multiprocessing library suffice to fully utilize that machine?

multiprocessing documentation
"...Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine..."

Running Octave tasks from Python

I have a pretty complex computation code written in Octave and a python script which receives user input, and needs to run the Octave code based on the user inputs. As I see it, I have these options:
Port the Octave code to python.
Use external libraries (i.e. oct2py) which enable you to run the Octave/Matlab engine from python.
Communicate between a python process and an octave process. One such possibility would be to use subprocess from the python code and wait for the answer.
Since I'm pretty reluctant to port my code to python and I don't want to rely on maintenance of external libraries such as oct2py, I am in favor of option 3. However, since the system should scale well, I do not want to spawn a new octave process for every request, and a tasks queue system seems more reasonable. Is there any (recommended) tasks queue system to enqueue tasks in python and have an octave worker on the other end process it?

The way it is described here, option 3 degenerates to option 2 because Octave does not have an obvious way (an API or package) for the 'Octave worker' to connect to a task queue.
The only way Octave does "networking" is by the sockets package and this means implementing the protocol for communicating with the task queue from scratch (in Octave).
The original motivation for having an 'Octave worker' is to have the main process of Octave launch once and then "direct it" to execute functions and return results, rather than launching the main process of Octave for every call to a function.
Since Octave cannot do 'a worker' (that launches, listens to a 'channel' and executes code) out of the box, the only other way to achieve this is to have the task queue framework all work in Python and only call Octave when you need its functionality, most likely via oct2py (i.e. option 2).
There are many different ways to do this ranging from Redis, to PyPubSub, Celery and RabbitMQ. All of them straightforward and very well documented. PyPubSub does not require any additional components.
(Just as a note: The solution of having an 'executable' octave script, calling it via Python and blocking until it returns is not as bad as it sounds however and for some parallel-processing frameworks it is the only way to have multiple copies of the same Octave script operate on different data segments.)

All three options are reasonable depending on your particular case.
I don't want to rely on maintenance of external libraries such as oct2py, I am in favor of option 3
oct2py is implemented using option 3. You can reinvent what it already does or use it directly. oct2py is pure Python and it has permissive license: if its development were to stop tomorrow; you could include its code alongside yours.

Is anyone using zeromq to coordinate multiple Python interpreters in the same process?

I love Python's global interpreter lock because it makes the underlying C code simple.
But it means that each Python interpreter main loop is restricted to one thread at a time.
This is bad because recently the number of cores per processor chip has been doubling frequently.
One of the supposed advantages to zeromq is that it makes multi-threaded programming "easy" or easier.
Is it possible to launch multiple Python interpreters in the same process and have them communicate only using in-process zeromq with no other shared state? Has anyone tried it? Does it work well? Please comment and/or provide links.

I don't know of any way to create multiple instances of the Python interpreter within a single process, but I do have experience with splitting multiple instances across multiple processes and communicating with zmq.
I've been using multiprocessing to implement an island-model architecture for global optimization, with zmq for managing communication between the islands. Each island is its own process with its own Python interpreter, created and managed by the master archipelago process.
Using multiprocessing allows you to launch as many independent Python interpreters as you wish, but they all reside in their own processes with a separate memory space. I believe the OS scheduler takes care of assigning processes to cores and sharing CPU time. The separate memory space is the hardest part, because it means you have to explicitly communicate. To communicate between processes, the objects/data you wish to send must be serializable, because zmq sends byte-strings.
The nice thing about zmq is that it's a piece of cake to scale across systems distributed over a network, and it's pretty lightweight. You can create just about any communication pattern you wish, using REP/REQ, PUB/SUB, or whatever.
But no, it's not as easy as just spinning up a few threads from the threading module.
Edit: Also, here's a Stack Overflow question similar to yours. Inside are some more relevant links which indicate that it may be possible to run multiple Python interpreters within a single process, but it doesn't look simple. Multiple independent embedded Python Interpreters on multiple operating system threads invoked from C/C++ program

How to make Ruby or Python web sites to use multiple cores?

Even though Python and Ruby have one kernel thread per interpreter thread, they have a global interpreter lock (GIL) that is used to protect potentially shared data structures, so this inhibits multi-processor execution. Even though the portions in those languajes that are written in C or C++ can be free-threaded, that's not possible with pure interpreted code unless you use multiple processes. What's the best way to achieve this? Using FastCGI? Creating a cluster or a farm of virtualized servers? Using their Java equivalents, JRuby and Jython?

I'm not totally sure which problem you want so solve, but if you deploy your python/django application via an apache prefork MPM using mod_python apache will start several worker processes for handling different requests.
If one request needs so much resources, that you want to use multiple cores have a look at pyprocessing. But I don't think that would be wise.

The 'standard' way to do this with rails is to run a "pack" of Mongrel instances (ie: 4 copies of the rails application) and then use apache or nginx or some other piece of software to sit in front of them and act as a load balancer.
This is probably how it's done with other ruby frameworks such as merb etc, but I haven't used those personally.
The OS will take care of running each mongrel on it's own CPU.
If you install mod_rails aka phusion passenger it will start and stop multiple copies of the rails process for you as well, so it will end up spreading the load across multiple CPUs/cores in a similar way.

Use an interface that runs each response in a separate interpreter, such as mod_wsgi for Python. This lets multi-threading be used without encountering the GIL.
EDIT: Apparently, mod_wsgi no longer supports multiple interpreters per process because idiots couldn't figure out how to properly implement extension modules. It still supports running requests in separate processes FastCGI-style, though, so that's apparently the current accepted solution.

In Python and Ruby it is only possible to use multiple cores, is to spawn new (heavyweight) processes.
The Java counterparts inherit the possibilities of the Java platform. You could imply use Java threads. That is for example a reason why sometimes (often) Java Application Server like Glassfish are used for Ruby on Rails applications.

For Python, the PyProcessing project allows you to program with processes much like you would use threads. It is included in the standard library of the recently released 2.6 version as multiprocessing. The module has many features for establishing and controlling access to shared data structures (queues, pipes, etc.) and support for common idioms (i.e. managers and worker pools).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.