python using multiprocessing pool.map for distributed computing - python

The next code works to execute "someprogram" in parallel for the 50 inputfilenames, using a pool of 5 workers. How is it possible to use 5 workers on
this computer, and 7 on another computer, using only the standard libraries like multiprocessing. Ideally i would have a list tuples (hostname, amountofworkers), which can be used to speed up something, and maybe turn it into a decorator so that it can be reused more easily on methods like the "commandlinestuff". (Using linux and python 2.7)
import multiprocessing
import subprocess
def commandlinestuff(inputfilename):
p = subprocess.Popen("someprogram "+inputfilename, shell=True)
p.wait()
inputfilenames = ["something"+str(i).zfill(2)+".in" for i in range(50)]
p = multiprocessing.pool(5)
p.map(commandlinestuff, inputfilenames)

It sounds like you are trying to re-invent pyro which is itself written in pure python but is not currently a part of the standard libraries.
Basically you need a server running on the remote machine(s) that accepts a connection, receives a pickled object to execute, (and possibly some data), executes it and posts back a result. You will also need a client on the local machine that does the posts, gathers the results and possibly does some load balancing.
The Parallel Processing entry in the python wiki gives a long list of tools to do this sort of thing with various advantages and disadvantages.

https://code.google.com/p/distributed-python-for-scripting/ did the trick for me - nothing to install, and seems to be the shortest/easiest way possible to do multiprocessing in python

Related

How to catch runtime errors from native code in python?

I have the following problem, Lets have this python function
def func():
run some code here which calls some native code
Inside func() I am calling some functions which in turn calls some native C code.
If any crash happens the whole python process crashes alltoghether.
How is possible to catch and recover from such errors?
One way that came to my mind is run this function in a separate process, but not just starting another process because there is a lot of memory and objects used by the function, will be very hard to split that. Is there something like fork() in C available in python, to create a copy of the same exact process with same memory structures and etc?
Or maybe other ideas?
Update:
It seems that there is no real way of catching the C runtime errors in python, those are at a lower level and crashes the whole Python virtual machine.
As solutions you currently have two options:
Use os.fork() but work only in unix like OS env.
Use multiprocessing and a shared memory model to share big objects between processes. Usual serialization will just not work with objects that have multi-gigabytes in memory (you will just run out of memory). However there is a very good python library called Ray (https://docs.ray.io/en/master/) that performs in-memory big objects serialization using shared memory model and it's ideal for BigData/ML workloads - highly recommended.
As long as you are running on an operating system that supports fork that's already how the multiprocessing module creates subprocesses. You could os.fork, multiprocessing.Process or multiprocessing.Pool to get what you want. You can also use the os.fork() call on these systems.

On what CPU cores are my Python processes running?

The setup
I have written a pretty complex piece of software in Python (on a Windows PC). My software starts basically two Python interpreter shells. The first shell starts up (I suppose) when you double click the main.py file. Within that shell, other threads are started in the following way:
# Start TCP_thread
TCP_thread = threading.Thread(name = 'TCP_loop', target = TCP_loop, args = (TCPsock,))
TCP_thread.start()
# Start UDP_thread
UDP_thread = threading.Thread(name = 'UDP_loop', target = UDP_loop, args = (UDPsock,))
TCP_thread.start()
The Main_thread starts a TCP_thread and a UDP_thread. Although these are separate threads, they all run within one single Python shell.
The Main_threadalso starts a subprocess. This is done in the following way:
p = subprocess.Popen(['python', mySubprocessPath], shell=True)
From the Python documentation, I understand that this subprocess is running simultaneously (!) in a separate Python interpreter session/shell. The Main_threadin this subprocess is completely dedicated to my GUI. The GUI starts a TCP_thread for all its communications.
I know that things get a bit complicated. Therefore I have summarized the whole setup in this figure:
I have several questions concerning this setup. I will list them down here:
Question 1 [Solved]
Is it true that a Python interpreter uses only one CPU core at a time to run all the threads? In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?
Answer: yes, this is true. The GIL (Global Interpreter Lock) ensures that all threads run on one CPU core at a time.
Question 2 [Not yet solved]
Do I have a way to track which CPU core it is?
Question 3 [Partly solved]
For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter instance. Is this correct?
Answer: Yes this is correct. At first there was some confusion about whether the following code would create a new Python interpreter instance:
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
The issue has been clarified. This code indeed starts a new Python interpreter instance.
Will Python be smart enough to make that separate Python interpreter instance run on a different CPU core? Is there a way to track which one, perhaps with some sporadic print statements as well?
Question 4 [New question]
The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):
# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])
# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))
The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?
Q: Is it true that a Python interpreter uses only one CPU core at a time to run all the threads?
No. GIL and CPU affinity are unrelated concepts. GIL can be released during blocking I/O operations, long CPU intensive computations inside a C extension anyway.
If a thread is blocked on GIL; it is probably not on any CPU core and therefore it is fair to say that pure Python multithreading code may use only one CPU core at a time on CPython implementation.
Q: In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?
I don't think CPython manages CPU affinity implicitly. It is likely relies on OS scheduler to choose where to run a thread. Python threads are implemented on top of real OS threads.
Q: Or is the Python interpreter able to spread them over multiple cores?
To find out the number of usable CPUs:
>>> import os
>>> len(os.sched_getaffinity(0))
16
Again, whether or not threads are scheduled on different CPUs does not depend on Python interpreter.
Q: Suppose that the answer to Question 1 is 'multiple cores', do I have a way to track on which core each thread is running, perhaps with some sporadic print statements? If the answer to Question 1 is 'only one core', do I have a way to track which one it is?
I imagine, a specific CPU may change from one time-slot to another. You could look at something like /proc/<pid>/task/<tid>/status on old Linux kernels. On my machine, task_cpu can be read from /proc/<pid>/stat or /proc/<pid>/task/<tid>/stat:
>>> open("/proc/{pid}/stat".format(pid=os.getpid()), 'rb').read().split()[-14]
'4'
For a current portable solution, see whether psutil exposes such info.
You could restrict the current process to a set of CPUs:
os.sched_setaffinity(0, {0}) # current process on 0-th core
Q: For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter session/shell. Is this correct?
Yes. subprocess module creates new OS processes. If you run python executable then it starts a new Python interpeter. If you run a bash script then no new Python interpreter is created i.e., running bash executable does not start a new Python interpreter/session/etc.
Q: Supposing that it is correct, will Python be smart enough to make that separate interpreter session run on a different CPU core? Is there a way to track this, perhaps with some sporadic print statements as well?
See above (i.e., OS decides where to run your thread and there could be OS API that exposes where the thread is run).
multiprocessing.Process(target=foo, args=(q,)).start()
multiprocessing.Process also creates a new OS process (that runs a new Python interpreter).
In reality, my subprocess is another file. So this example won't work for me.
Python uses modules to organize the code. If your code is in another_file.py then import another_file in your main module and pass another_file.foo to multiprocessing.Process.
Nevertheless, how would you compare it to p = subprocess.Popen(..)? Does it matter if I start the new process (or should I say 'python interpreter instance') with subprocess.Popen(..)versus multiprocessing.Process(..)?
multiprocessing.Process() is likely implemented on top of subprocess.Popen(). multiprocessing provides API that is similar to threading API and it abstracts away details of communication between python processes (how Python objects are serialized to be sent between processes).
If there are no CPU intensive tasks then you could run your GUI and I/O threads in a single process. If you have a series of CPU intensive tasks then to utilize multiple CPUs at once, either use multiple threads with C extensions such as lxml, regex, numpy (or your own one created using Cython) that can release GIL during long computations or offload them into separate processes (a simple way is to use a process pool such as provided by concurrent.futures).
Q: The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):
# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])
# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))
"Approach 1(a)" is wrong on POSIX (though it may work on Windows). For portability, use "Approach 1(b)" unless you know you need cmd.exe (pass a string in this case, to make sure that the correct command-line escaping is used).
The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?
subprocess creates new processes, any processes e.g., you could run a bash script. multprocessing is used to run Python code in another process. It is more flexible to import a Python module and run its function than to run it as a script. See Call python script with input with in a python script using subprocess.
Since you are using the threading module which is build up on thread. As the documentation suggests, it uses the ''POSIX thread implementation'' pthread of your OS.
The threads are managed by the OS instead of Python interpreter. So the answer will depend on the pthread library in your system. However, CPython uses GIL to prevent multiple threads from executing Python bytecodes simutanously. So they will be sequentialized. But still they can be separated to different cores, which depends on your pthread libs.
Simplly use a debugger and attach it to your python.exe. For example the GDB thread command.
Similar to question 1, the new process is managed by your OS and probably running on a different core. Use debugger or any process monitor to see it. For more details, go to the CreatProcess() documentation page.
1, 2: You have three real threads, but in CPython they're limited by GIL , so, assuming they're running pure python, code you'll see CPU usage as if only one core used.
3: As said gdlmx it's up to OS to choose a core to run a thread on,
but if you really need control, you can set process or thread affinity using
native API via ctypes. Since you are on Windows, it would be like this:
# This will run your subprocess on core#0 only
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
cpu_mask = 1
ctypes.windll.kernel32.SetProcessAffinityMask(p._handle, cpu_mask)
I use here private Popen._handle for simplicty. The clean way would beOpenProcess(p.tid) etc.
And yes, subprocess runs python like everything else in another new process.

Threading with Hadoop Streaming

I am making use of Hadoop streaming to write a python based HTML grabber. I find that running a single threaded python script is slow. I want to modify it to a multithreaded version. Does anyone know what would be a good number to set the number of threads in the mapper to. I am not sure of the specs of each node of the cluster but I assume that it would support atleast two threads.
I tried to use threading with python, there were issues with the Global Interpreter Lock. Ported code to use the multiprocessing module, internally hadoop assigns as many mappers as there are cores in the cluster, hence multiprocessing is not the way to go if you need speed up. Multithreading if performed right might give some speedup
I have not use hadoop streaming for html grabber but here is a post that talking about how urllib2 work s using multiple thread (not multipleprocessing package, just simple multi thread).
Hope can be helpful.

distributed python programming

I am trying to split the execution of a python program to two different machines. I am wondering if there's a way to call the python interpreter on one machine from another. Not running a script on another machine, but rather split the task of execution to two machines.
Over the course of the next couple of months, I will be teaching my self distributed programming, and I thought this would be a good way to start.
I think the first step is to use one machine to call another machine and send it a piece of the program. Then the next step would be that both machines execute the same program together and communicate to avoid problems. The third step would be three machines, etc.
Advice, tips, and thoughts are all welcome...
Disclamer: I am a developer of SCOOP.
Data-based technologies you may want to get acquainted with for distributed processing would be the MPI standard (for multi-computers, using mpi4py [prefered] or pympi) and the standard multiprocessing module allowing remote computation (but awkward, from my point of view).
You should begin with task-based frameworks, though. They provides a simple and user-friendly usage. Both of these were an utmost focus while creating SCOOP. You can try it with pip -U scoop. On Windows, you may wish to install PyZMQ first using their executable installers. You can check the provided examples and play with the various parameters to understand what causes performance degradation or increase with ease. I encourage you to compare it to its alternatives such as Celery for similar work.
Both of these frameworks allow remote launching of Python programs. More importantly, it does the parallel processing for you while you only need to feed them with your tasks.
You may want to check Fabric for an easy way to setup your remote environments or even control or launch scripts remotely.
Check out Ray, which is a library for writing parallel and distributed Python.
Ray uses the same syntax to parallelize code on a single multicore machine and in the distributed setting.
If you add the #ray.remote decorator to a function, it can be executed asynchronously in parallel (on any machine in the cluster). Remote function invocations return futures, whose values can be retrieved with ray.get.
The same thing can be done with Python classes (instead of functions), see the documentation for actors.
import ray
import time
ray.init()
#ray.remote
def function(x):
time.sleep(1)
return x
args = [1, 2, 3, 4]
# Submit 4 tasks in parallel.
result_ids = [function.remote(x) for x in args]
# Retrieve the results. Assuming at least 4 cores,
# this will take 1 second.
results = ray.get(result_ids)
See the Ray documentation for more. Note, I'm one of the Ray developers.
There is MPI version for Python [1] [2].
MPI (Message Passing Interface) is a standardized interface and it is cool because you find it also in C, Java, (Fortran) etc.
It enables you to communicate between your processes that run remote. You use these messages for synchronization and for information passing.
You also have collective operations, like broadcast, gather, reduce
Have a look at RPyC, you might find it usefull.

multiprocess a function with a element of list as argument

I am trying to acheive multi processing in python. I might have a minimum of 500 elements in the list at least. I have a function to which each element of a list needs to be passed as an argument. Then each of this function should be executed as a seperate process using mutli processing either starting a new interpretter or however. Following is some pseudo code.
def fiction(arrayElement)
perform some operations here
arrayList[]
for eachElement in arrayList:
fiction(eachElement)
I want multiprocess the function under
for eachElement in arrayList:
So that I can use the multiple cores of my box. All the help is appreciated.
The multiprocessing module contains all sorts of basic classes which can be helpful for this:
from multiprocessing import Pool
def f(x):
return x*x
p = Pool(5)
p.map(f, [1,2,3])
And the work will be distributed among 3 processes.
This is fairly simple, but you can achieve much more using an external packages, mostly a Message-oriented middleware.
Prime examples are ActiveMQ, RabbitMQ and ZeroMQ.
RabbitMQ has a combination of good python API and simplicity. You can see here how simple it is to create a dispatcher-workers pattern, in which one process is sending the workload, and other processes preform it.
ZeroMQ is a bit more low-level, but is very lightweight and does not require an extenal broker.

Categories