Python Subprocess Popen Parallelization

Python Subprocess Popen Parallelization - python

Objective
a process (.exe) with multiple input arguments
Multiple files. For each the above mentioned process shall be executed
I want to use python to parallelize the process
I am using subprocess.Popen to create the processes and afterwards keep a maximum of N parallel processes.
For testing purposes, I want to parallelize a simple script like "cmd timeout 5".
State of work
import subprocess
count = 10
parallel = 2
processes = []
for i in range(0,count):
while (len(processes) >= parallel):
for process in processes:
if (process.poll() is None):
processes.remove(process)
break
process = subprocess.Popen(["cmd", "/c timeout 5"])
processes.append(process)
[...]
I read somewhere that a good approach for checking if a process is running would be is not None like shown in the code.
Question
I am somehow struggling to set it up correctly, especially the Popen([...]) part. In some cases, all processes are executed without considering the maximum parallel count and in other cases, it doesnt work at all.
I guess that there has to be a part where the process is closed if finished.
Thanks!

You will probably have a better time using the built-in multiprocessing module to manage the subprocesses running your tasks.
The reason I've wrapped the command in a dict is that imap_unordered (which is faster than imap but doesn't guarantee ordered execution since any worker process can grab any job – whether that's okay for you is your business problem) doesn't have a starmap alternative, so it's easier to unpack a single "job" within the callable.
import multiprocessing
import subprocess
def run_command(job):
# TODO: add other things here?
subprocess.check_call(job["command"])
def main():
with multiprocessing.Pool(2) as p:
jobs = [{"command": ["cmd", "/c timeout 5"]} for x in range(10)]
for result in p.imap_unordered(run_command, jobs):
pass
if __name__ == "__main__":
main()

Related

Spawning multiple processes with Python

Earlier I tried to use the threading module in python to create multiple threads. Then I learned about the GIL and how it does not allow taking advantage of multiple CPU cores on a single machine. So now I'm trying to do multiprocessing (I don't strictly need seperate threads).
Here is a sample code I wrote to see if distinct processes are being created. But as can be seen in the output below, I'm getting the same process ID everytime. So multiple processes are not being created. What am I missing?
import multiprocessing as mp
import os
def pri():
print(os.getpid())
if __name__=='__main__':
# Checking number of CPU cores
print(mp.cpu_count())
processes=[mp.Process(target=pri()) for x in range(1,4)]
for p in processes:
p.start()
for p in processes:
p.join()
Output:
4
12554
12554
12554

The Process class requires a callable as its target.
Instead of running the function in the separate process, you are calling it and passing its result (None in this case) to the Process class.
Just change the following:
mp.Process(target=pri())
with:
mp.Process(target=pri)

Since the subprocesses runs on a different process, you won't see their print statements. They also don't share the same memory space. You pass pri() to target, where it needs to be pri. You need to pass a callable object, not execute it.
The prints you see are part of your main thread executions. Because you pass pri(), the code is actually executed. You need to change your code so the pri function returns value, rather than prints it.
Then you need to implement a queue, where all your threads write to it and when they're done, your main thread reads the queue.
A nice feature of the multiprocessing module is the Pool object. It allows you to create a thread pool, and then just use it. It's more convenient.
I have tried your code, the thing is the command executes too quick, so the OS reuses the PIDs. If you add a time.sleep(1) in your pri function, it would work as you expect.
That is True only for Windows. The example below is made on Windows platform. On Unix like machines, you won't need the sleep.
The more convenience solution is like this:
from multiprocessing import Pool
from time import sleep
import os
def pri(x):
sleep(1)
return os.getpid()
def use_procs():
p_pool = Pool(4)
p_results = p_pool.map(pri, [_ for _ in range(1,4)])
p_pool.close()
p_pool.join()
return p_results
if __name__ == '__main__':
res = use_procs()
for r in res:
print r
Without the sleep:
==================== RESTART: C:/Python27/tests/test2.py ====================
6576
6576
6576
>>>
with the sleep:
==================== RESTART: C:/Python27/tests/test2.py ====================
10396
10944
9000

multiprocessing.pool context and load balancing

I've encountered some unexpected behaviour of the python multiprocessing Pool class.
Here are my questions:
1) When does Pool creates its context, which is later used for serialization? The example below runs fine as long as the Pool object is created after the Container definition. If you swap the Pool initializations, serialization error occurs. In my production code I would like to initialize Pool way before defining the container class. Is it possible to refresh Pool "context" or to achieve this in another way.
2) Does Pool have its own load balancing mechanism and if so how does it work?
If I run a similar example on my i7 machine with the pool of 8 processes I get the following results:
- For a light evaluation function Pool favours using only one process for computation. It creates 8 processes as requested but for most of the time only one is used (I printed the pid from inside and also see this in htop).
- For a heavy evaluation function the behaviour is as expected. It uses all 8 processes equally.
3) When using Pool I always see 4 more processes that I requested (i.e. for Pool(processes=2) I see 6 new processes). What is their role?
I use Linux with Python 2.7.2
from multiprocessing import Pool
from datetime import datetime
POWER = 10
def eval_power(container):
for power in xrange(2, POWER):
container.val **= power
return container
#processes = Pool(processes=2)
class Container(object):
def __init__(self, value):
self.val = value
processes = Pool(processes=2)
if __name__ == "__main__":
cont = [Container(foo) for foo in xrange(20)]
then = datetime.now()
processes.map(eval_power, cont)
now = datetime.now()
print "Eval time:", now - then
EDIT - TO BAKURIU
1) I was afraid that that's the case.
2) I don't understand what the linux scheduler has to do with python assigning computations to processes. My situation can be ilustrated by the example below:
from multiprocessing import Pool
from os import getpid
from collections import Counter
def light_func(ind):
return getpid()
def heavy_func(ind):
for foo in xrange(1000000):
ind += foo
return getpid()
if __name__ == "__main__":
list_ = range(100)
pool = Pool(4)
l_func = pool.map(light_func, list_)
h_func = pool.map(heavy_func, list_)
print "light func:", Counter(l_func)
print "heavy func:", Counter(h_func)
On my i5 machine (4 threads) I get the following results:
light func: Counter({2967: 100})
heavy func: Counter({2969: 28, 2967: 28, 2968: 23, 2970: 21})
It seems that the situation is as I've described it. However I still don't understand why python does it this way. My guess would be that it tries to minimise communication expenses, but still the mechanism which it uses for load balancing is unknown. The documentation isn't very helpful either, the multiprocessing module is very poorly documented.
3) If I run the above code I get 4 more processes as described before. The screen comes from htop: http://i.stack.imgur.com/PldmM.png

The Pool object creates the subprocesses during the call to __init__ hence you must define Container before. By the way, I wouldn't include all the code in a single file but use a module to implement the Container and other utilities and write a small file that launches the main program.
The Pool does exactly what is described in the documentation. In particular it has no control over the scheduling of the processes hence what you see is what Linux's scheduler thinks it is right. For small computations they take so little time that the scheduler doesn't bother parallelizing them(this probably have better performances due to core affinity etc.)
Could you show this with an example and what you see in the task manager? I think they may be the processes that handle the queue inside the Pool, but I'm not sure. On my machine I can see only the main process plus the two subprocesses.
Update on point 2:
The Pool object simply puts the tasks into a queue, and the child processes get the arguments from this queue. If a process takes almost no time to execute an object, than Linux scheduler let the process execute more time(hence consuming more items from the queue). If the execution takes much time then this scheduler will change processes and thus the other child processes are also executed.
In your case a single process is consuming all items because the computation take so little time that before the other child processes are ready it has already finished all items.
As I said, Pool doesn't do anything about balancing the work of the subprocesses. It's simply a queue and a bunch of workers, the pool puts items in the queue and the processes get the items and compute the results. AFAIK the only thing that it does to control the queue is putting a certain number of tasks in a single item in the queue(see the documentation) but there is no guarantee about which process will grab which task. Everything else is left to the OS.
On my machine the results are less extreme. Two processes get about twice the number of calls than the other two for the light computation, while for the heavy one all have more or less the same number of items processed. Probably on different OSes and/or hardware we would obtain even different results.

Subprocess management with python

I have a data analysis script that takes an argument specifying the segments of the analysis to perform. I want to run up to 'n' instances of the script at a time where 'n' is the number of cores on the machine. The complication is that there are more segments of the analysis than there are cores so I want to run at most, 'n' processes at once, and one of them finishes, kick off another one. Has anyone done something like this before using the subprocess module?

I do think that multiprocessing module will help you achieve what you need.
Take look at the example technique.
import multiprocessing
def do_calculation(data):
"""
#note: you can define your calculation code
"""
return data * 2
def start_process():
print 'Starting', multiprocessing.current_process().name
if __name__ == '__main__':
analsys_jobs = list(range(10)) # could be your analysis work
print 'analsys_jobs :', analsys_jobs
pool_size = multiprocessing.cpu_count() * 2
pool = multiprocessing.Pool(processes=pool_size,
initializer=start_process,
maxtasksperchild=2, )
#maxtasksperchild = tells the pool to restart a worker process \
# after it has finished a few tasks. This can be used to avoid \
# having long-running workers consume ever more system resources
pool_outputs = pool.map(do_calculation, analsys_jobs)
#The result of the map() method is functionally equivalent to the \
# built-in map(), except that individual tasks run in parallel. \
# Since the pool is processing its inputs in parallel, close() and join()\
# can be used to synchronize the main process with the \
# task processes to ensure proper cleanup.
pool.close() # no more tasks
pool.join() # wrap up current tasks
print 'Pool :', pool_outputs
You could find good multiprocessing techniques here to start with

Use the multiprocessing module, specifically the Pool class. Pool creates a pool of processes (by default, as many processes as you have CPUs), and allows you to submit jobs to the pool which are executed on the next free process. It takes care of all the subprocess management and the details of passing data between tasks, so you can write code in a very straightforward manner. See the documentation for some examples on usage.

Multiprocessing in Python while limiting the number of running processes

I'd like to run multiple instances of program.py simultaneously, while limiting the number of instances running at the same time (e.g. to the number of CPU cores available on my system). For example, if I have 10 cores and have to do 1000 runs of program.py in total, only 10 instances will be created and running at any given time.
I've tried using the multiprocessing module, multithreading, and using queues, but there's nothing that seemed to me to lend itself to an easy implementation. The biggest problem I have is finding a way to limit the number of processes running simultaneously. This is important because if I create 1000 processes at once, it becomes equivalent to a fork bomb. I don't need the results returned from the processes programmatically (they output to disk), and the processes all run independently of each other.
Can anyone please give me suggestions or an example of how I could implement this in python, or even bash? I'd post the code I've written so far using queues, but it doesn't work as intended and might already be down the wrong path.
Many thanks.

I know you mentioned that the Pool.map approach doesn't make much sense to you. The map is just an easy way to give it a source of work, and a callable to apply to each of the items. The func for the map could be any entry point to do the actual work on the given arg.
If that doesn't seem right for you, I have a pretty detailed answer over here about using a Producer-Consumer pattern: https://stackoverflow.com/a/11196615/496445
Essentially, you create a Queue, and start N number of workers. Then you either feed the queue from the main thread, or create a Producer process that feeds the queue. The workers just keep taking work from the queue and there will never be more concurrent work happening than the number of processes you have started.
You also have the option of putting a limit on the queue, so that it blocks the producer when there is already too much outstanding work, if you need to put constraints also on the speed and resources that the producer consumes.
The work function that gets called can do anything you want. This can be a wrapper around some system command, or it can import your python lib and run the main routine. There are specific process management systems out there which let you set up configs to run your arbitrary executables under limited resources, but this is just a basic python approach to doing it.
Snippets from that other answer of mine:
Basic Pool:
from multiprocessing import Pool
def do_work(val):
# could instantiate some other library class,
# call out to the file system,
# or do something simple right here.
return "FOO: %s" % val
pool = Pool(4)
work = get_work_args()
results = pool.map(do_work, work)
Using a process manager and producer
from multiprocessing import Process, Manager
import time
import itertools
def do_work(in_queue, out_list):
while True:
item = in_queue.get()
# exit signal
if item == None:
return
# fake work
time.sleep(.5)
result = item
out_list.append(result)
if __name__ == "__main__":
num_workers = 4
manager = Manager()
results = manager.list()
work = manager.Queue(num_workers)
# start for workers
pool = []
for i in xrange(num_workers):
p = Process(target=do_work, args=(work, results))
p.start()
pool.append(p)
# produce data
# this could also be started in a producer process
# instead of blocking
iters = itertools.chain(get_work_args(), (None,)*num_workers)
for item in iters:
work.put(item)
for p in pool:
p.join()
print results

You should use a process supervisor. One approach would be using the API provided by Circus to do that "programatically", the documentation site is now offline but I think its just a temporary problem, anyway, you can use the Circus to handle this. Another approach would be using the supervisord and setting the parameter numprocs of the process to the number of cores you have.
An example using Circus:
from circus import get_arbiter
arbiter = get_arbiter("myprogram", numprocesses=3)
try:
arbiter.start()
finally:
arbiter.stop()

Bash script rather than Python, but I use it often for simple parallel processing:
#!/usr/bin/env bash
waitForNProcs()
{
nprocs=$(pgrep -f $procName | wc -l)
while [ $nprocs -gt $MAXPROCS ]; do
sleep $SLEEPTIME
nprocs=$(pgrep -f $procName | wc -l)
done
}
SLEEPTIME=3
MAXPROCS=10
procName=myPython.py
for file in ./data/*.txt; do
waitForNProcs
./$procName $file &
done
Or for very simple cases, another option is xargs where P sets the number of procs
find ./data/ | grep txt | xargs -P10 -I SUB ./myPython.py SUB

While there are many answers about using multiprocessing.pool, there are not many code snippets on how to use multiprocessing.Process, which is indeed more beneficial when memory usage matters. starting 1000 processes will overload the CPU and kill the memory. If each process and its data pipelines are memory intensive, OS or Python itself will limit the number of parallel processes. I developed the below code to limit the simultaneous number of jobs submitted to the CPU in batches. The batch size can be scaled proportional to the number of CPU cores. In my windows PC, the number of jobs per batch can be efficient upto 4 times the CPU coures available.
import multiprocessing
def func_to_be_multiprocessed(q,data):
q.put(('s'))
q = multiprocessing.Queue()
worker = []
for p in range(number_of_jobs):
worker[p].append(multiprocessing.Process(target=func_to_be_multiprocessed, \
args=(q,data)...))
num_cores = multiprocessing.cpu_count()
Scaling_factor_batch_jobs = 3.0
num_jobs_per_batch = num_cores * Scaling_factor_batch_jobs
num_of_batches = number_of_jobs // num_jobs_per_batch
for i_batch in range(num_of_batches):
floor_job = i_batch * num_jobs_per_batch
ceil_job = floor_job + num_jobs_per_batch
for p in worker[floor_job : ceil_job]:
worker.start()
for p in worker[floor_job : ceil_job]:
worker.join()
for p in worker[ceil_job :]:
worker.start()
for p in worker[ceil_job :]:
worker.join()
for p in multiprocessing.active_children():
p.terminate()
result = []
for p in worker:
result.append(q.get())
The only problem is, if any of the job in any batch could not complete and leads to a hanging situation, rest of the batches of jobs will not be initiated. So, the function to be processed must have proper error handling routines.

How do I run two python loops concurrently?

Suppose I have the following in Python
# A loop
for i in range(10000):
Do Task A
# B loop
for i in range(10000):
Do Task B
How do I run these loops simultaneously in Python?

If you want concurrency, here's a very simple example:
from multiprocessing import Process
def loop_a():
while 1:
print("a")
def loop_b():
while 1:
print("b")
if __name__ == '__main__':
Process(target=loop_a).start()
Process(target=loop_b).start()
This is just the most basic example I could think of. Be sure to read http://docs.python.org/library/multiprocessing.html to understand what's happening.
If you want to send data back to the program, I'd recommend using a Queue (which in my experience is easiest to use).
You can use a thread instead if you don't mind the global interpreter lock. Processes are more expensive to instantiate but they offer true concurrency.

There are many possible options for what you wanted:
use loop
As many people have pointed out, this is the simplest way.
for i in xrange(10000):
# use xrange instead of range
taskA()
taskB()
Merits: easy to understand and use, no extra library needed.
Drawbacks: taskB must be done after taskA, or otherwise. They can't be running simultaneously.
multiprocess
Another thought would be: run two processes at the same time, python provides multiprocess library, the following is a simple example:
from multiprocessing import Process
p1 = Process(target=taskA, args=(*args, **kwargs))
p2 = Process(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
merits: task can be run simultaneously in the background, you can control tasks(end, stop them etc), tasks can exchange data, can be synchronized if they compete the same resources etc.
drawbacks: too heavy!OS will frequently switch between them, they have their own data space even if data is redundant. If you have a lot tasks (say 100 or more), it's not what you want.
threading
threading is like process, just lightweight. check out this post. Their usage is quite similar:
import threading
p1 = threading.Thread(target=taskA, args=(*args, **kwargs))
p2 = threading.Thread(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
coroutines
libraries like greenlet and gevent provides something called coroutines, which is supposed to be faster than threading. No examples provided, please google how to use them if you're interested.
merits: more flexible and lightweight
drawbacks: extra library needed, learning curve.

Why do you want to run the two processes at the same time? Is it because you think they will go faster (there is a good chance that they wont). Why not run the tasks in the same loop, e.g.
for i in range(10000):
doTaskA()
doTaskB()
The obvious answer to your question is to use threads - see the python threading module. However threading is a big subject and has many pitfalls, so read up on it before you go down that route.
Alternatively you could run the tasks in separate proccesses, using the python multiprocessing module. If both tasks are CPU intensive this will make better use of multiple cores on your computer.
There are other options such as coroutines, stackless tasklets, greenlets, CSP etc, but Without knowing more about Task A and Task B and why they need to be run at the same time it is impossible to give a more specific answer.

from threading import Thread
def loopA():
for i in range(10000):
#Do task A
def loopB():
for i in range(10000):
#Do task B
threadA = Thread(target = loopA)
threadB = Thread(target = loobB)
threadA.run()
threadB.run()
# Do work indepedent of loopA and loopB
threadA.join()
threadB.join()

You could use threading or multiprocessing.

How about: A loop for i in range(10000): Do Task A, Do Task B ? Without more information i dont have a better answer.

I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make sure your function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example, that's how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.