I'm very new to multiprocessing so I'm likely doing something really dumb. So the situation in a nutshell:
I have a GUI app that performs multiple lengthy calculations in the background.
Since it's a GUI app, the wrapper method that does all calculations uses threading to prevent window from hanging:
def _run_calc(self):
"""
Run data processing in a separate thread to prevent the main
window from freezing.
"""
t = threading.Thread(target=self._process_data)
t.start()
Inside this thread, further down the line the wrapper method that runs all individual calculations is using multiprocessing:
def _calculate_components(self):
processes = []
if self.mineralogy.get():
self.minerals = self._get_mineralogy_components()
miner_worker = Process(target=self.calculate_mineralogy())
processes.append(miner_worker)
if self.porosity.get():
porosity_worker = Process(target=self.calculate_porosity())
processes.append(porosity_worker)
if self.poi.get():
poi_worker = Process(target=self.calculate_poi())
processes.append(poi_worker)
if self.water_table.get():
owt_worker = Process(target=self.calculate_owt())
processes.append(owt_worker)
for i in processes:
i.start()
for i in processes:
i.join()
self._add_components_to_data()
Now the problem is that based on console output processes get executed one after another, not concurrently.
Also without using multiprocessing a run on a test data takes 35 seconds and 47 with multiprocessing, which, of course defeats the whole purpoise.
I'm pretty sure I'm misunderstanding something here and doing something completely wrong. How to make processes run in parallel?
Related
Objective
a process (.exe) with multiple input arguments
Multiple files. For each the above mentioned process shall be executed
I want to use python to parallelize the process
I am using subprocess.Popen to create the processes and afterwards keep a maximum of N parallel processes.
For testing purposes, I want to parallelize a simple script like "cmd timeout 5".
State of work
import subprocess
count = 10
parallel = 2
processes = []
for i in range(0,count):
while (len(processes) >= parallel):
for process in processes:
if (process.poll() is None):
processes.remove(process)
break
process = subprocess.Popen(["cmd", "/c timeout 5"])
processes.append(process)
[...]
I read somewhere that a good approach for checking if a process is running would be is not None like shown in the code.
Question
I am somehow struggling to set it up correctly, especially the Popen([...]) part. In some cases, all processes are executed without considering the maximum parallel count and in other cases, it doesnt work at all.
I guess that there has to be a part where the process is closed if finished.
Thanks!
You will probably have a better time using the built-in multiprocessing module to manage the subprocesses running your tasks.
The reason I've wrapped the command in a dict is that imap_unordered (which is faster than imap but doesn't guarantee ordered execution since any worker process can grab any job – whether that's okay for you is your business problem) doesn't have a starmap alternative, so it's easier to unpack a single "job" within the callable.
import multiprocessing
import subprocess
def run_command(job):
# TODO: add other things here?
subprocess.check_call(job["command"])
def main():
with multiprocessing.Pool(2) as p:
jobs = [{"command": ["cmd", "/c timeout 5"]} for x in range(10)]
for result in p.imap_unordered(run_command, jobs):
pass
if __name__ == "__main__":
main()
I frequently use the pattern below to parallelify tasks in python. I do it this way because filling the input queue is quick, and once the processes are launched and running asynchronously, I can call a blocking get() in a loop and pull the results out as they are ready. For tasks which take days, this is great because I can do things like report progress.
from multiprocessing import Process, Queue
class worker():
def __init__(self, init_dict,):
self.init_dict = init_dict
def __call__(self, task_queue, done_queue):
for task_args in task_queue.get()
task_result = self.do_work(task_args)
done_queue.put(task_result)
if __name__=="__main__":
n_threads = 8
init_dict = {} # whatever we need to setup our class
worker_class = worker(init_dict)
task_queue = Queue()
done_queue = Queue()
some_iterator = [1,2,3,4,5] # or a list of files to chew through normally
for task in some_iterator:
task_queue.put(task)
for i in range(n_threads):
Process(target=worker_class, args=(task_queue, done_queue)).start()
for i in range(len(some_iterator)):
result = done_queue.get()
# do something with result
# print out progress stats, whatever, as tasks complete
I have glossed over a few detail like catching errors, dealing with things that fail, killing zombie process, exiting at the end of the task queue and catching tracebacks, but you get the idea. I really love this pattern and it works perfectly for my needs. I have a lot of code that uses it.
I need more computing power though and want to spread the work across a cluster. Ray offers a multiprocessing pool with an API that matches that of python multiprocessing. I just can't work out how to get the above pattern to work. Mainly I get:
RuntimeError: Queue objects should only be shared between processes through inheritance
Does anybody have any recommendations of how I can get results as they are ready from a queue when using a pool, rather than n separate processes?
I appreciate that if I do a massive rewrite, then there are probably other ways to get what I want from ray, but I have a lot of code like this, so want to try and keep changes minimal.
Thanks
I've encountered some unexpected behaviour of the python multiprocessing Pool class.
Here are my questions:
1) When does Pool creates its context, which is later used for serialization? The example below runs fine as long as the Pool object is created after the Container definition. If you swap the Pool initializations, serialization error occurs. In my production code I would like to initialize Pool way before defining the container class. Is it possible to refresh Pool "context" or to achieve this in another way.
2) Does Pool have its own load balancing mechanism and if so how does it work?
If I run a similar example on my i7 machine with the pool of 8 processes I get the following results:
- For a light evaluation function Pool favours using only one process for computation. It creates 8 processes as requested but for most of the time only one is used (I printed the pid from inside and also see this in htop).
- For a heavy evaluation function the behaviour is as expected. It uses all 8 processes equally.
3) When using Pool I always see 4 more processes that I requested (i.e. for Pool(processes=2) I see 6 new processes). What is their role?
I use Linux with Python 2.7.2
from multiprocessing import Pool
from datetime import datetime
POWER = 10
def eval_power(container):
for power in xrange(2, POWER):
container.val **= power
return container
#processes = Pool(processes=2)
class Container(object):
def __init__(self, value):
self.val = value
processes = Pool(processes=2)
if __name__ == "__main__":
cont = [Container(foo) for foo in xrange(20)]
then = datetime.now()
processes.map(eval_power, cont)
now = datetime.now()
print "Eval time:", now - then
EDIT - TO BAKURIU
1) I was afraid that that's the case.
2) I don't understand what the linux scheduler has to do with python assigning computations to processes. My situation can be ilustrated by the example below:
from multiprocessing import Pool
from os import getpid
from collections import Counter
def light_func(ind):
return getpid()
def heavy_func(ind):
for foo in xrange(1000000):
ind += foo
return getpid()
if __name__ == "__main__":
list_ = range(100)
pool = Pool(4)
l_func = pool.map(light_func, list_)
h_func = pool.map(heavy_func, list_)
print "light func:", Counter(l_func)
print "heavy func:", Counter(h_func)
On my i5 machine (4 threads) I get the following results:
light func: Counter({2967: 100})
heavy func: Counter({2969: 28, 2967: 28, 2968: 23, 2970: 21})
It seems that the situation is as I've described it. However I still don't understand why python does it this way. My guess would be that it tries to minimise communication expenses, but still the mechanism which it uses for load balancing is unknown. The documentation isn't very helpful either, the multiprocessing module is very poorly documented.
3) If I run the above code I get 4 more processes as described before. The screen comes from htop: http://i.stack.imgur.com/PldmM.png
The Pool object creates the subprocesses during the call to __init__ hence you must define Container before. By the way, I wouldn't include all the code in a single file but use a module to implement the Container and other utilities and write a small file that launches the main program.
The Pool does exactly what is described in the documentation. In particular it has no control over the scheduling of the processes hence what you see is what Linux's scheduler thinks it is right. For small computations they take so little time that the scheduler doesn't bother parallelizing them(this probably have better performances due to core affinity etc.)
Could you show this with an example and what you see in the task manager? I think they may be the processes that handle the queue inside the Pool, but I'm not sure. On my machine I can see only the main process plus the two subprocesses.
Update on point 2:
The Pool object simply puts the tasks into a queue, and the child processes get the arguments from this queue. If a process takes almost no time to execute an object, than Linux scheduler let the process execute more time(hence consuming more items from the queue). If the execution takes much time then this scheduler will change processes and thus the other child processes are also executed.
In your case a single process is consuming all items because the computation take so little time that before the other child processes are ready it has already finished all items.
As I said, Pool doesn't do anything about balancing the work of the subprocesses. It's simply a queue and a bunch of workers, the pool puts items in the queue and the processes get the items and compute the results. AFAIK the only thing that it does to control the queue is putting a certain number of tasks in a single item in the queue(see the documentation) but there is no guarantee about which process will grab which task. Everything else is left to the OS.
On my machine the results are less extreme. Two processes get about twice the number of calls than the other two for the light computation, while for the heavy one all have more or less the same number of items processed. Probably on different OSes and/or hardware we would obtain even different results.
Background
I have a collection of Python scripts used to build and execute Verilog-AMS tesbenches. The overall design was built with threading in mind, as each major test case is its own testbench and I have all of the supporting files / data output separate for each instance. The only shared items will be the launcher script and my data extraction script. The problem that I'm faced with is that my Verilog-AMS simulator does not natively support multithreading and for my test cases it takes a substantial amount of time to complete.
Problem
The machine I'm running this on has 32GiB of RAM and 8 "cores" available for me to use and I may be able to access a machine with 32. I would like to take advantage of the available computing power and execute the simulations simultaneously. What would be the best approach?
I currently use subprocess.call to execute my simulation. I would like to execute up to n commands at once, with each one executing on a separate thread / as a separate process. Once a simulation has completed, the next one in the queue (if one exists) would execute.
I'm pretty new to Python and haven't really written a threaded application. I would like some advice on how I should proceed. I saw this question, and from that I think the multiprocessing module may be better suited to my needs.
What do you all recommend?
I had done some similar task in the past with Machine Learning and Data Mining. Using multiprocessing in your case may not be that difficult of a task. It depends on how tolerant you are keen on making the program, you can use a Threaded Pool pattern. My personal favourite is Producer - Consumer pattern using Queue, this design can handle a variety of complex task. Here is a sample toy program using multiprocessing:
import multiprocessing
from multiprocessing import Queue, Process
from Queue import Empty as QueueEmpty
# Assuming this text is very very very very large
text="Here I am writing some nonsense\nBut people will read\n..."
def read(q):
"""Read the text and put in a queue"""
for line in text.split("\n"):
q.put(line)
def work(qi, qo):
"""Put the line into the queue out"""
while True:
try:
data = qi.get(timeout = 1) # Timeout after 1 second
qo.put(data)
except QueueEmpty:
return # Exit when all work is done
except:
raise # Raise all other errors
def join(q):
"""Join all the output queue and write to a text file"""
f = open("file.txt", w)
while True:
try:
f.write(q.get(timeout=1))
except QueueEmpty:
f.close()
return
except:
raise
def main():
# Input queue
qi = Queue()
# Output queue
qo = Queue()
# Start the producer
Process(target = read, args = (qi, )).start()
# Start 8 consumers
for i in range(8):
Process(target = work, args = (qi, qo, )).start()
# Final process to handle the queue out
Process(target = join, args = (qo, )).start()
Type this from memory so if there is any error, please correct. :)
Suppose I have the following in Python
# A loop
for i in range(10000):
Do Task A
# B loop
for i in range(10000):
Do Task B
How do I run these loops simultaneously in Python?
If you want concurrency, here's a very simple example:
from multiprocessing import Process
def loop_a():
while 1:
print("a")
def loop_b():
while 1:
print("b")
if __name__ == '__main__':
Process(target=loop_a).start()
Process(target=loop_b).start()
This is just the most basic example I could think of. Be sure to read http://docs.python.org/library/multiprocessing.html to understand what's happening.
If you want to send data back to the program, I'd recommend using a Queue (which in my experience is easiest to use).
You can use a thread instead if you don't mind the global interpreter lock. Processes are more expensive to instantiate but they offer true concurrency.
There are many possible options for what you wanted:
use loop
As many people have pointed out, this is the simplest way.
for i in xrange(10000):
# use xrange instead of range
taskA()
taskB()
Merits: easy to understand and use, no extra library needed.
Drawbacks: taskB must be done after taskA, or otherwise. They can't be running simultaneously.
multiprocess
Another thought would be: run two processes at the same time, python provides multiprocess library, the following is a simple example:
from multiprocessing import Process
p1 = Process(target=taskA, args=(*args, **kwargs))
p2 = Process(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
merits: task can be run simultaneously in the background, you can control tasks(end, stop them etc), tasks can exchange data, can be synchronized if they compete the same resources etc.
drawbacks: too heavy!OS will frequently switch between them, they have their own data space even if data is redundant. If you have a lot tasks (say 100 or more), it's not what you want.
threading
threading is like process, just lightweight. check out this post. Their usage is quite similar:
import threading
p1 = threading.Thread(target=taskA, args=(*args, **kwargs))
p2 = threading.Thread(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
coroutines
libraries like greenlet and gevent provides something called coroutines, which is supposed to be faster than threading. No examples provided, please google how to use them if you're interested.
merits: more flexible and lightweight
drawbacks: extra library needed, learning curve.
Why do you want to run the two processes at the same time? Is it because you think they will go faster (there is a good chance that they wont). Why not run the tasks in the same loop, e.g.
for i in range(10000):
doTaskA()
doTaskB()
The obvious answer to your question is to use threads - see the python threading module. However threading is a big subject and has many pitfalls, so read up on it before you go down that route.
Alternatively you could run the tasks in separate proccesses, using the python multiprocessing module. If both tasks are CPU intensive this will make better use of multiple cores on your computer.
There are other options such as coroutines, stackless tasklets, greenlets, CSP etc, but Without knowing more about Task A and Task B and why they need to be run at the same time it is impossible to give a more specific answer.
from threading import Thread
def loopA():
for i in range(10000):
#Do task A
def loopB():
for i in range(10000):
#Do task B
threadA = Thread(target = loopA)
threadB = Thread(target = loobB)
threadA.run()
threadB.run()
# Do work indepedent of loopA and loopB
threadA.join()
threadB.join()
You could use threading or multiprocessing.
How about: A loop for i in range(10000): Do Task A, Do Task B ? Without more information i dont have a better answer.
I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make sure your function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example, that's how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!