Understanding Python Multiprocessing Documentation - python

Trying to understand Python multiprocessing documents.
I would put this on meta but I'm not sure whether it might be valuable to searchers later.
I need some guidance as to how these examples relate to multiprocessing.
Am I correct in thinking that multiprocessing is using multiple processes (and thus CPUs) in order to break down an iterable task and thus shorten its duration?
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
We are starting one process; but how do I start multiple to complete my task? Do I iterate through Process + start() lines?
Yet there are no examples later in the documentation of for example:
for x in range(5):
p[x]=Process(target=f, args=('bob',))
p[x].start()
p.join()
Would that be the 'real life' implementation?
Here is the 'Queue Example':
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print(q.get()) # prints "[42, None, 'hello']"
p.join()
But again, how is this multiprocessing? This is just starting a process and having it run objects in a queue?
How do I make multiple processes start and run the objects in the queue?
Finally for pool:
from multiprocessing import Pool
import time
def f(x):
return x*x
if __name__ == '__main__':
with Pool(processes=4) as pool: # start 4 worker processes
result = pool.apply_async(f, (10,)) # evaluate "f(10)" asynchronously in a single process
print(result.get(timeout=1)) # prints "100" unless your computer is *very* slow
Are four processes doing 10x10 at once and it waits until all four come back or does just the one do this because we only gave the pool one argument?
If the former: Wouldn't that be slower than just having one do it in the first place? What about memory? Do we hold process 1's result until process 4 returns in RAM or does it get printed?
print(pool.map(f, range(10))) # prints "[0, 1, 4,..., 81]"
it = pool.imap(f, range(10))
print(next(it)) # prints "0"
print(next(it)) # prints "1"
print(it.next(timeout=1)) # prints "4" unless your computer is *very* slow
result = pool.apply_async(time.sleep, (10,))
print(result.get(timeout=1)) # raises multiprocessing.TimeoutError

Related

multiprocessing.pool with manager and async methods

I am trying to make use of Manager() to share dictionary between processes and tried out the following code:
from multiprocessing import Manager, Pool
def f(d):
d['x'] += 2
if __name__ == '__main__':
manager = Manager()
d = manager.dict()
d['x'] = 2
p= Pool(4)
for _ in range(2000):
p.map_async(f, (d,)) #apply_async, map
p.close()
p.join()
print (d) # expects this result --> {'x': 4002}
Using map_async and apply_async, the result printed is always different (e.g. {'x': 3838}, {'x': 3770}).
However, using map will give the expected result.
Also, i have tried using Process instead of Pool, the results are different too.
Any insights?
Something on the non-blocking part and race conditions are not handled by manager?
When you call map (rather than map_async), it will block until the processors have finished all the requests you are passing, which in your case is just one call to function f. So even though you have a pool size of 4, you are in essence doing the 2000 processes one at a time. To actually parallelize execution, you should have done a single p.map(f, [d]*2000) instead of the loop.
But when you call map_async, you do not block and are returned a result object. A call to get on the result object will block until the process finishes and will return with the result of the function call. So now you are running up to 4 processes at a time. But the update to the dictionary is not serialized across the processors. I have modifed the code to force serialization of of d[x] += 2 by using a multiprocessing lock. You will see that the results are now 4002.
from multiprocessing import Manager, Pool, Lock
def f(d):
lock.acquire()
d['x'] += 2
lock.release()
def init(l):
global lock
lock = l
if __name__ == '__main__':
with Manager() as manager:
d = manager.dict()
d['x'] = 2
lock = Lock()
p = Pool(4, initializer=init, initargs=(lock,)) # Create the multiprocessing lock that is sharable by all the processes
results = [] # if the function returnd a result we wanted
for _ in range(2000):
results.append(p.map_async(f, (d,))) #apply_async, map
"""
for i in range(2000): # if the function returned a result we wanted
results[i].get() # wait for everything to finish
"""
p.close()
p.join()
print(d)

Multiprocessing - Shared Array

So I'm trying to implement multiprocessing in python where I wish to have a Pool of 4-5 processes running a method in parallel. The purpose of this is to run a total of thousand Monte simulations (250-200 simulations per process) instead of running 1000. I want each process to write to a common shared array by acquiring a lock on it as soon as its done processing the result for one simulation, writing the result and releasing the lock. So it should be a three step process :
Acquire lock
Write result
Release lock for other processes waiting to write to array.
Everytime I pass the array to the processes each process creates a copy of that array which I donot want as I want a common array. Can anyone help me with this by providing sample code?
Since you're only returning state from the child process to the parent process, then using a shared array and explicity locks is overkill. You can use Pool.map or Pool.starmap to accomplish exactly what you need. For example:
from multiprocessing import Pool
class Adder:
"""I'm using this class in place of a monte carlo simulator"""
def add(self, a, b):
return a + b
def setup(x, y, z):
"""Sets up the worker processes of the pool.
Here, x, y, and z would be your global settings. They are only included
as an example of how to pass args to setup. In this program they would
be "some arg", "another" and 2
"""
global adder
adder = Adder()
def job(a, b):
"""wrapper function to start the job in the child process"""
return adder.add(a, b)
if __name__ == "__main__":
args = list(zip(range(10), range(10, 20)))
# args == [(0, 10), (1, 11), ..., (8, 18), (9, 19)]
with Pool(initializer=setup, initargs=["some arg", "another", 2]) as pool:
# runs jobs in parallel and returns when all are complete
results = pool.starmap(job, args)
print(results) # prints [10, 12, ..., 26, 28]
Not tested, but something like that should work.
The array and lock are shared between processes.
from multiprocessing import Process, Array, Lock
def f(array, lock, n): #n is the dedicated location in the array
lock.acquire()
array[n]=-array[n]
lock.release()
if __name__ == '__main__':
size=100
arr=Array('i', [3,-7])
lock=Lock()
p = Process(target=f, args=(arr,lock,0))
q = Process(target=f, args=(arr,lock,1))
p.start()
q.start()
q.join()
p.join()
print(arr[:])
the documentation here https://docs.python.org/3.5/library/multiprocessing.html has plenty of examples to start with

Python: Thread Execution time

Is it possible to set execution time for thread in Python.???? like,If that time period get completed I can stop that thread and create new instance of it.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
# print "[0, 1, 4,..., 81]"
print pool.map(f, range(10))
# print same numbers in arbitrary order
for i in pool.imap_unordered(f, range(10)):
print i
# evaluate "f(20)" asynchronously
res = pool.apply_async(f, (20,)) # runs in *only* one process
print res.get(timeout=1) # prints "400"
From the multiprocessing docs.

Python 3 - Multiprocessing - Queue.get() does not respond

I want to make a brute force attack and therefore need some speed... So I came up to use the multiprocessing library... However, in every tutorial I've found, something did not work.... Hm.. This one seems to work very well, except, that I whenever I call the get() function, idle seems to go to sleep and it doesn't respond at all. Am I just silly or what? I just copy pasted the example, so it should have worked....
import multiprocessing as mp
import random
import string
# Define an output queue
output = mp.Queue()
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put(rand_str)
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(2)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
#dano hit it on the head! You don't have if __name__ == "__main__": so you have a "fork bomb". That is, each process is running the processes, and so on. You will also notice that I have moved the creation of the queue.
import multiprocessing as mp
import random
import string
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put(rand_str)
if __name__ == "__main__":
# Define an output queue
output = mp.Queue()
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(2)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
What happens is that multiprocessing runs each child process as a module, so __name__ is only __main__ in the parent. If you don't have that then each child process will (attempt to) start two more processes, each of which will start two more, and so on. No wonder IDLE stops.

Regarding relation of multiprocessing with cores of server

For the following code, assume I have a 32 core machine, will python decide how many process to create for me?
from multiprocessing import Process
for i in range(100):
p = Process(target=run, args=(fileToAnalyse,))
p.start()
No, it does not decide for you.
To limit the number of subprocesses, you need to use a pool of workers.
Example from the documentation:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1) # prints "100" unless your computer is *very* slow
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
If you omit, processes=4, it will use multiprocessing.cpu_count which return the number of cpu.

Categories