Python 3 - Multiprocessing - Queue.get() does not respond - python

I want to make a brute force attack and therefore need some speed... So I came up to use the multiprocessing library... However, in every tutorial I've found, something did not work.... Hm.. This one seems to work very well, except, that I whenever I call the get() function, idle seems to go to sleep and it doesn't respond at all. Am I just silly or what? I just copy pasted the example, so it should have worked....
import multiprocessing as mp
import random
import string
# Define an output queue
output = mp.Queue()
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put(rand_str)
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(2)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)

#dano hit it on the head! You don't have if __name__ == "__main__": so you have a "fork bomb". That is, each process is running the processes, and so on. You will also notice that I have moved the creation of the queue.
import multiprocessing as mp
import random
import string
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put(rand_str)
if __name__ == "__main__":
# Define an output queue
output = mp.Queue()
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(2)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
What happens is that multiprocessing runs each child process as a module, so __name__ is only __main__ in the parent. If you don't have that then each child process will (attempt to) start two more processes, each of which will start two more, and so on. No wonder IDLE stops.

Related

How to start functions in parallel, check if they are done, and start a new function in python?

I want to write a python code that does the following:
At first, it starts, say, 3 processes (or threads, or whatever) in parallel.
Then in a loop, python waits until any of the processes have finished (and returned some value)
Then, the python code starts a new function
In the end, I want 3 processes always running in parallel, until all functions I need to run are run. Here is some pseudocode:
import time
import random
from multiprocessing import Process
# some random function which can have different execution time
def foo():
time.sleep(random.randint(10) + 2)
return 42
# Start 3 functions
p = []
p.append(Process(target=foo))
p.append(Process(target=foo))
p.append(Process(target=foo))
while(True):
# wait until one of the processes has finished
???
# then add a new process so that always 3 are running in parallel
p.append(Process(target=foo))
I am pretty sure it is not clear what I want. Please ask.
What you really want is to start three processes and feed a queue with jobs that you want executed. Then there will only ever be three processes and when one is finished, it reads the next item from the queue and executes that:
import time
import random
from multiprocessing import Process, Queue
# some random function which can have different execution time
def foo(a):
print('foo', a)
time.sleep(random.randint(1, 10) + 2)
print(a)
return 42
def readQueue(q):
while True:
item = q.get()
if item:
f,*args = item
f(*args)
else:
return
if __name__ == '__main__':
q = Queue()
for a in range(4): # create 4 jobs
q.put((foo, a))
for _ in range(3): # sentinel for 3 processes
q.put(None)
# Start 3 processes
p = []
p.append(Process(target=readQueue, args=(q,)))
p.append(Process(target=readQueue, args=(q,)))
p.append(Process(target=readQueue, args=(q,)))
for j in p:
j.start()
#time.sleep(10)
for j in p:
j.join()
You can use the Pool of the multiprocessing module.
my_foos = [foo, foo, foo, foo]
def do_something(method):
method()
from multiprocessing import Pool
with Pool(3) as p:
p.map(do_something, my_foos)
The number 3 states the number of parallel jobs.
map takes the inputs as arguments to the function do_something
In your case do_something can be a function which calls the functions you want to be processed, which are passed as a list to inputs.

Understanding Python Multiprocessing Documentation

Trying to understand Python multiprocessing documents.
I would put this on meta but I'm not sure whether it might be valuable to searchers later.
I need some guidance as to how these examples relate to multiprocessing.
Am I correct in thinking that multiprocessing is using multiple processes (and thus CPUs) in order to break down an iterable task and thus shorten its duration?
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
We are starting one process; but how do I start multiple to complete my task? Do I iterate through Process + start() lines?
Yet there are no examples later in the documentation of for example:
for x in range(5):
p[x]=Process(target=f, args=('bob',))
p[x].start()
p.join()
Would that be the 'real life' implementation?
Here is the 'Queue Example':
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print(q.get()) # prints "[42, None, 'hello']"
p.join()
But again, how is this multiprocessing? This is just starting a process and having it run objects in a queue?
How do I make multiple processes start and run the objects in the queue?
Finally for pool:
from multiprocessing import Pool
import time
def f(x):
return x*x
if __name__ == '__main__':
with Pool(processes=4) as pool: # start 4 worker processes
result = pool.apply_async(f, (10,)) # evaluate "f(10)" asynchronously in a single process
print(result.get(timeout=1)) # prints "100" unless your computer is *very* slow
Are four processes doing 10x10 at once and it waits until all four come back or does just the one do this because we only gave the pool one argument?
If the former: Wouldn't that be slower than just having one do it in the first place? What about memory? Do we hold process 1's result until process 4 returns in RAM or does it get printed?
print(pool.map(f, range(10))) # prints "[0, 1, 4,..., 81]"
it = pool.imap(f, range(10))
print(next(it)) # prints "0"
print(next(it)) # prints "1"
print(it.next(timeout=1)) # prints "4" unless your computer is *very* slow
result = pool.apply_async(time.sleep, (10,))
print(result.get(timeout=1)) # raises multiprocessing.TimeoutError

Getting a pickle error when trying to run processes

What I'm trying to do is running a list of prime number decomposition in different processes at once. I have a threaded version that's working, but can't seem to get it working with processes.
import math
from Queue import Queue
import multiprocessing
def primes2(n):
primfac = []
num = n
d = 2
while d * d <= n:
while (n % d) == 0:
primfac.append(d) # supposing you want multiple factors repeated
n //= d
d += 1
if n > 1:
primfac.append(n)
myfile = open('processresults.txt', 'a')
myfile.write(str(num) + ":" + str(primfac) + "\n")
return primfac
def mp_factorizer(nums, nprocs):
def worker(nums, out_q):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outdict = {}
for n in nums:
outdict[n] = primes2(n)
out_q.put(outdict)
# Each process will get 'chunksize' nums and a queue to put his out
# dict into
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
# Wait for all worker processes to finish
for p in procs:
p.join()
print resultdict
if __name__ == '__main__':
mp_factorizer((400243534500, 100345345000, 600034522000, 9000045346435345000), 4)
I'm getting a pickle error shown below:
Any help would be greatly appreciated :)
You need to use multiprocessing.Queue instead of regular Queue. +more
This is due the Process doesn't run using the same memory space and there are some objects that aren't pickable, like the a regular queue (Queue.Queue). To overcome this, the multiprocessing library provide a Queue class that is actually a Proxy to a Queue.
And also, you could extract the def worker(.. out as any other method. This could be your main problem because on "how" a process is forked on a OS level.
You can also use a multiprocessing.Manager +more.
dynamically created functions cannot be pickled and therefore cannot be used as the target of a Process, the function worker needs to be defined in the global scope instead of inside the definition of mp_factorizer.

Regarding relation of multiprocessing with cores of server

For the following code, assume I have a 32 core machine, will python decide how many process to create for me?
from multiprocessing import Process
for i in range(100):
p = Process(target=run, args=(fileToAnalyse,))
p.start()
No, it does not decide for you.
To limit the number of subprocesses, you need to use a pool of workers.
Example from the documentation:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1) # prints "100" unless your computer is *very* slow
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
If you omit, processes=4, it will use multiprocessing.cpu_count which return the number of cpu.

Execute a list of process without multiprocessing pool map

import multiprocessing as mp
if __name__ == '__main__':
#pool = mp.Pool(M)
p1 = mp.Process(target= target1, args= (arg1,))
p2 = mp.Process(target= target2, args= (arg1,))
...
p9 = mp.Process(target= target9, args= (arg9,))
p10 = mp.Process(target= target10, args= (arg10,))
...
pN = mp.Process(target= targetN, args= (argN,))
processList = [p1, p2, .... , p9, p10, ... ,pN]
I have N different target functions which consume unequal non-trivial amount of time to execute.
I am looking for a way to execute them in parallel such that M (1 < M < N) processes are running simultaneously. And as soon as a process is finished next process should start from the list, until all the processes in processList are completed.
As I am not calling the same target function, I could not use Pool.
I considered doing something like this:
for i in range(0, N, M):
limit = i + M
if(limit > N):
limit = N
for p in processList[i:limit]:
p.join()
Since my target functions consume unequal time to execute, this method is not really efficient.
Any suggestions? Thanks in advance.
EDIT:
Question title has been changed to 'Execute a list of process without multiprocessing pool map' from 'Execute a list of process without multiprocessing pool'.
You can use proccess Pool:
#!/usr/bin/env python
# coding=utf-8
from multiprocessing import Pool
import random
import time
def target_1():
time.sleep(random.uniform(0.5, 2))
print('done target 1')
def target_2():
time.sleep(random.uniform(0.5, 2))
print('done target 1')
def target_3():
time.sleep(random.uniform(0.5, 2))
print('done target 1')
def target_4():
time.sleep(random.uniform(0.5, 2))
print('done target 1')
pool = Pool(2) # maximum two processes at time.
pool.apply_async(target_1)
pool.apply_async(target_2)
pool.apply_async(target_3)
pool.apply_async(target_4)
pool.close()
pool.join()
Pool is created specifically for what you need to do - execute many tasks in limited number of processes.
I also suggest you take a look at concurrent.futures library and it's backport to Python 2.7. It has a ProcessPoolExecutor, which has roughly same capabilities, but it's methods returns Future objects, and they has a nicer API.
Here is a way to do it in Python 3.4, which could be adapted for Python 2.7 :
targets_with_args = [
(target1, arg1),
(target2, arg2),
(target3, arg3),
...
]
with concurrent.futures.ProcessPoolExecutor(max_workers=20) as executor:
futures = [executor.submit(target, arg) for target, arg in targets_with_args]
results = [future.result() for future in concurrent.futures.as_completed(futures)]
I would use a Queue. adding processes to it from processList, and as soon as a process is finished i would remove it from the queue and add another one.
a pseudo code will look like:
from Queue import Queue
q = Queue(m)
# add first process to queue
i = 0
q.put(processList[i])
processList[i].start()
i+=1
while not q.empty():
p=q.get()
# check if process is finish. if not return it to the queue for later checking
if p.is_alive():
p.put(t)
# add another process if there is space and there are more processes to add
if not q.full() and i < len(processList):
q.put(processList[i])
processList[i].start()
i+=1
A simple solution would be to wrap the functions target{1,2,...N} into a single function forward_to_target that forwards to the appropriate target{1,2,...N} function according to the argument that is passed in. If you cannot infer the appropriate target function from the arguments you currently use, replace each argument with a tuple (argX, X), then in the forward_to_target function unpack the tuple and forward to the appropriate function indicated by the X.
You could have two lists of targets and arguments, zip the two together - and send them to a runner function (here it's run_target_on_args):
#!/usr/bin/env python
import multiprocessing as mp
# target functions
targets = [len, str, len, zip]
# arguments for each function
args = [["arg1"], ["arg2"], ["arg3"], [["arg5"], ["arg6"]]]
# applies target function on it's arguments
def run_target_on_args(target_args):
return target_args[0](*target_args[1])
pool = mp.Pool()
print pool.map(run_target_on_args, zip(targets, args))

Categories