I have a program i want to split into 10 parts with multiprocessing. Each worker will be searching for the same answer using different variables to look for it (in this case its brute forcing a password). How to I get the processes to communicate their status, and how do I terminate all processes once one process has found the answer. Thank you!
If you are going to split it into 10 parts than either you should have 10 cores or at least your worker function should not be 100% CPU bound.
The following code initializes each process with a multiprocess.Queue instance to which the worker function will write its result. The main process waits for the first entry written to the queue and then terminates all pool processes. For this demo, the worker function is passed arguments 1, 2, 3, ... 10 and then sleeps for that amount of time and returns the argument passed. So we would expect that the worker function that was passed the argument value of 1 to complete first and that the total running time of the program should be slightly more than 1 second (it takes some time to create the 10 processes):
import multiprocessing
import time
def init_pool(q):
global queue
queue = q
def worker(x):
time.sleep(x)
# write result to queue
queue.put_nowait(x)
def main():
queue = multiprocessing.Queue()
pool = multiprocessing.Pool(10, initializer=init_pool, initargs=(queue,))
for i in range(1, 11):
# non-blocking:
pool.apply_async(worker, args=(i,))
# wait for first result
result = queue.get()
pool.terminate() # kill all tasks
print('Result: ', result)
# required for Windows:
if __name__ == '__main__':
t = time.time()
main()
print('total time =', time.time() - t)
Prints:
Result: 1
total time = 1.2548246383666992
This question already has answers here:
Timeout on a function call
(23 answers)
Closed 2 years ago.
Currently I am writing a function that includes random samples with some restrictions. It is possible at the last step there is no valid condition to choose and my function will be stuck.
Is there a way to set run time limit for this function? Let's say after 2 seconds, if there is no result returned, and it will run the function again until it returns the result within 2 seconds.
Thanks in advance.
You can use the threading module.
The following code is an example of a Thread with a timeout:
from threading import Thread
p = Thread(target = myFunc, args = [myArg1, myArg2])
p.start()
p.join(timeout = 2)
You could add something like the following to your code, as a check to keep looping until the function should properly end:
shouldFinish = False
def myFunc(myArg1, myArg2):
...
if finishedProperly:
shouldFinish = True
while shouldFinish == False:
p = Thread(target = myFunc, args = [myArg1, myArg2])
p.start()
p.join(timeout = 2)
I have a list which contains table names and let say size of list be n. Now I have m servers so I have opened m cursors corresponding to each which is also in another list. Now for every table I want to call a certain function which takes parameter as this two list.
templst = [T1,T2,T3,T4,T5,T6, T7,T8,T9,T10,T11]
curlst = [cur1,cur2,cur3,cur4,cur5]
These cursors are opened as cur = conn.cursor() so these are objects
def extract_single(tableName, cursorconn):
qry2 = "Select * FROM %s"% (tableName)
cursorconn.execute(qry2).fetchall()
print " extraction done"
return
Now I have opened 5 processess (since I have 5 cursors ) so as to run them in parallel.
processes = []
x = 0
for x in range(5):
new_p = 'p%x'%x
print "process :", new_p
new_p = multiprocessing.Process(target=extract_single, args=(templst[x],cur[x]))
new_p.start()
processes.append(new_p)
for process in processes:
process.join()
So this makes sure that I have opened 5 processes for each cursor and it took first 5 table names.
Now I want that as soon as any process among the 5 finishes it should immediately take the 6th table from my templst and the same thing goes on till all the templst is done.
How to modify this code for this behaviour ?
For Example
for simple example what I want to do. Let us consider a templst as an int for which I want to call sleep function
templst = [1,2,5,7,4,3,6,8,9,10,11]
curlst = [cur1,cur2,cur3,cur4,cur5]
def extract_single(sec, cursorconn):
print "Sleeping for second=%s done by cursor=%s"% (sec,cursorconn)
time.sleep(sec)
print " sleeping done"
return
so when I start the 5 cursors so it is possible that either the sleep(1) or sleep(2) finishes first
so as soon as it finishes I want to run sleep(3) with that cursor.
My real query will be dependent on cursor since it will be SQL query
Modified approach
Considering previous example of sleep. I now want to implement that I have suppose 10 cursors and my sleep queue is sorted in increasing order or decreasing order.
Considering list in increasing order
Now out of 10 cursors the first 5 cursors will take first 5 elements from queue and my another set of 5 cursors will take last five.
So basically my cursor queue is divided into 2 halfs which will take lowest value and another half will take highest value.
Now if cursor from first half finishes it should take next lowest value avaliable and if cursor from another second half then it should take (n-6)th value i.e. 6 value from end.
I need to traverse the queue from both side and have two set of cursors of each 5
example: curlst1 = [cur1,cur2,cur3,cur4,cur5]
curlst2 = [cur6,cur7,cur8,cur9,cur10 ]
templst = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
so cur1 -> 1
cur2 ->2
... cur5 -> 5
cur6 -> 16
cur7 ->15
.... cur10->12
now cur1 finishes first so it will take 6 (first avaliable element from front)
cur2 finsihes it takes 7 and so on
if cur 10 finsihes it will take 11 (next avaliable element from back)
and so on till all elements of templst.
Place your templst arguments, whether table names as in the real example or number of seconds to sleep as in the example below, on a multiprocessing queue. Then each process loops reading the next item from the queue. When the queue is empty, there is no more work to be performed and your process can return. You have in effect implemented your own process pooling where each process has its own dedicated cursor connection. Now your function extract_single takes as its first argument the queue from which to retrieve the table name or seconds argument from.
import multiprocessing
import Queue
import time
def extract_single(q, cursorconn):
while True:
try:
sec = q.get_nowait()
print "Sleeping for second=%s done by cursor=%s" % (sec,cursorconn)
time.sleep(sec)
print " sleeping done"
except Queue.Empty:
return
def main():
q = multiprocessing.Queue()
templst = [1,2,5,7,4,3,6,8,9,10,11]
for item in templst:
q.put(item) # add items to queue
curlst = [cur1,cur2,cur3,cur4,cur5]
process = []
for i in xrange(5):
p = multiprocessing.Process(target=extract_single, args=(q, curlst[i]))
process.append(p)
p.start()
for p in process:
p.join()
if __name__ == '__main__':
main()
Note
If you have fewer than 5 processors, you might try running this with 5 (or more) threads, in which case a regular Queue object should be used.
Updated Answer to Updated Question
The data structure that allows you to remove items from the front of the queue and also from the end is known as a deque (double-ended queue). Unfortunately, there is no version of a deque supported for multiprocessing. But I think that your table processing might work just as well with threading and it's highly unlikely that your computer has 10 processors to support 10 concurrent processes running anyway.
import threading
from collections import deque
import time
import sys
templst = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
q = deque(templst)
curlst1 = [cur1,cur2,cur3,cur4,cur5]
curlst2 = [cur6,cur7,cur8,cur9,cur10]
def extract_single(cursorconn, from_front):
while True:
try:
sec = q.popleft() if from_front else q.pop()
#print "Sleeping for second=%s done by cursor=%s" % (sec,cursorconn)
sys.stdout.write("Sleeping for second=%s done by cursor=%s\n" % (sec,cursorconn))
sys.stdout.flush() # flush output
time.sleep(sec)
#print " sleeping done"
sys.stdout.write("sleeping done by %s\n" % cursorconn)
sys.stdout.flush() # flush output
except IndexError:
return
def main():
threads = []
for cur in curlst1:
t = threading.Thread(target=extract_single, args=(cur, True))
threads.append(t)
t.start()
for cur in curlst2:
t = threading.Thread(target=extract_single, args=(cur, False))
threads.append(t)
t.start()
for t in threads:
t.join()
if __name__ == '__main__':
main()
Python's semaphore doesn't support negative initial values. How, then, do I make a thread wait until 8 other threads have done something? If Semophore supported negative initial values, I could have just set it to -8, and have each thread increment the value by 1, until we have a 0, which unblocks the waiting thread.
I can manually increment a global counter inside a critical section, then use a conditional variable, but I want to see if there are other suggestions.
Surely it's late for an answer, but it can come handy to someone else.
If you want to wait for 8 different threads to do something, you can just wait 8 times.
You initialize a semaphore in 0 with
s = threading.Semaphore(0)
and then
for _ in range(8):
s.acquire()
will do the job.
Full example:
import threading
import time
NUM_THREADS = 4
s = threading.Semaphore(0)
def thread_function(i):
print("start of thread", i)
time.sleep(1)
s.release()
print("end of thread", i)
def main_thread():
print("start of main thread")
threads = [
threading.Thread(target=thread_function, args=(i, ))
for i
in range(NUM_THREADS)
]
[t.start() for t in threads]
[s.acquire() for _ in range(NUM_THREADS)]
print("end of main thread")
main_thread()
Possible output:
start of main thread
start of thread 0
start of thread 1
start of thread 2
start of thread 3
end of thread 0
end of thread 2
end of thread 1
end of thread 3
end of main thread
For any further readers: starting from python3.2 there are Barrier Objects which
provides a simple synchronization primitive for use by a fixed number of threads that need to wait for each other.
Below is the test code - I'm playing around with the Thread Pool found in the standard library. The problem is that the final process never ends. It just hangs.
I should mention what I'm after here, func depending on input can take a few seconds to a few minutes and I want them to finish as soon as possible - in order of whatever finishes first. Ideally the number of "func" I will execute will be around four of five at the same time.
>>> from multiprocessing.pool import ThreadPool
>>>
>>> def func(i):
... import time
... if i % 2 == 0: time.sleep(5)
... print i
...
>>> t = ThreadPool(5)
>>>
>>> for i in range(10):
... z = t.Process(target=func, args=(i,))
... z.start()
...
1
3
5
7
9
>>> 0
2
4
6
8
In other words, after printing "8" the code just waits here until I force a KeyboardInterrupt. I've tried setting the process as a daemon but no luck. Any advice/better documentation?
From the documentation
Worker processes within a Pool typically live for the complete duration of the Pool’s work queue.
You should probably use for task like this
t.imap(func,xrange(10))
t.close()
t.join()