So right now I am attempting to create a Python program that executes a series of tasks (Process subclasses). One of the things that I would like to know is when a Process has completed. Ideally, what I would want to do is have the Process subclass make a callback to the calling process in order to add the next Process into a queue. Here is what I have so far:
from multiprocessing import Process, Queue
import time
class Task1(Process):
def __init__(self, queue):
super(Task1, self).__init__()
self.queue = queue
def run(self):
print 'Start Task 1'
time.sleep(1)
print 'Completed Task 1'
# make a callback to the main process to alert it that its execution has completed
class Task2(Process):
def __init__(self, queue):
super(Task2, self).__init__()
self.queue = queue
def run(self):
print 'Start Task 2'
time.sleep(1)
print 'Completed Task 2'
# make a callback to the main process to alert it that its execution has completed
if __name__ == '__main__':
queue = Queue()
p1 = Process1(queue)
p1.start()
p1.join()
# need a callback of some sort to know when p1 has completed its execution in order to add Process2 into the queue
Prior to Python, I mainly worked with Objective-C. I am mainly trying to find something for Process that is analogous to a completion block. Thanks.
Functions are first class citizens in Python, so you can just pass them as arguments,
e.g. to your task constructors:
def __init__(self, queue, callb):
super(Task2, self).__init__()
self.queue = queue
self.callb = callb
You can then call them after the run method finished:
def run(self):
print 'Start Task 2'
time.sleep(1)
print 'Completed Task 2'
self.callb(self)
Define a function somewhere, e.g.
def done(arg):
print "%s is done" % arg
And pass it to the task constructor:
p1 = Task1(queue, done)
If I'm understanding your question correctly, your code already does what you want it to do!
The p1.join() will block the main process until p1 finishes. If p1.join() returns without error, then the process must have terminated and you can immediately start task2. Your "callback" would simply be a check that p1.join() has returned correctly!
From the documentation:
join([timeout])
Block the calling thread until the process whose join() method is called terminates or until the optional timeout occurs.
If timeout is None then there is no timeout.
A process can be joined many times.
A process cannot join itself because this would cause a deadlock. It is an error to attempt to join a process before it has been started.
Edit:
Optionally, if you want a non-blocking solution, you can poll a particular process to see if it has terminated:
p1.start()
while(p1.is_alive()):
pass #busy wait
p2.start()
This will do exactly the same as with p1.join, but with this you can replace the pass with useful work while waiting for p1 to complete.
Related
I've read that it's considered bad practice to kill a thread. (Is there any way to kill a Thread?) There are a LOT of answers there, and I'm wondering if even using a thread in the first place is the right answer for me.
I have a bunch multiprocessing.Processes. Essentially, each Process is doing this:
while some_condition:
result = self.function_to_execute(i, **kwargs_i)
# outQ is a multiprocessing.queue shared between all Processes
self.outQ.put(Result(i, result))
Problem is... I need a way to interrupt function_to_execute, but can't modify the function itself. Initially, I was thinking simply process.terminate(), but that appears to be unsafe with multiprocessing.queue.
Most likely (but not guaranteed), if I need to kill a thread, the 'main' program is going to be done soon. Is my safest option to do something like this? Or perhaps there is a more elegant solution than using a thread in the first place?
def thread_task():
while some_condition:
result = self.function_to_execute(i, **kwargs_i)
if (this_thread_is_not_daemonized):
self.outQ.put(Result(i, result))
t = Thread(target=thread_task)
t.start()
if end_early:
t.daemon = True
I believe the end result of this is that the Process that spawned the thread will continue to waste CPU cycles on a task I no longer care about the output for, but if the main program finishes, it'll clean up all my memory nicely.
The main problem with daemonizing a thread is that the main program could potentially continue for 30+ minutes even when I don't care about the output of that thread anymore.
From the threading docs:
If you want your threads to stop gracefully, make them non-daemonic
and use a suitable signalling mechanism such as an Event
Here is a contrived example of what I was thinking - no idea if it mimics what you are doing or can be adapted for your situation. Another caveat: I've never written any real concurrent code.
Create an Event object in the main process and pass it all the way to the thread.
Design the thread so that it loops until the Event object is set. Once you don't need the processing anymore SET the Event object in the main process. No need to modify the function being run in the thread.
from multiprocessing import Process, Queue, Event
from threading import Thread
import time, random, os
def f_to_run():
time.sleep(.2)
return random.randint(1,10)
class T(Thread):
def __init__(self, evt,q, func, parent):
self.evt = evt
self.q = q
self.func = func
self.parent = parent
super().__init__()
def run(self):
while not self.evt.is_set():
n = self.func()
self.q.put(f'PID {self.parent}-{self.name}: {n}')
def f(T,evt,q,func):
pid = os.getpid()
t = T(evt,q,func,pid)
t.start()
t.join()
q.put(f'PID {pid}-{t.name} is alive - {t.is_alive()}')
q.put(f'PID {pid}:DONE')
return 'foo done'
if __name__ == '__main__':
results = []
q = Queue()
evt = Event()
# two processes each with one thread
p= Process(target=f, args=(T, evt, q, f_to_run))
p1 = Process(target=f, args=(T, evt, q, f_to_run))
p.start()
p1.start()
while len(results) < 40:
results.append(q.get())
print('.',end='')
print('')
evt.set()
p.join()
p1.join()
while not q.empty():
results.append(q.get_nowait())
for thing in results:
print(thing)
I initially tried to use threading.Event but the multiprocessing module complained that it couldn't be pickled. I was actually surprised that the multiprocessing.Queue and multiprocessing.Event worked AND could be accessed by the thread.
Not sure why I started with a Thread subclass - I think I thought it would be easier to control/specify what happens in it's run method. But it can be done with a function also.
from multiprocessing import Process, Queue, Event
from threading import Thread
import time, random
def f_to_run():
time.sleep(.2)
return random.randint(1,10)
def t1(evt,q, func):
while not evt.is_set():
n = func()
q.put(n)
def g(t1,evt,q,func):
t = Thread(target=t1,args=(evt,q,func))
t.start()
t.join()
q.put(f'{t.name} is alive - {t.is_alive()}')
return 'foo'
if __name__ == '__main__':
q = Queue()
evt = Event()
p= Process(target=g, args=(t1, evt, q, f_to_run))
p.start()
time.sleep(5)
evt.set()
p.join()
I am trying to learn multiprocessing with queue.
What I want to do is figure out when/how to "add more items to the queue" when the script is in motion.
The below script is the baseline I am working from:
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print('Doing something fancy in {} for {}!'.format(
proc_name, self.name))
def worker(q):
obj = q.get()
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
queue.put(MyFancyClass('Frankie'))
print(queue.qsize())
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
on line 26, the Fancy Dan inject works, but the Frankie piece doesn't. I am able to confirm that Frankie does make it into the queue. I need a spot where I can "Check for more items" and insert them into the queue as needed. If no more items exist, then close the queue when the existing items are clear.
How do I do this?
Thanks!
Let's make it clear:
the target function worker(q) will be called just once in the above scheme. At that first call the function will suspend waiting the result from blocking operation q.get(). It gets the instance MyFancyClass('Fancy Dan') from the queue, invokes its do_something method and get finished.
MyFancyClass('Frankie') will be put into the queue but won't go to the Process cause the process' target function is done.
one of the ways is to read from the queue and wait for a signal/marked item which signals that queue usage is stopped. Let's say None value.
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print('Doing something fancy in {} for {}!'.format(proc_name, self.name))
def worker(q):
while True:
obj = q.get()
if obj is None:
break
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
queue.put(MyFancyClass('Frankie'))
# print(queue.qsize())
queue.put(None)
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
The output:
Doing something fancy in Process-1 for Fancy Dan!
Doing something fancy in Process-1 for Frankie!
One way you could do this is by changing worker to
def worker(q):
while not q.empty():
obj = q.get()
obj.do_something()
The problem with your original code is that worker returns after doing work on one item on the queue. You need some sort of looping logic.
This solution is imperfect because empty() is not reliable. Also will fail if the queue becomes empty before adding more items to it (the process will just return).
I would suggest using a Process Pool Executor.
Submit is pretty close to what you're looking for.
I'm making remote API calls using threads, using no join so that the program could make the next API call without waiting for the last to complete.
Like so:
def run_single_thread_no_join(function, args):
thread = Thread(target=function, args=(args,))
thread.start()
return
The problem was I needed to know when all API calls were completed. So I moved to code that's using a cue & join.
Threads seem to run in serial now.
I can't seem to figure out how to get the join to work so that threads execute in parallel.
What am I doing wrong?
def run_que_block(methods_list, num_worker_threads=10):
'''
Runs methods on threads. Stores method returns in a list. Then outputs that list
after all methods in the list have been completed.
:param methods_list: example ((method name, args), (method_2, args), (method_3, args)
:param num_worker_threads: The number of threads to use in the block.
:return: The full list of returns from each method.
'''
method_returns = []
# log = StandardLogger(logger_name='run_que_block')
# lock to serialize console output
lock = threading.Lock()
def _output(item):
# Make sure the whole print completes or threads can mix up output in one line.
with lock:
if item:
print(item)
msg = threading.current_thread().name, item
# log.log_debug(msg)
return
# The worker thread pulls an item from the queue and processes it
def _worker():
while True:
item = q.get()
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()
# Create the queue and thread pool.
q = Queue()
threads = []
# starts worker threads.
for i in range(num_worker_threads):
t = threading.Thread(target=_worker)
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
t.start()
threads.append(t)
for method in methods_list:
q.put(method[0](*method[1]))
# block until all tasks are done
q.join()
# stop workers
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
return method_returns
You're doing all the work in the main thread:
for method in methods_list:
q.put(method[0](*method[1]))
Assuming each entry in methods_list is a callable and a sequence of arguments for it, you did all the work in the main thread, then put the result from each function call in the queue, which doesn't allow any parallelization aside from printing (which is generally not a big enough cost to justify thread/queue overhead).
Presumably, you want the threads to do the work for each function, so change that loop to:
for method in methods_list:
q.put(method) # Don't call it, queue it to be called in worker
and change the _worker function so it calls the function that does the work in the thread:
def _worker():
while True:
method, args = q.get() # Extract and unpack callable and arguments
item = method(*args) # Call callable with provided args and store result
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()
I have problem with python multithreaded Queues. I have this script, where producer take elements from input queue, produces some elements and puts them to output queue, and consumer takes element from output queue and just prints them:
import threading
import Queue
class Producer(threading.Thread):
def __init__(self, iq, oq):
threading.Thread.__init__(self)
self.iq = iq
self.oq = oq
def produce(self, e):
self.oq.put(e*2)
self.oq.task_done()
print "Producer %s produced %d and put it to output Queue"%(self.getName(), e*2)
def run(self):
while 1:
e = self.iq.get()
self.iq.task_done()
print "Get %d from input Queue"%(e)
self.produce(e)
class Consumer(threading.Thread):
def __init__(self, oq):
threading.Thread.__init__(self)
self.oq = oq
def run(self):
while 1:
e = self.oq.get()
self.oq.task_done()
print "Consumer get %d from output queue and consumed"%e
iq = Queue.Queue()
oq = Queue.Queue()
for i in xrange(2):
iq.put((i+1)*10)
for i in xrange(2):
t1 = Producer(iq, oq)
t1.setDaemon(True)
t1.start()
t2 = Consumer(oq)
t2.setDaemon(True)
t2.start()
iq.join()
oq.join()
But, every time I run it, it works different(gives exception, or consumer does not do any job). I think the problem is in task_done() command, can anyone explain me where the bug is?
I have modified Consumer class:
class Consumer(threading.Thread):
def __init__(self, oq):
threading.Thread.__init__(self)
self.oq = oq
def run(self):
while 1:
e = self.oq.get()
self.oq.task_done()
print "Consumer get %d from output queue and consumed"%e
page = urllib2.urlopen("http://www.ifconfig.me/ip")
print page
Now consumer after each task_done() command should connect to web site (it takes some time), but it does not, instead if execution time of code after task_done() is small, it runs but if it is long it does not run! Why? Can anyone explain me this issue? If I put everything before task_done() command then I will block queue from other threads which is stupid enough. Or is there anything I am missing about multithreading in python?
From the Queue docs:
Queue.task_done() Indicate that a formerly enqueued task is complete.
Used by queue consumer threads. For each get() used to fetch a task, a
subsequent call to task_done() tells the queue that the processing on
the task is complete.
If a join() is currently blocking, it will resume when all items have
been processed (meaning that a task_done() call was received for every
item that had been put() into the queue)
For example in your code you do the following in your Producer class:
def produce(self, e):
self.oq.put(e*2)
self.oq.task_done()
print "Producer %s produced %d and put it to output Queue"%(self.getName(), e*2)
You shouldn't do self.oq.task_done() here, since you haven't used oq.get().
I am not sure this is the only problem though.
EDIT:
For your other problem, you're using iq.join() and oq.join() at the end, this leads your main thread to exit before the other threads print the retrieved pages, and since you're creating your threads as Daemons, your Python application exits without waiting for them to finish executing. (Remember that Queue.join() depends on Queue.task_done())
Now you're saying "If I put everything before task_done() command then I will block queue from other threads". I can't see what you mean, this will only block your Consumer thread, but you can always create more Consumer threads which won't be blocked by each other.
I'm sorry if it is a stupid question. I am trying to use a number of classes of multi-threading to finish different jobs, which involves invoking these multi-threadings at different times for many times. But I am not sure which method to use. The code looks like this:
class workers1(Thread):
def __init__(self):
Thread.__init__(self)
def run(self):
do some stuff
class workers2(Thread):
def __init__(self):
Thread.__init__(self)
def run(self):
do some stuff
class workers3(Thread):
def __init__(self):
Thread.__init__(self)
def run(self):
do some stuff
WorkerList1=[workers1(i) for i in range(X)]
WorkerList2=[workers2(i) for i in range(XX)]
WorkerList2=[workers3(i) for i in range(XXX)]
while True:
for thread in WorkerList1:
thread.run (start? join? or?)
for thread in WorkerList2:
thread.run (start? join? or?)
for thread in WorkerList3:
thread.run (start? join? or?)
do sth .
I am trying to have all the threads in all the WorkerList to start functioning at the same time, or at least start around the same time. After sometime once they were all terminated, I would like to invoke all the threads again.
If there were no loop, I can just use .start; but since I can only start a thread once, start apparently does not fit here. If I use run, it seems that all the threads start sequentially, not only the threads in the same list, but also threads from different lists.
Can anyone please help?
there are a lot of misconceptions here:
you can only start a specific instance of a thread once. but in your case, the for loop is looping over different instances of a thread, each instance being assigned to the variable thread in the loop, so there is no problem at all in calling the start() method over each thread. (you can think of it as if the variable thread is an alias of the Thread() object instantiated in your list)
run() is not the same as join(): calling run() performs as if you were programming sequentially. the run() method does not start a new thread, it simply execute the statements in in the method, as for any other function call.
join() does not start executing anything: it only waits for a thread to finish. in order for join() to work properly for a thread, you have to call start() on this thread first.
additionally, you should note that you cannot restart a thread once it has finished execution: you have to recreate the thread object for it to be started again. one workaround to get this working is to call Thread.__init__() at the end of the run() method. however, i would not recommend doing this since this will disallow the use of the join() method to detect the end of execution of the thread.
If you would call thread.start() in the loops, you would actually start every thread only once, because all the entries in your list are distinct thread objects (it does not matter they belong to the same class). You should never call the run() method of a thread directly -- it is meant to be called by the start() method. Calling it directly would not call it in a separate thread.
The code below creates a class that is just a thread but the start and calls the initialization of the Thread class again so that the thread doesn't know it has been called.
from threading import Thread
class MTThread(Thread):
def __init__(self, name = "", target = None):
self.mt_name = name
self.mt_target = target
Thread.__init__(self, name = name, target = target)
def start(self):
super().start()
Thread.__init__(self, name = self.mt_name, target = self.mt_target)
def run(self):
super().run()
Thread.__init__(self, name = self.mt_name, target = self.mt_target)
def code():
#Some code
thread = MTThread(name = "SomeThread", target = code)
thread.start()
thread.start()
I had this same dilemma and came up with this solution which has worked perfectly for me. It also allows a thread-killing decorator to be used efficiently.
The key feature is the use of a thread refresher which is instantiated and .started in main. This thread-refreshing thread will run a function that instantiates and starts all other (real, task-performing) threads. Decorating the thread-refreshing function with a thread-killer allows you to kill all threads when a certain condition is met, such as main terminating.
#ThreadKiller(arg) #qu'est-ce que c'est
def RefreshThreads():
threadTask1 = threading.Thread(name = "Task1", target = Task1, args = (anyArguments))
threadTask2 = threading.Thread(name = "Task2", target = Task2, args = (anyArguments))
threadTask1.start()
threadTask2.start()
#Main
while True:
#do stuff
threadRefreshThreads = threading.Thread(name = "RefreshThreads", target = RefreshThreads, args = ())
threadRefreshThreads.start()
from threading import Thread
from time import sleep
def runA():
while a==1:
print('A\n')
sleep(0.5)
if __name__ == "__main__":
a=1
t1 = Thread(target = runA)
t1.setDaemon(True)
t1.start()
sleep(2)
a=0
print(" now def runA stops")
sleep(3)
print("and now def runA continue")
a=1
t1 = Thread(target = runA)
t1.start()
sleep(2)