python queue task_done() issue - python

I have problem with python multithreaded Queues. I have this script, where producer take elements from input queue, produces some elements and puts them to output queue, and consumer takes element from output queue and just prints them:
import threading
import Queue
class Producer(threading.Thread):
def __init__(self, iq, oq):
threading.Thread.__init__(self)
self.iq = iq
self.oq = oq
def produce(self, e):
self.oq.put(e*2)
self.oq.task_done()
print "Producer %s produced %d and put it to output Queue"%(self.getName(), e*2)
def run(self):
while 1:
e = self.iq.get()
self.iq.task_done()
print "Get %d from input Queue"%(e)
self.produce(e)
class Consumer(threading.Thread):
def __init__(self, oq):
threading.Thread.__init__(self)
self.oq = oq
def run(self):
while 1:
e = self.oq.get()
self.oq.task_done()
print "Consumer get %d from output queue and consumed"%e
iq = Queue.Queue()
oq = Queue.Queue()
for i in xrange(2):
iq.put((i+1)*10)
for i in xrange(2):
t1 = Producer(iq, oq)
t1.setDaemon(True)
t1.start()
t2 = Consumer(oq)
t2.setDaemon(True)
t2.start()
iq.join()
oq.join()
But, every time I run it, it works different(gives exception, or consumer does not do any job). I think the problem is in task_done() command, can anyone explain me where the bug is?
I have modified Consumer class:
class Consumer(threading.Thread):
def __init__(self, oq):
threading.Thread.__init__(self)
self.oq = oq
def run(self):
while 1:
e = self.oq.get()
self.oq.task_done()
print "Consumer get %d from output queue and consumed"%e
page = urllib2.urlopen("http://www.ifconfig.me/ip")
print page
Now consumer after each task_done() command should connect to web site (it takes some time), but it does not, instead if execution time of code after task_done() is small, it runs but if it is long it does not run! Why? Can anyone explain me this issue? If I put everything before task_done() command then I will block queue from other threads which is stupid enough. Or is there anything I am missing about multithreading in python?

From the Queue docs:
Queue.task_done() Indicate that a formerly enqueued task is complete.
Used by queue consumer threads. For each get() used to fetch a task, a
subsequent call to task_done() tells the queue that the processing on
the task is complete.
If a join() is currently blocking, it will resume when all items have
been processed (meaning that a task_done() call was received for every
item that had been put() into the queue)
For example in your code you do the following in your Producer class:
def produce(self, e):
self.oq.put(e*2)
self.oq.task_done()
print "Producer %s produced %d and put it to output Queue"%(self.getName(), e*2)
You shouldn't do self.oq.task_done() here, since you haven't used oq.get().
I am not sure this is the only problem though.
EDIT:
For your other problem, you're using iq.join() and oq.join() at the end, this leads your main thread to exit before the other threads print the retrieved pages, and since you're creating your threads as Daemons, your Python application exits without waiting for them to finish executing. (Remember that Queue.join() depends on Queue.task_done())
Now you're saying "If I put everything before task_done() command then I will block queue from other threads". I can't see what you mean, this will only block your Consumer thread, but you can always create more Consumer threads which won't be blocked by each other.

Related

EC2 Spot Instance Termination & Python 2.7

I know that the termination notice is made available via the meta-data url and that I can do something similar to
if requests.get("http://169.254.169.254/latest/meta-data/spot/termination-time").status_code == 200
in order to determine if the notice has been posted. I run a Python service on my Spot Instances that:
Loops over long polling SQS Queues
If it gets a message, it pauses polling and works on the payload.
Working on the payload can take 5-50 minutes.
Working on the payload will involve spawning a threadpool of up to 50 threads to handle parallel uploading of files to S3, this is the majority of the time spent working on the payload.
Finally, remove the message from the queue, rinse, repeat.
The work is idempotent, so if the same payload runs multiple times, I'm out the processing time/costs, but will not negatively impact the application workflow.
I'm searching for an elegant way to now also poll for the termination notice every five seconds in the background. As soon as the termination notice appears, I'd like to immediately release the message back to the SQS queue in order for another instance to pick it up as quickly as possible.
As a bonus, I'd like to shutdown the work, kill off the threadpool, and have the service enter a stasis state. If I terminate the service, supervisord will simply start it back up again.
Even bigger bonus! Is there not a python module available that simplifies this and just works?
I wrote this code to demonstrate how a thread can be used to poll for the Spot instance termination. It first starts up a polling thread, which would be responsible for checking the http endpoint.
Then we create pool of fake workers (mimicking real work to be done) and starts running the pool. Eventually the polling thread will kick in (about 10 seconds into execution as implemented) and kill the whole thing.
To prevent the script from continuing to work after Supervisor restarts it, we would simply put a check at the beginning of the __main__ and if the termination notice is there we sleep for 2.5 minutes, which is longer than that notice lasts before the instance is shutdown.
#!/usr/bin/env python
import threading
import Queue
import random
import time
import sys
import os
class Instance_Termination_Poll(threading.Thread):
"""
Sleep for 5 seconds and eventually pretend that we then recieve the
termination event
if requests.get("http://169.254.169.254/latest/meta-data/spot/termination-time").status_code == 200
"""
def run(self):
print("Polling for termination")
while True:
for i in range(30):
time.sleep(5)
if i==2:
print("Recieve Termination Poll!")
print("Pretend we returned the message to the queue.")
print("Now Kill the entire program.")
os._exit(1)
print("Well now, this is embarassing!")
class ThreadPool:
"""
Pool of threads consuming tasks from a queue
"""
def __init__(self, num_threads):
self.num_threads = num_threads
self.errors = Queue.Queue()
self.tasks = Queue.Queue(self.num_threads)
for _ in range(num_threads):
Worker(self.tasks, self.errors)
def add_task(self, func, *args, **kargs):
"""
Add a task to the queue
"""
self.tasks.put((func, args, kargs))
def wait_completion(self):
"""
Wait for completion of all the tasks in the queue
"""
try:
while True:
if self.tasks.empty() == False:
time.sleep(10)
else:
break
except KeyboardInterrupt:
print "Ctrl-c received! Kill it all with Prejudice..."
os._exit(1)
self.tasks.join()
class Worker(threading.Thread):
"""
Thread executing tasks from a given tasks queue
"""
def __init__(self, tasks, error_queue):
threading.Thread.__init__(self)
self.tasks = tasks
self.daemon = True
self.errors = error_queue
self.start()
def run(self):
while True:
func, args, kargs = self.tasks.get()
try:
func(*args, **kargs)
except Exception, e:
print("Exception " + str(e))
error = {'exception': e}
self.errors.put(error)
self.tasks.task_done()
def do_work(n):
"""
Sleeps a random ammount of time, then creates a little CPU usage to
mimic some work taking place.
"""
for z in range(100):
time.sleep(random.randint(3,10))
print "Thread ID: {} working.".format(threading.current_thread())
for x in range(30000):
x*n
print "Thread ID: {} done, sleeping.".format(threading.current_thread())
if __name__ == '__main__':
num_threads = 30
# Start up the termination polling thread
term_poll = Instance_Termination_Poll()
term_poll.start()
# Create our threadpool
pool = ThreadPool(num_threads)
for y in range(num_threads*2):
pool.add_task(do_work, n=y)
# Wait for the threadpool to complete
pool.wait_completion()

Can a queue worker signal failure to the parent?

Imagine that I have a task queue with a consumer like this (this is almost identical to the sample code here):
def worker(tasks):
while True:
try:
item = tasks.get_nowait()
except:
return
execute(item)
tasks.task_done()
and a producer like this:
def batch_execute(items, n_threads):
tasks = Queue()
for item in items:
tasks.put(item)
for n in range(n_threads):
t = threading.Thread(target=worker, args=tasks)
t.start()
tasks.join()
This works, except that execute(item) can throw exceptions. If that happens, the given thread will bail, the others keep running, and the tasks.join() will hang indefinitely. Both traits are undesirable. Is there a typical design people use to e.g. "forward" the exception from the child thread into the parent thread and unblock tasks.join()? Or do I have to manually implement all of that around python's Queue class?

Single Producer Multiple Consumer

I wish to have a single producer, multiple consumer architecture in Python while performing multi-threaded programming. I wish to have an operation like this :
Producer produces the data
Consumers 1 ..N (N is pre-determined) wait for the data to arrive (block) and then process the SAME data in different ways.
So I need all the consumers to to get the same data from the producer.
When I used Queue to perform this, I realized that all but the first consumer would be starved with the implementation I have.
One possible solution is to have a unique queue for each of the consumer threads wherein the same data is pushed in multiple queues by the producer. Is there a better way to do this ?
from threading import Thread
import time
import random
from Queue import Queue
my_queue = Queue(0)
def Producer():
global my_queue
my_list = []
for each in range (50):
my_list.append(each)
my_queue.put(my_list)
def Consumer1():
print "Consumer1"
global my_queue
print my_queue.get()
my_queue.task_done()
def Consumer2():
print "Consumer2"
global my_queue
print my_queue.get()
my_queue.task_done()
P = Thread(name = "Producer", target = Producer)
C1 = Thread(name = "Consumer1", target = Consumer1)
C2 = Thread(name = "Consumer2", target = Consumer2)
P.start()
C1.start()
C2.start()
In the example above, the C2 gets blocked indefinitely as C1 consumes the data produced by P1. What I would rather want is for C1 and C2 both to be able to access the SAME data as produced by P1.
Thanks for any code/pointers!
Your producer creates only one job to do:
my_queue.put(my_list)
For example, put my_list twice, and both consumers work:
def Producer():
global my_queue
my_list = []
for each in range (50):
my_list.append(each)
my_queue.put(my_list)
my_queue.put(my_list)
So this way you put two jobs to queue with the same list.
However i have to warn you: to modify the same data in different threads without thread synchronization is generally bad idea.
Anyways, approach with one queue would not work for you, since one queue is supposed to be processed with threads with the same algorithm.
So, I advise you to go ahead with unique queue per each consumer, since other solutions are not as trivial.
How about a per-thread queue then?
As part of starting each consumer, you would also create another Queue, and add this to a list of "all thread queues". Then start the producer, passing it the list of all queues, which he can then push data into all of them.
A single-producers and five-consumers example, verified.
from multiprocessing import Process, JoinableQueue
import time
import os
q = JoinableQueue()
def producer():
for item in range(30):
time.sleep(2)
q.put(item)
pid = os.getpid()
print(f'producer {pid} done')
def worker():
while True:
item = q.get()
pid = os.getpid()
print(f'pid {pid} Working on {item}')
print(f'pid {pid} Finished {item}')
q.task_done()
for i in range(5):
p = Process(target=worker, daemon=True).start()
producers = []
# it is easy to extend it to multi producers.
for i in range(1):
p = Process(target=producer)
producers.append(p)
p.start()
# make sure producers done
for p in producers:
p.join()
# block until all workers are done
q.join()
print('All work completed')
Explanation:
One producer and five consumers in this example.
JoinableQueue is used to make sure all elements stored in queue will be processed. 'task_done' is for worker to notify an element is done. 'q.join()' will wait for all elements marked as done.
With #2, there is no need to join wait for every worker.
But it is important to join wait for producer to store element into queue. Otherwise, program exit immediately.
I do know it might be an overkill, but... What about using signal/slot framework from Qt? For consistency, QThread could be used instead of threading.Thread
from __future__ import annotations # Needed for forward Consumer typehint in register_consumer
from queue import Queue
from typing import List
from PySide2.QtCore import QThread, QObject, QCoreApplication, Signal, Slot, Qt
import time
import random
def thread_name():
# Convenient class
return QThread.currentThread().objectName()
class Producer(QThread):
product_available = Signal(list)
def __init__(self):
QThread.__init__(self, objectName='ThreadProducer')
self.consumers: List[Consumer] = list()
# See Consumer class comments for info (exactly the same reason here)
self.internal_consumer_queue = Queue()
self.active = True
def run(self):
my_list = [each for each in range(5)]
self.product_available.emit(my_list)
print(f'Producer: from thread {QThread.currentThread().objectName()} I\'ve sent my products\n')
while self.active:
consumer: Consumer = self.internal_consumer_queue.get(block=True)
print(f'Producer: {consumer} has told me it has completed his task with my product! '
f'(Thread {thread_name()})')
if not consumer in self.consumers:
raise ValueError(f'Consumer {consumer} was not registered')
self.consumers.remove(consumer)
if len(self.consumers) == 0:
print('All consumers have completed their task! I\'m terminating myself')
self.active = False
#Slot(object)
def on_task_done_by_consumer(self, consumer: Consumer):
self.internal_consumer_queue.put(consumer)
def register_consumer(self, consumer: Consumer):
if consumer in self.consumers:
return
self.consumers.append(consumer)
consumer.task_done_with_product.connect(self.on_task_done_by_consumer)
class Consumer(QThread):
task_done_with_product = Signal(object)
def __init__(self, name: str, producer: Producer):
self.name = name
# Super init and set Thread name
QThread.__init__(self, objectName=f'Thread_Of_{self.name}')
self.producer = producer
# See method on_product_available doc
self.internal_queue = Queue()
def run(self) -> None:
self.producer.product_available.connect(self.on_product_available, Qt.ConnectionType.UniqueConnection)
# Thread loop waiting for product availability
product = self.internal_queue.get(block=True)
print(f'{self.name}: Product {product} received and elaborated in thread {thread_name()}\n\n')
# Tell the producer I've done
self.task_done_with_product.emit(self)
# Now the thread is naturally closed
#Slot(list)
def on_product_available(self, product: list):
"""
As a limitation of PySide, it seems that list are not supported for QueuedConnection. This work around using
internal queue might solve
"""
# This is executed in Main Loop!
print(f'{self.name}: In thread {thread_name()} I received the product, and I\'m queuing it for being elaborated'
f'in consumer thread')
self.internal_queue.put(product)
# Quit the thread
self.active = False
def __repr__(self):
# Needed in case of exception for representing current consumer
return f'{self.name}'
# Needed to executed main and threads event loops
app = QCoreApplication()
QThread.currentThread().setObjectName('MainThread')
producer = Producer()
c1 = Consumer('Consumer1', producer)
c1.start()
producer.register_consumer(c1)
c2 = Consumer('Consumer2', producer)
c2.start()
producer.register_consumer(c2)
producer.product_available.connect(c1.on_product_available)
producer.product_available.connect(c2.on_product_available)
# Start Producer thread for LAST!
producer.start()
app.exec_()
Results:
Producer: from thread ThreadProducer I've sent my products
Consumer1: In thread MainThread I received the product, and I'm queuing it for being elaboratedin consumer thread
Consumer1: Product [0, 1, 2, 3, 4] received and elaborated in thread Thread_Of_Consumer1
Consumer2: In thread MainThread I received the product, and I'm queuing it for being elaboratedin consumer thread
Consumer2: Product [0, 1, 2, 3, 4] received and elaborated in thread Thread_Of_Consumer2
Producer: Consumer1 has told me it has completed his task with my product! (Thread ThreadProducer)
Producer: Consumer2 has told me it has completed his task with my product! (Thread ThreadProducer)
All consumers have completed their task! I'm terminating myself
Notes:
The step-by-step explanation is into the code comments. If anything is unclear, I'll try my best for better clarifying
Unfortunately I've not found a way to use QueueConnection (doc here) so as to directly execute the Slot into the proper thread: an internal queueing has been used to pass information from main loop to proper thread (either Producer and Consumer). It seems that list and object cannot be meta-registered in PySide/pyqt for queueing purposes

Python Process completion

So right now I am attempting to create a Python program that executes a series of tasks (Process subclasses). One of the things that I would like to know is when a Process has completed. Ideally, what I would want to do is have the Process subclass make a callback to the calling process in order to add the next Process into a queue. Here is what I have so far:
from multiprocessing import Process, Queue
import time
class Task1(Process):
def __init__(self, queue):
super(Task1, self).__init__()
self.queue = queue
def run(self):
print 'Start Task 1'
time.sleep(1)
print 'Completed Task 1'
# make a callback to the main process to alert it that its execution has completed
class Task2(Process):
def __init__(self, queue):
super(Task2, self).__init__()
self.queue = queue
def run(self):
print 'Start Task 2'
time.sleep(1)
print 'Completed Task 2'
# make a callback to the main process to alert it that its execution has completed
if __name__ == '__main__':
queue = Queue()
p1 = Process1(queue)
p1.start()
p1.join()
# need a callback of some sort to know when p1 has completed its execution in order to add Process2 into the queue
Prior to Python, I mainly worked with Objective-C. I am mainly trying to find something for Process that is analogous to a completion block. Thanks.
Functions are first class citizens in Python, so you can just pass them as arguments,
e.g. to your task constructors:
def __init__(self, queue, callb):
super(Task2, self).__init__()
self.queue = queue
self.callb = callb
You can then call them after the run method finished:
def run(self):
print 'Start Task 2'
time.sleep(1)
print 'Completed Task 2'
self.callb(self)
Define a function somewhere, e.g.
def done(arg):
print "%s is done" % arg
And pass it to the task constructor:
p1 = Task1(queue, done)
If I'm understanding your question correctly, your code already does what you want it to do!
The p1.join() will block the main process until p1 finishes. If p1.join() returns without error, then the process must have terminated and you can immediately start task2. Your "callback" would simply be a check that p1.join() has returned correctly!
From the documentation:
join([timeout])
Block the calling thread until the process whose join() method is called terminates or until the optional timeout occurs.
If timeout is None then there is no timeout.
A process can be joined many times.
A process cannot join itself because this would cause a deadlock. It is an error to attempt to join a process before it has been started.
Edit:
Optionally, if you want a non-blocking solution, you can poll a particular process to see if it has terminated:
p1.start()
while(p1.is_alive()):
pass #busy wait
p2.start()
This will do exactly the same as with p1.join, but with this you can replace the pass with useful work while waiting for p1 to complete.

How to end a program properly with threads?

I have a class which pulls items from a queue and then runs code on it. I also have code in the main function that adds items to the queue for processing.
For some reason, the program doesn't want to end properly.
Here is the code:
class Downloader(Thread):
def __init__(self, queue):
self.queue = queue
Thread.__init__(self)
def run(self):
while True:
download_file(self.queue.get())
self.queue.task_done()
def spawn_threads(Class, amount):
for t in xrange(amount):
thread = Class(queue)
thread.setDaemon = True
thread.start()
if __name__ == "__main__":
spawn_threads(Downloader, 20)
for item in items: queue.put(item)
#not the real code, but simplied because it isn't relevant
print 'Done scanning. Waiting for downloads to finish.'
queue.join()
print 'Done!'
The program waits for it to finish properly at the queue.join() and prints Done!, but something keeps the program from closing which i can't seem to put my finger on. I'd assume it was the while True loop, but i thought setting the threads as daemons was meant to solve that.
You are not using setDaemon() correctly. As a result, none of the Downloader threads are daemon threads.
Instead of
thread.setDaemon = True
write
thread.setDaemon(True)
or
thread.daemon = True
(The docs seem to imply that the latter is the preferred spelling in Python 2.6+.)

Categories