Multiple consumers, is it possible to clone a queue (gevent)? - python

I'd like to do something like that (1 queue, and multiple consumers):
import gevent
from gevent import queue
q=queue.Queue()
q.put(1)
q.put(2)
q.put(3)
q.put(StopIteration)
def consumer(qq):
for i in qq:
print i
jobs=[gevent.spawn(consumer,i) for i in [q,q]]
gevent.joinall(jobs)
But it's not possible ... the queue is consumed by job1 ... so job2 would block forever.
It gives me the exception gevent.hub.LoopExit: This operation would block forever.
I would that each consumer will be able to consume the full queue from start. (should display 1,2,3,1,2,3 or 1,1,2,2,3,3 ... nevermind)
One idea should be to clone the queue before spawning, but it's not possible using copy (shallow/deep) module ;-(
Is there another way to do that ?
[EDIT]
what do you think of that ?
import gevent
from gevent import queue
class MasterQueueClonable(queue.Queue):
def __init__(self,*a,**k):
queue.Queue.__init__(self,*a,**k)
self.__cloned = []
self.__old=[]
#override
def get(self,*a,**k):
e=queue.Queue.get(self,*a,**k)
for i in self.__cloned: i.put(e) # serve to current clones
self.__old.append(e) # save old element
return e
def clone(self):
q=queue.Queue()
for i in self.__old: q.put(i) # feed a queue with elements which are out
self.__cloned.append(q) # stock the queue, to be able to put newer elements too
return q
q=MasterQueueClonable()
q.put(1)
q.put(2)
q.put(3)
q.put(StopIteration)
def consumer(qq):
for i in qq:
print id(qq),i
jobs=[gevent.spawn(consumer,i) for i in [q.clone(), q ,q.clone(),q.clone()]]
gevent.joinall(jobs)
It's based on the idea of RyanYe. There is a "master queue" without a dispatcher.
My master queue override the GET method, and can dispatch to an ondemand clone.
And more, a "clone" can be created after the start of the masterqueue (with the __old trick).

I suggest you to create a greenlet to dispatch the work to consumers. Example code:
import gevent
from gevent import queue
master_queue=queue.Queue()
master_queue.put(1)
master_queue.put(2)
master_queue.put(3)
master_queue.put(StopIteration)
total_consumers = 10
consumer_queues = [queue.Queue() for i in xrange(total_consumers)]
def dispatcher(master_queue, consumer_queues):
for i in master_queue:
[j.put(i) for j in consumer_queues]
[j.put(StopIteration) for j in consumer_queues]
def consumer(qq):
for i in qq:
print i
jobs=[gevent.spawn(dispatcher, q, consumer_queues)] + [gevent.spawn(consumer, i) for i in consumer_queues]
gevent.joinall(jobs)
UPDATE: Fix missing StopIteration for consumer queues. Thanks arilou for pointing it out.

I've added copy() method to Queue class:
>>> import gevent.queue
>>> q = gevent.queue.Queue()
>>> q.put(5)
>>> q.copy().get()
5
>>> q
<Queue at 0x1062760d0 queue=deque([5])>
Let me know if it helps.

In the answer by Ryan Ye one line is missed in the end of the dispatcher() function:
[j.put(StopIteration) for j in consumer_queues]
Without it we still get 'gevent.hub.LoopExit: This operation would block forever' since 'for i in master_queue' loop doesn't copy StopIteration exception into the consumer_queues.
(Sorry, I can't leave comments yet so I write it as a separete answer.)

Related

Python - Non-empty shared list on separate thread appears empty

I've two classes - MessageProducer and MessageConsumer.
MessageConsumer does the following:
receives messages and puts them in its message list "_unprocessed_msgs"
on a separate worker thread, moves the messages to internal list "_in_process_msgs"
on the worker thread, processes messages from "_in_process_msgs"
On my development environment, I'm facing issue with #2 above - after adding a message by performing step#1, when worker thread checks length of "_unprocessed_msgs", it gets it as zero.
When step #1 is repeated, the list properly shows 2 items on the thread on which the item was added. But in step #2, on worker thread, again the len(_unprocessed_msgs) returns zero.
Not sure why this is happening. Would really appreciate help any help on this.
I'm using Ubuntu 16.04 having Python 2.7.12.
Below is the sample source code. Please let me know if more information is required.
import threading
import time
class MessageConsumerThread(threading.Thread):
def __init__(self):
super(MessageConsumerThread, self).__init__()
self._unprocessed_msg_q = []
self._in_process_msg_q = []
self._lock = threading.Lock()
self._stop_processing = False
def start_msg_processing_thread(self):
self._stop_processing = False
self.start()
def stop_msg_processing_thread(self):
self._stop_processing = True
def receive_msg(self, msg):
with self._lock:
LOG.info("Before: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
self._unprocessed_msg_q.append(msg)
LOG.info("After: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
def _queue_unprocessed_msgs(self):
with self._lock:
LOG.info("MessageConsumerThread::_queue_unprocessed_msgs: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
if self._unprocessed_msg_q:
LOG.info("Moving messages from unprocessed to in_process queue")
self._in_process_msg_q += self._unprocessed_msg_q
self._unprocessed_msg_q = []
LOG.info("Moved messages from unprocessed to in_process queue")
def run(self):
while not self._stop_processing:
# Allow other threads to add messages to message queue
time.sleep(1)
# Move unprocessed listeners to in-process listener queue
self._queue_unprocessed_msgs()
# If nothing to process continue the loop
if not self._in_process_msg_q:
continue
for msg in self._in_process_msg_q:
self.consume_message(msg)
# Clean up processed messages
del self._in_process_msg_q[:]
def consume_message(self, msg):
print(msg)
class MessageProducerThread(threading.Thread):
def __init__(self, producer_id, msg_receiver):
super(MessageProducerThread, self).__init__()
self._producer_id = producer_id
self._msg_receiver = msg_receiver
def start_producing_msgs(self):
self.start()
def run(self):
for i in range(1,10):
msg = "From: %s; Message:%s" %(self._producer_id, i)
self._msg_receiver.receive_msg(msg)
def main():
msg_receiver_thread = MessageConsumerThread()
msg_receiver_thread.start_msg_processing_thread()
msg_producer_thread = MessageProducerThread(producer_id='Producer-01',
msg_receiver=msg_receiver_thread)
msg_producer_thread.start_producing_msgs()
msg_producer_thread.join()
msg_receiver_thread.stop_msg_processing_thread()
msg_receiver_thread.join()
if __name__ == '__main__':
main()
Following is the log the I get:
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=0
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=1**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=1
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=2**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
This is not a good desing for you application.
I spent some time trying to debug this - but threading code is naturally complicated, so we should try to descomplicate it, instead of getting it even more confure.
When I see threading code in Python, I usually see it written a in a procedural form: a normal function that is passed to threading.Thread as the target argument that drives each thread. That way, you don't need to write code for a new class that will have a single instance.
Another thing is that, although Python's global interpreter lock itself guarantees lists won't get corrupted if modified in two separate threads, lists are not a recomended "thread data passing" data structure. You probably should look at threading.Queue to do that
The thing is wrong in this code at first sight is probably not the cause of your problem due to your use of locks, but it might be. Instead of
self._unprocessed_msg_q = []
which will create a new list object, the other thread have momentarily no reference too (so it might write data to the old list), you should do:
self._unprocessed_msg_q[:] = []
Or just the del slice thing you do on the other method.
But to be on the safer side, and having mode maintanable and less surprising code, you really should change to a procedural approach there, assuming Python threading. Assume "Thread" is the "final" object that can do its thing, and then use Queues around:
# coding: utf-8
from __future__ import print_function
from __future__ import unicode_literals
from threading import Thread
try:
from queue import Queue, Empty
except ImportError:
from Queue import Queue, Empty
import time
import random
TERMINATE_SENTINEL = object()
NO_DATA_SENTINEL = object()
class Receiver(object):
def __init__(self, queue):
self.queue = queue
self.in_process = []
def receive_data(self, data):
self.in_process.append(data)
def consume_data(self):
print("received data:", self.in_process)
del self.in_process[:]
def receiver_loop(self):
queue = self.queue
while True:
try:
data = queue.get(block=False)
except Empty:
print("got no data from queue")
data = NO_DATA_SENTINEL
if data is TERMINATE_SENTINEL:
print("Got sentinel: exiting receiver loop")
break
self.receive_data(data)
time.sleep(random.uniform(0, 0.3))
if queue.empty():
# Only process data if we have nothing to receive right now:
self.consume_data()
print("sleeping receiver")
time.sleep(1)
if self.in_process:
self.consume_data()
def producer_loop(queue):
for i in range(10):
time.sleep(random.uniform(0.05, 0.4))
print("putting {0} in queue".format(i))
queue.put(i)
def main():
msg_queue = Queue()
msg_receiver_thread = Thread(target=Receiver(msg_queue).receiver_loop)
time.sleep(0.1)
msg_producer_thread = Thread(target=producer_loop, args=(msg_queue,))
msg_receiver_thread.start()
msg_producer_thread.start()
msg_producer_thread.join()
msg_queue.put(TERMINATE_SENTINEL)
msg_receiver_thread.join()
if __name__ == '__main__':
main()
note that since you want multiple methods in the recever thread to do things with data, I used a class - but it does not inherit from Thread, and does not have to worry about its workings. All its methods are called within the same thread: no need of locks, no worries about race conditions within the receiver class itself. For communicating outside the class, the Queue class is structured to handle any race conditions for us.
The producer loop, as it is just a dummy producer, has no need at all to be written in class form. But it would look just the same, if it had more methods.
(The random sleeps help visualize what would happen in "real world" message receiving)
Also, you might want to take a look at something like:
https://www.thoughtworks.com/insights/blog/composition-vs-inheritance-how-choose
Finally I was able to solve the issue. In the actual code, I've a Manager class that is responsible for instantiating MessageConsumerThread as its last thing in the initializer:
class Manager(object):
def __init__(self):
...
...
self._consumer = MessageConsumerThread(self)
self._consumer.start_msg_processing_thread()
The problem seems to be with passing 'self' in MessageConsumerThread initializer when Manager is still executing its initializer (eventhough those are last two steps). The moment I moved the creation of consumer out of initializer, consumer thread was able to see the elements in "_unprocessed_msg_q".
Please note that the issue is still not reproducible with the above sample code. It is manifesting itself in the production environment only. Without the above fix, I tried queue and dictionary as well but observed the same issue. After the fix, tried with queue and list and was able to successfully execute the code.
I really appreciate and thank #jsbueno and #ivan_pozdeev for their time and help! Community #stackoverflow is very helpful!

Gevent: Using two queues with two consumers without blocking each other at the same time

I have the problem that I need to write values generated by a consumer to disk. I do not want to open a new instance of a file to write every time, so I thought to use a second queue and a other consumer to write to disk from a singe Greenlet. The problem with my code is that the second queue does not get consumed async from the first queue. The first queue finishes first and then the second queue gets consumed.
I want to write values to disk at the same time then other values get generated.
Thanks for help!
#!/usr/bin/python
#- * -coding: utf-8 - * -
import gevent #pip install gevent
from gevent.queue import *
import gevent.monkey
from timeit import default_timer as timer
from time import sleep
import cPickle as pickle
gevent.monkey.patch_all()
def save_lineCount(count):
with open("count.p", "wb") as f:
pickle.dump(count, f)
def loader():
for i in range(0,3):
q.put(i)
def writer():
while True:
task = q_w.get()
print "writing",task
save_lineCount(task)
def worker():
while not q.empty():
task = q.get()
if task%2:
q_w.put(task)
print "put",task
sleep(10)
def asynchronous():
threads = []
threads.append(gevent.spawn(writer))
for i in range(0, 1):
threads.append(gevent.spawn(worker))
start = timer()
gevent.joinall(threads,raise_error=True)
end = timer()
#pbar.close()
print "\n\nTime passed: " + str(end - start)[:6]
q = gevent.queue.Queue()
q_w = gevent.queue.Queue()
gevent.spawn(loader).join()
asynchronous()
In general, that approach should work fine. There are some problems with this specific code, though:
Calling time.sleep will cause all greenlets to block. You either need to call gevent.sleep or monkey-patch the process in order to have just one greenlet block (I see gevent.monkey imported, but patch_all is not called). I suspect that's the major problem here.
Writing to a file is also synchronous and causes all greenlets to block. You can use FileObjectThread if that's a major bottleneck.

Python Event loop w/ gevent

import gevent
from gevent.event import AsyncResult
import time
class Job(object):
def __init__(self, name):
self.name = name
def setter(job):
print 'starting'
gevent.sleep(3)
job.result.set('%s done' % job.name)
def waiter(job):
print job.result.get()
# event loop
running = []
for i in range(5):
print 'creating'
j = Job(i)
j.result = AsyncResult()
running.append(gevent.spawn(setter, j))
running.append(gevent.spawn(waiter, j))
print 'started greenlets, event loop go do something else'
time.sleep(5)
gevent.joinall(running)
gevent doesnt actually start until joinall is called
Is there something that would start/spawn gevent asynchronously (why does it not start right away as soon as spawn is called)?
Is there a select/epoll on running greenlets to see which one needs to be joined instead of joinall()?
No, it does not start straight away. It will start as soon as your main greenlet yields to the hub (releases control by calling sleep or join for example)
Clearly your intention is that it starts when you call time. It does not, because you have not monkey patched it.
Add these lines to the very top of your file:
from gevent import monkey
monkey.patch_all()
This will then have the behaviour that you want (because under the hood, time will be modified to yield to the hub).
Alternatively, you can call gevent.sleep.
Since you did not monkey patch, time.sleep() is causing your app to pause. Use gevent.sleep(5) instead.
The very first step should be monkey patching
from gevent import monkey;
monkey.patch_all()
This will spawn the greenlets asynchronously.

Single Producer Multiple Consumer

I wish to have a single producer, multiple consumer architecture in Python while performing multi-threaded programming. I wish to have an operation like this :
Producer produces the data
Consumers 1 ..N (N is pre-determined) wait for the data to arrive (block) and then process the SAME data in different ways.
So I need all the consumers to to get the same data from the producer.
When I used Queue to perform this, I realized that all but the first consumer would be starved with the implementation I have.
One possible solution is to have a unique queue for each of the consumer threads wherein the same data is pushed in multiple queues by the producer. Is there a better way to do this ?
from threading import Thread
import time
import random
from Queue import Queue
my_queue = Queue(0)
def Producer():
global my_queue
my_list = []
for each in range (50):
my_list.append(each)
my_queue.put(my_list)
def Consumer1():
print "Consumer1"
global my_queue
print my_queue.get()
my_queue.task_done()
def Consumer2():
print "Consumer2"
global my_queue
print my_queue.get()
my_queue.task_done()
P = Thread(name = "Producer", target = Producer)
C1 = Thread(name = "Consumer1", target = Consumer1)
C2 = Thread(name = "Consumer2", target = Consumer2)
P.start()
C1.start()
C2.start()
In the example above, the C2 gets blocked indefinitely as C1 consumes the data produced by P1. What I would rather want is for C1 and C2 both to be able to access the SAME data as produced by P1.
Thanks for any code/pointers!
Your producer creates only one job to do:
my_queue.put(my_list)
For example, put my_list twice, and both consumers work:
def Producer():
global my_queue
my_list = []
for each in range (50):
my_list.append(each)
my_queue.put(my_list)
my_queue.put(my_list)
So this way you put two jobs to queue with the same list.
However i have to warn you: to modify the same data in different threads without thread synchronization is generally bad idea.
Anyways, approach with one queue would not work for you, since one queue is supposed to be processed with threads with the same algorithm.
So, I advise you to go ahead with unique queue per each consumer, since other solutions are not as trivial.
How about a per-thread queue then?
As part of starting each consumer, you would also create another Queue, and add this to a list of "all thread queues". Then start the producer, passing it the list of all queues, which he can then push data into all of them.
A single-producers and five-consumers example, verified.
from multiprocessing import Process, JoinableQueue
import time
import os
q = JoinableQueue()
def producer():
for item in range(30):
time.sleep(2)
q.put(item)
pid = os.getpid()
print(f'producer {pid} done')
def worker():
while True:
item = q.get()
pid = os.getpid()
print(f'pid {pid} Working on {item}')
print(f'pid {pid} Finished {item}')
q.task_done()
for i in range(5):
p = Process(target=worker, daemon=True).start()
producers = []
# it is easy to extend it to multi producers.
for i in range(1):
p = Process(target=producer)
producers.append(p)
p.start()
# make sure producers done
for p in producers:
p.join()
# block until all workers are done
q.join()
print('All work completed')
Explanation:
One producer and five consumers in this example.
JoinableQueue is used to make sure all elements stored in queue will be processed. 'task_done' is for worker to notify an element is done. 'q.join()' will wait for all elements marked as done.
With #2, there is no need to join wait for every worker.
But it is important to join wait for producer to store element into queue. Otherwise, program exit immediately.
I do know it might be an overkill, but... What about using signal/slot framework from Qt? For consistency, QThread could be used instead of threading.Thread
from __future__ import annotations # Needed for forward Consumer typehint in register_consumer
from queue import Queue
from typing import List
from PySide2.QtCore import QThread, QObject, QCoreApplication, Signal, Slot, Qt
import time
import random
def thread_name():
# Convenient class
return QThread.currentThread().objectName()
class Producer(QThread):
product_available = Signal(list)
def __init__(self):
QThread.__init__(self, objectName='ThreadProducer')
self.consumers: List[Consumer] = list()
# See Consumer class comments for info (exactly the same reason here)
self.internal_consumer_queue = Queue()
self.active = True
def run(self):
my_list = [each for each in range(5)]
self.product_available.emit(my_list)
print(f'Producer: from thread {QThread.currentThread().objectName()} I\'ve sent my products\n')
while self.active:
consumer: Consumer = self.internal_consumer_queue.get(block=True)
print(f'Producer: {consumer} has told me it has completed his task with my product! '
f'(Thread {thread_name()})')
if not consumer in self.consumers:
raise ValueError(f'Consumer {consumer} was not registered')
self.consumers.remove(consumer)
if len(self.consumers) == 0:
print('All consumers have completed their task! I\'m terminating myself')
self.active = False
#Slot(object)
def on_task_done_by_consumer(self, consumer: Consumer):
self.internal_consumer_queue.put(consumer)
def register_consumer(self, consumer: Consumer):
if consumer in self.consumers:
return
self.consumers.append(consumer)
consumer.task_done_with_product.connect(self.on_task_done_by_consumer)
class Consumer(QThread):
task_done_with_product = Signal(object)
def __init__(self, name: str, producer: Producer):
self.name = name
# Super init and set Thread name
QThread.__init__(self, objectName=f'Thread_Of_{self.name}')
self.producer = producer
# See method on_product_available doc
self.internal_queue = Queue()
def run(self) -> None:
self.producer.product_available.connect(self.on_product_available, Qt.ConnectionType.UniqueConnection)
# Thread loop waiting for product availability
product = self.internal_queue.get(block=True)
print(f'{self.name}: Product {product} received and elaborated in thread {thread_name()}\n\n')
# Tell the producer I've done
self.task_done_with_product.emit(self)
# Now the thread is naturally closed
#Slot(list)
def on_product_available(self, product: list):
"""
As a limitation of PySide, it seems that list are not supported for QueuedConnection. This work around using
internal queue might solve
"""
# This is executed in Main Loop!
print(f'{self.name}: In thread {thread_name()} I received the product, and I\'m queuing it for being elaborated'
f'in consumer thread')
self.internal_queue.put(product)
# Quit the thread
self.active = False
def __repr__(self):
# Needed in case of exception for representing current consumer
return f'{self.name}'
# Needed to executed main and threads event loops
app = QCoreApplication()
QThread.currentThread().setObjectName('MainThread')
producer = Producer()
c1 = Consumer('Consumer1', producer)
c1.start()
producer.register_consumer(c1)
c2 = Consumer('Consumer2', producer)
c2.start()
producer.register_consumer(c2)
producer.product_available.connect(c1.on_product_available)
producer.product_available.connect(c2.on_product_available)
# Start Producer thread for LAST!
producer.start()
app.exec_()
Results:
Producer: from thread ThreadProducer I've sent my products
Consumer1: In thread MainThread I received the product, and I'm queuing it for being elaboratedin consumer thread
Consumer1: Product [0, 1, 2, 3, 4] received and elaborated in thread Thread_Of_Consumer1
Consumer2: In thread MainThread I received the product, and I'm queuing it for being elaboratedin consumer thread
Consumer2: Product [0, 1, 2, 3, 4] received and elaborated in thread Thread_Of_Consumer2
Producer: Consumer1 has told me it has completed his task with my product! (Thread ThreadProducer)
Producer: Consumer2 has told me it has completed his task with my product! (Thread ThreadProducer)
All consumers have completed their task! I'm terminating myself
Notes:
The step-by-step explanation is into the code comments. If anything is unclear, I'll try my best for better clarifying
Unfortunately I've not found a way to use QueueConnection (doc here) so as to directly execute the Slot into the proper thread: an internal queueing has been used to pass information from main loop to proper thread (either Producer and Consumer). It seems that list and object cannot be meta-registered in PySide/pyqt for queueing purposes

Gevent threads don't finish even though all the Queue items are exhausted

I'm trying to set up a simple producer-consumer system in Gevent but my script doesn't exit:
import gevent
from gevent.queue import *
import time
import random
q = Queue()
workers = []
def do_work(wid, value):
"""
Actual blocking function
"""
gevent.sleep(random.randint(0,2))
print 'Task', value, 'done', wid
return
def worker(wid):
"""
Consumer
"""
while True:
item = q.get()
do_work(wid, item)
def producer():
"""
Producer
"""
for i in range(4):
workers.append(gevent.spawn(worker, random.randint(1, 100000)))
for item in range(1, 9):
q.put(item)
producer()
gevent.joinall(workers)
I haven't been able to find good examples/tutorials on using Gevent so what I've pasted above is what I've cobbled up from the internet.
Multiple workers get activated, the items go into the queue but even when everything in the queue finishes, the main program doesn't exit. I have to press CTRL ^ C.
What am I doing wrong?
Thanks.
On a side note: if there is anything my script that could be improved, please let me know. Simple things like checking when the Queue is empty, etc.
I think you should use JoinableQueue like in example from documentation.
import gevent
from gevent.queue import *
import time
import random
q = JoinableQueue()
workers = []
def do_work(wid, value):
gevent.sleep(random.randint(0,2))
print 'Task', value, 'done', wid
def worker(wid):
while True:
item = q.get()
try:
do_work(wid, item)
finally:
q.task_done()
def producer():
for i in range(4):
workers.append(gevent.spawn(worker, random.randint(1, 100000)))
for item in range(1, 9):
q.put(item)
producer()
q.join()
In your worker, you activate a loop that will run forever.
As a side note, an imho more elegant "forever loop" can be written with just:
for work_unit in q:
# Do work, etc
gevent.joinall() waits for the workers to finish; but they never do, so your program will forever be waiting. This is what causes it to not exit.
If you don't care about the workers anymore, you can just kill them instead:
gevent.killall(workers)
An alternative is to put a 'special' item in the queue. When a worker receives this item, it recognises it as different from normal work and stops working.
for worker in workers:
q.put("TimeToDie")
for work_unit in q:
if work_unint == "TimeToDie":
break
do_work()
Or you could even use gevent's Event to do this kind of pattern.

Categories