How to use queue with concurrent future ThreadPoolExecutor in python 3? - python

I am using simple threading modules to do concurrent jobs. Now I would like to take advantages of concurrent futures modules. Can some put me a example of using a queue with concurrent library?
I am getting TypeError: 'Queue' object is not iterable
I dont know how to iterate queues
code snippet:
def run(item):
self.__log.info(str(item))
return True
<queue filled here>
with concurrent.futures.ThreadPoolExecutor(max_workers = 100) as executor:
furtureIteams = { executor.submit(run, item): item for item in list(queue)}
for future in concurrent.futures.as_completed(furtureIteams):
f = furtureIteams[future]
print(f)

I would suggest something like this:
def run(queue):
item = queue.get()
self.__log.info(str(item))
return True
<queue filled here>
workerThreadsToStart = 10
with concurrent.futures.ThreadPoolExecutor(max_workers = 100) as executor:
furtureIteams = { executor.submit(run, queue): index for intex in range(workerThreadsToStart)}
for future in concurrent.futures.as_completed(furtureIteams):
f = furtureIteams[future]
print(f)
The problem you will run in is that a queue is thought to be endless and as a medium to decouple the threads that put something into the queue and threads that get items out of the queue.
When
you have a finite number of items or
you compute all items at once
and afterwards process them in parallel, a queue makes no sense.
A ThreadPoolExecutor makes a queue obsolete in these cases.
I had a look at the ThreadPoolExecutor source:
def submit(self, fn, *args, **kwargs): # line 94
self._work_queue.put(w) # line 102
A Queue is used inside.

As commented above, you can use the iter() function to execute a ThreadPool on a queue object. A very general code for this would look something like this:
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(run, iter(queue.get, None))
Where the run method executes the aspired work on the items of the queue.

Related

Ensuring a python queue that can be populated by multiple threads will always be cleared without polling

I have the below code that shows how a queue would always be cleared even with multiple threads adding to the queue. It's using recursion but a while loop could work as well. Is this a bad practice or would there be a scenario where the queue might have an object and it won't get pulled until something gets added to the queue.
The primary purpose of this is to have a queue that ensures order of execution without the need to continually poll or block with q.get()
import queue
import threading
lock = threading.RLock()
q = queue.Queue()
def execute():
with lock:
if not q.empty():
text = q.get()
print(text)
execute()
def add_to_queue(text):
q.put(text)
execute()
# Assume multiple threads can call add to queue
add_to_queue("Hello")
This is one solution that uses timeout on the .get function, one pushes to the queue and one reads from the queue. You could have multiple readers and writers.
import queue
import threading
q = queue.Queue()
def read():
try:
while True:
text = q.get(timeout=1)
print(text)
except queue.Empty:
print("exiting")
def write():
q.put("Hello")
q.put("There")
q.put("My")
q.put("Friend")
writer = threading.Thread(target=write)
reader = threading.Thread(target=read)
writer.start()
reader.start()
reader.join()

Why am I unable to join this thread in python?

I am writing a multithreading class. The class has a parallel_process() function that is overridden with the parallel task. The data to be processed is put in the queue. The worker() function in each thread keeps calling parallel_process() until the queue is empty. Results are put in the results Queue object. The class definition is:
import threading
try:
from Queue import Queue
except ImportError:
from queue import Queue
class Parallel:
def __init__(self, pkgs, common=None, nthreads=1):
self.nthreads = nthreads
self.threads = []
self.queue = Queue()
self.results = Queue()
self.common = common
for pkg in pkgs:
self.queue.put(pkg)
def parallel_process(self, pkg, common):
pass
def worker(self):
while not self.queue.empty():
pkg = self.queue.get()
self.results.put(self.parallel_process(pkg, self.common))
self.queue.task_done()
return
def start(self):
for i in range(self.nthreads):
t = threading.Thread(target=self.worker)
t.daemon = False
t.start()
self.threads.append(t)
def wait_for_threads(self):
print('Waiting on queue to empty...')
self.queue.join()
print('Queue processed. Joining threads...')
for t in self.threads:
t.join()
print('...Thread joined.')
def get_results(self):
results = []
print('Obtaining results...')
while not self.results.empty():
results.append(self.results.get())
return results
I use it to create a parallel task:
class myParallel(Parallel): # return square of numbers in a list
def parallel_process(self, pkg, common):
return pkg**2
p = myParallel(range(50),nthreads=4)
p.start()
p.wait_for_threads()
r = p.get_results()
print('FINISHED')
However all threads do not join every time the code is run. Sometimes only 2 join, sometimes no thread joins. I do not think I am blocking the threads from finishing. What reason could there be for join() to not work here?
This statement may lead to errors:
while not self.queue.empty():
pkg = self.queue.get()
With multiple threads pulling items from the queue, there's no guarantee that self.queue.get() will return a valid item, even if you check if the queue is empty beforehand. Here is a possible scenario
Thread 1 checks the queue and the queue is not empty, control proceeds into the while loop.
Control passes to Thread 2, which also checks the queue, finds it is not empty and enters the while loop. Thread 2 gets an item from the loop. The queue is now empty.
Control passes back to Thread 1, it gets an item from the queue, but the queue is now empty, an Empty Exception should be raised.
You should just use a try/except to get an item from the queue
try:
pkg = self.queue.get_nowait()
except Empty:
pass
#Brendan Abel identified the cause. I'd like to suggest a different solution: queue.join() is usually a Bad Idea too. Instead, create a unique value to use as a sentinel:
class Parallel:
_sentinel = object()
At the end of __init__(), add one sentinel to the queue for each thread:
for i in range(nthreads):
self.queue.put(self._sentinel)
Change the start of worker() like so:
while True:
pkg = self.queue.get()
if pkg is self._sentinel:
break
By the construction of the queue, it won't be empty until each thread has seen its sentinel value, so there's no need to mess with the unpredictable queue.size().
Also remove the queue.join() and queue.task_done() cruft.
This will give you reliable code that's easy to modify for fancier scenarios. For example, if you want to add more work items while the threads are running, fine - just write another method to say "I'm done adding work items now", and move the loop adding sentinels into that.

Self-joining thread pool: where's my race condition?

Since I use a similar pattern in my work a lot, I decided to write a class that abstracts very simple worker concurrency via job queue / threading. I know there are already things out there that solve this, but I also wanted to use this as an opportunity to hone my multithreading skills.
The main challenge I've given myself is that I want this to be able to let processes finish, even if they are not explicitly blocked by Queue.join(). "A process finishing" is defined by the input function returning a value (or None). The way I have attempted to accomplish this is by having each job create it's own results queue rq, which is then checked by _wait_for_results in a non-daemon thread, which blocks the automatic exit of all other daemonized threads until rq is filled by the worker in add_to_queue.
Here is the full class:
class EasyPool(object):
def __init__(self, concurrency, always_finish=True):
def add_to_queue(q):
while True:
func_data, rq = q.get()
func, args, kwargs = func_data
if not args:
args = []
if not kwargs:
kwargs = {}
result = func(*args, **kwargs)
rq.put(result)
q.task_done()
self.rqs = []
self.always_finish = always_finish
self.q = Queue(maxsize=0)
self.workers = []
for i in range(concurrency):
worker = Thread(target=add_to_queue, args=(self.q,))
self.workers.append(worker)
worker.setDaemon(True)
worker.start()
def _wait_for_results(self, rq):
rq.not_empty.acquire()
rq.not_empty.wait()
rq.not_empty.notify()
rq.not_empty.release()
def add_job(self, func, *args, **kwargs):
rq = Queue()
if self.always_finish:
blocker = Thread(target=self._wait_for_results, args=(rq,))
blocker.setDaemon(False)
blocker.start()
to_add = []
[ to_add.append(i) if i else to_add.append(None) for i in [func, args, kwargs] ]
self.q.put((to_add, rq))
return rq.get
When a job is created via the .add_job instance method, it immediately returns a promise-like object, which is a reference to the .get method of the results queue. The problem I'm facing is that there seems to be a race condition between this .get and the _wait_for_results method. I think the answer probably involves a Lock or a Condition, but I'm not really sure. Any help is much appreciated :)

Is there a way to use asyncio.Queue in multiple threads?

Let's assume I have the following code:
import asyncio
import threading
queue = asyncio.Queue()
def threaded():
import time
while True:
time.sleep(2)
queue.put_nowait(time.time())
print(queue.qsize())
#asyncio.coroutine
def async():
while True:
time = yield from queue.get()
print(time)
loop = asyncio.get_event_loop()
asyncio.Task(async())
threading.Thread(target=threaded).start()
loop.run_forever()
The problem with this code is that the loop inside async coroutine is never finishing the first iteration, while queue size is increasing.
Why is this happening this way and what can I do to fix it?
I can't get rid of separate thread, because in my real code I use a separate thread to communicate with a serial device, and I haven't find a way to do that using asyncio.
asyncio.Queue is not thread-safe, so you can't use it directly from more than one thread. Instead, you can use janus, which is a third-party library that provides a thread-aware asyncio queue.
import asyncio
import threading
import janus
def threaded(squeue):
import time
while True:
time.sleep(2)
squeue.put_nowait(time.time())
print(squeue.qsize())
#asyncio.coroutine
def async_func(aqueue):
while True:
time = yield from aqueue.get()
print(time)
loop = asyncio.get_event_loop()
queue = janus.Queue(loop=loop)
asyncio.create_task(async_func(queue.async_q))
threading.Thread(target=threaded, args=(queue.sync_q,)).start()
loop.run_forever()
There is also aioprocessing (full-disclosure: I wrote it), which provides process-safe (and as a side-effect, thread-safe) queues as well, but that's overkill if you're not trying to use multiprocessing.
Edit
As pointed it out in other answers, for simple use-cases you can use loop.call_soon_threadsafe to add to the queue, as well.
If you do not want to use another library you can schedule a coroutine from the thread. Replacing the queue.put_nowait with the following works fine.
asyncio.run_coroutine_threadsafe(queue.put(time.time()), loop)
The variable loop represents the event loop in the main thread.
EDIT:
The reason why your async coroutine is not doing anything is that
the event loop never gives it a chance to do so. The queue object is
not threadsafe and if you dig through the cpython code you find that
this means that put_nowait wakes up consumers of the queue through
the use of a future with the call_soon method of the event loop. If
we could make it use call_soon_threadsafe it should work. The major
difference between call_soon and call_soon_threadsafe, however, is
that call_soon_threadsafe wakes up the event loop by calling loop._write_to_self() . So let's call it ourselves:
import asyncio
import threading
queue = asyncio.Queue()
def threaded():
import time
while True:
time.sleep(2)
queue.put_nowait(time.time())
queue._loop._write_to_self()
print(queue.qsize())
#asyncio.coroutine
def async():
while True:
time = yield from queue.get()
print(time)
loop = asyncio.get_event_loop()
asyncio.Task(async())
threading.Thread(target=threaded).start()
loop.run_forever()
Then, everything works as expected.
As for the threadsafe aspect of
accessing shared objects,asyncio.queue uses under the hood
collections.deque which has threadsafe append and popleft.
Maybe checking for queue not empty and popleft is not atomic, but if
you consume the queue only in one thread (the one of the event loop)
it could be fine.
The other proposed solutions, loop.call_soon_threadsafe from Huazuo
Gao's answer and my asyncio.run_coroutine_threadsafe are just doing
this, waking up the event loop.
BaseEventLoop.call_soon_threadsafe is at hand. See asyncio doc for detail.
Simply change your threaded() like this:
def threaded():
import time
while True:
time.sleep(1)
loop.call_soon_threadsafe(queue.put_nowait, time.time())
loop.call_soon_threadsafe(lambda: print(queue.qsize()))
Here's a sample output:
0
1443857763.3355968
0
1443857764.3368602
0
1443857765.338082
0
1443857766.3392274
0
1443857767.3403943
What about just using threading.Lock with asyncio.Queue?
class ThreadSafeAsyncFuture(asyncio.Future):
""" asyncio.Future is not thread-safe
https://stackoverflow.com/questions/33000200/asyncio-wait-for-event-from-other-thread
"""
def set_result(self, result):
func = super().set_result
call = lambda: func(result)
self._loop.call_soon_threadsafe(call) # Warning: self._loop is undocumented
class ThreadSafeAsyncQueue(queue.Queue):
""" asyncio.Queue is not thread-safe, threading.Queue is not awaitable
works only with one putter to unlimited-size queue and with several getters
TODO: add maxsize limits
TODO: make put corouitine
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.lock = threading.Lock()
self.loop = asyncio.get_event_loop()
self.waiters = []
def put(self, item):
with self.lock:
if self.waiters:
self.waiters.pop(0).set_result(item)
else:
super().put(item)
async def get(self):
with self.lock:
if not self.empty():
return super().get()
else:
fut = ThreadSafeAsyncFuture()
self.waiters.append(fut)
result = await fut
return result
See also - asyncio: Wait for event from other thread

Multiple consumers, is it possible to clone a queue (gevent)?

I'd like to do something like that (1 queue, and multiple consumers):
import gevent
from gevent import queue
q=queue.Queue()
q.put(1)
q.put(2)
q.put(3)
q.put(StopIteration)
def consumer(qq):
for i in qq:
print i
jobs=[gevent.spawn(consumer,i) for i in [q,q]]
gevent.joinall(jobs)
But it's not possible ... the queue is consumed by job1 ... so job2 would block forever.
It gives me the exception gevent.hub.LoopExit: This operation would block forever.
I would that each consumer will be able to consume the full queue from start. (should display 1,2,3,1,2,3 or 1,1,2,2,3,3 ... nevermind)
One idea should be to clone the queue before spawning, but it's not possible using copy (shallow/deep) module ;-(
Is there another way to do that ?
[EDIT]
what do you think of that ?
import gevent
from gevent import queue
class MasterQueueClonable(queue.Queue):
def __init__(self,*a,**k):
queue.Queue.__init__(self,*a,**k)
self.__cloned = []
self.__old=[]
#override
def get(self,*a,**k):
e=queue.Queue.get(self,*a,**k)
for i in self.__cloned: i.put(e) # serve to current clones
self.__old.append(e) # save old element
return e
def clone(self):
q=queue.Queue()
for i in self.__old: q.put(i) # feed a queue with elements which are out
self.__cloned.append(q) # stock the queue, to be able to put newer elements too
return q
q=MasterQueueClonable()
q.put(1)
q.put(2)
q.put(3)
q.put(StopIteration)
def consumer(qq):
for i in qq:
print id(qq),i
jobs=[gevent.spawn(consumer,i) for i in [q.clone(), q ,q.clone(),q.clone()]]
gevent.joinall(jobs)
It's based on the idea of RyanYe. There is a "master queue" without a dispatcher.
My master queue override the GET method, and can dispatch to an ondemand clone.
And more, a "clone" can be created after the start of the masterqueue (with the __old trick).
I suggest you to create a greenlet to dispatch the work to consumers. Example code:
import gevent
from gevent import queue
master_queue=queue.Queue()
master_queue.put(1)
master_queue.put(2)
master_queue.put(3)
master_queue.put(StopIteration)
total_consumers = 10
consumer_queues = [queue.Queue() for i in xrange(total_consumers)]
def dispatcher(master_queue, consumer_queues):
for i in master_queue:
[j.put(i) for j in consumer_queues]
[j.put(StopIteration) for j in consumer_queues]
def consumer(qq):
for i in qq:
print i
jobs=[gevent.spawn(dispatcher, q, consumer_queues)] + [gevent.spawn(consumer, i) for i in consumer_queues]
gevent.joinall(jobs)
UPDATE: Fix missing StopIteration for consumer queues. Thanks arilou for pointing it out.
I've added copy() method to Queue class:
>>> import gevent.queue
>>> q = gevent.queue.Queue()
>>> q.put(5)
>>> q.copy().get()
5
>>> q
<Queue at 0x1062760d0 queue=deque([5])>
Let me know if it helps.
In the answer by Ryan Ye one line is missed in the end of the dispatcher() function:
[j.put(StopIteration) for j in consumer_queues]
Without it we still get 'gevent.hub.LoopExit: This operation would block forever' since 'for i in master_queue' loop doesn't copy StopIteration exception into the consumer_queues.
(Sorry, I can't leave comments yet so I write it as a separete answer.)

Categories