Get all items from thread Queue - python

I have one thread that writes results into a Queue.
In another thread (GUI), I periodically (in the IDLE event) check if there are results in the queue, like this:
def queue_get_all(q):
items = []
while 1:
try:
items.append(q.get_nowait())
except Empty, e:
break
return items
Is this a good way to do it ?
Edit:
I'm asking because sometimes the
waiting thread gets stuck for a few
seconds without taking out new
results.
The "stuck" problem turned out to be because I was doing the processing in the idle event handler, without making sure that such events are actually generated by calling wx.WakeUpIdle, as is recommended.

If you're always pulling all available items off the queue, is there any real point in using a queue, rather than just a list with a lock? ie:
from __future__ import with_statement
import threading
class ItemStore(object):
def __init__(self):
self.lock = threading.Lock()
self.items = []
def add(self, item):
with self.lock:
self.items.append(item)
def getAll(self):
with self.lock:
items, self.items = self.items, []
return items
If you're also pulling them individually, and making use of the blocking behaviour for empty queues, then you should use Queue, but your use case looks much simpler, and might be better served by the above approach.
[Edit2] I'd missed the fact that you're polling the queue from an idle loop, and from your update, I see that the problem isn't related to contention, so the below approach isn't really relevant to your problem. I've left it in in case anyone finds a blocking variant of this useful:
For cases where you do want to block until you get at least one result, you can modify the above code to wait for data to become available through being signalled by the producer thread. Eg.
class ItemStore(object):
def __init__(self):
self.cond = threading.Condition()
self.items = []
def add(self, item):
with self.cond:
self.items.append(item)
self.cond.notify() # Wake 1 thread waiting on cond (if any)
def getAll(self, blocking=False):
with self.cond:
# If blocking is true, always return at least 1 item
while blocking and len(self.items) == 0:
self.cond.wait()
items, self.items = self.items, []
return items

I think the easiest way of getting all items out of the queue is the following:
def get_all_queue_result(queue):
result_list = []
while not queue.empty():
result_list.append(queue.get())
return result_list

I'd be very surprised if the get_nowait() call caused the pause by not returning if the list was empty.
Could it be that you're posting a large number of (maybe big?) items between checks which means the receiving thread has a large amount of data to pull out of the Queue? You could try limiting the number you retrieve in one batch:
def queue_get_all(q):
items = []
maxItemsToRetrieve = 10
for numOfItemsRetrieved in range(0, maxItemsToRetrieve):
try:
if numOfItemsRetrieved == maxItemsToRetrieve:
break
items.append(q.get_nowait())
except Empty, e:
break
return items
This would limit the receiving thread to pulling up to 10 items at a time.

The simplest method is using a list comprehension:
items = [q.get() for _ in range(q.qsize())]
Use of the range function is generally frowned upon, but I haven't found a simpler method yet.

If you're done writing to the queue, qsize should do the trick without needing to check the queue for each iteration.
responseList = []
for items in range(0, q.qsize()):
responseList.append(q.get_nowait())

I see you are using get_nowait() which according to the documentation, "return[s] an item if one is immediately available, else raise the Empty exception"
Now, you happen to break out of the loop when an Empty exception is thrown. Thus, if there is no result immediately available in the queue, your function returns an empty items list.
Is there a reason why you are not using the get() method instead? It may be the case that the get_nowait() fails because the queue is servicing a put() request at that same moment.

Related

Fill a Queue with Objects from several data loaders using multiprocessing

I work on a machine learning input pipeline. I wrote a data loader that reads in data from a large .hdf file and returns slices, which takes roughly 2 seconds per slice. Therefore I would like to use a queue, that takes in objects from several data loaders and can return single objects from the queue via a next function (like a generator). Furthermore the processes that fill the queue should run somehow in the background, refilling the queue when it is not full. I do not get it to work properly. It worked with a single dataloader, giving me 4 times the same slices..
import multiprocessing as mp
class Queue_Generator():
def __init__(self, data_loader_list):
self.pool = mp.Pool(4)
self.data_loader_list = data_loader_list
self.queue = mp.Queue(maxsize=16)
self.pool.map(self.fill_queue, self.data_loader_list)
def fill_queue(self,gen):
self.queue.put(next(gen))
def __next__(self):
yield self.queue.get()
What I get from this:
NotImplementedError: pool objects cannot be passed between processes or pickled
Thanks in advance
Your specific error means that you cannot have a pool as part of your class when you are passing class methods to a pool. What I would suggest could be the following:
import multiprocessing as mp
from queue import Empty
class QueueGenerator(object):
def __init__(self, data_loader_list):
self.data_loader_list = data_loader_list
self.queue = mp.Queue(maxsize=16)
def __iter__(self):
processes = list()
for _ in range(4):
pr = mp.Process(target=fill_queue, args=(self.queue, self.data_loader_list))
pr.start()
processes.append(pr)
return self
def __next__(self):
try:
return self.queue.get(timeout=1) # this should have a value, otherwise your loop will never stop. make it something that ensures your processes have enough time to update the queue but not too long that your program freezes for an extended period of time after all information is processed
except Empty:
raise StopIteration
# have fill queue as a separate function
def fill_queue(queue, gen):
while True:
try:
value = next(gen)
queue.put(value)
except StopIteration: # assumes the given data_loader_list is an iterator
break
print('stopping')
gen = iter(range(70))
qg = QueueGenerator(gen)
for val in qg:
print(val)
# test if it works several times:
for val in qg:
print(val)
The next issue for you to solve I think is to have the data_loader_list be something that provides new information in every separate process. But since you have not given any information about that I can't help you with that. The above does however provide you a way to have the processes fill your queue which is then passed out as an iterator.
Not quite sure why you are yielding in __next__, that doesn't look quite right to me. __next__ should return a value, not a generator object.
Here is a simple way that you can return the results of parallel functions as a generator. It may or may not meet your specific requirements but can be tweaked to suit. It will keep on processing data_loader_list until it is exhausted. This may use a lot of memory compared to keeping, for example, 4 items in a Queue at all times.
import multiprocessing as mp
def read_lines(data_loader):
from time import sleep
sleep(2)
return f'did something with {data_loader}'
def make_gen(data_loader_list):
with mp.Pool(4) as pool:
for result in pool.imap(read_lines, data_loader_list):
yield result
if __name__ == '__main__':
data_loader_list = [i for i in range(15)]
result_generator = make_gen(data_loader_list)
print(type(result_generator))
for i in result_generator:
print(i)
Using imap means that the results can be processed as they are produced. map and map_async would block in the for loop until all results were ready. See this question for more.

How to wait until a multithread queue is not empty without wasting too much cpu cycles

I want to make a thread wait until a multithread queue is not empty. The queue has only one producer and one consumer. The producer places tasks in the queue when available but the producer has to wait until two or more tasks have been gathered. The reason why I don't just use the get method twice in order to retrieve two tasks is because it over complicates the flow of the algorithm. That cannot be depicted in the snippet bellow though, because obviously it's just an oversimplified example.
I need to know that the queue is not empty so that I can compare the peak of the queue (without removing it) with the element I just removed with get
How it could be done with sleep:
while myQueue.empty():
sleep(0.05)
How can I do that without using sleep? Should I use event.wait()? If yes, I cannot figure out how I should properly use the event.clear() command. Since the thread that I want to make wait is also the consumer and I cannot be sure whether the queue is empty. Even if I use queue.empty() to check.
Essentially, it seems you need to implement the Queue.peek() method, that would return the next element in the queue without actually removing it.
This method is not available in the standard Queue object, but you can inherit and expand it without problems:
from Queue import Queue
class VoyeurQueue(Queue):
def peek(self, block=True, timeout=None):
# ...
Now for the contents of new peek() method, you can simply copy-paste the contents of get() method of the base Queue object with some modifications. You can find it at /usr/lib/python?.?/Queue.py if you're on Linux, or %PYTHONPATH%/lib/Queue.py if you're on Windows (not sure about the latter as I'm currently on Linux machine and cannot check). In my copy of Python 2.7, the get() method is implemented as:
def get(self, block=True, timeout=None):
# ... lots of comments
self.not_empty.acquire()
try:
if not block:
if not self._qsize():
raise Empty
elif timeout is None:
while not self._qsize():
self.not_empty.wait()
elif timeout < 0:
raise ValueError("'timeout' must be a non-negative number")
else:
endtime = _time() + timeout
while not self._qsize():
remaining = endtime - _time()
if remaining <= 0.0:
raise Empty
self.not_empty.wait(remaining)
item = self._get()
self.not_full.notify()
return item
finally:
self.not_empty.release()
def _get(self):
return self.queue.popleft()
Now, for differences. You don't want to remove the element, so instead of _get() we define the following:
def _peek(self):
return self.queue[0]
And in the peek() method, we still use the self.not_empty Condition but we no longer need the self.not_full.notify(). So the resulting code will look like:
from Queue import Queue
class VoyeurQueue(Queue):
def peek(self, block=True, timeout=None):
self.not_empty.acquire()
try:
if not block:
if not self._qsize():
raise Empty
elif timeout is None:
while not self._qsize():
self.not_empty.wait()
elif timeout < 0:
raise ValueError("'timeout' must be a non-negative number")
else:
endtime = _time() + timeout
while not self._qsize():
remaining = endtime - _time()
if remaining <= 0.0:
raise Empty
self.not_empty.wait(remaining)
item = self._peek()
return item
finally:
self.not_empty.release()
def _peek(self):
return self.queue[0]
You can use a semaphore, initialized at zero, in parallel to the queue. Let say for example mySemaphore = threading.Semaphore(0). By default the thread calling mySempahore.acquire() will be blocked as the semaphore is zero without touching the queue. Then when you put someting in the queue, you can call mySemaphore.release() that will allow one thread to execute (util the next loop is suppose).
Just myQueue.get(block=True) will bock your thread (stop its execution) until there is something to retrieve from the queue. When an item is availabe in the queue it will be returned by this call. You can add a timeout in case you want to exit if the queue is never feed.
See https://docs.python.org/3/library/queue.html#queue.Queue.get.
I want to make a thread wait until a multithread queue is not empty.
I want to avoid retrieving the next object, that's why I am not using the get method
If you don't mind using a sentinel object (I use one I name Done to tell my consumer thread we're done so it can wrap up.)
Start = object() # sentinel object on global scope.
in producer:
queue.put(Start)
and in worker:
item = queue.get() # blocks until something received
if item is Start:
print('we have now started!')
I'm not sure why you'd do that though, but this does seem to do what you want it to do.

Is there a "single slot" queue?

I need to use a queue which holds only one element, any new element discarding the existing one. Is there a built-in solution?
The solution I coded works but I strive not to reinvent the wheel :)
import Queue
def myput(q, what):
# empty the queue
while not q.empty():
q.get()
q.put(what)
q = Queue.Queue()
print("queue size: {}".format(q.qsize()))
myput(q, "hello")
myput(q, "hello")
myput(q, "hello")
print("queue size: {}".format(q.qsize()))
EDIT: following some comments & answers -- I know that a variable is just for that :) In my program, though, queues will be used to communicate between processes.
As you specify you are using queues to communicate between processes, you should use the multiprocesssing.Queue.
In order to ensure there is only one item in the queue at once, you can have the producers sharing a lock and, whilst locked, first get_nowait from the queue before put. This is similar to the loop you have in your code, but without the race condition of two producers both emptying the queue before putting their new item, and therefore ending up with two items in the queue.
Although the OP is regarding inter-process-communication, I came across a situation where I needed a queue with a single element (such that old elements are discarded when a new element is appended) set up between two threads (producer/consumer).
The following code illustrates the solution I came up with using a collections.deque as was mentioned in the comments:
import collections
import _thread
import time
def main():
def producer(q):
i = 0
while True:
q.append(i)
i+=1
time.sleep(0.75)
def consumer(q):
while True:
try:
v = q.popleft()
print(v)
except IndexError:
print("nothing to pop...queue is empty")
sleep(1)
deq = collections.deque(maxlen=1)
print("starting")
_thread.start_new_thread(producer, (deq,))
_thread.start_new_thread(consumer, (deq,))
if __name__ == "__main__":
main()
In the code above, since the producer is faster than the consumer (sleeps less), some of the elements will not be processed.
Notes (from the documentation):
Deques support thread-safe, memory efficient appends and pops from
either side of the deque with approximately the same O(1) performance
in either direction.
Once a bounded length deque is full, when new items are added, a
corresponding number of items are discarded from the opposite end.
Warning: The code never stops :)

Executed failed threads again

So I have have script that uses about 50k threads, but only runs 10 at a time. I use the threading library for this and BoundedSemaphore to limit the threads to 10 at a time. In some cases there is not enough memory for all threads, but it is important that all threads get processed so I would like to repeat those threads that got killed because of insufficient memory.
import some_other_script, threading
class myThread (threading.Thread):
def __init__(self, item):
threading.Thread.__init__(self)
self.item = item
def run(self):
threadLimiter.acquire()
some_other_script.method(self.item)
somelist.remove(self.item)
threadLimiter.release()
threadLimiter = threading.BoundedSemaphore(10)
somelist = ['50,000 Items','.....]
for item in somelist:
myThread(item).start()
As you can see the only idea I could come up with so far was to delete the item that got processed from the list within every thread with somelist.remove(self.item). (Each item is unique and only present once within the list).
My idea was that I could run a while loop around the for loop, to check if it still contains items, which did not work, because after the for loop is finished the threads are not finished an so the list isn't empty.
What I want to do is to catch those which fail, because the systems runs out of memory and executed them again (and again if need be).
Thank you very much in advance!
This solves both the too many active threads problem and the problem in your question:
def get_items():
threads = threading.enumerate()
items = set()
for thr in threads:
if isinstance(thr, myThread): items.add(thr.item)
return items
def manageThreads(howmany):
while bigset:
items = get_items()
items_to_add = bigset.difference(items)
while len(items) < howmany:
item = items_to_add.pop()
processor = myThread(item)
processor.start()
with thread_done:
thread_done.wait()
thread_done = threading.Condition()
bigset = set(["50,000 items", "..."])
manageThreads(10)
The mythread class run method:
def run(self):
try:
some_other_script.method(self.item)
bigset.remove(self.item)
finally:
with thread_done:
thread_done.notify()
Threading.enumerate() returns a list of currently active thread objects. So, the manageThreads function initially creates 10 threads, then waits for one to finish, then checks the thread count again, and so on. If a thread runs out of memory or another error occurs during processing, it wont remove the item from the bigset, causing it to be requeued by the manager onto a different thread.

check a list constantly and do something if list has items

I have a global list where items are added constantly (from network clients):
mylist = []
def additem(uuid,work):
mylist.append(uuid,work)
And a function which should check the list and if there are items proceed them:
def proceeditems():
while True:
itemdone = []
if len(mylist) > 0:
for item in mylist:
try:
#This can go wrong and than need to be done again
result = gevent.spawn(somework(item))
#result returns the uuid
itemdone.append(result.value)
except:
pass
for item in itemdone:
mylist[:] = [value for value in mylist if value[0]!=item]
So I hope you get an idea now what I try to do, but I think the endless loop seems to be not the right solution.
In this kind of case, you must have used either multithreading or multiprocessing (depending whether the network client is running at different thread or different process.
In either case, you should use Queue to manage the incoming data, then store into itemdone afterwards.
You define the queue like this:
my_queue = queue.Queue() # or multiprocessing.Queue()
Then later you should include the queue in the arguments (or if you use threading, you can use global queue, like you did)
def additem(uuid,work,the_queue):
the_queue.put((uuid,word)) # Queue in a tuple containing the data
def proceeditems(the_queue):
while True:
item = the_queue.get() # This will block until something is inside the queue
try:
result = somework(item)
itemdone.append(result)
except:
the_queue.put(item) # If failed, put back to queue for retry.
# You don't need the last two lines
To stop the whole process, you can make the additem function to insert special token, and the proceeditems, upon receiving the special token, will just quit the loop.

Categories