I have several threads doing several tasks. One thread listens to data from a UDP socket. Another thread processes the data into JSON objects and queues the data. Another thread sends the information.
What Im trying to work out at the moment, is how a queue works on empty data? While most routines does a
while not q.empty():
object = q.get()
I need to find out how to process the queue in the while loop when there is data on the queue.
I guess I could put a while True loop in and sleep(1). But the trouble is, if the data hits the queue quicker than the sleep time. If I take the sleep(1) out, then my usually understanding, is that the while loop will just eat CPU.
So I guess I need some way of telling the thread that there's data on the queue and therefore to run a processing routine?
You can use conditional variable used in classical synchronization problem, to avoid busy waiting.
You can read more about "Producer consumer" problem.
In this case, the producer threads will add one or more items to the queue, whereas a consumer would be a thread waiting for an item in the queue.
Look at threading.Condition class for usage and more details.
Ref: https://docs.python.org/3/library/threading.html#threading.Condition
Edit:
If you use Queue class from queue module then you don't have to implement nitty-gritty of thread lock and condition management w.r.t Queue.
Ref: https://docs.python.org/3.8/library/queue.html
def worker():
while True:
item = q.get()
if item is None:
break
do_work(item)
q.task_done()
q = queue.Queue()
threads = []
for i in range(num_worker_threads):
t = threading.Thread(target=worker)
t.start()
threads.append(t)
for item in source():
q.put(item)
# block until all tasks are done
q.join()
# stop workers
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
In the above example, the worker thread is waiting for the item in the queue. If an item is made available then Queue object will notify the waiting threads and one of them will get the item from the queue.
Related
My understanding of how a ThreadPoolExecutor works is that when I call #submit, tasks are assigned to threads until all available threads are busy, at which point the executor puts the tasks in a queue awaiting a thread becoming available.
The behavior I want is to block when there is not a thread available, to wait until one becomes available and then only submit my task.
The background is that my tasks are coming from a queue, and I only want to pull messages off my queue when there are threads available to work on these messages.
In an ideal world, I'd be able to provide an option to #submit to tell it to block if a thread is not available, rather than putting them in a queue.
However, that option does not exist. So what I'm looking at is something like:
with concurrent.futures.ThreadPoolExecutor(max_workers=CONCURRENCY) as executor:
while True:
wait_for_available_thread(executor)
message = pull_from_queue()
executor.submit(do_work_for_message, message)
And I'm not sure of the cleanest implementation of wait_for_available_thread.
Honestly, I'm surprised this isn't actually in concurrent.futures, as I would have thought the pattern of pulling from a queue and submitting to a thread pool executor would be relatively common.
One approach might be to keep track of your currently running threads via a set of Futures:
active_threads = set()
def pop_future(future):
active_threads.pop(future)
with concurrent.futures.ThreadPoolExecutor(max_workers=CONCURRENCY) as executor:
while True:
while len(active_threads) >= CONCURRENCY:
time.sleep(0.1) # or whatever
message = pull_from_queue()
future = executor.submit(do_work_for_message, message)
active_threads.add(future)
future.add_done_callback(pop_future)
A more sophisticated approach might be to have the done_callback be the thing that triggers a queue pull, rather than polling and blocking, but then you need to fall back to polling the queue if the workers manage to get ahead of it.
I started looking into create threads in Python. I did some theory search first to understand how Threads work in Python. I also went ahead to read about the use of Queue in Python and how it can help solving trivial Threading problems. I was able to understand separate codes for each. Then I came across the following tutorial :
http://www.ibm.com/developerworks/aix/library/au-threadingpython/
It shows the relevance of Thread and Queue in Python and how it can speed up the execution process under certain circumstances.
I am having difficulty in understanding some areas of the code
def main():
#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
main()
print "Elapsed Time: %s" % (time.time() - start)
In the first for loop the multiple threads are created and a queue instance is passed to it. But from my understanding the queue is empty as of now.
In the next for loop
for host in hosts:
The host values are pushed into the queue. Now how is this Queue data assigned to threads?
Lastly, what is the use of queue.join() with relevance to this program?
"In the first for loop the multiple threads are created and a queue instance is passed to it. But from my understanding the queue is empty as of now."
Yes, the threads are started but they have no work to do yet.
for host in hosts:
"The host values are pushed into the queue. Now how is this Queue data assigned to threads?"
by the Queue instance
"what is the use of queue.join() with relevance to this program?"
join causes your program to wait until the threads have all finished processing and their outputs are collected by the queue instance. Your program will block at this point until the queue has completed.
I have a python app that goes like this:
main thread is GUI
there is a config thread that is started even before GUI
config thread starts a few others independent threads
=> How can I let GUI know that all of those "independent threads" (3.) have finished? How do I detect it in my program (just give me general idea)
I know about Semaphores but I couldnt figure it out, since this is a little more logically complicated than to what I am used to when dealing with threads.
PS all those threads are QThreads from PyQt if that is of any importance but I doubt it.
thanks
The Queue module is great for communicating between threads without worrying about locks or other mutexes. It features a pair of methods, task_done() and join() that are used to signal the completion of tasks and to wait for all tasks to be completed. Here's an example from the docs:
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
A typical producer-consumer problem is solved in python like below:
from queue import Queue
job_queue = Queue(maxsize=10)
def manager():
while i_have_some_job_do:
job = get_data_from_somewhere()
job_queue.put(job) #blocks only if queue is currently full
def worker():
while True:
data = job_queue.get() # blocks until data available
#get things done
But I have a variant of producer/consumer problem (not one strictly speaking, so let me call it manager-worker):
The manager puts some job in a Queue, and the worker should keep getting the jobs and doing them. But when the worker get a job, it does not remove the job from the Queue(unlike Queue.get()). And it is the manager which is able to remove a job from the Queue.
So how does the worker get the job while not removing the job from the queue? Maybe get and put is OK?
How does the manager remove a particular job from the queue?
Perhaps your works can't remove jobs completely, but consider letting them move them from the original queue to a different "job done" queue. The move itself should be cheap and fast, and the manager can then process the "job done" queue, removing elements it agrees are done, and moving others back to the worker queue.
guys!
My application is a bot. It simply receives a message, process it and returns result.
But there are a lot of messages and I'm creating separate thread for processing each, but it makes an application slower (not a bit).
So, Is it any way to reduce CPU usage by replacing threads with something else?
You probably want processes rather than threads. Spawn processes at startup, and use Pipes to talk to them.
http://docs.python.org/dev/library/multiprocessing.html
Threads and processes have the same speed.
Your problem is not which one you use, but how many you use.
The answer is to only have a fixed couple of threads or processes. Say 10.
You then create a Queue (use the Queue module) to store all messages from your robot.
The 10 threads will constantly be working, and everytime they finish, they wait for a new message in the Queue.
This saves you from the overhead of creating and destroying threads.
See http://docs.python.org/library/queue.html for more info.
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
You could try creating only a limited amount of workers and distribute work between them. Python's multiprocessing.Pool would be the thing to use.
You might not even need threads. If your server can handle each request quickly, you can just make it all single-threaded using something like Twisted.