python threads synchronization - python

I have a python app that goes like this:
main thread is GUI
there is a config thread that is started even before GUI
config thread starts a few others independent threads
=> How can I let GUI know that all of those "independent threads" (3.) have finished? How do I detect it in my program (just give me general idea)
I know about Semaphores but I couldnt figure it out, since this is a little more logically complicated than to what I am used to when dealing with threads.
PS all those threads are QThreads from PyQt if that is of any importance but I doubt it.
thanks

The Queue module is great for communicating between threads without worrying about locks or other mutexes. It features a pair of methods, task_done() and join() that are used to signal the completion of tasks and to wait for all tasks to be completed. Here's an example from the docs:
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done

Related

Python Queues in Multithreading and Continual Processing

I have several threads doing several tasks. One thread listens to data from a UDP socket. Another thread processes the data into JSON objects and queues the data. Another thread sends the information.
What Im trying to work out at the moment, is how a queue works on empty data? While most routines does a
while not q.empty():
object = q.get()
I need to find out how to process the queue in the while loop when there is data on the queue.
I guess I could put a while True loop in and sleep(1). But the trouble is, if the data hits the queue quicker than the sleep time. If I take the sleep(1) out, then my usually understanding, is that the while loop will just eat CPU.
So I guess I need some way of telling the thread that there's data on the queue and therefore to run a processing routine?
You can use conditional variable used in classical synchronization problem, to avoid busy waiting.
You can read more about "Producer consumer" problem.
In this case, the producer threads will add one or more items to the queue, whereas a consumer would be a thread waiting for an item in the queue.
Look at threading.Condition class for usage and more details.
Ref: https://docs.python.org/3/library/threading.html#threading.Condition
Edit:
If you use Queue class from queue module then you don't have to implement nitty-gritty of thread lock and condition management w.r.t Queue.
Ref: https://docs.python.org/3.8/library/queue.html
def worker():
while True:
item = q.get()
if item is None:
break
do_work(item)
q.task_done()
q = queue.Queue()
threads = []
for i in range(num_worker_threads):
t = threading.Thread(target=worker)
t.start()
threads.append(t)
for item in source():
q.put(item)
# block until all tasks are done
q.join()
# stop workers
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
In the above example, the worker thread is waiting for the item in the queue. If an item is made available then Queue object will notify the waiting threads and one of them will get the item from the queue.

Threading cleanup (disposing)

I am referring to this Simple threading event example .
More specifically, this piece of code:
for i in range(4):
t = threading.Thread(target=worker)
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
t.start()
From what I understand this created 4 threads that will be used. As I am more familiar with C++ and C# I am wondering about cleanup. Can I just leave these threads open or is there a proper way of 'closing'/disposing them? Please do not misread this as wanting to kill the thread. I am just wondering that when all work is completed is there a proper way of cleaning up.

Python multithreading design suggestions

I have a question regarding my current threading design -- my current process spawns a new thread and continues the main thread until the termination condition. The process waits until all threads finish before terminating. The issue I am having is that each new thread spawned needs to see if the previous thread spawned is done. Should I simply set up a queue and use just one thread to process all the tasks? Or is it possible to spawn a thread, somehow check if the previous thread is done and process the task only once that thread in question is done?
thanks for your help
If all of the threads aside from the initial "main" thread are supposed to run sequentially, then yes, you should use a task queue and a single worker thread.
Queue can help with this (and allow your main thread to .join() on it if it needs to wait for all of the queued tasks to be completed).
Look at Gevent.
You can create several Greenlet objects for your several tasks.
Each greenlet is green thread.
from gevent import monkey
monkey.patch_all()
import gevent
from gevent import Greenlet
class Task(Greenlet):
def __init__(self, name):
Greenlet.__init__(self)
self.name = name
def _run(self):
print "Task %s: some task..." % self.name
t1 = Task("task1")
t2 = Task("task2")
t1.start()
t2.start()
# here we are waiting all tasks
gevent.joinall([t1,t2])

Threads in Python again

guys!
My application is a bot. It simply receives a message, process it and returns result.
But there are a lot of messages and I'm creating separate thread for processing each, but it makes an application slower (not a bit).
So, Is it any way to reduce CPU usage by replacing threads with something else?
You probably want processes rather than threads. Spawn processes at startup, and use Pipes to talk to them.
http://docs.python.org/dev/library/multiprocessing.html
Threads and processes have the same speed.
Your problem is not which one you use, but how many you use.
The answer is to only have a fixed couple of threads or processes. Say 10.
You then create a Queue (use the Queue module) to store all messages from your robot.
The 10 threads will constantly be working, and everytime they finish, they wait for a new message in the Queue.
This saves you from the overhead of creating and destroying threads.
See http://docs.python.org/library/queue.html for more info.
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
You could try creating only a limited amount of workers and distribute work between them. Python's multiprocessing.Pool would be the thing to use.
You might not even need threads. If your server can handle each request quickly, you can just make it all single-threaded using something like Twisted.

A thread pool that lets me know when at least 1 has finished?

I need to use a thread pool in python, and I want to be able to know when at least 1 thead out or "maximum threads allowed" has finished, so I can start it again if I still need to do something.
I has been using something like this:
def doSomethingWith(dataforthread):
dostuff()
i = i-1 #thread has finished
i = 0
poolSize = 5
threads = []
data = #array of data
while len(data):
while True:
if i<poolSize: #if started threads is < poolSize start new thread
dataforthread = data.pop(0)
i = i+1
thread = doSomethingWith(dataforthread)
thread.start()
threads.append(thread)
else:
break
for t in threads: #wait for ALL threads (I ONLY WANT TO WAIT FOR 1 [any])
t.join()
As I understand, my code opens 5 threads, and then waits for all the threads to finish before starting new threads, until data is consumed. But what I really want to do is start a new thread as soon as one of the threads finish and the pool has an "available spot" for a new thread.
I have been reading this, but I think that would have the same issue than my code (not sure, im new to python but by looking at joinAll() it looks like that).
Does someone has an example to do what I am trying to achieve?
I mean detecting as soon as i is < than poolSize, launching new threads until i=poolSize and do that until data is consumed.
As the article author mentions, and #getekha highlights, thread pools in Python don't accomplish exactly the same thing as they do in other languages. If you need parallelism, you should look into the multiprocessing module. Among other things, it has these handy Queue and Pool constructs. Also, there's an accepted PEP for "futures" that you'll probably want to monitor.
The problem is that Python has a Global Interpreter Lock, which must be held to run any Python code. This means that only one thread can execute Python code at any time, so thread pools in Python are not the same as in other languages. This is mainly for arcane reasons known only to a select few (i.e. it's complicated).
If you really want to run code asynchronously, you should spawn new Processes; the multiprocesssing module has a Pool class which you could look into.

Categories