Join one of many threads in Python - python

I have a python program with one main thread and let's say 2 other threads (or maybe even more, probably doesn't matter). I would like to let the main thread sleep until ONE of the other threads is finished. It's easy to do with polling (by calling t.join(1) and waiting for one second for every thread t).
Is it possible to do it without polling, just by
SOMETHING_LIKE_JOIN(1, [t1, t2])
where t1 and t2 are threading.Thread objects? The call must do the following: sleep 1 second, but wake up as soon as one of t1,t2 is finished. Quite similar to POSIX select(2) call with two file descriptors.

One solution is to use a multiprocessing.dummy.Pool; multiprocessing.dummy provides an API almost identical to multiprocessing, but backed by threads, so it gets you a thread pool for free.
For example, you can do:
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(2) # Two workers
for res in pool.imap_unordered(some_func, list_of_func_args):
# res is whatever some_func returned
multiprocessing.Pool.imap_unordered returns results as they become available, regardless of which task finishes first.
If you can use Python 3.2 or higher (or install the concurrent.futures PyPI module for older Python) you can generalize to disparate task functions by creating one or more Futures from a ThreadPoolExecutor, then using concurrent.futures.wait with return_when=FIRST_COMPLETED, or using concurrent.futures.as_completed for similar effect.

Here is an example of using condition object.
from threading import Thread, Condition, Lock
from time import sleep
from random import random
_lock = Lock()
def run(idx, condition):
sleep(random() * 3)
print('thread_%d is waiting for notifying main thread.' % idx)
_lock.acquire()
with condition:
print('thread_%d notifies main thread.' % idx)
condition.notify()
def is_working(thread_list):
for t in thread_list:
if t.is_alive():
return True
return False
def main():
condition = Condition(Lock())
thread_list = [Thread(target=run, kwargs={'idx': i, 'condition': condition}) for i in range(10)]
with condition:
with _lock:
for t in thread_list:
t.start()
while is_working(thread_list):
_lock.release()
if condition.wait(timeout=1):
print('do something')
sleep(1) # <-- Main thread is doing something.
else:
print('timeout')
for t in thread_list:
t.join()
if __name__ == '__main__':
main()
I don't think there is race condition as you described in comment. The condition object contains a Lock. When the main thread is working(sleep(1) in the example), it holds the lock and no thread can notify it until it finishes its work and release the lock.
I just realize that there is a race condition in the previous example. I added a global _lock to ensure the condition never notifies the main thread until the main thread starts waiting. I don't like how it works, but I haven't figured out a better solution...

You can create a Thread Class and the main thread keeps a reference to it. So you can check whether the thread has finished and make your main thread continue again easily.
If that doesn't helped you, I suggest you to look at the Queue library!
import threading
import time, random
#THREAD CLASS#
class Thread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.daemon = True
self.state = False
#START THREAD (THE RUN METHODE)#
self.start()
#THAT IS WHAT THE THREAD ACTUALLY DOES#
def run(self):
#THREAD SLEEPS FOR A RANDOM TIME RANGE#
time.sleep(random.randrange(5, 10))
#AFTERWARDS IS HAS FINISHED (STORE IN VARIABLE)#
self.state = True
#RETURNS THE STATE#
def getState(self):
return self.state
#10 SEPERATE THREADS#
threads = []
for i in range(10):
threads.append(Thread())
#MAIN THREAD#
while True:
#RUN THROUGH ALL THREADS AND CHECK FOR ITS STATE#
for i in range(len(threads)):
if threads[i].getState():
print "WAITING IS OVER: THREAD ", i
#SLEEPS ONE SECOND#
time.sleep(1)

Related

Python - Why ThreadPoolExecutor().submit a thread with queue blocks and Thread().start doesn't?

operating system: windows 10
python 3.7.6
I have a task function
def mytest(task_queue):
while True:
print(task_queue.get())
I want run sub thread and waiting for others put something into task_queue.
If I use concurrent.futures.ThreadPoolExecutor().submit() start thread, then put something into queue, it will block, task_queue.put(1) never run.
if __name__ == '__main__':
import queue
task_queue = queue.Queue()
task_queue.put(0)
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.submit(mytest, task_queue)
task_queue.put(1)
task_queue.put(2)
# only print 0, then block
If I start thread by Thread().start(), it works as I expect.
if __name__ == '__main__':
import queue
task_queue = queue.Queue()
task_queue.put(0)
t1 = threading.Thread(target=mytest, args=(task_queue,))
t1.start()
task_queue.put(1)
task_queue.put(2)
# print 0, 1, 2. but the main thread does not exit
But I don't think either of these methods will block the code because they just start the thread.
So I have 2 question:
Why does submit() block the code?
Why main thread does not exit when use start() to start sub thread without join()?
THX
Q-1) Why does submit() block the code?
A-1) No submit() method not blocking, it is schedules the callable mytest to be executed as mytest(task_queue) and returns a Future object. Look at below code, you will see that submit() method will not block the main thread
if __name__ == '__main__':
import queue
task_queue = queue.Queue()
task_queue.put(0)
executor = ThreadPoolExecutor()
executor.submit(mytest, task_queue)
task_queue.put(1)
task_queue.put(2)
print("hello")
>> 0
hello
1
2
Or you can do like :
if __name__ == '__main__':
import queue
task_queue = queue.Queue()
task_queue.put(0)
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.submit(mytest, task_queue)
task_queue.put(1)
task_queue.put(2)
You will see that task_queue.put(1) and other will be called immediately
As you see above examples, submit() method not blocking, but when you use with statement with concurrent.futures.ThreadPoolExecutor(), __exit__() method will be called end of the with statement. This __exit__() method will call shutdown(wait=True) method of ThreadPoolExecutor() class. When we look at the doc about shutdown(wait=True) method :
If wait is True then this method will not return until all the pending
futures are done executing and the resources associated with the
executor have been freed. If wait is False then this method will
return immediately and the resources associated with the executor will
be freed when all pending futures are done executing. Regardless of
the value of wait, the entire Python program will not exit until all
pending futures are done executing.
That's why your main thread blocked end of the with statement.
I want to give an answer to your second question, but i am confused with something about main thread exit or not. I will edit this answer later (for second question)
Thread(...).start() creates a new thread. End of story. You can always create a new thread if there's still some memory left in which to create it.
executor.submit(mytest, task_queue) creates a new task and adds it to the task_queue. But adding the task to the queue will force the caller to wait until there is room for it in the queue.
Some time later, when the task eventually reaches the head of the queue, a worker thread will take the task and execute it.

How to avoid to start hundreds of threads when starting (very short) actions at different timings in the future

I use this method to launch a few dozen (less than thousand) of calls of do_it at different timings in the future:
import threading
timers = []
while True:
for i in range(20):
t = threading.Timer(i * 0.010, do_it, [i]) # I pass the parameter i to function do_it
t.start()
timers.append(t) # so that they can be cancelled if needed
wait_for_something_else() # this can last from 5 ms to 20 seconds
The runtime of each do_it call is very fast (much less than 0.1 ms) and non-blocking. I would like to avoid spawning hundreds of new threads for such a simple task.
How could I do this with only one additional thread for all do_it calls?
Is there a simple way to do this with Python, without third party library and only standard library?
As I understand it, you want a single worker thread that can process submitted tasks, not in the order they are submitted, but rather in some prioritized order. This seems like a job for the thread-safe queue.PriorityQueue.
from dataclasses import dataclass, field
from threading import Thread
from typing import Any
from queue import PriorityQueue
#dataclass(order=True)
class PrioritizedItem:
priority: int
item: Any=field(compare=False)
def thread_worker(q: PriorityQueue[PrioritizedItem]):
while True:
do_it(q.get().item)
q.task_done()
q = PriorityQueue()
t = Thread(target=thread_worker, args=(q,))
t.start()
while True:
for i in range(20):
q.put(PrioritizedItem(priority=i * 0.010, item=i))
wait_for_something_else()
This code assumes you want to run forever. If not, you can add a timeout to the q.get in thread_worker, and return when the queue.Empty exception is thrown because the timeout expired. Like that you'll be able to join the queue/thread after all the jobs have been processed, and the timeout has expired.
If you want to wait until some specific time in the future to run the tasks, it gets a bit more complicated. Here's an approach that extends the above approach by sleeping in the worker thread until the specified time has arrived, but be aware that time.sleep is only as accurate as your OS allows it to be.
from dataclasses import astuple, dataclass, field
from datetime import datetime, timedelta
from time import sleep
from threading import Thread
from typing import Any
from queue import PriorityQueue
#dataclass(order=True)
class TimedItem:
when: datetime
item: Any=field(compare=False)
def thread_worker(q: PriorityQueue[TimedItem]):
while True:
when, item = astuple(q.get())
sleep_time = (when - datetime.now()).total_seconds()
if sleep_time > 0:
sleep(sleep_time)
do_it(item)
q.task_done()
q = PriorityQueue()
t = Thread(target=thread_worker, args=(q,))
t.start()
while True:
now = datetime.now()
for i in range(20):
q.put(TimedItem(when=now + timedelta(seconds=i * 0.010), item=i))
wait_for_something_else()
To address this problem using only a single extra thread we have to sleep in that thread, so it's possible that new tasks with higher priority could come in while the worker is sleeping. In that case the worker would process that new high priority task after it's done with the current one. The above code assumes that scenario will not happen, which seems reasonable based on the problem description. If that might happen you can alter the sleep code to repeatedly poll if the task at the front of the priority queue has come due. The disadvantage with a polling approach like that is that it would be more CPU intensive.
Also, if you can guarantee that the relative order of the tasks won't change after they've been submitted to the worker, then you can replace the priority queue with a regular queue.Queue to simplify the code somewhat.
These do_it tasks can be cancelled by removing them from the queue.
The above code was tested with the following mock definitions:
def do_it(x):
print(x)
def wait_for_something_else():
sleep(5)
An alternative approach that uses no extra threads would be to use asyncio, as pointed out by smcjones. Here's an approach using asyncio that calls do_it at specific times in the future by using loop.call_later:
import asyncio
def do_it(x):
print(x)
async def wait_for_something_else():
await asyncio.sleep(5)
async def main():
loop = asyncio.get_event_loop()
while True:
for i in range(20):
loop.call_later(i * 0.010, do_it, i)
await wait_for_something_else()
asyncio.run(main())
These do_it tasks can be cancelled using the handle returned by loop.call_later.
This approach will, however, require either switching over your program to use asyncio throughout, or running the asyncio event loop in a separate thread.
It sounds like you want something to be non-blocking and asynchronous, but also single-processed and single-threaded (one thread dedicated to do_it).
If this is the case, and especially if any networking is involved, so long as you're not actively doing serious I/O on your main thread, it is probably worthwhile using asyncio instead.
It's designed to handle non-blocking operations, and allows you to make all of your requests without waiting for a response.
Example:
import asyncio
def main():
while True:
tasks = []
for i in range(20):
tasks.append(asyncio.create_task(do_it(i)))
await wait_for_something_else()
for task in tasks:
await task
asyncio.run(main())
Given the time spent on blocking I/O (seconds) - you'll probably waste more time managing threads than you will save on generating a separate thread to do these other operations.
As you have said that in your code each series of 20 do_it calls starts when wait_for_something_else is finished, I would recommend calling the join method in each iteration of the while loop:
import threading
timers = []
while True:
for i in range(20):
t = threading.Timer(i * 0.010, do_it, [i]) # I pass the parameter i to function do_it
t.start()
timers.append(t) # so that they can be cancelled if needed
wait_for_something_else() # this can last from 5 ms to 20 seconds
for t in timers[-20:]:
t.join()
do_it run in order and cancellable
run all do_it in one thread and sleep for the specific timing (may not with sleep)
use a variable "should_run_it" to check the do_it should run or not (cancellable?)
it's that something like this?
import threading
import time
def do_it(i):
print(f"[{i}] {time.time()}")
should_run_it = {i:True for i in range(20)}
def guard_do_it(i):
if should_run_it[i]:
do_it(i)
def run_do_it():
for i in range(20):
guard_do_it(i)
time.sleep(0.010)
if __name__ == "__main__":
t = threading.Timer(0.010, run_do_it)
start = time.time()
print(start)
t.start()
#should_run_it[5] = should_run_it[10] = should_run_it[15] = False # test
t.join()
end = time.time()
print(end)
print(end - start)
I don't have a ton of experience with threading in Python, so please go easy on me. The concurrent.futures library is a part of Python3 and it's dead simple. I'm providing an example for you so you can see how straightforward it is.
Concurrent.futures with exactly one thread for do_it() and concurrency:
import concurrent.futures
import time
def do_it(iteration):
time.sleep(0.1)
print('do it counter', iteration)
def wait_for_something_else():
time.sleep(1)
print('waiting for something else')
def single_thread():
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
futures = (executor.submit(do_it, i) for i in range(20))
for future in concurrent.futures.as_completed(futures):
future.result()
def do_asap():
wait_for_something_else()
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [executor.submit(single_thread), executor.submit(do_asap)]
for future in concurrent.futures.as_completed(futures):
future.result()
The code above uses max_workers=1 threads to execute do_it() in a single thread. On line 13, do_it() is constrained to a single thread using the option max_workers=1 to limit the work to exactly one thread.
On line 22, both methods are submitted to the concurrent.futures thread pool executor. The code from lines 21-24 enables both methods to run in a thread pool and do_it runs on a single non-blocking thread.
The concurrent.futures doc describes how to control the number of threads. When max_workers is not specified, the total number of threads assigned to both processes is max_workers = min(32, os.cpu_count() + 4).

How do I update the GUI from another thread? using python

What is the best way to update a gui from another thread in python.
I have main function (GUI) in thread1 and from this i'm referring another thread (thread2), is it possible to update GUI while working in Thread2 without cancelling work at thread2, if it is yes how can I do that?
any suggested reading about thread handling. ?
Of course you can use Threading to run several processes simultaneously.
You have to create a class like this :
from threading import Thread
class Work(Thread):
def __init__(self):
Thread.__init__(self)
self.lock = threading.Lock()
def run(self): # This function launch the thread
(your code)
if you want run several thread at the same time :
def foo():
i = 0
list = []
while i < 10:
list.append(Work())
list[i].start() # Start call run() method of the class above.
i += 1
Be careful if you want to use the same variable in several threads. You must lock this variable so that they do not all reach this variable at the same time. Like this :
lock = threading.Lock()
lock.acquire()
try:
yourVariable += 1 # When you call lock.acquire() without arguments, block all variables until the lock is unlocked (lock.release()).
finally:
lock.release()
From the main thread, you can call join() on the queue to wait until all pending tasks have been completed.
This approach has the benefit that you are not creating and destroying threads, which is expensive. The worker threads will run continuously, but will be asleep when no tasks are in the queue, using zero CPU time.
I hope it will help you.

Thread not exiting

I am learning about Thread in Python and am trying to make a simple program, one that uses threads to grab a number off the Queue and print it.
I have the following code
import threading
from Queue import Queue
test_lock = threading.Lock()
tests = Queue()
def start_thread():
while not tests.empty():
with test_lock:
if tests.empty():
return
test = tests.get()
print("{}".format(test))
for i in range(10):
tests.put(i)
threads = []
for i in range(5):
threads.append(threading.Thread(target=start_thread))
threads[i].daemon = True
for thread in threads:
thread.start()
tests.join()
When run it just prints the values and never exits.
How do I make the program exit when the Queue is empty?
From the docstring of Queue.join():
Blocks until all items in the Queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls task_done()
to indicate the item was retrieved and all work on it is complete.
When the count of unfinished tasks drops to zero, join() unblocks.
So you must call tests.task_done() after processing the item.
Since your threads are daemon threads, and the queue will handle concurrent access correctly, you don't need to check if the queue is empty or use a lock. You can just do:
def start_thread():
while True:
test = tests.get()
print("{}".format(test))
tests.task_done()

Monitoring a threaded Python program with htop

First of all, this is the code I am referring to:
from random import randint
import time
from threading import Thread
import Queue
class TestClass(object):
def __init__(self, queue):
self.queue = queue
def do(self):
while True:
wait = randint(1, 10)
time.sleep(1.0/wait)
print '[>] Enqueuing from TestClass.do...', wait
self.queue.put(wait)
class Handler(Thread):
def __init__(self, queue):
Thread.__init__(self)
self.queue = queue
def run(self):
task_no = 0
while True:
task = self.queue.get()
task_no += 1
print ('[<] Dequeuing from Handler.run...', task,
'task_no=', task_no)
time.sleep(1) # emulate processing time
print ('[*] Task %d done!') % task_no
self.queue.task_done()
def main():
q = Queue.Queue()
watchdog = TestClass(q)
observer = Thread(target=watchdog.do)
observer.setDaemon(True)
handler = Handler(q)
handler.setDaemon(True)
handler.start()
observer.start()
try:
while True:
wait = randint(1, 10)
time.sleep(1.0/wait)
print '[>] Enqueuing from main...', wait
q.put(wait)
except KeyboardInterrupt:
print '[*] Exiting...', True
if __name__ == '__main__':
main()
While the code is not very important to my question, it is a simple script that spawns 2 threads, on top of the main one. Two of them enqueue "tasks", and one dequeues them and "executes" them.
I am just starting to study threading in python, and I have of course ran into the subject of GIL, so I expected to have one process. But the thing is, when I monitor this particular script with htop, I notice not 1, but 3 processes being spawned.
How is this possible?
The GIL means only one thread will "do work" at a time but it doesn't mean that Python won't spawn the threads. In your case, you asked Python to spawn two threads so it did (giving you a total of three threads). FYI, top lists both processes and threads in case this was causing your confusion.
Python threads are useful for when you want concurrency but don't need parallelism. Concurrency is a tool for making programs simpler and more modular; it allows you to spawn a thread per task instead of having to write one big (often messy) while loop and/or use a bunch of callbacks (like JavaScript).
If you're interested in this subject, I recommend googling "concurrency versus parallelism". The concept is not language specific.
Edit: Alternativly, you can just read this Stack Overflow thread.

Categories