I have the following script in python that calls a function every X seconds creating a new thread:
def function():
threading.Timer(X, function).start()
do_something
function()
My question is, what if the function takes 2*X seconds to execute? Since I'm using threading this should not be a problem, right? I will have more "instances" of the function running at the same time but once every one finishes its thread should be destroyed. Thanks
If the function takes 2*X seconds, then you're going to have multiple instances of function running concurrently. It's easy to see with an example:
import threading
import time
X = 2
def function():
print("Thread {} starting.".format(threading.current_thread()))
threading.Timer(X, function).start()
time.sleep(2*X)
print("Thread {} done.".format(threading.current_thread()))
function()
Output:
Thread <_MainThread(MainThread, started 140115183785728)> starting.
Thread <_Timer(Thread-1, started 140115158210304)> starting.
Thread <_MainThread(MainThread, started 140115183785728)> done.
Thread <_Timer(Thread-2, started 140115149817600)> starting.
Thread <_Timer(Thread-3, started 140115141424896)> starting.
Thread <_Timer(Thread-1, started 140115158210304)> done.
Thread <_Timer(Thread-4, started 140115133032192)> starting.
Thread <_Timer(Thread-2, started 140115149817600)> done.
Thread <_Timer(Thread-3, started 140115141424896)> done.
Thread <_Timer(Thread-5, started 140115158210304)> starting.
Thread <_Timer(Thread-6, started 140115141424896)> starting.
Thread <_Timer(Thread-4, started 140115133032192)> done.
Thread <_Timer(Thread-7, started 140115149817600)> starting.
Thread <_Timer(Thread-5, started 140115158210304)> done.
Thread <_Timer(Thread-8, started 140115133032192)> starting.
Thread <_Timer(Thread-6, started 140115141424896)> done.
Thread <_Timer(Thread-9, started 140115158210304)> starting.
Thread <_Timer(Thread-7, started 140115149817600)> done.
Thread <_Timer(Thread-10, started 140115141424896)> starting.
Thread <_Timer(Thread-8, started 140115133032192)> done.
Thread <_Timer(Thread-11, started 140115149817600)> starting.
<And on and on forever and ever>
As you can see from the output, this is also an infinite loop, so the program will never end.
If it's safe for multiple instances of function to run at the same time, then this is fine. If it's not, then you need to protect the not-thread-safe part of function with a lock:
import threading
import time
X = 2
lock = threading.Lock()
def function():
with lock:
print("Thread {} starting.".format(threading.current_thread()))
threading.Timer(X, function).start()
time.sleep(2*X)
print("Thread {} done.".format(threading.current_thread()))
function()
Output:
Thread <_MainThread(MainThread, started 140619426387712)> starting.
Thread <_MainThread(MainThread, started 140619426387712)> done.
Thread <_Timer(Thread-1, started 140619400812288)> starting.
Thread <_Timer(Thread-1, started 140619400812288)> done.
Thread <_Timer(Thread-2, started 140619392419584)> starting.
Thread <_Timer(Thread-2, started 140619392419584)> done.
Thread <_Timer(Thread-3, started 140619381606144)> starting.
Thread <_Timer(Thread-3, started 140619381606144)> done.
Thread <_Timer(Thread-4, started 140619392419584)> starting.
Thread <_Timer(Thread-4, started 140619392419584)> done.
Thread <_Timer(Thread-5, started 140619381606144)> starting.
One final note: because of the Global Interpreter Lock, in CPython only one thread can ever actually execute bytecode at a time. So when you use threads, you're not really improving performance if you're doing CPU-bound tasks, because only one thread is every actually executing at a time. Instead, the OS ends up frequently switching between all the threads, giving each a bit of CPU time. This will generally end up being slower than a single-threaded approach, because of the added overhead of switching between the threads. If you're planning on doing CPU-bound work in each thread, you may want to use multiprocessing instead.
In theory you could have 3 active threads running at any given time: one that is just about to end, one that's in the middle of a run, and one that's just been spawned.
|-----|
|-----|
|-----|
In practice, you might end up with a few more:
import threading
import logging
logger = logging.getLogger(__name__)
import time
def function():
threading.Timer(X, function).start()
logger.info('{} active threads'.format(threading.active_count()))
time.sleep(2*X)
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
X = 3
function()
yields
[16:12:13 MainThread] 2 active threads
[16:12:16 Thread-1] 3 active threads
[16:12:19 Thread-2] 4 active threads
[16:12:22 Thread-3] 4 active threads
[16:12:25 Thread-4] 5 active threads
[16:12:28 Thread-5] 4 active threads
[16:12:31 Thread-6] 4 active threads
[16:12:34 Thread-7] 4 active threads
[16:12:37 Thread-8] 5 active threads
[16:12:40 Thread-9] 4 active threads
[16:12:43 Thread-10] 5 active threads
[16:12:46 Thread-11] 5 active threads
I don't see any inherent problem with this; you just have to be aware of what it's doing.
You may run into a race condition if one instance of the function is writing to a resource while another is trying to read that same resource.
http://en.wikipedia.org/wiki/Multithreading_(computer_architecture)#Disadvantages
Can you setup a test so that you can experiment with the behavior that you are concerned with?
Related
As titles describe, I create a separate thread to do a long task in Flask.
import schedule
import time
start_time = time.time()
def job():
print("I'm working..." + str(time.time() - start_time))
def run_schedule():
while True:
schedule.run_pending()
time.sleep(1)
When I press Ctrl + c to terminate the server, the thread still prints. How can I stop the thread when server exits?
You may want to set your thread as daemon.
A thread runs until it ends by itself or it is explicity killed.
A daemon thread runs with the same conditions and if at least one other non-daemonic thread is running: this means that if you end your main thread and no other threads are running, all daemonic thread will end as well.
if you're using threading module, you may set the thread as daemonic by changing his boolean:
import threading
your_thread.daemon = True
if you're using thread module, it should be one of the kwargs
I had some performance issues with a multi-threading code to parallelize multiple telnet probes.
Slow
My first implementation was is really slow, same a if the tasks were run sequencially:
for printer in printers:
…
thread = threading.Thread(target=collect, args=(task, printers_response), kwargs=kw)
threads.append(thread)
for thread in threads:
thread.start()
thread.join()
Blastlingly Fast
for printer in printers:
…
thread = threading.Thread(target=collect, args=(task, printers_response), kwargs=kw)
threads.append(thread)
thread.start() # <----- moved this
for thread in threads:
thread.join()
Question
I don't get why moving the start() method change the performance so much.
In your first implementation you are actually running the code sequentially because by calling join() immediately after start() the main thread is blocked until the started thread is finished.
thread.join() is blocking every thread as soon as they are created in your first implementation.
According to threading.Thread.join() documentation:
Wait until the thread terminates.
This blocks the calling thread until the thread whose join() method is called terminates -- either normally or through an unhandled exception or until the optional timeout occurs".
In your slow example you start the thread and wait till it is complete, then you iterate to the next thread.
Example
from threading import Thread
from time import sleep
def foo(a, b):
while True:
print(a + ' ' + b)
sleep(1)
ths = []
for i in range(3):
th = Thread(target=foo, args=('hi', str(i)))
ths.append(th)
for th in ths:
th.start()
th.join()
Produces
hi 0
hi 0
hi 0
hi 0
In your slow solution you are basically not using multithreading at all. Id's running a thread, waiting to finish it and then running another - there is no difference in running everything in one thread and this solution - you are running them in series.
The second one on the other hand starts all threads and then joins them. This solution limits the execution time to the longest execution time of one single thread - you are running them in parallel.
Self-taught programming student, so I apologize for all the amateur mistakes. I want to learn some deeper subjects, so I'm trying to understand threading, and exception handling.
import threading
import sys
from time import sleep
from random import randint as r
def waiter(n):
print "Starting thread " + str(n)
wait_time = r(1,10)
sleep(wait_time)
print "Exiting thread " + str(n)
if __name__=='__main__':
try:
for i in range(5):
t = threading.Thread(target=waiter, args=(i+1,))
t.daemon = True
t.start()
sleep(3)
print 'All threads complete!'
sys.exit(1)
except KeyboardInterrupt:
print ''
sys.exit(1)
This script just starts and stops threads after a random time and will kill the program if it receives a ^C. I've noticed that it doesn't print when some threads finish:
Starting thread 1
Starting thread 2
Starting thread 3
Exiting thread 3
Exiting thread 2
Starting thread 4
Exiting thread 1
Exiting thread 4
Starting thread 5
All threads complete!
In this example, it never states it exits thread 5. I find I can fix this if I comment out the t.daemon = True statement, but then exception handling waits for any threads to finish up.
Starting thread 1
Starting thread 2
^C
Exiting thread 1
Exiting thread 2
I can understand that when dealing with threads, it's best that they complete what they're handling before exiting, but I'm just curious as to why this is. I'd really appreciate any answers regarding the nature of threading and daemons to guide my understanding.
The whole point of a daemon thread is that if it's not finished by the time the main thread finishes, it gets summarily killed. Quoting the docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property or the daemon constructor argument.
Note Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event.
Now, look at your logic. The main thread only sleeps for 3 seconds after starting thread 5. But thread 5 can sleep for anywhere from 1-10 seconds. So, about 70% of the time, it's not going to be finished by the time the main thread wakes up, prints "All threads complete!", and exits. But thread 5 is still asleep for another 5 seconds. In which case thread 5 will be killed without ever going to print "Exiting thread 5".
If this isn't the behavior you want—if you want the main thread to wait for all the threads to finish—then don't use daemon threads.
I'm trying to implement a simple threadpool in python.
I start a few threads with the following code:
threads = []
for i in range(10):
t = threading.Thread(target=self.workerFuncSpinner(
taskOnDeckQueue, taskCompletionQueue, taskErrorQueue, i))
t.setDaemon(True)
threads.append(t)
t.start()
for thread in threads:
thread.join()
At this point, the worker thread only prints when it starts and exits and time.sleeps between. The problem is, instead of getting output like:
#All output at the same time
thread 1 starting
thread 2 starting
thread n starting
# 5 seconds pass
thread 1 exiting
thread 2 exiting
thread n exiting
I get:
thread 1 starting
# 5 seconds pass
thread 1 exiting
thread 2 starting
# 5 seconds pass
thread 2 exiting
thread n starting
# 5 seconds pass
thread n exiting
And when I do a threading.current_thread(), they all report they are mainthread.
It's like there not even threads, but running in the main thread context.
Help?
Thanks
You are calling workerFuncSpinner in the main thread when creating the Thread object. Use a reference to the method instead:
t=threading.Thread(target=self.workerFuncSpinner,
args=(taskOnDeckQueue, taskCompletionQueue, taskErrorQueue, i))
Your original code:
t = threading.Thread(target=self.workerFuncSpinner(
taskOnDeckQueue, taskCompletionQueue, taskErrorQueue, i))
t.start()
could be rewritten as
# call the method in the main thread
spinner = self.workerFuncSpinner(
taskOnDeckQueue, taskCompletionQueue, taskErrorQueue, i)
# create a thread that will call whatever `self.workerFuncSpinner` returned,
# with no arguments
t = threading.Thread(target=spinner)
# run whatever workerFuncSpinner returned in background thread
t.start()
You were calling the method serially in the main thread and nothing in the created threads.
I suspect workerFuncSpinner may be your problem. I would verify that it is not actually running the task, but returning a callable object for the thread to run.
https://docs.python.org/2/library/threading.html#threading.Thread
from thread import start_new_thread
num_threads = 0
def heron(a):
global num_threads
num_threads += 1
# code has been left out, see above
num_threads -= 1
return new
start_new_thread(heron,(99,))
start_new_thread(heron,(999,))
start_new_thread(heron,(1733,))
start_new_thread(heron,(17334,))
while num_threads > 0:
pass
This is simple code of thread i want to know in last line why do we use while loop
The final while-loop waits for all of the threads to finish before the main thread exits.
It is expensive check (100% CPU for the spin-wait). You can improve it in one of two ways:
while num_threads > 0:
time.sleep(0.1)
or by tracking all the threads in a list and joining them one-by-one:
for worker in worker_threads:
worker.join()
We want to keep process alive until all children finish the work. So we must keep executing something in main thread as long as any child is alive, hence the check for num_threads variable.
If it wasn't for this, all children threads would be killed ASAP main thread finished its work regardless of whether they actually finished their work, so waiting for them is mandatory to ensure everything is done.
To build on Raymond Hettinger's answer: the parent process starts a number of threads, each of which does work. We then wait for each to exit, so that we can collect and process their output. In this case each worker just outputs to the screen, so the parent just has to join() each task to make sure it ran and exited correctly.
Here's an alternate way to code the above. It uses the higher-level library threading (vs thread), and only calls join() on threads besides the current one. We also use threading.enumerate() instead of manually keeping track of worker threads -- easier.
code:
import threading
def heron(a):
print '{}: a={}'.format(threading.current_thread(), a)
threading.Thread(target=heron, args=(99,)).start()
threading.Thread(target=heron, args=(999,)).start()
threading.Thread(target=heron, args=(1733,)).start()
threading.Thread(target=heron, args=(17334,)).start()
print
print '{} threads, joining'.format(threading.active_count())
for thread in threading.enumerate():
print '- {} join'.format(thread)
if thread == threading.current_thread():
continue
thread.join()
print 'done'
Example output:
python ./jointhread.py
<Thread(Thread-1, started 140381408802560)>: a=99
<Thread(Thread-2, started 140381400082176)>: a=999
<Thread(Thread-3, started 140381400082176)>: a=1733
2 threads, joining
- <_MainThread(MainThread, started 140381429581632)> join
- <Thread(Thread-4, started 140381408802560)> join
<Thread(Thread-4, started 140381408802560)>: a=17334
done