This question already has answers here:
Is there any way to kill a Thread?
(31 answers)
Closed 2 years ago.
I want to create threads that will add something to an array, but, if they don't do that in less than 2 seconds, I want to terminate them.
This is a prof of concept, so the code is simple. Every second I want a thread to add that item in the list, so a thread runs after 0, 1, 2, 3 and 4 seconds. The idea is to not let the thread 3 and 4 run.
import threading, time
myList = []
def foo(value):
global myList
time.sleep(value)
print("Value: {}".format(value))
myList.append(value)
threads = []
for i in range(5):
th = threading.Thread(target=foo, args=(i,))
threads.append(th)
for th in threads:
th.start()
What do I do now? I tried using some other logic like using
th.join(timeout)
But that doesn't seem to work.
As I said in a comment you can't really "kill" a thread (externally). However they can "commit suicide" by returning or raising a exception.
Below is example of doing the latter when the thread's execution time has exceeded a given amount of time. Note that this is not the same as doing a join(timeout) call, which only blocks until the thread ends or the specified amount of time has elapsed. That's why the printing of value and its appending to the list happens regardless of whether the thread finishes before the call to join() times-out or not.
I got the basic idea of using sys.settrace() from the tutorial titled Different ways to kill a Thread — although my implementation is slightly different. Also note that this approach likely introduces a significant amount of overhead.
import sys
import threading
import time
class TimelimitedThread(threading.Thread):
def __init__(self, *args, time_limit, **kwargs):
self.time_limit = time_limit
self._run_backup = self.run # Save superclass run() method.
self.run = self._run # Change it to custom version.
super().__init__(*args, **kwargs)
def _run(self):
self.start_time = time.time()
sys.settrace(self.globaltrace)
self._run_backup() # Call superclass run().
self.run = self._run_backup # Restore original.
def globaltrace(self, frame, event, arg):
return self.localtrace if event == 'call' else None
def localtrace(self, frame, event, arg):
if(event == 'line' and
time.time()-self.start_time > self.time_limit): # Over time?
raise SystemExit() # Terminate thread.
return self.localtrace
THREAD_TIME_LIMIT = 2.1 # Secs
threads = []
my_list = []
def foo(value):
global my_list
time.sleep(value)
print("Value: {}".format(value))
my_list.append(value)
for i in range(5):
th = TimelimitedThread(target=foo, args=(i,), time_limit=THREAD_TIME_LIMIT)
threads.append(th)
for th in threads:
th.start()
for th in threads:
th.join()
print('\nResults:')
print('my_list:', my_list)
Output:
Value: 0
Value: 1
Value: 2
Results:
my_list: [0, 1, 2]
Join() is used to wait for the respective thread to finish. To terminate a thread, use stop().. You can try as follows:
time.sleep(N)
th.join()
Related
I'm trying to implement a queue. This is old code which was either taken from some kind of tutorial that I did some time ago or from some kind of experimentation that I did reading the docs, or a mix of the two. Thing is I'm not sure if the code is mine or not, but I'm trying to use it as an example to learn from. The script has a producer that produces numbers in a list and 2 consumers competing for grabbing those numbers and adding them up, the one with the highest sum wins.
So, here's my question: in the following code in the "consume_numbers" function I have a time.sleep(0.01) line which makes the code run. Without it, the code hangs, with it it runs smoothly. Can someone explain why this happens and how I could implement a queue without this issue?
import concurrent.futures
import time
import random
import threading
import queue
class MyQueue(queue.Queue):
def __init__(self, maxsize=10):
super().__init__()
self.maxsize = maxsize
self.numbers = []
def set_number(self, number):
self.put(number)
self.numbers.append(number)
def get_number(self):
return self.get()
def produce_random_numbers(q: MyQueue, maxcount: int, evnt: threading.Event):
count = 0
while not evnt.is_set():
num = random.randint(1, 5)
q.set_number(num)
count += 1
if count > maxcount:
event.set()
def consume_numbers(q: MyQueue, consumed: list, evnt: threading.Event):
while not q.empty() or not evnt.is_set():
num = q.get_number()
time.sleep(0.01)
consumed.append(num)
if __name__ == "__main__":
q = MyQueue(maxsize=10)
event = threading.Event()
cons1 = []
cons2 = []
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as ex:
ex.submit(produce_random_numbers, q, 50, event)
ex.submit(consume_numbers, q, cons1, event)
ex.submit(consume_numbers, q, cons2, event)
event.set()
print(f'Generated Numbers: {q.numbers}')
print(f'Numbers Consumed by Thread1 which summed up to {sum(cons1)} are: {cons1}')
print(f'Numbers Consumed by Thread2 which summed up to {sum(cons2)} are: {cons2}')
if sum(cons1) > sum(cons2):
print("Thread1 Wins!")
elif sum(cons1) < sum(cons2):
print("Thread2 Wins!")
else:
print("It's a tie!")
Thanks!
The code does not implement a queue from scratch, but extends queue.Queue to add memory. There is an event object that is used to signal to the consumers that the producer thread has finished. There is are hidden race conditions in the consumers when there is only one item on the queue.
The check not q.empty() or not evnt.is_set() will run the loop code either if there is something in the queue or the event has not been set. It could happen that:
One thread sees that the queue is not empty and enters the loop
A thread switch happens, and the other thread consumes the last item
A switch happens to the first thread, which calls get_number() and blocks
A similar race condition happens with the evnt.is_set() check:
The last item is added to the queue by the producer, and a thread switch happens
One thread consumes the last item, a switch
A thread switch happens, the consumer gets the last item and goes back to the loop condition. As the event has not been set the loop is executed and get_number() blocks
Having the threads wait minimizes the chance of these conditions happening. Without waiting, it is very likely that a single consumer thread will consume all the queue items, while the other one is still entering its loop.
Using timeouts is cumbersome. A useful idiom that avoids using events is to use iter and use an impossible value as a sentinel:
# --- snip ---
def produce_random_numbers(q: MyQueue, maxcount: int, n_consumers: int):
for _ in range(maxcount):
num = random.randint(1, 5)
q.set_number(num)
for _ in range(n_consumers):
q.put(None) # <--- I use put to put one sentinel per consumer
def consume_numbers(q: MyQueue, consumed: list):
for num in iter(q.get_number, None):
consumed.append(num)
if __name__ == "__main__":
q = MyQueue(maxsize=10)
cons1 = []
cons2 = []
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as ex:
ex.submit(produce_random_numbers, q, 500000, 2)
ex.submit(consume_numbers, q, cons1)
ex.submit(consume_numbers, q, cons2)
print(f'Generated Numbers: {q.numbers}')
# --- snip ---
There are some other issues and things I would have done differently:
The event.set() after the with... block is useless: the event has already been set by the producer
There is a typo in the producer and the global event variable is used instead of the local evnt parameter. Fortunately those refer to the same object.
As there is only one producer, there will be no problem. Otherwise the order of MyQueue.numbers could be different from the order in which the items were added to the queue:
put is called on one thread
a thread switch happens
a put + append happens in the new thread
a thread switch happens, and the first value is appended
Instead of defining MyQueue.set_number I would have overrided put
I am running a python script every hour and I've been using time.sleep(3600) inside of a while loop. It seems to work as needed but I am worried about it blocking new tasks. My research of this seems to be that it only blocks the current thread but I want to be 100% sure. While the hourly job shouldn't take more than 15min, if it does or if it hangs, I don't want it to block the next one that starts. This is how I've done it:
import threading
import time
def long_hourly_job():
# do some long task
pass
if __name__ == "__main__":
while True:
thr = threading.Thread(target=long_hourly_job)
thr.start()
time.sleep(3600)
Is this sufficient?
Also, the reason i am using time.sleep for this hourly job rather than a cron job is I want to do everything in code to make dockerization cleaner.
The code will work (ie: sleep does only block the calling thread), but you should be careful of some issues. Some of them have been already stated in the comments, like the possibility of time overlaps between threads. The main issue is that your code is slowly leaking resources. After creating a thread, the OS keeps some data structures even after the thread has finished running. This is necessary, for example to keep the thread's exit status until the thread's creator requires it. The function to clear these structures (conceptually equivalent to closing a file) is called join. A thread that has finished running and is not joined is termed a 'zombie thread'. The amount of memory required by these structures is very small, and your program should run for centuries for any reasonable amount of available RAM. Nevertheless, it is a good practice to join all the threads you create. A simple approach (if you know that 3600 s is more than enough time for the thread to finish) would be:
if __name__ == "__main__":
while True:
thr = threading.Thread(target=long_hourly_job)
thr.start()
thr.join(3600) # wait at most 3600 s for the thread to finish
if thr.isAlive(): # join does not return useful information
print("Ooops: the last job did not finish on time")
A better approach if you think that it is possible that sometimes 3600 s is not enough time for the thread to finish:
if __name__ == "__main__":
previous = []
while True:
thr = threading.Thread(target=long_hourly_job)
thr.start()
previous.append(thr)
time.sleep(3600)
for i in reversed(range(len(previous))):
t = previous[i]
t.join(0)
if t.isAlive():
print("Ooops: thread still running")
else:
print("Thread finished")
previous.remove(t)
I know that the print statement makes no sense: use logging instead.
Perhaps a little late. I tested the code from other answers but my main process got stuck (perhaps I'm doing something wrong?). I then tried a different approach. It's based on threading Timer class, but trying to emulate the QtCore.QTimer() behavior and features:
import threading
import time
import traceback
class Timer:
SNOOZE = 0
ONEOFF = 1
def __init__(self, timerType=SNOOZE):
self._timerType = timerType
self._keep = threading.Event()
self._timerSnooze = None
self._timerOneoff = None
class _SnoozeTimer(threading.Timer):
# This uses threading.Timer class, but consumes more CPU?!?!?!
def __init__(self, event, msec, callback, *args):
threading.Thread.__init__(self)
self.stopped = event
self.msec = msec
self.callback = callback
self.args = args
def run(self):
while not self.stopped.wait(self.msec):
self.callback(*self.args)
def start(self, msec: int, callback, *args, start_now=False) -> bool:
started = False
if msec > 0:
if self._timerType == self.SNOOZE:
if self._timerSnooze is None:
self._timerSnooze = self._SnoozeTimer(self._keep, msec / 1000, callback, *args)
self._timerSnooze.start()
if start_now:
callback(*args)
started = True
else:
if self._timerOneoff is None:
self._timerOneoff = threading.Timer(msec / 1000, callback, *args)
self._timerOneoff.start()
started = True
return started
def stop(self):
if self._timerType == self.SNOOZE:
self._keep.set()
self._timerSnooze.join()
else:
self._timerOneoff.cancel()
self._timerOneoff.join()
def is_alive(self):
if self._timerType == self.SNOOZE:
isAlive = self._timerSnooze is not None and self._timerSnooze.is_alive() and not self._keep.is_set()
else:
isAlive = self._timerOneoff is not None and self._timerOneoff.is_alive()
return isAlive
isAlive = is_alive
KEEP = True
def callback():
global KEEP
KEEP = False
print("ENDED", time.strftime("%M:%S"))
if __name__ == "__main__":
count = 0
t = Timer(timerType=Timer.ONEOFF)
t.start(5000, callback)
print("START", time.strftime("%M:%S"))
while KEEP:
if count % 10000000 == 0:
print("STILL RUNNING")
count += 1
Notice the while loop runs in a separate thread, and uses a callback function to invoke when the time is over (in your case, this callback function would be used to check if the long running process has finished).
Given the following class:
from abc import ABCMeta, abstractmethod
from time import sleep
import threading
from threading import active_count, Thread
class ScraperPool(metaclass=ABCMeta):
Queue = []
ResultList = []
def __init__(self, Queue, MaxNumWorkers=0, ItemsPerWorker=50):
# Initialize attributes
self.MaxNumWorkers = MaxNumWorkers
self.ItemsPerWorker = ItemsPerWorker
self.Queue = Queue # For testing purposes.
def initWorkerPool(self, PrintIDs=True):
for w in range(self.NumWorkers()):
Thread(target=self.worker, args=(w + 1, PrintIDs,)).start()
sleep(1) # Explicitly wait one second for this worker to start.
def run(self):
self.initWorkerPool()
# Wait until all workers (i.e. threads) are done.
while active_count() > 1:
print("Active threads: " + str(active_count()))
sleep(5)
self.HandleResults()
def worker(self, id, printID):
if printID:
print("Starting worker " + str(id) + ".")
while (len(self.Queue) > 0):
self.scraperMethod()
if printID:
print("Worker " + str(id) + " is quiting.")
# Todo Kill is this Thread.
return
def NumWorkers(self):
return 1 # Simplified for testing purposes.
#abstractmethod
def scraperMethod(self):
pass
class TestScraper(ScraperPool):
def scraperMethod(self):
# print("I am scraping.")
# print("Scraping. Threads#: " + str(active_count()))
temp_item = self.Queue[-1]
self.Queue.pop()
self.ResultList.append(temp_item)
def HandleResults(self):
print(self.ResultList)
ScraperPool.register(TestScraper)
scraper = TestScraper(Queue=["Jaap", "Piet"])
scraper.run()
print(threading.active_count())
# print(scraper.ResultList)
When all the threads are done, there's still one active thread - threading.active_count() on the last line gets me that number.
The active thread is <_MainThread(MainThread, started 12960)> - as printed with threading.enumerate().
Can I assume that all my threads are done when active_count() == 1?
Or can, for instance, imported modules start additional threads so that my threads are actually done when active_count() > 1 - also the condition for the loop I'm using in the run method.
You can assume that your threads are done when active_count() reaches 1. The problem is, if any other module creates a thread, you'll never get to 1. You should manage your threads explicitly.
Example: You can put the threads in a list and join them one at a time. The relevant changes to your code are:
def __init__(self, Queue, MaxNumWorkers=0, ItemsPerWorker=50):
# Initialize attributes
self.MaxNumWorkers = MaxNumWorkers
self.ItemsPerWorker = ItemsPerWorker
self.Queue = Queue # For testing purposes.
self.WorkerThreads = []
def initWorkerPool(self, PrintIDs=True):
for w in range(self.NumWorkers()):
thread = Thread(target=self.worker, args=(w + 1, PrintIDs,))
self.WorkerThreads.append(thread)
thread.start()
sleep(1) # Explicitly wait one second for this worker to start.
def run(self):
self.initWorkerPool()
# Wait until all workers (i.e. threads) are done. Waiting in order
# so some threads further in the list may finish first, but we
# will get to all of them eventually
while self.WorkerThreads:
self.WorkerThreads[0].join()
self.HandleResults()
according to docs active_count() includes the main thread, so if you're at 1 then you're most likely done, but if you have another source of new threads in your program then you may be done before active_count() hits 1.
I would recommend implementing explicit join method on your ScraperPool and keeping track of your workers and explicitly joining them to main thread when needed instead of checking that you're done with active_count() calls.
Also remember about GIL...
I have an application that fires up a series of threads. Occassionally, one of these threads dies (usually due to a network problem). How can I properly detect a thread crash and restart just that thread? Here is example code:
import random
import threading
import time
class MyThread(threading.Thread):
def __init__(self, pass_value):
super(MyThread, self).__init__()
self.running = False
self.value = pass_value
def run(self):
self.running = True
while self.running:
time.sleep(0.25)
rand = random.randint(0,10)
print threading.current_thread().name, rand, self.value
if rand == 4:
raise ValueError('Returned 4!')
if __name__ == '__main__':
group1 = []
group2 = []
for g in range(4):
group1.append(MyThread(g))
group2.append(MyThread(g+20))
for m in group1:
m.start()
print "Now start second wave..."
for p in group2:
p.start()
In this example, I start 4 threads then I start 4 more threads. Each thread randomly generates an int between 0 and 10. If that int is 4, it raises an exception. Notice that I don't join the threads. I want both group1 and group2 list of threads to be running. I found that if I joined the threads it would wait until the thread terminated. My thread is supposed to be a daemon process, thus should rarely (if ever) hit the ValueError Exception this example code is showing and should be running constantly. By joining it, the next set of threads doesn't begin.
How can I detect that a specific thread died and restart just that one thread?
I have attempted the following loop right after my for p in group2 loop.
while True:
# Create a copy of our groups to iterate over,
# so that we can delete dead threads if needed
for m in group1[:]:
if not m.isAlive():
group1.remove(m)
group1.append(MyThread(1))
for m in group2[:]:
if not m.isAlive():
group2.remove(m)
group2.append(MyThread(500))
time.sleep(5.0)
I took this method from this question.
The problem with this, is that isAlive() seems to always return True, because the threads never restart.
Edit
Would it be more appropriate in this situation to use multiprocessing? I found this tutorial. Is it more appropriate to have separate processes if I am going to need to restart the process? It seems that restarting a thread is difficult.
It was mentioned in the comments that I should check is_active() against the thread. I don't see this mentioned in the documentation, but I do see the isAlive that I am currently using. As I mentioned above, though, this returns True, thus I'm never able to see that a thread as died.
I had a similar issue and stumbled across this question. I found that join takes a timeout argument, and that is_alive will return False once the thread is joined. So my audit for each thread is:
def check_thread_alive(thr):
thr.join(timeout=0.0)
return thr.is_alive()
This detects thread death for me.
You could potentially put in an a try except around where you expect it to crash (if it can be anywhere you can do it around the whole run function) and have an indicator variable which has its status.
So something like the following:
class MyThread(threading.Thread):
def __init__(self, pass_value):
super(MyThread, self).__init__()
self.running = False
self.value = pass_value
self.RUNNING = 0
self.FINISHED_OK = 1
self.STOPPED = 2
self.CRASHED = 3
self.status = self.STOPPED
def run(self):
self.running = True
self.status = self.RUNNING
while self.running:
time.sleep(0.25)
rand = random.randint(0,10)
print threading.current_thread().name, rand, self.value
try:
if rand == 4:
raise ValueError('Returned 4!')
except:
self.status = self.CRASHED
Then you can use your loop:
while True:
# Create a copy of our groups to iterate over,
# so that we can delete dead threads if needed
for m in group1[:]:
if m.status == m.CRASHED:
value = m.value
group1.remove(m)
group1.append(MyThread(value))
for m in group2[:]:
if m.status == m.CRASHED:
value = m.value
group2.remove(m)
group2.append(MyThread(value))
time.sleep(5.0)
I have recently posted a question about how to postpone execution of a function in Python (kind of equivalent to Javascript setTimeout) and it turns out to be a simple task using threading.Timer (well, simple as long as the function does not share state with other code, but that would create problems in any event-driven environment).
Now I am trying to do better and emulate setInterval. For those who are not familiar with Javascript, setInterval allows to repeat a call to a function every x seconds, without blocking the execution of other code. I have created this example decorator:
import time, threading
def setInterval(interval, times = -1):
# This will be the actual decorator,
# with fixed interval and times parameter
def outer_wrap(function):
# This will be the function to be
# called
def wrap(*args, **kwargs):
# This is another function to be executed
# in a different thread to simulate setInterval
def inner_wrap():
i = 0
while i != times:
time.sleep(interval)
function(*args, **kwargs)
i += 1
threading.Timer(0, inner_wrap).start()
return wrap
return outer_wrap
to be used as follows
#setInterval(1, 3)
def foo(a):
print(a)
foo('bar')
# Will print 'bar' 3 times with 1 second delays
and it seems to me it is working fine. My problem is that
it seems overly complicated, and I fear I may have missed a simpler/better mechanism
the decorator can be called without the second parameter, in which case it will go on forever. When I say foreover, I mean forever - even calling sys.exit() from the main thread will not stop it, nor will hitting Ctrl+c. The only way to stop it is to kill python process from the outside. I would like to be able to send a signal from the main thread that would stop the callback. But I am a beginner with threads - how can I communicate between them?
EDIT In case anyone wonders, this is the final version of the decorator, thanks to the help of jd
import threading
def setInterval(interval, times = -1):
# This will be the actual decorator,
# with fixed interval and times parameter
def outer_wrap(function):
# This will be the function to be
# called
def wrap(*args, **kwargs):
stop = threading.Event()
# This is another function to be executed
# in a different thread to simulate setInterval
def inner_wrap():
i = 0
while i != times and not stop.isSet():
stop.wait(interval)
function(*args, **kwargs)
i += 1
t = threading.Timer(0, inner_wrap)
t.daemon = True
t.start()
return stop
return wrap
return outer_wrap
It can be used with a fixed amount of repetitions as above
#setInterval(1, 3)
def foo(a):
print(a)
foo('bar')
# Will print 'bar' 3 times with 1 second delays
or can be left to run until it receives a stop signal
import time
#setInterval(1)
def foo(a):
print(a)
stopper = foo('bar')
time.sleep(5)
stopper.set()
# It will stop here, after printing 'bar' 5 times.
Your solution looks fine to me.
There are several ways to communicate with threads. To order a thread to stop, you can use threading.Event(), which has a wait() method that you can use instead of time.sleep().
stop_event = threading.Event()
...
stop_event.wait(1.)
if stop_event.isSet():
return
...
For your thread to exit when the program is terminated, set its daemon attribute to True before calling start(). This applies to Timer() objects as well because they subclass threading.Thread. See http://docs.python.org/library/threading.html#threading.Thread.daemon
Maybe these are the easiest setInterval equivalent in python:
import threading
def set_interval(func, sec):
def func_wrapper():
set_interval(func, sec)
func()
t = threading.Timer(sec, func_wrapper)
t.start()
return t
Maybe a bit simpler is to use recursive calls to Timer:
from threading import Timer
import atexit
class Repeat(object):
count = 0
#staticmethod
def repeat(rep, delay, func):
"repeat func rep times with a delay given in seconds"
if Repeat.count < rep:
# call func, you might want to add args here
func()
Repeat.count += 1
# setup a timer which calls repeat recursively
# again, if you need args for func, you have to add them here
timer = Timer(delay, Repeat.repeat, (rep, delay, func))
# register timer.cancel to stop the timer when you exit the interpreter
atexit.register(timer.cancel)
timer.start()
def foo():
print "bar"
Repeat.repeat(3,2,foo)
atexit allows to signal stopping with CTRL-C.
this class Interval
class ali:
def __init__(self):
self.sure = True;
def aliv(self,func,san):
print "ali naber";
self.setInterVal(func, san);
def setInterVal(self,func, san):
# istenilen saniye veya dakika aralığında program calışır.
def func_Calistir():
func(func,san); #calışıcak fonksiyon.
self.t = threading.Timer(san, func_Calistir)
self.t.start()
return self.t
a = ali();
a.setInterVal(a.aliv,5);