I have an application that fires up a series of threads. Occassionally, one of these threads dies (usually due to a network problem). How can I properly detect a thread crash and restart just that thread? Here is example code:
import random
import threading
import time
class MyThread(threading.Thread):
def __init__(self, pass_value):
super(MyThread, self).__init__()
self.running = False
self.value = pass_value
def run(self):
self.running = True
while self.running:
time.sleep(0.25)
rand = random.randint(0,10)
print threading.current_thread().name, rand, self.value
if rand == 4:
raise ValueError('Returned 4!')
if __name__ == '__main__':
group1 = []
group2 = []
for g in range(4):
group1.append(MyThread(g))
group2.append(MyThread(g+20))
for m in group1:
m.start()
print "Now start second wave..."
for p in group2:
p.start()
In this example, I start 4 threads then I start 4 more threads. Each thread randomly generates an int between 0 and 10. If that int is 4, it raises an exception. Notice that I don't join the threads. I want both group1 and group2 list of threads to be running. I found that if I joined the threads it would wait until the thread terminated. My thread is supposed to be a daemon process, thus should rarely (if ever) hit the ValueError Exception this example code is showing and should be running constantly. By joining it, the next set of threads doesn't begin.
How can I detect that a specific thread died and restart just that one thread?
I have attempted the following loop right after my for p in group2 loop.
while True:
# Create a copy of our groups to iterate over,
# so that we can delete dead threads if needed
for m in group1[:]:
if not m.isAlive():
group1.remove(m)
group1.append(MyThread(1))
for m in group2[:]:
if not m.isAlive():
group2.remove(m)
group2.append(MyThread(500))
time.sleep(5.0)
I took this method from this question.
The problem with this, is that isAlive() seems to always return True, because the threads never restart.
Edit
Would it be more appropriate in this situation to use multiprocessing? I found this tutorial. Is it more appropriate to have separate processes if I am going to need to restart the process? It seems that restarting a thread is difficult.
It was mentioned in the comments that I should check is_active() against the thread. I don't see this mentioned in the documentation, but I do see the isAlive that I am currently using. As I mentioned above, though, this returns True, thus I'm never able to see that a thread as died.
I had a similar issue and stumbled across this question. I found that join takes a timeout argument, and that is_alive will return False once the thread is joined. So my audit for each thread is:
def check_thread_alive(thr):
thr.join(timeout=0.0)
return thr.is_alive()
This detects thread death for me.
You could potentially put in an a try except around where you expect it to crash (if it can be anywhere you can do it around the whole run function) and have an indicator variable which has its status.
So something like the following:
class MyThread(threading.Thread):
def __init__(self, pass_value):
super(MyThread, self).__init__()
self.running = False
self.value = pass_value
self.RUNNING = 0
self.FINISHED_OK = 1
self.STOPPED = 2
self.CRASHED = 3
self.status = self.STOPPED
def run(self):
self.running = True
self.status = self.RUNNING
while self.running:
time.sleep(0.25)
rand = random.randint(0,10)
print threading.current_thread().name, rand, self.value
try:
if rand == 4:
raise ValueError('Returned 4!')
except:
self.status = self.CRASHED
Then you can use your loop:
while True:
# Create a copy of our groups to iterate over,
# so that we can delete dead threads if needed
for m in group1[:]:
if m.status == m.CRASHED:
value = m.value
group1.remove(m)
group1.append(MyThread(value))
for m in group2[:]:
if m.status == m.CRASHED:
value = m.value
group2.remove(m)
group2.append(MyThread(value))
time.sleep(5.0)
Related
I'm trying to implement a queue. This is old code which was either taken from some kind of tutorial that I did some time ago or from some kind of experimentation that I did reading the docs, or a mix of the two. Thing is I'm not sure if the code is mine or not, but I'm trying to use it as an example to learn from. The script has a producer that produces numbers in a list and 2 consumers competing for grabbing those numbers and adding them up, the one with the highest sum wins.
So, here's my question: in the following code in the "consume_numbers" function I have a time.sleep(0.01) line which makes the code run. Without it, the code hangs, with it it runs smoothly. Can someone explain why this happens and how I could implement a queue without this issue?
import concurrent.futures
import time
import random
import threading
import queue
class MyQueue(queue.Queue):
def __init__(self, maxsize=10):
super().__init__()
self.maxsize = maxsize
self.numbers = []
def set_number(self, number):
self.put(number)
self.numbers.append(number)
def get_number(self):
return self.get()
def produce_random_numbers(q: MyQueue, maxcount: int, evnt: threading.Event):
count = 0
while not evnt.is_set():
num = random.randint(1, 5)
q.set_number(num)
count += 1
if count > maxcount:
event.set()
def consume_numbers(q: MyQueue, consumed: list, evnt: threading.Event):
while not q.empty() or not evnt.is_set():
num = q.get_number()
time.sleep(0.01)
consumed.append(num)
if __name__ == "__main__":
q = MyQueue(maxsize=10)
event = threading.Event()
cons1 = []
cons2 = []
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as ex:
ex.submit(produce_random_numbers, q, 50, event)
ex.submit(consume_numbers, q, cons1, event)
ex.submit(consume_numbers, q, cons2, event)
event.set()
print(f'Generated Numbers: {q.numbers}')
print(f'Numbers Consumed by Thread1 which summed up to {sum(cons1)} are: {cons1}')
print(f'Numbers Consumed by Thread2 which summed up to {sum(cons2)} are: {cons2}')
if sum(cons1) > sum(cons2):
print("Thread1 Wins!")
elif sum(cons1) < sum(cons2):
print("Thread2 Wins!")
else:
print("It's a tie!")
Thanks!
The code does not implement a queue from scratch, but extends queue.Queue to add memory. There is an event object that is used to signal to the consumers that the producer thread has finished. There is are hidden race conditions in the consumers when there is only one item on the queue.
The check not q.empty() or not evnt.is_set() will run the loop code either if there is something in the queue or the event has not been set. It could happen that:
One thread sees that the queue is not empty and enters the loop
A thread switch happens, and the other thread consumes the last item
A switch happens to the first thread, which calls get_number() and blocks
A similar race condition happens with the evnt.is_set() check:
The last item is added to the queue by the producer, and a thread switch happens
One thread consumes the last item, a switch
A thread switch happens, the consumer gets the last item and goes back to the loop condition. As the event has not been set the loop is executed and get_number() blocks
Having the threads wait minimizes the chance of these conditions happening. Without waiting, it is very likely that a single consumer thread will consume all the queue items, while the other one is still entering its loop.
Using timeouts is cumbersome. A useful idiom that avoids using events is to use iter and use an impossible value as a sentinel:
# --- snip ---
def produce_random_numbers(q: MyQueue, maxcount: int, n_consumers: int):
for _ in range(maxcount):
num = random.randint(1, 5)
q.set_number(num)
for _ in range(n_consumers):
q.put(None) # <--- I use put to put one sentinel per consumer
def consume_numbers(q: MyQueue, consumed: list):
for num in iter(q.get_number, None):
consumed.append(num)
if __name__ == "__main__":
q = MyQueue(maxsize=10)
cons1 = []
cons2 = []
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as ex:
ex.submit(produce_random_numbers, q, 500000, 2)
ex.submit(consume_numbers, q, cons1)
ex.submit(consume_numbers, q, cons2)
print(f'Generated Numbers: {q.numbers}')
# --- snip ---
There are some other issues and things I would have done differently:
The event.set() after the with... block is useless: the event has already been set by the producer
There is a typo in the producer and the global event variable is used instead of the local evnt parameter. Fortunately those refer to the same object.
As there is only one producer, there will be no problem. Otherwise the order of MyQueue.numbers could be different from the order in which the items were added to the queue:
put is called on one thread
a thread switch happens
a put + append happens in the new thread
a thread switch happens, and the first value is appended
Instead of defining MyQueue.set_number I would have overrided put
I am running a python script every hour and I've been using time.sleep(3600) inside of a while loop. It seems to work as needed but I am worried about it blocking new tasks. My research of this seems to be that it only blocks the current thread but I want to be 100% sure. While the hourly job shouldn't take more than 15min, if it does or if it hangs, I don't want it to block the next one that starts. This is how I've done it:
import threading
import time
def long_hourly_job():
# do some long task
pass
if __name__ == "__main__":
while True:
thr = threading.Thread(target=long_hourly_job)
thr.start()
time.sleep(3600)
Is this sufficient?
Also, the reason i am using time.sleep for this hourly job rather than a cron job is I want to do everything in code to make dockerization cleaner.
The code will work (ie: sleep does only block the calling thread), but you should be careful of some issues. Some of them have been already stated in the comments, like the possibility of time overlaps between threads. The main issue is that your code is slowly leaking resources. After creating a thread, the OS keeps some data structures even after the thread has finished running. This is necessary, for example to keep the thread's exit status until the thread's creator requires it. The function to clear these structures (conceptually equivalent to closing a file) is called join. A thread that has finished running and is not joined is termed a 'zombie thread'. The amount of memory required by these structures is very small, and your program should run for centuries for any reasonable amount of available RAM. Nevertheless, it is a good practice to join all the threads you create. A simple approach (if you know that 3600 s is more than enough time for the thread to finish) would be:
if __name__ == "__main__":
while True:
thr = threading.Thread(target=long_hourly_job)
thr.start()
thr.join(3600) # wait at most 3600 s for the thread to finish
if thr.isAlive(): # join does not return useful information
print("Ooops: the last job did not finish on time")
A better approach if you think that it is possible that sometimes 3600 s is not enough time for the thread to finish:
if __name__ == "__main__":
previous = []
while True:
thr = threading.Thread(target=long_hourly_job)
thr.start()
previous.append(thr)
time.sleep(3600)
for i in reversed(range(len(previous))):
t = previous[i]
t.join(0)
if t.isAlive():
print("Ooops: thread still running")
else:
print("Thread finished")
previous.remove(t)
I know that the print statement makes no sense: use logging instead.
Perhaps a little late. I tested the code from other answers but my main process got stuck (perhaps I'm doing something wrong?). I then tried a different approach. It's based on threading Timer class, but trying to emulate the QtCore.QTimer() behavior and features:
import threading
import time
import traceback
class Timer:
SNOOZE = 0
ONEOFF = 1
def __init__(self, timerType=SNOOZE):
self._timerType = timerType
self._keep = threading.Event()
self._timerSnooze = None
self._timerOneoff = None
class _SnoozeTimer(threading.Timer):
# This uses threading.Timer class, but consumes more CPU?!?!?!
def __init__(self, event, msec, callback, *args):
threading.Thread.__init__(self)
self.stopped = event
self.msec = msec
self.callback = callback
self.args = args
def run(self):
while not self.stopped.wait(self.msec):
self.callback(*self.args)
def start(self, msec: int, callback, *args, start_now=False) -> bool:
started = False
if msec > 0:
if self._timerType == self.SNOOZE:
if self._timerSnooze is None:
self._timerSnooze = self._SnoozeTimer(self._keep, msec / 1000, callback, *args)
self._timerSnooze.start()
if start_now:
callback(*args)
started = True
else:
if self._timerOneoff is None:
self._timerOneoff = threading.Timer(msec / 1000, callback, *args)
self._timerOneoff.start()
started = True
return started
def stop(self):
if self._timerType == self.SNOOZE:
self._keep.set()
self._timerSnooze.join()
else:
self._timerOneoff.cancel()
self._timerOneoff.join()
def is_alive(self):
if self._timerType == self.SNOOZE:
isAlive = self._timerSnooze is not None and self._timerSnooze.is_alive() and not self._keep.is_set()
else:
isAlive = self._timerOneoff is not None and self._timerOneoff.is_alive()
return isAlive
isAlive = is_alive
KEEP = True
def callback():
global KEEP
KEEP = False
print("ENDED", time.strftime("%M:%S"))
if __name__ == "__main__":
count = 0
t = Timer(timerType=Timer.ONEOFF)
t.start(5000, callback)
print("START", time.strftime("%M:%S"))
while KEEP:
if count % 10000000 == 0:
print("STILL RUNNING")
count += 1
Notice the while loop runs in a separate thread, and uses a callback function to invoke when the time is over (in your case, this callback function would be used to check if the long running process has finished).
This question already has answers here:
Is there any way to kill a Thread?
(31 answers)
Closed 2 years ago.
I want to create threads that will add something to an array, but, if they don't do that in less than 2 seconds, I want to terminate them.
This is a prof of concept, so the code is simple. Every second I want a thread to add that item in the list, so a thread runs after 0, 1, 2, 3 and 4 seconds. The idea is to not let the thread 3 and 4 run.
import threading, time
myList = []
def foo(value):
global myList
time.sleep(value)
print("Value: {}".format(value))
myList.append(value)
threads = []
for i in range(5):
th = threading.Thread(target=foo, args=(i,))
threads.append(th)
for th in threads:
th.start()
What do I do now? I tried using some other logic like using
th.join(timeout)
But that doesn't seem to work.
As I said in a comment you can't really "kill" a thread (externally). However they can "commit suicide" by returning or raising a exception.
Below is example of doing the latter when the thread's execution time has exceeded a given amount of time. Note that this is not the same as doing a join(timeout) call, which only blocks until the thread ends or the specified amount of time has elapsed. That's why the printing of value and its appending to the list happens regardless of whether the thread finishes before the call to join() times-out or not.
I got the basic idea of using sys.settrace() from the tutorial titled Different ways to kill a Thread — although my implementation is slightly different. Also note that this approach likely introduces a significant amount of overhead.
import sys
import threading
import time
class TimelimitedThread(threading.Thread):
def __init__(self, *args, time_limit, **kwargs):
self.time_limit = time_limit
self._run_backup = self.run # Save superclass run() method.
self.run = self._run # Change it to custom version.
super().__init__(*args, **kwargs)
def _run(self):
self.start_time = time.time()
sys.settrace(self.globaltrace)
self._run_backup() # Call superclass run().
self.run = self._run_backup # Restore original.
def globaltrace(self, frame, event, arg):
return self.localtrace if event == 'call' else None
def localtrace(self, frame, event, arg):
if(event == 'line' and
time.time()-self.start_time > self.time_limit): # Over time?
raise SystemExit() # Terminate thread.
return self.localtrace
THREAD_TIME_LIMIT = 2.1 # Secs
threads = []
my_list = []
def foo(value):
global my_list
time.sleep(value)
print("Value: {}".format(value))
my_list.append(value)
for i in range(5):
th = TimelimitedThread(target=foo, args=(i,), time_limit=THREAD_TIME_LIMIT)
threads.append(th)
for th in threads:
th.start()
for th in threads:
th.join()
print('\nResults:')
print('my_list:', my_list)
Output:
Value: 0
Value: 1
Value: 2
Results:
my_list: [0, 1, 2]
Join() is used to wait for the respective thread to finish. To terminate a thread, use stop().. You can try as follows:
time.sleep(N)
th.join()
How do I change a parameter of a function running in an infinite loop in a thread (python)?
I am new to threading and python but this is what I want to do (simplified),
class myThread (threading.Thread):
def __init__(self, i):
threading.Thread.__init__(self)
def run(i):
self.blink(i)
def blink(i):
if i!=0:
if i==1:
speed=0.10
elif i==2:
speed=0.20
elif i==3:
speed=0.30
while(true):
print("speed\n")
i=3
blinkThread=myThread(i)
blinkThread.start()
while(i!=0):
i=input("Enter 0 to Exit or 1/2/3 to continue\n")
if i!=0:
blinkThread.run(i)
Now, obviously this code gives errors regarding the run() method. I want to run the function blink() in infinite loop but change the 'i' variable. I also cannot do it without a thread because I have other portions of code which are doing parallel tasks. What can I do?
Thanks!
Best thing to learn first, is to never change variables from different threads. Communicate over queues:
import threading
import queue
def drive(speed_queue):
speed = 1
while True:
try:
speed = speed_queue.get(timeout=1)
if speed == 0:
break
except queue.Empty:
pass
print("speed:", speed)
def main():
speed_queue = queue.Queue()
threading.Thread(target=drive, args=(speed_queue,)).start()
while True:
speed = int(input("Enter 0 to Exit or 1/2/3 to continue: "))
speed_queue.put(speed)
if speed == 0:
break
main()
Besides a lot of syntax errors, you're approaching the whole process wrong - there is no point in delegating the work from run to another method, but even if there was, the last while would loop infinitely (if it was actually written as while True:) never checking the speed change.
Also, don't use run() method to interface with your thread - it's a special method that gets called when starting the thread, you should handle your own updates separately.
You should also devote some time to learn OOP in Python as that's not how one makes a class.
Here's an example that does what you want, hope it might help you:
import threading
import time
class MyThread (threading.Thread):
def __init__(self, speed=0.1):
self._speed_cache = 0
self.speed = i
self.lock = threading.RLock()
super(MyThread, self).__init__()
def set_speed(self, speed): # you can use a proper setter if you want
with self.lock:
self.speed = speed
def run(self):
while True:
with self.lock:
if self.speed == 0:
print("Speed dropped to 0, exiting...")
break
# just so we don't continually print the speed, print only on change
if self.speed != self._speed_cache:
print("Current speed: {}".format(self.speed))
self._speed_cache = self.speed
time.sleep(0.1) # let it breathe
try:
input = raw_input # add for Python 2.6+ compatibility
except NameError:
pass
current_speed = 3 # initial speed
blink_thread = MyThread(current_speed)
blink_thread.start()
while current_speed != 0: # main loop until 0 speed is selected
time.sleep(0.1) # wait a little for an update
current_speed = int(input("Enter 0 to Exit or 1/2/3 to continue\n")) # add validation?
blink_thread.set_speed(current_speed)
Also, do note that threading is not executing anything in parallel - it uses GIL to switch between contexts but there are never two threads executing at absolutely the same time. Mutex (lock) in this sense is there just to ensure atomicity of operations, not actual exclusiveness.
If you need something to actually execute in parallel (if you have more than one core, that is), you'll need to use multiprocessing instead.
class Job(object):
def __init__(self, name):
self.name = name
self.depends = []
self.waitcount = 0
def work(self):
#does some work
def add_dependent(self, another_job)
self.depends.append(another_job)
self.waitcount += 1
so, waitcount is based on the number of jobs you have in depends
job_board = {}
# create a dependency tree
for i in range(1000):
# create random jobs
j = Job(<new name goes here>)
# add jobs to depends if dependent
# record it in job_board
job_board[j.name] = j
# example
# jobC is in self.depends of jobA and jobB
# jobC would have a waitcount of 2
rdyQ = Queue.Queue()
def worker():
try:
job = rdyQ.get()
success = job.work()
# if this job was successful create dependent jobs
if success:
for dependent_job in job.depends:
dependent_job.waitcount -= 1
if dependent_job.waitcount == 0:
rdyQ.put(dependent_job)
and then i would create threads
for i in range(10):
t = threading.Thread( target=worker )
t.daemon=True
t.start()
for job_name, job_obj in job_board.iteritems():
if job_obj.waitcount == 0:
rdyQ.put(job_obj)
while True:
# until all jobs finished wait
Now here is an example:
# example
# jobC is in self.depends of jobA and jobB
# jobC would have a waitcount of 2
now in this scenario, if both jobA and jobB are running and they both tried to decrement waitcount of jobC, weird things were happening
so i put a lock
waitcount_lock = threading.Lock()
and changed this code to:
# if this job was successful create dependent jobs
if success:
for dependent_job in job.depends:
with waitcount_lock:
dependent_job.waitcount -= 1
if dependent_job.waitcount == 0:
rdyQ.put(dependent_job)
and strange things still happen
i.e. same job was being processed by multiple threads, as if the job was put into the queue twice
is it not a best practice to have/modify nested objects when complex objects are being pass amongst threads?
Here's a complete, executable program that appears to work fine. I expect you're mostly seeing "weird" behavior because, as I suggested in a comment, you're counting job successors instead of job predecessors. So I renamed things with "succ" and "pred" in their names to make that much clearer. daemon threads are also usually a Bad Idea, so this code arranges to shut down all the threads cleanly when the work is over. Note too the use of assertions to verify that implicit beliefs are actually true ;-)
import threading
import Queue
import random
NTHREADS = 10
NJOBS = 10000
class Job(object):
def __init__(self, name):
self.name = name
self.done = False
self.succs = []
self.npreds = 0
def work(self):
assert not self.done
self.done = True
return True
def add_dependent(self, another_job):
self.succs.append(another_job)
another_job.npreds += 1
def worker(q, lock):
while True:
job = q.get()
if job is None:
break
success = job.work()
if success:
for succ in job.succs:
with lock:
assert succ.npreds > 0
succ.npreds -= 1
if succ.npreds == 0:
q.put(succ)
q.task_done()
jobs = [Job(i) for i in range(NJOBS)]
for i, job in enumerate(jobs):
# pick some random successors
possible = xrange(i+1, NJOBS)
succs = random.sample(possible,
min(len(possible),
random.randrange(10)))
for succ in succs:
job.add_dependent(jobs[succ])
q = Queue.Queue()
for job in jobs:
if job.npreds == 0:
q.put(job)
print q.qsize(), "ready jobs initially"
lock = threading.Lock()
threads = [threading.Thread(target=worker,
args=(q, lock))
for _ in range(NTHREADS)]
for t in threads:
t.start()
q.join()
# add sentinels so threads end cleanly
for t in threads:
q.put(None)
for t in threads:
t.join()
for job in jobs:
assert job.done
assert job.npreds == 0
CLARIFYING THE LOCK
In a sense, the lock in this code protects "too much". The potential problem it's addressing is that multiple threads may try to decrement the .npreds member of the same Job object simultaneously. Without mutual exclusion, the stored value at the end of that may be anywhere from 1 smaller than its initial value, to the correct result (the initial value minus the number of threads trying to decrement it).
But there's no need to also mutate the queue under lock protection. Queues do their own thread-safe locking. So, e.g., the code could be written like so instead:
for succ in job.succs:
with lock:
npreds = succ.npreds = succ.npreds - 1
assert npreds >= 0
if npreds == 0:
q.put(succ)
It's generally best practice to hold a lock for as little time as possible. However, I find this rewrite harder to follow. Pick your poison ;-)