I have some code which runs routinely, and every now and then (like once a month) the program seems to hang somewhere and I'm not sure where.
I thought I would implement [what has turned out to be not quite] a "quick fix" of checking how long the program has been running for. I decided to use multithreading to call the function, and then while it is running, check the time.
For example:
import datetime
import threading
def myfunc():
#Code goes here
t=threading.Thread(target=myfunc)
t.start()
d1=datetime.datetime.utcnow()
while threading.active_count()>1:
if (datetime.datetime.utcnow()-d1).total_seconds()>60:
print 'Exiting!'
raise SystemExit(0)
However, this does not close the other thread (myfunc).
What is the best way to go about killing the other thread?
The docs could be clearer about this. Raising SystemExit tells the interpreter to quit, but "normal" exit processing is still done. Part of normal exit processing is .join()-ing all active non-daemon threads. But your rogue thread never ends, so exit processing waits forever to join it.
As #roippi said, you can do
t.daemon = True
before starting it. Normal exit processing does not wait for daemon threads. Your OS should kill them then when the main process exits.
Another alternative:
import os
os._exit(13) # whatever exit code you want goes there
That stops the interpreter "immediately", and skips all normal exit processing.
Pick your poison ;-)
There is no way to kill a thread. You must kill the target from within the target. The best way is with a hook and a queue. It goes something like this.
import Threading
from Queue import Queue
# add a kill_hook arg to your function, kill_hook
# is a queue used to pass messages to the main thread
def myfunc(*args, **kwargs, kill_hook=None):
#Code goes here
# put this somewhere which is periodically checked.
# an ideal place to check the hook is when logging
try:
if q.get_nowait(): # or use q.get(True, 5) to wait a longer
print 'Exiting!'
raise SystemExit(0)
except Queue.empty:
pass
q = Queue() # the queue used to pass the kill call
t=threading.Thread(target=myfunc, args = q)
t.start()
d1=datetime.datetime.utcnow()
while threading.active_count()>1:
if (datetime.datetime.utcnow()-d1).total_seconds()>60:
# if your kill criteria are met, put something in the queue
q.put(1)
I originally found this answer somewhere online, possibly this. Hope this helps!
Another solution would be to use a separate instance of Python, and monitor the other Python thread, killing it from the system level, with psutils.
Wow, I like the daemon and stealth os._exit solutions too!
Related
I am writing an queue processing application which uses threads for waiting on and responding to queue messages to be delivered to the app. For the main part of the application, it just needs to stay active. For a code example like:
while True:
pass
or
while True:
time.sleep(1)
Which one will have the least impact on a system? What is the preferred way to do nothing, but keep a python app running?
I would imagine time.sleep() will have less overhead on the system. Using pass will cause the loop to immediately re-evaluate and peg the CPU, whereas using time.sleep will allow the execution to be temporarily suspended.
EDIT: just to prove the point, if you launch the python interpreter and run this:
>>> while True:
... pass
...
You can watch Python start eating up 90-100% CPU instantly, versus:
>>> import time
>>> while True:
... time.sleep(1)
...
Which barely even registers on the Activity Monitor (using OS X here but it should be the same for every platform).
Why sleep? You don't want to sleep, you want to wait for the threads to finish.
So
# store the threads you start in a your_threads list, then
for a_thread in your_threads:
a_thread.join()
See: thread.join
If you are looking for a short, zero-cpu way to loop forever until a KeyboardInterrupt, you can use:
from threading import Event
Event().wait()
Note: Due to a bug, this only works on Python 3.2+. In addition, it appears to not work on Windows. For this reason, while True: sleep(1) might be the better option.
For some background, Event objects are normally used for waiting for long running background tasks to complete:
def do_task():
sleep(10)
print('Task complete.')
event.set()
event = Event()
Thread(do_task).start()
event.wait()
print('Continuing...')
Which prints:
Task complete.
Continuing...
signal.pause() is another solution, see https://docs.python.org/3/library/signal.html#signal.pause
Cause the process to sleep until a signal is received; the appropriate handler will then be called. Returns nothing. Not on Windows. (See the Unix man page signal(2).)
I've always seen/heard that using sleep is the better way to do it. Using sleep will keep your Python interpreter's CPU usage from going wild.
You don't give much context to what you are really doing, but maybe Queue could be used instead of an explicit busy-wait loop? If not, I would assume sleep would be preferable, as I believe it will consume less CPU (as others have already noted).
[Edited according to additional information in comment below.]
Maybe this is obvious, but anyway, what you could do in a case where you are reading information from blocking sockets is to have one thread read from the socket and post suitably formatted messages into a Queue, and then have the rest of your "worker" threads reading from that queue; the workers will then block on reading from the queue without the need for neither pass, nor sleep.
Running a method as a background thread with sleep in Python:
import threading
import time
class ThreadingExample(object):
""" Threading example class
The run() method will be started and it will run in the background
until the application exits.
"""
def __init__(self, interval=1):
""" Constructor
:type interval: int
:param interval: Check interval, in seconds
"""
self.interval = interval
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
""" Method that runs forever """
while True:
# Do something
print('Doing something imporant in the background')
time.sleep(self.interval)
example = ThreadingExample()
time.sleep(3)
print('Checkpoint')
time.sleep(2)
print('Bye')
I have some testcases where I start a webserver process and then
run some URL tests to check if every function runs fine.
The server process start-up time is depending on the system where it is executed. It's a matter of seconds and I work with a time.sleep(5) for now.
But honestly I'm not a huge fan of sleep() since it might work for my systems but what if the test runs on a system where server needs 6 secs to start ... (so it's never really safe to go that way..)
Tests will fail for no reason at all.
So the question is: is there a nice way to check if the process really started.
I use the python multiprocessing module
Example:
from multiprocessing import Process
import testapp.server
import requests
import testapp.config as cfg
import time
p = Process(target=testapp.server.main)
p.start()
time.sleep(5)
testurl=cfg.server_settings["protocol"] + cfg.server_settings["host"] + ":" +str(cfg.server_settings["port"]) + "/test/12"
r = requests.get(testurl)
p.terminate()
assert int(r.text)==12
So it would be nice to avoid the sleep() and really check when the process started ...
You should use is_alive (docs) but that would almost always return True after you initiated start() on the process. If you want to make sure the process is already doing something important, there's no getting around the time.sleep (at least from this end, look at the last paragraph for another idea)
In any case, you could implement is_alive like this:
p = Process(target=testapp.server.main)
p.start()
while not p.is_alive():
time.sleep(0.1)
do_something_once_alive()
As you can see we still need to "sleep" and check again (just 0.1 seconds), but it will probably be much less than 5 seconds until is_alive returns True.
If both is_alive and time.sleep aren't accurate enough for you to know if the process really does something specific yet, and if you're controlling the other program as well, you should have it raise another kind of flag so you know you're good to go.
I suggest creating your process with a connection object as argument (other synchronization primitives may work) and use the send() method within your child process to notify your parent process that business can go on. Use the recv() method on the parent end of the connection object.
import multiprocessing as mp
def worker(conn):
conn.send(0) # argument object must be pickable
# your worker is ready to do work and just signaled it to the parent
out_conn, in_conn = mp.Pipe()
process = mp.Process(target=worker,
args=(out_conn,))
process.start()
in_conn.recv() # Will block until something is received
# worker in child process signaled it is ready. Business can go on
I was reading about Queue in the Python documentation and this book, and I don't fully understand why my thread hangs. I have the following mcve:
from threading import Thread
import queue
def print_number(number_queue_display):
while True:
number = number_queue_display.get()
print(number)
number_queue_display.task_done()
number_queue = queue.Queue()
printing_numbers = Thread(target=print_number, args=(number_queue,),)
printing_numbers.start()
number_queue.put(5)
number_queue.put(10)
number_queue.put(15)
number_queue.put(20)
number_queue.join()
printing_numbers.join()
The only time it works is if I set the thread to daemon like so:
printing_numbers.setDaemon(True)
but that's because as stated in the Python documentation, the program will exit when only the daemon threads are left. The Python docs example for Queue doesn't use a daemon thread.
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left.
Even if I were to remove the two joins(number_queue.join() printing_numbers.join()), it still hangs, but I'm unsure of why.
Questions:
Why is it hanging?
How do I keep it as a non-daemon thread, but prevent it from hanging?
print_number() is running an infinite loop - it never exits, so the thread never ends. It sits in number_queue_display.get() forever, waiting for another queue item that never appears. Then, since the thread never ends, printing_numbers.join() also waits forever.
So you need some way to tell the thread to quit. One common way is to put a special "sentinel" value on the queue, and have the thread exit when it sees that. For concreteness, here's a complete program, which is very much the same as what you started with. None is used as the sentinel (and is commonly used for this purpose), but any unique object would work. Note that the .task_done() parts were removed, because they no longer serve a purpose.
from threading import Thread
import queue
def print_number(number_queue_display):
while True:
number = number_queue_display.get()
if number is None:
break
print(number)
number_queue = queue.Queue()
printing_numbers = Thread(target=print_number, args=(number_queue,),)
printing_numbers.start()
number_queue.put(5)
number_queue.put(10)
number_queue.put(15)
number_queue.put(20)
number_queue.put(None) # tell the thread it's done
printing_numbers.join() # wait for the thread to exit
I am using the new concurrent.futures module (which also has a Python 2 backport) to do some simple multithreaded I/O. I am having trouble understanding how to cleanly kill tasks started using this module.
Check out the following Python 2/3 script, which reproduces the behavior I'm seeing:
#!/usr/bin/env python
from __future__ import print_function
import concurrent.futures
import time
def control_c_this():
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future1 = executor.submit(wait_a_bit, name="Jack")
future2 = executor.submit(wait_a_bit, name="Jill")
for future in concurrent.futures.as_completed([future1, future2]):
future.result()
print("All done!")
def wait_a_bit(name):
print("{n} is waiting...".format(n=name))
time.sleep(100)
if __name__ == "__main__":
control_c_this()
While this script is running it appears impossible to kill cleanly using the regular Control-C keyboard interrupt. I am running on OS X.
On Python 2.7 I have to resort to kill from the command line to kill the script. Control-C is just ignored.
On Python 3.4, Control-C works if you hit it twice, but then a lot of strange stack traces are dumped.
Most documentation I've found online talks about how to cleanly kill threads with the old threading module. None of it seems to apply here.
And all the methods provided within the concurrent.futures module to stop stuff (like Executor.shutdown() and Future.cancel()) only work when the Futures haven't started yet or are complete, which is pointless in this case. I want to interrupt the Future immediately.
My use case is simple: When the user hits Control-C, the script should exit immediately like any well-behaved script does. That's all I want.
So what's the proper way to get this behavior when using concurrent.futures?
It's kind of painful. Essentially, your worker threads have to be finished before your main thread can exit. You cannot exit unless they do. The typical workaround is to have some global state, that each thread can check to determine if they should do more work or not.
Here's the quote explaining why. In essence, if threads exited when the interpreter does, bad things could happen.
Here's a working example. Note that C-c takes at most 1 sec to propagate because the sleep duration of the child thread.
#!/usr/bin/env python
from __future__ import print_function
import concurrent.futures
import time
import sys
quit = False
def wait_a_bit(name):
while not quit:
print("{n} is doing work...".format(n=name))
time.sleep(1)
def setup():
executor = concurrent.futures.ThreadPoolExecutor(max_workers=5)
future1 = executor.submit(wait_a_bit, "Jack")
future2 = executor.submit(wait_a_bit, "Jill")
# main thread must be doing "work" to be able to catch a Ctrl+C
# http://www.luke.maurits.id.au/blog/post/threads-and-signals-in-python.html
while (not (future1.done() and future2.done())):
time.sleep(1)
if __name__ == "__main__":
try:
setup()
except KeyboardInterrupt:
quit = True
I encountered this, but the issue I had was that many futures (10's of thousands) would be waiting to run and just pressing Ctrl-C left them waiting, not actually exiting. I was using concurrent.futures.wait to run a progress loop and needed to add a try ... except KeyboardInterrupt to handle cancelling unfinished Futures.
POLL_INTERVAL = 5
with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as pool:
futures = [pool.submit(do_work, arg) for arg in large_set_to_do_work_over]
# next line returns instantly
done, not_done = concurrent.futures.wait(futures, timeout=0)
try:
while not_done:
# next line 'sleeps' this main thread, letting the thread pool run
freshly_done, not_done = concurrent.futures.wait(not_done, timeout=POLL_INTERVAL)
done |= freshly_done
# more polling stats calculated here and printed every POLL_INTERVAL seconds...
except KeyboardInterrupt:
# only futures that are not done will prevent exiting
for future in not_done:
# cancel() returns False if it's already done or currently running,
# and True if was able to cancel it; we don't need that return value
_ = future.cancel()
# wait for running futures that the above for loop couldn't cancel (note timeout)
_ = concurrent.futures.wait(not_done, timeout=None)
If you're not interested in keeping exact track of what got done and what didn't (i.e. don't want a progress loop), you can replace the first wait call (the one with timeout=0) with not_done = futures and still leave the while not_done: logic.
The for future in not_done: cancel loop can probably behave differently based on that return value (or be written as a comprehension), but waiting for futures that are done or canceled isn't really waiting - it returns instantly. The last wait with timeout=None ensures that pool's running jobs really do finish.
Again, this only works correctly if the do_work that's being called actually, eventually returns within a reasonable amount of time. That was fine for me - in fact, I want to be sure that if do_work gets started, it runs to completion. If do_work is 'endless' then you'll need something like cdosborn's answer that uses a variable visible to all the threads, signaling them to stop themselves.
Late to the party, but I just had the same problem.
I want to kill my program immediately and I don't care what's going on. I don't need a clean shutdown beyond what Linux will do.
I found that replacing geitda's code in the KeyboardInterrupt exception handler with os.kill(os.getpid(), 9) exits immediately after the first ^C.
main = str(os.getpid())
def ossystem(c):
return subprocess.Popen(c, shell=True, stdout=subprocess.PIPE).stdout.read().decode("utf-8").strip()
def killexecutor():
print("Killing")
pids = ossystem('ps -a | grep scriptname.py').split('\n')
for pid in pids:
pid = pid.split(' ')[0].strip()
if(str(pid) != main):
os.kill(int(pid), 9)
...
killexecutor()
I've seen a lot of questions related to this... but my code works on python 2.6.2 and fails to work on python 2.6.5. Am I wrong in thinking that the whole atexit "functions registered via this module are not called when the program is killed by a signal" thing shouldn't count here because I'm catching the signal and then exiting cleanly? What's going on here? Whats the proper way to do this?
import atexit, sys, signal, time, threading
terminate = False
threads = []
def test_loop():
while True:
if terminate:
print('stopping thread')
break
else:
print('looping')
time.sleep(1)
#atexit.register
def shutdown():
global terminate
print('shutdown detected')
terminate = True
for thread in threads:
thread.join()
def close_handler(signum, frame):
print('caught signal')
sys.exit(0)
def run():
global threads
thread = threading.Thread(target=test_loop)
thread.start()
threads.append(thread)
while True:
time.sleep(2)
print('main')
signal.signal(signal.SIGINT, close_handler)
if __name__ == "__main__":
run()
python 2.6.2:
$ python halp.py
looping
looping
looping
main
looping
main
looping
looping
looping
main
looping
^Ccaught signal
shutdown detected
stopping thread
python 2.6.5:
$ python halp.py
looping
looping
looping
main
looping
looping
main
looping
looping
main
^Ccaught signal
looping
looping
looping
looping
...
looping
looping
Killed <- kill -9 process at this point
The main thread on 2.6.5 appears to never execute the atexit functions.
The root difference here is actually unrelated to both signals and atexit, but rather a change in the behavior of sys.exit.
Before around 2.6.5, sys.exit (more accurately, SystemExit being caught at the top level) would cause the interpreter to exit; if threads were still running, they'd be terminated, just as with POSIX threads.
Around 2.6.5, the behavior changed: the effect of sys.exit is now essentially the same as returning from the main function of the program. When you do that--in both versions--the interpreter waits for all threads to be joined before exiting.
The relevant change is that Py_Finalize now calls wait_for_thread_shutdown() near the top, where it didn't before.
This behavioral change seems incorrect, primarily because it no longer functions as documented, which is simply: "Exit from Python." The practical effect is no longer to exit from Python, but simply to exit the thread. (As a side note, sys.exit has never exited Python when called from another thread, but that obscure divergance from documented behavior doesn't justify a much bigger one.)
I can see the appeal of the new behavior: rather than two ways to exit the main thread ("exit and wait for threads" and "exit immediately"), there's only one, as sys.exit is essentially identical to simply returning from the top function. However, it's a breaking change and diverges from documented behavior, which far outweighs that.
Because of this change, after sys.exit from the signal handler above, the interpreter sits around waiting for threads to exit and then runs atexit handlers after they do. Since it's the handler itself that tells the threads to exit, the result is a deadlock.
Exiting due to a signal is not the same as exiting from within a signal handler. Catching a signal and exiting with sys.exit is a clean exit, not an exit due to a signal handler. So, yes, I agree that it should run atexit handlers here--at least in principle.
However, there's something tricky about signal handlers: they're completely asynchronous. They can interrupt the program flow at any time, between any VM opcode. Take this code, for example. (Treat this as the same form as your code above; I've omitted code for brevity.)
import threading
lock = threading.Lock()
def test_loop():
while not terminate:
print('looping')
with lock:
print "Executing synchronized operation"
time.sleep(1)
print('stopping thread')
def run():
while True:
time.sleep(2)
with lock:
print "Executing another synchronized operation"
print('main')
There's a serious problem here: a signal (eg. ^C) may be received while run() is holding lock. If that happens, your signal handler will be run with the lock still held. It'll then wait for test_loop to exit, and if that thread is waiting for the lock, you'll deadlock.
This is a whole category of problems, and it's why a lot of APIs say not to call them from within signal handlers. Instead, you should set a flag to tell the main thread to shut down at an appropriate time.
do_shutdown = False
def close_handler(signum, frame):
global do_shutdown
do_shutdown = True
print('caught signal')
def run():
while not do_shutdown:
...
My preference is to avoid exiting the program with sys.exit entirely and to explicitly do cleanup at the main exit point (eg. the end of run()), but you can use atexit here if you want.
I'm not sure if this was entirely changed, but this is how I have my atexit done in 2.6.5
atexit.register(goodbye)
def goodbye():
print "\nStopping..."