I'm working on a programming project--writing a basic P2P filesharing application in Python. I'm using two threads: a main one to call select and wait for input from a list of sockets and sys.stdin (to receive typed commands) and a helper thread that takes status update messages off a queue and prints them. (It is the only thing that prints anything)
I'm also required to catch the standard SIGINT and handle it to exit gracefully. I have a quit method that does this; typing 'quit' as a command works just fine. So in the main thread I try setting this method as the handler for SIGINT. As far as I can tell, the process catches the signal and calls the quit method. The helper thread prints a message confirming that it is exiting. But then I get the following error message from the main thread:
Traceback (most recent call last):
File "peer.py", line 226, in <module>
main()
File "peer.py", line 223, in main
p.run()
File "peer.py", line 160, in run
readables, writables, exceptions = select(self.sockets, [], [])
select.error: (4, 'Interrupted system call')
After which the program does still exit. Whereas without the signal handler in place, sending a SIGINT gives me the following:
Traceback (most recent call last):
File "peer.py", line 225, in <module>
main()
File "peer.py", line 222, in main
p.run()
File "peer.py", line 159, in run
readables, writables, exceptions = select(self.sockets, [], [])
KeyboardInterrupt
Which fails to terminate the program; I have to stop and kill it. This is confusing because the SIGINT appears to interrupt the call to select only when it is caught by my custom method. (Which only puts a message on the print queue and sets a "done" variable) Does anyone know how this can be happening? Is it just a bad idea trying to use signal handlers and threads simultaneously?
I'm not sure about using signal handlers to catch this case, but I've found a recipe for handling this case on *nix based systems here: http://code.activestate.com/recipes/496735-workaround-for-missed-sigint-in-multithreaded-prog/
In a nutshell (If I undertand correctly):
Before you start any new threads, fork a child process (using os.fork) to finish the program run, and have the parent process watch for the KeyboardInterrupt.
When the parent catches the keyboard interrupt, you can kill the child process (which by now may have started other threads) using os.kill. This will, in turn, terminate any threads of that child process.
Yes, last night after I stopped working on it I realized that I did want it to interrupt. It was being interrupted by executing the signal handler, presumably. So I just catch the select.error and have it jump to the end of the loop, where it immediately exits and moves on to the cleanup code.
Related
I have a python application running on the kivy gui platform, and communicating with an AI game engine via
self.katago_proces = subprocess.Popen(
command,
startupinfo=startupinfo, # STARTF_USESHOWWINDOW on windows
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=False,
)
The two outputs are continuously checked in threads, while the input is written to as needed.
with self._lock:
if self.katago_process:
# store query + callback
try:
self.katago_process.stdin.write((json.dumps(query) + "\n").encode())
self.katago_process.stdin.flush() # <- this hangs forever
except OSError as e:
self.check_alive(os_error=str(e), exception_if_dead=True)
This has worked without issues, sending hundreds of queries to the engine and getting results back fine.
However, in a new feature which automatically submits a query in the callback from another one (automatically playing out a game to the end), I'm seeing stdin.flush() hang forever, breaking the application.
Specifically on exit, I print the stack tracebacks using:
for threadId, stack in sys._current_frames().items():
print(f"\n# ThreadID: {threadId}")
for filename, lineno, name, line in traceback.extract_stack(stack):
print(f"\tFile: {filename}, line {lineno}, in {name}")
if line:
print(f"\t\t{line.strip()}")
Which outputs something like:
# ThreadID: 22216 - a message queue waiting for user input, as expected
# ThreadID: 23044 - the stderr read thread waiting at .readline() as expected
# ThreadID: 26552 - self.katago_process.stdin.flush() , the problematic call that hangs, originating from the stdout read thread
# ThreadID: 8504 - the main gui thread printing this.
Checking with the debugger, I can confirm that the process/pipe is still there, i.e. process.poll() is None. This is expected, as otherwise there would be an OSError anyway. Calling flush in the debugger also hangs forever.
What could cause flush to hang forever, and what can be done about it?
This is on windows by the way, I have not been able to reproduce it on linux, although this could also be a gpu speed issue. Slowing down the queries (by asking for more calculation) also prevents the issue. It looks like some kind of weird race condition in the core library to me.
The problem was almost certainly this:
# ThreadID: 26552 - self.katago_process.stdin.flush() , the problematic call that hangs, originating from the stdout read thread
Since the C++ program tries to write+flush, while the thread that is supposed to read this output is in turn flushing, they get deadlocked occasionally.
I solved this by making a write queue+thread such that all the flushing to the external program is done in a separate thread.
I have two Python scripts foo.py and bar.py, foo.py will call bar.py via os.system().
#foo.py
import os
print os.getpid()
os.system("python dir/bar.py")
#bar.py
import time
time.sleep(10)
print "over"
Say the pid of foo.py is 123, if the program terminate normally, it'll print
123
over
If I type kill 123 while it's running, I'll get the following output
123
Terminated
over
If I press Ctrl-C while it's running, I'll get something like
123
^CTraceback (most recent call last):
File "dir/bar.py", line 4, in <module>
time.sleep(10)
KeyboardInterrupt
But if I type kill -SIGINT 123 while it's running, it seems the program will just ignore the signal and exit normally.
123
over
It seems to me that,
if I type kill 123, the sub-process will not be affected.
if I type Ctrl-C, both processes will be terminated.
if I type kill -SIGINT 123 while the sub-process is running, the signal will be ignored.
Can someone please explain to me how it works?
Isn't Ctrl-C and kill -SIGINT supposed to be equivalent?
If I type kill 123 is it guaranteed that the sub-process will not be affected (if it happens to be running)?
I am on Ubuntu 14.04 by the way. Thanks!
Let's consider each case in turn:
if I type kill 123, the sub-process will not be affected.
Yes, that's how kill [pid] works. It sends a signal only to the process you want to kill. If you want to send the signal to an group of processes, then you have to use the negative number representing the process group.
if I type Ctrl-C, both processes will be terminated.
I assume you mean "terminated by Ctrl-C". Actually, that's not the case: only the child is terminated. If you add at the end of foo.py a line like this one print "I'm a little teapot", you'll see this line gets printed. What happens is that the child gets the signal. The parent then continues from os.system. Without the additional line, it looks like the parent was also affected by the Ctrl-C but it is not the case, as the additional line shows.
You shell does send the signal to the process group that is associated with the tty, which includes the parent. However, os.system uses the system call which blocks the SIGINT and SIGQUIT signals in the process that makes the call. So the parent is immune.
If you do not use os.system, then your process will be affected by the SIGINT. Try this code for foo.py:
import os
import subprocess
print os.getpid()
p = subprocess.Popen(["python", "dir/bar.py"])
p.wait()
print "I'm a little teapot"
If you hit Ctrl-C while this runs, you'll get two tracebacks: one from the parent, one from the child:
$ python foo.py
29626
^CTraceback (most recent call last):
File "dir/bar.py", line 4, in <module>
Traceback (most recent call last):
File "foo.py", line 8, in <module>
time.sleep(10)
KeyboardInterrupt p.wait()
File "/usr/lib/python2.7/subprocess.py", line 1389, in wait
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
File "/usr/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call
return func(*args)
KeyboardInterrupt
if I type kill -SIGINT 123 while the sub-process is running, the signal will be ignored.
See above.
Isn't Ctrl-C and kill -SIGINT supposed to be equivalent?
Ctrl-C does send a SIGINT to foreground process group associated with the tty in which you issue the Ctrl-C.
If I type kill 123 is it guaranteed that the sub-process will not be affected (if it happens to be running)?
By itself kill 123 will send the signal only to the process with pid 123. Children won't be affected.
I've developed a program in Python and pyGtk and today added the singleton feature, which doesn't allow to run it if it is already running. But now I want to go further and, if its running, somehow make it call self.window.present() to showit.
So I've been looking at Signals, PIPE, FIFO, MQ, Socket, etc. for three hours now! I don't know if I'm just not seeing it or what, but can't find the way to do this (even when lots of apps do it)
Now, the question would be: How do I send a "signal" to a running instance of the same script (which is not in an infinite bucle listening for it, but doing it's job), to make it call a function?
I'm trying sending signals, using:
os.kill(int(apid[0]),signal.SIGUSR1)
and receiving them with:
signal.signal(signal.SIGUSR1, self.handler)
def handler(signum, frame):
print 'Signal handler called with signal', signum
but it kills the running process with
Traceback (most recent call last):
File "./algest_new.py", line 4080, in <module>
gtk.main()
KeyboardInterrupt
The simple answer is, you don't. When you say you have implemented a "singleton feature" I'm not sure exactly what you mean. It seems almost as though you are expecting the code in the second process to be able to see the singleton object in the first one, which clearly isn't possible. But I may have misunderstood.
The usual way to do this is to create a file with a unique name at a known location, typically containing the process id of the running process. If you start your program and it sees the file already present it knows to explain to the user that there's a copy already running. You could also send a signal to that process (under Unix, anyway) to tell it to bring its window to the foreground.
Oh, and don't forget that your program should delete the PIDfile when it terminates :-)
Confusingly, gtk.main will raise the KeyboardInterrupt exception if the signal handler raises any exception. With this program:
import gtk
import signal
def ohno(*args):
raise Exception("Oh no")
signal.signal(signal.SIGUSR1, ohno)
gtk.main()
After launching, calling os.kill(pid, signal.SIGUSR1) from another process results in this exception:
File "signaltest.py", line 9, in <module>
gtk.main()
KeyboardInterrupt
This seems to be an issue with pygtk - an exception raised by a signal.signal handler in a non-gtk python app will do the expected thing and display the handler's exception (e.g. "Oh no").
So in short: if gtk.main is raising KeyboardInterrupt in response to other signals, check that your signal handlers aren't raising exceptions of their own.
I'm trying to kill the notepad.exe process on windows using this function:
import thread, wmi, os
print 'CMD: Kill command called'
def kill():
c = wmi.WMI ()
Commands=['notepad.exe']
if Commands[0]!='All':
print 'CMD: Killing: ',Commands[0]
for process in c.Win32_Process ():
if process.Name==Commands[0]:
process.Terminate()
else:
print 'CMD: trying to kill all processes'
for process in c.Win32_Process ():
if process.executablepath!=inspect.getfile(inspect.currentframe()):
try:
process.Terminate()
except:
print 'CMD: Unable to kill: ',proc.name
kill() #Works
thread.start_new_thread( kill, () ) #Not working
It works like a charm when I'm calling the function like this:
kill()
But when running the function in a new thread it crashes and I have no idea why.
import thread, wmi, os
import pythoncom
print 'CMD: Kill command called'
def kill():
pythoncom.CoInitialize()
. . .
Running Windows functions in threads can be tricky since it often involves COM objects. Using pythoncom.CoInitialize() usually allows you do it. Also, you may want to take a look at the threading library. It's much easier to deal with than thread.
There are a couple of problems (EDIT: The second problem has been addressed since starting my answer, by "MikeHunter", so I will skip that):
Firstly, your program ends right after starting the thread, taking the thread with it. I will assume this is not a problem long-term because presumably this is going to be part of something bigger. To get around that, you can simulate something else keeping the program going by just adding a time.sleep() call at the end of the script with, say, 5 seconds as the sleep length.
This will allow the program to give us a useful error, which in your case is:
CMD: Kill command called
Unhandled exception in thread started by <function kill at 0x0223CF30>
Traceback (most recent call last):
File "killnotepad.py", line 4, in kill
c = wmi.WMI ()
File "C:\Python27\lib\site-packages\wmi.py", line 1293, in connect
raise x_wmi_uninitialised_thread ("WMI returned a syntax error: you're probably running inside a thread without first calling pythoncom.CoInitialize[Ex]")
wmi.x_wmi_uninitialised_thread: <x_wmi: WMI returned a syntax error: you're probably running inside a thread without first calling pythoncom.CoInitialize[Ex] (no underlying exception)>
As you can see, this reveals the real problem and leads us to the solution posted by MikeHunter.
I have a script that repeatedly runs an Ant buildfile and scrapes output into a parsable format. When I create the subprocess using Popen, there is a small time window where hitting Ctrl+C will kill the script, but will not kill the subprocess running Ant, leaving a zombie that is printing output to the console that can only be killed using Task Manager. Once Ant has started printing output, hitting Ctrl+C will always kill my script as well as Ant. Is there a way to make it so that hitting Ctrl+C will always kill the subprocess running Ant without leaving a zombie behind?
Also of note: I have a handler for SIGINT that performs a few cleanup operations before calling exit(0). If I manually kill the subprocess in the handler using os.kill(p.pid, signal.SIGTERM) (not SIGINT), then I can successfully kill the subprocess in situations where it would normally zombify. However, when you hit Ctrl+C once Ant has started producing output, you get a stacktrace from subprocess where it is unable to kill the subprocess itself as I have already killed it.
EDIT: My code looked something like:
p = Popen('ls')
def handle_sig_int(signum, stack_frame):
# perform cleanup
os.kill(p.pid, signal.SIGTERM)
exit(0)
signal.signal(signal.SIGINT, handle_sig_int)
p.wait()
Which would produce the following stacktrace when triggered incorrectly:
File "****.py", line ***, in run_test
p.wait()
File "/usr/lib/python2.5/subprocess.py", line 1122, in wait
pid, sts = os.waitpid(self.pid, 0)
File "****.py", line ***, in handle_sig_int
os.kill(p.pid, signal.SIGTERM)
I fixed it by catching the OSError raised by p.wait and exiting:
try:
p.wait()
except OSError:
exit('The operation was interrupted by the user')
This seems to work in the vast majority of my test runs. I occasionally get a uname: write error: Broken pipe, though I don't know what causes it. It seems to happen if I time the Ctrl+C just right before the child process can start displaying output.
Call p.terminate() in your SIGTERM handler:
if p.poll() is None: # Child still around?
p.terminate() # kill it
[EDIT] Since you're stuck with Python 2.5, use os.kill(p.pid, signal.SIGTERM) instead of p.terminate(). The check should make sure you don't get an exception (or reduce the number of times you get one).
To make it even better, you can catch the exception and check the message. If it means "child process not found", then ignore the exception. Otherwise, rethrow it with raise (no arguments).