pipe's stdin.flush() hangs forever - python

I have a python application running on the kivy gui platform, and communicating with an AI game engine via
self.katago_proces = subprocess.Popen(
command,
startupinfo=startupinfo, # STARTF_USESHOWWINDOW on windows
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=False,
)
The two outputs are continuously checked in threads, while the input is written to as needed.
with self._lock:
if self.katago_process:
# store query + callback
try:
self.katago_process.stdin.write((json.dumps(query) + "\n").encode())
self.katago_process.stdin.flush() # <- this hangs forever
except OSError as e:
self.check_alive(os_error=str(e), exception_if_dead=True)
This has worked without issues, sending hundreds of queries to the engine and getting results back fine.
However, in a new feature which automatically submits a query in the callback from another one (automatically playing out a game to the end), I'm seeing stdin.flush() hang forever, breaking the application.
Specifically on exit, I print the stack tracebacks using:
for threadId, stack in sys._current_frames().items():
print(f"\n# ThreadID: {threadId}")
for filename, lineno, name, line in traceback.extract_stack(stack):
print(f"\tFile: {filename}, line {lineno}, in {name}")
if line:
print(f"\t\t{line.strip()}")
Which outputs something like:
# ThreadID: 22216 - a message queue waiting for user input, as expected
# ThreadID: 23044 - the stderr read thread waiting at .readline() as expected
# ThreadID: 26552 - self.katago_process.stdin.flush() , the problematic call that hangs, originating from the stdout read thread
# ThreadID: 8504 - the main gui thread printing this.
Checking with the debugger, I can confirm that the process/pipe is still there, i.e. process.poll() is None. This is expected, as otherwise there would be an OSError anyway. Calling flush in the debugger also hangs forever.
What could cause flush to hang forever, and what can be done about it?
This is on windows by the way, I have not been able to reproduce it on linux, although this could also be a gpu speed issue. Slowing down the queries (by asking for more calculation) also prevents the issue. It looks like some kind of weird race condition in the core library to me.

The problem was almost certainly this:
# ThreadID: 26552 - self.katago_process.stdin.flush() , the problematic call that hangs, originating from the stdout read thread
Since the C++ program tries to write+flush, while the thread that is supposed to read this output is in turn flushing, they get deadlocked occasionally.
I solved this by making a write queue+thread such that all the flushing to the external program is done in a separate thread.

Related

Why os.remove() raises exception PermissionError?

On a Windows 7 platform I'm using python 3.6 as a framework to start working processes (written in C).
For starting the processes subprocess.Popen is used. The following shows the relevant code (one thread per process to be started).
redirstream = open(redirfilename, "w")
proc = subprocess.Popen(batchargs, shell=False, stdout=redirstream)
outs, errs = proc.communicate(timeout=60)
# wait for job to be finished
ret = proc.wait()
...
if ret == 0: # changed !!
redirstream.flush()
redirstream.close()
os.remove(redirfilename)
communicate is just used to be able, to terminate the executable after 60 seconds , for the case it hangs. redirstream is used to write output from the executable (written in C) to a file, used for general debugging purposes (not related to this issue). Of course, all processes are passed redirfiles with different filenames.
Up to ten such subprocesses are started in that way from independent python threads.
Although it works, I made a mysterious observation:
For the case an executable has finished without errors, I want to delete redirfilename, because it is not used anymore.
Now lets say, I have started process-A, B and C.
Processes A and B are finished and gave back 0 as result.
Process C however intentionally doesn't get data (just for testing, a serial connection has been disconnected) and waits for input from a named pipe (created from python) using Windows ReadFile function:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365467(v=vs.85).aspx
In that case, while "C" is still waiting for ReadFile to be finished, os.remove(redirfilename) for A and B sometimes throws exception "PermissionError", saying, that the file is still used by another process. But from task manager I can see, that the processes A and B are not existing anymore (as expected).
I tried to catch the PermissionError and repeat the delete command after some delay. Only after "C" has terminated (timeout after 60 seconds), the redirfile for A or B can be deleted.
Why is the redirstream still blocked and somehow in use, although the process behind is not alive anymore and why is it blocked by ReadFile() in a completely unrelated process, which is definitely not related to that particular file? Is that an issue in Python or in my implementation?
Any hints are highly appreciated...

Why does Popen.stdout contain only part of output?

I am running two processes simultaneously in python using the subprocess module:
p_topic = subprocess.Popen(['rostopic','echo','/msg/address'], stdout=PIPE)
p_play = subprocess.Popen(['rosbag','play',bagfile_path])
These are ROS processes: p_topic listens for a .bag file to be played and outputs certain information from that .bag file to the stdout stream; I want to then access this output using the p_topic.stdout object (which behaves as a file).
However, what I find happening is that the p_topic.stdout object only contains the first ~1/3 of the output lines it should have - that is, in comparison to running the two commands manually, simultaneously in two shells side by side.
I've tried waiting for many seconds for output to finish, but this doesn't change anything, its approximately the same ratio of lines captured by p_topic.stdout each time. Any hints on what this could be would be greatly appreciated!
EDIT:
Here's the reading code:
#wait for playing to stop
while p_play.poll() == None:
time.sleep(.1)
time.sleep(X)#wait for some time for the p_topic to finish
p_topic.terminate()
output=[]
for line in p_topic.stdout:
output.append(line)
Note that the value X in time.sleep(X) doesn't make any difference
By default, when a process's stdout is not connected to a terminal, the output is block buffered. When connected to a terminal, it's line buffered. You expect to get complete lines, but you can't unless rostopic unbuffers or explicitly line buffers its stdout (if it's a C program, you can use setvbuf to make this automatic).
The other (possibly overlapping) possibility is that the pipe buffer itself is filling (pipe buffers are usually fairly small), and because you never drain it, rostopic fills the pipe buffer and then blocks indefinitely until you kill it, leaving only what managed to fit in the pipe to be drained when you read the process's stdout. In that case, you'd need to either spawn a thread to keep the pipe drained from Python, or have your main thread use select module components to monitor and drain the pipe (intermingled with polling the other process). The thread is generally easier, though you do need to be careful to avoid thread safety issues.
is it worth trying process communicate/wait? rather than sleep and would that solve your issue?
i have this for general purpose so not sure if you can take this and change it to what you need?
executable_Params = "{0} {1} {2} {3} {4}".format(my_Binary,
arg1,
arg2,
arg3,
arg4)
# execute the process
process = subprocess.Popen(shlex.split(executable_Params),
shell=False,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
stdout, stderr = process.communicate()
ret_code = process.wait()
if ret_code == 0:
return 0
else:
#get the correct message from my enum method
error_msg = Process_Error_Codes(ret_code).name
raise subprocess.CalledProcessError(returncode=ret_code,
cmd=executable_Params)

Close Python subprocess.PIPE after process is terminated

I am using Python 2.7.8 to coordinate and automate the running of several application many times over in a Windows environment. During each run, I use subprocess.Popen to launch several child process, passing subprocess.PIPE for stdin and stdout to each as follows:
proc = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
where cmd is the list of arguments.
The script waits for an external trigger to know when a given run is done, and then terminates each application it is currently running by writing a string to the stdin of each Popen object. The applications read this string, and perform its own graceful shutdown (which is why I don't simply call kill() or terminate()).
# Try to shutdown process
timeout = 5
try:
if proc.poll() is None:
proc.stdin.write(cmd)
# Wait to see if proc shuts down gracefully
while timeout > 0:
if proc.poll() is not None:
break
else:
time.sleep(1)
timeout -= 1
else:
# Kill it the old fashioned way
proc.kill()
except Error:
pass # Process as necessary...
Once the applications are complete, I'm left with a Popen object. If I inspect the stdin or stdout members of that object, I get something like the following:
<open file '<fdopen>', mode 'wb' at 0x0277C758>
The script then loops to perform the next run, relaunching the necessary applications.
My question is, do I need to explicitly call close() for the stdin and stdout file descriptors each time, in order to avoid leaks, i.e. in the finally statement above? I am wondering this because it is possible for the loop to occur hundreds, or even thousands of times during a given script.
I've looked through the subprocess.py code, but the file handles for the pipes are created by an apparent Windows(-only) call in the _subprocess module, so I can't get any further detail.
The pipes might eventually be closed during the garbage collection but you should not rely on it and close the pipes explicitly.
def kill_process(process):
if process.poll() is None: # don't send the signal unless it seems it is necessary
try:
process.kill()
except OSError: # ignore
pass
# shutdown process in `timeout` seconds
t = Timer(timeout, kill_process, [proc])
t.start()
proc.communicate(cmd)
t.cancel()
.communicate() method closes the pipes and waits for the child process to exit.

python how to kill a popen process, shell false [why not working with standard methods]

I'm trying to kill a subprocess started with:
playing_long = Popen(["omxplayer", "/music.mp3"], stdout=subprocess.PIPE)
and after a while
pid = playing_long.pid
playing_long.terminate()
os.kill(pid,0)
playing_long.kill()
Which doesn't work.
Neither the solution pointed out here
How to terminate a python subprocess launched with shell=True
Noting that I am using threads, and it is not recommended to use preexec_fn when you use threads (or at least this is what I read, anyway it doesn't work either).
Why it is not working? There's no error message in the code, but I have to manually kill -9 the process to stop listening the mp3 file.
Thanks
EDIT:
From here, I have added a wait() after the kill().
Surprisingly, before re-starting the process I check if this is still await, so that I don't start a chorus with the mp3 file.
Without the wait(), the system sees that the process is alive.
With the wait(), the system understands that the process is dead and starts again it.
However, the process is still sounding. Definitively I can't seem to get it killed.
EDIT2: The problem is that omxplayer starts a second process that I don't kill, and it's the responsible for the actual music.
I've tried to use this code, found in several places in internet, it seems to work for everyone but not for me
playing_long.stdin.write('q')
playing_long.stdin.flush()
And it prints 'NoneType' object has no attribute 'write'. Even when using this code immediately after starting the popen process, it fails with the same message
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write('q')
playing_long.stdin.flush()
EDIT3: The problem then was that I wasn't establishing the stdin line in the popen line. Now it is
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write(b'q')
playing_long.stdin.flush()
*needing to specify that it is bytes what I write in stdin
Final solution then (see the process edited in the question):
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write(b'q')
playing_long.stdin.flush()

Detecting the end of the stream on popen.stdout.readline

I have a python program which launches subprocesses using Popen and consumes their output nearly real-time as it is produced. The code of the relevant loop is:
def run(self, output_consumer):
self.prepare_to_run()
popen_args = self.get_popen_args()
logging.debug("Calling popen with arguments %s" % popen_args)
self.popen = subprocess.Popen(**popen_args)
while True:
outdata = self.popen.stdout.readline()
if not outdata and self.popen.returncode is not None:
# Terminate when we've read all the output and the returncode is set
break
output_consumer.process_output(outdata)
self.popen.poll() # updates returncode so we can exit the loop
output_consumer.finish(self.popen.returncode)
self.post_run()
def get_popen_args(self):
return {
'args': self.command,
'shell': False, # Just being explicit for security's sake
'bufsize': 0, # More likely to see what's being printed as it happens
# Not guarantted since the process itself might buffer its output
# run `python -u` to unbuffer output of a python processes
'cwd': self.get_cwd(),
'env': self.get_environment(),
'stdout': subprocess.PIPE,
'stderr': subprocess.STDOUT,
'close_fds': True, # Doesn't seem to matter
}
This works great on my production machines, but on my dev machine, the call to .readline() hangs when certain subprocesses complete. That is, it will successfully process all of the output, including the final output line saying "process complete", but then will again poll readline and never return. This method exits properly on the dev machine for most of the sub-processes I call, but consistently fails to exit for one complex bash script that itself calls many sub-processes.
It's worth noting that popen.returncode gets set to a non-None (usually 0) value many lines before the end of the output. So I can't just break out of the loop when that is set or else I lose everything that gets spat out at the end of the process and is still buffered waiting for reading. The problem is that when I'm flushing the buffer at that point, I can't tell when I'm at the end because the last call to readline() hangs. Calling read() also hangs. Calling read(1) gets me every last character out, but also hangs after the final line. popen.stdout.closed is always False. How can I tell when I'm at the end?
All systems are running python 2.7.3 on Ubuntu 12.04LTS. FWIW, stderr is being merged with stdout using stderr=subprocess.STDOUT.
Why the difference? Is it failing to close stdout for some reason? Could the sub-sub-processes do something to keep it open somehow? Could it be because I'm launching the process from a terminal on my dev box, but in production it's launched as a daemon through supervisord? Would that change the way the pipes are processed and if so how do I normalize them?
The main code loop looks right. It could be that the pipe isn't closing because another process is keeping it open. For example, if script launches a background process that writes to stdout then the pipe will no close. Are you sure no other child process still running?
An idea is to change modes when you see the .returncode has set. Once you know the main process is done, read all its output from buffer, but don't get stuck waiting. You can use select to read from the pipe with a timeout. Set a several seconds timeout and you can clear the buffer without getting stuck waiting child process.
Without knowing the contents of the "one complex bash script" which causes the problem, there's too many possibilities to determine the exact cause.
However, focusing on the fact that you claim it works if you run your Python script under supervisord, then it might be getting stuck if a sub-process is trying to read from stdin, or just behaves differently if stdin is a tty, which (I presume) supervisord will redirect from /dev/null.
This minimal example seems to cope better with cases where my example test.sh runs subprocesses which try to read from stdin...
import os
import subprocess
f = subprocess.Popen(args='./test.sh',
shell=False,
bufsize=0,
stdin=open(os.devnull, 'rb'),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
close_fds=True)
while 1:
s = f.stdout.readline()
if not s and f.returncode is not None:
break
print s.strip()
f.poll()
print "done %d" % f.returncode
Otherwise, you can always fall back to using a non-blocking read, and bail out when you get your final output line saying "process complete", although it's a bit of a hack.
If you use readline() or read(), it should not hang. No need to check returncode or poll(). If it is hanging when you know the process is finished, it is most probably a subprocess keeping your pipe open, as others said before.
There are two things you could do to debug this:
* Try to reproduce with a minimal script instead of the current complex one, or
* Run that complex script with strace -f -e clone,execve,exit_group and see what is that script starting, and if any process is surviving the main script (check when the main script calls exit_group, if strace is still waiting after that, you have a child still alive).
I find that calls to read (or readline) sometimes hang, despite previously calling poll. So I resorted to calling select to find out if there is readable data. However, select without a timeout can hang, too, if the process was closed. So I call select in a semi-busy loop with a tiny timeout for each iteration (see below).
I'm not sure if you can adapt this to readline, as readline might hang if the final \n is missing, or if the process doesn't close its stdout before you close its stdin and/or terminate it. You could wrap this in a generator, and everytime you encounter a \n in stdout_collected, yield the current line.
Also note that in my actual code, I'm using pseudoterminals (pty) to wrap the popen handles (to more closely fake user input) but it should work without.
# handle to read from
handle = self.popen.stdout
# how many seconds to wait without data
timeout = 1
begin = datetime.now()
stdout_collected = ""
while self.popen.poll() is None:
try:
fds = select.select([handle], [], [], 0.01)[0]
except select.error, exc:
print exc
break
if len(fds) == 0:
# select timed out, no new data
delta = (datetime.now() - begin).total_seconds()
if delta > timeout:
return stdout_collected
# try longer
continue
else:
# have data, timeout counter resets again
begin = datetime.now()
for fd in fds:
if fd == handle:
data = os.read(handle, 1024)
# can handle the bytes as they come in here
# self._handle_stdout(data)
stdout_collected += data
# process exited
# if using a pseudoterminal, close the handles here
self.popen.wait()
Why are you setting the sdterr to STDOUT?
The real benefit of making a communicate() call on a subproces is that you are able to retrieve a tuple containining the stdout response as well as the stderr meesage.
Those might be useful if the logic depends on their succsss or failure.
Also, it would save you from the pain of having to iterate through lines. Communicate() gives you everything and there would be no unresolved questions about whether or not the full message was received
I wrote a demo with bash subprocess that can be easy explored.
A closed pipe can be recognized by '' in the output from readline(), while the output from an empty line is '\n'.
from subprocess import Popen, PIPE, STDOUT
p = Popen(['bash'], stdout=PIPE, stderr=STDOUT)
out = []
while True:
outdata = p.stdout.readline()
if not outdata:
break
#output_consumer.process_output(outdata)
print "* " + repr(outdata)
out.append(outdata)
print "* closed", repr(out)
print "* returncode", p.wait()
Example of input/output: Closing the pipe distinctly before terminating the process. That is why wait() should be used instead of poll()
[prompt] $ python myscript.py
echo abc
* 'abc\n'
exec 1>&- # close stdout
exec 2>&- # close stderr
* closed ['abc\n']
exit
* returncode 0
[prompt] $
Your code did output a huge number of empty strings for this case.
Example: Fast terminated process without '\n' on the last line:
echo -n abc
exit
* 'abc'
* closed ['abc']
* returncode 0

Categories