Python - Executing code as long as a subprocess is running

Python - Executing code as long as a subprocess is running - python

I would like to run a section of code as long as a forked subprocess (rsync) is running. This is how I did it in my code:
rsync_proc = subprocess.Popen(proc_args, stdout=subprocess.PIPE)
while rsync_proc.poll() == None:
sys.stdout.write('\r'+
rsync_progress_report(source_size_kb, dest, start)),
sys.stdout.flush()
time.sleep(1)
For some reason, this causes the rsync subprocess to get stuck when it's almost finished. The while loop just continues looping with the rsync_proc.poll() returning None.
When I do run this same rsync call without the while loop code, it finishes without a problem.
Thanks in advance.

If you attach strace to your stuck rsync child process, you'll probably see it's blocked writing to stdout.
If it's blocked writing to stdout, it's probably because the pipe is full because you never read from it.
Try reading from the pipe and just discarding the output - or, if you really don't want the output, don't connect it to a pipe in the first place.

Related

python how to kill a popen process, shell false [why not working with standard methods]

I'm trying to kill a subprocess started with:
playing_long = Popen(["omxplayer", "/music.mp3"], stdout=subprocess.PIPE)
and after a while
pid = playing_long.pid
playing_long.terminate()
os.kill(pid,0)
playing_long.kill()
Which doesn't work.
Neither the solution pointed out here
How to terminate a python subprocess launched with shell=True
Noting that I am using threads, and it is not recommended to use preexec_fn when you use threads (or at least this is what I read, anyway it doesn't work either).
Why it is not working? There's no error message in the code, but I have to manually kill -9 the process to stop listening the mp3 file.
Thanks
EDIT:
From here, I have added a wait() after the kill().
Surprisingly, before re-starting the process I check if this is still await, so that I don't start a chorus with the mp3 file.
Without the wait(), the system sees that the process is alive.
With the wait(), the system understands that the process is dead and starts again it.
However, the process is still sounding. Definitively I can't seem to get it killed.
EDIT2: The problem is that omxplayer starts a second process that I don't kill, and it's the responsible for the actual music.
I've tried to use this code, found in several places in internet, it seems to work for everyone but not for me
playing_long.stdin.write('q')
playing_long.stdin.flush()
And it prints 'NoneType' object has no attribute 'write'. Even when using this code immediately after starting the popen process, it fails with the same message
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write('q')
playing_long.stdin.flush()
EDIT3: The problem then was that I wasn't establishing the stdin line in the popen line. Now it is
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write(b'q')
playing_long.stdin.flush()
*needing to specify that it is bytes what I write in stdin

Final solution then (see the process edited in the question):
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write(b'q')
playing_long.stdin.flush()

Run tool via Python in cmd does not wait

I'm running a tool via Python in cmd. For each sample in a given directory I want that tool to do something. However, when I use process = subprocess.Popen(command) in the loop, the commands does not wait untill its finished, resulting in 10 prompts at once. And when I use subprocess.Popen(command, stdout=subprocess.PIPE) the command remains black and I can't see the progress, although it does wait untill the command is finished.
Does anyone know a way how to call an external tool via Python in cmd, that does wait untill the command is finished and thats able to show the progress of the tool in the cmd?
#main.py
for sample in os.listdir(os.getcwd()):
if ".fastq" in sample and '_R1_' in sample and "Temp" not in sample:
print time.strftime("%H:%M:%S")
DNA_Bowtie2.DNA_Bowtie2(os.getcwd()+'\\'+sample+'\\'+sample)
#DNA_Bowtie2.py
# Run Bowtie2 command and wait for process to be finished.
process = subprocess.Popen(command, stdout=subprocess.PIPE)
process.wait()
process.stdout.read()
Edit: command = a perl or java command. With above make-up I cannot see tool output since the prompt (perl window, or java window) remains black.

It seems like your subprocess forks otherwise there is no way the wait() would return before the process has finished.

The order is important here: first read the output, then wait.
If you do it this way:
process.wait()
process.stdout.read()
you can experience a deadlock if the pipe buffer is completely full: the subprocess blocks on waiting on stdout and never reaches the end, your program blocks on wait() and never reaches the read().
Do instead
process.stdout.read()
process.wait()
which will read until EOF.
This holds for if you want the stdout of the process at all.
If you don't want that, you should omit the stdout=PIPE stuff. Then the output is directed into that prompt window. Then you can omit process.stdout.read() as well.
Normally, the process.wait() should then prevent that 10 instances run at once. If that doesn't work, I don't know why not...

Python Multiprocessing - sending inputs to child processes

I am using the multiprocessing module in python to launch few processes in parallel. These processes are independent of each other. They generate their own output and write out the results in different files. Each process calls an external tool using the subprocess.call method.
It was working fine until I discovered an issue in the external tool where due to some error condition it goes into a 'prompt' mode and waits for the user input. Now in my python script I use the join method to wait till all the processes finish their tasks. This is causing the whole thing to wait for this erroneous subprocess call. I can put a timeout for each of the process but I do not know in advance how long each one is going to run and hence this option is ruled out.
How do I figure out if any child process is waiting for an user input and how do I send an 'exit' command to it? Any pointers or suggestions to relevant modules in python will be really appreciated.
My code here:
import subprocess
import sys
import os
import multiprocessing
def write_script(fname,e):
f = open(fname,'w')
f.write("Some useful cammnd calling external tool")
f.close()
subprocess.call(['chmod','+x',os.path.abspath(fname)])
return os.path.abspath(fname)
def run_use(mname,script):
print "ssh "+mname+" "+script
subprocess.call(['ssh',mname,script])
if __name__ == '__main__':
dict1 = {}
dict['mod1'] = ['pp1','ext2','les3','pw4']
dict['mod2'] = ['aaa','bbb','ccc','ddd']
machines = ['machine1','machine2','machine3','machine4']
log_file.write(str(dict1.keys()))
for key in dict1.keys():
arr = []
for mod in dict1[key]:
d = {}
arr.append(mod)
if ((mod == dict1[key][-1]) | (len(arr)%4 == 0)):
for i in range(0,len(arr)):
e = arr.pop()
script = write_script(e+"_temp.sh",e)
d[i] = multiprocessing.Process(target=run_use,args=(machines[i],script,))
d[i].daemon = True
for pp in d:
d[pp].start()
for pp in d:
d[pp].join()

Since you're writing a shell script to run your subcommands, can you simply tell them to read input from /dev/null?
#!/bin/bash
# ...
my_other_command -a -b arg1 arg2 < /dev/null
# ...
This may stop them blocking on input and is a really simple solution. If this doesn't work for you, read on for some other options.
The subprocess.call() function is simply shorthand for constructing a subprocess.Popen instance and then calling the wait() method on it. So, your spare processes could instead create their own subprocess.Popen instances and poll them with poll() method on the object instead of wait() (in a loop with a suitable delay). This leaves them free to remain in communication with the main process so you can, for example, allow the main process to tell the child process to terminate the Popen instance with the terminate() or kill() methods and then itself exit.
So, the question is how does the child process tell whether the subprocess is awaiting user input, and that's a trickier question. I would say perhaps the easiest approach is to monitor the output of the subprocess and search for the user input prompt, assuming that it always uses some string that you can look for. Alternatively, if the subprocess is expected to generate output continually then you could simply look for any output and if a configured amount of time goes past without any output then you declare that process dead and terminate it as detailed above.
Since you're reading the output, actually you don't need poll() or wait() - the process closing its output file descriptor is good enough to know that it's terminated in this case.
Here's an example of a modified run_use() method which watches the output of the subprocess:
def run_use(mname,script):
print "ssh "+mname+" "+script
proc = subprocess.Popen(['ssh',mname,script], stdout=subprocess.PIPE)
for line in proc.stdout:
if "UserPrompt>>>" in line:
proc.terminate()
break
In this example we assume that the process either gets hung on on UserPrompt>>> (replace with the appropriate string) or it terminates naturally. If it were to get stuck in an infinite loop, for example, then your script would still not terminate - you can only really address that with an overall timeout, but you didn't seem keen to do that. Hopefully your subprocess won't misbehave in that way, however.
Finally, if you don't know in advance the prompt that will be giving from your process then your job is rather harder. Effectively what you're asking to do is monitor an external process and know when it's blocked reading on a file descriptor, and I don't believe there's a particularly clean solution to this. You could consider running a process under strace or similar, but that's quite an awful hack and I really wouldn't recommend it. Things like strace are great for manual diagnostics, but they really shouldn't be part of a production setup.

Detecting the end of the stream on popen.stdout.readline

I have a python program which launches subprocesses using Popen and consumes their output nearly real-time as it is produced. The code of the relevant loop is:
def run(self, output_consumer):
self.prepare_to_run()
popen_args = self.get_popen_args()
logging.debug("Calling popen with arguments %s" % popen_args)
self.popen = subprocess.Popen(**popen_args)
while True:
outdata = self.popen.stdout.readline()
if not outdata and self.popen.returncode is not None:
# Terminate when we've read all the output and the returncode is set
break
output_consumer.process_output(outdata)
self.popen.poll() # updates returncode so we can exit the loop
output_consumer.finish(self.popen.returncode)
self.post_run()
def get_popen_args(self):
return {
'args': self.command,
'shell': False, # Just being explicit for security's sake
'bufsize': 0, # More likely to see what's being printed as it happens
# Not guarantted since the process itself might buffer its output
# run `python -u` to unbuffer output of a python processes
'cwd': self.get_cwd(),
'env': self.get_environment(),
'stdout': subprocess.PIPE,
'stderr': subprocess.STDOUT,
'close_fds': True, # Doesn't seem to matter
}
This works great on my production machines, but on my dev machine, the call to .readline() hangs when certain subprocesses complete. That is, it will successfully process all of the output, including the final output line saying "process complete", but then will again poll readline and never return. This method exits properly on the dev machine for most of the sub-processes I call, but consistently fails to exit for one complex bash script that itself calls many sub-processes.
It's worth noting that popen.returncode gets set to a non-None (usually 0) value many lines before the end of the output. So I can't just break out of the loop when that is set or else I lose everything that gets spat out at the end of the process and is still buffered waiting for reading. The problem is that when I'm flushing the buffer at that point, I can't tell when I'm at the end because the last call to readline() hangs. Calling read() also hangs. Calling read(1) gets me every last character out, but also hangs after the final line. popen.stdout.closed is always False. How can I tell when I'm at the end?
All systems are running python 2.7.3 on Ubuntu 12.04LTS. FWIW, stderr is being merged with stdout using stderr=subprocess.STDOUT.
Why the difference? Is it failing to close stdout for some reason? Could the sub-sub-processes do something to keep it open somehow? Could it be because I'm launching the process from a terminal on my dev box, but in production it's launched as a daemon through supervisord? Would that change the way the pipes are processed and if so how do I normalize them?

The main code loop looks right. It could be that the pipe isn't closing because another process is keeping it open. For example, if script launches a background process that writes to stdout then the pipe will no close. Are you sure no other child process still running?
An idea is to change modes when you see the .returncode has set. Once you know the main process is done, read all its output from buffer, but don't get stuck waiting. You can use select to read from the pipe with a timeout. Set a several seconds timeout and you can clear the buffer without getting stuck waiting child process.

Without knowing the contents of the "one complex bash script" which causes the problem, there's too many possibilities to determine the exact cause.
However, focusing on the fact that you claim it works if you run your Python script under supervisord, then it might be getting stuck if a sub-process is trying to read from stdin, or just behaves differently if stdin is a tty, which (I presume) supervisord will redirect from /dev/null.
This minimal example seems to cope better with cases where my example test.sh runs subprocesses which try to read from stdin...
import os
import subprocess
f = subprocess.Popen(args='./test.sh',
shell=False,
bufsize=0,
stdin=open(os.devnull, 'rb'),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
close_fds=True)
while 1:
s = f.stdout.readline()
if not s and f.returncode is not None:
break
print s.strip()
f.poll()
print "done %d" % f.returncode
Otherwise, you can always fall back to using a non-blocking read, and bail out when you get your final output line saying "process complete", although it's a bit of a hack.

If you use readline() or read(), it should not hang. No need to check returncode or poll(). If it is hanging when you know the process is finished, it is most probably a subprocess keeping your pipe open, as others said before.
There are two things you could do to debug this:
* Try to reproduce with a minimal script instead of the current complex one, or
* Run that complex script with strace -f -e clone,execve,exit_group and see what is that script starting, and if any process is surviving the main script (check when the main script calls exit_group, if strace is still waiting after that, you have a child still alive).

I find that calls to read (or readline) sometimes hang, despite previously calling poll. So I resorted to calling select to find out if there is readable data. However, select without a timeout can hang, too, if the process was closed. So I call select in a semi-busy loop with a tiny timeout for each iteration (see below).
I'm not sure if you can adapt this to readline, as readline might hang if the final \n is missing, or if the process doesn't close its stdout before you close its stdin and/or terminate it. You could wrap this in a generator, and everytime you encounter a \n in stdout_collected, yield the current line.
Also note that in my actual code, I'm using pseudoterminals (pty) to wrap the popen handles (to more closely fake user input) but it should work without.
# handle to read from
handle = self.popen.stdout
# how many seconds to wait without data
timeout = 1
begin = datetime.now()
stdout_collected = ""
while self.popen.poll() is None:
try:
fds = select.select([handle], [], [], 0.01)[0]
except select.error, exc:
print exc
break
if len(fds) == 0:
# select timed out, no new data
delta = (datetime.now() - begin).total_seconds()
if delta > timeout:
return stdout_collected
# try longer
continue
else:
# have data, timeout counter resets again
begin = datetime.now()
for fd in fds:
if fd == handle:
data = os.read(handle, 1024)
# can handle the bytes as they come in here
# self._handle_stdout(data)
stdout_collected += data
# process exited
# if using a pseudoterminal, close the handles here
self.popen.wait()

Why are you setting the sdterr to STDOUT?
The real benefit of making a communicate() call on a subproces is that you are able to retrieve a tuple containining the stdout response as well as the stderr meesage.
Those might be useful if the logic depends on their succsss or failure.
Also, it would save you from the pain of having to iterate through lines. Communicate() gives you everything and there would be no unresolved questions about whether or not the full message was received

I wrote a demo with bash subprocess that can be easy explored.
A closed pipe can be recognized by '' in the output from readline(), while the output from an empty line is '\n'.
from subprocess import Popen, PIPE, STDOUT
p = Popen(['bash'], stdout=PIPE, stderr=STDOUT)
out = []
while True:
outdata = p.stdout.readline()
if not outdata:
break
#output_consumer.process_output(outdata)
print "* " + repr(outdata)
out.append(outdata)
print "* closed", repr(out)
print "* returncode", p.wait()
Example of input/output: Closing the pipe distinctly before terminating the process. That is why wait() should be used instead of poll()
[prompt] $ python myscript.py
echo abc
* 'abc\n'
exec 1>&- # close stdout
exec 2>&- # close stderr
* closed ['abc\n']
exit
* returncode 0
[prompt] $
Your code did output a huge number of empty strings for this case.
Example: Fast terminated process without '\n' on the last line:
echo -n abc
exit
* 'abc'
* closed ['abc']
* returncode 0

Process called by subprocess.Popen not closing properly when parent ends

I have the simplified following code in Python:
proc_args = "gzip --force file; echo this_still_prints > out"
post_proc = subprocess.Popen(proc_args, shell=True)
while True:
time.sleep(1)
Assume file is big enough to take several seconds do process. If I close the Python process while gzip is still running, it will cause gzip to end, but it will still execute the following line to gzip. I'd like to know why this happens, and if there's a way I can make it to not continue executing the following commands.
Thank you!

A process exiting does not automatically cause all its child processes to be killed. See this question and its related questions for much discussion of this.
gzip exits because the pipe containing its standard input gets closed when the parent exits; it reads EOF and exits. However, the shell that's running the two commands is not reading from stdin, so it doesn't notice this. So it just continues on and executes the echo command (which also doesn't read stdin).

post_proc.kill() I believe is what you are looking for ... but afaik you must explicitly call it
see: http://docs.python.org/library/subprocess.html#subprocess.Popen.kill

I use try-finally in such cases (unfortunately you cannot employ with like you would in file.open()):
proc_args = "gzip --force file; echo this_still_prints > out"
post_proc = subprocess.Popen(proc_args, shell=True)
try:
while True:
time.sleep(1)
finally:
post_proc.kill()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.