How to debug subprocess call in Python/Django - python

In my Python/Django code I call gdalbuildvrt process. This process should create a .VRT file, but it does not. In order to check it, I write the subprocess output to a debug file. This is how I do it:
process = subprocess.Popen(["gdalbuildvrt", mosaic, folder], stdout=subprocess.PIPE)
stdout = process.communicate()[0]
with open(os.path.join(os.path.dirname(os.path.abspath(__file__)), "debug.txt"), 'w') as file:
file.write('{}'.format(stdout) + " -> " + mosaic)
As a result I see this in debug.txt file:
b'0...10...20...30...40...50...60...70...80...90...100 - done.\n' -> /var/www/vwrapper/accent/accent/layers/raw/mosaic.vrt
So, as you can see the first part of debug statement says, that it'ok:
0...10...20...30...40...50...60...70...80...90...100 - done.
And the second part says, that /var/www/vwrapper/accent/accent/layers/raw/mosaic.vrt should be created. However, when I go to the target folder and refresh it, I see no mosaic.vrt file there. So, what may be wrong with that and how can I fix it? I should add that on Windows machine it's 100% ok, but on CentOS it does not.

process = subprocess.Popen(["gdalbuildvrt", mosaic, folder],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
ret = process.returncode
or
process = subprocess.Popen(["gdalbuildvrt", mosaic, folder],
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
which will redirect stderr to stdout.
Then log those two things:
All error logging should be on stderr, not stdout. And any return code will appear via process.returncode.
You could also probably use one of the higher processes, like subprocess.check_call()

Related

python subprocess module hangs for spark-submit command when writing STDOUT

I have a python script that is used to submit spark jobs using the spark-submit tool. I want to execute the command and write the output both to STDOUT and a logfile in real time. i'm using python 2.7 on a ubuntu server.
This is what I have so far in my SubmitJob.py script
#!/usr/bin/python
# Submit the command
def submitJob(cmd, log_file):
with open(log_file, 'w') as fh:
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
output = process.stdout.readline()
if output == '' and process.poll() is not None:
break
if output:
print output.strip()
fh.write(output)
rc = process.poll()
return rc
if __name__ == "__main__":
cmdList = ["dse", "spark-submit", "--spark-master", "spark://127.0.0.1:7077", "--class", "com.spark.myapp", "./myapp.jar"]
log_file = "/tmp/out.log"
exist_status = submitJob(cmdList, log_file)
print "job finished with status ",exist_status
The strange thing is, when I execute the same command direcly in the shell it works fine and produces output on screen as the proggram proceeds.
So it looks like something is wrong in the way I'm using the subprocess.PIPE for stdout and writing the file.
What's the current recommended way to use subprocess module for writing to stdout and log file in real time line by line? I see bunch of options on the internet but not sure which is correct or latest.
thanks
Figured out what the problem was.
I was trying to redirect both stdout n stderr to pipe to display on screen. This seems to block the stdout when stderr is present. If I remove the stderr=stdout argument from Popen, it works fine. So for spark-submit it looks like you don't need to redirect stderr explicitly as it already does this implicitly
To print the Spark log
One can call the commandList given by user330612
cmdList = ["spark-submit", "--spark-master", "spark://127.0.0.1:7077", "--class", "com.spark.myapp", "./myapp.jar"]
Then it can be printed by using subprocess, remember to use communicate() to prevent deadlocks https://docs.python.org/2/library/subprocess.html
Warning Deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that. Here below is the code to print the log.
import subprocess
p = subprocess.Popen(cmdList,stdout=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
stdout, stderr = p.communicate()
stderr=stderr.splitlines()
stdout=stdout.splitlines()
for line in stderr:
print line #now it can be printed line by line to a file or something else, for the log
for line in stdout:
print line #for the output
More information about subprocess and printing lines can be found at:
https://pymotw.com/2/subprocess/

How to redirect print and stdout to a pipe and read it from parent process?

If possible I would like to not use subProcess.popen. The reason I want to capture the stdout of the process started by the child is because I need to save the output of the child in a variable to display it back later. However I have yet to find a way to do so anywhere. I also need to activate multiple programs without necessarily closing the one that's active. I also need to be controlling the child process whit the parent process.
I'm launching a subprocess like this
listProgram = ["./perroquet.py"]
listOutput = ["","",""]
tubePerroquet = os.pipe()
pipeMain = os.pipe()
pipeAge = os.pipe()
pipeSavoir = os.pipe()
pid = os.fork()
process = 1
if pid == 0:
os.close(pipePerroquet[1])
os.dup2(pipePerroquet[0],0)
sys.stdout = os.fdopen(tubeMain[1], 'w')
os.execvp("./perroquet.py", listProgram)
Now as you can see I'm launching the program with os.execvp and using os.dup2() to redirect the stdout of the child. However I'm not sure of what I've done in the code and want to know of the correct way to redirect stdout with os.dup2 and then be able to read it in the parent process.
Thank you for your help.
I cannot understand why you do not want to use the excellent subprocess module that could save you a lot of boiler plate code (and as much error possibilities ...). Anyway, I assume perroquet.py is a python script, not an executable progam. Shell know how to find the correct interpretor for scripts, but exec family are low-level functions that expect a real executable program.
You should at least have something like :
listProgram = [ "python", "./perroquet.py","",""]
...
os.execvp("python", listProgram)
But I'd rather use :
prog = subprocess.Popen(("python", "./perroquet.py", "", ""), stdout = PIPE)
or even as you are already in python import it and directly call the functions from there.
EDIT :
It looks thart what you really want is :
user gives you a command (can be almost anything)
[ you validate that the command is safe ] - unsure if you intend to do it but you should ...
you make the shell execute the command and get its output - you may want to read stderr too and control exit code
You should try something like
while True:
cmd = raw_input("commande :") # input with Python 3
if cmd.strip().lower() == exit: break
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, shell=True)
out, err = proc.communicate()
code = proc.returncode
print("OUT", out, "ERR", err, "CODE", code)
It is absolutely unsafe, since this code executes any command as the underlying shell would do (include rm -rf *, rd /s/q ., ...), but it gives you the output, the output and the return code of the command, and it can be used is a loop. The only limitation is that as you use a different shell for each command, you cannot use commands that change shell environment - they will be executed but will have no effect.
Here's a solution if you need to extract any changes to the environment
from subprocess import Popen, PIPE
import os
def execute_and_get_env(cmd, initial_env=None):
if initial_env is None:
initial_env = os.environ
r_fd, w_fd = os.pipe()
write_env = "; env >&{}".format(w_fd)
p = Popen(cmd + write_env, shell=True, env=initial_env, pass_fds=[w_fd], stdout=PIPE, stderr=PIPE)
output, error = p.communicate()
# this will cause problems if the environment gets very large as
# writing to the pipe will hang because it gets full and we only
# read from the pipe when the process is over
os.close(w_fd)
with open(r_fd) as f:
env = dict(line[:-1].split("=", 1) for line in f)
return output, error, env
export_cmd = "export my_var='hello world'"
echo_cmd = "echo $my_var"
out, err, env = execute_and_get_env(export_cmd)
out, err, env = execute_and_get_env(echo_cmd, env)
print(out)

python subprocess.call output is not interleaved

I have a python (v3.3) script that runs other shell scripts. My python script also prints message like "About to run script X" and "Done running script X".
When I run my script I'm getting all the output of the shell scripts separate from my print statements. I see something like this:
All of script X's output
All of script Y's output
All of script Z's output
About to run script X
Done running script X
About to run script Y
Done running script Y
About to run script Z
Done running script Z
My code that runs the shell scripts looks like this:
print( "running command: " + cmnd )
ret_code = subprocess.call( cmnd, shell=True )
print( "done running command")
I wrote a basic test script and do *not* see this behaviour. This code does what I would expect:
print("calling")
ret_code = subprocess.call("/bin/ls -la", shell=True )
print("back")
Any idea on why the output is not interleaved?
Thanks. This works but has one limitation - you can't see any output until after the command completes. I found an answer from another question (here) that uses popen but also lets me see the output in real time. Here's what I ended up with this:
import subprocess
import sys
cmd = ['/media/sf_git/test-automation/src/SalesVision/mswm/shell_test.sh', '4', '2']
print('running command: "{0}"'.format(cmd)) # output the command.
# Here, we join the STDERR of the application with the STDOUT of the application.
process = subprocess.Popen(cmd, bufsize=1, universal_newlines=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in iter(process.stdout.readline, ''):
line = line.replace('\n', '')
print(line)
sys.stdout.flush()
process.wait() # Wait for the underlying process to complete.
errcode = process.returncode # Harvest its returncode, if needed.
print( 'Script ended with return code of: ' + str(errcode) )
This uses Popen and allows me to see the progress of the called script.
It has to do with STDOUT and STDERR buffering. You should be using subprocess.Popen to redirect STDOUT and STDERR from your child process into your application. Then, as needed, output them. Example:
import subprocess
cmd = ['ls', '-la']
print('running command: "{0}"'.format(cmd)) # output the command.
# Here, we join the STDERR of the application with the STDOUT of the application.
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
process.wait() # Wait for the underlying process to complete.
out, err = process.communicate() # Capture what it outputted on STDOUT and STDERR
errcode = process.returncode # Harvest its returncode, if needed.
print(out)
print('done running command')
Additionally, I wouldn't use shell = True unless it's really required. It forces subprocess to fire up a whole shell environment just to run a command. It's usually better to inject directly into the env parameter of Popen.

prevent subprocess.Popen from displaying output in python

So I am trying to store the output of a command into a variable. I do not want it to display output while running the command though...
The code I have right now is as follows...
def getoutput(*args):
myargs=args
listargs=[l.split(' ',1) for l in myargs]
import subprocess
output=subprocess.Popen(listargs[0], shell=False ,stdout=subprocess.PIPE)
out, error = output.communicate()
return(out,error)
def main():
a,b=getoutput("httpd -S")
if __name__ == '__main__':
main()
If I put this in a file and execute it on the command line. I get the following output even though I do not have a print statement in the code. How can I prevent this, while still storing the output?
#python ./apache.py
httpd: Could not reliably determine the server's fully qualified domain name, using xxx.xxx.xxx.xx for ServerName
Syntax OK
What you are seeing is standard-error output, not standard-output output. Stderr redirection is controlled by the stderr constructor argument. It defaults to None, which means no redirection occurs, which is why you see this output.
Usually it's a good idea to keep stderr output since it aids debugging and doesn't affect normal redirection (e.g. | and > shell redirection won't capture stderr by default). However you can redirect it somewhere else the same way you do stdout:
sp = subprocess.Popen(listargs[0], shell=False,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, error = sp.communicate()
Or you can just drop stderr:
devnull = open(os.devnull, 'wb') #python >= 2.4
sp = subprocess.Popen(listargs[0], shell=False,
stdout=subprocess.PIPE, stderr=devnull)
#python 3.x:
sp = subprocess.Popen(listargs[0], shell=False
stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
You're catching stdout, but you're not catching stderr(standard error) which I think is where that message is coming from.
output=subprocess.Popen(listargs[0], shell=False ,stdout=subprocess.PIPE, stderr=STDOUT)
That will put anything from stderr into the same place as stdout.

Spawn subprocess that expects console input without blocking?

I am trying to do a CVS login from Python by calling the cvs.exe process.
When calling cvs.exe by hand, it prints a message to the console and then waits for the user to input the password.
When calling it with subprocess.Popen, I've noticed that the call blocks. The code is
subprocess.Popen(cvscmd, shell = True, stdin = subprocess.PIPE, stdout = subprocess.PIPE,
stderr = subprocess.PIPE)
I assume that it blocks because it's waiting for input, but my expectation was that calling Popen would return immediately and then I could call subprocess.communicate() to input the actual password. How can I achieve this behaviour and avoid blocking on Popen?
OS: Windows XP
Python: 2.6
cvs.exe: 1.11
Remove the shell=True part. Your shell has nothing to do with it. Using shell=True is a common cause of trouble.
Use a list of parameters for cmd.
Example:
cmd = ['cvs',
'-d:pserver:anonymous#bayonne.cvs.sourceforge.net:/cvsroot/bayonne',
'login']
p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
This won't block on my system (my script continues executing).
However since cvs reads the password directly from the terminal (not from standard input or output) you can't just write the password to the subprocess' stdin.
What you could do is pass the password as part of the CVSROOT specification instead, like this:
:pserver:<user>[:<passwd>]#<server>:/<path>
I.e. a function to login to a sourceforge project:
import subprocess
def login_to_sourceforge_cvs(project, username='anonymous', password=''):
host = '%s.cvs.sourceforge.net' % project
path = '/cvsroot/%s' % project
cmd = ['cvs',
'-d:pserver:%s:%s#%s:%s' % (username, password, host, path),
'login']
p = subprocess.Popen(cmd, stdin=subprocess.PIPE,
stdout=subprocess.PIPE
stderr=subprocess.STDOUT)
return p
This works for me. Calling
login_to_sourceforge_cvs('bayonne')
Will log in anonymously to the bayonne project's cvs.
If you are automating external programs that need input - like password - your best bet would probably be to use pexpect.

Categories