How to write output from parallel subprocesses to different files

How to write output from parallel subprocesses to different files - python

I have the following python script. How can I log the outputs of each command separately, i.e. one file per each command containing that command's output?
#!/usr/bin/env python
from subprocess import Popen
import sys
commands = [
'command1',
'command2',
'command3'
]
processes = [Popen(cmd, shell=True) for cmd in commands]
for p in processes:
p.wait()

Just set stdout parameter to a corresponding file:
import shlex
from contextlib import ExitStack # $ pip install contextlib2 (on Python 2)
from subprocess import Popen
with ExitStack() as stack:
for i, cmd in enumerate(commands):
output_file = stack.enter_context(open('output%d.log' % i, 'w'))
stack.callback(Popen(shlex.split(cmd), stdout=output_file).wait)
To redirect stderr output from a child process, you could set stderr parameter. If stderr=subprocess.STDOUT then stdout and stderr are merged.
ExitStack is used to close the files and wait for already started child processes to exit even if an exception happens inside the with-statement e.g., if some command fails to start.

Related

Calling a shell script from python

I have a python script that calls a shell scrips, that in turn calls a .exe called iv4_console. I need to print the stdout of iv4_console for debugging purposes. I used this:
Python:
import sys
import subprocess
var="rW015005000000"
proc = subprocess.Popen(["c.sh", var], shell=True, stdout=subprocess.PIPE)
output = ''
for line in iter(proc.stdout.readline, ""):
print line
output += line
Shell:
start_dir=$PWD
release=$1
echo Release inside shell: $release
echo Directory: $start_dir
cd $start_dir
cd ../../iv_system4/ports/visualC12/Debug
echo Debug dir: $PWD
./iv4_console.exe ../embedded/LUA/analysis/verbose-udp-toxml.lua ../../../../../logs/$release/VASP_DUN722_20160307_Krk_Krk_113048_092_1_$release.dvl &>../../../../FCW/ObjectDetectionTest/VASP_DUN722_20160307_Krk_Krk_113048_092_1_$release.xml
./iv4_console.exe ../embedded/LUA/analysis/verbose-udp-toxml.lua ../../../../../logs/$release/VASP_FL140_20170104_C60_Checkout_afterIC_162557_001_$release.dvl &>../../../../FCW/ObjectDetectionTest/VASP_FL140_20170104_C60_Checkout_afterIC_162557_001_$release.xml
exit
But this didn't work, it prints nothing. What do you think?

See my comment, best approach (i.m.o) would be to just use python only.
However, in answer of your question, try:
import sys
import subprocess
var="rW015005000000"
proc = subprocess.Popen(["/bin/bash", "/full/path/to/c.sh"], stdout=subprocess.PIPE)
# Best to always avoid shell=True because of security vulnerabilities.
proc.wait() # To make sure the shell script does not continue running indefinitely in the background
output, errors = proc.communicate()
print(output.decode())
# Since subprocess.communicate() returns a bytes-string, you can use .decode() to print the actual output as a string.

You can use
import subprocess
subprocess.call(['./c.sh'])
to call the shell script in python file
or
import subprocess
import shlex
subprocess.call(shlex.split('./c.sh var'))

Multiple pipes in subprocess

I am trying to use Sailfish, which takes multiple fastq files as arguments, in a ruffus pipeline. I execute Sailfish using the subprocess module in python, but <() in the subprocess call does not work even when I set shell=True.
This is the command I want to execute using python:
sailfish quant [options] -1 <(cat sample1a.fastq sample1b.fastq) -2 <(cat sample2a.fastq sample2b.fastq) -o [output_file]
or (preferably):
sailfish quant [options] -1 <(gunzip sample1a.fastq.gz sample1b.fastq.gz) -2 <(gunzip sample2a.fastq.gz sample2b.fastq.gz) -o [output_file]
A generalization:
someprogram <(someprocess) <(someprocess)
How would I go about doing this in python? Is subprocess the right approach?

To emulate the bash process substitution:
#!/usr/bin/env python
from subprocess import check_call
check_call('someprogram <(someprocess) <(anotherprocess)',
shell=True, executable='/bin/bash')
In Python, you could use named pipes:
#!/usr/bin/env python
from subprocess import Popen
with named_pipes(n=2) as paths:
someprogram = Popen(['someprogram'] + paths)
processes = []
for path, command in zip(paths, ['someprocess', 'anotherprocess']):
with open(path, 'wb', 0) as pipe:
processes.append(Popen(command, stdout=pipe, close_fds=True))
for p in [someprogram] + processes:
p.wait()
where named_pipes(n) is:
import os
import shutil
import tempfile
from contextlib import contextmanager
#contextmanager
def named_pipes(n=1):
dirname = tempfile.mkdtemp()
try:
paths = [os.path.join(dirname, 'named_pipe' + str(i)) for i in range(n)]
for path in paths:
os.mkfifo(path)
yield paths
finally:
shutil.rmtree(dirname)
Another and more preferable way (no need to create a named entry on disk) to implement the bash process substitution is to use /dev/fd/N filenames (if they are available) as suggested by #Dunes. On FreeBSD, fdescfs(5) (/dev/fd/#) creates entries for all file descriptors opened by the process. To test availability, run:
$ test -r /dev/fd/3 3</dev/null && echo /dev/fd is available
If it fails; try to symlink /dev/fd to proc(5) as it is done on some Linuxes:
$ ln -s /proc/self/fd /dev/fd
Here's /dev/fd-based implementation of someprogram <(someprocess) <(anotherprocess) bash command:
#!/usr/bin/env python3
from contextlib import ExitStack
from subprocess import CalledProcessError, Popen, PIPE
def kill(process):
if process.poll() is None: # still running
process.kill()
with ExitStack() as stack: # for proper cleanup
processes = []
for command in [['someprocess'], ['anotherprocess']]: # start child processes
processes.append(stack.enter_context(Popen(command, stdout=PIPE)))
stack.callback(kill, processes[-1]) # kill on someprogram exit
fds = [p.stdout.fileno() for p in processes]
someprogram = stack.enter_context(
Popen(['someprogram'] + ['/dev/fd/%d' % fd for fd in fds], pass_fds=fds))
for p in processes: # close pipes in the parent
p.stdout.close()
# exit stack: wait for processes
if someprogram.returncode != 0: # errors shouldn't go unnoticed
raise CalledProcessError(someprogram.returncode, someprogram.args)
Note: on my Ubuntu machine, the subprocess code works only in Python 3.4+, despite pass_fds being available since Python 3.2.

Whilst J.F. Sebastian has provided an answer using named pipes it is possible to do this with anonymous pipes.
import shlex
from subprocess import Popen, PIPE
inputcmd0 = "zcat hello.gz" # gzipped file containing "hello"
inputcmd1 = "zcat world.gz" # gzipped file containing "world"
def get_filename(file_):
return "/dev/fd/{}".format(file_.fileno())
def get_stdout_fds(*processes):
return tuple(p.stdout.fileno() for p in processes)
# setup producer processes
inputproc0 = Popen(shlex.split(inputcmd0), stdout=PIPE)
inputproc1 = Popen(shlex.split(inputcmd1), stdout=PIPE)
# setup consumer process
# pass input processes pipes by "filename" eg. /dev/fd/5
cmd = "cat {file0} {file1}".format(file0=get_filename(inputproc0.stdout),
file1=get_filename(inputproc1.stdout))
print("command is:", cmd)
# pass_fds argument tells Popen to let the child process inherit the pipe's fds
someprogram = Popen(shlex.split(cmd), stdout=PIPE,
pass_fds=get_stdout_fds(inputproc0, inputproc1))
output, error = someprogram.communicate()
for p in [inputproc0, inputproc1, someprogram]:
p.wait()
assert output == b"hello\nworld\n"

Calling ffmpeg kills script in background only

I've got a python script that calls ffmpeg via subprocess to do some mp3 manipulations. It works fine in the foreground, but if I run it in the background, it gets as far as the ffmpeg command, which itself gets as far as dumping its config into stderr. At this point, everything stops and the parent task is reported as stopped, without raising an exception anywhere. I've tried a few other simple commands in the place of ffmpeg, they execute normally in foreground or background.
This is the minimal example of the problem:
import subprocess
inf = "3HTOSD.mp3"
outf = "out.mp3"
args = [ "ffmpeg",
"-y",
"-i", inf,
"-ss", "0",
"-t", "20",
outf
]
print "About to do"
result = subprocess.call(args)
print "Done"
I really can't work out why or how a wrapped process can cause the parent to terminate without at least raising an error, and how it only happens in so niche a circumstance. What is going on?
Also, I'm aware that ffmpeg isn't the nicest of packages, but I'm interfacing with something that has using ffmpeg compiled into it, so using it again seems sensible.

It might be related to Linux process in background - “Stopped” in jobs? e.g., using parent.py:
from subprocess import check_call
check_call(["python", "-c", "import sys; sys.stdin.readline()"])
should reproduce the issue: "parent.py script shown as stopped" if you run it in bash as a background job:
$ python parent.py &
[1] 28052
$ jobs
[1]+ Stopped python parent.py
If the parent process is in an orphaned process group then it is killed on receiving SIGTTIN signal (a signal to stop).
The solution is to redirect the input:
import os
from subprocess import check_call
try:
from subprocess import DEVNULL
except ImportError: # Python 2
DEVNULL = open(os.devnull, 'r+b', 0)
check_call(["python", "-c", "import sys; sys.stdin.readline()"], stdin=DEVNULL)
If you don't need to see ffmpeg stdout/stderr; you could also redirect them to /dev/null:
check_call(ffmpeg_cmd, stdin=DEVNULL, stdout=DEVNULL, stderr=STDOUT)

I like to use the commands module. It's simpler to use in my opinion.
import commands
cmd = "ffmpeg -y -i %s -ss 0 -t 20 %s 2>&1" % (inf, outf)
status, output = commands.getstatusoutput(cmd)
if status != 0:
raise Exception(output)
As a side note, sometimes PATH can be an issue, and you might want to use an absolute path to the ffmpeg binary.
matt#goliath:~$ which ffmpeg
/opt/local/bin/ffmpeg

From the python/subprocess/call documentation:
Wait for command to complete, then return the returncode attribute.
So as long as the process you called does not exit, your program does not go on.
You should set up a Popen process object, put its standard output and error in different buffers/streams and when there is an error, you terminate the process.
Maybe something like this works:
proc = subprocess.Popen(args, stderr = subprocess.PIPE) # puts stderr into a new stream
while proc.poll() is None:
try:
err = proc.stderr.read()
except: continue
else:
if err:
proc.terminate()
break

How to write to stdout AND to log file simultaneously with Popen?

I am using Popen to call a shell script that is continuously writing its stdout and stderr to a log file. Is there any way to simultaneously output the log file continuously (to the screen), or alternatively, make the shell script write to both the log file and stdout at the same time?
I basically want to do something like this in Python:
cat file 2>&1 | tee -a logfile #"cat file" will be replaced with some script
Again, this pipes stderr/stdout together to tee, which writes it both to stdout and my logfile.
I know how to write stdout and stderr to a logfile in Python. Where I'm stuck is how to duplicate these back to the screen:
subprocess.Popen("cat file", shell=True, stdout=logfile, stderr=logfile)
Of course, I could just do something like this, but is there any way to do this without tee and shell file descriptor redirection?:
subprocess.Popen("cat file 2>&1 | tee -a logfile", shell=True)

You can use a pipe to read the data from the program's stdout and write it to all the places you want:
import sys
import subprocess
logfile = open('logfile', 'w')
proc=subprocess.Popen(['cat', 'file'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in proc.stdout:
sys.stdout.write(line)
logfile.write(line)
proc.wait()
UPDATE
In python 3, the universal_newlines parameter controls how pipes are used. If False, pipe reads return bytes objects and may need to be decoded (e.g., line.decode('utf-8')) to get a string. If True, python does the decode for you
Changed in version 3.3: When universal_newlines is True, the class uses the encoding locale.getpreferredencoding(False) instead of locale.getpreferredencoding(). See the io.TextIOWrapper class for more information on this change.

To emulate: subprocess.call("command 2>&1 | tee -a logfile", shell=True) without invoking the tee command:
#!/usr/bin/env python2
from subprocess import Popen, PIPE, STDOUT
p = Popen("command", stdout=PIPE, stderr=STDOUT, bufsize=1)
with p.stdout, open('logfile', 'ab') as file:
for line in iter(p.stdout.readline, b''):
print line, #NOTE: the comma prevents duplicate newlines (softspace hack)
file.write(line)
p.wait()
To fix possible buffering issues (if the output is delayed), see links in Python: read streaming input from subprocess.communicate().
Here's Python 3 version:
#!/usr/bin/env python3
import sys
from subprocess import Popen, PIPE, STDOUT
with Popen("command", stdout=PIPE, stderr=STDOUT, bufsize=1) as p, \
open('logfile', 'ab') as file:
for line in p.stdout: # b'\n'-separated lines
sys.stdout.buffer.write(line) # pass bytes as is
file.write(line)

Write to terminal byte by byte for interactive applications
This method write any bytes it gets to stdout immediately, which more closely simulates the behavior of tee, especially for interactive applications.
main.py
#!/usr/bin/env python3
import os
import subprocess
import sys
with subprocess.Popen(sys.argv[1:], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) as proc, \
open('logfile.txt', 'bw') as logfile:
while True:
byte = proc.stdout.read(1)
if byte:
sys.stdout.buffer.write(byte)
sys.stdout.flush()
logfile.write(byte)
# logfile.flush()
else:
break
exit_status = proc.returncode
sleep.py
#!/usr/bin/env python3
import sys
import time
for i in range(10):
print(i)
sys.stdout.flush()
time.sleep(1)
First we can do a non-interactive sanity check:
./main.py ./sleep.py
And we see it counting to stdout on real time.
Next, for an interactive test, you can run:
./main.py bash
Then the characters you type appear immediately on the terminal as you type them, which is very important for interactive applications. This is what happens when you run:
bash | tee logfile.txt
Also, if you want the output to show on the ouptut file immediately, then you can also add a:
logfile.flush()
but tee does not do this, and I'm afraid it would kill performance. You can test this out easily with:
tail -f logfile.txt
Related question: live output from subprocess command
Tested on Ubuntu 18.04, Python 3.6.7.

Disable console output from subprocess.Popen in Python

I run Python 2.5 on Windows, and somewhere in the code I have
subprocess.Popen("taskkill /PID " + str(p.pid))
to kill IE window by pid. The problem is that without setting up piping in Popen I still get output to console - SUCCESS: The process with PID 2068 has been terminated. I debugged it to CreateProcess in subprocess.py, but can't go from there.
Anyone knows how to disable this?

from subprocess import check_call, DEVNULL, STDOUT
check_call(
("taskkill", "/PID", str(p.pid)),
stdout=DEVNULL,
stderr=STDOUT,
)
I always pass in tuples (or lists) to subprocess as it saves me worrying about escaping. check_call ensures (a) the subprocess has finished before the pipe closes, and (b) a failure in the called process is not ignored.
If you're stuck in python 2, subprocess doesn't provide DEVNULL. However, you can replicate it by opening os.devnull (the standard, cross-platform way of saying NUL in Python 2.4+):
import os
from subprocess import check_call, STDOUT
DEVNULL = open(os.devnull, 'wb')
try:
check_call(
("taskkill", "/PID", str(p.pid)),
stdout=DEVNULL,
stderr=STDOUT,
)
finally:
DEVNULL.close()

fh = open("NUL","w")
subprocess.Popen("taskkill /PID " + str(p.pid), stdout = fh, stderr = fh)
fh.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to write output from parallel subprocesses to different files - python

Related

Calling a shell script from python

Multiple pipes in subprocess

Calling ffmpeg kills script in background only

How to write to stdout AND to log file simultaneously with Popen?

Disable console output from subprocess.Popen in Python

Categories

Resources