I am trying to use Sailfish, which takes multiple fastq files as arguments, in a ruffus pipeline. I execute Sailfish using the subprocess module in python, but <() in the subprocess call does not work even when I set shell=True.
This is the command I want to execute using python:
sailfish quant [options] -1 <(cat sample1a.fastq sample1b.fastq) -2 <(cat sample2a.fastq sample2b.fastq) -o [output_file]
or (preferably):
sailfish quant [options] -1 <(gunzip sample1a.fastq.gz sample1b.fastq.gz) -2 <(gunzip sample2a.fastq.gz sample2b.fastq.gz) -o [output_file]
A generalization:
someprogram <(someprocess) <(someprocess)
How would I go about doing this in python? Is subprocess the right approach?
To emulate the bash process substitution:
#!/usr/bin/env python
from subprocess import check_call
check_call('someprogram <(someprocess) <(anotherprocess)',
shell=True, executable='/bin/bash')
In Python, you could use named pipes:
#!/usr/bin/env python
from subprocess import Popen
with named_pipes(n=2) as paths:
someprogram = Popen(['someprogram'] + paths)
processes = []
for path, command in zip(paths, ['someprocess', 'anotherprocess']):
with open(path, 'wb', 0) as pipe:
processes.append(Popen(command, stdout=pipe, close_fds=True))
for p in [someprogram] + processes:
p.wait()
where named_pipes(n) is:
import os
import shutil
import tempfile
from contextlib import contextmanager
#contextmanager
def named_pipes(n=1):
dirname = tempfile.mkdtemp()
try:
paths = [os.path.join(dirname, 'named_pipe' + str(i)) for i in range(n)]
for path in paths:
os.mkfifo(path)
yield paths
finally:
shutil.rmtree(dirname)
Another and more preferable way (no need to create a named entry on disk) to implement the bash process substitution is to use /dev/fd/N filenames (if they are available) as suggested by #Dunes. On FreeBSD, fdescfs(5) (/dev/fd/#) creates entries for all file descriptors opened by the process. To test availability, run:
$ test -r /dev/fd/3 3</dev/null && echo /dev/fd is available
If it fails; try to symlink /dev/fd to proc(5) as it is done on some Linuxes:
$ ln -s /proc/self/fd /dev/fd
Here's /dev/fd-based implementation of someprogram <(someprocess) <(anotherprocess) bash command:
#!/usr/bin/env python3
from contextlib import ExitStack
from subprocess import CalledProcessError, Popen, PIPE
def kill(process):
if process.poll() is None: # still running
process.kill()
with ExitStack() as stack: # for proper cleanup
processes = []
for command in [['someprocess'], ['anotherprocess']]: # start child processes
processes.append(stack.enter_context(Popen(command, stdout=PIPE)))
stack.callback(kill, processes[-1]) # kill on someprogram exit
fds = [p.stdout.fileno() for p in processes]
someprogram = stack.enter_context(
Popen(['someprogram'] + ['/dev/fd/%d' % fd for fd in fds], pass_fds=fds))
for p in processes: # close pipes in the parent
p.stdout.close()
# exit stack: wait for processes
if someprogram.returncode != 0: # errors shouldn't go unnoticed
raise CalledProcessError(someprogram.returncode, someprogram.args)
Note: on my Ubuntu machine, the subprocess code works only in Python 3.4+, despite pass_fds being available since Python 3.2.
Whilst J.F. Sebastian has provided an answer using named pipes it is possible to do this with anonymous pipes.
import shlex
from subprocess import Popen, PIPE
inputcmd0 = "zcat hello.gz" # gzipped file containing "hello"
inputcmd1 = "zcat world.gz" # gzipped file containing "world"
def get_filename(file_):
return "/dev/fd/{}".format(file_.fileno())
def get_stdout_fds(*processes):
return tuple(p.stdout.fileno() for p in processes)
# setup producer processes
inputproc0 = Popen(shlex.split(inputcmd0), stdout=PIPE)
inputproc1 = Popen(shlex.split(inputcmd1), stdout=PIPE)
# setup consumer process
# pass input processes pipes by "filename" eg. /dev/fd/5
cmd = "cat {file0} {file1}".format(file0=get_filename(inputproc0.stdout),
file1=get_filename(inputproc1.stdout))
print("command is:", cmd)
# pass_fds argument tells Popen to let the child process inherit the pipe's fds
someprogram = Popen(shlex.split(cmd), stdout=PIPE,
pass_fds=get_stdout_fds(inputproc0, inputproc1))
output, error = someprogram.communicate()
for p in [inputproc0, inputproc1, someprogram]:
p.wait()
assert output == b"hello\nworld\n"
Related
I have created a Subprocess object. The subprocess invokes a shell, I need to send the shell command provided below to it. The code I've tried:
from subprocess import Popen, PIPE
p = Popen(["code.exe","25"],stdin=PIPE,stdout=PIPE,stderr=PIPE)
print p.communicate(input='ping 8.8.8.8')
The command doesn't execute, nothing is being input into the shell. Thanks in advance.
If I simulate code.exe to read the arg and then process stdin:
#!/usr/bin/env bash
echo "arg: $1"
echo "stdin:"
while read LINE
do
echo "$LINE"
done < /dev/stdin
and slightly update your code:
import os
from subprocess import Popen, PIPE
cwd = os.getcwd()
exe = os.path.join(cwd, 'foo.sh')
p = Popen([exe, '25'], stdin=PIPE, stdout=PIPE, stderr=PIPE)
out, err = p.communicate(input='aaa\nbbb\n')
for line in out.split('\n'):
print(line)
Then the spawned process outputs:
arg: 25
stdin:
aaa
bbb
If input is changed without a \n though:
out, err = p.communicate(input='aaa')
Then it doesn't appear:
arg: 25
stdin:
Process finished with exit code 0
So you might want to look closely at the protocol between both ends of the pipe. For example this might be enough:
input='ping 8.8.8.8\n'
Hope that helps.
I have a python script that calls a shell scrips, that in turn calls a .exe called iv4_console. I need to print the stdout of iv4_console for debugging purposes. I used this:
Python:
import sys
import subprocess
var="rW015005000000"
proc = subprocess.Popen(["c.sh", var], shell=True, stdout=subprocess.PIPE)
output = ''
for line in iter(proc.stdout.readline, ""):
print line
output += line
Shell:
start_dir=$PWD
release=$1
echo Release inside shell: $release
echo Directory: $start_dir
cd $start_dir
cd ../../iv_system4/ports/visualC12/Debug
echo Debug dir: $PWD
./iv4_console.exe ../embedded/LUA/analysis/verbose-udp-toxml.lua ../../../../../logs/$release/VASP_DUN722_20160307_Krk_Krk_113048_092_1_$release.dvl &>../../../../FCW/ObjectDetectionTest/VASP_DUN722_20160307_Krk_Krk_113048_092_1_$release.xml
./iv4_console.exe ../embedded/LUA/analysis/verbose-udp-toxml.lua ../../../../../logs/$release/VASP_FL140_20170104_C60_Checkout_afterIC_162557_001_$release.dvl &>../../../../FCW/ObjectDetectionTest/VASP_FL140_20170104_C60_Checkout_afterIC_162557_001_$release.xml
exit
But this didn't work, it prints nothing. What do you think?
See my comment, best approach (i.m.o) would be to just use python only.
However, in answer of your question, try:
import sys
import subprocess
var="rW015005000000"
proc = subprocess.Popen(["/bin/bash", "/full/path/to/c.sh"], stdout=subprocess.PIPE)
# Best to always avoid shell=True because of security vulnerabilities.
proc.wait() # To make sure the shell script does not continue running indefinitely in the background
output, errors = proc.communicate()
print(output.decode())
# Since subprocess.communicate() returns a bytes-string, you can use .decode() to print the actual output as a string.
You can use
import subprocess
subprocess.call(['./c.sh'])
to call the shell script in python file
or
import subprocess
import shlex
subprocess.call(shlex.split('./c.sh var'))
I want to get the process name, given it's pid in python.
Is there any direct method in python?
The psutil package makes this very easy.
import psutil
process = psutil.Process(pid)
process_name = process.name()
If you want to see the running process, you can just use os module to execute the ps unix command
import os
os.system("ps")
This will list the processes.
But if you want to get process name by ID, you can try ps -o cmd= <pid>
So the python code will be
import os
def get_pname(id):
return os.system("ps -o cmd= {}".format(id))
print(get_pname(1))
The better method is using subprocess and pipes.
import subprocess
def get_pname(id):
p = subprocess.Popen(["ps -o cmd= {}".format(id)], stdout=subprocess.PIPE, shell=True)
return str(p.communicate()[0])
name = get_pname(1)
print(name)
Command name (only the executable name):
from subprocess import PIPE, Popen
def get_cmd(pid)
with Popen(f"ps -q {pid} -o comm=", shell=True, stdout=PIPE) as p:
return p.communicate()[0]
Command with all its arguments as a string:
from subprocess import PIPE, Popen
def get_args(pid)
with Popen(f"ps -q {pid} -o cmd=", shell=True, stdout=PIPE) as p:
return p.communicate()[0]
I have the following python script. How can I log the outputs of each command separately, i.e. one file per each command containing that command's output?
#!/usr/bin/env python
from subprocess import Popen
import sys
commands = [
'command1',
'command2',
'command3'
]
processes = [Popen(cmd, shell=True) for cmd in commands]
for p in processes:
p.wait()
Just set stdout parameter to a corresponding file:
import shlex
from contextlib import ExitStack # $ pip install contextlib2 (on Python 2)
from subprocess import Popen
with ExitStack() as stack:
for i, cmd in enumerate(commands):
output_file = stack.enter_context(open('output%d.log' % i, 'w'))
stack.callback(Popen(shlex.split(cmd), stdout=output_file).wait)
To redirect stderr output from a child process, you could set stderr parameter. If stderr=subprocess.STDOUT then stdout and stderr are merged.
ExitStack is used to close the files and wait for already started child processes to exit even if an exception happens inside the with-statement e.g., if some command fails to start.
I have the following python code that hangs :
cmd = ["ssh", "-tt", "-vvv"] + self.common_args
cmd += [self.host]
cmd += ["cat > %s" % (out_path)]
p = subprocess.Popen(cmd, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = p.communicate(in_string)
It is supposed to save a string (in_string) into a remote file over ssh.
The file is correctly saved but then the process hangs. If I use
cmd += ["echo"] instead of
cmd += ["cat > %s" % (out_path)]
the process does not hang so I am pretty sure that I misunderstand something about the way communicate considers that the process has exited.
do you know how I should write the command so the the "cat > file" does not make communicate hang ?
-tt option allocates tty that prevents the child process to exit when .communicate() closes p.stdin (EOF is ignored). This works:
import pipes
from subprocess import Popen, PIPE
cmd = ["ssh", self.host, "cat > " + pipes.quote(out_path)] # no '-tt'
p = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE)
stdout, stderr = p.communicate(in_string)
You could use paramiko -- pure Python ssh library, to write data to a remote file via ssh:
#!/usr/bin/env python
import os
import posixpath
import sys
from contextlib import closing
from paramiko import SSHConfig, SSHClient
hostname, out_path, in_string = sys.argv[1:] # get from command-line
# load parameters to setup ssh connection
config = SSHConfig()
with open(os.path.expanduser('~/.ssh/config')) as config_file:
config.parse(config_file)
d = config.lookup(hostname)
# connect
with closing(SSHClient()) as ssh:
ssh.load_system_host_keys()
ssh.connect(d['hostname'], username=d.get('user'))
with closing(ssh.open_sftp()) as sftp:
makedirs_exists_ok(sftp, posixpath.dirname(out_path))
with sftp.open(out_path, 'wb') as remote_file:
remote_file.write(in_string)
where makedirs_exists_ok() function mimics os.makedirs():
from functools import partial
from stat import S_ISDIR
def isdir(ftp, path):
try:
return S_ISDIR(ftp.stat(path).st_mode)
except EnvironmentError:
return None
def makedirs_exists_ok(ftp, path):
def exists_ok(mkdir, name):
"""Don't raise an error if name is already a directory."""
try:
mkdir(name)
except EnvironmentError:
if not isdir(ftp, name):
raise
# from os.makedirs()
head, tail = posixpath.split(path)
if not tail:
assert path.endswith(posixpath.sep)
head, tail = posixpath.split(head)
if head and tail and not isdir(ftp, head):
exists_ok(partial(makedirs_exists_ok, ftp), head) # recursive call
# do create directory
assert isdir(ftp, head)
exists_ok(ftp.mkdir, path)
It makes sense that the cat command hangs. It is waiting for an EOF. I tried sending an EOF in the string but couldn't get it to work. Upon researching this question, I found a great module for streamlining the use of SSH for command line tasks like your cat example. It might not be exactly what you need for your usecase, but it does do what your question asks.
Install fabric with
pip install fabric
Inside a file called fabfile.py put
from fabric.api import run
def write_file(in_string, path):
run('echo {} > {}'.format(in_string,path))
And then run this from the command prompt with,
fab -H username#host write_file:in_string=test,path=/path/to/file