I am trying to use python in a unix style pipe.
For example, in unix I can use a pipe such as:
$ samtools view -h somefile.bam | python modifyStdout.py | samtools view -bh - > processed.bam
I can do this by using a for line in sys.stdin: loop in the python script and that appears to work without problems.
However I would like to internalise this unix command into a python script. The files involved will be large so I would like to avoid blocking behaviour, and basically stream between processes.
At the moment I am trying to use Popen to manage each command, and pass the stdout of the first process to the stdin of the next process, and so on.
In a seperate python script I have (sep_process.py):
import sys
f = open("sentlines.txt", 'wr')
f.write("hi")
for line in sys.stdin:
print line
f.write(line)
f.close()
And in my main python script I have this:
import sys
from subprocess import Popen, PIPE
# Generate an example file to use
f = open('sees.txt', 'w')
f.write('somewhere over the\nrainbow')
f.close()
if __name__ == "__main__":
# Use grep as an example command
p1 = Popen("grep over sees.txt".split(), stdout=PIPE)
# Send to sep_process.py
p2 = Popen("python ~/Documents/Pythonstuff/Bam_count_tags/sep_process.py".split(), stdin=p1.stdout, stdout=PIPE)
# Send to final command
p3 = Popen("wc", stdin=p2.stdout, stdout=PIPE)
# Read output from wc
result = p3.stdout.read()
print result
The p2 process however fails [Errno 2] No such file or directory even though the file exists.
Do I need to implement a Queue of some kind and/or open the python function using the multiprocessing module?
The tilde ~ is a shell expansion. You are not using a shell, so it is looking for a directory called ~.
You could read the environment variable HOME and insert that. Use
os.environ['HOME']
Alternatively you could use shell=True if you can't be bothered to do your own expansion.
Thanks #cdarke, that solved the problem for using simple commands like grep, wc etc. However I was too stupid to get subprocess.Popen to work when using an executable such as samtools to provide the data stream.
To fix the issue, I created a string containing the pipe exactly as I would write it in the command line, for example:
sam = '/Users/me/Documents/Tools/samtools-1.2/samtools'
home = os.environ['HOME']
inpath = "{}/Documents/Pythonstuff/Bam_count_tags".format(home)
stream_in = "{s} view -h {ip}/test.bam".format(s=sam, ip=inpath)
pyscript = "python {ip}/bam_tags.py".format(ip=inpath)
stream_out = "{s} view -bh - > {ip}/small.bam".format(s=sam, ip=inpath)
# Absolute paths, witten as a pipe
fullPipe = "{inS} | {py} | {outS}".format(inS=stream_in,
py=pyscript,
outS=stream_out)
print fullPipe
# Translates to >>>
# samtools view -h test.bam | python ./bam_tags.py | samtools view -bh - > small.bam
I then used popen from the os module instead and this worked as expected:
os.popen(fullPipe)
Related
I would like to call a complex command line in Python and capture its output, and I don't understand how I should be doing it:
Command line that I'm trying to run is:
cat codegen_query_output.json | jq -r '.[0].code' | echoprint-inverted-query index.bin
As far as I got is:
process = subprocess.Popen(['ls', '-a'], stdout=subprocess.PIPE)
out, err = process.communicate()
print out
but this is a simple ls -a ([cmd, args]) any idea how should I run/structure my complex command line call?
The cleanest way is to create 2 subprocesses piped together. You don't need a subprocess for the cat command, just pass an opened file handle:
import subprocess
with open("codegen_query_output.json") as input_stream:
jqp = subprocess.Popen(["jq","-r",'.[0].code'],stdin=input_stream,stdout=subprocess.PIPE)
ep = subprocess.Popen(["echoprint-inverted-query","index.bin"],stdin=jqp.stdout,stdout=subprocess.PIPE)
output = ep.stdout.read()
return_code = ep.wait() or jqp.wait()
The jqp process takes the file contents as input. Its output is passed to ep input.
In the end we read output from ep to get the final result. The return_code is a combination of both return codes. If something goes wrong, it's different from 0 (more detailed return code info would be to test separately of course)
Standard error isn't considered here. It will be displayed to the console, unless stderr=subprocess.STDOUT is set (to merge with piped output)
This method doesn't require a shell or shell=True, it's then more portable and secure.
It takes a shell to interpret operators like |. You can ask Python to run a shell, and pass your command as the thing to execute:
cmd = "cat test.py | tail -n3"
process = subprocess.Popen(['bash', '-c', cmd], stdout=subprocess.PIPE)
out, err = process.communicate()
print out
I have a python script that calls a shell scrips, that in turn calls a .exe called iv4_console. I need to print the stdout of iv4_console for debugging purposes. I used this:
Python:
import sys
import subprocess
var="rW015005000000"
proc = subprocess.Popen(["c.sh", var], shell=True, stdout=subprocess.PIPE)
output = ''
for line in iter(proc.stdout.readline, ""):
print line
output += line
Shell:
start_dir=$PWD
release=$1
echo Release inside shell: $release
echo Directory: $start_dir
cd $start_dir
cd ../../iv_system4/ports/visualC12/Debug
echo Debug dir: $PWD
./iv4_console.exe ../embedded/LUA/analysis/verbose-udp-toxml.lua ../../../../../logs/$release/VASP_DUN722_20160307_Krk_Krk_113048_092_1_$release.dvl &>../../../../FCW/ObjectDetectionTest/VASP_DUN722_20160307_Krk_Krk_113048_092_1_$release.xml
./iv4_console.exe ../embedded/LUA/analysis/verbose-udp-toxml.lua ../../../../../logs/$release/VASP_FL140_20170104_C60_Checkout_afterIC_162557_001_$release.dvl &>../../../../FCW/ObjectDetectionTest/VASP_FL140_20170104_C60_Checkout_afterIC_162557_001_$release.xml
exit
But this didn't work, it prints nothing. What do you think?
See my comment, best approach (i.m.o) would be to just use python only.
However, in answer of your question, try:
import sys
import subprocess
var="rW015005000000"
proc = subprocess.Popen(["/bin/bash", "/full/path/to/c.sh"], stdout=subprocess.PIPE)
# Best to always avoid shell=True because of security vulnerabilities.
proc.wait() # To make sure the shell script does not continue running indefinitely in the background
output, errors = proc.communicate()
print(output.decode())
# Since subprocess.communicate() returns a bytes-string, you can use .decode() to print the actual output as a string.
You can use
import subprocess
subprocess.call(['./c.sh'])
to call the shell script in python file
or
import subprocess
import shlex
subprocess.call(shlex.split('./c.sh var'))
I have a .bat script which is calculating the execution time of a process. As follows:
set startTime=%time%
YourApp.exe
echo Start Time: %startTime%
echo Finish Time: %time%
Now, i want to return "Finish Time" to some variable of the script from which this .bat script is called but i am not getting how shall i return the value from the .bat script. Kindly suggest how shall i acheive it.
You can combine subprocess and regex to parse the output
import subprocess
import re
output = subprocess.Popen(
("yourBatch.bat", "arguments1", "argument2"),
stdout=subprocess.PIPE).stdout
finish_time_search = re.search('Finish Time: (.*)', output[1], re.IGNORECASE)
if finish_time_search:
finish_time = finish_time_search.group(1)
output.close()
From the script side:
It is easy to parse the called bat output and get the information you need without an exit code. You can use the subprocess command and the communicate method to parse the output. As reported in the python help:
https://docs.python.org/2/library/subprocess.html
output=`dmesg | grep hda`
# becomes
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
From the DOS side:
You can use the "exit" command:
exit /b %errorlevel%
And use errorlevel for your purpouse.
Please, for more info, look at:
http://www.computerhope.com/exithlp.htm
Another option is to use and environment variable with the command "setx" (which uses a register and it is not volatile), or a file as a temp storage.
I write lots of small scripts to manipulate files on a Bash-based server. I would like to have a mechanism by which to log which commands created which files in a given directory. However, I don't just want to capture every input command, all the time.
Approach 1: a wrapper script that uses a Bash builtin (a la history or fc -ln -1) to grab the last command and write it to a log file. I have not been able to figure out any way to do this, as the shell builtin commands do not appear to be recognized outside of the interactive shell.
Approach 2: a wrapper script that pulls from ~/.bash_history to get the last command. This, however, requires setting up the Bash shell to flush every command to history immediately (as per this comment) and seems also to require that the history be allowed to grow inexorably. If this is the only way, so be it, but it would be great to avoid having to edit the ~/.bashrc file on every system where this might be implemented.
Approach 3: use script. My problem with this is that it requires multiple commands to start and stop the logging, and because it launches its own shell it is not callable from within another script (or at least, doing so complicates things significantly).
I am trying to figure out an implementation that's of the form log_this.script other_script other_arg1 other_arg2 > file, where everything after the first argument is logged. The emphasis here is on efficiency and minimizing syntax overhead.
EDIT: iLoveTux and I both came up with similar solutions. For those interested, my own implementation follows. It is somewhat more constrained in its functionality than the accepted answer, but it also auto-updates any existing logfile entries with changes (though not deletions).
Sample usage:
$ cmdlog.py "python3 test_script.py > test_file.txt"
creates a log file in the parent directory of the output file with the following:
2015-10-12#10:47:09 test_file.txt "python3 test_script.py > test_file.txt"
Additional file changes are added to the log;
$ cmdlog.py "python3 test_script.py > test_file_2.txt"
the log now contains
2015-10-12#10:47:09 test_file.txt "python3 test_script.py > test_file.txt"
2015-10-12#10:47:44 test_file_2.txt "python3 test_script.py > test_file_2.txt"
Running on the original file name again changes the file order in the log, based on modification time of the files:
$ cmdlog.py "python3 test_script.py > test_file.txt"
produces
2015-10-12#10:47:44 test_file_2.txt "python3 test_script.py > test_file_2.txt"
2015-10-12#10:48:01 test_file.txt "python3 test_script.py > test_file.txt"
Full script:
#!/usr/bin/env python3
'''
A wrapper script that will write the command-line
args associated with any files generated to a log
file in the directory where the files were made.
'''
import sys
import os
from os import listdir
from os.path import isfile, join
import subprocess
import time
from datetime import datetime
def listFiles(mypath):
"""
Return relative paths of all files in mypath
"""
return [join(mypath, f) for f in listdir(mypath) if
isfile(join(mypath, f))]
def read_log(log_file):
"""
Reads a file history log and returns a dictionary
of {filename: command} entries.
Expects tab-separated lines of [time, filename, command]
"""
entries = {}
with open(log_file) as log:
for l in log:
l = l.strip()
mod, name, cmd = l.split("\t")
# cmd = cmd.lstrip("\"").rstrip("\"")
entries[name] = [cmd, mod]
return entries
def time_sort(t, fmt):
"""
Turn a strftime-formatted string into a tuple
of time info
"""
parsed = datetime.strptime(t, fmt)
return parsed
ARGS = sys.argv[1]
ARG_LIST = ARGS.split()
# Guess where logfile should be put
if (">" or ">>") in ARG_LIST:
# Get position after redirect in arg list
redirect_index = max(ARG_LIST.index(e) for e in ARG_LIST if e in ">>")
output = ARG_LIST[redirect_index + 1]
output = os.path.abspath(output)
out_dir = os.path.dirname(output)
elif ("cp" or "mv") in ARG_LIST:
output = ARG_LIST[-1]
out_dir = os.path.dirname(output)
else:
out_dir = os.getcwd()
# Set logfile location within the inferred output directory
LOGFILE = out_dir + "/cmdlog_history.log"
# Get file list state prior to running
all_files = listFiles(out_dir)
pre_stats = [os.path.getmtime(f) for f in all_files]
# Run the desired external commands
subprocess.call(ARGS, shell=True)
# Get done time of external commands
TIME_FMT = "%Y-%m-%d#%H:%M:%S"
log_time = time.strftime(TIME_FMT)
# Get existing entries from logfile, if present
if LOGFILE in all_files:
logged = read_log(LOGFILE)
else:
logged = {}
# Get file list state after run is complete
post_stats = [os.path.getmtime(f) for f in all_files]
post_files = listFiles(out_dir)
# Find files whose states have changed since the external command
changed = [e[0] for e in zip(all_files, pre_stats, post_stats) if e[1] != e[2]]
new = [e for e in post_files if e not in all_files]
all_modded = list(set(changed + new))
if not all_modded: # exit early, no need to log
sys.exit(0)
# Replace files that have changed, add those that are new
for f in all_modded:
name = os.path.basename(f)
logged[name] = [ARGS, log_time]
# Write changed files to logfile
with open(LOGFILE, 'w') as log:
for name, info in sorted(logged.items(), key=lambda x: time_sort(x[1][1], TIME_FMT)):
cmd, mod_time = info
if not cmd.startswith("\""):
cmd = "\"{}\"".format(cmd)
log.write("\t".join([mod_time, name, cmd]) + "\n")
sys.exit(0)
You can use the tee command, which stores its standard input to a file and outputs it on standard output. Pipe the command line into tee, and pipe tee's output into a new invocation of your shell:
echo '<command line to be logged and executed>' | \
tee --append /path/to/your/logfile | \
$SHELL
i.e., for your example of other_script other_arg1 other_arg2 > file,
echo 'other_script other_arg1 other_arg2 > file' | \
tee --append /tmp/mylog.log | \
$SHELL
If your command line needs single quotes, they need to be escaped properly.
OK, so you don't mention Python in your question, but it is tagged Python, so I figured I would see what I could do. I came up with this script:
import sys
from os.path import expanduser, join
from subprocess import Popen, PIPE
def issue_command(command):
process = Popen(command, stdout=PIPE, stderr=PIPE, shell=True)
return process.communicate()
home = expanduser("~")
log_file = join(home, "command_log")
command = sys.argv[1:]
with open(log_file, "a") as fout:
fout.write("{}\n".format(" ".join(command)))
out, err = issue_command(command)
which you can call like (if you name it log_this and make it executable):
$ log_this echo hello world
and it will put "echo hello world" in a file ~/command_log, note though that if you want to use pipes or redirection you have to quote your command (this may be a real downfall for your use case or it may not be, but I haven't figured out how to do this just yet without the quotes) like this:
$ log_this "echo hello world | grep h >> /tmp/hello_world"
but since it's not perfect, I thought I would add a little something extra.
The following script allows you to specify a different file to log your commands to as well as record the execution time of the command:
#!/usr/bin/env python
from subprocess import Popen, PIPE
import argparse
from os.path import expanduser, join
from time import time
def issue_command(command):
process = Popen(command, stdout=PIPE, stderr=PIPE, shell=True)
return process.communicate()
home = expanduser("~")
default_file = join(home, "command_log")
parser = argparse.ArgumentParser()
parser.add_argument("-f", "--file", type=argparse.FileType("a"), default=default_file)
parser.add_argument("-p", "--profile", action="store_true")
parser.add_argument("command", nargs=argparse.REMAINDER)
args = parser.parse_args()
if args.profile:
start = time()
out, err = issue_command(args.command)
runtime = time() - start
entry = "{}\t{}\n".format(" ".join(args.command), runtime)
args.file.write(entry)
else:
out, err = issue_command(args.command)
entry = "{}\n".format(" ".join(args.command))
args.file.write(entry)
args.file.close()
You would use this the same way as the other script, but if you wanted to specify a different file to log to just pass -f <FILENAME> before your actual command and your log will go there, and if you wanted to record the execution time just provide the -p (for profile) before your actual command like so:
$ log_this -p -f ~/new_log "echo hello world | grep h >> /tmp/hello_world"
I will try to make this better, but if you can think of anything else this could do for you, I am making a github project for this where you can submit bug reports and feature requests.
If possible I would like to not use subProcess.popen. The reason I want to capture the stdout of the process started by the child is because I need to save the output of the child in a variable to display it back later. However I have yet to find a way to do so anywhere. I also need to activate multiple programs without necessarily closing the one that's active. I also need to be controlling the child process whit the parent process.
I'm launching a subprocess like this
listProgram = ["./perroquet.py"]
listOutput = ["","",""]
tubePerroquet = os.pipe()
pipeMain = os.pipe()
pipeAge = os.pipe()
pipeSavoir = os.pipe()
pid = os.fork()
process = 1
if pid == 0:
os.close(pipePerroquet[1])
os.dup2(pipePerroquet[0],0)
sys.stdout = os.fdopen(tubeMain[1], 'w')
os.execvp("./perroquet.py", listProgram)
Now as you can see I'm launching the program with os.execvp and using os.dup2() to redirect the stdout of the child. However I'm not sure of what I've done in the code and want to know of the correct way to redirect stdout with os.dup2 and then be able to read it in the parent process.
Thank you for your help.
I cannot understand why you do not want to use the excellent subprocess module that could save you a lot of boiler plate code (and as much error possibilities ...). Anyway, I assume perroquet.py is a python script, not an executable progam. Shell know how to find the correct interpretor for scripts, but exec family are low-level functions that expect a real executable program.
You should at least have something like :
listProgram = [ "python", "./perroquet.py","",""]
...
os.execvp("python", listProgram)
But I'd rather use :
prog = subprocess.Popen(("python", "./perroquet.py", "", ""), stdout = PIPE)
or even as you are already in python import it and directly call the functions from there.
EDIT :
It looks thart what you really want is :
user gives you a command (can be almost anything)
[ you validate that the command is safe ] - unsure if you intend to do it but you should ...
you make the shell execute the command and get its output - you may want to read stderr too and control exit code
You should try something like
while True:
cmd = raw_input("commande :") # input with Python 3
if cmd.strip().lower() == exit: break
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, shell=True)
out, err = proc.communicate()
code = proc.returncode
print("OUT", out, "ERR", err, "CODE", code)
It is absolutely unsafe, since this code executes any command as the underlying shell would do (include rm -rf *, rd /s/q ., ...), but it gives you the output, the output and the return code of the command, and it can be used is a loop. The only limitation is that as you use a different shell for each command, you cannot use commands that change shell environment - they will be executed but will have no effect.
Here's a solution if you need to extract any changes to the environment
from subprocess import Popen, PIPE
import os
def execute_and_get_env(cmd, initial_env=None):
if initial_env is None:
initial_env = os.environ
r_fd, w_fd = os.pipe()
write_env = "; env >&{}".format(w_fd)
p = Popen(cmd + write_env, shell=True, env=initial_env, pass_fds=[w_fd], stdout=PIPE, stderr=PIPE)
output, error = p.communicate()
# this will cause problems if the environment gets very large as
# writing to the pipe will hang because it gets full and we only
# read from the pipe when the process is over
os.close(w_fd)
with open(r_fd) as f:
env = dict(line[:-1].split("=", 1) for line in f)
return output, error, env
export_cmd = "export my_var='hello world'"
echo_cmd = "echo $my_var"
out, err, env = execute_and_get_env(export_cmd)
out, err, env = execute_and_get_env(echo_cmd, env)
print(out)