In my python script, I need to use 'awk' but I want to pass file using the sys.argv.
My current code is like this:
import sys
import os
cmd="awk '/regex/ {print}' sys.argv[1] | sed 's/old/new/g'"
x=os.popen(cmd).read()
Now the problem is that 'sys.argv' is a python thing but cmd variable is using a linux command.
So my question is - Is there any way to include sys.argv in my linux command?
You really don't need Awk or sed for this. Python can do these things natively, elegantly, flexibly, robustly, and naturally.
import sys
import re
r = re.compile(r'regex')
s = re.compile(r'old')
with open(sys.argv[1]) as input:
for line in input:
if r.search(line):
print(s.sub('new', line))
If you really genuinely want to use subprocesses for something, simply use Python's general string interpolation functions where you need to insert the value of a Python variable into a string.
import subprocess
import sys
import shlex
result = subprocess.run(
"""awk '/regex/ {print}' {} |
sed 's/old/new/g'""".format(shlex.quote(sys.argv[1])),
stdout=subprocess.PIPE,
shell=True, check=True)
print(subprocess.stdout)
But really, don't do this. If you really can't avoid a subprocess, keep it as simple as possible (avoid shell=True and peel off all the parts which can be done in Python).
Just try like this
cmd="awk '/regex/ {print}' " + str(sys.argv[1]) + " | sed 's/old/new/g'"
x=os.popen(cmd).read()
Your best choice is to implement your logic as pure Python logic, as described in the first part of the answer by #tripleee. Your second best choice is to keep the external tools, but eliminate the need for a shell in invoking them and connecting them together.
See the Python documentation section Replacing Shell Pipelines.
import sys
from subprocess import Popen, PIPE
p1 = Popen(['awk', '/regex/ {print}'], stdin=open(sys.argv[1]), stdout=PIPE)
p2 = Popen(['sed', 's/old/new/g'], stdin=p1.stdout, stdout=PIPE)
x = p2.communicate()[0]
Your third best choice is to keep the shell, but pass the data out-of-band from the code:
p = subprocess.run([
"""awk '/regex/ {print}' <"$1" | sed 's/old/new/'""", # code to run
'_', # $0 in context of that code
sys.argv[1] # $1 in context of that code
], shell=True, check=True, stdout=subprocess.PIPE)
print(p.stdout)
Related
How do I execute the following shell command using the Python subprocess module?
echo "input data" | awk -f script.awk | sort > outfile.txt
The input data will come from a string, so I don't actually need echo. I've got this far, can anyone explain how I get it to pipe through sort too?
p_awk = subprocess.Popen(["awk","-f","script.awk"],
stdin=subprocess.PIPE,
stdout=file("outfile.txt", "w"))
p_awk.communicate( "input data" )
UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!
You'd be a little happier with the following.
import subprocess
awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data\n" )
Delegate part of the work to the shell. Let it connect two processes with a pipeline.
You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.
Edit. Some of the reasons for suggesting that awk isn't helping.
[There are too many reasons to respond via comments.]
Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.
The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.
The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.
Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.
Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.
Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.
Sidebar Why building a pipeline (a | b) is so hard.
When the shell is confronted with a | b it has to do the following.
Fork a child process of the original shell. This will eventually become b.
Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".
Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.
The b child closes replaces its stdin with the new b's stdin. Exec the b process.
The b child waits for a to complete.
The parent is waiting for b to complete.
I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).
Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.
However, it's easier to delegate that operation to the shell.
import subprocess
some_string = b'input_data'
sort_out = open('outfile.txt', 'wb', 0)
sort_in = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sort_out).stdin
subprocess.Popen(['awk', '-f', 'script.awk'], stdout=sort_in,
stdin=subprocess.PIPE).communicate(some_string)
To emulate a shell pipeline:
from subprocess import check_call
check_call('echo "input data" | a | b > outfile.txt', shell=True)
without invoking the shell (see 17.1.4.2. Replacing shell pipeline):
#!/usr/bin/env python
from subprocess import Popen, PIPE
a = Popen(["a"], stdin=PIPE, stdout=PIPE)
with a.stdin:
with a.stdout, open("outfile.txt", "wb") as outfile:
b = Popen(["b"], stdin=a.stdout, stdout=outfile)
a.stdin.write(b"input data")
statuses = [a.wait(), b.wait()] # both a.stdin/stdout are closed already
plumbum provides some syntax sugar:
#!/usr/bin/env python
from plumbum.cmd import a, b # magic
(a << "input data" | b > "outfile.txt")()
The analog of:
#!/bin/sh
echo "input data" | awk -f script.awk | sort > outfile.txt
is:
#!/usr/bin/env python
from plumbum.cmd import awk, sort
(awk["-f", "script.awk"] << "input data" | sort > "outfile.txt")()
The accepted answer is sidestepping actual question.
here is a snippet that chains the output of multiple processes:
Note that it also prints the (somewhat) equivalent shell command so you can run it and make sure the output is correct.
#!/usr/bin/env python3
from subprocess import Popen, PIPE
# cmd1 : dd if=/dev/zero bs=1m count=100
# cmd2 : gzip
# cmd3 : wc -c
cmd1 = ['dd', 'if=/dev/zero', 'bs=1M', 'count=100']
cmd2 = ['tee']
cmd3 = ['wc', '-c']
print(f"Shell style : {' '.join(cmd1)} | {' '.join(cmd2)} | {' '.join(cmd3)}")
p1 = Popen(cmd1, stdout=PIPE, stderr=PIPE) # stderr=PIPE optional, dd is chatty
p2 = Popen(cmd2, stdin=p1.stdout, stdout=PIPE)
p3 = Popen(cmd3, stdin=p2.stdout, stdout=PIPE)
print("Output from last process : " + (p3.communicate()[0]).decode())
# thoretically p1 and p2 may still be running, this ensures we are collecting their return codes
p1.wait()
p2.wait()
print("p1 return: ", p1.returncode)
print("p2 return: ", p2.returncode)
print("p3 return: ", p3.returncode)
http://www.python.org/doc/2.5.2/lib/node535.html covered this pretty well. Is there some part of this you didn't understand?
Your program would be pretty similar, but the second Popen would have stdout= to a file, and you wouldn't need the output of its .communicate().
Inspired by #Cristian's answer. I met just the same issue, but with a different command. So I'm putting my tested example, which I believe could be helpful:
grep_proc = subprocess.Popen(["grep", "rabbitmq"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
subprocess.Popen(["ps", "aux"], stdout=grep_proc.stdin)
out, err = grep_proc.communicate()
This is tested.
What has been done
Declared lazy grep execution with stdin from pipe. This command will be executed at the ps command execution when the pipe will be filled with the stdout of ps.
Called the primary command ps with stdout directed to the pipe used by the grep command.
Grep communicated to get stdout from the pipe.
I like this way because it is natural pipe conception gently wrapped with subprocess interfaces.
The previous answers missed an important point. Replacing shell pipeline is basically correct, as pointed out by geocar. It is almost sufficient to run communicate on the last element of the pipe.
The remaining problem is passing the input data to the pipeline. With multiple subprocesses, a simple communicate(input_data) on the last element doesn't work - it hangs forever. You need to create a a pipeline and a child manually like this:
import os
import subprocess
input = """\
input data
more input
""" * 10
rd, wr = os.pipe()
if os.fork() != 0: # parent
os.close(wr)
else: # child
os.close(rd)
os.write(wr, input)
os.close(wr)
exit()
p_awk = subprocess.Popen(["awk", "{ print $2; }"],
stdin=rd,
stdout=subprocess.PIPE)
p_sort = subprocess.Popen(["sort"],
stdin=p_awk.stdout,
stdout=subprocess.PIPE)
p_awk.stdout.close()
out, err = p_sort.communicate()
print (out.rstrip())
Now the child provides the input through the pipe, and the parent calls communicate(), which works as expected. With this approach, you can create arbitrary long pipelines without resorting to "delegating part of the work to the shell". Unfortunately the subprocess documentation doesn't mention this.
There are ways to achieve the same effect without pipes:
from tempfile import TemporaryFile
tf = TemporaryFile()
tf.write(input)
tf.seek(0, 0)
Now use stdin=tf for p_awk. It's a matter of taste what you prefer.
The above is still not 100% equivalent to bash pipelines because the signal handling is different. You can see this if you add another pipe element that truncates the output of sort, e.g. head -n 10. With the code above, sort will print a "Broken pipe" error message to stderr. You won't see this message when you run the same pipeline in the shell. (That's the only difference though, the result in stdout is the same). The reason seems to be that python's Popen sets SIG_IGN for SIGPIPE, whereas the shell leaves it at SIG_DFL, and sort's signal handling is different in these two cases.
EDIT: pipes is available on Windows but, crucially, doesn't appear to actually work on Windows. See comments below.
The Python standard library now includes the pipes module for handling this:
https://docs.python.org/2/library/pipes.html, https://docs.python.org/3.4/library/pipes.html
I'm not sure how long this module has been around, but this approach appears to be vastly simpler than mucking about with subprocess.
For me, the below approach is the cleanest and easiest to read
from subprocess import Popen, PIPE
def string_to_2_procs_to_file(input_s, first_cmd, second_cmd, output_filename):
with open(output_filename, 'wb') as out_f:
p2 = Popen(second_cmd, stdin=PIPE, stdout=out_f)
p1 = Popen(first_cmd, stdout=p2.stdin, stdin=PIPE)
p1.communicate(input=bytes(input_s))
p1.wait()
p2.stdin.close()
p2.wait()
which can be called like so:
string_to_2_procs_to_file('input data', ['awk', '-f', 'script.awk'], ['sort'], 'output.txt')
I am writing a python program that uses other software. I was able to pass the command using subprocess.popen. I am facing a new problem: I need to concatenate multiples files as two
files and use them as the input for the external program. The command line looks like this:
extersoftware --fq --f <(cat fileA_1 fileB_1) <(cat fileA_2 fileB_2)
I cannot use shell=True because there are other commands I need to pass by variables, such as --fq.(They are not limited to --fq, here is just an example)
One possible solution is to generate middle file.
This is what I have tried:
file_1 = ['cat', 'fileA_1', 'fileB_1']
p1 = Popen(file_1, stdout=PIPE)
p2 = Popen(['>', 'output_file'], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()
output = p2.communicate()
print output
I got error message: OSError: [Errno 2] No such file or directory Which part did I do wrong?
It would be better if there is no middle file. For this reason, I am looking at named pipe. I do not quiet understand it.
I have looked at multiple questions that have been answered here. To me they are all some how different from my question here.
Thanks ahead for all your help.
The way bash handles <(..) is to:
Create a pipe
Fork a command that writes to the write end
Substitute the <(..) for /dev/fd/N where N is the input end file descriptor of the pipe (try echo <(true)).
Run the command
The command will then open /dev/fd/N, and the OS will cause that to duplicate the inherited read end of the pipe.
We can do the same thing in Python:
import subprocess
import os
# Open a pipe and run a command that writes to the write end
input_fd, output_fd = os.pipe()
subprocess.Popen(["cat", "foo.txt", "bar.txt"], shell=False, stdout=output_fd)
os.close(output_fd);
# Run a command that uses /dev/fd/* to read from the read end
proc = subprocess.Popen(["wc", "/dev/fd/" + str(input_fd)],
shell=False, stdout = subprocess.PIPE)
# Read that command's output
print proc.communicate()[0]
For example:
$ cat foo.txt
Hello
$ cat bar.txt
World
$ wc <(cat foo.txt bar.txt)
2 2 12 /dev/fd/63
$ python test.py
2 2 12 /dev/fd/4
Process substitution returns the device filename that is being used. You will have to assign the pipe to a higher FD (e.g. 20) by passing a function to preexec_fn that uses os.dup2() to copy it, and then pass the FD device filename (e.g. /dev/fd/20) as one of the arguments of the call.
def assignfd(fd, handle):
def assign():
os.dup2(handle, fd)
return assign
...
p2 = Popen(['cat', '/dev/fd/20'], preexec_fn=assignfd(20, p1.stdout.fileno()))
...
It's actually possible have it both ways -- using a shell, while passing a list of arguments through unambiguously in a way that doesn't allow them to be shell-parsed.
Use bash explicitly rather than shell=True to ensure that you have support for <(), and use "$#" to refer to the additional argv array elements, like so:
subprocess.Popen(['bash', '-c',
'extersoftware "$#" --f <(cat fileA_1 fileB_1) <(cat fileA_2 fileB_2)',
"_", # this is a dummy passed in as argv[0] of the interpreter
"--fq", # this is substituted into the shell by the "$#"
])
If you wanted to independently pass in all three arrays -- extra arguments, and the exact filenames to be passed to each cat instance:
BASH_SCRIPT=r'''
declare -a filelist1=( )
filelist1_len=$1; shift
while (( filelist1_len-- > 0 )); do
filelist1+=( "$1" ); shift
done
filelist2_len=$1; shift
while (( filelist2_len-- > 0 )); do
filelist2+=( "$1" ); shift
done
extersoftware "$#" --f <(cat "${filelist1[#]}") <(cat "${filelist2[#]}")
'''
subprocess.Popen(['bash', '-c', BASH_SCRIPT, '' +
[str(len(filelist1))] + filelist1 +
[str(len(filelist2))] + filelist2 +
["--fq"],
])
You could put more interesting logic in the embedded shell script as well, were you so inclined.
In this specific case, we may use:
import subprocess
import os
if __name__ == '__main__':
input_fd1, output_fd1 = os.pipe()
subprocess.Popen(['cat', 'fileA_1', 'fileB_1'],
shell=False, stdout=output_fd1)
os.close(output_fd1)
input_fd2, output_fd2 = os.pipe();
subprocess.Popen(['cat', 'fileA_2', 'fileB_2'],
shell=False, stdout=output_fd2)
os.close(output_fd2)
proc = subprocess.Popen(['extersoftware','--fq', '--f',
'/dev/fd/'+str(input_fd1), '/dev/fd/' + str(input_fd2)], shell=False)
Change log:
Reformatted the code so it should be easier to read now (and hopefully still syntactically correct). It's tested in Python 2.6.6 on Scientific Linux 6.5 and everything looks fine.
Removed unnecessary semicolons.
I've a problem with shell command, when i want to enter a value by using raw_input and put it in shell command, it displays "s was unexpected in this context".
Here is my program:
import curses, sys, os, signal,argparse
from multiprocessing import Process
from scapy.all import *
from subprocess import call, PIPE
def main():
var=raw_input("Entre wap's #mac: ")
subprocess.call('tshark -r crackWEP.pcap "((wlan.fc.type_subtype==0x20)&&(wlan.bssid==**"%s"%var**))"|wc -l', shell=True, stdout=subprocess.PIPE)
if __name__ == u'__main__':
main()
Well you are not substituting var in the command now, are you?
You mixed bash and python. You probably meant:
var=raw_input("Entre wap's #mac: ")
subprocess.call('tshark -r crackWEP.pcap "((wlan.fc.type_subtype==0x20)&&(wlan.bssid==**"%s"%'+var+'**))"|wc -l', shell=True, stdout=subprocess.PIPE)
Also care with user input and shell=True. People like to put "much fun" in there. I'd advise to call tshark, with shell=False, catch output from it and count lines in python. Running separate external program seems like a waste.
Edit:
2nd more pythonic version:
command = 'tshark -r crackWEP.pcap "((wlan.fc.type_subtype==0x20)&&(wlan.bssid==**"%s"%{}**))"|wc -l'.format(var)
subprocess.call(command, shell=True, stdout=subprocess.PIPE)
I am trying to mirror the following shell command using subprocess.Popen():
echo "SELECT employeeid FROM Users WHERE samaccountname=${1};" | bsqldb -S mdw2k8sqlp02.dow.com -D PhoneBookClient -U PortManUser -P plum45\\torts -q
It currently looks like:
stdout = subprocess.Popen(["echo", "\"SELECT", "employeeid", "FROM", "Users", "WHERE", "samaccountname=${1};\"", "|", "bsqldb", "arg1etc"], stdout=subprocess.PIPE)
for line in stdout.stdout.readlines():
print line
It seems that this is wrong, it returns the following standard out:
"SELECT employeeid FROM Users WHERE samaccountname=${1};" | bsqldb arg1etc
Does anyone know where my syntax for subprocess.Popen() has gone wrong?
The problem is that you're trying to run a shell command without the shell. What happens is that you're passing all of those strings—including "|" and everything after is—as arguments to the echo command.
Just add shell=True to your call to fix that.
However, you almost definitely want to pass the command line as a string, instead of trying to guess at the list that will be joined back up into the string to pass to the shell.
Or, even better, don't use the shell, and instead pipe within Python. The docs have a nice section about Replacing shell pipeline (and all kinds of other things) with subprocess code.
But in your case, the thing you're trying to pipe is just echo, which is quite silly, since you already have exactly what echo would return, and can just feed it as the input to the second program.
Also, I'm not sure what you expect that ${1} to get filled in with. Presumably you're porting a shell script that took some arguments on the command line; your Python script may have the same thing in sys.argv[1], but without knowing more about what you're doing, that's little more than a guess.
The analog of echo some string | command arg1 arg2 shell pipeline in Python is:
from subprocess import Popen, PIPE
p = Popen(["command", "arg1", "arg2"], stdin=PIPE)
p.communicate("some string")
In your case, you could write it as:
import shlex
import sys
from subprocess import Popen, PIPE
cmd = shlex.split("bsqldb -S mdw2k8sqlp02.dow.com -D PhoneBookClient "
"-U PortManUser -P plum45\\torts -q")
sql = """SELECT employeeid FROM Users
WHERE samaccountname={name};""".format(name=sql_escape(sys.argv[1]))
p = Popen(cmd, stdin=PIPE)
p.communicate(input=sql)
sys.exit(p.returncode)
There are many posts here on SO, like this one: Store output of subprocess.Popen call in a string
There is problem with complicated commands. For example, if I need to get output from this
ps -ef|grep something|wc -l
Subprocess won't do the job, because argument for subprocess is [name of program, arguments], so it is not possible to use more sophisicated commands (more programs, pipes, etc.).
Is there way to capture the output of a chain of multiple commands?
Just pass the shell=True option to subprocess
import subprocess
subprocess.check_output('ps -ef | grep something | wc -l', shell=True)
For a no-shell, clean version using the subprocess module, you can use the following example (from the documentation):
output = `dmesg | grep hda`
becomes
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
The Python program essentially does here what the shell does: it sends the output of each command to the next one in turn. An advantage of this approach is that the programmer has full control on the individual standard error outputs of the commands (they can be suppressed if needed, logged, etc.).
That said, I generally prefer to use instead the subprocess.check_output('ps -ef | grep something | wc -l', shell=True) shell-delegation approach suggested by nneonneo: it is general, very legible and convenient.
Well, another alternative would just be to implement part of the command in plain Python. For example,
count = 0
for line in subprocess.check_output(['ps', '-ef']).split('\n'):
if something in line: # or re.search(something, line) to use regex
count += 1
print count