Sed command in python

Sed command in python - python

My input is as
Type combinational function (A B)
Want output to be
Type combinational
function (A B)
I used code and its working
sed 's/\([^ ]* [^ ]*\) \(function.*\)/\1\n\2/' Input_file
When I use this code inside python script using os.system and subprocess its giving me error.
How can I execute this sed inside python script. Or how can I write python code for above sed code.
Python code used
cmd='''
sed 's/\([^ ]* [^ ]*\) \(function.*\)/\1\n\2/' Input_file
'''
subprocess.check_output(cmd, shell=True)
Error is
sed: -e expression #1, char 34: unterminated `s' command

The \n in the string is being substituted by Python into a literal newline. As suggested by #bereal in a comment, you can avoid that by using r'''...''' instead of '''...''' around the script; but a much better solution is to avoid doing in sed what Python already does very well all by itself.
with open('Input_file') as inputfile:
lines = inputfile.read()
lines = lines.replace(' function', '\nfunction')
This is slightly less strict than your current sed script, in that it doesn't require exactly two space-separated tokens before the function marker. If you want to be strict, try re.sub() instead.
import re
# ...
lines = re.sub(r'^(\S+\s+\S+)\s+(function)', r'\1\n\2', lines, re.M)
(Tangentially, you also want to avoid the unnecessary shell=True; perhaps see Actual meaning of 'shell=True' in subprocess)

Although the solutions 1 and 2 are the shortest valid way to get your code running (on Unix), i'd like to add some remarks:
a. os.system() has some issues related to it, and should be replaced by subprocess.call("your command line", shell=False). Regardless of using os.system or subprocess.call, shell=True implies a security risk.
b. Since sed (and awk) are tools that rely heavily on regular expressions it is recommended, when building python for maintainability, to use native python code. In this case use the re, regular expression module, which has a regexp optimized implementation.

Related

Invalid argument/option - '|' [duplicate]

This question already has answers here:
How to use `subprocess` command with pipes
(7 answers)
Closed 1 year ago.
When trying to run the tasklist command with grep by using subprocess:
command = ("tasklist | grep edpa.exe | gawk \"{ print $2 }\"")
p = subprocess.Popen(command, stdout=subprocess.PIPE)
text = p.communicate(timeout=600)[0]
print(text)
I get this error:
ERROR: Invalid argument/option - '|'.
Type "TASKLIST /?" for usage.
It works fine when i run the command directly from cmd, but when using subprocess something goes wrong.
How can it be fixed? I need to use the output of the command so i can not use os.system
.

Two options:
Use the shell=True option of the Popen(); this will pass it through the shell, which is the part that interprets things like the |
Just run tasklist in the Popen(), then do the processing in Python rather than invoking grep and awk
Of the two, the latter is probably the better approach in this particular instance, since these grep and awk commands are easily translated into Python.
Your linters may also complain that shell=True is prone to security issues, although this particular usage would be OK.

In the absence of shell=True, subprocess runs a single subprocess. In other words, you are passing | and grep etc as arguments to tasklist.
The simplest fix is to add shell=True; but a much better fix is to do the trivial text processing in Python instead. This also coincidentally gets rid of the useless grep.
for line in subprocess.check_output(['tasklist'], timeout=600).splitlines():
if 'edpa.exe' in line:
text = line.split()[1]
print(text)
I have assumed you really want to match edpa.exe literally, anywhere in the output line; your regex would match edpa followed by any character followed by exe. The code could be improved by doing the split first and then look for the search string only in the process name field (if that is indeed your intent).
Perhaps notice also how you generally want to avoid the low-level Popen whenever you can use one of the higher-level functions.

How to apply string formatting to a bash command (incorporated into Python script via subprocess)?

I would like to add a bash command to my Python script, which linearises a FASTA sequence file while leaving sequence separation intact (hence the specific choice of command). Below is the command, with the example input file of "inputfile.txt":
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < inputfile.txt
The aim is to allow the user to specify the file which is to be modified in the command line, for example:
$ python3 program.py inputfile.txt
I have tried to use string formatting (i.e. %s) in conjunction with sys.argv in order to achieve this. However, I have tried many different locations of " and ', and still cannot get this to work and accept a user input from the command line here.
(The command contains escapes such as \n and so I have tried to counteract this by adding additional backslashes, as well as additional % for the existing %s in the command.)
import sys
import subprocess
path = sys.argv[1]
holder = subprocess.Popen("""awk '/^>/ {printf("\\n%%s\\n",$0);next; } { printf("%%s",$0);} END {printf("\\n");}' < %s""" % path , shell=True, stdout=subprocess.PIPE).stdout.read()
print(holder)
I would very much appreciate any help with identifying the syntax error here, or suggestions for how I could add this user input.

TL;DR: Don't shell out to awk! Just use Python. But let's go step by step...
Your instinct of using triple quotes here is good, then at least you don't need to escape both single and double quotes, that you need in your shell string.
The next useful device you can use is raw strings, using r'...' or r"..." or r"""...""". Raw strings don't expand backslash escapes, so in that case you can leave the \ns intact.
Last is the %s, which you need to escape if you use the % operator, but here I'm going to suggest that instead of using the shell to redirect input, just use Python's subprocess to send stdin from the file! Much simpler and you end up with no substitution.
I'll also recommend that you use subprocess.check_output() instead of Popen(). It's much simpler to use and it's a lot more robust, since it will check that the command exited successfully (with a zero exit status.)
Putting it all together (so far), you get:
with open(path) as inputfile:
holder = subprocess.check_output(
r"""awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}'""",
shell=True,
stdin=inputfile)
But here you can go one step further, since you don't really need a shell anymore, it's only being used to split the command line into two arguments, so just do this split in Python (it's almost always possible and easy to do this and it's a lot more robust since you don't have to deal with the shell's word splitting!)
with open(path) as inputfile:
holder = subprocess.check_output(
['awk', r'/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}'],
stdin=inputfile)
The second string in the list is still a raw string, since you want to preserve the bacsklash escapes.
I could go into how you can do this without using printf() in awk, using print instead, which should get rid of both \ns and %s, but instead I'll tell you that it's much easier to do what you're doing in Python directly!
In fact, everything that awk (or sed, tr, cut, etc.) can do, Python can do better (or, at least, in a more readable and maintainable way.)
In the case of your particular code:
with open(path) as inputfile:
for line in inputfile:
if line.startswith('>'):
# Insert a blank line before this one.
print()
print(line)
if line.startswith('>'):
# Also insert a blank line after this.
print()
# And a blank line at the end.
print()
Isn't this better?
And you can put this into a function, into a module, and reuse it anywhere you'd like. It's easy to store the result in a string, save it into a variable if you like, much more flexible...
Anyways, if you still want to stick to shelling out, see my previous code, I think that's the best you can do while still shelling out, without significantly changing the external command.

python escape markquotes for bash script [duplicate]

When using os.system() it's often necessary to escape filenames and other arguments passed as parameters to commands. How can I do this? Preferably something that would work on multiple operating systems/shells but in particular for bash.
I'm currently doing the following, but am sure there must be a library function for this, or at least a more elegant/robust/efficient option:
def sh_escape(s):
return s.replace("(","\\(").replace(")","\\)").replace(" ","\\ ")
os.system("cat %s | grep something | sort > %s"
% (sh_escape(in_filename),
sh_escape(out_filename)))
Edit: I've accepted the simple answer of using quotes, don't know why I didn't think of that; I guess because I came from Windows where ' and " behave a little differently.
Regarding security, I understand the concern, but, in this case, I'm interested in a quick and easy solution which os.system() provides, and the source of the strings is either not user-generated or at least entered by a trusted user (me).

shlex.quote() does what you want since python 3.
(Use pipes.quote to support both python 2 and python 3,
though note that pipes has been deprecated since 3.10
and slated for removal in 3.13)

This is what I use:
def shellquote(s):
return "'" + s.replace("'", "'\\''") + "'"
The shell will always accept a quoted filename and remove the surrounding quotes before passing it to the program in question. Notably, this avoids problems with filenames that contain spaces or any other kind of nasty shell metacharacter.
Update: If you are using Python 3.3 or later, use shlex.quote instead of rolling your own.

Perhaps you have a specific reason for using os.system(). But if not you should probably be using the subprocess module. You can specify the pipes directly and avoid using the shell.
The following is from PEP324:
Replacing shell pipe line
-------------------------
output=`dmesg | grep hda`
==>
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]

Maybe subprocess.list2cmdline is a better shot?

Note that pipes.quote is actually broken in Python 2.5 and Python 3.1 and not safe to use--It doesn't handle zero-length arguments.
>>> from pipes import quote
>>> args = ['arg1', '', 'arg3']
>>> print 'mycommand %s' % (' '.join(quote(arg) for arg in args))
mycommand arg1 arg3
See Python issue 7476; it has been fixed in Python 2.6 and 3.2 and newer.

I believe that os.system just invokes whatever command shell is configured for the user, so I don't think you can do it in a platform independent way. My command shell could be anything from bash, emacs, ruby, or even quake3. Some of these programs aren't expecting the kind of arguments you are passing to them and even if they did there is no guarantee they do their escaping the same way.

Notice: This is an answer for Python 2.7.x.
According to the source, pipes.quote() is a way to "Reliably quote a string as a single argument for /bin/sh". (Although it is deprecated since version 2.7 and finally exposed publicly in Python 3.3 as the shlex.quote() function.)
On the other hand, subprocess.list2cmdline() is a way to "Translate a sequence of arguments into a command line string, using the same rules as the MS C runtime".
Here we are, the platform independent way of quoting strings for command lines.
import sys
mswindows = (sys.platform == "win32")
if mswindows:
from subprocess import list2cmdline
quote_args = list2cmdline
else:
# POSIX
from pipes import quote
def quote_args(seq):
return ' '.join(quote(arg) for arg in seq)
Usage:
# Quote a single argument
print quote_args(['my argument'])
# Quote multiple arguments
my_args = ['This', 'is', 'my arguments']
print quote_args(my_args)

The function I use is:
def quote_argument(argument):
return '"%s"' % (
argument
.replace('\\', '\\\\')
.replace('"', '\\"')
.replace('$', '\\$')
.replace('`', '\\`')
)
that is: I always enclose the argument in double quotes, and then backslash-quote the only characters special inside double quotes.

On UNIX shells like Bash, you can use shlex.quote in Python 3 to escape special characters that the shell might interpret, like whitespace and the * character:
import os
import shlex
os.system("rm " + shlex.quote(filename))
However, this is not enough for security purposes! You still need to be careful that the command argument is not interpreted in unintended ways. For example, what if the filename is actually a path like ../../etc/passwd? Running os.system("rm " + shlex.quote(filename)) might delete /etc/passwd when you only expected it to delete filenames found in the current directory! The issue here isn't with the shell interpreting special characters, it's that the filename argument isn't interpreted by the rm as a simple filename, it's actually interpreted as a path.
Or what if the valid filename starts with a dash, for example, -f? It's not enough to merely pass the escaped filename, you need to disable options using -- or you need to pass a path that doesn't begin with a dash like ./-f. The issue here isn't with the shell interpreting special characters, it's that the rm command interprets the argument as a filename or a path or an option if it begins with a dash.
Here is a safer implementation:
if os.sep in filename:
raise Exception("Did not expect to find file path separator in file name")
os.system("rm -- " + shlex.quote(filename))

I think these answers are a bad idea for escaping command-line arguments on Windows. Based on the results: people are trying to apply a black-list approach to filtering 'bad' characters, assuming (and hoping) they got them all. Windows is very complex and there could be all manner of characters found in the future that might allow an attacker to hijack command line arguments.
I've already seen some answers neglect to filter basic meta-characters in Windows (like the semi-colon.) The approach I take is far simpler:
Make a list of allowed ASCII characters.
Remove all chars that aren't in that list.
Escape slashes and double-quotes.
Surround entire command with double quotes so the command argument cannot be maliciously broken and commandeered with spaces.
A basic example:
def win_arg_escape(arg, allow_vars=0):
allowed_list = """'"/\\abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-. """
if allow_vars:
allowed_list += "~%$"
# Filter out anything that isn't a
# standard character.
buf = ""
for ch in arg:
if ch in allowed_list:
buf += ch
# Escape all slashes.
buf = buf.replace("\\", "\\\\")
# Escape double quotes.
buf = buf.replace('"', '""')
# Surround entire arg with quotes.
# This avoids spaces breaking a command.
buf = '"%s"' % (buf)
return buf
The function has an option to enable use of environmental variables and other shell variables. Enabling this poses more risk so its disabled by default.

subprocess call of sed command giving error

I have a text file which contains the following line
PIXEL_SCALE 1.0 # size of pixel in arc
To replace 1.0 in it with 0.3,
I tried to use sed via subprocess.call from python script.
Following sed regex command works perfectly from shell.
sed -i 's/^\(PIXEL_SCALE\s*\)\([0-9]*\.[0-9]*\)/\10.3/' filename.txt
But the equivalent subprocess.call command gives me the following error.
subprocess.call(['sed','-i',"'s/^\(PIXEL_SCALE\s*\)\([0-9]*\.[0-9]*\)/\10.3/'",'filename.txt'])
sed: -e expression #1, char 1: unknown command: `''
I tried converting the string to raw string by prefixing string with r and also tried .encode("UTF-8"). But they didn't have any effect.
What could be going wrong here?
Thanks

' quotes are delimiters used by the shell. As you do not use a shell, you don't need them around your regular expression:
subprocess.call(['sed','-i',r"s/^\(PIXEL_SCALE\s*\)\([0-9]*\.[0-9]*\)/\10.3/",'filename.txt'])
# ^^ ^
In addition, I used a raw string (r"....") to prevent interpretation of the backslash-escaped sequences by python.

subprocess.call("sed -i 's/^\(PIXEL_SCALE\s*\)\([0-9]*\.[0-9]*\)/\10.3/' filename.txt", shell=True)
that works

's/(PIXEL_SCALE\s*)[0-9]+[0-9]+/\10.3/'

grep command called from python

Platform: Windows
Grep: http://gnuwin32.sourceforge.net/packages/grep.htm
Python: 2.7.2
Windows command prompt used to execute the commands.
I am searching for the for the following pattern "2345$" in a file.
Contents of the file are as follows:
abcd 2345
2345
abcd 2345$
grep "2345$" file.txt
grep returns 2 lines (first and second) successfully.
When I try to run the above command through python I don't see any output.
Python code snippet is as follows:
temp = open('file.txt', "r+")
grep_cmd = []
grep_cmd.extend([grep, '"2345$"' ,temp.name])
print grep_cmd
p = subprocess.Popen(grep_cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdoutdata = p.communicate()[0]
print stdoutdata
If I have
grep_cmd.extend([grep, '2345$' ,temp.name])
in my python script, I get the correct answer.
The questions is why the grep command with "
grep_cmd.extend([grep, '"2345$"' ,temp.name])
executed from python fails. Isn't python supposed to execute
the command as it is.
Thanks
Gudge.

Do not put double quotes around your pattern. It is only needed on the command line to quote shell metacharacters. When calling a program from python, you do not need this.
You also do not need to open the file yourself - grep will do that:
grep_cmd.extend([grep, '2345$', 'file.txt'])
To understand the reason for the double quotes not being needed and causing your command to fail, you need to understand the purpose of the double quotes and how they are processed.
The shell uses double quotes to prevent special processing of some shell metacharacters. Shell metacharacters are those characters that the shell handles specially and does not pass literally to the programs it executes. The most commonly used shell metacharacter is "space". The shell splits a command on space boundaries to build an argument vector to execute a program with. If you want to include a space in an argument, it must be quoted in some way (single or double quotes, backslash, etc). Another is the dollar sign ($), which is used to signify variable expansion.
When you are executing a program without the shell involved, all these rules about quoting and shell metacharacters are not relevant. In python, you are building the argument vector yourself, so the relevant quoting rules are python quoting rules (e.g. to include a double quote inside a double-quoted string, prefix the double quote with a backslash - the backslash will not be in the final string). The characters in each element of the argument vector when you have completed constructing it are the literal characters that will be passed to the program you are executing.
Grep does not treat double quotes as special characters, so if grep gets double quotes in its search pattern, it will attempt to match double quotes from its input.
My original answer's reference to shell=True was incorrect - first I did not notice that you had originally specified shell=True, and secondly I was coming from the perspective of a Unix/Linux implementation, not Windows.
The python subprocess module page has this to say about shell=True and Windows:
On Windows: the Popen class uses CreateProcess() to execute the child child program, which operates on strings. If args is a sequence, it will be converted to a string in a manner described in Converting an argument sequence to a string on Windows.
That linked section on converting an argument sequence to a string on Windows does not make sense to me. First, a string is a sequence, and so is a list, yet the Frequently Used Arguments section says this about arguments:
args is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names).
This contradicts the conversion process described in the Python documentation, and given the behaviour you have observed, I'd say the documentation is wrong, and only applied to a argument string, not an argument vector. I cannot verify this myself as I do not have Windows or the source code for Python lying around.
I suspect that if you call subprocess.Popen like:
p = subprocess.Popen(grep + ' "2345$" file.txt', stdout=..., shell_True)
you may find that the double quotes are stripped out as part of the documented argument conversion.

You can use python-textops3 :
from textops import *
print('\n'.join(cat('file.txt') | grep('2345$')))
with python-textops3 you can use unix-like commands with pipes within python
so no need to fork a process which is very heavy

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.