Platform: Windows
Grep: http://gnuwin32.sourceforge.net/packages/grep.htm
Python: 2.7.2
Windows command prompt used to execute the commands.
I am searching for the for the following pattern "2345$" in a file.
Contents of the file are as follows:
abcd 2345
2345
abcd 2345$
grep "2345$" file.txt
grep returns 2 lines (first and second) successfully.
When I try to run the above command through python I don't see any output.
Python code snippet is as follows:
temp = open('file.txt', "r+")
grep_cmd = []
grep_cmd.extend([grep, '"2345$"' ,temp.name])
print grep_cmd
p = subprocess.Popen(grep_cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdoutdata = p.communicate()[0]
print stdoutdata
If I have
grep_cmd.extend([grep, '2345$' ,temp.name])
in my python script, I get the correct answer.
The questions is why the grep command with "
grep_cmd.extend([grep, '"2345$"' ,temp.name])
executed from python fails. Isn't python supposed to execute
the command as it is.
Thanks
Gudge.
Do not put double quotes around your pattern. It is only needed on the command line to quote shell metacharacters. When calling a program from python, you do not need this.
You also do not need to open the file yourself - grep will do that:
grep_cmd.extend([grep, '2345$', 'file.txt'])
To understand the reason for the double quotes not being needed and causing your command to fail, you need to understand the purpose of the double quotes and how they are processed.
The shell uses double quotes to prevent special processing of some shell metacharacters. Shell metacharacters are those characters that the shell handles specially and does not pass literally to the programs it executes. The most commonly used shell metacharacter is "space". The shell splits a command on space boundaries to build an argument vector to execute a program with. If you want to include a space in an argument, it must be quoted in some way (single or double quotes, backslash, etc). Another is the dollar sign ($), which is used to signify variable expansion.
When you are executing a program without the shell involved, all these rules about quoting and shell metacharacters are not relevant. In python, you are building the argument vector yourself, so the relevant quoting rules are python quoting rules (e.g. to include a double quote inside a double-quoted string, prefix the double quote with a backslash - the backslash will not be in the final string). The characters in each element of the argument vector when you have completed constructing it are the literal characters that will be passed to the program you are executing.
Grep does not treat double quotes as special characters, so if grep gets double quotes in its search pattern, it will attempt to match double quotes from its input.
My original answer's reference to shell=True was incorrect - first I did not notice that you had originally specified shell=True, and secondly I was coming from the perspective of a Unix/Linux implementation, not Windows.
The python subprocess module page has this to say about shell=True and Windows:
On Windows: the Popen class uses CreateProcess() to execute the child child program, which operates on strings. If args is a sequence, it will be converted to a string in a manner described in Converting an argument sequence to a string on Windows.
That linked section on converting an argument sequence to a string on Windows does not make sense to me. First, a string is a sequence, and so is a list, yet the Frequently Used Arguments section says this about arguments:
args is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names).
This contradicts the conversion process described in the Python documentation, and given the behaviour you have observed, I'd say the documentation is wrong, and only applied to a argument string, not an argument vector. I cannot verify this myself as I do not have Windows or the source code for Python lying around.
I suspect that if you call subprocess.Popen like:
p = subprocess.Popen(grep + ' "2345$" file.txt', stdout=..., shell_True)
you may find that the double quotes are stripped out as part of the documented argument conversion.
You can use python-textops3 :
from textops import *
print('\n'.join(cat('file.txt') | grep('2345$')))
with python-textops3 you can use unix-like commands with pipes within python
so no need to fork a process which is very heavy
Related
I'm trying without sucess to pass a Json string to a Python Script using PowerShell Script (.ps1) to automate this task.
spark-submit `
--driver-memory 8g `
--master local[*] `
--conf spark.driver.bindAddress=127.0.0.1 `
--packages mysql:mysql-connector-java:6.0.6,org.elasticsearch:elasticsearch-spark-20_2.11:7.0.0 `
--py-files build/dependencies.zip build/main.py `
$param
When $param='{ \"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test\"\"}' works fine, the python receives a valid JSON string and parse correctly.
When I use the character & like $param='{ \"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&autoReconnect=true&useSSL=false\"\"}' the string is printed like { "job_start": \jdbc:mysql://127.0.0.1:3307/test? and the rest of the string are reconized as other commands.
'serverTimezone' is not recognized as an internal or external command
'autoReconnect' is not recognized as an internal or external command
'useSSL' is not recognized as an internal or external command
The \"\" is need to maintain the double quots in the Python script, not sure why need two escaped double quotes.
UPDATE:
Now I'm having problems with the ! character, I can't escape this character even with ^ or \.
# Only "" doesn't work
$param='{\"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test^&serverTimezone=UTC\"\", \"\"password\"\": \"\"testpassword^!123\"\"}'
spark-submit.cmd `
--driver-memory 8g `
--master local[*] `
--conf spark.driver.bindAddress=127.0.0.1 `
--packages mysql:mysql-connector-java:6.0.6,org.elasticsearch:elasticsearch-spark-20_2.11:7.0.0 `
--py-files build/dependencies.zip build/main.py `
$param
# OUTPUT: misses the ! character
{"job_start": "jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC", "password": "testpassword123"}
Thanks you all.
tl;dr
Note: The following does not solve the OP's specific problem (the cause of which is still unknown), but hopefully contains information of general interest.
# Use "" to escape " and - in case of delayed expansion - ^! to escape !
$param = '{ ""job_start"": ""jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more^!"" }'
There are high-profile utilities (CLIs) such as az (Azure) that are Python-based, but on Windows use an auxiliary batch file as the executable that simply relays arguments to a Python script.
Use Get-Command az, for instance, to discover an executable's full file name; batch files, which are processed by cmd.exe, the legacy command processor, have a filename extension of either .cmd or .bat
To prevent calls to such a batch file from breaking, double quotes embedded in arguments passed from PowerShell must be escaped as ""
Additionally, but only if setlocal enabledelayedexpansion is in effect in a given target batch file or if your computer is configured to use delayed expansion by default, for all batch files:
! characters must be escaped as ^!, which, however, is only effective if cmd.exe considers the ! part of a double-quoted string.
It looks like we have a confluence of two problems:
A PowerShell problem with " chars. embedded in arguments passed to external programs:
In an ideal world, passing JSON text such as '{ "foo": "bar" }' to an external program would work as-is, but due to PowerShell's broken handling of embedded double quotes, that is not enough, and the " chars. must additionally be escaped, for the target program, either as \" (which most programs support), or, in the case of cmd.exe (see below), as "", which Python fortunately recognizes too: '{ ""foo"": ""bar"" }'
Limitations of argument-passing and escaping in cmd.exe batch files:
It sounds like spark-submit is an auxiliary batch file (.cmd or .bat) that passes the arguments through to a Python script.
The problem is that if you use \" for escaping embedded ", cmd.exe doesn't recognize them as escaped, which causes it to consider the & characters unquoted, and they are therefore interpreted as shell metacharacters, i.e. as characters with special syntactic function (command sequencing, in this case).
Additionally, and only if setlocal enabledelayedexpansion is in effect in a given batch file, any literal ! characters in arguments require additional handling:
If cmd.exe thinks the ! is part of an unquoted argument, you cannot escape ! at all.
Inside a quoted argument (which invariably means "..." in cmd.exe), you must escape a literal ! as ^!.
Note that this requirement is the inverse of how all other metacharacters must be escaped (which require ^ when unquoted, but not inside "...").
The unfortunate consequence is that you need to know the implementation details of the target batch file - whether it uses setlocal enabledelayedexpansion or not - in order to formulate your arguments properly.
The same applies if your computer is configured to use delayed expansion by default, for all batch files (and interactively), which is neither common nor advisable. To test if a given computer is configured that way, check the output from the following command for DelayedExpansion : 1: if there's no output at all, delayed expansion is OFF; if there's 1 or 2 outputs, delayed expansion is ON by default if the first or only output reports DelayedExpansion : 1.
Get-ItemProperty -EA Ignore 'registry::HKEY_CURRENT_USER\Software\Microsoft\Command Processor', 'registry::HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor' DelayedExpansion
Workaround:
Since you're technically calling a batch file, use "" to escape literal " chars. inside your single-quoted ('...') PowerShell string.
If you know that the target batch file uses setlocal enabledelayedexpansion or if your computer is configured to use delayed expansion by default, escape ! characters as ^!
Note that this is only effective if cmd.exe considers the ! part of a double-quoted string.
Therefore (note that I've extended the URL to include a token with !, meant to be passed through literally as suffix more!):
$param = '{ ""job_start"": ""jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more^!"" }'
If you need to escape an existing JSON string programmatically:
# Unescaped JSON string, which in an ideal world you'd be able
# to pass as-is.
$param = '{ "job_start": "jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more!" }'
# Escape the " chars.
$param = $param -replace '"', '""'
# If needed, also escape the ! chars.
$param = $param -replace '!', '^!'
Ultimately, both problems should be fixed at the source - but that this is highly unlikely, because it would break backward compatibility.
With respect to PowerShell, this GitHub issue contains the backstory, technical details, a robust wrapper function to hide the problems, and discussions about how to fix the problem at least on an opt-in basis.
In this question Which characters need to be escaped when using Bash?
, you will find all the characters that you should escape when passing them as normal characters in the shell, you will also notice that & is one of them.
Now I understand that if you tried to escape it, the JSON parser you are using will probably fail to parse the string. So one quick workaround would be to replace the & by any other special non-escapable symbol like # or %, and do a step in your app where you replace it with & before parsing. Just make sure that the symbol you will use isn't used in your strings, and won't be used at any time.
I would like to run ssh with print of python.
The followings are my test code.
import subprocess
# case1:
command_str = "\"print(\'test\')\""
# case 2:
# command_str = "\\\"print(\'test\')\\\""
ssh_command = ['ssh', 'USER_X#localhost', 'python', '-c']
ssh_command.append(command_str)
process = subprocess.run(ssh_command, stdout=subprocess.PIPE)
print(process.stdout)
case 1 and case 2 did not work.
The outputs are followings,
case 1:
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: `python -c print('test')'
b''
case 2:
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: `python -c \"print('test')\"'
b''
Please let me know how it works.
It should work with
command_str = "'print(\"test\")'"
or equivalently
command_str = '\'print("test")\''
Explanation
The outermost quotes and the escaping are for the local Python. So in either case, the local Python string will be 'print("test")'.
There is no quoting or escaping required for the local shell, as subcommand.run(...) won't invoke it unless shell=True is passed.
Thus the single quotes within the python string are for the remote shell (presumably bash or other sh-compatible shell). The argument passed to the remote Python is thus print("test"). (And the double quotes in there are to signify the string literal to print to the remote python.)
Can we do without escaping (without \)?
As there are three levels involved (local Python, remote shell, remote Python), I don't think so.
Can we do with a single type of quotes?
Yes, with a bit more escaping. Let's build this from behind (or inside-out).
We want to print
test
This needs to be escaped for the remote Python (to form a string literal instead of an identifier):
"test"
Call this with the print() function:
print("test")
Quite familiar so far.
Now we want to pass this as an argument to python -c on a sh-like shell. To protect the ( and ) to be interpreted by that, we quote the whole thing. For the already present " not to terminate the quotation, we escape them:
"print(\"test\")"
You can try this in a terminal:
$> echo "print(\"test\")"
print("test")
Perfect!
Now we have to represent the whole thing in (the local) Python. We wrap another layer of quotes around it, have to escape the four(!) existing quotation marks as well as the two backslashes:
"\"print(\\\"test\\\")\""
(Done. This can also be used as command_str.)
Can we do with only single quotes (') and escaping?
I don't know, but at least not as easily. Why? Because, other than to Python, double and single quotes aren't interchangeable to sh and bash: Within single quotes, these shells assume a raw string without escaping until the closing ' occurs.
My brain hurts!
If literally, go see a doctor. If figuratively, yeah, mine too. And your code's future readers (including yourself) will probably feel the same, when they try to untangle that quoting-escaping-forest.
But there's a painless alternative in our beloved Python standard library!
import shlex
command_str = shlex.quote('print("test")')
This is much easier to understand. The inner quotes (double quotes here, but doesn't really matter: shlex.quote("print('test')") works just as fine) are for the remote Python. The outer quotes are obviously for the local Python. And all the quoting and escaping beyond that for the remote shell is taken care of by this utility function.
The correct syntax for python 2 and 3 is:
python -c 'print("test")'
When using os.system() it's often necessary to escape filenames and other arguments passed as parameters to commands. How can I do this? Preferably something that would work on multiple operating systems/shells but in particular for bash.
I'm currently doing the following, but am sure there must be a library function for this, or at least a more elegant/robust/efficient option:
def sh_escape(s):
return s.replace("(","\\(").replace(")","\\)").replace(" ","\\ ")
os.system("cat %s | grep something | sort > %s"
% (sh_escape(in_filename),
sh_escape(out_filename)))
Edit: I've accepted the simple answer of using quotes, don't know why I didn't think of that; I guess because I came from Windows where ' and " behave a little differently.
Regarding security, I understand the concern, but, in this case, I'm interested in a quick and easy solution which os.system() provides, and the source of the strings is either not user-generated or at least entered by a trusted user (me).
shlex.quote() does what you want since python 3.
(Use pipes.quote to support both python 2 and python 3,
though note that pipes has been deprecated since 3.10
and slated for removal in 3.13)
This is what I use:
def shellquote(s):
return "'" + s.replace("'", "'\\''") + "'"
The shell will always accept a quoted filename and remove the surrounding quotes before passing it to the program in question. Notably, this avoids problems with filenames that contain spaces or any other kind of nasty shell metacharacter.
Update: If you are using Python 3.3 or later, use shlex.quote instead of rolling your own.
Perhaps you have a specific reason for using os.system(). But if not you should probably be using the subprocess module. You can specify the pipes directly and avoid using the shell.
The following is from PEP324:
Replacing shell pipe line
-------------------------
output=`dmesg | grep hda`
==>
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
Maybe subprocess.list2cmdline is a better shot?
Note that pipes.quote is actually broken in Python 2.5 and Python 3.1 and not safe to use--It doesn't handle zero-length arguments.
>>> from pipes import quote
>>> args = ['arg1', '', 'arg3']
>>> print 'mycommand %s' % (' '.join(quote(arg) for arg in args))
mycommand arg1 arg3
See Python issue 7476; it has been fixed in Python 2.6 and 3.2 and newer.
I believe that os.system just invokes whatever command shell is configured for the user, so I don't think you can do it in a platform independent way. My command shell could be anything from bash, emacs, ruby, or even quake3. Some of these programs aren't expecting the kind of arguments you are passing to them and even if they did there is no guarantee they do their escaping the same way.
Notice: This is an answer for Python 2.7.x.
According to the source, pipes.quote() is a way to "Reliably quote a string as a single argument for /bin/sh". (Although it is deprecated since version 2.7 and finally exposed publicly in Python 3.3 as the shlex.quote() function.)
On the other hand, subprocess.list2cmdline() is a way to "Translate a sequence of arguments into a command line string, using the same rules as the MS C runtime".
Here we are, the platform independent way of quoting strings for command lines.
import sys
mswindows = (sys.platform == "win32")
if mswindows:
from subprocess import list2cmdline
quote_args = list2cmdline
else:
# POSIX
from pipes import quote
def quote_args(seq):
return ' '.join(quote(arg) for arg in seq)
Usage:
# Quote a single argument
print quote_args(['my argument'])
# Quote multiple arguments
my_args = ['This', 'is', 'my arguments']
print quote_args(my_args)
The function I use is:
def quote_argument(argument):
return '"%s"' % (
argument
.replace('\\', '\\\\')
.replace('"', '\\"')
.replace('$', '\\$')
.replace('`', '\\`')
)
that is: I always enclose the argument in double quotes, and then backslash-quote the only characters special inside double quotes.
On UNIX shells like Bash, you can use shlex.quote in Python 3 to escape special characters that the shell might interpret, like whitespace and the * character:
import os
import shlex
os.system("rm " + shlex.quote(filename))
However, this is not enough for security purposes! You still need to be careful that the command argument is not interpreted in unintended ways. For example, what if the filename is actually a path like ../../etc/passwd? Running os.system("rm " + shlex.quote(filename)) might delete /etc/passwd when you only expected it to delete filenames found in the current directory! The issue here isn't with the shell interpreting special characters, it's that the filename argument isn't interpreted by the rm as a simple filename, it's actually interpreted as a path.
Or what if the valid filename starts with a dash, for example, -f? It's not enough to merely pass the escaped filename, you need to disable options using -- or you need to pass a path that doesn't begin with a dash like ./-f. The issue here isn't with the shell interpreting special characters, it's that the rm command interprets the argument as a filename or a path or an option if it begins with a dash.
Here is a safer implementation:
if os.sep in filename:
raise Exception("Did not expect to find file path separator in file name")
os.system("rm -- " + shlex.quote(filename))
I think these answers are a bad idea for escaping command-line arguments on Windows. Based on the results: people are trying to apply a black-list approach to filtering 'bad' characters, assuming (and hoping) they got them all. Windows is very complex and there could be all manner of characters found in the future that might allow an attacker to hijack command line arguments.
I've already seen some answers neglect to filter basic meta-characters in Windows (like the semi-colon.) The approach I take is far simpler:
Make a list of allowed ASCII characters.
Remove all chars that aren't in that list.
Escape slashes and double-quotes.
Surround entire command with double quotes so the command argument cannot be maliciously broken and commandeered with spaces.
A basic example:
def win_arg_escape(arg, allow_vars=0):
allowed_list = """'"/\\abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-. """
if allow_vars:
allowed_list += "~%$"
# Filter out anything that isn't a
# standard character.
buf = ""
for ch in arg:
if ch in allowed_list:
buf += ch
# Escape all slashes.
buf = buf.replace("\\", "\\\\")
# Escape double quotes.
buf = buf.replace('"', '""')
# Surround entire arg with quotes.
# This avoids spaces breaking a command.
buf = '"%s"' % (buf)
return buf
The function has an option to enable use of environmental variables and other shell variables. Enabling this poses more risk so its disabled by default.
I need a python script to call a bash script on windows.
So basically I must make a subprocess call form python, that will call cygwin with the -c option that will call the script I need,
The problem is that this script takes a few arguments and that these arguments are full os spaces and quotes and slashes.
I'm using code like the following
arq_saida_unix = arq_saida.replace("\\","/")
subprocess.call("C:\\cygwin64\\bin\\bash \".\\retirarVirgula.sh\\ \""+arq_saida+"\"")
Or I'm directly escaping, which sometimes takes me to as much as 8 backslashes in a row, for a backslash to get to my script must be escaped i) in bash ii) in cmd.exe iii) in python
all of this is error prone and takes quite some time every time to get it right.
Is there a better way of doing it? Ideally I wouldn't have any escaping backslashes, but anything that avoids the triple-slash double quote above would be nice.
I tried to use re.escape, but could figure out how exactly to use it , except as a replacement to .replace("\","/") and similar.
Don't pass a single string to call; instead, pass a list consisting of the command name and one argument per element. This saves you from needing to protect special characters from shell interpretation.
subprocess.call(["retirarVirgula.sh", arq_saida], executable=r"C:\cygwin64\bin\bash")
Note: I'm assuming arq_saida contains the single argument to pass to the script; if the script takes multiple arguments, then arc_saida should probably be built as a list as well:
arq_saida = ["arg", "arg two", "arg three"]
subprocess.call(["retirarVirgula.sh"] + arq_saida, executable=r"C:\cygwin64\bin\bash")
I am attempting to create a Python script that in turn runs the shell script "js2coffee" to convert some javascript into coffeescript.
From the command line I can run this, and get coffeescript back again...
echo "var myNumber = 100;" | js2coffee
What I need to do is use this same pattern from Python.
In Python, I've come to something like this:
command = "echo '" + myJavscript + "' | js2coffee"
result = os.popen(command).read()
This works sometimes, but there are issues related to special characters (mostly quotes, I think) not being properly escaped in the myJavascript. There has got to be a standard way of doing this. Any ideas? Thanks!
Use the input stream of a process to feed it the data, that way you can avoid the shell and you don't need to escape your javascript. Additionally, you're not vulnerable to shell injection attacks;
pr = subprocess.Popen(['js2coffee'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
result, stderrdata = pr.communicate('var myNumber = 100;')
subprocess module is the way to go:
http://docs.python.org/library/subprocess.html#frequently-used-arguments
be kindly noted the following:
args is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names)