I'm trying without sucess to pass a Json string to a Python Script using PowerShell Script (.ps1) to automate this task.
spark-submit `
--driver-memory 8g `
--master local[*] `
--conf spark.driver.bindAddress=127.0.0.1 `
--packages mysql:mysql-connector-java:6.0.6,org.elasticsearch:elasticsearch-spark-20_2.11:7.0.0 `
--py-files build/dependencies.zip build/main.py `
$param
When $param='{ \"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test\"\"}' works fine, the python receives a valid JSON string and parse correctly.
When I use the character & like $param='{ \"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&autoReconnect=true&useSSL=false\"\"}' the string is printed like { "job_start": \jdbc:mysql://127.0.0.1:3307/test? and the rest of the string are reconized as other commands.
'serverTimezone' is not recognized as an internal or external command
'autoReconnect' is not recognized as an internal or external command
'useSSL' is not recognized as an internal or external command
The \"\" is need to maintain the double quots in the Python script, not sure why need two escaped double quotes.
UPDATE:
Now I'm having problems with the ! character, I can't escape this character even with ^ or \.
# Only "" doesn't work
$param='{\"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test^&serverTimezone=UTC\"\", \"\"password\"\": \"\"testpassword^!123\"\"}'
spark-submit.cmd `
--driver-memory 8g `
--master local[*] `
--conf spark.driver.bindAddress=127.0.0.1 `
--packages mysql:mysql-connector-java:6.0.6,org.elasticsearch:elasticsearch-spark-20_2.11:7.0.0 `
--py-files build/dependencies.zip build/main.py `
$param
# OUTPUT: misses the ! character
{"job_start": "jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC", "password": "testpassword123"}
Thanks you all.
tl;dr
Note: The following does not solve the OP's specific problem (the cause of which is still unknown), but hopefully contains information of general interest.
# Use "" to escape " and - in case of delayed expansion - ^! to escape !
$param = '{ ""job_start"": ""jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more^!"" }'
There are high-profile utilities (CLIs) such as az (Azure) that are Python-based, but on Windows use an auxiliary batch file as the executable that simply relays arguments to a Python script.
Use Get-Command az, for instance, to discover an executable's full file name; batch files, which are processed by cmd.exe, the legacy command processor, have a filename extension of either .cmd or .bat
To prevent calls to such a batch file from breaking, double quotes embedded in arguments passed from PowerShell must be escaped as ""
Additionally, but only if setlocal enabledelayedexpansion is in effect in a given target batch file or if your computer is configured to use delayed expansion by default, for all batch files:
! characters must be escaped as ^!, which, however, is only effective if cmd.exe considers the ! part of a double-quoted string.
It looks like we have a confluence of two problems:
A PowerShell problem with " chars. embedded in arguments passed to external programs:
In an ideal world, passing JSON text such as '{ "foo": "bar" }' to an external program would work as-is, but due to PowerShell's broken handling of embedded double quotes, that is not enough, and the " chars. must additionally be escaped, for the target program, either as \" (which most programs support), or, in the case of cmd.exe (see below), as "", which Python fortunately recognizes too: '{ ""foo"": ""bar"" }'
Limitations of argument-passing and escaping in cmd.exe batch files:
It sounds like spark-submit is an auxiliary batch file (.cmd or .bat) that passes the arguments through to a Python script.
The problem is that if you use \" for escaping embedded ", cmd.exe doesn't recognize them as escaped, which causes it to consider the & characters unquoted, and they are therefore interpreted as shell metacharacters, i.e. as characters with special syntactic function (command sequencing, in this case).
Additionally, and only if setlocal enabledelayedexpansion is in effect in a given batch file, any literal ! characters in arguments require additional handling:
If cmd.exe thinks the ! is part of an unquoted argument, you cannot escape ! at all.
Inside a quoted argument (which invariably means "..." in cmd.exe), you must escape a literal ! as ^!.
Note that this requirement is the inverse of how all other metacharacters must be escaped (which require ^ when unquoted, but not inside "...").
The unfortunate consequence is that you need to know the implementation details of the target batch file - whether it uses setlocal enabledelayedexpansion or not - in order to formulate your arguments properly.
The same applies if your computer is configured to use delayed expansion by default, for all batch files (and interactively), which is neither common nor advisable. To test if a given computer is configured that way, check the output from the following command for DelayedExpansion : 1: if there's no output at all, delayed expansion is OFF; if there's 1 or 2 outputs, delayed expansion is ON by default if the first or only output reports DelayedExpansion : 1.
Get-ItemProperty -EA Ignore 'registry::HKEY_CURRENT_USER\Software\Microsoft\Command Processor', 'registry::HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor' DelayedExpansion
Workaround:
Since you're technically calling a batch file, use "" to escape literal " chars. inside your single-quoted ('...') PowerShell string.
If you know that the target batch file uses setlocal enabledelayedexpansion or if your computer is configured to use delayed expansion by default, escape ! characters as ^!
Note that this is only effective if cmd.exe considers the ! part of a double-quoted string.
Therefore (note that I've extended the URL to include a token with !, meant to be passed through literally as suffix more!):
$param = '{ ""job_start"": ""jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more^!"" }'
If you need to escape an existing JSON string programmatically:
# Unescaped JSON string, which in an ideal world you'd be able
# to pass as-is.
$param = '{ "job_start": "jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more!" }'
# Escape the " chars.
$param = $param -replace '"', '""'
# If needed, also escape the ! chars.
$param = $param -replace '!', '^!'
Ultimately, both problems should be fixed at the source - but that this is highly unlikely, because it would break backward compatibility.
With respect to PowerShell, this GitHub issue contains the backstory, technical details, a robust wrapper function to hide the problems, and discussions about how to fix the problem at least on an opt-in basis.
In this question Which characters need to be escaped when using Bash?
, you will find all the characters that you should escape when passing them as normal characters in the shell, you will also notice that & is one of them.
Now I understand that if you tried to escape it, the JSON parser you are using will probably fail to parse the string. So one quick workaround would be to replace the & by any other special non-escapable symbol like # or %, and do a step in your app where you replace it with & before parsing. Just make sure that the symbol you will use isn't used in your strings, and won't be used at any time.
Related
In bash (as started by Python), I want to print this string \033[31m so that I can use a pipe | operator after it, followed by a command to copy that string to the clipboard. This means that in practice, I'm trying to run something like:
os.system('echo \\033[31m | xsel -ib')
...but the xsel -ib part is working fine, so this question is focused specifically on the behavior of echo.
Most of my attempts have been similar to:
echo -e \\033[31m
I have tried it with single quotes, double quotes, no quotes, removing the -e flag, etc. The closest I got was:
echo -n "\\ 033[31m"
which prints this string \ 033[31m
I don't want that space between \ and 0
-n flag is used to not append a new line after the printed string
I use Ubuntu 20.04, and xsel is a selection and clipboard manipulation tool for the X11 Window System (which Ubuntu 20.04 uses).
echo is the wrong tool for the job. It's a shell builtin, and one for which the POSIX sh standard explicitly does not guarantee portable behavior for when escape sequences (such as \033) are present. system() starts /bin/sh instead of bash, so POSIX behavior -- not that of your regular interactive shell -- is expected.
Use subprocess.run() instead of os.system(), and you don't need echo in the first place.
If you want to put an escape sequence into the clipboard (so not \033 but instead the ESC key that this gets converted to by an echo with XSI extensions to POSIX):
# to store \033 as a single escape character, use a regular Python bytestring
subprocess.run(['xsel', '-ib'], input=b'\033[31m')
If you want to put the literal text without being interpreted (so there's an actual backslash and an actual zero), use a raw bytestring instead:
# to store \033 as four separate characters, use a raw string
subprocess.run(['xsel', '-ib'], input=rb'\033[31m')
For a more detailed description of why echo causes problems in this context, see the excellent answer by Stephane to the Unix & Linux Stack Exchange question Why is printf better than echo?.
If you for some reason do want to keep using a shell pipeline, switch to printf instead:
# to store \033 as four separate characters, use %s
subprocess.run(r''' printf '%s\n' '\033[31m' | xsel -ib ''', shell=True)
# to store \033 as a single escape character, use %b
subprocess.run(r''' printf '%b\n' '\033[31m' | xsel -ib ''', shell=True)
I have some app which gets configured using a YAML file. That app does some processing and supports hooks to execute shell code before and after the processing. This is mostly meant to execute external script files doing the real business, but one can e.g. export environment variables as well. It seems like things are simply forwarded to a shell call with the configured string.
The important thing to note is that one hook is specially called in case ANYTHING goes wrong during processing. In that case the app provides some additional error details to the configured shell script. This is done by reading the necessary part of the YAML config and doing a simple string replacement of special keywords to what is actually available on runtime. Those keywords follow the syntax {...}. Things look like the following in my config:
on_error:
- |
export BGM_ERR_VAR_CONFIG_PATH="{configuration_filename}"
export BGM_ERR_VAR_REPO="{repository}"
export BGM_ERR_VAR_ERROR_MSG="{error}"
export BGM_ERR_VAR_ERROR_OUT="{output}"
'/path/to/script.sh 'some_arg' '[...]' [...]
Originally those keywords were expected to be forwarded as arguments in the called script, but my script needs some other arguments already, so I decided to forward things using environment variables. Shouldn't make too much of a difference regarding my problem, though.
That problem is that really ANYTHING can got wrong and especially the placeholder {output} can contain arbitrary complex error messages. It's most likely a mixture of executed shell commands, using single quotes in most cases, and stacktraces of the programming language the app is implemented in, using double quotes. With my config above this leads to invalid shell code being executed in the end:
[2021-10-12 07:18:46,073] ERROR: /bin/sh: 13: Syntax error: Unterminated quoted string
The following is what the app logs as being executed at all:
[2021-10-12 07:18:46,070] DEBUG: export BGM_ERR_VAR_CONFIG_PATH="/path/to/some.yaml"
export BGM_ERR_VAR_REPO="HOST:PARENT/CHILD"
export BGM_ERR_VAR_ERROR_MSG="Command 'borg check --prefix arch- --debug --show-rc --umask 0007 HOST:PARENT/CHILD' returned non-zero exit status 2."
export BGM_ERR_VAR_ERROR_OUT="using builtin fallback logging configuration
35 self tests completed in 0.04 seconds
SSH command line: ['ssh', '-F', '/[...]/.ssh/config', 'HOST', 'borg', 'serve', '--umask=007', '--debug']
RemoteRepository: 169 B bytes sent, 66 B bytes received, 3 messages sent
Connection closed by remote host
Traceback (most recent call last):
File "borg/archiver.py", line 177, in wrapper"
'/path/to/script.sh '[...]' '[...]' '[...]' '[...]'
The args to my own script are safe regarding quoting, those are only hard-coded paths, keywords etc., nothing dynamic in any way. The problem should be the double quotes used for the path to the python file throwing the exception. OTOH, if I only use single quotes with my environment variables, those would break because the output shell command invoked uses single quotes as well.
So, how do I implement a safe forwarding of {output} into the environment variable in this context?
I thought of using some subshell ="$(...)" and sed to normalize quotes, but everything I came up with resulted in a command line with exactly the same quoting problems like before. Same goes for printf and its %q to escape quotes. It seems I need something which is able to deal with arbitrary individual arguments and joining those to some string again or something like that. Additionally, things should not be too complex to not bloat the YAML config in the end.
The following might work, but loses the double quotes:
export BGM_ERR_VAR_ERROR_OUT="$(echo "{output}")"
How about that?
export BGM_ERR_VAR_ERROR_OUT="$(cat << EOT
{output}
EOT
)"
Anything else? Thanks!
To avoid all the replacement problems, I suggest not using replacements, and forwarding the values as environment variables instead. This assumes you have control over the calling code, which I assume is correct from your explanation.
Since environment variables are by convention uppercase, putting your values in lowercase names is quite safe, and then you can simply do
on_error:
- |
export BGM_ERR_VAR_CONFIG_PATH="$configuration_filename"
export BGM_ERR_VAR_REPO="$repository"
export BGM_ERR_VAR_ERROR_MSG="$error"
export BGM_ERR_VAR_ERROR_OUT="$output"
'/path/to/script.sh 'some_arg' '[...]' [...]
The calling code would need to modify the environment accordingly so that it will contain the expected values. This is the safest way to forward the values, since it guarantees not to interpret the values as bash syntax at all.
If this is not possible, the next best thing is probably to use a heredoc, albeit one with quotes to avoid processing anything in the content – you can use read to avoid the unnecessary cat:
on_error:
- |
read -r -d '' BGM_ERR_VAR_CONFIG_PATH <<'EOF'
{configuration_filename}
EOF
export BGM_ERR_VAR_CONFIG_PATH
# ... snip: other variables ...
'/path/to/script.sh 'some_arg' '[...]' [...]
The only thing you need to be aware of here is that the content may not include a line reading EOF. The calling code needs to ensure this.
I wrote a Python script to replace "powerline" as a terminal prompt solution for myself here: https://github.com/diogobaeder/dotfiles/blob/master/.bash_prompt.py
Then all I do is to define the prompt from the output of that script:
# in my ~/.bashrc
export PS1="\$(python ~/.bash_prompt.py)"
The script itself works fine, I get the command prompt I want; However, since there's no wrapping for the styles I put there, the terminal (doesn't matter which GUI terminal program I use) doesn't calculate the prompt width correctly, and as I type characters after the prompt it ends up not wrapping them to a new line at first, overwriting the prompt completely.
Now, I know that when stylizing my bash prompt I need to escape style codes with \[ and \] so that bash takes into consideration that they're escape sequences and calculates the width correctly. However, if I put them as wrappers for my styles in my Python script (see esc_start and esc_end), I can't get them to be properly evaluated by bash as "calculation escape sequences", instead I get literal square brackets printed. If I then escape in Python the backslashes too (\\[ and \\]), then I get unescaped literals outputted (\[ and \]). Bash seems to completely ignore them as escape sequences for calculating the prompt width.
If, however, I remove the backslash in my PS1 command ($(python ~/.bash_prompt.py) instead of \$(python ~/.bash_prompt.py)), then putting \[ and \] as escape sequences in my Python script (as esc_start and esc_end), then bash considers them as proper escapes and ends up wrapping lines as expected (i.e., when I go past the right border it wraps the line as expected). The problem with this, however, is that removing this backslash from my PS1 definition in .bashrc makes the script run only once per terminal session, and not for each prompt line - so, for example, if I'm in a Git working tree and I change from one branch to another, it doesn't show the new branch as soon as the command finishes, instead it shows the old branch.
Let me give some examples of what I mean, that you can try in your own .bashrc without needing my Python script:
PS1="\[\033[31m\]\u#\h\[\033[0m\]$ " # This wraps lines correctly
PS1="\033[31m\u#\h\033[0m$ " # This makes the line overwrite the prompt
So, any ideas of how to cope with bash and make it understand the \[ and \] sequences correctly when printed by the Python script, while still keeping the script running for each command prompt? Or, if this is a limitation in bash, is there another way to force it to wrap the line when it reaches the right border of the terminal window?
Thanks!
Diogo
[EDIT] Solution (thanks, Grisha Levit!):
This is my bashrc line now:
PROMPT_COMMAND='PS1="$(python ~/.bash_prompt.py)"'
And I re-added the escapes (\[ and \]) to my script, and now it works perfectly! :-)
Bash first interprets the escape sequences in $PS1 and only afterwards handles command substitution, etc.
Bash allows these prompt strings to be customized by inserting a number of backslash-escaped special characters that are decoded as follows [...]
After the string is decoded, it is expanded via parameter expansion, command substitution, arithmetic expansion, and quote removal [...]
--Bash Reference Manual: Controlling the Prompt
This means that any special sequences printed by your command will not be interpreted as colors, etc. The solution is to use $PROMPT_COMMAND to change the value of $PS1, like:
PROMPT_COMMAND='PS1=$(python ~/.bash_prompt.py)'
I would like to run ssh with print of python.
The followings are my test code.
import subprocess
# case1:
command_str = "\"print(\'test\')\""
# case 2:
# command_str = "\\\"print(\'test\')\\\""
ssh_command = ['ssh', 'USER_X#localhost', 'python', '-c']
ssh_command.append(command_str)
process = subprocess.run(ssh_command, stdout=subprocess.PIPE)
print(process.stdout)
case 1 and case 2 did not work.
The outputs are followings,
case 1:
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: `python -c print('test')'
b''
case 2:
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: `python -c \"print('test')\"'
b''
Please let me know how it works.
It should work with
command_str = "'print(\"test\")'"
or equivalently
command_str = '\'print("test")\''
Explanation
The outermost quotes and the escaping are for the local Python. So in either case, the local Python string will be 'print("test")'.
There is no quoting or escaping required for the local shell, as subcommand.run(...) won't invoke it unless shell=True is passed.
Thus the single quotes within the python string are for the remote shell (presumably bash or other sh-compatible shell). The argument passed to the remote Python is thus print("test"). (And the double quotes in there are to signify the string literal to print to the remote python.)
Can we do without escaping (without \)?
As there are three levels involved (local Python, remote shell, remote Python), I don't think so.
Can we do with a single type of quotes?
Yes, with a bit more escaping. Let's build this from behind (or inside-out).
We want to print
test
This needs to be escaped for the remote Python (to form a string literal instead of an identifier):
"test"
Call this with the print() function:
print("test")
Quite familiar so far.
Now we want to pass this as an argument to python -c on a sh-like shell. To protect the ( and ) to be interpreted by that, we quote the whole thing. For the already present " not to terminate the quotation, we escape them:
"print(\"test\")"
You can try this in a terminal:
$> echo "print(\"test\")"
print("test")
Perfect!
Now we have to represent the whole thing in (the local) Python. We wrap another layer of quotes around it, have to escape the four(!) existing quotation marks as well as the two backslashes:
"\"print(\\\"test\\\")\""
(Done. This can also be used as command_str.)
Can we do with only single quotes (') and escaping?
I don't know, but at least not as easily. Why? Because, other than to Python, double and single quotes aren't interchangeable to sh and bash: Within single quotes, these shells assume a raw string without escaping until the closing ' occurs.
My brain hurts!
If literally, go see a doctor. If figuratively, yeah, mine too. And your code's future readers (including yourself) will probably feel the same, when they try to untangle that quoting-escaping-forest.
But there's a painless alternative in our beloved Python standard library!
import shlex
command_str = shlex.quote('print("test")')
This is much easier to understand. The inner quotes (double quotes here, but doesn't really matter: shlex.quote("print('test')") works just as fine) are for the remote Python. The outer quotes are obviously for the local Python. And all the quoting and escaping beyond that for the remote shell is taken care of by this utility function.
The correct syntax for python 2 and 3 is:
python -c 'print("test")'
Platform: Windows
Grep: http://gnuwin32.sourceforge.net/packages/grep.htm
Python: 2.7.2
Windows command prompt used to execute the commands.
I am searching for the for the following pattern "2345$" in a file.
Contents of the file are as follows:
abcd 2345
2345
abcd 2345$
grep "2345$" file.txt
grep returns 2 lines (first and second) successfully.
When I try to run the above command through python I don't see any output.
Python code snippet is as follows:
temp = open('file.txt', "r+")
grep_cmd = []
grep_cmd.extend([grep, '"2345$"' ,temp.name])
print grep_cmd
p = subprocess.Popen(grep_cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdoutdata = p.communicate()[0]
print stdoutdata
If I have
grep_cmd.extend([grep, '2345$' ,temp.name])
in my python script, I get the correct answer.
The questions is why the grep command with "
grep_cmd.extend([grep, '"2345$"' ,temp.name])
executed from python fails. Isn't python supposed to execute
the command as it is.
Thanks
Gudge.
Do not put double quotes around your pattern. It is only needed on the command line to quote shell metacharacters. When calling a program from python, you do not need this.
You also do not need to open the file yourself - grep will do that:
grep_cmd.extend([grep, '2345$', 'file.txt'])
To understand the reason for the double quotes not being needed and causing your command to fail, you need to understand the purpose of the double quotes and how they are processed.
The shell uses double quotes to prevent special processing of some shell metacharacters. Shell metacharacters are those characters that the shell handles specially and does not pass literally to the programs it executes. The most commonly used shell metacharacter is "space". The shell splits a command on space boundaries to build an argument vector to execute a program with. If you want to include a space in an argument, it must be quoted in some way (single or double quotes, backslash, etc). Another is the dollar sign ($), which is used to signify variable expansion.
When you are executing a program without the shell involved, all these rules about quoting and shell metacharacters are not relevant. In python, you are building the argument vector yourself, so the relevant quoting rules are python quoting rules (e.g. to include a double quote inside a double-quoted string, prefix the double quote with a backslash - the backslash will not be in the final string). The characters in each element of the argument vector when you have completed constructing it are the literal characters that will be passed to the program you are executing.
Grep does not treat double quotes as special characters, so if grep gets double quotes in its search pattern, it will attempt to match double quotes from its input.
My original answer's reference to shell=True was incorrect - first I did not notice that you had originally specified shell=True, and secondly I was coming from the perspective of a Unix/Linux implementation, not Windows.
The python subprocess module page has this to say about shell=True and Windows:
On Windows: the Popen class uses CreateProcess() to execute the child child program, which operates on strings. If args is a sequence, it will be converted to a string in a manner described in Converting an argument sequence to a string on Windows.
That linked section on converting an argument sequence to a string on Windows does not make sense to me. First, a string is a sequence, and so is a list, yet the Frequently Used Arguments section says this about arguments:
args is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names).
This contradicts the conversion process described in the Python documentation, and given the behaviour you have observed, I'd say the documentation is wrong, and only applied to a argument string, not an argument vector. I cannot verify this myself as I do not have Windows or the source code for Python lying around.
I suspect that if you call subprocess.Popen like:
p = subprocess.Popen(grep + ' "2345$" file.txt', stdout=..., shell_True)
you may find that the double quotes are stripped out as part of the documented argument conversion.
You can use python-textops3 :
from textops import *
print('\n'.join(cat('file.txt') | grep('2345$')))
with python-textops3 you can use unix-like commands with pipes within python
so no need to fork a process which is very heavy