Python subprocess using perl for formatting is giving incomplete output - python

I'm having an issue reading output from a python subprocess command.
The bash command from whose output I want to read:
pacmd list-sink-inputs | tr '\n' '\r' | perl -pe 's/ *index: ([0-9]+).+?application\.process\.id = "([^\r]+)"\r.+?(?=index:|$)/\2:\1\r/g' | tr '\r' '\n'
When I run this via bash I get the intended output:
4 sink input(s) available.
6249:72
20341:84
20344:86
20350:87
When I try to get it's output via python's subprocess running either one :
subprocess.Popen(cmnd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()[0].decode('UTF-8')
check_output(cmnd,shell=True).decode('UTF-8')
subprocess.run(cmnd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE).stdout.decode('utf-8')
where cmnd = """pacmd list-sink-inputs | tr '\n' '\r' | perl -pe 's/ *index: ([0-9]+).+?application\.process\.id = "([^\r]+)"\r.+?(?=index:|$)/\2:\1\r/g' | tr '\r' '\n'"""
It gives the following output:
'4 sink input(s) available.\n\x02:\x01\n\x02:\x01\n\x02:\x01\n\x02:\x01\n'
Which is unintended as it doesn't have the 6249:72 ,etc. numbers I want. Even stderr is blank and returncode is 0 as intended.
The only workaround, I could find was to redirect the bash output to a text file and then read the text file via python which I don't want to use because that's unnecessary file IO.
I've already gone through Missing output from subprocess command, Python Subprocess Grep, Python subprocess run() is giving abnormal output [duplicate] and many others but can't wrap my head around what's going wrong.

You have a quoting issue. """\1""" means chr(0o1). To produce the string \1, you could use """\\1""". The other instances of \ should be \\ as well.
Since all instances of \ need to be escaped, you could also use r"""\1""".
Other issues:
\1 and \2 outside of a regular expression is wrong anyways. You should be using $1 and $2.
There's no use for a mutliline literal here. "..." or r"..." would suffice.
The whole tr business can be avoided by using -0777 to cause perl to treat the entire file as one line.
This gives us:
cmnd = "pacmd list-sink-inputs | perl -0777pe's/ *index: (\\d+).+?application\\.process\\.id = "([^\\n]+)"\\n.+?(?=index:|$)/$2:$1\\n/sag'"
or
cmnd = r"pacmd list-sink-inputs | perl -0777pe's/ *index: (\d+).+?application\.process\.id = "([^\n]+)"\n.+?(?=index:|$)/$2:$1\n/sag'"
But why is Perl being used at all here? You could easily do the same thing in Python!

Related

echo not printing \033 correctly in pipeline started by os.system()

In bash (as started by Python), I want to print this string \033[31m so that I can use a pipe | operator after it, followed by a command to copy that string to the clipboard. This means that in practice, I'm trying to run something like:
os.system('echo \\033[31m | xsel -ib')
...but the xsel -ib part is working fine, so this question is focused specifically on the behavior of echo.
Most of my attempts have been similar to:
echo -e \\033[31m
I have tried it with single quotes, double quotes, no quotes, removing the -e flag, etc. The closest I got was:
echo -n "\\ 033[31m"
which prints this string \ 033[31m
I don't want that space between \ and 0
-n flag is used to not append a new line after the printed string
I use Ubuntu 20.04, and xsel is a selection and clipboard manipulation tool for the X11 Window System (which Ubuntu 20.04 uses).
echo is the wrong tool for the job. It's a shell builtin, and one for which the POSIX sh standard explicitly does not guarantee portable behavior for when escape sequences (such as \033) are present. system() starts /bin/sh instead of bash, so POSIX behavior -- not that of your regular interactive shell -- is expected.
Use subprocess.run() instead of os.system(), and you don't need echo in the first place.
If you want to put an escape sequence into the clipboard (so not \033 but instead the ESC key that this gets converted to by an echo with XSI extensions to POSIX):
# to store \033 as a single escape character, use a regular Python bytestring
subprocess.run(['xsel', '-ib'], input=b'\033[31m')
If you want to put the literal text without being interpreted (so there's an actual backslash and an actual zero), use a raw bytestring instead:
# to store \033 as four separate characters, use a raw string
subprocess.run(['xsel', '-ib'], input=rb'\033[31m')
For a more detailed description of why echo causes problems in this context, see the excellent answer by Stephane to the Unix & Linux Stack Exchange question Why is printf better than echo?.
If you for some reason do want to keep using a shell pipeline, switch to printf instead:
# to store \033 as four separate characters, use %s
subprocess.run(r''' printf '%s\n' '\033[31m' | xsel -ib ''', shell=True)
# to store \033 as a single escape character, use %b
subprocess.run(r''' printf '%b\n' '\033[31m' | xsel -ib ''', shell=True)

Invalid argument/option - '|' [duplicate]

This question already has answers here:
How to use `subprocess` command with pipes
(7 answers)
Closed 1 year ago.
When trying to run the tasklist command with grep by using subprocess:
command = ("tasklist | grep edpa.exe | gawk \"{ print $2 }\"")
p = subprocess.Popen(command, stdout=subprocess.PIPE)
text = p.communicate(timeout=600)[0]
print(text)
I get this error:
ERROR: Invalid argument/option - '|'.
Type "TASKLIST /?" for usage.
It works fine when i run the command directly from cmd, but when using subprocess something goes wrong.
How can it be fixed? I need to use the output of the command so i can not use os.system
.
Two options:
Use the shell=True option of the Popen(); this will pass it through the shell, which is the part that interprets things like the |
Just run tasklist in the Popen(), then do the processing in Python rather than invoking grep and awk
Of the two, the latter is probably the better approach in this particular instance, since these grep and awk commands are easily translated into Python.
Your linters may also complain that shell=True is prone to security issues, although this particular usage would be OK.
In the absence of shell=True, subprocess runs a single subprocess. In other words, you are passing | and grep etc as arguments to tasklist.
The simplest fix is to add shell=True; but a much better fix is to do the trivial text processing in Python instead. This also coincidentally gets rid of the useless grep.
for line in subprocess.check_output(['tasklist'], timeout=600).splitlines():
if 'edpa.exe' in line:
text = line.split()[1]
print(text)
I have assumed you really want to match edpa.exe literally, anywhere in the output line; your regex would match edpa followed by any character followed by exe. The code could be improved by doing the split first and then look for the search string only in the process name field (if that is indeed your intent).
Perhaps notice also how you generally want to avoid the low-level Popen whenever you can use one of the higher-level functions.

passing text from python to shell | unicode | applying cut on it

I have a python script that essentially parses an xml file, uses the package re and prints text as follows:
string = str(search_compiled.groups(0)[0].encode('utf-8')) + "%" + str(text.encode('utf-8'))
print string
I receive the text in the shell script as follows:
string="$($file.py $arg1 $arg2 $arg3)"
varA="$(echo "$string" | cut -d'%' -f1)"
varB="$(echo "$string" | cut -d'%' -f2)"
echo "$string"
So, in summary, I need the passed string to be cut into two by the delimiter '%' and store the results in varA and varB.
The splitting does not happen.
string shows the entire thingy: part A plus the part B. Here's the catch, the '%' I added in the python script does not get printed though.
Could anyone please help me in understanding what is going wrong?
You can use the pipe and cut commands as you have in the question but without the quotes on the delimiter character use -d% instead of -d'%'
varA=$(echo $string | cut -f1 -d%)
varB=$(echo $string | cut -f2 -d%)
[root#test /tmp]$ eval `echo "aaa%bbb%ccc" | awk -F '%' '{print "a="$1" b="$2}'`
[root#test /tmp]$ echo $a
aaa
[root#test /tmp]$ echo $b
bbb
Explanation
Use awk -F '%' '{print "a="$1" b="$2}' get like this a=aaa b=bbb
eval a=aaa b=bbb Equivalent to the input terminal
$ a=aaa
$ b=bbb
I re-read this for a 3rd time, and I think this is the basic problem (from your description):
string shows the entire thingy: part A plus the part B. Here's the catch, the '%' I added in the python script does not get printed though.
The conversion of data to utf-8 then back to string seems suspect to me. Can you change the string creation line in your python program to this:
string = u'{}%{}'.format(search_compiled.groups(0)[0].encode('utf-8'), text.encode('utf-8'))
You might be double encoding, so this could be what you need:
string = u'{}%{}'.format(search_compiled.groups(0)[0], text)
Add this in the shell script before it calls the python script:
export PYTHONIOENCODING=UTF-8

call awk from inside python generate error

Ive to run awk from the python. When I run the script from the terminal, gives the desired output but showing error when
executing from inside the python.
runAwk = '''awk '{printf $1}{for(i=2;i<=NF;i++)printf "|"$i}{printf "\n"}' final.txt'''
os.system(runAwk)
gives the error:
awk: line 1: runaway string constant " ...
when I surfed from the web, I found that awk can not be used with os module and there are not much contents. I am confused how to proceed ahead.
The \n in your runAwk string is being interpreted by Python as a literal newline character, rather than being passed through to awk as the two characters \ and n. If you use a raw string instead, by preceding the opening triple-quotes with an r:
runAwk = r'''awk '{printf $1}{for(i=2;i<=NF;i++)printf "|"$i}{printf "\n"}' final.txt'''
... then Python won't treat \n as meaning "newline", and awk will see the string you intended.

How to pass escaped string to shell script in Python

I am attempting to create a Python script that in turn runs the shell script "js2coffee" to convert some javascript into coffeescript.
From the command line I can run this, and get coffeescript back again...
echo "var myNumber = 100;" | js2coffee
What I need to do is use this same pattern from Python.
In Python, I've come to something like this:
command = "echo '" + myJavscript + "' | js2coffee"
result = os.popen(command).read()
This works sometimes, but there are issues related to special characters (mostly quotes, I think) not being properly escaped in the myJavascript. There has got to be a standard way of doing this. Any ideas? Thanks!
Use the input stream of a process to feed it the data, that way you can avoid the shell and you don't need to escape your javascript. Additionally, you're not vulnerable to shell injection attacks;
pr = subprocess.Popen(['js2coffee'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
result, stderrdata = pr.communicate('var myNumber = 100;')
subprocess module is the way to go:
http://docs.python.org/library/subprocess.html#frequently-used-arguments
be kindly noted the following:
args is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names)

Categories