Grep-ing for lines in Python

Grep-ing for lines in Python - python

I wanted to convert a Bash script to Python for training. The Bash script queries a remote host for its df via ssh and searches the returned line for a specific field, like so:
ssh $_host df 2>/dev/null | grep -i $_filesystem | grep :
In Python, I use subprocess.call in combination with shlex.split to get the "df" output:
_cmd = shlex.split("ssh %s df 2>/dev/null" % _host)
_test = subprocess.check_output(_cmd)
Here, _test returns as type string and if I want to search it line for line, it prints out every character as a new line. subprocess.Popen for pipes doesn't work either. All I want to do is print out the line in which my searchstring is found for further formatting.
I've already seen several attempts approaching a solution, but most of them iterate through files with defined carriage returns, not strings that are returned by a subprocess call.

Unless I misunderstood your narrative, the problem is that you iterate over the string _test (in code that you don't show). This iterates over one character at a time. So you just need to write
_test = _test.split("\n")
to turn _test into a list of lines, and you'll be getting lines instead of single characters when you iterate over it.

Related

Invalid argument/option - '|' [duplicate]

This question already has answers here:
How to use `subprocess` command with pipes
(7 answers)
Closed 1 year ago.
When trying to run the tasklist command with grep by using subprocess:
command = ("tasklist | grep edpa.exe | gawk \"{ print $2 }\"")
p = subprocess.Popen(command, stdout=subprocess.PIPE)
text = p.communicate(timeout=600)[0]
print(text)
I get this error:
ERROR: Invalid argument/option - '|'.
Type "TASKLIST /?" for usage.
It works fine when i run the command directly from cmd, but when using subprocess something goes wrong.
How can it be fixed? I need to use the output of the command so i can not use os.system
.

Two options:
Use the shell=True option of the Popen(); this will pass it through the shell, which is the part that interprets things like the |
Just run tasklist in the Popen(), then do the processing in Python rather than invoking grep and awk
Of the two, the latter is probably the better approach in this particular instance, since these grep and awk commands are easily translated into Python.
Your linters may also complain that shell=True is prone to security issues, although this particular usage would be OK.

In the absence of shell=True, subprocess runs a single subprocess. In other words, you are passing | and grep etc as arguments to tasklist.
The simplest fix is to add shell=True; but a much better fix is to do the trivial text processing in Python instead. This also coincidentally gets rid of the useless grep.
for line in subprocess.check_output(['tasklist'], timeout=600).splitlines():
if 'edpa.exe' in line:
text = line.split()[1]
print(text)
I have assumed you really want to match edpa.exe literally, anywhere in the output line; your regex would match edpa followed by any character followed by exe. The code could be improved by doing the split first and then look for the search string only in the process name field (if that is indeed your intent).
Perhaps notice also how you generally want to avoid the low-level Popen whenever you can use one of the higher-level functions.

How to apply string formatting to a bash command (incorporated into Python script via subprocess)?

I would like to add a bash command to my Python script, which linearises a FASTA sequence file while leaving sequence separation intact (hence the specific choice of command). Below is the command, with the example input file of "inputfile.txt":
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < inputfile.txt
The aim is to allow the user to specify the file which is to be modified in the command line, for example:
$ python3 program.py inputfile.txt
I have tried to use string formatting (i.e. %s) in conjunction with sys.argv in order to achieve this. However, I have tried many different locations of " and ', and still cannot get this to work and accept a user input from the command line here.
(The command contains escapes such as \n and so I have tried to counteract this by adding additional backslashes, as well as additional % for the existing %s in the command.)
import sys
import subprocess
path = sys.argv[1]
holder = subprocess.Popen("""awk '/^>/ {printf("\\n%%s\\n",$0);next; } { printf("%%s",$0);} END {printf("\\n");}' < %s""" % path , shell=True, stdout=subprocess.PIPE).stdout.read()
print(holder)
I would very much appreciate any help with identifying the syntax error here, or suggestions for how I could add this user input.

TL;DR: Don't shell out to awk! Just use Python. But let's go step by step...
Your instinct of using triple quotes here is good, then at least you don't need to escape both single and double quotes, that you need in your shell string.
The next useful device you can use is raw strings, using r'...' or r"..." or r"""...""". Raw strings don't expand backslash escapes, so in that case you can leave the \ns intact.
Last is the %s, which you need to escape if you use the % operator, but here I'm going to suggest that instead of using the shell to redirect input, just use Python's subprocess to send stdin from the file! Much simpler and you end up with no substitution.
I'll also recommend that you use subprocess.check_output() instead of Popen(). It's much simpler to use and it's a lot more robust, since it will check that the command exited successfully (with a zero exit status.)
Putting it all together (so far), you get:
with open(path) as inputfile:
holder = subprocess.check_output(
r"""awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}'""",
shell=True,
stdin=inputfile)
But here you can go one step further, since you don't really need a shell anymore, it's only being used to split the command line into two arguments, so just do this split in Python (it's almost always possible and easy to do this and it's a lot more robust since you don't have to deal with the shell's word splitting!)
with open(path) as inputfile:
holder = subprocess.check_output(
['awk', r'/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}'],
stdin=inputfile)
The second string in the list is still a raw string, since you want to preserve the bacsklash escapes.
I could go into how you can do this without using printf() in awk, using print instead, which should get rid of both \ns and %s, but instead I'll tell you that it's much easier to do what you're doing in Python directly!
In fact, everything that awk (or sed, tr, cut, etc.) can do, Python can do better (or, at least, in a more readable and maintainable way.)
In the case of your particular code:
with open(path) as inputfile:
for line in inputfile:
if line.startswith('>'):
# Insert a blank line before this one.
print()
print(line)
if line.startswith('>'):
# Also insert a blank line after this.
print()
# And a blank line at the end.
print()
Isn't this better?
And you can put this into a function, into a module, and reuse it anywhere you'd like. It's easy to store the result in a string, save it into a variable if you like, much more flexible...
Anyways, if you still want to stick to shelling out, see my previous code, I think that's the best you can do while still shelling out, without significantly changing the external command.

Use raw string to run a sub process

This is kind of weird, but I am running a program called scrapebox, scrapebox has an automator plugin that creates a file to automagically run a few things within. In order to run the automator from cmd I would cd into the program directory then type:
Scrapebox.exe "automator:1.sbaf"
It would first launch Scrapebox the program, once open, it would immediately run the automated file.
This is a small piece in a much bigger puzzle. I am trying to call that within a larger Python script.
import os
import subprocess
..........
..........
..........
print "Opening Scrapebox now, please wait."
os.chdir('C:\Users\Admin\DomainDB\Programs\ScrapeBox')
print
print "Current working dir : %s" % os.getcwd()
print
subprocess.call(["Scrapebox.exe"])
#"automator:1.sbaf"
print "Scrapebox finished. Moving on."
When I run it as above, it works and opens scrapebox. But, what I really need to do is something like this:
subprocess.call(["Scrapebox.exe "automator:1.sbaf""])
When I do that it throws a syntax error. So how can I input that maybe as a raw string as though it were being typed into cmd?

If you want to embed double quotes in a string, you can use one of a number of ways. Also to pass a single string of all arguments, don't pass as a list []:
subprocess.call("Scrapebox.exe \"automator:1.sbaf\"")
subprocess.call('Scrapebox.exe "automator:1.sbaf"')
Python can use either single- or double-quotes around a string. You can also triple-quote a string (three single- or double-quotes at the start and end), which allows newlines as well, but it is not needed here.
If you pass a list of arguments, each argument should be an element of the list:
subprocess.call(['Scrapebox.exe','automator:1.sbaf'])

How to return multiple variables from python to bash

I have a bash script that calls a python script. At first I was just returning one variable and that is fine, but now I was told to return two variables and I was wondering if there is a clean and simple way to return more than one variable.
archiveID=$(python glacier_upload.py $archive_file_name $CURRENTVAULT)
Is the call I make from bash
print archive_id['ArchiveId']
archive_id['ArchiveId']
This returns the archive id to the bash script
Normally I know you can use a return statement in python to return multiple variables, but with it just being a script that is the way I found to return a variable. I could make it a function that gets called but even then, how would I receive the multiple variables that I would be passing back?

From your python script, output one variable per line. Then from you bash script, read one variable per line:
Python
print "foo bar"
print 5
Bash
#! /bin/bash
python main.py | while read line ; do
echo $line
done
Final Solution:
Thanks Guillaume! You gave me a great starting point out the soultion. I am just going to post my solution here for others.
#! /bin/bash
array=()
while read line ; do
array+=($line)
done < <(python main.py)
echo ${array[#]}
I found the rest of the solution that I needed here

The safest and cleanest way to parse any input effectively in bash is to map into an array,
mapfile -t -d, <<<"$(python your_script.py)"
Now you just need to make sure you script outputs the data you want to read with the chosen delimiter, "," in my example (-d selects a delimiter, -t trims input like newlines). The quotes are non-optional to ensure the shell doesn't separate things with spaces.
If you have a tuple of things that do not contain commas, this would be enough:
print(str(your_tuple).strip('()'))
Below some easy ways for easy input, before I was more familiar with Bash:
My favorite way is reading straight into a list:
x=($(python3 -c "print('a','b','c')"))
echo ${x[1]}
b
echo ${x[*]}
a b c
for this reason if my_python_function returns a tuple, I would use format to make sure I just get space delimited results:
#Assuming a tuple of length 3 is returned
#Remember to quote in case of a space in a single parameter!
print('"{}" "{}" "{}"'.format(*my_python_function())
If you want this to be generic you would need to construct the format string:
res = my_python_function()
print(("{} "*len(res)).format(*res))
is one way. No need to worry about the extra space, but you could [:-1] on the format string to get rid of it.
Finally, if you are expecting multi-word arguments (i.e. a space in a single argument, you need to add quotes, and a level of indirection (I am assuming you will only be running your own, "safe", scripts):
#myfile.py
res = my_python_function()
print(('"{}" '*len(res)).format(*res))
#my file.bash
eval x=($(python3 myfile.py))

How to pass escaped string to shell script in Python

I am attempting to create a Python script that in turn runs the shell script "js2coffee" to convert some javascript into coffeescript.
From the command line I can run this, and get coffeescript back again...
echo "var myNumber = 100;" | js2coffee
What I need to do is use this same pattern from Python.
In Python, I've come to something like this:
command = "echo '" + myJavscript + "' | js2coffee"
result = os.popen(command).read()
This works sometimes, but there are issues related to special characters (mostly quotes, I think) not being properly escaped in the myJavascript. There has got to be a standard way of doing this. Any ideas? Thanks!

Use the input stream of a process to feed it the data, that way you can avoid the shell and you don't need to escape your javascript. Additionally, you're not vulnerable to shell injection attacks;
pr = subprocess.Popen(['js2coffee'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
result, stderrdata = pr.communicate('var myNumber = 100;')

subprocess module is the way to go:
http://docs.python.org/library/subprocess.html#frequently-used-arguments
be kindly noted the following:
args is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Grep-ing for lines in Python - python

Related

Invalid argument/option - '|' [duplicate]

How to apply string formatting to a bash command (incorporated into Python script via subprocess)?

Use raw string to run a sub process

How to return multiple variables from python to bash

How to pass escaped string to shell script in Python

Categories

Resources