How to make a python script "pipeable" in bash? - python

I wrote a script and I want it to be pipeable in bash. Something like:
echo "1stArg" | myscript.py
Is it possible? How?

See this simple echo.py:
import sys
if __name__ == "__main__":
for line in sys.stdin:
sys.stderr.write("DEBUG: got line: " + line)
sys.stdout.write(line)
running:
ls | python echo.py 2>debug_output.txt | sort
output:
echo.py
test.py
test.sh
debug_output.txt content:
DEBUG: got line: echo.py
DEBUG: got line: test.py
DEBUG: got line: test.sh

I'll complement the other answers with a grep example that uses fileinput to implement the typical behaviour of UNIX tools: 1) if no arguments are specified, it reads data from stdin; 2) many files can be specified as arguments; 3) a single argument of - means stdin.
import fileinput
import re
import sys
def grep(lines, regexp):
return (line for line in lines if regexp.search(line))
def main(args):
if len(args) < 1:
print("Usage: grep.py PATTERN [FILE...]", file=sys.stderr)
return 2
regexp = re.compile(args[0])
input_lines = fileinput.input(args[1:])
for output_line in grep(input_lines, regexp):
sys.stdout.write(output_line)
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
Example:
$ seq 1 20 | python grep.py "4"
4
14

In your Python script you simply read from stdin.

Everything that reads from stdin is "pipeable". Pipe simply redirects stdout of former program to the latter.

Related

How to read sys.stdin containing binary data in python (ignore errors)?

How do I read sys.stdin, but ignoring decoding errors?
I know that sys.stdin.buffer exists, and I can read the binary data and then decode it with .decode('utf8', errors='ignore'), but I want to read sys.stdin line by line.
Maybe I can somehow reopen the sys.stdin file but with errors='ignore' option?
Found three solutions from here as Mark Setchell mentioned.
import sys
import io
def first():
with open(sys.stdin.fileno(), 'r', errors='ignore') as f:
return f.read()
def second():
sys.stdin = io.TextIOWrapper(sys.stdin.buffer, errors='ignore')
return sys.stdin.read()
def third():
sys.stdin.reconfigure(errors='ignore')
return sys.stdin.read()
print(first())
#print(second())
#print(third())
Usage:
$ echo 'a\x80b' | python solution.py
ab
You can set an errorhandler option on the PYTHONIOENCODING environment variable: this will affect both sys.stdin and sys,stdout (sys.stderr will always use "backslashreplace"). PYTHONIOENCODING accepts an optional encoding name and an optional errorhandler name preceded by a colon, so "UTF8", "UTF8:ignore" and ":ignore" would all be valid values.
$ cat so73335410.py
import sys
if __name__ == '__main__':
data = sys.stdin.read()
print(data)
$
$ echo hello | python so73335410.py
hello
$ echo hello hello hello hello | zip > hello.zip
adding: - (deflated 54%)
$
$ cat hello.zip | PYTHONIOENCODING=UTF8:ignore python so73335410.py
UYv>
-▒
UY HW#'PKv>
▒-PK,-/>PKmPK/>
$

Python: Read data from STDIN, unless its not provided

I have a python program which reads from STDIN:
#!/usr/bin/env python
def my_func(data):
print (data)
if __name__ == '__main__':
import sys
data = sys.stdin.read()
my_func(data)
I see the expected results when I execute this with:
cat file.txt | ./app.py
I want to add some other functionality to the program:
if __name__ == '__main__':
import sys
data = sys.stdin.read()
if data:
my_func(data)
else:
print ('I am some other functionality')
However when I execute this with:
./app.py
... the program just hangs, as if it is waiting for STDIN input.
What's the correct way to write this, so it will handle both methods of executing.
As #Klaus D. suggested, using different command line arguments should be the most straightforward option.
sys.argv[i] returns the argument at index i used when launching the script. Index 0 is always the script name while any other index represents subsequent arguments passed to the script. With that in mind:
if __name__ == '__main__':
import sys
objective = sys.argv[1]
if objective == 'read':
data = sys.stdin.read()
if data:
my_func(data)
else:
print ('I am some other functionality')
Many Unix-style programs look for a dash - to indicate input from stdin. So,
import sys
if sys.argv[1] == '-':
read_from_stdin()
else:
other_action()
$ ./app.py - < /path/to/input

passing json output from one script as input to second script in python [duplicate]

I wrote a script and I want it to be pipeable in bash. Something like:
echo "1stArg" | myscript.py
Is it possible? How?
See this simple echo.py:
import sys
if __name__ == "__main__":
for line in sys.stdin:
sys.stderr.write("DEBUG: got line: " + line)
sys.stdout.write(line)
running:
ls | python echo.py 2>debug_output.txt | sort
output:
echo.py
test.py
test.sh
debug_output.txt content:
DEBUG: got line: echo.py
DEBUG: got line: test.py
DEBUG: got line: test.sh
I'll complement the other answers with a grep example that uses fileinput to implement the typical behaviour of UNIX tools: 1) if no arguments are specified, it reads data from stdin; 2) many files can be specified as arguments; 3) a single argument of - means stdin.
import fileinput
import re
import sys
def grep(lines, regexp):
return (line for line in lines if regexp.search(line))
def main(args):
if len(args) < 1:
print("Usage: grep.py PATTERN [FILE...]", file=sys.stderr)
return 2
regexp = re.compile(args[0])
input_lines = fileinput.input(args[1:])
for output_line in grep(input_lines, regexp):
sys.stdout.write(output_line)
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
Example:
$ seq 1 20 | python grep.py "4"
4
14
In your Python script you simply read from stdin.
Everything that reads from stdin is "pipeable". Pipe simply redirects stdout of former program to the latter.

Why is my output one character per line?

I have this python script which calls a shell script and processes the output.
$ cat cd-and-run.py
#!/usr/bin/env python
import sys, getopt
import subprocess
def run_mock_phantom (test_value):
aid = 'unknown'
proc = subprocess.Popen(['./mock-phanton.sh', test_value], stdout=subprocess.PIPE)
for line in proc.communicate()[0]:
print line
return aid
def main(argv):
app_id = run_mock_phantom ( 'test-one-two' )
print "app is %s" % (app_id)
if __name__ == "__main__":
main(sys.argv[1:])
Here is the shell script the above script calls:
$ cat mock-phanton.sh
#!/bin/sh
appid=$1
if [ -z $appid ]
then
echo "oh man you didnt do it right ..."
exit 0
fi
echo "APP_ID=$appid"
When I run the script I get this output:
$ ./cd-and-run.py
A
P
P
_
I
D
=
t
e
s
t
-
o
n
e
-
t
w
o
app is unknown
What I don't understand is why does each character get outputted on a separate line and not just ...
APP_ID=test-one-two
?
Try changing this:
for line in proc.communicate()[0]:
print line
to this:
print proc.communicate()[0]
I think you're unintentionally iterating over a string.
EDIT:
As mentioned in this question, you can iterate over proc.communicate()[0].splitlines() for multiple lines of stdout. Perhaps a cleaner way to do it is like this, described in the same question:
for line in proc.stdout:
print line

Redirect bash output to python script

I am using zbarimg to scan bar codes, I want to redirect the output to a python script. How can I redirect the output of the following command:
zbarimg code.png
to a python script, and what should be the script like?
I tried the following script:
#!/usr/local/bin/python
s = raw_input()
print s
I made it an executable by issuing the following:
chmod +x in.py
Than I ran the following :
zbarimg code.png | in.py
I know it's wrong but I can't figure out anything else!
Use sys.stdin to read from stdin in your python script. For example:
import sys
data = sys.stdin.readlines()
Using the pipe operator | from the command is correct, actually. Did it not work?
You might need to explicitly specify the path for the python script as in
zbarimg code.png | ./in.py
and as #dogbane says, reading from stdin like sys.stdin.readlines() is better than using raw_input
I had to invoke the python program command as
somecommand | python mypythonscript.py instead of somecommand | ./mypythonscript.py. This worked for me. The latter produced errors.
My purpose: Sum up the durations of all mp3 files by piping output of soxi -D *mp3
into python: soxi -D *mp3 | python sum_durations.py
Details:
soxi -D *mp3produces:
122.473016
139.533016
128.456009
307.802993
...
sum_durations.py script:
import sys
import math
data = sys.stdin.readlines()
#print(data)
sum = 0.0
for line in data:
#print(line)
sum += float(line)
mins = math.floor(sum / 60)
secs = math.floor(sum) % 60
print("total duration: " + str(mins) + ":" + str(secs))

Categories