I wrote a script and I want it to be pipeable in bash. Something like:
echo "1stArg" | myscript.py
Is it possible? How?
See this simple echo.py:
import sys
if __name__ == "__main__":
for line in sys.stdin:
sys.stderr.write("DEBUG: got line: " + line)
sys.stdout.write(line)
running:
ls | python echo.py 2>debug_output.txt | sort
output:
echo.py
test.py
test.sh
debug_output.txt content:
DEBUG: got line: echo.py
DEBUG: got line: test.py
DEBUG: got line: test.sh
I'll complement the other answers with a grep example that uses fileinput to implement the typical behaviour of UNIX tools: 1) if no arguments are specified, it reads data from stdin; 2) many files can be specified as arguments; 3) a single argument of - means stdin.
import fileinput
import re
import sys
def grep(lines, regexp):
return (line for line in lines if regexp.search(line))
def main(args):
if len(args) < 1:
print("Usage: grep.py PATTERN [FILE...]", file=sys.stderr)
return 2
regexp = re.compile(args[0])
input_lines = fileinput.input(args[1:])
for output_line in grep(input_lines, regexp):
sys.stdout.write(output_line)
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
Example:
$ seq 1 20 | python grep.py "4"
4
14
In your Python script you simply read from stdin.
Everything that reads from stdin is "pipeable". Pipe simply redirects stdout of former program to the latter.
Related
How do I read sys.stdin, but ignoring decoding errors?
I know that sys.stdin.buffer exists, and I can read the binary data and then decode it with .decode('utf8', errors='ignore'), but I want to read sys.stdin line by line.
Maybe I can somehow reopen the sys.stdin file but with errors='ignore' option?
Found three solutions from here as Mark Setchell mentioned.
import sys
import io
def first():
with open(sys.stdin.fileno(), 'r', errors='ignore') as f:
return f.read()
def second():
sys.stdin = io.TextIOWrapper(sys.stdin.buffer, errors='ignore')
return sys.stdin.read()
def third():
sys.stdin.reconfigure(errors='ignore')
return sys.stdin.read()
print(first())
#print(second())
#print(third())
Usage:
$ echo 'a\x80b' | python solution.py
ab
You can set an errorhandler option on the PYTHONIOENCODING environment variable: this will affect both sys.stdin and sys,stdout (sys.stderr will always use "backslashreplace"). PYTHONIOENCODING accepts an optional encoding name and an optional errorhandler name preceded by a colon, so "UTF8", "UTF8:ignore" and ":ignore" would all be valid values.
$ cat so73335410.py
import sys
if __name__ == '__main__':
data = sys.stdin.read()
print(data)
$
$ echo hello | python so73335410.py
hello
$ echo hello hello hello hello | zip > hello.zip
adding: - (deflated 54%)
$
$ cat hello.zip | PYTHONIOENCODING=UTF8:ignore python so73335410.py
UYv>
-▒
UY HW#'PKv>
▒-PK,-/>PKmPK/>
$
I have a python program which reads from STDIN:
#!/usr/bin/env python
def my_func(data):
print (data)
if __name__ == '__main__':
import sys
data = sys.stdin.read()
my_func(data)
I see the expected results when I execute this with:
cat file.txt | ./app.py
I want to add some other functionality to the program:
if __name__ == '__main__':
import sys
data = sys.stdin.read()
if data:
my_func(data)
else:
print ('I am some other functionality')
However when I execute this with:
./app.py
... the program just hangs, as if it is waiting for STDIN input.
What's the correct way to write this, so it will handle both methods of executing.
As #Klaus D. suggested, using different command line arguments should be the most straightforward option.
sys.argv[i] returns the argument at index i used when launching the script. Index 0 is always the script name while any other index represents subsequent arguments passed to the script. With that in mind:
if __name__ == '__main__':
import sys
objective = sys.argv[1]
if objective == 'read':
data = sys.stdin.read()
if data:
my_func(data)
else:
print ('I am some other functionality')
Many Unix-style programs look for a dash - to indicate input from stdin. So,
import sys
if sys.argv[1] == '-':
read_from_stdin()
else:
other_action()
$ ./app.py - < /path/to/input
I wrote a script and I want it to be pipeable in bash. Something like:
echo "1stArg" | myscript.py
Is it possible? How?
See this simple echo.py:
import sys
if __name__ == "__main__":
for line in sys.stdin:
sys.stderr.write("DEBUG: got line: " + line)
sys.stdout.write(line)
running:
ls | python echo.py 2>debug_output.txt | sort
output:
echo.py
test.py
test.sh
debug_output.txt content:
DEBUG: got line: echo.py
DEBUG: got line: test.py
DEBUG: got line: test.sh
I'll complement the other answers with a grep example that uses fileinput to implement the typical behaviour of UNIX tools: 1) if no arguments are specified, it reads data from stdin; 2) many files can be specified as arguments; 3) a single argument of - means stdin.
import fileinput
import re
import sys
def grep(lines, regexp):
return (line for line in lines if regexp.search(line))
def main(args):
if len(args) < 1:
print("Usage: grep.py PATTERN [FILE...]", file=sys.stderr)
return 2
regexp = re.compile(args[0])
input_lines = fileinput.input(args[1:])
for output_line in grep(input_lines, regexp):
sys.stdout.write(output_line)
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
Example:
$ seq 1 20 | python grep.py "4"
4
14
In your Python script you simply read from stdin.
Everything that reads from stdin is "pipeable". Pipe simply redirects stdout of former program to the latter.
I have this python script which calls a shell script and processes the output.
$ cat cd-and-run.py
#!/usr/bin/env python
import sys, getopt
import subprocess
def run_mock_phantom (test_value):
aid = 'unknown'
proc = subprocess.Popen(['./mock-phanton.sh', test_value], stdout=subprocess.PIPE)
for line in proc.communicate()[0]:
print line
return aid
def main(argv):
app_id = run_mock_phantom ( 'test-one-two' )
print "app is %s" % (app_id)
if __name__ == "__main__":
main(sys.argv[1:])
Here is the shell script the above script calls:
$ cat mock-phanton.sh
#!/bin/sh
appid=$1
if [ -z $appid ]
then
echo "oh man you didnt do it right ..."
exit 0
fi
echo "APP_ID=$appid"
When I run the script I get this output:
$ ./cd-and-run.py
A
P
P
_
I
D
=
t
e
s
t
-
o
n
e
-
t
w
o
app is unknown
What I don't understand is why does each character get outputted on a separate line and not just ...
APP_ID=test-one-two
?
Try changing this:
for line in proc.communicate()[0]:
print line
to this:
print proc.communicate()[0]
I think you're unintentionally iterating over a string.
EDIT:
As mentioned in this question, you can iterate over proc.communicate()[0].splitlines() for multiple lines of stdout. Perhaps a cleaner way to do it is like this, described in the same question:
for line in proc.stdout:
print line
I am using zbarimg to scan bar codes, I want to redirect the output to a python script. How can I redirect the output of the following command:
zbarimg code.png
to a python script, and what should be the script like?
I tried the following script:
#!/usr/local/bin/python
s = raw_input()
print s
I made it an executable by issuing the following:
chmod +x in.py
Than I ran the following :
zbarimg code.png | in.py
I know it's wrong but I can't figure out anything else!
Use sys.stdin to read from stdin in your python script. For example:
import sys
data = sys.stdin.readlines()
Using the pipe operator | from the command is correct, actually. Did it not work?
You might need to explicitly specify the path for the python script as in
zbarimg code.png | ./in.py
and as #dogbane says, reading from stdin like sys.stdin.readlines() is better than using raw_input
I had to invoke the python program command as
somecommand | python mypythonscript.py instead of somecommand | ./mypythonscript.py. This worked for me. The latter produced errors.
My purpose: Sum up the durations of all mp3 files by piping output of soxi -D *mp3
into python: soxi -D *mp3 | python sum_durations.py
Details:
soxi -D *mp3produces:
122.473016
139.533016
128.456009
307.802993
...
sum_durations.py script:
import sys
import math
data = sys.stdin.readlines()
#print(data)
sum = 0.0
for line in data:
#print(line)
sum += float(line)
mins = math.floor(sum / 60)
secs = math.floor(sum) % 60
print("total duration: " + str(mins) + ":" + str(secs))