python: read data from stdin and raw_input - python

I want to pass some data to a python script using echo and after that promote the user to input options. I am running through an EOFError which I think is happening since I read all data in sys.stdin. How do I fix this issue? Thanks!
code.py:
x = ''
for line in sys.stdin:
x += line
y = raw_input()
usage:
echo -e -n '1324' | ./code.py
error at raw_input():
EOFError: EOF when reading a line

Use:
{ echo -e -n '1324'; cat; } | ./code.py
First echo will write the literal string to the pipe, then cat will read from standard input and copy that to the pipe. The python script will see this all as its standard input.

You just cannot send data through stdin (that's redirecting) and then get back the interactive mode.
When you perform a | b, b cannot read from standard input anymore. If it wants to do that, it will stop as soon as a finishes and breaks the pipe.
But when a finishes, it does not mean than you get hold of stdin again.
Maybe you could change the way you want to do things, example:
echo -n -e '1324' | ./code.py
becomes
./code.py '1234' '5678'
and use sys.argv[] to get the value of 1234, 5678...
import sys
x = ''
for line in sys.argv[1:]:
x += line+"\n"
y = raw_input()
if you have a lot of lines to output, pass an argument which is a file and what you'll read
import sys
x = ''
for line in open(sys.argv[1],"r"):
x += line
y = raw_input()

Related

Use of subprocess with Linux pipe command

I want to run from Python script next command:
strings <FILE NAME> | grep "Version = <VERSION STRING>" > /dev/null
I need to save command return code and command output for following script logic.
Currently I used next code:
strings_out = subprocess.Popen(('strings', file), stdout=subprocess.PIPE)
grep_output = subprocess.check_output(('grep', "Version = " + version_string), stdin=strings_out.stdout)
strings_out.wait()
I get error
subprocess.CalledProcessError: Command '('grep', 'Version = <VERSION STRING>')' returned non-zero exit status 1
My assumption is that check_output run out of memory.
What is wrong in my use of subprocess?
A non-zero exit status for a check_output means that the bash command had a problem - I don't think you ran out of memory.
On testing myself, I found that if I gave grep a string that exists within a file, I got a proper output with your code (I'm not using Version because I don't know what input files you have, but otherwise things are just about the same). I do, however, get the same error you get if I grep a string that doesn't exist.
Maybe you are running it on a file that string doesn't output any instances of "Version = " + version_string. If you are in a loop, it would only take one file to not have the proper string to get the error.
On another note, if you plan on finishing this line: strings <FILE NAME> | grep "Version = <VERSION STRING>" > /dev/null with subprocess, you'll be piping the output to /dev/null. In this case, you won't see the output of grep.
As #samsonjm has mentioned, every successfully ran bash command has the exit code = 0. It implies that the grep command has failed. Moreover, there is no clue for an OutofMemory error.
I suspect that input file to the strings command is large and hence it could be taking more time to return its result. Therefore, I suspect the string_out.wait() directive should be called immediately after the first line above to feed in the input from stdin to the grep command. It is reasonable to think in this way as the subprocess executes commands in a child process that might be running until completion.
strings_out = subprocess.Popen(('strings', file), stdout=subprocess.PIPE)
strings_out.wait()
grep_output = subprocess.check_output(('grep', "Version = " + version_string), stdin=strings_out.stdout)
That's neat, I've never thought to use subprocess stdin/stdout like that before. However, my advice would be to either go pure Python and write a method to search for the string in a file, or get a little fancier with your subprocess line.
Python might look something like:
import os
search_term = bytes("Version = " + version_string, encoding='utf-8')
i = 0
found = False
file_size = os.stat(f).st_size
chunk_size = len(search_term) *10
with open(file_name, 'rb') as f:
while f.tell() < size:
x = f.read() #read a small amount of data
i += chunk_size - len(search_term) #to make sure we don't miss the search_term
f.seek(i)
if search_term in x:
found = True
break
For subprocess:
cmd = f'strings {file_name} | grep "Version = {version_string}"'
test = subprocess.run([cmd], shell=True, capture_output=True)
test.returncode

Send parameters to python from bash

I have a bash script that calls a python script with parameters.
In the bash script, I'm reading a file that contains one row of parameters separated by ", and then calls the python script with the line I read.
My problem is that the python gets the parameters separated by the space.
The line looks like this: "param_a" "Param B" "Param C"
Code Example:
Bash Script:
LINE=`cat $tmp_file`
id=`python /full_path/script.py $LINE`
Python Script:
print sys.argv[1]
print sys.argv[2]
print sys.argv[3]
Received output:
"param_a"
"Param
B"
Wanted output:
param_a
Param B
Param C
How can I send the parameters to the Python script the way I need?
Thanks!
What about
id=`python /full_path/script.py $tmp_file`
and
import sys
for line in open(sys.argv[1]):
print(line)
?
The issue is in how bash passes the arguments. Python has nothing do to with it.
So, you have to solve all these stuff before sending it to Python, I decided to use awk and xargs for this. (but xargs is the actual MVP here.)
LINE=$(cat $tmp_file)
awk -v ORS="\0" -v FPAT='"[^"]+"' '{for (i=1;i<=NF;i++){print substr($i,2,length($i)-2)}}' <<<$LINE |
xargs -0 python ./script.py
First $(..) is preferred over backticks, because it is more readable. You are making a variable after all.
awk only reads from stdin or a file, but you can force it to read from a variable with the <<<, also called "here string".
With awk I loop over all fields (as defined by the regex in the FPAT variable), and print them without the "".
The output record separator I choose is the NULL character (-v ORF='\0'), xargs will split on this character.
xargs will now parse the piped input by separating the arguments on NULL characters (set with -0) and execute the command given with the parsed arguments.
Note, while awk is found on most UNIX systems, I make use of FPAT which is a GNU awk extension and you might not be having GNU awk as default (for example Ubuntu), but gnu awk is usually just a install gawk away.
Also, the next command would be a quick and easy solution, but generally considered as unsafe, since eval will execute everything it receives.
eval "python ./script "$LINE
This can be done using bash arrays:
tmp_file='gash.txt'
# Set IFS to " which splits on double quotes and removes them
# Using read is preferable to using the external program cat
# read -a reads into the array called "line"
# UPPERCASE variable names are discouraged because of collisions with bash variables
IFS=\" read -ra line < "$tmp_file"
# That leaves blank and space elements in "line",
# we create a new array called "params" without those elements
declare -a params
for((i=0; i < ${#line[#]}; i++))
do
p="${line[i]}"
if [[ -n "$p" && "$p" != " " ]]
then
params+=("$p")
fi
done
# `backticks` are frowned upon because of poor readability
# I've called the python script "gash.py"
id=$(python ./gash.py "${params[#]}")
echo "$id"
gash.py:
import sys
print "1",sys.argv[1]
print "2",sys.argv[2]
print "3",sys.argv[3]
Gives:
1 param_a
2 Param B
3 Param C

Eagerly return lines from stdin Python

I'm making a script which takes has some others script output piped into it. The other script takes a while to complete, and prints the progress onto the console along with the data I want to parse.
Since I'm piping the result to my script, I want to be able to do 2 things. As my input comes, I would like to echo it out onto the screen. After the command completes, I would like to have a list of lines that were passed via stdin.
My first though was to use a simple
for line in sys.stdin:
sys.stdout.write(line + '\n')
lines.append(line)
sys.stdout.flush()
but to my surprise, the command waits until stdin hits EOF, until it starts yielding lines.
My current workaround is this:
line = sys.stdin.readline()
lines = []
while line:
sys.stdout.write(line.strip() + '\n')
lines.append(line.strip())
sys.stdout.flush()
line = sys.stdin.readline()
But this does not always wait until the whole input is used.
Is there any other way to do this? It seems strange that the for solution behaves the way it does.
edited to answer your question regarding exiting on end of input
The workaround you describe, or something similar like this below appears to be necessary:
#!/usr/bin/env python
import sys
lines = []
while True:
line = sys.stdin.readline()
if not line:
break
line = line.rstrip()
sys.stdout.write(line + '\n')
lines.append(line)
sys.stdout.flush()
This is explained in the python man page, under the -u option:
-u Force stdin, stdout and stderr to be totally unbuffered. On
systems where it matters, also put stdin, stdout and stderr in
binary mode. Note that there is internal buffering in xread-
lines(), readlines() and file-object iterators ("for line in
sys.stdin") which is not influenced by this option. To work
around this, you will want to use "sys.stdin.readline()" inside
a "while 1:" loop.
I created a file dummy.py containing the code above, then ran this:
for i in 1 2 3 4 5; do sleep 5; echo $i; echo; done | ./dummy.py
This is the output:
harold_mac:~ harold$ for i in 1 2 3 4 5; do sleep 5; echo $i; done | ./dummy.py
1
2
3
4
5
harold_mac:~ harold$
Python uses buffered input. If you check with python --help you see:
-u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x
So try the unbuffered option with:
command | python -u your_script.py
Other people have already told you about the unbuffered output. I will just add a couple of thoughts:
often it is better to print debug info to stderr, and stderr output is usually unbuffered
it is simplier to delegate intermediate output to special tools. For example, there is a tee utility, that allows to split stdout of a previous command. Assuming you are in bash, you can print the intermediate output to stdout right away, and use process substitution instead of printing to a file (instead of awk you will call your python script):
$ python -c 'for i in range(5): print i+1' | tee >( awk '{print "from awk", $0**2 }')
1
2
3
4
5
from awk 1
from awk 4
from awk 9
from awk 16
from awk 25
You need to make 1) stdin in your python program and 2) stdout on the contrary side of the pipe both to be line buffered. To get this
1) use stdin = os.fdopen(sys.stdin.fileno(), 'r', 1) in your program;
2) use stdbuf -oL to change buffering mode of the output of the other program:
stdbuf -oL otherprogram | python yourscript.py

File following program

I am trying to build a python program that follows a log file checks for certain patterns. (Much like grep ..)
Part of the testing code 'test.py' is to read the stdin,
import fileinput
for line in fileinput.input():
print line
so if I do this in one terminal
tail -f log.txt | python test.py
In another terminal
echo "hello" >> log.txt
you expect hello is print out on the first terminal, but it doesn't. How to change the code? I also want to use it like this
cat log.txt | python test.py
with the same test.py.
Echoing sys.stdin directly seems to work on my Mac OS laptop:
import sys
for line in sys.stdin:
print line.rstrip()
But interestingly, this didn't work very well on my Linux box. It would print the output from tail -f eventually, but the buffering was definitely making it appear as though the program was not working (it would print out fairly large chunks after several seconds of waiting).
Instead I got more responsive behavior by reading from sys.stdin one byte at a time:
import sys
buf = ''
while True:
buf += sys.stdin.read(1)
if buf.endswith('\n'):
print buf[:-1]
buf = ''

Pythonic way to send contents of a file to a pipe and count # lines in a single step

given the > 4gb file myfile.gz, I need to zcat it into a pipe for consumption by Teradata's fastload. I also need to count the number of lines in the file. Ideally, I only want to make a single pass through the file. I use awk to output the entire line ($0) to stdout and through using awk's END clause, writes the number of rows (awk's NR variable) to another file descriptor (outfile).
I've managed to do this using awk but I'd like to know if a more pythonic way exists.
#!/usr/bin/env python
from subprocess import Popen, PIPE
from os import path
the_file = "/path/to/file/myfile.gz"
outfile = "/tmp/%s.count" % path.basename(the_file)
cmd = ["-c",'zcat %s | awk \'{print $0} END {print NR > "%s"} \' ' % (the_file, outfile)]
zcat_proc = Popen(cmd, stdout = PIPE, shell=True)
The pipe is later consumed by a call to teradata's fastload, which reads from
"/dev/fd/" + str(zcat_proc.stdout.fileno())
This works but I'd like to know if its possible to skip awk and take better advantage of python. I'm also open to other methods. I have multiple large files that I need to process in this manner.
There's no need for either of zcat or Awk. Counting the lines in a gzipped file can be done with
import gzip
nlines = sum(1 for ln in gzip.open("/path/to/file/myfile.gz"))
If you want to do something else with the lines, such as pass them to a different process, do
nlines = 0
for ln in gzip.open("/path/to/file/myfile.gz"):
nlines += 1
# pass the line to the other process
Counting lines and unzipping gzip-compressed files can be easily done with Python and its standard library. You can do everything in a single pass:
import gzip, subprocess, os
fifo_path = "path/to/fastload-fifo"
os.mkfifo(fifo_path)
fastload_fifo = open(fifo_path)
fastload = subprocess.Popen(["fastload", "--read-from", fifo_path],
stdin=subprocess.PIPE)
with gzip.open("/path/to/file/myfile.gz") as f:
for i, line in enumerate(f):
fastload_fifo.write(line)
print "Number of lines", i + 1
os.unlink(fifo_path)
I don't know how to invoke Fastload -- subsitute the correct parameters in the invocation.
This can be done in one simple line of bash:
zcat myfile.gz | tee >(wc -l >&2) | fastload
This will print the line count on stderr. If you want it somewhere else you can redirect the wc output however you like.
Actually, it should not be possible to pipe the data to Fastload at all, so it would be great if somebody post here an exact example if he could.
From Teradata documentation on the Fastload configuration http://www.info.teradata.com/htmlpubs/DB_TTU_14_00/index.html#page/Load_and_Unload_Utilities/B035_2411_071A/2411Ch03.026.028.html#ww1938556
FILE=filename
Keyword phrase specifying the name of the data source that contains the input data. fileid must refer to a regular file. Specifically, pipes are not supported.

Categories