Identifying end of stream while reading from stdin - python

I am reading input from sys.stdin in python and I need to perform some extra operations when the last line is encountered. How can I identify if the current line being executed is the last one?
for line in sys.stdin:
if <line is last line>:
// do some extra operation
else:
// rest of stuff

The only way to know that you're at the end of the stream is when you try to read from it and there's nothing there. Logic added after the for-loop will be at the end-of-stream case.
If you need to detect end-of-stream in the input stream, before you've finished with the previous record, then your logic can't use a "for"-loop. Instead, you must use "while." You must pre-read the first record, then, "while" the latest-thing-read isn't empty, you must read the next record and then process the current one. Only in this way can you know, before processing the current record, that there will be no records following it.

Before starting the loop, read the first line of the input. Then, in the loop, always process the line previously read. After the loop terminates, you'll still have the last line from sys.stdin in line_prev for your special processing.
import sys
line_prev = sys.stdin.readline()
for line in sys.stdin:
rest_of_stuff(line_prev)
line_prev = line
do_some_extra_operation(line)

Why don't use try this:
for a in iter(raw_input, ""):
# do something with a
The loop will break when the input equals to the sentinel (the second argument in iter). You can keep a reference to the last line as well, such as:
for a in iter(raw_input, ""):
if a == last_line:
# do stuff
last_line = a
# Do more stuff
For your understanding, all input in python is taken from sys.stdin, and as a result, you can use functions such as raw_input, and it will read from sys.stdin. Think of it like this:
def raw_input():
sys.stdin.readline()
It's not exactly like that, but it's similar to that concept.

Related

What is a Pythonic way to detect that the next read will produce an EOF in Python 3 (and Python 2)

Currently, I am using
def eofapproached(f):
pos = f.tell()
near = f.read(1) == ''
f.seek(pos)
return near
to detect if a file open in 'r' mode (the default) is "at EOF" in the sense that the next read would produce the EOF condition.
I might use it like so:
f = open('filename.ext') # default 'r' mode
print(eofapproached(f))
FYI, I am working with some existing code that stops when EOF occurs, and I want my code to do some action just before that happens.
I am also interested in any suggestions for a better (e.g., more concise) function name. I thought of eofnear, but that does not necessarily convey as specific a meaning.
Currently, I use Python 3, but I may be forced to use Python 2 (part of a legacy system) in the future.
You can use f.tell() to find out your current position in the file.
The problem is, that you need to find out how big the file is.
The niave (and efficient) solution is os.path.getsize(filepath) and compare that to the result of tell() but that will return the size in bytes, which is only relavent if reading in binary mode ('rb') as your file may have multi-byte characters.
Your best solution is to seek to the end and back to find out the size.
def char_count(f):
current = f.tell()
f.seek(0, 2)
end = f.tell()
f.seek(current)
return end
def chars_left(f, length=None):
if not length:
length = char_count(f)
return length - f.tell()
Preferably, run char_count once at the beginning, and then pass that into chars_left. Seeking isn't efficient, but you need to know how long your file is in characters and the only way is by reading it.
If you are reading line by line, and want to know before reading the last line, you also have to know how long your last line is to see if you are at the beginning of the last line.
If you are reading line by line, and only want to know if the next line read will result in an EOF, then when chars_left(f, total) == 0 you know you are there (no more lines left to read)
I've formulated this code to avoid the use of tell (perhaps using tell is simpler):
import os
class NearEOFException(Exception): pass
def tellMe_before_EOF(filePath, chunk_size):
fileSize = os.path.getsize(filePath)
chunks_num = (fileSize // chunk_size) # how many chunks can we read from file?
reads = 0 # how many chunks we read so far
f = open(filePath)
if chunks_num == 0:
raise NearEOFException("File is near EOF")
for i in range(chunks_num-1):
yield f.read(chunk_size)
else:
raise NearEOFException("File is near EOF")
if __name__ == "__main__":
g = tellMe_before_EOF("xyz", 3) # read in chunks of 3 chars
while True:
print(next(g), end='') # near EOF raise NearEOFException
The naming of the function is disputed. It's boring to name things, I'm just not good at that.
The function works like this: take the size of the file and see approximately how many times can we read N sized chunks and store it in chunks_num. This simple division gets us near EOF, the question is where do you think near EOF is? Near the last char for example or near the last nth characters? Maybe that's something to keep in mind if it matters.
Trace through this code to see how it works.

How to read most recent line from stdin in python

Is there way to read only the current data from stdin?
I would like to pipe some never-ending input data (from a mouse like device) into a python script and grab only the most recent line of data.
The input x,y data looks like this and arrives at 600 lines per second:
0.123,0.123
0.244,0.566
etc.
So far I have tried something like this:
import sys, time
while 1:
data = sys.stdin.readline()
my_slow_function(data)
Python seems to buffer the data so nothing is skipped. I would like to skip everything except the current line.
Just spin up a separate thread to read stdin into a global variable. Make it a daemon thread so that you don't have to close it later on. The thread reads the data as it arrives and keeps discarding the old stuff. Have your regular program read last_line when it wants to.
I added an event so that the regular program can wait when no new data is available. If that's not what you want, take it out.
import sys
import threading
last_line = ''
new_line_event = threading.Event()
def keep_last_line():
global last_line, new_line_event
for line in sys.stdin:
last_line = line
new_line_event.set()
keep_last_line_thread = threading.Thread(target=keep_last_line)
keep_last_line_thread.daemon = True
keep_last_line_thread.start()
Keep the current line, only act on the last line.
buffer = None
for line in sys.stdin:
buffer = line
my_slow_function(buffer)

How to read line by line from stdin in python

Everyone knows how to count the characters from STDIN in C. However, when I tried to do that in python3, I find it is a puzzle. (counter.py)
import sys
chrCounter = 0
for line in sys.stdin.readline():
chrCounter += len(line)
print(chrCounter)
Then I try to test the program by
python3 counter.py < counter.py
The answer is only the length of the first line "import sys". In fact, the program ONLY read the first line from the standard input, and discard the rest of them.
It will be work if I take the place of sys.stdin.readline by sys.stdin.read()
import sys
print(len(sys.stdin.read()))
However, it is obviously, that the program is NOT suitable for a large input. Please give me a elegant solution. Thank you!
It's simpler:
for line in sys.stdin:
chrCounter += len(line)
The file-like object sys.stdin is automatically iterated over line by line; if you call .readline() on it, you only read the first line (and iterate over that character-by-character); if you call read(), then you'll read the entire input into a single string and iterate over that character-by.character.
The answer from Tim Pietzcker is IMHO the correct one. There are 2 similar ways of doing this. Using:
for line in sys.stdin:
and
for line in sys.stdin.readlines():
The second option is closer to your original code. The difference between these two options is made clear by using e.g. the following modification of the for-loop body and using keyboard for input:
for line in sys.stdin.readlines():
line_len = len(line)
print('Last line was', line_len, 'chars long.')
chrCounter += len(line)
If you use the first option (for line in sys.stdin:), then the lines are processed right after you hit enter.
If you use the second option (for line in sys.stdin.readlines():), then the whole file is first read, split into lines and only then they are processed.
If I just wanted a character count, I'd read in blocks at a time instead of lines at a time:
# 4096 chosen arbitrarily. Pick any other number you want to use.
print(sum(iter(lambda:len(sys.stdin.read(4096)), 0)))

How to idle file-processing program until new data arrives in the file

i have a text file that is being written by another program every 10 seconds.
my code goes through this file and parses the data i want. but at some point the for loop reaches the end of file and program closes.
GOAL: i want the program to wait inside the for loop for more data to come so that it parses the new data too.
i tried it using a while with a condition about the lines that are left to be read but for some reason the program just stops a little after exiting the while loop.if i add let's say 25 lines...it processes 9 of them and then the program exits the for loop and program finishes(not crashes)
QUESTION: is there a better way to idle the program until new data arrives? what is wrong in this code?
k = -1
with open('epideiksh.txt') as weather_file:
for line in weather_file:
k = k+1
lines_left = count_lines_of('epideiksh.txt') - k
while ( lines_left <= 10 ):
print("waiting for more data")
time.sleep(10)
pointer = count_lines('epideiksh.txt') - k
if line.startswith('Heat Index'):
do_my_thing()
time.sleep(10)
The simplest, but slightly error-prone way of simulating tail is:
with open("filename") as input:
while True:
for line in input:
if interesting(line):
do_something_with(line)
sleep a_little
input.seek(0, io.SEEK_CUR)
In my very limited testing, that seemed to work without the seek. But it shouldn't, since normally you have to do something like that in order to clear the eof flag. One thing to keep in mind is that tell() cannot be used on a (text) file while it is being iterated, and seeking from SEEK_CUR invokes tell(). So in the above code snippet, you could not break out of the for loop and fall into the input.seek() call.
The problem with the above is that it is possible that the readline (implicit in the iterator) will only read part of the line currently being written. So you need to be prepared to abandon and reread partial lines:
with open("filename") as input:
# where is the end of the last complete line read
where = input.tell()
# use readline explicitly because next() and tell() are incompatible
while True:
line = input.readline()
if not line or line[-1] != '\n':
time.sleep(a_little)
input.seek(where)
else:
where = input.tell()
if interesting(line):
do_something_with(line)

Python - Readline Control-D after non-empty line does not work why?

I am new to python, and I am sorry if what I am asking seems odd. I want to loop over each line on standard input and return a modified line to standard output immediately. I have code that works, mostly. However I do not know how to make this work completely.
I have the following code
while True:
line = sys.stdin.readline()
if not line:
break
sys.stdout.write(line)
When being used interactively this will exit if there is an EOF on a new line, however if there is text before I type Control-D I must give the code twice before it will exit the line, and then once more before the loop will exit.
How do I fix this.
I think my answer from here can be copied immediately:
It has to do with ^D really does: it just stops the current
read(2) call.
If the program does int rdbytes = read(fd, buffer, sizeof buffer);
and you press ^D inbetween, read() returns with the currently read
bytes in the buffer, returning their number. The same happens on line
termination; the \n at the end is always delivered.
So only a ^D at the start of a line or after another ^D has the
desired effect of having read() return 0, signalizing EOF.
And this behaviour, of course, affects Python code as well.
A strategy suggested in the python docs is:
for line in sys.stdin:
sys.stdout.write(line)
See the IO Tutorial.

Categories