How to detect EOF when reading a file with readline() in Python? - python

I need to read the file line by line with readline() and cannot easily change that. Roughly it is:
with open(file_name, 'r') as i_file:
while True:
line = i_file.readline()
# I need to check that EOF has not been reached, so that readline() really returned something
The real logic is more involved, so I can't read the file at once with readlines() or write something like for line in i_file:.
Is there a way to check readline() for EOF? Does it throw an exception maybe?
It was very hard to find the answer on the internet because the documentation search redirects to something non-relevant (a tutorial rather than the reference, or GNU readline), and the noise on the internet is mostly about readlines() function.
The solution should work in Python 3.6+.

From the documentation:
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.
with open(file_name, 'r') as i_file:
while True:
line = i_file.readline()
if not line:
break
# do something with line

Using this I suggest:
fp = open("input")
while True:
nstr = fp.readline()
if len(nstr) == 0:
break # or raise an exception if you want
# do stuff using nstr
As Barmar mentioned in the comments, readline "returns an empty string at EOF".

Empty strings returned in case of EOF evaluate to False, so this could be a nice use case for the walrus operator:
with open(file_name, 'r') as i_file:
while line := i_file.readline():
# do something with line

Related

Unable to read multiline files in python using readline()

The following code is not working properly. It is unable to read multiline files in python using readline().
myobject=open("myfile.txt",'r')
while ((myobject.readline())):
print(myobject.readline())
myobject.close()
It just prints the first line and then newlines. I don't understand why?
It's because readline reads one line at a time, your code will still print a new line because readline keeps trailing newlines.
The way to fix would be to do this:
with open("myfile.txt", 'r') as f:
for line in f:
print(line)
readline() returns the line that it is currently pointing to and moves to the next line. So, the calls to the function in the while condition and in the print statement are not the same. In fact, they are pointing to adjacent lines.
First, store the line in a temporary variable, then check and print.
myobject = open('myfile.txt')
while True:
line = myobject.readline()
if line:
print(line)
else:
break
When you open the file in 'r' mode, the file object returned points at the beginning of the file.
Everytime you call readline, a line is read, and the object now points to the next line in the file
Since your loop condition also reads the file and moves it to the next line, you are getting lines only at even places, like line no 2, 4, 6. Line Numbers, 1, 3, 5, ... will be read by while ((myobject.readline())): and discarded.
A simple solution will be
myobject = open("myfile.txt",'r')
for line in myobject:
print(line, end='')
myobject.close()
OR for your case, when you want to use only readline()
myobject = open("myfile.txt",'r')
while True:
x = myobject.readline()
if len(x) == 0:
break
print(x, end='')
myobject.close()
This code works, because readline behaves in the following way.
According to python documentation, https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.

Remove last line from text file [duplicate]

How can one delete the very last line of a file with python?
Input File example:
hello
world
foo
bar
Output File example:
hello
world
foo
I've created the following code to find the number of lines in the file - but I do not know how to delete the specific line number.
try:
file = open("file")
except IOError:
print "Failed to read file."
countLines = len(file.readlines())
Because I routinely work with many-gigabyte files, looping through as mentioned in the answers didn't work for me. The solution I use:
with open(sys.argv[1], "r+", encoding = "utf-8") as file:
# Move the pointer (similar to a cursor in a text editor) to the end of the file
file.seek(0, os.SEEK_END)
# This code means the following code skips the very last character in the file -
# i.e. in the case the last line is null we delete the last line
# and the penultimate one
pos = file.tell() - 1
# Read each character in the file one at a time from the penultimate
# character going backwards, searching for a newline character
# If we find a new line, exit the search
while pos > 0 and file.read(1) != "\n":
pos -= 1
file.seek(pos, os.SEEK_SET)
# So long as we're not at the start of the file, delete all the characters ahead
# of this position
if pos > 0:
file.seek(pos, os.SEEK_SET)
file.truncate()
You could use the above code and then:-
lines = file.readlines()
lines = lines[:-1]
This would give you an array of lines containing all lines but the last one.
This doesn't use python, but python's the wrong tool for the job if this is the only task you want. You can use the standard *nix utility head, and run
head -n-1 filename > newfile
which will copy all but the last line of filename to newfile.
Assuming you have to do this in Python and that you have a large enough file that list slicing isn't sufficient, you can do it in a single pass over the file:
last_line = None
for line in file:
if last_line:
print last_line # or write to a file, call a function, etc.
last_line = line
Not the most elegant code in the world but it gets the job done.
Basically it buffers each line in a file through the last_line variable, each iteration outputs the previous iterations line.
here is my solution for linux users:
import os
file_path = 'test.txt'
os.system('sed -i "$ d" {0}'.format(file_path))
no need to read and iterate through the file in python.
On systems where file.truncate() works, you could do something like this:
file = open('file.txt', 'rb')
pos = next = 0
for line in file:
pos = next # position of beginning of this line
next += len(line) # compute position of beginning of next line
file = open('file.txt', 'ab')
file.truncate(pos)
According to my tests, file.tell() doesn't work when reading by line, presumably due to buffering confusing it. That's why this adds up the lengths of the lines to figure out positions. Note that this only works on systems where the line delimiter ends with '\n'.
Here's a more general memory-efficient solution allowing the last 'n' lines to be skipped (like the head command):
import collections, fileinput
def head(filename, lines_to_delete=1):
queue = collections.deque()
lines_to_delete = max(0, lines_to_delete)
for line in fileinput.input(filename, inplace=True, backup='.bak'):
queue.append(line)
if lines_to_delete == 0:
print queue.popleft(),
else:
lines_to_delete -= 1
queue.clear()
Inspiring from previous posts, I propound this:
with open('file_name', 'r+') as f:
f.seek(0, os.SEEK_END)
while f.tell() and f.read(1) != '\n':
f.seek(-2, os.SEEK_CUR)
f.truncate()
Though I have not tested it (please, no hate for that) I believe that there's a faster way of going it. It's more of a C solution, but quite possible in Python. It's not Pythonic, either. It's a theory, I'd say.
First, you need to know the encoding of the file. Set a variable to the number of bytes a character in that encoding uses (1 byte in ASCII). CHARsize (why not). Probably going to be 1 byte with an ASCII file.
Then grab the size of the file, set FILEsize to it.
Assume you have the address of the file (in memory) in FILEadd.
Add FILEsize to FILEadd.
Move backwords (increment by -1***CHARsize**), testing each CHARsize bytes for a \n (or whatever newline your system uses). When you reach the first \n, you now have the position of the beginning of the first line of the file. Replace \n with \x1a (26, the ASCII for EOF, or whatever that is one your system/with the encoding).
Clean up however you need to (change the filesize, touch the file).
If this works as I suspect it would, you're going to save a lot of time, as you don't need to read through the whole file from the beginning, you read from the end.
here's another way, without slurping the whole file into memory
p=""
f=open("file")
for line in f:
line=line.strip()
print p
p=line
f.close()

Checking if a text file has another line Python

I'm working on a script to parse text files into a spreadsheet for myself, and in doing so I need to read through them. The issue is finding out when to stop. Java has a method attached when reading called hasNext() or hasNextLine() I was wondering if there was something like that in Python? For some reason I can't find this anywhere.
Ex:
open(f) as file:
file.readline()
nextLine = true
while nextLine:
file.readline()
Do stuff
if not file.hasNextLine():
nextLine = false
Just use a for loop to iterate over the file object:
for line in file:
#do stuff..
Note that this includes the new line char (\n) at the end of each line string. This can be removed through either:
for line in file:
line = line[:-1]
#do stuff...
or:
for line in (l[:-1] for l in file):
#do stuff...
You can only check if the file has another line by reading it (although you can check if you are at the end of the file with file.tell without any reading).
This can be done through calling file.readline and checking if the string is not empty or timgeb's method of calling next and catching the StopIteration exception.
So to answer your question exactly, you can check whether a file has another line through:
next_line = file.readline():
if next_line:
#has next line, do whatever...
or, without modifying the current file pointer:
def has_another_line(file):
cur_pos = file.tell()
does_it = bool(file.readline())
file.seek(cur_pos)
return does_it
which resets the file pointer resetting the file object back to its original state.
e.g.
$ printf "hello\nthere\nwhat\nis\nup\n" > f.txt
$ python -q
>>> f = open('f.txt')
>>> def has_another_line(file):
... cur_pos = file.tell()
... does_it = bool(file.readline())
... file.seek(cur_pos)
... return does_it
...
>>> has_another_line(f)
True
>>> f.readline()
'hello\n'
The typical cadence that I use for reading text files is this:
with open('myfile.txt', 'r') as myfile:
lines = myfile.readlines()
for line in lines:
if 'this' in line: #Your criteria here to skip lines
continue
#Do something here
Using with will only keep the file open until you have executed all of the code within it's block, then the file will be closed. I also think it's valuable to highlight the readlines() method here, which reads all lines in the file and stores them in a list. In terms of handling newline (\n) characters, I would point you to #Joe Iddon's answer.
Python doesn't have an end-of-file (EOF) indicator, but you could get the same effect this way:
with open(f) as file:
file.seek(0, 2) # go to end of file
eof = file.tell() # get end-of-file position
file.seek(0, 0) # go back to start of file
file.readline()
nextLine = True # maybe nextLine = (file.tell() != eof)
while nextLine:
file.readline()
# Do stuff
if file.tell() == eof:
nextLine = False
But as others have pointed out, you may do better by treating the file as an iterable, like this:
with open(f) as file:
next_line = next(file)
# next loop will terminate when next_line is '',
# i.e., after failing to read another line at end of file
while next_line:
# Do stuff
next_line = next(file)
Files are iterators over lines. If all you want to do is check whether a file has a line left, you can issue line = next(file) and catch the StopIeration raised in case there isn't another line. Alternatively you can use line = next(file, default) with a non-string default value (e.g. None) and then check against that.
Note that in most cases, you know that you are done when the for loop over the file ends, as the other answers have explained. So make sure you actually need that kind of fine grained control with next.
with open(filepath, 'rt+') as f:
for line in f.readlines():
#code to process each line
Opening it this way also closes it when it's finished which is much better on the overall memory usage, which might not matter depending on the file size.
The first lines is comparable to:
f = open(....)
f.readlines() gives you a list of all lines in the file.
The loop will start at the first line and end at then last line and shouldn't throw any errors regarding EOF for example.
[Edit]
notice the 'rt+' in the open method. As far as I'm aware this opens the file in read text mode. I.e. no decode required.

Read large text file without read it into RAM at once

I have a large text file and it's 2GB or more. Of course I shouldn't use read().
I think use readline() maybe is a way, but I don't know how to stop the loop at the end of the file.
I've tried this:
with open('test', 'r') as f:
while True:
try:
f.readline()
except:
break
But when the file is at end, the loop won't stop and will keep print empty string ('').
End of File is defined as an empty string returned by readline. Note that an actual empty line, like every line returned by readline ends with the line separator.
with open('test', 'r') as f:
while True:
line = f.readline()
if line == "":
break
But then again, a file object in python is already iterable.
with open('test', 'r') as f:
for line in f:
print(line.strip())
strip removes whitespace, including the newline, so you don't print double newlines.
And if you don't like it safe, and want the least code possible:
for l in open("text"): print(l.strip())
EDIT: strip removes all kind of whitespaces from both sides. If you actually just want to get rid of ending newlines, you can use rstrip("\n")
You could just use a for statement instead of a while statement. You could do something like
for line in f.readlines()
print(line)
Might help.

Python: read all text file lines in loop

I want to read huge text file line by line (and stop if a line with "str" found).
How to check, if file-end is reached?
fn = 't.log'
f = open(fn, 'r')
while not _is_eof(f): ## how to check that end is reached?
s = f.readline()
print s
if "str" in s: break
There's no need to check for EOF in python, simply do:
with open('t.ini') as f:
for line in f:
# For Python3, use print(line)
print line
if 'str' in line:
break
Why the with statement:
It is good practice to use the with keyword when dealing with file
objects. This has the advantage that the file is properly closed after
its suite finishes, even if an exception is raised on the way.
Just iterate over each line in the file. Python automatically checks for the End of file and closes the file for you (using the with syntax).
with open('fileName', 'r') as f:
for line in f:
if 'str' in line:
break
There are situations where you can't use the (quite convincing) with... for... structure. In that case, do the following:
line = self.fo.readline()
if len(line) != 0:
if 'str' in line:
break
This will work because the the readline() leaves a trailing newline character, where as EOF is just an empty string.
You can stop the 2-line separation in the output by using
with open('t.ini') as f:
for line in f:
print line.strip()
if 'str' in line:
break
The simplest way to read a file one line at a time is this:
for line in open('fileName'):
if 'str' in line:
break
No need for a with-statement or explicit close. Notice no variable 'f' that refers to the file. In this case python assigns the result of the open() to a hidden, temporary variable. When the for loop ends (no matter how -- end-of-file, break or exception), the temporary variable goes out of scope and is deleted; its destructor will then close the file.
This works as long as you don't need to explicitly access the file in the loop, i.e., no need for seek, flush, or similar. Should also note that this relies on python using a reference counting garbage collector, which deletes an object as soon as its reference count goes to zero.

Categories