Python: read all text file lines in loop - python

I want to read huge text file line by line (and stop if a line with "str" found).
How to check, if file-end is reached?
fn = 't.log'
f = open(fn, 'r')
while not _is_eof(f): ## how to check that end is reached?
s = f.readline()
print s
if "str" in s: break

There's no need to check for EOF in python, simply do:
with open('t.ini') as f:
for line in f:
# For Python3, use print(line)
print line
if 'str' in line:
break
Why the with statement:
It is good practice to use the with keyword when dealing with file
objects. This has the advantage that the file is properly closed after
its suite finishes, even if an exception is raised on the way.

Just iterate over each line in the file. Python automatically checks for the End of file and closes the file for you (using the with syntax).
with open('fileName', 'r') as f:
for line in f:
if 'str' in line:
break

There are situations where you can't use the (quite convincing) with... for... structure. In that case, do the following:
line = self.fo.readline()
if len(line) != 0:
if 'str' in line:
break
This will work because the the readline() leaves a trailing newline character, where as EOF is just an empty string.

You can stop the 2-line separation in the output by using
with open('t.ini') as f:
for line in f:
print line.strip()
if 'str' in line:
break

The simplest way to read a file one line at a time is this:
for line in open('fileName'):
if 'str' in line:
break
No need for a with-statement or explicit close. Notice no variable 'f' that refers to the file. In this case python assigns the result of the open() to a hidden, temporary variable. When the for loop ends (no matter how -- end-of-file, break or exception), the temporary variable goes out of scope and is deleted; its destructor will then close the file.
This works as long as you don't need to explicitly access the file in the loop, i.e., no need for seek, flush, or similar. Should also note that this relies on python using a reference counting garbage collector, which deletes an object as soon as its reference count goes to zero.

Related

How to detect EOF when reading a file with readline() in Python?

I need to read the file line by line with readline() and cannot easily change that. Roughly it is:
with open(file_name, 'r') as i_file:
while True:
line = i_file.readline()
# I need to check that EOF has not been reached, so that readline() really returned something
The real logic is more involved, so I can't read the file at once with readlines() or write something like for line in i_file:.
Is there a way to check readline() for EOF? Does it throw an exception maybe?
It was very hard to find the answer on the internet because the documentation search redirects to something non-relevant (a tutorial rather than the reference, or GNU readline), and the noise on the internet is mostly about readlines() function.
The solution should work in Python 3.6+.
From the documentation:
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.
with open(file_name, 'r') as i_file:
while True:
line = i_file.readline()
if not line:
break
# do something with line
Using this I suggest:
fp = open("input")
while True:
nstr = fp.readline()
if len(nstr) == 0:
break # or raise an exception if you want
# do stuff using nstr
As Barmar mentioned in the comments, readline "returns an empty string at EOF".
Empty strings returned in case of EOF evaluate to False, so this could be a nice use case for the walrus operator:
with open(file_name, 'r') as i_file:
while line := i_file.readline():
# do something with line

Unable to read multiline files in python using readline()

The following code is not working properly. It is unable to read multiline files in python using readline().
myobject=open("myfile.txt",'r')
while ((myobject.readline())):
print(myobject.readline())
myobject.close()
It just prints the first line and then newlines. I don't understand why?
It's because readline reads one line at a time, your code will still print a new line because readline keeps trailing newlines.
The way to fix would be to do this:
with open("myfile.txt", 'r') as f:
for line in f:
print(line)
readline() returns the line that it is currently pointing to and moves to the next line. So, the calls to the function in the while condition and in the print statement are not the same. In fact, they are pointing to adjacent lines.
First, store the line in a temporary variable, then check and print.
myobject = open('myfile.txt')
while True:
line = myobject.readline()
if line:
print(line)
else:
break
When you open the file in 'r' mode, the file object returned points at the beginning of the file.
Everytime you call readline, a line is read, and the object now points to the next line in the file
Since your loop condition also reads the file and moves it to the next line, you are getting lines only at even places, like line no 2, 4, 6. Line Numbers, 1, 3, 5, ... will be read by while ((myobject.readline())): and discarded.
A simple solution will be
myobject = open("myfile.txt",'r')
for line in myobject:
print(line, end='')
myobject.close()
OR for your case, when you want to use only readline()
myobject = open("myfile.txt",'r')
while True:
x = myobject.readline()
if len(x) == 0:
break
print(x, end='')
myobject.close()
This code works, because readline behaves in the following way.
According to python documentation, https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.

Checking if a text file has another line Python

I'm working on a script to parse text files into a spreadsheet for myself, and in doing so I need to read through them. The issue is finding out when to stop. Java has a method attached when reading called hasNext() or hasNextLine() I was wondering if there was something like that in Python? For some reason I can't find this anywhere.
Ex:
open(f) as file:
file.readline()
nextLine = true
while nextLine:
file.readline()
Do stuff
if not file.hasNextLine():
nextLine = false
Just use a for loop to iterate over the file object:
for line in file:
#do stuff..
Note that this includes the new line char (\n) at the end of each line string. This can be removed through either:
for line in file:
line = line[:-1]
#do stuff...
or:
for line in (l[:-1] for l in file):
#do stuff...
You can only check if the file has another line by reading it (although you can check if you are at the end of the file with file.tell without any reading).
This can be done through calling file.readline and checking if the string is not empty or timgeb's method of calling next and catching the StopIteration exception.
So to answer your question exactly, you can check whether a file has another line through:
next_line = file.readline():
if next_line:
#has next line, do whatever...
or, without modifying the current file pointer:
def has_another_line(file):
cur_pos = file.tell()
does_it = bool(file.readline())
file.seek(cur_pos)
return does_it
which resets the file pointer resetting the file object back to its original state.
e.g.
$ printf "hello\nthere\nwhat\nis\nup\n" > f.txt
$ python -q
>>> f = open('f.txt')
>>> def has_another_line(file):
... cur_pos = file.tell()
... does_it = bool(file.readline())
... file.seek(cur_pos)
... return does_it
...
>>> has_another_line(f)
True
>>> f.readline()
'hello\n'
The typical cadence that I use for reading text files is this:
with open('myfile.txt', 'r') as myfile:
lines = myfile.readlines()
for line in lines:
if 'this' in line: #Your criteria here to skip lines
continue
#Do something here
Using with will only keep the file open until you have executed all of the code within it's block, then the file will be closed. I also think it's valuable to highlight the readlines() method here, which reads all lines in the file and stores them in a list. In terms of handling newline (\n) characters, I would point you to #Joe Iddon's answer.
Python doesn't have an end-of-file (EOF) indicator, but you could get the same effect this way:
with open(f) as file:
file.seek(0, 2) # go to end of file
eof = file.tell() # get end-of-file position
file.seek(0, 0) # go back to start of file
file.readline()
nextLine = True # maybe nextLine = (file.tell() != eof)
while nextLine:
file.readline()
# Do stuff
if file.tell() == eof:
nextLine = False
But as others have pointed out, you may do better by treating the file as an iterable, like this:
with open(f) as file:
next_line = next(file)
# next loop will terminate when next_line is '',
# i.e., after failing to read another line at end of file
while next_line:
# Do stuff
next_line = next(file)
Files are iterators over lines. If all you want to do is check whether a file has a line left, you can issue line = next(file) and catch the StopIeration raised in case there isn't another line. Alternatively you can use line = next(file, default) with a non-string default value (e.g. None) and then check against that.
Note that in most cases, you know that you are done when the for loop over the file ends, as the other answers have explained. So make sure you actually need that kind of fine grained control with next.
with open(filepath, 'rt+') as f:
for line in f.readlines():
#code to process each line
Opening it this way also closes it when it's finished which is much better on the overall memory usage, which might not matter depending on the file size.
The first lines is comparable to:
f = open(....)
f.readlines() gives you a list of all lines in the file.
The loop will start at the first line and end at then last line and shouldn't throw any errors regarding EOF for example.
[Edit]
notice the 'rt+' in the open method. As far as I'm aware this opens the file in read text mode. I.e. no decode required.

Read large text file without read it into RAM at once

I have a large text file and it's 2GB or more. Of course I shouldn't use read().
I think use readline() maybe is a way, but I don't know how to stop the loop at the end of the file.
I've tried this:
with open('test', 'r') as f:
while True:
try:
f.readline()
except:
break
But when the file is at end, the loop won't stop and will keep print empty string ('').
End of File is defined as an empty string returned by readline. Note that an actual empty line, like every line returned by readline ends with the line separator.
with open('test', 'r') as f:
while True:
line = f.readline()
if line == "":
break
But then again, a file object in python is already iterable.
with open('test', 'r') as f:
for line in f:
print(line.strip())
strip removes whitespace, including the newline, so you don't print double newlines.
And if you don't like it safe, and want the least code possible:
for l in open("text"): print(l.strip())
EDIT: strip removes all kind of whitespaces from both sides. If you actually just want to get rid of ending newlines, you can use rstrip("\n")
You could just use a for statement instead of a while statement. You could do something like
for line in f.readlines()
print(line)
Might help.

How to solve "OSError: telling position disabled by next() call"

I am creating a file editing system and would like to make a line based tell() function instead of a byte based one. This function would be used inside of a "with loop" with the open(file) call. This function is part of a class that has:
self.f = open(self.file, 'a+')
# self.file is a string that has the filename in it
The following is the original function
(It also has a char setting if you wanted line and byte return):
def tell(self, char=False):
t, lc = self.f.tell(), 0
self.f.seek(0)
for line in self.f:
if t >= len(line):
t -= len(line)
lc += 1
else:
break
if char:
return lc, t
return lc
The problem I'm having with this is that this returns an OSError and it has to do with how the system is iterating over the file but I don't understand the issue. Thanks to anyone who can help.
I don't know if this was the original error but you can get the same error if you try to call f.tell() inside of a line-by-line iteration of a file like so:
with open(path, "r+") as f:
for line in f:
f.tell() #OSError
which can be easily substituted by the following:
with open(path, mode) as f:
line = f.readline()
while line:
f.tell() #returns the location of the next line
line = f.readline()
I have an older version of Python 3, and I'm on Linux instead of a Mac, but I was able to recreate something very close to your error:
IOError: telling position disabled by next() call
An IO error, not an OS error, but otherwise the same. Bizarrely enough, I couldn't cause it using your open('a+', ...), but only when opening the file in read mode: open('r+', ...).
Further muddling things is that the error comes from _io.TextIOWrapper, a class that appears to be defined in Python's _pyio.py file... I stress "appears", because:
The TextIOWrapper in that file has attributes like _telling that I can't access on the whatever-it-is object calling itself _io.TextIOWrapper.
The TextIOWrapper class in _pyio.py doesn't make any distinction between readable, writable, or random-access files. Either both should work, or both should raise the same IOError.
Regardless, the TextIOWrapper class as described in the _pyio.py file disables the tell method while the iteration is in progress. This seems to be what you're running into (comments are mine):
def __next__(self):
# Disable the tell method.
self._telling = False
line = self.readline()
if not line:
# We've reached the end of the file...
self._snapshot = None
# ...so restore _telling to whatever it was.
self._telling = self._seekable
raise StopIteration
return line
In your tell method, you almost always break out of the iteration before it reaches the end of the file, leaving _telling disabled (False):
One other way to reset _telling is the flush method, but it also failed if called while the iteration was in progress:
IOError: can't reconstruct logical file position
The way around this, at least on my system, is to call seek(0) on the TextIOWrapper, which restores everything to a known state (and successfully calls flush in the bargain):
def tell(self, char=False):
t, lc = self.f.tell(), 0
self.f.seek(0)
for line in self.f:
if t >= len(line):
t -= len(line)
lc += 1
else:
break
# Reset the file iterator, or later calls to f.tell will
# raise an IOError or OSError:
f.seek(0)
if char:
return lc, t
return lc
If that's not the solution for your system, it might at least tell you where to start looking.
PS: You should consider always returning both the line number and the character offset. Functions that can return completely different types are hard to deal with --- it's a lot easier for the caller to just throw away the value her or she doesn't need.
Just a quick workaround for this issue:
As you are iterating over the file from the beginning anyways, just keep track of where you are with a dedicated variable:
file_pos = 0
with open('file.txt', 'rb') as f:
for line in f:
# process line
file_pos += len(line)
Now file_pos will always be, what file.tell() would tell you. Note that this only works for ASCII files as tell and seek work with byte positions. Working on a line-basis it's easy though to convert strings from byte to unicode-strings.
I had the same error: OSError: telling position disabled by next() call, and solved it by adding the 'rb' mode while opening the file.
The error message is pretty clear, but missing one detail: calling next on a text file object disables the tell method. A for loop repeatedly calls next on iter(f), which happens to be f itself for a file. I ran into a similar issue trying to call tell inside the loop instead of calling your function twice.
An alternative solution is to iterate over the file without using the built-in file iterator. Instead, you can bake a nearly equally efficient iterator from the arcane two-arg form of the iter function:
for line in iter(f.readline, ''):

Categories