End-line characters from lines read from text file, using Python - python

When reading lines from a text file using python, the end-line character often needs to be truncated before processing the text, as in the following example:
f = open("myFile.txt", "r")
for line in f:
line = line[:-1]
# do something with line
Is there an elegant way or idiom for retrieving text lines without the end-line character?

The idiomatic way to do this in Python is to use rstrip('\n'):
for line in open('myfile.txt'): # opened in text-mode; all EOLs are converted to '\n'
line = line.rstrip('\n')
process(line)
Each of the other alternatives has a gotcha:
file('...').read().splitlines() has to load the whole file in memory at once.
line = line[:-1] will fail if the last line has no EOL.

Simple. Use splitlines()
L = open("myFile.txt", "r").read().splitlines();
for line in L:
process(line) # this 'line' will not have '\n' character at the end

What's wrong with your code? I find it to be quite elegant and simple. The only problem is that if the file doesn't end in a newline, the last line returned won't have a '\n' as the last character, and therefore doing line = line[:-1] would incorrectly strip off the last character of the line.
The most elegant way to solve this problem would be to define a generator which took the lines of the file and removed the last character from each line only if that character is a newline:
def strip_trailing_newlines(file):
for line in file:
if line[-1] == '\n':
yield line[:-1]
else:
yield line
f = open("myFile.txt", "r")
for line in strip_trailing_newlines(f):
# do something with line

Long time ago, there was Dear, clean, old, BASIC code that could run on 16 kb core machines:
like that:
if (not open(1,"file.txt")) error "Could not open 'file.txt' for reading"
while(not eof(1))
line input #1 a$
print a$
wend
close
Now, to read a file line by line, with far better hardware and software (Python), we must reinvent the wheel:
def line_input (file):
for line in file:
if line[-1] == '\n':
yield line[:-1]
else:
yield line
f = open("myFile.txt", "r")
for line_input(f):
# do something with line
I am induced to think that something has gone the wrong way somewhere...

What do you thing about this approach?
with open(filename) as data:
datalines = (line.rstrip('\r\n') for line in data)
for line in datalines:
...do something awesome...
Generator expression avoids loading whole file into memory and with ensures closing the file

You may also consider using line.rstrip() to remove the whitespaces at the end of your line.

Related

Can you delete all the lines of a .txt file after a specific line?

I am trying to delete all the lines in a text file after a line that contains a specific string. What I am trying to do is find the number of the line in said file and rewrite the whole text up until that line.
The code that I'm trying is the following:
import itertools as it
with open('sampletext.txt', "r") as rf:
for num, line in enumerate(rf, 1): #Finds the number of the line in which a specific string is contained
if 'string' in line:
print(num)
with open('sampletext_copy.txt', "w") as wf:
for line in it.islice(rf, 0, num):
wf.write(line)
Also would appreciate any tips on how to do this. Thank you!
You could do it like this:
with open('sampletext.txt', "r") as rf, open('sampletext_copy.txt', "w") as wf:
for line in rf:
if 'string' in line:
break
wf.write(line)
Basically, you open both files at the same time, then read the input file line-by-line. If string is in the line, then you're done - otherwise, write it to the output file.
In case if you want to apply changes to original file, it's possible to do using .truncate() method of file object:
with open(r"sampletext.txt", "r+") as f:
while line := f.readline():
if line.rstrip() == "string": # line.startswith("string")
f.truncate(f.tell()) # removes all content after current position
break
Here we iterating over file until reach this specific line and resize stream to size of bytes we've already read (to get it we use .tell()).
Just to complement Donut's answer, if you want to modify the file in place, there's a much more efficient solution:
with open('sampletext.txt', "r+") as f:
for line in iter(f.readline, ''): # Can't use for line in f: because it disables
# tell for txt
# Or for walrus lovers:
# while line := f.readline():
if 'string' in line:
f.seek(0, 1) # Needed to ensure underlying handle matches logical read
# position; f.seek(f.tell()) is logically equivalent
f.truncate()
break
If issue #26158 is ever fixed (so calling truncate on a file actually truncates at the logical position, not the arbitrary position of the underlying raw handle that's likely advanced a great deal due to buffering), this simpler code would work:
with open('sampletext.txt', "r+") as f:
for line in f:
if 'string' in line:
f.truncate()
break

Remove last line from text file [duplicate]

How can one delete the very last line of a file with python?
Input File example:
hello
world
foo
bar
Output File example:
hello
world
foo
I've created the following code to find the number of lines in the file - but I do not know how to delete the specific line number.
try:
file = open("file")
except IOError:
print "Failed to read file."
countLines = len(file.readlines())
Because I routinely work with many-gigabyte files, looping through as mentioned in the answers didn't work for me. The solution I use:
with open(sys.argv[1], "r+", encoding = "utf-8") as file:
# Move the pointer (similar to a cursor in a text editor) to the end of the file
file.seek(0, os.SEEK_END)
# This code means the following code skips the very last character in the file -
# i.e. in the case the last line is null we delete the last line
# and the penultimate one
pos = file.tell() - 1
# Read each character in the file one at a time from the penultimate
# character going backwards, searching for a newline character
# If we find a new line, exit the search
while pos > 0 and file.read(1) != "\n":
pos -= 1
file.seek(pos, os.SEEK_SET)
# So long as we're not at the start of the file, delete all the characters ahead
# of this position
if pos > 0:
file.seek(pos, os.SEEK_SET)
file.truncate()
You could use the above code and then:-
lines = file.readlines()
lines = lines[:-1]
This would give you an array of lines containing all lines but the last one.
This doesn't use python, but python's the wrong tool for the job if this is the only task you want. You can use the standard *nix utility head, and run
head -n-1 filename > newfile
which will copy all but the last line of filename to newfile.
Assuming you have to do this in Python and that you have a large enough file that list slicing isn't sufficient, you can do it in a single pass over the file:
last_line = None
for line in file:
if last_line:
print last_line # or write to a file, call a function, etc.
last_line = line
Not the most elegant code in the world but it gets the job done.
Basically it buffers each line in a file through the last_line variable, each iteration outputs the previous iterations line.
here is my solution for linux users:
import os
file_path = 'test.txt'
os.system('sed -i "$ d" {0}'.format(file_path))
no need to read and iterate through the file in python.
On systems where file.truncate() works, you could do something like this:
file = open('file.txt', 'rb')
pos = next = 0
for line in file:
pos = next # position of beginning of this line
next += len(line) # compute position of beginning of next line
file = open('file.txt', 'ab')
file.truncate(pos)
According to my tests, file.tell() doesn't work when reading by line, presumably due to buffering confusing it. That's why this adds up the lengths of the lines to figure out positions. Note that this only works on systems where the line delimiter ends with '\n'.
Here's a more general memory-efficient solution allowing the last 'n' lines to be skipped (like the head command):
import collections, fileinput
def head(filename, lines_to_delete=1):
queue = collections.deque()
lines_to_delete = max(0, lines_to_delete)
for line in fileinput.input(filename, inplace=True, backup='.bak'):
queue.append(line)
if lines_to_delete == 0:
print queue.popleft(),
else:
lines_to_delete -= 1
queue.clear()
Inspiring from previous posts, I propound this:
with open('file_name', 'r+') as f:
f.seek(0, os.SEEK_END)
while f.tell() and f.read(1) != '\n':
f.seek(-2, os.SEEK_CUR)
f.truncate()
Though I have not tested it (please, no hate for that) I believe that there's a faster way of going it. It's more of a C solution, but quite possible in Python. It's not Pythonic, either. It's a theory, I'd say.
First, you need to know the encoding of the file. Set a variable to the number of bytes a character in that encoding uses (1 byte in ASCII). CHARsize (why not). Probably going to be 1 byte with an ASCII file.
Then grab the size of the file, set FILEsize to it.
Assume you have the address of the file (in memory) in FILEadd.
Add FILEsize to FILEadd.
Move backwords (increment by -1***CHARsize**), testing each CHARsize bytes for a \n (or whatever newline your system uses). When you reach the first \n, you now have the position of the beginning of the first line of the file. Replace \n with \x1a (26, the ASCII for EOF, or whatever that is one your system/with the encoding).
Clean up however you need to (change the filesize, touch the file).
If this works as I suspect it would, you're going to save a lot of time, as you don't need to read through the whole file from the beginning, you read from the end.
here's another way, without slurping the whole file into memory
p=""
f=open("file")
for line in f:
line=line.strip()
print p
p=line
f.close()

How to print line of Text file if Semicolon is at the end

I have a text file.
Test.txt
this is line one; this line one
this is line two;
this is line three
I want to print line which contains semicolon but the semicolon should be at the end of line.
My code
search = open("Test.txt","r")
for line in search :
if ";" in line:
semi = line.split(";")
if semi[-1] == "\n":
print(line)
Output
this is line two;
My code is working fine but i want a better way to do this.
Can any one tell me short and most pythonic way to do this ?
For sure its easier
for line in search :
if line.endswith(';\n'):
print(line)
And as #IMCoins noted it's better to use context manager with to close your file when you're done working:
with open("Test.txt","r") as test_file:
for line in test_file:
if line.endswith(';\n'):
print(line)
Use the with keyword at first to open a file :
with open('foo.txt', 'r') as f:
for line in f:
if ';' in line:
semi = line.split(';')
if semi[-1] == '\n':
print line
For me, it is already mostly pythonic as you're using built-in function, with for loop.
if line[:-2] == ';\n':
print(line)
works correctly for python 2 & 3
also works if line is only a '/n'

Read large text file without read it into RAM at once

I have a large text file and it's 2GB or more. Of course I shouldn't use read().
I think use readline() maybe is a way, but I don't know how to stop the loop at the end of the file.
I've tried this:
with open('test', 'r') as f:
while True:
try:
f.readline()
except:
break
But when the file is at end, the loop won't stop and will keep print empty string ('').
End of File is defined as an empty string returned by readline. Note that an actual empty line, like every line returned by readline ends with the line separator.
with open('test', 'r') as f:
while True:
line = f.readline()
if line == "":
break
But then again, a file object in python is already iterable.
with open('test', 'r') as f:
for line in f:
print(line.strip())
strip removes whitespace, including the newline, so you don't print double newlines.
And if you don't like it safe, and want the least code possible:
for l in open("text"): print(l.strip())
EDIT: strip removes all kind of whitespaces from both sides. If you actually just want to get rid of ending newlines, you can use rstrip("\n")
You could just use a for statement instead of a while statement. You could do something like
for line in f.readlines()
print(line)
Might help.

Combined effect of reading lines twice?

As a practice, I am learning to reading a file.
As is obvious from code, hopefully, I have a file in working/root whatever directory. I need to read it and print it.
my_file=open("new.txt","r")
lengt=sum(1 for line in my_file)
for i in range(0,lengt-1):
myline=my_file.readlines(1)[0]
print(myline)
my_file.close()
This returns error and says out of range.
The text file simply contains statements like
line one
line two
line three
.
.
.
Everything same, I tried myline=my_file.readline(). I get empty 7 lines.
My guess is that while using for line in my_file, I read up the lines. So reached end of document. To get same result as I desire, I do I overcome this?
P.S. if it mattersm it's python 3.3
No need to count along. Python does it for you:
my_file = open("new.txt","r")
for myline in my_file:
print(myline)
Details:
my_file is an iterator. This a special object that allows to iterate over it.
You can also access a single line:
line 1 = next(my_file)
gives you the first line assuming you just opened the file. Doing it again:
line 2 = next(my_file)
you get the second line. If you now iterate over it:
for myline in my_file:
# do something
it will start at line 3.
Stange extra lines?
print(myline)
will likely print an extra empty line. This is due to a newline read from the file and a newline added by print(). Solution:
Python 3:
print(myline, end='')
Python 2:
print myline, # note the trailing comma.
Playing it save
Using the with statement like this:
with open("new.txt", "r") as my_file:
for myline in my_file:
print(myline)
# my_file is open here
# my_file is closed here
you don't need to close the file as it done as soon you leave the context, i.e. as soon as you continue with your code an the same level as the with statement.
You can actually take care of all of this at once by iterating over the file contents:
my_file = open("new.txt", "r")
length = 0
for line in my_file:
length += 1
print(line)
my_file.close()
At the end, you will have printed all of the lines, and length will contain the number of lines in the file. (If you don't specifically need to know length, there's really no need for it!)
Another way to do it, which will close the file for you (and, in fact, will even close the file if an exception is raised):
length = 0
with open("new.txt", "r") as my_file:
for line in my_file:
length += 1
print(line)

Categories