How can one delete the very last line of a file with python?
Input File example:
hello
world
foo
bar
Output File example:
hello
world
foo
I've created the following code to find the number of lines in the file - but I do not know how to delete the specific line number.
try:
file = open("file")
except IOError:
print "Failed to read file."
countLines = len(file.readlines())
Because I routinely work with many-gigabyte files, looping through as mentioned in the answers didn't work for me. The solution I use:
with open(sys.argv[1], "r+", encoding = "utf-8") as file:
# Move the pointer (similar to a cursor in a text editor) to the end of the file
file.seek(0, os.SEEK_END)
# This code means the following code skips the very last character in the file -
# i.e. in the case the last line is null we delete the last line
# and the penultimate one
pos = file.tell() - 1
# Read each character in the file one at a time from the penultimate
# character going backwards, searching for a newline character
# If we find a new line, exit the search
while pos > 0 and file.read(1) != "\n":
pos -= 1
file.seek(pos, os.SEEK_SET)
# So long as we're not at the start of the file, delete all the characters ahead
# of this position
if pos > 0:
file.seek(pos, os.SEEK_SET)
file.truncate()
You could use the above code and then:-
lines = file.readlines()
lines = lines[:-1]
This would give you an array of lines containing all lines but the last one.
This doesn't use python, but python's the wrong tool for the job if this is the only task you want. You can use the standard *nix utility head, and run
head -n-1 filename > newfile
which will copy all but the last line of filename to newfile.
Assuming you have to do this in Python and that you have a large enough file that list slicing isn't sufficient, you can do it in a single pass over the file:
last_line = None
for line in file:
if last_line:
print last_line # or write to a file, call a function, etc.
last_line = line
Not the most elegant code in the world but it gets the job done.
Basically it buffers each line in a file through the last_line variable, each iteration outputs the previous iterations line.
here is my solution for linux users:
import os
file_path = 'test.txt'
os.system('sed -i "$ d" {0}'.format(file_path))
no need to read and iterate through the file in python.
On systems where file.truncate() works, you could do something like this:
file = open('file.txt', 'rb')
pos = next = 0
for line in file:
pos = next # position of beginning of this line
next += len(line) # compute position of beginning of next line
file = open('file.txt', 'ab')
file.truncate(pos)
According to my tests, file.tell() doesn't work when reading by line, presumably due to buffering confusing it. That's why this adds up the lengths of the lines to figure out positions. Note that this only works on systems where the line delimiter ends with '\n'.
Here's a more general memory-efficient solution allowing the last 'n' lines to be skipped (like the head command):
import collections, fileinput
def head(filename, lines_to_delete=1):
queue = collections.deque()
lines_to_delete = max(0, lines_to_delete)
for line in fileinput.input(filename, inplace=True, backup='.bak'):
queue.append(line)
if lines_to_delete == 0:
print queue.popleft(),
else:
lines_to_delete -= 1
queue.clear()
Inspiring from previous posts, I propound this:
with open('file_name', 'r+') as f:
f.seek(0, os.SEEK_END)
while f.tell() and f.read(1) != '\n':
f.seek(-2, os.SEEK_CUR)
f.truncate()
Though I have not tested it (please, no hate for that) I believe that there's a faster way of going it. It's more of a C solution, but quite possible in Python. It's not Pythonic, either. It's a theory, I'd say.
First, you need to know the encoding of the file. Set a variable to the number of bytes a character in that encoding uses (1 byte in ASCII). CHARsize (why not). Probably going to be 1 byte with an ASCII file.
Then grab the size of the file, set FILEsize to it.
Assume you have the address of the file (in memory) in FILEadd.
Add FILEsize to FILEadd.
Move backwords (increment by -1***CHARsize**), testing each CHARsize bytes for a \n (or whatever newline your system uses). When you reach the first \n, you now have the position of the beginning of the first line of the file. Replace \n with \x1a (26, the ASCII for EOF, or whatever that is one your system/with the encoding).
Clean up however you need to (change the filesize, touch the file).
If this works as I suspect it would, you're going to save a lot of time, as you don't need to read through the whole file from the beginning, you read from the end.
here's another way, without slurping the whole file into memory
p=""
f=open("file")
for line in f:
line=line.strip()
print p
p=line
f.close()
Related
I'm working on a script to parse text files into a spreadsheet for myself, and in doing so I need to read through them. The issue is finding out when to stop. Java has a method attached when reading called hasNext() or hasNextLine() I was wondering if there was something like that in Python? For some reason I can't find this anywhere.
Ex:
open(f) as file:
file.readline()
nextLine = true
while nextLine:
file.readline()
Do stuff
if not file.hasNextLine():
nextLine = false
Just use a for loop to iterate over the file object:
for line in file:
#do stuff..
Note that this includes the new line char (\n) at the end of each line string. This can be removed through either:
for line in file:
line = line[:-1]
#do stuff...
or:
for line in (l[:-1] for l in file):
#do stuff...
You can only check if the file has another line by reading it (although you can check if you are at the end of the file with file.tell without any reading).
This can be done through calling file.readline and checking if the string is not empty or timgeb's method of calling next and catching the StopIteration exception.
So to answer your question exactly, you can check whether a file has another line through:
next_line = file.readline():
if next_line:
#has next line, do whatever...
or, without modifying the current file pointer:
def has_another_line(file):
cur_pos = file.tell()
does_it = bool(file.readline())
file.seek(cur_pos)
return does_it
which resets the file pointer resetting the file object back to its original state.
e.g.
$ printf "hello\nthere\nwhat\nis\nup\n" > f.txt
$ python -q
>>> f = open('f.txt')
>>> def has_another_line(file):
... cur_pos = file.tell()
... does_it = bool(file.readline())
... file.seek(cur_pos)
... return does_it
...
>>> has_another_line(f)
True
>>> f.readline()
'hello\n'
The typical cadence that I use for reading text files is this:
with open('myfile.txt', 'r') as myfile:
lines = myfile.readlines()
for line in lines:
if 'this' in line: #Your criteria here to skip lines
continue
#Do something here
Using with will only keep the file open until you have executed all of the code within it's block, then the file will be closed. I also think it's valuable to highlight the readlines() method here, which reads all lines in the file and stores them in a list. In terms of handling newline (\n) characters, I would point you to #Joe Iddon's answer.
Python doesn't have an end-of-file (EOF) indicator, but you could get the same effect this way:
with open(f) as file:
file.seek(0, 2) # go to end of file
eof = file.tell() # get end-of-file position
file.seek(0, 0) # go back to start of file
file.readline()
nextLine = True # maybe nextLine = (file.tell() != eof)
while nextLine:
file.readline()
# Do stuff
if file.tell() == eof:
nextLine = False
But as others have pointed out, you may do better by treating the file as an iterable, like this:
with open(f) as file:
next_line = next(file)
# next loop will terminate when next_line is '',
# i.e., after failing to read another line at end of file
while next_line:
# Do stuff
next_line = next(file)
Files are iterators over lines. If all you want to do is check whether a file has a line left, you can issue line = next(file) and catch the StopIeration raised in case there isn't another line. Alternatively you can use line = next(file, default) with a non-string default value (e.g. None) and then check against that.
Note that in most cases, you know that you are done when the for loop over the file ends, as the other answers have explained. So make sure you actually need that kind of fine grained control with next.
with open(filepath, 'rt+') as f:
for line in f.readlines():
#code to process each line
Opening it this way also closes it when it's finished which is much better on the overall memory usage, which might not matter depending on the file size.
The first lines is comparable to:
f = open(....)
f.readlines() gives you a list of all lines in the file.
The loop will start at the first line and end at then last line and shouldn't throw any errors regarding EOF for example.
[Edit]
notice the 'rt+' in the open method. As far as I'm aware this opens the file in read text mode. I.e. no decode required.
I want to do a lot of boring C# code replacements automatically through a python script. I read all lines of the file, transform them, truncate the whole file, write new strings and close it.
f = open(file, 'r+')
text = f.readlines()
# some changes
f.truncate(0)
for line in text:
f.write(line)
f.close()
All my changes are written. But some strange characters in the beginning of the file appear. I don't know how to avoid them. Even if I open with encoding='utf-8-sig' it doesn't help.
I tried truncate whole file besides the 1st line like this:
import sys
f.truncate(sys.getsizeof(text[0]))
for index in range(1, len(text), 1):
f.write(text[index])
But in this case more than 1st line is writing instead of only first line.
EDIT
I tried this:
f.truncate(len(text[0]))
for index in range(1, len(text), 1):
f.write(text[index])
And the first line has written correct but next one with the same issue. So I think this characters from the end of the file and I try to write after them.
f=open(file, 'r+')
text = f.readlines() # After reading all the lines, the pointer is at the end of the file.
# some changes
f.seek(0) # To bring the pointer back to the starting of the file.
f.truncate() # Don't pass any value in truncate() as it means number of bytes to be truncated by default size of file.
for line in text:
f.write(line)
f.close()
Check out this Link for more details.
I am trying to move each line down at the bottom of the file; this is how the file look like:
daodaos 12391039
idiejda 94093420
jfijdsf 10903213
....
#completed
So at the end of the parsing, I am planning to get all the entry that are on the top, under the actual string that says # completed.
The problem is that I am not sure how can I do this in one pass; I know that I can read the whole file, every single line, close the file and then re-open the file in write mode; searching for that line, removing it from the file and adding it to the end; but it feels incredibly inefficient.
Is there a way in one pass, to process the current line; then in the same for loop, delete the line and append it at the end of the file?
file = open('myfile.txt', 'a')
for items in file:
#process items line
#append items line to the end of the file
#remove items line from the file
suggest to keep it simple read and writeback
with open('myfile.txt') as f:
lines = f.readlines()
with open('myfile.txt', 'w') as f:
newlines = []
for line in lines:
# do you stuff, check if completed, rearrange the list
if line.startswith('#completed'):
idx=i
newlines = lines[idx:] + lines[:idx]
break
f.write(''.join(newlines)) # write back new lines
below is another version i could think of if insist wanna modify while reading
with open('myfile.txt', 'r+') as f:
newlines = ''
line = True
while line:
line = f.readline()
if line.startswith('#completed'):
# line += f.read() # uncomment this line if you interest on line after #completed
f.truncate()
f.seek(0)
f.write(line + newlines)
break
else:
newlines += line
Not really.
Your main problem here is that you're iterating on the file at the same time you want to change it. This will Do Bad Things (tm) to your processing, unless you plan to micro-manage the file position pointer.
You do have that power: the seek method lets you move to a given file location, expressed in bytes. seek(0) moves to the start of the file; seek(-1) to the end. The problem you face is that your for loop trusts that this pointer indicates the next line to read.
One distinct problem is that you can't just remove a line from the middle of the file; something exists in those bytes. Think of it as lines of text on a page, written in pencil. You can erase line 4, but this does not cause lines 5-end to magically float up half a centimeter; they're still in the same physical location.
How to Do It ... sort of
Read all of the lines into a list. You can easily change a list the way you want. When you hit the end, then write the list back to the file -- or use your magic seek and append powers to alter only a little of it.
I'll recommend you to do this the simple way: read all the file and store it in a variable, move the completed files to another variable and then rewrite your file.
As a practice, I am learning to reading a file.
As is obvious from code, hopefully, I have a file in working/root whatever directory. I need to read it and print it.
my_file=open("new.txt","r")
lengt=sum(1 for line in my_file)
for i in range(0,lengt-1):
myline=my_file.readlines(1)[0]
print(myline)
my_file.close()
This returns error and says out of range.
The text file simply contains statements like
line one
line two
line three
.
.
.
Everything same, I tried myline=my_file.readline(). I get empty 7 lines.
My guess is that while using for line in my_file, I read up the lines. So reached end of document. To get same result as I desire, I do I overcome this?
P.S. if it mattersm it's python 3.3
No need to count along. Python does it for you:
my_file = open("new.txt","r")
for myline in my_file:
print(myline)
Details:
my_file is an iterator. This a special object that allows to iterate over it.
You can also access a single line:
line 1 = next(my_file)
gives you the first line assuming you just opened the file. Doing it again:
line 2 = next(my_file)
you get the second line. If you now iterate over it:
for myline in my_file:
# do something
it will start at line 3.
Stange extra lines?
print(myline)
will likely print an extra empty line. This is due to a newline read from the file and a newline added by print(). Solution:
Python 3:
print(myline, end='')
Python 2:
print myline, # note the trailing comma.
Playing it save
Using the with statement like this:
with open("new.txt", "r") as my_file:
for myline in my_file:
print(myline)
# my_file is open here
# my_file is closed here
you don't need to close the file as it done as soon you leave the context, i.e. as soon as you continue with your code an the same level as the with statement.
You can actually take care of all of this at once by iterating over the file contents:
my_file = open("new.txt", "r")
length = 0
for line in my_file:
length += 1
print(line)
my_file.close()
At the end, you will have printed all of the lines, and length will contain the number of lines in the file. (If you don't specifically need to know length, there's really no need for it!)
Another way to do it, which will close the file for you (and, in fact, will even close the file if an exception is raised):
length = 0
with open("new.txt", "r") as my_file:
for line in my_file:
length += 1
print(line)
How can one delete the very last line of a file with python?
Input File example:
hello
world
foo
bar
Output File example:
hello
world
foo
I've created the following code to find the number of lines in the file - but I do not know how to delete the specific line number.
try:
file = open("file")
except IOError:
print "Failed to read file."
countLines = len(file.readlines())
Because I routinely work with many-gigabyte files, looping through as mentioned in the answers didn't work for me. The solution I use:
with open(sys.argv[1], "r+", encoding = "utf-8") as file:
# Move the pointer (similar to a cursor in a text editor) to the end of the file
file.seek(0, os.SEEK_END)
# This code means the following code skips the very last character in the file -
# i.e. in the case the last line is null we delete the last line
# and the penultimate one
pos = file.tell() - 1
# Read each character in the file one at a time from the penultimate
# character going backwards, searching for a newline character
# If we find a new line, exit the search
while pos > 0 and file.read(1) != "\n":
pos -= 1
file.seek(pos, os.SEEK_SET)
# So long as we're not at the start of the file, delete all the characters ahead
# of this position
if pos > 0:
file.seek(pos, os.SEEK_SET)
file.truncate()
You could use the above code and then:-
lines = file.readlines()
lines = lines[:-1]
This would give you an array of lines containing all lines but the last one.
This doesn't use python, but python's the wrong tool for the job if this is the only task you want. You can use the standard *nix utility head, and run
head -n-1 filename > newfile
which will copy all but the last line of filename to newfile.
Assuming you have to do this in Python and that you have a large enough file that list slicing isn't sufficient, you can do it in a single pass over the file:
last_line = None
for line in file:
if last_line:
print last_line # or write to a file, call a function, etc.
last_line = line
Not the most elegant code in the world but it gets the job done.
Basically it buffers each line in a file through the last_line variable, each iteration outputs the previous iterations line.
here is my solution for linux users:
import os
file_path = 'test.txt'
os.system('sed -i "$ d" {0}'.format(file_path))
no need to read and iterate through the file in python.
On systems where file.truncate() works, you could do something like this:
file = open('file.txt', 'rb')
pos = next = 0
for line in file:
pos = next # position of beginning of this line
next += len(line) # compute position of beginning of next line
file = open('file.txt', 'ab')
file.truncate(pos)
According to my tests, file.tell() doesn't work when reading by line, presumably due to buffering confusing it. That's why this adds up the lengths of the lines to figure out positions. Note that this only works on systems where the line delimiter ends with '\n'.
Here's a more general memory-efficient solution allowing the last 'n' lines to be skipped (like the head command):
import collections, fileinput
def head(filename, lines_to_delete=1):
queue = collections.deque()
lines_to_delete = max(0, lines_to_delete)
for line in fileinput.input(filename, inplace=True, backup='.bak'):
queue.append(line)
if lines_to_delete == 0:
print queue.popleft(),
else:
lines_to_delete -= 1
queue.clear()
Inspiring from previous posts, I propound this:
with open('file_name', 'r+') as f:
f.seek(0, os.SEEK_END)
while f.tell() and f.read(1) != '\n':
f.seek(-2, os.SEEK_CUR)
f.truncate()
Though I have not tested it (please, no hate for that) I believe that there's a faster way of going it. It's more of a C solution, but quite possible in Python. It's not Pythonic, either. It's a theory, I'd say.
First, you need to know the encoding of the file. Set a variable to the number of bytes a character in that encoding uses (1 byte in ASCII). CHARsize (why not). Probably going to be 1 byte with an ASCII file.
Then grab the size of the file, set FILEsize to it.
Assume you have the address of the file (in memory) in FILEadd.
Add FILEsize to FILEadd.
Move backwords (increment by -1***CHARsize**), testing each CHARsize bytes for a \n (or whatever newline your system uses). When you reach the first \n, you now have the position of the beginning of the first line of the file. Replace \n with \x1a (26, the ASCII for EOF, or whatever that is one your system/with the encoding).
Clean up however you need to (change the filesize, touch the file).
If this works as I suspect it would, you're going to save a lot of time, as you don't need to read through the whole file from the beginning, you read from the end.
here's another way, without slurping the whole file into memory
p=""
f=open("file")
for line in f:
line=line.strip()
print p
p=line
f.close()