Related
Let's say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?
First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
if line.strip("\n") != "nickname_to_delete":
f.write(line)
You need to strip("\n") the newline character in the comparison because if your file doesn't end with a newline character the very last line won't either.
Solution to this problem with only a single open:
with open("target.txt", "r+") as f:
d = f.readlines()
f.seek(0)
for i in d:
if i != "line you want to remove...":
f.write(i)
f.truncate()
This solution opens the file in r/w mode ("r+") and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.
The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.
with open("yourfile.txt", "r") as file_input:
with open("newfile.txt", "w") as output:
for line in file_input:
if line.strip("\n") != "nickname_to_delete":
output.write(line)
That's it! In one loop and one only you can do the same thing. It will be much faster.
This is a "fork" from #Lother's answer (which I believe that should be considered the right answer).
For a file like this:
$ cat file.txt
1: october rust
2: november rain
3: december snow
This fork from Lother's solution works fine:
#!/usr/bin/python3.4
with open("file.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "snow" not in line:
f.write(line)
f.truncate()
Improvements:
with open, which discard the usage of f.close()
more clearer if/else for evaluating if string is not present in the current line
The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don't need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.
I liked the fileinput approach as explained in this answer:
Deleting a line from a text file (python)
Say for example I have a file which has empty lines in it and I want to remove empty lines, here's how I solved it:
import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
if len(line) > 1:
sys.stdout.write(line)
Note: The empty lines in my case had length 1
If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:
$ cat animal.txt
dog
pig
cat
monkey
elephant
Delete the first line:
>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt'])
then
$ cat animal.txt
pig
cat
monkey
elephant
Probably, you already got a correct answer, but here is mine.
Instead of using a list to collect unfiltered data (what readlines() method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:
main_file = open('data_base.txt').read() # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
if 'your data to delete' not in line: # remove a specific string
main_file.write(line) # put all strings back to your db except deleted
else: pass
main_file.close()
Hope you will find this useful! :)
I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you'll have to write the result back to the source file.
Here's how I might do this:
import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']
I'm assuming nicknames.csv contains data like:
Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...
Then load the file into the list:
nicknames = None
with open("nicknames.csv") as sourceFile:
nicknames = sourceFile.read().splitlines()
Next, iterate over to list to match your inputs to delete:
for nick in nicknames_to_delete:
try:
if nick in nicknames:
nicknames.pop(nicknames.index(nick))
else:
print(nick + " is not found in the file")
except ValueError:
pass
Lastly, write the result back to file:
with open("nicknames.csv", "a") as nicknamesFile:
nicknamesFile.seek(0)
nicknamesFile.truncate()
nicknamesWriter = csv.writer(nicknamesFile)
for name in nicknames:
nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()
In general, you can't; you have to write the whole file again (at least from the point of change to the end).
In some specific cases you can do better than this -
if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;
or you could just overwrite the data chunk with a 'this is bad data, skip it' value or keep a 'this item has been deleted' flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.
This is probably overkill for short documents (anything under 100 KB?).
I like this method using fileinput and the 'inplace' method:
import fileinput
for line in fileinput.input(fname, inplace =1):
line = line.strip()
if not 'UnwantedWord' in line:
print(line)
It's a little less wordy than the other answers and is fast enough for
Save the file lines in a list, then remove of the list the line you want to delete and write the remain lines to a new file
with open("file_name.txt", "r") as f:
lines = f.readlines()
lines.remove("Line you want to delete\n")
with open("new_file.txt", "w") as new_f:
for line in lines:
new_f.write(line)
here's some other method to remove a/some line(s) from a file:
src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()
contents.pop(idx) # remove the line item from list, by line number, starts from 0
f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()
You can use the re library
Assuming that you are able to load your full txt-file. You then define a list of unwanted nicknames and then substitute them with an empty string "".
# Delete unwanted characters
import re
# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)
Do you want to remove a specific line from file so use this snippet short and simple code you can easily remove any line with sentence or prefix(Symbol).
with open("file_name.txt", "r") as f:
lines = f.readlines()
with open("new_file.txt", "w") as new_f:
for line in lines:
if not line.startswith("write any sentence or symbol to remove line"):
new_f.write(line)
To delete a specific line of a file by its line number:
Replace variables filename and line_to_delete with the name of your file and the line number you want to delete.
filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}
with open(filename) as f:
content = f.readlines()
for line in content:
file_lines[initial_line] = line.strip()
initial_line += 1
f = open(filename, "w")
for line_number, line_content in file_lines.items():
if line_number != line_to_delete:
f.write('{}\n'.format(line_content))
f.close()
print('Deleted line: {}'.format(line_to_delete))
Example output:
Deleted line: 3
Take the contents of the file, split it by newline into a tuple. Then, access your tuple's line number, join your result tuple, and overwrite to the file.
Let's say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?
First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
if line.strip("\n") != "nickname_to_delete":
f.write(line)
You need to strip("\n") the newline character in the comparison because if your file doesn't end with a newline character the very last line won't either.
Solution to this problem with only a single open:
with open("target.txt", "r+") as f:
d = f.readlines()
f.seek(0)
for i in d:
if i != "line you want to remove...":
f.write(i)
f.truncate()
This solution opens the file in r/w mode ("r+") and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.
The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.
with open("yourfile.txt", "r") as file_input:
with open("newfile.txt", "w") as output:
for line in file_input:
if line.strip("\n") != "nickname_to_delete":
output.write(line)
That's it! In one loop and one only you can do the same thing. It will be much faster.
This is a "fork" from #Lother's answer (which I believe that should be considered the right answer).
For a file like this:
$ cat file.txt
1: october rust
2: november rain
3: december snow
This fork from Lother's solution works fine:
#!/usr/bin/python3.4
with open("file.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "snow" not in line:
f.write(line)
f.truncate()
Improvements:
with open, which discard the usage of f.close()
more clearer if/else for evaluating if string is not present in the current line
The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don't need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.
I liked the fileinput approach as explained in this answer:
Deleting a line from a text file (python)
Say for example I have a file which has empty lines in it and I want to remove empty lines, here's how I solved it:
import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
if len(line) > 1:
sys.stdout.write(line)
Note: The empty lines in my case had length 1
If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:
$ cat animal.txt
dog
pig
cat
monkey
elephant
Delete the first line:
>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt'])
then
$ cat animal.txt
pig
cat
monkey
elephant
Probably, you already got a correct answer, but here is mine.
Instead of using a list to collect unfiltered data (what readlines() method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:
main_file = open('data_base.txt').read() # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
if 'your data to delete' not in line: # remove a specific string
main_file.write(line) # put all strings back to your db except deleted
else: pass
main_file.close()
Hope you will find this useful! :)
I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you'll have to write the result back to the source file.
Here's how I might do this:
import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']
I'm assuming nicknames.csv contains data like:
Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...
Then load the file into the list:
nicknames = None
with open("nicknames.csv") as sourceFile:
nicknames = sourceFile.read().splitlines()
Next, iterate over to list to match your inputs to delete:
for nick in nicknames_to_delete:
try:
if nick in nicknames:
nicknames.pop(nicknames.index(nick))
else:
print(nick + " is not found in the file")
except ValueError:
pass
Lastly, write the result back to file:
with open("nicknames.csv", "a") as nicknamesFile:
nicknamesFile.seek(0)
nicknamesFile.truncate()
nicknamesWriter = csv.writer(nicknamesFile)
for name in nicknames:
nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()
In general, you can't; you have to write the whole file again (at least from the point of change to the end).
In some specific cases you can do better than this -
if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;
or you could just overwrite the data chunk with a 'this is bad data, skip it' value or keep a 'this item has been deleted' flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.
This is probably overkill for short documents (anything under 100 KB?).
I like this method using fileinput and the 'inplace' method:
import fileinput
for line in fileinput.input(fname, inplace =1):
line = line.strip()
if not 'UnwantedWord' in line:
print(line)
It's a little less wordy than the other answers and is fast enough for
Save the file lines in a list, then remove of the list the line you want to delete and write the remain lines to a new file
with open("file_name.txt", "r") as f:
lines = f.readlines()
lines.remove("Line you want to delete\n")
with open("new_file.txt", "w") as new_f:
for line in lines:
new_f.write(line)
here's some other method to remove a/some line(s) from a file:
src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()
contents.pop(idx) # remove the line item from list, by line number, starts from 0
f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()
You can use the re library
Assuming that you are able to load your full txt-file. You then define a list of unwanted nicknames and then substitute them with an empty string "".
# Delete unwanted characters
import re
# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)
Do you want to remove a specific line from file so use this snippet short and simple code you can easily remove any line with sentence or prefix(Symbol).
with open("file_name.txt", "r") as f:
lines = f.readlines()
with open("new_file.txt", "w") as new_f:
for line in lines:
if not line.startswith("write any sentence or symbol to remove line"):
new_f.write(line)
To delete a specific line of a file by its line number:
Replace variables filename and line_to_delete with the name of your file and the line number you want to delete.
filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}
with open(filename) as f:
content = f.readlines()
for line in content:
file_lines[initial_line] = line.strip()
initial_line += 1
f = open(filename, "w")
for line_number, line_content in file_lines.items():
if line_number != line_to_delete:
f.write('{}\n'.format(line_content))
f.close()
print('Deleted line: {}'.format(line_to_delete))
Example output:
Deleted line: 3
Take the contents of the file, split it by newline into a tuple. Then, access your tuple's line number, join your result tuple, and overwrite to the file.
How could I print the final line of a text file read in with python?
fi=open(inputFile,"r")
for line in fi:
#go to last line and print it
One option is to use file.readlines():
f1 = open(inputFile, "r")
last_line = f1.readlines()[-1]
f1.close()
If you don't need the file after, though, it is recommended to use contexts using with, so that the file is automatically closed after:
with open(inputFile, "r") as f1:
last_line = f1.readlines()[-1]
Do you need to be efficient by not reading all the lines into memory at once? Instead you can iterate over the file object.
with open(inputfile, "r") as f:
for line in f: pass
print line #this is the last line of the file
Three ways to read the last line of a file:
For a small file, read the entire file into memory
with open("file.txt") as file:
lines = file.readlines()
print(lines[-1])
For a big file, read line by line and print the last line
with open("file.txt") as file:
for line in file:
pass
print(line)
For efficient approach, go directly to the last line
import os
with open("file.txt", "rb") as file:
# Go to the end of the file before the last break-line
file.seek(-2, os.SEEK_END)
# Keep reading backward until you find the next break-line
while file.read(1) != b'\n':
file.seek(-2, os.SEEK_CUR)
print(file.readline().decode())
If you can afford to read the entire file in memory(if the filesize is considerably less than the total memory), you can use the readlines() method as mentioned in one of the other answers, but if the filesize is large, the best way to do it is:
fi=open(inputFile, 'r')
lastline = ""
for line in fi:
lastline = line
print lastline
You could use csv.reader() to read your file as a list and print the last line.
Cons: This method allocates a new variable (not an ideal memory-saver for very large files).
Pros: List lookups take O(1) time, and you can easily manipulate a list if you happen to want to modify your inputFile, as well as read the final line.
import csv
lis = list(csv.reader(open(inputFile)))
print lis[-1] # prints final line as a list of strings
If you care about memory this should help you.
last_line = ''
with open(inputfile, "r") as f:
f.seek(-2, os.SEEK_END) # -2 because last character is likely \n
cur_char = f.read(1)
while cur_char != '\n':
last_line = cur_char + last_line
f.seek(-2, os.SEEK_CUR)
cur_char = f.read(1)
print last_line
This might help you.
class FileRead(object):
def __init__(self, file_to_read=None,file_open_mode=None,stream_size=100):
super(FileRead, self).__init__()
self.file_to_read = file_to_read
self.file_to_write='test.txt'
self.file_mode=file_open_mode
self.stream_size=stream_size
def file_read(self):
try:
with open(self.file_to_read,self.file_mode) as file_context:
contents=file_context.read(self.stream_size)
while len(contents)>0:
yield contents
contents=file_context.read(self.stream_size)
except Exception as e:
if type(e).__name__=='IOError':
output="You have a file input/output error {}".format(e.args[1])
raise Exception (output)
else:
output="You have a file error {} {} ".format(file_context.name,e.args)
raise Exception (output)
b=FileRead("read.txt",'r')
contents=b.file_read()
lastline = ""
for content in contents:
# print '-------'
lastline = content
print lastline
I use the pandas module for its convenience (often to extract the last value).
Here is the example for the last row:
import pandas as pd
df = pd.read_csv('inputFile.csv')
last_value = df.iloc[-1]
The return is a pandas Series of the last row.
The advantage of this is that you also get the entire contents as a pandas DataFrame.
I'd like to remove the first column from a file. The file contains 3 columns separated by space and the columns has the following titles:
X', 'Displacement' and 'Force' (Please see the image).
I have came up with the following code, but to my disappointment it doesn't work!
f = open("datafile.txt", 'w')
for line in f:
line = line.split()
del x[0]
f.close()
Any help is much appreciated !
Esan
First of all, you're attempting to read from a file (by iterating through the file contents) that is open for writing. This will give you an IOError.
Second, there is no variable named x in existence (you have not declared/set one in the script). This will generate a NameError.
Thirdly and finally, once you have finished (correctly) reading and editing the columns in your file, you will need to write the data back into the file.
To avoid loading a (potentially large) file into memory all at once, it is probably a good idea to read from one file (line by line) and write to a new file simultaneously.
Something like this might work:
f = open("datafile.txt", "r")
g = open("datafile_fixed.txt", "w")
for line in f:
if line.strip():
g.write("\t".join(line.split()[1:]) + "\n")
f.close()
g.close()
Some reading about python i/o might be helpful, but something like the following should get you on your feet:
with open("datafile.txt", "r") as fin:
with open("outputfile.txt", "w") as fout:
for line in fin:
line = line.split(' ')
if len(line) == 3:
del line[0]
fout.write(line[0] + ' ' + line[1])
else:
fout.write('\n')
EDIT: fixed to work with blank lines
print ''.join([' '.join(l.split()[1:]) for l in file('datafile.txt')])
or, if you want to preserve spaces and you know that the second column always starts at the, say, 10th character:
print ''.join([l[11:] for l in file('datafile.txt')])
I am working with a very large (~11GB) text file on a Linux system. I am running it through a program which is checking the file for errors. Once an error is found, I need to either fix the line or remove the line entirely. And then repeat...
Eventually once I'm comfortable with the process, I'll automate it entirely. For now however, let's assume I'm running this by hand.
What would be the fastest (in terms of execution time) way to remove a specific line from this large file? I thought of doing it in Python...but would be open to other examples. The line might be anywhere in the file.
If Python, assume the following interface:
def removeLine(filename, lineno):
Thanks,
-aj
You can have two file objects for the same file at the same time (one for reading, one for writing):
def removeLine(filename, lineno):
fro = open(filename, "rb")
current_line = 0
while current_line < lineno:
fro.readline()
current_line += 1
seekpoint = fro.tell()
frw = open(filename, "r+b")
frw.seek(seekpoint, 0)
# read the line we want to discard
fro.readline()
# now move the rest of the lines in the file
# one line back
chars = fro.readline()
while chars:
frw.writelines(chars)
chars = fro.readline()
fro.close()
frw.truncate()
frw.close()
Modify the file in place, offending line is replaced with spaces so the remainder of the file does not need to be shuffled around on disk. You can also "fix" the line in place if the fix is not longer than the line you are replacing
import os
from mmap import mmap
def removeLine(filename, lineno):
f=os.open(filename, os.O_RDWR)
m=mmap(f,0)
p=0
for i in range(lineno-1):
p=m.find('\n',p)+1
q=m.find('\n',p)
m[p:q] = ' '*(q-p)
os.close(f)
If the other program can be changed to output the fileoffset instead of the line number, you can assign the offset to p directly and do without the for loop
As far as I know, you can't just open a txt file with python and remove a line. You have to make a new file and move everything but that line to it. If you know the specific line, then you would do something like this:
f = open('in.txt')
fo = open('out.txt','w')
ind = 1
for line in f:
if ind != linenumtoremove:
fo.write(line)
ind += 1
f.close()
fo.close()
You could of course check the contents of the line instead to determine if you want to keep it or not. I also recommend that if you have a whole list of lines to be removed/changed to do all those changes in one pass through the file.
If the lines are variable length then I don't believe that there is a better algorithm than reading the file line by line and writing out all lines, except for the one(s) that you do not want.
You can identify these lines by checking some criteria, or by keeping a running tally of lines read and suppressing the writing of the line(s) that you do not want.
If the lines are fixed length and you want to delete specific line numbers, then you may be able to use seek to move the file pointer... I doubt you're that lucky though.
Update: solution using sed as requested by poster in comment.
To delete for example the second line of file:
sed '2d' input.txt
Use the -i switch to edit in place. Warning: this is a destructive operation. Read the help for this command for information on how to make a backup automatically.
def removeLine(filename, lineno):
in = open(filename)
out = open(filename + ".new", "w")
for i, l in enumerate(in, 1):
if i != lineno:
out.write(l)
in.close()
out.close()
os.rename(filename + ".new", filename)
I think there was a somewhat similar if not exactly the same type of question asked here. Reading (and writing) line by line is slow, but you can read a bigger chunk into memory at once, go through that line by line skipping lines you don't want, then writing this as a single chunk to a new file. Repeat until done. Finally replace the original file with the new file.
The thing to watch out for is when you read in a chunk, you need to deal with the last, potentially partial line you read, and prepend that into the next chunk you read.
#OP, if you can use awk, eg assuming line number is 10
$ awk 'NR!=10' file > newfile
I will provide two alternatives based on the look-up factor (line number or a search string):
Line number
def removeLine2(filename, lineNumber):
with open(filename, 'r+') as outputFile:
with open(filename, 'r') as inputFile:
currentLineNumber = 0
while currentLineNumber < lineNumber:
inputFile.readline()
currentLineNumber += 1
seekPosition = inputFile.tell()
outputFile.seek(seekPosition, 0)
inputFile.readline()
currentLine = inputFile.readline()
while currentLine:
outputFile.writelines(currentLine)
currentLine = inputFile.readline()
outputFile.truncate()
String
def removeLine(filename, key):
with open(filename, 'r+') as outputFile:
with open(filename, 'r') as inputFile:
seekPosition = 0
currentLine = inputFile.readline()
while not currentLine.strip().startswith('"%s"' % key):
seekPosition = inputFile.tell()
currentLine = inputFile.readline()
outputFile.seek(seekPosition, 0)
currentLine = inputFile.readline()
while currentLine:
outputFile.writelines(currentLine)
currentLine = inputFile.readline()
outputFile.truncate()