Related
Let's say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?
First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
if line.strip("\n") != "nickname_to_delete":
f.write(line)
You need to strip("\n") the newline character in the comparison because if your file doesn't end with a newline character the very last line won't either.
Solution to this problem with only a single open:
with open("target.txt", "r+") as f:
d = f.readlines()
f.seek(0)
for i in d:
if i != "line you want to remove...":
f.write(i)
f.truncate()
This solution opens the file in r/w mode ("r+") and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.
The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.
with open("yourfile.txt", "r") as file_input:
with open("newfile.txt", "w") as output:
for line in file_input:
if line.strip("\n") != "nickname_to_delete":
output.write(line)
That's it! In one loop and one only you can do the same thing. It will be much faster.
This is a "fork" from #Lother's answer (which I believe that should be considered the right answer).
For a file like this:
$ cat file.txt
1: october rust
2: november rain
3: december snow
This fork from Lother's solution works fine:
#!/usr/bin/python3.4
with open("file.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "snow" not in line:
f.write(line)
f.truncate()
Improvements:
with open, which discard the usage of f.close()
more clearer if/else for evaluating if string is not present in the current line
The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don't need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.
I liked the fileinput approach as explained in this answer:
Deleting a line from a text file (python)
Say for example I have a file which has empty lines in it and I want to remove empty lines, here's how I solved it:
import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
if len(line) > 1:
sys.stdout.write(line)
Note: The empty lines in my case had length 1
If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:
$ cat animal.txt
dog
pig
cat
monkey
elephant
Delete the first line:
>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt'])
then
$ cat animal.txt
pig
cat
monkey
elephant
Probably, you already got a correct answer, but here is mine.
Instead of using a list to collect unfiltered data (what readlines() method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:
main_file = open('data_base.txt').read() # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
if 'your data to delete' not in line: # remove a specific string
main_file.write(line) # put all strings back to your db except deleted
else: pass
main_file.close()
Hope you will find this useful! :)
I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you'll have to write the result back to the source file.
Here's how I might do this:
import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']
I'm assuming nicknames.csv contains data like:
Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...
Then load the file into the list:
nicknames = None
with open("nicknames.csv") as sourceFile:
nicknames = sourceFile.read().splitlines()
Next, iterate over to list to match your inputs to delete:
for nick in nicknames_to_delete:
try:
if nick in nicknames:
nicknames.pop(nicknames.index(nick))
else:
print(nick + " is not found in the file")
except ValueError:
pass
Lastly, write the result back to file:
with open("nicknames.csv", "a") as nicknamesFile:
nicknamesFile.seek(0)
nicknamesFile.truncate()
nicknamesWriter = csv.writer(nicknamesFile)
for name in nicknames:
nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()
In general, you can't; you have to write the whole file again (at least from the point of change to the end).
In some specific cases you can do better than this -
if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;
or you could just overwrite the data chunk with a 'this is bad data, skip it' value or keep a 'this item has been deleted' flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.
This is probably overkill for short documents (anything under 100 KB?).
I like this method using fileinput and the 'inplace' method:
import fileinput
for line in fileinput.input(fname, inplace =1):
line = line.strip()
if not 'UnwantedWord' in line:
print(line)
It's a little less wordy than the other answers and is fast enough for
Save the file lines in a list, then remove of the list the line you want to delete and write the remain lines to a new file
with open("file_name.txt", "r") as f:
lines = f.readlines()
lines.remove("Line you want to delete\n")
with open("new_file.txt", "w") as new_f:
for line in lines:
new_f.write(line)
here's some other method to remove a/some line(s) from a file:
src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()
contents.pop(idx) # remove the line item from list, by line number, starts from 0
f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()
You can use the re library
Assuming that you are able to load your full txt-file. You then define a list of unwanted nicknames and then substitute them with an empty string "".
# Delete unwanted characters
import re
# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)
Do you want to remove a specific line from file so use this snippet short and simple code you can easily remove any line with sentence or prefix(Symbol).
with open("file_name.txt", "r") as f:
lines = f.readlines()
with open("new_file.txt", "w") as new_f:
for line in lines:
if not line.startswith("write any sentence or symbol to remove line"):
new_f.write(line)
To delete a specific line of a file by its line number:
Replace variables filename and line_to_delete with the name of your file and the line number you want to delete.
filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}
with open(filename) as f:
content = f.readlines()
for line in content:
file_lines[initial_line] = line.strip()
initial_line += 1
f = open(filename, "w")
for line_number, line_content in file_lines.items():
if line_number != line_to_delete:
f.write('{}\n'.format(line_content))
f.close()
print('Deleted line: {}'.format(line_to_delete))
Example output:
Deleted line: 3
Take the contents of the file, split it by newline into a tuple. Then, access your tuple's line number, join your result tuple, and overwrite to the file.
guess you have a solution concerning the following issue:
I want to compare two lists for common entries (on the basis of column 10) and write common entries to one file and unique entries for the first list into another file. The code I wrote is:
INFILE1 = open ("c:\\python\\test\\58962.filtered.csv", "r")
INFILE2 = open ("c:\\python\\test\\83887.filtered.csv", "r")
OUTFILE1 = open ("c:\\python\\test\\58962_vs_83887.common.csv", "w")
OUTFILE2 = open ("c:\\python\\test\\58962_vs_83887.unique.csv", "w")
for line in INFILE1:
line = line.rstrip().split(",")
if line[11] in INFILE2:
OUTFILE1.write(line)
else:
OUTFILE2.write(line)
INFILE1.close()
INFILE2.close()
OUTFILE1.close()
OUTFILE2.close()
The following error appears:
8 OUTFILE1.write(line)
9 else:
---> 10 OUTFILE2.write(line)
11 INFILE1.close()
TypeError: write() argument must be str, not list
Does somebody know about help for this?
Best
This line
line = line.rstrip().split(",")
replaces the line you read from a file by it's splitted list. You then try to write the splitted list to your file - thats not how the write method works and it tells you exactly that.
Change it to :
for line in INFILE1:
lineList = line.rstrip().split(",") # dont overwrite line, use lineList
if lineList[11] in INFILE2: # used lineList
OUTFILE1.write(line) # corrected indentation
else:
OUTFILE2.write(line)
You could have easily found your error yourself, just printing out the line before and after splitting or just befrore writing.
Please read How to debug small programs (#1) and follow it - its easier to find and fix bugs yourself then posting questions here.
You have some other problem at hand, though:
Files are stream based, they start with a position of 0 in the file. The position is advanced if you access parts of the file. When at the end, you wont get anything by using INFILE2.read() or other methods.
So if you want to repeatadly check if some lines column of file1 is somewhere in file2 you need to read file2 into a list (or other datastructure) so your repeated checks work. In other words, this:
if lineList[11] in INFILE2:
might work once, then the file is consumed and it will return false all the time.
You also might want to change from:
f = open(...., ...)
# do something with f
f.close()
to
with open(name,"r") as f:
# do something with f, no close needed, closed when leaving block
as it is safer, will close the file even if exceptions happen.
To solve that try this (untested) code:
with open ("c:\\python\\test\\83887.filtered.csv", "r") as file2:
infile2 = file2.readlines() # read in all lines as list
with open ("c:\\python\\test\\58962.filtered.csv", "r") as INFILE1:
# next 2 lines are 1 line, \ at end signifies line continues
with open ("c:\\python\\test\\58962_vs_83887.common.csv", "w") as OUTFILE1, \
with open ("c:\\python\\test\\58962_vs_83887.unique.csv", "w") as OUTFILE2:
for line in INFILE1:
lineList = line.rstrip().split(",")
if any(lineList[11] in x for x in infile2): # check the list of lines if
# any contains line[11]
OUTFILE1.write(line)
else:
OUTFILE2.write(line)
# all files are autoclosed here
Links to read:
the-with-statement
any() and other built-ins
Let's say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?
First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
if line.strip("\n") != "nickname_to_delete":
f.write(line)
You need to strip("\n") the newline character in the comparison because if your file doesn't end with a newline character the very last line won't either.
Solution to this problem with only a single open:
with open("target.txt", "r+") as f:
d = f.readlines()
f.seek(0)
for i in d:
if i != "line you want to remove...":
f.write(i)
f.truncate()
This solution opens the file in r/w mode ("r+") and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.
The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.
with open("yourfile.txt", "r") as file_input:
with open("newfile.txt", "w") as output:
for line in file_input:
if line.strip("\n") != "nickname_to_delete":
output.write(line)
That's it! In one loop and one only you can do the same thing. It will be much faster.
This is a "fork" from #Lother's answer (which I believe that should be considered the right answer).
For a file like this:
$ cat file.txt
1: october rust
2: november rain
3: december snow
This fork from Lother's solution works fine:
#!/usr/bin/python3.4
with open("file.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "snow" not in line:
f.write(line)
f.truncate()
Improvements:
with open, which discard the usage of f.close()
more clearer if/else for evaluating if string is not present in the current line
The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don't need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.
I liked the fileinput approach as explained in this answer:
Deleting a line from a text file (python)
Say for example I have a file which has empty lines in it and I want to remove empty lines, here's how I solved it:
import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
if len(line) > 1:
sys.stdout.write(line)
Note: The empty lines in my case had length 1
If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:
$ cat animal.txt
dog
pig
cat
monkey
elephant
Delete the first line:
>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt'])
then
$ cat animal.txt
pig
cat
monkey
elephant
Probably, you already got a correct answer, but here is mine.
Instead of using a list to collect unfiltered data (what readlines() method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:
main_file = open('data_base.txt').read() # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
if 'your data to delete' not in line: # remove a specific string
main_file.write(line) # put all strings back to your db except deleted
else: pass
main_file.close()
Hope you will find this useful! :)
I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you'll have to write the result back to the source file.
Here's how I might do this:
import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']
I'm assuming nicknames.csv contains data like:
Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...
Then load the file into the list:
nicknames = None
with open("nicknames.csv") as sourceFile:
nicknames = sourceFile.read().splitlines()
Next, iterate over to list to match your inputs to delete:
for nick in nicknames_to_delete:
try:
if nick in nicknames:
nicknames.pop(nicknames.index(nick))
else:
print(nick + " is not found in the file")
except ValueError:
pass
Lastly, write the result back to file:
with open("nicknames.csv", "a") as nicknamesFile:
nicknamesFile.seek(0)
nicknamesFile.truncate()
nicknamesWriter = csv.writer(nicknamesFile)
for name in nicknames:
nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()
In general, you can't; you have to write the whole file again (at least from the point of change to the end).
In some specific cases you can do better than this -
if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;
or you could just overwrite the data chunk with a 'this is bad data, skip it' value or keep a 'this item has been deleted' flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.
This is probably overkill for short documents (anything under 100 KB?).
I like this method using fileinput and the 'inplace' method:
import fileinput
for line in fileinput.input(fname, inplace =1):
line = line.strip()
if not 'UnwantedWord' in line:
print(line)
It's a little less wordy than the other answers and is fast enough for
Save the file lines in a list, then remove of the list the line you want to delete and write the remain lines to a new file
with open("file_name.txt", "r") as f:
lines = f.readlines()
lines.remove("Line you want to delete\n")
with open("new_file.txt", "w") as new_f:
for line in lines:
new_f.write(line)
here's some other method to remove a/some line(s) from a file:
src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()
contents.pop(idx) # remove the line item from list, by line number, starts from 0
f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()
You can use the re library
Assuming that you are able to load your full txt-file. You then define a list of unwanted nicknames and then substitute them with an empty string "".
# Delete unwanted characters
import re
# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)
Do you want to remove a specific line from file so use this snippet short and simple code you can easily remove any line with sentence or prefix(Symbol).
with open("file_name.txt", "r") as f:
lines = f.readlines()
with open("new_file.txt", "w") as new_f:
for line in lines:
if not line.startswith("write any sentence or symbol to remove line"):
new_f.write(line)
To delete a specific line of a file by its line number:
Replace variables filename and line_to_delete with the name of your file and the line number you want to delete.
filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}
with open(filename) as f:
content = f.readlines()
for line in content:
file_lines[initial_line] = line.strip()
initial_line += 1
f = open(filename, "w")
for line_number, line_content in file_lines.items():
if line_number != line_to_delete:
f.write('{}\n'.format(line_content))
f.close()
print('Deleted line: {}'.format(line_to_delete))
Example output:
Deleted line: 3
Take the contents of the file, split it by newline into a tuple. Then, access your tuple's line number, join your result tuple, and overwrite to the file.
Preface - I'm pretty new to Python, having had more experience in another language.
I have a text file with single column list of strings in the generic (but slightly varying) format "./abc123a1/type/1ab2_x_data_type.file.type"
I need to extract the abc123a1 and the 1ab2 portions from all several hundred of the rows and put them under two columns (column a and b) in a csv. Sometimes there may be a "1ab2_a" and a "1ab2_b", but I only want one 1ab2. So I'd want to grab "1ab2_a" and ignore all others.
I have the regex which I THINK will work:
tmp = list()
if re.findall(re.compile(r'^([a-zA-Z0-9]{4})_'), x):
tmp = re.findall(re.compile(r'^([a-zA-Z0-9]{4})_'), x)
elif re.findall(re.compile(r'_([a-zA-Z0-9]{4})_'), x):
tmp = re.findall(re.compile(r'_([a-zA-Z0-9]{4})_'), x)
if len(tmp) == 0:
return None
elif len(tmp) > 1:
print "ERROR found multiple matches"
return "ERROR"
else:
return tmp[0].upper()
I am trying to make this script step by step and testing things to make sure it works, but it's just not.
import sys
import csv
listOfData = []
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
for line in f:
listOfData.append([line])
print listOfData
with open('extracted.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('column a', 'column b'))
writer.writerows(listOfData)
print listOfData
Still failing to get anything in the csv other than column headers, much less a parsed version!
Does anyone have any better ideas or formats I could do this in? A friend mentioned looking into glob.glob, but I haven't had luck getting that to work either.
IMHO, you were not far from making it work. The problem is that you read once the whole file just to print the lines, and then (once at end of file) you try to put them into a list... and get an empty list !
You should read the file only once:
import sys
import csv
listOfData = []
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
listOfData.append([line])
print listOfData
with open('extracted.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('column a', 'column b'))
writer.writerows(listOfData)
print listOfData
once it works, you still have to use the regex to get relevant data to put into the csv file
I am not sure about your regex (it will most probably not work) , but the reason why your current (non-regex , simple) code does not work is because -
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
for line in f:
listOfData.append([line])
As you can see you are first iterating over each line in file and printing it, it should be fine, but after the loop ends, the file pointer is at the end of file, so trying to iterate over it again , would not produce any result. You should only iterate over it once, and do both printing and appending to list in it. Example -
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
listOfData.append([line])
I think at least part of the problem is the two for loops in the following:
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
for line in f:
listOfData.append([line])
The first one prints all the lines of f, so there's nothing left for the second one to iterate over unless you first f.seek(0) and rewind the file.
An alternative way would to simply to this:
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
listOfData.append([line])
It's hard to tell if your regexes are OK without more than one line of sample input data.
Are you sure you need all of the regular expressions? You seem to be parsing a list of paths and filenames. The path could be split up using a split command, for example:
print "./abc123a1/type/1ab2_a_data_type.file.type".split("/")
Would give:
['.', 'abc123a1', 'type', '1ab2_a_data_type.file.type']
You could then create a set consisting of the second entry and up to the '_' in forth entry, e.g.
('abc123a1', '1ab2')
This could then be used to print only the first entry from each:
pairs = set()
with open(sys.argv[1], 'r') as in_file, open('extracted.csv', 'wb') as out_file:
writer = csv.writer(out_file)
for row in in_file:
folders = row.split("/")
col_a = folders[1]
col_b = folders[3].split("_")[0]
if (col_a, col_b) not in pairs:
pairs.add((col_a, col_b))
writer.writerow([col_a, col_b])
So for an input looking like this:
./abc123a1/type/1ab2_a_data_type.file.type
./abc123a1/type/1ab2_b_data_type.file.type
./abc123a2/type/1ab2_a_data_type.file.type
./abc123a3/type/1ab2_a_data_type.file.type
You would get a CSV file looking like:
abc123a1,1ab2
abc123a2,1ab2
abc123a3,1ab2
I have a file that I need to write certain contents to a new file.
The current contents is as follows:
send from #1373846594 to pool/10.0.68.61#1374451276 estimated size is 7.83G
send from #1374451276 to pool/10.0.68.61#1375056084 estimated size is 10.0G
I need the new file to show:
#1373846594 --> pool/10.0.68.61#1374451276 --> 7.83G
#1374451276 --> pool/10.0.68.61#1375056084 --> 10.0G
I have tried:
with open("file", "r") as drun:
for _,_,snap,_,pool_,_,_,size in zip(*[iter(drun)]*9):
drun.write("{0}\t{1}\t{2}".format(snap,pool,size))
I know I am either way off or just not quite there but I am not sure where to go next with this. Any help would be appreciated.
You want to split your lines using str.split(), and you'll need to write to another file first, then move that back into place; reading and writing to the same file is tricky and should be avoided unless you are working with fixed record sizes.
However, the fileinput module makes in-place file editing easy enough:
import fileinput
for line in fileinput.input(filename, inplace=True):
components = line.split()
snap, pool, size = components[2], components[4], components[-1]
print '\t'.join((snap,pool,size))
The print statement writes to sys.stdout, which fileinput conveniently redirects when inplace=True is set. This means you are writing to the output file (that replaces the original input file), writing a bonus newline on every loop too.
inf = open(file)
outf = open(outfile,'w')
for line in inf:
parts = line.split()
outf.write("{0}-->{1}-->{2}".format(parts[2], parts[4], parts[8]))
inf.close()
outf.close()
Perhaps something simple using a regex pattern match:
with open('output_file', 'w') as outFile:
for line in open('input_file'):
line = line.split()
our_patterns = [i for i in line if re.search('^#', i) or \
re.search('^pool', i) or \
re.search('G$', i)]
outFile.write(' --> '.join(our_patterns) + '\n')
The pattern matching will extract any parts that begin with # or pool, as well as the final size that ends with G. These parts are then joined with the --> and written to file. Hope this helps
SOURCE, DESTINATION, SIZE = 2, 4, 8
with open('file.txt') as drun:
for line in drun:
pieces = line.split()
print(pieces[SOURCE], pieces[DESTINATION], pieces[SIZE], sep=' --> ', file=open('log.txt', 'a'))