How do I remove a line from a txt file which start with ">"?
For example, in the txt file, there is about 250k+ lines and if I were to use the code below, it will take quite some time.
data = ""
with open(fileName) as f:
for line in f:
if ">" not in line:
line = line.replace("\n", "")
data += line
An example of the txt file is:
> version 1.0125 revision 0... # This is the line to be removed
some random line 1
some random line 2
> version 1.0126 revision 0... # This is the line to be removed
...
I have tried using data = f.read(), it is instant but the data will contain line that start with ">".
Any help is appreciated. Thank you :)
Not knowing what you want to do with the data afterwards, this should be fast and correct:
with open(fileName) as f:
data = "".join(line for line in f if not line.startswith(">"))
If you just want to remove these lines from the file, I would honestly not do it in Python, but in your shell directly, e.g. on Linux:
$ grep -v '^>' original_file.txt >fixed_file.txt
If you insist on Python, do it on a line-by-line basis:
with open(original_file) as f:
with open(new_file, "w") as g:
for line in f:
if not line.startswith(">"):
g.write(line)
Use two files, one for reading, second for appending:
with open(fileName, 'r') as f, open(fileName.raplace('.txt', '_1.txt'), 'a+') as df:
for line in f.readlines():
if not line.startswith('>'):
df.write(line)
Related
I am trying to do what for many will be a very straight forward thing but for me is just infuriatingly difficult.
I am trying search for a line in a file that contains certain words or phrases and modify that line...that's it.
I have been through the forum and suggested similar questions and have found many hints but none do just quite what I want or are beyond my current ability to grasp.
This is the test file:
# 1st_word 2nd_word
# 3rd_word 4th_word
And this is my script so far:
############################################################
file = 'C:\lpthw\\text'
f1 = open(file, "r+")
f2 = open(file, "r+")
############################################################
def wrline():
lines = f1.readlines()
for line in lines:
if "1st_word" in line and "2nd_word" in line:
#f2.write(line.replace('#\t', '\t'))
f2.write((line.replace('#\t', '\t')).rstrip())
f1.seek(0)
wrline()
My problem is that the below inserts a \n after the line every time and adds a blank line to the file.
f2.write(line.replace('#\t', '\t'))
The file becomes:
1st_word 2nd_word
# 3rd_word 4th_word
An extra blank line between the lines of text.
If I use the following:
f2.write((line.replace('#\t', '\t')).rstrip())
I get this:
1st_word 2nd_wordd
# 3rd_word 4th_word
No new blank line inserted but and extra "d" at the end instead.
What am I doing wrong?
Thanks
Your blank line is coming from the original blank line in the file. Writing a line with nothing in it writes a newline to the file. Instead of not putting anything into the written line, you have to completely skip the iteration, so it does not write that newline. Here's what I suggest:
def wrline():
lines = open('file.txt', 'r').readlines()
f2 = open('file.txt', 'w')
for line in lines:
if '1st_word' in line and '2nd_word' in line:
f2.write((line.replace('# ', ' ')).rstrip('\n'))
else:
if line != '\n':
f2.write(line)
f2.close()
I would keep read and write operations separate.
#read
with open(file, 'r') as f:
lines = f.readlines()
#parse, change and write back
with open(file, 'w') as f:
for line in lines:
if line.startswith('#\t'):
line = line[1:]
f.write(line)
You have not closed the files and there is no need for the \t
Also get rid of the rstrip()
Read in the file, replace the data and write it back.. open and close each time.
fn = 'example.txt'
new_data = []
# Read in the file
with open(fn, 'r+') as file:
filedata = file.readlines()
# Replace the target string
for line in filedata:
if "1st_word" in line and "2nd_word" in line:
line = line.replace('#', '')
new_data.append(line)
# Write the file out again
with open(fn, 'w+') as file:
for line in new_data:
file.write(line)
I have a large 11 GB .txt file with email addresses. I would like to save only the strings till the # symbol among each other. My output only generate the first line.I have used this code of a earlier project. I would like to save the output in a different .txt file. I hope someone could help me out.
my code:
import re
def get_html_string(file,start_string,end_string):
answer="nothing"
with open(file, 'rb') as open_file:
for line in open_file:
line = line.rstrip()
if re.search(start_string, line) :
answer=line
break
start=answer.find(start_string)+len(start_string)
end=answer.find(end_string)
#print(start,end,answer)
return answer[start:end]
beginstr=''
end='#'
file='test.txt'
readstring=str(get_html_string(file,beginstr,end))
print readstring
Your file is quite big (11G) so you shouldn't keep all those strings in memory. Instead, process the file line by line and write the result before reading next line.
This should be efficient :
with open('test.txt', 'r') as input_file:
with open('result.txt', 'w') as output_file:
for line in input_file:
prefix = line.split('#')[0]
output_file.write(prefix + '\n')
If your file looks like this example:
user#google.com
user2#jshds.com
Useruser#jsnl.com
You can use this:
def get_email_name(file_name):
with open(file_name) as file:
lines = file.readlines()
result = list()
for line in lines:
result.append(line.split('#')[0])
return result
get_email_name('emails.txt')
Out:
['user', 'user2', 'Useruser']
I'm new to python and I wanted my code to read a csv line by line (large file) and then if the experiment ID is a specific number write that line to a csv. Only problem is it only writes the first instance then stops. Any suggestions? Thanks.
out = open('new.csv', 'w')
with open('exp.csv','r') as w:
header =w.readline()
out.write(header)
for line in w:
line = line.strip("\n")
tokens = line.split(",")
exp_id = tokens[0]
if (exp_id=='2243920414'):
out.write(line)
continue
out.close()
You can just remove the line = line.strip('\n') and it works fine.
If you remove the newline, you just append all of the matching lines onto a single line which in your eyes looked like it had only matched once.
with open('exp.csv','r') as w, open('new.csv', 'w') as out:
header = w.readline()
out.write(header)
for line in w:
tokens = line.split(",")
exp_id = tokens[0]
if (exp_id=='2243920414'):
out.write(line)
You should also check the csv module.
It's a great module for everything related to csv files.
The global variable originalInfo contains
Joe;Bloggs;j.bloggs#anemail.com;0715491874;1
I have written a function to delete that line in a text file containing more information of this type. It works, but it is really clunky and inelegant.
f = open("input.txt",'r') # Input file
t = open("output.txt", 'w') #Temp output file
for line in f:
if line != originalInfo:
t.write(line)
f.close()
t.close()
os.remove("input.txt")
os.rename('output.txt', 'input.txt')
Is there a more efficient way of doing this? Thanks
You solution nearly works, but you need to take care of the trailing newline. This is bit shorter version, doing what you intend:
import shutil
with open("input.txt",'r') as fin, open("output.txt", 'w') as fout:
for line in fin:
if line.strip() != originalInfo:
fout.write(line)
shutil.move('output.txt', 'input.txt')
The strip() is a bit extra effort but would strip away extra white space.
Alternatively, you could do:
originalInfo += '\n'
and later in the loop:
if line != originalInfo:
You can open the file, read it by readlines(), close it and open it to write again. With this way you don't have to create an output file:
with open('input.txt') as file:
lines = file.readlines
with open('input.txt') as file:
for line in lines:
if line != originalInfo:
file.write(line)
But if you want to have an output:
with open('input.txt') as input:
with open('output.txt', 'w') as output:
for line in input:
if line != originalInfo:
output.write(line)
I am writing a code in Python for searching a string in a huge text file which will occur every 10-15 lines and copying its next line in another text file. I am a beginner in Python so not sure what would be best to do this. I am trying by using the below script:
name = raw_input('Enter file:')
with open(name) as f:
with open("output.txt", "w") as f1:
for line in f:
if "IDENTIFIER" in line:
f1.write(line)
After this what I need in output file is the entire next line after this string is found.
something like line+1 which I suppose is not available in Python.
How can I jump to the next line and write that line in the output file after me text IDENTIFIER?
with open("file_in.txt") as f:
with open("file_out.txt","w") as f2:
for line in f:
if "my_test" in line:
f2.write(line.rstrip("\n")+next(f)) # the next line :P
You can use a flag variable:
flag = False
name = raw_input('Enter file:')
with open(name) as f:
with open("output.txt", "w") as f1:
for line in f:
if flag:
f1.write(line + '\n')
flag = False
if "IDENTIFIER" in line:
f1.write(line)
flag = True