How do you delete a specific line in a text file with python - python

How do you delete a specific line from a python text file that starts with a certain word? For example, if I have the list:
cat | apples
bee | orange
dog | bananas
How do I delete the whole line that starts with 'bee'
I tried like del line in a for loop but it didn't work

You're probably confusing the del operator with actually modifying a file. Deleting memory read from a file won't delete it from the file. You'll want to store the lines you need into memory, remove the lines you don't need, then write them back to the file. If the lines you need can fit into memory, then this will work:
lines = []
for line in open('test.txt'):
if not line.startswith('bee'):
lines.append(line)
with open('test.txt', 'w') as file:
file.writelines(lines)

I would suggest the following:
read the content of the file into python list
with open('file_name', 'r') as f:
data = f.readlines()
filter out lines starts with "bee" data = [ l for l in data if not l.startswith('bee') ]
print data back to a file.

Related

Extracting the data from the same position over multiple lines in a string

Fairly simple question but I can't figure out where i'm going wrong. I have a text file which I have split into multiple lines. I want to print a certain location from each line, characters 14 to 20 but when I run the below code it prints a blank set of a characters.
with open('filetxt', 'r') as file:
data = file.read().rstrip()
for line in data:
print(line[14:20])
If you want to read the file line by line, try:
with open('filetxt', 'r') as file:
for line in file:
print(line[14:20])
I think you're using the wrong read() method. read() reads the whole file at once you might want to use readlines() which returns a list of the read lines. I.e.:
with open('filetxt', 'r') as file:
lines = file.readlines()
for line in lines:
print(line[14:20])

Using a for loop to add a new line to a table: python

I am trying to create a .bed file after searching through DNA sequences for two regular expressions. Ideally, I'd like to generate a tab-separated file which contains the sequence description, the start location of the first regex and the end location of the second regex. I know that the regex section works, it's just creating the \t separated file I am struggling with.
I was hoping that I could open/create a file and simply print a new line for each iteration of the for loop that contains this information, like so:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(f'{sequence.description}\t{h.start()}\t{h_rc.end()}')
file_object.close()
But this doesn't seem to work (creates empty file). I have also tried to use file_object.write, but again this creates an empty file too.
This is all of the code I have including searching for the regexes:
import re, sys
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
infile = sys.argv[1]
for sequence in SeqIO.parse(infile, "fasta"):
hit = re.finditer(r"CAGTGGG..GCAA[TA]AA", str(sequence.seq))
mimp_length = 400
for h in hit:
h_start = h.start()
hit_rc = re.finditer(r"TT[TA]TTGC..CCCACTG", str(sequence.seq))
for h_rc in hit_rc:
h_rc_end = h_rc.end()
length = h_rc_end - h_start
if length > 0:
if length < mimp_length:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(sequence.description, h.start(), h_rc.end())
file_object.close()
This is the desired output:
Focub_II5_mimp_1__contig_1.16(656599:656809) 2 208
Focub_II5_mimp_2__contig_1.47(41315:41540) 2 223
Focub_II5_mimp_3__contig_1.65(13656:13882) 2 224
Focub_II5_mimp_4__contig_1.70(61591:61809) 2 216
This is example input:
>Focub_II5_mimp_1__contig_1.16(656599:656809)
TACAGTGGGATGCAAAAAGTATTCGCAGGTGTGTAGAGAGATTTGTTGCTCGGAAGCTAGTTAGGTGTAGCTTGTCAGGTTCTCAGTACCCTATATTACACCGAGATCAGCGGGATAATCTAGTCTCGAGTACATAAGCTAAGTTAAGCTACTAACTAGCGCAGCTGACACAACTTACACACCTGCAAATACTTTTTGCATCCCACTGTA
>Focub_II5_mimp_2__contig_1.47(41315:41540)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTCTGCCGCTAGCCCATTTTAACAGCTAGAGTGTGTATATTAACCTCACACATAGCTATCTCTTATACTAATTGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTGTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_3__contig_1.65(13656:13882)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTTCTGCCGCTAGCCTATTTTAATAGTTAGAGTGTGCATATTAACCTCACACATAGCTATCTTATATACTAATCGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTCTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_4__contig_1.70(61591:61809)
TACAGTGGGATGCAATAAGTTTGAATGCAGGCTGAAGTACCAGCTGTTGTAATCTAGCTCCTGTATACAACGCTTTAGCTTGATAAAGTAAGCGCTAAGCTGTATCAGGCAAAAGGCTATCCCGATTGGGGTATTGCTACGTAGGGAACTGGTCTTACCTTGGTTAGTCAGTGAATGTGTACTTGAGTTTGGATTCAAACTTATTGCATCCCACTGTA
Is anybody able to help?
Thank you :)
to write a line to a file you would do something like this:
with open("file.txt", "a") as f:
print("new line", file=f)
and if you want it tab separated you can also add sep="\t", this is why python 3 made print a function so you can use sep, end, file, and flush keyword arguments. :)
opening a file for appending means the file pointer starts at the end of the file which means that writing to it doesn't override any data (gets appended to the end of the file) and iterating over it (or otherwise reading from it) gives nothing like you already reached the end of the file.
So instead of iterating over the lines of the file you would just write the single line to it:
with open("Mimp_hits.bed", "a") as file_object:
print(sequence.description, h.start(), h_rc.end(), file=file_object)
you can also consider just opening the file near the beginning of the loop since opening it once and writing multiple times is more efficient than opening it multiple times, also the with block automatically closes the file so no need to do that explicitly.
You are trying to open the file in "a+" mode, and loop over lines from it (which will not find anything because the file is positioned at the end when you do that). In any case, if this is an output file only, then you would open it in "a" mode to append to it.
Probably you just want to open the file once for appending, and inside the with statement, do your main loop, using file_object.write(...) when you want to actually append strings to the file. Note that there is no need for file_object.close() when using this with construct.
with open("Mimp_hits.bed", "a") as file_object:
for sequence in SeqIO.parse(infile, "fasta"):
# ... etc per original code ...
if length < mimp_length:
file_object.write("{}\t{}\t{}\n".format(
sequence.description, h.start(), h_rc.end()))

Edit and save file

I need to edit my file and save it so that I can use it for another program . First I need to put "," in between every word and add a word at the end of every line.
In order to put "," in between every word , I used this command
for line in open('myfile','r+') :
for word in line.split():
new = ",".join(map(str,word))
print new
I'm not too sure how to overwrite the original file or maybe create a new output file for the edited version . I tried something like this
with open('myfile','r+') as f:
for line in f:
for word in line.split():
new = ",".join(map(str,word))
f.write(new)
The output is not what i wanted (different from the print new) .
Second, I need to add a word at the end of every line. So, i tried this
source = open('myfile','r')
output = open('out','a')
output.write(source.read().replace("\n", "yes\n"))
The code to add new word works perfectly. But I was thinking there should be an easier way to open a file , do two editing in one go and save it. But I'm not too sure how. Ive spent a tremendous amount of time to figure out how to overwrite the file and it's about time I seek for help
Here you go:
source = open('myfile', 'r')
output = open('out','w')
output.write('yes\n'.join(','.join(line.split()) for line in source.read().split('\n')))
One-liner:
open('out', 'w').write('yes\n'.join(','.join(line.split() for line in open('myfile', 'r').read().split('\n')))
Or more legibly:
source = open('myfile', 'r')
processed_lines = []
for line in source:
line = ','.join(line.split()).replace('\n', 'yes\n')
processed_lines.append(line)
output = open('out', 'w')
output.write(''.join(processed_lines))
EDIT
Apparently I misread everything, lol.
#It looks like you are writing the word yes to all of the lines, then spliting
#each word into letters and listing those word's letters on their own line?
source = open('myfile','r')
output = open('out','w')
for line in source:
for word in line.split():
new = ",".join(word)
print >>output, new
print >>output, 'y,e,s'
How big is this file?
Maybe You could create a temporary list which would just contain everything from file you want to edit. Every element could represent one line.
Editing list of strings is pretty simple.
After Your changes you can just open Your file again with
writable = open('configuration', 'w')
and then put changed lines to file with
file.write(writable, currentLine + '\n')
.
Hope that helps - even a little bit. ;)
For the first problem, you could read all the lines in f before overwriting f, assuming f is opened in 'r+' mode. Append all the results into a string, then execute:
f.seek(0) # reset file pointer back to start of file
f.write(new) # new should contain all concatenated lines
f.truncate() # get rid of any extra stuff from the old file
f.close()
For the second problem, the solution is similar: Read the entire file, make your edits, call f.seek(0), write the contents, f.truncate() and f.close().

Remove whitespaces in the beginning of every string in a file in python?

How to remove whitespaces in the beginning of every string in a file with python?
I have a file myfile.txt with the strings as shown below in it:
_ _ Amazon.inc
Arab emirates
_ Zynga
Anglo-Indian
Those underscores are spaces.
The code must be in a way that it must go through each and every line of a file and remove all those whitespaces, in the beginning of a line.
I've tried using lstrip but that's not working for multiple lines and readlines() too.
Using a for loop can make it better?
All you need to do is read the lines of the file one by one and remove the leading whitespace for each line. After that, you can join again the lines and you'll get back the original text without the whitespace:
with open('myfile.txt') as f:
line_lst = [line.lstrip() for line in f.readlines()]
lines = ''.join(line_lst)
print lines
Assuming that your input data is in infile.txt, and you want to write this file to output.txt, it is easiest to use a list comprehension:
inf = open("infile.txt")
stripped_lines = [l.lstrip() for l in inf.readlines()]
inf.close()
# write the new, stripped lines to a file
outf = open("output.txt", "w")
outf.write("".join(stripped_lines))
outf.close()
To read the lines from myfile.txt and write them to output.txt, use
with open("myfile.txt") as input:
with open("output.txt", "w") as output:
for line in input:
output.write(line.lstrip())
That will make sure that you close the files after you're done with them, and it'll make sure that you only keep a single line in memory at a time.
The above code works in Python 2.5 and later because of the with keyword. For Python 2.4 you can use
input = open("myfile.txt")
output = open("output.txt", "w")
for line in input:
output.write(line.lstrip())
if this is just a small script where the files will be closed automatically at the end. If this is part of a larger program, then you'll want to explicitly close the files like this:
input = open("myfile.txt")
try:
output = open("output.txt", "w")
try:
for line in input:
output.write(line.lstrip())
finally:
output.close()
finally:
input.close()
You say you already tried with lstrip and that it didn't work for multiple lines. The "trick" is to run lstrip on each individual line line I do above. You can try the code out online if you want.

How to delete parts of a file in python?

I have a file named a.txt which looks like this:
I'm the first line
I'm the second line.
There may be more lines here.
I'm below an empty line.
I'm a line.
More lines here.
Now, I want to remove the contents above the empty line(including the empty line itself).
How could I do this in a Pythonic way?
Basically you can't delete stuff from the beginning of a file, so you will have to write to a new file.
I think the pythonic way looks like this:
# get a iterator over the lines in the file:
with open("input.txt", 'rt') as lines:
# while the line is not empty drop it
for line in lines:
if not line.strip():
break
# now lines is at the point after the first paragraph
# so write out everything from here
with open("output.txt", 'wt') as out:
out.writelines(lines)
Here are some simpler versions of this, without with for older Python versions:
lines = open("input.txt", 'rt')
for line in lines:
if not line.strip():
break
open("output.txt", 'wt').writelines(lines)
and a very straight forward version that simply splits the file at the empty line:
# first, read everything from the old file
text = open("input.txt", 'rt').read()
# split it at the first empty line ("\n\n")
first, rest = text.split('\n\n',1)
# make a new file and write the rest
open("output.txt", 'wt').write(rest)
Note that this can be pretty fragile, for example windows often uses \r\n as a single linebreak, so a empty line would be \r\n\r\n instead. But often you know the format of the file uses one kind of linebreaks only, so this could be fine.
Naive approach by iterating over the lines in the file one by one top to bottom:
#!/usr/bin/env python
with open("4692065.txt", 'r') as src, open("4692065.cut.txt", "w") as dest:
keep = False
for line in src:
if keep: dest.write(line)
if line.strip() == '': keep = True
The fileinput module (from the standard library) is convenient for this kind of thing. It sets things up so you can act as though your are editing the file "in-place":
import fileinput
import sys
fileobj=iter(fileinput.input(['a.txt'], inplace=True))
# iterate through the file until you find an empty line.
for line in fileobj:
if not line.strip():
break
# Iterators (like `fileobj`) pick up where they left off.
# Starting a new for-loop saves you one `if` statement and boolean variable.
for line in fileobj:
sys.stdout.write(line)
Any idea how big the file is going to be?
You could read the file into memory:
f = open('your_file', 'r')
lines = f.readlines()
which will read the file line by line and store those lines in a list (lines).
Then, close the file and reopen with 'w':
f.close()
f = open('your_file', 'w')
for line in lines:
if your_if_here:
f.write(line)
This will overwrite the current file. Then you can pick and choose which lines from the list you want to write back in. Probably not a very good idea if the file gets to large though, since the entire file has to reside in memory. But, it doesn't require that you create a second file to dump your output.
from itertools import dropwhile, islice
def content_after_emptyline(file_object):
return islice(dropwhile(lambda line: line.strip(), file_object), 1, None)
with open("filename") as f:
for line in content_after_emptyline(f):
print line,
You could do a little something like this:
with open('a.txt', 'r') as file:
lines = file.readlines()
blank_line = lines.index('\n')
lines = lines[blank_line+1:] #\n is the index of the blank line
with open('a.txt', 'w') as file:
file.write('\n'.join(lines))
and that makes the job much simpler.

Categories