Remove line from a text file after read - python

I have a text file named 1.txt which contains the following:
123456
011111
02222
03333
and I have created a python code which copy the first line to x number of folders to file number.txt
then copy the second to x number of folders:
progs = int(raw_input( "Folders Number : "))
with open('1.txt', 'r') as f:
progs2 = f.read().splitlines()
progs3 = int(raw_input( "Copy time for each line : "))
for i in xrange(progs):
splis = int(math.ceil(float(progs)/len(progs3)))
with open("{0}/number.txt".format(pathname),'w') as fi:
fi.write(progs2[i/splis])
I want to edit the code to remove the line after copying it to the specified number of folder;
like when the code copy the number 123456 I want it to be deleted from the file so when I use the program again to continue from the second number.
Any idea about the code?

I'd like to write this as a comment but I do not have the necessary points
to do that so I'll just write an answer. Adding up on Darren Ringer's answer.
After reading the line you could close the file and open it again overwriting
it with the old content except for the the line which you want to remove,
which has already been described in this answer:
Deleting a specific line in a file (python)
Another option would be to use in-place Filtering using the same filename
for your output which would replace your old file with the filtered content. This
is essentially the same. You just don't have to open and close the file again.
This has also already been answered by 1_CR in the following question and can also
be found at https://docs.python.org/ (Optional in-place filtering section):
Deleting a line from a text file
Adapted to your case it would look something like this:
import fileinput
import sys, os
os.chdir('/Path/to/your/file')
for line_number, line in enumerate(fileinput.input('1.txt', inplace=1)):
if line_number == 0:
# do something with the line
else:
sys.stdout.write(line) # Write the remaining lines back to your file
Cheers

You could load all the lines into a list with readlines(), then when you get a line to work with simply remove it from the list and then write the list to the file. You will be overwriting the entire file every time you perform a read (not just removing the data inline) but there is no way to do it otherwise while simultaneously ensuring the file contents are up-to-date (Of which I am aware).

Related

How to modify and overwrite large files?

I want to make several modifications to some lines in the file and overwrite the file. I do not want to create a new file with the changes, and since the file is large (hundreds of MB), I don't want to read it all at once in memory.
datfile = 'C:/some_path/text.txt'
with open(datfile) as file:
for line in file:
if line.split()[0] == 'TABLE':
# if this is true, I want to change the second word of the line
# something like: line.split()[1] = 'new'
Please note that an important part of the problem is that the file is big. There are several solutions on the site that address the similar problems but do not account for the size of the files.
Is there a way to do this in python?
You can't replace the contents of a portion of a file without rewriting the remainder of the file regardless of python. Each byte of a file lives in a fixed location on a disk or flash memory. If you want to insert text into the file that is shorter or longer than the text it replaces, you will need to move the remainder of the file. If your replacement is longer than the original text, you will probably want to write a new file to avoid overwriting the data.
Given how file I/O works, and the operations you are already performing on the file, making a new file will not be as big of a problem as you think. You are already reading in the entire file line-by-line and parsing the content. Doing a buffered write of the replacement data will not be all that expensive.
from tempfile import NamedTemporaryFile
from os import remove, rename
from os.path import dirname
datfile = 'C:/some_path/text.txt'
try:
with open(datfile) as file, NamedTemporaryFile(mode='wt', dir=dirname(datfile), delete=False) as output:
tname = output.name
for line in file:
if line.startswith('TABLE'):
ls = line.split()
ls[1] = 'new'
line = ls.join(' ') + '\n'
output.write(line)
except:
remove(tname)
else:
rename(tname, datfile)
Passing dir=dirname(datfile) to NamedTemporaryFile should guarantee that the final rename does not have to copy the file from one disk to another in most cases. Using delete=False allows you to do the rename if the operation succeeds. The temporary file is deleted by name if any problem occurs, and renamed to the original file otherwise.

Using a for loop to add a new line to a table: python

I am trying to create a .bed file after searching through DNA sequences for two regular expressions. Ideally, I'd like to generate a tab-separated file which contains the sequence description, the start location of the first regex and the end location of the second regex. I know that the regex section works, it's just creating the \t separated file I am struggling with.
I was hoping that I could open/create a file and simply print a new line for each iteration of the for loop that contains this information, like so:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(f'{sequence.description}\t{h.start()}\t{h_rc.end()}')
file_object.close()
But this doesn't seem to work (creates empty file). I have also tried to use file_object.write, but again this creates an empty file too.
This is all of the code I have including searching for the regexes:
import re, sys
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
infile = sys.argv[1]
for sequence in SeqIO.parse(infile, "fasta"):
hit = re.finditer(r"CAGTGGG..GCAA[TA]AA", str(sequence.seq))
mimp_length = 400
for h in hit:
h_start = h.start()
hit_rc = re.finditer(r"TT[TA]TTGC..CCCACTG", str(sequence.seq))
for h_rc in hit_rc:
h_rc_end = h_rc.end()
length = h_rc_end - h_start
if length > 0:
if length < mimp_length:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(sequence.description, h.start(), h_rc.end())
file_object.close()
This is the desired output:
Focub_II5_mimp_1__contig_1.16(656599:656809) 2 208
Focub_II5_mimp_2__contig_1.47(41315:41540) 2 223
Focub_II5_mimp_3__contig_1.65(13656:13882) 2 224
Focub_II5_mimp_4__contig_1.70(61591:61809) 2 216
This is example input:
>Focub_II5_mimp_1__contig_1.16(656599:656809)
TACAGTGGGATGCAAAAAGTATTCGCAGGTGTGTAGAGAGATTTGTTGCTCGGAAGCTAGTTAGGTGTAGCTTGTCAGGTTCTCAGTACCCTATATTACACCGAGATCAGCGGGATAATCTAGTCTCGAGTACATAAGCTAAGTTAAGCTACTAACTAGCGCAGCTGACACAACTTACACACCTGCAAATACTTTTTGCATCCCACTGTA
>Focub_II5_mimp_2__contig_1.47(41315:41540)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTCTGCCGCTAGCCCATTTTAACAGCTAGAGTGTGTATATTAACCTCACACATAGCTATCTCTTATACTAATTGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTGTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_3__contig_1.65(13656:13882)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTTCTGCCGCTAGCCTATTTTAATAGTTAGAGTGTGCATATTAACCTCACACATAGCTATCTTATATACTAATCGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTCTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_4__contig_1.70(61591:61809)
TACAGTGGGATGCAATAAGTTTGAATGCAGGCTGAAGTACCAGCTGTTGTAATCTAGCTCCTGTATACAACGCTTTAGCTTGATAAAGTAAGCGCTAAGCTGTATCAGGCAAAAGGCTATCCCGATTGGGGTATTGCTACGTAGGGAACTGGTCTTACCTTGGTTAGTCAGTGAATGTGTACTTGAGTTTGGATTCAAACTTATTGCATCCCACTGTA
Is anybody able to help?
Thank you :)
to write a line to a file you would do something like this:
with open("file.txt", "a") as f:
print("new line", file=f)
and if you want it tab separated you can also add sep="\t", this is why python 3 made print a function so you can use sep, end, file, and flush keyword arguments. :)
opening a file for appending means the file pointer starts at the end of the file which means that writing to it doesn't override any data (gets appended to the end of the file) and iterating over it (or otherwise reading from it) gives nothing like you already reached the end of the file.
So instead of iterating over the lines of the file you would just write the single line to it:
with open("Mimp_hits.bed", "a") as file_object:
print(sequence.description, h.start(), h_rc.end(), file=file_object)
you can also consider just opening the file near the beginning of the loop since opening it once and writing multiple times is more efficient than opening it multiple times, also the with block automatically closes the file so no need to do that explicitly.
You are trying to open the file in "a+" mode, and loop over lines from it (which will not find anything because the file is positioned at the end when you do that). In any case, if this is an output file only, then you would open it in "a" mode to append to it.
Probably you just want to open the file once for appending, and inside the with statement, do your main loop, using file_object.write(...) when you want to actually append strings to the file. Note that there is no need for file_object.close() when using this with construct.
with open("Mimp_hits.bed", "a") as file_object:
for sequence in SeqIO.parse(infile, "fasta"):
# ... etc per original code ...
if length < mimp_length:
file_object.write("{}\t{}\t{}\n".format(
sequence.description, h.start(), h_rc.end()))

python search for string in file return entire line + next line into new text file

I have a very large text file (50,000+ lines) that should always be in the same sequence. In python I want to search the text file for each of the $INGGA lines and join this line with the subsequent $INHDT to create a new text file. I need to do this without reading into memory as this causes it to crash every time. I can find return the $INGGA line but I'm not sure of the best way of then getting the next line and joining into a new string that is memory efficient
Thanks
Phil
=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2016.05.06 09:11:34 =~=~=~=~=~=~=~=~=~=~=~= > $PRDID,2.15,-0.10,31.87*6E
$INGGA,091124.00,5249.8336,N,00120.9619,W,1,20,0.6,95.0,M,49.4,M,,*50
$INHDT,31.9,T*1E $INZDA,091124.0055,06,05,2016,,*7F
$INVTG,22.0,T,,M,4.4,N,8.1,K,A*24 $PRDID,2.13,-0.06,34.09*6C
$INGGA,091124.20,5249.8338,N,00120.9618,W,1,20,0.6,95.0,M,49.4,M,,*5D
$INHDT,34.1,T*13 $INZDA,091124.2055,06,05,2016,,*7D
$INVTG,24.9,T,,M,4.4,N,8.1,K,A*2B $PRDID,2.16,-0.03,36.24*61
$INGGA,091124.40,5249.8340,N,00120.9616,W,1,20,0.6,95.0,M,49.4,M,,*5A
$INHDT,36.3,T*13 $INZDA,091124.4055,06,05,2016,,*7B
$INVTG,27.3,T,,M,4.4,N,8.1,K,A*22 $PRDID,2.11,-0.05,38.33*68
$INGGA,091124.60,5249.8343,N,00120.9614,W,1,20,0.6,95.1,M,49.4,M,,*58
$INHDT,38.4,T*1A $INZDA,091124.6055,06,05,2016,,*79
$INVTG,29.5,T,,M,4.4,N,8.1,K,A*2A $PRDID,2.09,-0.02,40.37*6D
$INGGA,091124.80,5249.8345,N,00120.9612,W,1,20,0.6,95.1,M,49.4,M,,*56
$INHDT,40.4,T*15 $INZDA,091124.8055,06,05,2016,,*77
$INVTG,31.7,T,,M,4.4,N,8.1,K,A*21 $PRDID,2.09,0.02,42.42*40
$INGGA,091125.00,5249.8347,N,00120.9610,W,1,20,0.6,95.1,M,49.4,M,,*5F
$INHDT,42.4,T*17
You can just read a line of file and write to another new file.
Like this:
import re
#open new file with append
nf = open('newfile', 'at')
#open file with read
with open('file', 'rt') as f:
for line in f:
r = re.match(r'\$INGGA', line)
if r is not None:
nf.write(line)
nf.write("$INHDT,31.9,T*1E" + '\n')
You can use at to append write and wt to read line!
I have 150,000 lines file, It's run well!
I suggest using a simple regex that will parse and capture the parts you care about. Here is an example that will capture the piece you care about:
(\$INGGA.*\n\$INHDT.*\n)
https://regex101.com/r/tK1hF0/3
As in my above link, you'll notice that I used the "global" g setting on the regex, telling it to capture all groups that match. Otherwise, it'll stop after the first match.
I also had trouble determining where the actual line breaks exist in your above example file, so you can tweak the above to match exactly where the breaks occur.
Here is some starter python example code:
import re
test_str = # load your file here
p = re.compile(ur'(\$INGGA.*\n\$INHDT.*\n)')
matches = re.findall(p, test_str)
In the example PuTTY log you give, its all one line separated with space.
So in this case you can use this to replace the space with new line and gets new file -
cat large_file | sed 's/ /\n/g' > new_large_file
To iterate over the file separated with new line, run this -
cat new_large_file | python your_script.py
Your script get line by line so your computer should not crash.
your_script.py -
import sys
INGGA_line = ""
for line in sys.stdin:
line_striped = line.strip()
if line_striped.startswith("$INGGA"):
INGGA_line = line_striped
elif line_striped.startswith("$INZDA"):
print line_striped, INGGA_line
else:
print line_striped
This answer is aimed at python 3.
According to this other answer (and the docs), you can iterate your file line-by-line memory-efficiently:
with open(filename, 'r') as f:
for line in f:
...process...
An example of how you could fulfill your above criteria could be
# Target file write-only, source file read-only
with open(targetfile, 'w') as tf, open(sourcefile, 'r') as sf:
# Flag for whether we are looking for 1st or 2nd part
look_for_ingga = True
for line in sf:
if look_for_ingga:
if line.startswith('$INGGA,'):
tf.write(line)
look_for_ingga = False
elif line.startswith('$INHDT,'):
tf.write(line)
look_for_ingga = True
In the case where you have multiple '$INGGA,' prior to the '$INHDT,', this grabs the first one and disregards the rest. In case you want to take only the last '$INGGA,' before the '$INHDT,', store the last '$INGGA,' in a variable instead of writing it to disk. Then, when you find your '$INHDT,', store both.
In case you meant that you want to write to a separate new file for each INGGA-INHDT pair, the target file with-statement should be nested inside for line in sf instead, or the results should be buffered in a list for later storage.
Refer to the docs for introductions to with-statements and file reading/writing.

Writing to the end of specific line in python

I have a text file that contains key value pairs separated by a tab like this:
KEY\tVALUE
I have opened this file in append mode(a+) so I can both read and write. Now it may happen that a particular key has more than 1 value. For that I want to be able to go to that particular key and write the next value beside original one separated by a some delimiter(or ,).
Here is what I wish to do:
import io
ft = io.open("test.txt",'a+')
ft.seek(0)
for line in ft:
if (line.split('\t')[0] == "querykey"):
ft.write(unicode("nextvalue"));#Write the another key value beside the original one
Now there are two problems with it:
I will iterate through the file to see on which line the key is present(Is there a faster way?)
I will write a string to the end of that line.
I would be grateful if I can get help with the second point.
The write function always writes at the end of file. How should I write to the end of a specific line? I have searched and have not got very clear answers as to how to do that
You can read whole of file content, do your edit and write edited content to file.
with open('test.txt') as f:
lines = f.readlines()
f= open('test.txt', 'w')#open file for write
for line in lines:
if line.split('\t')[0] == "querykey":
line = line + ',newkey'
f.write('\n'.join(lines))

Delete a row from a text file with Python

I have a file where each line starts with a number. The user can delete a row by typing in the number of the row the user would like to delete.
The issue I'm having is setting the mode for opening it. When I use a+, the original content is still there. However, tacked onto the end of the file are the lines that I want to keep. On the other hand, when I use w+, the entire file is deleted. I'm sure there is a better way than opening it with w+ mode, deleting everything, and then re-opening it and appending the lines.
def DeleteToDo(self):
print "Which Item Do You Want To Delete?"
DeleteItem = raw_input(">") #select a line number to delete
print "Are You Sure You Want To Delete Number" + DeleteItem + "(y/n)"
VerifyDelete = str.lower(raw_input(">"))
if VerifyDelete == "y":
FILE = open(ToDo.filename,"a+") #open the file (tried w+ as well, entire file is deleted)
FileLines = FILE.readlines() #read and display the lines
for line in FileLines:
FILE.truncate()
if line[0:1] != DeleteItem: #if the number (first character) of the current line doesn't equal the number to be deleted, re-write that line
FILE.write(line)
else:
print "Nothing Deleted"
This is what a typical file may look like
1. info here
2. more stuff here
3. even more stuff here
When you open a file for writing, you clobber the file (delete its current contents and start a new file). You can find this out by reading documentation for the open() command.
When you open a file for appending, you do not clobber the file. But how can you delete just one line? A file is a sequence of bytes stored on a storage device; there is no way for you to delete one line and have all the other lines automatically "slide down" into new positions on the storage device.
(If your data was stored in a database, you could actually delete just one "row" from the database; but a file is not a database.)
So, the traditional way to solve this: you read from the original file, and you copy it to a new output file. As you copy, you perform any desired edits; for example, you can delete a line simply by not copying that one line; or you can insert a line by writing it in the new file.
Then, once you have successfully written the new file, and successfully closed it, if there is no error, you go ahead and rename the new file back to the same name as the old file (which clobbers the old file).
In Python, your code should be something like this:
import os
# "num_to_delete" was specified by the user earlier.
# I'm assuming that the number to delete is set off from
# the rest of the line with a space.
s_to_delete = str(num_to_delete) + ' '
def want_input_line(line):
return not line.startswith(s_to_delete)
in_fname = "original_input_filename.txt"
out_fname = "temporary_filename.txt"
with open(in_fname) as in_f, open(out_fname, "w") as out_f:
for line in in_f:
if want_input_line(line):
out_f.write(line)
os.rename(out_fname, in_fname)
Note that if you happen to have a file called temporary_filename.txt it will be clobbered by this code. Really we don't care what the filename is, and we can ask Python to make up some unique filename for us, using the tempfile module.
Any recent version of Python will let you use multiple statements in a single with statement, but if you happen to be using Python 2.6 or something you can nest two with statements to get the same effect:
with open(in_fname) as in_f:
with open(out_fname, "w") as out_f:
for line in in_f:
... # do the rest of the code
Also, note that I did not use the .readlines() method to get the input lines, because .readlines() reads the entire contents of the file into memory, all at once, and if the file is very large this will be slow or might not even work. You can simply write a for loop using the "file object" you get back from open(); this will give you one line at a time, and your program will work with even really large files.
EDIT: Note that my answer is assuming that you just want to do one editing step. As #jdi noted in comments for another answer, if you want to allow for "interactive" editing where the user can delete multiple lines, or insert lines, or whatever, then the easiest way is in fact to read all the lines into memory using .readlines(), insert/delete/update/whatever on the resulting list, and then only write out the list to a file a single time when editing is all done.
def DeleteToDo():
print ("Which Item Do You Want To Delete?")
DeleteItem = raw_input(">") #select a line number to delete
print ("Are You Sure You Want To Delete Number" + DeleteItem + "(y/n)")
DeleteItem=int(DeleteItem)
VerifyDelete = str.lower(raw_input(">"))
if VerifyDelete == "y":
FILE = open('data.txt',"r") #open the file (tried w+ as well, entire file is deleted)
lines=[x.strip() for x in FILE if int(x[:x.index('.')])!=DeleteItem] #read all the lines first except the line which matches the line number to be deleted
FILE.close()
FILE = open('data.txt',"w")#open the file again
for x in lines:FILE.write(x+'\n') #write the data to the file
else:
print ("Nothing Deleted")
DeleteToDo()
Instead of writing out all lines one by one to the file, delete the line from memory (to which you read the file using readlines()) and then write the memory back to disk in one shot. That way you will get the result you want, and you won't have to clog the I/O.
You could mmap the file... after haven read the suitable documentation...
You don't need to check for the lines numbers in your file, you can do something like this:
def DeleteToDo(self):
print "Which Item Do You Want To Delete?"
DeleteItem = int(raw_input(">")) - 1
print "Are You Sure You Want To Delete Number" + str(DeleteItem) + "(y/n)"
VerifyDelete = str.lower(raw_input(">"))
if VerifyDelete == "y":
with open(ToDo.filename,"r") as f:
lines = ''.join([a for i,a in enumerate(f) if i != DeleteItem])
with open(ToDo.filename, "w") as f:
f.write(lines)
else:
print "Nothing Deleted"

Categories