start iteration from a specific line in a text file in python

start iteration from a specific line in a text file in python - python

Assume I have a text file (named test.txt) that I have wrote 15 lines into it before in my Python script. Now, I want to append some lines to that file. How can I start iteration from line #16 of test.txt and append some new lines to it in Python?

To append at the end of the file, you don't need to "iterate" over it – simply open it in append mode:
with open("my_file", "a") as f:
f.write("another line\n")
Iterating over files can be used to read them, not to write them.

when you "open" the file, using the conventional
f = open(FILE)
you should state the method of you are using, in this case, append, so
f = open(FILE, 'a')

Related

Using a for loop to add a new line to a table: python

I am trying to create a .bed file after searching through DNA sequences for two regular expressions. Ideally, I'd like to generate a tab-separated file which contains the sequence description, the start location of the first regex and the end location of the second regex. I know that the regex section works, it's just creating the \t separated file I am struggling with.
I was hoping that I could open/create a file and simply print a new line for each iteration of the for loop that contains this information, like so:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(f'{sequence.description}\t{h.start()}\t{h_rc.end()}')
file_object.close()
But this doesn't seem to work (creates empty file). I have also tried to use file_object.write, but again this creates an empty file too.
This is all of the code I have including searching for the regexes:
import re, sys
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
infile = sys.argv[1]
for sequence in SeqIO.parse(infile, "fasta"):
hit = re.finditer(r"CAGTGGG..GCAA[TA]AA", str(sequence.seq))
mimp_length = 400
for h in hit:
h_start = h.start()
hit_rc = re.finditer(r"TT[TA]TTGC..CCCACTG", str(sequence.seq))
for h_rc in hit_rc:
h_rc_end = h_rc.end()
length = h_rc_end - h_start
if length > 0:
if length < mimp_length:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(sequence.description, h.start(), h_rc.end())
file_object.close()
This is the desired output:
Focub_II5_mimp_1__contig_1.16(656599:656809) 2 208
Focub_II5_mimp_2__contig_1.47(41315:41540) 2 223
Focub_II5_mimp_3__contig_1.65(13656:13882) 2 224
Focub_II5_mimp_4__contig_1.70(61591:61809) 2 216
This is example input:
>Focub_II5_mimp_1__contig_1.16(656599:656809)
TACAGTGGGATGCAAAAAGTATTCGCAGGTGTGTAGAGAGATTTGTTGCTCGGAAGCTAGTTAGGTGTAGCTTGTCAGGTTCTCAGTACCCTATATTACACCGAGATCAGCGGGATAATCTAGTCTCGAGTACATAAGCTAAGTTAAGCTACTAACTAGCGCAGCTGACACAACTTACACACCTGCAAATACTTTTTGCATCCCACTGTA
>Focub_II5_mimp_2__contig_1.47(41315:41540)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTCTGCCGCTAGCCCATTTTAACAGCTAGAGTGTGTATATTAACCTCACACATAGCTATCTCTTATACTAATTGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTGTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_3__contig_1.65(13656:13882)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTTCTGCCGCTAGCCTATTTTAATAGTTAGAGTGTGCATATTAACCTCACACATAGCTATCTTATATACTAATCGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTCTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_4__contig_1.70(61591:61809)
TACAGTGGGATGCAATAAGTTTGAATGCAGGCTGAAGTACCAGCTGTTGTAATCTAGCTCCTGTATACAACGCTTTAGCTTGATAAAGTAAGCGCTAAGCTGTATCAGGCAAAAGGCTATCCCGATTGGGGTATTGCTACGTAGGGAACTGGTCTTACCTTGGTTAGTCAGTGAATGTGTACTTGAGTTTGGATTCAAACTTATTGCATCCCACTGTA
Is anybody able to help?
Thank you :)

to write a line to a file you would do something like this:
with open("file.txt", "a") as f:
print("new line", file=f)
and if you want it tab separated you can also add sep="\t", this is why python 3 made print a function so you can use sep, end, file, and flush keyword arguments. :)
opening a file for appending means the file pointer starts at the end of the file which means that writing to it doesn't override any data (gets appended to the end of the file) and iterating over it (or otherwise reading from it) gives nothing like you already reached the end of the file.
So instead of iterating over the lines of the file you would just write the single line to it:
with open("Mimp_hits.bed", "a") as file_object:
print(sequence.description, h.start(), h_rc.end(), file=file_object)
you can also consider just opening the file near the beginning of the loop since opening it once and writing multiple times is more efficient than opening it multiple times, also the with block automatically closes the file so no need to do that explicitly.

You are trying to open the file in "a+" mode, and loop over lines from it (which will not find anything because the file is positioned at the end when you do that). In any case, if this is an output file only, then you would open it in "a" mode to append to it.
Probably you just want to open the file once for appending, and inside the with statement, do your main loop, using file_object.write(...) when you want to actually append strings to the file. Note that there is no need for file_object.close() when using this with construct.
with open("Mimp_hits.bed", "a") as file_object:
for sequence in SeqIO.parse(infile, "fasta"):
# ... etc per original code ...
if length < mimp_length:
file_object.write("{}\t{}\t{}\n".format(
sequence.description, h.start(), h_rc.end()))

Python: Deleting lines from a file with certain criteria

I'm trying to delete lines from a file with certain criteria but when I run the script it just deletes the whole file. When I change the script to just 'read' the lines it returns the lines with the search criteria but when I open the file in 'write' mode it and change it from printing each line to remove each line it empties the whole thing.
#!/usr/bin/env python
f = raw_input('Enter filename > ')
with open(f, 'w+') as fobj:
criteria = raw_input('Enter criteria > ')
for eachLine in fobj:
if criteria in eachLine:
fobj.remove(eachLine)
break
fobj.close()

I hope you wanted to remove the line having particular criteria. You can simply create another file with and write the content in that file as following:
output = []
with open('test.txt', 'r') as f:
lines = f.readlines()
criteria = 'test'
output =[line for line in lines if criteria not in line]
fin = open('newfile.txt', 'wb')
fin.writelines(output)

From the docs:
w+ Open for reading and writing. The file is created if it does not
exist, otherwise it is truncated. The stream is positioned at
the beginning of the file.
a+ Open for reading and writing. The file is created if it does not
exist. The stream is positioned at the end of the file. Subse-
quent writes to the file will always end up at the then current
end of file, irrespective of any intervening fseek(3) or similar.
so you are truncating the file on the line containing the with open. You probably want to create a new file with a different name and rename it at the end of your program.

Script not writing to file

I would like to make it so that it opens up alan.txt, search the text for all instance of scholary_tehologian and if found, add the word "test" under it. when I tried doing it this way:
## Script
with open('alan.txt', 'r+') as f:
for line in f:
if "scholarly_theologian" in line:
f.write("test")
it wouldn't write anything. I'm in Windows 8.1

You can't modify a file like this. You can only append to it, write characters instead of others, or rewrite it entirely. See How do I modify a text file in Python?.
What you should do is create another file with the content you want.
EDIT:
Claudio's answer has the code for what I offered. It has the benefit (over manicphase's code) of not keeping the whole file in memory. This is important if the file is long. manicphase's answer, on the other hand, has the benefit of not creating a second file. It rewrites the original one. Choose the one that fits your needs.

Rewritten answer because the last one was wrong.
If you want to read lines you have to put .readlines() after open(...) or f. Then there's a few ways you could insert "test".
## Script
with open('alan.txt', 'r') as f:
lines = f.readlines()
for i in range(len(lines)):
if "scholarly_theologian" in lines[i]:
lines[i] = lines[i] + "\ntest"
with open('alan.txt', 'w') as f:
f.write("\n".join(lines))

This should do the trick:
with open('output.txt', 'w') as o:
with open('alan.txt', 'r') as f:
for line in f:
o.write(line)
if line.find('scholarly_theoligian'):
o.write('test')
Like Ella Shar mentioned, you need to create a new file and add the new content into it.
If working with two files is not acceptable, the next step would be to delete the input file, and to rename the output file.

How to efficiently append a new line to the starting of a large file?

I want to append a new line in the starting of 2GB+ file. I tried following code but code OUT of MEMORY
error.
myfile = open(tableTempFile, "r+")
myfile.read() # read everything in the file
myfile.seek(0) # rewind
myfile.write("WRITE IN THE FIRST LINE ")
myfile.close();
What is the way to write in a file file without getting the entire file in memory?
How to append a new line at starting of the file?

Please note, there's no way to do this with any built-in functions in Python.
You can do this easily in LINUX using tail / cat etc.
For doing it via Python we must use an auxiliary file and for doing this with very large files, I think this method is the possibility:
def add_line_at_start(filename,line_to_be_added):
f = fileinput.input(filename,inplace=1)
for xline in f:
if f.isfirstline():
print line_to_be_added.rstrip('\r\n') + '\n' + xline,
else:
print xline
NOTE:
Never try to use read() / readlines() functions when you are dealing with big files. These methods tried load the complete file into your memory
In your given code, seek function is going to take you the starting point but then everything you write would overwrite the current content

If you can afford having the entire file in memory at once:
first_line_update = "WRITE IN THE FIRST LINE \n"
with open(tableTempFile, 'r+') as f:
lines = f.readlines()
lines[0] = first_line_update
f.writelines(lines)
otherwise:
from shutil import copy
from itertools import islice, chain
# TODO: use a NamedTemporaryFile from the tempfile module
first_line_update = "WRITE IN THE FIRST LINE \n"
with open("inputfile", 'r') as infile, open("tmpfile", 'w+') as outfile:
# replace the first line with the string provided:
outfile.writelines(
(line for line in chain((first_line_update,), islice(infile,1,None)))
# if you don't want to replace the first line but to insert another line before
# this simplifies to:
#outfile.writelines(line for line in chain((first_line_update,), infile))
copy("tmpfile", "infile")
# TODO: remove temporary file

Generally, you can't do that. A file is a sequence of bytes, not a sequence of lines. This data model doesn't allow for insertions in arbitrary points - you can either replace a byte by another or append bytes at the end.
You can either:
Replace the first X bytes in the file. This could work for you if you can make sure that the first line's length will never vary.
Truncate the file, write the first line, then rewrite all the rest after it. If you can't fit all your file into the memory, then:
create a temporary file (the tempfile module will help you)
write your line to it
open your base file in r and copy its contents after the first line to the temporary file, piece-wise
close both files, then replace the input file by the temporary file
(Note that appending to the end of a file is much easier - all you need to do is open the file in the append a mode.)

Prepend line to beginning of a file

I can do this using a separate file, but how do I append a line to the beginning of a file?
f=open('log.txt','a')
f.seek(0) #get to the first position
f.write("text")
f.close()
This starts writing from the end of the file since the file is opened in append mode.

In modes 'a' or 'a+', any writing is done at the end of the file, even if at the current moment when the write() function is triggered the file's pointer is not at the end of the file: the pointer is moved to the end of file before any writing. You can do what you want in two manners.
1st way, can be used if there are no issues to load the file into memory:
def line_prepender(filename, line):
with open(filename, 'r+') as f:
content = f.read()
f.seek(0, 0)
f.write(line.rstrip('\r\n') + '\n' + content)
2nd way:
def line_pre_adder(filename, line_to_prepend):
f = fileinput.input(filename, inplace=1)
for xline in f:
if f.isfirstline():
print line_to_prepend.rstrip('\r\n') + '\n' + xline,
else:
print xline,
I don't know how this method works under the hood and if it can be employed on big big file. The argument 1 passed to input is what allows to rewrite a line in place; the following lines must be moved forwards or backwards in order that the inplace operation takes place, but I don't know the mechanism

In all filesystems that I am familiar with, you can't do this in-place. You have to use an auxiliary file (which you can then rename to take the name of the original file).

To put code to NPE's answer, I think the most efficient way to do this is:
def insert(originalfile,string):
with open(originalfile,'r') as f:
with open('newfile.txt','w') as f2:
f2.write(string)
f2.write(f.read())
os.remove(originalfile)
os.rename('newfile.txt',originalfile)

Different Idea:
(1) You save the original file as a variable.
(2) You overwrite the original file with new information.
(3) You append the original file in the data below the new information.
Code:
with open(<filename>,'r') as contents:
save = contents.read()
with open(<filename>,'w') as contents:
contents.write(< New Information >)
with open(<filename>,'a') as contents:
contents.write(save)

The clear way to do this is as follows if you do not mind writing the file again
with open("a.txt", 'r+') as fp:
lines = fp.readlines() # lines is list of line, each element '...\n'
lines.insert(0, one_line) # you can use any index if you know the line index
fp.seek(0) # file pointer locates at the beginning to write the whole file again
fp.writelines(lines) # write whole lists again to the same file
Note that this is not in-place replacement. It's writing a file again.
In summary, you read a file and save it to a list and modify the list and write the list again to a new file with the same filename.

num = [1, 2, 3] #List containing Integers
with open("ex3.txt", 'r+') as file:
readcontent = file.read() # store the read value of exe.txt into
# readcontent
file.seek(0, 0) #Takes the cursor to top line
for i in num: # writing content of list One by One.
file.write(str(i) + "\n") #convert int to str since write() deals
# with str
file.write(readcontent) #after content of string are written, I return
#back content that were in the file

There's no way to do this with any built-in functions, because it would be terribly inefficient. You'd need to shift the existing contents of the file down each time you add a line at the front.
There's a Unix/Linux utility tail which can read from the end of a file. Perhaps you can find that useful in your application.

If the file is the too big to use as a list, and you simply want to reverse the file, you can initially write the file in reversed order and then read one line at the time from the file's end (and write it to another file) with file-read-backwards module

An improvement over the existing solution provided by #eyquem is as below:
def prepend_text(filename: Union[str, Path], text: str):
with fileinput.input(filename, inplace=True) as file:
for line in file:
if file.isfirstline():
print(text)
print(line, end="")
It is typed, neat, more readable, and uses some improvements python got in recent years like context managers :)

I tried a different approach:
I wrote first line into a header.csv file. body.csv was the second file. Used Windows type command to concatenate them one by one into final.csv.
import os
os.system('type c:\\\header.csv c:\\\body.csv > c:\\\final.csv')

with open("fruits.txt", "r+") as file:
file.write("bab111y")
file.seek(0)
content = file.read()
print(content)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

start iteration from a specific line in a text file in python - python

Assume I have a text file (named test.txt) that I have wrote 15 lines into it before in my Python script. Now, I want to append some lines to that file. How can I start iteration from line #16 of test.txt and append some new lines to it in Python?

To append at the end of the file, you don't need to "iterate" over it – simply open it in append mode: with open("my_file", "a") as f: f.write("another line\n") Iterating over files can be used to read them, not to write them.

when you "open" the file, using the conventional f = open(FILE) you should state the method of you are using, in this case, append, so f = open(FILE, 'a')

Related

Using a for loop to add a new line to a table: python

Python: Deleting lines from a file with certain criteria

Script not writing to file

How to efficiently append a new line to the starting of a large file?

Prepend line to beginning of a file

Categories

Resources