Strange behavior (maybe just me) from the python file handling code - python

I just started learning file handling in python3. The code I try to create accepts user provided file name, opens the file, then print out the total number of lines and characters within the file.
The question is that I have to declare different variables (fh and fl in the following example) for the line and character count separately.
In the following working-as-expected code, if I comment out fl = open(fname) line and change for line in fl: to for line in fh, then the line count in the output becomes zero (not expected).
fname = input('Enter the file name: ')
fh = open(fname)
text = fh.read()
fl = open(fname)
count = 0
for line in fl:
count = count + 1
print("line count in", fname, ":", count)
print("word count in", fname, ":", len(text))
Does that mean in the future if I were to process string functions on the same file, I have to declare different variable and read the same file multiple times? Is there a way to achieve "read once, use many times" goal?

When you read lines from the file you opened, the offset you are at currently is also kept track of. So after you have read all the lines (as in the for-loop), the position will be at the end of the file.
You can use f.seek(0) to set the current position in the file back to the beginning. Then you can do the for line in f again and go through the same contents twice, without having to open it twice.
See file methods.
Of course you could also both store the contents and count the number of rows in a single loop as you read through the file; or use the readlines method to get a list of the rows in the file. Using the readlines method would be a good way to read the files only once and get the contents in a useful format.

Related

Using a for loop to add a new line to a table: python

I am trying to create a .bed file after searching through DNA sequences for two regular expressions. Ideally, I'd like to generate a tab-separated file which contains the sequence description, the start location of the first regex and the end location of the second regex. I know that the regex section works, it's just creating the \t separated file I am struggling with.
I was hoping that I could open/create a file and simply print a new line for each iteration of the for loop that contains this information, like so:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(f'{sequence.description}\t{h.start()}\t{h_rc.end()}')
file_object.close()
But this doesn't seem to work (creates empty file). I have also tried to use file_object.write, but again this creates an empty file too.
This is all of the code I have including searching for the regexes:
import re, sys
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
infile = sys.argv[1]
for sequence in SeqIO.parse(infile, "fasta"):
hit = re.finditer(r"CAGTGGG..GCAA[TA]AA", str(sequence.seq))
mimp_length = 400
for h in hit:
h_start = h.start()
hit_rc = re.finditer(r"TT[TA]TTGC..CCCACTG", str(sequence.seq))
for h_rc in hit_rc:
h_rc_end = h_rc.end()
length = h_rc_end - h_start
if length > 0:
if length < mimp_length:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(sequence.description, h.start(), h_rc.end())
file_object.close()
This is the desired output:
Focub_II5_mimp_1__contig_1.16(656599:656809) 2 208
Focub_II5_mimp_2__contig_1.47(41315:41540) 2 223
Focub_II5_mimp_3__contig_1.65(13656:13882) 2 224
Focub_II5_mimp_4__contig_1.70(61591:61809) 2 216
This is example input:
>Focub_II5_mimp_1__contig_1.16(656599:656809)
TACAGTGGGATGCAAAAAGTATTCGCAGGTGTGTAGAGAGATTTGTTGCTCGGAAGCTAGTTAGGTGTAGCTTGTCAGGTTCTCAGTACCCTATATTACACCGAGATCAGCGGGATAATCTAGTCTCGAGTACATAAGCTAAGTTAAGCTACTAACTAGCGCAGCTGACACAACTTACACACCTGCAAATACTTTTTGCATCCCACTGTA
>Focub_II5_mimp_2__contig_1.47(41315:41540)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTCTGCCGCTAGCCCATTTTAACAGCTAGAGTGTGTATATTAACCTCACACATAGCTATCTCTTATACTAATTGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTGTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_3__contig_1.65(13656:13882)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTTCTGCCGCTAGCCTATTTTAATAGTTAGAGTGTGCATATTAACCTCACACATAGCTATCTTATATACTAATCGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTCTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_4__contig_1.70(61591:61809)
TACAGTGGGATGCAATAAGTTTGAATGCAGGCTGAAGTACCAGCTGTTGTAATCTAGCTCCTGTATACAACGCTTTAGCTTGATAAAGTAAGCGCTAAGCTGTATCAGGCAAAAGGCTATCCCGATTGGGGTATTGCTACGTAGGGAACTGGTCTTACCTTGGTTAGTCAGTGAATGTGTACTTGAGTTTGGATTCAAACTTATTGCATCCCACTGTA
Is anybody able to help?
Thank you :)
to write a line to a file you would do something like this:
with open("file.txt", "a") as f:
print("new line", file=f)
and if you want it tab separated you can also add sep="\t", this is why python 3 made print a function so you can use sep, end, file, and flush keyword arguments. :)
opening a file for appending means the file pointer starts at the end of the file which means that writing to it doesn't override any data (gets appended to the end of the file) and iterating over it (or otherwise reading from it) gives nothing like you already reached the end of the file.
So instead of iterating over the lines of the file you would just write the single line to it:
with open("Mimp_hits.bed", "a") as file_object:
print(sequence.description, h.start(), h_rc.end(), file=file_object)
you can also consider just opening the file near the beginning of the loop since opening it once and writing multiple times is more efficient than opening it multiple times, also the with block automatically closes the file so no need to do that explicitly.
You are trying to open the file in "a+" mode, and loop over lines from it (which will not find anything because the file is positioned at the end when you do that). In any case, if this is an output file only, then you would open it in "a" mode to append to it.
Probably you just want to open the file once for appending, and inside the with statement, do your main loop, using file_object.write(...) when you want to actually append strings to the file. Note that there is no need for file_object.close() when using this with construct.
with open("Mimp_hits.bed", "a") as file_object:
for sequence in SeqIO.parse(infile, "fasta"):
# ... etc per original code ...
if length < mimp_length:
file_object.write("{}\t{}\t{}\n".format(
sequence.description, h.start(), h_rc.end()))

Read/Write Loop Text File with Python

I want to open an existing txt file and search for line of text appearing many times and in different places. Each time search found, insert 2 new rows below it with specified text.
I tried this code but got 'AttributeError' on 'Path.write' line ('str' object has no attribute 'write').
Path = '...\\Test.txt'
searchString = '* Start *'
with open(Path, 'r+') as f:
content = f.readlines()
nextLine = False
for line in content:
if searchString in line:
nextLine = not nextLine
else:
if nextLine:
Path.write('Name\nDirection')
nextLine = not nextLine
else:
pass
I must also allocate to 'Direction' line a number, starting at 0 and increment by 15 until all file is read. So after first instance is found, two lines are inserted into existing txt file like this;
...some text in the existing text file....
* Start *
Name
Direction 0
0 then changes to 15 on next instance (ie Direction 15), then 30 (ie Direction 30) etc until end of file.
EDITED CODE: Simplified coded. Anyone vote me up I'd appreciate
Path = '...\\Test.txt'
direction_number = 0
#Open new file
Newfile = open(Path, 'w')
#read other file
with open(Path, 'r') as f:
content = f.readlines()
#if find special text, write other lines to new file
for line in content:
Newfile.write(line)
if searchString in line:
Newfile.write('Name\nDirection %d' % direction_number)
direction_number += 15
Newfile.close()
Instead of trying to reopen and insert lines into the original file, you should just write a new file. So for each line in the old file, write it to the new file, and write the two additional lines if it contains the text in question.
direction_number = 0
with open("newfile.txt", 'w') as g:
# Loop through every line of text we've already read from
# the first file.
for line in content:
# Write the line to the new file
g.write(line)
# Also, check if the line contains the <searchString> string.
# If it does, write the "Name" and "Direction [whatever]" line.
if searchString in line:
g.write('Name\nDirection %d\n' % direction_number)
direction_number += 15
EDIT: To explain more about this second with open statement: Remember earlier that you used with open(Path, 'r+') as f: to READ your file.
The Path part is where the name of the file is stored, the r+ part means that you're opening it for reading, and the "f" is just a variable that essentially says, "Anything we do on f, we do to the file". Likewise, to start working with a new file, I wrote with open("newfile.txt", 'w') as g:. The "newfile.txt" is the name of the file. The "w" means you're opening up this file for writing to it instead of reading from it (if the file doesn't exist, it will create it; if it exists already, it will completely write over it). Then the "g" is just a variable I picked to refer to this file. So g.write(line) just writes the next line of text from the first file to the next line of text in the second file. I suppose you could use "f" again here, since at this point you've already read all of the lines from the old file. But using a different variable cuts down on any ambiguity of what file you're dealing with, especially if you ever wanted to change this so that you simultaneously have one file still open for reading as you have a second file open for writing.

Python For loop seems to be getting skipped over

This is my first question, so please be nice.
I am leaning Python through an online course. I completed this assignment using Trinket, and it worked there. I submitted the code via the online grading system, and it passed there too. So, I already have been given 'credit' for this assignment.
When I try it in Idle or PyCharm, the code does not work. It seems to be skipping over the for loop completely.
I have looked at other answers on this type of question, but I cannot figure out how to apply them to my situation. Please help me understand why my for loop seems to be getting skipped.
fname = input("Enter file name: ")
fh = open(fname + ".txt")
x = fh.read()
count = 0
for line in fh:
line = line.rstrip()
word = line.split()
if len(word) == 0:
continue
if word[0] != "From":
continue
else:
print(word[1])
count += 1
print("There were", count, "lines in the file with From as the first word")
In the .txt file being used, there are 27 email addresses that print out one by one, then the final line gives the total count. Like I said, it works in Trinket and the online code grader, but not in PyCharm or Idle.
Thanks in advance.
When you do x = fh.read(), you are reading the content of file and storing it in variable x.
From Python documentation:
To read a file’s contents, call f.read(size), which reads some
quantity of data and returns it as a string. size is an optional
numeric argument. When size is omitted or negative, the entire
contents of the file will be read and returned; it’s your problem if
the file is twice as large as your machine’s memory. Otherwise, at
most size bytes are read and returned. If the end of the file has been
reached, f.read() will return an empty string ("").
So, once the file is read completely, fh is already at the end of file and iterating it with for loop is not yielding any new lines. You have two options here:
Change for line in fh: with for line in x: OR
Remove x = fh.read()
You could try replacing
for line in fh:
with
for line in x:
or removing x = fh.read() since it'll read the entire file and left you with an empty file for the loop

How to process filenames separated by new line?

I'm using 2 files
1° Is findjava.py, this file outputs all java file names in a directory, separated by \n
2° countfile receives 1 single filename and counts its lines
I'm already receiving an string with the filenames in count file ( javafile1\njavafile2\njavafile3\n)
How could I run a loop to go trough all those file names one by one?
I'd need to read that string till it finds a \n, then use that part as a variable to run my script to count the lines, and then keep reading the next file name.
Split on \n
So something like
files = "javafile1\njavafile2\njavafile3\n"
list_of_files = files.split("\n")
for file in list_of_files:
with open(file) as fh:
lines = fh.readlines()
You don't leave any example data, unless it's somehow complicated by other factors it seems you can split() the data without further ado:
files = "javafile1\njavafile2\njavafile3\n"
for name in files.split():
print "counting lines in file " + name
countfile(name)
In python you can use a for loop to access a file one line at a time:
with open("file_list") as stream:
for filename in stream:
with open(filename) as f:
for line in f:
lines += 1
print f, lines

Combined effect of reading lines twice?

As a practice, I am learning to reading a file.
As is obvious from code, hopefully, I have a file in working/root whatever directory. I need to read it and print it.
my_file=open("new.txt","r")
lengt=sum(1 for line in my_file)
for i in range(0,lengt-1):
myline=my_file.readlines(1)[0]
print(myline)
my_file.close()
This returns error and says out of range.
The text file simply contains statements like
line one
line two
line three
.
.
.
Everything same, I tried myline=my_file.readline(). I get empty 7 lines.
My guess is that while using for line in my_file, I read up the lines. So reached end of document. To get same result as I desire, I do I overcome this?
P.S. if it mattersm it's python 3.3
No need to count along. Python does it for you:
my_file = open("new.txt","r")
for myline in my_file:
print(myline)
Details:
my_file is an iterator. This a special object that allows to iterate over it.
You can also access a single line:
line 1 = next(my_file)
gives you the first line assuming you just opened the file. Doing it again:
line 2 = next(my_file)
you get the second line. If you now iterate over it:
for myline in my_file:
# do something
it will start at line 3.
Stange extra lines?
print(myline)
will likely print an extra empty line. This is due to a newline read from the file and a newline added by print(). Solution:
Python 3:
print(myline, end='')
Python 2:
print myline, # note the trailing comma.
Playing it save
Using the with statement like this:
with open("new.txt", "r") as my_file:
for myline in my_file:
print(myline)
# my_file is open here
# my_file is closed here
you don't need to close the file as it done as soon you leave the context, i.e. as soon as you continue with your code an the same level as the with statement.
You can actually take care of all of this at once by iterating over the file contents:
my_file = open("new.txt", "r")
length = 0
for line in my_file:
length += 1
print(line)
my_file.close()
At the end, you will have printed all of the lines, and length will contain the number of lines in the file. (If you don't specifically need to know length, there's really no need for it!)
Another way to do it, which will close the file for you (and, in fact, will even close the file if an exception is raised):
length = 0
with open("new.txt", "r") as my_file:
for line in my_file:
length += 1
print(line)

Categories