Goal is to write a script which will copy a text file and exclude any line beginning with #.
My question is I seem to get an index error which is dependent upon the order of my if elif conditions. The only difference between the nonworking code and the working code (besides the suffix "_bad" to the nonworking function name) is that I test the "" condition first (works) vs testing the "#" condition first (doesn't work)
Base file is created by this script:
>>> testFileObj = open("test.dat","w")
>>> testFileObj.write("#line one\nline one\n#line two\nline two\n")
>>> testFileObj.close()
Code which works:
def copyAndWriteExcludingPoundSigns(origFile, origFileWithOutPounds):
origFileObj = open(origFile,"r")
modFileObj = open(origFileWithOutPounds,"w")
while True:
textObj = origFileObj.readline()
if textObj == "":
break
elif textObj[0] == "#":
continue
else:
modFileObj.write(textObj)
origFileObj.close()
modFileObj.close()
Code which doesn't work:
def copyAndWriteExcludingPoundSigns_Bad(origFile, origFileWithOutPounds):
origFileObj = open(origFile,"r")
modFileObj = open(origFileWithOutPounds,"w")
while True:
textObj = origFileObj.readline()
if textObj[0] == "#":
continue
elif textObj == "":
break
else:
modFileObj.write(textObj)
origFileObj.close()
modFileObj.close()
Which gives me this error:
Traceback (most recent call last):
File "<pyshell#96>", line 1, in <module>
copyAndWriteExcludingPoundSigns_Bad("test.dat","testOutput.dat")
File "<pyshell#94>", line 6, in copyAndWriteExcludingPoundSigns_Bad
if textObj[0] == "#":
IndexError: string index out of range
If you do if textObj[0] == "#": and textObj="" then there is no character at the zero index, because the string is empty, hence the index error.
The alternative is to do
if textObj.startswith("#"): which will work in both cases.
some tips (and please please read PEP8):
use a 'for' instead of a 'while' loop
no need to use readlines after python 2.4
test if the line is empty before testing for the first char
Untested:
def copy_and_write_excluding_pound_signs(original, filtered):
original_file = open(original,"r")
filtered_file = open(filtered,"w")
for line in original_file:
if line and line[0] == '#':
continue
filtered_file.write(line)
original_file.close()
filtered_file.close()
You may also want to filter a line with some white space befor the '#':
import re
def copy_and_write_excluding_pound_signs(original, filtered):
pound_re = re.compile(r'^\s*#')
original_file = open(original,"r")
filtered_file = open(filtered,"w")
for line in original_file:
if pound_re.match(line):
continue
filtered_file.write(line)
original_file.close()
filtered_file.close()
You should use line.startswith('#') to check whether the string line starts with '#'. If the line is empty (such as line = ''), there would be no first character, and you'd get this error.
Also the existence of a line that an empty string isn't guaranteed, so breaking out of the loop like that is inadvisable. Files in Python are iterable, so can simply do a for line in file: loop.
The problem with your non-working code is that it is encountering an empty line, which is causing the IndexError when the statement if textObj[0] == "#": is evaluated ([0] is a reference to the first element of string). The working code avoids doing that when the line is empty.
The simpliest way I can think of to rewrite your function is to use for line in <fileobj> you won't have worry about line ever being empty. Also if you use the Python with statement, your files will also automatically be closed. Anyway here's what I suggest:
def copyAndWriteExcludingPoundSigns(origFile, origFileWithOutPounds):
with open(origFile,"r") as origFileObj:
with open(origFileWithOutPounds,"w") as modFileObj:
for line in origFileObj:
if line[0] != '#':
modFileObj.write(line)
The two with statement could be combine, but that would have made for a very long and harder to read line of code so I broke it up.
Related
I am writing this python code to check DNA sequence file. The output will be the name of person to whom this DNA is matched.
This link has the description of assignment.
https://cs50.harvard.edu/x/2020/psets/6/dna/
But when i try to run the code its showing value error.
Kindly someone let me know the error in the code.
I am new to the programming.
from sys import argv, exit
import csv
def max_Reptitions_of_substrings(dnaSequences , substring):
arr = [0] * len(dnaSequences)
for i in range(len(dnaSequences) - len(substring), -1, -1):
if dnaSequences[i: i + len(substring)] == substring:
if i + len(substring) > len(dnaSequences) - 1:
arr[i] = 1
else:
arr[i] = 1 + arr[i + len(substring)]
return max(arr)
def print_Matching(reading, newdata):
for i in reading:
human = i[0]
value = [int(digit) for digit in i[1:]]
if value == newdata:
print(human)
return
print("No match")
def main():
if len(argv) != 3:
print("Missing Command line Argument")
exit(1)
with open(argv[1], 'r') as database:
reading = csv.reader(database)
sequences = next(reading)[1:]
with open(argv[2], 'r') as sequenceFilestrong text:
dnaSequences = sequenceFile.read()
newdata = [max_Reptitions_of_substrings(dnaSequences, substr) for substr in sequences]
print_Matching(reading, newdata)
Value error shown is as
Traceback (most recent call last):
File "dna.py", line 36, in <module>
print_Matching(reading, newdata)
File "dna.py", line 15, in print_Matching
for i in reading:
ValueError: I/O operation on closed file.
The error message is pretty explicit and spot-on:
ValueError: I/O operation on closed file.
You are opening your CSV file in a with block and create a new CSV reader based on that file. But at the end of the with block, the file is closed. reading now refers to a CSV reader that’s linked to a closed file connection.
Hence the error.
Try to look at code indentation, it's very important in Python. First with should be inside function main, second with should be inside first with.
Why?
Just look at the code. In print matching you're using reading csv.reader, which right now try to use already closed file passed as argument.
I wrote this python function. Project spec doesn't allow me to use try/except to handle an error. Per the doctstring I'm supposed to return False if successful, True if there's a failure and I'll call a declare_error function. Line number taken care of in a main function.
Note I'm not allowed to import anything besides sys and os, so using regular expressions are off the table.
Here is my code. Any suggestions on what I should use for the if/else statement?
#===============================================================================
def read_subsequent_lines(file_object, line_number, simulation_obj_list):
#===============================================================================
'''Read and parse the next line of the file, confirm it matches one of the
line signatures, such as ('sim_object_type=World', 'size=')
Parameters: file_object, the input file object.
line_number, the current line number being read in.
simulation_obj_list, the list of converted lines.
Returns: False for success, True for failure
Convert the line in the file to a list of strings, with one string per
name=value pair (such as "name=Joe"). Make sure that each line matches
up with one of the line "signatures" in the constant definitions.
Modify this line_list to only include the value portion of the pair,
calling extract_line_using_signature (...) to get the each line list.
Append each modified line_list to the simulation_obj_list( ).
If success: Return False.
If failure: Call declare_error (...) and Return True.
'''
#List of lists to contain each line of the input file
line_list = []
#Read in Lines and append to a list of lines containing strings
for line in file_object:
# print(line.strip().split(','))
line_list.append(line.strip().split())
#Compare line to signature constants and append to simulation_obj_list
for i in line_list:
#World
if len(i) == NUM_WORLD_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[0][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[0][1]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[0],i))
#Person
if len(i) == NUM_PERSON_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[1][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[1][1]) and \
i[2].startswith(LINE_SIGNATURE_TUPLE[1][2]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[1],i))
#Robot
if len(i) == NUM_ROBOT_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[2][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[2][1]) and \
i[2].startswith(LINE_SIGNATURE_TUPLE[2][2]) and i[3].startswith(LINE_SIGNATURE_TUPLE[2][3]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[2],i))
The quick ugly fix is to use elifs and elses (I'm assuming NUM_WORLD_TOKENS, NUM_PERSON_TOKENS, and NUM_ROBOT_TOKENS are all unique values):
#Compare line to signature constants and append to simulation_obj_list
for i in line_list:
#World
if len(i) == NUM_WORLD_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[0][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[0][1]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[0],i))
else:
declare_error()
return True
#Person
elif len(i) == NUM_PERSON_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[1][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[1][1]) and \
i[2].startswith(LINE_SIGNATURE_TUPLE[1][2]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[1],i))
else:
declare_error()
return True
#Robot
elif len(i) == NUM_ROBOT_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[2][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[2][1]) and \
i[2].startswith(LINE_SIGNATURE_TUPLE[2][2]) and i[3].startswith(LINE_SIGNATURE_TUPLE[2][3]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[2],i))
else:
declare_error()
return True
return False
That is very smelly code though. How about using regular expressions?
for line in lines:
if re.match(WORLD_REGEX, line):
simulation_obj_list.append(
extract_line_using_signature(LINE_SIGNATURE_TUPLE[0], line))
elif re.match(PERSON_REGEX, line):
# etc
else:
declare_error()
return True
return False
In the following iteration, once the conditions to find a line are satisfied, how can I modify and write to the text file containing the list,a line situated at "n" lines distance from the one where the iteration has arrived?
llll=['aaa','ww','emmm','wiguy','ynof','sijegy']
mfw=open(r"D:\file.txt",'r')
listmfw=mfw.readlines()
for i, line in enumerate(listmfw):
if line != '\n':
if not (line.split()[0] in llll):
if (line.strip('\n') == 'wiguy'):
i=i-42
print('zugyt')
#As the line where I want to write is empty,I tried:
line=line.replace('\n\',''zugyt')
But nothing happens
I added what was missing - typing error!
I think you may misunderstand what python's file interface does. Even if you changed the open to read/write and then wrote to the file, all that would do is overwrite what's in the file at the point you write. You may want to look at fileinput, below replaces all blank newlines with 'Test\n':
from __future__ import print_function
import fileinput
file = fileinput.input('test.txt', inplace=1)
for line in file:
if line == '\n':
print('Test')
else:
print(line, end='')
I found a working solution which writes what I need in the position I need,relative to another position which is found by the specified conditionis.This is: :
mfw=open(r"D:\file.txt",'r')
listmfw=mfw.readlines()
for i, line in enumerate(listmfw):
if line != '\n':
#if not (line.split()[0] in llll):
if (line.strip('\n') == 'wiguy'):
listmfw[i-43]='bzz'
out = open(r"D:\Users\cristina\Documents\horia\Linguistics\file.txt", 'w')
out.writelines(listmfw)
listmfw[i-31]='vrrr'
out.writelines(listmfw)
listmfw[i-19]='dopiu'
out.writelines(listmfw)
if (line.strip('\n') == 'emmm'):
listmfw[i-13]='ommm'
out.writelines(listmfw)
if (line.strip('\n') == 'sijegy'):
listmfw[i-19]='mrrr'
out.writelines(listmfw)
out.flush()
out.close()
mfw.close()
f=open('sequence3.fasta', 'r')
str=''
for line in f:
line2=line.rstrip('\n')
if (line2[0]!='>'):
str=str+line2
elif (len(line)==0):
break
str.rstrip('\n')
f.close()
The script is suppose to read 3 DNA sequences and connect them to one sequence.
The problem is, I get this error:
IndexError: string index out of range
And when I write like this:
f=open('sequence3.fasta', 'r')
str=''
for line in f:
line.rstrip('\n')
if (line[0]!='>'):
str=str+line
elif (len(line)==0):
break
str.rstrip('\n')
f.close()
It runs but there are spaces in between.
Thanks
The second version doesn't crash because the line line.rstrip('\n') is a NOOP. rtrip returns a new string, and doesn't modify the existing one (line).
The first version crashes because probably you have empty lines in your input file so line.rstrip returns an empty line. Try this:
f=open('sequence3.fasta', 'r')
str=''
for line in f:
line2=line.rstrip('\n')
if line2 and line2[0]!='>':
str=str+line2
elif len(line)==0:
break
if line2 is an equivalent of if len(line2) > 0. Similarly, you could replace your elif len(line)==0 with elif not line.
Your empty line condition is in wrong place. Try:
for line in f:
line = line.rstrip('\n')
if len(line) == 0: # or simply: if not line:
break
if line[0] != '>':
str=str+line
Or another solution is to use the .startswith: if not line.startswith('>')
line.rstrip('\n')
Returns copy of line, and you do nothing with it. It doesn't change "line".
Exception "IndexError: string index out of range" means that "line[0]" cannot be referenced -- so "line" must be empty. Perhaps you should make it like this:
for line in f:
line = line.rstrip('\n')
if line:
if (line[0]!='>'):
str=str+line
else:
break
You shouldn't use your second code example where you don't save the return value of rstrip. rstrip doesn't modify the original string that it was used on. RStrip - Return a copy of the string with trailing characters removed..
Also in your if else statement your first condition that you check should be for length 0, otherwise you'll get an error for checking past the strings length.
Additionally, having a break in your if else statements will end your loop early if you have an empty line. Instead of breaking you could just not do anything if there is 0 length.
if (len(line2) != 0):
if (line2[0] != '>'):
str = str+line2
Also your line near the end str.rstrip('\n') isn't doing anything since the return value of rstrip isn't saved.
I'd like to read a file in python line by line, but in some cases (based on an if condition) I'd also like to read the next line in the file, and then keep reading it the same way.
Example:
file_handler = open(fname, 'r')
for line in file_handler:
if line[0] == '#':
print line
else:
line2 = file_handler.readline()
print line2
basically in this example I am trying to read it line by line, but when the line does not start with # I'd like to read the next line, print it, and then keep reading the line after line2. This is just an example where I got the error for similar stuff I am doing in my code but my goal is as stated in the title.
But I'd get an error like ValueError: Mixing iteration and read methods would lose data.
Would it be possible to do what I am trying to do in a smarter way?
If you just want to skip over lines not starting with #, there's a much easier way to do this:
file_handler = open(fname, 'r')
for line in file_handler:
if line[0] != '#':
continue
# now do the regular logic
print line
Obviously this kind of simplistic logic won't work in all possible cases. When it doesn't, you have to do exactly what the error implies: either use iteration consistently, or use read methods consistently. This is going to be more tedious and error-prone, but it's not that bad.
For example, with readline:
while True:
line = file_handler.readline()
if not line:
break
if line[0] == '#':
print line
else:
line2 = file_handler.readline()
print line2
Or, with iteration:
lines = file_handler
for line in file_handler:
if line[0] == '#':
print line
else:
print line
print next(file_handler)
However, that last version is sort of "cheating". You're relying on the fact that the iterator in the for loop is the same thing as the iterable it was created from. This happens to be true for files, but not for, say, lists. So really, you should do the same kind of while True loop here, unless you want to add an explicit iter call (or at least a comment explaining why you don't need one).
And a better solution might be to write a generator function that transforms one iterator into another based on your rule, and then print out each value iterated by that generator:
def doublifier(iterable):
it = iter(iterable)
while True:
line = next(it)
if line.startswith('#'):
yield line, next(it)
else:
yield (line,)
file_handler = open(fname, 'r')
for line in file_handler:
if line.startswith('#'): # <<< comment 1
print line
else:
line2 = next(file_handler) # <<< comment 2
print line2
Discussion
Your code used a single equal sign, which is incorrect. It should be double equal sign for comparison. I recommend to use the .startswith() function to enhance code clarity.
Use the next() function to advance to the next line since you are using file_handler as an iterator.
add a flag value:
if flag is True:
print line #or whatever
flag = False
if line[0] == '#':
flag = True
This is versatile version :-)
You can save a bit of state information that tells you what to do with the next line:
want_next = False
for line in open(fname):
if want_next:
print line
want_next = False
elif line[0] == '#':
print line
want_next = True
I think what you are looking for is next rather than readline.
A few things. In your code, you use = rather than ==. I will use startswith instead. If you call next on an iterator, it will return the next item or throw a StopIteration exception.
The file
ewolf#~ $cat foo.txt
# zork zap
# woo hoo
here is
some line
# a line
with no haiku
The program
file_handler = open( 'foo.txt', 'r' )
for line in file_handler:
line = line.strip()
if line.startswith( '#' ):
print "Not Skipped : " + line
elif line is not None:
try:
l2 = file_handler.next()
l2 = l2.strip()
print "Skipping. Next line is : " + l2
except StopIteration:
# End of File
pass
The output
Not Skipped : # zork zap
Not Skipped : # woo hoo
Skipping. Next line is : some line
Not Skipped : # a line
Skipping. Next line is :
try if line[0] == "#" instead of line[0] = "#"