Error Handling without using try/except in a Python Function - python

I wrote this python function. Project spec doesn't allow me to use try/except to handle an error. Per the doctstring I'm supposed to return False if successful, True if there's a failure and I'll call a declare_error function. Line number taken care of in a main function.
Note I'm not allowed to import anything besides sys and os, so using regular expressions are off the table.
Here is my code. Any suggestions on what I should use for the if/else statement?
#===============================================================================
def read_subsequent_lines(file_object, line_number, simulation_obj_list):
#===============================================================================
'''Read and parse the next line of the file, confirm it matches one of the
line signatures, such as ('sim_object_type=World', 'size=')
Parameters: file_object, the input file object.
line_number, the current line number being read in.
simulation_obj_list, the list of converted lines.
Returns: False for success, True for failure
Convert the line in the file to a list of strings, with one string per
name=value pair (such as "name=Joe"). Make sure that each line matches
up with one of the line "signatures" in the constant definitions.
Modify this line_list to only include the value portion of the pair,
calling extract_line_using_signature (...) to get the each line list.
Append each modified line_list to the simulation_obj_list( ).
If success: Return False.
If failure: Call declare_error (...) and Return True.
'''
#List of lists to contain each line of the input file
line_list = []
#Read in Lines and append to a list of lines containing strings
for line in file_object:
# print(line.strip().split(','))
line_list.append(line.strip().split())
#Compare line to signature constants and append to simulation_obj_list
for i in line_list:
#World
if len(i) == NUM_WORLD_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[0][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[0][1]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[0],i))
#Person
if len(i) == NUM_PERSON_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[1][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[1][1]) and \
i[2].startswith(LINE_SIGNATURE_TUPLE[1][2]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[1],i))
#Robot
if len(i) == NUM_ROBOT_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[2][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[2][1]) and \
i[2].startswith(LINE_SIGNATURE_TUPLE[2][2]) and i[3].startswith(LINE_SIGNATURE_TUPLE[2][3]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[2],i))

The quick ugly fix is to use elifs and elses (I'm assuming NUM_WORLD_TOKENS, NUM_PERSON_TOKENS, and NUM_ROBOT_TOKENS are all unique values):
#Compare line to signature constants and append to simulation_obj_list
for i in line_list:
#World
if len(i) == NUM_WORLD_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[0][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[0][1]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[0],i))
else:
declare_error()
return True
#Person
elif len(i) == NUM_PERSON_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[1][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[1][1]) and \
i[2].startswith(LINE_SIGNATURE_TUPLE[1][2]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[1],i))
else:
declare_error()
return True
#Robot
elif len(i) == NUM_ROBOT_TOKENS:
if i[0].startswith(LINE_SIGNATURE_TUPLE[2][0]) and i[1].startswith(LINE_SIGNATURE_TUPLE[2][1]) and \
i[2].startswith(LINE_SIGNATURE_TUPLE[2][2]) and i[3].startswith(LINE_SIGNATURE_TUPLE[2][3]):
simulation_obj_list.append(extract_line_using_signature(LINE_SIGNATURE_TUPLE[2],i))
else:
declare_error()
return True
return False
That is very smelly code though. How about using regular expressions?
for line in lines:
if re.match(WORLD_REGEX, line):
simulation_obj_list.append(
extract_line_using_signature(LINE_SIGNATURE_TUPLE[0], line))
elif re.match(PERSON_REGEX, line):
# etc
else:
declare_error()
return True
return False

Related

Replacement for isAlpha() to include underscores?

I am processing data using Python3 and I need to read a results file that looks like this:
ENERGY_BOUNDS
1.964033E+07 1.733253E+07 1.491825E+07 1.384031E+07 1.161834E+07 1.000000E+07 8.187308E+06 6.703200E+06
6.065307E+06 5.488116E+06 4.493290E+06 3.678794E+06 3.011942E+06 2.465970E+06 2.231302E+06 2.018965E+06
GAMMA_INTERFACE
0
EIGENVALUE
1.219034E+00
I want to search the file for a specific identifier (in this case ENERGY_BOUNDS), begin reading the numeric values after this identifier but not the identifier itself, and stop when I reach the next identifier. However, my problem is that I was using isAlpha to find the next identifier, and some of them contain underscores. Here is my code:
def read_data_from_file(file_name, identifier):
with open(file_name, 'r') as read_obj:
list_of_results = []
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
if identifier in line:
# If yes, read the next line
nextValue = next(read_obj)
while(not nextValue.strip().isalpha()): # Keep on reading until next identifier appears
list_of_results.extend(nextValue.split())
nextValue = next(read_obj)
return(list_of_results)
I think I need to use regex, but I am stuck regarding how to phrase it. Any help would be much appreciated!
take = False
with open('path/to/input') as infile:
for line in input:
if line.strip() == "ENERGY_BOUNDS":
take = True
continue # we don't actually want this line
if all(char.isalpha() or char=="_" for char in line.strip()): # we've hit the next section
take = False
if take:
print(line) # or whatever else you want to do with this line
Here's an option for you.
Just iterate over the file until you hit the identifier.
Then iterate over it in another for loop until the next identifier causes a ValueError.
def read_data_from_file(file_name, identifier):
with open(file_name, 'r') as f:
list_of_results = []
for line in f:
if identifier in line:
break
for line in f:
try:
list_of_results.extend(map(float, line.split()))
except ValueError:
break
return list_of_results
You can use this regex: ^[A-Z]+(?:_[A-Z]+)*$
Additionally, you can modify the regex to match strings of custom length, like this: ^[A-Z]{2,10}+(?:_[A-Z]+)*$, where {2, 10} is {MIN, MAX} length:
You can find this demo here: https://regex101.com/r/9jESAH/35
See this answer for more details.
Here is a simple function to verify a string has alpha, uppercase and lowercase and underscore:
RE_PY_VAR_NAME="^[a-zA-Z_]+$"
def isAlphaUscore(s:str) -> bool:
assert not s is None, "s cannot be None"
return re.search(RE_PY_VAR_NAME, s)

Return value in a quite nested for-loop

I want nested loops to test whether all elements match the condition and then to return True. Example:
There's a given text file: file.txt, which includes lines of this pattern:
aaa:bb3:3
fff:cc3:4
Letters, colon, alphanumeric, colon, integer, newline.
Generally, I want to test whether all lines matches this pattern. However, in this function I would like to check whether the first column includes only letters.
def opener(file):
#Opens a file and creates a list of lines
fi=open(file).read().splitlines()
import string
res = True
for i in fi:
#Checks whether any characters in the first column is not a letter
if any(j not in string.ascii_letters for j in i.split(':')[0]):
res = False
else:
continue
return res
However, the function returns False even if all characters in the first column are letters. I would like to ask you for the explanation, too.
Your code evaluates the empty line after your code - hence False :
Your file contains a newline after its last line, hence your code checks the line after your last data which does not fullfill your test- that is why you get False no matter the input:
aaa:bb3:3
fff:cc3:4
empty line that does not start with only letters
You can fix it if you "spezial treat" empty lines if they occur at the end. If you have an empty line in between filled ones you return False as well:
with open("t.txt","w") as f:
f.write("""aaa:bb3:3
fff:cc3:4
""")
import string
def opener(file):
letters = string.ascii_letters
# Opens a file and creates a list of lines
with open(file) as fi:
res = True
empty_line_found = False
for i in fi:
if i.strip(): # only check line if not empty
if empty_line_found: # we had an empty line and now a filled line: error
return False
#Checks whether any characters in the first column is not a letter
if any(j not in letters for j in i.strip().split(':')[0]):
return False # immediately exit - no need to test the rest of the file
else:
empty_line_found = True
return res # or True
print (opener("t.txt"))
Output:
True
If you use
# example with a file that contains an empty line between data lines - NOT ok
with open("t.txt","w") as f:
f.write("""aaa:bb3:3
fff:cc3:4
""")
or
# example for file that contains empty line after data - which is ok
with open("t.txt","w") as f:
f.write("""aaa:bb3:3
ff2f:cc3:4
""")
you get: False
Colonoscopy
ASCII, and UNICODE, both define character 0x3A as COLON. This character looks like two dots, one over the other: :
ASCII, and UNICODE, both define character 0x3B as SEMICOLON. This character looks like a dot over a comma: ;
You were consistent in your use of the colon in your example: fff:cc3:4 and you were consistent in your use of the word semicolon in your descriptive text: Letters, semicolon, alphanumeric, semicolon, integer, newline.
I'm going to assume you meant colon (':') since that is the character you typed. If not, you should change it to a semicolon (';') everywhere necessary.
Your Code
Here is your code, for reference:
def opener(file):
#Opens a file and creates a list of lines
fi=open(file).read().splitlines()
import string
res = True
for i in fi:
#Checks whether any characters in the first column is not a letter
if any(j not in string.ascii_letters for j in i.split(':')[0]):
res = False
else:
continue
return res
Your Problem
The problem you asked about was the function always returning false. The example you gave included a blank line between the first example and the second. I would caution you to watch out for spaces or tabs in those blank lines. You can fix this by explicitly catching blank lines and skipping over them:
for i in fi:
if i.isspace():
# skip blank lines
continue
Some Other Problems
Now here are some other things you might not have noticed:
You provided a nice comment in your function. That should have been a docstring:
def opener(file):
""" Opens a file and creates a list of lines.
"""
You import string in the middle of your function. Don't do that. Move the import
up to the top of the module:
import string # at top of file
def opener(file): # Not at top of file
You opened the file with open() and never closed it. This is exactly why the with keyword was added to python:
with open(file) as infile:
fi = infile.read().splitlines()
You opened the file, read its entire contents into memory, then split it into lines
discarding the newlines at the end. All so that you could split it by colons and ignore
everything but the first field.
It would have been simpler to just call readlines() on the file:
with open(file) as infile:
fi = infile.readlines()
res = True
for i in fi:
It would have been even easier and even simpler to just iterate on the file directly:
with open(file) as infile:
res = True
for i in infile:
It seems like you are building up towards checking the entire format you gave at the beginning. I suspect a regular expression would be (1) easier to write and maintain; (2) easier to understand later; and (3) faster to execute. Both now, for this simple case, and later when you have more rules in place:
import logging
import re
bad_lines = 0
for line in infile:
if line.isspace():
continue
if re.match(valid_line, line):
continue
logging.warn(f"Bad line: {line}")
bad_lines += 1
return bad_lines == 0
Your names are bad. Your function includes the names file, fi, i, j, and res. The only one that barely makes sense is file.
Considering that you are asking people to read your code and help you find a problem, please, please use better names. If you just replaced those names with file (same), infile, line, ch, and result the code gets more readable. If you restructured the code using standard Python best practices, like with, it gets even more readable. (And has fewer bugs!)

Read a file line by line, sometimes reading the next line within same loop

I'd like to read a file in python line by line, but in some cases (based on an if condition) I'd also like to read the next line in the file, and then keep reading it the same way.
Example:
file_handler = open(fname, 'r')
for line in file_handler:
if line[0] == '#':
print line
else:
line2 = file_handler.readline()
print line2
basically in this example I am trying to read it line by line, but when the line does not start with # I'd like to read the next line, print it, and then keep reading the line after line2. This is just an example where I got the error for similar stuff I am doing in my code but my goal is as stated in the title.
But I'd get an error like ValueError: Mixing iteration and read methods would lose data.
Would it be possible to do what I am trying to do in a smarter way?
If you just want to skip over lines not starting with #, there's a much easier way to do this:
file_handler = open(fname, 'r')
for line in file_handler:
if line[0] != '#':
continue
# now do the regular logic
print line
Obviously this kind of simplistic logic won't work in all possible cases. When it doesn't, you have to do exactly what the error implies: either use iteration consistently, or use read methods consistently. This is going to be more tedious and error-prone, but it's not that bad.
For example, with readline:
while True:
line = file_handler.readline()
if not line:
break
if line[0] == '#':
print line
else:
line2 = file_handler.readline()
print line2
Or, with iteration:
lines = file_handler
for line in file_handler:
if line[0] == '#':
print line
else:
print line
print next(file_handler)
However, that last version is sort of "cheating". You're relying on the fact that the iterator in the for loop is the same thing as the iterable it was created from. This happens to be true for files, but not for, say, lists. So really, you should do the same kind of while True loop here, unless you want to add an explicit iter call (or at least a comment explaining why you don't need one).
And a better solution might be to write a generator function that transforms one iterator into another based on your rule, and then print out each value iterated by that generator:
def doublifier(iterable):
it = iter(iterable)
while True:
line = next(it)
if line.startswith('#'):
yield line, next(it)
else:
yield (line,)
file_handler = open(fname, 'r')
for line in file_handler:
if line.startswith('#'): # <<< comment 1
print line
else:
line2 = next(file_handler) # <<< comment 2
print line2
Discussion
Your code used a single equal sign, which is incorrect. It should be double equal sign for comparison. I recommend to use the .startswith() function to enhance code clarity.
Use the next() function to advance to the next line since you are using file_handler as an iterator.
add a flag value:
if flag is True:
print line #or whatever
flag = False
if line[0] == '#':
flag = True
This is versatile version :-)
You can save a bit of state information that tells you what to do with the next line:
want_next = False
for line in open(fname):
if want_next:
print line
want_next = False
elif line[0] == '#':
print line
want_next = True
I think what you are looking for is next rather than readline.
A few things. In your code, you use = rather than ==. I will use startswith instead. If you call next on an iterator, it will return the next item or throw a StopIteration exception.
The file
ewolf#~ $cat foo.txt
# zork zap
# woo hoo
here is
some line
# a line
with no haiku
The program
file_handler = open( 'foo.txt', 'r' )
for line in file_handler:
line = line.strip()
if line.startswith( '#' ):
print "Not Skipped : " + line
elif line is not None:
try:
l2 = file_handler.next()
l2 = l2.strip()
print "Skipping. Next line is : " + l2
except StopIteration:
# End of File
pass
The output
Not Skipped : # zork zap
Not Skipped : # woo hoo
Skipping. Next line is : some line
Not Skipped : # a line
Skipping. Next line is :
try if line[0] == "#" instead of line[0] = "#"

Match an element of every line

I have a list of rules for a given input file for my function. If any of them are violated in the file given, I want my program to return an error message and quit.
Every gene in the file should be on the same chromosome
Thus for a lines such as:
NM_001003443 chr11 + 5997152 5927598 5921052 5926098 1 5928752,5925972, 5927204,5396098,
NM_001003444 chr11 + 5925152 5926098 5925152 5926098 2 5925152,5925652, 5925404,5926098,
NM_001003489 chr11 + 5925145 5926093 5925115 5926045 4 5925151,5925762, 5987404,5908098,
etc.
Each line in the file will be variations of this line
Thus, I want to make sure every line in the file is on chr11
Yet I may be given a file with a different list of chr(and any number of numbers). Thus I want to write a function that will make sure whatever number is found on chr in the line is the same for every line.
Should I use a regular expression for this, or what should I do? This is in python by the way.
Such as: chr\d+ ?
I am unsure how to make sure that whatever is matched is the same in every line though...
I currently have:
from re import *
for line in file:
r = 'chr\d+'
i = search(r, line)
if i in line:
but I don't know how to make sure it is the same in every line...
In reference to sajattack's answer
fp = open(infile, 'r')
for line in fp:
filestring = ''
filestring +=line
chrlist = search('chr\d+', filestring)
chrlist = chrlist.group()
for chr in chrlist:
if chr != chrlist[0]:
print('Every gene in file not on same chromosome')
Just read the file and have a while loop check each line to make sure it contains chr11. There are string functions to search for substrings in a string. As soon as you find a line that returns false (does not contain chr11) then break out of the loop and set a flag valid = false.
import re
fp = open(infile, 'r')
fp.readline()
tar = re.findall(r'chr\d+', fp.readline())[0]
for line in fp:
if (line.find(tar) == -1):
print("Not valid")
break
This should search for a number in the line and check for validity.
Is it safe to assume that the first chr is the correct one? If so, use this:
import re
chrlist = re.findall("chr[0-9]+", open('file').read())
# ^ this is a list with all chr(whatever numbers)
for chr in chrlist:
if chr != chrlist[0]
print("Chr does not match")
break
My solution uses a "match group" to collect the matched numbers from the "chr" string.
import re
pat = re.compile(r'\schr(\d+)\s')
def chr_val(line):
m = re.search(pat, line)
if m is not None:
return m.group(1)
else:
return ''
def is_valid(f):
line = f.readline()
v = chr_val(line)
if not v:
return False
return all(chr_val(line) == v for line in f)
with open("test.txt", "r") as f:
print("The file is {0}".format("valid" if is_valid(f) else "NOT valid"))
Notes:
Pre-compiles the regular expression for speed.
Uses a raw string (r'') to specify the regular expression.
The pattern requires white space (\s) on either side of the chr string.
is_valid() returns False if the first line doesn't have a good chr value. Then it returns a Boolean value that is true if all of the following lines match the chr value of the first line.
Your sample code just prints something like The file is True so I made it a bit friendlier.

Python 2.7 using if elif to go through a text file

Goal is to write a script which will copy a text file and exclude any line beginning with #.
My question is I seem to get an index error which is dependent upon the order of my if elif conditions. The only difference between the nonworking code and the working code (besides the suffix "_bad" to the nonworking function name) is that I test the "" condition first (works) vs testing the "#" condition first (doesn't work)
Base file is created by this script:
>>> testFileObj = open("test.dat","w")
>>> testFileObj.write("#line one\nline one\n#line two\nline two\n")
>>> testFileObj.close()
Code which works:
def copyAndWriteExcludingPoundSigns(origFile, origFileWithOutPounds):
origFileObj = open(origFile,"r")
modFileObj = open(origFileWithOutPounds,"w")
while True:
textObj = origFileObj.readline()
if textObj == "":
break
elif textObj[0] == "#":
continue
else:
modFileObj.write(textObj)
origFileObj.close()
modFileObj.close()
Code which doesn't work:
def copyAndWriteExcludingPoundSigns_Bad(origFile, origFileWithOutPounds):
origFileObj = open(origFile,"r")
modFileObj = open(origFileWithOutPounds,"w")
while True:
textObj = origFileObj.readline()
if textObj[0] == "#":
continue
elif textObj == "":
break
else:
modFileObj.write(textObj)
origFileObj.close()
modFileObj.close()
Which gives me this error:
Traceback (most recent call last):
File "<pyshell#96>", line 1, in <module>
copyAndWriteExcludingPoundSigns_Bad("test.dat","testOutput.dat")
File "<pyshell#94>", line 6, in copyAndWriteExcludingPoundSigns_Bad
if textObj[0] == "#":
IndexError: string index out of range
If you do if textObj[0] == "#": and textObj="" then there is no character at the zero index, because the string is empty, hence the index error.
The alternative is to do
if textObj.startswith("#"): which will work in both cases.
some tips (and please please read PEP8):
use a 'for' instead of a 'while' loop
no need to use readlines after python 2.4
test if the line is empty before testing for the first char
Untested:
def copy_and_write_excluding_pound_signs(original, filtered):
original_file = open(original,"r")
filtered_file = open(filtered,"w")
for line in original_file:
if line and line[0] == '#':
continue
filtered_file.write(line)
original_file.close()
filtered_file.close()
You may also want to filter a line with some white space befor the '#':
import re
def copy_and_write_excluding_pound_signs(original, filtered):
pound_re = re.compile(r'^\s*#')
original_file = open(original,"r")
filtered_file = open(filtered,"w")
for line in original_file:
if pound_re.match(line):
continue
filtered_file.write(line)
original_file.close()
filtered_file.close()
You should use line.startswith('#') to check whether the string line starts with '#'. If the line is empty (such as line = ''), there would be no first character, and you'd get this error.
Also the existence of a line that an empty string isn't guaranteed, so breaking out of the loop like that is inadvisable. Files in Python are iterable, so can simply do a for line in file: loop.
The problem with your non-working code is that it is encountering an empty line, which is causing the IndexError when the statement if textObj[0] == "#": is evaluated ([0] is a reference to the first element of string). The working code avoids doing that when the line is empty.
The simpliest way I can think of to rewrite your function is to use for line in <fileobj> you won't have worry about line ever being empty. Also if you use the Python with statement, your files will also automatically be closed. Anyway here's what I suggest:
def copyAndWriteExcludingPoundSigns(origFile, origFileWithOutPounds):
with open(origFile,"r") as origFileObj:
with open(origFileWithOutPounds,"w") as modFileObj:
for line in origFileObj:
if line[0] != '#':
modFileObj.write(line)
The two with statement could be combine, but that would have made for a very long and harder to read line of code so I broke it up.

Categories