Greedy execution of statements? - python

I have something like this using BeautifulSoup:
for line in lines:
code = l.find('span', {'class':'boldHeader'}).text
coded = l.find('div', {'class':'Description'}).text
definition = l.find('ul', {'class':'definitions'}).text
print code, coded, def
However, not all elements exist at all times. I can enclose this in a try except so that it does not break the program execution like this:
for line in lines:
try:
code = l.find('span', {'class':'boldHeader'}).text
coded = l.find('div', {'class':'Description'}).text
definition = l.find('ul', {'class':'definitions'}).text
print code, coded, def
except:
pass
But how I execute the statements in a greedy fashion? For instance, if there are only two elements available code and coded, I just want to get those and continue with the execution. As of now, even if code and coded exist, if def does not exist, the print command is never executed.
One way of doing this is to put a try...except for every statement like this:
for line in lines:
try:
code = l.find('span', {'class':'boldHeader'}).text
except:
pass
try:
coded = l.find('div', {'class':'Description'}).text
except:
pass
try:
definition = l.find('ul', {'class':'definitions'}).text
except:
pass
print code, coded, def
But this is an ugly approach and I want something cleaner. Any suggestions?

How about capture the "ugly" code in a function, and just call the function as needed:
def get_txt(l,tag,classname):
try:
txt=l.find(tag, {'class':classname}).text
except AttributeError:
txt=None
return txt
for line in lines:
code = get_txt(l,'span','boldHeader')
coded = get_txt(l,'div','Description')
defn = get_txt(l,'ul','definitions')
print code, coded, defn
PS. I changed def to defn because def is a Python keyword. Using it as a variable name raises a SyntaxError.
PPS. It's not a good practice to use bare exceptions:
try:
....
except:
...
because it almost always captures more that you intend. Much better to be explicit about what you want to catch:
try:
...
except AttributeError as err:
...

First of all, you can test for None instead of catching an exception. l.find should return None if it doesn't find your item. Exceptions should be reserved for errors and really extraordinary situations.
Second thing you can do is to create an array of all HTML elements you want to check and then have a nested for loop. Since it's been a while since I've used python, I will outline the code and then (hopefully) edit the answer when I test it.
Something like:
elementsToCheck = [
[ 'span', {'class':'boldHeader'} ],
[ 'div', {'class':'Description'} ],
[ 'ul', {'class':'definitions'} ]]
concatenated = ''
for line in lines:
for something in elementsToCheck
element = l.find(something[0], something[1])
if element is not None
concatenated += element.text
print concatenated
Obviously the code above won't work, but you should get the idea. :)

Related

Multiple Try/Except for Validate Config-File

Thats my first question on Stackoverflow and im a totally Python beginner.
I want to write, to get firm with python, a small Backup-Programm, the main part is done, but now i want to make it a bit "portable" and use a Config file, which i want to Validate.
My class "getBackupOptions" should be give Back a validate dict which should be enriched with "GlobalOptions" and "BackupOption" so that i finally get an fully "BackupOption" dict when i call "getBackupOptions.BackupOptions".
My Question now is, (in this Example is it easy, because its only the Function which check if the Path should be Recursive searched or not) how to simplify my Code?
For each (possible) Error i must write a new "TryExcept" Block - Can i Simplify it?
Maybe is there another way to Validate Config Files/Arrays?
class getBackupOptions:
def __init__(self,BackupOption,GlobalOptions):
self.BackupOption = BackupOption
self.GlobalOptions = GlobalOptions
self.getRecusive()
def getRecusive(self):
try:
if self.BackupOption['recursive'] != None:
pass
else:
raise KeyError
except KeyError:
try:
if self.GlobalOptions['recursive'] != None:
self.BackupOption['recursive'] = self.GlobalOptions['recursive']
else:
raise KeyError
except KeyError:
print('Recusive in: ' + str(self.BackupOption) + ' and Global is not set!')
exit()
Actually i only catch an KeyError, but what if the the Key is there but there is something else than "True" or "False"?
Thanks a lot for you help!
You may try this
class getBackupOptions:
def __init__(self,BackupOption,GlobalOptions):
self.BackupOption = BackupOption
self.GlobalOptions = GlobalOptions
self.getRecusive()
def getRecusive(self):
if self.BackupOption.get('recursive') == 'True' and self.GlobalOptions.get('recursive') == 'True':
self.BackupOption['recursive'] = self.GlobalOptions['recursive']
else:
print('Recusive in: ' + str(self.BackupOption) + ' and Global is not set!')
exit()
Here get method is used, therefore KeyError will not be faced.
If any text other than True comes in the field it will be considered as False.

Check if variable is defined using a function

I need to check, whether variable is defined or not. If it is not, then this variable should be created as empty string.
I want to do it by try and it works fine:
try:
ident
except:
ident = ''
But I need to do that using a function, coz I will do that many, many times and it will became unreadable.
Doing it like below can't work, coz it will not go into the function, if ident does not exist.
def absence_of_tag(ident):
try:
ident
except:
return ''
I was also trying to do it with *args, like that:
def absence_of_tag(*args):
try:
args
except:
return ''
and then call it by:
ident = absence_of_tag(ident)
I thought, it will go into the except in function, but still it gave me NameError: name 'ident' is not defined
Do you have any idea how to do that? Is it even possible?
If you want a one-liner, you can perhaps go with:
ident= ident if ident else ' '
EDIT: This also worked for me:
f=lambda x: x if x else ' '
c=8
f(c)
# output: 8
f(d)
# NameError: name 'd' is not defined

I got aTypeError: 'NoneType' object is not iterable

in fucntion getLink(urls), I have return (cloud,parent,children)
in main function, I have (cloud,parent,children) = getLink(urls) and I got error of this line: TypeError: 'NoneType' object is not iterable
parent and children are all list of http links. since, it is not able to paste them here, parent is a list contains about 30 links; children is a list contains about 30 items, each item is about 10-100 links which is divide by ",".
cloud is a list contain about 100 words, like that: ['official store', 'Java Applets Centre', 'About Google', 'Web History'.....]
I didnot know why I get an error. Is there anything wrong in passing parameter? Or because the list take too much space?
#crawler url: read webpage and return a list of url and a list of its name
def crawler(url):
try:
m = urllib.request.urlopen(url)
msg = m.read()
....
return (list(set(list(links))),list(set(list(titles))) )
except Exception:
print("url wrong!")
#this is the function has gone wrong: it throw an exception here, also the error I mentioned, also it will end while before len(parent) reach 100.
def getLink(urls):
try:
newUrl=[]
parent = []
children =[]
cloud =[]
i=0
while len(parent)<=100:
url = urls[i]
if url in parent:
i += 1
continue
(links, titles) = crawler(url)
parent.append(url)
children.append(",".join(links))
cloud = cloud + titles
newUrl= newUrl+links
print ("links: ",links)
i += 1
if i == len(urls):
urls = list(set(newUrl))
newUrl = []
i = 0
return (cloud,parent,children)
except Exception:
print("can not get links")
def readfile(file):
#not related, this function will return a list of url
def main():
file='sampleinput.txt'
urls=readfile(file)
(cloud,parent,children) = getLink(urls)
if __name__=='__main__':
main()
There might be a way that your function ends without reaching the explicit return statement.
Look at the following example code.
def get_values(x):
if x:
return 'foo', 'bar'
x, y = get_values(1)
x, y = get_values(0)
When the function is called with 0 as parameter the return is skipped and the function will return None.
You could add an explicit return as the last line of your function. In the example given in this answer it would look like this.
def get_values(x):
if x:
return 'foo', 'bar'
return None, None
Update after seing the code
When the exception is triggered in get_link you just print something and return from the function. You have no return statement, so Python will return None. The calling function now tries to expand None into three values and that fails.
Change your exception handling to return a tuple with three values like you do it when everything is fine. Using None for each value is a good idea for it shows you, that something went wrong. Additionally I wouldn't print anything in the function. Don't mix business logic and input/output.
except Exception:
return None, None, None
Then in your main function use the following:
cloud, parent, children = getLink(urls)
if cloud is None:
print("can not get links")
else:
# do some more work

Jump into the line instruction that caused an exception in python after the exception is called

Is there any way to return back and repeat the instruction that was handling an exception in Python?
E.g. if we get some data by input() method, and for some reason is caused an exception (e.g. when trying to convert the input string into int), we raised the exception, but after the exception, I would like again to go to the same line where the input() is.
Just note, "continue" is not an option, even if it is in a loop, because it could be several different input() assigning them to a different variables in different parts of the loop.
So the question again is:
while 1:
try:
foo = int(input(">")
...some other code here...
bar = int(input(">")
...some other code here...
fred = int(input(">")
...some other code here...
except Exception:
... do something for error handling and ...
jump_back_and_repeat_last_line_that_caused_the_exception
Imagine that the above code could be in a loop, and the exception can be caused in any instruction (foo... bar... fred...etc, or even can be any other line). So, if it fails in the "bar" line, it should try again the "bar" line.
Is there any reserved word to do this in python?
Define a function; Handle exception there.
def read_int():
while 1:
try:
value = int(input('>'))
except ValueError:
# Error handling + Jump back to input line.
continue
else:
return value
while 1:
foo = read_int()
bar = read_int()
fred = read_int()
There might be a way to do that, but it will probably result with a very poor design.
If I understand you correctly, then your problem is with the exception caused by calling input.
If that is indeed the case, then you should simply implement it in a separate method, which will handle the exception properly:
foo = getUserInput()
...some other code here...
bar = getUserInput()
...some other code here...
fred = getUserInput()
...some other code here...
def getUserInput():
while 1:
try:
return int(input(">"))
except Exception:
pass
don't do nothing in except:
while 1:
try:
a=int(raw_input('input an integer: ')) #on python2 it's "raw_input" instead of "input"
break
except ValueError, err:
# print err
pass
print 'user input is:', a
output is:
D:\U\ZJ\Desktop> py a.py
input an integer: a
input an integer: b
input an integer: c
input an integer: 123
user input is: 123

python function that changes itself to list

So I'm working on a chemistry project for fun, and I have a function that initializes a list from a text file. What I want to do s make it so the function replaces itself with a list. So here's my first attempt at it which randomly will or won't work and I don't know why:
def periodicTable():
global periodicTable
tableAtoms = open('/Users/username/Dropbox/Python/Chem Project/atoms.csv','r')
listAtoms = tableAtoms.readlines()
tableAtoms.close()
del listAtoms[0]
atoms = []
for atom in listAtoms:
atom = atom.split(',')
atoms.append(Atom(*atom))
periodicTable = atoms
It gets called in in this way:
def findAtomBySymbol(symbol):
try:
periodicTable()
except:
pass
for atom in periodicTable:
if atom.symbol == symbol:
return atom
return None
Is there a way to make this work?
Don't do that. The correct thing to do would be using a decorator that ensures the function is only executed once and caches the return value:
def cachedfunction(f):
cache = []
def deco(*args, **kwargs):
if cache:
return cache[0]
result = f(*args, **kwargs)
cache.append(result)
return result
return deco
#cachedfunction
def periodicTable():
#etc
That said, there's nothing stopping you from replacing the function itself after it has been called, so your approach should generally work. I think the reason it doesn't is because an exception is thrown before you assign the result to periodicTable and thus it never gets replaced. Try removing the try/except block or replacing the blanket except with except TypeError to see what exactly happens.
This is very bad practice.
What would be better is to have your function remember if it has already loaded the table:
def periodicTable(_table=[]):
if _table:
return _table
tableAtoms = open('/Users/username/Dropbox/Python/Chem Project/atoms.csv','r')
listAtoms = tableAtoms.readlines()
tableAtoms.close()
del listAtoms[0]
atoms = []
for atom in listAtoms:
atom = atom.split(',')
atoms.append(Atom(*atom))
_table[:] = atoms
The first two lines check to see if the table has already been loaded, and if it has it simply returns it.

Categories