I have a problem with identifying an exception.
Im writing a scraper that scrapes a lot of different websites, and some errors I want to handle and some I only want to ignore.
I except my exceptions like this:
except Exception as e:
most of the exceptions I can identify like this:
type(e).__name__ == "IOError"
But I have one exception "[Errno 10054] An existing connection was forcibly closed by the remote host"
that has the name "error" which is too vague and Im guessing other errors also have that name. Im guessing I can somehow get the errno number from my exception and thus identify it. But I don't know how.
First, you should not rely on the exception's class name, but on the class itself - two classes from two different modules can have the same value for the __name__ attribute while being different exceptions. So what you want is:
try:
something_that_may_raise()
except IOError as e:
handle_io_error(e)
except SomeOtherError as e:
handle_some_other_error(e)
etc...
Then you have two kind of exceptions: the one that you can actually handle one way or another, and the other ones. If the program is only for your personal use, the best way to handle "the other ones" is usually to not handle them at all - the Python runtime will catch them, display a nice traceback with all relevant informations (so you know what happened and where and can eventually add some handling for this case).
If it's a "public" program and/or if you do have some things to clean up before the program crash, you can add a last "catch all" except clause at the program's top level that will log the error and traceback somewhere so it isn't lost (logging.exception is your friend), clean what has to be cleaned and terminate with a more friendly error message.
There are very few cases where one would really want to just ignore an exception (I mean pretending nothing wrong or unexpected happened and happily continue). At the very least you will want to notify the user one of the actions failed and why - in your case that might be a top-level loop iterating over a set of sites to scrap, with an inner try/except block catching "expected" error cases, ie:
# config:
config = [
# ('url', {params})
('some.site.tld', {"param1" : value1, "param2" : value2}),
('some.other.tld', {"param1" : value1, "answer" : 42}),
# etc
]
def run():
for url, params in config:
try:
results = scrap(url, **params)
except (SomeKnownError, SomeOtherExceptedException) as e:
# things that are to be expected and mostly harmless
#
# you configured your logger so that warnings only
# go to stderr
logger.warning("failed to scrap %s : %s - skipping", url, e)
except (MoreSeriousError, SomethingIWannaKnowAbout) as e:
# things that are more annoying and you want to know
# about but that shouldn't prevent from continuing
# with the remaining sites
#
# you configured your logger so that exceptions goes
# to both stderr and your email.
logger.exception("failed to scrap %s : %s - skipping", url, e)
else:
do_something_with(results)
Then have a top-level handler around the call to run() that takes care of unexpected errors :
def main(argv):
parse_args()
try:
set_up_everything()
run()
return 0
except Exception as e:
logger.exception("oops, something unexpected happened : %s", e)
return 1
finally:
do_some_cleanup()
if __name__ == "__main__":
sys.exit(main(sys.argv))
Note that the logging module has an SMTPHandler - but since mail can easily fail too you'd better still have a reliable log (stderr and tee to a file ?) locally. The logging module takes some time to learn but it really pays off in the long run.
Related
I'm building a helper library to call the AdWords (Google Ads) Keyword Planner API and having trouble catching RateExceededError errors when they come up.
The specific error message that I'm getting is below.
GoogleAdsServerFault: RateExceededError <rateName=RATE_LIMIT, rateKey=null, rateScope=ACCOUNT, retryAfterSeconds=30, errorDetails="Quota check failed: QuotaInfo{quotaGroupId=v2-kwi-webapi-global, userId=global}"> Original AdsAPI trace for debugging [
com.google.ads.api.services.common.error.ApiException: RateExceededError <rateName=RATE_LIMIT, rateKey=null, rateScope=ACCOUNT, retryAfterSeconds=30, errorDetails="Quota check failed: QuotaInfo{quotaGroupId=v2-kwi-webapi-global, userId=global}">
Underlying ApiErrors are:
RateExceededError <rateName=RATE_LIMIT, rateKey=null, rateScope=ACCOUNT, retryAfterSeconds=30>
I'm currently working with the below setup to call the API and catch errors, however exceptions are still being raised occasionally. Is there a better way I should catch these errors and just log the exceptions as warnings?
class AdwordsAPIException(Exception):
pass
def call_adwords_api_client(self, selector):
try:
return _adwords_client.get(selector)
except AdwordsAPIException:
return None
Many thanks in advance!
Well, you have made a custom Exception class which is never raised, to skip all exceptions try this
def call_adwords_api_client(self, selector):
try:
return _adwords_client.get(selector)
except:
return None
Also, api suggests to wait for 30 seconds before trying again. good luck.
In an except block I want to raise the same exception but without the stack trace and without the information that this exception has been raised as direct cause of another exception. (and without modifying sys.tracebacklimit globally)
Additionally I have a very clumsy exception class which parses and modifies the message text so I can't just reproduce it.
My current approach is
try:
deeply_nested_function_that_raises_exception()
except ClumsyExceptionBaseClass as exc:
cls, code, msg = exc.__class__, exc.code, exc.msg
raise cls("Error: %d %s" % (code, msg))
What I'm doing here is de-composing the exception information, re-assemble a new exception with a message which will be parsed and split into error code and message in the constructor and raise it from outside the except block in order to forget all trace information.
Is there a more pythonic way to do this? All I want is get rid of the noisy (and useless in my case) trace back while keeping the information contained in the exception object..
In Python 3, you can use with_traceback to remove the traceback entries accumulated so far:
try: ...
except Exception as e:
raise e.with_traceback(None)
In Python 2, it’s just
try: ...
except Exception as e:
raise e # not just "raise"
It will of course still show the trace to this line, since that’s added as the exception propagates (again).
We use cx_Oracle to connect multiple threads to the database and issue various selects and updates.
However, for unknown reasons, the script is being killed by the system on random database connections. There are no informations in the syslog or messages-files.
Due to the error handling we try to write tracebacks in the logfile. Unfortunately we have no information about the crash of the script in the logfile. Only in stdout is a printout with "PID killed" at the last line.
Could it be a problem to make database connections with multiple threads at the same time? There are also other scripts running at the same time that also talk to the database (not multi-threaded) but access other tables.
This is our Function that is called for every select. For the updates we have other Functions.
def ora_send_sql( logger, statement):
try:
dsn = cx_Oracle.makedsn(SQL_IP, SQL_PORT, SQL_SID)
con = cx_Oracle.connect(SQL_USR, SQL_PWD, dsn)
cur = con.cursor()
cur.execute(statement)
con.commit()
con.close()
return 0
except cx_Oracle.Warning as w:
logger.warning(" Oracle-Warning: "+ str(e).strip())
pass
except cx_Oracle.Error as e:
error, = exc.args
logger.error(" Oracle-Error-Code:", error.code)
logger.error(" Oracle-Error-Message:", error.message)
return -1
except:
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
logger.exception(" Got Traceback in ora_send_sql: " + str(exc_type) + " Fname=" + str(fname) + " Lineno=" + str(exc_tb.tb_lineno))
return -2
I don't suppose you tried turning except: into except Exception as e and then trying to see if the exception is somewhat special. Another possible thing to try perhaps is removing the exception handling completely and letting it crash, then investigating the output. That way you could spot the actual exception that is thrown, because I simply cannot believe it would just "crash". Finally, try investigating dmesg for any segfaults.
I think we have fixed the problem by updating cx_Oracle to the latest version.
They fixed a few memory-leaks in the updates.....
But that still does not explain why we do not see any information about the killing....
You almost certainly need to use threaded=True in the connect() call, see http://cx-oracle.readthedocs.io/en/latest/module.html#cx_Oracle.Connection
I have code as follows:
try:
schema = lxml.etree.RelaxNG(file=schema_file)
schema.assertValid(etree)
except lxml.etree.DocumentInvalid as exception:
print(schema.error_log)
print(exception.error_log)
raise exception
It consistently raises a DocumentInvalid error:
DocumentInvalid: Document does not comply with schema
yet prints no errors in either the schema error log or the global error log.
This only happens for some files, others validate correctly, or give a reason for failing to validate. (It's a very long schema, so I won't quote it here.)
I don't even know where to start looking. What could the cause possibly be?
That is how you can get an error message
try:
schema = lxml.etree.RelaxNG(file=schema_file)
schema.assertValid(etree)
except lxml.etree.DocumentInvalid as exception:
print(exception.error_log.filter_from_errors()[0])
raise exception
i need to identify who raise an exception to handle better str error, is there a way ?
look at my example:
try:
os.mkdir('/valid_created_dir')
os.listdir('/invalid_path')
except OSError, msg:
# here i want i way to identify who raise the exception
if is_mkdir_who_raise_an_exception:
do some things
if is_listdir_who_raise_an_exception:
do other things ..
how i can handle this, in python ?
If you have completely separate tasks to execute depending on which function failed, as your code seems to show, then separate try/exec blocks, as the existing answers suggest, may be better (though you may probably need to skip the second part if the first one has failed).
If you have many things that you need to do in either case, and only a little amount of work that depends on which function failed, then separating might create a lot of duplication and repetition so the form you suggested may well be preferable. The traceback module in Python's standard library can help in this case:
import os, sys, traceback
try:
os.mkdir('/valid_created_dir')
os.listdir('/invalid_path')
except OSError, msg:
tb = sys.exc_info()[-1]
stk = traceback.extract_tb(tb, 1)
fname = stk[0][2]
print 'The failing function was', fname
Of course instead of the print you'll use if checks to decide exactly what processing to do.
Wrap in "try/catch" each function individually.
try:
os.mkdir('/valid_created_dir')
except Exception,e:
## doing something,
## quite probably skipping the next try statement
try:
os.listdir('/invalid_path')
except OSError, msg:
## do something
This will help readability/comprehension anyways.
How about the simple solution:
try:
os.mkdir('/valid_created_dir')
except OSError, msg:
# it_is_mkdir_whow_raise_ane_xception:
do some things
try:
os.listdir('/invalid_path')
except OSError, msg:
# it_is_listdir_who_raise_ane_xception:
do other things ..
Here's the clean approach: attach additional information to the exception where it happens, and then use it in a unified place:
import os, sys
def func():
try:
os.mkdir('/dir')
except OSError, e:
if e.errno != os.errno.EEXIST:
e.action = "creating directory"
raise
try:
os.listdir('/invalid_path')
except OSError, e:
e.action = "reading directory"
raise
try:
func()
except Exception, e:
if getattr(e, "action", None):
text = "Error %s: %s" % (e.action, e)
else:
text = str(e)
sys.exit(text)
In practice, you'd want to create wrappers for functions like mkdir and listdir if you want to do this, rather than scattering small try/except blocks all over your code.
Normally, I don't find this level of detail in error messages so important (the Python message is usually plenty), but this is a clean way to do it.