I've got a large bulk downloading application written in Python/Mechanize, aiming to download something like 20,000 files. Clearly, any downloader that big is occasionally going to run into some ECONNRESET errors. Now, I know how to handle each of these individually, but there's two problems with that:
I'd really rather not wrap every single outbound web call in a try/catch block.
Even if I were to do so, there's trouble with knowing how to handle the errors once the exception has thrown. If the code is just
data = browser.response().read()
then I know precisely how to deal with it, namely:
data = None
while (data == None):
try:
data = browser.response().read()
except IOError as e:
if e.args[1].args[0].errno != errno.ECONNRESET:
raise
data = None
but if it's just a random instance of
browser.follow_link(link)
then how do I know what Mechanize's internal state looks like if an ECONNRESET is thrown somewhere in here? For example, do I need to call browser.back() before I try the code again? What's the proper way to recover from that kind of error?
EDIT: The solution in the accepted answer certainly works, and in my case it turned out to be not so hard to implement. I'm still academically interested, however, in whether there's an error handling mechanism that could result in quicker error catching.
Perhaps place the try..except block higher up in the chain of command:
import collections
def download_file(url):
# Bundle together the bunch of browser calls necessary to download one file.
browser.follow_link(...)
...
response=browser.response()
data=response.read()
urls=collections.deque(urls)
while urls:
url=urls.popleft()
try:
download_file(url)
except IOError as err:
if err.args[1].args[0].errno != errno.ECONNRESET:
raise
else:
# if ECONNRESET error, add the url back to urls to try again later
urls.append(url)
Related
I hope this is not too opinion based, but I've understood that the Python community has some strong guidelines for how to use exception handling, though I couldn't find any commentary fitting to my situation.
I'm in the process of writing the main API function of a web-based script. The incoming data is first preprocessed/sanitized, then pushed through the complicated processing, and the result is finally postprocessed/formatted before being sent back to the requester. Each of the three parts can fail due to various reasons, so I need to wrap them into try-except blocks.
My question is: Is it more pythonic to wrap the whole process in one try-except -block an capture various types of exceptions, or to create individual blocks? I.e., of the following two styles, should I prefer one of them?
Version 1
def main(input_data):
try:
parsed_data = parse_incoming_data(input_data)
except LookupError as e:
return func.HttpResponse(f"Misformatted data, key not found: {e}", status_code=400)
try:
processed_data = big_process(parsed_data)
except CustomProcessingException as e:
return func.HttpResponse(f"Main service failed due to {e}.", status_code=500)
try:
output = postprocess_result(processed_data)
except CustomPostprocessingException as e:
return func.HttpResponse(f"Postprocessing error {e}.", status_code=555)
return func.HttpResponse(output, status_code=200)
Version 2
def main(input_data):
try:
parsed_data = parse_incoming_data(input_data)
processed_data = big_process(parsed_data)
output = postprocess_result(processed_data)
except LookupError as e:
return func.HttpResponse(f"Misformatted data, key not found: {e}", status_code=400)
except CustomProcessingException as e:
return func.HttpResponse(f"Main service failed due to {e}.", status_code=500)
except CustomPostprocessingException as e:
return func.HttpResponse(f"Postprocessing error {e}.", status_code=555)
return func.HttpResponse(output, status_code=200)
This depends on the specifics of the code and the errors being thrown.
With the first way, say processed_data = big_process(parsed_data) fails. output = postprocess_result(processed_data) will still run since they're separate checks. Also, all three failures have the potential to be handled in three different ways.
With the second way, say the first line fails. The third line will not run since execution is transferred to the except block. Also, all three failures will be handled the same way (unless you manually use flags or some other mechanism to differentiate).
So the answer is, neither is really more Pythonic. The pieces of code are not equivalent, so you should use whichever is more appropriate for your particular case.
I am scratching my head about what is the best-practice to get the traceback in the logfile only once. Please note that in general I know how to get the traceback into the log.
Let's assume I have a big program consisting of various modules and functions that are imported, so that it can have quite some depth and the logger is set up properly.
Whenever an exception may occur I do the following:
try:
do_something()
except MyError as err:
log.error("The error MyError occurred", exc_info=err)
raise
Note that the traceback is written to the log via the option exc_info=err.
My Problem is now that when everything gets a bit more complex and nested I loose control about how often this traceback is written to the log and it gets quite messy.
An example of the situation with my current solution for this problem is as follows:
from other_module import other_f
def main():
try:
# do something
val = other_f()
except (AlreadyLoggedError1, AlreadyLoggedError2, AlreadyLoggedError3):
# The error was caught within other_f() or deeper and
# already logged with traceback info where it occurred
# After logging it was raised like in the above example
# I do not want to log it again, so it is just raised
raise
except BroaderException as err:
# I cannot expect to have thought of all exceptions
# So in case something unexpected happened
# I want to have the traceback logged here
# since the error is not logged yet
log.error("An unecpected error occured", exc_info=err)
raise
The problem with this solution is, that I need to to keep track of all Exceptions that are already logged by myself and the line except (AlreadyLoggedError1, AlreadyLoggedError2, ...) gets arbitrary long and has to be put at any level between main() and the position the error actually occured.
So my question is: Is there some better (pythonic) way handling this? To be more specific: I want to raise the information that the exception was already logged together with the exception so that I do not have to account for that via an extra except block like in my above example.
The solution normally used for larger applications is for the low-level code to not actually do error handling itself if it's just going to be logged, but to put exception logging/handling at the highest level in the code possible, since exceptions will bubble up as far as needed. For example, libraries that send errors to a service like New Relic and Sentry don't need you to instrument each small part of your code that might throw an error, they are set up to just catch any exception and send it to a remote service for aggregation and tracking.
I'm new on Flask, when writing view, i wander if all errors should be catched. If i do so, most of view code should be wrappered with try... except. I think it's not graceful.
for example.
#app.route('/')
def index():
try:
API.do()
except:
abort(503)
Should i code like this? If not, will the service crash(uwsgi+lnmp)?
You only catch what you can handle. The word "handle" means "do something useful with" not merely "print a message and die". The print-and-die is already handled by the exception mechanism and probably does it better than you will.
For example, this is not handling an exception usefully:
denominator = 0
try:
y = x / denominator
except ZeroDivisionError:
abort(503)
There is nothing useful you can do, and the abort is redundant as that's what uncaught exceptions will cause to happen anyway. Here is an example of a useful handling:
try:
config_file = open('private_config')
except IOError:
config_file = open('default_config_that_should_always_be_there')
but note that if the second open fails, there is nothing useful to do so it will travel up the call stack and possibly halt the program. What you should never do is have a bare except: because it hides information about what faulted where. This will result in much head scratching when you get a defect report of "all it said was 503" and you have no idea what went wrong in API.do().
Try / except blocks that can't do any useful handling clutter up the code and visually bury the main flow of execution. Languages without exceptions force you to check every call for an error return if only to generate an error return yourself. Exceptions exist in part to get rid of that code noise.
Why is it a bad idea to catch all exceptions in Python ?
I understand that catching all exceptions using the except: clause will even catch the 'special' python exceptions: SystemExit, KeyboardInterrupt, and GeneratorExit. So why not just use a except Exception: clause to catch all exceptions?
Because it's terribly nonspecific and it doesn't enable you to do anything interesting with the exception. Moreover, if you're catching every exception there could be loads of exceptions that are happening that you don't even know are happening (which could cause your application to fail without you really knowing why). You should be able to predict (either through reading documentation or experimentation) specifically which exceptions you need to handle and how to handle them, but if you're blindly suppressing all of them from the beginning you'll never know.
So, by popular request, here's an example. A programmer is writing Python code and she gets an IOError. Instead of investigating further, she decides to catch all exceptions:
def foo():
try:
f = open("file.txt")
lines = f.readlines()
return lines[0]
except:
return None
She doesn't realize the issue in his ways: what if the file exists and is accessible, but it is empty? Then this code will raise an IndexError (since the list lines is empty). So she'll spend hours wondering why she's getting None back from this function when the file exists and isn't locked without realizing something that would be obvious if she had been more specific in catching errors, which is that she's accessing data that might not exist.
Because you probably want to handle each exception differently. It's not the same thing to have a KeyInterrupt than to have a Encoding problem, or an OS one... You can catch specific exceptions one after the other.
try:
XXX
except TYPE:
YYY
except TYPE:
ZZZ
Some of you might remember a question very similar to this, as I seeked your help writin the original util in C (using libssh2 and openssl).
I'm now trying to port it to python and got stuck at an unexpected place. Ported about 80% of the core and functionality in 30 minutes, and then spend 10hours+ and still haven't finished that ONE function, so I'm here again to ask for you help one more time :)
The whole source (~130 lines, should be easily readable, not complex) is available here: http://pastebin.com/Udm6Ehu3
The connecting, switching on SSL, handshaking, authentication and even sending (encrypted) commands works fine (I can see from my routers log that I log in with proper user and password).
The problem is with ftp_read in the tunnel scenario (else from self.proxy is None). One attempt was this:
def ftp_read(self, trim=False):
if self.proxy is None:
temp = self.s.read(READBUFF)
else:
while True:
try:
temp = self.sock.bio_read(READBUFF)
except Exception, e:
print type(e)
if type(e) == SSL.WantReadError:
try:
self.chan.send(self.sock.bio_read(10240))
except Exception, e:
print type(e)
self.chan.send(self.sock.bio_read(10240))
elif type(e) == SSL.WantWriteError:
self.chan.send(self.sock.bio_read(10240))
But I end up stuck at either having a blocked waiting for bio read (or channel read in the ftp_write function), or exception OpenSSL.SSL.WantReadError which, ironicly, is what I'm trying to handle.
If I comment out the ftp_read calls, the proxy scenario works fine (logging in, sending commands no problem), as mentioned. So out of read/write unencrypted, read/write encrypted I'm just missing the read tunnel encrypted.
I've spend 12hours+ now, and feel like I'm getting nowhere, so any thoughts are highly appreciated.
EDIT: I'm not asking someone to write the function for me, so if you know a thing or two about SSL (especially BIOs), and you can see an obvious flaw in my interaction between tunnel and BIO, that'll suffice as a answer :) Like: maybe the ftp_write returns more data than those 10240 bytes requested (or just sends two texts ("blabla\n", "command done.\n")) so it isn't properly flushed. Which might be true, but apparently I can't rely on .want_write()/.want_read() from pyOpenSSL to report anything but 0 bytes available.
Okay, so I think I manged to sort it out.
sarnold, you'll like this updated version:
def ftp_read(self, trim=False):
if self.proxy is None:
temp = self.s.read(READBUFF)
else:
temp = ""
while True:
try:
temp += self.sock.recv(READBUFF)
break
except Exception, e:
if type(e) == SSL.WantReadError:
self.ssl_wants_read()
elif type(e) == SSL.WantWriteError:
self.ssl_wants_write()
where ssl_wants_* is:
def ssl_wants_read(self):
try:
self.chan.send(self.sock.bio_read(10240))
except Exception, e:
chan_output = None
chan_output = self.chan.recv(10240)
self.sock.bio_write(chan_output)
def ssl_wants_write(self):
self.chan.send(self.sock.bio_read(10240))
Thanks for the input, sarnold. It made things a bit clearer and easier to work with. However, my issue seemed to be one missed error handling (broke out of SSL.WantReadError exception too soon).