Error handling: how to properly prioritize exceptions - python

There are two functions: one downloads the excel file (ExcelFileUploadView(APIView)) and the other processes the downloaded file(def parse_excel_rfi_sheet).
Function parse_excel_rfi_sheet is called inside ExcelFileUploadView(APIView)
class ExcelFileUploadView(APIView):
parser_classes = (MultiPartParser, FormParser)
permission_classes = (permissions.AllowAny,)
def put(self, request, format=None):
if 'file' not in request.data:
raise ParseError("Empty content")
f = request.data['file']
filename = f.name
if filename.endswith('.xlsx'):
try:
file = default_storage.save(filename, f)
r = parse_excel_rfi_sheet(file)
status = 200
except:
raise Exception({"general_errors": ["Error during file upload"]})
finally:
default_storage.delete(file)
else:
status = 406
r = {"general_errors": ["Please upload only xlsx files"]}
return Response(r, status=status)
def parse_excel_rfi_sheet(file):
workbook = load_workbook(filename=file)
sheet = workbook["RFI"]
curent_module_coordinate = []
try:
....
curent_module_coordinate.append(sheet['E688'].value)
curent_module_coordinate.append(sheet['E950'].value)
if check_exel_rfi_template_structure(structure=curent_module_coordinate):
file_status = True
else:
file_status = False
except:
raise Exception({"general_errors": ["Error during excel file parsing. Unknown module cell"]})
The problem is that when an error occurs inside the parse_excel_rfi_sheet, I do not see a call of {"general_errors": ["Error during excel file parsing. Unknown module cell"]}
Instead, I always see the call
{"general_errors": ["Error during file upload"]}
That's why I can't understand at what stage the error occurred: at the moment of downloading the file or at the moment of processing.
How to change this?

Since you are calling parse_excel_rfi_sheet from ExcelFileUploadView whenever the exception {"general_errors": ["Error during excel file parsing. Unknown module cell"]} is raised from parse_excel_rfi_sheet function try block from ExcelFileUploadView fails and comes to except and raises the exception {"general_errors": ["Error during file upload"]}.
You can verify this by printing the exception raised by the ExcelFileUploadView function.
Chane the try block to the following:
try:
file = default_storage.save(filename, f)
r = parse_excel_rfi_sheet(file)
status = 200
except Exception as e:
print("Exception raised ", e)
raise Exception({"general_errors": ["Error during file upload"]})

Your problem comes from catching absolutely all exceptions, first in parse_excel_rfi_sheet, then once again in your put method. Both bare except clause (except: whatever_code_here) and large try blocks are antipatterns - you only want to catch the exact exceptions you're expecting at a given point (using except (SomeExceptionType, AnotherExceptionType, ...) as e:, and have as few code as possible in your try blocks so you are confident you know where the exception comes from.
The only exception (no pun intended) to this rule is the case of "catch all" handlers at a higher level, that are use to catch unexpected errors, log them (so you have a trace of what happened), and present a friendly error message to the user - but even then, you don't want a bare except clause but a except Exception as e.
TL;DR: never assume anything about which exception was raised, where and why, and never pass exceptions silently (at least log them - and check your logs).

raise Exception(...) generates a new Exception instance and raises that one.
This means, the try ... except in put effectively throws away the exception it caught and replaces it with a new one with message "Error during file upload", which is why you always see the same message.
A clean way to handle this would be to define a custom subclass of Exception (e.g., InvalidFormatException) and raise that one in parse_excel_rfi_sheet, having two different except cases in put:
class InvalidFormatException(Exception):
pass
[...]
def parse_excel_rfi_sheet(file):
workbook = load_workbook(filename=file)
sheet = workbook["RFI"]
curent_module_coordinate = []
try:
....
curent_module_coordinate.append(sheet['E688'].value)
curent_module_coordinate.append(sheet['E950'].value)
if check_exel_rfi_template_structure(structure=curent_module_coordinate):
file_status = True
else:
file_status = False
except:
raise InvalidFormatException({"general_errors": ["Error during excel file parsing. Unknown module cell"]})
Your put then becomes:
def put(self, request, format=None):
if 'file' not in request.data:
raise ParseError("Empty content")
f = request.data['file']
filename = f.name
if filename.endswith('.xlsx'):
try:
file = default_storage.save(filename, f)
r = parse_excel_rfi_sheet(file)
status = 200
except InvalidFormatException:
raise # pass on the exception
except:
raise Exception({"general_errors": ["Error during file upload"]})
finally:
default_storage.delete(file)
else:
status = 406
r = {"general_errors": ["Please upload only xlsx files"]}
return Response(r, status=status)
Warning: As pointed out in the comments to this answer, note that -although not directly inquired- the OP's code should be further modified to remove the bare except: clause, as this is probably not the expected behaviour.

Related

Putting try and catch in the right place

I have a very simple caching service that caches files on S3. There are times when the file I am trying to cache locally does not exist on AWS S3. As such in one of my files that uses the caching service I prefer to return None if the file i am trying to cache is no found.
However I realize that i will be using the caching service in many other places and as a result I have been told by my peers that cache.cache_file() should still raise an error in this case but a simpler one like FileNotFoundError that doesn't require the caller to do if e.response["Error"]["Code"] == "404"
My Caching Code
import logging
import os
from pathlib import Path
from stat import S_IREAD, S_IRGRP, S_IROTH
from mylib import s3
from mylib.aws.clients import s3_client, s3_resource
logger = logging.getLogger(__name__)
class Cache:
def _is_file_size_equal(self, s3_path_of_file: str, local_path: Path, file_name: str) -> bool:
bucket, key = s3.deconstruct_s3_url(f"{s3_path_of_file}/{file_name}")
s3_file_size = s3_resource().Object(bucket, key).content_length
local_file_size = (local_path / file_name).stat().st_size
return s3_file_size == local_file_size
def cache_file(self, s3_path_of_file: str, local_path: Path, file_name: str) -> None:
bucket, key = s3.deconstruct_s3_url(f"{s3_path_of_file}/{file_name}")
if not (local_path / file_name).exists() or not self._is_file_size_equal(
s3_path_of_file, local_path, file_name
):
os.makedirs(local_path, exist_ok=True)
s3_client().download_file(bucket, key, f"{local_path}/{file_name}")
os.chmod(local_path / file_name, S_IREAD | S_IRGRP | S_IROTH)
else:
logger.info("Cached File is Valid!")
My Code that calls the Caching Code
def get_required_stream(environment: str, proxy_key: int) -> Optional[BinaryIO]:
s3_overview_file_path = f"s3://{TRACK_BUCKET}/{environment}"
overview_file = f"{some_key}.mv"
local_path = _cache_directory(environment)
try:
cache.cache_file(s3_overview_file_path, local_path, overview_file)
overview_file_cache = local_path / f"{proxy_key}.mv"
return overview_file_cache.open("rb")
except botocore.exceptions.ClientError as e:
if e.response["Error"]["Code"] == "404":
return None
else:
raise
Issue
Being new to Python I am a little unsure how this would work. I assume it means that my code that calls the caching service especially the except part would look something like this.
except FileNotFoundError:
return None
And in the caching service where i have s3_client().download_file(bucket, key, f"{local_path}/{file_name}") I would wrap it with a try and catch ?
While this question probably comes across as trivial and it is I thought I would ask it here anyway since it would be good learning opportunity for me and also understand how to write clean code. I would love suggestions on how I can achieve the desired and if my assumption is wrong?
def get_required_stream(environment: str, proxy_key: int) -> Optional[BinaryIO]:
s3_overview_file_path = f"s3://{TRACK_BUCKET}/{environment}"
overview_file = f"{some_key}.mv"
local_path = _cache_directory(environment)
try:
cache.cache_file(s3_overview_file_path, local_path, overview_file)
overview_file_cache = local_path / f"{proxy_key}.mv"
return overview_file_cache.open("rb")
except botocore.exceptions.ClientError as e:
if e.response["Error"]["Code"] == "404":
exc = FileNotFoundError()
exc.filename = overview_file_cache
raise exc
raise
# then you can use your function like this
try:
filedesc = get_required_stream(...)
except FileNotFoundError e:
print(f'{e.filename} not found')
If you do not want calling code to catch botocore.exceptions.ClientError exception, you could wrap your cache_file method in a try except block and throw a specific exception. I would also go a step further and create a simple custom exception object that wraps botocore.exceptions.ClientError and exposes error_code and error_message from boto exception. That way, caller doesn't have to separately catch FileNotFoundError when file not found and then again botocore.exceptions.ClientError for a different type of error (say permission or some network error). They can just catch the custom exception and further inspect for more details.
try:
//do something
except botocore.exceptions.ClientError as ex:
raise YourCustomS3Exception(ex) //YourCustomS3Exception needs to handle ex
One clean way would be an additional file_exists() method in the Cache class, so every user of the cache can check before they attempt the actual caching/download. Just like using pythons filesystem/path functions.
An exception can still occur if the file is deleted/becomes unreachable between the file_exists() call and the download, but I think in this rare case, the botocore exception is just fine.

How to handle multiple exceptions?

I'm a Python learner, trying to handle a few scenarios:
Reading a file.
Formatting Data.
Manipulating/Copying Data.
Writing a file.
So far, I have:
try:
# Do all
except Exception as err1:
print err1
#File Reading error/ File Not Present
except Exception as err2:
print err2
# Data Format is incorrect
except Exception as err3:
print err3
# Copying Issue
except Exception as err4:
print err4
# Permission denied for writing
The idea of implementing in this fashion is to catch the exact error for all different scenarios. I can do it in all separate try/except blocks.
Is this possible? And reasonable?
Your try blocks should be as minimal as possible, so
try:
# do all
except Exception:
pass
is not something you want to do.
The code in your example won't work as you expect it to, because in every except block you are catching the most general exception type Exception. In fact, only the first except block will ever be executed.
What you want to be doing is having multiple try/except blocks, each one responsible for as few things as possible and catching the most specific exception.
For example:
try:
# opening the file
except FileNotFoundException:
print('File does not exist')
exit()
try:
# writing to file
except PermissionError:
print('You do not have permission to write to this file')
exit()
However, sometimes it is appropriate to catch different types of exceptions, in the same except block or in several blocks.
try:
ssh.connect()
except (ConnectionRefused, TimeoutError):
pass
or
try:
ssh.connect()
except ConnectionRefused:
pass
except TimeoutError:
pass
As DeepSpace stated,
Your try blocks should be as minimal as possible.
If you want to achieve
try:
# do all
except Exception:
pass
Then you might as well do something like
def open_file(file):
retval = False
try:
# opening the file succesful?
retval = True
except FileNotFoundException:
print('File does not exist')
except PermissionError:
print('You have no permission.')
return retval
def crunch_file(file):
retval = False
try:
# conversion or whatever logical operation with your file?
retval = True
except ValueError:
print('Probably wrong data type?')
return retval
if __name__ == "__main__":
if open_file(file1):
open(file1)
if open_file(file2) and crunch_file(file2):
print('opened and crunched')
Yes this is possible.
Just say as example:
try:
...
except RuntimeError:
print err1
except NameError:
print err2
...
Just define the exact Error you want to intercept.

python3 re-raising an exception with custom attribute?

here's my code from python2 which needs to be ported:
try:
do_something_with_file(filename)
except:
exc_type, exc_inst, tb = sys.exc_info()
exc_inst.filename = filename
raise exc_type, exc_inst, tb
with above code, I can get the whole exception with the problematic input file by checking whether an exception has 'filename' attribute.
however python3's raise has been changed. this is what 2to3 gave me for above code:
except Exception as e:
et, ei, tb = sys.exc_info()
e.filename = filename
raise et(e).with_traceback(tb)
which gives me another error and I don't think filename attribute is preserved:
in __call__
raise et(e).with_traceback(tb)
TypeError: function takes exactly 5 arguments (1 given)
What I just want is passing exceptions transparently with some information to track the input file. I miss python2's raise [exception_type[,exception_instance[,traceback]]] - How can I do this in python3?
You can set the __traceback__ attribute:
except Exception as e:
et, ei, tb = sys.exc_info()
ei.filename = filename
ei.__traceback__ = tb
raise ei
or call .with_traceback() directly on the old instance:
except Exception as e:
et, ei, tb = sys.exc_info()
ei.filename = filename
raise ei.with_traceback(tb)
However, the traceback is already automatically attached, there is no need to re-attach it, really.
See the raise statement documentation:
A traceback object is normally created automatically when an exception is raised and attached to it as the __traceback__ attribute, which is writable.
In this specific case, perhaps you wanted a different exception instead, with context?
class FilenameException(Exception):
filename = None
def __init__(self, filename):
super().__init__(filename)
self.filename = filename
try:
something(filename)
except Exception as e:
raise FilenameException(filename) from e
This would create a chained exception, where both exceptions would be printed if uncaught, and the original exception is available as newexception.__context__.
You don't need to do anything; in Python 3 re-raised exceptions automatically have a full traceback from where they were originally raised.
try:
do_something_with_file(filename)
except Exception as exc_inst:
exc_inst.filename = filename
raise exc_inst
This works because PyErr_NormalizeException sets the __traceback__ attribute appropriately on catching an exception in an except statement; see http://www.python.org/dev/peps/pep-3134/.

Python exception handling - line number

I'm using python to evaluate some measured data. Because of many possible results it is difficult to handle or possible combinations. Sometimes an error happens during the evaluation. It is usually an index error because I get out of range from measured data.
It is very difficult to find out on which place in code the problem happened. It would help a lot if I knew on which line the error was raised. If I use following code:
try:
result = evaluateData(data)
except Exception, err:
print ("Error: %s.\n" % str(err))
Unfortunately this only tells me that there is and index error. I would like to know more details about the exception (line in code, variable etc.) to find out what happened. Is it possible?
Thank you.
Solution, printing filename, linenumber, line itself and exception description:
import linecache
import sys
def PrintException():
exc_type, exc_obj, tb = sys.exc_info()
f = tb.tb_frame
lineno = tb.tb_lineno
filename = f.f_code.co_filename
linecache.checkcache(filename)
line = linecache.getline(filename, lineno, f.f_globals)
print 'EXCEPTION IN ({}, LINE {} "{}"): {}'.format(filename, lineno, line.strip(), exc_obj)
try:
print 1/0
except:
PrintException()
Output:
EXCEPTION IN (D:/Projects/delme3.py, LINE 15 "print 1/0"): integer division or modulo by zero
To simply get the line number you can use sys, if you would like to have more, try the traceback module.
import sys
try:
[][2]
except IndexError:
print("Error on line {}".format(sys.exc_info()[-1].tb_lineno))
prints:
Error on line 3
Example from the traceback module documentation:
import sys, traceback
def lumberjack():
bright_side_of_death()
def bright_side_of_death():
return tuple()[0]
try:
lumberjack()
except IndexError:
exc_type, exc_value, exc_traceback = sys.exc_info()
print "*** print_tb:"
traceback.print_tb(exc_traceback, limit=1, file=sys.stdout)
print "*** print_exception:"
traceback.print_exception(exc_type, exc_value, exc_traceback,
limit=2, file=sys.stdout)
print "*** print_exc:"
traceback.print_exc()
print "*** format_exc, first and last line:"
formatted_lines = traceback.format_exc().splitlines()
print formatted_lines[0]
print formatted_lines[-1]
print "*** format_exception:"
print repr(traceback.format_exception(exc_type, exc_value,
exc_traceback))
print "*** extract_tb:"
print repr(traceback.extract_tb(exc_traceback))
print "*** format_tb:"
print repr(traceback.format_tb(exc_traceback))
print "*** tb_lineno:", exc_traceback.tb_lineno
I use the traceback which is simple and robust:
import traceback
try:
raise ValueError()
except:
print(traceback.format_exc()) # or: traceback.print_exc()
Out:
Traceback (most recent call last):
File "catch.py", line 4, in <module>
raise ValueError()
ValueError
The simplest way is just to use:
import traceback
try:
<blah>
except IndexError:
traceback.print_exc()
or if using logging:
import logging
try:
<blah>
except IndexError as e:
logging.exception(e)
Gives you file, lineno, and exception for the last item in the call stack
from sys import exc_info
from traceback import format_exception
def print_exception():
etype, value, tb = exc_info()
info, error = format_exception(etype, value, tb)[-2:]
print(f'Exception in:\n{info}\n{error}')
try:
1 / 0
except:
print_exception()
prints
Exception in:
File "file.py", line 12, in <module>
1 / 0
ZeroDivisionError: division by zero
I would suggest using the python logging library, it has two useful methods that might help in this case.
logging.findCaller()
findCaller(stack_info=False) - Reports just the line number for the previous caller leading to the exception raised
findCaller(stack_info=True) - Reports the line number & stack for the previous caller leading to the exception raised
logging.logException()
Reports the line & stack within the try/except block that raised the exception
For more info checkout the api https://docs.python.org/3/library/logging.html
There are many answers already posted here that show how to get the line number, but it's worth noting that if you want variables containing the "raw data," so to speak, of the stack trace so that you can have more granular control of what you display or how you format it, using the traceback module you can step through the stack frame by frame and look at what's stored in the attributes of the frame summary objects. There are several simple and elegant ways to manipulate the frame summary objects directly. Let's say for example that you want the line number from the last frame in the stack (which tells you which line of code triggered the exception), here's how you could get it by accessing the relevant frame summary object:
Option 1:
import sys
import traceback
try:
# code that raises an exception
except Exception as exc:
exc_type, exc_value, exc_tb = sys.exc_info()
stack_summary = traceback.extract_tb(exc_tb)
end = stack_summary[-1] # or `stack_summary.pop(-1)` if you prefer
Option 2:
import sys
import traceback
try:
# code that raises an exception
except Exception as exc:
tbe = traceback.TracebackException(*sys.exc_info())
end = tbe.stack[-1] # or `tbe.stack.pop(-1)` if you prefer
In either of the above examples, end will be a frame summary object:
&gt&gt&gt type(end)
&ltclass &#39traceback.FrameSummary&#39&gt
which was in turn taken from a stack summary object:
&gt&gt&gt type(stack_summary) # from option 1
&ltclass &#39traceback.StackSummary&#39&gt
&gt&gt&gt type(tbe&#46stack) # from option 2
&ltclass &#39traceback.StackSummary&#39&gt
The stack summary object behaves like a list and you can iterate through all of the frame summary objects in it however you want in order to trace through the error. The frame summary object (end, in this example), contains the line number and everything else you need to locate where in the code the exception occurred:
>>> print(end.__doc__)
A single frame from a traceback.
- :attr:`filename` The filename for the frame.
- :attr:`lineno` The line within filename for the frame that was
active when the frame was captured.
- :attr:`name` The name of the function or method that was executing
when the frame was captured.
- :attr:`line` The text from the linecache module for the
of code that was running when the frame was captured.
- :attr:`locals` Either None if locals were not supplied, or a dict
mapping the name to the repr() of the variable.
And if you capture the exception object (either from except Exception as exc: syntax or from the second object returned by sys.exc_info()), you will then have everything you need to write your own highly customized error printing/logging function:
err_type = type(exc).__name__
err_msg = str(exc)
Putting it all together:
from datetime import datetime
import sys
import traceback
def print_custom_error_message():
exc_type, exc_value, exc_tb = sys.exc_info()
stack_summary = traceback.extract_tb(exc_tb)
end = stack_summary[-1]
err_type = type(exc_value).__name__
err_msg = str(exc_value)
date = datetime.strftime(datetime.now(), "%B %d, %Y at precisely %I:%M %p")
print(f"On {date}, a {err_type} occured in {end.filename} inside {end.name} on line {end.lineno} with the error message: {err_msg}.")
print(f"The following line of code is responsible: {end.line!r}")
print("Please make a note of it.")
def do_something_wrong():
try:
1/0
except Exception as exc:
print_custom_error_message()
if __name__ == "__main__":
do_something_wrong()
Let's run it!
user#some_machine:~$ python example.py
On August 25, 2022 at precisely 01:31 AM, a ZeroDivisionError occured in example.py inside do_something_wrong on line 21 with the error message: division by zero.
The following line of code is responsible: '1/0'
Please make a note of it.
At this point you can see how you could print this message for any place in the stack: end, beginning, anywhere in-between, or iterate through and print it for every frame in the stack.
Of course, the formatting functionality already provided by the traceback module covers most debugging use cases, but it's useful to know how to manipulate the traceback objects to extract the information you want.
I always use this snippet
import sys, os
try:
raise NotImplementedError("No error")
except Exception as e:
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
print(exc_type, fname, exc_tb.tb_lineno)
for different views and possible issues you can refer When I catch an exception, how do I get the type, file, and line number?
All the solutions answer the OPs problem, however, if one is holding onto a specific error instance and the last traceback stack won't do it they will not suffice —a corner scenario example given below.
In this case, the magic attribute __traceback__ of a raised exception instance may be used. This is a system traceback object (exposed as types.TracebackType, see Types) not the module. This traceback instance behaves just one like would expect (but it is always nice to check):
from collections import deque
from typing import (List, )
errors: List[Exception] = deque([], 5)
def get_raised(error_cls: type, msg: str) -> Exception:
# for debugging, an exception instance that is not raised
# ``__traceback__`` has ``None``
try:
raise error_cls(msg)
except Exception as error:
return error
error = get_raised(NameError, 'foo')
errors.append(error)
error = get_raised(ValueError, 'bar')
errors.append(error)
try:
raise get_raised(TypeError, 'bar')
except Exception as error:
errors.append(error)
Now, I can check out the first error's details:
import types
traceback = errors[0].__traceback__ # inadvisable name due to module
line_no: int = traceback.tb_lineno
frame: types.FrameType = traceback.tb_frame
previous: Union[type(None), types.TracebackType] = traceback.tb_next
filename: str = frame.f_code.co_filename
The traceback's previous is None for the second error despite a preceding error as expected, but for the third wherein an error is raised twice it is not.
This below is just a test which makes no sense contextually. A case were it is useful if when an exception is raised in a view of a webapp (a 500 status kind of incident) that gets caught and stored for the admit to inspect akin to Setry.io (but for free). Here is a minimal example where the home page / will raise an error, that gets caught and the route errors will list them. This is using Pyramid in a very concentrated way (multifile is way better) with no logging or authentication and the error logging could be better for the admin to inspect similar to Sentry.io.
from pyramid.config import Configurator
from waitress import serve
from collections import deque
# just for typehinting:
from pyramid.request import Request
from pyramid.traversal import DefaultRootFactory
from pyramid.router import Router
import types
from typing import (List, )
def home_view(context: DefaultRootFactory, request: Request) -> dict:
raise NotImplementedError('I forgot to fill this')
return {'status': 'ok'} # never reached.
def caught_view(error: Exception, request: Request) -> dict:
"""
Exception above is just type hinting.
This is controlled by the context argument in
either the ``add_exception_view`` method of config,
or the ``exception_view_config`` decorator factory (callable class)
"""
# this below is a simplification as URLDecodeError is an attack (418)
request.response.status = 500
config.registry.settings['error_buffer'].append(error)
#logging.exception(error) # were it set up.
#slack_admin(format_error(error)) # ditto
return {'status': 'error', 'message': 'The server crashed!!'}
def format_error(error: Exception) -> str:
traceback = error.__traceback__ # inadvisable name due to module
frame: types.FrameType = traceback.tb_frame
return f'{type(error).__name__}: {error}' +\
f'at line {traceback.tb_lineno} in file {frame.f_code.co_filename}'
def error_view(context: DefaultRootFactory, request: Request) -> dict:
print(request.registry.settings['error_buffer'])
return {'status': 'ok',
'errors':list(map(format_error, request.registry.settings['error_buffer']))
}
with Configurator(settings=dict()) as config:
config.add_route('home', '/')
config.add_route('errors', '/errors')
config.add_view(home_view, route_name='home', renderer='json')
config.add_view(error_view, route_name='errors', renderer='json')
config.add_exception_view(caught_view, context=Exception, renderer='json')
config.registry.settings['error_buffer']: List[Exception] = deque([], 5)
# not in config.registry.settings, not JSON serialisable
# config.add_request_method
app : Router = config.make_wsgi_app()
port = 6969
serve(app, port=port)

closing files properly opened with urllib2.urlopen()

I have following code in a python script
try:
# send the query request
sf = urllib2.urlopen(search_query)
search_soup = BeautifulSoup.BeautifulStoneSoup(sf.read())
sf.close()
except Exception, err:
print("Couldn't get programme information.")
print(str(err))
return
I'm concerned because if I encounter an error on sf.read(), then sf.clsoe() is not called.
I tried putting sf.close() in a finally block, but if there's an exception on urlopen() then there's no file to close and I encounter an exception in the finally block!
So then I tried
try:
with urllib2.urlopen(search_query) as sf:
search_soup = BeautifulSoup.BeautifulStoneSoup(sf.read())
except Exception, err:
print("Couldn't get programme information.")
print(str(err))
return
but this raised a invalid syntax error on the with... line.
How can I best handle this, I feel stupid!
As commenters have pointed out, I am using Pys60 which is python 2.5.4
I would use contextlib.closing (in combination with from __future__ import with_statement for old Python versions):
from contextlib import closing
with closing(urllib2.urlopen('http://blah')) as sf:
search_soup = BeautifulSoup.BeautifulStoneSoup(sf.read())
Or, if you want to avoid the with statement:
try:
sf = None
sf = urllib2.urlopen('http://blah')
search_soup = BeautifulSoup.BeautifulStoneSoup(sf.read())
finally:
if sf:
sf.close()
Not quite as elegant though.
finally:
if sf: sf.close()
Why not just try closing sf, and passing if it doesn't exist?
import urllib2
try:
search_query = 'http://blah'
sf = urllib2.urlopen(search_query)
search_soup = BeautifulSoup.BeautifulStoneSoup(sf.read())
except urllib2.URLError, err:
print(err.reason)
finally:
try:
sf.close()
except NameError:
pass
Given that you are trying to use 'with', you should be on Python 2.5, and then this applies too: http://docs.python.org/tutorial/errors.html#defining-clean-up-actions
If urlopen() has an exception, catch it and call the exception's close() function, like this:
try:
req = urllib2.urlopen(url)
req.close()
print 'request {0} ok'.format(url)
except urllib2.HTTPError, e:
e.close()
print 'request {0} failed, http code: {1}'.format(url, e.code)
except urllib2.URLError, e:
print 'request {0} error, error reason: {1}'.format(url, e.reason)
the exception is also a full response object, you can see this issue message: http://bugs.jython.org/issue1544
Looks like the problem runs deeper than I thought - this forum thread indicates urllib2 doesn't implement with until after python 2.6, and possibly not until 3.1
You could create your own generic URL opener:
from contextlib import contextmanager
#contextmanager
def urlopener(inURL):
"""Open a URL and yield the fileHandle then close the connection when leaving the 'with' clause."""
fileHandle = urllib2.urlopen(inURL)
try: yield fileHandle
finally: fileHandle.close()
Then you could then use your syntax from your original question:
with urlopener(theURL) as sf:
search_soup = BeautifulSoup.BeautifulSoup(sf.read())
This solution gives you a clean separation of concerns. You get a clean generic urlopener syntax that handles the complexities of properly closing the resource regardless of errors that occur underneath your with clause.
Why not just use multiple try/except blocks?
try:
# send the query request
sf = urllib2.urlopen(search_query)
except urllib2.URLError as url_error:
sys.stderr.write("Error requesting url: %s\n" % (search_query,))
raise
try:
search_soup = BeautifulSoup.BeautifulStoneSoup(sf.read())
except Exception, err: # Maybe catch more specific Exceptions here
sys.stderr.write("Couldn't get programme information from url: %s\n" % (search_query,))
raise # or return as in your original code
finally:
sf.close()

Categories