Sniffing Python urllib2.httperror content and passing it along

Sniffing Python urllib2.httperror content and passing it along - python

In some python code, I am using a library to wrap requests to a web service. The behaviour I intend is that any HTTPErrors have their content output with a logging.error along with the status code, and the error passed along:
def my_request_thing(api, other_stuff):
request = make_request_from(api, other_stuff)
try:
with closing(urllib2.urlopen(request)) as fd:
return fd.read()
except HTTPError as e:
logging.error("Error from server: %s\n%s", e.code, e.read())
raise
This code will log, and pass the error along, with one problem, the exceptions content is exhausted in e.read. This code is intended to be used to most clients to the API substituting things like root paths and http headers...
I may then have another function for more domain specific stuff using this:
function get_my_thing(thing_id, conditions):
try:
return json.loads(my_request_thing(<thing_id + conditions into api and stuff...>))
except HTTPError as e:
if e.code == 404 and "my thing does not exist" in e.read():
return False
else:
raise e
Note here that this also tries to get data with e.read - which is now empty, and may still reraise the error. This will fail to work - there is not data in e.read here.
Is there a good way to reraise this exception such that the content is not exhausted, but so I can sniff out particular exception types and log them all on the way?

As per Karel Kubat comment, why don't you inject the results from e.read() into the exception as a data member upon seeing it for the first time?
For example, derive your own error class from HTTPError with an empty self.content. When catching an exception for the first time, fill self.content from self.read(). Next handlers can inspect e.content.

Related

Correct way to handle custom API error messages in Requests?

I'm using the Python Requests library to call an API that may return custom error messages in it's JSON response, i.e. something like this:
{
"error": "TOKEN INVALID"
}
I would like to have a custom exception class with this error message as the message, but otherwise use Requests error handling as much as possible. From other questions it appears that the correct way to do this would be something like this:
class MyCustomAPIError(requests.exceptions.HTTPError):
pass
response = requests.request('GET', url)
try:
response.raise_for_status()
except requests.exceptions.HTTPError as e:
msg = e.response.json().get('error')
raise MyCustomAPIError(msg)
This however raises two exceptions, the original HTTPError first and then
During handling of the above exception, another exception occurred:
... my custom error
which doesn't seem very elegant. I could suppress the first exception by using
raise MyCustomAPIError(msg) from None
but that seems rather hacky. Is this the intended way to do this or am I missing a simpler solution?

How to get the exception string from requests.exceptions.RequestException

I have the below flask code :
from flask import Flask,request,jsonify
import requests
from werkzeug.exceptions import InternalServerError, NotFound
import sys
import json
app = Flask(__name__)
app.config['SECRET_KEY'] = "Secret!"
class InvalidUsage(Exception):
status_code = 400
def __init__(self, message, status_code=None, payload=None):
Exception.__init__(self)
self.message = message
if status_code is not None:
self.status_code = status_code
self.payload = payload
def to_dict(self):
rv = dict(self.payload or ())
rv['message'] = self.message
rv['status_code'] = self.status_code
return rv
#app.errorhandler(InvalidUsage)
def handle_invalid_usage(error):
response = jsonify(error.to_dict())
response.status_code = error.status_code
return response
#app.route('/test',methods=["GET","POST"])
def test():
url = "https://httpbin.org/status/404"
try:
response = requests.get(url)
if response.status_code != 200:
try:
response.raise_for_status()
except requests.exceptions.HTTPError:
status = response.status_code
print status
raise InvalidUsage("An HTTP exception has been raised",status_code=status)
except requests.exceptions.RequestException as e:
print e
if __name__ == "__main__":
app.run(debug=True)
My question is how do i get the exception string(message) and other relevant params from the requests.exceptions.RequestException object e ?
Also what is the best way to log such exceptions . In case of an HTTPError exceptions i have the status code to refer to.
But requests.exceptions.RequestException catches all request exceptions . So how do i differentiate between them and also what is the best way to log them apart from using print statements.
Thanks a lot in advance for any answers.

RequestException is a base class for HTTPError, ConnectionError, Timeout, URLRequired, TooManyRedirects and others (the whole list is available at the GitHub page of requests module). Seems that the best way of dealing with each error and printing the corresponding information is by handling them starting from more specific and finishing with the most general one (the base class). This has been elaborated widely in the comments in this StackOverflow topic. For your test() method this could be:
#app.route('/test',methods=["GET","POST"])
def test():
url = "https://httpbin.org/status/404"
try:
# some code...
except requests.exceptions.ConnectionError as ece:
print("Connection Error:", ece)
except requests.exceptions.Timeout as et:
print("Timeout Error:", et)
except requests.exceptions.RequestException as e:
print("Some Ambiguous Exception:", e)
This way you can firstly catch the errors that inherit from the RequestException class and which are more specific.
And considering an alternative for printing statements - I'm not sure if that's exactly what you meant, but you can log into console or to a file with standard Python logging in Flask or with the logging module itself (here for Python 3).

This is actually not a question about using the requests library as much as it is a general Python question about how to extract the error string from an exception instance. The answer is relatively straightforward: you convert it to a string by calling str() on the exception instance. Any properly written exception handler (in requests or otherwise) would have implemented an __str__() method to allow an str() call on an instance. Example below:
import requests
rsp = requests.get('https://httpbin.org/status/404')
try:
if rsp.status_code >= 400:
rsp.raise_for_status()
except requests.exceptions.RequestException as e:
error_str = str(e)
# log 'error_str' to disk, a database, etc.
print('The error was:', error_str)
Yes, in this example, we print it, but once you have the string you have additional options. Anyway, saving this to test.py results in the following output given your test URL:
$ python3 test.py
The error was: 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404

Python HTTP Error 429 with urllib2

I am using the following code to resolve redirects to return a links final url
def resolve_redirects(url):
return urllib2.urlopen(url).geturl()
Unfortunately I sometimes get HTTPError: HTTP Error 429: Too Many Requests. What is a good way to combat this? Is the following good or is there a better way.
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
time.sleep(5)
return urllib2.urlopen(url).geturl()
Also, what would happen if there is an exception in the except block?

It would be better to make sure the HTTP code is actually 429 before re-trying.
That can be done like this:
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError, e:
if e.code == 429:
time.sleep(5);
return resolve_redirects(url)
raise
This will also allow arbitrary numbers of retries (which may or may not be desired).
https://docs.python.org/2/howto/urllib2.html#httperror

This is a fine way to handle the exception, though you should check to make sure you are always sleeping for the appropriate amount of time between requests for the given website (for example twitter limits the amount of requests per minute and has this amount clearly shown in their api documentation). So just make sure you're always sleeping long enough.
To recover from an exception within an exception, you can simply embed another try/catch block:
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
time.sleep(5)
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
return "Failed twice :S"
Edit: as #jesse-w-at-z points out, you should be returning an URL in the second error case, the code I posted is just a reference example of how to write a nested try/catch.

Adding User-Agent to request header solved my issue:
from urllib import request
from urllib.request import urlopen
url = 'https://www.example.com/abc.json'
req = request.Request(url)
req.add_header('User-Agent', 'abc-bot')
response = request.urlopen(req)

What is the best way to write try except in Python?

Suppose I have a code snippet as following
r = requests.post(url, data=values, files=files)
Since this is making a network request, a bunch of exceptions can be thrown from this line. For completeness of the argument, I could also have file reads, sending emails, etc. To encounter for such errors I do
try:
r = requests.post(url, data=values, files=files)
if r.status_code != 200:
raise Exception("Could not post to "+ url)
except Exception as e:
logger.error("Error posting to " + url)
There are two problems which I see with this approach.
I have just handled a generic exception and don't know what exact exception would be raised by this line, what is the best way to find it in python.
This makes the code look ugly, which is non pythonic but fine, as long as its robust and handles all the cases.
I am wondering what would be the best way to handle exceptions in python.

The best way to write try-except -- in Python or anywhere else -- is as narrow as possible. It's a common problem to catch more exceptions than you meant to handle!
In particular, at a minimum, I'd re-write your example code as something like:
try:
r = requests.post(url, data=values, files=files)
except Exception as e:
logger.error("Error posting to %r: %s" % (url, e))
raise
else:
if r.status_code != 200:
logger.error("Could not to %r: HTTP code %s" % (url, r.status_code))
raise RuntimeError("HTTP code %s trying to post to %r" % (r.status_code, url))
This embodies several best-practices, such as: detailed error messages, always re-raise exceptions you don't know how to specifically handle (after logging error messages with more details as well as the exception), never raise something as generic as Exception, &c -- and, crucially, catch exceptions only on the narrowest part of code you possibly can, that's what the else: clause in try/except is for!-)
If and when you do expect -- and know how to handle -- specific exceptions, so much the better -- you put other except ThisSpecificProblem as e: clauses before the generic except Exception clause which logs and re-raises. But (from the Zen of Python -- import this at a Python interpreter prompt!) -- "Errors should never pass silently. // Unless explicitly silenced."... and you should only "explicitly silence" errors you fully expect, and fully know how to handle!

I have just handled a generic exception and don't know what exact
exception would be raised by this line, what is the best way to find
it in python.
As always, the answer is to look at the documentation:
In the event of a network problem (e.g. DNS failure, refused
connection, etc), Requests will raise a ConnectionError exception.
In the rare event of an invalid HTTP response, Requests will raise an
HTTPError exception.
If a request times out, a Timeout exception is raised.
If a request exceeds the configured number of maximum redirections, a
TooManyRedirects exception is raised.
All exceptions that Requests explicitly raises inherit from
requests.exceptions.RequestException.
Code that raises exceptions (especially if there are custom exceptions) is documented. You can also have a look at the source if the documentation is not explicit.
Your code is fine, except you should avoid generic except clauses as these can hide other problems with your code. You should except those exceptions that you can predict, and then let the others "rise up" until caught/logged.

Well, answering your first question, what exact exception would be raised by this line, you are one step away.
You already call except Exception as e, but you don't use e anywhere. e contains the information about your exception, so just add a little print statement
print e
And it works:
>>> try:
... x = int(raw_input('Input: '))
... except Exception as e:
... print e
...
Input: 5t
invalid literal for int() with base 10: '5t'
>>>
I don't exactly see what you're asking in the 2nd, you say it is ugly/non-pythonic, but then you say it is fine. Yes, it is fine, and it is also quite pythonic, in my opinion.

You should try avoiding using except Exception as e: as much as possible.
For clarity you can create a custom exception class which takes care of your error code = 200 scenario.
class PostingError(Exception):
pass
And then raise PostingError only. Try catching this error only. By catching all kinds of error, you might be catching wrong information. For example even a memory error might be caught and displayed as a "Error posting to URL".
So this is how it would look like finally
try:
r = requests.post(url, data=values, files=files)
if r.status_code != 200:
raise PostingError("Could not post to "+ url)
except PostingError as e:
logger.error(e)

How to print body of response on urllib2.URLError?

In a script I am creating I am posting a lot of data to a REST API.
The script is quite modularized and at the top level somewhere I am catching a URLError. I need to know what is in the body of the response, because there will be an error message in there.
Is there a method on URLError that I can use?
try:
(calling some function that throws URLError)
except urllib2.URLError, e:
print "Error: " + str(e.body_or_something)

Yes there is. You have an access to the response via e.readlines():
try:
(calling some function that throws URLError)
except urllib2.URLError, e:
print e.readlines()

See the documenet: https://docs.python.org/2/library/urllib2.html#urllib2.URLError
exception urllib2.URLError The handlers raise this exception (or
derived exceptions) when they run into a problem. It is a subclass of
IOError.
reason The reason for this error. It can be a message string or
another exception instance (socket.error for remote URLs, OSError for
local URLs).
exception urllib2.HTTPError Though being an exception (a subclass of
URLError), an HTTPError can also function as a non-exceptional
file-like return value (the same thing that urlopen() returns). This
is useful when handling exotic HTTP errors, such as requests for
authentication.
code An HTTP status code as defined in RFC 2616. This numeric value
corresponds to a value found in the dictionary of codes as found in
BaseHTTPServer.BaseHTTPRequestHandler.responses.
reason The reason for this error. It can be a message string or
another exception instance.
So you can access the response body when the request raise urllib2.HTTPError.
Try this:
try:
(calling some function that throws URLError)
except urllib2.HTTPError as e:
body = e.readlines()
print e.code, e.reason, body
except urllib2.URLError as e:
print e.reason
except:
sys.excepthook(*sys.exc_info())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sniffing Python urllib2.httperror content and passing it along - python

Related

Correct way to handle custom API error messages in Requests?

How to get the exception string from requests.exceptions.RequestException

Python HTTP Error 429 with urllib2

What is the best way to write try except in Python?

How to print body of response on urllib2.URLError?

Categories

Resources