I'm setting up a small Python script so my colleagues can collect data from a certain internal API based on a few inputs using the following code:
url = "https://....."
params = dict(...)
client = BackendApplicationClient(client_id=client_id)
client.prepare_request_body(scope=[])
session = OAuth2Session(client=client)
response = session.get(url=url, params=params, verify=session.verify)
where the params are based on the manual inputs. I can guarantee some of the inputs will not conform to the API's requirements fully (like lower case letters where upper case is needed, etc.). In this case, the API will return a response with status 400:
>> response
<Response [400]>
>> response.text
{"statusCode":400,"errorMessage":"Bad Request","errors": ...}
>> response.status_code
400
I thought I could capture this with response.raise_for_status(), but no Exception is raised, and the returned value is None:
>> response.raise_for_status()
None
Why is this? I thought the raise_for_status function was supposed to raise an Exception based on the response's status_code
raise_for_status() on a response from the requests module will raise an HTTPError exception if the HTTP status code is 400. This is a peculiarity of OAuth2Session which you can read about here
Related
Background and Code
I have the below function to handle rate limiting in Twitter's V2 API based on the HTTP status codes.
from datetime import datetime
from osometweet.utils import pause_until
def manage_rate_limits(response):
"""Manage Twitter V2 Rate Limits
This method takes in a `requests` response object after querying
Twitter and uses the headers["x-rate-limit-remaining"] and
headers["x-rate-limit-reset"] headers objects to manage Twitter's
most common, time-dependent HTTP errors.
Wiki Reference: https://github.com/osome-iu/osometweet/wiki/Info:-HTTP-Status-Codes-and-Errors
Twitter Reference: https://developer.twitter.com/en/support/twitter-api/error-troubleshooting
"""
while True:
# The x-rate-limit-remaining parameter is not always present.
# If it is, we want to use it.
try:
# Get number of requests left with our tokens
remaining_requests = int(response.headers["x-rate-limit-remaining"])
# If that number is one, we get the reset-time
# and wait until then, plus 15 seconds (your welcome Twitter).
# The regular 429 exception is caught below as well,
# however, we want to program defensively, where possible.
if remaining_requests == 1:
buffer_wait_time = 15
resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
print(f"One request from being rate limited. Waiting on Twitter.\n\tResume Time: {resume_time}")
pause_until(resume_time)
except Exception as e:
print("An x-rate-limit-* parameter is likely missing...")
print(e)
# Explicitly checking for time dependent errors.
# Most of these errors can be solved simply by waiting
# a little while and pinging Twitter again - so that's what we do.
if response.status_code != 200:
# Too many requests error
if response.status_code == 429:
buffer_wait_time = 15
resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
print(f"Too many requests. Waiting on Twitter.\n\tResume Time: {resume_time}")
pause_until(resume_time)
# Twitter internal server error
elif response.status_code == 500:
# Twitter needs a break, so we wait 30 seconds
resume_time = datetime.now().timestamp() + 30
print(f"Internal server error # Twitter. Giving Twitter a break...\n\tResume Time: {resume_time}")
pause_until(resume_time)
# Twitter service unavailable error
elif response.status_code == 503:
# Twitter needs a break, so we wait 30 seconds
resume_time = datetime.now().timestamp() + 30
print(f"Twitter service unavailable. Giving Twitter a break...\n\tResume Time: {resume_time}")
pause_until(resume_time)
# If we get this far, we've done something wrong and should exit
raise Exception(
"Request returned an error: {} {}".format(
response.status_code, response.text
)
)
# Each time we get a 200 response, exit the function and return the response object
if response.ok:
return response
This function is fed a response object from a requests call like the below
response = requests.get(
url,
headers=self._header,
params=payload
)
response = manage_rate_limits(response)
In the above response call the parameters are the following:
where
url = Twitter's base endpoint URL (in this case it is the full archive academic search)
params/payload = a combination of endpoint search operators (these should be irrelevant but I can include if necessary)
headers/self._bearer_token is a user bearer_token in the below proper header form
self._header = {"Authorization": f"Bearer {MY_BEARER_TOKEN}"}
Question & Error:
Using the above code, I get a long-running script that returns the below error from the rate_limit_manager function.
Traceback (most recent call last):
File "/scratch/mdeverna/Superspreaders/src/get_rts_of_user.py", line 218, in get_rts_of_user
full_archive_search = True
File "/nfs/nfs5/home/scratch/mdeverna/osometweet/osometweet/api.py", line 248, in search
response = self._oauth.make_request(url, payload)
File "/nfs/nfs5/home/scratch/mdeverna/osometweet/osometweet/oauth.py", line 181, in make_request
response = manage_rate_limits(response)
File "/nfs/nfs5/home/scratch/mdeverna/osometweet/osometweet/rate_limit_manager.py", line 67, in manage_rate_limits
response.status_code, response.text
Exception: Request returned an error: 429 {"title":"Too Many Requests","type":"about:blank","status":429,"detail":"Too Many Requests"}
What I don't understand is that the line that prints this exception is...
# If we get this far, we've done something wrong and should exit
raise Exception(
"Request returned an error: {} {}".format(
response.status_code, response.text
)
... and this illustrates the response.status_code prints (equals) 429, however, the conditional earlier in this function checks for exactly this status code but seems to miss it. It seems like the condition which checks if the status code = 429 is being skipped, only to print down below that the status code is 429?
What is going on here?
Even if the status code is 429 or 500 or 503, you're going to flow off the bottom of the if/elif/elif sequence and right into the raise. Did you intend to return at the end of each? Or did you mean for the raise to be in an else: clause?
This question already has an answer here:
How to return with a specific status in a Python Google Cloud Function
(1 answer)
Closed 2 years ago.
I created a function that transforms some data and sends it to FB API.
It works perfectly when FB API responds with 200 code, otherwise function returns internal server error.
I've added raise_for_status() and now I can return an error message if FB API responds with non-200 code.
How can I make my function not only to respond with a relevant error message but with the relevant status code?
response = requests.request("POST", url, headers=headers, data=payload, params=params)
resp = {}
try:
response.raise_for_status()
except requests.exceptions.HTTPError:
response.status_code = 400
resp['message'] = response.text
else:
resp['message'] = response.text
finally:
return resp
Add the HTTP code after your response, like this
return resp, 403
I'm using Python 3.7 with urllib.
All work fine but it seems not to athomatically redirect when it gets an http redirect request (307).
This is the error i get:
ERROR 2020-06-15 10:25:06,968 HTTP Error 307: Temporary Redirect
I've to handle it with a try-except and manually send another request to the new Location: it works fine but i don't like it.
These is the piece of code i use to perform the request:
req = urllib.request.Request(url)
req.add_header('Authorization', auth)
req.add_header('Content-Type','application/json; charset=utf-8')
req.data=jdati
self.logger.debug(req.headers)
self.logger.info(req.data)
resp = urllib.request.urlopen(req)
url is an https resource and i set an header with some Authhorization info and content-type.
req.data is a JSON
From urllib documentation i've understood that the redirects are authomatically performed by the the library itself, but it doesn't work for me. It always raises an http 307 error and doesn't follow the redirect URL.
I've also tried to use an opener specifiyng the default redirect handler, but with the same result
opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler)
req = urllib.request.Request(url)
req.add_header('Authorization', auth)
req.add_header('Content-Type','application/json; charset=utf-8')
req.data=jdati
resp = opener.open(req)
What could be the problem?
The reason why the redirect isn't done automatically has been correctly identified by yours truly in the discussion in the comments section. Specifically, RFC 2616, Section 10.3.8 states that:
If the 307 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.
Back to the question - given that data has been assigned, this automatically results in get_method returning POST (as per how this method was implemented), and since that the request method is POST, and the response code is 307, an HTTPError is raised instead as per the above specification. In the context of Python's urllib, this specific section of the urllib.request module raises the exception.
For an experiment, try the following code:
import urllib.request
import urllib.parse
url = 'http://httpbin.org/status/307'
req = urllib.request.Request(url)
req.data = b'hello' # comment out to not trigger manual redirect handling
try:
resp = urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
if e.status != 307:
raise # not a status code that can be handled here
redirected_url = urllib.parse.urljoin(url, e.headers['Location'])
resp = urllib.request.urlopen(redirected_url)
print('Redirected -> %s' % redirected_url) # the original redirected url
print('Response URL -> %s ' % resp.url) # the final url
Running the code as is may produce the following
Redirected -> http://httpbin.org/redirect/1
Response URL -> http://httpbin.org/get
Note the subsequent redirect to get was done automatically, as the subsequent request was a GET request. Commenting out req.data assignment line will result in the lack of the "Redirected" output line.
Other notable things to note in the exception handling block, e.read() may be done to retrieve the response body produced by the server as part of the HTTP 307 response (since data was posted, there might be a short entity in the response that may be processed?), and that urljoin is needed as the Location header may be a relative URL (or simply has the host missing) to the subsequent resource.
Also, as a matter of interest (and for linkage purposes), this specific question has been asked multiple times before and I am rather surprised that they never got any answers, which follows:
How to handle 307 redirection using urllib2 from http to https
HTTP Error 307: Temporary Redirect in Python3 - INTRANET
HTTP Error 307 - Temporary redirect in python script
In python, when a http request is invalid, response is None, in this case, how to get the response code from the response? The invalid request in my code are caused by two reasons, one is a invalid token, I expect to get 401 in this case, another reason is invalid parameter, I expect to get 400 in this case, but under both cases, response is always None and I'm not able to get the response code by calling response.getcode(), how to solve this?
req = urllib2.Request(url)
response = None
try: response = urllib2.urlopen(req)
except urllib2.URLError as e:
res_code = response.getcode() #AttributeError: 'NoneType' object has no attribute 'getcode'
You can't get the status code when URLError is raised. Because when it is raised (ex: DNS couldn't resolve domain name), it means request hasn't been sent to server yet so there is no HTTP response generated.
In your scenario, (for 4xx HTTP status code), urllib2 throws HTTPError so you can derive the status code from it.
The documentation says:
code
An HTTP status code as defined in RFC 2616. This numeric value corresponds to a value found in the dictionary of codes as found in BaseHTTPServer.BaseHTTPRequestHandler.responses.
import urllib2
request = urllib2.Request(url)
try:
response = urllib2.urlopen(request)
res_code = response.code
except urllib2.HTTPError as e:
res_code = e.code
Hope this helps.
According to the urllib2 documentation,
Because the default handlers handle redirects (codes in the 300 range), and codes in the 100-299 range indicate success, you will usually only see error codes in the 400-599 range.
And yet the following code
request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)
raises an HTTPError with code 201 (created):
ERROR 2011-08-11 20:40:17,318 __init__.py:463] HTTP Error 201: Created
So why is urllib2 throwing HTTPErrors on this successful request?
It's not too much of a pain; I can easily extend the code to:
try:
request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)
except HTTPError, e:
if e.code == 201:
# success! :)
else:
# fail! :(
else:
# when will this happen...?
But this doesn't seem like the intended behavior, based on the documentation and the fact that I can't find similar questions about this odd behavior.
Also, what should the else block be expecting? If successful status codes are all interpreted as HTTPErrors, then when does urllib2.urlopen() just return a normal file-like response object like all the urllib2 documentation refers to?
You can write a custom Handler class for use with urllib2 to prevent specific error codes from being raised as HTTError. Here's one I've used before:
class BetterHTTPErrorProcessor(urllib2.BaseHandler):
# a substitute/supplement to urllib2.HTTPErrorProcessor
# that doesn't raise exceptions on status codes 201,204,206
def http_error_201(self, request, response, code, msg, hdrs):
return response
def http_error_204(self, request, response, code, msg, hdrs):
return response
def http_error_206(self, request, response, code, msg, hdrs):
return response
Then you can use it like:
opener = urllib2.build_opener(self.BetterHTTPErrorProcessor)
urllib2.install_opener(opener)
req = urllib2.Request(url, data, headers)
urllib2.urlopen(req)
As the actual library documentation mentions:
For 200 error codes, the response object is returned immediately.
For non-200 error codes, this simply passes the job on to the protocol_error_code handler methods, via OpenerDirector.error(). Eventually, urllib2.HTTPDefaultErrorHandler will raise an HTTPError if no other handler handles the error.
http://docs.python.org/library/urllib2.html#httperrorprocessor-objects
I personally think it was a mistake and very nonintuitive for this to be the default behavior.
It's true that non-2XX codes imply a protocol level error, but turning that into an exception is too far (in my opinion at least).
In any case, I think the most elegant way to avoid this is:
opener = urllib.request.build_opener()
for processor in opener.process_response['https']: # or http, depending on what you're using
if isinstance(processor, urllib.request.HTTPErrorProcessor): # HTTPErrorProcessor also for https
opener.process_response['https'].remove(processor)
break # there's only one such handler by default
response = opener.open('https://www.google.com')
Now you have the response object. You can check it's status code, headers, body, etc.