AIOHTTP having request body/content/text when calling raise_for_status

AIOHTTP having request body/content/text when calling raise_for_status - python

I'm using FastAPI with aiohttp, I built a singleton for a persistent session and I'm using it for opening the session at startup and closing it at shutdown.
Demand: The response body is precious in case of a failure I must log it with the other details.
Because how raise_for_status behave I had to write those ugly functions which handle each HTTP method, this is one of them:
async def post(self, url: str, json: dict, headers: dict) -> ClientResponse:
response = await self.session.post(url=url, json=json, headers=headers)
response_body = await response.text()
try:
response.raise_for_status()
except Exception:
logger.exception('Request failed',
extra={'url': url, 'json': json, 'headers': headers, 'body': response_body})
raise
return response
If I could count on raise_for_status to return also the body (response.text()),
I just could initiate the session ClientSession(raise_for_status=True) and write a clean code:
response = await self.session.post(url=url, json=json, headers=headers)
Is there a way to force somehow raise_for_status to return also the payload/body, maybe in the initialization of the ClientSession?
Thanks for the help.

It is not possible for aiohttp and raise_for_status. As #Andrew Svetlov answered here:
Consider response as closed after raising an exception.
Technically it can contain a partial body but there is no any guarantee.
There is no reason to read it, the body could be very huge, 1GiB is not a limit.
If you need a response content for non-200 -- read it explicitly.
Alternatively, consider using the httpx library in this way. (It is widely used in conjunction with FastAPI):
def raise_on_4xx_5xx(response):
response.raise_for_status()
async with httpx.AsyncClient(event_hooks={'response': [raise_on_4xx_5xx]}) as client:
try:
r = await client.get('http://httpbin.org/status/418')
except httpx.HTTPStatusError as e:
print(e.response.text)

Related

Downloading Images with Python aiohttp: ClientPayloadError: Response payload is not completed

Prerequisites:
Python 3.9.5
aiohttp 3.7.4.post0
Hello! I am trying to download images from the given URL, and 99% of the time it works just fine, here is the snippet:
async def download_image(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
if response.status != 200:
raise exceptions.FileNotFound()
data = await response.read()
img = Image.open(io.BytesIO(data))
return img
But sometimes, on the step data = await resp.read() function throws aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed. Exception can be raised on a certain image, and on the second attempt to load this exact image it works again.
aiohttp documentation states:
This exception can only be raised while reading the response payload
if one of these errors occurs:
invalid compression
malformed chunked encoding
not enough data that satisfy Content-Length HTTP header.
What can I do to debug precisely what raises an exception? For me it seems some data in the process of session.get(url) gets corrupted, bits flip here and there. Is there a better way to retry image download than catching error while calling download_image function and repeating it?

How to retry Django Celery task on Exception when Internal Server Error is raised?

I am trying to use Celery to send anywhere from 1 to about 25 consecutive requests to a third-party API. The measure of success is whether I get a URL back in the response payload: response_json["data"]["url"]. Celery or not, sometimes I get the data I expect, and sometimes I don't.
I decided to experiment with Celery to retry the API call while taking advantage of the built-in exponential backoff, which sounds perfectly suited to my needs, but I am struggling to implement.
When the data I am expecting is not available in the response payload, I get a TypeError: 'type' object is not iterable (i.e., there is no such item in the response). But I am also getting an Internal Server Error, which I wonder whether I need to handle or whether I could use to trigger the retry.
Here is a example of one of several similar approaches I have tried:
#shared_task(autoretry_for=(Exception), retry_backoff=True, retry_backoff_max=120)
def third_party_api_request(payload, api_url, headers):
response = requests.request("POST", api_url, headers=headers, data=payload)
response_json = response.json()
return response_json["data"]["url"]
# output:
Internal Server Error:
-- snip --
autoretry_for = tuple(
TypeError: 'type' object is not iterable
Another approach I have tried:
#shared_task(bind=True)
def third_party_api_request(self, payload, api_url, headers):
try:
response = requests.request("POST", api_url, headers=headers, data=payload)
response_json = response.json()
return response_json["data"]["url"]
except TypeError as exc:
logger.error("Error sending request to API: %s", exc)
raise self.retry(exc=exc)
# output:
ERROR 2022-04-22 17:31:40,131 tasks 72780 123145528369152 Error sending request to API: 'NoneType' object is not subscriptable
Internal Server Error:
-- snip --
TypeError: 'NoneType' object is not subscriptable
-- snip --
raise ret
celery.exceptions.Retry: Retry in 180s: TypeError("'NoneType' object is not subscriptable")
ERROR 2022-04-22 17:31:40,409 log 72780 123145528369152 Internal Server Error:
-- snip --
TypeError: 'NoneType' object is not subscriptable
-- snip --
raise ret
celery.exceptions.Retry: Retry in 180s: TypeError("'NoneType' object is not subscriptable")
And yet another approach, with similar results:
#shared_task(autoretry_for=(TypeError), retry_backoff=True, retry_backoff_max=120)
def send_http_request_to_proctoru_task(payload, api_url, headers):
response = requests.request("POST", api_url, headers=headers, data=payload)
response_json = response.json()
try:
return response_json["data"]["url"]
except TypeError:
logger.error("API response: %s", response_json)
raise

The issue was I had misplaced the return statement. I still have much fine-tuning to do, but the following code solves the issue in the question that I posed earlier today. I have been reading posts, documentation, and articles for days, and I wish I could everyone credit, but this is the blog post that ultimately helped me realize my error: https://testdriven.io/blog/retrying-failed-celery-tasks/.
#shared_task(name="send_http_request_to_proctoru_task", bind=True, max_retries=6)
def send_http_request_to_proctoru_task(self, api_url, headers, payload):
try:
response = requests.request("POST", api_url, headers=headers, data=payload)
response_json = response.json()
if response_json["response_code"] == 2:
raise Exception()
return response_json["data"]["url"]
except Exception as exc:
logger.warning("Exception raised. Executing retry %s" % self.request.retries)
raise self.retry(exc=exc, countdown=2 ** self.request.retries)

aiohttp / Getting response object out of context manager

I'm currently doing my first "baby-steps" with aiohttp (coming from the requests module).
I tried to simplify the requests a bit so I wont have to use a context manager for each request in my main module.
Therefore I tried this:
async def get(session, url, headers, proxies=None):
async with session.get(url, headers=headers, proxy=proxies) as response:
response_object = response
return response_object
But it resulted in:
<class 'aiohttp.client_exceptions.ClientConnectionError'> - Connection closed
The request is available in the context manager. When I try to access it within the context manager in the mentioned function, all works.
But shouldn't it also be able to be saved in the variable <response_object> and then be returned afterwards so I can access it outside of the context manager?
Is there any workaround to this?

If you don't care for the data being loaded during the get method, perhaps you could try loading it inside it:
async def get(session, url, headers, proxies=None):
async with session.get(url, headers=headers, proxy=proxies) as response:
await response.read()
return response
And the using the body that was read like:
resp = get(session, 'http://python.org', {})
print(await resp.text())
Under the hood, the read method caches the body in a member named _body and when trying to call json, aiohttp first checks whether the body was already read or not.

Return a requests.Response object from Flask

I'm trying to build a simple proxy using Flask and requests. The code is as follows:
#app.route('/es/<string:index>/<string:type>/<string:id>',
methods=['GET', 'POST', 'PUT']):
def es(index, type, id):
elasticsearch = find_out_where_elasticsearch_lives()
# also handle some authentication
url = '%s%s%s%s' % (elasticsearch, index, type, id)
esreq = requests.Request(method=request.method, url=url,
headers=request.headers, data=request.data)
resp = requests.Session().send(esreq.prepare())
return resp.text
This works, except that it loses the status code from Elasticsearch. I tried returning resp (a requests.models.Response) directly, but this fails with
TypeError: 'Response' object is not callable
Is there another, simple, way to return a requests.models.Response from Flask?

Ok, found it:
If a tuple is returned the items in the tuple can provide extra information. Such tuples have to be in the form (response, status, headers). The status value will override the status code and headers can be a list or dictionary of additional header values.
(Flask docs.)
So
return (resp.text, resp.status_code, resp.headers.items())
seems to do the trick.

Using text or content property of the Response object will not work if the server returns encoded data (such as content-encoding: gzip) and you return the headers unchanged. This happens because text and content have been decoded, so there will be a mismatch between the header-reported encoding and the actual encoding.
According to the documentation:
In the rare case that you’d like to get the raw socket response from the server, you can access r.raw. If you want to do this, make sure you set stream=True in your initial request.
and
Response.raw is a raw stream of bytes – it does not transform the response content.
So, the following works for gzipped data too:
esreq = requests.Request(method=request.method, url=url,
headers=request.headers, data=request.data)
resp = requests.Session().send(esreq.prepare(), stream=True)
return resp.raw.read(), resp.status_code, resp.headers.items()
If you use a shortcut method such as get, it's just:
resp = requests.get(url, stream=True)
return resp.raw.read(), resp.status_code, resp.headers.items()

Flask can return an object of type flask.wrappers.Response.
You can create one of these from your requests.models.Response object r like this:
from flask import Response
return Response(
response=r.reason,
status=r.status_code,
headers=dict(r.headers)
)

I ran into the same scenario, except that in my case my requests.models.Response contained an attachment. This is how I got it to work:
return send_file(BytesIO(result.content), mimetype=result.headers['Content-Type'], as_attachment=True)

My use case is to call another API in my own Flask API. I'm just propagating unsuccessful requests.get calls through my Flask response. Here's my successful approach:
headers = {
'Authorization': 'Bearer Muh Token'
}
try:
response = requests.get(
'{domain}/users/{id}'\
.format(domain=USERS_API_URL, id=hit['id']),
headers=headers)
response.raise_for_status()
except HTTPError as err:
logging.error(err)
flask.abort(flask.Response(response=response.content, status=response.status_code, headers=response.headers.items()))

Why does Python's urllib2.urlopen() raise an HTTPError for successful status codes?

According to the urllib2 documentation,
Because the default handlers handle redirects (codes in the 300 range), and codes in the 100-299 range indicate success, you will usually only see error codes in the 400-599 range.
And yet the following code
request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)
raises an HTTPError with code 201 (created):
ERROR 2011-08-11 20:40:17,318 __init__.py:463] HTTP Error 201: Created
So why is urllib2 throwing HTTPErrors on this successful request?
It's not too much of a pain; I can easily extend the code to:
try:
request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)
except HTTPError, e:
if e.code == 201:
# success! :)
else:
# fail! :(
else:
# when will this happen...?
But this doesn't seem like the intended behavior, based on the documentation and the fact that I can't find similar questions about this odd behavior.
Also, what should the else block be expecting? If successful status codes are all interpreted as HTTPErrors, then when does urllib2.urlopen() just return a normal file-like response object like all the urllib2 documentation refers to?

You can write a custom Handler class for use with urllib2 to prevent specific error codes from being raised as HTTError. Here's one I've used before:
class BetterHTTPErrorProcessor(urllib2.BaseHandler):
# a substitute/supplement to urllib2.HTTPErrorProcessor
# that doesn't raise exceptions on status codes 201,204,206
def http_error_201(self, request, response, code, msg, hdrs):
return response
def http_error_204(self, request, response, code, msg, hdrs):
return response
def http_error_206(self, request, response, code, msg, hdrs):
return response
Then you can use it like:
opener = urllib2.build_opener(self.BetterHTTPErrorProcessor)
urllib2.install_opener(opener)
req = urllib2.Request(url, data, headers)
urllib2.urlopen(req)

As the actual library documentation mentions:
For 200 error codes, the response object is returned immediately.
For non-200 error codes, this simply passes the job on to the protocol_error_code handler methods, via OpenerDirector.error(). Eventually, urllib2.HTTPDefaultErrorHandler will raise an HTTPError if no other handler handles the error.
http://docs.python.org/library/urllib2.html#httperrorprocessor-objects

I personally think it was a mistake and very nonintuitive for this to be the default behavior.
It's true that non-2XX codes imply a protocol level error, but turning that into an exception is too far (in my opinion at least).
In any case, I think the most elegant way to avoid this is:
opener = urllib.request.build_opener()
for processor in opener.process_response['https']: # or http, depending on what you're using
if isinstance(processor, urllib.request.HTTPErrorProcessor): # HTTPErrorProcessor also for https
opener.process_response['https'].remove(processor)
break # there's only one such handler by default
response = opener.open('https://www.google.com')
Now you have the response object. You can check it's status code, headers, body, etc.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

AIOHTTP having request body/content/text when calling raise_for_status - python

Related

Downloading Images with Python aiohttp: ClientPayloadError: Response payload is not completed

How to retry Django Celery task on Exception when Internal Server Error is raised?

aiohttp / Getting response object out of context manager

Return a requests.Response object from Flask

Why does Python's urllib2.urlopen() raise an HTTPError for successful status codes?

Categories

Resources