Correctly catch aiohttp TimeoutError when using asyncio.gather - python

It is my first question here on Stack Overflow so I apologize if I did something stupid or missed something.
I am trying to make asynchronous aiohttp GET requests to many api endpoints at a time to check the status of these pages: the result should be a triple of the form
(url, True, "200") in case of a working link and (url, False, response_status) in case of a "problematic link". This is the atomic function for each call:
async def ping_url(url, session, headers, endpoint):
try:
async with session.get((url + endpoint), timeout=5, headers=headers) as response:
return url, (response.status == 200), str(response.status)
except Exception as e:
test_logger.info(url + ": " + e.__class__.__name__)
return url, False, repr(e)
These are wrapped into a function using asyncio.gather() which also creates the aiohttp Session:
async def ping_urls(urllist, endpoint):
headers = ... # not relevant
async with ClientSession() as session:
try:
results = await asyncio.gather(*[ping_url(url, session, headers, endpoint) \
for url in urllist],return_exceptions=True)
except Exception as e:
print(repr(e))
return results
The whole called from a main that looks like this:
urls = ... # not relevant
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(ping_urls(urls, endpoint))
except Exception as e:
pass
finally:
loop.close()
This works most of the time, but if the list is pretty long, I noticed that as soon as I get one
TimeoutError
the execution loop stops and I get TimeoutError for all other urls after the first one that timed out. If I omit the timeout in the innermost function I get somehow better results, but then it is not that fast anymore. Is there a way to control the Timeouts for the single api calls instead of a big general timeout for the whole list of urls?
Any kind of help would be extremely appreciated, I got stuck with my bachelor thesis because of this issue.

You may want to try setting a session timeout for your client session. This can be done like:
async def ping_urls(urllist, endpoint):
headers = ... # not relevant
timeout = ClientTimeout(total=TIMEOUT_SECONDS)
async with ClientSession(timeout=timeout) as session:
try:
results = await asyncio.gather(
*[
ping_url(url, session, headers, endpoint)
for url in urllist
],
return_exceptions=True
)
except Exception as e:
print(repr(e))
return results
This should set the ClientSession instance to have TIMEOUT_SECONDS as the timeout. Obviously you will need to set that value to something appropriate!

I struggled with the exceptions as well. I then found the hint, that I can also show the type of the Exception. And with that create appropriate Exception handling.
try: ...
except Exception as e:
print(f'Error: {e} of Type: {type(e)}')
So, with this you can find out, what kind of errors occur and you can catch and handle them individually.
e.g.
try: ...
except aiohttp.ClientConnectionError as e:
# deal with this type of exception
except aiohttp.ClientResponseError as e:
# handle individually
except asyncio.exceptions.TimeoutError as e:
# these kind of errors happened to me as well

Related

Why am I not getting a connection error when my API request fails on dropped wifi?

I am pulling data down from an API that has a limit of 250 records per call. There are a total of 100,000 records I need to pull down doing it 250 a time. I run my application leveraging the get_stats function below. It works fine for awhile but when my wifi drops and I am in the middle of the get request the request will hang and I won't get an exception back causing the rest of the application to hang as well.
I have tested turning off my wifi when the function is NOT in the middle of the get request and it does return back the ConnectionError exception.
How do I go about handling the situation where my app is in the middle of the get request and my wifi drops? I am thinking I need to do a timeout to give my wifi time to reconnect and then retry but how do I go about doing that? Or is there another way?
def get_stats(url, version):
headers = {
"API_version": version,
"API_token": "token"
}
try:
r = requests.get(url, headers=headers)
print(f"Status code: 200")
return json.loads(r.text)
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
print("Error here in timeout")
except requests.exceptions.TooManyRedirects:
# Tell the user their URL was bad and try a different one
print("Redirect errors here")
except requests.exceptions.ConnectionError as r:
print("Connection error")
r = "Connection Error"
return r
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
print("System errors here")
raise SystemExit(e)
To set a timeout on the request, call requests.get like this
r = requests.get(url, headers=headers, timeout=10)
The end goal is to get the data, so just make the call again with a possible sleep after failing
edit: I would say that the timeout is the sleep

Aiohttp session timeout doesn't cancel the request

I have this piece of code where I send a POST request and set it a maximum timeout using the aiohttp package:
from aiohttp import ClientTimeout, ClientSession
response_code = None
timeout = ClientTimeout(total=2)
async with ClientSession(timeout=timeout) as session:
try:
async with session.post(
url="some url", json=post_payload, headers=headers,
) as response:
response_code = response.status
except Exception as err:
logger.error(err)
That part works, however the request appears to not be canceled whenever the timeout and respectively the except clause are reached - I still receive it on the other end, even though an exception has been raised. I would like for the request to automatically be canceled whenever the timeout has been reached. Thanks in advance.

Creating an all in one requests function for shorter easier code in python3

I'm Trying to create an all in one function that handles all my API requests and cuts down on lots of repeated code especially with all of the error handling for different error codes.
I am using a few files different files to achieve this a.py that connects to "api a" and b.py that connect to "api b" and api.py that contains the function
a.py and b.py both start with
from api import *
and use
login_response = post_api_call(api_url_base + login_url, None , login_data).json()
or similar
api.py contains the below, but will be fleshed out with more error handling with retries etc which is what I don't want to be repeating.
import requests
import logging
def post_api_call (url, headers, data):
try:
response = requests.post(url, headers=headers, data=data)
response.raise_for_status()
except requests.exceptions.HTTPError as errh:
print ("Http Error:",errh)
logging.warning("Http Error:" + errh)
except requests.exceptions.ConnectionError as errc:
print ("Error Connecting:",errc)
logging.warning ("Error Connecting:" + errc)
except requests.exceptions.Timeout as errt:
print ("Timeout Error:",errt)
logging.warning ("Timeout Error:" + errt)
# only use above if want to retry certain errors, below should catch all of above if needed.
except requests.exceptions.RequestException as err:
print ("OOps: Something Else",err)
logging.warning ("OOps: Something Else" + err)
# retry certain errors...
return response
The above works and isn't an issue.
The issue I'm having is I'm trying to not have different functions for post/get/push etc. how can I pass this through as a variable?
The other issue I am having is some APIs need the data passed as "data=data" others only work when I specify "JSON=data". Others need headers while some don't, but if I pass headers = None as a variable i get 405 Errors. The only other way round it that I can think of is long nested if statements which is nearly as bad as the repeating code.
Am I trying to over simplify this? Is there a better way?
The scripts have a number of API calls (minimum of 5) to a number of different APIs (currently 3 but expecting this to grow) it will then combine all the received data, compare it to the database and the run any updates against the necessary APIs.
Imports:
from requests import Request, Session
Method:
def api_request(*args, **kwargs):
if "session" in kwargs and isinstance(kwargs["session"], Session):
local_session = kwargs["session"]
del kwargs["session"]
else:
local_session = Session()
req = Request(*args, **kwargs)
prepared_req = local_session.prepare_request(req)
try:
response = local_session.send(prepared_req)
except:
# error handling
pass
return response
Usage:
sess = Session()
headers = {
"Accept-Language": "en-US",
"User-Agent": "test-app"
}
result = api_request("GET", "http://www.google.com", session=sess, headers=headers)
print(result.text)

How to get the exception string from requests.exceptions.RequestException

I have the below flask code :
from flask import Flask,request,jsonify
import requests
from werkzeug.exceptions import InternalServerError, NotFound
import sys
import json
app = Flask(__name__)
app.config['SECRET_KEY'] = "Secret!"
class InvalidUsage(Exception):
status_code = 400
def __init__(self, message, status_code=None, payload=None):
Exception.__init__(self)
self.message = message
if status_code is not None:
self.status_code = status_code
self.payload = payload
def to_dict(self):
rv = dict(self.payload or ())
rv['message'] = self.message
rv['status_code'] = self.status_code
return rv
#app.errorhandler(InvalidUsage)
def handle_invalid_usage(error):
response = jsonify(error.to_dict())
response.status_code = error.status_code
return response
#app.route('/test',methods=["GET","POST"])
def test():
url = "https://httpbin.org/status/404"
try:
response = requests.get(url)
if response.status_code != 200:
try:
response.raise_for_status()
except requests.exceptions.HTTPError:
status = response.status_code
print status
raise InvalidUsage("An HTTP exception has been raised",status_code=status)
except requests.exceptions.RequestException as e:
print e
if __name__ == "__main__":
app.run(debug=True)
My question is how do i get the exception string(message) and other relevant params from the requests.exceptions.RequestException object e ?
Also what is the best way to log such exceptions . In case of an HTTPError exceptions i have the status code to refer to.
But requests.exceptions.RequestException catches all request exceptions . So how do i differentiate between them and also what is the best way to log them apart from using print statements.
Thanks a lot in advance for any answers.
RequestException is a base class for HTTPError, ConnectionError, Timeout, URLRequired, TooManyRedirects and others (the whole list is available at the GitHub page of requests module). Seems that the best way of dealing with each error and printing the corresponding information is by handling them starting from more specific and finishing with the most general one (the base class). This has been elaborated widely in the comments in this StackOverflow topic. For your test() method this could be:
#app.route('/test',methods=["GET","POST"])
def test():
url = "https://httpbin.org/status/404"
try:
# some code...
except requests.exceptions.ConnectionError as ece:
print("Connection Error:", ece)
except requests.exceptions.Timeout as et:
print("Timeout Error:", et)
except requests.exceptions.RequestException as e:
print("Some Ambiguous Exception:", e)
This way you can firstly catch the errors that inherit from the RequestException class and which are more specific.
And considering an alternative for printing statements - I'm not sure if that's exactly what you meant, but you can log into console or to a file with standard Python logging in Flask or with the logging module itself (here for Python 3).
This is actually not a question about using the requests library as much as it is a general Python question about how to extract the error string from an exception instance. The answer is relatively straightforward: you convert it to a string by calling str() on the exception instance. Any properly written exception handler (in requests or otherwise) would have implemented an __str__() method to allow an str() call on an instance. Example below:
import requests
rsp = requests.get('https://httpbin.org/status/404')
try:
if rsp.status_code >= 400:
rsp.raise_for_status()
except requests.exceptions.RequestException as e:
error_str = str(e)
# log 'error_str' to disk, a database, etc.
print('The error was:', error_str)
Yes, in this example, we print it, but once you have the string you have additional options. Anyway, saving this to test.py results in the following output given your test URL:
$ python3 test.py
The error was: 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404

Preferred Python style for exception handling

This is a general, best-practice question. Which of the following try-except examples is better (the function itself is a simple wrapper for requests.get()):
def get(self, url, params=params):
try:
response = {}
response = requests.get(url, params=params)
except requests.ConnectionError,e:
log.exception(e)
finally:
return response
or
def get(self, url, params=params):
try:
return requests.get(url, params=params)
except requests.ConnectionError,e:
log.exception(e)
return {}
Or perhaps both are suboptimal? I seem to write these kind of wrapper functions fairly often for error logging and would like to know the most Pythonic way of doing this. Any advice on this would be appreciated.
It is better to return nothing on exception, and I'm agree with Mark - there is no need to return anything on exception.
def get(self, url, params=params):
try:
return requests.get(url, params=params)
except requests.ConnectionError,e:
log.exception(e)
res = get(...)
if res is not None:
#Proccess with data
#or
if res is None:
#aborting
The second version looks ok to me, but the first one is slightly broken. For example, if the code inside try-except raises anything but ConnectionError, you'll still return {} since returning from finally suppresses any exceptions. And this latter feature is quite confusing (I had to try it myself before answering).
You can also use else clause with try:
def get(self, url, params=params):
try:
# Do dangerous some stuff here
except requests.ConnectionError,e:
# handle the exception
else: # If nothing happened
# Do some safe stuff here
return some_result
finally:
# Do some mandatory stuff
This allows defining the exception scope more precisely.
The second seems clearer to me.
The first version is a little confusing. At first I though it was an error that you were assigning to the same variable twice. It was only after some thought that I understood why this works.
I'd probably look at writing a context manager.
from contextlib import contextmanager
#contextmanager
def get(url, params=params):
try:
yield requests.get(url, params=params)
except requests.ConnectionError as e:
log.exception(e)
yield {}
except:
raise # anything else stays an exception
Then:
with get(...) as res:
print res # will be actual response or empty dict

Categories