Python Requests Module - API Calls - python

I'm written a django web project and am using some API calls. I'd like to try and build in some mechanisms to handle slow and failed API calls. Specifically, I'd like to try the API call three times with increasing call times breaking the loop when the request is successful. What is a good way to handle this or is what I've put together acceptable? Below is the code I have in place now.
for x in [0.5, 1, 5]:
try:
r = requests.get(api_url, headers = headers, timeout=x)
break
except:
pass

You can use exceptions provided by requests itself to handle failed Api calls. You can use ConnectionError exception if a network problem occurs. Refer to this so post for more details. I am not pasting a link to requests docs and explaining every exception in detail since SO post given before have the answer for your question. An example code segment is given below
try:
r = requests.get(url, params={'key': 'value'})
except requests.exceptions.ConnectionError as e:
print e

This outlines the procedure I'm talking about. A single API request could end up being a little flaky.
migrateup.com/making-unreliable-apis-reliable-with-python/#

Related

Python Request Package Close Connection Method Does not Work

it is the first time that I am working with a REST API in a jupyter notebook and I don't know what I am doing wrong here. When I try to execute the following code in a cell, the cell runs forever without throwing any errors. First I did not include the close method from the request package, but then I thought the problem might be the open connection. However including the close method also did not help. Do you know what could be the reason?
api_key = "exampletoken"
header = {'authorization':"Bearer {}".format(api_key)}
payload = {}
r = request.post('exampleurl', headers = header, data = payload)
r.close()
Thanks in advance!
runs forever without throwing any errors.
By default requests does not timeout, so it can wait infinite amount of time. This might cause behavior you described and mean server did not respond. To figure if that is cause, please set timeout for example
r = request.post('exampleurl', headers = header, data = payload, timeout=180)
would raise Exception after 180 seconds (i.e. 3 minutes) if it do not get response. If you want to know more about timeouts in requests I suggest reading realpython.com tutorial

delay between requests using github3 in python

I'm using python github3 module and i need to set delay between request to github api, because my app make to much load on server.
I'm doing things such as
git = github3.GitHub()
for i in itertools.chain(git.all_repositories(), git.repositories(type='private')):
do things
I found that GitHub use requests to make request to github api.
https://github.com/sigmavirus24/github3.py/blob/3e251f2a066df3c8da7ce0b56d24befcf5eb2d4b/github3/models.py#L233
But i can't figure out what parameter i should pass or what atribute i should change to set some delay between the requests.
Can you advise me something?
github3.py presently has no options to enforce delays between requests. That said, there is a way to get the request metadata which includes the number of requests you have left in your ratelimit as well as when that ratelimit should reset. I suggest you use git.rate_limit()['resources']['core'] to determine what delays you should set for yourself inside your own loop.
I use the following function when I expect to exceed my query limit:
def wait_for_karma(gh, min_karma=25, msg=None):
while gh:
core = gh.rate_limit()['resources']['core']
if core['remaining'] < min_karma:
now = time.time()
nap = max(core['reset'] - now, 0.1)
logger.info("napping for %s seconds", nap)
if msg:
logger.info(msg)
time.sleep(nap)
else:
break
I'll call it before making a call that I believe is "big" (i.e. could require multiple API calls to satisfy). Based on your code sample, you may want to do this at the bottom of your loop:
git = github3.GitHub()
for i in itertools.chain(git.all_repositories(), git.repositories(type='private')):
do_things()
wait_for_karma(git, msg="pausing")

Why does requests.get() not raise when the server can't be found?

In the following code snippet, I know for a fact that https://asdasdasdasd.vm:8080/v2/api-docs does not exist. It fails a DNS lookup. Unfortunately, the get() never seems to return, raise, or timeout. My logs have only "A" in them. I would expect A C D or A B D. But I only ever see A in the logs.
try:
sys.stderr.write("A")
resp = requests.get("https://asdasdasdasd.vm:8080/v2/api-docs", timeout=1.0)
sys.stderr.write("B")
except:
sys.stderr.write("C")
sys.stderr.write("D")
sys.stderr.flush()
return swag
(That URL is not sanitized for this post. That's actually the URL I'm trying to use while working on this question.)
What am I missing here?
EDIT - I have also tried specifying the timeout as (1.0,1.0) but the behavior did not change.
EDIT2 - Per suggestions below, I ran my code from the python and ipython consoles. The code behaves as I expect (ACD). Of course, in my real application, I am not running this code from the command line. I don't know how this matters, but the method containing the code is being invoked by a web service. Specifically, a Swagger endpoint. With my browser, I hit an endpoint that's supposed to return our Swagger documentation. The endpoint (which uses flask_swagger) invokes init_swagger(...). init_swagger() calls my method with a Swagger object. That's it. How this matters, I cannot say. It doesn't make any sense to me that something outside of my method should somehow be able to mess with my exception handling.
The only thing I can think of is that Swagger has jacked with the requests class. But now it is dinner time and I am going home.
The following code for me returns A, C, D
import requests
from requests.exceptions import ConnectionError
try:
print("A")
resp = requests.get("https://asdasdasdasd.vm:8080/v2/api-docs", timeout=1.0)
print("B")
except ConnectionError:
print("C")
print("D")
This is because the host cannot be resolved for me, if I swap it out for localhost...
resp = requests.get("http://localhost/v2/api-docs", timeout=1.0)
...then I see an A, followed by a period of time before C and D show.
From reading the comments, I know what is up...
builtins has a ConnectionError that can be used (without importing anything). Requests doesn't use this exception, instead it uses the one found in requests.exceptions if you wish to catch the ConnectionError you must catch the correct exception, or it will drop out and not execute the except clause.

Retry loading page on timeout with urllib2?

I am trying to force Python to retry loading the page when I get a timeout error. Is there a way that I can make it retry a specific number of times, possibly after a specific time delay?
Any help would be appreciated.
Thank you.
urllib2 doesn't have anything built-in for that, but you can write it yourself.
The tricky part is that, as the urlopen docs say, no matter what goes wrong, you just get a URLError. So, how do you know whether it was a timeout, or something else?
Well, if you look up URLError, it says it will have a reason which will be a socket.error for remote URLs. And if you look up socket.error it tells you that it's a subclass of either IOError or OSError (depending on your Python version). And if you look up OSError, it tells you that it has an errno that represents the underlying error.
So, which errno value do you get for timeout? I'm willing to bet it's EINPROGRESS, but let's find out for sure:
>>> urllib.urlopen('http://127.0.0.1', timeout=0)
urllib2.URLError: <urlopen error [Errno 36] Operation now in progress>
>>> errno.errorcode[36]
'EINPROGRESS'
(You could just use the number 36, but that's not guaranteed to be the same across platforms; errno.EINPROGRESS should be more portable.)
So:
import errno
import urllib2
def retrying_urlopen(retries, *args, **kwargs):
for i in range(retries):
try:
return urllib2.urlopen(*args, **kwargs)
except URLError as e:
if e.reason.errno == errno.EINPROGRESS:
continue
raise
If you think this sucks and should be a lot less clunky… well, I think everyone agrees. Exceptions have been radically improved twice, with another big one coming up, plus various small changes along the way. But if you stick with 2.7, you don't get the benefits of those improvements.
If moving to Python 3.4 isn't possible, maybe moving to a third-party module like requests or urllib3 is. Both of those libraries have a separate exception type for Timeout, instead of making you grub through the details of a generic URLError.
Check out the requests library. If you'd like to wait only for a specified amount of time (not for the entire download, just until you get a response from the server), just add the timeout argument to the standard URL request, in seconds:
r = requests.get(url, timeout=10)
If the timeout time is exceeded, it raises a requests.exceptions.Timeout exception, which can be handled however you wish. As an example, you could put the request in a try/except block, catch the exception if it's raised, and retry the connection again for a specified number of times before failing completely.
You might also want to check out requests.adapters.HTTPAdapter, which has a max_retries argument. It's typically used within a Requests Session, and according to the docs, it provides a general-case interface for Requests sessions to contact HTTP and HTTPS urls by implementing the Transport Adapter interface.
Even I am new to Python, but I think even a simple solution like this could do the trick,
begin with considering stuff as None, where stuff is page_source. Also remember that I have only considered the URLError exception. You might want to add more as desired.
import urllib2
import time
stuff=None
max_attempts=4
r=0
while stuff is None and r<max_attempts:
try:
response = urllib2.urlopen('http://www.google.com/ncr', timeout=10)
stuff = response.read()
except urllib2.URLError:
r=r+1
print "Re-trying, attempt -- ",r
time.sleep(5)
pass
print stuff
Hope that helps.
Regards,
Md. Mohsin

Recovering from ECONNRESET in Python/Mechanize

I've got a large bulk downloading application written in Python/Mechanize, aiming to download something like 20,000 files. Clearly, any downloader that big is occasionally going to run into some ECONNRESET errors. Now, I know how to handle each of these individually, but there's two problems with that:
I'd really rather not wrap every single outbound web call in a try/catch block.
Even if I were to do so, there's trouble with knowing how to handle the errors once the exception has thrown. If the code is just
data = browser.response().read()
then I know precisely how to deal with it, namely:
data = None
while (data == None):
try:
data = browser.response().read()
except IOError as e:
if e.args[1].args[0].errno != errno.ECONNRESET:
raise
data = None
but if it's just a random instance of
browser.follow_link(link)
then how do I know what Mechanize's internal state looks like if an ECONNRESET is thrown somewhere in here? For example, do I need to call browser.back() before I try the code again? What's the proper way to recover from that kind of error?
EDIT: The solution in the accepted answer certainly works, and in my case it turned out to be not so hard to implement. I'm still academically interested, however, in whether there's an error handling mechanism that could result in quicker error catching.
Perhaps place the try..except block higher up in the chain of command:
import collections
def download_file(url):
# Bundle together the bunch of browser calls necessary to download one file.
browser.follow_link(...)
...
response=browser.response()
data=response.read()
urls=collections.deque(urls)
while urls:
url=urls.popleft()
try:
download_file(url)
except IOError as err:
if err.args[1].args[0].errno != errno.ECONNRESET:
raise
else:
# if ECONNRESET error, add the url back to urls to try again later
urls.append(url)

Categories