I use requests.post(url, headers, timeout=10) and sometimes I received a ReadTimeout exception HTTPSConnectionPool(host='domain.com', port=443): Read timed out. (read timeout=10)
Since I already set timeout as 10 seconds, why am I still receiving a ReadTimeout exception?
Per https://requests.readthedocs.io/en/latest/user/quickstart/#timeouts, that is the expected behavior. As royhowie mentioned, wrap it in a try/except block
(e.g.:
try:
requests.post(url, headers, timeout=10)
except requests.exceptions.Timeout:
print "Timeout occurred"
)
try:
#defined request goes here
except requests.exceptions.ReadTimeout:
# Set up for a retry, or continue in a retry loop
You can wrap it like an exception block like this. Since you asked for this only ReadTimeout. Otherwise catch all of them;
try:
#defined request goes here
except:
# Set up for a retry, or continue in a retry loop
Another thing you can try is at the end of your code block, include the following:
time.sleep(2)
This worked for me. The delay is longer (in seconds) but might help overcome the issue you're having.
Related
Example of 522 error when I go to the webpage manually
Example of 525 error when I go to the webpage manually
Example of 504 error when I go to the webpage manually
I am running the following for loop which goes through a dictionary of subreddits(key) and urls (value). The urls produce a dictionary with all posts from 2022 of a given subreddit. Sometimes the for loop stops and produces a 'http error 525' or other errors.
I'm wondering how I can check for these errors when reading the url and then try again until the error is not given before moving to the next subreddit.
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
page = urllib.request.urlopen(url).read()
dict_last_posts[subredd] = page
I haven't been able to figure it out.
You can put this code in try and except block like this:
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
while True:
try:
page = urllib.request.urlopen(url).read()
dict_last_posts[subredd] = page
break # exit the while loop if the request succeeded
except urllib.error.HTTPError as e:
if e.code == 525 or e.code == 522 or e.code == 504:
print("Encountered HTTP error while reading URL. Retrying...")
else:
raise # re-raise the exception if it's a different error
This code will catch any HTTP Error that occurs while reading the URL and check if the error code is 525 or 504 or 525. If it is, it will print a message and try reading the URL again. If it's a different error, it will re-raise the exception so that you can handle it appropriately.
NOTE: This code will retry reading the URL indefinitely until it succeeds or a different error occurs. You may want to add a counter or a timeout to prevent the loop from going on forever in case the error persists.
It's unwise to indefinitely retry a request. Set a limit even if it's very high, but don't set it so high that it causes you to be rate limited (HTTP status 429). The backoff_factor will also have an impact on rate limiting.
Use the requests package for this. This makes it very easy to set a custom adapter for all of your requests via Session, and it includes Retry from urllib3 which takes care of retry behavior in an object you can pass to your adapter.
import requests
from requests.adapters import HTTPAdapter, Retry
s = requests.Session()
retries = Retry(
total=5,
backoff_factor=0.1,
status_forcelist=[504, 522, 525]
)
s.mount('https://', HTTPAdapter(max_retries=retries))
for subredd, url in dict_last_subreddit_posts.items():
response = s.get(url)
dict_last_posts[subredd] = response.content
You can play around with total (maximum number of retries) and backoff_factor (adjusts wait time between retries) to get the behavior you want.
Try something like this:
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
http_response = urllib.request.urlopen(url)
while http_response.status != 200:
if http_response.status == 503:
http_response = urllib.request.urlopen(url)
elif http_response.status == 523:
#enter code here
else:
#enter code here
dict_last_posts[subredd] = http_response.read()
But, Michael Ruth answer is better
I am pulling data down from an API that has a limit of 250 records per call. There are a total of 100,000 records I need to pull down doing it 250 a time. I run my application leveraging the get_stats function below. It works fine for awhile but when my wifi drops and I am in the middle of the get request the request will hang and I won't get an exception back causing the rest of the application to hang as well.
I have tested turning off my wifi when the function is NOT in the middle of the get request and it does return back the ConnectionError exception.
How do I go about handling the situation where my app is in the middle of the get request and my wifi drops? I am thinking I need to do a timeout to give my wifi time to reconnect and then retry but how do I go about doing that? Or is there another way?
def get_stats(url, version):
headers = {
"API_version": version,
"API_token": "token"
}
try:
r = requests.get(url, headers=headers)
print(f"Status code: 200")
return json.loads(r.text)
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
print("Error here in timeout")
except requests.exceptions.TooManyRedirects:
# Tell the user their URL was bad and try a different one
print("Redirect errors here")
except requests.exceptions.ConnectionError as r:
print("Connection error")
r = "Connection Error"
return r
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
print("System errors here")
raise SystemExit(e)
To set a timeout on the request, call requests.get like this
r = requests.get(url, headers=headers, timeout=10)
The end goal is to get the data, so just make the call again with a possible sleep after failing
edit: I would say that the timeout is the sleep
I am using the following code to resolve redirects to return a links final url
def resolve_redirects(url):
return urllib2.urlopen(url).geturl()
Unfortunately I sometimes get HTTPError: HTTP Error 429: Too Many Requests. What is a good way to combat this? Is the following good or is there a better way.
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
time.sleep(5)
return urllib2.urlopen(url).geturl()
Also, what would happen if there is an exception in the except block?
It would be better to make sure the HTTP code is actually 429 before re-trying.
That can be done like this:
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError, e:
if e.code == 429:
time.sleep(5);
return resolve_redirects(url)
raise
This will also allow arbitrary numbers of retries (which may or may not be desired).
https://docs.python.org/2/howto/urllib2.html#httperror
This is a fine way to handle the exception, though you should check to make sure you are always sleeping for the appropriate amount of time between requests for the given website (for example twitter limits the amount of requests per minute and has this amount clearly shown in their api documentation). So just make sure you're always sleeping long enough.
To recover from an exception within an exception, you can simply embed another try/catch block:
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
time.sleep(5)
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
return "Failed twice :S"
Edit: as #jesse-w-at-z points out, you should be returning an URL in the second error case, the code I posted is just a reference example of how to write a nested try/catch.
Adding User-Agent to request header solved my issue:
from urllib import request
from urllib.request import urlopen
url = 'https://www.example.com/abc.json'
req = request.Request(url)
req.add_header('User-Agent', 'abc-bot')
response = request.urlopen(req)
so I want to check if a URL is reachable from python, and I got this code from googling:
def checkUrl(url):
p = urlparse(url)
conn = http.client.HTTPConnection(p.netloc)
conn.request('HEAD', p.path)
resp = conn.getresponse()
return resp.status < 400
Here is my URL: https://eurotableau.nomisonline.com.
It works fine if I just pass that in to the function. The resp.status is 302. However, if I add a port 443 at the end of it, https://eurotableau.nomisonline.com:443, it returns false. The resp.status is 400. I tried both URL in google Chrome, both of them work. So my question is why is this happening? Anyway I can include the port value and still get valid resp.status value (< 400)? Thanks.
Use http.client.HTTPSConnection instead. The plain old HTTPConnection ignores the protocol that is part of the URL.
If you do not require the HEAD method but just wish to check if host is available then why not do:
from urllib2 import urlopen
try:
u = urlopen("https://eurotableau.nomisonline.com")
u.close()
print "Everything fine!"
except Exception, e:
if hasattr(e, "code"):
print "Server is there but something is wrong with rest of URL"
else: print "Server is on vacations or was never there!"
print e
This will establish a connection with server but it won't download any data unless you read it. It'll only read few KB to get the header (like when using HEAD method) and wait for you to request more. But you will close it there.
So, you can catch an exception and see what the problem is, or if there is no exception, just close the connection.
urllib2 will handle HTTPS and protocol://user#URL:PORT for you neatly.
No worries about anything.
Suppose I have a code snippet as following
r = requests.post(url, data=values, files=files)
Since this is making a network request, a bunch of exceptions can be thrown from this line. For completeness of the argument, I could also have file reads, sending emails, etc. To encounter for such errors I do
try:
r = requests.post(url, data=values, files=files)
if r.status_code != 200:
raise Exception("Could not post to "+ url)
except Exception as e:
logger.error("Error posting to " + url)
There are two problems which I see with this approach.
I have just handled a generic exception and don't know what exact exception would be raised by this line, what is the best way to find it in python.
This makes the code look ugly, which is non pythonic but fine, as long as its robust and handles all the cases.
I am wondering what would be the best way to handle exceptions in python.
The best way to write try-except -- in Python or anywhere else -- is as narrow as possible. It's a common problem to catch more exceptions than you meant to handle!
In particular, at a minimum, I'd re-write your example code as something like:
try:
r = requests.post(url, data=values, files=files)
except Exception as e:
logger.error("Error posting to %r: %s" % (url, e))
raise
else:
if r.status_code != 200:
logger.error("Could not to %r: HTTP code %s" % (url, r.status_code))
raise RuntimeError("HTTP code %s trying to post to %r" % (r.status_code, url))
This embodies several best-practices, such as: detailed error messages, always re-raise exceptions you don't know how to specifically handle (after logging error messages with more details as well as the exception), never raise something as generic as Exception, &c -- and, crucially, catch exceptions only on the narrowest part of code you possibly can, that's what the else: clause in try/except is for!-)
If and when you do expect -- and know how to handle -- specific exceptions, so much the better -- you put other except ThisSpecificProblem as e: clauses before the generic except Exception clause which logs and re-raises. But (from the Zen of Python -- import this at a Python interpreter prompt!) -- "Errors should never pass silently. // Unless explicitly silenced."... and you should only "explicitly silence" errors you fully expect, and fully know how to handle!
I have just handled a generic exception and don't know what exact
exception would be raised by this line, what is the best way to find
it in python.
As always, the answer is to look at the documentation:
In the event of a network problem (e.g. DNS failure, refused
connection, etc), Requests will raise a ConnectionError exception.
In the rare event of an invalid HTTP response, Requests will raise an
HTTPError exception.
If a request times out, a Timeout exception is raised.
If a request exceeds the configured number of maximum redirections, a
TooManyRedirects exception is raised.
All exceptions that Requests explicitly raises inherit from
requests.exceptions.RequestException.
Code that raises exceptions (especially if there are custom exceptions) is documented. You can also have a look at the source if the documentation is not explicit.
Your code is fine, except you should avoid generic except clauses as these can hide other problems with your code. You should except those exceptions that you can predict, and then let the others "rise up" until caught/logged.
Well, answering your first question, what exact exception would be raised by this line, you are one step away.
You already call except Exception as e, but you don't use e anywhere. e contains the information about your exception, so just add a little print statement
print e
And it works:
>>> try:
... x = int(raw_input('Input: '))
... except Exception as e:
... print e
...
Input: 5t
invalid literal for int() with base 10: '5t'
>>>
I don't exactly see what you're asking in the 2nd, you say it is ugly/non-pythonic, but then you say it is fine. Yes, it is fine, and it is also quite pythonic, in my opinion.
You should try avoiding using except Exception as e: as much as possible.
For clarity you can create a custom exception class which takes care of your error code = 200 scenario.
class PostingError(Exception):
pass
And then raise PostingError only. Try catching this error only. By catching all kinds of error, you might be catching wrong information. For example even a memory error might be caught and displayed as a "Error posting to URL".
So this is how it would look like finally
try:
r = requests.post(url, data=values, files=files)
if r.status_code != 200:
raise PostingError("Could not post to "+ url)
except PostingError as e:
logger.error(e)