I am writing some small python app which uses requests to get and post data to an html page.
now the problem I am having is that if I can't reach the html page the code stops with a max retries exceeded. I want to be able to do some things if I can't reach the server.
is such a thing possible?
here is sample code:
import requests
url = "http://127.0.0.1/"
req = requests.get(url)
if req.status_code == 304:
#do something
elif req.status_code == 404:
#do something else
# etc etc
# code here if server can`t be reached for whatever reason
You want to handle the exception requests.exceptions.ConnectionError, like so:
try:
req = requests.get(url)
except requests.exceptions.ConnectionError as e:
# Do stuff here
You may want to set a suitable timeout when catching ConnectionError:
url = "http://www.stackoverflow.com"
try:
req = requests.get(url, timeout=2) #2 seconds timeout
except requests.exceptions.ConnectionError as e:
# Couldn't connect
See this answer if you want to change the number of retries.
Related
Example of 522 error when I go to the webpage manually
Example of 525 error when I go to the webpage manually
Example of 504 error when I go to the webpage manually
I am running the following for loop which goes through a dictionary of subreddits(key) and urls (value). The urls produce a dictionary with all posts from 2022 of a given subreddit. Sometimes the for loop stops and produces a 'http error 525' or other errors.
I'm wondering how I can check for these errors when reading the url and then try again until the error is not given before moving to the next subreddit.
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
page = urllib.request.urlopen(url).read()
dict_last_posts[subredd] = page
I haven't been able to figure it out.
You can put this code in try and except block like this:
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
while True:
try:
page = urllib.request.urlopen(url).read()
dict_last_posts[subredd] = page
break # exit the while loop if the request succeeded
except urllib.error.HTTPError as e:
if e.code == 525 or e.code == 522 or e.code == 504:
print("Encountered HTTP error while reading URL. Retrying...")
else:
raise # re-raise the exception if it's a different error
This code will catch any HTTP Error that occurs while reading the URL and check if the error code is 525 or 504 or 525. If it is, it will print a message and try reading the URL again. If it's a different error, it will re-raise the exception so that you can handle it appropriately.
NOTE: This code will retry reading the URL indefinitely until it succeeds or a different error occurs. You may want to add a counter or a timeout to prevent the loop from going on forever in case the error persists.
It's unwise to indefinitely retry a request. Set a limit even if it's very high, but don't set it so high that it causes you to be rate limited (HTTP status 429). The backoff_factor will also have an impact on rate limiting.
Use the requests package for this. This makes it very easy to set a custom adapter for all of your requests via Session, and it includes Retry from urllib3 which takes care of retry behavior in an object you can pass to your adapter.
import requests
from requests.adapters import HTTPAdapter, Retry
s = requests.Session()
retries = Retry(
total=5,
backoff_factor=0.1,
status_forcelist=[504, 522, 525]
)
s.mount('https://', HTTPAdapter(max_retries=retries))
for subredd, url in dict_last_subreddit_posts.items():
response = s.get(url)
dict_last_posts[subredd] = response.content
You can play around with total (maximum number of retries) and backoff_factor (adjusts wait time between retries) to get the behavior you want.
Try something like this:
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
http_response = urllib.request.urlopen(url)
while http_response.status != 200:
if http_response.status == 503:
http_response = urllib.request.urlopen(url)
elif http_response.status == 523:
#enter code here
else:
#enter code here
dict_last_posts[subredd] = http_response.read()
But, Michael Ruth answer is better
I am pulling data down from an API that has a limit of 250 records per call. There are a total of 100,000 records I need to pull down doing it 250 a time. I run my application leveraging the get_stats function below. It works fine for awhile but when my wifi drops and I am in the middle of the get request the request will hang and I won't get an exception back causing the rest of the application to hang as well.
I have tested turning off my wifi when the function is NOT in the middle of the get request and it does return back the ConnectionError exception.
How do I go about handling the situation where my app is in the middle of the get request and my wifi drops? I am thinking I need to do a timeout to give my wifi time to reconnect and then retry but how do I go about doing that? Or is there another way?
def get_stats(url, version):
headers = {
"API_version": version,
"API_token": "token"
}
try:
r = requests.get(url, headers=headers)
print(f"Status code: 200")
return json.loads(r.text)
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
print("Error here in timeout")
except requests.exceptions.TooManyRedirects:
# Tell the user their URL was bad and try a different one
print("Redirect errors here")
except requests.exceptions.ConnectionError as r:
print("Connection error")
r = "Connection Error"
return r
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
print("System errors here")
raise SystemExit(e)
To set a timeout on the request, call requests.get like this
r = requests.get(url, headers=headers, timeout=10)
The end goal is to get the data, so just make the call again with a possible sleep after failing
edit: I would say that the timeout is the sleep
I am trying to handle the exceptions from the http responses.
The PROBLEM with my code is that I am forced to use and IF condition to catch http error codes
if page.status_code != requests.codes.ok:
page.raise_for_status()
I do not believe this is the right way to do it, I am trying the FOLLOWING
import requests
url = 'http://someurl.com/404-page.html'
myHeaders = {'User-agent': 'myUserAgent'}
s = requests.Session()
try:
page = s.get(url, headers=myHeaders)
#if page.status_code != requests.codes.ok:
# page.raise_for_status()
except requests.ConnectionError:
print ("DNS problem or refused to connect")
# Or Do something with it
except requests.HTTPError:
print ("Some HTTP response error")
#Or Do something with it
except requests.Timeout:
print ("Error loading...too long")
#Or Do something with it, perhaps retry
except requests.TooManyRedirects:
print ("Too many redirect")
#Or Do something with it
except requests.RequestException as e:
print (e.message)
#Or Do something with it
else:
print ("nothing happen")
#Do something if no exception
s.close()
This ALWAYS prints "nothing happen", How I would be able to catch all possible exceptions related to GET URL?
You could catch a RequestException if you want to catch all the exceptions:
import requests
try:
r = requests.get(........)
except requests.RequestException as e:
print(e.message)
I have to make a series of requests to my localserver and check response. Basically I am trying to hit the right url by brute forcing. This is my code:
for i in range(48,126):
test = chr(i)
urln = '012a4' + test
url = {"tk" : urln}
data = urllib.urlencode(url)
print data
request = urllib2.Request("http://127.0.0.1/brute.php", data)
response = urllib2.urlopen(request)
status_code = response.getcode()
I've to make request like: http://127.0.0.1/brute.php?tk=some_val
I am getting an error because the url is not properly encoding. I am internal server error 500 even when one of the url in series should give 200. manually giving that url confirms it. Also, what is the right way to skip 500/400 errors until I get a 200?
When using urllib2 you should always handle any exceptions that are raised as follows:
import urllib, urllib2
for i in range(0x012a40, 0x12a8e):
url = {"tk" : '{:x}'.format(i)}
data = urllib.urlencode(url)
print data
try:
request = urllib2.Request("http://127.0.0.1/brute.php", data)
response = urllib2.urlopen(request)
status_code = response.getcode()
except urllib2.URLError, e:
print e.reason
This will display the following when the connection fails, and then continue to try the next connection:
[Errno 10061] No connection could be made because the target machine actively refused it
e.reason will give you the textual reason, and e.errno will give you the error code. So you could still stop if the error was something other than 10061 for example.
Lastly, you seem to be cycling through a range of numbers in hex format? You might find it easier to work directly with 0x formatting to build your strings.
It sounds like you will benefit from a try/except block:
for i in range(48,126):
test = 'chr(i)'
new urln = '012a4' + test
url = {"tk" : urln}
data = urllib.urlencode(url)
print data
request = urllib2.Request("http://127.0.0.1/brute.php", data)
try:
response = urllib2.urlopen(request)
except:
status_code = response.getcode()**strong text**
print status_code
You typically would also want to catch the error as well:
except Exception, e:
print e
Or catch specific errors only, for example:
except ValueError:
#do stuff
Though you wouldn't get a ValueError in your code.
I am using the following code to resolve redirects to return a links final url
def resolve_redirects(url):
return urllib2.urlopen(url).geturl()
Unfortunately I sometimes get HTTPError: HTTP Error 429: Too Many Requests. What is a good way to combat this? Is the following good or is there a better way.
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
time.sleep(5)
return urllib2.urlopen(url).geturl()
Also, what would happen if there is an exception in the except block?
It would be better to make sure the HTTP code is actually 429 before re-trying.
That can be done like this:
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError, e:
if e.code == 429:
time.sleep(5);
return resolve_redirects(url)
raise
This will also allow arbitrary numbers of retries (which may or may not be desired).
https://docs.python.org/2/howto/urllib2.html#httperror
This is a fine way to handle the exception, though you should check to make sure you are always sleeping for the appropriate amount of time between requests for the given website (for example twitter limits the amount of requests per minute and has this amount clearly shown in their api documentation). So just make sure you're always sleeping long enough.
To recover from an exception within an exception, you can simply embed another try/catch block:
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
time.sleep(5)
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
return "Failed twice :S"
Edit: as #jesse-w-at-z points out, you should be returning an URL in the second error case, the code I posted is just a reference example of how to write a nested try/catch.
Adding User-Agent to request header solved my issue:
from urllib import request
from urllib.request import urlopen
url = 'https://www.example.com/abc.json'
req = request.Request(url)
req.add_header('User-Agent', 'abc-bot')
response = request.urlopen(req)