As seen here, max-retries can be set for requests.Session(), but I only need the head.status_code to check if a url is valid and active.
Is there a way to just get the head within a mount session?
import requests
def valid_active_url(url):
try:
site_ping = requests.head(url, allow_redirects=True)
except requests.exceptions.ConnectionError:
print('Error trying to connect to {}.'.format(url))
try:
if (site_ping.status_code < 400):
return True
else:
return False
except Exception:
return False
return False
Based on docs am thinking I need to either:
see if the session.mount method results return a status code (which I haven't found yet)
roll my own retry method, perhaps with a decorator like this or this or a (less eloquent) loop like this.
In terms of the first approach I have tried:
s = requests.Session()
a = requests.adapters.HTTPAdapter(max_retries=3)
s.mount('http://redirected-domain.com', a)
resp = s.get('http://www.redirected-domain.org')
resp.status_code
Are we only using s.mount() to get in and set max_retries? Seems to be a redundancy, aside from that the http connection would be pre-established.
Also resp.status_code returns 200 where I am expecting a 301 (which is what requests.head returns.
NOTE: resp.ok might be all I need for my purposes here.
After a mere two hours of developing the question, the answer took five minutes:
def valid_url(url):
if (url.lower() == 'none') or (url == ''):
return False
try:
s = requests.Session()
a = requests.adapters.HTTPAdapter(max_retries=5)
s.mount(url, a)
resp = s.head(url)
return resp.ok
except requests.exceptions.MissingSchema:
# If it's missing the schema, run again with schema added
return valid_url('http://' + url)
except requests.exceptions.ConnectionError:
print('Error trying to connect to {}.'.format(url))
return False
Based on this answer it looks like the head request will be slightly less resource intensive than the get, particularly if the url contains a large amount of data.
The requests.adapters.HTTPAdapter is the built in adaptor for the urllib3 library that underlies the Requests library.
On another note, I'm not sure what the correct term or phrase for what I'm checking here is. A url could still be valid if it returns an error code.
Related
Example of 522 error when I go to the webpage manually
Example of 525 error when I go to the webpage manually
Example of 504 error when I go to the webpage manually
I am running the following for loop which goes through a dictionary of subreddits(key) and urls (value). The urls produce a dictionary with all posts from 2022 of a given subreddit. Sometimes the for loop stops and produces a 'http error 525' or other errors.
I'm wondering how I can check for these errors when reading the url and then try again until the error is not given before moving to the next subreddit.
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
page = urllib.request.urlopen(url).read()
dict_last_posts[subredd] = page
I haven't been able to figure it out.
You can put this code in try and except block like this:
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
while True:
try:
page = urllib.request.urlopen(url).read()
dict_last_posts[subredd] = page
break # exit the while loop if the request succeeded
except urllib.error.HTTPError as e:
if e.code == 525 or e.code == 522 or e.code == 504:
print("Encountered HTTP error while reading URL. Retrying...")
else:
raise # re-raise the exception if it's a different error
This code will catch any HTTP Error that occurs while reading the URL and check if the error code is 525 or 504 or 525. If it is, it will print a message and try reading the URL again. If it's a different error, it will re-raise the exception so that you can handle it appropriately.
NOTE: This code will retry reading the URL indefinitely until it succeeds or a different error occurs. You may want to add a counter or a timeout to prevent the loop from going on forever in case the error persists.
It's unwise to indefinitely retry a request. Set a limit even if it's very high, but don't set it so high that it causes you to be rate limited (HTTP status 429). The backoff_factor will also have an impact on rate limiting.
Use the requests package for this. This makes it very easy to set a custom adapter for all of your requests via Session, and it includes Retry from urllib3 which takes care of retry behavior in an object you can pass to your adapter.
import requests
from requests.adapters import HTTPAdapter, Retry
s = requests.Session()
retries = Retry(
total=5,
backoff_factor=0.1,
status_forcelist=[504, 522, 525]
)
s.mount('https://', HTTPAdapter(max_retries=retries))
for subredd, url in dict_last_subreddit_posts.items():
response = s.get(url)
dict_last_posts[subredd] = response.content
You can play around with total (maximum number of retries) and backoff_factor (adjusts wait time between retries) to get the behavior you want.
Try something like this:
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
http_response = urllib.request.urlopen(url)
while http_response.status != 200:
if http_response.status == 503:
http_response = urllib.request.urlopen(url)
elif http_response.status == 523:
#enter code here
else:
#enter code here
dict_last_posts[subredd] = http_response.read()
But, Michael Ruth answer is better
How should I structure the code so I won't get this error:
UnboundLocalError: local variable 'r' referenced before assignment.
If I want to ensure I get a 200 response before returning r.json(), where should I place this code — inside or outside the try block?
if r.status_code == requests.code['ok']
My function:
def get_req():
url = 'https://www.example.com/search'
data = {'p': 'something'}
try:
r = requests.get(url, params=data)
r.raise_for_status()
except requests.exceptions.HTTPError as err:
print(err)
return r.json()
I don't see any way you can get that error. I see no references to r that can occur unless your call to requests.get() succeeded and set r properly. Are you sure you're seeing that happen with the code you're showing us?
Why do you want to check for status code 200? You're already calling raise_for_status(), which basically does that. raise_for_status() checks for some other codes that mean success, but you probably want your code to treat those as success for your purpose as well. So I don't think you need to check for 200 explicitly.
So by the time you call r.json() and return, that should be what you want to do.
UPDATE: Now that you've removed the sys.exit(), you have to do something specific in the error case. In the comments I've given you the four possibilities I see. The simplest one would be to declare your method as returning None if the request fails. This returns the least information to the caller, but you are printing an error already, so that might be fine. For that case, your code would look like this:
def get_req():
url = 'https://www.example.com/search'
data = {'p': 'something'}
try:
r = requests.get(url, params=data)
r.raise_for_status()
except requests.exceptions.HTTPError as err:
print(err)
return None
return r.json()
This code is "correct" if you define returning None on failure of the request, but throwing an exception in some other cases, as being the expected behavior. You might want to catch Exception instead or as a separate case, if you never want this method to throw an exception. I don't know if it's possible for requests.get() to throw some other exception than HTTPError. Maybe it isn't, in which case this code will never throw an exception as is. IMO, it's better to assume it can, and deal with that case explicitly. The code is much more readable that way and does not require future readers to know what exceptions requests.get() is able to throw to understand the behavior of this code in all cases.
your function should look like this:
def get_req():
url = 'https://www.example.com/search'
data = {'p': 'something'}
try:
r = requests.get(url, params=data)
r.raise_for_status()
if r.status_code == requests.code['ok']:
return r.json()
except requests.exceptions.HTTPError as err:
print(err)
sys.exit(1)
hope this helps
My question is closely related to this one.
I'm using the Requests library to hit an HTTP endpoint.
I want to check if the response is a success.
I am currently doing this:
r = requests.get(url)
if 200 <= response.status_code <= 299:
# Do something here!
Instead of doing that ugly check for values between 200 and 299, is there a shorthand I can use?
The response has an ok property. Use that:
if response.ok:
...
The implementation is just a try/except around Response.raise_for_status, which is itself checks the status code.
#property
def ok(self):
"""Returns True if :attr:`status_code` is less than 400, False if not.
This attribute checks if the status code of the response is between
400 and 600 to see if there was a client error or a server error. If
the status code is between 200 and 400, this will return True. This
is **not** a check to see if the response code is ``200 OK``.
"""
try:
self.raise_for_status()
except HTTPError:
return False
return True
I am a Python newbie but I think the easiest way is:
if response.ok:
# whatever
The pythonic way to check for requests success would be to optionally raise an exception with
try:
resp = requests.get(url)
resp.raise_for_status()
except requests.exceptions.HTTPError as err:
print(err)
EAFP: It’s Easier to Ask for Forgiveness than Permission: You should just do what you expect to work and if an exception might be thrown from the operation then catch it and deal with that fact.
I have to make a series of requests to my localserver and check response. Basically I am trying to hit the right url by brute forcing. This is my code:
for i in range(48,126):
test = chr(i)
urln = '012a4' + test
url = {"tk" : urln}
data = urllib.urlencode(url)
print data
request = urllib2.Request("http://127.0.0.1/brute.php", data)
response = urllib2.urlopen(request)
status_code = response.getcode()
I've to make request like: http://127.0.0.1/brute.php?tk=some_val
I am getting an error because the url is not properly encoding. I am internal server error 500 even when one of the url in series should give 200. manually giving that url confirms it. Also, what is the right way to skip 500/400 errors until I get a 200?
When using urllib2 you should always handle any exceptions that are raised as follows:
import urllib, urllib2
for i in range(0x012a40, 0x12a8e):
url = {"tk" : '{:x}'.format(i)}
data = urllib.urlencode(url)
print data
try:
request = urllib2.Request("http://127.0.0.1/brute.php", data)
response = urllib2.urlopen(request)
status_code = response.getcode()
except urllib2.URLError, e:
print e.reason
This will display the following when the connection fails, and then continue to try the next connection:
[Errno 10061] No connection could be made because the target machine actively refused it
e.reason will give you the textual reason, and e.errno will give you the error code. So you could still stop if the error was something other than 10061 for example.
Lastly, you seem to be cycling through a range of numbers in hex format? You might find it easier to work directly with 0x formatting to build your strings.
It sounds like you will benefit from a try/except block:
for i in range(48,126):
test = 'chr(i)'
new urln = '012a4' + test
url = {"tk" : urln}
data = urllib.urlencode(url)
print data
request = urllib2.Request("http://127.0.0.1/brute.php", data)
try:
response = urllib2.urlopen(request)
except:
status_code = response.getcode()**strong text**
print status_code
You typically would also want to catch the error as well:
except Exception, e:
print e
Or catch specific errors only, for example:
except ValueError:
#do stuff
Though you wouldn't get a ValueError in your code.
I am using the following code to resolve redirects to return a links final url
def resolve_redirects(url):
return urllib2.urlopen(url).geturl()
Unfortunately I sometimes get HTTPError: HTTP Error 429: Too Many Requests. What is a good way to combat this? Is the following good or is there a better way.
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
time.sleep(5)
return urllib2.urlopen(url).geturl()
Also, what would happen if there is an exception in the except block?
It would be better to make sure the HTTP code is actually 429 before re-trying.
That can be done like this:
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError, e:
if e.code == 429:
time.sleep(5);
return resolve_redirects(url)
raise
This will also allow arbitrary numbers of retries (which may or may not be desired).
https://docs.python.org/2/howto/urllib2.html#httperror
This is a fine way to handle the exception, though you should check to make sure you are always sleeping for the appropriate amount of time between requests for the given website (for example twitter limits the amount of requests per minute and has this amount clearly shown in their api documentation). So just make sure you're always sleeping long enough.
To recover from an exception within an exception, you can simply embed another try/catch block:
def resolve_redirects(url):
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
time.sleep(5)
try:
return urllib2.urlopen(url).geturl()
except HTTPError:
return "Failed twice :S"
Edit: as #jesse-w-at-z points out, you should be returning an URL in the second error case, the code I posted is just a reference example of how to write a nested try/catch.
Adding User-Agent to request header solved my issue:
from urllib import request
from urllib.request import urlopen
url = 'https://www.example.com/abc.json'
req = request.Request(url)
req.add_header('User-Agent', 'abc-bot')
response = request.urlopen(req)