As seen here, max-retries can be set for requests.Session(), but I only need the head.status_code to check if a url is valid and active.
Is there a way to just get the head within a mount session?
import requests
def valid_active_url(url):
try:
site_ping = requests.head(url, allow_redirects=True)
except requests.exceptions.ConnectionError:
print('Error trying to connect to {}.'.format(url))
try:
if (site_ping.status_code < 400):
return True
else:
return False
except Exception:
return False
return False
Based on docs am thinking I need to either:
see if the session.mount method results return a status code (which I haven't found yet)
roll my own retry method, perhaps with a decorator like this or this or a (less eloquent) loop like this.
In terms of the first approach I have tried:
s = requests.Session()
a = requests.adapters.HTTPAdapter(max_retries=3)
s.mount('http://redirected-domain.com', a)
resp = s.get('http://www.redirected-domain.org')
resp.status_code
Are we only using s.mount() to get in and set max_retries? Seems to be a redundancy, aside from that the http connection would be pre-established.
Also resp.status_code returns 200 where I am expecting a 301 (which is what requests.head returns.
NOTE: resp.ok might be all I need for my purposes here.
After a mere two hours of developing the question, the answer took five minutes:
def valid_url(url):
if (url.lower() == 'none') or (url == ''):
return False
try:
s = requests.Session()
a = requests.adapters.HTTPAdapter(max_retries=5)
s.mount(url, a)
resp = s.head(url)
return resp.ok
except requests.exceptions.MissingSchema:
# If it's missing the schema, run again with schema added
return valid_url('http://' + url)
except requests.exceptions.ConnectionError:
print('Error trying to connect to {}.'.format(url))
return False
Based on this answer it looks like the head request will be slightly less resource intensive than the get, particularly if the url contains a large amount of data.
The requests.adapters.HTTPAdapter is the built in adaptor for the urllib3 library that underlies the Requests library.
On another note, I'm not sure what the correct term or phrase for what I'm checking here is. A url could still be valid if it returns an error code.
I want to check if a website exists. I use the request module to make a get request and check the status code after the request was made.
def check_website_exist(self, url):
result = True
request = requests.get("http://"+url)
print(request.status_code)
if request.status_code == 200:
output.info("website found!")
return result
else:
output.error("website not found!")
result = False
return result
When I make a request for the site 'www.isdfugpdohsiughsdopiughdsfoiguf.com' I get the status code 200, even though the site doesn't exists. Why do I get a 200 status code, but the website doesn't exist?
Here is an example of how to do it. you are not catching the possible errors.
import requests
from requests.exceptions import ConnectionError
try:
url = "http://www.isdfugpdohsiughsdopiughdsfoiguf.com"
request_url = requests.get(url)
print(request_url.status_code)
except ConnectionError:
print("No exist")
so I want to check if a URL is reachable from python, and I got this code from googling:
def checkUrl(url):
p = urlparse(url)
conn = http.client.HTTPConnection(p.netloc)
conn.request('HEAD', p.path)
resp = conn.getresponse()
return resp.status < 400
Here is my URL: https://eurotableau.nomisonline.com.
It works fine if I just pass that in to the function. The resp.status is 302. However, if I add a port 443 at the end of it, https://eurotableau.nomisonline.com:443, it returns false. The resp.status is 400. I tried both URL in google Chrome, both of them work. So my question is why is this happening? Anyway I can include the port value and still get valid resp.status value (< 400)? Thanks.
Use http.client.HTTPSConnection instead. The plain old HTTPConnection ignores the protocol that is part of the URL.
If you do not require the HEAD method but just wish to check if host is available then why not do:
from urllib2 import urlopen
try:
u = urlopen("https://eurotableau.nomisonline.com")
u.close()
print "Everything fine!"
except Exception, e:
if hasattr(e, "code"):
print "Server is there but something is wrong with rest of URL"
else: print "Server is on vacations or was never there!"
print e
This will establish a connection with server but it won't download any data unless you read it. It'll only read few KB to get the header (like when using HEAD method) and wait for you to request more. But you will close it there.
So, you can catch an exception and see what the problem is, or if there is no exception, just close the connection.
urllib2 will handle HTTPS and protocol://user#URL:PORT for you neatly.
No worries about anything.
I am writing some small python app which uses requests to get and post data to an html page.
now the problem I am having is that if I can't reach the html page the code stops with a max retries exceeded. I want to be able to do some things if I can't reach the server.
is such a thing possible?
here is sample code:
import requests
url = "http://127.0.0.1/"
req = requests.get(url)
if req.status_code == 304:
#do something
elif req.status_code == 404:
#do something else
# etc etc
# code here if server can`t be reached for whatever reason
You want to handle the exception requests.exceptions.ConnectionError, like so:
try:
req = requests.get(url)
except requests.exceptions.ConnectionError as e:
# Do stuff here
You may want to set a suitable timeout when catching ConnectionError:
url = "http://www.stackoverflow.com"
try:
req = requests.get(url, timeout=2) #2 seconds timeout
except requests.exceptions.ConnectionError as e:
# Couldn't connect
See this answer if you want to change the number of retries.
By using python, how can I check if a website is up? From what I read, I need to check the "HTTP HEAD" and see status code "200 OK", but how to do so ?
Cheers
Related
How do you send a HEAD HTTP request in Python?
You could try to do this with getcode() from urllib
import urllib.request
print(urllib.request.urlopen("https://www.stackoverflow.com").getcode())
200
For Python 2, use
print urllib.urlopen("http://www.stackoverflow.com").getcode()
200
I think the easiest way to do it is by using Requests module.
import requests
def url_ok(url):
r = requests.head(url)
return r.status_code == 200
You can use httplib
import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("HEAD", "/")
r1 = conn.getresponse()
print r1.status, r1.reason
prints
200 OK
Of course, only if www.python.org is up.
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
req = Request("http://stackoverflow.com")
try:
response = urlopen(req)
except HTTPError as e:
print('The server couldn\'t fulfill the request.')
print('Error code: ', e.code)
except URLError as e:
print('We failed to reach a server.')
print('Reason: ', e.reason)
else:
print ('Website is working fine')
Works on Python 3
import httplib
import socket
import re
def is_website_online(host):
""" This function checks to see if a host name has a DNS entry by checking
for socket info. If the website gets something in return,
we know it's available to DNS.
"""
try:
socket.gethostbyname(host)
except socket.gaierror:
return False
else:
return True
def is_page_available(host, path="/"):
""" This function retreives the status code of a website by requesting
HEAD data from the host. This means that it only requests the headers.
If the host cannot be reached or something else goes wrong, it returns
False.
"""
try:
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
if re.match("^[23]\d\d$", str(conn.getresponse().status)):
return True
except StandardError:
return None
The HTTPConnection object from the httplib module in the standard library will probably do the trick for you. BTW, if you start doing anything advanced with HTTP in Python, be sure to check out httplib2; it's a great library.
If server if down, on python 2.7 x86 windows urllib have no timeout and program go to dead lock. So use urllib2
import urllib2
import socket
def check_url( url, timeout=5 ):
try:
return urllib2.urlopen(url,timeout=timeout).getcode() == 200
except urllib2.URLError as e:
return False
except socket.timeout as e:
print False
print check_url("http://google.fr") #True
print check_url("http://notexist.kc") #False
I use requests for this, then it is easy and clean.
Instead of print function you can define and call new function (notify via email etc.). Try-except block is essential, because if host is unreachable then it will rise a lot of exceptions so you need to catch them all.
import requests
URL = "https://api.github.com"
try:
response = requests.head(URL)
except Exception as e:
print(f"NOT OK: {str(e)}")
else:
if response.status_code == 200:
print("OK")
else:
print(f"NOT OK: HTTP response code {response.status_code}")
You may use requests library to find if website is up i.e. status code as 200
import requests
url = "https://www.google.com"
page = requests.get(url)
print (page.status_code)
>> 200
In my opinion, caisah's answer misses an important part of your question, namely dealing with the server being offline.
Still, using requests is my favorite option, albeit as such:
import requests
try:
requests.get(url)
except requests.exceptions.ConnectionError:
print(f"URL {url} not reachable")
If by up, you simply mean "the server is serving", then you could use cURL, and if you get a response than it's up.
I can't give you specific advice because I'm not a python programmer, however here is a link to pycurl http://pycurl.sourceforge.net/.
Hi this class can do speed and up test for your web page with this class:
from urllib.request import urlopen
from socket import socket
import time
def tcp_test(server_info):
cpos = server_info.find(':')
try:
sock = socket()
sock.connect((server_info[:cpos], int(server_info[cpos+1:])))
sock.close
return True
except Exception as e:
return False
def http_test(server_info):
try:
# TODO : we can use this data after to find sub urls up or down results
startTime = time.time()
data = urlopen(server_info).read()
endTime = time.time()
speed = endTime - startTime
return {'status' : 'up', 'speed' : str(speed)}
except Exception as e:
return {'status' : 'down', 'speed' : str(-1)}
def server_test(test_type, server_info):
if test_type.lower() == 'tcp':
return tcp_test(server_info)
elif test_type.lower() == 'http':
return http_test(server_info)
Requests and httplib2 are great options:
# Using requests.
import requests
request = requests.get(value)
if request.status_code == 200:
return True
return False
# Using httplib2.
import httplib2
try:
http = httplib2.Http()
response = http.request(value, 'HEAD')
if int(response[0]['status']) == 200:
return True
except:
pass
return False
If using Ansible, you can use the fetch_url function:
from ansible.module_utils.basic import AnsibleModule
from ansible.module_utils.urls import fetch_url
module = AnsibleModule(
dict(),
supports_check_mode=True)
try:
response, info = fetch_url(module, url)
if info['status'] == 200:
return True
except Exception:
pass
return False
my 2 cents
def getResponseCode(url):
conn = urllib.request.urlopen(url)
return conn.getcode()
if getResponseCode(url) != 200:
print('Wrong URL')
else:
print('Good URL')
Here's my solution using PycURL and validators
import pycurl, validators
def url_exists(url):
"""
Check if the given URL really exists
:param url: str
:return: bool
"""
if validators.url(url):
c = pycurl.Curl()
c.setopt(pycurl.NOBODY, True)
c.setopt(pycurl.FOLLOWLOCATION, False)
c.setopt(pycurl.CONNECTTIMEOUT, 10)
c.setopt(pycurl.TIMEOUT, 10)
c.setopt(pycurl.COOKIEFILE, '')
c.setopt(pycurl.URL, url)
try:
c.perform()
response_code = c.getinfo(pycurl.RESPONSE_CODE)
c.close()
return True if response_code < 400 else False
except pycurl.error as err:
errno, errstr = err
raise OSError('An error occurred: {}'.format(errstr))
else:
raise ValueError('"{}" is not a valid url'.format(url))