When I use the following function with the Python 3.2.3 package in cygwin it hangs on any request to any https host. It will throw with this error: [Errno 104] Connection reset by peer, after 60 seconds.
UPDATE: I thought it was limited to only cygwin, but this also happens in Windows 7 64bit with Python 3.3. I'll try 3.2 right now. The error when using the windows command shell is:
urlopen error [WinError 10054] An existing connection was forcibly closed by the remote host
UPDATE2(Electric-Bugaloo): This is limited to a couple of sites that I'm trying to use. I tested against google and other major sites with no issue. It appears it's related to this bug:
http://bugs.python.org/issue16361
Specifically, the server is hanging after the client-hello. It's due to the version of openssl that shipped with the compiled versions of python3.2 and 3.3. It's mis-identifying the ssl version of the server. Now I need code to auto downgrade my version of ssl to sslv3 when opening a connection to the affected sites like in this post:
How to use urllib2 to get a webpage using SSLv3 encryption
but I can't get it to work.
def worker(url, body=None, bt=None):
'''This function does all the requests to wherever for data
takes in a url, optional body utf-8 encoded please, and optional body type'''
hdrs = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-us,en;q=0.5',
'Accept-Encoding': 'gzip,deflate',
'User-Agent': "My kewl Python tewl!"}
if 'myweirdurl' in url:
hdrs = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-us,en;q=0.5',
'Accept-Encoding': 'gzip,deflate',
'User-Agent': "Netscape 6.0"}
if bt:
hdrs['Content-Type'] = bt
urlopen = urllib.request.urlopen
Request = urllib.request.Request
start_req = time.time()
logger.debug('request start: {}'.format(datetime.now().ctime()))
if 'password' not in url:
logger.debug('request url: {}'.format(url))
req = Request(url, data=body, headers=hdrs)
try:
if body:
logger.debug("body: {}".format(body))
handle = urlopen(req, data=body, timeout=298)
else:
handle = urlopen(req, timeout=298)
except socket.error as se:
logger.error(se)
logger.error(se.errno)
logger.error(type(se))
if hasattr(se, 'errno') == 60:
logger.error("returning: Request Timed Out")
return 'Request Timed Out'
except URLError as ue:
end_time = time.time()
logger.error(ue)
logger.error(hasattr(ue, 'code'))
logger.error(hasattr(ue, 'errno'))
logger.error(hasattr(ue, 'reason'))
if hasattr(ue, 'code'):
logger.warn('The server couldn\'t fulfill the request.')
logger.error('Error code: {}'.format(ue.code))
if ue.code == 404:
return "Resource Not Found (404)"
elif hasattr(ue, 'reason') :
logger.warn('We failed to reach a server with {}'.format(url))
logger.error('Reason: {}'.format(ue.reason))
logger.error(type(ue.reason))
logger.error(ue.reason.errno)
if ue.reason == 'Operation timed out':
logger.error("Arrggghh, timed out!")
else:
logger.error("Why U no match my reason?")
if ue.reason.errno == 60:
return "Operation timed out"
elif hasattr(ue, 'errno'):
logger.warn(ue.reason)
logger.error('Error code: {}'.format(ue.errno))
if ue.errno == 60:
return "Operation timed out"
logger.error("req time: {}".format(end_time - start_req))
logger.error("returning: Server Error")
return "Server Error"
else:
resp_headers = dict(handle.info())
logger.debug('Here are the headers of the page : {}'.format(resp_headers))
logger.debug("The true URL in case of redirects {}".format(handle.geturl()))
try:
ce = resp_headers['Content-Encoding']
except KeyError as ke:
ce = None
else:
logger.debug('Content-Encoding: {}'.format(ce))
try:
ct = resp_headers['Content-Type']
except KeyError as ke:
ct = None
else:
logger.debug('Content-Type: {}'.format(ct))
if ce == "gzip":
logger.debug("Unzipping payload")
bi = BytesIO(handle.read())
gf = GzipFile(fileobj=bi, mode="rb")
if "charset=utf-8" in ct.lower() or ct == 'text/html' or ct == 'text/plain':
payload = gf.read().decode("utf-8")
else:
logger.debug("Unknown content type: {}".format(ct))
sys.exit()
return payload
else:
if ct is not None and "charset=utf-8" in ct.lower() or ct == 'text/html' or ct == 'text/plain':
return handle.read().decode("utf-8")
else:
logger.debug("Unknown content type: {}".format(ct))
sys.exit()
I figured it out, here's the code block necessary to make this work on Windows:
'''had to add this windows specific block to handle this bug in urllib2:
http://bugs.python.org/issue11220
'''
if "windows" in platform().lower():
if 'my_wacky_url' or 'my_other_wacky_url' in url.lower():
import ssl
ssl_context = urllib.request.HTTPSHandler(
context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
opener = urllib.request.build_opener(ssl_context)
urllib.request.install_opener(opener)
#end of urllib workaround
I added this blob right before the first try: block and it worked like a charm. Thanks for the assistance andrean!
Related
im trying to make small watchdog script for pinging container with nginx.
this is my code:
import requests
import time
url = 'http://localhost:1234/'
response = ''
def checkloop (url, response):
try:
response = requests.get(url)
except requests.ConnectionError:
print("Can't connect to the site, sorry")
else:
response.status_code == 200
print("OK", response.status_code)
while response == '':
checkloop (url, response)
time.sleep(5)
But i cant get error when host is in down state. script breaking. i always getting this error:
"requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=1234): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10332e4c0>: Failed to establish a new connection: [Errno 61] Connection refused'))"
how i can get in print() 403 or 404 or another error when host is different when httpget 200 code? is it possible?
or maybe you can advise to me already working watchdog for python or small manual how to do that?
A HTTP 2xx/3x/4xx/5xx answer from the server requires to be able to HTTP-connect to the server. If you can't connect at the network level at all (host unreachable), you can't connect at the HTTP level (see the OSI model).
Instead of making your checkloop function update the global variable response, you could use your function to tell you if the server is reachable. Example :
import requests
import time
url = 'http://localhost:1234/'
def can_connect_to(url) -> bool: # `bool` is a type hint for the return type of the function
try:
response = requests.get(url)
except requests.ConnectionError:
print("Can't connect to the site, try again later")
return False # can NOT connect
else:
# do whatever you want
if response.status_code == 200:
print("OK 200")
return True # can connect
else:
print("strange ...", response.status_code)
# return True or False ?
while not can_connect_to(url):
time.sleep(5)
If you need the response content, you can use Python's Thruthy/False value for that, which looks like what you posted :
import requests
import time
url = 'http://localhost:1234/'
def fetch(url) -> str: # we expect a string to be returned (any length)
try:
response = requests.get(url)
except requests.ConnectionError:
print("Can't connect to the site, try again later")
return "" # can NOT connect, we don't have any content for now
elif response.status_code == 200:
print("OK 200")
return response.text # can connect
else:
print("strange ...", response.status_code)
return response.text # may be truthy or falsy ...
while not response_text := fetch(url): # call fetch, put the result into `response_text`, and check if its truthy or falsy
time.sleep(5)
# at this point, `not response_text` is False, which means that `response_text` is truthy, which for a string means it is non-empty
print(response_text)
I have been building my own python (version 3.2.1) trading application in a practice account of a Forex provider (OANDA) but I am having some issues in receiving the streaming prices with a Linux debian-based OS.
In particular, I have followed their "Python streaming rates" guide available here: http://developer.oanda.com/rest-live/sample-code/.
I have a thread calling the function 'connect_to_stream' which prints out all the ticks received from the server:
streaming_thread = threading.Thread(target=streaming.connect_to_stream, args=[])
streaming_thread.start()
The streaming.connect_to_stream function is defined as following:
def connect_to_stream():
[..]#provider-related info are passed here
try:
s = requests.Session()
url = "https://" + domain + "/v1/prices"
headers = {'Authorization' : 'Bearer ' + access_token,
'Connection' : 'keep-alive'
}
params = {'instruments' : instruments, 'accountId' : account_id}
req = requests.Request('GET', url, headers = headers, params = params)
pre = req.prepare()
resp = s.send(pre, stream = True, verify = False)
return resp
except Exception as e:
s.close()
print ("Caught exception when connecting to stream\n%s" % str(e))
if response.status_code != 200:
print (response.text)
return
for line in response.iter_lines(1):
if line:
try:
msg = json.loads(line)
print(msg)
except Exception as e:
print ("Caught exception when connecting to stream\n%s" % str(e))
return
The msg variable contains the tick received for the streaming.
The problem is that I receive ticks for three hours on average after which the connection gets dropped and the script either hangs without receiving any ticks or throws an exception with reason "Connection Reset by Peer".
Could you please share any thoughts on where I am going wrong here? Is it anything related to the requests library (iter_lines maybe)?
I would like to receive ticks indefinitely unless a Keyboard exception is raised.
Thanks
That doesn't seem too weird to me that a service would close connections living for more than 3 hours.
That's probably a safety on their side to make sure to free their server sockets from ghost clients.
So you should probably just reconnect when you are disconnected.
try:
s = requests.Session()
url = "https://" + domain + "/v1/prices"
headers = {'Authorization' : 'Bearer ' + access_token,
'Connection' : 'keep-alive'
}
params = {'instruments' : instruments, 'accountId' : account_id}
req = requests.Request('GET', url, headers = headers, params = params)
pre = req.prepare()
resp = s.send(pre, stream = True, verify = False)
return resp
except SocketError as e:
if e.errno == errno.ECONNRESET:
pass # connection has been reset, reconnect.
except Exception as e:
pass # other exceptions but you'll probably need to reconnect too.
I have a python script that makes a series of url calls using urllib2. The url is on http, but requires authentication. I am currently trying to run the script such that it will make over 100 calls. Every time I run the script, some calls fail with error code 401, and some pass. All calls are for the same URL using the same username and password. (Each time I run the script it is not the same calls that fail, sometimes the first call fails, sometimes it works.)
Any ideas why a 401 might occur inconsistently?
The error message printed to the screen is...
Here is the method responsible for making the url call:
def simpleExecuteRequest(minX, minY, maxX, maxY, type) :
url = 'http://myhost.com/geowebcache/rest/seed/mylayer.xml'
msgTemplate = """<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<seedRequest>
<name>mylayer</name>
<bounds>
<coords>
<double>%s</double>
<double>%s</double>
<double>%s</double>
<double>%s</double>
</coords>
</bounds>
<gridSetId>nyc</gridSetId>
<zoomStart>0</zoomStart>
<zoomStop>10</zoomStop>
<format>image/png</format>
<type>%s</type>
<threadCount>1</threadCount>
</seedRequest>
"""
message = msgTemplate%(minX, minY, maxX, maxY, type)
headers = { 'User-Agent' : "Python script", 'Content-type' : 'text/xml; charset="UTF-8"', 'Content-length': '%d' % len(message) }
passwordManager = urllib2.HTTPPasswordMgrWithDefaultRealm()
passwordManager.add_password(None, url, 'username', 'xxx')
authenticationHandler = urllib2.HTTPBasicAuthHandler(passwordManager)
proxyHandler = urllib2.ProxyHandler({})
opener = urllib2.build_opener(proxyHandler, authenticationHandler)
urllib2.install_opener(opener)
try :
request = urllib2.Request(url, message, headers)
response = urllib2.urlopen(request)
content = response.read()
print 'success'
except IOError, e:
print e
Sometimes the output will look like this...
<urlopen error (10053, 'Software caused connection abort')>
success
success
<urlopen error (10053, 'Software caused connection abort')>
<urlopen error (10053, 'Software caused connection abort')>
...
When run 1 minute later it might look like this...
success
<urlopen error (10053, 'Software caused connection abort')>
success
success
<urlopen error (10053, 'Software caused connection abort')>
On both runs the same series of inputs for min/max x/y and type were provided in the same order.
...
The code looks correct to me, so I don't see the issue.
Here are a few thoughts on how to proceed:
I usually work-out the http requests at the command line using curl before translating it into a script.
The requests library is easier to use than urllib2
When you receive a response, print out the headers so you can see what is going on
Instead of except IOError, e use except IOError as e. The new way protects you from hard to find errors.
I presume you redacted the username and password and are using the real ones in your own script ;-)
By using python, how can I check if a website is up? From what I read, I need to check the "HTTP HEAD" and see status code "200 OK", but how to do so ?
Cheers
Related
How do you send a HEAD HTTP request in Python?
You could try to do this with getcode() from urllib
import urllib.request
print(urllib.request.urlopen("https://www.stackoverflow.com").getcode())
200
For Python 2, use
print urllib.urlopen("http://www.stackoverflow.com").getcode()
200
I think the easiest way to do it is by using Requests module.
import requests
def url_ok(url):
r = requests.head(url)
return r.status_code == 200
You can use httplib
import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("HEAD", "/")
r1 = conn.getresponse()
print r1.status, r1.reason
prints
200 OK
Of course, only if www.python.org is up.
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
req = Request("http://stackoverflow.com")
try:
response = urlopen(req)
except HTTPError as e:
print('The server couldn\'t fulfill the request.')
print('Error code: ', e.code)
except URLError as e:
print('We failed to reach a server.')
print('Reason: ', e.reason)
else:
print ('Website is working fine')
Works on Python 3
import httplib
import socket
import re
def is_website_online(host):
""" This function checks to see if a host name has a DNS entry by checking
for socket info. If the website gets something in return,
we know it's available to DNS.
"""
try:
socket.gethostbyname(host)
except socket.gaierror:
return False
else:
return True
def is_page_available(host, path="/"):
""" This function retreives the status code of a website by requesting
HEAD data from the host. This means that it only requests the headers.
If the host cannot be reached or something else goes wrong, it returns
False.
"""
try:
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
if re.match("^[23]\d\d$", str(conn.getresponse().status)):
return True
except StandardError:
return None
The HTTPConnection object from the httplib module in the standard library will probably do the trick for you. BTW, if you start doing anything advanced with HTTP in Python, be sure to check out httplib2; it's a great library.
If server if down, on python 2.7 x86 windows urllib have no timeout and program go to dead lock. So use urllib2
import urllib2
import socket
def check_url( url, timeout=5 ):
try:
return urllib2.urlopen(url,timeout=timeout).getcode() == 200
except urllib2.URLError as e:
return False
except socket.timeout as e:
print False
print check_url("http://google.fr") #True
print check_url("http://notexist.kc") #False
I use requests for this, then it is easy and clean.
Instead of print function you can define and call new function (notify via email etc.). Try-except block is essential, because if host is unreachable then it will rise a lot of exceptions so you need to catch them all.
import requests
URL = "https://api.github.com"
try:
response = requests.head(URL)
except Exception as e:
print(f"NOT OK: {str(e)}")
else:
if response.status_code == 200:
print("OK")
else:
print(f"NOT OK: HTTP response code {response.status_code}")
You may use requests library to find if website is up i.e. status code as 200
import requests
url = "https://www.google.com"
page = requests.get(url)
print (page.status_code)
>> 200
In my opinion, caisah's answer misses an important part of your question, namely dealing with the server being offline.
Still, using requests is my favorite option, albeit as such:
import requests
try:
requests.get(url)
except requests.exceptions.ConnectionError:
print(f"URL {url} not reachable")
If by up, you simply mean "the server is serving", then you could use cURL, and if you get a response than it's up.
I can't give you specific advice because I'm not a python programmer, however here is a link to pycurl http://pycurl.sourceforge.net/.
Hi this class can do speed and up test for your web page with this class:
from urllib.request import urlopen
from socket import socket
import time
def tcp_test(server_info):
cpos = server_info.find(':')
try:
sock = socket()
sock.connect((server_info[:cpos], int(server_info[cpos+1:])))
sock.close
return True
except Exception as e:
return False
def http_test(server_info):
try:
# TODO : we can use this data after to find sub urls up or down results
startTime = time.time()
data = urlopen(server_info).read()
endTime = time.time()
speed = endTime - startTime
return {'status' : 'up', 'speed' : str(speed)}
except Exception as e:
return {'status' : 'down', 'speed' : str(-1)}
def server_test(test_type, server_info):
if test_type.lower() == 'tcp':
return tcp_test(server_info)
elif test_type.lower() == 'http':
return http_test(server_info)
Requests and httplib2 are great options:
# Using requests.
import requests
request = requests.get(value)
if request.status_code == 200:
return True
return False
# Using httplib2.
import httplib2
try:
http = httplib2.Http()
response = http.request(value, 'HEAD')
if int(response[0]['status']) == 200:
return True
except:
pass
return False
If using Ansible, you can use the fetch_url function:
from ansible.module_utils.basic import AnsibleModule
from ansible.module_utils.urls import fetch_url
module = AnsibleModule(
dict(),
supports_check_mode=True)
try:
response, info = fetch_url(module, url)
if info['status'] == 200:
return True
except Exception:
pass
return False
my 2 cents
def getResponseCode(url):
conn = urllib.request.urlopen(url)
return conn.getcode()
if getResponseCode(url) != 200:
print('Wrong URL')
else:
print('Good URL')
Here's my solution using PycURL and validators
import pycurl, validators
def url_exists(url):
"""
Check if the given URL really exists
:param url: str
:return: bool
"""
if validators.url(url):
c = pycurl.Curl()
c.setopt(pycurl.NOBODY, True)
c.setopt(pycurl.FOLLOWLOCATION, False)
c.setopt(pycurl.CONNECTTIMEOUT, 10)
c.setopt(pycurl.TIMEOUT, 10)
c.setopt(pycurl.COOKIEFILE, '')
c.setopt(pycurl.URL, url)
try:
c.perform()
response_code = c.getinfo(pycurl.RESPONSE_CODE)
c.close()
return True if response_code < 400 else False
except pycurl.error as err:
errno, errstr = err
raise OSError('An error occurred: {}'.format(errstr))
else:
raise ValueError('"{}" is not a valid url'.format(url))
I have written a script in python that uses cookies and POST/GET. I also included proxy support in my script. However, when one enters a dead proxy, the script crashes. Is there any way to check if a proxy is dead/alive before running the rest of my script?
Furthermore, I noticed that some proxies don't handle cookies/POST headers properly. Is there any way to fix this?
The simplest was is to simply catch the IOError exception from urllib:
try:
urllib.urlopen(
"http://example.com",
proxies={'http':'http://example.com:8080'}
)
except IOError:
print "Connection error! (Check proxy)"
else:
print "All was fine"
Also, from this blog post - "check status proxy address" (with some slight improvements):
for python 2
import urllib2
import socket
def is_bad_proxy(pip):
try:
proxy_handler = urllib2.ProxyHandler({'http': pip})
opener = urllib2.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib2.install_opener(opener)
req=urllib2.Request('http://www.example.com') # change the URL to test here
sock=urllib2.urlopen(req)
except urllib2.HTTPError, e:
print 'Error code: ', e.code
return e.code
except Exception, detail:
print "ERROR:", detail
return True
return False
def main():
socket.setdefaulttimeout(120)
# two sample proxy IPs
proxyList = ['125.76.226.9:80', '213.55.87.162:6588']
for currentProxy in proxyList:
if is_bad_proxy(currentProxy):
print "Bad Proxy %s" % (currentProxy)
else:
print "%s is working" % (currentProxy)
if __name__ == '__main__':
main()
for python 3
import urllib.request
import socket
import urllib.error
def is_bad_proxy(pip):
try:
proxy_handler = urllib.request.ProxyHandler({'http': pip})
opener = urllib.request.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
req=urllib.request.Request('http://www.example.com') # change the URL to test here
sock=urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
print('Error code: ', e.code)
return e.code
except Exception as detail:
print("ERROR:", detail)
return True
return False
def main():
socket.setdefaulttimeout(120)
# two sample proxy IPs
proxyList = ['125.76.226.9:80', '25.176.126.9:80']
for currentProxy in proxyList:
if is_bad_proxy(currentProxy):
print("Bad Proxy %s" % (currentProxy))
else:
print("%s is working" % (currentProxy))
if __name__ == '__main__':
main()
Remember this could double the time the script takes, if the proxy is down (as you will have to wait for two connection-timeouts).. Unless you specifically have to know the proxy is at fault, handling the IOError is far cleaner, simpler and quicker..
you can use the Proxy-checker library which is as simple as this
from proxy_checker import ProxyChecker
checker = ProxyChecker()
checker.check_proxy('<ip>:<port>')
output :
{
"country": "United States",
"country_code": "US",
"protocols": [
"socks4",
"socks5"
],
"anonymity": "Elite",
"timeout": 1649
}
with the possibility of generating your own proxies and check them with two lines of code
you can use ip-getter website to get the IP by which you are sending a request, then check if the IP is the same as your proxy IP or some thing else. Here is a script for that matter:
import requests
proxy_ip = "<IP>"
proxy_port = "<PORT>"
proxy_user = "<USERNAME>"
proxy_pass = "<PASSWORD>"
proxies = {
"http": f"http://{proxy_user}:{proxy_pass}#{proxy_ip}:{proxy_port}/",
"https": f"http://{proxy_user}:{proxy_pass}#{proxy_ip}:{proxy_port}/"
}
url = 'https://api.ipify.org'
try:
response = requests.get(url, proxies=proxies)
assert response.text==proxy_ip
except:
print("Proxy does not work")
I think that the better approach is like dbr said, handling the exception.
Another solution that could be better in some cases, is to use an external online proxy checker tool to check if a proxy server is alive and then continue using your script without any modification.
There is one nice package Grab
So, if it ok for you, you can write something like this(simple valid proxy checker-generator):
from grab import Grab, GrabError
def get_valid_proxy(proxy_list): #format of items e.g. '128.2.198.188:3124'
g = Grab()
for proxy in proxy_list:
g.setup(proxy=proxy, proxy_type='http', connect_timeout=5, timeout=5)
try:
g.go('google.com')
except GrabError:
#logging.info("Test error")
pass
else:
yield proxy