I have a script which get HTTP Header of a lot of pages on Internet with httplib in Python.
My problem is on a specific domain (and probably others), httplib raise an exception, and I don't understand why.
>>> import httplib
>>> http = httplib.HTTPConnection('iswtc.la')
>>> http.request('GET', '/a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/httplib.py", line 914, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.6/httplib.py", line 951, in _send_request
self.endheaders()
File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders
self._send_output()
File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output
self.send(msg)
File "/usr/lib64/python2.6/httplib.py", line 739, in send
self.connect()
File "/usr/lib64/python2.6/httplib.py", line 720, in connect
self.timeout)
File "/usr/lib64/python2.6/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known
What is different on this specific domain, and how can I handle this ?
PS : It's not really my code because this works fine :
>>> http = httplib.HTTPConnection('bit.ly')
>>> http.request('GET', '/a')
bit.ly exists, whereas iswtc.la doesn't:
$ nslookup bit.ly
Non-authoritative answer:
Name: bit.ly
Address: 69.58.188.39
Name: bit.ly
Address: 69.58.188.40
$ nslookup iswtc.la
** server can't find iswtc.la: NXDOMAIN
Related
UPDATE: I managed to do a request with urllib2, but I'm still wondering what is happening here.
I would like to do a HTTPS request with Python.
This works fine with the requests module, but I don't want to use external dependencies, so I'd like to use the standard library.
httplib
When I follow this example I don't get a response. I get a timeout instead. I'm out of ideas as to what would cause this.
Code:
import requests
print requests.get('https://python.org')
from httplib import HTTPSConnection
conn = HTTPSConnection('www.python.org')
conn.request('GET', '/index.html')
print conn.getresponse()
Output:
<Response [200]>
Traceback (most recent call last):
File "test.py", line 6, in <module>
conn.request('GET', '/index.html')
File "C:\Python27\lib\httplib.py", line 1069, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 1109, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 1065, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 892, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 854, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 1282, in connect
HTTPConnection.connect(self)
File "C:\Python27\lib\httplib.py", line 831, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 575, in create_connection
raise err
socket.error: [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
urllib
This fails for a different (but possibly related) reason. Code:
import urllib
print urllib.urlopen("https://python.org")
Output:
Traceback (most recent call last):
File "test.py", line 10, in <module>
print urllib.urlopen("https://python.org")
File "C:\Python27\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python27\lib\urllib.py", line 215, in open
return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 445, in open_https
h.endheaders(data)
File "C:\Python27\lib\httplib.py", line 1065, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 892, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 854, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 1290, in connect
server_hostname=server_hostname)
File "C:\Python27\lib\ssl.py", line 369, in wrap_socket
_context=self)
File "C:\Python27\lib\ssl.py", line 599, in __init__
self.do_handshake()
File "C:\Python27\lib\ssl.py", line 828, in do_handshake
self._sslobj.do_handshake()
IOError: [Errno socket error] [SSL: UNKNOWN_PROTOCOL] unknown protocol (_ssl.c:727)
What is requests doing that makes it succeed where both of these libraries fail?
requests.get without timeout parameter mean no timeout at all.
httplib.HTTPSConnection accept parameter timeout in Python 2.6 and newer according to httplib docs. If your problem was caused by timeout, setting high enough timeout should help. Please try replacing:
conn = HTTPSConnection('www.python.org')
with:
conn = HTTPSConnection('www.python.org', timeout=300)
which will give 300 seconds (5 minutes) for processing.
I want to get the Url status for websites with the below code. For one website (webscraper.io), I got an error. My script is:
import httplib
url = "http://webscraper.io/"
if 'http' in url:
url = url.replace('http://', '').strip()
conn = httplib.HTTPConnection(url)
conn.request("GET",'')
r1 = conn.getresponse()
print 'r1.Status code=', r1.status
I got the below errors:
Traceback (most recent call last):
File "TestSatusline.py", line 23, in <module>
conn.request("GET",'')
File "/usr/lib/python2.7/httplib.py", line 1017, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 826, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 807, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known
Does anybody has any idea?
thanks
after
if 'http' in url:
url = url.replace('http://', '').strip()
in your code , url is webscraper.io/, it should be webscraper.io
use urlparse
import httplib
import urlparse
url = "http://webscraper.io/"
o = urlparse.urlparse(url)
conn = httplib.HTTPConnection(o.netloc)
conn.request("GET",'')
r1 = conn.getresponse()
print 'r1.Status code=', r1.status
output
r1.Status code= 200
you could take a look at requests. http://docs.python-requests.org/en/master/
I need to make a simple header request to every URL in a large set, to check if they are still available. Now I made the following code:
from http import client
for i, triple in enumerate(catalouge):
connection = client.HTTPConnection(triple[2].strip('http://'))
connection.request('HEAD', '/')
print(connection.getresponse().status + ' on entry ' + str(i+1))
Now catalouge is a set of all the links with the 3rd element being the URL which needs to be checked. The .strip('http://') part is needed since I will receive this error otherwise:
http.client.InvalidURL: nonnumeric port:
With this code in place I now receive this error:
Traceback (most recent call last):
[...]
connection.request('HEAD', '/')
File "/usr/lib/python3.4/http/client.py", line 1137, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.4/http/client.py", line 1182, in _send_request
self.endheaders(body)
File "/usr/lib/python3.4/http/client.py", line 1133, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.4/http/client.py", line 963, in _send_output
self.send(msg)
File "/usr/lib/python3.4/http/client.py", line 898, in send
self.connect()
File "/usr/lib/python3.4/http/client.py", line 871, in connect
self.timeout, self.source_address)
File "/usr/lib/python3.4/socket.py", line 494, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/lib/python3.4/socket.py", line 533, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
Did I miss something? Any suggestion would be very appreciated.
Using that apache libloud docs and valid credentials i get the below error trying to list domains on godaddy. Does libcloud noi longer support godaddy?
>>> from libcloud.dns.types import Provider
>>> from libcloud.dns.providers import get_driver
>>> cls = get_driver(Provider.GODADDY)
>>> driver = cls('twst', 'adfadf', 'dsdfsdf')
>>> zones = driver.list_zones()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/libcloud/dns/drivers/godaddy.py", line 146, in list_zones
'/v1/domains/').object
File "/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py", line 782, in request
headers=headers)
File "/usr/lib/python2.7/httplib.py", line 979, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1013, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 797, in send
self.connect()
File "/usr/local/lib/python2.7/dist-packages/libcloud/httplib_ssl.py", line 266, in connect
self.timeout)
File "/usr/lib/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known
>>>
It looks like there was a bug in the driver. The "host" attribute on the connection class was incorrectly set to a URL instead of a hostname.
I pushed a fix for that - https://github.com/apache/libcloud/commit/a3ba6a4751623224f16175df9175ec06b29cdc1a
You can test this change by installing latest in development version from git using pip - pip install git+https://git-wip-us.apache.org/repos/asf/libcloud.git#trunk#egg=apache-libcloud
I have confirmed the change is working locally, but if you encounter any more issues, please let us know.
from libcloud.dns.types import Provider
from libcloud.dns.providers import get_driver
cls = get_driver(Provider.GODADDY)
driver = cls('twst', 'adfadf', 'dsdfsdf')
print driver.list_zones()
...
libcloud.dns.drivers.godaddy.GoDaddyDNSException: <GoDaddyDNSException in MALFORMED_API_KEY: Malformed API key>
In addition to that, I will also go ahead and push a change so a more friendly exception is thrown in case the "host" attribute is set to a value which is not a hostname.
I am using
httplib.HTTPConnection ("http://ipaddr:port")
conn.request("GET", "", params, headers)
I am able to do PUT/GET using ipaddr:port using my firefox client!!.
But I am seeing this error on execution of the script:
File "post_python.py", line 5, in <module>
conn.request("GET", "", params, headers)
File "/usr/lib64/python2.6/httplib.py", line 914, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.6/httplib.py", line 951, in _send_request
self.endheaders()
File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders
self._send_output()
File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output
self.send(msg)
File "/usr/lib64/python2.6/httplib.py", line 739, in send
self.connect()
File "/usr/lib64/python2.6/httplib.py", line 720, in connect
self.timeout)
File "/usr/lib64/python2.6/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known"
Please can someone help me ??
Try this instead (without "http://" before the IP address):
conn = httplib.HTTPConnection("x.x.x.x", port)
conn.request("GET", "", params, headers)
You might have a proxy in between that the browser already knows about. If you're under linux try setting http_proxy environment variable.
If it's an IPv6 address, you need to surround it with brackets as per RFC 2732. If I recall correctly, that's the error message you get if you don't use brackets.
httplib.HTTPConnection ("http://[::1]:8080")
conn.request("GET", "", params, headers)