I am using common_crawl_index lib for python from here to get some data from S3.
When I run the command to get data cci_lookup com.abc it has error as below.
I still get the output that is the list of urls for the lookup domain but I do not know why the error happen.
MainThread:2014-12-19 01:48:16,150:ERROR:utils:224 Caught exception reading instance data
Traceback (most recent call last):
File "/home/deploy/anaconda/lib/python2.7/site-packages/boto/utils.py", line 211, in retry_url
r = opener.open(req)
File "/home/deploy/anaconda/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/home/deploy/anaconda/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/home/deploy/anaconda/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/home/deploy/anaconda/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/home/deploy/anaconda/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
URLError: <urlopen error timed out>
com.abc.www/forum/index.php/groupcp.php:http
com.abc.www/forum/index.php/includes/MLOtvHD:http
com.abc.www/forum/index.php/includes/blog.php:http
com.abc.www/forum/index.php/includes/content.php:http
com.abc.www/forum/index.php/includes/index.php:http
Hope anyone can help!
Related
I am trying to connect to an API using python 2.7.
Code:
from urllib import urlencode
import urllib2
def http_post(url, data):
post = urlencode(data)
req = urllib2.Request(url, post)
response = urllib2.urlopen(req)
return response.read()
Error:
>>> r = http_post(LOGIN_URL, PARAMS)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in http_post
File "/usr/local/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/local/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/local/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/local/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/local/lib/python2.7/urllib2.py", line 1222, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/usr/local/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno -5] No address associated with hostname
Similar code in python 3.5 is running.
It looks like the url is not being found.
Have you defined LOGIN_URL just above the output we see in your "Error:" extract?
I have having trouble accessing HTTPS sites using urllib2. Here's what I've got:
import urllib2
import ssl
proxy = urllib2.ProxyHandler({'https':'http://username:password#proxy:port',
'http': 'http://username:password#proxy:port'})
opener = urllib2.build_opener(urllib2.HTTPHandler(),
urllib2.HTTPSHandler(),
proxy)
urllib2.install(opener)
url_secure = 'https://www.google.com'
url_nonsecure = 'http://www.google.com'
response = urllib2.urlopen(url_nonsecure)
d = response.read()
response.close()
print d
The above runs without issue. However, if i try to run the above using
response = urllib2.urlopen(url_secure)
I get
Traceback (most recent call last):
File "google_maps.py", line 25, in <module>
response = urllib2.urlopen(url_secure)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 429, in open
response = self._open(req, data)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 447, in _open
'_open', req)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 1241, in https_open
context=self._context)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)>
From this and this it seemed like the problem could be solved by creating a new context. So I tried both:
ctx=ssl._create_unverified_context()
and
ctx = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
and ran
response = urllib2.urlopen(url_secure, context = ctx)
d = response.read()
response.close()
print d
which both threw the following error:
Traceback (most recent call last):
File "google_maps.py", line 25, in <module>
response = urllib2.urlopen(url_secure, context=ctx)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 429, in open
response = self._open(req, data)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 447, in _open
'_open', req)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 1241, in https_open
context=self._context)
File "/Users/username/anaconda2/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>
How can I access https sites?
I was making my project on mac and I tried to do the same things by Beagle Bone Black(BBB).
However, I couldn't use urllib in BBB so I am stuck: I cannot go forward.(it is working well in my mac)
I tried this simple code as an example:
import urllib
conn = urllib.urlopen('http://stackoverflow.com/questions/8479736/using-python-urllib-how-to-avoid-non-html-content')
then this Error occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib.py", line 86, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 207, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 351, in open_http
'got a bad status line', None)
IOError: ('http protocol error', 0, 'got a bad status line', None)
I need to fetch a html data for my project.
How can I solve this problem? Do you have any ideas ?
Thank you.
When I tried urllib2
I got this:
>>> import urllib2
>>> conn = urllib2.urlopen('http://stackoverflow.com/questions/8479736/using-python-urllib-how-to-avoid-non-html-content')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1180, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 371, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
Also I tried this:
curl http://stackoverflow.com/questions/8479736/using-python-urllib-how-to-avoid-non-html-content
curl: (52) Empty reply from server
and this:
wget http://stackoverflow.com/questions/8479736/using-python-urllib-how-to-avoid-non-html-content
Connecting to stackoverflow.com (198.252.206.16:80)
wget: error getting response
but they didn't work
at home, I also tried and failed but returns a different error:
conn = urllib2.urlopen('http://stackoverflow.com/questions/8479736/using-python-urllib-how-to-avoid-non-html-content')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno -2] Name or service not known>
environment
BBB: Linux beaglebone 3.8.13 #1 SMP Tue Jun 18 02:11:09 EDT 2013 armv7l GNU/Linux
python version: 2.7.3
I'm really want to recommend you requests lib:
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
http://www.python-requests.org/en/latest/
How to install:
sudo pip install requests
I'm using this code to get the web site html content,
import urllib.request
import lxml.html as lh
req= urllib.request.Request("http://www.ip-adress.com/ip_tracer/157.123.22.11",
headers={'User-Agent' : "Magic Browser"})
html = urllib.request.urlopen(req).read()
doc = lh.fromstring(html)
print (''.join(doc.xpath('.//*[#class="odd"]')[-1].text_content().split()))
I want to get the Organization: Zenith Data Systems.
but it shows some errors
Traceback (most recent call last):
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 1135, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 967, in request
self._send_request(method, url, body, headers)
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 1005, in _send_request
self.endheaders(body)
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 963, in endheaders
self._send_output(message_body)
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 808, in _send_output
self.send(msg)
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 746, in send
self.connect()
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 724, in connect
self.timeout, self.source_address)
File "/usr/local/python3.2.3/lib/python3.2/socket.py", line 404, in create_connection
raise err
File "/usr/local/python3.2.3/lib/python3.2/socket.py", line 395, in create_connection
sock.connect(sa)
socket.error: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ext.py", line 4, in <module>
html = urllib.request.urlopen(req).read()
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 138, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 369, in open
response = self._open(req, data)
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 387, in _open
'_open', req)
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 347, in _call_chain
result = func(*args)
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 1155, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 1138, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 111] Connection refused>}
How to solve it. Thanks,
Basically, Connection Refused means only registered users are allowed to access the page, or server under heavy maintenance or similar reasons.
From your above code, if you want to handle errors you may try using try and except like below code:
try:
req= urllib.request.Request("http://www.ip-adress.com/ip_tracer/157.123.22.11",headers={'User-Agent' : "Magic Browser"})
html = urllib.request.urlopen(req).read()
doc = lh.fromstring(html)
print (''.join(doc.xpath('.//*[#class="odd"]')[-1].text_content().split()))
except urllib.error.URLError as e:
print(e.reason)
I tried running this,
>>> urllib2.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl')
But it is giving error like this, can anyone tell me a solution ?
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
urllib2.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl')
File "C:\Python26\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python26\lib\urllib2.py", line 391, in open
response = self._open(req, data)
File "C:\Python26\lib\urllib2.py", line 409, in _open
'_open', req)
File "C:\Python26\lib\urllib2.py", line 369, in _call_chain
result = func(*args)
File "C:\Python26\lib\urllib2.py", line 1161, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Python26\lib\urllib2.py", line 1136, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 11001] getaddrinfo failed>
Double check domain is accessible or not.
I am getting 504 Gateway Timeout error here for domain - tycho.usno.navy.mil , at the moment.
Looks like the site is down, also downforeveryoneorjustme.com says that
It's not just you!
http://tycho.usno.navy.mil looks down
from here.
Thats why getaddrinfo is failing
Wrapping in try..except could help keep it neat:
try:
urllib2.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl')
except URLError:
print "Error opening URL"