I am learning about urllib2 by following this tutorial http://docs.python.org/howto/urllib2.html#urlerror Running the code below yields a different outcome from the tutorial
import urllib2
req = urllib2.Request('http://www.pretend-o-server.org')
try:
urllib2.urlopen(req)
except urllib2.URLError, e:
print e.reason
Python interpreter spits this back
Traceback (most recent call last):
File "urlerror.py", line 8, in <module>
print e.reason
AttributeError: 'HTTPError' object has no attribute 'reason'
How come this is happening?
UPDATE
When I try to print out the code attribute it works fine
import urllib2
req = urllib2.Request('http://www.pretend-o-server.org')
try:
urllib2.urlopen(req)
except urllib2.URLError, e:
print e.code
Depending on the error type, the object e may or may not carry that attribute.
In the link you provided there is a more complete example:
Number 2
from urllib2 import Request, urlopen, URLError
req = Request(someurl)
try:
response = urlopen(req)
except URLError, e:
if hasattr(e, 'reason'): # <--
print 'We failed to reach a server.'
print 'Reason: ', e.reason
elif hasattr(e, 'code'): # <--
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
else:
# everything is fine
Because there is no such attribute. Try:
print str(e)
and you will get nice:
HTTP Error 404: Not Found
The reason I got the AttributeError was because I was using OpenDNS. Apparently even when you pass in a bogus URL, OpenDNS treats it like it exists. So after switching to Googles DNS server, I am getting the expected result which is:
[Errno -2] Name or service not known
Also I should mention the traceback I got for running this code which is everything excluding try and except
from urllib2 import Request, urlopen, URLError, HTTPError
req = Request('http://www.pretend_server.com')
urlopen(req)
is this
Traceback (most recent call last):
File "urlerror.py", line 5, in <module>
urlopen(req)
File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.6/urllib2.py", line 397, in open
response = meth(req, response)
File "/usr/lib/python2.6/urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.6/urllib2.py", line 435, in error
return self._call_chain(*args)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
which a kind gentle(wo)man? from IRC #python told me was highly strange and then asked if I was using OpenDNS to which I replied yes. So they suggested I switch it to Google's which I proceeded to do.
Related
I'm trying to make a request to the GitHub API with Python 3 urllib to create a release, but I made some mistake and it fails with an exception:
Traceback (most recent call last):
File "./a.py", line 27, in <module>
'Authorization': 'token ' + token,
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 422: Unprocessable Entity
GitHub however is nice, and explains why it failed on the response body as shown at: 400 vs 422 response to POST of data
So, how do I read the response body? Is there a way to prevent the exception from being raised?
I've tried to catch the exception and explore it in ipdb, which gives an object of type urllib.error.HTTPError but I couldn't find that body data there, only headers.
The script:
#!/usr/bin/env python3
import json
import os
import sys
from urllib.parse import urlencode
from urllib.request import Request, urlopen
repo = sys.argv[1]
tag = sys.argv[2]
upload_file = sys.argv[3]
token = os.environ['GITHUB_TOKEN']
url_template = 'https://{}.github.com/repos/' + repo + '/releases'
# Create.
_json = json.loads(urlopen(Request(
url_template.format('api'),
json.dumps({
'tag_namezxcvxzcv': tag,
'name': tag,
'prerelease': True,
}).encode(),
headers={
'Accept': 'application/vnd.github.v3+json',
'Authorization': 'token ' + token,
},
)).read().decode())
# This is not the tag, but rather some database integer identifier.
release_id = _json['id']
usage: Can someone give a python requests example of uploading a release asset in github?
The HTTPError has a read() method that allows you to read the response body. So in your case, you should be able to do something such as:
try:
body = urlopen(Request(
url_template.format('api'),
json.dumps({
'tag_namezxcvxzcv': tag,
'name': tag,
'prerelease': True,
}).encode(),
headers={
'Accept': 'application/vnd.github.v3+json',
'Authorization': 'token ' + token,
},
)).read().decode()
except urllib.error.HTTPError as e:
body = e.read().decode() # Read the body of the error response
_json = json.loads(body)
The docs explain in more detail how the HTTPError instance can be used as a response, and some of its other attributes.
I've written the following code in python that goes to the url in the array and finds specific info about that page - a web scraper of sorts. This one takes in an array of Reddit threads and outputs the score of each thread. This program almost never executes completely. Usually, i'll get through 5 or so iterations before receiving the error message below. Could someone please help me get to the bottom of this?
import urllib2
from bs4 import BeautifulSoup
urls = ['http://www.reddit.com/r/videos/comments/1i12o2/soap_precursor_to_a_lot_of_other_hilarious_shows/', 'http://www.reddit.com/r/videos/comments/1i12nx/kid_reporter_interviews_ryan_reynolds/', 'http://www.reddit.com/r/videos/comments/1i12ml/just_my_two_boys_going_full_derp_shocking_plot/']
for x in urls:
f = urllib2.urlopen(x)
data = f.read()
soup = BeautifulSoup(data)
span = soup.find('span', attrs={'class':'number'})
print '{}:{}'.format(x, span.text)
The error message I am getting is:
Traceback (most recent call last):
File "C:/Users/jlazarus/Documents/YouTubeparse2.py", line 7, in <module>
f = urllib2.urlopen(x)
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 429: Unknown
Ignore with a try and except rule to catch the error, this is what you want if you just want to skip past the error.
import urllib2
from bs4 import BeautifulSoup
urls = ['http://www.reddit.com/r/videos/comments/1i12o2/soap_precursor_to_a_lot_of_other_hilarious_shows/', 'http://www.reddit.com/r/videos/comments/1i12nx/kid_reporter_interviews_ryan_reynolds/', 'http://www.reddit.com/r/videos/comments/1i12ml/just_my_two_boys_going_full_derp_shocking_plot/']
for x in urls:
try:
f = urllib2.urlopen(x)
data = f.read()
soup = BeautifulSoup(data)
span = soup.find('span', attrs={'class':'number'})
print '{}:{}'.format(x, span.text)
except HTTPError:
print("HTTP Error, continuing")
My code is like follows, but when it runs it throws an error.
search_request = urllib2.Request(url,data=tmp_file_name,headers={'X-Requested-With':'WoMenShi888XMLHttpRequestWin'})
#print search_request.get_method()
search_response = urllib2.urlopen(search_request)
html_data = search_response.read()
the error is:
Traceback (most recent call last):
File "xx_tmp.py", line 83, in <module>
print hello_lfi()
File "xx_tmp.py", line 69, in hello_lfi
search_response = urllib2.urlopen(search_request)
File "D:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "D:\Python27\lib\urllib2.py", line 406, in open
response = meth(req, response)
File "D:\Python27\lib\urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "D:\Python27\lib\urllib2.py", line 444, in error
return self._call_chain(*args)
File "D:\Python27\lib\urllib2.py", line 378, in _call_chain
result = func(*args)
File "D:\Python27\lib\urllib2.py", line 527, in http_error_defau
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error
I don't know how to fix it? I mean, when an error happened, how can my code continue to work?
when i try use
try:
search_response = urllib2.urlopen(search_request)
except urllib2.HTTPError:
pass
new error
UnboundLocalError: local variable 'search_response' referenced before assignment
i use
global search_response
and have error
NameError: global name 'search_response' is not defined
You can catch the exception, this will prevent your program from stopping so 'abruptly':
try:
search_response = urllib2.urlopen(search_request)
except urllib2.HTTPError:
print 'There was an error with the request'
If you want to continue, you can simply:
try:
search_response = urllib2.urlopen(search_request)
except urllib2.HTTPError:
pass
This will allow your program to continue; but your other statement html_data = search_response.read() won't give you the expected result. To fix this problem permanently, you need to debug your request to see why its failing; this isn't something specific to Python.
I had the same error when I was trying to send a large post request to my GAE Python server. It turns out the server threw the error because I was trying to write the received POST string into a db.StringProperty(). I changed that to db.TextProperty() and it didn't throw the error anymore.
Source: Overcome appengine 500 byte string limit in python? consider text
I am trying to connect to a REST resource and retrieve the data using Python script (Python 3.2.3). When I run the script I am getting error as HTTP Error 401: Unauthorized. Please note that I am able to access the given REST resource using REST client using Basic Authentication. In the REST Client I have specified the hostname, user and password details (realm is not required).
Below is the code and complete error. Your help is very much appreciated.
Code:
import urllib.request
# set up authentication info
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm=None,
uri=r'http://hostname/',
user='administrator',
passwd='administrator')
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)
res = opener.open(r'http://hostname:9004/apollo-api/nodes')
nodes = res.read()
Error
Traceback (most recent call last):
File "C:\Python32\scripts\get-nodes.py", line 12, in <module>
res = opener.open(r'http://tolowa.wysdm.lab.emc.com:9004/apollo-api/nodes')
File "C:\Python32\lib\urllib\request.py", line 375, in open
response = meth(req, response)
File "C:\Python32\lib\urllib\request.py", line 487, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python32\lib\urllib\request.py", line 413, in error
return self._call_chain(*args)
File "C:\Python32\lib\urllib\request.py", line 347, in _call_chain
result = func(*args)
File "C:\Python32\lib\urllib\request.py", line 495, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
Try to give the correct realm name. You can find this out for example when opening the page in a browser - the password prompt should display the name.
You can also read the realm by catching the exception that was raised:
import urllib.error
import urllib.request
# set up authentication info
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm=None,
uri=r'http://hostname/',
user='administrator',
passwd='administrator')
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)
try:
res = opener.open(r'http://hostname:9004/apollo-api/nodes')
nodes = res.read()
except urllib.error.HTTPError as e:
print(e.headers['www-authenticate'])
You should get the following output:
Basic realm="The realm you are after"
Read the realm from above and set it in your add_password method and it should be good to go.
I installed Python 2.6.2 earlier on a Windows XP machine and run the following code:
import urllib2
import urllib
page = urllib2.Request('http://www.python.org/fish.html')
urllib2.urlopen( page )
I get the following error.
Traceback (most recent call last):<br>
File "C:\Python26\test3.py", line 6, in <module><br>
urllib2.urlopen( page )<br>
File "C:\Python26\lib\urllib2.py", line 124, in urlopen<br>
return _opener.open(url, data, timeout)<br>
File "C:\Python26\lib\urllib2.py", line 383, in open<br>
response = self._open(req, data)<br>
File "C:\Python26\lib\urllib2.py", line 401, in _open<br>
'_open', req)<br>
File "C:\Python26\lib\urllib2.py", line 361, in _call_chain<br>
result = func(*args)<br>
File "C:\Python26\lib\urllib2.py", line 1130, in http_open<br>
return self.do_open(httplib.HTTPConnection, req)<br>
File "C:\Python26\lib\urllib2.py", line 1105, in do_open<br>
raise URLError(err)<br>
URLError: <urlopen error [Errno 11001] getaddrinfo failed><br><br><br>
import urllib2
response = urllib2.urlopen('http://www.python.org/fish.html')
html = response.read()
You're doing it wrong.
Have a look in the urllib2 source, at the line specified by the traceback:
File "C:\Python26\lib\urllib2.py", line 1105, in do_open
raise URLError(err)
There you'll see the following fragment:
try:
h.request(req.get_method(), req.get_selector(), req.data, headers)
r = h.getresponse()
except socket.error, err: # XXX what error?
raise URLError(err)
So, it looks like the source is a socket error, not an HTTP protocol related error. Possible reasons: you are not on line, you are behind a restrictive firewall, your DNS is down,...
All this aside from the fact, as mcandre pointed out, that your code is wrong.
Name resolution error.
getaddrinfo is used to resolve the hostname (python.org)in your request. If it fails, it means that the name could not be resolved because:
It does not exist, or the records are outdated (unlikely; python.org is a well-established domain name)
Your DNS server is down (unlikely; if you can browse other sites, you should be able to fetch that page through Python)
A firewall is blocking Python or your script from accessing the Internet (most likely; Windows Firewall sometimes does not ask you if you want to allow an application)
You live on an ancient voodoo cemetery. (unlikely; if that is the case, you should move out)
Windows Vista, python 2.6.2
It's a 404 page, right?
>>> import urllib2
>>> import urllib
>>>
>>> page = urllib2.Request('http://www.python.org/fish.html')
>>> urllib2.urlopen( page )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python26\lib\urllib2.py", line 124, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python26\lib\urllib2.py", line 389, in open
response = meth(req, response)
File "C:\Python26\lib\urllib2.py", line 502, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python26\lib\urllib2.py", line 427, in error
return self._call_chain(*args)
File "C:\Python26\lib\urllib2.py", line 361, in _call_chain
result = func(*args)
File "C:\Python26\lib\urllib2.py", line 510, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
>>>
DJ
First, I see no reason to import urllib; I've only ever seen urllib2 used to replace urllib entirely and I know of no functionality that's useful from urllib and yet is missing from urllib2.
Next, I notice that http://www.python.org/fish.html gives a 404 error to me. (That doesn't explain the backtrace/exception you're seeing. I get urllib2.HTTPError: HTTP Error 404: Not Found
Normally if you just want to do a default fetch of a web pages (without adding special HTTP headers, doing doing any sort of POST, etc) then the following suffices:
req = urllib2.urlopen('http://www.python.org/')
html = req.read()
# and req.close() if you want to be pedantic