I keep getting HTTP Error 400: Bad Request from urlopen - python

I'm studying Python from Udacity
cause I use different version so I get stuck in programming profanity editor
This is my code:
import urllib.request
def readdocument(x):
document = open(x)
profanitycheck(document.read())
document.close()
def profanitycheck(urcontent):
q = urllib.request.Request("http://www.wdylike.appspot.com/?q="+urcontent)
with urllib.request.urlopen(q) as content2:
output = content2.read()
print(output)
filelocate=(r"C:\Users\Sutthikiat\Desktop\movie_quotes.txt")
readdocument(filelocate)
this is txt file:
-- Houston, we have a problem. (Apollo 13)
-- Mama always said, life is like a box of chocolates. You never know what you are going to get. (Forrest Gump)
-- You cant handle the truth. (A Few Good Men)
-- I believe everything and I believe nothing. (A Shot in the Dark)
but I create a new text file and check with it, It runs properly so I don't understand how my code gets error, maybe it's about exception??
this is error code:
Traceback (most recent call last):
File "C:\Users\Sutthikiat\Desktop\cursecheck.py", line 13, in <module>
readdocument(filelocate)
File "C:\Users\Sutthikiat\Desktop\cursecheck.py", line 4, in readdocument
profanitycheck(document.read())
File "C:\Users\Sutthikiat\Desktop\cursecheck.py", line 8, in profanitycheck
with urllib.request.urlopen(q) as content2:
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

the server think it invalid,just like this
urllib.request.urlopen("https://www.baidu.com/s?wd="+"a\nb")
the url contains invalid character: \n (just like content you read from file). You need to quote them:
from urllib.quote import quote
q = urllib.request.urlopen("https://www.baidu.com/s?wd="+ urllib.request.quote("a\nb"))
print(q.url)
'https://www.baidu.com/s?wd=a%0Ab'

Related

python 3.x and urllib

hi i have a problem with my code im using python 3.6 and i open a file .txt and read the text for send to my urllib.request.urlopen() but i have a error i known that is for my txt file has spaces and \n but in python 2.7 work perfectly
here is my code :
import urllib.request
import urllib.parse
def readtext():
quotes = open("C:/Users/sdand/Documents/Python/udacity/curse.txt")
contents_of_files = quotes.read()
print(contents_of_files)
quotes.close()
check_profanity(contents_of_files)
def check_profanity(text):
req = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text)
output = req.read()
req.close()
readtext()
and this is my error:
Traceback (most recent call last):
File "C:/Users/sdand/Documents/Python/udacity/profanity.py", line 17, in <module>
readtext()
File "C:/Users/sdand/Documents/Python/udacity/profanity.py", line 9, in readtext
check_profanity(contents_of_files)
File "C:/Users/sdand/Documents/Python/udacity/profanity.py", line 12, in check_profanity
req = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text)
File "C:\Program Files\Python36\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files\Python36\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\Program Files\Python36\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Program Files\Python36\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\Program Files\Python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Program Files\Python36\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
i appreciate you help thank you
Try escaping the query string:
def check_profanity(text):
req = urllib.request.urlopen("http://www.wdylike.appspot.com/?" + urllib.parse.urlencode([('q', text)]))
output = req.read()
req.close()
urllib.request.urlopen, sends a GET request to the url supplied. Apparently, it does not check if the string is url encoded and does not try to do it itself.
URLs cannot have many special characters in them (like spaces), and those need to be encoded to be a valid url (like replaceing space with +).
So basically, the content you read from the file is not encoded to be a proper http url. That is done by urllib.parse.urlencode.
urllib.parse.urlencode takes a list of tuple with a key, value pair.
Basically,
urllib.parse.urlencode([('q', 'value'), ('another', 'value with spaces & other *special* chars')])
# equals:
# q=value&another=value+with+spaces+%26+other+%2Aspecial%2A+chars
Which is ready to be consumed in a url.

Python urllib.request.urlopen() returning error 403

I'm trying to download the HTML of a page (http://www.guangxindai.com in this case) but I'm getting back an error 403. Here is my code:
import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
f = opener.open("http://www.guangxindai.com")
f.read()
but I get error response.
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
f = opener.open("http://www.guangxindai.com")
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 513, in error
return self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
I have tried different request headers, but still can not get correct response. I can view the web through browser. It seems strange for me. I guess the web use some method to block web spider. Does anyone know what is happening? How can I get the HTML of page correctly?
I was having the same problem that you and I found the answer in this link.
The answer provided by Stefano Sanfilippo is quite simple and worked for me:
from urllib.request import Request, urlopen
url_request = Request("http://www.guangxindai.com",
headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(url_request).read()
If your aim is to read the html of the page you can use the following code. It worked for me on Python 2.7
import urllib
f = urllib.urlopen("http://www.guangxindai.com")
f.read()

Error 400 with python-amazon-simple-product-api via pythonanywhere

I've been at this for the better part of a day but have been coming up with the same Error 400 for quite some time. Basically, the application's goal is to parse a book's ISBN from the Amazon referral url and use it as the reference key to pull images from Amazon's Product Advertising API. The webpage is written in Python 3.4 and Django 1.8. I spent quite a while researching on here and settled for using python-amazon-simple-product-api since it would make parsing results from Amazon a little easier.
Answers like: How to use Python Amazon Simple Product API to get price of a product
Make it seem pretty simple, but I haven't quite gotten it to lookup a product successfully yet. Here's a console printout of what my method usually does, with the correct ISBN already filled:
>>> from amazon.api import AmazonAPI
>>> access_key='amazon-access-key'
>>> secret ='amazon-secret-key'
>>> assoc ='amazon-associate-account-name'
>>> amazon = AmazonAPI(access_key, secret, assoc)
>>> product = amazon.lookup(ItemId='1632360705')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/tsuko/.virtualenvs/django17/lib/python3.4/site-packages/amazon/api.py", line 161, in lo
okup
response = self.api.ItemLookup(ResponseGroup=ResponseGroup, **kwargs)
File "/home/tsuko/.virtualenvs/django17/lib/python3.4/site-packages/bottlenose/api.py", line 242, i
n __call__
{'api_url': api_url, 'cache_url': cache_url})
File "/home/tsuko/.virtualenvs/django17/lib/python3.4/site-packages/bottlenose/api.py", line 203, i
n _call_api
return urllib2.urlopen(api_request, timeout=self.Timeout)
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.4/urllib/request.py", line 499, in error
return self._call_chain(*args)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 579, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
Now I guess I'm curious if this is some quirk with PythonAnywhere, or if I've missed a configuration setting in Django? As far as I can tell through AWS and the Amazon Associates page my keys are correct. I'm not too worried about parsing at this point, just getting the object. I've even tried bypassing the API and just using Bottlenose (which the API extends) but I get the same error 400 result.
I'm really new to Django and Amazon's API, any assistance would be appreciated!
You still haven't authorised your account for API access. To do so, you can go to https://affiliate-program.amazon.com/gp/advertising/api/registration/pipeline.html

Parse.com user login - 404 error

I am fairly inexperienced with user authentication especially through restful apis. I am trying to use python to log in with a user that is set up in parse.com. The following is the code I have:
API_LOGIN_ROOT = 'https://api.parse.com/1/login'
params = {'username':username,'password':password}
encodedParams = urllib.urlencode(params)
url = API_LOGIN_ROOT + "?" + encodedParams
request = urllib2.Request(url)
request.add_header('Content-type', 'application/x-www-form-urlencoded')
# we could use urllib2's authentication system, but it seems like overkill for this
auth_header = "Basic %s" % base64.b64encode('%s:%s' % (APPLICATION_ID, MASTER_KEY))
request.add_header('Authorization', auth_header)
request.add_header('X-Parse-Application-Id', APPLICATION_ID)
request.add_header('X-Parse-REST-API-Key', MASTER_KEY)
request.get_method = lambda: http_verb
# TODO: add error handling for server response
response = urllib2.urlopen(request)
#response_body = response.read()
#response_dict = json.loads(response_body)
This is a modification of an open source library used to access the parse rest interface.
I get the following error:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/_webapp25.py", line 703, in __call__
handler.post(*groups)
File "/Users/nazbot/src/PantryPal_AppEngine/fridgepal.py", line 464, in post
url = user.login()
File "/Users/nazbot/src/PantryPal_AppEngine/fridgepal.py", line 313, in login
url = self._executeCall(self.username, self.password, 'GET', data)
File "/Users/nazbot/src/PantryPal_AppEngine/fridgepal.py", line 292, in _executeCall
response = urllib2.urlopen(request)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
return self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found
Can someone point me to where I am screwing up? I'm not quite sure why I'm getting a 404 instead of an access denied or some other issue.
Make sure the "User" class was created on Parse.com as a special user class. When you are adding the class, make sure to change the Class Type to "User" instead of "Custom". A little user head icon will show up next to the class name on the left hand side.
This stumped me for a long time until Matt from the Parse team showed me the problem.
Please change: API_LOGIN_ROOT = 'https://api.parse.com/1/login' to the following: API_LOGIN_ROOT = 'https://api.parse.com/1/login**/**'
I had the same problem using PHP, adding the / at the end fixed the 404 error.

Python urllib2 URLError exception?

I installed Python 2.6.2 earlier on a Windows XP machine and run the following code:
import urllib2
import urllib
page = urllib2.Request('http://www.python.org/fish.html')
urllib2.urlopen( page )
I get the following error.
Traceback (most recent call last):<br>
File "C:\Python26\test3.py", line 6, in <module><br>
urllib2.urlopen( page )<br>
File "C:\Python26\lib\urllib2.py", line 124, in urlopen<br>
return _opener.open(url, data, timeout)<br>
File "C:\Python26\lib\urllib2.py", line 383, in open<br>
response = self._open(req, data)<br>
File "C:\Python26\lib\urllib2.py", line 401, in _open<br>
'_open', req)<br>
File "C:\Python26\lib\urllib2.py", line 361, in _call_chain<br>
result = func(*args)<br>
File "C:\Python26\lib\urllib2.py", line 1130, in http_open<br>
return self.do_open(httplib.HTTPConnection, req)<br>
File "C:\Python26\lib\urllib2.py", line 1105, in do_open<br>
raise URLError(err)<br>
URLError: <urlopen error [Errno 11001] getaddrinfo failed><br><br><br>
import urllib2
response = urllib2.urlopen('http://www.python.org/fish.html')
html = response.read()
You're doing it wrong.
Have a look in the urllib2 source, at the line specified by the traceback:
File "C:\Python26\lib\urllib2.py", line 1105, in do_open
raise URLError(err)
There you'll see the following fragment:
try:
h.request(req.get_method(), req.get_selector(), req.data, headers)
r = h.getresponse()
except socket.error, err: # XXX what error?
raise URLError(err)
So, it looks like the source is a socket error, not an HTTP protocol related error. Possible reasons: you are not on line, you are behind a restrictive firewall, your DNS is down,...
All this aside from the fact, as mcandre pointed out, that your code is wrong.
Name resolution error.
getaddrinfo is used to resolve the hostname (python.org)in your request. If it fails, it means that the name could not be resolved because:
It does not exist, or the records are outdated (unlikely; python.org is a well-established domain name)
Your DNS server is down (unlikely; if you can browse other sites, you should be able to fetch that page through Python)
A firewall is blocking Python or your script from accessing the Internet (most likely; Windows Firewall sometimes does not ask you if you want to allow an application)
You live on an ancient voodoo cemetery. (unlikely; if that is the case, you should move out)
Windows Vista, python 2.6.2
It's a 404 page, right?
>>> import urllib2
>>> import urllib
>>>
>>> page = urllib2.Request('http://www.python.org/fish.html')
>>> urllib2.urlopen( page )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python26\lib\urllib2.py", line 124, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python26\lib\urllib2.py", line 389, in open
response = meth(req, response)
File "C:\Python26\lib\urllib2.py", line 502, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python26\lib\urllib2.py", line 427, in error
return self._call_chain(*args)
File "C:\Python26\lib\urllib2.py", line 361, in _call_chain
result = func(*args)
File "C:\Python26\lib\urllib2.py", line 510, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
>>>
DJ
First, I see no reason to import urllib; I've only ever seen urllib2 used to replace urllib entirely and I know of no functionality that's useful from urllib and yet is missing from urllib2.
Next, I notice that http://www.python.org/fish.html gives a 404 error to me. (That doesn't explain the backtrace/exception you're seeing. I get urllib2.HTTPError: HTTP Error 404: Not Found
Normally if you just want to do a default fetch of a web pages (without adding special HTTP headers, doing doing any sort of POST, etc) then the following suffices:
req = urllib2.urlopen('http://www.python.org/')
html = req.read()
# and req.close() if you want to be pedantic

Categories