python 3.x and urllib

python 3.x and urllib - python

hi i have a problem with my code im using python 3.6 and i open a file .txt and read the text for send to my urllib.request.urlopen() but i have a error i known that is for my txt file has spaces and \n but in python 2.7 work perfectly
here is my code :
import urllib.request
import urllib.parse
def readtext():
quotes = open("C:/Users/sdand/Documents/Python/udacity/curse.txt")
contents_of_files = quotes.read()
print(contents_of_files)
quotes.close()
check_profanity(contents_of_files)
def check_profanity(text):
req = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text)
output = req.read()
req.close()
readtext()
and this is my error:
Traceback (most recent call last):
File "C:/Users/sdand/Documents/Python/udacity/profanity.py", line 17, in <module>
readtext()
File "C:/Users/sdand/Documents/Python/udacity/profanity.py", line 9, in readtext
check_profanity(contents_of_files)
File "C:/Users/sdand/Documents/Python/udacity/profanity.py", line 12, in check_profanity
req = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text)
File "C:\Program Files\Python36\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files\Python36\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\Program Files\Python36\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Program Files\Python36\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\Program Files\Python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Program Files\Python36\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
i appreciate you help thank you

Try escaping the query string:
def check_profanity(text):
req = urllib.request.urlopen("http://www.wdylike.appspot.com/?" + urllib.parse.urlencode([('q', text)]))
output = req.read()
req.close()
urllib.request.urlopen, sends a GET request to the url supplied. Apparently, it does not check if the string is url encoded and does not try to do it itself.
URLs cannot have many special characters in them (like spaces), and those need to be encoded to be a valid url (like replaceing space with +).
So basically, the content you read from the file is not encoded to be a proper http url. That is done by urllib.parse.urlencode.
urllib.parse.urlencode takes a list of tuple with a key, value pair.
Basically,
urllib.parse.urlencode([('q', 'value'), ('another', 'value with spaces & other *special* chars')])
# equals:
# q=value&another=value+with+spaces+%26+other+%2Aspecial%2A+chars
Which is ready to be consumed in a url.

Related

Why does my urllib.request return a http error 403?

i am attempting to make a program that downloads a series of product pictures from a site using python. The site stores its images under a certain url format https://www.sitename.com/XYZabcde where XYZ are three letters that represent the brand of the product and abcde are a series of numbers in between 00000 and 30000.
here is my code:
import urllib.request
def down(i, inp):
full_path = 'images/image-{}.jpg'.format(i)
url = "https://www.sitename.com/{}{}.jpg".format(inp,i)
urllib.request.urlretrieve(url, full_path)
print("saved")
return None
inp = input("brand :" )
i = 20100
while i <= 20105:
x = str(i)
y = x.zfill(5)
z = "https://www.sitename.com/{}{}.jpg".format(inp,y)
print(z)
down(y, inp)
i += 1
With the code i have written i can successfully download a series of pictures from it which i know exist for example brand RVL from 20100 to 20105 will succesfully download those six pictures.
however when i broaden the while loop to include links i dont know will give me an image i get this error code :
Traceback (most recent call last):
File "c:/Users/euan/Desktop/university/programming/Python/parser/test - Copy.py", line 20, in <module>
down(y, inp)
File "c:/Users/euan/Desktop/university/programming/Python/parser/test - Copy.py", line 6, in down
urllib.request.urlretrieve(url, full_path)
File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
what can i do to check and avoid any url that would yield this result?

You cannot as such know in advance which URLs you don't have access to, but you can surround the download with a try-except:
import urllib.request, urllib.error
...
def down(i, inp):
full_path = 'images/image-{}.jpg'.format(i)
url = "https://www.sitename.com/{}{}.jpg".format(inp,i)
try:
urllib.request.urlretrieve(url, full_path)
print("saved")
except urllib.error.HTTPError as e:
print("failed:", e)
return None
In that case it will just print e.g. "failed: HTTP Error 403: Forbidden" whenever a URL cannot be fetched, and the program will continue.

How to use URLLib.request to loop through URL's and download images?

My current program looks like this
import os
import urllib.request
baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"
for i in range(1,48):
url = baseUrl % i
urllib.request.urlretrieve(baseUrl, os.path.basename(url))
I haven't coded python in a long time, but I wrote this using urllib2 back when I used to use Python2.7.
It is supposed to replace the %s in the URL and loop through 1-48, and download all the images to the directory that the script is in. But i get alot of errors.
edit : Here is the error that is thrown.
Traceback (most recent call last):
File "download.py", line 9, in <module>
urllib.request.urlretrieve(url, os.path.basename(url))
File "C:\Program Files\Python37\lib\urllib\request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Program Files\Python37\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files\Python37\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Program Files\Python37\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Program Files\Python37\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Program Files\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Program Files\Python37\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

urllib.request is only available on Python 3 so you have to run the code in Python 3.

Try using the requests module:
import requests
baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"
for i in range(1,48):
url = baseUrl % i
response = requests.get(url)
my_raw_data = response.content
with open(os.path.basename(url), 'wb') as my_data:
my_data.write(my_raw_data)
my_data.close()
Just to add, you must use url in the request, not the baseUrl as shown in your code :
import os
import urllib.request
baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"
for i in range(1,48):
url = baseUrl % i
#urllib.request.urlretrieve(baseUrl, os.path.basename(url))
#Use This line :
urllib.request.urlretrieve(url, os.path.basename(url))
Run this in Python 3

Simple fix, if you pass the correct string:
urllib.request.urlretrieve(url, os.path.basename(url))
The documentation says urlretrieve is a Legacy carryover, so you might want to find a different way to do this.
I found this alternate approach modified from another SO answer:
import os
import requests
baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"
for i in range(1,48):
url = baseUrl % i
r = requests.get(url)
open(os.path.basename(url), 'wb').write(r.content)

I keep getting HTTP Error 400: Bad Request from urlopen

I'm studying Python from Udacity
cause I use different version so I get stuck in programming profanity editor
This is my code:
import urllib.request
def readdocument(x):
document = open(x)
profanitycheck(document.read())
document.close()
def profanitycheck(urcontent):
q = urllib.request.Request("http://www.wdylike.appspot.com/?q="+urcontent)
with urllib.request.urlopen(q) as content2:
output = content2.read()
print(output)
filelocate=(r"C:\Users\Sutthikiat\Desktop\movie_quotes.txt")
readdocument(filelocate)
this is txt file:
-- Houston, we have a problem. (Apollo 13)
-- Mama always said, life is like a box of chocolates. You never know what you are going to get. (Forrest Gump)
-- You cant handle the truth. (A Few Good Men)
-- I believe everything and I believe nothing. (A Shot in the Dark)
but I create a new text file and check with it, It runs properly so I don't understand how my code gets error, maybe it's about exception??
this is error code:
Traceback (most recent call last):
File "C:\Users\Sutthikiat\Desktop\cursecheck.py", line 13, in <module>
readdocument(filelocate)
File "C:\Users\Sutthikiat\Desktop\cursecheck.py", line 4, in readdocument
profanitycheck(document.read())
File "C:\Users\Sutthikiat\Desktop\cursecheck.py", line 8, in profanitycheck
with urllib.request.urlopen(q) as content2:
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Users\Sutthikiat\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

the server think it invalid,just like this
urllib.request.urlopen("https://www.baidu.com/s?wd="+"a\nb")
the url contains invalid character: \n (just like content you read from file). You need to quote them:
from urllib.quote import quote
q = urllib.request.urlopen("https://www.baidu.com/s?wd="+ urllib.request.quote("a\nb"))
print(q.url)
'https://www.baidu.com/s?wd=a%0Ab'

How to handle urls in python urllib2 appengine with plus signs?

Here's my problem. I'm trying to request a url from the rotten tomatoes API. Now the thing is that they require you to have your movie titles contain + signs where ever there should be spaces. However I'm not sure how to implement this on the app engine side, because whenever I try doing the same thing on app engine, I get the same error:
Traceback (most recent call last):
File "/programming/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/programming/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/programming/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/programming/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/programming/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/programming/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/Users/student/Desktop/Movie Rater/MovieRaterBackend/higgsmovies.py", line 12, in get
page = urllib2.urlopen(site)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
return self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 400: Bad Request
Here's my code:
title = self.request.get("title")
site = "http://api.rottentomatoes.com/api/public/v1.0/movies.json?apikey=" + constants.ROTTEN_TOMATOES_KEY + "&q=" + title + "&page_limit=1"
page = urllib2.urlopen(site)
soup = BeautifulSoup(page)
self.response.out.write(soup)
constants is just a python file containing all of my passwords and stuff, and I'm using beautiful soup to clean things up, but I'm sure that's not the problem. This code is just accessed by going to the url myapplication.com/about?title=your+title+goes+here, where myapplication will be the url of the website, probably some appspot.com url.
This works for URLs that don't contain + signs.
Any help would be greatly appreciated!

This does not directly answer your question, but have you tried using the url fetch service, directly:
eg:
from google.appengine.api import urlfetch
title = self.request.get("title")
site = "http://api.rottentomatoes.com/api/public/v1.0/movies.json?apikey=" + constants.ROTTEN_TOMATOES_KEY + "&q=" + title + "&page_limit=1"
result = urlfetch.fetch(site)

Plus signs ("+") are part of the standard for encoding form data: application/x-www-form-urlencoded
Query strings, which are everything after the question mark ("?"), are form data -- or, in this case, REST query parameters. So their API is behaving correctly here.

I haven't found a way to handle the plus signs, because appengine seems to infer that these are new variables/values. However, using a regex other than '+' is a viable solution to the problem as a whole, as long as the application accessing the URL is able to replace [space]s with [regex] rather than the normal '+'. Seeing as the intended application of this service is to be a backend for an iPhone application, there should not be too much trouble with this method. I only have to make sure that my regex is not included in any movie names, and that it is not too long. For web applications using appengine to forward this kind of data to another online service, there is the possibility of writing a javascript script to handle this properly.

I use urllib.urlencode.
Example:
params = { 'q' : 'value', 'apikey' : key_value }
request_url += urllib.urlencode(params)
urllib2.urlopen(request_url)

Parse.com user login - 404 error

I am fairly inexperienced with user authentication especially through restful apis. I am trying to use python to log in with a user that is set up in parse.com. The following is the code I have:
API_LOGIN_ROOT = 'https://api.parse.com/1/login'
params = {'username':username,'password':password}
encodedParams = urllib.urlencode(params)
url = API_LOGIN_ROOT + "?" + encodedParams
request = urllib2.Request(url)
request.add_header('Content-type', 'application/x-www-form-urlencoded')
# we could use urllib2's authentication system, but it seems like overkill for this
auth_header = "Basic %s" % base64.b64encode('%s:%s' % (APPLICATION_ID, MASTER_KEY))
request.add_header('Authorization', auth_header)
request.add_header('X-Parse-Application-Id', APPLICATION_ID)
request.add_header('X-Parse-REST-API-Key', MASTER_KEY)
request.get_method = lambda: http_verb
# TODO: add error handling for server response
response = urllib2.urlopen(request)
#response_body = response.read()
#response_dict = json.loads(response_body)
This is a modification of an open source library used to access the parse rest interface.
I get the following error:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/_webapp25.py", line 703, in __call__
handler.post(*groups)
File "/Users/nazbot/src/PantryPal_AppEngine/fridgepal.py", line 464, in post
url = user.login()
File "/Users/nazbot/src/PantryPal_AppEngine/fridgepal.py", line 313, in login
url = self._executeCall(self.username, self.password, 'GET', data)
File "/Users/nazbot/src/PantryPal_AppEngine/fridgepal.py", line 292, in _executeCall
response = urllib2.urlopen(request)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
return self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found
Can someone point me to where I am screwing up? I'm not quite sure why I'm getting a 404 instead of an access denied or some other issue.

Make sure the "User" class was created on Parse.com as a special user class. When you are adding the class, make sure to change the Class Type to "User" instead of "Custom". A little user head icon will show up next to the class name on the left hand side.
This stumped me for a long time until Matt from the Parse team showed me the problem.

Please change: API_LOGIN_ROOT = 'https://api.parse.com/1/login' to the following: API_LOGIN_ROOT = 'https://api.parse.com/1/login**/**'
I had the same problem using PHP, adding the / at the end fixed the 404 error.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python 3.x and urllib - python

Related

Why does my urllib.request return a http error 403?

How to use URLLib.request to loop through URL's and download images?

I keep getting HTTP Error 400: Bad Request from urlopen

How to handle urls in python urllib2 appengine with plus signs?

Parse.com user login - 404 error

Categories

Resources