So recently I came across the Google Testing Tools API - Mobile Friendly Test (https://developers.google.com/webmaster-tools/search-console-api/reference/rest/v1/urlTestingTools.mobileFriendlyTest) but I couldn't work it even when I am trying on the site. I tried to use python for this app and followed the guide (https://developers.google.com/webmaster-tools/search-console-api/v1/samples) and I made some few changes to actually make it work (since urllib was merged into one library). So end of the day my code looked like this:
from __future__ import print_function
import urllib
import urllib.request as urllib2
api_key = 'API_KEY'
request_url = 'https://www.google.com/'
service_url = 'https://searchconsole.googleapis.com/v1/urlTestingTools/mobileFriendlyTest:run'
params = {
'url' : request_url,
'key' : api_key,
}
data = urllib.parse.urlencode(params)
content = urllib2.urlopen(url=service_url, data=str.encode(data)).read()
print(content)
And I got the error:
File ".\script2.py", line 14, in <module>
content = urllib2.urlopen(url=service_url, data=str.encode(data)).read()
File "C:\Python\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Python\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Python\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Python\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Python\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
I also tried using curl command and the online tool (not https://search.google.com/test/mobile-friendly but Try This API section) but neither of them worked.
Well I actually solved my own problem, it is mainly caused by urllib I think. So here is what I did;
from __future__ import print_function
import urllib.parse as parser
import urllib.request as urllib2
import json
import base64
request_url = url
params = {
'url': request_url,
'key': api_key
}
data = bytes(parser.urlencode(params), encoding='utf-8')
content = urllib2.urlopen(url=service_url, data=data).read()
sContent = str(content, encoding='utf-8') #Shorthand for stringContent
Related
I'm trying to use the pastebin api with docs: python https://pastebin.com/doc_api. Using the urllib library: https://docs.python.org/3/library/urllib.html.
import urllib.request
import urllib.parse
def main():
def pastebinner():
site = 'https://pastebin.com/api/api_post.php'
dev_key =
code = "12345678910, test"
our_data = urllib.parse.urlencode({"api_dev_key": dev_key, "api_option": "paste", "api_paste_code": code})
our_data = our_data.encode()
resp = urllib.request.urlopen(site, our_data)
print(resp.read())
pastebinner()
if __name__ == "__main__":
main()
Here's the error i get:
File "C:\Program
Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 214, in urlopen
return opener.open(url, data, timeout) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 523, in open
response = meth(req, response) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 632, in http_response
response = self.parent.error( File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 561, in error
return self._call_chain(*args) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 494, in _call_chain
result = func(*args) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 422: Unprocessable entity
Any ideas regarding the reason for getting this error?
bump: I still have no idea please help.
bump2: :v
You are using urllib.request.urlopen(site, our_data) which is an HTTP GET (default for anything in urllib). You need to do an HTTP POST instead. Obligatory w3 link
Please note that the code below is untested
import urllib.request
import urllib.parse
def main():
def pastebinner():
site = 'https://pastebin.com/api/api_post.php'
dev_key = 'APIKEYGOESHERE'
code = "12345678910, test"
our_data = urllib.parse.urlencode({"api_dev_key": dev_key, "api_option": "paste", "api_paste_code": code})
our_data = our_data.encode()
request = urllib.request.Request(site, method='POST')
resp = urllib.request.urlopen(request, our_data)
print(resp.read())
pastebinner()
if __name__ == "__main__":
main()
The error is very unhelpful. I mean, why not return a teapot response instead?
leaving this here in case anyone else runs into this issue. Not 100% sure about this, will test later DONT USE URLLIB2 USE httplib2. I believe that will fix your problem.
My current program looks like this
import os
import urllib.request
baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"
for i in range(1,48):
url = baseUrl % i
urllib.request.urlretrieve(baseUrl, os.path.basename(url))
I haven't coded python in a long time, but I wrote this using urllib2 back when I used to use Python2.7.
It is supposed to replace the %s in the URL and loop through 1-48, and download all the images to the directory that the script is in. But i get alot of errors.
edit : Here is the error that is thrown.
Traceback (most recent call last):
File "download.py", line 9, in <module>
urllib.request.urlretrieve(url, os.path.basename(url))
File "C:\Program Files\Python37\lib\urllib\request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Program Files\Python37\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files\Python37\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Program Files\Python37\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Program Files\Python37\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Program Files\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Program Files\Python37\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
urllib.request is only available on Python 3 so you have to run the code in Python 3.
Try using the requests module:
import requests
baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"
for i in range(1,48):
url = baseUrl % i
response = requests.get(url)
my_raw_data = response.content
with open(os.path.basename(url), 'wb') as my_data:
my_data.write(my_raw_data)
my_data.close()
Just to add, you must use url in the request, not the baseUrl as shown in your code :
import os
import urllib.request
baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"
for i in range(1,48):
url = baseUrl % i
#urllib.request.urlretrieve(baseUrl, os.path.basename(url))
#Use This line :
urllib.request.urlretrieve(url, os.path.basename(url))
Run this in Python 3
Simple fix, if you pass the correct string:
urllib.request.urlretrieve(url, os.path.basename(url))
The documentation says urlretrieve is a Legacy carryover, so you might want to find a different way to do this.
I found this alternate approach modified from another SO answer:
import os
import requests
baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"
for i in range(1,48):
url = baseUrl % i
r = requests.get(url)
open(os.path.basename(url), 'wb').write(r.content)
I'm trying to make a request to the GitHub API with Python 3 urllib to create a release, but I made some mistake and it fails with an exception:
Traceback (most recent call last):
File "./a.py", line 27, in <module>
'Authorization': 'token ' + token,
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 422: Unprocessable Entity
GitHub however is nice, and explains why it failed on the response body as shown at: 400 vs 422 response to POST of data
So, how do I read the response body? Is there a way to prevent the exception from being raised?
I've tried to catch the exception and explore it in ipdb, which gives an object of type urllib.error.HTTPError but I couldn't find that body data there, only headers.
The script:
#!/usr/bin/env python3
import json
import os
import sys
from urllib.parse import urlencode
from urllib.request import Request, urlopen
repo = sys.argv[1]
tag = sys.argv[2]
upload_file = sys.argv[3]
token = os.environ['GITHUB_TOKEN']
url_template = 'https://{}.github.com/repos/' + repo + '/releases'
# Create.
_json = json.loads(urlopen(Request(
url_template.format('api'),
json.dumps({
'tag_namezxcvxzcv': tag,
'name': tag,
'prerelease': True,
}).encode(),
headers={
'Accept': 'application/vnd.github.v3+json',
'Authorization': 'token ' + token,
},
)).read().decode())
# This is not the tag, but rather some database integer identifier.
release_id = _json['id']
usage: Can someone give a python requests example of uploading a release asset in github?
The HTTPError has a read() method that allows you to read the response body. So in your case, you should be able to do something such as:
try:
body = urlopen(Request(
url_template.format('api'),
json.dumps({
'tag_namezxcvxzcv': tag,
'name': tag,
'prerelease': True,
}).encode(),
headers={
'Accept': 'application/vnd.github.v3+json',
'Authorization': 'token ' + token,
},
)).read().decode()
except urllib.error.HTTPError as e:
body = e.read().decode() # Read the body of the error response
_json = json.loads(body)
The docs explain in more detail how the HTTPError instance can be used as a response, and some of its other attributes.
I'm trying to download the HTML of a page (http://www.guangxindai.com in this case) but I'm getting back an error 403. Here is my code:
import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
f = opener.open("http://www.guangxindai.com")
f.read()
but I get error response.
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
f = opener.open("http://www.guangxindai.com")
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 513, in error
return self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
I have tried different request headers, but still can not get correct response. I can view the web through browser. It seems strange for me. I guess the web use some method to block web spider. Does anyone know what is happening? How can I get the HTML of page correctly?
I was having the same problem that you and I found the answer in this link.
The answer provided by Stefano Sanfilippo is quite simple and worked for me:
from urllib.request import Request, urlopen
url_request = Request("http://www.guangxindai.com",
headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(url_request).read()
If your aim is to read the html of the page you can use the following code. It worked for me on Python 2.7
import urllib
f = urllib.urlopen("http://www.guangxindai.com")
f.read()
I want to send some HTTP requests to Twitter in Python in order to create a sign in for Twitter users for my app. I am using urllib, and following this link: https://dev.twitter.com/web/sign-in/implementing.
But I am unable to do this. I guess I need to authenticate before requesting a token but I don't know how to do that.
Code:
import urllib.request
req = urllib.request.Request("https://api.twitter.com/oauth/authenticate",
headers={'User-Agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read() //after this statement im
getting the error
Error:
Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
html = urllib.request.urlopen(req).read()
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 469, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 579, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 507, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
If you go to the url with a browser it shows you that you need a key:
Whoa there!
There is no request token for this page. That's the special key we need from applications asking to use your Twitter account. Please go back to the site or application that sent you here and try again; it was probably just a mistake.
If you go to this link it will let you choose one of your apps and
it will bring you to a signature-generator that will show you the request settings.
To get a request_token you can use requests_oauthlib:
import requests
from requests_oauthlib import OAuth1
REQUEST_TOKEN_URL = "https://api.twitter.com/oauth/request_token"
CONSUMER_KEY = "xxxxxxxx
CONSUMER_SECRET = "xxxxxxxxxxxxxxxxx"
oauth = OAuth1(CONSUMER_KEY, client_secret=CONSUMER_SECRET)
r = requests.post(url=REQUEST_TOKEN_URL, auth=oauth)
print(r.content)
oauth_token=xxxxxxxxxxxxxx&oauth_token_secret=xxxxxxxxxxx&oauth_callback_confirmed=true
You then need to extract the oauth_token oauth_token_secret:
from urlparse import parse_qs
import webbrowser
data = parse_qs(r.content)
oauth_token = data['oauth_token'][0]
oauth_token_secret = data['oauth_token_secret'][0]
AUTH = "https://api.twitter.com/oauth/authorize?oauth_token={}"
auth = AUTH.format(oauth_token)
webbrowser.open(auth)
A webpage will open asking to Authorize your_app to use your account?
For python 3 use:
from urllib.parse import parse_qs
data = parse_qs(r.text)
oauth_token = data['oauth_token'][0]
oauth_token_secret = data['oauth_token_secret'][0]