urllib2 and HTTPErrorProcessor Python 3 - python

I'm trying to change my python code from 2.7 to 3.6
So, I'm not familiar to python but I have error with urllib2
I have this error
Error Contents: name 'urllib2' is not defined
So I do this:
from urllib.request import urlopen
This is maybe ok, because urllib2 doesn't work on phyton 3?
But I have this:
class NoRedirection(urllib2.HTTPErrorProcessor):
def http_response(self, request, response):
return response
https_response = http_response
What I tried to change
class NoRedirection(urlopen.HTTPErrorProcessor):
But does't work. How to fix this?
**AttributeError: 'function' object has no attribute 'HTTPErrorProcessor'**

There is a separate module for errors found here. What you want to do is something along these lines
from urllib.error import HTTPError
class NoRedirection(HTTPError):
...

Related

AttributeError: 'function' object has no attribute 'text'

Do you know repl.it?
I am coding python on this site.
And my goal is creating Web Scraper.
I think this code is clean.
But I'm getting an error:
AttributeError: 'function' object has no attribute 'text'
My code:
import requests
indeed_result = requests.get
("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
print(indeed_result.text)
Surely, I have requests package installed.
Please give me some advice
You just need to remove the back to new line after get like this:
import requests
indeed_result = requests.get("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
print(indeed_result.text)
if you want to continue typping in the next line just add a backslash \ as follows:
indeed_result = requests.get\
("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
Removing back to new line after get
try this
import requests
res = requests.get("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
print(res.text)
# result if success 200

Python - Web Scraping exercise - Attribute Error

I am learning how to scrape web information. Below is a snippet of the actual code solution + output from datacamp.
On datacamp, this works perfectly fine, but when I try to run it on Spyder (my own macbook), it doesn't work...
This is because on datacamp, the URL has already been pre-loaded into a variable named 'response'.. however on Spyder, the URL needs to be defined again.
So, I first defined the response variable as response = requests.get('https://www.datacamp.com/courses/all') so that the code will point to datacamp's website..
My code looks like:
from scrapy.selector import Selector
import requests
response = requests.get('https://www.datacamp.com/courses/all')
this_url = response.url
this_title = response.xpath('/html/head/title/text()').extract_first()
print_url_title( this_url, this_title )
When I run this on Spyder, I got an error message
Traceback (most recent call last):
File "<ipython-input-30-6a8340fd3a71>", line 11, in <module>
this_title = response.xpath('/html/head/title/text()').extract_first()
AttributeError: 'Response' object has no attribute 'xpath'
Could someone please guide me? I would really like to know how to get this code working on Spyder.. thank you very much.
The value returned by requests.get('https://www.datacamp.com/courses/all') is a Response object, and this object has no attribute xpath, hence the error: AttributeError: 'Response' object has no attribute 'xpath'
I assume response from your tutorial source, probably has been assigned to another object (most likely the object returned by etree.HTML) and not the value returned by requests.get(url).
You can however do this:
from lxml import etree #import etree
response = requests.get('https://www.datacamp.com/courses/all') #get the Response object
tree = etree.HTML(response.text) #pass the page's source using the Response object
result = tree.xpath('/html/head/title/text()') #extract the value
print(response.url) #url
print(result) #findings

Passing a list as a url value to urlopen

Motivation
Motivated by this problem - the OP was using urlopen() and accidentally passed a sys.argv list instead of a string as a url. This error message was thrown:
AttributeError: 'list' object has no attribute 'timeout'
Because of the way urlopen was written, the error message itself and the traceback is not very informative and may be difficult to understand especially for a Python newcomer:
Traceback (most recent call last):
File "test.py", line 15, in <module>
get_category_links(sys.argv)
File "test.py", line 10, in get_category_links
response = urlopen(url)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 420, in open
req.timeout = timeout
AttributeError: 'list' object has no attribute 'timeout'
Problem
Here is the shortened code I'm working with:
try:
from urllib.request import urlopen
except ImportError:
from urllib2 import urlopen
import sys
def get_category_links(url):
response = urlopen(url)
# do smth with response
print(response)
get_category_links(sys.argv)
I'm trying to think whether this kind of an error can be caught statically with either smart IDEs like PyCharm, static code analysis tools like flake8 or pylint, or with language features like type annotations.
But, I'm failing to detect the problem:
it is probably too specific for flake8 and pylint to catch - they don't warn about the problem
PyCharm does not warn about sys.argv being passed into urlopen, even though, if you "jump to source" of sys.argv it is defined as:
argv = [] # real value of type <class 'list'> skipped
if I annotate the function parameter as a string and pass sys.argv, no warnings as well:
def get_category_links(url: str) -> None:
response = urlopen(url)
# do smth with response
get_category_links(sys.argv)
Question
Is it possible to catch this problem statically (without actually executing the code)?
Instead of keeping it editor specific, you can use mypy to analyze your code. This way it will run on all dev environments instead of just for those who use PyCharm.
from urllib.request import urlopen
import sys
def get_category_links(url: str) -> None:
response = urlopen(url)
# do smth with response
get_category_links(sys.argv)
response = urlopen(sys.argv)
The issues pointed out by mypy for the above code:
error: Argument 1 to "get_category_links" has incompatible type List[str]; expected "str"
error: Argument 1 to "urlopen" has incompatible type List[str]; expected "Union[str, Request]"
Mypy here can guess the type of sys.argv because of its definition in its stub file. Right now some standard library modules are still missing from typeshed though, so you will have to either contribute them or ignore the errors related till they get added :-).
When to run mypy?
To catch such errors you can run mypy on the files with annotations with your tests in your CI tool. Running it on all files in project may take some time, for a small project it is your choice.
Add a pre-commit hook that runs mypy on staged files and points out issues right away(could be a little annoying to the dev if it takes a while).
Firstly, you need to check whether the url type is string or not and if string then check for ValueError exception(Valid url)
import sys
from urllib2 import urlopen
def get_category_links(url):
if type(url) != type(""): #Check if url is string or not
print "Please give string url"
return
try:
response = urlopen(url)
# do smth with response
print(response)
except ValueError: #If url is string but invalid
print "Bad URL"
get_category_links(sys.argv)

Falcon parsing json error

I'm trying out Falcon for a small api project. Unfortunate i'm stuck on the json parsing stuff and code from the documentation examples does not work.
I have tried so many things i've found on Stack and Google but no changes.
I've tried the following codes that results in the errors below
import json
import falcon
class JSON_Middleware(object):
def process_request(self, req, resp):
raw_json = json.loads(req.stream.read().decode('UTF-8'))
"""Exception: AttributeError: 'str' object has no attribute 'read'"""
raw_json = json.loads(req.stream.read(), 'UTF-8')
"""Exception: TypeError: the JSON object must be str, not 'bytes'"""
raw_json = json.loads(req.stream, 'UTF-8')
"""TypeError: the JSON object must be str, not 'Body'"""
I'm on the way of giving up, but if somebody can tell me why this is happening and how to parse JSON in Falcon i would be extremely thankful.
Thanks
Environment:
OSX Sierra
Python 3.5.2
Falcon and other is the latest version from Pip
your code should work if other pieces of code are in place . a quick test(filename app.py):
import falcon
import json
class JSON_Middleware(object):
def process_request(self, req, resp):
raw_json = json.loads(req.stream.read())
print raw_json
class Test:
def on_post(self,req,resp):
pass
app = application = falcon.API(middleware=JSON_Middleware())
t = Test()
app.add_route('/test',t)
run with: gunicorn app
$ curl -XPOST 'localhost:8000' -d '{"Hello":"wold"}'
You have to invoke encode() on the bytes returned by read() with something like req.stream.read().encode('utf-8').
This way the bytes are converted to a str as expected by json.loads().
The other way not to bother with all this boring and error prone encode/decode and bytes/str stuff (which BTW differs in Py2 and Py3), is to use simplejson as a replacement for json. It is API compatible, so the only change is to replace import json with import simplejson as json in your code.
In addition, it simplifies the code since reading the body can be done with json.load(req.bounded_stream), which is much shorter and more readable than json.loads(req.bounded_stream.read().encode('utf-8')).
I now do it this way, and don't use the standard json module any more.

urllib not taking context as a parameter

I'm trying to add the sssl.SSlContext to a urlopen method but keep getting the error:
TypeError: urlopen() got an unexpected keyword argument 'context'
I'm using python 3 and urllib. This has a context parameter defined - https://docs.python.org/2/library/urllib.html. So I don't understand why it is throwing the error. But either way this is the code:
try:
# For Python 3.0 and later
from urllib.request import urlopen, Request
except ImportError:
# Fall back to Python 2's urllib2
from urllib2 import urlopen, Request
request = Request(url, content, headers)
request.get_method = lambda: method
if sys.version_info[0] == 2 and sys.version_info[1] < 8:
result = urlopen(request)
else:
gcontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
result = urlopen(request, context=gcontext)
Can someone explain what I am doing wrong?
According to urllib.request.urlopen documentation:
Changed in version 3.4.3: context was added.
the parameter context will be added in Python 3.4.3. You need to fall back for lower version.
In Python 2.x, it's added in Python 2.7.9. (urllib.urlopen, urllib2.urlopen)
You're looking at the wrong docs. https://docs.python.org/3.0/library/urllib.request.html are the ones you want. You were using Python 2.X documentation.

Categories