urllib not taking context as a parameter - python

I'm trying to add the sssl.SSlContext to a urlopen method but keep getting the error:
TypeError: urlopen() got an unexpected keyword argument 'context'
I'm using python 3 and urllib. This has a context parameter defined - https://docs.python.org/2/library/urllib.html. So I don't understand why it is throwing the error. But either way this is the code:
try:
# For Python 3.0 and later
from urllib.request import urlopen, Request
except ImportError:
# Fall back to Python 2's urllib2
from urllib2 import urlopen, Request
request = Request(url, content, headers)
request.get_method = lambda: method
if sys.version_info[0] == 2 and sys.version_info[1] < 8:
result = urlopen(request)
else:
gcontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
result = urlopen(request, context=gcontext)
Can someone explain what I am doing wrong?

According to urllib.request.urlopen documentation:
Changed in version 3.4.3: context was added.
the parameter context will be added in Python 3.4.3. You need to fall back for lower version.
In Python 2.x, it's added in Python 2.7.9. (urllib.urlopen, urllib2.urlopen)

You're looking at the wrong docs. https://docs.python.org/3.0/library/urllib.request.html are the ones you want. You were using Python 2.X documentation.

Related

urllib2 and HTTPErrorProcessor Python 3

I'm trying to change my python code from 2.7 to 3.6
So, I'm not familiar to python but I have error with urllib2
I have this error
Error Contents: name 'urllib2' is not defined
So I do this:
from urllib.request import urlopen
This is maybe ok, because urllib2 doesn't work on phyton 3?
But I have this:
class NoRedirection(urllib2.HTTPErrorProcessor):
def http_response(self, request, response):
return response
https_response = http_response
What I tried to change
class NoRedirection(urlopen.HTTPErrorProcessor):
But does't work. How to fix this?
**AttributeError: 'function' object has no attribute 'HTTPErrorProcessor'**
There is a separate module for errors found here. What you want to do is something along these lines
from urllib.error import HTTPError
class NoRedirection(HTTPError):
...

Equivalent urllib.parse.unquote() in python 2.7

I import urlparse instead of urllib.parse in python 2.7 but getting AttributeError: 'function' object has no attribute 'unquote'
File "./URLDefenseDecode2.py", line 40, in decodev2
htmlencodedurl = urlparse.unquote(urlencodedurl)
What is the equivalent urllib.parse.unquote() in python 2.7 ?
In Python 2.7, unquote is directly in urllib: urllib.unquote(string)
The semantics of Python 3 urllib.parse.unquote are not the same as Python 2 urllib.unqote, especially when dealing with non-ascii strings.
The following code should allow you to always use the newer semantics of Python 3 and eventually you can just remove it, when you no longer need to support Python 2.
try:
from urllib.parse import unquote
except ImportError:
from urllib import unquote as stdlib_unquote
# polyfill. This behaves the same as urllib.parse.unquote on Python 3
def unquote(string, encoding='utf-8', errors='replace'):
if isinstance(string, bytes):
raise TypeError("a bytes-like object is required, not '{}'".format(type(string)))
return stdlib_unquote(string.encode(encoding)).decode(encoding, errors=errors)

Passing a list as a url value to urlopen

Motivation
Motivated by this problem - the OP was using urlopen() and accidentally passed a sys.argv list instead of a string as a url. This error message was thrown:
AttributeError: 'list' object has no attribute 'timeout'
Because of the way urlopen was written, the error message itself and the traceback is not very informative and may be difficult to understand especially for a Python newcomer:
Traceback (most recent call last):
File "test.py", line 15, in <module>
get_category_links(sys.argv)
File "test.py", line 10, in get_category_links
response = urlopen(url)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 420, in open
req.timeout = timeout
AttributeError: 'list' object has no attribute 'timeout'
Problem
Here is the shortened code I'm working with:
try:
from urllib.request import urlopen
except ImportError:
from urllib2 import urlopen
import sys
def get_category_links(url):
response = urlopen(url)
# do smth with response
print(response)
get_category_links(sys.argv)
I'm trying to think whether this kind of an error can be caught statically with either smart IDEs like PyCharm, static code analysis tools like flake8 or pylint, or with language features like type annotations.
But, I'm failing to detect the problem:
it is probably too specific for flake8 and pylint to catch - they don't warn about the problem
PyCharm does not warn about sys.argv being passed into urlopen, even though, if you "jump to source" of sys.argv it is defined as:
argv = [] # real value of type <class 'list'> skipped
if I annotate the function parameter as a string and pass sys.argv, no warnings as well:
def get_category_links(url: str) -> None:
response = urlopen(url)
# do smth with response
get_category_links(sys.argv)
Question
Is it possible to catch this problem statically (without actually executing the code)?
Instead of keeping it editor specific, you can use mypy to analyze your code. This way it will run on all dev environments instead of just for those who use PyCharm.
from urllib.request import urlopen
import sys
def get_category_links(url: str) -> None:
response = urlopen(url)
# do smth with response
get_category_links(sys.argv)
response = urlopen(sys.argv)
The issues pointed out by mypy for the above code:
error: Argument 1 to "get_category_links" has incompatible type List[str]; expected "str"
error: Argument 1 to "urlopen" has incompatible type List[str]; expected "Union[str, Request]"
Mypy here can guess the type of sys.argv because of its definition in its stub file. Right now some standard library modules are still missing from typeshed though, so you will have to either contribute them or ignore the errors related till they get added :-).
When to run mypy?
To catch such errors you can run mypy on the files with annotations with your tests in your CI tool. Running it on all files in project may take some time, for a small project it is your choice.
Add a pre-commit hook that runs mypy on staged files and points out issues right away(could be a little annoying to the dev if it takes a while).
Firstly, you need to check whether the url type is string or not and if string then check for ValueError exception(Valid url)
import sys
from urllib2 import urlopen
def get_category_links(url):
if type(url) != type(""): #Check if url is string or not
print "Please give string url"
return
try:
response = urlopen(url)
# do smth with response
print(response)
except ValueError: #If url is string but invalid
print "Bad URL"
get_category_links(sys.argv)

Falcon parsing json error

I'm trying out Falcon for a small api project. Unfortunate i'm stuck on the json parsing stuff and code from the documentation examples does not work.
I have tried so many things i've found on Stack and Google but no changes.
I've tried the following codes that results in the errors below
import json
import falcon
class JSON_Middleware(object):
def process_request(self, req, resp):
raw_json = json.loads(req.stream.read().decode('UTF-8'))
"""Exception: AttributeError: 'str' object has no attribute 'read'"""
raw_json = json.loads(req.stream.read(), 'UTF-8')
"""Exception: TypeError: the JSON object must be str, not 'bytes'"""
raw_json = json.loads(req.stream, 'UTF-8')
"""TypeError: the JSON object must be str, not 'Body'"""
I'm on the way of giving up, but if somebody can tell me why this is happening and how to parse JSON in Falcon i would be extremely thankful.
Thanks
Environment:
OSX Sierra
Python 3.5.2
Falcon and other is the latest version from Pip
your code should work if other pieces of code are in place . a quick test(filename app.py):
import falcon
import json
class JSON_Middleware(object):
def process_request(self, req, resp):
raw_json = json.loads(req.stream.read())
print raw_json
class Test:
def on_post(self,req,resp):
pass
app = application = falcon.API(middleware=JSON_Middleware())
t = Test()
app.add_route('/test',t)
run with: gunicorn app
$ curl -XPOST 'localhost:8000' -d '{"Hello":"wold"}'
You have to invoke encode() on the bytes returned by read() with something like req.stream.read().encode('utf-8').
This way the bytes are converted to a str as expected by json.loads().
The other way not to bother with all this boring and error prone encode/decode and bytes/str stuff (which BTW differs in Py2 and Py3), is to use simplejson as a replacement for json. It is API compatible, so the only change is to replace import json with import simplejson as json in your code.
In addition, it simplifies the code since reading the body can be done with json.load(req.bounded_stream), which is much shorter and more readable than json.loads(req.bounded_stream.read().encode('utf-8')).
I now do it this way, and don't use the standard json module any more.

How do I search for text in a page using regular expressions in Python?

I'm trying to create a simple module for phenny, a simple IRC bot framework in Python. The module is supposed to go to http://www.isup.me/websitetheuserrequested to check is a website was up or down. I assumed I could use regex for the module seeing as other built-in modules use it too, so I tried creating this simple script although I don't think I did it right.
import re, urllib
import web
isupuri = 'http://www.isup.me/%s'
check = re.compile(r'(?ims)<span class="body">.*?</span>')
def isup(phenny, input):
global isupuri
global cleanup
bytes = web.get(isupuri)
quote = check.findall(bytes)
result = re.sub(r'<[^>]*?>', '', str(quote[0]))
phenny.say(result)
isup.commands = ['isup']
isup.priority = 'low'
isup.example = '.isup google.com'
It imports the required web packages (I think), and defines the string and the text to look for within the page. I really don't know what I did in those four lines, I kinda just ripped the code off another phenny module.
Here is an example of a quotes module that grabs a random quote from some webpage, I kinda tried to use that as a base: http://pastebin.com/vs5ypHZy
Does anyone know what I am doing wrong? If something needs clarified I can tell you, I don't think I explained this enough.
Here is the error I get:
Traceback (most recent call last):
File "C:\phenny\bot.py", line 189, in call
try: func(phenny, input)
File "C:\phenny\modules\isup.py", line 18, in isup
result = re.sub(r'<[^>]*?>', '', str(quote[0]))
IndexError: list index out of range
try this (from http://docs.python.org/release/2.6.7/library/httplib.html#examples):
import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("HEAD","/index.html")
res = conn.getresponse()
if res.status >= 200 and res.status < 300:
print "up"
else:
print "down"
You will also need to add code to follow redirects before checking the response status.
edit
Alternative that does not need to handle redirects but uses exceptions for logic:
import urllib2
request = urllib2.Request('http://google.com')
request.get_method = lambda : 'HEAD'
try:
response = urllib2.urlopen(request)
print "up"
print response.code
except urllib2.URLError, e:
# failure
print "down"
print e
You should do your own tests and choose the best one.
The error means your regexp wasn't found anywhere on the page (the list quote has no element 0).

Categories