Equivalent urllib.parse.unquote() in python 2.7 - python

I import urlparse instead of urllib.parse in python 2.7 but getting AttributeError: 'function' object has no attribute 'unquote'
File "./URLDefenseDecode2.py", line 40, in decodev2
htmlencodedurl = urlparse.unquote(urlencodedurl)
What is the equivalent urllib.parse.unquote() in python 2.7 ?

In Python 2.7, unquote is directly in urllib: urllib.unquote(string)

The semantics of Python 3 urllib.parse.unquote are not the same as Python 2 urllib.unqote, especially when dealing with non-ascii strings.
The following code should allow you to always use the newer semantics of Python 3 and eventually you can just remove it, when you no longer need to support Python 2.
try:
from urllib.parse import unquote
except ImportError:
from urllib import unquote as stdlib_unquote
# polyfill. This behaves the same as urllib.parse.unquote on Python 3
def unquote(string, encoding='utf-8', errors='replace'):
if isinstance(string, bytes):
raise TypeError("a bytes-like object is required, not '{}'".format(type(string)))
return stdlib_unquote(string.encode(encoding)).decode(encoding, errors=errors)

Related

urllib2 and HTTPErrorProcessor Python 3

I'm trying to change my python code from 2.7 to 3.6
So, I'm not familiar to python but I have error with urllib2
I have this error
Error Contents: name 'urllib2' is not defined
So I do this:
from urllib.request import urlopen
This is maybe ok, because urllib2 doesn't work on phyton 3?
But I have this:
class NoRedirection(urllib2.HTTPErrorProcessor):
def http_response(self, request, response):
return response
https_response = http_response
What I tried to change
class NoRedirection(urlopen.HTTPErrorProcessor):
But does't work. How to fix this?
**AttributeError: 'function' object has no attribute 'HTTPErrorProcessor'**
There is a separate module for errors found here. What you want to do is something along these lines
from urllib.error import HTTPError
class NoRedirection(HTTPError):
...

Falcon parsing json error

I'm trying out Falcon for a small api project. Unfortunate i'm stuck on the json parsing stuff and code from the documentation examples does not work.
I have tried so many things i've found on Stack and Google but no changes.
I've tried the following codes that results in the errors below
import json
import falcon
class JSON_Middleware(object):
def process_request(self, req, resp):
raw_json = json.loads(req.stream.read().decode('UTF-8'))
"""Exception: AttributeError: 'str' object has no attribute 'read'"""
raw_json = json.loads(req.stream.read(), 'UTF-8')
"""Exception: TypeError: the JSON object must be str, not 'bytes'"""
raw_json = json.loads(req.stream, 'UTF-8')
"""TypeError: the JSON object must be str, not 'Body'"""
I'm on the way of giving up, but if somebody can tell me why this is happening and how to parse JSON in Falcon i would be extremely thankful.
Thanks
Environment:
OSX Sierra
Python 3.5.2
Falcon and other is the latest version from Pip
your code should work if other pieces of code are in place . a quick test(filename app.py):
import falcon
import json
class JSON_Middleware(object):
def process_request(self, req, resp):
raw_json = json.loads(req.stream.read())
print raw_json
class Test:
def on_post(self,req,resp):
pass
app = application = falcon.API(middleware=JSON_Middleware())
t = Test()
app.add_route('/test',t)
run with: gunicorn app
$ curl -XPOST 'localhost:8000' -d '{"Hello":"wold"}'
You have to invoke encode() on the bytes returned by read() with something like req.stream.read().encode('utf-8').
This way the bytes are converted to a str as expected by json.loads().
The other way not to bother with all this boring and error prone encode/decode and bytes/str stuff (which BTW differs in Py2 and Py3), is to use simplejson as a replacement for json. It is API compatible, so the only change is to replace import json with import simplejson as json in your code.
In addition, it simplifies the code since reading the body can be done with json.load(req.bounded_stream), which is much shorter and more readable than json.loads(req.bounded_stream.read().encode('utf-8')).
I now do it this way, and don't use the standard json module any more.

AttributeError: 'module' object has no attribute 'loads' while parsing json in python

I am getting this error:
Traceback (most recent call last):
File "C:/Users/Shivam/Desktop/jsparse.py", line 13, in <module>
info = json.loads(str(data))
AttributeError: 'module' object has no attribute 'loads'
Any thoughts what wrong I am doing here?
This is my code:
import json
import urllib
url = ''
uh = urllib.urlopen(url)
data = uh.read()
info = json.loads(str(data))
The problem is that you're using Python 2.5.x, which doesn't have the json module. If possible, I recommend upgrading to Python 2.7.x, as 2.5.x is badly outdated.
If you need to stick with Python 2.5.x, you'll have to use the simplejson module (see here). This code will work for 2.5.x as well as newer Python versions:
try:
import json
except ImportError:
import simplejson as json
Or if you're only using Python 2.5, just do:
import simplejson as json

urllib not taking context as a parameter

I'm trying to add the sssl.SSlContext to a urlopen method but keep getting the error:
TypeError: urlopen() got an unexpected keyword argument 'context'
I'm using python 3 and urllib. This has a context parameter defined - https://docs.python.org/2/library/urllib.html. So I don't understand why it is throwing the error. But either way this is the code:
try:
# For Python 3.0 and later
from urllib.request import urlopen, Request
except ImportError:
# Fall back to Python 2's urllib2
from urllib2 import urlopen, Request
request = Request(url, content, headers)
request.get_method = lambda: method
if sys.version_info[0] == 2 and sys.version_info[1] < 8:
result = urlopen(request)
else:
gcontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
result = urlopen(request, context=gcontext)
Can someone explain what I am doing wrong?
According to urllib.request.urlopen documentation:
Changed in version 3.4.3: context was added.
the parameter context will be added in Python 3.4.3. You need to fall back for lower version.
In Python 2.x, it's added in Python 2.7.9. (urllib.urlopen, urllib2.urlopen)
You're looking at the wrong docs. https://docs.python.org/3.0/library/urllib.request.html are the ones you want. You were using Python 2.X documentation.

im trying to get proxies using regex python out of a web page [duplicate]

This question already has answers here:
TypeError: can't use a string pattern on a bytes-like object in re.findall()
(3 answers)
Closed 5 months ago.
import urllib.request
import re
page = urllib.request.urlopen("http://www.samair.ru/proxy/ip-address-01.htm").read()
re.findall('\d+\.\d+\.\d+\.\d+', page)
i dont understand why it says:
File "C:\Python33\lib\re.py", line 201, in findall
return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object
import urllib
import re
page = urllib.urlopen("http://www.samair.ru/proxy/ip-address-01.htm").read()
print re.findall('\d+\.\d+\.\d+\.\d+', page)
Worked and gave me the result:
['056.249.66.50', '100.44.124.8', '103.31.250.115', ...
Edit
This works for python2.7
The result of reading the file-like object returned by urllib.request.urlopen is a bytes object. You can either decode it into a unicode string and use a unicode regex:
>>> re.findall('\d+\.\d+\.\d+\.\d+', page.decode('utf-8'))
['056.249.66.50', '100.44.124.8', '103.31.250.115', '105.236.180.243', '105.236.21.213', '108.171.162.172', '109.207.61.143', '109.207.61.197', '109.207.61.202', '109.226.199.129', '109.232.112.109', '109.236.220.98', '110.196.42.33', '110.74.197.141', '110.77.183.64', '110.77.199.111', '110.77.200.248', '110.77.219.154', '110.77.219.2', '110.77.221.208']
... or use a bytes regex:
>>> re.findall(b'\d+\.\d+\.\d+\.\d+', page)
[b'056.249.66.50', b'100.44.124.8', b'103.31.250.115', b'105.236.180.243', b'105.236.21.213', b'108.171.162.172', b'109.207.61.143', b'109.207.61.197', b'109.207.61.202', b'109.226.199.129', b'109.232.112.109', b'109.236.220.98', b'110.196.42.33', b'110.74.197.141', b'110.77.183.64', b'110.77.199.111', b'110.77.200.248', b'110.77.219.154', b'110.77.219.2', b'110.77.221.208']
Depending on which datatype you prefer to work with.

Categories