urllib.unquote not properly decoding url [duplicate]

urllib.unquote not properly decoding url [duplicate] - python

This question already has an answer here:
URLDecoding requests
(1 answer)
Closed 7 years ago.
I am able to do the following in the python shell:
>>> import urllib
>>> s='https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql'
>>> print urllib.unquote(s)
https://www.microsoft.com/de-at/store/movies/american-pie-präsentiert-nackte-tatsachen/8d6kgwzl63ql
However, if I do this within a python program, it improperly decodes the url:
url = res.history[0].url if res.history else res.url
print '1111', url
print '2222', urllib.unquote(url)
111 https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql
222 https://www.microsoft.com/de-at/store/movies/american-pie-prÃ¤sentiert-nackte-tatsachen/8d6kgwzl63ql
Why isn't this being properly decoded in the program but it is in my python shell?

The following worked to fix the issue:
url = urllib.unquote(str(res.url)).decode('utf-8', 'ignore')
res.url was a unicode string, but didn't seem to work well with urllib.unquote. So the solution was to first convert it to a string (like how it was in the python interpreter) and then decode it into Unicode.

Related

Hex to plain ASCII? Python 2.7.13 [duplicate]

This question already has answers here:
Decode escaped characters in URL
(5 answers)
Closed 2 years ago.
Im trying to turn this:
%73%6c%61%70%72%69%73%65%40%6c%69%65%6e%6d%75%6c%74%69%6d%65%64%69%61%2e%63%6f%6d
into this:
slaprise#lienmultimedia.com
and my brain is exploding..
Any help would be appreciated.
Thank you

Python 2.7.17 (should work for Python 2.7.13)
import urllib2
url = urllib2.unquote("%73%6c%61%70%72%69%73%65%40%6c%69%65%6e%6d%75%6c%74%69%6d%65%64%69%61%2e%63%6f%6d")
print(url)
# slaprise#lienmultimedia.com

You are trying to URL-decode that string, use urllib:
from urllib import unquote
url = unquote("%73%6c%61%70%72%69%73%65%40%6c%69%65%6e%6d%75%6c%74%69%6d%65%64%69%61%2e%63%6f%6d")
# url = slaprise#lienmultimedia.com

Python json.dumps() doesn't encode emojis properly [duplicate]

This question already has answers here:
python: json.dumps can't handle utf-8?
(3 answers)
Closed 8 months ago.
Why does json.dumps() encode emojis into unicode? See code and output below:
import json
obj = {"key": "hello 😀"}
print(obj)
{'key': 'hello 😀'}
print(json.dumps(obj))
'{"key": "hello \ud83d\ude00"}'
I have tried print(json.dumps(obj)).encode('utf-8') and some variants (.decode()...) but it didn't change the output much. Im working on Python 3.6.1

print(json.dumps(obj, ensure_ascii=False))
However, the ASCII variant is more portable, since you are almost guaranteed you won't have encoding problems. Docs

regex getting url link [duplicate]

This question already has answers here:
Why can't Python parse this JSON data? [closed]
(3 answers)
Closed 5 years ago.
I am trying to extract the link I am getting from a curl command. Curl command throws back of type string.
{"success":true,"key":"Syv77d","link":"https://file.io/Syv77d","expiry":"14 days"}
In my below code this gets https://file.io/Syv77d","expiry":"14 days"}
link = re.search('https://.*$',fileIO)
What I wanted was just https://file.io/Syv77d
The link would vary so i would need the url without the double-qoutes. I think I am missing something in my regex.

Convert the string object to a JSON object.
Ex:
import json
jData = json.loads('{"success":true,"key":"Syv77d","link":"https://file.io/Syv77d","expiry":"14 days"}')
jData["link"]

Python Request in unicode [duplicate]

This question already has answers here:
urllib.quote() throws KeyError
(3 answers)
Closed 7 years ago.
I am trying to make a program that requests to steam to get a the cheapest price for an item. For this I will be using StatTrak™ P250 | Supernova (Factory New) as an example.
The problem is that when requesting, you will make a url:
http://www.steamcommunity.com/market/priceoverview/?country=SG&currency=13&appid=730&market_hash_name=StatTrak™%20P250%20%7C%20Supernova%20%28Factory%20New%29
Afterwards, (I am using the requests module) I do this:
url = "http://www.steamcommunity.com/market/priceoverview/?country=SG&currency=13&appid=730&market_hash_name=StatTrak™%20P250%20%7C%20Supernova%20%28Factory%20New%29"
requests.get(url)
However, the server will return an error.
I can't seem to find solutions to replace ™. I have tried %2122. In python I tried using u'\u084a' but that didn't work too. The problem is that python sends literally \u084a in the request. Is there any way to solve this?

Just use URL encoding. You can't use unicode in urls.
>>> import urllib
>>> f = {'market_hash_name': 'StatTrak™'}
>>> urllib.urlencode(f)
'market_hash_name=StatTrak%E2%84%A2'
Also possible
>>> urllib.quote_plus('StatTrak™')

Convert str with percents (url) to usual str [duplicate]

This question already has answers here:
Decode escaped characters in URL
(5 answers)
Closed 8 years ago.
I have strings like C%2B%2B_name.zip which are supposed as url encoded. How to convert them to C++_name.zip?
Py 3.x.

For Python 3, you will need to use:
urllib.parse.unquote('C%2B%2B_name.zip')
See urllib.parse.unquote.

All you need is URL library
import urllib
print urllib.unquote('C%2B%2B_name.zip')
and if you have names with other characters (not only English), then you can add .decode('utf8')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

urllib.unquote not properly decoding url [duplicate] - python

Related

Hex to plain ASCII? Python 2.7.13 [duplicate]

Python json.dumps() doesn't encode emojis properly [duplicate]

regex getting url link [duplicate]

Python Request in unicode [duplicate]

Convert str with percents (url) to usual str [duplicate]

Categories

Resources