Python json.dumps() doesn't encode emojis properly [duplicate] - python

This question already has answers here:
python: json.dumps can't handle utf-8?
(3 answers)
Closed 8 months ago.
Why does json.dumps() encode emojis into unicode? See code and output below:
import json
obj = {"key": "hello 😀"}
print(obj)
{'key': 'hello 😀'}
print(json.dumps(obj))
'{"key": "hello \ud83d\ude00"}'
I have tried print(json.dumps(obj)).encode('utf-8') and some variants (.decode()...) but it didn't change the output much. Im working on Python 3.6.1

print(json.dumps(obj, ensure_ascii=False))
However, the ASCII variant is more portable, since you are almost guaranteed you won't have encoding problems. Docs

Related

How Python decodes UTF8 Encoding in String Format [duplicate]

This question already has answers here:
Convert "\x" escaped string into readable string in python
(4 answers)
Closed 1 year ago.
Now there is a string of utf-8:
s = '\\346\\235\\216\\346\\265\\267\\347\\216\\211'
I need to decode it, but now I only do it in this way:
result = eval(bytes(f"b'{s}'", encoding="utf8")).decode('utf-8')
This is not safe, so is there a better way?
Use ast.literal_eval(), it's not unsafe.
Then you don't need to call bytes(), since it will return a byte string.
result = ast.literal_eval(f"b'{s}'").decode('utf-8')
Might be what you are hoping to get ... :
'\\346\\235\\216\\346\\265\\267\\347\\216\\211'.encode('utf8').decode('unicode-escape')
you can do decoded_string = s.decode("utf8")

Changing string to ascii in python [duplicate]

This question already has answers here:
Convert a Unicode string to a string in Python (containing extra symbols)
(12 answers)
Closed 3 years ago.
I need to convert word
name = 'Łódź'
to ASCII characters
output: 'Lodz'
I can't import any library like unicodedata.
I need to do it in clear python.
I've tried to encode than decode and nothing worked.
Well, a simple method would be to map and replace. This also does not require any special imports.
name = 'Łódź'
name=name.replace('Ł','L')
name=name.replace('ó','o')
name=name.replace('ź','z')
print(name)

urllib.unquote not properly decoding url [duplicate]

This question already has an answer here:
URLDecoding requests
(1 answer)
Closed 7 years ago.
I am able to do the following in the python shell:
>>> import urllib
>>> s='https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql'
>>> print urllib.unquote(s)
https://www.microsoft.com/de-at/store/movies/american-pie-präsentiert-nackte-tatsachen/8d6kgwzl63ql
However, if I do this within a python program, it improperly decodes the url:
url = res.history[0].url if res.history else res.url
print '1111', url
print '2222', urllib.unquote(url)
111 https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql
222 https://www.microsoft.com/de-at/store/movies/american-pie-präsentiert-nackte-tatsachen/8d6kgwzl63ql
Why isn't this being properly decoded in the program but it is in my python shell?
The following worked to fix the issue:
url = urllib.unquote(str(res.url)).decode('utf-8', 'ignore')
res.url was a unicode string, but didn't seem to work well with urllib.unquote. So the solution was to first convert it to a string (like how it was in the python interpreter) and then decode it into Unicode.

Convert str with percents (url) to usual str [duplicate]

This question already has answers here:
Decode escaped characters in URL
(5 answers)
Closed 8 years ago.
I have strings like C%2B%2B_name.zip which are supposed as url encoded. How to convert them to C++_name.zip?
Py 3.x.
For Python 3, you will need to use:
urllib.parse.unquote('C%2B%2B_name.zip')
See urllib.parse.unquote.
All you need is URL library
import urllib
print urllib.unquote('C%2B%2B_name.zip')
and if you have names with other characters (not only English), then you can add .decode('utf8')

Converting \x escaped string to UTF-8 [duplicate]

This question already has answers here:
Convert "\x" escaped string into readable string in python
(4 answers)
Closed 9 years ago.
How can convert a string that looks like '\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82' to something readable?
In python 2.7
>>> print '\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'
привет
>>> print '\\xd0\\xbf\\xd1\\x80\\xd0\\xb8\\xd0\\xb2\\xd0\\xb5\\xd1\\x82'.decode('string-escape')
привет
>>> print r'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'.decode('string-escape')
привет
In python 3.x
>>> br'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'.decode('unicode-escape').encode('latin1').decode('utf-8')
'привет'
For file reading you may use this instead of open():
import codecs
with codecs.open('filename','r','string-escape') as f:
data=f.read()
data will be reencoded while f reading.

Categories