This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 7 months ago.
I have a dictionary containing arabic words like
data = [{'name': 'آدَم'}, {'name': 'آزَر'}]
print(json.dumps(data), file=open('data.json', 'a', encoding="utf-8"))
Output:
[{"name": "\u0622\u0632\u064e\u0631"}...]
I don't want to encode the arabic text while creating the data.json file. If I do not use json.dumps then it works fine but then it shows single quotes ' instead of double qoutes "
Pass the parameter ensure_ascii = False:
json.dumps(data, ensure_ascii = False)
Documentation here.
If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.
Related
This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 1 year ago.
I'm trying to dump everything from scrapy crawler into json file but äåö changes to something like \u00f6.
How do I fix that?
Use ensure_ascii=False
Ex:
import json
data = {"Hello": "äåö"}
print(json.dumps(data, ensure_ascii=False)) # --> {"Hello": "äåö"}
Note: If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.
This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 1 year ago.
I know this has been asked before on Stackoverflow and on other sites but I cannot seem to be able to save a JSON file using escaped Unicode characters (Python3). I have read a lot of tutorials.
What am I missing? I have tried a lot of things but nothing works. I have also tried encoding/decoding in UTF-8 but I am obviously missing something.
Just to be clear, I have managed to get it working for other characters like й (0439) but I am having trouble with a single quote being encoded..
If I have the following dict:
import json
data = {"key": "Test \u0027TEXT\u0027 around"}
I want to save it exactly as it is in a new JSON file, but no matter what I do it always ends up as a single character, which is what is encoded in Unicode.
The following 2 blocks print the exact same thing: {"key": "Test 'TEXT' around"}.
print(json.dumps(data))
print(json.dumps(data, ensure_ascii=False))
Is there any way to keep the Unicode string literal? I want to have that very string as a value: "Test \u0027TEXT\u0027 around"
The behavior you are describing has nothing to do with JSON. This is simply how Python 3 handles strings. Open the shell and write:
>>> "Test \u0027TEXT\u0027 around"
"Test 'TEXT' around"
If you do not want Python to interpret the special characters, you should use raw strings (or maybe even byte sequences):
>>> r"Test \u0027TEXT\u0027 around"
'Test \\u0027TEXT\\u0027 around'
Reference:
https://docs.python.org/2.0/ref/strings.html
https://docs.python.org/3/library/stdtypes.html#binaryseq
This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 4 years ago.
I want to log requests responses to file, but when i use getLogger().warning(string), following string is being unicode-escaped
Example:
r = requests.post(...)
result = json.loads(r.text) # Now result == '{"error": "Ошибка 1"}'
getLogger().warning(json.dumps(result))
This code will write to log file unicode-escaped string, where "Ошибка" will be written like "\u0417\u0430\u043a\u0430\u0437..."
But i want to see these characters as is.
Your issue is that json.dumps is converting the Unicode to ASCII by escaping it. This can be avoided by adding an extra parameter, ensure_ascii=False, to the .dumps function:
r = requests.post(...)
result = json.loads(r.text) # Now result == {"error": "Ошибка 1"}
getLogger().warning(json.dumps(result, ensure_ascii=False))
You can check the documentation for other arguments to the json.dumps function.
This question already has answers here:
What does a leading `\x` mean in a Python string `\xaa`
(2 answers)
Closed 7 years ago.
text="\xe2\x80\x94"
print re.sub(r'(\\(?<=\\)x[a-z0-9]{2})+',"replacement_text",text)
output is —
how can I handle the hex decimal characters in this situation?
Your input doesn't have backslashes. It has 3 bytes, the UTF-8 encoding for the U+2014 EM DASH character:
>>> text = "\xe2\x80\x94"
>>> len(text)
3
>>> text[0]
'\xe2'
>>> text.decode('utf8')
u'\u2014'
>>> print text.decode('utf8')
—
You either need to match those UTF-8 bytes directly, or decode from UTF-8 to unicode and match the codepoint. The latter is preferable; always try to deal with text as Unicode to simplify how many characters you have to transform at a time.
Also note that Python's repr() output (which is used impliciltly when echoing in the interactive interpreter or when printing lists, dicts or other containers) uses \xhh escape sequences to represent any non-printable character. For UTF-8 strings, that includes anything outside the ASCII range. You could just replace anything outside that range with:
re.sub(r'[\x80-\xff]+', "replacement_text", text)
Take into account that this'll match multiple UTF-8-encoded characters in a row, and replace these together as a group!
Your input is in hex, not an actual "\xe2\x80\x94".
\x is just the way to say that the following characters should be interpreted in hex.
This was explained in this post.
This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 7 months ago.
Dumping a string that contains unicode characters as json produces weird unicode escape sequences:
text = "⌂⚘いの法嫁"
print(text) # output: ⌂⚘いの法嫁
import json
json_text = json.dumps(text)
print(json_text) # output: "\u2302\u2698\u3044\u306e\u6cd5\u5ac1"
I'd like to get this output instead:
"⌂⚘いの法嫁"
How can I dump unicode characters as characters instead of escape sequences?
Call json.dumps with ensure_ascii=False:
json_string = json.dumps(json_dict, ensure_ascii=False)
On Python 2, the return value will be unicode instead of str, so you might want to encode it before doing anything else with it.