This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 1 year ago.
I'm trying to dump everything from scrapy crawler into json file but äåö changes to something like \u00f6.
How do I fix that?
Use ensure_ascii=False
Ex:
import json
data = {"Hello": "äåö"}
print(json.dumps(data, ensure_ascii=False)) # --> {"Hello": "äåö"}
Note: If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.
Related
This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 1 year ago.
I know this has been asked before on Stackoverflow and on other sites but I cannot seem to be able to save a JSON file using escaped Unicode characters (Python3). I have read a lot of tutorials.
What am I missing? I have tried a lot of things but nothing works. I have also tried encoding/decoding in UTF-8 but I am obviously missing something.
Just to be clear, I have managed to get it working for other characters like й (0439) but I am having trouble with a single quote being encoded..
If I have the following dict:
import json
data = {"key": "Test \u0027TEXT\u0027 around"}
I want to save it exactly as it is in a new JSON file, but no matter what I do it always ends up as a single character, which is what is encoded in Unicode.
The following 2 blocks print the exact same thing: {"key": "Test 'TEXT' around"}.
print(json.dumps(data))
print(json.dumps(data, ensure_ascii=False))
Is there any way to keep the Unicode string literal? I want to have that very string as a value: "Test \u0027TEXT\u0027 around"
The behavior you are describing has nothing to do with JSON. This is simply how Python 3 handles strings. Open the shell and write:
>>> "Test \u0027TEXT\u0027 around"
"Test 'TEXT' around"
If you do not want Python to interpret the special characters, you should use raw strings (or maybe even byte sequences):
>>> r"Test \u0027TEXT\u0027 around"
'Test \\u0027TEXT\\u0027 around'
Reference:
https://docs.python.org/2.0/ref/strings.html
https://docs.python.org/3/library/stdtypes.html#binaryseq
This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 4 years ago.
I want to log requests responses to file, but when i use getLogger().warning(string), following string is being unicode-escaped
Example:
r = requests.post(...)
result = json.loads(r.text) # Now result == '{"error": "Ошибка 1"}'
getLogger().warning(json.dumps(result))
This code will write to log file unicode-escaped string, where "Ошибка" will be written like "\u0417\u0430\u043a\u0430\u0437..."
But i want to see these characters as is.
Your issue is that json.dumps is converting the Unicode to ASCII by escaping it. This can be avoided by adding an extra parameter, ensure_ascii=False, to the .dumps function:
r = requests.post(...)
result = json.loads(r.text) # Now result == {"error": "Ошибка 1"}
getLogger().warning(json.dumps(result, ensure_ascii=False))
You can check the documentation for other arguments to the json.dumps function.
This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 7 months ago.
I have a dictionary containing arabic words like
data = [{'name': 'آدَم'}, {'name': 'آزَر'}]
print(json.dumps(data), file=open('data.json', 'a', encoding="utf-8"))
Output:
[{"name": "\u0622\u0632\u064e\u0631"}...]
I don't want to encode the arabic text while creating the data.json file. If I do not use json.dumps then it works fine but then it shows single quotes ' instead of double qoutes "
Pass the parameter ensure_ascii = False:
json.dumps(data, ensure_ascii = False)
Documentation here.
If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.
This question already has answers here:
Decode escaped characters in URL
(5 answers)
Url decode UTF-8 in Python
(5 answers)
Closed 9 years ago.
I have a link like below
http%253A%252F.....25252520.doc
How do i convert this to normal link in python?..the link has lots of encoded stuff..
Apply urllib.unquote twice:
>>> import urllib
>>> strs = urllib.unquote("http%253A%252F.....25252520.doc")
>>> urllib.unquote(strs)
'http:/.....25252520.doc'
Use urllib.unquote():
Replace %xx escapes by their single-character equivalent.
It looks as if you have a double or ever triple encoded URL; the http:// part has been encoded to http%253A%252F which decodes to http%3A%2F which in turn becomes http:/. The URL itself may contain another stage of encoding but you didn't share enough of the actual URL with us to determine that.
This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 7 months ago.
Dumping a string that contains unicode characters as json produces weird unicode escape sequences:
text = "⌂⚘いの法嫁"
print(text) # output: ⌂⚘いの法嫁
import json
json_text = json.dumps(text)
print(json_text) # output: "\u2302\u2698\u3044\u306e\u6cd5\u5ac1"
I'd like to get this output instead:
"⌂⚘いの法嫁"
How can I dump unicode characters as characters instead of escape sequences?
Call json.dumps with ensure_ascii=False:
json_string = json.dumps(json_dict, ensure_ascii=False)
On Python 2, the return value will be unicode instead of str, so you might want to encode it before doing anything else with it.