Python logging: unicode symbols is unicode-escaped [duplicate] - python

This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 4 years ago.
I want to log requests responses to file, but when i use getLogger().warning(string), following string is being unicode-escaped
Example:
r = requests.post(...)
result = json.loads(r.text) # Now result == '{"error": "Ошибка 1"}'
getLogger().warning(json.dumps(result))
This code will write to log file unicode-escaped string, where "Ошибка" will be written like "\u0417\u0430\u043a\u0430\u0437..."
But i want to see these characters as is.

Your issue is that json.dumps is converting the Unicode to ASCII by escaping it. This can be avoided by adding an extra parameter, ensure_ascii=False, to the .dumps function:
r = requests.post(...)
result = json.loads(r.text) # Now result == {"error": "Ошибка 1"}
getLogger().warning(json.dumps(result, ensure_ascii=False))
You can check the documentation for other arguments to the json.dumps function.

Related

How to save äåö characters in Json file? [duplicate]

This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 1 year ago.
I'm trying to dump everything from scrapy crawler into json file but äåö changes to something like \u00f6.
How do I fix that?
Use ensure_ascii=False
Ex:
import json
data = {"Hello": "äåö"}
print(json.dumps(data, ensure_ascii=False)) # --> {"Hello": "äåö"}
Note: If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.

How to delete "u'" before the attributes [duplicate]

This question already has answers here:
What's the u prefix in a Python string?
(5 answers)
Closed 6 years ago.
When I load the JSON file and when I print it I get before each attribute a "u'"
how can I escape it ?
try:
with codecs.open('graphe.json', 'r', 'utf-8') as json_data:
c = json.load(json_data)
print c
except IOError, e:
print 'IOError : No file in input'
{u'ressourcepath': u'D:\Stage_ete_2016\DjangoProject\resources\',
u'Nodes': [{u'title': [u'npq1', u'npq3', u'npq2'],....
the JSON
{"ressourcepath": "D:\Stage_ete_2016\DjangoProject\resources\",
"Nodes": [{"title": ["npq1", "npq3", "npq2"],...
so the problem is that I use this dictionary to write a JavaScript code (template) and I must respect the JavaScript syntax (Vis js):
The u prefix means that those strings are unicode rather than 8-bit strings. The best way to not show the u prefix is to switch to Python 3, where strings are unicode by default. If that's not an option, the str constructor will convert from unicode to 8-bit, so simply loop recursively over the result and convert unicode to str. However, it is probably best just to leave the strings as unicode.

How to use Python convert a unicode string to the real string [duplicate]

This question already has answers here:
Chinese and Japanese character support in python
(3 answers)
Closed 7 years ago.
I have used Python to get some info through urllib2, but the info is unicode string.
I've tried something like below:
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print unicode(a).encode("gb2312")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.encode("utf-8").decode("utf-8")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print u""+a
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).decode("utf-8")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).encode("utf-8")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.decode("utf-8").encode("gb2312")
but all results are the same:
\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728
And I want to get the following Chinese text:
方法,删除存储在
You need to convert the string to a unicode string.
First of all, the backslashes in a are auto-escaped:
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a # Prints \u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728
a # Prints '\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728'
So playing with the encoding / decoding of this escaped string makes no difference.
You can either use unicode literal or convert the string into a unicode string.
To use unicode literal, just add a u in the front of the string:
a = u"\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
To convert existing string into a unicode string, you can call unicode, with unicode_escape as the encoding parameter:
print unicode(a, encoding='unicode_escape') # Prints 方法,删除存储在
I bet you are getting the string from a JSON response, so the second method is likely to be what you need.
BTW, the unicode_escape encoding is a Python specific encoding which is used to
Produce a string that is suitable as Unicode literal in Python source
code
Where are you getting this data from? Perhaps you could share the method by which you are downloading and extracting it.
Anyway, it kind of looks like a remnant of some JSON encoded string? Based on that assumption, here is a very hacky (and not entirely serious) way to do it:
>>> a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
>>> a
'\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728'
>>> s = '"{}"'.format(a)
>>> s
'"\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728"'
>>> import json
>>> json.loads(s)
u'\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728'
>>> print json.loads(s)
方法,删除存储在
This involves recreating a valid JSON encoded string by wrapping the given string in a in double quotes, then decoding the JSON string into a Python unicode string.

how to normalize or decode an URL in python? [duplicate]

This question already has answers here:
Decode escaped characters in URL
(5 answers)
Url decode UTF-8 in Python
(5 answers)
Closed 9 years ago.
I have a link like below
http%253A%252F.....25252520.doc
How do i convert this to normal link in python?..the link has lots of encoded stuff..
Apply urllib.unquote twice:
>>> import urllib
>>> strs = urllib.unquote("http%253A%252F.....25252520.doc")
>>> urllib.unquote(strs)
'http:/.....25252520.doc'
Use urllib.unquote():
Replace %xx escapes by their single-character equivalent.
It looks as if you have a double or ever triple encoded URL; the http:// part has been encoded to http%253A%252F which decodes to http%3A%2F which in turn becomes http:/. The URL itself may contain another stage of encoding but you didn't share enough of the actual URL with us to determine that.

Python JSON loads/dumps breaks Unicode? [duplicate]

This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 7 months ago.
Dumping a string that contains unicode characters as json produces weird unicode escape sequences:
text = "⌂⚘いの法嫁"
print(text) # output: ⌂⚘いの法嫁
import json
json_text = json.dumps(text)
print(json_text) # output: "\u2302\u2698\u3044\u306e\u6cd5\u5ac1"
I'd like to get this output instead:
"⌂⚘いの法嫁"
How can I dump unicode characters as characters instead of escape sequences?
Call json.dumps with ensure_ascii=False:
json_string = json.dumps(json_dict, ensure_ascii=False)
On Python 2, the return value will be unicode instead of str, so you might want to encode it before doing anything else with it.

Categories