This question already has answers here:
What's the u prefix in a Python string?
(5 answers)
Closed 6 years ago.
When I load the JSON file and when I print it I get before each attribute a "u'"
how can I escape it ?
try:
with codecs.open('graphe.json', 'r', 'utf-8') as json_data:
c = json.load(json_data)
print c
except IOError, e:
print 'IOError : No file in input'
{u'ressourcepath': u'D:\Stage_ete_2016\DjangoProject\resources\',
u'Nodes': [{u'title': [u'npq1', u'npq3', u'npq2'],....
the JSON
{"ressourcepath": "D:\Stage_ete_2016\DjangoProject\resources\",
"Nodes": [{"title": ["npq1", "npq3", "npq2"],...
so the problem is that I use this dictionary to write a JavaScript code (template) and I must respect the JavaScript syntax (Vis js):
The u prefix means that those strings are unicode rather than 8-bit strings. The best way to not show the u prefix is to switch to Python 3, where strings are unicode by default. If that's not an option, the str constructor will convert from unicode to 8-bit, so simply loop recursively over the result and convert unicode to str. However, it is probably best just to leave the strings as unicode.
Related
This question already has answers here:
getting bytes from unicode string in python
(6 answers)
Closed 2 years ago.
I have a string which I get from a function
>>> example = Some_function()
This Some_function return a very long combination of Unicode and ASCII string like 'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123'
My Problem is that when I try to convert this unicode string to bytes it gives me an error that \ud919 cannot be encoded by utf-8. I tried :
>>> further=bytes(example,encoding='utf-8')
Note: I cannot ignore this \ud919. If there is a way to solve this problem or how can I convert 'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123' to 'gn1\ud123a\ud123\ud123\ud123\\ud919\ud123\ud123' to treat \ud919 as simple string not unicode.
based on the version.
print type(unicode_string), repr(unicode_string) Python 3.x : print type(unicode_string), ascii(unicode_string)
\ud919 is a surrogate character, one does not simply convert it. Use surrogatepass flag:
'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123'.encode('utf-8', 'surrogatepass')
>>> b'gn1\xed\x84\xa3a\xed\x84\xa3\xed\x84\xa3\xed\x84\xa3\xed\xa4\x99\xed\x84\xa3\xed\x84\xa3'
This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 2 years ago.
I'm scraping some data from a website and using regex i was able to extract some strings in UTF-16 format. Using this site I'm able to decode the strings i extract but i want to do it all in Python.
The extracted text is in String format, not bytes. So a simple .encode() doesn't work.
For example:
String: \u0074\u0065\u0073\u0074 --> String: test
I can think of solving this by treating the string as a byte object, but i have no idea how to do this.
EDIT: The data chunk i've extracted from using regex:
I = new Array();
I[0] = new Array();
I[0][1] = new Array();
I[0][1][0] = new Array();
I[0][1][0][0] = '\u0074\u0065\u0073\u0074';
I[0][2]='';
Any help is appreciated.
Thanks
If you don't care about security, the simplest way is eval:
eval('"' + yourstringhere + '"')
But the correct way to do this would be:
x = bytes(yourstringhere, "utf-8").decode("unicode_escape")
the variable yourstringhere should contain the "String" object you mentioned before running this, obviously.
At first we convert the string to bytes using the bytes function. Then we decode using a special encoding called unicode_escape which parses the \u.... sequences into actual unicode characters
This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 4 years ago.
I want to log requests responses to file, but when i use getLogger().warning(string), following string is being unicode-escaped
Example:
r = requests.post(...)
result = json.loads(r.text) # Now result == '{"error": "Ошибка 1"}'
getLogger().warning(json.dumps(result))
This code will write to log file unicode-escaped string, where "Ошибка" will be written like "\u0417\u0430\u043a\u0430\u0437..."
But i want to see these characters as is.
Your issue is that json.dumps is converting the Unicode to ASCII by escaping it. This can be avoided by adding an extra parameter, ensure_ascii=False, to the .dumps function:
r = requests.post(...)
result = json.loads(r.text) # Now result == {"error": "Ошибка 1"}
getLogger().warning(json.dumps(result, ensure_ascii=False))
You can check the documentation for other arguments to the json.dumps function.
This question already has answers here:
Chinese and Japanese character support in python
(3 answers)
Closed 7 years ago.
I have used Python to get some info through urllib2, but the info is unicode string.
I've tried something like below:
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print unicode(a).encode("gb2312")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.encode("utf-8").decode("utf-8")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print u""+a
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).decode("utf-8")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).encode("utf-8")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.decode("utf-8").encode("gb2312")
but all results are the same:
\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728
And I want to get the following Chinese text:
方法,删除存储在
You need to convert the string to a unicode string.
First of all, the backslashes in a are auto-escaped:
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a # Prints \u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728
a # Prints '\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728'
So playing with the encoding / decoding of this escaped string makes no difference.
You can either use unicode literal or convert the string into a unicode string.
To use unicode literal, just add a u in the front of the string:
a = u"\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
To convert existing string into a unicode string, you can call unicode, with unicode_escape as the encoding parameter:
print unicode(a, encoding='unicode_escape') # Prints 方法,删除存储在
I bet you are getting the string from a JSON response, so the second method is likely to be what you need.
BTW, the unicode_escape encoding is a Python specific encoding which is used to
Produce a string that is suitable as Unicode literal in Python source
code
Where are you getting this data from? Perhaps you could share the method by which you are downloading and extracting it.
Anyway, it kind of looks like a remnant of some JSON encoded string? Based on that assumption, here is a very hacky (and not entirely serious) way to do it:
>>> a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
>>> a
'\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728'
>>> s = '"{}"'.format(a)
>>> s
'"\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728"'
>>> import json
>>> json.loads(s)
u'\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728'
>>> print json.loads(s)
方法,删除存储在
This involves recreating a valid JSON encoded string by wrapping the given string in a in double quotes, then decoding the JSON string into a Python unicode string.
This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 7 months ago.
Dumping a string that contains unicode characters as json produces weird unicode escape sequences:
text = "⌂⚘いの法嫁"
print(text) # output: ⌂⚘いの法嫁
import json
json_text = json.dumps(text)
print(json_text) # output: "\u2302\u2698\u3044\u306e\u6cd5\u5ac1"
I'd like to get this output instead:
"⌂⚘いの法嫁"
How can I dump unicode characters as characters instead of escape sequences?
Call json.dumps with ensure_ascii=False:
json_string = json.dumps(json_dict, ensure_ascii=False)
On Python 2, the return value will be unicode instead of str, so you might want to encode it before doing anything else with it.