I've got a script which makes a JSON request that may return text in any script, then outputs the text (I dont have any control over the text being returned).
It works fine with latin characters, but other scripts output as a mojibake, and I'm not sure what's going wrong.
In the response, the problematic characters are encoded using \u syntax. In particular, I have a string containing \u00d0\u00b8\u00d1\u0081\u00d0\u00bf\u00d1\u008b\u00d1\u0082\u00d0\u00b0\u00d0\u00bd\u00d0\u00b8\u00d0\u00b5 which should output as испытание but instead outputs as иÑпÑÑание.
Obviously this is something to do with how python deals with unicode and UTF, but I despite all I've read I don't understand what's going on well enough to know how to solve it.
I've tried to extract the salient points from the code below:
response = requests.get(url, params=params, cookies=self.cookies, auth=self.auth)
text = response.text
print text
status = json.loads(text)
print status
for folder in status['folders']
print folder['name']
Output:
{ "folders": [ { "name": "\u00d0\u00b8\u00d1\u0081\u00d0\u00bf\u00d1\u008b\u00d1\u0082\u00d0\u00b0\u00d0\u00bd\u00d0\u00b8\u00d0\u00b5" } ] }
{u'folders': [{ u'name': u'\xd0\xb8\xd1\x81\xd0\xbf\xd1\x8b\xd1\x82\xd0\xb0\xd0\xbd\xd0\xb8\xd0\xb5' }]}
иÑпÑÑание
I've also tried
status = response.json();
for folder in status['folders']:
print folder['name']
With the same result.
Note, I'm really passing the string to a GTKMenuItem to be displayed, but the output from printing the string is the same as from showing it in the menu.
As #Ricardo Cárdenes said in the comment the server sends incorrect response. The response that you've got is double encoded:
>>>> u = u'\xd0\xb8\xd1\x81\xd0\xbf\xd1\x8b\xd1\x82\xd0\xb0\xd0\xbd\xd0\xb8\xd0\xb5'
>>>> print u.encode('latin-1').decode('utf-8')
испытание
The correct string would look like:
>>>> s = {u"name": u"испытание"}
>>>> import json
>>>> print json.dumps(s)
{"name": "\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435"}
>>>> print s['name']
испытание
>>>> print s['name'].encode('unicode-escape')
\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435
>>>> print s['name'].encode('utf-8')
испытание
>>>> s['name'].encode('utf-8')
'\xd0\xb8\xd1\x81\xd0\xbf\xd1\x8b\xd1\x82\xd0\xb0\xd0\xbd\xd0\xb8\xd0\xb5'
Related
I am making a REST API project that gets data from a python script and prints it via node js. The data is sent from the python script to node js with the following code:
json_dump = json.dumps(data)
print(data.encode("utf-8", "replace"))
And js gets the data with the following code:
PythonShell.run('script.py', options, function (err, data) {
if (err) throw err;
res.json(JSON.parse(data));
});
But I get the following error:
Unexpected token b in JSON at position 0
The JSON arrives correctly but starts with a 'b' and many characters are not getting printed or gets printed like this: "\xf0\x9f\xa4\x91". What can I do?
Remove the .encode("utf-8", "replace"). This converts the string to a bytes object (the representation starts with the b"...")
json_dump = json.dumps(data)
print(json_dump)
I have a json which looks like the below. When i validate json, i get an error "Invalid Json" since i have just a single quote and also, there is u which i can't tell why it is there.
{u'domain': u'127.0.0.1', u'user_id': u'example.com', u'sender': u'shop_1'}
The Json above is invalid. How can i make the json appear with double quotes and also remove the u from the response to get a valid json.
PS: Beginner with Python
You can use json.dumps() for that:
>>> import json
>>> my_json = {u'domain': u'127.0.0.1', u'user_id': u'example.com', u'sender': u'shop_1'}
>>> print(json.dumps(my_json))
{"domain": "127.0.0.1", "user_id": "example.com", "sender": "shop_1"}
The 'u that you see at the start of each string is an indication that this is a unicode string:
What's the u prefix in a Python string?
https://docs.python.org/2/tutorial/introduction.html#unicode-strings
I am using the code below which I found here, but it always returns the same quote. I can see that data itself has more quotes, but not sure how to parse them out one at a time. Thank you.
import requests
url = "http://www.forbes.com/forbesapi/thought/uri.json?enrich=true&query=1&relatedlimit=5"
response = requests.get(url)
data = response.json()
quote=data['thought']['quote'].strip()
Returns:
u'Teach self-denial and make its practice pleasure, and you can create for the world a destiny more sublime that ever issued from the brain of the wildest dreamer.'
type(data) returns a dict. Then data.keys() returns [u'thought']. data.items() returns a bunch of text. Why only one key, if are seemingly more than one quote inside? And why does data = response.json() return disc with type(data), rather than a json object?
I copied your json file at http://jsonviewer.stack.hu/ and saw the structure of data. Once you understand the structure.
import requests
url = "http://www.forbes.com/forbesapi/thought/uri.json?enrich=true&query=1&relatedlimit=5"
response = requests.get(url)
data = response.json()
#print recent quote from Thought
print '*'*10
quote=data['thought']["quote"]
print quote
#print quotes from related authors
print '*'*10
quotes_of_related_authors = data['thought']['relatedAuthorThoughts']
for i in quotes_of_related_authors:
print i.get('quote')
# print quotes from related theme thoughts
print '*'*10
quotes_of_related_theme_thoughts = data['thought']['relatedThemeThoughts']
for i in quotes_of_related_theme_thoughts:
print i.get('quote')
The data returned from run_report returns a python dictionary, where it is then parsed into JSON String and printed so it can be accessed by JSON. The run_report function also creates a .json file in which I can access later:
print "Content-type: application/json\n"
json_data = run_report(sites_list, tierOne, dateFrom, dateTo, filename, file_extension)
print json.dumps(json_data, indent=4, sort_keys=True)
However, when it prints, I receive this output:
..{
"data": {
"FR": 1424068
},
"tierone": {
"countries": [
"US",
"BR",
...
],
"ratio": 100.0,
"total": 1424068,
"total_countries": 1
},
"total": 1424068,
"total_countries": 1
}
What I don't understand is how those trailing dots even show up. The dots, however, do not show up if I were to open one of the .json files I created with the run_report function and print the read data file.
def open_file(file_extension, json_file):
with open(file_extension + json_file) as data_file:
data = json.load(data_file)
return json.dumps(data)
json_data = open_file(file_extension, filename)
print json_data
Something else is producing those . characters; the json.dumps() function never adds those.
Make sure nothing else is writing to sys.stdout; everything you send to sys.stdout is sent to the browser (print writes to sys.stdout by default).
From your comments I understand you wanted to write additional information to the server logs; do not use sys.stdout for that; write to sys.stderr instead.
I've done some coding in RoR, and in Rails, when I return a JSON object via an API call, it returns as
{ "id" : "1", "name" : "Dan" }.
However in Python (with Flask and Flask-SQLAlchemy), when I return a JSON object via json.dumps or jsonpickle.encode it is returned as
"{ \"id\" : \"1\", \"name\": \"Dan\" }" which seems very unwieldily as it can't easily be parsed on the other end (by an iOS app in this case - Obj-C).
What am I missing here, and what should I do to return it as a JSON literal, rather than a JSON string?
This is what my code looks like:
people = models.UserRelationships.query.filter_by(user_id=user_id, active=ACTIVE_RECORD)
friends = people.filter_by(friends=YES)
json_object = jsonpickle.encode(friends.first().as_dict(), unpicklable=False, keys=True)
print(json_object) # this prints here, i.e. { "id" : "1", "name" : "Dan" }
return json_object # this returns "{ \"id\" : \"1\", \"name\": \"Dan\" }" to the browser
What is missing in your understanding here is that when you use the JSON modules in Python, you're not working with a JSON object. JSON is by definition just a string that matches a certain standard.
Lets say you have the string:
friends = '{"name": "Fred", "id": 1}'
If you want to work with this data in python, you will want to load it into a python object:
import json
friends_obj = json.loads(friends)
At this point friends_obj is a python dictionary.
If you want to convert it (or any other python dictionary or list) then this is where json.dumps comes in handy:
friends_str = json.dumps(friends_obj)
print friends_str
'{"name": "Fred", "id": 1}'
However if we attempt to "dump" the original friends string you'll see you get a different result:
dumped_str = json.dumps(friends)
print dumped_str
'"{\\"name\\": \\"Fred\\", \\"id\\": 1}"'
This is because you're basically attempting to encode an ordinary string as JSON and it is escaping the characters. I hope this helps make sense of things!
Cheers
Looks like you are using Django here, in which case do something like
from django.utils import simplejson as json
...
return HttpResponse(json.dumps(friends.first().as_dict()))
This is almost always a sign that you're double-encoding your data somewhere. For example:
>>> obj = { "id" : "1", "name" : "Dan" }
>>> j = json.dumps(obj)
>>> jj = json.dumps(j)
>>> print(obj)
{'id': '1', 'name': 'Dan'}
>>> print(j)
{"id": "1", "name": "Dan"}
>>> print(jj)
"{\"id\": \"1\", \"name\": \"Dan\"}"
Here, jj is a perfectly valid JSON string representation—but it's not a representation of obj, it's a representation of the string j, which is useless.
Normally you don't do this directly; instead, either you started with a JSON string rather than an object in the first place (e.g., you got it from a client request or from a text file), or you called some function in a library like requests or jsonpickle that implicitly calls json.dumps with an already-encoded string. But either way, it's the same problem, with the same solution: Just don't double-encode.
You should be using flask.jsonify, which will not only encode correctly, but also set the content-type headers accordingly.
people = models.UserRelationships.query.filter_by(user_id=user_id, active=ACTIVE_RECORD)
friends = people.filter_by(friends=YES)
return jsonify(friends.first().as_dict())