I tried to convert this C# code (encoding and serialization) to Python but results are different. Why?
var dictt = new Dictionary<string, object>
{
{ "aaa", "6mjDx3Cya4JvbTLMenPpXA==" },
{ "bbb", "4U5M+V2yoIA7rWj46rdhTBgpEjf1zYK0m11lDM7DRCI="},
};
JavaScriptSerializer serialzr = new JavaScriptSerializer();
return Convert.ToBase64String(Encoding.UTF8.GetBytes(serialzr.Serialize(dictt))); //eyJpdiI6IjZtakR4M0N5YTRKdmJUTE1lblBwWEE9PSIsInZhbHVlIjoiNFU1TStWMnlvSUE3cldqNDZyZGhUQmdwRWpmMXpZSzBtMTFsRE03RFJDST0ifQ==
dictt = {
"aaa": "6mjDx3Cya4JvbTLMenPpXA==",
"bbb": "4U5M+V2yoIA7rWj46rdhTBgpEjf1zYK0m11lDM7DRCI="
}
y = json.dumps(dictt)
#y= {"aaa": "6mjDx3Cya4JvbTLMenPpXA==", "bbb": "4U5M+V2yoIA7rWj46rdhTBgpEjf1zYK0m11lDM7DRCI="}
json_object = json.loads(y) #convert it to json, like serialize in C#
# json_object = {'aaa': '6mjDx3Cya4JvbTLMenPpXA==', 'bbb': '4U5M+V2yoIA7rWj46rdhTBgpEjf1zYK0m11lDM7DRCI='}
json_object_utf8_encoded = str(json_object).encode('utf8') #encode utf8
#json_object_utf8_encoded = b"{'aaa': '6mjDx3Cya4JvbTLMenPpXA==', 'bbb': '4U5M+V2yoIA7rWj46rdhTBgpEjf1zYK0m11lDM7DRCI='}"
json_base64 = base64.b64encode(json_object_utf8_encoded) #convert to base64 string
#json_base64 = "b'eydpdic6ICc2bWpEeDNDeWE0SnZiVExNZW5QcFhBPT0nLCAndmFsdWUnOiAnNFU1TStWMnlvSUE3cldqNDZyZGhUQmdwRWpmMXpZSzBtMTFsRE03RFJDST0nfQ=='"
json_base64_str = json_base64.decode("utf-8")
return json_base64_str
#eydpdic6ICc2bWpEeDNDeWE0SnZiVExNZW5QcFhBPT0nLCAndmFsdWUnOiAnNFU1TStWMnlvSUE3cldqNDZyZGhUQmdwRWpmMXpZSzBtMTFsRE03RFJDST0nfQ==
There's a small difference between the following two python commands, resulting with two different outputs: " vs. '.
While this one:
serialized = json.dumps(dict)
print(serialized)
outputs:
{"aaa": "6mjDx3Cya4JvbTLMenPpXA==", "bbb": "4U5M+V2yoIA7rWj46rdhTBgpEjf1zYK0m11lDM7DRCI="}
This one:
serialized = json.dumps(dict)
deserialized = json.loads(serialized)
print(str(deserialized))
outputs:
{'aaa': '6mjDx3Cya4JvbTLMenPpXA==', 'bbb': '4U5M+V2yoIA7rWj46rdhTBgpEjf1zYK0m11lDM7DRCI='}
You work with the latter one: you're unnecessarily deserializing the json, and then calling str upon it, giving you the different result. Just drop the json.loads step.
EDIT:
The returned values (C# vs python) will still not match, because C# serializer (whether it's JavaScriptSerializer or the more common JsonConvert.Serialize of Newtonsoft) outputs a json with no spaces:
{"aaa":"6mjDx3Cya4JvbTLMenPpXA==","bbb":"4U5M+V2yoIA7rWj46rdhTBgpEjf1zYK0m11lDM7DRCI="}
while python's json.dumps outputs a json with spaces:
{"aaa": "6mjDx3Cya4JvbTLMenPpXA==", "bbb": "4U5M+V2yoIA7rWj46rdhTBgpEjf1zYK0m11lDM7DRCI="}
Even if you'd pass an indentation argument to json.dumps (e.g., json.dumps(dict, indent=2)), you can't do that with JavaScriptSerializer, and although you could do it with Newtonsoft's JsonConvert.Serialize it would still not work, since the latter uses CRLF characters for the indentation, while python doesn't.
That being said, it shouldn't matter, because the logical output is the same: A Base64 encoding of the serialized dictionary.
Related
I have some JSON data like:
{
"status": "200",
"msg": "",
"data": {
"time": "1515580011",
"video_info": [
{
"announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
"announcement_shop": "",
etc.
How do I grab the content "FOLLOW ME PLEASE"? I tried using
replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']
But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.
How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?
In a single line -
>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'
To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.
First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.
{
'data': {
'time': '1515580011',
'video_info': [{
'announcement': ( # ***
"""{
"announcement_id": "6",
"name": "INS\\u8d26\\u53f7",
"icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
"icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
"videoid": "15154610218328614178",
"content": "FOLLOW ME PLEASE",
"x_coordinate": "0.22",
"y_coordinate": "0.23"
}"""),
'announcement_shop': ''
}]
},
'msg': '',
'status': '200'
}
*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.
First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.
So, in summary, "descend" the ladder that is "data" using the following "rungs" -
data, a dictionary
video_info, a list of dicts
announcement, a dict in the first dict of the list of dicts
content residing as part of json data.
First,
i = data['data']
Next,
j = i['video_info']
Next,
k = j[0] # since this is a list
If you only want the first element, this suffices. Otherwise, you'd need to iterate:
for k in j:
...
Next,
l = k['announcement']
Now, l is JSON data. Load it -
import json
m = json.loads(l)
Lastly,
content = m['content']
print(content)
'FOLLOW ME PLEASE'
This should hopefully serve as a guide should you have future queries of this nature.
You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.
You'll have to decode that string first:
import json
replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])
then handle the resulting dictionary from there.
The content of "announcement" is another JSON string. Decode it and then access its contents as you were doing with the outer objects.
Sorry if it's too much of a noob question.
I have a dictionary where the keys are bytes (like b'access_token' ) instead of strings.
{
b'access_token': [b'b64ssscba8c5359bac7e88cf5894bc7922xxx'],
b'token_type': [b'bearer']
}
usually I access the elements of a dictionary by data_dict.get('key'), but in this case I was getting NoneType instead of the actual value.
How do I access them or is there a way to convert this bytes keyed dict to string keyed dict?
EDIT: I actually get this dict from parsing a query string like this access_token=absdhasd&scope=abc by urllib.parse.parse_qs(string)
You can use str.encode() and bytes.decode() to swap between the two (optionally, providing an argument that specifies the encoding. 'UTF-8' is the default). As a
result, you can take your dict:
my_dict = {
b'access_token': [b'b64ssscba8c5359bac7e88cf5894bc7922xxx'],
b'token_type': [b'bearer']
}
and just do a comprehension to swap all the keys:
new_dict = {k.decode(): v for k,v in my_dict.items()}
# {
# 'access_token': [b'b64ssscba8c5359bac7e88cf5894bc7922xxx'],
# 'token_type': [b'bearer']
# }
Similarly, you can just use .encode() when accessing the dict in order to get a bytes object from your string:
my_key = 'access_token'
my_value = my_dict[my_key.encode()]
# [b'b64ssscba8c5359bac7e88cf5894bc7922xxx']
Most probably, you are making some silly mistake.
It is working fine in my tests.
Perhaps you forgot to add the prefix b when trying to index the dictionary
d={
b'key1': [b'val1'],
b'key2': [b'val2']
}
d[b'key1'] # --> returns [b'val1']
d.get(b'key2') # --> returns [b'val2']
Perhaps this could be something you're looking for?
dict = {
b'access_token': [b'b64ssscba8c5359bac7e88cf5894bc7922xxx'],
b'token_type': [b'bearer']
}
print(dict.get( b'access_token'))
So I have a file with a json that looks like:
{
"a":{
"ab":2,
"cd":3
},
"b":{
"ef":2,
"gh":3
},
"c":{
"ij":2,
"kl":3
}
}
So in python, I would like to import this json from the file, and then from that break it into separate jsons, each in a separate variable, such that each variable would look like:
json1 = {
"a":{
"ab":2,
"cd":3
}
}
##etc.
And these json variables should function as variables that can be converted to json objects, via methods like json.load, or json.dump.
How can this be done?
Once you've imported the file with json.load, what you have is just a plain old Python dict:
with open('bigfile.json') as f:
bigd = json.load('bigfile.json')
And if you iterate over items() for a dict, what you get is key-value pairs.
for key, value in bigd.items():
And turning a key-value pair back into a single-item dict is trivial.
smalld = {key: value}
At which point you have a dict again, so you can json.dump it.
with open(f'smallfile-{key}.json', 'w') as f:
json.dump(f, smalld)
Or whatever else you want to do with them. For example, append each smalld to a listodicts, or convert its repr to ASCII art and send it to /dev/lpr0, or whatever.
I have some JSON data like:
{
"status": "200",
"msg": "",
"data": {
"time": "1515580011",
"video_info": [
{
"announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
"announcement_shop": "",
etc.
How do I grab the content "FOLLOW ME PLEASE"? I tried using
replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']
But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.
How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?
In a single line -
>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'
To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.
First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.
{
'data': {
'time': '1515580011',
'video_info': [{
'announcement': ( # ***
"""{
"announcement_id": "6",
"name": "INS\\u8d26\\u53f7",
"icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
"icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
"videoid": "15154610218328614178",
"content": "FOLLOW ME PLEASE",
"x_coordinate": "0.22",
"y_coordinate": "0.23"
}"""),
'announcement_shop': ''
}]
},
'msg': '',
'status': '200'
}
*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.
First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.
So, in summary, "descend" the ladder that is "data" using the following "rungs" -
data, a dictionary
video_info, a list of dicts
announcement, a dict in the first dict of the list of dicts
content residing as part of json data.
First,
i = data['data']
Next,
j = i['video_info']
Next,
k = j[0] # since this is a list
If you only want the first element, this suffices. Otherwise, you'd need to iterate:
for k in j:
...
Next,
l = k['announcement']
Now, l is JSON data. Load it -
import json
m = json.loads(l)
Lastly,
content = m['content']
print(content)
'FOLLOW ME PLEASE'
This should hopefully serve as a guide should you have future queries of this nature.
You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.
You'll have to decode that string first:
import json
replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])
then handle the resulting dictionary from there.
The content of "announcement" is another JSON string. Decode it and then access its contents as you were doing with the outer objects.
So I use the Java Debugger JSON in my python program because a few months ago I was told that this was the best way of opening a text file and making it into a dictionary and also saving the dictionary to a text file. However I am not sure how it works.
Below is how I am using it within my program:
with open ("totals.txt", 'r') as f30:
totaldict = json.load(f30)
and
with open ("totals.txt", 'w') as f29:
json.dump(totaldict, f29)
I need to explain how it works for my project so could anyone explain for me how exactly json works when loading a text file into dictionary format and when dumping contents into the text file?
Thanks.
Edit: please don't just post links to other articles as I have tried to look at these and they have offered me not much help as they are not in my context of using JSON for dictionaries and a bit overwhelming as I am only a beginner.
JSON is J ava S cript O bject N otation. It works in Python like it does anywhere else, by giving you a syntax for describing arbitrary things as objects.
Most JSON is primarily composed of JavaScript arrays, which look like this:
[1, 2, 3, 4, 5]
Or lists of key-value pairs describing an object, which look like this:
{"key1": "value1", "key2": "value2"}
These can also be nested in either direction:
[{"object1": "data1"}, {"object2": "data2"}]
{"object1": ["list", "of", "data"]}
Naturally, Python can very easily treat these types as lists and dicts, which is exactly what the json module tries to do.
>>> import json
>>> json.loads('[{"object1": "data1"}, {"object2": "data2"}]')
[{'object1': 'data1'}, {'object2': 'data2'}]
>>> json.dumps(_)
'[{"object1": "data1"}, {"object2": "data2"}]'
Try this: Python Module of the Week
The json module provides an API similar to pickle for converting in-memory Python objects to a serialized representation known as JavaScript Object Notation (JSON). Unlike pickle, JSON has the benefit of having implementations in many languages (especially JavaScript)
Encoding and Decoding Simple Data Types
The encoder understands Python’s native types by default (string, unicode, int, float, list, tuple, dict).
import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
print 'DATA:', repr(data)
data_string = json.dumps(data)
print 'JSON:', data_string
Values are encoded in a manner very similar to Python’s repr() output.
$ python json_simple_types.py
DATA: [{'a': 'A', 'c': 3.0, 'b': (2, 4)}]
JSON: [{"a": "A", "c": 3.0, "b": [2, 4]}]
Encoding, then re-decoding may not give exactly the same type of object.
import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
data_string = json.dumps(data)
print 'ENCODED:', data_string
decoded = json.loads(data_string)
print 'DECODED:', decoded
print 'ORIGINAL:', type(data[0]['b'])
print 'DECODED :', type(decoded[0]['b'])
In particular, strings are converted to unicode and tuples become lists.
$ python json_simple_types_decode.py
ENCODED: [{"a": "A", "c": 3.0, "b": [2, 4]}]
DECODED: [{u'a': u'A', u'c': 3.0, u'b': [2, 4]}]
ORIGINAL: <type 'tuple'>
DECODED : <type 'list'>