Python 3: taking a json and splitting it into smaller jsons - python

So I have a file with a json that looks like:
{
"a":{
"ab":2,
"cd":3
},
"b":{
"ef":2,
"gh":3
},
"c":{
"ij":2,
"kl":3
}
}
So in python, I would like to import this json from the file, and then from that break it into separate jsons, each in a separate variable, such that each variable would look like:
json1 = {
"a":{
"ab":2,
"cd":3
}
}
##etc.
And these json variables should function as variables that can be converted to json objects, via methods like json.load, or json.dump.
How can this be done?

Once you've imported the file with json.load, what you have is just a plain old Python dict:
with open('bigfile.json') as f:
bigd = json.load('bigfile.json')
And if you iterate over items() for a dict, what you get is key-value pairs.
for key, value in bigd.items():
And turning a key-value pair back into a single-item dict is trivial.
smalld = {key: value}
At which point you have a dict again, so you can json.dump it.
with open(f'smallfile-{key}.json', 'w') as f:
json.dump(f, smalld)
Or whatever else you want to do with them. For example, append each smalld to a listodicts, or convert its repr to ASCII art and send it to /dev/lpr0, or whatever.

Related

get value by key from json data with python [duplicate]

I have some JSON data like:
{
"status": "200",
"msg": "",
"data": {
"time": "1515580011",
"video_info": [
{
"announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
"announcement_shop": "",
etc.
How do I grab the content "FOLLOW ME PLEASE"? I tried using
replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']
But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.
How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?
In a single line -
>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'
To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.
First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.
{
'data': {
'time': '1515580011',
'video_info': [{
'announcement': ( # ***
"""{
"announcement_id": "6",
"name": "INS\\u8d26\\u53f7",
"icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
"icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
"videoid": "15154610218328614178",
"content": "FOLLOW ME PLEASE",
"x_coordinate": "0.22",
"y_coordinate": "0.23"
}"""),
'announcement_shop': ''
}]
},
'msg': '',
'status': '200'
}
*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.
First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.
So, in summary, "descend" the ladder that is "data" using the following "rungs" -
data, a dictionary
video_info, a list of dicts
announcement, a dict in the first dict of the list of dicts
content residing as part of json data.
First,
i = data['data']
Next,
j = i['video_info']
Next,
k = j[0] # since this is a list
If you only want the first element, this suffices. Otherwise, you'd need to iterate:
for k in j:
...
Next,
l = k['announcement']
Now, l is JSON data. Load it -
import json
m = json.loads(l)
Lastly,
content = m['content']
print(content)
'FOLLOW ME PLEASE'
This should hopefully serve as a guide should you have future queries of this nature.
You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.
You'll have to decode that string first:
import json
replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])
then handle the resulting dictionary from there.
The content of "announcement" is another JSON string. Decode it and then access its contents as you were doing with the outer objects.

How to append the list of dictionary to same list in python?

I'm having a JSON with nested values. I need to remove the key of the nested field and need to keep the values as plain JSON.
JSON(Structure of my JSON)
[
{
"id":"101",
"name":"User1",
"place":{
"city":"City1",
"district":"District1",
"state":"State1",
"country":"Country1"
},
"access":[{"status":"available"}]
}
]
I need to get the JSON output as:
Expected Output:
[
{
"id":"101",
"name":"User1",
"city":"City1",
"district":"District1",
"state":"State1",
"country":"Country1"
"access":[{"status":"available"}]
}
]
What i need is:
I need to parse the JSON
Get the Placefield out of the JSON
Remove the key and brackets and append the values to existing
Python
for i in range(0,len(json_data)):
place_data = json_data[i]['place']
print(type(place_data)) #dict
del place_data['place']
Any approach to get the expected output in python.?
One way to accomplish this could be by
for i in json_data:
i.update(i.pop("place"))
Another way to accomplish this with multiple "keys" updated...
This would only work for a single nested level as described in the original question
def expand(array):
flatten = list()
for obj in array:
temp = {}
for key, value in obj.items():
if isinstance(value, dict):
temp.update(value)
else:
temp.update({key:value})
flatten.append(temp)
return flatten

Python3 - Parse list of strings inside nested json

Python Noob here. I saw many similar questions but none of it my exact use case. I have a simple nested json, and I'm trying to access the element name present inside metadata. Below is my sample json.
{
"items": [{
"metadata": {
"name": "myname1"
}
},
{
"metadata": {
"name": "myname1"
}
}
]
}
Below is the code That I have tried so far, but not successfull.
import json
f = open('./myfile.json')
x = f.read()
data = json.loads(x)
for i in data['items']:
for j in i['metadata']:
print (j['name'])
It errors out stating below
File "pythonjson.py", line 8, in
print (j['name']) TypeError: string indices must be integers
When I printed print (type(j)) I received the following o/p <class 'str'>. So I can see that it is a list of strings and not an dictinoary. So now How can I parse through a list of strings? Any official documentation or guide would be much helpful to know the concept of this.
Your json is bad, and the python exception is clear and unambiguous. You have the basic string "name" and you are trying to ... do a lookup on that?
Let's cut out all the json and look at the real issue. You do not know how to iterate over a dict. You're actually iterating over the keys themselves. If you want to see their values too, you're going to need dict.items()
https://docs.python.org/3/tutorial/datastructures.html#looping-techniques
metadata = {"name": "myname1"}
for key, value in metadata.items():
if key == "name":
print ('the name is', value)
But why bother if you already know the key you want to look up?
This is literally why we have dict.
print ('the name is', metadata["name"])
You likely need:
import json
f = open('./myfile.json')
x = f.read()
data = json.loads(x)
for item in data['items']:
print(item["metadata"]["name"]
Your original JSON is not valid (colons missing).
to access contents of name use "i["metadata"].keys()" this will return all keys in "metadata".
Working code to access all values of the dictionary in "metadata".
for i in data['items']:
for j in i["metadata"].keys():
print (i["metadata"][j])
**update:**Working code to access contents of "name" only.
for i in data['items']:
print (i["metadata"]["name"])

Remove entire JSON object if it contains a specified phrase (from a list in python)

Have a JSON file output similar to:
{
"object1": {
"json_data": "{json data information}",
"tables_data": "TABLES_DATA"
},
"object2": {
"json_data": {json data information}",
"tables_data": ""
}
}
Essentially, if there is an empty string for tables_data as shown in object2 (eg. "tables_data": ""), I want the entire object to be removed so that the output would look like:
{
"object1": {
"json_data": "{json data information}",
"tables_data": "TABLES_DATA"
}
}
What is the best way to go about doing this? Each of these objects correspond to a separate index in a list that I've appended called summary[].
To achieve this, you could iterate through the JSON dictionary and test the tables_data values, adding the objectX elements to a new dictionary if their tables_data value passes the test:
new_dict = {k: v for k, v in json_dict.items()
if v.get("tables_data", "") != ""}
If your JSON objectX is stored in a list as you say, these could be processed as follows using a list comprehension:
filtered_summary = [object_dict for object_dict in summary
if object_dict.get("tables_data", "") != ""]
Unless you have compelling reasons to do otherwise, the pythonic way to filter out a list or dict (or any iterable) is not to change it in place but to create a new filtered one. For your case this would look like
raw_data = YOUR_DICT_HERE_WHEREVER_IT_COMES_FROM
# NB : empty string have a false value in a boolean test
cleaned_data = {k:v for k, v in raw_data.items() if not v["table_data"]}

How can I access the nested data in this complex JSON, which includes another JSON document as one of the strings?

I have some JSON data like:
{
"status": "200",
"msg": "",
"data": {
"time": "1515580011",
"video_info": [
{
"announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
"announcement_shop": "",
etc.
How do I grab the content "FOLLOW ME PLEASE"? I tried using
replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']
But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.
How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?
In a single line -
>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'
To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.
First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.
{
'data': {
'time': '1515580011',
'video_info': [{
'announcement': ( # ***
"""{
"announcement_id": "6",
"name": "INS\\u8d26\\u53f7",
"icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
"icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
"videoid": "15154610218328614178",
"content": "FOLLOW ME PLEASE",
"x_coordinate": "0.22",
"y_coordinate": "0.23"
}"""),
'announcement_shop': ''
}]
},
'msg': '',
'status': '200'
}
*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.
First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.
So, in summary, "descend" the ladder that is "data" using the following "rungs" -
data, a dictionary
video_info, a list of dicts
announcement, a dict in the first dict of the list of dicts
content residing as part of json data.
First,
i = data['data']
Next,
j = i['video_info']
Next,
k = j[0] # since this is a list
If you only want the first element, this suffices. Otherwise, you'd need to iterate:
for k in j:
...
Next,
l = k['announcement']
Now, l is JSON data. Load it -
import json
m = json.loads(l)
Lastly,
content = m['content']
print(content)
'FOLLOW ME PLEASE'
This should hopefully serve as a guide should you have future queries of this nature.
You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.
You'll have to decode that string first:
import json
replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])
then handle the resulting dictionary from there.
The content of "announcement" is another JSON string. Decode it and then access its contents as you were doing with the outer objects.

Categories