Python3 - Parse list of strings inside nested json - python

Python Noob here. I saw many similar questions but none of it my exact use case. I have a simple nested json, and I'm trying to access the element name present inside metadata. Below is my sample json.
{
"items": [{
"metadata": {
"name": "myname1"
}
},
{
"metadata": {
"name": "myname1"
}
}
]
}
Below is the code That I have tried so far, but not successfull.
import json
f = open('./myfile.json')
x = f.read()
data = json.loads(x)
for i in data['items']:
for j in i['metadata']:
print (j['name'])
It errors out stating below
File "pythonjson.py", line 8, in
print (j['name']) TypeError: string indices must be integers
When I printed print (type(j)) I received the following o/p <class 'str'>. So I can see that it is a list of strings and not an dictinoary. So now How can I parse through a list of strings? Any official documentation or guide would be much helpful to know the concept of this.

Your json is bad, and the python exception is clear and unambiguous. You have the basic string "name" and you are trying to ... do a lookup on that?
Let's cut out all the json and look at the real issue. You do not know how to iterate over a dict. You're actually iterating over the keys themselves. If you want to see their values too, you're going to need dict.items()
https://docs.python.org/3/tutorial/datastructures.html#looping-techniques
metadata = {"name": "myname1"}
for key, value in metadata.items():
if key == "name":
print ('the name is', value)
But why bother if you already know the key you want to look up?
This is literally why we have dict.
print ('the name is', metadata["name"])

You likely need:
import json
f = open('./myfile.json')
x = f.read()
data = json.loads(x)
for item in data['items']:
print(item["metadata"]["name"]
Your original JSON is not valid (colons missing).

to access contents of name use "i["metadata"].keys()" this will return all keys in "metadata".
Working code to access all values of the dictionary in "metadata".
for i in data['items']:
for j in i["metadata"].keys():
print (i["metadata"][j])
**update:**Working code to access contents of "name" only.
for i in data['items']:
print (i["metadata"]["name"])

Related

get value by key from json data with python [duplicate]

I have some JSON data like:
{
"status": "200",
"msg": "",
"data": {
"time": "1515580011",
"video_info": [
{
"announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
"announcement_shop": "",
etc.
How do I grab the content "FOLLOW ME PLEASE"? I tried using
replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']
But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.
How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?
In a single line -
>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'
To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.
First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.
{
'data': {
'time': '1515580011',
'video_info': [{
'announcement': ( # ***
"""{
"announcement_id": "6",
"name": "INS\\u8d26\\u53f7",
"icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
"icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
"videoid": "15154610218328614178",
"content": "FOLLOW ME PLEASE",
"x_coordinate": "0.22",
"y_coordinate": "0.23"
}"""),
'announcement_shop': ''
}]
},
'msg': '',
'status': '200'
}
*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.
First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.
So, in summary, "descend" the ladder that is "data" using the following "rungs" -
data, a dictionary
video_info, a list of dicts
announcement, a dict in the first dict of the list of dicts
content residing as part of json data.
First,
i = data['data']
Next,
j = i['video_info']
Next,
k = j[0] # since this is a list
If you only want the first element, this suffices. Otherwise, you'd need to iterate:
for k in j:
...
Next,
l = k['announcement']
Now, l is JSON data. Load it -
import json
m = json.loads(l)
Lastly,
content = m['content']
print(content)
'FOLLOW ME PLEASE'
This should hopefully serve as a guide should you have future queries of this nature.
You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.
You'll have to decode that string first:
import json
replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])
then handle the resulting dictionary from there.
The content of "announcement" is another JSON string. Decode it and then access its contents as you were doing with the outer objects.

Using Python to Display a particular value through key in Json file

I used python to get a json response from a website ,the json file is as follows:
{
"term":"albany",
"moresuggestions":490,
"autoSuggestInstance":null,
"suggestions":[
{
"group":"CITY_GROUP",
"entities":[
{
"geoId":"1000000000000000355",
"destinationId":"1508137",
"landmarkCityDestinationId":null,
"type":"CITY",
"caption":"<span class='highlighted'>Albany</span>, Albany County, United States of America",
"redirectPage":"DEFAULT_PAGE",
"latitude":42.650249,
"longitude":-73.753578,
"name":"Albany"
},
{},
{},
{},
{},
{}
]
},
{},
{},
{}
]
}
I used the following script to display the values according to a key:
import json
a =['']
data = json.loads(a)
print data["suggestions"]
This displays everything under 'suggestions' from the json file, however If I want to go one or two more level down,it throws an error.For Eg. I wanted to display the value of "caption", I searched for the solution but could not find what I need.I even tried calling :
print data["suggestions"]["entities"]
But the above syntax throws an error.What am I missing here?
data["suggestions"] is a list of dictionaries. You either need to provide an index (ie data["suggestions"][0]["entities"]) or use a loop:
for suggestion in data["suggestions"]:
print suggestion["entities"]
Keep in mind that "entities" is also a list, so the same will apply:
for suggestion in data["suggestions"]:
for entity in suggestion["entities"]:
print entity["caption"]
If you see data within suggestions, is an array, so you should read like below:
print data["suggestions"][0]["entities"]
print data["suggestions"][0]["entities"][0]["caption"]
"Suggestion" key holds a list of dicts.
You can access it like this though if the positions of dictionary remain intact.
data["suggestions"][0]["entities"][0]["caption"]

How can I access the nested data in this complex JSON, which includes another JSON document as one of the strings?

I have some JSON data like:
{
"status": "200",
"msg": "",
"data": {
"time": "1515580011",
"video_info": [
{
"announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
"announcement_shop": "",
etc.
How do I grab the content "FOLLOW ME PLEASE"? I tried using
replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']
But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.
How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?
In a single line -
>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'
To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.
First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.
{
'data': {
'time': '1515580011',
'video_info': [{
'announcement': ( # ***
"""{
"announcement_id": "6",
"name": "INS\\u8d26\\u53f7",
"icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
"icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
"videoid": "15154610218328614178",
"content": "FOLLOW ME PLEASE",
"x_coordinate": "0.22",
"y_coordinate": "0.23"
}"""),
'announcement_shop': ''
}]
},
'msg': '',
'status': '200'
}
*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.
First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.
So, in summary, "descend" the ladder that is "data" using the following "rungs" -
data, a dictionary
video_info, a list of dicts
announcement, a dict in the first dict of the list of dicts
content residing as part of json data.
First,
i = data['data']
Next,
j = i['video_info']
Next,
k = j[0] # since this is a list
If you only want the first element, this suffices. Otherwise, you'd need to iterate:
for k in j:
...
Next,
l = k['announcement']
Now, l is JSON data. Load it -
import json
m = json.loads(l)
Lastly,
content = m['content']
print(content)
'FOLLOW ME PLEASE'
This should hopefully serve as a guide should you have future queries of this nature.
You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.
You'll have to decode that string first:
import json
replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])
then handle the resulting dictionary from there.
The content of "announcement" is another JSON string. Decode it and then access its contents as you were doing with the outer objects.

how to parse json nested dict in python?

I'm trying to work with json file stored locally. That is formatted as below:
{
"all":{
"variables":{
"items":{
"item1":{
"one":{
"size":"1"
},
"two":{
"size":"2"
}
}
}
}
}
}
I'm trying to get the value of the size key using the following code.
with open('path/to/file.json','r') as file:
data = json.load(file)
itemParse(data["all"]["variables"]["items"]["item1"])
def itemParse(data):
for i in data:
# also tried for i in data.iterkeys():
# data has type dict while i has type unicode
print i.get('size')
# also tried print i['size']
got different errors and nothing seems to work. any suggestions?
also, tried using json.loads got error expect string or buffer
When you iterate over data you are getting the key only. There is 2 ways to solve it.
def itemParse(data):
for i, j in data.iteritems():
print j.get('size')
or
def itemParse(data):
for i in data:
print data[i].get('size')
First, use json.loads().
data = json.loads(open('path/to/file.json','r').read())
Second, your for loop should be changed to this
for k,v in data.iteritems():
print data[k]['size']
Regarding the error expect string or buffer, do you have permissions to read the json file?

Parsing Python JSON with multiple same strings with different values

I am stuck on an issue where I am trying to parse for the id string in JSON that exists more than 1 time. I am using the requests library to pull json from an API. I am trying to retrieve all of the values of "id" but have only been able to successfully pull the one that I define. Example json:
{
"apps": [{
"id": "app1",
"id": "app2",
"id": "new-app"
}]
}
So what I have done so far is turn the json response into dictionary so that I am actually parse the first iteration of "id". I have tried to create for loops but have been getting KeyError when trying to find string id or TypeError: list indices must be integers or slices, not str. The only thing that I have been able to do successfully is define which id locations to output.
(data['apps'][N]['id']) -> where N = 0, 1 or 2
This would work if there was only going to be 1 string of id at a time but will always be multiple and the location will change from time to time.
So how do return the values of all strings for "id" from this single json output? Full code below:
import requests
url = "http://x.x.x.x:8080/v2/apps/"
response = requests.get(url)
#Error if not 200 and exit
ifresponse.status_code!=200:
print("Status:", response.status_code, "CheckURL.Exiting")
exit()
#Turn response into a dict and parse for ids
data = response.json()
for n in data:
print(data['apps'][0]['id'])
OUTPUT:
app1
UPDATE:
Was able to get resolution thanks to Robᵩ. Here is what I ended up using:
def list_hook(pairs):
result = {}
for name, value in pairs:
if name == 'id':
result.setdefault(name, []).append(value)
print(value)
data = response.json(object_pairs_hook = list_hook)
Also The API that I posted as example is not a real API. It was just supposed to be a visual representation of what I was trying to achieve. I am actually using Mesosphere's Marathon API . Trying to build a python listener for port mapping containers.
Your best choice is to contact the author of the API and let him know that his data format is silly.
Your next-best choice is to modify the behavior of the the JSON parser by passing in a hook function. Something like this should work:
def list_hook(pairs):
result = {}
for name, value in pairs:
if name == 'id':
result.setdefault(name, []).append(value)
else:
result[name] = value
return result
data = response.json(object_pairs_hook = list_hook)
for i in range(3):
print(i, data['apps'][0]['id'][i])

Categories