Flatten Nested JSON in Python - python

I'm new to Python and I'm quite stuck (I've gone through multiple other stackoverflows and other sites and still can't get this to work).
I've the below json coming out of an API connection
{
"results":[
{
"group":{
"mediaType":"chat",
"queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
},
"data":[
{
"interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
"metrics":[
{
"metric":"nOffered",
"qualifier":null,
"stats":{
"max":null,
"min":null,
"count":14,
"count_negative":null,
"count_positive":null,
"sum":null,
"current":null,
"ratio":null,
"numerator":null,
"denominator":null,
"target":null
}
}
],
"views":null
}
]
}
]
}
and what I'm mainly looking to get out of it is (or at least something as close as)
MediaType
QueueId
NOffered
Chat
67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d
14
Is something like that possible? I've tried multiple things and I either get the whole of this out in one line or just get different errors.

The error you got indicates you missed that some of your values are actually a dictionary within an array.
Assuming you want to flatten your json file to retrieve the following keys: mediaType, queueId, count.
These can be retrieved by the following sample code:
import json
with open(path_to_json_file, 'r') as f:
json_dict = json.load(f)
for result in json_dict.get("results"):
media_type = result.get("group").get("mediaType")
queue_id = result.get("group").get("queueId")
n_offered = result.get("data")[0].get("metrics")[0].get("count")
If your data and metrics keys will have multiple indices you will have to use a for loop to retrieve every count value accordingly.

Assuming that the format of the API response is always the same, have you considered hardcoding the extraction of the data you want?
This should work: With response defined as the API output:
response = {
"results":[
{
"group":{
"mediaType":"chat",
"queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
},
"data":[
{
"interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
"metrics":[
{
"metric":"nOffered",
"qualifier":'null',
"stats":{
"max":'null',
"min":'null',
"count":14,
"count_negative":'null',
"count_positive":'null',
"sum":'null',
"current":'null',
"ratio":'null',
"numerator":'null',
"denominator":'null',
"target":'null'
}
}
],
"views":'null'
}
]
}
]
}
You can extract the results as follows:
results = response["results"][0]
{
"mediaType": results["group"]["mediaType"],
"queueId": results["group"]["queueId"],
"nOffered": results["data"][0]["metrics"][0]["stats"]["count"]
}
which gives
{
'mediaType': 'chat',
'queueId': '67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d',
'nOffered': 14
}

Related

JSON response list select

I'm fiddling with an API and this is the response i get:
{
"name1": {
},
"name2": {
"something1": 213,
"something2": [
{
"info1": 123,
"info2": 324
}
]
}
}
I've tried using
r.json()['name2']['something2']['info2'][0]
and
r.json()['name2']['something2'][0]
the first one gives me an error while the second one prints "something2" in its entirety, which i only need specific infos and values from there. How can I do that?
I think you should use the following code
data = r.json()
data['name2']['something2'][0]['info2']
You need to put [0] after something2 because something2 is a list

Convert nested dictionary inside dictionary into relational and add missing keys using Python

I am trying to convert below json records into relational but I am not getting the expected output,
Filename.json:-
{
"SampleRecord":{
"SampleRules":[
{
"Scaler_id":"1",
"family_min_samples_percentage":5,
"original_number_of_clusters":4,
"Results":[
{
"eps_value":0.1,
"min_samples":5,
"number_of_clusters":9,
"number_of_noise_samples":72,
"scores":{
"adjusted_rand_index":0.001,
"adjusted_mutual_info_score":0.009
}
}
],
"isnegative":"False",
"comment":[
"#Comment"
],
"enable":"enabled",
"additional_value":{
"type":[
{
"value":"AAA"
}
],
"uid":[
{
"value":"BBB"
}
],
"options":[
{
"value":"CCC"
},
{
"value":"DDD"
}
],
"scope":[
{
"value":"EEE"
}
]
}
},
{
"Scaler_id":"2",
"family_min_samples_percentage":5,
"original_number_of_clusters":4,
"Results":[
{
"eps_value":0.1,
"min_samples":5,
"number_of_clusters":9,
"number_of_noise_samples":72,
"scores":{
"adjusted_rand_index":0.001,
"adjusted_mutual_info_score":0.009
}
}
],
"isnegative":"False",
"comment":[
"#Comment"
],
"enable":"enabled",
"additional_value":{
"type":[
{
"value":"AAA"
}
],
"uid":[
{
"value":"BBB"
}
],
"options":[
{
"value":"CCC"
}
]
}
}
]
}
}
Expected output:
Scaler_id~original_number_of_clusters~Results_eps_value~Results_Scores_adjusted_rand_index~Results_Scores_avies_bouldin_score~isnegative~comment~additional_value_type~additional_value_uid~additional_value_options
1~4~0.1~0.001~1.70~False~#comment~AAA~BBB~CCC~EEE
1~4~0.1~0.001~1.70~False~#comment~AAA~BBB~DDD~EEE
2~4~0.1~0.001~1.70~False~#comment~AAA~BBB~CCC~Null
with open(Filename.json) as inputfile:
content = inputfile.read()
data=json.loads(json.dumps(content,ensure_ascii=False))
df1=pd.json_normalize(data['SampleRecord'],'SampleRules',sep='_')
df23.to_csv('Sample1.txt',encoding='utf-8',index=False,sep'~',na_rep='')
output(Sample1.txt):
Scaler_id~original_number_of_clusters~Results~isnegative~comment~additional_value_type~additional_value_uid~additional_value_options
1~4~[{0.1},{0.001},{1.70}]~False~#comment~[{AAA}]~[{BBB}]~[{CCC},{DDD}]~[{EEE}]
2~4~[{0.1},{0.001},{1.70}]~False~#comment~[{AAA}]~[{BBB}]~[{CCC}]~
df2=pd.json_normalize(data['SampleRecord'],['SampleRules','Results'],[['SampleRules','Scaler_id'],['SampleRules','original_number_of_clusters'],['SampleRules','isnegative'],['SampleRules','comment']],record_prefix='Results',sep='_',max_level=None,errors='ignore')
df3=pd.json_normalize(data['SampleRecord'],['SampleRules','additional_value','type'],['SampleRules','Scaler_id'],record_prefix='additional_value',sep='_',max_level=None,errors='ignore')
df23=pd.merge(df2,df3,how='inner',left_on=('SampleRules_Scaler_id'),right_on=('SampleRules_Scaler_id'))
df23.to_csv('Sample2.txt',encoding='utf-8',index=False,sep'~',na_rep='')
Current output(Sample2.txt):
1~4~0.1~0.001~1.70~False~#comment~AAA~BBB~CCC
1~4~0.1~0.001~1.70~False~#comment~AAA~BBB~DDD
2~4~0.1~0.001~1.70~False~#comment~AAA~BBB~CCC
df4=pd.json_normalize(data['SampleRecord'],['SampleRules','additional_value','scope'],['SampleRules','Scaler_id'],record_prefix='additional_value',sep='_',max_level=None,errors='ignore') #This throws KeyError='Scope' (since this key is missing in few records)
I tried to use get() since it gives default value None but it didnt work,
df4=pd.json_normalize(data['SampleRecord']['SampleRules'],['additional_value'],['scope'].get('value'),['SampleRules','Scaler_id'],record_prefix='additional_value',sep='_',max_level=None,errors='ignore')
#TypeError : list indices must be integers or slices,not str
Problems:
1)How to get nested dictionary (additional_value) values in single normalize python code like without explicitly defining df2,df3,df4 for each sub dictionaries?
2)How to get missing key as Null if key itself missing in Json record and avoid keyError
I have already referred the below,but no luck
How to fill missing json keys with key and null value?
If key not in JSON then set value to null and insert into dataframe
Python JSON TypeError list indices must be integers or slices, not str
python dictionary keyError
I am a beginner to Python. Any suggestions would be of great help.
Thanks in advance!

Flatten json in Python from XHR-response

Updated: The XHR response was not correct earlier
I'm failing with flatten my json in a correct way from a XHR-response.
I have just expanded one item below, to make it more readable.
I am using python and I have tried, with incorrect outcome.
u = "URL"
SE_units = requests.get(u,headers=h).json()
dp = pd.json_normalize(SE_units,[SE_units,"Items"])
SE_dp_list.append(dp)
From the XHR-Response below I would like to have the Items-information into a CSV but when i do export.to_CSV I see that it haven't been flattened correctly
{"Content":{
"PaginationCount":12,"FilterValues":null,"Items":
[{
"Id":258370,
"OriginalType":"BostadObjectPage",
"PublishDate":null,
"Title":"02 Skogsvagen",
"Image":
{
"description":null,
"alt":null,
"externalUrl":"/abc.jpg"
},
"StaticMapImage":null,
"Url":"/abcd/",
"HideReadMore":false,
"ProjectData":null,
"ObjectData":
{
"BuildingTypeLabel":"Rad-/Kedje-/Parhus",
"ObjectStatus":"SalesInProgress",
"ObjectStatusLabel":"Till salu",
"ObjectNumber":"02",
"City":"staden",
"RoomInterval":"2-3",
"LivingArea":"101",
"SalesPrice":"2 150 000",
"MonthlyFee":null,
"Elevator":false,
"Balcony":false,
"Terrace":true
},
"FastighetProjectData":null,
"FastighetObjectData":null,
"OfficeData":null
},
{
"Id":258372,
"OriginalType":"BostadObjectPage",
"PublishDate":null,
....."same structure as above"
"OfficeData":null
}],
"NoResultsMessage":null,
"SimplifiedBuildingType":null,
"NextIndex":-1,
"TotalCount":12,
"Heading":null,
"ShowMoreLabel":null,
"DataColumns":null,
"Error":null},
"ObjectSearchData":
{
"BuildingVariantId":"Houses",
"BuildingsFoundLabel":" {count}",
"BuildingTypeIds":[400],
"BuildingsAvailableForSale":12,
"BuildingNoResultsLabel":""
}
}
Expected output format after writing to CSV

How to get values of keys for changing Json

I am using python2.7
I have a json i pull that is always changing when i request it.
I need to pull out Animal_Target_DisplayName under Term7 Under Relation6 in my dict.
The problem is sometimes the object Relation6 is in another part of the Json, it could be leveled deeper or in another order.
I am trying to create code that can just export the values of the key Animal_Target_DisplayName but nothing is working. It wont even loop down the nested dict.
Now this can work if i just pull it out using something like ['view']['Term0'][0]['Relation6'] but remember the JSON is never returned in the same structure.
Code i am using to get the values of the key Animal_Target_DisplayName but it doesnt seem to loop through my dict and find all the values with that key.
array = []
for d in dict.values():
row = d['Animal_Target_DisplayName']
array.append(row)
JSON Below:
dict = {
"view":{
"Term0":[
{
"Id":"b0987b91-af12-4fe3-a56f-152ac7a4d84d",
"DisplayName":"Dog",
"FullName":"Dog",
"AssetType1":[
{
"AssetType_Id":"00000000-0000-0000-0000-000000031131",
}
]
},
{
"Id":"ee74a59d-fb74-4052-97ba-9752154f015d",
"DisplayName":"Dog2",
"FullName":"Dog",
"AssetType1":[
{
"AssetType_Id":"00000000-0000-0000-0000-000000031131",
}
]
},
{
"Id":"eb548eae-da6f-41e8-80ea-7e9984f56af6",
"DisplayName":"Dog3",
"FullName":"Dog3",
"AssetType1":[
{
"AssetType_Id":"00000000-0000-0000-0000-000000031131",
}
]
},
{
"Id":"cfac6dd4-0efa-4417-a2bf-0333204f8a42",
"DisplayName":"Animal Set",
"FullName":"Animal Set",
"AssetType1":[
{
"AssetType_Id":"00000000-0000-0000-0001-000400000001",
}
],
"StringAttribute2":[
{
"StringAttribute_00000000-0000-0000-0000-000000003114_Id":"00a701a8-be4c-4b76-a6e5-3b0a4085bcc8",
"StringAttribute_00000000-0000-0000-0000-000000003114_Value":"Desc"
}
],
"StringAttribute3":[
{
"StringAttribute_00000000-0000-0000-0000-000000000262_Id":"a81adfb4-7528-4673-8c95-953888f3b43a",
"StringAttribute_00000000-0000-0000-0000-000000000262_Value":"meow"
}
],
"BooleanAttribute4":[
{
"BooleanAttribute_00000000-0000-0000-0001-000500000001_Id":"932c5f97-c03f-4a1a-a0c5-a518f5edef5e",
"BooleanAttribute_00000000-0000-0000-0001-000500000001_Value":"true"
}
],
"SingleValueListAttribute5":[
{
"SingleValueListAttribute_00000000-0000-0000-0001-000500000031_Id":"ef51dedd-6f25-4408-99a6-5a6cfa13e198",
"SingleValueListAttribute_00000000-0000-0000-0001-000500000031_Value":"Blah"
}
],
"Relation6":[
{
"Animal_Id":"2715ca09-3ced-4b74-a418-cef4a95dddf1",
"Term7":[
{
"Animal_Target_Id":"88fd0090-4ea8-4ae6-b7f0-1b13e5cf3d74",
"Animal_Target_DisplayName":"Animaltheater",
"Animal_Target_FullName":"Animaltheater"
}
]
},
{
"Animal_Id":"6068fe78-fc8e-4542-9aee-7b4b68760dcd",
"Term7":[
{
"Animal_Target_Id":"4e87a614-2a8b-46c0-90f3-8a0cf9bda66c",
"Animal_Target_DisplayName":"Animaltitle",
"Animal_Target_FullName":"Animaltitle"
}
]
},
{
"Animal_Id":"754ec0e6-19b6-4b6b-8ba1-573393268257",
"Term7":[
{
"Animal_Target_Id":"a8986ed5-3ec8-44f3-954c-71cacb280ace",
"Animal_Target_DisplayName":"Animalcustomer",
"Animal_Target_FullName":"Animalcustomer"
}
]
},
{
"Animal_Id":"86b3ffd1-4d54-4a98-b25b-369060651bd6",
"Term7":[
{
"Animal_Target_Id":"89d02067-ebe8-4b87-9a1f-a6a0bdd40ec4",
"Animal_Target_DisplayName":"Animalfact_transaction",
"Animal_Target_FullName":"Animalfact_transaction"
}
]
},
{
"Animal_Id":"ea2e1b76-f8bc-46d9-8ebc-44ffdd60f213",
"Term7":[
{
"Animal_Target_Id":"e398cd32-1e73-46bd-8b8f-d039986d6de0",
"Animal_Target_DisplayName":"Animalfact_transaction",
"Animal_Target_FullName":"Animalfact_transaction"
}
]
}
],
"Relation10":[
{
"TargetRelation_b8b178ff-e957-47db-a4e7-6e5b789d6f03_Id":"aff80bd0-a282-4cf5-bdcc-2bad35ddec1d",
"Term11":[
{
"AnimalId":"3ac22167-eb91-469a-9d94-315aa301f55a",
"AnimalDisplayName":"Animal",
"AnimalFullName":"Animal"
}
]
}
],
"Tag12":[
{
"Tag_Id":"75968ea6-4c9f-43c9-80f7-dfc41b24ec8f",
"Tag_Name":"AnimalAnimaltitle"
},
{
"Tag_Id":"b1adbc00-aeef-415b-82b6-a3159145c60d",
"Tag_Name":"Animal2"
},
{
"Tag_Id":"5f78e4dc-2b37-41e0-a0d3-cec773af2397",
"Tag_Name":"AnimalDisplayName"
}
]
}
]
}
}
The output i am trying to get is a list of all the values from key Animal_Target_DisplayName like this ['Animaltheater','Animaltitle', 'Animalcustomer', 'Animalfact_transaction', 'Animalfact_transaction'] but we need to remember the nested structure of this json always changes but the keys for it are always the same.
I guess your only option is running through the entire dict and get the values of Animal_Target_DisplayName key, I propose the following recursive solution:
def run_json(dict_):
animal_target_sons = []
if type(dict_) is list:
for element in dict_:
animal_target_sons.append(run_json(element))
elif type(dict_) is dict:
for key in dict_:
if key=="Animal_Target_DisplayName":
animal_target_sons.append([dict_[key]])
else:
animal_target_sons.append(run_json(dict_[key]))
return [x for sublist in animal_target_sons for x in sublist]
run_json(dict_)
Then calling run_json returns a list with what you want. By the way, I recommend you to rename your json from dict to, for example dict_, since dict is a reserved word of Python for the dictionary type.
Since you're getting JSON, why not make use of the json module? That will do the parsing for you and allow you to use dictionary functions+features to get the information you need.
#!/usr/bin/python2.7
from __future__ import print_function
import json
# _somehow_ get your JSON in as a string. I'm calling it "jstr" for this
# example.
# Use the module to parse it
jdict = json.loads(jstr)
# our dict has keys...
# view -> Term0 -> keys-we're-interested-in
templist = jdict["view"]["Term0"]
results = {}
for _el in range(len(templist)):
if templist[_el]["FullName"] == "Animal Set":
# this is the one we're interested in - and it's another list
moretemp = templist[_el]["Relation6"]
for _k in range(len(moretemp)):
term7 = moretemp[_k]["Term7"][0]
displayName = term7["Animal_Target_DisplayName"]
fullName = term7["Animal_Target_FullName"]
results[fullName] = displayName
print("{0}".format(results))
Then you can dump the results dict plain, or with pretty-printing:
>>> print(json.dumps(results, indent=4))
{
"Animaltitle2": "Animaltitle2",
"Animalcustomer3": "Animalcustomer3",
"Animalfact_transaction4": "Animalfact_transaction4",
"Animaltheater1": "Animaltheater1"
}

Error while parsing json from IBM watson using python

I am trying to parse out a JSON download using python and here is the download that I have:
{
"document_tone":{
"tone_categories":[
{
"tones":[
{
"score":0.044115,
"tone_id":"anger",
"tone_name":"Anger"
},
{
"score":0.005631,
"tone_id":"disgust",
"tone_name":"Disgust"
},
{
"score":0.013157,
"tone_id":"fear",
"tone_name":"Fear"
},
{
"score":1.0,
"tone_id":"joy",
"tone_name":"Joy"
},
{
"score":0.058781,
"tone_id":"sadness",
"tone_name":"Sadness"
}
],
"category_id":"emotion_tone",
"category_name":"Emotion Tone"
},
{
"tones":[
{
"score":0.0,
"tone_id":"analytical",
"tone_name":"Analytical"
},
{
"score":0.0,
"tone_id":"confident",
"tone_name":"Confident"
},
{
"score":0.0,
"tone_id":"tentative",
"tone_name":"Tentative"
}
],
"category_id":"language_tone",
"category_name":"Language Tone"
},
{
"tones":[
{
"score":0.0,
"tone_id":"openness_big5",
"tone_name":"Openness"
},
{
"score":0.571,
"tone_id":"conscientiousness_big5",
"tone_name":"Conscientiousness"
},
{
"score":0.936,
"tone_id":"extraversion_big5",
"tone_name":"Extraversion"
},
{
"score":0.978,
"tone_id":"agreeableness_big5",
"tone_name":"Agreeableness"
},
{
"score":0.975,
"tone_id":"emotional_range_big5",
"tone_name":"Emotional Range"
}
],
"category_id":"social_tone",
"category_name":"Social Tone"
}
]
}
}
I am trying to parse out 'tone_name' and 'score' from the above file and I am using following code:
import urllib
import json
url = urllib.urlopen('https://watson-api-explorer.mybluemix.net/tone-analyzer/api/v3/tone?version=2016-05-19&text=I%20am%20happy')
data = json.load(url)
for item in data['document_tone']:
print item["tone_name"]
I keep running into error that tone_name not defined.
As jonrsharpe said in a comment:
data['document_tone'] is a dictionary, but 'tone_name' is a key in dictionaries much further down the structure.
You need to access the dictionary that tone_name is in. If I am understanding the JSON correctly, tone_name is a key within tones, within tone_categories, within document_tone. You would then want to change your code to go to that level, like so:
for item in data['document_tone']['tone_categories']:
# item is an anonymous dictionary
for thing in item[tones]:
print(thing['tone_name'])
The reason more than one for is needed is because of the mix of lists and dictionaries in the file. 'tone_categories is a list of dictionaries, so it accesses each one of those. Then, it iterates through the list tones, which is in each one and full of more dictionaries. Those dictionaries are the ones that contain 'tone_name', so it prints the value of 'tone_name'.
If this does not work, let me know. I was unable to test it since I could not get the rest of the code to work on my computer.
You are incorrectly walking the structure. The root node has a single document_tone key, the value of which only has the tone_categories key. Each of the categories has a list of tones and it's name. Here is how you would print it out (adjust as needed):
for cat in data['document_tone']['tone_categories']:
print('Category:', cat['category_name'])
for tone in cat['tones']:
print('-', tone['tone_name'])
The result of this is:
Category: Emotion Tone
- Anger
- Disgust
- Fear
- Joy
- Sadness
Category: Language Tone
- Analytical
- Confident
- Tentative
Category: Social Tone
- Openness
- Conscientiousness
- Extraversion
- Agreeableness
- Emotional Range

Categories