Navigate dict based on its structure - python

I have a python code that interacts with multiple APIs. All of the APIs return some json but each has different structure. Let's say I'm looking for people's names in all these jsons:
json_a = {
"people": [
{"name": "John"},
{"name": "Peter"}
]
}
json_b = {
"humans": {
"names": ["Adam", "Martin"]
}
}
As you can see above the dictionaries from jsons have arbitrary structures. I'd like to define something that will serve as a "blueprint" for navigating each json, something like this:
all_jsons = {
"json_a": {
"url": "http://endpoint",
"json_structure": "people -> list -> name"
},
"json_b": {
"url": "http://someotherendpoint",
"json_structure": "humans -> names -> list"
}
}
So that if I'm working with json_a I'll just look into all_jsons["json_a"]["json_structure"] and I have an information on how to navigate this exact json. What would be the best way to achieve this?

Why not define concrete retrieval functions for each api:
def retrieve_a(data):
return [d["name"] for d in data["people"]]
def retrieve_b(data):
return data["humans"]["names"]
and store them for each endpoint:
all_jsons = {
"json_a": {
"url": "http://endpoint",
"retrieve": retrieve_a
},
"json_b": {
"url": "http://someotherendpoint",
"retrieve": retrieve_b
}
}
I have found this approach more workable than trying to express code-logic by configuration. Then you can easily collect names:
for dct in all_jsons.values():
data = ... # requests.get(dct["url"]).json() # or similar
names = dct["retrieve"](data)

To get a value from a dictionary with a key if it may not exist, dict.get(key) is used for. To distinguish which type list or dict or else, type(val) is useful for. Combination of them should achieve your problem.

Related

Flatten Nested JSON in Python

I'm new to Python and I'm quite stuck (I've gone through multiple other stackoverflows and other sites and still can't get this to work).
I've the below json coming out of an API connection
{
"results":[
{
"group":{
"mediaType":"chat",
"queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
},
"data":[
{
"interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
"metrics":[
{
"metric":"nOffered",
"qualifier":null,
"stats":{
"max":null,
"min":null,
"count":14,
"count_negative":null,
"count_positive":null,
"sum":null,
"current":null,
"ratio":null,
"numerator":null,
"denominator":null,
"target":null
}
}
],
"views":null
}
]
}
]
}
and what I'm mainly looking to get out of it is (or at least something as close as)
MediaType
QueueId
NOffered
Chat
67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d
14
Is something like that possible? I've tried multiple things and I either get the whole of this out in one line or just get different errors.
The error you got indicates you missed that some of your values are actually a dictionary within an array.
Assuming you want to flatten your json file to retrieve the following keys: mediaType, queueId, count.
These can be retrieved by the following sample code:
import json
with open(path_to_json_file, 'r') as f:
json_dict = json.load(f)
for result in json_dict.get("results"):
media_type = result.get("group").get("mediaType")
queue_id = result.get("group").get("queueId")
n_offered = result.get("data")[0].get("metrics")[0].get("count")
If your data and metrics keys will have multiple indices you will have to use a for loop to retrieve every count value accordingly.
Assuming that the format of the API response is always the same, have you considered hardcoding the extraction of the data you want?
This should work: With response defined as the API output:
response = {
"results":[
{
"group":{
"mediaType":"chat",
"queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
},
"data":[
{
"interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
"metrics":[
{
"metric":"nOffered",
"qualifier":'null',
"stats":{
"max":'null',
"min":'null',
"count":14,
"count_negative":'null',
"count_positive":'null',
"sum":'null',
"current":'null',
"ratio":'null',
"numerator":'null',
"denominator":'null',
"target":'null'
}
}
],
"views":'null'
}
]
}
]
}
You can extract the results as follows:
results = response["results"][0]
{
"mediaType": results["group"]["mediaType"],
"queueId": results["group"]["queueId"],
"nOffered": results["data"][0]["metrics"][0]["stats"]["count"]
}
which gives
{
'mediaType': 'chat',
'queueId': '67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d',
'nOffered': 14
}

How to create a tree using BFS in python?

So I have a flattened tree in JSON like this, as array of objects:
[{
aid: "id3",
data: ["id1", "id2"]
},
{
aid: "id1",
data: ["id3", "id2"]
},
{
aid: "id2",
nested_data: {aid: "id4", atype: "nested", data: ["id1", "id3"]},
data: []
}]
I want to gather that tree and resolve ids into data with recursion loops into something like this (say we start from "id3"):
{
"aid":"id3",
"payload":"1",
"data":[
{
"id1":{
"aid":"id1",
"data":[
{
"id3":null
},
{
"id2":null
}
]
}
},
{
"id2":{
"aid":"id2",
"nested_data":{
"aid":"id4",
"atype":"nested",
"data":[
{
"id1":null
},
{
"id3":null
}
]
},
"data":[
]
}
}
]
}
So that we would get breadth-first search and resolve some field into "value": "object with that field" on first entrance and "value": Null
How to do such a thing in python 3?
Apart from all the problems that your structure has in terms of syntax (identifiers must be within quotes, etc.), the code below will provide you with the requested answer.
But you should carefully think about what you are doing, and have the following into account:
Using the relations expressed in the flat structure that you provide will mean that you will have an endless recursion since you have items that include other items that in turn include the first ones (like id3 including id1, which in turn include id3. So, you have to define stop criteria, or be sure that this does not occur in your flat structure.
Your initial flat structure is better to be in the form of a dictionary, instead of a list of pairs {id, data}. That is why the first thing the code below does is to transform this.
Your final, desired structure contains a lot of redundancies in terms of information contained. Consider simplifying it.
Finally, you mentioned nothing about the "nested_data" nodes, and how they should be treated. I simply assumed that in case that exist, further expansion is required.
Please, consider trying to provide a bit of context in your questions, some real data examples (I believe the data provided is not real data, therefore the inconsistencies and redundancies), and try yourself and provide your efforts; that's the only way to learn.
from pprint import pprint
def reformat_flat_info(flat):
reformatted = {}
for o in flat:
key = o["aid"]
del o["aid"]
reformatted[key] = o
return reformatted
def expand_data(aid, flat, lvl=0):
obj = flat[aid]
if obj is None: return {aid: obj}
obj.update({"aid": aid})
if lvl > 1:
return {aid: None}
for nid,id in enumerate(obj["data"]):
obj["data"][nid] = expand_data(id, flat, lvl=lvl+1)
if "nested_data" in obj:
for nid,id in enumerate(obj["nested_data"]["data"]):
obj["nested_data"]["data"][nid] = expand_data(id, flat, lvl=lvl+1)
return {aid: obj}
# Provide the flat information structure
flat_info = [
{
"aid": "id3",
"data": ["id1", "id2"]
}, {
"aid": "id1",
"data": ["id3", "id2"]
}, {
"aid": "id2",
"nested_data": {"aid": "id4", "atype": "nested", "data": ["id1", "id3"]},
"data": []
}
]
pprint(flat_info)
print('-'*80)
# Reformat the flat information structure
new_flat_info = reformat_flat_info(flat=flat_info)
pprint(new_flat_info)
print('-'*80)
# Generate the result
starting_id = "id3"
result = expand_data(aid=starting_id, flat=new_flat_info)
pprint(result)

Error while parsing json from IBM watson using python

I am trying to parse out a JSON download using python and here is the download that I have:
{
"document_tone":{
"tone_categories":[
{
"tones":[
{
"score":0.044115,
"tone_id":"anger",
"tone_name":"Anger"
},
{
"score":0.005631,
"tone_id":"disgust",
"tone_name":"Disgust"
},
{
"score":0.013157,
"tone_id":"fear",
"tone_name":"Fear"
},
{
"score":1.0,
"tone_id":"joy",
"tone_name":"Joy"
},
{
"score":0.058781,
"tone_id":"sadness",
"tone_name":"Sadness"
}
],
"category_id":"emotion_tone",
"category_name":"Emotion Tone"
},
{
"tones":[
{
"score":0.0,
"tone_id":"analytical",
"tone_name":"Analytical"
},
{
"score":0.0,
"tone_id":"confident",
"tone_name":"Confident"
},
{
"score":0.0,
"tone_id":"tentative",
"tone_name":"Tentative"
}
],
"category_id":"language_tone",
"category_name":"Language Tone"
},
{
"tones":[
{
"score":0.0,
"tone_id":"openness_big5",
"tone_name":"Openness"
},
{
"score":0.571,
"tone_id":"conscientiousness_big5",
"tone_name":"Conscientiousness"
},
{
"score":0.936,
"tone_id":"extraversion_big5",
"tone_name":"Extraversion"
},
{
"score":0.978,
"tone_id":"agreeableness_big5",
"tone_name":"Agreeableness"
},
{
"score":0.975,
"tone_id":"emotional_range_big5",
"tone_name":"Emotional Range"
}
],
"category_id":"social_tone",
"category_name":"Social Tone"
}
]
}
}
I am trying to parse out 'tone_name' and 'score' from the above file and I am using following code:
import urllib
import json
url = urllib.urlopen('https://watson-api-explorer.mybluemix.net/tone-analyzer/api/v3/tone?version=2016-05-19&text=I%20am%20happy')
data = json.load(url)
for item in data['document_tone']:
print item["tone_name"]
I keep running into error that tone_name not defined.
As jonrsharpe said in a comment:
data['document_tone'] is a dictionary, but 'tone_name' is a key in dictionaries much further down the structure.
You need to access the dictionary that tone_name is in. If I am understanding the JSON correctly, tone_name is a key within tones, within tone_categories, within document_tone. You would then want to change your code to go to that level, like so:
for item in data['document_tone']['tone_categories']:
# item is an anonymous dictionary
for thing in item[tones]:
print(thing['tone_name'])
The reason more than one for is needed is because of the mix of lists and dictionaries in the file. 'tone_categories is a list of dictionaries, so it accesses each one of those. Then, it iterates through the list tones, which is in each one and full of more dictionaries. Those dictionaries are the ones that contain 'tone_name', so it prints the value of 'tone_name'.
If this does not work, let me know. I was unable to test it since I could not get the rest of the code to work on my computer.
You are incorrectly walking the structure. The root node has a single document_tone key, the value of which only has the tone_categories key. Each of the categories has a list of tones and it's name. Here is how you would print it out (adjust as needed):
for cat in data['document_tone']['tone_categories']:
print('Category:', cat['category_name'])
for tone in cat['tones']:
print('-', tone['tone_name'])
The result of this is:
Category: Emotion Tone
- Anger
- Disgust
- Fear
- Joy
- Sadness
Category: Language Tone
- Analytical
- Confident
- Tentative
Category: Social Tone
- Openness
- Conscientiousness
- Extraversion
- Agreeableness
- Emotional Range

merging json files by keys in python (or some other easy scripting language)

I would like to merge multiple json files into a single object or file. An example JSON object could be
{
"food": {
"ingredents": [
"one": "this",
"two": "that",
]
}
}
and
"food": {
"amount": [
"tablespoons": "3",
]
}
}
I would like to be able to merge these so that we get
"food": {
"ingredents": [
"one": "this",
"two": "that",
],
"amount": [
"tablespoons": "3",
]
}
}
so that they are all combined via the parent keys instead of just as a list where "food" would repeat itself. Also I would like the outgoing file to replace anything that is repeated, such as if ingredients "one" : "this" existed somewhere else it would only appear once.
Any help would be greatly appreciated. Specifically something that iterates through a list of JSON files and applies the method would be ideal.
I have tried using glob and iterating through the JSON files
such as
ar = []
for f in glob.glob("*.json"):
with open(f, "rb") as filename:
ar.append(json.load(filename))
with open("outfile.json", "w") as outfile:
json.dump(ar, outfile)
yet this just gives me a list of JSON objects without connecting them by key.
I could likely write one solution where it collects the key data and uses conditionals to determine where to place an object inside a certain key, but this would require a lot more work especially since I am dealing with a large amount of files. If there is a simpler solution that would be amazing.
Not sure which libraries you've tried that you say didn't work correctly for your needs, but I would recommend lodash. It's an incredibly fast, tiny, robust library to handle these kinds of operations. For this specific case you could easily accomplish it with lodash merge https://lodash.com/docs#merge
example:
var users = {
'data': [{ 'user': 'barney' }, { 'user': 'fred' }]
};
var ages = {
'data': [{ 'age': 36 }, { 'age': 40 }]
};
_.merge(users, ages);
// → { 'data': [{ 'user': 'barney', 'age': 36 }, { 'user': 'fred', 'age': 40 }] }

How to search nested dict/array structure to add a subdict in python?

If I have the following python dict "mydict":
mydict = {
"folder": "myfolder",
"files": [
{ "folder": "subfolder",
"files": []
},
{ "folder": "anotherfolder",
"files": [
{ "folder": "subsubfolder",
"files": []
},
{ "folder": "subotherfolder",
"files": []
}
]
},
]
}
How can I make it such that if I have another dict "newdict":
newdict = {
"folder":"newfolder"
"files":[]
}
How/Is it possible to write a function the takes two arguments ("current dictionary to be added to","dict i want to add", "foldername-to-insert-under" such that after calling the function with:
desiredfunction(mydict, newdict, "subsubfolder")
I would like the function to search my existing "mydict" for the appropriate "files" array to append "newdict".
mydict = {
"folder": "myfolder",
"files": [
{ "folder": "subfolder",
"files": []
},
{ "folder": "anotherfolder",
"files": [
{ "folder": "subsubfolder",
"files": [
{
"folder":"newfolder"
"files":[]
}
]
},
{ "folder": "subotherfolder",
"files": []
}
]
},
]
}
What is the best way to do this? I do not know how/if its possible to search an existing dictionary structure, search within multiple levels of a nested dictionary to appropriately insert a new dict before, help is appreciated.
If you know the shape (amount of nested levels, types, etc.) of the data structure, you can just write a procedure containing the proper amount of for loops to iterate over the data structure and find the key.
If you don't know the shape of the data structure, you can recursively traverse the data structure until it's empty or the appropriate key has been found.
EDIT: Oh, you want to add a new dict inside the old dict. I thought you wanted to add something from the old dict to the new one.
ex. If you want to insert 'something' under the key 'z' in a dict, and you know what the dict is going to look like at all times,
example = {0: {'a': 'dummy'}, 1: {'z': 'dummy'}}
for thing1 in example:
for thing2 in thing1:
if thing2 == 'z':
example[thing1][thing2] = 'something'
If you don't know what the dict is going to look like, or if you don't want to hardcode the for loops,
example = {0: {'a': 'dummy'}, 1: {'b': {'z': 'dummy'}}}
replacement = 'something'
path = [1, 'b', 'z']
parent = example
while path:
key = path.pop(0)
if path:
parent = parent[key]
else:
parent[key] = replacement
print example
where path can be passed as an argument to a recursive call of a recursive procedure that traverses the dict.

Categories