Parsing through nested JSON keys - python

I have a JSON file that looks like this:
data = {
"x": {
"y": {
"key": {
},
"w": {
}
}}}
And have converted it into a dict in python to them parse through it to look for keys, using the following code:
entry = input("Search JSON for the following: ") //search for "key"
if entry in data:
print(entry)
else:
print("Not found.")
However, even when I input "key" as entry, it still returns "Not found." Do I need to control the depth of data, what if I don't know the location of "key" but still want to search for it.

Your method is not working because key is not a key in data. data has one key: x. So you need to look at the dictionary and see if the key is in it. If not, you can pass the next level dictionaries back to the function recursively. This will find the first matching key:
data = {
"x": {
"y": {
"key": "some value",
"w": {}
}}}
key = "key"
def findValue(key, d):
if key in d:
return d[key]
for v in d.values():
if isinstance(v, dict):
found = findValue(key, v)
if found is not None:
return found
findValue(key, data)
# 'some value'
It will return None if your key is not found

Here's an approach which allows you to collect all the values from a nested dict, if the keys are repeated at different levels of nesting. It's very similar to the above answer, just wrapped in a function with a nonlocal list to hold the results:
def foo(mydict, mykey):
result = []
num_recursive_calls = 0
def explore(mydict, mykey):
#nonlocal result #allow successive recursive calls to write to list
#actually this is unnecessary in this case! Here
#is where we would need it, for a call counter:
nonlocal num_recursive_calls
num_recursive_calls += 1
for key in mydict.keys(): #get all keys from that level of nesting
if mykey == key:
print(f"Found {key}")
result.append({key:mydict[key]})
elif isinstance(mydict.get(key), dict):
print(f"Found nested dict under {key}, exploring")
explore(mydict[key], mykey)
explore(mydict, mykey)
print(f"explore called {num_recursive_calls} times") #see above
return result
For example, with
data = {'x': {'y': {'key': {}, 'w': {}}}, 'key': 'duplicate'}
This will return:
[{'key': {}}, {'key': 'duplicate'}]

Related

How can I get RecursiveSearch() to return the modified data dictionary?

Disclaimer: I've been at this for about a week, and it's entirely possible that I've come up with the solution, but I missed it in my troubleshooting. Also, the INI files can be over 200 lines long and 10 deep with combinations of dictionaries and lists.
Situation: I maintain a couple dozen applications, and each application has a JSON formatted INI file that tracks certain system settings. On my computer, I aggregated all those INI files into a single file and then collapsed the structure. That collapsed structure is the unique keys from all those INI files, followed by all the possible values that each key has, and then what I may want each value to be replaced with (see examples below).
Goal: When I need to make configuration changes in those applications, I want to instead make the value changes in my JSON file and then use a Python script to replace the matching key-value pairs in all those other system files.
Simplifications:
I understand opening and writing the files, my problem is the parsing.
The recursion will always end with a key-value pair where the type(value) is str.
Sample INI file from one of those applications
{
"Version": "3.24.2",
"Package": [
{
"ID": "42",
"Display": "4",
"Driver": "E10A"
}, {
"ID": "50",
"Display": "1",
"Driver": "E12A"
}
]
}
My change file
Example use: If I want to replace all instances of {"Display":"1"} with {"Display":"10"}, then all I have to do is put a 10 between the double quotes below ... {"Display": {"1": ""}} to {"Display": {"1": "10"}}
{
"Version" {
"3.24.2": "",
"42.1": "",
"2022-10-1": ""
},
"ID" {
"42": "",
"50": ""
},
"Display": {
"1": "",
"4": ""
},
"Driver": {
"01152003.1": "",
"E10A": "",
"E12A": ""
}
}
Attempt 1
I read that Python assigns values like a C *pointer, but that was not my experience with this attempt. There are no errors, and the data variable never changed.
def RecursiveSearch(val, key=None):
if isinstance(val, dict):
for k, v in val.items():
RecursiveSearch(v, k)
elif isinstance(val, list):
for v in val:
RecursiveSearch(v, key)
elif isinstance(val, str):
# Is the key being tracked in my change file
if key in ChangeFile:
# Is that key's value being tracked in my change file
if val in ChangeFile[key].keys():
# Find the matching key-value and apply the replacement value
for k, v in ChangeFile[key].items():
# Only replace the value if it has something to replace it with
if k == val and v != "":
key[val] = v
data = open('config.ini', 'w', encoding='UTF-8', errors='ignore')
data = convertJSON(data)
ChangeFile = open('change.json', 'r', encoding='UTF-8', errors='ignore')
ChangeFile = convertJSON(data)
data = RecursiveSearch(val=data, key=None)
print(data)
Attempt 2
Same code but with return values. In this attempt the data is completely replaced with the last key-value pair the recursion looked at.
def RecursiveSearch(val, key=None):
if isinstance(val, dict):
for k, v in val.items():
tmp = RecursiveSearch(v, k)
if tmp != {k: v}:
return tmp
return val
elif isinstance(val, list):
for v in val:
tmp = RecursiveSearch(v, key)
if v != tmp:
return tmp
return val
elif isinstance(val, str):
# Is the key being tracked in my change file
if key in ChangeFile:
# Is that key's value being tracked in my change file
if val in ChangeFile[key].keys():
# Find the matching key-value and apply the replacement value
for k, v in ChangeFile[key].items():
# Only replace the value if it has something to replace it with
if k == val and v != "":
return v
else: return val
else: return val
else: return val
else: return val
# Return edited data after the recursion uncoils
return {key: val}
data = open('config.ini', 'w', encoding='UTF-8', errors='ignore')
data = convertJSON(data)
ChangeFile = open('change.json', 'r', encoding='UTF-8', errors='ignore')
ChangeFile = convertJSON(data)
data = RecursiveSearch(val=data, key=None)
print(data)

How can I get a value of a specific json key only on a certain level?

I have a huge nested json file and I want to get the values of "text" but only on a certain level as there are many "text" keys deeper in the json file. The level I mean would be the "text:"Hi" after "event":"user".
The file looks like this:
`
{
"_id":{
"$oid":"123"
},
"events":[
{
"event":"action",
"metadata":{
"model_id":"12"
},
"action_text":null,
"hide_rule_turn":false
},
{
"event":"user",
"text":"Hi",
"parse_data":{
"intent":{
"name":"greet",
"confidence":{
"$numberDouble":"0.9601748585700989"
}
},
"entities":[
],
"text":"Hi",
"metadata":{
},
"text_tokens":[
[
{
"$numberInt":"0"
},
{
"$numberInt":"2"
}
]
],
"selector":{
"ideas":{
"response":{
"responses":[
{
"text":"yeah"
},
{
"text":"No"
},
{
"text":"Goo"
}
]
},
`
First I uses this function to get the text data but of course if gave me all of them:
def json_extract(obj, key):
"""Recursively fetch values from nested JSON."""
arr = []
def extract(obj, arr, key):
"""Recursively search for values of key in JSON tree."""
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, (dict, list)):
extract(v, arr, key)
elif k == key:
arr.append(v)
elif isinstance(obj, list):
for item in obj:
extract(item, arr, key)
return arr
values = extract(obj, arr, key)
return values
I also tried to access only the second level through this text but it gave me a KeyNotFound Error:
for i in data["events"][0]:
print(i["text"])
Maybe because that key is not in every nested list? ... I really don't know what else I could do
Since events is a list, you can write a list comprehension (if there are multiple items you need), or you can use the next function to get an element that you need from the iterator:
event = next(e for e in data.get('events', list()) if e.get('event')=='user')
print(event.get('text', ''))
Using get method gives you the safety that it won't throw an exception if the key doesn't exist in the dictionary
Edit:
If you need this for all events:
all_events = [e for e in data.get('events', list()) if e.get('event')=='user']
for event in all_events:
print(event.get('text', ''))
Convert your JSON to a Python dictionary (e.g., json.load or json.loads depending on how you're accessing the JSON). Then just pass a reference to the dictionary to this:
def json_extract(jdata):
assert isinstance(jdata, dict)
arr = []
def _extract(d, arr):
if 'event' in d and (t := d.get('text')):
arr.append(t)
for k, v in d.items():
if k not in {'event', 'text'}:
if isinstance(v, list):
for e in v:
if isinstance(e, dict):
_extract(e, arr)
elif isinstance(v, dict):
_extract(v, arr)
return arr
return _extract(jdata, arr)
This will return a list of all values associated with the key 'text' providing that key is found in a dictionary that also has an 'event' key

Drift management of JSON configurations by comparing with dictionary data

I am trying to write a python code for drift management, which compares the application's configuration in JSON with the predefined dictionary of key-value pairs.
Ex: Application configuration in JSON:
{
"location": "us-east-1",
"properties": [
{
"type": "t2.large",
"os": "Linux"
}
],
"sgs": {
"sgid": "x-1234"
}
}
Ex: Dictionary with desired values to compare:
{
"os": "Windows",
"location": "us-east-1"
}
Expected output:
Difference is:
{
"os": "Windows"
}
I have been trying to convert the entire JSON (including sub dicts) into a single dict without sub dicts, and then iterate over it with each values of desired dict. I am able to print all the key, values in line but couldn't convert into a dict.
Is there a better way to do this? Or any references that could help me out here?
import json
def openJsonFile(file):
with open(file) as json_data:
workData = json.load(json_data)
return workData
def recursive_iter(obj):
if isinstance(obj, dict):
for item in obj.items():
yield from recursive_iter(item)
elif any(isinstance(obj, t) for t in (list, tuple)):
for item in obj:
yield from recursive_iter(item)
else:
yield obj
data = openJsonFile('file.json')
for item in recursive_iter(data):
print(item)
Expected output:
{
"location": "us-east-1",
"type": "t2.large",
"os": "Linux"
"sgid": "x-1234"
}
I think this will do what you say you want. I used the dictionary flattening code in this answer with a small modification — I changed it to not concatenate the keys of parent dictionaries with those of the nested ones since that seems to be what you want. This assumes that the keys used in all nested dictionaries are unique from one another, which in my opinion is a weakness of your approach.
You asked for references that could help you: Searching this website for related questions is often a productive way to find solutions to your own problems. This is especially true when what you want to know is something that has probably been asked before (such as how to flatten nested dictionaries).
Also note that I have written the code to closely follow the PEP 8 - Style Guide for Python Code guidelines — which I strongly suggest you read (and start following as well).
import json
desired = {
"os": "Windows",
"location": "us-east-1"
}
def read_json_file(file):
with open(file) as json_data:
return json.load(json_data)
def flatten(d):
out = {}
for key, val in d.items():
if isinstance(val, dict):
val = [val]
if isinstance(val, list):
for subdict in val:
deeper = flatten(subdict).items()
out.update({key2: val2 for key2, val2 in deeper})
else:
out[key] = val
return out
def check_drift(desired, config):
drift = {}
for key, value in desired.items():
if config[key] != value:
drift[key] = value
return drift
if __name__ == '__main__':
from pprint import pprint
config = flatten(read_json_file('config.json'))
print('Current configuration (flattened):')
pprint(config, width=40, sort_dicts=False)
drift = check_drift(desired, config)
print()
print('Drift:')
pprint(drift, width=40, sort_dicts=False)
This is the output it produces:
Current configuration (flattened):
{'location': 'us-east-1',
'type': 't2.large',
'os': 'Linux',
'sgid': 'x-1234'}
Drift:
{'os': 'Windows'}

Python 3 dictionary find all elements and their key path

Given a dictionary like below, what I'd like to do is find all elements (keys+data) and their full root key path. If the data is a string it returns the data and the full key root. If the data is a dict, then it returns the first key inside that dict as the data and the full key root.
a = {
"level1_key": {
"level2_key": {
"level3_key": {
"status": "down"
}
}
}
}
For example, the first key has no root, and it's data is a dict, so return the current key as data and no parent keys.
Key = None
Data = level1_key
The data for level2_key is a dict, so return the current key and it's parents.
Key = level1_key
Data = level2_key
Key = level1_key.level2_key
Data = level3_key
Key = level1_key.level2_key.level3_key
Data = status
The last keys data is a string, so return the string as data, and all its keys
Key = level1_key.level2_key.level3_key.status
Data = down
Because there are 5 elements in the dict (4 keys and 1 string) I would end up with 5 tuples of key paths and data.
The reason behind this is that each element represents configuration, if say the "status" needs changing to "down" what actually needs changing is: level1_key.level2_key.level3_key.status
A non-working example I wrote which gets all the strings, their keys, and partial root paths, but it doesn't quite get all the root keys
current_key = ""
key_list = []
def thing(data):
global current_key
global key_list
if isinstance(data, dict):
for key, value in data.items():
current_key = key
key_list.append(key)
thing(value)
elif isinstance(data, str):
print(f"Key: {current_key}")
print(f"Key List: {key_list}")
print(f"Data: {data}")
print("##########")
key_list = []
Data is:
{
"100": {
"status": "down"
},
"200": {
"status1": "up",
"status2": "down",
"status3": {
"nested": "more_data"
}
}
}
Key: status
Key List: [100, 'status']
Data: down
##########
Key: status1
Key List: [200, 'status1']
Data: up
##########
Key: status2
Key List: ['status2']
Data: down
##########
Key: nested
Key List: ['status3', 'nested']
Data: more_data
##########
The last bit "more_data" is missing a root key of 200 for example.

pythonic way to check if my dict contains prototyped key hierarchy

I have a dict, lets say mydict
I also know about this json, let's say myjson:
{
"actor":{
"name":"",
"type":"",
"mbox":""
},
"result":{
"completion":"",
"score":{ "scaled":"" },
"success":"",
"timestamp":""
},
"verb":{
"display":{
"en-US":""
},
"id":""
},
"context":{
"location":"",
"learner_id": "",
"session_id": ""
},
"object":{
"definition":{
"name":{
"en-US":""
}
},
"id":"",
"activity_type":""
}
}
I want to know if ALL of myjson keys (with the same hierarchy) are in mydict. I don't care if mydict has more data in it (it can have more data). How do I do this in python?
Make a dictionary of myjson
import json
with open('myjson.json') as j:
new_dict = json.loads(j.read())
Then go through each key of that dictionary, and confirm that the value of that key is the same in both dictionaries
def compare_dicts(new_dict, mydict):
for key in new_dict:
if key in mydict and mydict[key] == new_dict[key]:
continue
else:
return False
return True
EDIT:
A little more complex, but something like this should suit you needs
def compare(n, m):
for key in n:
if key in m:
if m[key] == n[key]:
continue
elif isinstance(n[key], dict) and isinstance(m[key],dict):
if compare(n[key], m[key]):
continue
else:
return False
else:
return False
return True
If you just care about the values and not the keys you can do this:
>>> all(v in mydict.items() for v in myjson.items())
True
Will be true if all values if myjson are in mydict, even if they have other keys.
Edit: If you only care about the keys, use this:
>>> all(v in mydict.keys() for v in myjson.keys())
True
This returns true if every key of myjson is in mydict, even if they point to different values.

Categories