Related
I want to get the value of a specific key in a nested json file, without knowing the exact location. So basically looking through all the keys (and nested keys) until it finds the match, and return a dictionary {match: "value"}
Nested json_data:
{
"$id": "1",
"DataChangedEntry": {
"$id": "2",
"PathProperty": "/",
"Metadata": null,
"PreviousValue": null,
"CurrentValue": {
"CosewicWsRefId": {
"Value": "QkNlrjq2HL9bhTQqU8-qH"
},
"Date": {
"Value": "2022-05-20T00:00:00Z"
},
"YearSentToMinister": {
"Value": "0001-01-01T00:00:00"
},
"DateSentToMinister": {
"Value": "0001-01-01T00:00:00"
},
"Order": null,
"Type": {
"Value": "REGULAR"
},
"ReportType": {
"Value": "NEW"
},
"Stage": {
"Value": "ASSESSED"
},
"State": {
"Value": "PUBLISHED"
},
"StatusAndCriteria": {
"Status": {
"Value": "EXTINCT"
},
"StatusComment": {
"EnglishText": null,
"FrenchText": null
},
"StatusChange": {
"Value": "NOT_INITIALIZED"
},
"StatusCriteria": {
"EnglishText": null,
"FrenchText": null
},
"ApplicabilityOfCriteria": {
"ApplicabilityCriteriaList": []
}
},
"Designation": null,
"Note": null,
"DomainEvents": [],
"Version": {
"Value": 1651756761385.1248
},
"Id": {
"Value": "3z3XlCkaXY9xinAbK5PrU"
},
"CreatedAt": {
"Value": 1651756761384
},
"ModifiedAt": {
"Value": 1651756785274
},
"CreatedBy": {
"Value": "G#a"
},
"ModifiedBy": {
"Value": "G#a"
}
}
},
"EventAction": "Create",
"EventDataChange": {
"$ref": "2"
},
"CorrelationId": "3z3XlCkaXY9xinAbK5PrU",
"EventId": "WGxlewsUAHayLHZ2LHvFk",
"EventTimeUtc": "2022-05-06T13:15:31.7463355Z",
"EventDataVersion": "1.0.0",
"EventType": "AssessmentCreatedInfrastructure"
}
Desired return is the value from json_data["DataChangedEntry"]["CurrentValue"]["Date"]["Value"]:
"2022-05-20T00:00:00Z"
So far I've tried a recursive function but it keeps return None:
match_dict = {}
def recursive_json(data,attr,m_dict):
for k,v in data.items():
if k == attr:
for k2,v2 in v.items():
m_dict = {attr, v2}
print('IF: ',m_dict)
return m_dict
elif isinstance(v,dict):
return recursive_json(v,attr,m_dict)
print('RETURN: ',recursive_json(json_data, "Date", match_dict))
Output:
RETURN: None
I tried removing the second return statement, and it now prints the value I want in the function, but still returns None:
match_dict = {}
def recursive_json(data,attr,m_dict):
for k,v in data.items():
if k == attr:
for k2,v2 in v.items():
m_dict = {attr, v2}
print('IF: ',m_dict)
return m_dict
elif isinstance(v,dict):
recursive_json(v,attr,m_dict)
print('RETURN: ',recursive_json(json_data, "Date", match_dict))
Output:
IF: {'Date', '2022-05-20T00:00:00Z'}
RETURN: None
I don't get why it keeps returning None. Is there a better way to return the value I want?
The underlying question is: how can we make multiple recursive calls in a loop, return the recursive result if any of them returns something useful, and fail otherwise?
If we blindly return inside the loop, then only one recursive call can be made. Whatever it returns, gets returned at this level. If it didn't find the useful result, we don't get a useful result.
If we blindly don't return inside the loop, then the values that were returned don't matter. Nothing in the current call makes use of them, so we will finish looping, make all the recursive calls, reach the end of the function... and thus implicitly return None.
The way around this, of course, is to check whether the recursive call returned something useful. If it did, we can return that; otherwise, we keep going. If we reach the end, then we signal that we couldn't find anything useful - that way, if we are being recursively called, the caller can do the right thing.
Assuming that None cannot be a "useful" value, we can naturally use that as the signal. We don't even have to return it explicitly at the end.
After fixing some other typos (we should not overwrite the global built-in dict name, and anyway we don't need to name the dict that we pass in at the start, and the parameter should be m_dict so that it's properly defined when we make the recursive call), we get:
def recursive_json(data, attr, m_dict):
for k,v in data.items():
if k == attr:
for k2,v2 in v.items():
m_dict = {attr, v2}
print('IF: ', m_dict)
return m_dict
elif isinstance(v,dict):
result = recursive_json(v, attr, m_dict)
if result:
return result
# call it:
recursive_json(json_data, "Date", {})
We can see that the debug trace is printed, and the value is also returned.
Let's improve this a bit:
First off, the inner for k2,v2 in v.items(): loop doesn't make any sense. Again, we can only return once per call, so this would skip any values in the dict after the first. We would be better served just returning v directly. Also, the m_dict parameter doesn't actually help implement the logic; we don't modify it between calls. It doesn't make sense to use a set for our return value, since it's fundamentally unordered; we care about the order here. Finally, we don't need the debug trace any more. That gives us:
def recursive_json(data, attr):
for k, v in data.items():
if k == attr:
return attr, v
elif isinstance(v,dict):
result = recursive_json(v, attr)
if result:
return result
To get fancier, we can separate the base case from the recursive case, and use more elegant tools for each. To check if any of the keys matches, we can simply check with the in operator. To recurse and return the first fruitful result, the built-in next is useful. We get:
def recursive_json(data, attr):
if not isinstance(data, dict):
# reached a leaf, can't search in here.
return None
if attr in data:
return k, data[k]
candidates = (recursive_json(v, attr) for v in data.values())
try:
# the first non-None candidate, if any.
return next(c for c in candidates if c is not None)
except StopIteration:
return None # all candidates were None.
It seems like you're trying to write something like this:
from json import loads
from typing import Any
test_json = """
{
"a": {
"b": {
"value": 1
}
},
"b": {
"value": 2
},
"c": {
"b": {
"value": 3
},
"c": {
"value": 4
}
},
"d": {}
}
"""
json_data = loads(test_json)
def find_value(data: dict, attr: str, depth_first: bool=True) -> (bool, Any):
# assumes data is a dict, with 'value' attributes for the attr to be found
# returns [whether value was found]: bool, [actual value]: Any
for k, v in data.items():
if k == attr and 'value' in v:
return True, v['value']
elif depth_first and isinstance(v, dict):
if (t := find_value(v, attr, depth_first))[0]:
return t
if not depth_first:
for _, v in data.items():
if isinstance(v, dict) and (t := find_value(v, attr, depth_first))[0]:
return t
return False, None
# returns True, 1 - first 'b' with a 'value', depth-first
print(find_value(json_data, 'b'))
# returns True, 2 - first 'b' with a 'value', breadth-first
print(find_value(json_data, 'b', False))
# returns True, 4 - first 'c' with a 'value' - the 'c' at the root level has no 'value'
print(find_value(json_data, 'c'))
# returns False, None - no 'd' with a value
print(find_value(json_data, 'd'))
# returns False, None - no 'e' in data
print(find_value(json_data, 'e'))
Your own function can return None because you don't actually return the value a recursive call would return. And the default return value for a function is None.
However, your code also doesn't account for the case where there is nothing to be found.
(Note: this solution only works in Python 3.8 or later, due to its use of the walrus operator := - of course it's not that hard to write it without, but that's left as an exercisae for the reader
If I wanted to use an array to get a value from a dictionary, I would do something like this:
def get_dict_with_arr(d, arr):
accumulator = d
for elem in arr:
accumulator = accumulator[elem]
return accumulator
and use it like this:
test_dict = {
'this': {
'is': {
'it': 'test'
}
}
}
get_dict_with_arr(test_dict, ['this', 'is', 'it']) # returns 'test'
My question is, how may I write a function that sets the value instead of getting it? Basically I want to write a set_dict_with_arr(d, arr, value) function.
Try:
def set_dict_with_arr(d, arr, value):
cur_d = d
for v in arr[:-1]:
cur_d.setdefault(v, {})
cur_d = cur_d[v]
cur_d[arr[-1]] = value
return d
test_dict = {"this": {"is": {"it": "test"}}}
test_dict = set_dict_with_arr(test_dict, ["this", "is", "it"], "new value")
print(test_dict)
Prints:
{"this": {"is": {"it": "new value"}}}
I've searched and found this Append a dictionary to a dictionary but that clobbers keys from b if they exist in a..
I'd like to essentially recursively append 1 dictionary to another, where:
keys are unique (obviously, it's a dictionary), but each dictionary is fully represented in the result such that a.keys() and b.keys() are both subsets of c.keys()
if the same key is in both dictionaries, the resulting key contains a list of values from both, such that a[key] and b[key] are in c[key]
the values could be another dictionary, (but nothing deeper than 1 level), in which case the same logic should apply (append values) such that a[key1][key2] and b[key1][key2] are in c[key][key2]
The basic example is where 2 dictionary have keys that don't overlap, and I can accomplish that in multiple ways.. c = {**a, **b} for example, so I haven't covered that below
A trickier case:
a = {
"key1": "value_a1"
"key2": "value_a2"
}
b = {
"key1": "value_b1"
"key3": "value_b3"
}
c = combine(a, b)
c >> {
"key1": ["value_a1", "value_b1"],
"key2": "value_a2",
"key3": "value_b3"
}
An even trickier case
a = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_a2"],
"sub_key_2": "sub_value_a3"
},
"key2": "value_a2"
}
b = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_b1"],
"sub_key_2": "sub_value_b3"
},
"key3": "value_b3" # I'm okay with converting this to a list even if it's not one
}
c = combine(a, b)
c >> {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_a2", "sub_value_b1"], #sub_value_a1 is not duplicated
"sub_key_2": ["sub_value_a3", "sub_value_b3"]
},
"key2": "value_a2",
"key3": "value_b3" # ["value_b3"] this would be okay, following from the code comment above
}
Caveats:
Python 3.6
The examples show lists being created as_needed, but I'm okay with every non-dict value being a list, as mentioned in the code comments
The values within the lists will always be strings
I tried to explain as best I could but can elaborate more if needed. Been working on this for a few days and keep getting stuck on the sub key part
There is no simple built-in way of doing this, but you can recreate the logic in python.
def combine_lists(a: list, b: list) -> list:
return a + [i for i in b if i not in a]
def combine_strs(a: str, b: str) -> str:
if a == b:
return a
return [a, b]
class EMPTY:
"A sentinel representing an empty value."
def combine_dicts(a: dict, b: dict) -> dict:
output = {}
keys = list(a) + [k for k in b if k not in a]
for key in keys:
aval = a.get(key, EMPTY)
bval = b.get(key, EMPTY)
if isinstance(aval, list) and isinstance(bval, list):
output[key] = combine_lists(aval, bval)
elif isinstance(aval, str) and isinstance(bval, str):
output[key] = combine_strs(aval, bval)
elif isinstance(aval, dict) and isinstance(bval, dict):
output[key] = combine_dicts(aval, bval)
elif bval is EMPTY:
output[key] = aval
elif aval is EMPTY:
output[key] = bval
else:
raise RuntimeError(
f"Cannot combine types: {type(aval)} and {type(bval)}"
)
return output
Sounds like you want a specialised version of dict. So, you could subclass it to give you the behaviour you want. Being a bit of a Python noob, I started with the answer here : Subclassing Python dictionary to override __setitem__
Then I added the behaviour in your couple of examples.
I also added a MultiValue class which is a subclass of list. This makes it easy to tell if a value in the dict already has multiple values. Also it removes duplicates, as it looks like you don't want them.
class MultiValue(list):
# Class to hold multiple values for a dictionary key. Prevents duplicates.
def append(self, value):
if isinstance(value, MultiValue):
for v in value:
if not v in self:
super(MultiValue, self).append(v)
else:
super(MultiValue, self).append(value)
class MultiValueDict(dict):
# dict which converts a key's value to a MultiValue when the key already exists.
def __init__(self, *args, **kwargs):
self.update(*args, **kwargs)
def __setitem__(self, key, value):
# optional processing here
if key in self:
existing_value = self[key]
if isinstance(existing_value, MultiValueDict) and isinstance(value, dict):
existing_value.update(value)
return
if isinstance(existing_value, MultiValue):
existing_value.append(value)
value = existing_value
else:
value = MultiValue([existing_value, value])
super(MultiValueDict, self).__setitem__(key, value)
def update(self, *args, **kwargs):
if args:
if len(args) > 1:
raise TypeError("update expected at most 1 arguments, "
"got %d" % len(args))
other = dict(args[0])
for key in other:
self[key] = other[key]
for key in kwargs:
self[key] = kwargs[key]
def setdefault(self, key, value=None):
if key not in self:
self[key] = value
return self[key]
Example 1:
a = {
"key1": "value_a1",
"key2": "value_a2"
}
b = {
"key1": "value_b1",
"key3": "value_b3"
}
# combine by creating a MultiValueDict then using update to add b to it.
c = MultiValueDict(a)
c.update(b)
print(c)
# gives {'key1': ['value_a1', 'value_b1'], 'key2': 'value_a2', 'key3': 'value_b3'}
Example 2: The value for key1 is created as a MultiValueDict and the value for the sub_key_1 is a MultiValue, so this may not fit what you're trying to do. It depends how you're building you data set.
a = {
"key1": MultiValueDict({
"sub_key_1": MultiValue(["sub_value_a1", "sub_value_a2"]),
"sub_key_2": "sub_value_a3"
}),
"key2": "value_a2"
}
b = {
"key1": MultiValueDict({
"sub_key_1": MultiValue(["sub_value_a1", "sub_value_b1"]),
"sub_key_2": "sub_value_b3"
}),
"key3": "value_b3" # I'm okay with converting this to a list even if it's not one
}
c = MultiValueDict(a)
c.update(b)
print(c)
# gives {'key1': {'sub_key_1': ['sub_value_a1', 'sub_value_a2', 'sub_value_b1'], 'sub_key_2': ['sub_value_a3', 'sub_value_b3']}, 'key2': 'value_a2', 'key3': 'value_b3'}
a = {
"key1": "value_a1",
"key2": "value_a2"
}
b = {
"key1": "value_b1",
"key3": "value_b3"
}
def appendValues(ax,cx):
if type(ax)==list:#is key's value in a, a list?
cx.extend(ax)#if it is a list then extend
else:#key's value in a, os not a list
cx.append(ax)#so use append
cx=list(set(cx))#make values unique with set
return cx
def combine(a,b):
c={}
for x in b:#first copy b keys and values to c
c[x]=b[x]
for x in a:#now combine a with c
if not x in c:#this key is not in c
c[x]=a[x]#so add it
else:#key exists in c
if type(c[x])==list:#is key's value in c ,a list?
c[x]=appendValues(a[x],c[x])
elif type(c[x])==dict:#is key's value in c a dictionary?
c[x]=combine(c[x],a[x])#combine dictionaries
else:#so key';'s value is not list or dict
c[x]=[c[x]]#make value a list
c[x]=appendValues(a[x],c[x])
return c
c = combine(a, b)
print(c)
print("==========================")
a = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_a2"],
"sub_key_2": "sub_value_a3"
},
"key2": "value_a2"
}
b = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_b1"],
"sub_key_2": "sub_value_b3"
},
"key3": "value_b3" # I'm okay with converting this to a list even if it's not one
}
c = combine(a, b)
print(c)
I am trying to get a JSON sub-schema's "name" from based off of its contents. This is kind of hard to explain, so an example would be better:
{
"dummy_name_1": {
"dummy_key_1": "unique_dummy_value_1",
"dummy_key_2": "dummy_value_2"
},
"dummy_name_2": {
"dummy_key_1": "unique_dummy_value_2",
"dummy_key_2": "dummy_value_2"
}
}
I want to get the name of dummy_name_1 (which would be "dummy_name_1") given the value of the key "dummy_key_1" (which would be "unique_dummy_value_1"). Basically, if I give the Python function I want "dummy_key_1" and "unique_dummy_value_1" as parameters, I want it to return the string "dummy_name_1".
Something like this? structure being your dict.
def get_dummy_name(dummy_key, dummy_value):
for dummy_name, content in structure.items():
if dummy_key in content.keys() and content[dummy_key] == dummy_value:
return dummy_name
try with this:
def get_category_name(key_name, key_value):
dictionary = {
"dummy_name_1": {
"dummy_key_1": "unique_dummy_value_1",
"dummy_key_2": "dummy_value_2"
},
"dummy_name_2": {
"dummy_key_1": "unique_dummy_value_2",
"dummy_key_2": "dummy_value_2"
}
}
for elem in dictionary.items():
if key_name in elem[1] and elem[1][key_name] == key_value:
return elem[0]
return False
response = get_category_name('dummy_key_1', 'unique_dummy_value_1')
What is the easiest way to say whether the key div exist or not
di = {
'resp': {
u'frame': {
'html': {
'div': [
u'test1'
]
}
}
}
}
di.get("div","Not found") # prints not found
You need to make a function that recursively check the nested dictionary.
def exists(d, key):
return isinstance(d, dict) and \
(key in d or any(exists(d[k], key) for k in d))
Example:
>>> di = {
... 'resp': {
... u'frame': {
... 'html': {
... 'div': [
... u'test1'
... ]
... }
... }
... }
... }
>>>
>>> exists(di, 'div')
True
>>> exists(di, 'html')
True
>>> exists(di, 'body') # Not exist
False
>>> exists(di, 'test1') # Not a dictionary key.
False
In this precise case, you could use
if 'div' in di['resp'][u'frame']['html']:
More generally, if you don't know (or care) where 'div' is within di, you will need a function to search through the various sub-dictionaries.
You must do a deep search for it.
def rec_search(d):
for key in d.keys():
if key == 'div': return True
for value in d.values():
if isinstance(value, dict) and rec_search(value): return True
return False
First, flatten the dictionary:
def flatten_dict(d):
for k,v in d.items():
if isinstance(v, dict):
for item in flatten_dict(v):
yield [k]+item
else:
yield v
Now check membership in the keys array. Note this will not tell you how many instances of div there are. just that at least 1 is present.
Just trying to solve using regex, which is not the way to solve your problem. But this is fast .
#!/usr/bin/python
di = {
'resp': {
'frame': {
'html': {
'div': [
'test1'
]
}
}
}
}
import re
def check(k):
key = di.keys()
string = str(di.values())
if k in key:
return True
try:
m = re.findall('({[\"\']%s[\"\'])' % k, string)[0]
if m and re.match('{', m):
return True
else:
return False
except:
return False
for i in ['resp', 'abc', 'frame', 'div', 'yopy', 'python', 'test1']:
print i, check(i)
Output:
resp True
abc False
frame True
div True
yopy False
python False
test1 False