I'm trying to print my json content. I know how to print just keys and values but I want to have access to the objects within the keys too. This is my code:
json_mini = json.loads('{"one" : {"testing" : 39, "this": 17}, "two" : "2", "three" : "3"}')
for index, value in json_mini.items():
print index, value
if value.items():
for ind2, val2 in value.items():
print ind2, val2
which gives me this error: AttributeError: 'unicode' object has no attribute 'items'
How to iterate over it? So I can do some process on each separate key and value?
Recursive example:
import json
def func(data):
for index, value in data.items():
print index, value
if isinstance(value, dict):
func(value)
json_mini = json.loads('{"one" : {"testing" : 39, "this": 17}, "two" : "2", "three" : "3"}')
func(json_mini)
Here's a recursive way that works in Python 2 and 3, which doesn't use isinstance(). It instead uses exceptions to determine whether a given element is a sub-object or not.
import json
def func(obj, name=''):
try:
for key, value in obj.items():
func(value, key)
except AttributeError:
print('{}: {}'.format(name, obj))
json_mini = json.loads('''{
"three": "3",
"two": "2",
"one": {
"this": 17,
"testing": 39
}
}''')
func(json_mini)
Output:
this: 17
testing: 39
three: 3
two: 2
Related
I want to write a function that checks keys of dict1 (base dict) and compare it to keys of dict2 (list of nested dictionaries, can be one or multiple), such that it checks for the mandatory key and then optional keys(if and whatever are present) and returns the difference as a list.
dict1 = {"name": str, #mandatory
"details" : { #optional
"class" : str, #optional
"subjects" : { #optional
"english" : bool, #optional
"maths" : bool #optional
}
}}
dict2 = [{"name": "SK",
"details" : {
"class" : "A"}
},
{"name": "SK",
"details" : {
"class" : "A",
"subjects" :{
"english" : True,
"science" : False
}
}}]
After comparing dict2 with dict1,The expected output is:-
pass #no difference in keys in 1st dictionary
["science"] #the different key in second dictionary of dict2
Try out this recursive check function:
def compare_dict_keys(d1, d2, diff: list):
if isinstance(d2, dict):
for key, expected_value in d2.items():
try:
actual_value = d1[key]
compare_dict_keys(actual_value, expected_value, diff)
except KeyError:
diff.append(key)
else:
pass
dict1 vs dict2
difference = []
compare_dict_keys(dict1, dict2, difference)
print(difference)
# Output: ['science']
dict2 vs dict1
difference = []
compare_dict_keys(dict2, dict1, difference)
print(difference)
# Output: ['maths']
I've searched and found this Append a dictionary to a dictionary but that clobbers keys from b if they exist in a..
I'd like to essentially recursively append 1 dictionary to another, where:
keys are unique (obviously, it's a dictionary), but each dictionary is fully represented in the result such that a.keys() and b.keys() are both subsets of c.keys()
if the same key is in both dictionaries, the resulting key contains a list of values from both, such that a[key] and b[key] are in c[key]
the values could be another dictionary, (but nothing deeper than 1 level), in which case the same logic should apply (append values) such that a[key1][key2] and b[key1][key2] are in c[key][key2]
The basic example is where 2 dictionary have keys that don't overlap, and I can accomplish that in multiple ways.. c = {**a, **b} for example, so I haven't covered that below
A trickier case:
a = {
"key1": "value_a1"
"key2": "value_a2"
}
b = {
"key1": "value_b1"
"key3": "value_b3"
}
c = combine(a, b)
c >> {
"key1": ["value_a1", "value_b1"],
"key2": "value_a2",
"key3": "value_b3"
}
An even trickier case
a = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_a2"],
"sub_key_2": "sub_value_a3"
},
"key2": "value_a2"
}
b = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_b1"],
"sub_key_2": "sub_value_b3"
},
"key3": "value_b3" # I'm okay with converting this to a list even if it's not one
}
c = combine(a, b)
c >> {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_a2", "sub_value_b1"], #sub_value_a1 is not duplicated
"sub_key_2": ["sub_value_a3", "sub_value_b3"]
},
"key2": "value_a2",
"key3": "value_b3" # ["value_b3"] this would be okay, following from the code comment above
}
Caveats:
Python 3.6
The examples show lists being created as_needed, but I'm okay with every non-dict value being a list, as mentioned in the code comments
The values within the lists will always be strings
I tried to explain as best I could but can elaborate more if needed. Been working on this for a few days and keep getting stuck on the sub key part
There is no simple built-in way of doing this, but you can recreate the logic in python.
def combine_lists(a: list, b: list) -> list:
return a + [i for i in b if i not in a]
def combine_strs(a: str, b: str) -> str:
if a == b:
return a
return [a, b]
class EMPTY:
"A sentinel representing an empty value."
def combine_dicts(a: dict, b: dict) -> dict:
output = {}
keys = list(a) + [k for k in b if k not in a]
for key in keys:
aval = a.get(key, EMPTY)
bval = b.get(key, EMPTY)
if isinstance(aval, list) and isinstance(bval, list):
output[key] = combine_lists(aval, bval)
elif isinstance(aval, str) and isinstance(bval, str):
output[key] = combine_strs(aval, bval)
elif isinstance(aval, dict) and isinstance(bval, dict):
output[key] = combine_dicts(aval, bval)
elif bval is EMPTY:
output[key] = aval
elif aval is EMPTY:
output[key] = bval
else:
raise RuntimeError(
f"Cannot combine types: {type(aval)} and {type(bval)}"
)
return output
Sounds like you want a specialised version of dict. So, you could subclass it to give you the behaviour you want. Being a bit of a Python noob, I started with the answer here : Subclassing Python dictionary to override __setitem__
Then I added the behaviour in your couple of examples.
I also added a MultiValue class which is a subclass of list. This makes it easy to tell if a value in the dict already has multiple values. Also it removes duplicates, as it looks like you don't want them.
class MultiValue(list):
# Class to hold multiple values for a dictionary key. Prevents duplicates.
def append(self, value):
if isinstance(value, MultiValue):
for v in value:
if not v in self:
super(MultiValue, self).append(v)
else:
super(MultiValue, self).append(value)
class MultiValueDict(dict):
# dict which converts a key's value to a MultiValue when the key already exists.
def __init__(self, *args, **kwargs):
self.update(*args, **kwargs)
def __setitem__(self, key, value):
# optional processing here
if key in self:
existing_value = self[key]
if isinstance(existing_value, MultiValueDict) and isinstance(value, dict):
existing_value.update(value)
return
if isinstance(existing_value, MultiValue):
existing_value.append(value)
value = existing_value
else:
value = MultiValue([existing_value, value])
super(MultiValueDict, self).__setitem__(key, value)
def update(self, *args, **kwargs):
if args:
if len(args) > 1:
raise TypeError("update expected at most 1 arguments, "
"got %d" % len(args))
other = dict(args[0])
for key in other:
self[key] = other[key]
for key in kwargs:
self[key] = kwargs[key]
def setdefault(self, key, value=None):
if key not in self:
self[key] = value
return self[key]
Example 1:
a = {
"key1": "value_a1",
"key2": "value_a2"
}
b = {
"key1": "value_b1",
"key3": "value_b3"
}
# combine by creating a MultiValueDict then using update to add b to it.
c = MultiValueDict(a)
c.update(b)
print(c)
# gives {'key1': ['value_a1', 'value_b1'], 'key2': 'value_a2', 'key3': 'value_b3'}
Example 2: The value for key1 is created as a MultiValueDict and the value for the sub_key_1 is a MultiValue, so this may not fit what you're trying to do. It depends how you're building you data set.
a = {
"key1": MultiValueDict({
"sub_key_1": MultiValue(["sub_value_a1", "sub_value_a2"]),
"sub_key_2": "sub_value_a3"
}),
"key2": "value_a2"
}
b = {
"key1": MultiValueDict({
"sub_key_1": MultiValue(["sub_value_a1", "sub_value_b1"]),
"sub_key_2": "sub_value_b3"
}),
"key3": "value_b3" # I'm okay with converting this to a list even if it's not one
}
c = MultiValueDict(a)
c.update(b)
print(c)
# gives {'key1': {'sub_key_1': ['sub_value_a1', 'sub_value_a2', 'sub_value_b1'], 'sub_key_2': ['sub_value_a3', 'sub_value_b3']}, 'key2': 'value_a2', 'key3': 'value_b3'}
a = {
"key1": "value_a1",
"key2": "value_a2"
}
b = {
"key1": "value_b1",
"key3": "value_b3"
}
def appendValues(ax,cx):
if type(ax)==list:#is key's value in a, a list?
cx.extend(ax)#if it is a list then extend
else:#key's value in a, os not a list
cx.append(ax)#so use append
cx=list(set(cx))#make values unique with set
return cx
def combine(a,b):
c={}
for x in b:#first copy b keys and values to c
c[x]=b[x]
for x in a:#now combine a with c
if not x in c:#this key is not in c
c[x]=a[x]#so add it
else:#key exists in c
if type(c[x])==list:#is key's value in c ,a list?
c[x]=appendValues(a[x],c[x])
elif type(c[x])==dict:#is key's value in c a dictionary?
c[x]=combine(c[x],a[x])#combine dictionaries
else:#so key';'s value is not list or dict
c[x]=[c[x]]#make value a list
c[x]=appendValues(a[x],c[x])
return c
c = combine(a, b)
print(c)
print("==========================")
a = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_a2"],
"sub_key_2": "sub_value_a3"
},
"key2": "value_a2"
}
b = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_b1"],
"sub_key_2": "sub_value_b3"
},
"key3": "value_b3" # I'm okay with converting this to a list even if it's not one
}
c = combine(a, b)
print(c)
I have a Python script, which uses a function from a previous Stack Overflow solution.
from pandas import json_normalize
from collections.abc import MutableMapping as mm
def flatten(dictionary, parent_key=False, separator='.'):
items = []
for key, value in dictionary.items():
new_key = str(parent_key) + separator + key if parent_key else key
if isinstance(value, mm):
items.extend(flatten(value, new_key, separator).items())
elif isinstance(value, list):
for k, v in enumerate(value):
items.extend(flatten({str(k): v}, new_key).items())
else:
items.append((new_key, value))
return dict(items)
d = {
"_id" : 1,
"labelId" : [
6422
],
"levels" : [
{
"active" : "true",
"level" : 3,
"actions" : [
{
"isActive" : "true"
}]
}]
}
x = flatten(d)
x = json_normalize(x)
print(x)
Current Output:
_id labelId.0 levels.0.active levels.0.level levels.0.actions.0.isActive
0 1 6422 true 3 true
The issue I am having is the numeric keys which gets included in the column name. Is there a way I can amend my code in order to achieve my desired output?
Desired Output:
_id labelId levels.active levels.level levels.actions.isActive
0 1 6422 true 3 true
First of all using parent_key as bool then assigning it other type value is not the best practice. It works but can become messy. I modified a code a bit, adding separate argument to track parent_key status as bool, and p_key which carry the string you wanted. Here is snippet
from pandas import json_normalize
from collections.abc import MutableMapping as mm
def flatten(dictionary, p_key=None, parent_key=False, separator='.'):
items = []
for key, value in dictionary.items():
if parent_key:
new_key = f"{str(p_key)}{separator}{key}"
else:
new_key = p_key if p_key else key
if isinstance(value, mm):
items.extend(flatten(
dictionary=value,
p_key=new_key,
parent_key=True,
separator=separator).items())
elif isinstance(value, list):
for k, v in enumerate(value):
items.extend(flatten(
dictionary={str(k): v},
p_key=new_key,
parent_key=False,
separator=separator).items())
else:
items.append((new_key, value))
return dict(items)
d = {
"_id" : 1,
"labelId" : [
6422
],
"levels" : [
{
"active" : "true",
"level" : 3,
"actions" : [
{
"isActive" : "true"
}]
}]
}
x = flatten(d)
x = json_normalize(x)
print(x)
I have a JSON file that looks like this:
data = {
"x": {
"y": {
"key": {
},
"w": {
}
}}}
And have converted it into a dict in python to them parse through it to look for keys, using the following code:
entry = input("Search JSON for the following: ") //search for "key"
if entry in data:
print(entry)
else:
print("Not found.")
However, even when I input "key" as entry, it still returns "Not found." Do I need to control the depth of data, what if I don't know the location of "key" but still want to search for it.
Your method is not working because key is not a key in data. data has one key: x. So you need to look at the dictionary and see if the key is in it. If not, you can pass the next level dictionaries back to the function recursively. This will find the first matching key:
data = {
"x": {
"y": {
"key": "some value",
"w": {}
}}}
key = "key"
def findValue(key, d):
if key in d:
return d[key]
for v in d.values():
if isinstance(v, dict):
found = findValue(key, v)
if found is not None:
return found
findValue(key, data)
# 'some value'
It will return None if your key is not found
Here's an approach which allows you to collect all the values from a nested dict, if the keys are repeated at different levels of nesting. It's very similar to the above answer, just wrapped in a function with a nonlocal list to hold the results:
def foo(mydict, mykey):
result = []
num_recursive_calls = 0
def explore(mydict, mykey):
#nonlocal result #allow successive recursive calls to write to list
#actually this is unnecessary in this case! Here
#is where we would need it, for a call counter:
nonlocal num_recursive_calls
num_recursive_calls += 1
for key in mydict.keys(): #get all keys from that level of nesting
if mykey == key:
print(f"Found {key}")
result.append({key:mydict[key]})
elif isinstance(mydict.get(key), dict):
print(f"Found nested dict under {key}, exploring")
explore(mydict[key], mykey)
explore(mydict, mykey)
print(f"explore called {num_recursive_calls} times") #see above
return result
For example, with
data = {'x': {'y': {'key': {}, 'w': {}}}, 'key': 'duplicate'}
This will return:
[{'key': {}}, {'key': 'duplicate'}]
Say There are two dictionaries in python -
Dict1
mydict1 = {
"Person" :
{
"FName" : "Rakesh",
"LName" : "Roshan",
"Gender" : "Male",
"Status" : "Married",
"Age" : "60",
"Children" :
[
{
"Fname" : "Hrithik",
"Lname" : "Roshan",
"Gender" : "Male",
"Status" : "Married",
"Children" : ["Akram", "Kamal"],
},
{
"Fname" : "Pinky",
"Lname" : "Roshan",
"Gender" : "Female",
"Status" : "Married",
"Children" : ["Suzan", "Tina", "Parveen"]
}
],
"Movies" :
{
"The Last Day" :
{
"Year" : 1990,
"Director" : "Mr. Kapoor"
},
"Monster" :
{
"Year" : 1991,
"Director" : "Mr. Khanna"
}
}
}
}
Dict2
mydict2 = {
"Person" :
{
"FName" : "Rakesh",
"LName" : "Roshan",
"Gender" : "Male",
"Status" : "Married",
"Children" :
[
{
"Fname" : "Hrithik",
"Lname" : "Losan",
"Gender" : "Male",
"Status" : "Married",
"Children" : ["Akram", "Ajamal"],
},
{
"Fname" : "Pinky",
"Lname" : "Roshan",
"Gender" : "Female",
"Status" : "Married",
"Children" : ["Suzan", "Tina"]
}
]
}
}
I want to compare two dictionaries and print the difference in report format as below -
MISMATCH 1
==========
MATCH DICT KEY : Person >> Children >> LName
EXPECTED : Roshan
ACUTAL : Losan
MISMATCH 2
==========
MATCH LIST ITEM : Person >> Children >> Children
EXPECTED : Kamal
ACTUAL : Ajamal
MISMATCH 3
==========
MATCH LIST ITEM : Person >> Children >> Children
EXPECTED : Parveen
ACTUAL : NOT_FOUND
MISMATCH 4
==========
MATCH DICT KEY : Person >> Age
EXPECTED : 60
ACTUAL : NOT_FOUND
MISMATCH 5
==========
MATCH DICT KEY : Person >> Movies
EXPECTED : { Movies : {<COMPLETE DICT>} }
ACTUAL : NOT_FOUND
I tried with Python module called datadiff which does not give me a pretty output in a dictionary format. To generate the report I have to traverse dictionary and find '+' '-' keys. If the dictionary is too complex then its hard to traverse.
UPDATE: I've updated the code to deal with lists in a more appropriate way. I've also commented the code to make it more clear if you need to change it.
This answer is not 100% general right now, but it can be expanded upon easily to fit what you need.
def print_error(exp, act, path=[]):
if path != []:
print 'MATCH LIST ITEM: %s' % '>>'.join(path)
print 'EXPECTED: %s' % str(exp)
print 'ACTUAL: %s' % str(act)
print ''
def copy_append(lst, item):
foo = lst[:]
foo.append(str(item))
return foo
def deep_check(comp, compto, path=[], print_errors=True):
# Total number of errors found, is needed for when
# testing the similarity of dicts
errors = 0
if isinstance(comp, list):
# If the types are not the same then it is probably a critical error
# return a number to represent how important this is
if not isinstance(compto, list):
if print_errors:
print_error(comp, 'NOT_LIST', path)
return 1
# We don't want to destroy the original lists
comp_copy = comp[:]
compto_copy = compto[:]
# Remove items that are both is comp and compto
# and find items that are only in comp
for item in comp_copy[:]:
try:
compto_copy.remove(item)
# Only is removed if the item is in compto_copy
comp_copy.remove(item)
except ValueError:
# dicts need to be handled differently
if isinstance(item, dict):
continue
if print_errors:
print_error(item, 'NOT_FOUND', path)
errors += 1
# Find non-dicts that are only in compto
for item in compto_copy[:]:
if isinstance(item, dict):
continue
compto_copy.remove(item)
if print_errors:
print_error('NOT_FOUND', item, path)
errors += 1
# Now both copies only have dicts
# This is the part that compares dicts with the minimum
# errors between them, it is expensive since each dict in comp_copy
# has to be compared against each dict in compto_copy
for c in comp_copy:
lowest_errors = None
lowest_value = None
for ct in compto_copy:
errors_in = deep_check(c, ct, path, print_errors=False)
# Get and store the minimum errors
if errors_in < lowest_errors or lowest_errors is None:
lowest_errors = errors_in
lowest_value = ct
if lowest_errors is not None:
errors += lowest_errors
# Has to have print_errors passed incase the list of dicts
# contains a list of dicts
deep_check(c, lowest_value, path, print_errors)
compto_copy.remove(lowest_value)
return errors
if not isinstance(compto, dict):
# If the types are not the same then it is probably a critical error
# return a number to represent how important this is
if print_errors:
print_error(comp, 'NOT_DICT')
return 1
for key, value in compto.iteritems():
try:
comp[key]
except KeyError:
if print_errors:
print_error('NO_KEY', key, copy_append(path, key))
errors += 1
for key, value in comp.iteritems():
try:
tovalue = compto[key]
except KeyError:
if print_errors:
print_error(value, 'NOT_FOUND', copy_append(path, key))
errors += 1
continue
if isinstance(value, (list, dict)):
errors += deep_check(value, tovalue, copy_append(path, key), print_errors)
else:
if value != tovalue:
if print_errors:
print_error(value, tovalue, copy_append(path, key))
errors += 1
return errors
With your dicts as input I get:
MATCH LIST ITEM: Person>>Age
EXPECTED: 60
ACTUAL: NOT_FOUND
MATCH LIST ITEM: Person>>Movies
EXPECTED: {'The Last Day': {'Director': 'Mr. Kapoor', 'Year': 1990}, 'Monster': {'Director': 'Mr. Khanna', 'Year': 1991}}
ACTUAL: NOT_FOUND
MATCH LIST ITEM: Person>>Children>>Lname
EXPECTED: Roshan
ACTUAL: Losan
MATCH LIST ITEM: Person>>Children>>Children
EXPECTED: Kamal
ACTUAL: NOT_FOUND
MATCH LIST ITEM: Person>>Children>>Children
EXPECTED: NOT_FOUND
ACTUAL: Ajamal
MATCH LIST ITEM: Person>>Children>>Children
EXPECTED: Parveen
ACTUAL: NOT_FOUND
The way lists are compared has been updated so that these two lists:
['foo', 'bar']
['foo', 'bing', 'bar']
Will only raise an error about 'bing' not being in the first list. With string values the value can either be in the list or not, but an issue arises when you are comparing a list of dicts. You'll end up with dicts from the list that do not match to varying degrees, and knowing what dicts to compare from those is not straight forward.
My implementation solves this by assuming that pairs of dicts that create the lowest number of errors are the ones that need to be compared together. For example:
test1 = {
"Name": "Org Name",
"Members":
[
{
"Fname": "foo",
"Lname": "bar",
"Gender": "Neuter",
"Roles": ["President", "Vice President"]
},
{
"Fname": "bing",
"Lname": "bang",
"Gender": "Neuter",
"Roles": ["President", "Vice President"]
}
]
}
test2 = {
"Name": "Org Name",
"Members":
[
{
"Fname": "bing",
"Lname": "bang",
"Gender": "Male",
"Roles": ["President", "Vice President"]
},
{
"Fname": "foo",
"Lname": "bar",
"Gender": "Female",
"Roles": ["President", "Vice President"]
}
]
}
Produces this output:
MATCH LIST ITEM: Members>>Gender
EXPECTED: Neuter
ACTUAL: Female
MATCH LIST ITEM: Members>>Gender
EXPECTED: Neuter
ACTUAL: Male