Except equivalent in python for nested dicts - python

I have two large nested dictionaries in the form
dictOne = "key1": {
"key2": {}
"key3": {
"key4" : {data...}
}
}
dictTwo = "key1": {
"key2": {}
}
Except they are thousands of lines long some of the dicts are nested 10-15 levels in.
I want to find a way to combine them together similar to an EXCEPT in SQL. I want any keys that show up in dictTwo to be deleted from dictOne, but only if the dict under the key doesn't have children.
So in this case the resulting dict would be
dictRes = "key1": {
"key3": {
"key4" : {data...}
}
}
I am assuming there is no easy way to do this, but I was hoping someone could point me in the right direction towards making a method that could accomplish this

Sounds like you need a recursive option.
def dict_parser(target, pruning):
d = {}
for k,v in target.items():
if (not v) and (k in pruning):
continue
if isinstance(v, dict):
d[k] = dict_parser(v, pruning.get(k, {}))
else:
d[k] = v
return d
DEMO:
dictOne = {"key1": {
"key2": {},
"key3": {
"key4" : {"some stuff"}
}
}}
dictTwo = {"key1": {
"key2": {}
}}
dict_parser(dictOne, dictTwo)
# gives:
# # {'key1': {
# # 'key3': {
# # 'key4': {'some stuff'}}}}

Related

Python: recursively append dictionary to another

I've searched and found this Append a dictionary to a dictionary but that clobbers keys from b if they exist in a..
I'd like to essentially recursively append 1 dictionary to another, where:
keys are unique (obviously, it's a dictionary), but each dictionary is fully represented in the result such that a.keys() and b.keys() are both subsets of c.keys()
if the same key is in both dictionaries, the resulting key contains a list of values from both, such that a[key] and b[key] are in c[key]
the values could be another dictionary, (but nothing deeper than 1 level), in which case the same logic should apply (append values) such that a[key1][key2] and b[key1][key2] are in c[key][key2]
The basic example is where 2 dictionary have keys that don't overlap, and I can accomplish that in multiple ways.. c = {**a, **b} for example, so I haven't covered that below
A trickier case:
a = {
"key1": "value_a1"
"key2": "value_a2"
}
b = {
"key1": "value_b1"
"key3": "value_b3"
}
c = combine(a, b)
c >> {
"key1": ["value_a1", "value_b1"],
"key2": "value_a2",
"key3": "value_b3"
}
An even trickier case
a = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_a2"],
"sub_key_2": "sub_value_a3"
},
"key2": "value_a2"
}
b = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_b1"],
"sub_key_2": "sub_value_b3"
},
"key3": "value_b3" # I'm okay with converting this to a list even if it's not one
}
c = combine(a, b)
c >> {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_a2", "sub_value_b1"], #sub_value_a1 is not duplicated
"sub_key_2": ["sub_value_a3", "sub_value_b3"]
},
"key2": "value_a2",
"key3": "value_b3" # ["value_b3"] this would be okay, following from the code comment above
}
Caveats:
Python 3.6
The examples show lists being created as_needed, but I'm okay with every non-dict value being a list, as mentioned in the code comments
The values within the lists will always be strings
I tried to explain as best I could but can elaborate more if needed. Been working on this for a few days and keep getting stuck on the sub key part
There is no simple built-in way of doing this, but you can recreate the logic in python.
def combine_lists(a: list, b: list) -> list:
return a + [i for i in b if i not in a]
def combine_strs(a: str, b: str) -> str:
if a == b:
return a
return [a, b]
class EMPTY:
"A sentinel representing an empty value."
def combine_dicts(a: dict, b: dict) -> dict:
output = {}
keys = list(a) + [k for k in b if k not in a]
for key in keys:
aval = a.get(key, EMPTY)
bval = b.get(key, EMPTY)
if isinstance(aval, list) and isinstance(bval, list):
output[key] = combine_lists(aval, bval)
elif isinstance(aval, str) and isinstance(bval, str):
output[key] = combine_strs(aval, bval)
elif isinstance(aval, dict) and isinstance(bval, dict):
output[key] = combine_dicts(aval, bval)
elif bval is EMPTY:
output[key] = aval
elif aval is EMPTY:
output[key] = bval
else:
raise RuntimeError(
f"Cannot combine types: {type(aval)} and {type(bval)}"
)
return output
Sounds like you want a specialised version of dict. So, you could subclass it to give you the behaviour you want. Being a bit of a Python noob, I started with the answer here : Subclassing Python dictionary to override __setitem__
Then I added the behaviour in your couple of examples.
I also added a MultiValue class which is a subclass of list. This makes it easy to tell if a value in the dict already has multiple values. Also it removes duplicates, as it looks like you don't want them.
class MultiValue(list):
# Class to hold multiple values for a dictionary key. Prevents duplicates.
def append(self, value):
if isinstance(value, MultiValue):
for v in value:
if not v in self:
super(MultiValue, self).append(v)
else:
super(MultiValue, self).append(value)
class MultiValueDict(dict):
# dict which converts a key's value to a MultiValue when the key already exists.
def __init__(self, *args, **kwargs):
self.update(*args, **kwargs)
def __setitem__(self, key, value):
# optional processing here
if key in self:
existing_value = self[key]
if isinstance(existing_value, MultiValueDict) and isinstance(value, dict):
existing_value.update(value)
return
if isinstance(existing_value, MultiValue):
existing_value.append(value)
value = existing_value
else:
value = MultiValue([existing_value, value])
super(MultiValueDict, self).__setitem__(key, value)
def update(self, *args, **kwargs):
if args:
if len(args) > 1:
raise TypeError("update expected at most 1 arguments, "
"got %d" % len(args))
other = dict(args[0])
for key in other:
self[key] = other[key]
for key in kwargs:
self[key] = kwargs[key]
def setdefault(self, key, value=None):
if key not in self:
self[key] = value
return self[key]
Example 1:
a = {
"key1": "value_a1",
"key2": "value_a2"
}
b = {
"key1": "value_b1",
"key3": "value_b3"
}
# combine by creating a MultiValueDict then using update to add b to it.
c = MultiValueDict(a)
c.update(b)
print(c)
# gives {'key1': ['value_a1', 'value_b1'], 'key2': 'value_a2', 'key3': 'value_b3'}
Example 2: The value for key1 is created as a MultiValueDict and the value for the sub_key_1 is a MultiValue, so this may not fit what you're trying to do. It depends how you're building you data set.
a = {
"key1": MultiValueDict({
"sub_key_1": MultiValue(["sub_value_a1", "sub_value_a2"]),
"sub_key_2": "sub_value_a3"
}),
"key2": "value_a2"
}
b = {
"key1": MultiValueDict({
"sub_key_1": MultiValue(["sub_value_a1", "sub_value_b1"]),
"sub_key_2": "sub_value_b3"
}),
"key3": "value_b3" # I'm okay with converting this to a list even if it's not one
}
c = MultiValueDict(a)
c.update(b)
print(c)
# gives {'key1': {'sub_key_1': ['sub_value_a1', 'sub_value_a2', 'sub_value_b1'], 'sub_key_2': ['sub_value_a3', 'sub_value_b3']}, 'key2': 'value_a2', 'key3': 'value_b3'}
a = {
"key1": "value_a1",
"key2": "value_a2"
}
b = {
"key1": "value_b1",
"key3": "value_b3"
}
def appendValues(ax,cx):
if type(ax)==list:#is key's value in a, a list?
cx.extend(ax)#if it is a list then extend
else:#key's value in a, os not a list
cx.append(ax)#so use append
cx=list(set(cx))#make values unique with set
return cx
def combine(a,b):
c={}
for x in b:#first copy b keys and values to c
c[x]=b[x]
for x in a:#now combine a with c
if not x in c:#this key is not in c
c[x]=a[x]#so add it
else:#key exists in c
if type(c[x])==list:#is key's value in c ,a list?
c[x]=appendValues(a[x],c[x])
elif type(c[x])==dict:#is key's value in c a dictionary?
c[x]=combine(c[x],a[x])#combine dictionaries
else:#so key';'s value is not list or dict
c[x]=[c[x]]#make value a list
c[x]=appendValues(a[x],c[x])
return c
c = combine(a, b)
print(c)
print("==========================")
a = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_a2"],
"sub_key_2": "sub_value_a3"
},
"key2": "value_a2"
}
b = {
"key1": {
"sub_key_1": ["sub_value_a1", "sub_value_b1"],
"sub_key_2": "sub_value_b3"
},
"key3": "value_b3" # I'm okay with converting this to a list even if it's not one
}
c = combine(a, b)
print(c)

Python Set all dictionary values to lower case with specific key

I have a problem. I want to set all the values from my dictionary that are connected with the ["key1"] to lowercase. I started to create a test dictionary:
# Define test devices
item1 = {
"key1": "VALUE1",
"key2": "VALUE2"
}
item2 = {
"key1": "VALUE1",
"key2": "VALUE2"
}
collection = []
collection.append(item1)
collection.append(item2)
After that, I started by tring to set every value to lowercase like this:
for item in collection:
item = dict((k, v.lower()) for k,v in item.items())
But after that, I printed the collection, but nothing changed.
Why are all my values not lowercase and how can I set it for a specific key?
When using for loop with dictionary, you iterate through the keys present in it. All you need to do is assign to the corresponding key using dictionary[key] = .... The dictionary[key] in the right-hand side fetches the value associated with the key upon which you may call the lower() function.
This will fix the issue:
for key in dictionary:
dictionary[key] = dictionary[key].lower()
help your self with this approach
item1 = {
"key1": "VALUE1",
"key2": "VALUE2"
}
for k, v in item1.items():
if k == 'key1':
item1.update({k:v.lower()})
print(item1)
output
{'key1': 'value1', 'key2': 'VALUE2'}
item1 = {
"key1": "VALUE1",
"key2": "VALUE2"
}
for key, value in item1.items():
item1[key]=value.lower()
print(item1)
output {'key1': 'value1', 'key2': 'value2'}
for item in collection:
for key in item:
if key=="key1":
item[key]=item[key].lower()
Why?
In python, strings are immutable. When you use string.lower(), you create a copy of the original string and turn all the characters into lowercases. Hence you need the key to point to the new string. Unless you reassign the new string, the old string would not be replaced.
Here is a working example
# Define test devices
item1 = {
"key1": "VALUE1",
"key2": "VALUE2"
}
item2 = {
"key1": "VALUE1",
"key2": "VALUE2"
}
collection = (item1,item2)
for item in collection:
for k,v in item.items():
item[k] = v.lower()
Here added an extra piece of code
for i in collection:
i["key1"] = i["key1"].lower()
Here is the file
item1 = {
"key1": "VALUE1",
"key2": "VALUE2"
}
item2 = {
"key1": "VALUE1",
"key2": "VALUE2"
}
collection = []
collection.append(item1)
collection.append(item2)
print("Before operation",collection)
for i in collection:
i["key1"] = i["key1"].lower()
print("After Operation",collection)
Output
There is just a little problem of misconception in how updating a dictionary works.
You are trying to use the item coming out of the for, while this is just a copy of the original. You could just use the key to refer to the item in the originnal dictionnary and update it:
# Define test devices
item1 = {
"key1": "VALUE1",
"key2": "VALUE2"
}
item2 = {
"key1": "VALUE1",
"key2": "VALUE2"
}
collection = []
collection.append(item1)
collection.append(item2)
for item in collection:
for k,v in item.items():
if k == "key1":
item[k] = v.lower()
print(collection)
You can update the list using the below code:
for item in collection:
item.update(dict((k, v.lower()) for k,v in item.items() if k=='key1'))
print(collection)
Output
[{'key1': 'value1', 'key2': 'VALUE2'}, {'key1': 'value1', 'key2': 'VALUE2'}]

Removing nulls and empty objects of mixed data types from a dictionary with a nested structure

Off the back of a similar question I asked, how would one go about cleaning a dictionary containing a variety of datatypes: nulls, empty lists, empty dicts etc, at varying levels of nesting E.g.
{
"key":"value",
"key1": {},
"key2": [],
"key3": True,
"key4": False,
"key5": None,
"key6": [1,2,3],
"key7": {
"subkey": "subvalue"
},
"key8": {
"subdict": {
"subdictkey": "subdictvalue",
"subdictkey1": {},
"subdictkey2": [],
"subdictkey3": None
}
}
}
Becomes:
{
"key":"value",
"key3": True,
"key4": False,
"key6": [1,2,3],
"key7": {
"subkey": "subvalue"
},
"key8": {
"subdict": {
"subdictkey": "subdictvalue"
}
}
}
The solution should be good for n levels of nesting (not just 1 level). Obviously I want to avoid nested loops (particularly as n could equal 3 or 4), is the only solution flattening the structure? Is there a more elegant way of going about it?
Edit:
Building on #Ch3steR answer and accounting for the issue I encountered with a list containing a null, this is the final working function:
def recur(n_dict,new_d={}):
global counter
for key,val in n_dict.items():
if val or isinstance(val,bool) or (isinstance(val,list) and any(elem is not None for elem in val)):
if (isinstance(val,list) and any(elem is None for elem in val)):
counter=counter+1
else:
new_d={**new_d,**{key:val}}
if isinstance(val,dict):
new_d[key]=recur(val)
return new_d
You can use Recursion when you are dealing with an arbitrarily nested dictionary.
Try this.
def recur(n_dict,new_d={}):
for key,val in n_dict.items():
if val or isinstance(val,bool):
new_d={**new_d,**{key:val}}
if isinstance(val,dict):
new_d[key]=recur(val)
return new_d
a={
"key":"value",
"key1": {},
"key2": [],
"key3": True,
"key4": False,
"key5": None,
"key6": [1,2,3],
"key7": {
"subkey": "subvalue"
},
"key8": {
"subdict": {
"subdictkey": "subdictvalue",
"subdictkey1": {},
"subdictkey2": [],
"subdictkey3": None
}
}
}
print(recur(a))
{'key': 'value',
'key3': True,
'key4': False,
'key6': [1, 2, 3],
'key7': {'subkey': 'subvalue'},
'key8': {'subdict': {'subdictkey': 'subdictvalue'}}}
recur(n_dict,new_d={}) uses mutable default argument.
Note:
Never mutate new_d in-place or you will encounter this problem.
One of the way to check if your default argument is changed is use __defaults__
>>>recur(a)
>>>recur.__defaults__
({},)
>>>recur(a)
>>>recur.__defaults__
({},)

Python - Getting the intersection of two Json-Files

i'm looking for an option to calculate the intersection of two JSON-Files. I have been searching for it and found that i can use sets for my problem. This works "okay". But i have to get a more detailed view of the intersection. And this is where the problems are starting.
How i calc the intersection:
def calcIntersect(ValidationFile, json_object1, json_object2):
with open(ValidationFile) as schema_file:
schema = j.load(schema_file)
js.Draft4Validator.check_schema(schema)
with open(json_object1) as spec_file:
spec1 = j.load(spec_file, object_pairs_hook=OrderedDict)
js.validate(spec1, schema)
with open(json_object2) as spec_file:
spec2 = j.load(spec_file, object_pairs_hook=OrderedDict)
js.validate(spec2, schema)
x = set(spec1) & set(spec2)
print(x)
Example Data1:
{
"Car":{
"Brand":"Audi",
"Nationality":"Germany",
"Modelname":"A6"
},
"Engine":{
"cubic capacity":"2967",
"Enginetype":"V6",
"Fuel":"Diesel",
"MaxSpeed":"250"
},
"Colors":{
"Carcolor":"Black",
"Interiorrcolor":"white"
}
}
Example Data2:
{
"Car":{
"Brand":"Audi",
"Nationality":"USA",
"Modelname":"A6"
},
"Engine":{
"cubic capacity":"2995",
"Enginetype":"V6",
"Fuel":"Petrol",
"MaxSpeed":"250"
},
"Colors":{
"Carcolor":"Black",
"Interiorrcolor":"Black"
}
}
Example-Output:
{'Car', 'Colors', 'Engine'}
This are just the "Keys" but i need the dictonaries. At the moment it is giving me this keys to say that there is a intersection in it. Maybe in 'Car' there is in both Files a "Audi" and the nationality is different because one car is produced in America and the other car is produced in Germany. But still it returns 'Car' and not the "Audi".
I hope i were able to describe my problem for a bit. It's my first question..
The following lines, inspired by #likeon's answer, will give you a dictionary whose keys will be the keys of the intersecting objects in your specs, and the values an array containing the intersecting objects.
intersect = { key: [o, spec2[key]] for key, o in spec1.iteritems()
if key in spec2 };
Edit:
If you are using python 3, you must use itemsinstead of iteritems:
intersect = { key: [o, spec2[key]] for key, o in spec1.items()
if key in spec2 };
Why you don't just iterate over spec1 and compare values with spec2 like that:
x = {k: v for k, v in spec1.iteritems() if k in spec2 and spec2[k] == v}
You'll need a recursive solution:
json1 = {
"Car": {
"Brand": "Audi",
"Nationality": "Germany",
"Modelname": "A6"
},
"Engine": {
"cubic capacity": "2967",
"Enginetype": "V6",
"Fuel": "Diesel",
"MaxSpeed": "250"
},
"Colors": {
"Carcolor": "Black",
"Interiorrcolor": "white"
}
}
json2 = {
"Car": {
"Brand": "Audi",
"Nationality": "USA",
"Modelname": "A6"
},
"Engine": {
"cubic capacity": "2995",
"Enginetype": "V6",
"Fuel": "Petrol",
"MaxSpeed": "250"
},
"Colors": {
"Carcolor": "Black",
"Interiorrcolor": "Black"
}
}
def common_dict(d1, d2):
output = {}
for k in set(d1) & set(d2):
o1, o2 = d1[k], d2[k]
if isinstance(o1, dict) and isinstance(o2, dict):
output[k] = common_dict(o1, o2)
elif o1 == o2:
output[k] = o1
return output
print common_dict(json1, json2)
# {'Engine': {'MaxSpeed': '250', 'Enginetype': 'V6'}, 'Car': {'Brand': 'Audi', 'Modelname': 'A6'}, 'Colors': {'Carcolor': 'Black'}}

Check existence of a key recursively and append to array of dict

I've a dict as follows
{
"key1" : "value1",
"key2" : "value2",
"key3" : "value3",
"key4" : {
"key5" : "value5"
}
}
If the dict has key1==value1, I'll append the dict into a list.
Suppose key1==value1 is not present in the first key value pair, whereas it is inside nested dict as follows:
{
"key2" : "value2",
"key3" : "value3",
"key4" : {
"key5" : "value5",
"key1" : "value1",
"key6" : {
"key7" : "value7",
"key1" : "value1"
}
},
"key8" : {
"key9" : "value9",
"key10" : {
"key11" : "value11",
"key12" : "value12",
"key1" : "value1"
}
}
}
In the above dict, I've to check first whether there is key1=value1. If not, I've to traverse the nested dict and if it found in the nested dict, I've to append that dict to the list. If the nested dict is also a nested dict but key1=value1 is find in the first key value pair, then no need to check the inner dict(Eg key4 has key1=value1 in the in the first key value pair. Hence no need to check the inner one eventhough key6 has key1=value1).
So finally, I'll have the list as follows.
[
{
"key5" : "value5",
"key1" : "value1",
"key6" : {
"key7" : "value7",
"key1" : "value1"
}
},
{
"key11" : "value11",
"key12" : "value12",
"key1" : "value1"
}
]
How to achieve this?
Note: The depth of the dict may vary
if a dict contains key1 and value1 we will add it to the list and finish.
if not, we will got into all the values in the dict that are dict and do the same logic as well
l = []
def append_dict(d):
if d.get("key1") == "value1":
l.append(d)
return
for k,v in d.items():
if isinstance(v, dict):
append_dict(v)
append_dict(d)
print l
an iterative solution will be adding to queue the dict we would like to check:
from Queue import Queue
q = Queue()
l = []
q.put(d)
while not q.empty():
d = q.get()
if d.get("key1") == "value1":
l.append(d)
continue
for k,v in d.items():
if isinstance(v, dict):
q.put(v)
print l
As #shashank noted, usinq a stack instead of a queue will also work
it is BFS vs DFS for searching in the dictionary

Categories