How to find repeats or duplicates in nested dictionary?

How to find repeats or duplicates in nested dictionary? - python

I have a nested dictionary and I'm trying to find duplicates within in. For example, if I have:
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
The return value would be something like:
True
because this dictionary contains duplicates.
I was able to do this quite easily with a regular dictionary, and I thought this would work well with this case too:
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
rev_dictionary = {}
for key, value in dictionary.items():
rev_dictionary.setdefault(value, set()).add(key)
print(rev_dictionary)
for key,values in dictionary.items():
if len(values) > 1:
values = True
else:
values = False
which throws the following error:
TypeError: unhashable type: 'dict'
How can I get this working?
Thanks for the help!
Note: I'd prefer a solution without using libraries if possible

I am assuming you are defining duplicates by value and not by keys. In that case, you can flatten the nested dict using (mentioned here)
def flatten(d):
out = {}
for key, val in d.items():
if isinstance(val, dict):
val = [val]
if isinstance(val, list):
for subdict in val:
deeper = flatten(subdict).items()
out.update({key + '_' + key2: val2 for key2, val2 in deeper})
else:
out[key] = val
return out
and then check for the condition
v = flatten(d).values()
len(set(v))!=len(v)
results in True

I wrote a simple solution:
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
def get_dups(a, values=None):
if values is None: values = []
if (a in values): return True
values.append(a)
if type(a) == dict:
for i in a.values():
if (get_dups(i, values=values)):
return True
return False
print(get_dups(dictionary))
How it works
We start by saving every value in a list, which we will pass into the function.
Each run we check whether our current value is in that list, and return True once there is a duplicate.
if (a in values): return True
Next we just loop through the values and run get_dups on them if the current index is also a dictionary.

You can recursively add item values of sub-dicts to a set, and if any item value is already "seen" in the set, raise an exception so that the wrapper can return True to indicate that a dupe is found:
def has_dupes(d):
def values(d):
seen = set()
for k, v in d.items():
if isinstance(v, dict):
s = values(v)
if seen & s:
raise RuntimeError()
seen.update(s)
else:
if v in seen:
raise RuntimeError()
seen.add(v)
return seen
try:
values(d)
except RuntimeError:
return True
return False
so that given your sample input, has_dupes(dictionary) would return: True

I think all you need is to flatten the dict before passing to your duplication-detection pipeline:
import pandas as pd
def flatten_dict(d):
df = pd.io.json.json_normalize(d, sep='_')
return df.to_dict(orient='records')[0]
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
dictionary = flatten_dict(dictionary)
print('flattend')
print(dictionary)
rev_dictionary = {}
for key, value in dictionary.items():
rev_dictionary.setdefault(value, set()).add(key)
print('reversed')
print(rev_dictionary)
is_duplicate = False
for key, values in rev_dictionary.items():
if len(values) > 1:
is_duplicate = True
break
print('is duplicate?', is_duplicate)
The result:
flattend
{'hello': 3, 'world_is_a': 3, 'world_is_dict': None, 'world_this': 5}
reversed
{3: {'hello', 'world_is_a'}, None: {'world_is_dict'}, 5: {'world_this'}}
is duplicate? True
Code for flattening a dict borrowed from: Flatten nested Python dictionaries, compressing keys.

Convert the nested dictionary into nested lists of their values:
def nested_values(v):
return map(nested_values, v.values()) if isinstance(v, dict) else v
Then flatten the nested lists into one list of all values in the dictionaries,
and then check the flattened list of values for duplicates:
from itertools import chain
def is_duplicated_value(d):
flat = list(chain.from_iterable(nested_values(d)))
return len(flat) != len(set(flat))
Test:
print is_duplicated_value( {1:'a', 2:'b', 3:{1:'c', 2:'a'}} )
print is_duplicated_value( {1:'a', 2:'b', 3:{1:'c', 2:'d'}} )
Outputs:
True
False
Depending on your use and size of dictionaries etc you may want to recast these steps into a recursive function that adds each value to a set checking if each value is in the set before adding and returning True immediately or False if the dictionary is exhausted.
class Duplicated(ValueError): pass
def is_dup(d):
values = set()
def add(v):
if isinstance(v, dict):
map(add, v.values())
else:
if v in values:
raise Duplicated
else:
values.add(v)
try:
add(d)
return False
except Duplicated:
return True
Test:
print is_dup( {1:'a', 2:'b', 3:{1:'c', 2:'a'}} )
print is_dup( {1:'a', 2:'b', 3:{1:'c', 2:'d'}} )
Outputs:
True
False

Related

Unable to return value from a recursive function

While trying to get value of a key in a nested dictionary by ignoring the hierarchy, it always returns None.
def nested(d,key):
for i in d.keys():
if i == key:
return d[i]
elif isinstance(d[i], dict):
nested(d[i],key)
j = {'hello': {'foo': {'bar': {'world':'yeay'}}}}
print(nested(j,'world'))
The expected answer is yeay of course, But what am I missing?

search for any string value in an ordered dictionary

I have a nested ordered dictionary i.e. a ordered dictionary, some of whose values are ordered dictionary that contain further ordered dictionaries.
When I use the collections module and use the following call
yOrDict = yaml_ordered_dict.load('ApplyForm.yml')
then I get an ordered dictionary yOrDict from the yml file ApplyForm.yml.
Say that yOrDict has just one key (ApplyForm) and a value which in turn is an ordered dictionary.
Further call gives another ordered dictionary:
yOrDictLevel1 = yOrDict.get('ApplyForm')
say that yOrDictLevel1 has six keys whose values contain ordered dictionaries.
say that at some place one of the ordered dictionary has a value named as Block.
Is there a call/function/method in python by which I can check if Block appears in the top level ordered dictionary - yOrDict ?
i.e. I want to check if the string "Block" is at all there in yOrDict ?

This should do the trick.
def find_value(needle, container):
# Already found the object. Return.
if needle == container:
return True
values = None
if isinstance(container, dict):
values = container.values()
elif hasattr(container, '__iter__'):
values = container.__iter__()
if values is None:
return False
# Check deeper in the container, if needed.
for val in values:
if find_value(needle, val):
return True
# No match found.
return False
Usage:
In [3]: d = { 'test': ['a', 'b', 'c'], 'd1': { 'd2': 'Block', 'a': 'b'} }
In [4]: find_value('Block', d)
Out[4]: True
Edit: testing whether a value contains needle:
def find_value(needle, container):
# Already found the object. Return.
if isinstance(container, basestring) and needle in container:
return True
values = None
if isinstance(container, dict):
values = container.values()
elif hasattr(container, '__iter__'):
values = container.__iter__()
if values is None:
return False
# Check deeper in the container, if needed.
for val in values:
if find_value(needle, val):
return True
# No match found.
return False

if "Block" in yOrDict.get('ApplyForm').values():
print "Found Block value"
It isn't explicitly a method of the higher level meta-dictionary, since it does a get and searches through the values, but its concise and does the trick.
edit response to your comment
if (("Beijing" in val) for key,val in yOrDict.get('ApplyForm')iteritems()):
print "Found Beijing"

return key by value in dictionary [duplicate]

This question already has answers here:
Inverse dictionary lookup in Python
(13 answers)
Closed 9 years ago.
I am trying to return the key in a dictionary given a value
in this case if 'b' is in the dictionary, I want it to return the key at which 'b' is (i.e 2)
def find_key(input_dict, value):
if value in input_dict.values():
return UNKNOWN #This is a placeholder
else:
return "None"
print(find_key({1:'a', 2:'b', 3:'c', 4:'d'}, 'b'))
The answer I want to get is the key 2, but I am unsure what to put in order to get the answer, any help would be much appreciated

Return first matching key:
def find_key(input_dict, value):
return next((k for k, v in input_dict.items() if v == value), None)
Return all matching keys as a set:
def find_key(input_dict, value):
return {k for k, v in input_dict.items() if v == value}
Values in a dictionary are not necessarily unique. The first option returns None if there is no match, the second returns an empty set for that case.
Since the order of dictionaries is arbitrary (dependent on what keys were used and the insertion and deletion history), what is considered the 'first' key is arbitrary too.
Demo:
>>> def find_key(input_dict, value):
... return next((k for k, v in input_dict.items() if v == value), None)
...
>>> find_key({1:'a', 2:'b', 3:'c', 4:'d'}, 'b')
2
>>> find_key({1:'a', 2:'b', 3:'c', 4:'d'}, 'z') is None
True
>>> def find_key(input_dict, value):
... return {k for k, v in input_dict.items() if v == value}
...
>>> find_key({1:'a', 2:'b', 3:'c', 4:'d'}, 'b')
set([2])
>>> find_key({1:'a', 2:'b', 3:'c', 4:'d', 5:'b'}, 'b')
set([2, 5])
>>> find_key({1:'a', 2:'b', 3:'c', 4:'d'}, 'z')
set([])
Note that we need to loop over the values each time we need to search for matching keys. This is not the most efficient way to go about this, especially if you need to match values to keys often. In that case, create a reverse index:
from collections import defaultdict
values_to_keys = defaultdict(set)
for key, value in input_dict:
values_to_keys[value].add(key)
Now you can ask for the set of keys directly in O(1) (constant) time:
keys = values_to_keys.get(value)
This uses sets; the dictionary has no ordering so either, sets make a little more sense here.

Amend your function as such:
def find_key_for(input_dict, value):
for k, v in input_dict.items():
if v == value:
yield k
Then to get the first key (or None if not present)
print next(find_key_for(your_dict, 'b'), None)
To get all positions:
keys = list(find_key_for(your_dict, 'b'))
Or, to get 'n' many keys:
from itertools import islice
keys = list(islice(find_key_for(your_dict, 'b'), 5))
Note - the keys you get will be 'n' many in the order the dictionary is iterated.
If you're doing this more often than not (and your values are hashable), then you may wish to transpose your dict
from collections import defaultdict
dd = defaultdict(list)
for k, v in d.items():
dd[v].append(k)
print dd['b']

Python looping over dictionary to substitute keys with the values from another dictionary

I have a dictionary, i want to change all the keys in that dictionary to the values in another dictionary.
For example:
X = {"apple" : 42}
Y = {"apple" : "Apples"}
After converting:
Dict X = {"Apples" : 42}
def convert(items, ID):
for key, value in items.items():
for keys, values in ID.items():
if keys == key:
key = values
return items
So I've written the above code in order to do this, however after executing this function, I print the dictionary and the keys have not changed.

This is because you're assigning local variables new values, not assigning dictionary keys new values.
However, to get the desired result, I would suggest doing what others have already suggested and creating a new dictionary, as your keys are not aligned with any existing dictionary:
If you'd like to do that, you have to set the value explicitly by dictionary assignment:
def convert(X, Y):
new_dict = {}
for x_key, x_value in X.items():
for y_key, y_value in Y.items():
if x_key == y_key:
new_dict[y_value] = x_value
return new_dict

Inside your first loop, you're iterating over the (key, value) pairs. Changing the value of the key variable will not update it on the dictionary.
What you have to do instead is reassign the value to the new key (values) and del the old key. This example creates a new copy so it doesn't modify the dict in place. I've also removed the inner for loop, since in python you can just check if the key is in the dict without iterating over all of them by using if key in dictionary.
def convert(items, id):
new_dict = items.copy()
for key, value in items.items():
if key in id:
new_key = id[key]
new_dict[new_key] = items[key] # Copy the value
del new_dict[key]
return new_dict
Example ipython session:
In [1]: items = {'apple': 42, 'orange': 17}
In [2]: new_keys = {'apple': 'banana', 'orange': 'tangerine'}
In [3]: def convert(items, ID):
...
In [13]: convert(items, new_keys)
Out[13]: {'banana': 42, 'tangerine': 17} # Updated dict returned
In [14]: items
Out[14]: {'apple': 42, 'orange': 17} # Original dict stays untouched

When you call items.items(), you're creating a copy of the dictionary's (key, value) pairs.
Thus, when you change the value of key, you're changing the value of a copy, not the original.
def convert(items, ID):
for key, value in items.items():
for keys, values in ID.items():
if keys == key:
items[key] = values
return items

Approach
Calculate the shared keys using set intersection.
Code for pre-2.7
def convert(items, ID):
# Find the shared keys
dst, src = set(items.keys()), set(ID.keys())
same_keys, diff_keys = dst.intersection(src), dst.difference(src)
# Make a new dictionary using the shared keys
new_values = [(ID[key], items[key]) for key in same_keys]
old_values = [(key, items[key]) for key in diff_keys]
return dict(new_values + old_values)
Code for 2.7+
def convert(items, ID):
# Find the shared keys
dst, src = set(items.keys()), set(ID.keys())
same_keys, diff_keys = dst.intersection(src), dst.difference(src)
# Make a new dictionary using the shared keys
new_values = {ID[key]: items[key] for key in same_keys}
old_values = {key: items[key] for key in diff_keys}
return reduce(lambda dst, src: dst.update(src) or dst, [new_values, old_values], {})
Test for pre-2.7
>>> def convert(items, ID):
... # Find the shared keys
... dst, src = set(items.keys()), set(ID.keys())
... same_keys, diff_keys = dst.intersection(src), dst.difference(src)
... # Make a new dictionary using the shared keys
... new_values = [(ID[key], items[key]) for key in same_keys]
... old_values = [(key, items[key]) for key in diff_keys]
... return dict(new_values + old_values)
...
>>> convert({"apple" : 42, "pear": 38}, {"apple" : "Apples", "peach": 31})
{'pear': 38, 'Apples': 42}
Test for 2.7+
>>> def convert(items, ID):
... # Find the shared keys
... dst, src = set(items.keys()), set(ID.keys())
... same_keys, diff_keys = dst.intersection(src), dst.difference(src)
... # Make a new dictionary using the shared keys
... new_values = {ID[key]: items[key] for key in same_keys}
... old_values = {key: items[key] for key in diff_keys}
... return reduce(lambda dst, src: dst.update(src) or dst, [new_values, old_values], {})
...
>>> convert({"apple" : 42, "pear": 38}, {"apple" : "Apples", "peach": 31})
{'pear': 38, 'Apples': 42}

Is there a reason why you need to modify the existing dictionary rather than just create a new dictionary?
To do this same task by creating a new dictionary, try this:
def convert(items, ID):
result = {}
for key, value in items.items():
if key in ID.keys():
result[ID[key]] = value
else:
result[key] = value
return result
If you really do want to modify the original dictionary you'll want to create a temporary new dictionary anyway, then fill the original with the contents of the new dictionary, like this
def convert(items, ID):
result = {}
for key, value in items.items():
if key in ID.keys():
result[ID[key]] = value
else:
result[key] = value
items.clear()
for key, value in result.items():
items[key] = value
return items
If you don't do it this way then you have to worry about overwriting values, i.e you try to rename a key to something already there. Here's an example of what I mean:
items = {"apples": 10, "bananas": 15}
ID = {"apples": "bananas", "bananas": "oranges"}
convert(items, ID)
print items
I assume that the behavior you want is to end up with {"bananas": 10, "oranges": 15}. What if it first renames "apples" to "bananas"? Then you have {"bananas": 10}, which will then become {"oranges": 10}.
The worst part about this is that it entirely depends on the order in which python iterates through the keys, which depends on the order in which you added them in the first place. If this ever changes in a future version of python, then the behavior of your program could change, which is something you DEFINITELY want to avoid.

you problem is that you worked in a new local copy of the dictionary or its key and value
if the problem is to return a new dictionary this will work in a single line
def convert(x,y):
return dict( (y.get(k,k), x[k]) for k in x )
x={'a':10, 'b':5}
y={'a':'A'}
print convert(x,y)
and in python 2.7+ you can even
def convert(x,y):
return { y.get(k,k): x[k] for k in x }
but if you want to work in the same input dictionary inplace then
def convert(x,y):
r={ y.get(k,k): x[k] for k in x }
for k in x.keys(): del x[k]
x.update(r)
x={'a':10, 'b':5}
y={'a':'A'}
convert(x,y)
print x

How to recursively replace character in keys of a nested dictionary?

I'm trying to create a generic function that replaces dots in keys of a nested dictionary. I have a non-generic function that goes 3 levels deep, but there must be a way to do this generic. Any help is appreciated! My code so far:
output = {'key1': {'key2': 'value2', 'key3': {'key4 with a .': 'value4', 'key5 with a .': 'value5'}}}
def print_dict(d):
new = {}
for key,value in d.items():
new[key.replace(".", "-")] = {}
if isinstance(value, dict):
for key2, value2 in value.items():
new[key][key2] = {}
if isinstance(value2, dict):
for key3, value3 in value2.items():
new[key][key2][key3.replace(".", "-")] = value3
else:
new[key][key2.replace(".", "-")] = value2
else:
new[key] = value
return new
print print_dict(output)
UPDATE: to answer my own question, I made a solution using json object_hooks:
import json
def remove_dots(obj):
for key in obj.keys():
new_key = key.replace(".","-")
if new_key != key:
obj[new_key] = obj[key]
del obj[key]
return obj
output = {'key1': {'key2': 'value2', 'key3': {'key4 with a .': 'value4', 'key5 with a .': 'value5'}}}
new_json = json.loads(json.dumps(output), object_hook=remove_dots)
print new_json

Yes, there exists better way:
def print_dict(d):
new = {}
for k, v in d.iteritems():
if isinstance(v, dict):
v = print_dict(v)
new[k.replace('.', '-')] = v
return new
(Edit: It's recursion, more on Wikipedia.)

Actually all of the answers contain a mistake that may lead to wrong typing in the result.
I'd take the answer of #ngenain and improve it a bit below.
My solution will take care about the types derived from dict (OrderedDict, defaultdict, etc) and also about not only list, but set and tuple types.
I also do a simple type check in the beginning of the function for the most common types to reduce the comparisons count (may give a bit of speed in the large amounts of the data).
Works for Python 3. Replace obj.items() with obj.iteritems() for Py2.
def change_keys(obj, convert):
"""
Recursively goes through the dictionary obj and replaces keys with the convert function.
"""
if isinstance(obj, (str, int, float)):
return obj
if isinstance(obj, dict):
new = obj.__class__()
for k, v in obj.items():
new[convert(k)] = change_keys(v, convert)
elif isinstance(obj, (list, set, tuple)):
new = obj.__class__(change_keys(v, convert) for v in obj)
else:
return obj
return new
If I understand the needs right, most of users want to convert the keys to use them with mongoDB that does not allow dots in key names.

I used the code by #horejsek, but I adapted it to accept nested dictionaries with lists and a function that replaces the string.
I had a similar problem to solve: I wanted to replace keys in underscore lowercase convention for camel case convention and vice versa.
def change_dict_naming_convention(d, convert_function):
"""
Convert a nested dictionary from one convention to another.
Args:
d (dict): dictionary (nested or not) to be converted.
convert_function (func): function that takes the string in one convention and returns it in the other one.
Returns:
Dictionary with the new keys.
"""
new = {}
for k, v in d.iteritems():
new_v = v
if isinstance(v, dict):
new_v = change_dict_naming_convention(v, convert_function)
elif isinstance(v, list):
new_v = list()
for x in v:
new_v.append(change_dict_naming_convention(x, convert_function))
new[convert_function(k)] = new_v
return new

Here's a simple recursive solution that deals with nested lists and dictionnaries.
def change_keys(obj, convert):
"""
Recursivly goes through the dictionnary obj and replaces keys with the convert function.
"""
if isinstance(obj, dict):
new = {}
for k, v in obj.iteritems():
new[convert(k)] = change_keys(v, convert)
elif isinstance(obj, list):
new = []
for v in obj:
new.append(change_keys(v, convert))
else:
return obj
return new

You have to remove the original key, but you can't do it in the body of the loop because it will throw RunTimeError: dictionary changed size during iteration.
To solve this, iterate through a copy of the original object, but modify the original object:
def change_keys(obj):
new_obj = obj
for k in new_obj:
if hasattr(obj[k], '__getitem__'):
change_keys(obj[k])
if '.' in k:
obj[k.replace('.', '$')] = obj[k]
del obj[k]
>>> foo = {'foo': {'bar': {'baz.121': 1}}}
>>> change_keys(foo)
>>> foo
{'foo': {'bar': {'baz$121': 1}}}

You can dump everything to a JSON
replace through the whole string and load the JSON back
def nested_replace(data, old, new):
json_string = json.dumps(data)
replaced = json_string.replace(old, new)
fixed_json = json.loads(replaced)
return fixed_json
Or use a one-liner
def short_replace(data, old, new):
return json.loads(json.dumps(data).replace(old, new))

While jllopezpino's answer works but only limited to the start with the dictionary, here is mine that works with original variable is either list or dict.
def fix_camel_cases(data):
def convert(name):
# https://stackoverflow.com/questions/1175208/elegant-python-function-to-convert-camelcase-to-snake-case
s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()
if isinstance(data, dict):
new_dict = {}
for key, value in data.items():
value = fix_camel_cases(value)
snake_key = convert(key)
new_dict[snake_key] = value
return new_dict
if isinstance(data, list):
new_list = []
for value in data:
new_list.append(fix_camel_cases(value))
return new_list
return data

Here's a 1-liner variant of #horejsek 's answer using dict comprehension for those who prefer:
def print_dict(d):
return {k.replace('.', '-'): print_dict(v) for k, v in d.items()} if isinstance(d, dict) else d
I've only tested this in Python 2.7

I am guessing you have the same issue as I have, inserting dictionaries into a MongoDB collection, encountering exceptions when trying to insert dictionaries that have keys with dots (.) in them.
This solution is essentially the same as most other answers here, but it is slightly more compact, and perhaps less readable in that it uses a single statement and calls itself recursively. For Python 3.
def replace_keys(my_dict):
return { k.replace('.', '(dot)'): replace_keys(v) if type(v) == dict else v for k, v in my_dict.items() }

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to find repeats or duplicates in nested dictionary? - python

Related

Unable to return value from a recursive function

search for any string value in an ordered dictionary

return key by value in dictionary [duplicate]

Python looping over dictionary to substitute keys with the values from another dictionary

How to recursively replace character in keys of a nested dictionary?

Categories

Resources