search for any string value in an ordered dictionary

search for any string value in an ordered dictionary - python

I have a nested ordered dictionary i.e. a ordered dictionary, some of whose values are ordered dictionary that contain further ordered dictionaries.
When I use the collections module and use the following call
yOrDict = yaml_ordered_dict.load('ApplyForm.yml')
then I get an ordered dictionary yOrDict from the yml file ApplyForm.yml.
Say that yOrDict has just one key (ApplyForm) and a value which in turn is an ordered dictionary.
Further call gives another ordered dictionary:
yOrDictLevel1 = yOrDict.get('ApplyForm')
say that yOrDictLevel1 has six keys whose values contain ordered dictionaries.
say that at some place one of the ordered dictionary has a value named as Block.
Is there a call/function/method in python by which I can check if Block appears in the top level ordered dictionary - yOrDict ?
i.e. I want to check if the string "Block" is at all there in yOrDict ?

This should do the trick.
def find_value(needle, container):
# Already found the object. Return.
if needle == container:
return True
values = None
if isinstance(container, dict):
values = container.values()
elif hasattr(container, '__iter__'):
values = container.__iter__()
if values is None:
return False
# Check deeper in the container, if needed.
for val in values:
if find_value(needle, val):
return True
# No match found.
return False
Usage:
In [3]: d = { 'test': ['a', 'b', 'c'], 'd1': { 'd2': 'Block', 'a': 'b'} }
In [4]: find_value('Block', d)
Out[4]: True
Edit: testing whether a value contains needle:
def find_value(needle, container):
# Already found the object. Return.
if isinstance(container, basestring) and needle in container:
return True
values = None
if isinstance(container, dict):
values = container.values()
elif hasattr(container, '__iter__'):
values = container.__iter__()
if values is None:
return False
# Check deeper in the container, if needed.
for val in values:
if find_value(needle, val):
return True
# No match found.
return False

if "Block" in yOrDict.get('ApplyForm').values():
print "Found Block value"
It isn't explicitly a method of the higher level meta-dictionary, since it does a get and searches through the values, but its concise and does the trick.
edit response to your comment
if (("Beijing" in val) for key,val in yOrDict.get('ApplyForm')iteritems()):
print "Found Beijing"

Related

Converting Keys and Values of the same length into Dictionary form

I have a function with two arguments:
keys (i.e. ['a','b','c'])
values to be put into a dictionary (i.e. [1,2,3])
and if the lists are the same length, I put them in dictionary form and return the dictionary. Otherwise, I return the keyword None from the function.
def dict_build(keys,values):
keys = ['']
values = []
dictionary = dict(zip(keys, values))
if len(keys) == len(values):
return dictionary
else:
return None
print(dict_build(['a','b','c'],[1,2,3])== { 'a': 1,'b': 2,'c': 3 })
# False
The output here should be True as the lists have the same length. Where am I going wrong here?

Your function immediately overwrites the keys and values parameters with empty lists, so you always return an empty dictionary.
Just deleting the first two lines of the function should fix it. You could also shorten it down further to just:
def dict_build(keys,values):
if len(keys) == len(values):
return dict(zip(keys, values))
else:
return None

How to find repeats or duplicates in nested dictionary?

I have a nested dictionary and I'm trying to find duplicates within in. For example, if I have:
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
The return value would be something like:
True
because this dictionary contains duplicates.
I was able to do this quite easily with a regular dictionary, and I thought this would work well with this case too:
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
rev_dictionary = {}
for key, value in dictionary.items():
rev_dictionary.setdefault(value, set()).add(key)
print(rev_dictionary)
for key,values in dictionary.items():
if len(values) > 1:
values = True
else:
values = False
which throws the following error:
TypeError: unhashable type: 'dict'
How can I get this working?
Thanks for the help!
Note: I'd prefer a solution without using libraries if possible

I am assuming you are defining duplicates by value and not by keys. In that case, you can flatten the nested dict using (mentioned here)
def flatten(d):
out = {}
for key, val in d.items():
if isinstance(val, dict):
val = [val]
if isinstance(val, list):
for subdict in val:
deeper = flatten(subdict).items()
out.update({key + '_' + key2: val2 for key2, val2 in deeper})
else:
out[key] = val
return out
and then check for the condition
v = flatten(d).values()
len(set(v))!=len(v)
results in True

I wrote a simple solution:
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
def get_dups(a, values=None):
if values is None: values = []
if (a in values): return True
values.append(a)
if type(a) == dict:
for i in a.values():
if (get_dups(i, values=values)):
return True
return False
print(get_dups(dictionary))
How it works
We start by saving every value in a list, which we will pass into the function.
Each run we check whether our current value is in that list, and return True once there is a duplicate.
if (a in values): return True
Next we just loop through the values and run get_dups on them if the current index is also a dictionary.

You can recursively add item values of sub-dicts to a set, and if any item value is already "seen" in the set, raise an exception so that the wrapper can return True to indicate that a dupe is found:
def has_dupes(d):
def values(d):
seen = set()
for k, v in d.items():
if isinstance(v, dict):
s = values(v)
if seen & s:
raise RuntimeError()
seen.update(s)
else:
if v in seen:
raise RuntimeError()
seen.add(v)
return seen
try:
values(d)
except RuntimeError:
return True
return False
so that given your sample input, has_dupes(dictionary) would return: True

I think all you need is to flatten the dict before passing to your duplication-detection pipeline:
import pandas as pd
def flatten_dict(d):
df = pd.io.json.json_normalize(d, sep='_')
return df.to_dict(orient='records')[0]
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
dictionary = flatten_dict(dictionary)
print('flattend')
print(dictionary)
rev_dictionary = {}
for key, value in dictionary.items():
rev_dictionary.setdefault(value, set()).add(key)
print('reversed')
print(rev_dictionary)
is_duplicate = False
for key, values in rev_dictionary.items():
if len(values) > 1:
is_duplicate = True
break
print('is duplicate?', is_duplicate)
The result:
flattend
{'hello': 3, 'world_is_a': 3, 'world_is_dict': None, 'world_this': 5}
reversed
{3: {'hello', 'world_is_a'}, None: {'world_is_dict'}, 5: {'world_this'}}
is duplicate? True
Code for flattening a dict borrowed from: Flatten nested Python dictionaries, compressing keys.

Convert the nested dictionary into nested lists of their values:
def nested_values(v):
return map(nested_values, v.values()) if isinstance(v, dict) else v
Then flatten the nested lists into one list of all values in the dictionaries,
and then check the flattened list of values for duplicates:
from itertools import chain
def is_duplicated_value(d):
flat = list(chain.from_iterable(nested_values(d)))
return len(flat) != len(set(flat))
Test:
print is_duplicated_value( {1:'a', 2:'b', 3:{1:'c', 2:'a'}} )
print is_duplicated_value( {1:'a', 2:'b', 3:{1:'c', 2:'d'}} )
Outputs:
True
False
Depending on your use and size of dictionaries etc you may want to recast these steps into a recursive function that adds each value to a set checking if each value is in the set before adding and returning True immediately or False if the dictionary is exhausted.
class Duplicated(ValueError): pass
def is_dup(d):
values = set()
def add(v):
if isinstance(v, dict):
map(add, v.values())
else:
if v in values:
raise Duplicated
else:
values.add(v)
try:
add(d)
return False
except Duplicated:
return True
Test:
print is_dup( {1:'a', 2:'b', 3:{1:'c', 2:'a'}} )
print is_dup( {1:'a', 2:'b', 3:{1:'c', 2:'d'}} )
Outputs:
True
False

Compare list with dictionary (that contains wildcards), return values

I have a list that contains several strings and a dictionary with strings (that contain wildcards) as keys and integers as values.
For example like this:
list1 = ['i', 'like', 'tomatoes']
dict1 = {'tomato*':'3', 'shirt*':'7', 'snowboard*':'1'}
I would like to go through list1 and see if there is a key in dict1 that (with the wildcard) matches the string from list1 and get the respective value from dict1. So in this case 3 for 'tomato*'.
Is there a way to iterate over list1, see if one of the dict1 keys (with wildcards) matches with this particular string and return the value from dict1?
I know I could iterate over dict1 and compare the keys with the elements in list1 this way. But in my case, the dict is very large and in addition, I have a lot of lists to go through. So it would take too much time to loop through the dictionary every time.
I thought about turning the keys into a list as well and get wildcard matches with a list comprehension and fnmatch(), but the returned match wouldn't be able to find the value in the dict (because of the wildcard).

Here is a data structure implemented using default python package to help you.
from collections import defaultdict
class Trie(defaultdict):
def __init__(self, value=None):
super().__init__(lambda: Trie(value)) # Trie is essentially hash-table within hash-table
self.__value = value
def __getitem__(self, key):
node = self
if len(key) > 1: # allows you to access the trie like this trie["abc"] instead of trie["a"]["b"]["c"]
for char in key:
node = node[char]
return node
else: # actual getitem routine
return defaultdict.__getitem__(self, key)
def __setitem__(self, key, value):
node = self
if len(key) > 1: # allows you to access the trie like this trie["abc"] instead of trie["a"]["b"]["c"]
for char in key[:-1]:
node = node[char]
node[key[-1]] = value
else: # actual setitem routine
if type(value) is int:
value = Trie(int(value))
defaultdict.__setitem__(self, key, value)
def __str__(self):
return str(self.__value)
d = Trie()
d["ab"] = 3
print(d["abcde"])
3

Finding if there are distinct elements in a python dictionary

I have a python dictionary containing n key-value pairs, out of which n-1 values are identical and 1 is not. I need to find the key of the distinct element.
For example: consider a python list [{a:1},{b:1},{c:2},{d:1}]. I need the to get 'c' as the output.
I can use a for loop to compare consecutive elements and then use two more for loops to compare those elements with the other elements. But is there a more efficient way to go about it or perhaps a built-in function which I am unaware of?

If you have a dictionary you can quickly check and find the first value which is different from the next two values cycling around the keys of your dictionary.
Here's an example:
def find_different(d):
k = d.keys()
for i in xrange(0, len(k)):
if d[k[i]] != d[k[(i+1)%len(k)]] and d[k[i]] != d[k[(i+2)%len(k)]]:
return k[i]
>>> mydict = {'a':1, 'b':1, 'c':2, 'd':1}
>>> find_different(mydict)
'c'
Otherwise, if what you have is a list of single-key dictionaries, then you can do it quite nicely mapping your list with a function which "extracts" the values from your elements, then check each one using the same logic.
Here's another working example:
def find_different(l):
mask = map(lambda x: x[x.keys()[0]], l)
for i in xrange(0, len(l)):
if mask[i] != mask[(i+1)%len(l)] and mask[i] != mask[(i+2)%len(l)]:
return l[i].keys()[0]
>>> mylist = [{'a':1},{'b':1},{'c':2},{'d':1}]
>>> find_different(mylist)
'c'
NOTE: these solutions do not work in Python 3 as the map function doesn't return a list and neither does the .keys() method of dictionaries.

Assuming that your "list of pairs" (actually list of dictionaries, sigh) cannot be changed:
from collections import defaultdict
def get_pair(d):
return (d.keys()[0], d.values()[0])
def extract_unique(l):
d = defaultdict(list)
for key, value in map(get_pair, l):
d[value].append(key)
return filter(lambda (v,l): len(l) == 1, d.items())[0][1]

If you already have your dictionary, then you make a list of all of the keys: key_list = yourDic.keys(). Using that list, you can then loop through your dictionary. This is easier if you know one of the values, but below I assume that you do not.
yourDic = {'a':1, 'b':4, 'c':1, 'd':1, }
key_list = yourDic.keys()
previous_value = yourDic[key_list[0]] # Making it so loop gets past first test
count = 0
for key in key_list:
test_value = yourDic[key]
if (test_value != previous_value) and count == 1: # Checks first key
print key_list[count - 1]
break
elif (test_value != previous_value):
print key
break
else:
previous_value = test_value
count += 1
So, once you find the value that is different, it will print the key. If you want it to print the value, too, you just need a print test_value statement

python check multi-level dict key existence

Many SO posts show you how to efficiently check the existence of a key in a dictionary, e.g., Check if a given key already exists in a dictionary
How do I do this for a multi level key? For example, if d["a"]["b"] is a dict, how can I check if d["a"]["b"]["c"]["d"] exists without doing something horrendous like this:
if "a" in d and isInstance(d["a"], dict) and "b" in d["a"] and isInstance(d["a"]["b"], dict) and ...
Is there some syntax like
if "a"/"b"/"c"/"d" in d
What I am actually using this for: we have jsons, parsed into dicts using simplejson, that I need to extract values from. Some of these values are nested three and four levels deep; but sometimes the value doesn't exist at all. So I wanted something like:
val = None if not d["a"]["b"]["c"]["d"] else d["a"]["b"]["c"]["d"] #here d["a"]["b"] may not even exist
EDIT: prefer not to crash if some subkey exists but is not a dictionary, e.g, d["a"]["b"] = 5.

Sadly, there isn't any builtin syntax or a common library to query dictionaries like that.
However, I believe the simplest(and I think it's efficient enough) thing you can do is:
d.get("a", {}).get("b", {}).get("c")
Edit: It's not very common, but there is: https://github.com/akesterson/dpath-python
Edit 2: Examples:
>>> d = {"a": {"b": {}}}
>>> d.get("a", {}).get("b", {}).get("c")
>>> d = {"a": {}}
>>> d.get("a", {}).get("b", {}).get("c")
>>> d = {"a": {"b": {"c": 4}}}
>>> d.get("a", {}).get("b", {}).get("c")
4

This isn't probably a good idea and I wouldn't recommend using this in prod. However, if you're just doing it for learning purposes then the below might work for you.
def rget(dct, keys, default=None):
"""
>>> rget({'a': 1}, ['a'])
1
>>> rget({'a': {'b': 2}}, ['a', 'b'])
2
"""
key = keys.pop(0)
try:
elem = dct[key]
except KeyError:
return default
except TypeError:
# you gotta handle non dict types here
# beware of sequences when your keys are integers
if not keys:
return elem
return rget(elem, keys, default)

UPDATE: I ended up writing my own open-source, pippable library that allows one to do this: https://pypi.python.org/pypi/dictsearch

A non-recursive version, quite similar to #Meitham's solution, which does not mutate the looked-for key. Returns True/False if the exact structure is present in the source dictionary.
def subkey_in_dict(dct, subkey):
""" Returns True if the given subkey is present within the structure of the source dictionary, False otherwise.
The format of the subkey is parent_key:sub_key1:sub_sub_key2 (etc.) - description of the dict structure, where the
character ":" is the delemiter.
:param dct: the dictionary to be searched in.
:param subkey: the target keys structure, which should be present.
:returns Boolean: is the keys structure present in dct.
:raises AttributeError: if subkey is not a string.
"""
keys = subkey.split(':')
work_dict = dct
while keys:
target = keys.pop(0)
if isinstance(work_dict, dict):
if target in work_dict:
if not keys: # this is the last element in the input, and it is in the dict
return True
else: # not the last element of subkey, change the temp var
work_dict = work_dict[target]
else:
return False
else:
return False
The structure that is checked is in the form parent_key:sub_key1:sub_sub_key2, where the : char is the delimiter. Obviously - it will match case-sensitively, and will stop (return False) if there's a list within the dictionary.
Sample usage:
dct = {'a': {'b': {'c': {'d': 123}}}}
print(subkey_in_dict(dct, 'a:b:c:d')) # prints True
print(subkey_in_dict(dct, 'a:b:c:d:e')) # False
print(subkey_in_dict(dct, 'a:b:d')) # False
print(subkey_in_dict(dct, 'a:b:c')) # True

This is what I usually use
def key_in_dict(_dict: dict, key_lookup: str, separator='.'):
"""
Searches for a nested key in a dictionary and returns its value, or None if nothing was found.
key_lookup must be a string where each key is deparated by a given "separator" character, which by default is a dot
"""
keys = key_lookup.split(separator)
subdict = _dict
for k in keys:
subdict = subdict[k] if k in subdict else None
if subdict is None: break
return subdict
Returns the key if exists, or None it it doesn't
key_in_dict({'test': {'test': 'found'}}, 'test.test') // 'found'
key_in_dict({'test': {'test': 'found'}}, 'test.not_a_key') // None

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

search for any string value in an ordered dictionary - python

Related

Converting Keys and Values of the same length into Dictionary form

How to find repeats or duplicates in nested dictionary?

Compare list with dictionary (that contains wildcards), return values

Finding if there are distinct elements in a python dictionary

python check multi-level dict key existence

Categories

Resources