Check for a key pattern in a dictionary in python - python

dict1=({"EMP$$1":1,"EMP$$2":2,"EMP$$3":3})
How to check if EMP exists in the dictionary using python
dict1.get("EMP##") ??

It's not entirely clear what you want to do.
You can loop through the keys in the dict selecting keys using the startswith() method:
>>> for key in dict1:
... if key.startswith("EMP$$"):
... print "Found",key
...
Found EMP$$1
Found EMP$$2
Found EMP$$3
You can use a list comprehension to get all the values that match:
>>> [value for key,value in dict1.items() if key.startswith("EMP$$")]
[1, 2, 3]
If you just want to know if a key matches you could use the any() function:
>>> any(key.startswith("EMP$$") for key in dict1)
True

This approach strikes me as contrary to the intent of a dictionary.
A dictionary is made up of hash keys which have had values associated with them. The benefit of this structure is that it provides very fast lookups (on the order of O(1)). By searching through the keys, you're negating that benefit.
I would suggest reorganizing your dictionary.
dict1 = {"EMP$$": {"1": 1, "2": 2, "3": 3} }
Then, finding "EMP$$" is as simple as
if "EMP$$" in dict1:
#etc...

You need to be a lot more specific with what you want to do. However, assuming the dictionary you gave:
dict1={"EMP$$1":1, "EMP$$2":2, "EMP$$3":3}
If you wanted to know if a specific key was present before trying to request it you could:
dict1.has_key('EMP$$1')
True
Returns True as dict1 has the a key EMP$$1.
You could also forget about checking for keys and rely on the default return value of dict1.get():
dict1.get('EMP$$5',0)
0
Returns 0 as default given dict1 doesn't have a key EMP$$5.
In a similar way you could also use a `try/except/ structure to catch and handle missed keys:
try:
dict1['EMP$$5']
except KeyError, e:
# Code to deal w key error
print 'Trapped key error in dict1 looking for %s' % e
The other answers to this question are also great, but we need more info to be more precise.

There's no way to match dictionary keys like this. I suggest you rethink your data structure for this problem. If this has to be extra quick you could use something like a suffix tree.

You can use in string operator that checks if item is in another string. dict1 iterator returns list of keys, so you check "EMP$$" against of each dict1.key.
dict1 = {"EMP$$1": 1, "EMP$$2": 2, "EMP$$3": 3}
print(any("EMP$$" in i for i in dict1))
# True
# testing for item that doesn't exist
print(any("AMP$$" in i for i in dict1))
# False

Related

Is it okay to nest a dict.get() inside another or is this bad design?

So, I am working on a code base where a dictionary contains some key information. At some point in the development process the name of one of the keys was changed, but the older key still exists in a lot of places. Lets call the keys new and old for reference.
In order to make it compatible with the older version, I am doing something like:
dict_name.get(new_key,dict_name.get(old_key,None))
Is this bad design or is it okay? Why/Why not?
Example for clarification: (Based on input by #Alexander)
There are two dictionaries d1 and d2.
d1={k1:v1,old_key:some_value}
d2={k1:v1,new_key:some_value}
The function which I am designing right now could get either d1 or d2 like dictionary as an argument. My function should be able to pick up some_value, regardless of whether old_key or new_key is present.
That is a reasonable approach. The only downside is that it will perform the get for both keys, which will not affect performance in most situations.
My only notes are nitpicks:
dict is a reserved word, so don't use it as a variable
None is the default, so it can be dropped for old_key, e.g.:
info.get('a', info.get('b'))
In response to "Is there a way to prevent the double call?": Yup, several reasonable ways exist =).
The one-liner would probably look like:
info['a'] if 'a' in info else info.get('b')
which starts to get difficult to read if your keys are longer.
A more verbose way would be to expand it out into full statements:
val = None
if 'a' in info:
val = info['a']
elif 'b' in info:
val = info['b']
And finally a generic option (default after *keys) will only work with python 3):
def multiget(info, *keys, default=None):
''' Try multiple keys in order, or default if not present '''
for k in keys:
if k in info:
return info[k]
return default
which would let you resolve multiple invocations cleanly, e.g.:
option_1 = multiget(info, 'a', 'b')
option_2 = multiget(info, 'x', 'y', 'z', default=10)
If this is somehow a pandemic of multiple api versions or something (?) you could even go so far as wrapping dict, though it is likely to be overkill:
>>> class MultiGetDict(dict):
... def multiget(self, *keys, default=None):
... for k in keys:
... if k in self:
... return self[k]
... return default
...
>>> d = MultiGetDict({1: 2})
>>> d.multiget(1)
2
>>> d.multiget(0, 1)
2
>>> d.multiget(0, 2)
>>> d.multiget(0, 2, default=3)
3
dict.get is there for exactly this reason, so you can fall back on default values if the keys are not in there.
Having a double fallback is very much OK. For example:
d = {}
result = d.get('new_key',d.get('old_key', None))
This would mean that result is None in the worse case, but there is no error (which is the goal of get in the first place.
In other words, it will get the value of new_key as a first priority, old_key as the second priority, and None as a third.
Also worth noting that get(key, None) is the same as get(key) so you might want to shorten that line:
result = d.get('new_key', d.get('old_key'))
If you want to avoid calling get multiple times (for example, if you have to do more than 2 of those, it will be unreadable) you can do something like this:
priority = ('new_key', 'old_key', 'older_key', 'oldest_key')
for key in priority:
result = d.get(key)
if result is not None:
break
And result becomes whatever is encountered first in that loop, or None otherwise
Based on the sample dictionary provided, I would argue that this is bad design...
Lets say your original dictionary is:
d1 = {'k1': 1, 'k2': 2}
If I understand you correctly, you then 'update' one of the keys, e.g.:
d1 = {'k3': 1, 'k2': 2}
If you try to access via:
d1.get('k3', d1.get('k1')) # 'k3' is new key, 'k1' is old key.
then the first lookup will always be present and the second lookup will never be used.
If you meant that the new dictionary would looks like:
d2 = {'k1': 1, 'k2': 2, 'k3': 1}
then you are storing the 'same' data in two different locations in your dictionary, which will surely lead to trouble (similar to normalized data in a database). For example, if the value of 'k3' was updated to 3, then the value of k1 would need to be updated as well.
Given the dictionaries provided in your example:
d1={k1: v1, old_key: some_value}
d2={k1: v1, new_key: some_value}
I assume that some_value are intended to be equal in both, i.e. d1[old_key] == d2[new_key]. If so, then you could use d2.get(new_key, d1.get(old_key). However, it just seems like a mess.
If some_value needs to be updated, for example, it must be updated in both dictionaries.
You are wasting memory by storing the some_value twice.
Your new_key in d2 may accidentally clobber an existing key in d1.
I would recommend not changing the key names in the first place.

Python set dictionary nested key with dot delineated string

If I have a dictionary that is nested, and I pass in a string like "key1.key2.key3" which would translate to:
myDict["key1"]["key2"]["key3"]
What would be an elegant way to be able to have a method where I could pass on that string and it would translate to that key assignment? Something like
myDict.set_nested('key1.key2.key3', someValue)
Using only builtin stuff:
def set(my_dict, key_string, value):
"""Given `foo`, 'key1.key2.key3', 'something', set foo['key1']['key2']['key3'] = 'something'"""
# Start off pointing at the original dictionary that was passed in.
here = my_dict
# Turn the string of key names into a list of strings.
keys = key_string.split(".")
# For every key *before* the last one, we concentrate on navigating through the dictionary.
for key in keys[:-1]:
# Try to find here[key]. If it doesn't exist, create it with an empty dictionary. Then,
# update our `here` pointer to refer to the thing we just found (or created).
here = here.setdefault(key, {})
# Finally, set the final key to the given value
here[keys[-1]] = value
myDict = {}
set(myDict, "key1.key2.key3", "some_value")
assert myDict == {"key1": {"key2": {"key3": "some_value"}}}
This traverses myDict one key at a time, ensuring that each sub-key refers to a nested dictionary.
You could also solve this recursively, but then you risk RecursionError exceptions without any real benefit.
There are a number of existing modules that will already do this, or something very much like it. For example, the jmespath module will resolve jmespath expressions, so given:
>>> mydict={'key1': {'key2': {'key3': 'value'}}}
You can run:
>>> import jmespath
>>> jmespath.search('key1.key2.key3', mydict)
'value'
The jsonpointer module does something similar, although it likes / for a separator instead of ..
Given the number of pre-existing modules I would avoid trying to write your own code to do this.
EDIT: OP's clarification makes it clear that this answer isn't what he's looking for. I'm leaving it up here for people who find it by title.
I implemented a class that did this a while back... it should serve your purposes.
I achieved this by overriding the default getattr/setattr functions for an object.
Check it out! AndroxxTraxxon/cfgutils
This lets you do some code like the following...
from cfgutils import obj
a = obj({
"b": 123,
"c": "apple",
"d": {
"e": "nested dictionary value"
}
})
print(a.d.e)
>>> nested dictionary value

Python dictionary check if the values match with other key values

I have created a python dictionary with a structure like :-
mydict = {'2018-08' : [32124,4234,23,2323,32423,342342],
'2018-07' : [13123,23424,2,3,4343,4232,2342],
'2018-06' : [1231,12,12313,12331,3123131313,434546,232]}
I want to check if any value in the values of key '2018-08' match with any values of other keys. is there a short way to write this?
You can simply loop over your expected values of mydict, and then for each of them check if its present in any of the values of the dictionary
You can use the idiom if item in list to check if item item is present in the list list
expected_values = mydict['2018-08']
found = False
for expected in expected_values:
for key in mydict:
if expected in mydict[key]:
found = True
break
Take into account that is a brute force algorithm and it may not be the optimal solution for larger dictionaries
The question is vague, so I am assuming that you want the values in the target month (e.g. 2018-08) that are contained somewhere within the other months.
Sets are much faster for testing membership compared to list iteration.
target = '2018-08'
s = set()
for k, v in mydict.iteritems():
if k != target:
s.update(v)
matches = set(mydict[target]) & s
Can use itertools.chain to create a long list
import itertools
for key,value in mydict.items():
temp_dict = mydict.copy()
temp_dict.pop(key)
big_value_list=list(itertools.chain(*temp_dict.values()))
print(key, set(value) & set(big_value_list))
Dry run by changing your provided inputs
mydict = {'2018-08' : [32124,4234,23,2323,32423,342342],
'2018-07' : [13123,23424,2,3,4343,4232,2342],
'2018-06' : [1231,12,12313,12331,3123131313,434546,232,342342,2342]}
Output:
('2018-08', set([342342]))
('2018-07', set([2342]))
('2018-06', set([342342, 2342]))

Creating a "dictionary of sets"

I need to efficiently store data in something that would resemble a "dictionary of sets" e.g. have a dictionary with multiple (unique) values matching each unique key. The source of my data would be a (not very well) structured XML.
My idea is:
I will look through a number of elements and find keys. If the key does not exist, add it to dictionary, if it already exists, just add a new value in the corresponding key.
And the result would be something like:
{
'key1': {'1484', '1487', 1488', ...}
'key2': {'1485', '1486', '1489', ...}
'key3': {'1490', '1491', '1492', ...}
...
}
I need to add new keys on the go.
I need to push unique values into each set.
I need to be able to iterate through the whole dictionary.
I am not sure if this is even feasible, but if anybody could push me in the right direction, I would be more than thankful.
I'm not going to benchmark this but in my experience native dicts are faster
store = {}
for key, value in yoursource:
try:
store[key].add(value)
except KeyError:
store[key] = {value}
from collections import defaultdict
mydict = defaultdict(set)
mydict["key1"] |= {'1484', '1487', '1488'}
Iteration is just like the normal dict.
Using dict.setdefault() to create the key if it doesn't exist, and initialising it with an empty set:
store = {}
for key, value in yoursource:
store.setdefault(key, set()).add(value)

How to retrieve from python dict where key is only partially known?

I have a dict that has string-type keys whose exact values I can't know (because they're generated dynamically elsewhere). However, I know that that the key I want contains a particular substring, and that a single key with this substring is definitely in the dict.
What's the best, or "most pythonic" way to retrieve the value for this key?
I thought of two strategies, but both irk me:
for k,v in some_dict.items():
if 'substring' in k:
value = v
break
-- OR --
value = [v for (k,v) in some_dict.items() if 'substring' in k][0]
The first method is bulky and somewhat ugly, while the second is cleaner, but the extra step of indexing into the list comprehension (the [0]) irks me. Is there a better way to express the second version, or a more concise way to write the first?
There is an option to write the second version with the performance attributes of the first one.
Use a generator expression instead of list comprehension:
value = next(v for (k,v) in some_dict.iteritems() if 'substring' in k)
The expression inside the parenthesis will return an iterator which you will then ask to provide the next, i.e. first element. No further elements are processed.
How about this:
value = (v for (k,v) in some_dict.iteritems() if 'substring' in k).next()
It will stop immediately when it finds the first match.
But it still has O(n) complexity, where n is the number of key-value pairs. You need something like a suffix list or a suffix tree to speed up searching.
If there are many keys but the string is easy to reconstruct from the substring, then it can be faster reconstructing it. e.g. often you know the start of the key but not the datestamp that has been appended on. (so you may only have to try 365 dates rather than iterate through millions of keys for example).
It's unlikely to be the case but I thought I would suggest it anyway.
e.g.
>>> names={'bob_k':32,'james_r':443,'sarah_p':12}
>>> firstname='james' #you know the substring james because you have a list of firstnames
>>> for c in "abcdefghijklmnopqrstuvwxyz":
... name="%s_%s"%(firstname,c)
... if name in names:
... print name
...
james_r
class MyDict(dict):
def __init__(self, *kwargs):
dict.__init__(self, *kwargs)
def __getitem__(self,x):
return next(v for (k,v) in self.iteritems() if x in k)
# Defining several dicos ----------------------------------------------------
some_dict = {'abc4589':4578,'abc7812':798,'kjuy45763':1002}
another_dict = {'boumboum14':'WSZE x478',
'tagada4783':'ocean11',
'maracuna102455':None}
still_another = {12:'jfg',45:'klsjgf'}
# Selecting the dicos whose __getitem__ method will be changed -------------
name,obj = None,None
selected_dicos = [ (name,obj) for (name,obj) in globals().iteritems()
if type(obj)==dict
and all(type(x)==str for x in obj.iterkeys())]
print 'names of selected_dicos ==',[ name for (name,obj) in selected_dicos]
# Transforming the selected dicos in instances of class MyDict -----------
for k,v in selected_dicos:
globals()[k] = MyDict(v)
# Exemple of getting a value ---------------------------------------------
print "some_dict['7812'] ==",some_dict['7812']
result
names of selected_dicos == ['another_dict', 'some_dict']
some_dict['7812'] == 798
I prefer the first version, although I'd use some_dict.iteritems() (if you're on Python 2) because then you don't have to build an entire list of all the items beforehand. Instead you iterate through the dict and break as soon as you're done.
On Python 3, some_dict.items(2) already results in a dictionary view, so that's already a suitable iterator.

Categories