Best idiom to get and set a value in a python dict

Best idiom to get and set a value in a python dict - python

I use a dict as a short-term cache. I want to get a value from the dictionary, and if the dictionary didn't already have that key, set it, e.g.:
val = cache.get('the-key', calculate_value('the-key'))
cache['the-key'] = val
In the case where 'the-key' was already in cache, the second line is not necessary. Is there a better, shorter, more expressive idiom for this?

yes, use:
val = cache.setdefault('the-key', calculate_value('the-key'))
An example in the shell:
>>> cache = {'a': 1, 'b': 2}
>>> cache.setdefault('a', 0)
1
>>> cache.setdefault('b', 0)
2
>>> cache.setdefault('c', 0)
0
>>> cache
{'a': 1, 'c': 0, 'b': 2}
See: http://docs.python.org/release/2.5.2/lib/typesmapping.html

Readability matters!
if 'the-key' not in cache:
cache['the-key'] = calculate_value('the-key')
val = cache['the-key']
If you really prefer an one-liner:
val = cache['the-key'] if 'the-key' in cache else cache.setdefault('the-key', calculate_value('the-key'))
Another option is to define __missing__ in the cache class:
class Cache(dict):
def __missing__(self, key):
return self.setdefault(key, calculate_value(key))

Have a look at the Python Decorator Library, and more specifically Memoize which acts as a cache. That way you can just decorate your call the calculate_value with the Memoize decorator.

Approach with
cache.setdefault('the-key',calculate_value('the-key'))
is great if calculate_value is not costly, because it will be evaluated each time. So if you have to read from DB, open a file or network connection or do anything "expensive", then use the following structure:
try:
val = cache['the-key']
except KeyError:
val = calculate_value('the-key')
cache['the-key'] = val

You might want to take a look at (the entire page at) "Code Like a Pythonista" http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#dictionary-get-method
It covers the setdefault() technique described above, and the defaultdict technique is also very handy for making dictionaries of sets or arrays for example.

You can also use defaultdict to do something similar:
>>> from collections import defaultdict
>>> d = defaultdict(int) # will default values to 0
>>> d["a"] = 1
>>> d["a"]
1
>>> d["b"]
0
>>>
You can assign any default you want by supplying your own factory function and itertools.repeat:
>>> from itertools import repeat
>>> def constant_factory(value):
... return repeat(value).next
...
>>> default_value = "default"
>>> d = defaultdict(constant_factory(default_value))
>>> d["a"]
'default'
>>> d["b"] = 5
>>> d["b"]
5
>>> d.keys()
['a', 'b']

use setdefault method,
if the key is already not present then setdefault creates the new key with the value provided in the second argument, in case the key is already present then it returns the value of that key.
val = cache.setdefault('the-key',value)

Use get to extract the value or to get None.
Combining None with or will let you chain another operation (setdefault)
def get_or_add(cache, key, value_factory):
return cache.get(key) or cache.setdefault(key, value_factory())
usage:
in order to make it lazy the method expects a function as the third parameter
get_or_add(cache, 'the-key', lambda: calculate_value('the-key'))

Related

Can someone explain what this does "defaultdict(lambda:0)" [duplicate]

In someone else's code I read the following two lines:
x = defaultdict(lambda: 0)
y = defaultdict(lambda: defaultdict(lambda: 0))
As the argument of defaultdict is a default factory, I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed. Am I correct?
And what about y? It seems that the default factory will create a defaultdict with default 0. But what does that mean concretely? I tried to play around with it in Python shell, but couldn't figure out what it is exactly.

I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed.
That's right. This is more idiomatically written
x = defaultdict(int)
In the case of y, when you do y["ham"]["spam"], the key "ham" is inserted in y if it does not exist. The value associated with it becomes a defaultdict in which "spam" is automatically inserted with a value of 0.
I.e., y is a kind of "two-tiered" defaultdict. If "ham" not in y, then evaluating y["ham"]["spam"] is like doing
y["ham"] = {}
y["ham"]["spam"] = 0
in terms of ordinary dict.

You are correct for what the first one does. As for y, it will create a defaultdict with default 0 when a key doesn't exist in y, so you can think of this as a nested dictionary. Consider the following example:
y = defaultdict(lambda: defaultdict(lambda: 0))
print y['k1']['k2'] # 0
print dict(y['k1']) # {'k2': 0}
To create an equivalent nested dictionary structure without defaultdict you would need to create an inner dict for y['k1'] and then set y['k1']['k2'] to 0, but defaultdict does all of this behind the scenes when it encounters keys it hasn't seen:
y = {}
y['k1'] = {}
y['k1']['k2'] = 0
The following function may help for playing around with this on an interpreter to better your understanding:
def to_dict(d):
if isinstance(d, defaultdict):
return dict((k, to_dict(v)) for k, v in d.items())
return d
This will return the dict equivalent of a nested defaultdict, which is a lot easier to read, for example:
>>> y = defaultdict(lambda: defaultdict(lambda: 0))
>>> y['a']['b'] = 5
>>> y
defaultdict(<function <lambda> at 0xb7ea93e4>, {'a': defaultdict(<function <lambda> at 0xb7ea9374>, {'b': 5})})
>>> to_dict(y)
{'a': {'b': 5}}

defaultdict takes a zero-argument callable to its constructor, which is called when the key is not found, as you correctly explained.
lambda: 0 will of course always return zero, but the preferred method to do that is defaultdict(int), which will do the same thing.
As for the second part, the author would like to create a new defaultdict(int), or a nested dictionary, whenever a key is not found in the top-level dictionary.

All answers are good enough still I am giving the answer to add more info:
"defaultdict requires an argument that is callable. That return result of that callable object is the default value that the dictionary returns when you try to access the dictionary with a key that does not exist."
Here's an example
SAMPLE= {'Age':28, 'Salary':2000}
SAMPLE = defaultdict(lambda:0,SAMPLE)
>>> SAMPLE
defaultdict(<function <lambda> at 0x0000000002BF7C88>, {'Salary': 2000, 'Age': 28})
>>> SAMPLE['Age']----> This will return 28
>>> SAMPLE['Phone']----> This will return 0 # you got 0 as output for a non existing key inside SAMPLE

y = defaultdict(lambda:defaultdict(lambda:0))
will be helpful if you try this y['a']['b'] += 1

How to check if a key-value pair is present in a dictionary?

Is there a smart pythonic way to check if there is an item (key,value) in a dict?
a={'a':1,'b':2,'c':3}
b={'a':1}
c={'a':2}
b in a:
--> True
c in a:
--> False

Use the short circuiting property of and. In this way if the left hand is false, then you will not get a KeyError while checking for the value.
>>> a={'a':1,'b':2,'c':3}
>>> key,value = 'c',3 # Key and value present
>>> key in a and value == a[key]
True
>>> key,value = 'b',3 # value absent
>>> key in a and value == a[key]
False
>>> key,value = 'z',3 # Key absent
>>> key in a and value == a[key]
False

You can check a tuple of the key, value against the dictionary's .items().
test = {'a': 1, 'b': 2}
print(('a', 1) in test.items())
>>> True

You've tagged this 2.7, as opposed to 2.x, so you can check whether the tuple is in the dict's viewitems:
(key, value) in d.viewitems()
Under the hood, this basically does key in d and d[key] == value.
In Python 3, viewitems is just items, but don't use items in Python 2! That'll build a list and do a linear search, taking O(n) time and space to do what should be a quick O(1) check.

>>> a = {'a': 1, 'b': 2, 'c': 3}
>>> b = {'a': 1}
>>> c = {'a': 2}
First here is a way that works for Python2 and Python3
>>> all(k in a and a[k] == b[k] for k in b)
True
>>> all(k in a and a[k] == c[k] for k in c)
False
In Python3 you can also use
>>> b.items() <= a.items()
True
>>> c.items() <= a.items()
False
For Python2, the equivalent is
>>> b.viewitems() <= a.viewitems()
True
>>> c.viewitems() <= a.viewitems()
False

Converting my comment into an answer :
Use the dict.get method which is already provided as an inbuilt method (and I assume is the most pythonic)
>>> dict = {'Name': 'Anakin', 'Age': 27}
>>> dict.get('Age')
27
>>> dict.get('Gender', 'None')
'None'
>>>
As per the docs -
get(key, default) -
Return the value for key if key is in the dictionary, else default.
If default is not given, it defaults to None, so that this method
never raises a KeyError.

Using get:
# this doesn't work if `None` is a possible value
# but you can use a different sentinal value in that case
a.get('a') == 1
Using try/except:
# more verbose than using `get`, but more foolproof also
a = {'a':1,'b':2,'c':3}
try:
has_item = a['a'] == 1
except KeyError:
has_item = False
print(has_item)
Other answers suggesting items in Python3 and viewitems in Python 2.7 are easier to read and more idiomatic, but the suggestions in this answer will work in both Python versions without any compatibility code and will still run in constant time. Pick your poison.

a.get('a') == 1
=> True
a.get('a') == 2
=> False
if None is valid item:
{'x': None}.get('x', object()) is None

Using .get is usually the best way to check if a key value pair exist.
if my_dict.get('some_key'):
# Do something
There is one caveat, if the key exists but is falsy then it will fail the test which may not be what you want. Keep in mind this is rarely the case. Now the inverse is a more frequent problem. That is using in to test the presence of a key. I have found this problem frequently when reading csv files.
Example
# csv looks something like this:
a,b
1,1
1,
# now the code
import csv
with open('path/to/file', 'r') as fh:
reader = csv.DictReader(fh) # reader is basically a list of dicts
for row_d in reader:
if 'b' in row_d:
# On the second iteration of this loop, b maps to the empty string but
# passes this condition statement, most of the time you won't want
# this. Using .get would be better for most things here.

For python 3.x
use if key in dict
See the sample code
#!/usr/bin/python
a={'a':1,'b':2,'c':3}
b={'a':1}
c={'a':2}
mylist = [a, b, c]
for obj in mylist:
if 'b' in obj:
print(obj['b'])
Output: 2

access value of a python dict() without knowing the keys

I have a dictionary of a list of dictionaries. something like below:
x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
The length of the lists (values) is the same for all keys of dict x.
I want to get the length of any one value i.e. a list without having to go through the obvious method -> get the keys, use len(x[keys[0]]) to get the length.
my code for this as of now:
val = None
for key in x.keys():
val = x[key]
break
#break after the first iteration as the length of the lists is the same for any key
try:
what_i_Want = len(val)
except TypeError:
print 'val wasn't set'
i am not happy with this, can be made more 'pythonic' i believe.

This is most efficient way, since we don't create any intermediate lists.
print len(x[next(iter(x))]) # 2
Note: For this method to work, the dictionary should have atleast one key in it.

What about this:
val = x[x.keys()[0]]
or alternatively:
val = x.values()[0]
and then your answer is
len(val)
Some of the other solutions (posted by thefourtheye and gnibbler) are better because they are not creating an intermediate list. I added this response merely as an easy to remember and obvious option, not a solution for time-efficient usage.

Works ok in Python2 or Python3
>>> x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
>>> next(len(i) for i in x.values())
2
This is better for Python2 as it avoids making a list of the values. Works well in Python3 too
>>> next(len(x[k]) for k in x)
2

Using next and iter:
>>> x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
>>> val = next(iter(x.values()), None) # Use `itervalues` in Python 2.x
>>> val
[{'q': 2, 'p': 1}, {'q': 5, 'p': 4}]
>>> len(val)
2
>>> x = {}
>>> val = next(iter(x.values()), None) # `None`: default value
>>> val is None
True

>>> x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
>>> len(x.values()[0])
2
Here, x.values gives you a list of all values then you can get length of any one value from it.

Multi-level defaultdict with variable depth?

I have a large list like:
[A][B1][C1]=1
[A][B1][C2]=2
[A][B2]=3
[D][E][F][G]=4
I want to build a multi-level dict like:
A
--B1
-----C1=1
-----C2=1
--B2=3
D
--E
----F
------G=4
I know that if I use recursive defaultdict I can write table[A][B1][C1]=1, table[A][B2]=2, but this works only if I hardcode those insert statement.
While parsing the list, I don't how many []'s I need beforehand to call table[key1][key2][...].

You can do it without even defining a class:
from collections import defaultdict
nested_dict = lambda: defaultdict(nested_dict)
nest = nested_dict()
nest[0][1][2][3][4][5] = 6

Your example says that at any level there can be a value, and also a dictionary of sub-elements. That is called a tree, and there are many implementations available for them. This is one:
from collections import defaultdict
class Tree(defaultdict):
def __init__(self, value=None):
super(Tree, self).__init__(Tree)
self.value = value
root = Tree()
root.value = 1
root['a']['b'].value = 3
print root.value
print root['a']['b'].value
print root['c']['d']['f'].value
Outputs:
1
3
None
You could do something similar by writing the input in JSON and using json.load to read it as a structure of nested dictionaries.

I think the simplest implementation of a recursive dictionary is this. Only leaf nodes can contain values.
# Define recursive dictionary
from collections import defaultdict
tree = lambda: defaultdict(tree)
Usage:
# Create instance
mydict = tree()
mydict['a'] = 1
mydict['b']['a'] = 2
mydict['c']
mydict['d']['a']['b'] = 0
# Print
import prettyprint
prettyprint.pp(mydict)
Output:
{
"a": 1,
"b": {
"a": 1
},
"c": {},
"d": {
"a": {
"b": 0
}
}
}

I'd do it with a subclass of dict that defines __missing__:
>>> class NestedDict(dict):
... def __missing__(self, key):
... self[key] = NestedDict()
... return self[key]
...
>>> table = NestedDict()
>>> table['A']['B1']['C1'] = 1
>>> table
{'A': {'B1': {'C1': 1}}}
You can't do it directly with defaultdict because defaultdict expects the factory function at initialization time, but at initialization time, there's no way to describe the same defaultdict. The above construct does the same thing that default dict does, but since it's a named class (NestedDict), it can reference itself as missing keys are encountered. It is also possible to subclass defaultdict and override __init__.

This is equivalent to the above, but avoiding lambda notation. Perhaps easier to read ?
def dict_factory():
return defaultdict(dict_factory)
your_dict = dict_factory()
Also -- from the comments -- if you'd like to update from an existing dict, you can simply call
your_dict[0][1][2].update({"some_key":"some_value"})
In order to add values to the dict.

Dan O'Huiginn posted a very nice solution on his journal in 2010:
http://ohuiginn.net/mt/2010/07/nested_dictionaries_in_python.html
>>> class NestedDict(dict):
... def __getitem__(self, key):
... if key in self: return self.get(key)
... return self.setdefault(key, NestedDict())
>>> eggs = NestedDict()
>>> eggs[1][2][3][4][5]
{}
>>> eggs
{1: {2: {3: {4: {5: {}}}}}}

You may achieve this with a recursive defaultdict.
from collections import defaultdict
def tree():
def the_tree():
return defaultdict(the_tree)
return the_tree()
It is important to protect the default factory name, the_tree here, in a closure ("private" local function scope). Avoid using a one-liner lambda version, which is bugged due to Python's late binding closures, and implement this with a def instead.
The accepted answer, using a lambda, has a flaw where instances must rely on the nested_dict name existing in an outer scope. If for whatever reason the factory name can not be resolved (e.g. it was rebound or deleted) then pre-existing instances will also become subtly broken:
>>> nested_dict = lambda: defaultdict(nested_dict)
>>> nest = nested_dict()
>>> nest[0][1][2][3][4][6] = 7
>>> del nested_dict
>>> nest[8][9] = 10
# NameError: name 'nested_dict' is not defined

To add to #Hugo To have a max depth:
l=lambda x:defaultdict(lambda:l(x-1)) if x>0 else defaultdict(dict)
arr = l(2)

A slightly different possibility that allows regular dictionary initialization:
from collections import defaultdict
def superdict(arg=()):
update = lambda obj, arg: obj.update(arg) or obj
return update(defaultdict(superdict), arg)
Example:
>>> d = {"a":1}
>>> sd = superdict(d)
>>> sd["b"]["c"] = 2

You could use a NestedDict.
from ndicts.ndicts import NestedDict
nd = NestedDict()
nd[0, 1, 2, 3, 4, 5] = 6
The result as a dictionary:
>>> nd.to_dict()
{0: {1: {2: {3: {4: {5: 6}}}}}}
To install ndicts
pip install ndicts

how to rewrite python dicts to get default values

I want to rewrite Python's dictionary access mechanism "getitem" to be able to return default values.
The functionality I am looking for is something like
a = dict()
a.setdefault_value(None)
print a[100] #this would return none
any hints ?
Thanks

There is already a collections.defaultdict:
from collections import defaultdict
a = defaultdict(lambda:None)
print a[100]

There is a defaultdict built-in starting with Python 2.6. The constructor takes a function which will be called when a value is not found. This gives more flexibility than simply returning None.
from collections import defaultdict
a = defaultdict(lambda: None)
print a[100] #gives None
The lambda is just a quick way to define a one-line function with no name. This code is equivalent:
def nonegetter():
return None
a = defaultdict(nonegetter)
print a[100] #gives None
This is a very useful pattern which gives you a hash showing the count of each unique object. Using a normal dict, you would need special cases to avoid KeyError.
counts = defaultdict(int)
for obj in mylist:
counts[obj] += 1

use a defaultdict (http://docs.python.org/library/collections.html#collections.defaultdict)
import collections
a = collections.defaultdict(lambda:None)
where the argument to the defaultdict constructor is a function which returns the default value.
Note that if you access an unset entry, it actually sets it to the default:
>>> print a[100]
None
>>> a
defaultdict(<function <lambda> at 0x38faf0>, {100: None})

If you really want to not use the defaultdict builtin, you need to define your own subclass of dict, like so:
class MyDefaultDict(dict):
def setdefault_value(self, default):
self.__default = default
def __getitem__(self, key):
try:
return self[key]
except IndexError:
return self.__default

i wasnt aware of defaultdict, and thats probably the best way to go. if you are opposed for some reason ive written small wrapper function for this purpose in the past. Has slightly different functionality that may or may not be better for you.
def makeDictGet(d, defaultVal):
return lambda key: d[key] if key in dict else defaultVal
And using it...
>>> d1 = {'a':1,'b':2}
>>> d1Get = makeDictGet(d1, 0)
>>> d1Get('a')
1
>>> d1Get(5)
0
>>> d1['newAddition'] = 'justAddedThisOne' #changing dict after the fact is fine
>>> d1Get('newAddition')
'justAddedThisOne'
>>> del d1['a']
>>> d1Get('a')
0
>>> d1GetDefaultNone = makeDictGet(d1, None) #having more than one such function is fine
>>> print d1GetDefaultNone('notpresent')
None
>>> d1Get('notpresent')
0
>>> f = makeDictGet({'k1':'val1','pi':3.14,'e':2.718},False) #just put new dict as arg if youre ok with being unable to change it or access directly
>>> f('e')
2.718
>>> f('bad')
False

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Best idiom to get and set a value in a python dict - python

Have a look at the Python Decorator Library, and more specifically Memoize which acts as a cache. That way you can just decorate your call the calculate_value with the Memoize decorator.

use setdefault method, if the key is already not present then setdefault creates the new key with the value provided in the second argument, in case the key is already present then it returns the value of that key. val = cache.setdefault('the-key',value)

Related

Can someone explain what this does "defaultdict(lambda:0)" [duplicate]

How to check if a key-value pair is present in a dictionary?

access value of a python dict() without knowing the keys

Multi-level defaultdict with variable depth?

how to rewrite python dicts to get default values

Categories

Resources