What exactly does the TYPE lambda do when used with defaultdict? I have this example and works fine even for int, list & lambda as argument:
d = defaultdict(int)
d['one'] = lambda x:x*x
d['one'](2)
4
d = defaultdict(list)
d['one'] = lambda x:x*x
d['one'](2)
4
d = defaultdict(lambda: None)
d['one'] = lambda x:x*x
d['one'](2)
4
I have the same result each time. So what is the main reason to initialize with lambda "default (lambda: None)"? Looks defaultdict dictionary does not care about the what TYPE of argument is passed in.
Your example only makes sense when you access keys that are not explicitly added to the dictionary:
>>> d = defaultdict(int)
>>> d['one']
0
>>> d = defaultdict(list)
>>> d['one']
[]
>>> d = defaultdict(lambda: None)
>>> d['one'] is None
True
As you can see, using a default dict will give every key you try to access a default value. That default value is taken by calling the function you pass to the constructor. So passing int will set int() as the default value (which is 0); passing list will set list() as the default value (which is an empty list []); and passing lambda: None will set (lambda: None)() as the default value (which is None).
That’s what the default dictionary does. Nothing else.
The idea is that this way, you can set up defaults which you don’t need to manually set up the first time you want to access the key. So for example something like this:
d = {}
for item in some_source_for_items:
if item['key'] not in d:
d[item['key']] = []
d[item['key']].append(item)
which just sets up a new empty list for every dictionary item when it is accessed, can be reduced to just this:
d = defaultdict(list)
for item in some_source_for_items:
d[item['key']].append(item)
And the defaultdict will make sure to initialize the list correctly.
You are not using the default value factory. You won't see a difference if all you do is assign to keys, rather than try and retrieve a key that isn't in the dictionary yet.
The default value factory (the first argument to defaultdict()) is not a type declaration. It is instead called whenever you try and access a key that isn't in the dictionary yet:
>>> from collections import defaultdict
>>> def demo_factory():
... print('Called the factory for a missing key')
... return 'Default value'
...
>>> d = defaultdict(demo_factory)
>>> list(d) # list the keys
[]
>>> d['foo']
Called the factory for a missing key
'Default value'
>>> list(d)
['foo']
>>> d['foo']
'Default value'
>>> d['bar'] = 'spam' # assignment is not the same thing
>>> list(d)
['foo', 'bar']
>>> d['bar']
'spam'
Only the first time when I tried to access the key 'foo' was the factory called to produce a default value, which is then stored in the dictionary for future access.
So for each of your different examples, what varies between them is what default value will be produced for each. You never access this functionality, because you directly assigned to the 'one' key.
Had you accessed a non-existing key you'd have created an integer with value 0, an empty list or None, respectively.
Related
I am trying to understand some code and I found the following script:
defaultdict(lambda: defaultdict(lambda: 0))
I am not familiar with defaultdict neither with lambda function. I suspect that this is equivalent to initialize a dictionary which values are also a dictionary. Am I wright?
A lambda expression defines a function in-place. Arguments go before the :, and the result of the function goes after it. For example:
>>> inc = lambda x: x+1
>>> inc(3)
4
>>> add = lambda x, y: x + y
>>> add(19, 23)
42
>>> zero = lambda: 0
>>> zero()
0
defaultdict is a dict that creates a default value any time you access it with a non-existent key. It does this by calling the function you pass to it. A common use case is to create a counter by having a defaultdict that automatically creates zero values that can then be incremented:
>>> foo = defaultdict(lambda: 0)
>>> foo["bar"]
0
>>> foo["bar"] += 1
>>> foo["bar"]
1
Since the function used by a defaultdict can be anything, we can nest them by giving an outer dict a function that returns an inner defaultdict:
>>> foo = defaultdict(lambda: defaultdict(lambda: 0))
>>> foo["bar"]["baz"]
0
In someone else's code I read the following two lines:
x = defaultdict(lambda: 0)
y = defaultdict(lambda: defaultdict(lambda: 0))
As the argument of defaultdict is a default factory, I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed. Am I correct?
And what about y? It seems that the default factory will create a defaultdict with default 0. But what does that mean concretely? I tried to play around with it in Python shell, but couldn't figure out what it is exactly.
I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed.
That's right. This is more idiomatically written
x = defaultdict(int)
In the case of y, when you do y["ham"]["spam"], the key "ham" is inserted in y if it does not exist. The value associated with it becomes a defaultdict in which "spam" is automatically inserted with a value of 0.
I.e., y is a kind of "two-tiered" defaultdict. If "ham" not in y, then evaluating y["ham"]["spam"] is like doing
y["ham"] = {}
y["ham"]["spam"] = 0
in terms of ordinary dict.
You are correct for what the first one does. As for y, it will create a defaultdict with default 0 when a key doesn't exist in y, so you can think of this as a nested dictionary. Consider the following example:
y = defaultdict(lambda: defaultdict(lambda: 0))
print y['k1']['k2'] # 0
print dict(y['k1']) # {'k2': 0}
To create an equivalent nested dictionary structure without defaultdict you would need to create an inner dict for y['k1'] and then set y['k1']['k2'] to 0, but defaultdict does all of this behind the scenes when it encounters keys it hasn't seen:
y = {}
y['k1'] = {}
y['k1']['k2'] = 0
The following function may help for playing around with this on an interpreter to better your understanding:
def to_dict(d):
if isinstance(d, defaultdict):
return dict((k, to_dict(v)) for k, v in d.items())
return d
This will return the dict equivalent of a nested defaultdict, which is a lot easier to read, for example:
>>> y = defaultdict(lambda: defaultdict(lambda: 0))
>>> y['a']['b'] = 5
>>> y
defaultdict(<function <lambda> at 0xb7ea93e4>, {'a': defaultdict(<function <lambda> at 0xb7ea9374>, {'b': 5})})
>>> to_dict(y)
{'a': {'b': 5}}
defaultdict takes a zero-argument callable to its constructor, which is called when the key is not found, as you correctly explained.
lambda: 0 will of course always return zero, but the preferred method to do that is defaultdict(int), which will do the same thing.
As for the second part, the author would like to create a new defaultdict(int), or a nested dictionary, whenever a key is not found in the top-level dictionary.
All answers are good enough still I am giving the answer to add more info:
"defaultdict requires an argument that is callable. That return result of that callable object is the default value that the dictionary returns when you try to access the dictionary with a key that does not exist."
Here's an example
SAMPLE= {'Age':28, 'Salary':2000}
SAMPLE = defaultdict(lambda:0,SAMPLE)
>>> SAMPLE
defaultdict(<function <lambda> at 0x0000000002BF7C88>, {'Salary': 2000, 'Age': 28})
>>> SAMPLE['Age']----> This will return 28
>>> SAMPLE['Phone']----> This will return 0 # you got 0 as output for a non existing key inside SAMPLE
y = defaultdict(lambda:defaultdict(lambda:0))
will be helpful if you try this y['a']['b'] += 1
This is a question relative to the solution provided here, it involves the following code as solution:
from collections import MutableMapping
def set_value(d, keys, newkey, newvalue, default_factory=dict):
"""
Equivalent to `reduce(dict.get, keys, d)[newkey] = newvalue`
if all `keys` exists and corresponding values are of correct type
"""
for key in keys:
try:
val = d[key]
except KeyError:
val = d[key] = default_factory()
else:
if not isinstance(val, MutableMapping):
val = d[key] = default_factory()
d = val
d[newkey] = newvalue
I'm hoping someone could provide me some explanation why this code works. I'm confused how the passed in dict 'd' doesn't get constantly overwritten where d = val. How does the dict 'd' keep getting further nested dictionaries without ever indexing to the next node? Sorry, if that doesn't make sense, i don't understand how this works.
Thanks for your help!
d is rebound; the variable is updated to point to val in each loop.
For each key in keys, either the key is found (val = d[key] succeeds) or the default_factory() is used to create a new value for that key.
If the key was found but the value was not a MutableMapping type, the found value is replaced with a new default_factory() result.
Once the new value has been determined for this level, d is told to forget about the old dictionary and pointed to the new instead.
Rebinding does not change the old value. It merely stops referring to that old value.
Let's use a simple example:
>>> d = {'foo': {}}
>>> keys = ['foo']
>>> newkey = 'bar'
>>> newval = 'eggs'
>>> original = d
At the start, original and d are the same object. Think of names here as paper labels, and their values as balloons. The labels are tied with string to the balloons. In the above example, the d and original labels are both tied to the same dictionary balloon.
When we enter the for key in keys loop, the d[key] lookup succeeds and val is tied to the result of d['foo'], an empty dictionary:
>>> key = keys[0]
>>> key
'foo'
>>> val = d[key]
>>> val
{}
This is a regular Python dictionary, and isinstance(val, MutableMapping) is True. Next line rebinds the d label to that dictionary. The string is simply untied from the original dictionary and now attached to the same balloon val is tied to:
>>> d = val
>>> d
{}
>>> original
{'foo': {}}
>>> d is val
True
>>> d is original
False
The original dictionary was not altered by the rebinding!
Having run out of keys (there was only one in keys), the next part then assigns newval to d[newkey]:
>>> d[newkey] = newval
>>> d
{'bar': 'eggs'}
However, d is not the only label attached to this dictionary balloon. Dictionaries themselves contain keys and values, both of which are labels that are tied to balloons too! The original label is still tied to the outer dictionary balloon, and it has a foo key associated value, which was tied to a nested dictionary, and it is this nested dictionary we just changed:
>>> original
{'foo': {'bar': 'eggs'}}
The algorithm merely followed along labels via strings to new dictionaries.
Using more complex key combinations just means more strings are being followed, with perhaps an extra dictionary being pumped up to be tied in.
I think your question boils down to:
Why does d[newkey] = newvalue modify the object, while d = var does not do anything to the object?
It is just the case that in Python, you can modify a mutable object in a function, but you can't change what object the outer name refers to.
I use a dict as a short-term cache. I want to get a value from the dictionary, and if the dictionary didn't already have that key, set it, e.g.:
val = cache.get('the-key', calculate_value('the-key'))
cache['the-key'] = val
In the case where 'the-key' was already in cache, the second line is not necessary. Is there a better, shorter, more expressive idiom for this?
yes, use:
val = cache.setdefault('the-key', calculate_value('the-key'))
An example in the shell:
>>> cache = {'a': 1, 'b': 2}
>>> cache.setdefault('a', 0)
1
>>> cache.setdefault('b', 0)
2
>>> cache.setdefault('c', 0)
0
>>> cache
{'a': 1, 'c': 0, 'b': 2}
See: http://docs.python.org/release/2.5.2/lib/typesmapping.html
Readability matters!
if 'the-key' not in cache:
cache['the-key'] = calculate_value('the-key')
val = cache['the-key']
If you really prefer an one-liner:
val = cache['the-key'] if 'the-key' in cache else cache.setdefault('the-key', calculate_value('the-key'))
Another option is to define __missing__ in the cache class:
class Cache(dict):
def __missing__(self, key):
return self.setdefault(key, calculate_value(key))
Have a look at the Python Decorator Library, and more specifically Memoize which acts as a cache. That way you can just decorate your call the calculate_value with the Memoize decorator.
Approach with
cache.setdefault('the-key',calculate_value('the-key'))
is great if calculate_value is not costly, because it will be evaluated each time. So if you have to read from DB, open a file or network connection or do anything "expensive", then use the following structure:
try:
val = cache['the-key']
except KeyError:
val = calculate_value('the-key')
cache['the-key'] = val
You might want to take a look at (the entire page at) "Code Like a Pythonista" http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#dictionary-get-method
It covers the setdefault() technique described above, and the defaultdict technique is also very handy for making dictionaries of sets or arrays for example.
You can also use defaultdict to do something similar:
>>> from collections import defaultdict
>>> d = defaultdict(int) # will default values to 0
>>> d["a"] = 1
>>> d["a"]
1
>>> d["b"]
0
>>>
You can assign any default you want by supplying your own factory function and itertools.repeat:
>>> from itertools import repeat
>>> def constant_factory(value):
... return repeat(value).next
...
>>> default_value = "default"
>>> d = defaultdict(constant_factory(default_value))
>>> d["a"]
'default'
>>> d["b"] = 5
>>> d["b"]
5
>>> d.keys()
['a', 'b']
use setdefault method,
if the key is already not present then setdefault creates the new key with the value provided in the second argument, in case the key is already present then it returns the value of that key.
val = cache.setdefault('the-key',value)
Use get to extract the value or to get None.
Combining None with or will let you chain another operation (setdefault)
def get_or_add(cache, key, value_factory):
return cache.get(key) or cache.setdefault(key, value_factory())
usage:
in order to make it lazy the method expects a function as the third parameter
get_or_add(cache, 'the-key', lambda: calculate_value('the-key'))
I want to rewrite Python's dictionary access mechanism "getitem" to be able to return default values.
The functionality I am looking for is something like
a = dict()
a.setdefault_value(None)
print a[100] #this would return none
any hints ?
Thanks
There is already a collections.defaultdict:
from collections import defaultdict
a = defaultdict(lambda:None)
print a[100]
There is a defaultdict built-in starting with Python 2.6. The constructor takes a function which will be called when a value is not found. This gives more flexibility than simply returning None.
from collections import defaultdict
a = defaultdict(lambda: None)
print a[100] #gives None
The lambda is just a quick way to define a one-line function with no name. This code is equivalent:
def nonegetter():
return None
a = defaultdict(nonegetter)
print a[100] #gives None
This is a very useful pattern which gives you a hash showing the count of each unique object. Using a normal dict, you would need special cases to avoid KeyError.
counts = defaultdict(int)
for obj in mylist:
counts[obj] += 1
use a defaultdict (http://docs.python.org/library/collections.html#collections.defaultdict)
import collections
a = collections.defaultdict(lambda:None)
where the argument to the defaultdict constructor is a function which returns the default value.
Note that if you access an unset entry, it actually sets it to the default:
>>> print a[100]
None
>>> a
defaultdict(<function <lambda> at 0x38faf0>, {100: None})
If you really want to not use the defaultdict builtin, you need to define your own subclass of dict, like so:
class MyDefaultDict(dict):
def setdefault_value(self, default):
self.__default = default
def __getitem__(self, key):
try:
return self[key]
except IndexError:
return self.__default
i wasnt aware of defaultdict, and thats probably the best way to go. if you are opposed for some reason ive written small wrapper function for this purpose in the past. Has slightly different functionality that may or may not be better for you.
def makeDictGet(d, defaultVal):
return lambda key: d[key] if key in dict else defaultVal
And using it...
>>> d1 = {'a':1,'b':2}
>>> d1Get = makeDictGet(d1, 0)
>>> d1Get('a')
1
>>> d1Get(5)
0
>>> d1['newAddition'] = 'justAddedThisOne' #changing dict after the fact is fine
>>> d1Get('newAddition')
'justAddedThisOne'
>>> del d1['a']
>>> d1Get('a')
0
>>> d1GetDefaultNone = makeDictGet(d1, None) #having more than one such function is fine
>>> print d1GetDefaultNone('notpresent')
None
>>> d1Get('notpresent')
0
>>> f = makeDictGet({'k1':'val1','pi':3.14,'e':2.718},False) #just put new dict as arg if youre ok with being unable to change it or access directly
>>> f('e')
2.718
>>> f('bad')
False