In someone else's code I read the following two lines:
x = defaultdict(lambda: 0)
y = defaultdict(lambda: defaultdict(lambda: 0))
As the argument of defaultdict is a default factory, I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed. Am I correct?
And what about y? It seems that the default factory will create a defaultdict with default 0. But what does that mean concretely? I tried to play around with it in Python shell, but couldn't figure out what it is exactly.
I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed.
That's right. This is more idiomatically written
x = defaultdict(int)
In the case of y, when you do y["ham"]["spam"], the key "ham" is inserted in y if it does not exist. The value associated with it becomes a defaultdict in which "spam" is automatically inserted with a value of 0.
I.e., y is a kind of "two-tiered" defaultdict. If "ham" not in y, then evaluating y["ham"]["spam"] is like doing
y["ham"] = {}
y["ham"]["spam"] = 0
in terms of ordinary dict.
You are correct for what the first one does. As for y, it will create a defaultdict with default 0 when a key doesn't exist in y, so you can think of this as a nested dictionary. Consider the following example:
y = defaultdict(lambda: defaultdict(lambda: 0))
print y['k1']['k2'] # 0
print dict(y['k1']) # {'k2': 0}
To create an equivalent nested dictionary structure without defaultdict you would need to create an inner dict for y['k1'] and then set y['k1']['k2'] to 0, but defaultdict does all of this behind the scenes when it encounters keys it hasn't seen:
y = {}
y['k1'] = {}
y['k1']['k2'] = 0
The following function may help for playing around with this on an interpreter to better your understanding:
def to_dict(d):
if isinstance(d, defaultdict):
return dict((k, to_dict(v)) for k, v in d.items())
return d
This will return the dict equivalent of a nested defaultdict, which is a lot easier to read, for example:
>>> y = defaultdict(lambda: defaultdict(lambda: 0))
>>> y['a']['b'] = 5
>>> y
defaultdict(<function <lambda> at 0xb7ea93e4>, {'a': defaultdict(<function <lambda> at 0xb7ea9374>, {'b': 5})})
>>> to_dict(y)
{'a': {'b': 5}}
defaultdict takes a zero-argument callable to its constructor, which is called when the key is not found, as you correctly explained.
lambda: 0 will of course always return zero, but the preferred method to do that is defaultdict(int), which will do the same thing.
As for the second part, the author would like to create a new defaultdict(int), or a nested dictionary, whenever a key is not found in the top-level dictionary.
All answers are good enough still I am giving the answer to add more info:
"defaultdict requires an argument that is callable. That return result of that callable object is the default value that the dictionary returns when you try to access the dictionary with a key that does not exist."
Here's an example
SAMPLE= {'Age':28, 'Salary':2000}
SAMPLE = defaultdict(lambda:0,SAMPLE)
>>> SAMPLE
defaultdict(<function <lambda> at 0x0000000002BF7C88>, {'Salary': 2000, 'Age': 28})
>>> SAMPLE['Age']----> This will return 28
>>> SAMPLE['Phone']----> This will return 0 # you got 0 as output for a non existing key inside SAMPLE
y = defaultdict(lambda:defaultdict(lambda:0))
will be helpful if you try this y['a']['b'] += 1
Related
I have this current code
lst = [1,2,3,4]
c = dict((el,0) for el in lst)
for key in lst:
c[key] += increase_val(key)
Is there a more pythonic way to do it? Like using map? This code words but i would like probably a one-liner or maybe better way of writing this
In my opinion, that is a very clean, readable way of updating the dictionary in the way you wanted.
However, if you are looking for a one-liner, here's one:
new_dict = {x: y + increase_val(x) for x, y in old_dict.items()}
What's different is that this create's a new dictionary instead of updating the original one. If you want to mutate the dictionary in place, I think the plain old for-loop would be the most readable alternative.
In your case no need of c = dict((el,0) for el in lst) statement, because we create dictionary where value of each key is 0.
and in next for loop you are adding increment value to 0 i.e. 0 + 100 = 100, so need of addition also.
You can write code like:
lst = [1,2,3,4]
c = {}
for key in lst:
c[key] = increase_val(key)
collection.Counter()
Use collections.Counter() to remove one iteration over list to create dictionary because default value of every key in your case is 0.
Use Collections library, import collections
Demo:
>>> lst = [1,2,3,4]
>>> data = collections.Counter()
>>> for key in lst:
data[key] += increase_val(key)
collection.defaultdict()
We can use collections.defaultdict also. Just use data = collections.defaultdict(int) in above code. Here default value is zero.
But if we want to set default value to any constant value like 100 then we can use lambda function to set default value to 100
Demo:
>>> data = {}
>>> data["any"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'any'
Get key error because there is on any key in dictionary.
>>> data1 = collections.defaultdict(lambda:0, data)
>>> data1["any"]
0
>>> data1 = collections.defaultdict(lambda:100, data)
>>> data1["any"]
>>> 100
I've hit a bit of a problem with creating empty dictionaries within dictionaries while using fromkeys(); they all link to the same one.
Here's a quick bit of code to demonstrate what I mean:
a = dict.fromkeys( range( 3 ), {} )
for key in a:
a[key][0] = key
Output I'd want is like a[0][0]=0, a[1][0]=1, a[2][0]=2, yet they all equal 2 since it's editing the same dictionarionary 3 times
If I was to define the dictionary like a = {0: {}, 1: {}, 2: {}}, it works, but that's not very practical for if you need to build it from a bigger list.
With fromkeys, I've tried {}, dict(), dict.copy() and b={}; b.copy(), how would I go about doing this?
The problem is that {} is a single value to fromkeys, and not a factory. Therefore you get the single mutable dict, not individual copies of it.
defaultdict is one way to create a dict that has a builtin factory.
from collections import defaultdict as dd
from pprint import pprint as pp
a = dd(dict)
for key in range(3):
a[key][0] = key
pp(a)
If you want something more strictly evaluated, you will need to use a dict comprehension or map.
a = {key: {} for key in range(3)}
But then, if you're going to do that, you may as well get it all done
a = {key: {0: key} for key in range(3)}
Just iterate over keys and insert a dict for each key:
{k: {0: k} for k in keys}
Here, keys is an iterable of hashable values such as range(3) in your example.
I have a dictionary of a list of dictionaries. something like below:
x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
The length of the lists (values) is the same for all keys of dict x.
I want to get the length of any one value i.e. a list without having to go through the obvious method -> get the keys, use len(x[keys[0]]) to get the length.
my code for this as of now:
val = None
for key in x.keys():
val = x[key]
break
#break after the first iteration as the length of the lists is the same for any key
try:
what_i_Want = len(val)
except TypeError:
print 'val wasn't set'
i am not happy with this, can be made more 'pythonic' i believe.
This is most efficient way, since we don't create any intermediate lists.
print len(x[next(iter(x))]) # 2
Note: For this method to work, the dictionary should have atleast one key in it.
What about this:
val = x[x.keys()[0]]
or alternatively:
val = x.values()[0]
and then your answer is
len(val)
Some of the other solutions (posted by thefourtheye and gnibbler) are better because they are not creating an intermediate list. I added this response merely as an easy to remember and obvious option, not a solution for time-efficient usage.
Works ok in Python2 or Python3
>>> x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
>>> next(len(i) for i in x.values())
2
This is better for Python2 as it avoids making a list of the values. Works well in Python3 too
>>> next(len(x[k]) for k in x)
2
Using next and iter:
>>> x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
>>> val = next(iter(x.values()), None) # Use `itervalues` in Python 2.x
>>> val
[{'q': 2, 'p': 1}, {'q': 5, 'p': 4}]
>>> len(val)
2
>>> x = {}
>>> val = next(iter(x.values()), None) # `None`: default value
>>> val is None
True
>>> x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
>>> len(x.values()[0])
2
Here, x.values gives you a list of all values then you can get length of any one value from it.
This is a question relative to the solution provided here, it involves the following code as solution:
from collections import MutableMapping
def set_value(d, keys, newkey, newvalue, default_factory=dict):
"""
Equivalent to `reduce(dict.get, keys, d)[newkey] = newvalue`
if all `keys` exists and corresponding values are of correct type
"""
for key in keys:
try:
val = d[key]
except KeyError:
val = d[key] = default_factory()
else:
if not isinstance(val, MutableMapping):
val = d[key] = default_factory()
d = val
d[newkey] = newvalue
I'm hoping someone could provide me some explanation why this code works. I'm confused how the passed in dict 'd' doesn't get constantly overwritten where d = val. How does the dict 'd' keep getting further nested dictionaries without ever indexing to the next node? Sorry, if that doesn't make sense, i don't understand how this works.
Thanks for your help!
d is rebound; the variable is updated to point to val in each loop.
For each key in keys, either the key is found (val = d[key] succeeds) or the default_factory() is used to create a new value for that key.
If the key was found but the value was not a MutableMapping type, the found value is replaced with a new default_factory() result.
Once the new value has been determined for this level, d is told to forget about the old dictionary and pointed to the new instead.
Rebinding does not change the old value. It merely stops referring to that old value.
Let's use a simple example:
>>> d = {'foo': {}}
>>> keys = ['foo']
>>> newkey = 'bar'
>>> newval = 'eggs'
>>> original = d
At the start, original and d are the same object. Think of names here as paper labels, and their values as balloons. The labels are tied with string to the balloons. In the above example, the d and original labels are both tied to the same dictionary balloon.
When we enter the for key in keys loop, the d[key] lookup succeeds and val is tied to the result of d['foo'], an empty dictionary:
>>> key = keys[0]
>>> key
'foo'
>>> val = d[key]
>>> val
{}
This is a regular Python dictionary, and isinstance(val, MutableMapping) is True. Next line rebinds the d label to that dictionary. The string is simply untied from the original dictionary and now attached to the same balloon val is tied to:
>>> d = val
>>> d
{}
>>> original
{'foo': {}}
>>> d is val
True
>>> d is original
False
The original dictionary was not altered by the rebinding!
Having run out of keys (there was only one in keys), the next part then assigns newval to d[newkey]:
>>> d[newkey] = newval
>>> d
{'bar': 'eggs'}
However, d is not the only label attached to this dictionary balloon. Dictionaries themselves contain keys and values, both of which are labels that are tied to balloons too! The original label is still tied to the outer dictionary balloon, and it has a foo key associated value, which was tied to a nested dictionary, and it is this nested dictionary we just changed:
>>> original
{'foo': {'bar': 'eggs'}}
The algorithm merely followed along labels via strings to new dictionaries.
Using more complex key combinations just means more strings are being followed, with perhaps an extra dictionary being pumped up to be tied in.
I think your question boils down to:
Why does d[newkey] = newvalue modify the object, while d = var does not do anything to the object?
It is just the case that in Python, you can modify a mutable object in a function, but you can't change what object the outer name refers to.
I use a dict as a short-term cache. I want to get a value from the dictionary, and if the dictionary didn't already have that key, set it, e.g.:
val = cache.get('the-key', calculate_value('the-key'))
cache['the-key'] = val
In the case where 'the-key' was already in cache, the second line is not necessary. Is there a better, shorter, more expressive idiom for this?
yes, use:
val = cache.setdefault('the-key', calculate_value('the-key'))
An example in the shell:
>>> cache = {'a': 1, 'b': 2}
>>> cache.setdefault('a', 0)
1
>>> cache.setdefault('b', 0)
2
>>> cache.setdefault('c', 0)
0
>>> cache
{'a': 1, 'c': 0, 'b': 2}
See: http://docs.python.org/release/2.5.2/lib/typesmapping.html
Readability matters!
if 'the-key' not in cache:
cache['the-key'] = calculate_value('the-key')
val = cache['the-key']
If you really prefer an one-liner:
val = cache['the-key'] if 'the-key' in cache else cache.setdefault('the-key', calculate_value('the-key'))
Another option is to define __missing__ in the cache class:
class Cache(dict):
def __missing__(self, key):
return self.setdefault(key, calculate_value(key))
Have a look at the Python Decorator Library, and more specifically Memoize which acts as a cache. That way you can just decorate your call the calculate_value with the Memoize decorator.
Approach with
cache.setdefault('the-key',calculate_value('the-key'))
is great if calculate_value is not costly, because it will be evaluated each time. So if you have to read from DB, open a file or network connection or do anything "expensive", then use the following structure:
try:
val = cache['the-key']
except KeyError:
val = calculate_value('the-key')
cache['the-key'] = val
You might want to take a look at (the entire page at) "Code Like a Pythonista" http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#dictionary-get-method
It covers the setdefault() technique described above, and the defaultdict technique is also very handy for making dictionaries of sets or arrays for example.
You can also use defaultdict to do something similar:
>>> from collections import defaultdict
>>> d = defaultdict(int) # will default values to 0
>>> d["a"] = 1
>>> d["a"]
1
>>> d["b"]
0
>>>
You can assign any default you want by supplying your own factory function and itertools.repeat:
>>> from itertools import repeat
>>> def constant_factory(value):
... return repeat(value).next
...
>>> default_value = "default"
>>> d = defaultdict(constant_factory(default_value))
>>> d["a"]
'default'
>>> d["b"] = 5
>>> d["b"]
5
>>> d.keys()
['a', 'b']
use setdefault method,
if the key is already not present then setdefault creates the new key with the value provided in the second argument, in case the key is already present then it returns the value of that key.
val = cache.setdefault('the-key',value)
Use get to extract the value or to get None.
Combining None with or will let you chain another operation (setdefault)
def get_or_add(cache, key, value_factory):
return cache.get(key) or cache.setdefault(key, value_factory())
usage:
in order to make it lazy the method expects a function as the third parameter
get_or_add(cache, 'the-key', lambda: calculate_value('the-key'))