The addition of collections.defaultdict in Python 2.5 greatly reduced the need for dict's setdefault method. This question is for our collective education:
What is setdefault still useful for, today in Python 2.6/2.7?
What popular use cases of setdefault were superseded with collections.defaultdict?
You could say defaultdict is useful for settings defaults before filling the dict and setdefault is useful for setting defaults while or after filling the dict.
Probably the most common use case: Grouping items (in unsorted data, else use itertools.groupby)
# really verbose
new = {}
for (key, value) in data:
if key in new:
new[key].append( value )
else:
new[key] = [value]
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # key might exist already
group.append( value )
# even simpler with defaultdict
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append( value ) # all keys have a default already
Sometimes you want to make sure that specific keys exist after creating a dict. defaultdict doesn't work in this case, because it only creates keys on explicit access. Think you use something HTTP-ish with many headers -- some are optional, but you want defaults for them:
headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
headers.setdefault( headername, defaultvalue )
I commonly use setdefault for keyword argument dicts, such as in this function:
def notify(self, level, *pargs, **kwargs):
kwargs.setdefault("persist", level >= DANGER)
self.__defcon.set(level, **kwargs)
try:
kwargs.setdefault("name", self.client.player_entity().name)
except pytibia.PlayerEntityNotFound:
pass
return _notify(level, *pargs, **kwargs)
It's great for tweaking arguments in wrappers around functions that take keyword arguments.
defaultdict is great when the default value is static, like a new list, but not so much if it's dynamic.
For example, I need a dictionary to map strings to unique ints. defaultdict(int) will always use 0 for the default value. Likewise, defaultdict(intGen()) always produces 1.
Instead, I used a regular dict:
nextID = intGen()
myDict = {}
for lots of complicated stuff:
#stuff that generates unpredictable, possibly already seen str
strID = myDict.setdefault(myStr, nextID())
Note that dict.get(key, nextID()) is insufficient because I need to be able to refer to these values later as well.
intGen is a tiny class I build that automatically increments an int and returns its value:
class intGen:
def __init__(self):
self.i = 0
def __call__(self):
self.i += 1
return self.i
If someone has a way to do this with defaultdict I'd love to see it.
As most answers state setdefault or defaultdict would let you set a default value when a key doesn't exist. However, I would like to point out a small caveat with regard to the use cases of setdefault. When the Python interpreter executes setdefaultit will always evaluate the second argument to the function even if the key exists in the dictionary. For example:
In: d = {1:5, 2:6}
In: d
Out: {1: 5, 2: 6}
In: d.setdefault(2, 0)
Out: 6
In: d.setdefault(2, print('test'))
test
Out: 6
As you can see, print was also executed even though 2 already existed in the dictionary. This becomes particularly important if you are planning to use setdefault for example for an optimization like memoization. If you add a recursive function call as the second argument to setdefault, you wouldn't get any performance out of it as Python would always be calling the function recursively.
Since memoization was mentioned, a better alternative is to use functools.lru_cache decorator if you consider enhancing a function with memoization. lru_cache handles the caching requirements for a recursive function better.
I use setdefault() when I want a default value in an OrderedDict. There isn't a standard Python collection that does both, but there are ways to implement such a collection.
As Muhammad said, there are situations in which you only sometimes wish to set a default value. A great example of this is a data structure which is first populated, then queried.
Consider a trie. When adding a word, if a subnode is needed but not present, it must be created to extend the trie. When querying for the presence of a word, a missing subnode indicates that the word is not present and it should not be created.
A defaultdict cannot do this. Instead, a regular dict with the get and setdefault methods must be used.
Theoretically speaking, setdefault would still be handy if you sometimes want to set a default and sometimes not. In real life, I haven't come across such a use case.
However, an interesting use case comes up from the standard library (Python 2.6, _threadinglocal.py):
>>> mydata = local()
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]
I would say that using __dict__.setdefault is a pretty useful case.
Edit: As it happens, this is the only example in the standard library and it is in a comment. So may be it is not enough of a case to justify the existence of setdefault. Still, here is an explanation:
Objects store their attributes in the __dict__ attribute. As it happens, the __dict__ attribute is writeable at any time after the object creation. It is also a dictionary not a defaultdict. It is not sensible for objects in the general case to have __dict__ as a defaultdict because that would make each object having all legal identifiers as attributes. So I can't foresee any change to Python objects getting rid of __dict__.setdefault, apart from deleting it altogether if it was deemed not useful.
I rewrote the accepted answer and facile it for the newbies.
#break it down and understand it intuitively.
new = {}
for (key, value) in data:
if key not in new:
new[key] = [] # this is core of setdefault equals to new.setdefault(key, [])
new[key].append(value)
else:
new[key].append(value)
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # it is new[key] = []
group.append(value)
# even simpler with defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append(value) # all keys have a default value of empty list []
Additionally,I categorized the methods as reference:
dict_methods_11 = {
'views':['keys', 'values', 'items'],
'add':['update','setdefault'],
'remove':['pop', 'popitem','clear'],
'retrieve':['get',],
'copy':['copy','fromkeys'],}
One drawback of defaultdict over dict (dict.setdefault) is that a defaultdict object creates a new item EVERYTIME non existing key is given (eg with ==, print). Also the defaultdict class is generally way less common then the dict class, its more difficult to serialize it IME.
P.S. IMO functions|methods not meant to mutate an object, should not mutate an object.
Here are some examples of setdefault to show its usefulness:
"""
d = {}
# To add a key->value pair, do the following:
d.setdefault(key, []).append(value)
# To retrieve a list of the values for a key
list_of_values = d[key]
# To remove a key->value pair is still easy, if
# you don't mind leaving empty lists behind when
# the last value for a given key is removed:
d[key].remove(value)
# Despite the empty lists, it's still possible to
# test for the existance of values easily:
if d.has_key(key) and d[key]:
pass # d has some values for key
# Note: Each value can exist multiple times!
"""
e = {}
print e
e.setdefault('Cars', []).append('Toyota')
print e
e.setdefault('Motorcycles', []).append('Yamaha')
print e
e.setdefault('Airplanes', []).append('Boeing')
print e
e.setdefault('Cars', []).append('Honda')
print e
e.setdefault('Cars', []).append('BMW')
print e
e.setdefault('Cars', []).append('Toyota')
print e
# NOTE: now e['Cars'] == ['Toyota', 'Honda', 'BMW', 'Toyota']
e['Cars'].remove('Toyota')
print e
# NOTE: it's still true that ('Toyota' in e['Cars'])
I use setdefault frequently when, get this, setting a default (!!!) in a dictionary; somewhat commonly the os.environ dictionary:
# Set the venv dir if it isn't already overridden:
os.environ.setdefault('VENV_DIR', '/my/default/path')
Less succinctly, this looks like this:
# Set the venv dir if it isn't already overridden:
if 'VENV_DIR' not in os.environ:
os.environ['VENV_DIR'] = '/my/default/path')
It's worth noting that you can also use the resulting variable:
venv_dir = os.environ.setdefault('VENV_DIR', '/my/default/path')
But that's less necessary than it was before defaultdicts existed.
Another use case that I don't think was mentioned above.
Sometimes you keep a cache dict of objects by their id where primary instance is in the cache and you want to set cache when missing.
return self.objects_by_id.setdefault(obj.id, obj)
That's useful when you always want to keep a single instance per distinct id no matter how you obtain an obj each time. For example when object attributes get updated in memory and saving to storage is deferred.
One very important use-case I just stumbled across: dict.setdefault() is great for multi-threaded code when you only want a single canonical object (as opposed to multiple objects that happen to be equal).
For example, the (Int)Flag Enum in Python 3.6.0 has a bug: if multiple threads are competing for a composite (Int)Flag member, there may end up being more than one:
from enum import IntFlag, auto
import threading
class TestFlag(IntFlag):
one = auto()
two = auto()
three = auto()
four = auto()
five = auto()
six = auto()
seven = auto()
eight = auto()
def __eq__(self, other):
return self is other
def __hash__(self):
return hash(self.value)
seen = set()
class cycle_enum(threading.Thread):
def run(self):
for i in range(256):
seen.add(TestFlag(i))
threads = []
for i in range(8):
threads.append(cycle_enum())
for t in threads:
t.start()
for t in threads:
t.join()
len(seen)
# 272 (should be 256)
The solution is to use setdefault() as the last step of saving the computed composite member -- if another has already been saved then it is used instead of the new one, guaranteeing unique Enum members.
In addition to what have been suggested, setdefault might be useful in situations where you don't want to modify a value that has been already set. For example, when you have duplicate numbers and you want to treat them as one group. In this case, if you encounter a repeated duplicate key which has been already set, you won't update the value of that key. You will keep the first encountered value. As if you are iterating/updating the repeated keys once only.
Here's a code example of recording the index for the keys/elements of a sorted list:
nums = [2,2,2,2,2]
d = {}
for idx, num in enumerate(sorted(nums)):
# This will be updated with the value/index of the of the last repeated key
# d[num] = idx # Result (sorted_indices): [4, 4, 4, 4, 4]
# In the case of setdefault, all encountered repeated keys won't update the key.
# However, only the first encountered key's index will be set
d.setdefault(num,idx) # Result (sorted_indices): [0, 0, 0, 0, 0]
sorted_indices = [d[i] for i in nums]
[Edit] Very wrong! The setdefault would always trigger long_computation, Python being eager.
Expanding on Tuttle's answer. For me the best use case is cache mechanism. Instead of:
if x not in memo:
memo[x]=long_computation(x)
return memo[x]
which consumes 3 lines and 2 or 3 lookups, I would happily write :
return memo.setdefault(x, long_computation(x))
I like the answer given here:
http://stupidpythonideas.blogspot.com/2013/08/defaultdict-vs-setdefault.html
In short, the decision (in non-performance-critical apps) should be made on the basis of how you want to handle lookup of empty keys downstream (viz. KeyError versus default value).
The different use case for setdefault() is when you don't want to overwrite the value of an already set key. defaultdict overwrites, while setdefault() does not. For nested dictionaries it is more often the case that you want to set a default only if the key is not set yet, because you don't want to remove the present sub dictionary. This is when you use setdefault().
Example with defaultdict:
>>> from collection import defaultdict()
>>> foo = defaultdict()
>>> foo['a'] = 4
>>> foo['a'] = 2
>>> print(foo)
defaultdict(None, {'a': 2})
setdefault doesn't overwrite:
>>> bar = dict()
>>> bar.setdefault('a', 4)
>>> bar.setdefault('a', 2)
>>> print(bar)
{'a': 4}
Another usecase for setdefault in CPython is that it is atomic in all cases, whereas defaultdict will not be atomic if you use a default value created from a lambda.
cache = {}
def get_user_roles(user_id):
if user_id in cache:
return cache[user_id]['roles']
cache.setdefault(user_id, {'lock': threading.Lock()})
with cache[user_id]['lock']:
roles = query_roles_from_database(user_id)
cache[user_id]['roles'] = roles
If two threads execute cache.setdefault at the same time, only one of them will be able to create the default value.
If instead you used a defaultdict:
cache = defaultdict(lambda: {'lock': threading.Lock()}
This would result in a race condition. In my example above, the first thread could create a default lock, and the second thread could create another default lock, and then each thread could lock its own default lock, instead of the desired outcome of each thread attempting to lock a single lock.
Conceptually, setdefault basically behaves like this (defaultdict also behaves like this if you use an empty list, empty dict, int, or other default value that is not user python code like a lambda):
gil = threading.Lock()
def setdefault(dict, key, value_func):
with gil:
if key not in dict:
return
value = value_func()
dict[key] = value
Conceptually, defaultdict basically behaves like this (only when using python code like a lambda - this is not true if you use an empty list):
gil = threading.Lock()
def __setitem__(dict, key, value_func):
with gil:
if key not in dict:
return
value = value_func()
with gil:
dict[key] = value
I have the following tuples:
ReadElement = namedtuple('ReadElement', 'address value size')
LookupElement = namedtuple('LookupElement', ReadElement._fields[0:2] + ('lookups', ReadElement._fields[2]))
and I want to iterate through them as follows:
mytuples = [ReadElement(1,2,3), LookupElement(1,2,3,4)]
for address, value, lookups?, size in mytuples
if lookups is not None:
addLookups(lookups)
print address, value, lookups?, size
def addLookups(*items):
return sum(items)
How could I iterate through similar tuples using the same piece of code?
I think what I am looking for is a Union type of the two named tuples, so that that union type preserves the order of the tuples in the loop.
From laike9m post I can see how I can use the isinstance operator without having to unpack the tuples in the loop however I would like to avoid special casing the data and just go straight through without any if statements.
If these were objects I could do something like mytuples[0].execute() without having to worry about what type they were as long as they were were subclassed from the same parent and had that method implemented.
It seems that my question maybe a variant of the following Why are Super-class and Sub-class reversed? . In the case above I only have two items one subclass and one superclass where they are very similar to each other and therefore could also be made into a single class.
First, your namedtuple definition is wrong, should be:
LookupElement = namedtuple('LookupElement', ReadElement._fields[0:2] + ('lookups', ReadElement._fields[2]))
Second, you don't need to do worry about all that:
>>> for nt in mytuples:
print(nt)
ReadElement(address=1, value=2, size=3)
LookupElement(address=1, value=2, lookups=3, size=4)
I'm going to sleep so maybe I can't answer your futher question. I think the best way is to check whether the field you want exists before using it.
I don't know exactly what you want, here's what I'll do:
mytuples = [ReadElement(1,2,3), LookupElement(1,2,3,4)]
for nt in mytuples
if 'lookups' in nt._fields:
print nt.address, nt.value, nt.lookups, nt.size
else:
print nt.address, nt.value, nt.size
I am currently writing a script that extracts data from an xml and writes it into an html file for easy viewing on a webpage.
Each piece of data has 2 pieces of "sub data": Owner and Type.
In order for the html to work properly I need the "owner" string and the "type" string to be written in the correct place. If it was just a single piece of data then I would use dictionaries and just use the data name as the key and then write the value to html, however there are 2 pieces of data.
My question is, can a dictionary have 2 values (in my case owner and type) assigned to a single key?
Any object can be a value in a dictionary, so you can use any collection to hold more than one value against the same key. To expand my comments into some code samples, in order of increasing complexity (and, in my opinion, readability):
Tuple
The simplest option is a two-tuple of strings, which you can access by index:
>>> d1 = {'key': ('owner', 'type')}
>>> d1['key'][0]
'owner'
>>> d1['key'][1]
'type'
Dictionary
Next up is a sub-dictionary, which allows you to access the values by key name:
>>> d2 = {'key': {'owner': 'owner', 'type': 'type'}}
>>> d2['key']['owner']
'owner'
>>> d2['key']['type']
'type'
Named tuple
Finally the collections module provides namedtuple, which requires a little setup but then allows you to access the values by attribute name:
>>> from collections import namedtuple
>>> MyTuple = namedtuple('MyTuple', ('owner', 'type'))
>>> d3 = {'key': MyTuple('owner', 'type')}
>>> d3['key'].owner
'owner'
>>> d3['key'].type
'type'
Using named keys/attributes makes your subsequent access to the values clearer (d3['key'].owner and d2['key']['owner'] are less ambiguous than d1['key'][0]).
As long as keys are hash-able you can have keys of any format. Note, tuples are hash-able so that would be a possible solution to your problem
Make a tuple of case-owner and type and use it as a key to your dictionary.
Note, generally all objects that are hashable should also be immutable, but not vice-versa. So
Let's say I have this code:
my_dict = {}
default_value = {'surname': '', 'age': 0}
# get info about john, or a default dict
item = my_dict.get('john', default_value)
# edit the data
item[surname] = 'smith'
item[age] = 68
my_dict['john'] = item
The problem becomes clear, if we now check the value of default_value:
>>> default_value
{'age': 68, 'surname': 'smith'}
It is obvious, that my_dict.get() did not return the value of default_value, but a pointer (?) to it.
The problem could be worked around by changing the code to:
item = my_dict.get('john', {'surname': '', 'age': 0})
but that doesn't seem to be a nice way to do it. Any ideas, comments?
item = my_dict.get('john', default_value.copy())
You're always passing a reference in Python.
This doesn't matter for immutable objects like str, int, tuple, etc. since you can't change them, only point a name at a different object, but it does for mutable objects like list, set, and dict. You need to get used to this and always keep it in mind.
Edit: Zach Bloom and Jonathan Sternberg both point out methods you can use to avoid the call to copy on every lookup. You should use either the defaultdict method, something like Jonathan's first method, or:
def my_dict_get(key):
try:
item = my_dict[key]
except KeyError:
item = default_value.copy()
This will be faster than if when the key nearly always already exists in my_dict, if the dict is large. You don't have to wrap it in a function but you probably don't want those four lines every time you access my_dict.
See Jonathan's answer for timings with a small dict. The get method performs poorly at all sizes I tested, but the try method does better at large sizes.
Don't use get. You could do:
item = my_dict.get('john', default_value.copy())
But this requires a dictionary to be copied even if the dictionary entry exists. Instead, consider just checking if the value is there.
item = my_dict['john'] if 'john' in my_dict else default_value.copy()
The only problem with this is that it will perform two lookups for 'john' instead of just one. If you're willing to use an extra line (and None is not a possible value you could get from the dictionary), you could do:
item = my_dict.get('john')
if item is None:
item = default_value.copy()
EDIT: I thought I'd do some speed comparisons with timeit. The default_value and my_dict were globals. I did them each for both if the key was there, and if there was a miss.
Using exceptions:
def my_dict_get():
try:
item = my_dict['key']
except KeyError:
item = default_value.copy()
# key present: 0.4179
# key absent: 3.3799
Using get and checking if it's None.
def my_dict_get():
item = my_dict.get('key')
if item is None:
item = default_value.copy()
# key present: 0.57189
# key absent: 0.96691
Checking its existance with the special if/else syntax
def my_dict_get():
item = my_dict['key'] if 'key' in my_dict else default_value.copy()
# key present: 0.39721
# key absent: 0.43474
Naively copying the dictionary.
def my_dict_get():
item = my_dict.get('key', default_value.copy())
# key present: 0.52303 (this may be lower than it should be as the dictionary I used was one element)
# key absent: 0.66045
For the most part, everything except the one using exceptions are very similar. The special if/else syntax seems to have the lowest time for some reason (no idea why).
In Python dicts are both objects (so they are always passed as references) and mutable (meaning they can be changed without being recreated).
You can copy your dictionary each time you use it:
my_dict.get('john', default_value.copy())
You can also use the defaultdict collection:
from collections import defaultdict
def factory():
return {'surname': '', 'age': 0}
my_dict = defaultdict(factory)
my_dict['john']
The main thing to realize is that everything in Python is pass-by-reference. A variable name in a C-style language is usually shorthand for an object-shaped area of memory, and assigning to that variable makes a copy of another object-shaped area... in Python, variables are just keys in a dictionary (locals()), and the act of assignment just stores a new reference. (Technically, everything is a pointer, but that's an implementation detail).
This has a number of implications, the main one being there will never be an implicit copy of an object made because you passed it to a function, assigned it, etc. The only way to get a copy is to explicitly do so. The python stdlib offers a copy module which contains some things, including a copy() and deepcopy() function for when you want to explicitly make a copy of something. Also, some types expose a .copy() function of their own, but this is not a standard, or consistently implemented. Others which are immutable tend to sometimes offer a .replace() method, which makes a mutated copy.
In the case of your code, passing in the original instance obviously doesn't work, and making a copy ahead of time (when you may not need to) is wasteful. So the simplest solution is probably...
item = my_dict.get('john')
if item is None:
item = default_dict.copy()
It would be useful in this case if .get() supported passing in a default value constructor function, but that's probably over-engineering a base class for a border case.
because my_dict.get('john', default_value.copy()) would create a copy of default dict each time get is called (even when 'john' is present and returned), it is faster and very OK to use this try/except option:
try:
return my_dict['john']
except KeyError:
return {'surname': '', 'age': 0}
Alternatively, you can also use a defaultdict:
import collections
def default_factory():
return {'surname': '', 'age': 0}
my_dict = collections.defaultdict(default_factory)