This question already has answers here:
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 2 years ago.
I came across this behavior that surprised me in Python 2.6 and 3.2:
>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}
However, dict comprehensions in 3.2 show a more polite demeanor:
>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>
Why does fromkeys behave like that?
Your Python 2.6 example is equivalent to the following, which may help to clarify:
>>> a = []
>>> xs = dict.fromkeys(range(2), a)
Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.
>>> xs[0] is a and xs[1] is a
True
Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:
xs = dict((i, []) for i in range(2))
In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.
Look at this:
>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}
In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.
As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?
To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.
The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.
Here are some options for creating multiple actual copies of a mutable container:
As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
d = {k: [] for k in range(2)}
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by #Andrew Clark:
d = dict((k, []) for k in range(2))
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
d = collections.defaultdict(list)
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
for k in range(2):
d[k].append(42)
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
d = {}
for k in range(2):
d.setdefault(k, []).append(42)
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.
Related
This question already has answers here:
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 2 years ago.
I came across this behavior that surprised me in Python 2.6 and 3.2:
>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}
However, dict comprehensions in 3.2 show a more polite demeanor:
>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>
Why does fromkeys behave like that?
Your Python 2.6 example is equivalent to the following, which may help to clarify:
>>> a = []
>>> xs = dict.fromkeys(range(2), a)
Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.
>>> xs[0] is a and xs[1] is a
True
Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:
xs = dict((i, []) for i in range(2))
In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.
Look at this:
>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}
In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.
As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?
To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.
The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.
Here are some options for creating multiple actual copies of a mutable container:
As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
d = {k: [] for k in range(2)}
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by #Andrew Clark:
d = dict((k, []) for k in range(2))
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
d = collections.defaultdict(list)
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
for k in range(2):
d[k].append(42)
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
d = {}
for k in range(2):
d.setdefault(k, []).append(42)
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.
What am I doing wrong here?
a = set().add(1)
print a # Prints `None`
I'm trying to add the number 1 to the empty set.
It is a convention in Python that methods that mutate sequences return None.
Consider:
>>> a_list = [3, 2, 1]
>>> print a_list.sort()
None
>>> a_list
[1, 2, 3]
>>> a_dict = {}
>>> print a_dict.__setitem__('a', 1)
None
>>> a_dict
{'a': 1}
>>> a_set = set()
>>> print a_set.add(1)
None
>>> a_set
set([1])
Some may consider this convention "a horrible misdesign in Python", but the Design and History FAQ gives the reasoning behind this design decision (with respect to lists):
Why doesn’t list.sort() return the sorted list?
In situations where performance matters, making a copy of the list
just to sort it would be wasteful. Therefore, list.sort() sorts the
list in place. In order to remind you of that fact, it does not return
the sorted list. This way, you won’t be fooled into accidentally
overwriting a list when you need a sorted copy but also need to keep
the unsorted version around.
In Python 2.4 a new built-in function – sorted() – has been added.
This function creates a new list from a provided iterable, sorts it
and returns it.
Your particular problems with this feature come from a misunderstanding of good ways to create a set rather than a language misdesign. As Lattyware points out, in Python versions 2.7 and later you can use a set literal a = {1} or do a = set([1]) as per Sven Marnach's answer.
Parenthetically, I like Ruby's convention of placing an exclamation point after methods that mutate objects, but I find Python's approach acceptable.
The add() method adds an element to the set, but it does not return the set again -- it returns None.
a = set()
a.add(1)
or better
a = set([1])
would work.
Because add() is modifing your set in place returning None:
>>> empty = set()
>>> print(empty.add(1))
None
>>> empty
set([1])
Another way to do it that is relatively simple would be:
a = set()
a = set() | {1}
this creates a union between your set a and a set with 1 as the element
print(a) yields {1} then because a would now have all elements of both a and {1}
You should do this:
a = set()
a.add(1)
print a
Notice that you're assigning to a the result of adding 1, and the add operation, as defined in Python, returns None - and that's what is getting assigned to a in your code.
Alternatively, you can do this for initializing a set:
a = set([1, 2, 3])
The add method updates the set, but returns None.
a = set()
a.add(1)
print a
You are assigning the value returned by set().add(1) to a. This value is None, as add() does not return any value, it instead acts in-place on the list.
What you wanted to do was this:
a = set()
a.add(1)
print(a)
Of course, this example is trivial, but Python does support set literals, so if you really wanted to do this, it's better to do:
a = {1}
print(a)
The curly brackets denote a set (although be warned, {} denotes an empty dict, not an empty set due to the fact that curly brackets are used for both dicts and sets (dicts are separated by the use of the colon to separate keys and values.)
Alternatively to a = set() | {1} consider "in-place" operator:
a = set()
a |= {1}
A variable is set. Another variable is set to the first. The first changes value. The second does not. This has been the nature of programming since the dawn of time.
>>> a = 1
>>> b = a
>>> b = b - 1
>>> b
0
>>> a
1
I then extend this to Python lists. A list is declared and appended. Another list is declared to be equal to the first. The values in the second list change. Mysteriously, the values in the first list, though not acted upon directly, also change.
>>> alist = list()
>>> blist = list()
>>> alist.append(1)
>>> alist.append(2)
>>> alist
[1, 2]
>>> blist
[]
>>> blist = alist
>>> alist.remove(1)
>>> alist
[2]
>>> blist
[2]
>>>
Why is this?
And how do I prevent this from happening -- I want alist to be unfazed by changes to blist (immutable, if you will)?
Python variables are actually not variables but references to objects (similar to pointers in C). There is a very good explanation of that for beginners in http://foobarnbaz.com/2012/07/08/understanding-python-variables/
One way to convince yourself about this is to try this:
a=[1,2,3]
b=a
id(a)
68617320
id(b)
68617320
id returns the memory address of the given object. Since both are the same for both lists it means that changing one affects the other, because they are, in fact, the same thing.
Variable binding in Python works this way: you assign an object to a variable.
a = 4
b = a
Both point to 4.
b = 9
Now b points to somewhere else.
Exactly the same happens with lists:
a = []
b = a
b = [9]
Now, b has a new value, while a has the old one.
Till now, everything is clear and you have the same behaviour with mutable and immutable objects.
Now comes your misunderstanding: it is about modifying objects.
lists are mutable, so if you mutate a list, the modifications are visible via all variables ("name bindings") which exist:
a = []
b = a # the same list
c = [] # another empty one
a.append(3)
print a, b, c # a as well as b = [3], c = [] as it is a different one
d = a[:] # copy it completely
b.append(9)
# now a = b = [3, 9], c = [], d = [3], a copy of the old a resp. b
What is happening is that you create another reference to the same list when you do:
blist = alist
Thus, blist referes to the same list that alist does. Thus, any modifications to that single list will affect both alist and blist.
If you want to copy the entire list, and not just create a reference, you can do this:
blist = alist[:]
In fact, you can check the references yourself using id():
>>> alist = [1,2]
>>> blist = []
>>> id(alist)
411260888
>>> id(blist)
413871960
>>> blist = alist
>>> id(blist)
411260888
>>> blist = alist[:]
>>> id(blist)
407838672
This is a relevant quote from the Python docs.:
Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.
Based on this post:
Python passes references-to-objects by value (like Java), and
everything in Python is an object. This sounds simple, but then you
will notice that some data types seem to exhibit pass-by-value
characteristics, while others seem to act like pass-by-reference...
what's the deal?
It is important to understand mutable and immutable objects. Some
objects, like strings, tuples, and numbers, are immutable. Altering
them inside a function/method will create a new instance and the
original instance outside the function/method is not changed. Other
objects, like lists and dictionaries are mutable, which means you can
change the object in-place. Therefore, altering an object inside a
function/method will also change the original object outside.
So in your example you are making the variable bList and aList point to the same object. Therefore when you remove an element from either bList or aList it is reflected in the object that they both point to.
The short answer two your question "Why is this?": Because in Python integers are immutable, while lists are mutable.
You were looking for an official reference in the Python docs. Have a look at this section:
http://docs.python.org/2/reference/simple_stmts.html#assignment-statements
Quote from the latter:
Assignment statements are used to (re)bind names to values and to
modify attributes or items of mutable objects
I really like this sentence, have never seen it before. It answers your question precisely.
A good recent write-up about this topic is http://nedbatchelder.com/text/names.html, which has already been mentioned in one of the comments.
I am running into this situation while using chained methods in python.
Suppose, I have the following code
hash = {}
key = 'a'
val = 'A'
hash[key] = hash.get(key, []).append(val)
The hash.get(key, []) returns a [] and I was expecting that dictionary would be {'a': ['A']}. However the dictionary is set as {'a': None}.
On looking this up further, I realised that this was occurring due to python lists.
list_variable = []
list_variable.append(val)
sets the list_variable as ['A']
However, setting a list in the initial declaration
list_variable = [].append(val)
type(list_variable)
<type 'NoneType'>
What is wrong with my understanding and expectation that list_variable should contain ['A']
Why are the statements behaving differently?
The .append() function alters the list in place and thus always returns None. This is normal and expected behaviour. You do not need to have a return value as the list itself has already been updated.
Use the dict.setdefault() method to set a default empty list object:
>>> hash = {}
>>> hash.setdefault('a', []).append('A')
>>> hash
{'a': ['A']}
You may also be interested in the collections.defaultdict class:
>>> from collections import defaultdict
>>> hash = defaultdict(list)
>>> hash['a'].append('A')
>>> hash
defaultdict(<type 'list'>, {'a': ['A']})
If you want to return a new list with the extra item added, use concatenation:
lst = lst + ['val']
append operates in-place. However you can utilize setdefault in this case:
hash.setdefault(key, []).append(val)
What am I doing wrong here?
a = set().add(1)
print a # Prints `None`
I'm trying to add the number 1 to the empty set.
It is a convention in Python that methods that mutate sequences return None.
Consider:
>>> a_list = [3, 2, 1]
>>> print a_list.sort()
None
>>> a_list
[1, 2, 3]
>>> a_dict = {}
>>> print a_dict.__setitem__('a', 1)
None
>>> a_dict
{'a': 1}
>>> a_set = set()
>>> print a_set.add(1)
None
>>> a_set
set([1])
Some may consider this convention "a horrible misdesign in Python", but the Design and History FAQ gives the reasoning behind this design decision (with respect to lists):
Why doesn’t list.sort() return the sorted list?
In situations where performance matters, making a copy of the list
just to sort it would be wasteful. Therefore, list.sort() sorts the
list in place. In order to remind you of that fact, it does not return
the sorted list. This way, you won’t be fooled into accidentally
overwriting a list when you need a sorted copy but also need to keep
the unsorted version around.
In Python 2.4 a new built-in function – sorted() – has been added.
This function creates a new list from a provided iterable, sorts it
and returns it.
Your particular problems with this feature come from a misunderstanding of good ways to create a set rather than a language misdesign. As Lattyware points out, in Python versions 2.7 and later you can use a set literal a = {1} or do a = set([1]) as per Sven Marnach's answer.
Parenthetically, I like Ruby's convention of placing an exclamation point after methods that mutate objects, but I find Python's approach acceptable.
The add() method adds an element to the set, but it does not return the set again -- it returns None.
a = set()
a.add(1)
or better
a = set([1])
would work.
Because add() is modifing your set in place returning None:
>>> empty = set()
>>> print(empty.add(1))
None
>>> empty
set([1])
Another way to do it that is relatively simple would be:
a = set()
a = set() | {1}
this creates a union between your set a and a set with 1 as the element
print(a) yields {1} then because a would now have all elements of both a and {1}
You should do this:
a = set()
a.add(1)
print a
Notice that you're assigning to a the result of adding 1, and the add operation, as defined in Python, returns None - and that's what is getting assigned to a in your code.
Alternatively, you can do this for initializing a set:
a = set([1, 2, 3])
The add method updates the set, but returns None.
a = set()
a.add(1)
print a
You are assigning the value returned by set().add(1) to a. This value is None, as add() does not return any value, it instead acts in-place on the list.
What you wanted to do was this:
a = set()
a.add(1)
print(a)
Of course, this example is trivial, but Python does support set literals, so if you really wanted to do this, it's better to do:
a = {1}
print(a)
The curly brackets denote a set (although be warned, {} denotes an empty dict, not an empty set due to the fact that curly brackets are used for both dicts and sets (dicts are separated by the use of the colon to separate keys and values.)
Alternatively to a = set() | {1} consider "in-place" operator:
a = set()
a |= {1}