Is there a Python function that can clone objects? [duplicate] - python

I would like to create a copy of an object. I want the new object to possess all properties of the old object (values of the fields). But I want to have independent objects. So, if I change values of the fields of the new object, the old object should not be affected by that.

To get a fully independent copy of an object you can use the copy.deepcopy() function.
For more details about shallow and deep copying please refer to the other answers to this question and the nice explanation in this answer to a related question.

How can I create a copy of an object in Python?
So, if I change values of the fields of the new object, the old object should not be affected by that.
You mean a mutable object then.
In Python 3, lists get a copy method (in 2, you'd use a slice to make a copy):
>>> a_list = list('abc')
>>> a_copy_of_a_list = a_list.copy()
>>> a_copy_of_a_list is a_list
False
>>> a_copy_of_a_list == a_list
True
Shallow Copies
Shallow copies are just copies of the outermost container.
list.copy is a shallow copy:
>>> list_of_dict_of_set = [{'foo': set('abc')}]
>>> lodos_copy = list_of_dict_of_set.copy()
>>> lodos_copy[0]['foo'].pop()
'c'
>>> lodos_copy
[{'foo': {'b', 'a'}}]
>>> list_of_dict_of_set
[{'foo': {'b', 'a'}}]
You don't get a copy of the interior objects. They're the same object - so when they're mutated, the change shows up in both containers.
Deep copies
Deep copies are recursive copies of each interior object.
>>> lodos_deep_copy = copy.deepcopy(list_of_dict_of_set)
>>> lodos_deep_copy[0]['foo'].add('c')
>>> lodos_deep_copy
[{'foo': {'c', 'b', 'a'}}]
>>> list_of_dict_of_set
[{'foo': {'b', 'a'}}]
Changes are not reflected in the original, only in the copy.
Immutable objects
Immutable objects do not usually need to be copied. In fact, if you try to, Python will just give you the original object:
>>> a_tuple = tuple('abc')
>>> tuple_copy_attempt = a_tuple.copy()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'copy'
Tuples don't even have a copy method, so let's try it with a slice:
>>> tuple_copy_attempt = a_tuple[:]
But we see it's the same object:
>>> tuple_copy_attempt is a_tuple
True
Similarly for strings:
>>> s = 'abc'
>>> s0 = s[:]
>>> s == s0
True
>>> s is s0
True
and for frozensets, even though they have a copy method:
>>> a_frozenset = frozenset('abc')
>>> frozenset_copy_attempt = a_frozenset.copy()
>>> frozenset_copy_attempt is a_frozenset
True
When to copy immutable objects
Immutable objects should be copied if you need a mutable interior object copied.
>>> tuple_of_list = [],
>>> copy_of_tuple_of_list = tuple_of_list[:]
>>> copy_of_tuple_of_list[0].append('a')
>>> copy_of_tuple_of_list
(['a'],)
>>> tuple_of_list
(['a'],)
>>> deepcopy_of_tuple_of_list = copy.deepcopy(tuple_of_list)
>>> deepcopy_of_tuple_of_list[0].append('b')
>>> deepcopy_of_tuple_of_list
(['a', 'b'],)
>>> tuple_of_list
(['a'],)
As we can see, when the interior object of the copy is mutated, the original does not change.
Custom Objects
Custom objects usually store data in a __dict__ attribute or in __slots__ (a tuple-like memory structure.)
To make a copyable object, define __copy__ (for shallow copies) and/or __deepcopy__ (for deep copies).
from copy import copy, deepcopy
class Copyable:
__slots__ = 'a', '__dict__'
def __init__(self, a, b):
self.a, self.b = a, b
def __copy__(self):
return type(self)(self.a, self.b)
def __deepcopy__(self, memo): # memo is a dict of id's to copies
id_self = id(self) # memoization avoids unnecesary recursion
_copy = memo.get(id_self)
if _copy is None:
_copy = type(self)(
deepcopy(self.a, memo),
deepcopy(self.b, memo))
memo[id_self] = _copy
return _copy
Note that deepcopy keeps a memoization dictionary of id(original) (or identity numbers) to copies. To enjoy good behavior with recursive data structures, make sure you haven't already made a copy, and if you have, return that.
So let's make an object:
>>> c1 = Copyable(1, [2])
And copy makes a shallow copy:
>>> c2 = copy(c1)
>>> c1 is c2
False
>>> c2.b.append(3)
>>> c1.b
[2, 3]
And deepcopy now makes a deep copy:
>>> c3 = deepcopy(c1)
>>> c3.b.append(4)
>>> c1.b
[2, 3]

Shallow copy with copy.copy()
#!/usr/bin/env python3
import copy
class C():
def __init__(self):
self.x = [1]
self.y = [2]
# It copies.
c = C()
d = copy.copy(c)
d.x = [3]
assert c.x == [1]
assert d.x == [3]
# It's shallow.
c = C()
d = copy.copy(c)
d.x[0] = 3
assert c.x == [3]
assert d.x == [3]
Deep copy with copy.deepcopy()
#!/usr/bin/env python3
import copy
class C():
def __init__(self):
self.x = [1]
self.y = [2]
c = C()
d = copy.deepcopy(c)
d.x[0] = 3
assert c.x == [1]
assert d.x == [3]
Documentation: https://docs.python.org/3/library/copy.html
Tested on Python 3.6.5.

I believe the following should work with many well-behaved classed in Python:
def copy(obj):
return type(obj)(obj)
(Of course, I am not talking here about "deep copies," which is a different story, and which may be not a very clear concept -- how deep is deep enough?)
According to my tests with Python 3, for immutable objects, like tuples or strings, it returns the same object (because there is no need to make a shallow copy of an immutable object), but for lists or dictionaries it creates an independent shallow copy.
Of course this method only works for classes whose constructors behave accordingly. Possible use cases: making a shallow copy of a standard Python container class.

Related

Python allows adding a list and a dictionary if using += [duplicate]

The += operator in python seems to be operating unexpectedly on lists. Can anyone tell me what is going on here?
class foo:
bar = []
def __init__(self,x):
self.bar += [x]
class foo2:
bar = []
def __init__(self,x):
self.bar = self.bar + [x]
f = foo(1)
g = foo(2)
print f.bar
print g.bar
f.bar += [3]
print f.bar
print g.bar
f.bar = f.bar + [4]
print f.bar
print g.bar
f = foo2(1)
g = foo2(2)
print f.bar
print g.bar
OUTPUT
[1, 2]
[1, 2]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3]
[1]
[2]
foo += bar seems to affect every instance of the class, whereas foo = foo + bar seems to behave in the way I would expect things to behave.
The += operator is called a "compound assignment operator".
The general answer is that += tries to call the __iadd__ special method, and if that isn't available it tries to use __add__ instead. So the issue is with the difference between these special methods.
The __iadd__ special method is for an in-place addition, that is it mutates the object that it acts on. The __add__ special method returns a new object and is also used for the standard + operator.
So when the += operator is used on an object which has an __iadd__ defined the object is modified in place. Otherwise it will instead try to use the plain __add__ and return a new object.
That is why for mutable types like lists += changes the object's value, whereas for immutable types like tuples, strings and integers a new object is returned instead (a += b becomes equivalent to a = a + b).
For types that support both __iadd__ and __add__ you therefore have to be careful which one you use. a += b will call __iadd__ and mutate a, whereas a = a + b will create a new object and assign it to a. They are not the same operation!
>>> a1 = a2 = [1, 2]
>>> b1 = b2 = [1, 2]
>>> a1 += [3] # Uses __iadd__, modifies a1 in-place
>>> b1 = b1 + [3] # Uses __add__, creates new list, assigns it to b1
>>> a2
[1, 2, 3] # a1 and a2 are still the same list
>>> b2
[1, 2] # whereas only b1 was changed
For immutable types (where you don't have an __iadd__) a += b and a = a + b are equivalent. This is what lets you use += on immutable types, which might seem a strange design decision until you consider that otherwise you couldn't use += on immutable types like numbers!
For the general case, see Scott Griffith's answer. When dealing with lists like you are, though, the += operator is a shorthand for someListObject.extend(iterableObject). See the documentation of extend().
The extend function will append all elements of the parameter to the list.
When doing foo += something you're modifying the list foo in place, thus you don't change the reference that the name foo points to, but you're changing the list object directly. With foo = foo + something, you're actually creating a new list.
This example code will explain it:
>>> l = []
>>> id(l)
13043192
>>> l += [3]
>>> id(l)
13043192
>>> l = l + [3]
>>> id(l)
13059216
Note how the reference changes when you reassign the new list to l.
As bar is a class variable instead of an instance variable, modifying in place will affect all instances of that class. But when redefining self.bar, the instance will have a separate instance variable self.bar without affecting the other class instances.
The problem here is, bar is defined as a class attribute, not an instance variable.
In foo, the class attribute is modified in the init method, that's why all instances are affected.
In foo2, an instance variable is defined using the (empty) class attribute, and every instance gets its own bar.
The "correct" implementation would be:
class foo:
def __init__(self, x):
self.bar = [x]
Of course, class attributes are completely legal. In fact, you can access and modify them without creating an instance of the class like this:
class foo:
bar = []
foo.bar = [x]
There are two things involved here:
1. class attributes and instance attributes
2. difference between the operators + and += for lists
+ operator calls the __add__ method on a list. It takes all the elements from its operands and makes a new list containing those elements maintaining their order.
+= operator calls __iadd__ method on the list. It takes an iterable and appends all the elements of the iterable to the list in place. It does not create a new list object.
In class foo the statement self.bar += [x] is not an assignment statement but actually translates to
self.bar.__iadd__([x]) # modifies the class attribute
which modifies the list in place and acts like the list method extend.
In class foo2, on the contrary, the assignment statement in the init method
self.bar = self.bar + [x]
can be deconstructed as:
The instance has no attribute bar (there is a class attribute of the same name, though) so it accesses the class attribute bar and creates a new list by appending x to it. The statement translates to:
self.bar = self.bar.__add__([x]) # bar on the lhs is the class attribute
Then it creates an instance attribute bar and assigns the newly created list to it. Note that bar on the rhs of the assignment is different from the bar on the lhs.
For instances of class foo, bar is a class attribute and not instance attribute. Hence any change to the class attribute bar will be reflected for all instances.
On the contrary, each instance of the class foo2 has its own instance attribute bar which is different from the class attribute of the same name bar.
f = foo2(4)
print f.bar # accessing the instance attribute. prints [4]
print f.__class__.bar # accessing the class attribute. prints []
Hope this clears things.
Although much time has passed and many correct things were said, there is no answer which bundles both effects.
You have 2 effects:
a "special", maybe unnoticed behaviour of lists with += (as stated by Scott Griffiths)
the fact that class attributes as well as instance attributes are involved (as stated by Can Berk Büder)
In class foo, the __init__ method modifies the class attribute. It is because self.bar += [x] translates to self.bar = self.bar.__iadd__([x]). __iadd__() is for inplace modification, so it modifies the list and returns a reference to it.
Note that the instance dict is modified although this would normally not be necessary as the class dict already contains the same assignment. So this detail goes almost unnoticed - except if you do a foo.bar = [] afterwards. Here the instances's bar stays the same thanks to the said fact.
In class foo2, however, the class's bar is used, but not touched. Instead, a [x] is added to it, forming a new object, as self.bar.__add__([x]) is called here, which doesn't modify the object. The result is put into the instance dict then, giving the instance the new list as a dict, while the class's attribute stays modified.
The distinction between ... = ... + ... and ... += ... affects as well the assignments afterwards:
f = foo(1) # adds 1 to the class's bar and assigns f.bar to this as well.
g = foo(2) # adds 2 to the class's bar and assigns g.bar to this as well.
# Here, foo.bar, f.bar and g.bar refer to the same object.
print f.bar # [1, 2]
print g.bar # [1, 2]
f.bar += [3] # adds 3 to this object
print f.bar # As these still refer to the same object,
print g.bar # the output is the same.
f.bar = f.bar + [4] # Construct a new list with the values of the old ones, 4 appended.
print f.bar # Print the new one
print g.bar # Print the old one.
f = foo2(1) # Here a new list is created on every call.
g = foo2(2)
print f.bar # So these all obly have one element.
print g.bar
You can verify the identity of the objects with print id(foo), id(f), id(g) (don't forget the additional ()s if you are on Python3).
BTW: The += operator is called "augmented assignment" and generally is intended to do inplace modifications as far as possible.
The other answers would seem to pretty much have it covered, though it seems worth quoting and referring to the Augmented Assignments PEP 203:
They [the augmented assignment operators] implement the same operator
as their normal binary form, except that the operation is done
`in-place' when the left-hand side object supports it, and that the
left-hand side is only evaluated once.
...
The idea behind augmented
assignment in Python is that it isn't just an easier way to write the
common practice of storing the result of a binary operation in its
left-hand operand, but also a way for the left-hand operand in
question to know that it should operate `on itself', rather than
creating a modified copy of itself.
>>> elements=[[1],[2],[3]]
>>> subset=[]
>>> subset+=elements[0:1]
>>> subset
[[1]]
>>> elements
[[1], [2], [3]]
>>> subset[0][0]='change'
>>> elements
[['change'], [2], [3]]
>>> a=[1,2,3,4]
>>> b=a
>>> a+=[5]
>>> a,b
([1, 2, 3, 4, 5], [1, 2, 3, 4, 5])
>>> a=[1,2,3,4]
>>> b=a
>>> a=a+[5]
>>> a,b
([1, 2, 3, 4, 5], [1, 2, 3, 4])
>>> a = 89
>>> id(a)
4434330504
>>> a = 89 + 1
>>> print(a)
90
>>> id(a)
4430689552 # this is different from before!
>>> test = [1, 2, 3]
>>> id(test)
48638344L
>>> test2 = test
>>> id(test)
48638344L
>>> test2 += [4]
>>> id(test)
48638344L
>>> print(test, test2) # [1, 2, 3, 4] [1, 2, 3, 4]```
([1, 2, 3, 4], [1, 2, 3, 4])
>>> id(test2)
48638344L # ID is different here
We see that when we attempt to modify an immutable object (integer in this case), Python simply gives us a different object instead. On the other hand, we are able to make changes to an mutable object (a list) and have it remain the same object throughout.
ref : https://medium.com/#tyastropheus/tricky-python-i-memory-management-for-mutable-immutable-objects-21507d1e5b95
Also refer below url to understand the shallowcopy and deepcopy
https://www.geeksforgeeks.org/copy-python-deep-copy-shallow-copy/
listname.extend() works great for this purpose :)

Python: accidentally created a reference but not sure how

I imagine this is one in a very long list of questions from people who have inadvertantly created references in python, but I've got the following situation. I'm using scipy minimize to set the sum of the top row of an array to 5 (as an example).
class problem_test:
def __init__(self):
test_array = [[1,2,3,4,5,6,7],
[4,5,6,7,8,9,10]]
def set_top_row_to_five(x, array):
array[0] = array[0] + x
return abs(sum(array[0]) - 5)
adjustment = spo.minimize(set_top_row_to_five,0,args=(test_array))
print(test_array)
print(adjustment.x)
ptest = problem_test()
However, the optimization is altering the original array (test_array):
[array([-2.03, -1.03, -0.03, 0.97, 1.97, 2.97, 3.97]), [4, 5, 6, 7, 8, 9, 10]]
[-0.00000001]
I realize I can solve this using, for example, deepcopy, but I'm keen to learn why this is happening so I don't do the same in future by accident.
Thanks in advance!
Names are references to objects. What is to observe is whether the objects (also passed in an argument) is modified itself or a new object is created. An example would be:
>>> l1 = list()
>>> l2 = l1
>>> l2.append(0) # this modifies object currently reference to by l1 and l2
>>> print(l1)
[0]
Whereas:
>>> l1 = list()
>>> l2 = list(l1) # New list object has been created with initial values from l1
>>> l2.append(0)
>>> print(l1)
[]
Or:
>>> l1 = list()
>>> l2 = l1
>>> l2 = [0] # New list object has been created and assigned to l2
>>> l2.append(0)
>>> print(l1)
[]
Similarly assuming l = [1, 2, 3]:
>>> def f1(list_arg):
... return list_arg.reverse()
>>> print(f1, l)
None [3, 2, 1]
We have just passed None returned my list.reverse method through and reversed l (in place). However:
>>> def f2(list_arg):
... ret_list = list(list_arg)
... ret_list.reverse()
... return ret_list
>>> print(f2(l), l)
[3, 2, 1] [1, 2, 3]
Function returns a new reversed object (initialized) from l which remained unchanged (NOTE: in this exampled built-in reversed or slicing would of course make more sense.)
When nested, one must not forget that for instance:
>>> l = [1, 2, 3]
>>> d1 = {'k': l}
>>> d2 = dict(d1)
>>> d1 is d2
False
>>> d1['k'] is d2['k']
True
Dictionaries d1 and d2 are two different objects, but their k item is only one (and shared) instance. This is the case when copy.deepcopy might come in handy.
Care needs to be taken when passing objects around to make sure they are modified or copy is used as wanted and expected. It might be helpful to return None or similar generic value when making in place changes and return the resulting object when working with a copy so that the function/method interface itself hints what the intention was and what is actually going on here.
When immutable objects (as the name suggests) are being "modified" a new object would actually be created and assigned to a new or back to the original name/reference:
>>> s = 'abc'
>>> print('0x{:x} {}'.format(id(s), s))
0x7f4a9dbbfa78 abc
>>> s = s.upper()
>>> print('0x{:x} {}'.format(id(s), s))
0x7f4a9c989490 ABC
Note though, that even immutable type could include reference to a mutable object. For instance for l = [1, 2, 3]; t1 = (l,); t2 = t1, one can t1[0].append(4). This change would also be seen in t2[0] (for the same reason as d1['k'] and d2['k'] above) while both tuples themselves remained unmodified.
One extra caveat (possible gotcha). When defining default argument values (using mutable types), that default argument, when function is called without passing an object, behaves like a "static" variable:
>>> def f3(arg_list=[]):
... arg_list.append('x')
... print(arg_list)
>>> f3()
['x']
>>> f3()
['x', 'x']
Since this is often not a behavior people assume at first glance, using mutable objects as default argument value is usually better avoided.
Similar would be true for class attributes where one object would be shared between all instances:
>>> class C(object):
... a = []
... def m(self):
... self.a.append('x') # We actually modify value of an attribute of C
... print(self.a)
>>> c1 = C()
>>> c2 = C()
>>> c1.m()
['x']
>>> c2.m()
['x', 'x']
>>> c1.m()
['x', 'x', 'x']
Note what the behavior would be in case of class immutable type class attribute in a similar example:
>>> class C(object):
... a = 0
... def m(self):
... self.a += 1 # We assign new object to an attribute of self
... print(self.a)
>>> c1 = C()
>>> c2 = C()
>>> c1.m()
1
>>> c2.m()
1
>>> c1.m()
2
All the fun details can be found in the documentation: https://docs.python.org/3.6/reference/datamodel.html

Shallow/Deep copy in python [duplicate]

This question already has answers here:
What is the difference between shallow copy, deepcopy and normal assignment operation?
(12 answers)
Closed 8 years ago.
From my understanding of deep/shallow copying. Shallow copying assigns a new identifier to point at the same object.
>>>x = [1,2,3]
>>>y = x
>>>x,y
([1,2,3],[1,2,3])
>>>x is y
True
>>>x[1] = 14
>>>x,y
([1,14,3],[1,14,3])
Deep copying creates a new object with equivalent value :
>>>import copy
>>>x = [1,2,3]
>>>y = copy.deepcopy(x)
>>>x is y
False
>>>x == y
True
>>>x[1] = 14
>>>x,y
([1,14,3],[1,2,3])
My confusion is if x=y creates a shallow copy and the copy.copy() function also creates a shallow copy of the object then:
>>> import copy
>>> x = [1,2,3]
>>> y = x
>>> z = copy.copy(x)
>>> x is y
True
>>> x is z
False
>>> id(x),id(y),id(z)
(4301106640, 4301106640, 4301173968)
why it is creating a new object if it is supposed to be a shallow copy?
A shallow copy creates a new list object and copies across all the references contained in the source list. A deep copy creates new objects recursively.
You won't see the difference with just immutable contents. Use nested lists to see the difference:
>>> import copy
>>> a = ['foo', 'bar', 'baz']
>>> b = ['spam', 'ham', 'eggs']
>>> outer = [a, b]
>>> copy_of_outer = copy.copy(outer)
>>> outer is copy_of_outer
False
>>> outer == copy_of_outer
True
>>> outer[0] is a
True
>>> copy_of_outer[0] is a
True
>>> outer[0] is copy_of_outer[0]
True
A new copy of the outer list was created, but the contents of the original and the copy are still the same objects.
>>> deep_copy_of_outer = copy.deepcopy(outer)
>>> deep_copy_of_outer[0] is a
False
>>> outer[0] is deep_copy_of_outer[0]
False
The deep copy doesn't share contents with the original; the a list has been recursively copied as well.

Does a slicing operation give me a deep or shallow copy?

The official Python docs say that using the slicing operator and assigning in Python makes a shallow copy of the sliced list.
But when I write code for example:
o = [1, 2, 4, 5]
p = o[:]
And when I write:
id(o)
id(p)
I get different id's and also appending one one list does not reflect in the other list. Isn't it creating a deep copy or is there somewhere I am going wrong?
You are creating a shallow copy, because nested values are not copied, merely referenced. A deep copy would create copies of the values referenced by the list too.
Demo:
>>> lst = [{}]
>>> lst_copy = lst[:]
>>> lst_copy[0]['foo'] = 'bar'
>>> lst_copy.append(42)
>>> lst
[{'foo': 'bar'}]
>>> id(lst) == id(lst_copy)
False
>>> id(lst[0]) == id(lst_copy[0])
True
Here the nested dictionary is not copied; it is merely referenced by both lists. The new element 42 is not shared.
Remember that everything in Python is an object, and names and list elements are merely references to those objects. A copy of a list creates a new outer list, but the new list merely receives references to the exact same objects.
A proper deep copy creates new copies of each and every object contained in the list, recursively:
>>> from copy import deepcopy
>>> lst_deepcopy = deepcopy(lst)
>>> id(lst_deepcopy[0]) == id(lst[0])
False
You should know that tests using is or id can be misleading of whether a true copy is being made with immutable and interned objects such as strings, integers and tuples that contain immutables.
Consider an easily understood example of interned strings:
>>> l1=['one']
>>> l2=['one']
>>> l1 is l2
False
>>> l1[0] is l2[0]
True
Now make a shallow copy of l1 and test the immutable string:
>>> l3=l1[:]
>>> l3 is l1
False
>>> l3[0] is l1[0]
True
Now make a copy of the string contained by l1[0]:
>>> s1=l1[0][:]
>>> s1
'one'
>>> s1 is l1[0] is l2[0] is l3[0]
True # they are all the same object
Try a deepcopy where every element should be copied:
>>> from copy import deepcopy
>>> l4=deepcopy(l1)
>>> l4[0] is l1[0]
True
In each case, the string 'one' is being interned into Python's internal cache of immutable strings and is will show that they are the same (they have the same id). It is implementation and version dependent of what gets interned and when it does, so you cannot depend on it. It can be a substantial memory and performance enhancement.
You can force an example that does not get interned instantly:
>>> s2=''.join(c for c in 'one')
>>> s2==l1[0]
True
>>> s2 is l1[0]
False
And then you can use the Python intern function to cause that string to refer to the cached object if found:
>>> l1[0] is s2
False
>>> s2=intern(s2)
>>> l1[0] is s2
True
Same applies to tuples of immutables:
>>> t1=('one','two')
>>> t2=t1[:]
>>> t1 is t2
True
>>> t3=deepcopy(t1)
>>> t3 is t2 is t1
True
And mutable lists of immutables (like integers) can have the list members interred:
>>> li1=[1,2,3]
>>> li2=deepcopy(li1)
>>> li2 == li1
True
>>> li2 is li1
False
>>> li1[0] is li2[0]
True
So you may use python operations that you KNOW will copy something but the end result is another reference to an interned immutable object. The is test is only a dispositive test of a copy being made IF the items are mutable.

Why does += behave unexpectedly on lists?

The += operator in python seems to be operating unexpectedly on lists. Can anyone tell me what is going on here?
class foo:
bar = []
def __init__(self,x):
self.bar += [x]
class foo2:
bar = []
def __init__(self,x):
self.bar = self.bar + [x]
f = foo(1)
g = foo(2)
print f.bar
print g.bar
f.bar += [3]
print f.bar
print g.bar
f.bar = f.bar + [4]
print f.bar
print g.bar
f = foo2(1)
g = foo2(2)
print f.bar
print g.bar
OUTPUT
[1, 2]
[1, 2]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3]
[1]
[2]
foo += bar seems to affect every instance of the class, whereas foo = foo + bar seems to behave in the way I would expect things to behave.
The += operator is called a "compound assignment operator".
The general answer is that += tries to call the __iadd__ special method, and if that isn't available it tries to use __add__ instead. So the issue is with the difference between these special methods.
The __iadd__ special method is for an in-place addition, that is it mutates the object that it acts on. The __add__ special method returns a new object and is also used for the standard + operator.
So when the += operator is used on an object which has an __iadd__ defined the object is modified in place. Otherwise it will instead try to use the plain __add__ and return a new object.
That is why for mutable types like lists += changes the object's value, whereas for immutable types like tuples, strings and integers a new object is returned instead (a += b becomes equivalent to a = a + b).
For types that support both __iadd__ and __add__ you therefore have to be careful which one you use. a += b will call __iadd__ and mutate a, whereas a = a + b will create a new object and assign it to a. They are not the same operation!
>>> a1 = a2 = [1, 2]
>>> b1 = b2 = [1, 2]
>>> a1 += [3] # Uses __iadd__, modifies a1 in-place
>>> b1 = b1 + [3] # Uses __add__, creates new list, assigns it to b1
>>> a2
[1, 2, 3] # a1 and a2 are still the same list
>>> b2
[1, 2] # whereas only b1 was changed
For immutable types (where you don't have an __iadd__) a += b and a = a + b are equivalent. This is what lets you use += on immutable types, which might seem a strange design decision until you consider that otherwise you couldn't use += on immutable types like numbers!
For the general case, see Scott Griffith's answer. When dealing with lists like you are, though, the += operator is a shorthand for someListObject.extend(iterableObject). See the documentation of extend().
The extend function will append all elements of the parameter to the list.
When doing foo += something you're modifying the list foo in place, thus you don't change the reference that the name foo points to, but you're changing the list object directly. With foo = foo + something, you're actually creating a new list.
This example code will explain it:
>>> l = []
>>> id(l)
13043192
>>> l += [3]
>>> id(l)
13043192
>>> l = l + [3]
>>> id(l)
13059216
Note how the reference changes when you reassign the new list to l.
As bar is a class variable instead of an instance variable, modifying in place will affect all instances of that class. But when redefining self.bar, the instance will have a separate instance variable self.bar without affecting the other class instances.
The problem here is, bar is defined as a class attribute, not an instance variable.
In foo, the class attribute is modified in the init method, that's why all instances are affected.
In foo2, an instance variable is defined using the (empty) class attribute, and every instance gets its own bar.
The "correct" implementation would be:
class foo:
def __init__(self, x):
self.bar = [x]
Of course, class attributes are completely legal. In fact, you can access and modify them without creating an instance of the class like this:
class foo:
bar = []
foo.bar = [x]
There are two things involved here:
1. class attributes and instance attributes
2. difference between the operators + and += for lists
+ operator calls the __add__ method on a list. It takes all the elements from its operands and makes a new list containing those elements maintaining their order.
+= operator calls __iadd__ method on the list. It takes an iterable and appends all the elements of the iterable to the list in place. It does not create a new list object.
In class foo the statement self.bar += [x] is not an assignment statement but actually translates to
self.bar.__iadd__([x]) # modifies the class attribute
which modifies the list in place and acts like the list method extend.
In class foo2, on the contrary, the assignment statement in the init method
self.bar = self.bar + [x]
can be deconstructed as:
The instance has no attribute bar (there is a class attribute of the same name, though) so it accesses the class attribute bar and creates a new list by appending x to it. The statement translates to:
self.bar = self.bar.__add__([x]) # bar on the lhs is the class attribute
Then it creates an instance attribute bar and assigns the newly created list to it. Note that bar on the rhs of the assignment is different from the bar on the lhs.
For instances of class foo, bar is a class attribute and not instance attribute. Hence any change to the class attribute bar will be reflected for all instances.
On the contrary, each instance of the class foo2 has its own instance attribute bar which is different from the class attribute of the same name bar.
f = foo2(4)
print f.bar # accessing the instance attribute. prints [4]
print f.__class__.bar # accessing the class attribute. prints []
Hope this clears things.
Although much time has passed and many correct things were said, there is no answer which bundles both effects.
You have 2 effects:
a "special", maybe unnoticed behaviour of lists with += (as stated by Scott Griffiths)
the fact that class attributes as well as instance attributes are involved (as stated by Can Berk Büder)
In class foo, the __init__ method modifies the class attribute. It is because self.bar += [x] translates to self.bar = self.bar.__iadd__([x]). __iadd__() is for inplace modification, so it modifies the list and returns a reference to it.
Note that the instance dict is modified although this would normally not be necessary as the class dict already contains the same assignment. So this detail goes almost unnoticed - except if you do a foo.bar = [] afterwards. Here the instances's bar stays the same thanks to the said fact.
In class foo2, however, the class's bar is used, but not touched. Instead, a [x] is added to it, forming a new object, as self.bar.__add__([x]) is called here, which doesn't modify the object. The result is put into the instance dict then, giving the instance the new list as a dict, while the class's attribute stays modified.
The distinction between ... = ... + ... and ... += ... affects as well the assignments afterwards:
f = foo(1) # adds 1 to the class's bar and assigns f.bar to this as well.
g = foo(2) # adds 2 to the class's bar and assigns g.bar to this as well.
# Here, foo.bar, f.bar and g.bar refer to the same object.
print f.bar # [1, 2]
print g.bar # [1, 2]
f.bar += [3] # adds 3 to this object
print f.bar # As these still refer to the same object,
print g.bar # the output is the same.
f.bar = f.bar + [4] # Construct a new list with the values of the old ones, 4 appended.
print f.bar # Print the new one
print g.bar # Print the old one.
f = foo2(1) # Here a new list is created on every call.
g = foo2(2)
print f.bar # So these all obly have one element.
print g.bar
You can verify the identity of the objects with print id(foo), id(f), id(g) (don't forget the additional ()s if you are on Python3).
BTW: The += operator is called "augmented assignment" and generally is intended to do inplace modifications as far as possible.
The other answers would seem to pretty much have it covered, though it seems worth quoting and referring to the Augmented Assignments PEP 203:
They [the augmented assignment operators] implement the same operator
as their normal binary form, except that the operation is done
`in-place' when the left-hand side object supports it, and that the
left-hand side is only evaluated once.
...
The idea behind augmented
assignment in Python is that it isn't just an easier way to write the
common practice of storing the result of a binary operation in its
left-hand operand, but also a way for the left-hand operand in
question to know that it should operate `on itself', rather than
creating a modified copy of itself.
>>> elements=[[1],[2],[3]]
>>> subset=[]
>>> subset+=elements[0:1]
>>> subset
[[1]]
>>> elements
[[1], [2], [3]]
>>> subset[0][0]='change'
>>> elements
[['change'], [2], [3]]
>>> a=[1,2,3,4]
>>> b=a
>>> a+=[5]
>>> a,b
([1, 2, 3, 4, 5], [1, 2, 3, 4, 5])
>>> a=[1,2,3,4]
>>> b=a
>>> a=a+[5]
>>> a,b
([1, 2, 3, 4, 5], [1, 2, 3, 4])
>>> a = 89
>>> id(a)
4434330504
>>> a = 89 + 1
>>> print(a)
90
>>> id(a)
4430689552 # this is different from before!
>>> test = [1, 2, 3]
>>> id(test)
48638344L
>>> test2 = test
>>> id(test)
48638344L
>>> test2 += [4]
>>> id(test)
48638344L
>>> print(test, test2) # [1, 2, 3, 4] [1, 2, 3, 4]```
([1, 2, 3, 4], [1, 2, 3, 4])
>>> id(test2)
48638344L # ID is different here
We see that when we attempt to modify an immutable object (integer in this case), Python simply gives us a different object instead. On the other hand, we are able to make changes to an mutable object (a list) and have it remain the same object throughout.
ref : https://medium.com/#tyastropheus/tricky-python-i-memory-management-for-mutable-immutable-objects-21507d1e5b95
Also refer below url to understand the shallowcopy and deepcopy
https://www.geeksforgeeks.org/copy-python-deep-copy-shallow-copy/
listname.extend() works great for this purpose :)

Categories