Does a slicing operation give me a deep or shallow copy? - python

The official Python docs say that using the slicing operator and assigning in Python makes a shallow copy of the sliced list.
But when I write code for example:
o = [1, 2, 4, 5]
p = o[:]
And when I write:
id(o)
id(p)
I get different id's and also appending one one list does not reflect in the other list. Isn't it creating a deep copy or is there somewhere I am going wrong?

You are creating a shallow copy, because nested values are not copied, merely referenced. A deep copy would create copies of the values referenced by the list too.
Demo:
>>> lst = [{}]
>>> lst_copy = lst[:]
>>> lst_copy[0]['foo'] = 'bar'
>>> lst_copy.append(42)
>>> lst
[{'foo': 'bar'}]
>>> id(lst) == id(lst_copy)
False
>>> id(lst[0]) == id(lst_copy[0])
True
Here the nested dictionary is not copied; it is merely referenced by both lists. The new element 42 is not shared.
Remember that everything in Python is an object, and names and list elements are merely references to those objects. A copy of a list creates a new outer list, but the new list merely receives references to the exact same objects.
A proper deep copy creates new copies of each and every object contained in the list, recursively:
>>> from copy import deepcopy
>>> lst_deepcopy = deepcopy(lst)
>>> id(lst_deepcopy[0]) == id(lst[0])
False

You should know that tests using is or id can be misleading of whether a true copy is being made with immutable and interned objects such as strings, integers and tuples that contain immutables.
Consider an easily understood example of interned strings:
>>> l1=['one']
>>> l2=['one']
>>> l1 is l2
False
>>> l1[0] is l2[0]
True
Now make a shallow copy of l1 and test the immutable string:
>>> l3=l1[:]
>>> l3 is l1
False
>>> l3[0] is l1[0]
True
Now make a copy of the string contained by l1[0]:
>>> s1=l1[0][:]
>>> s1
'one'
>>> s1 is l1[0] is l2[0] is l3[0]
True # they are all the same object
Try a deepcopy where every element should be copied:
>>> from copy import deepcopy
>>> l4=deepcopy(l1)
>>> l4[0] is l1[0]
True
In each case, the string 'one' is being interned into Python's internal cache of immutable strings and is will show that they are the same (they have the same id). It is implementation and version dependent of what gets interned and when it does, so you cannot depend on it. It can be a substantial memory and performance enhancement.
You can force an example that does not get interned instantly:
>>> s2=''.join(c for c in 'one')
>>> s2==l1[0]
True
>>> s2 is l1[0]
False
And then you can use the Python intern function to cause that string to refer to the cached object if found:
>>> l1[0] is s2
False
>>> s2=intern(s2)
>>> l1[0] is s2
True
Same applies to tuples of immutables:
>>> t1=('one','two')
>>> t2=t1[:]
>>> t1 is t2
True
>>> t3=deepcopy(t1)
>>> t3 is t2 is t1
True
And mutable lists of immutables (like integers) can have the list members interred:
>>> li1=[1,2,3]
>>> li2=deepcopy(li1)
>>> li2 == li1
True
>>> li2 is li1
False
>>> li1[0] is li2[0]
True
So you may use python operations that you KNOW will copy something but the end result is another reference to an interned immutable object. The is test is only a dispositive test of a copy being made IF the items are mutable.

Related

Is there a Python function that can clone objects? [duplicate]

I would like to create a copy of an object. I want the new object to possess all properties of the old object (values of the fields). But I want to have independent objects. So, if I change values of the fields of the new object, the old object should not be affected by that.
To get a fully independent copy of an object you can use the copy.deepcopy() function.
For more details about shallow and deep copying please refer to the other answers to this question and the nice explanation in this answer to a related question.
How can I create a copy of an object in Python?
So, if I change values of the fields of the new object, the old object should not be affected by that.
You mean a mutable object then.
In Python 3, lists get a copy method (in 2, you'd use a slice to make a copy):
>>> a_list = list('abc')
>>> a_copy_of_a_list = a_list.copy()
>>> a_copy_of_a_list is a_list
False
>>> a_copy_of_a_list == a_list
True
Shallow Copies
Shallow copies are just copies of the outermost container.
list.copy is a shallow copy:
>>> list_of_dict_of_set = [{'foo': set('abc')}]
>>> lodos_copy = list_of_dict_of_set.copy()
>>> lodos_copy[0]['foo'].pop()
'c'
>>> lodos_copy
[{'foo': {'b', 'a'}}]
>>> list_of_dict_of_set
[{'foo': {'b', 'a'}}]
You don't get a copy of the interior objects. They're the same object - so when they're mutated, the change shows up in both containers.
Deep copies
Deep copies are recursive copies of each interior object.
>>> lodos_deep_copy = copy.deepcopy(list_of_dict_of_set)
>>> lodos_deep_copy[0]['foo'].add('c')
>>> lodos_deep_copy
[{'foo': {'c', 'b', 'a'}}]
>>> list_of_dict_of_set
[{'foo': {'b', 'a'}}]
Changes are not reflected in the original, only in the copy.
Immutable objects
Immutable objects do not usually need to be copied. In fact, if you try to, Python will just give you the original object:
>>> a_tuple = tuple('abc')
>>> tuple_copy_attempt = a_tuple.copy()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'copy'
Tuples don't even have a copy method, so let's try it with a slice:
>>> tuple_copy_attempt = a_tuple[:]
But we see it's the same object:
>>> tuple_copy_attempt is a_tuple
True
Similarly for strings:
>>> s = 'abc'
>>> s0 = s[:]
>>> s == s0
True
>>> s is s0
True
and for frozensets, even though they have a copy method:
>>> a_frozenset = frozenset('abc')
>>> frozenset_copy_attempt = a_frozenset.copy()
>>> frozenset_copy_attempt is a_frozenset
True
When to copy immutable objects
Immutable objects should be copied if you need a mutable interior object copied.
>>> tuple_of_list = [],
>>> copy_of_tuple_of_list = tuple_of_list[:]
>>> copy_of_tuple_of_list[0].append('a')
>>> copy_of_tuple_of_list
(['a'],)
>>> tuple_of_list
(['a'],)
>>> deepcopy_of_tuple_of_list = copy.deepcopy(tuple_of_list)
>>> deepcopy_of_tuple_of_list[0].append('b')
>>> deepcopy_of_tuple_of_list
(['a', 'b'],)
>>> tuple_of_list
(['a'],)
As we can see, when the interior object of the copy is mutated, the original does not change.
Custom Objects
Custom objects usually store data in a __dict__ attribute or in __slots__ (a tuple-like memory structure.)
To make a copyable object, define __copy__ (for shallow copies) and/or __deepcopy__ (for deep copies).
from copy import copy, deepcopy
class Copyable:
__slots__ = 'a', '__dict__'
def __init__(self, a, b):
self.a, self.b = a, b
def __copy__(self):
return type(self)(self.a, self.b)
def __deepcopy__(self, memo): # memo is a dict of id's to copies
id_self = id(self) # memoization avoids unnecesary recursion
_copy = memo.get(id_self)
if _copy is None:
_copy = type(self)(
deepcopy(self.a, memo),
deepcopy(self.b, memo))
memo[id_self] = _copy
return _copy
Note that deepcopy keeps a memoization dictionary of id(original) (or identity numbers) to copies. To enjoy good behavior with recursive data structures, make sure you haven't already made a copy, and if you have, return that.
So let's make an object:
>>> c1 = Copyable(1, [2])
And copy makes a shallow copy:
>>> c2 = copy(c1)
>>> c1 is c2
False
>>> c2.b.append(3)
>>> c1.b
[2, 3]
And deepcopy now makes a deep copy:
>>> c3 = deepcopy(c1)
>>> c3.b.append(4)
>>> c1.b
[2, 3]
Shallow copy with copy.copy()
#!/usr/bin/env python3
import copy
class C():
def __init__(self):
self.x = [1]
self.y = [2]
# It copies.
c = C()
d = copy.copy(c)
d.x = [3]
assert c.x == [1]
assert d.x == [3]
# It's shallow.
c = C()
d = copy.copy(c)
d.x[0] = 3
assert c.x == [3]
assert d.x == [3]
Deep copy with copy.deepcopy()
#!/usr/bin/env python3
import copy
class C():
def __init__(self):
self.x = [1]
self.y = [2]
c = C()
d = copy.deepcopy(c)
d.x[0] = 3
assert c.x == [1]
assert d.x == [3]
Documentation: https://docs.python.org/3/library/copy.html
Tested on Python 3.6.5.
I believe the following should work with many well-behaved classed in Python:
def copy(obj):
return type(obj)(obj)
(Of course, I am not talking here about "deep copies," which is a different story, and which may be not a very clear concept -- how deep is deep enough?)
According to my tests with Python 3, for immutable objects, like tuples or strings, it returns the same object (because there is no need to make a shallow copy of an immutable object), but for lists or dictionaries it creates an independent shallow copy.
Of course this method only works for classes whose constructors behave accordingly. Possible use cases: making a shallow copy of a standard Python container class.

Understanding iterable types in comparisons

Recently I ran into cosmologicon's pywats and now try to understand part about fun with iterators:
>>> a = 2, 1, 3
>>> sorted(a) == sorted(a)
True
>>> reversed(a) == reversed(a)
False
Ok, sorted(a) returns a list and sorted(a) == sorted(a) becomes just a two lists comparision. But reversed(a) returns reversed object. So why these reversed objects are different? And id's comparision makes me even more confused:
>>> id(reversed(a)) == id(reversed(a))
True
The basic reason why id(reversed(a) == id(reversed(a) returns True , whereas reversed(a) == reversed(a) returns False , can be seen from the below example using custom classes -
>>> class CA:
... def __del__(self):
... print('deleted', self)
... def __init__(self):
... print('inited', self)
...
>>> CA() == CA()
inited <__main__.CA object at 0x021B8050>
inited <__main__.CA object at 0x021B8110>
deleted <__main__.CA object at 0x021B8050>
deleted <__main__.CA object at 0x021B8110>
False
>>> id(CA()) == id(CA())
inited <__main__.CA object at 0x021B80F0>
deleted <__main__.CA object at 0x021B80F0>
inited <__main__.CA object at 0x021B80F0>
deleted <__main__.CA object at 0x021B80F0>
True
As you can see when you did customobject == customobject , the object that was created on the fly was not destroyed until after the comparison occurred, this is because that object was required for the comparison.
But in case of id(co) == id(co) , the custom object created was passed to id() function, and then only the result of id function is required for comparison , so the object that was created has no reference left, and hence the object was garbage collected, and then when the Python interpreter recreated a new object for the right side of == operation, it reused the space that was freed previously. Hence, the id for both came as same.
This above behavior is an implementation detail of CPython (it may/may not differ in other implementations of Python) . And you should never rely on the equality of ids . For example in the below case it gives the wrong result -
>>> a = [1,2,3]
>>> b = [4,5,6]
>>> id(reversed(a)) == id(reversed(b))
True
The reason for this is again as explained above (garbage collection of the reversed object created for reversed(a) before creation of reversed object for reversed(b)).
If the lists are large, I think the most memory efficient and most probably the fastest method to compare equality for two iterators would be to use all() built-in function along with zip() function for Python 3.x (or itertools.izip() for Python 2.x).
Example for Python 3.x -
all(x==y for x,y in zip(aiterator,biterator))
Example for Python 2.x -
from itertools import izip
all(x==y for x,y in izip(aiterator,biterator))
This is because all() short circuits at the first False value is encounters, and `zip() in Python 3.x returns an iterator which yields out the corresponding elements from both the different iterators. This does not need to create a separate list in memory.
Demo -
>>> a = [1,2,3]
>>> b = [4,5,6]
>>> all(x==y for x,y in zip(reversed(a),reversed(b)))
False
>>> all(x==y for x,y in zip(reversed(a),reversed(a)))
True
sorted returns a list, whereas reversed returns a reversed object and is a different object. If you were to cast the result of reversed to a list before comparison, they will be equal.
In [8]: reversed(a)
Out[8]: <reversed at 0x2c98d30>
In [9]: reversed(a)
Out[9]: <reversed at 0x2c989b0>
reversed returns an iterable that doesn't implement a specific __eq__ operator and therefore is compared using identity.
The confusion about id(reversed(a)) == id(reversed(a)) is because after evaluating the first id(...) call the iterable can be disposed (nothing references it) and the second iterable may be reallocated at the very same memory address when the second id(...) call is done. This is however just a coincidence.
Try
ra1 = reversed(a)
ra2 = reversed(a)
and compare id(ra1) with id(ra2) and you will see they are different numbers (because in this case the iterable objects cannot be deallocated as they're referenced by ra1/ra2 variables).
You may try list(reversed(a)) ==list(reversed(a)) will return True
list(reversed(a))
[3, 2, 1]
once try
>>> v = id(reversed(a))
>>> n = id(reversed(a))
>>> v == n
False
again
>>> v = id(reversed(a))
>>> n = id(reversed(a))
>>> n1 = id(reversed(a))
>>> v == n1
True

Shallow/Deep copy in python [duplicate]

This question already has answers here:
What is the difference between shallow copy, deepcopy and normal assignment operation?
(12 answers)
Closed 8 years ago.
From my understanding of deep/shallow copying. Shallow copying assigns a new identifier to point at the same object.
>>>x = [1,2,3]
>>>y = x
>>>x,y
([1,2,3],[1,2,3])
>>>x is y
True
>>>x[1] = 14
>>>x,y
([1,14,3],[1,14,3])
Deep copying creates a new object with equivalent value :
>>>import copy
>>>x = [1,2,3]
>>>y = copy.deepcopy(x)
>>>x is y
False
>>>x == y
True
>>>x[1] = 14
>>>x,y
([1,14,3],[1,2,3])
My confusion is if x=y creates a shallow copy and the copy.copy() function also creates a shallow copy of the object then:
>>> import copy
>>> x = [1,2,3]
>>> y = x
>>> z = copy.copy(x)
>>> x is y
True
>>> x is z
False
>>> id(x),id(y),id(z)
(4301106640, 4301106640, 4301173968)
why it is creating a new object if it is supposed to be a shallow copy?
A shallow copy creates a new list object and copies across all the references contained in the source list. A deep copy creates new objects recursively.
You won't see the difference with just immutable contents. Use nested lists to see the difference:
>>> import copy
>>> a = ['foo', 'bar', 'baz']
>>> b = ['spam', 'ham', 'eggs']
>>> outer = [a, b]
>>> copy_of_outer = copy.copy(outer)
>>> outer is copy_of_outer
False
>>> outer == copy_of_outer
True
>>> outer[0] is a
True
>>> copy_of_outer[0] is a
True
>>> outer[0] is copy_of_outer[0]
True
A new copy of the outer list was created, but the contents of the original and the copy are still the same objects.
>>> deep_copy_of_outer = copy.deepcopy(outer)
>>> deep_copy_of_outer[0] is a
False
>>> outer[0] is deep_copy_of_outer[0]
False
The deep copy doesn't share contents with the original; the a list has been recursively copied as well.

Reference or Copy

I just started with Python 3. In a Python book i read that the interpreter can be forced to make a copy of a instance instead creating a reference by useing the slicing-notation.
This should create a reference to the existing instance of s1:
s1 = "Test"
s2 = s1
print(s1 == s2)
print(s1 is s2)
This should create a new instance:
s1 = "Test"
s2 = s1[:]
print(s1 == s2)
print(s1 is s2)
When running the samples above both return the same result, a reference to s1.
Can sombody explain why it's not working like described in the book? Is it a error of mine, or a error in the book?
This is true for mutable datatypes such as lists (i.e. works as you expect with s1 = [1,2,3]).
However strings and also e.g. tuples in python are immutable (meaning they cant be changed, you can only create new instances). There is no reason for a python interpreter to create copies of such objects, as you can not influence s2 via s1 or vice versa, you can only let s1 or s2 point to a different string.
Since strings are immutable, there's no great distinction between a copy of a string and a new reference to a string. The ability to create a copy of a list is important, because lists are mutable, and you might want to change one list without changing the other. But it's impossible to change a string, so there's nothing to be gained by making a copy.
Strings are immutable in Python. It means, that once a string is created and is written to the memory, it could not be modified at all. Every time you "modify" a string, a new copy is created.
You may notice another interesting behaviour of CPython:
s1 = 'Test'
s2 = 'Test'
id(s1) == id(s2)
>>> True
As you can see, in CPython s1 and s2 refer to the same address in the memory.
If you need a full copy of a mutable object (e.g. a list), check the copy module from the standard library. E.g.
import copy
l1 = [1,2,3]
l2 = l1
id(l2) == id(l1) # a reference
>>> True
l3 = copy.deepcopy(l1)
id(l3) == id(l1)
>>> False
The slice-as-copy give you a new copy of the object only if it's applied to a list:
>>> t = ()
>>> t is t[:]
True
>>> l = []
>>> l is l[:]
False
>>> s = ''
>>> s is s[:]
True
There was already some discussion in the python-dev mailing list of a way to make slice-as-copy work with strings but the feature was rejected due to performance issues.

Python list is not the same reference

This is the code:
L=[1,2]
L is L[:]
False
Why is this False?
L[:] (slice notation) means: Make a copy of the entire list, element by element.
So you have two lists that have identical content, but are separate entities. Since is evaluates object identity, it returns False.
L == L[:] returns True.
When in doubt ask for id ;)
>>> li = [1,2,4]
>>> id(li)
18686240
>>> id(li[:])
18644144
>>>
The getslice method of list, which is called when you to L[], returns a list; so, when you call it with the ':' argument, it doesn't behave differently, it returns a new list with the same elements as the original.
>>> id(L)
>>> id(L[:])
>>> L[:] == L
True
>>> L[:] is L
False

Categories