Reference or Copy - python

I just started with Python 3. In a Python book i read that the interpreter can be forced to make a copy of a instance instead creating a reference by useing the slicing-notation.
This should create a reference to the existing instance of s1:
s1 = "Test"
s2 = s1
print(s1 == s2)
print(s1 is s2)
This should create a new instance:
s1 = "Test"
s2 = s1[:]
print(s1 == s2)
print(s1 is s2)
When running the samples above both return the same result, a reference to s1.
Can sombody explain why it's not working like described in the book? Is it a error of mine, or a error in the book?

This is true for mutable datatypes such as lists (i.e. works as you expect with s1 = [1,2,3]).
However strings and also e.g. tuples in python are immutable (meaning they cant be changed, you can only create new instances). There is no reason for a python interpreter to create copies of such objects, as you can not influence s2 via s1 or vice versa, you can only let s1 or s2 point to a different string.

Since strings are immutable, there's no great distinction between a copy of a string and a new reference to a string. The ability to create a copy of a list is important, because lists are mutable, and you might want to change one list without changing the other. But it's impossible to change a string, so there's nothing to be gained by making a copy.

Strings are immutable in Python. It means, that once a string is created and is written to the memory, it could not be modified at all. Every time you "modify" a string, a new copy is created.
You may notice another interesting behaviour of CPython:
s1 = 'Test'
s2 = 'Test'
id(s1) == id(s2)
>>> True
As you can see, in CPython s1 and s2 refer to the same address in the memory.
If you need a full copy of a mutable object (e.g. a list), check the copy module from the standard library. E.g.
import copy
l1 = [1,2,3]
l2 = l1
id(l2) == id(l1) # a reference
>>> True
l3 = copy.deepcopy(l1)
id(l3) == id(l1)
>>> False

The slice-as-copy give you a new copy of the object only if it's applied to a list:
>>> t = ()
>>> t is t[:]
True
>>> l = []
>>> l is l[:]
False
>>> s = ''
>>> s is s[:]
True
There was already some discussion in the python-dev mailing list of a way to make slice-as-copy work with strings but the feature was rejected due to performance issues.

Related

Why does copy.deepcopy() behave differently for tuples than lists?

My understanding of deep-copies is that they replace references to objects with copies as new objects. Then,
Consider that:
>>> o = [1, 2, 3]
>>> l = [o]
>>> c = deepcopy(l)
>>> c[0] is l[0]
False
Compared to this:
>>> o = (1, 2, 3)
>>> l = [o]
>>> c = deepcopy(l)
>>> c[0] is l[0]
True
Why is the behaviour different?
deepcopy is redundant for immutable objects, because there's no practical way to tell the difference between a copy and the original. Yes you can use is or id() but those don't tell you much about the object itself.
A tuple is immutable, as long as all of the elements it contains are immutable. Numbers and strings are immutable so they make good tuple members. A list is never immutable.
A class may implement a method __deepcopy__ and if it does, that function will be called to make the copy. That function may return the original object or a new object depending on the properties of the class.
As John Gordon pointed out in the comments, this has nothing to do with deepcopy. Looks like your Python implementation re-uses the same object for equal tuples of literals, no matter where they appear. As in:
a = (1, 2, 3)
b = (1, 2, 3)
a is b # True
This is an implementation detail that you cannot rely on. In CPython, the most common implementation, that line used to evaluate to False. Nobody guarantees that it won't return False again next year. I plugged the same into an online interpreter that relies on the Skulpt implementation and it returns False too (code).
Just use == for comparison!
Why does CPython store equal tuples in the same memory location? I can only speculate but it's probably to conserve memory.

Understanding iterable types in comparisons

Recently I ran into cosmologicon's pywats and now try to understand part about fun with iterators:
>>> a = 2, 1, 3
>>> sorted(a) == sorted(a)
True
>>> reversed(a) == reversed(a)
False
Ok, sorted(a) returns a list and sorted(a) == sorted(a) becomes just a two lists comparision. But reversed(a) returns reversed object. So why these reversed objects are different? And id's comparision makes me even more confused:
>>> id(reversed(a)) == id(reversed(a))
True
The basic reason why id(reversed(a) == id(reversed(a) returns True , whereas reversed(a) == reversed(a) returns False , can be seen from the below example using custom classes -
>>> class CA:
... def __del__(self):
... print('deleted', self)
... def __init__(self):
... print('inited', self)
...
>>> CA() == CA()
inited <__main__.CA object at 0x021B8050>
inited <__main__.CA object at 0x021B8110>
deleted <__main__.CA object at 0x021B8050>
deleted <__main__.CA object at 0x021B8110>
False
>>> id(CA()) == id(CA())
inited <__main__.CA object at 0x021B80F0>
deleted <__main__.CA object at 0x021B80F0>
inited <__main__.CA object at 0x021B80F0>
deleted <__main__.CA object at 0x021B80F0>
True
As you can see when you did customobject == customobject , the object that was created on the fly was not destroyed until after the comparison occurred, this is because that object was required for the comparison.
But in case of id(co) == id(co) , the custom object created was passed to id() function, and then only the result of id function is required for comparison , so the object that was created has no reference left, and hence the object was garbage collected, and then when the Python interpreter recreated a new object for the right side of == operation, it reused the space that was freed previously. Hence, the id for both came as same.
This above behavior is an implementation detail of CPython (it may/may not differ in other implementations of Python) . And you should never rely on the equality of ids . For example in the below case it gives the wrong result -
>>> a = [1,2,3]
>>> b = [4,5,6]
>>> id(reversed(a)) == id(reversed(b))
True
The reason for this is again as explained above (garbage collection of the reversed object created for reversed(a) before creation of reversed object for reversed(b)).
If the lists are large, I think the most memory efficient and most probably the fastest method to compare equality for two iterators would be to use all() built-in function along with zip() function for Python 3.x (or itertools.izip() for Python 2.x).
Example for Python 3.x -
all(x==y for x,y in zip(aiterator,biterator))
Example for Python 2.x -
from itertools import izip
all(x==y for x,y in izip(aiterator,biterator))
This is because all() short circuits at the first False value is encounters, and `zip() in Python 3.x returns an iterator which yields out the corresponding elements from both the different iterators. This does not need to create a separate list in memory.
Demo -
>>> a = [1,2,3]
>>> b = [4,5,6]
>>> all(x==y for x,y in zip(reversed(a),reversed(b)))
False
>>> all(x==y for x,y in zip(reversed(a),reversed(a)))
True
sorted returns a list, whereas reversed returns a reversed object and is a different object. If you were to cast the result of reversed to a list before comparison, they will be equal.
In [8]: reversed(a)
Out[8]: <reversed at 0x2c98d30>
In [9]: reversed(a)
Out[9]: <reversed at 0x2c989b0>
reversed returns an iterable that doesn't implement a specific __eq__ operator and therefore is compared using identity.
The confusion about id(reversed(a)) == id(reversed(a)) is because after evaluating the first id(...) call the iterable can be disposed (nothing references it) and the second iterable may be reallocated at the very same memory address when the second id(...) call is done. This is however just a coincidence.
Try
ra1 = reversed(a)
ra2 = reversed(a)
and compare id(ra1) with id(ra2) and you will see they are different numbers (because in this case the iterable objects cannot be deallocated as they're referenced by ra1/ra2 variables).
You may try list(reversed(a)) ==list(reversed(a)) will return True
list(reversed(a))
[3, 2, 1]
once try
>>> v = id(reversed(a))
>>> n = id(reversed(a))
>>> v == n
False
again
>>> v = id(reversed(a))
>>> n = id(reversed(a))
>>> n1 = id(reversed(a))
>>> v == n1
True

Iteratively add elements to a list

I am trying to add elements of a list in Python and thereby generate a list of lists. Suppose I have two lists a = [1,2] and b = [3,4,5]. How can I construct the following list:
c = [[1,2,3],[1,2,4],[1,2,5]] ?
In my futile attempts to generate c, I stumbled on an erroneous preconception of Python which I would like to describe in the following. I would be grateful for someone elaborating a bit on the conceptual question posed at the end of the paragraph. I tried (among other things) to generate c as follows:
c = []
for i in b:
temp = a
temp.extend([i])
c += [temp]
What puzzled me was that a seems to be overwritten by temp. Why does this happen? It seems that the = operator is used in the mathematical sense by Python but not as an assignment (in the sense of := in mathematics).
You are not creating a copy; temp = a merely makes temp reference the same list object. As a result, temp.extend([i]) extends the same list object a references:
>>> a = []
>>> temp = a
>>> temp.extend(['foo', 'bar'])
>>> a
['foo', 'bar']
>>> temp is a
True
You can build c with a list comprehension:
c = [a + [i] for i in b]
By concatenating instead of extending, you create a new list object each iteration.
You could instead also have made an actual copy of a with:
temp = a[:]
where the identity slice (slicing from beginning to end) creates a new list containing a shallow copy.

Does a slicing operation give me a deep or shallow copy?

The official Python docs say that using the slicing operator and assigning in Python makes a shallow copy of the sliced list.
But when I write code for example:
o = [1, 2, 4, 5]
p = o[:]
And when I write:
id(o)
id(p)
I get different id's and also appending one one list does not reflect in the other list. Isn't it creating a deep copy or is there somewhere I am going wrong?
You are creating a shallow copy, because nested values are not copied, merely referenced. A deep copy would create copies of the values referenced by the list too.
Demo:
>>> lst = [{}]
>>> lst_copy = lst[:]
>>> lst_copy[0]['foo'] = 'bar'
>>> lst_copy.append(42)
>>> lst
[{'foo': 'bar'}]
>>> id(lst) == id(lst_copy)
False
>>> id(lst[0]) == id(lst_copy[0])
True
Here the nested dictionary is not copied; it is merely referenced by both lists. The new element 42 is not shared.
Remember that everything in Python is an object, and names and list elements are merely references to those objects. A copy of a list creates a new outer list, but the new list merely receives references to the exact same objects.
A proper deep copy creates new copies of each and every object contained in the list, recursively:
>>> from copy import deepcopy
>>> lst_deepcopy = deepcopy(lst)
>>> id(lst_deepcopy[0]) == id(lst[0])
False
You should know that tests using is or id can be misleading of whether a true copy is being made with immutable and interned objects such as strings, integers and tuples that contain immutables.
Consider an easily understood example of interned strings:
>>> l1=['one']
>>> l2=['one']
>>> l1 is l2
False
>>> l1[0] is l2[0]
True
Now make a shallow copy of l1 and test the immutable string:
>>> l3=l1[:]
>>> l3 is l1
False
>>> l3[0] is l1[0]
True
Now make a copy of the string contained by l1[0]:
>>> s1=l1[0][:]
>>> s1
'one'
>>> s1 is l1[0] is l2[0] is l3[0]
True # they are all the same object
Try a deepcopy where every element should be copied:
>>> from copy import deepcopy
>>> l4=deepcopy(l1)
>>> l4[0] is l1[0]
True
In each case, the string 'one' is being interned into Python's internal cache of immutable strings and is will show that they are the same (they have the same id). It is implementation and version dependent of what gets interned and when it does, so you cannot depend on it. It can be a substantial memory and performance enhancement.
You can force an example that does not get interned instantly:
>>> s2=''.join(c for c in 'one')
>>> s2==l1[0]
True
>>> s2 is l1[0]
False
And then you can use the Python intern function to cause that string to refer to the cached object if found:
>>> l1[0] is s2
False
>>> s2=intern(s2)
>>> l1[0] is s2
True
Same applies to tuples of immutables:
>>> t1=('one','two')
>>> t2=t1[:]
>>> t1 is t2
True
>>> t3=deepcopy(t1)
>>> t3 is t2 is t1
True
And mutable lists of immutables (like integers) can have the list members interred:
>>> li1=[1,2,3]
>>> li2=deepcopy(li1)
>>> li2 == li1
True
>>> li2 is li1
False
>>> li1[0] is li2[0]
True
So you may use python operations that you KNOW will copy something but the end result is another reference to an interned immutable object. The is test is only a dispositive test of a copy being made IF the items are mutable.

Why are lists linked in Python in a persistent way?

A variable is set. Another variable is set to the first. The first changes value. The second does not. This has been the nature of programming since the dawn of time.
>>> a = 1
>>> b = a
>>> b = b - 1
>>> b
0
>>> a
1
I then extend this to Python lists. A list is declared and appended. Another list is declared to be equal to the first. The values in the second list change. Mysteriously, the values in the first list, though not acted upon directly, also change.
>>> alist = list()
>>> blist = list()
>>> alist.append(1)
>>> alist.append(2)
>>> alist
[1, 2]
>>> blist
[]
>>> blist = alist
>>> alist.remove(1)
>>> alist
[2]
>>> blist
[2]
>>>
Why is this?
And how do I prevent this from happening -- I want alist to be unfazed by changes to blist (immutable, if you will)?
Python variables are actually not variables but references to objects (similar to pointers in C). There is a very good explanation of that for beginners in http://foobarnbaz.com/2012/07/08/understanding-python-variables/
One way to convince yourself about this is to try this:
a=[1,2,3]
b=a
id(a)
68617320
id(b)
68617320
id returns the memory address of the given object. Since both are the same for both lists it means that changing one affects the other, because they are, in fact, the same thing.
Variable binding in Python works this way: you assign an object to a variable.
a = 4
b = a
Both point to 4.
b = 9
Now b points to somewhere else.
Exactly the same happens with lists:
a = []
b = a
b = [9]
Now, b has a new value, while a has the old one.
Till now, everything is clear and you have the same behaviour with mutable and immutable objects.
Now comes your misunderstanding: it is about modifying objects.
lists are mutable, so if you mutate a list, the modifications are visible via all variables ("name bindings") which exist:
a = []
b = a # the same list
c = [] # another empty one
a.append(3)
print a, b, c # a as well as b = [3], c = [] as it is a different one
d = a[:] # copy it completely
b.append(9)
# now a = b = [3, 9], c = [], d = [3], a copy of the old a resp. b
What is happening is that you create another reference to the same list when you do:
blist = alist
Thus, blist referes to the same list that alist does. Thus, any modifications to that single list will affect both alist and blist.
If you want to copy the entire list, and not just create a reference, you can do this:
blist = alist[:]
In fact, you can check the references yourself using id():
>>> alist = [1,2]
>>> blist = []
>>> id(alist)
411260888
>>> id(blist)
413871960
>>> blist = alist
>>> id(blist)
411260888
>>> blist = alist[:]
>>> id(blist)
407838672
This is a relevant quote from the Python docs.:
Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.
Based on this post:
Python passes references-to-objects by value (like Java), and
everything in Python is an object. This sounds simple, but then you
will notice that some data types seem to exhibit pass-by-value
characteristics, while others seem to act like pass-by-reference...
what's the deal?
It is important to understand mutable and immutable objects. Some
objects, like strings, tuples, and numbers, are immutable. Altering
them inside a function/method will create a new instance and the
original instance outside the function/method is not changed. Other
objects, like lists and dictionaries are mutable, which means you can
change the object in-place. Therefore, altering an object inside a
function/method will also change the original object outside.
So in your example you are making the variable bList and aList point to the same object. Therefore when you remove an element from either bList or aList it is reflected in the object that they both point to.
The short answer two your question "Why is this?": Because in Python integers are immutable, while lists are mutable.
You were looking for an official reference in the Python docs. Have a look at this section:
http://docs.python.org/2/reference/simple_stmts.html#assignment-statements
Quote from the latter:
Assignment statements are used to (re)bind names to values and to
modify attributes or items of mutable objects
I really like this sentence, have never seen it before. It answers your question precisely.
A good recent write-up about this topic is http://nedbatchelder.com/text/names.html, which has already been mentioned in one of the comments.

Categories