Python confusion -- convention, name and value

Python confusion -- convention, name and value - python

I am a beginner and have a confusion when I am learning python. If I have the following python code:
import numpy as np
X = np.array([1,0,0])
Y = X
X[0] = 2
print Y
Y will be shown to be array([2, 0, 0])
However, if I do the following:
import numpy as np
X = np.array([1,0,0])
Y = X
X = 2*X
print Y
Y is still array([1,0,0])
What is going on?

think of it this way:
the equals sign in python assigns references.
Y = X makes Y point to the same address X points to
X[0] = 2 makes x[0] point to 2
X = 2*X makes X point to a new thing, but Y is still pointing to the address of the original X, so Y is unchanged
this isn't exactly true, but its close enough to understand the principle

That's because X and Y are references to the same object np.array([1,0,0]) this means that regardless whether a call is done through X or Y, the result will be the same, but changing the reference of one, has no effect.
If you write:
X = np.array([1,0,0])
Y = X
basically what happens is that there are two local variables X and Y that refer to the same object. So the memory looks like:
+--------+
Y -> |np.array| <- X
+--------+
|[1,0,0] |
+--------+
Now if you do X[0] = 2 that is basically short for:
X.__setitem__(0,2)
so you call a method on the object. So now the memory looks like:
+--------+
Y -> |np.array| <- X
+--------+
|[2,0,0] |
+--------+
If you however write:
X = 2*X
first 2*X is evaluated. Now 2*X is short for:
X.__rmul__(2)
(Python first looks if 2 supports __mul__ for X, but since 2 will raise a NotImplementedException), Python will fallback to X.__rmul__). Now X.__rmul__ does not change X: it leaves X intact, but constructs a new array and returns that. X catches by that new array that now references to that array).
which creates an new array object: array([4, 0, 0]) and then X references to that new object. So now the memory looks like:
+--------+ +--------+
Y -> |np.array| X ->|np.array|
+--------+ +--------+
|[2,0,0] | |[4,0,0] |
+--------+ +--------+
But as you can see, Y still references to the old object.

This is more about convention and names than reference and value.
When you assign:
Y = X
Then the name Y refers to the object that the name X points to. In some way the pointer X and Y point to the same object:
X is Y # True
The is checks if the names point to the same object!
Then it get's tricky: You do some operations on the arrays.
X[0] = 2
That's called "item assignment" and calls
X.__setitem__(0, 2)
What __setitem__ should do (convention) is to update some value in the container X. So X should still point to the same object afterwards.
However X * 2 is "multiplication" and the convention states that this should create a new object (again convention, you can change that behaviour by overwriting X.__mul__). So when you do
X = X * 2
The name X now refers to the new object that X * 2 created:
X is Y # False
Normally common libraries follow these conventions but it's important to highlight that you can completly change this!

When you say X = np.array([1, 0, 0]), you create an object that has some methods and some internal buffers that contain the actual data and other information in it.
Doing Y = X sets Y to refer to the same actual object. This is called binding to a name in Python. You have bound the same object that was bound to X to the name Y as well.
Doing X[0] = 2 calls the object's __setitem__ method, which does some stuff to the underlying buffers. If modifies the object in place. Now when you print the values of either X or Y, the numbers that come out of that object's buffers are 2, 0, 0.
Doing X = 2 * X translates to X.__rmul__(2). This method does not modify X in place. It creates and returns a new array object, each of whose elements is twice the corresponding element of X. Then you bind the new object to the name X. However, the name Y is still bound to the original array because you have not done anything to change that. As an aside, X.__rmul__ is used because 2.__mul__(X) does not work. Numpy arrays naturally define multiplication to be commutative, so X.__mul__ and X.__rmul__ should to the same thing.
It is interesting to note that you can also do X *= 2, which will propagate the changes to Y. This is because the *= operator translates to the __imul__ method, which does modify the input in place.

Related

Details about how a=b=c works

From this answer: How do chained assignments work?, I understand that chained assignement in Python :
x = y = z # (1)
is equivalent to:
temp = z
x = temp
y = temp
But is (1) also equivalent to:
x = z
y = x
?
Or is there a slight difference (for example when z = some_function())? If so, which difference?

In the very example you give, yes, the effects of the two approaches are practically identical because both involve simply assigning the same reference to a number of names.
Be aware, however, that if the expressions in the assignment targets involve more complex evaluations, the two approaches could be different.
For example, consider the following chain expression, where x is initialized as a dict and expensive_func is a time-consuming function that returns a key:
x[expensive_func()] = y = some_function()
While it would be indeed equivalent to the following:
temp = some_function()
x[expensive_func()] = temp
y = temp
it would not be be equivalent to the second approach:
x[expensive_func()] = some_function()
y = x[expensive_func()]
since expensive_func would then have to be called twice, doubling the time taken, and triggering the side effect of the function twice, if it has any.
Also, consider the following code:
obj = []
x = []
x[:] = y = obj
print(id(obj), id(x), id(y))
where the output would show that y gets assigned the same reference as obj, while x is different.
That code is then indeed equivalent to:
obj = []
x = []
temp = obj
x[:] = temp
y = temp
print(id(obj), id(x), id(y))
But not equivalent to:
obj = []
x = []
x[:] = obj
y = x[:]
print(id(obj), id(x), id(y))
The latter of which would show y getting a different reference from both obj and x.

I always find using examples to be the best way to understand things (in general).
Let's say we have a func:
def function_sample ():
print(20)
If you print it:
print(function_sample)
<function function_sample at 0x7f8f840a01f0>
returns the function object.
When assigning to a variable a function without parentheses (without calling/running it).
x = function_sample
print(x)
you will get the same message: <function function_sample at 0x7f8f840a01f0>
However, if you run it (with parentheses).
print(x())
You will see :
20
None
Why None? It's because Python functions have a default return value, which is None if no return expression is given, or return is given on its own.
Another sample:
def another_sample(some):
print(some)
y = another_sample
print(y)
As you probably have guessed it : <function another_sample at 0x7f8f7e747700>
If you try to print y() you will get an error because the some argument is missing.
But if we add one:
print(y(5))
5
None
One last example:
def third_sample ():
return 20
aa = third_sample # without running the func
bb = third_sample() # calling/running the func
print(aa) # function object
print(bb) # 20

The 2 approaches you have shown are both functional and legit in terms of going about chaining and using previous variables. No difference at all
When assigning variables to the same number of variables, instead of doing the typical:
x = 0
y = 0
OR using tuple unpacking approach:
(x,y) = 0,0
You could just do like what you have (chained assignment):
x = y = 0
This could be used with any object (being called on) for the RHS, and that:
x = y = some_object()
is the same as:
tmp = some_object()
x = tmp
y = tmp
and when you del tmp, the xand y become useless or nothing.

Assignment of variables inside function changes assignment outside - Python

I moved from using Matlab to Python and the variable assignment while using functions is confusing me.
I have a code as follows:
a = [1,1,1]
def keeps(x):
y = x[:]
y[1] = 2
return y
def changes(x):
y = x
y[1] = 2
return y
aout = keeps(a)
print(a, aout)
aout = changes(a)
print(a, aout)
The first print statement gives [1, 1, 1] [1, 2, 1], while
the second one gives [1, 2, 1] [1, 2, 1].
I had a understanding (coming from Matlab) that the operations on a variable within a function are local. But here, if I don't make a copy of the variable inside a function, the values change outside the function as well. It's almost as if the variable is defined as global.
It will be very helpful if someone can explain how the variables are allocated differently in both the methods and what are the best practices if one wants to send a variable to the function without affecting it's value outside the function? Thanks.

Argument passing is done by assignment. In changes, the first thing that happens implicitly is
x = a when you call changes(a). Since assingment NEVER copies data you mutate a.
In keeps you are not mutating the argument list because x[:] is creating a (shallow) copy which then the name y is assigned to.
I highly recommend watching Facts and Myths about Python names and values.

Let's look at your code, but first, we will mode the function declarations to the top, so that the order of execution becomes clearer.
def keeps(x):
y = x[:] #Here you are creating a modifiable copy of the original x list and referencing it with y
y[1] = 2
return y
def changes(x):
y = x # Here you are just referencing x itself with a new name y
y[1] = 2
return y
a = [1,1,1]
aout = keeps(a)
print(a, aout)
aout = changes(a)
print(a, aout)
Basically if you just assign another variable name to a list, you are giving two names to the same object, so any changes in the contents may affect both "lists". When you use y = x[:]you are in fact creating a new copy of the x list in memory, through list slicing, and assigning the new variable name y to that new copy of the list.

Variables and aliases with Python's code.interact

This behavior has me puzzled:
import code
class foo():
def __init__(self):
self.x = 1
def interact(self):
v = globals()
v.update(vars(self))
code.interact(local=v)
c = foo()
c.interact()
Python 2.6.6 (r266:84292, Sep 11 2012, 08:34:23)
(InteractiveConsole)
>>> id(x)
29082424
>>> id(c.x)
29082424
>>> x
1
>>> c.x
1
>>> x=2
>>> c.x
1
Why doesn't 'c.x' behave like an alias for 'x'? If I understand the id() function correctly, they are both at the same memory address.

Small integers from from -5 to 256 are cached in python, i.e their id() is always going to be same.
From the docs:
The current implementation keeps an array of integer objects for all
integers between -5 and 256, when you create an int in that range you
actually just get back a reference to the existing object.
>>> x = 1
>>> y = 1 #same id() here as integer 1 is cached by python.
>>> x is y
True
Update:
If two identifiers return same value of id() then it doesn't mean they can act as alias of
each other, it totally depends on the type of the object they are pointing to.
For immutable object you cannot create alias in python. Modifying one of the reference to an immutable object will simple make it point to a new object, while other references to that older object will still remain intact.
>>> x = y = 300
>>> x is y # x and y point to the same object
True
>>> x += 1 # modify x
>>> x # x now points to a different object
301
>>> y #y still points to the old object
300
A mutable object can be modified from any of it's references, but those modifications must be in-place modifications.
>>> x = y = []
>>> x is y
True
>>> x.append(1) # list.extend is an in-place operation
>>> y.append(2) # in-place operation
>>> x
[1, 2]
>>> y #works fine so far
[1, 2]
>>> x = x + [1] #not an in-place operation
>>> x
[1, 2, 1] #assigns a new object to x
>>> y #y still points to the same old object
[1, 2]

code.interact simply did (effectively) x=c.x for you. So when you checked their ids, they were pointing to the exact same object. But x=2 creates a new binding for the variable x. It is not an alias. Python does not have aliases, as far as I am aware.
Yes, in CPython id(x) is the memory address of the object x points to. It is not the memory address of the variable x itself (which is, after all, just a key in a dictionary).

If I understand the id() function correctly, they are both at the same memory address.
You don't understand it correctly. id returns an integer in respect of which the following identity is guaranteed: if id(x) == id(y) then x is y is guaranteed (and vice versa).
Accordingly, id tells you about the objects (values) that variables point to, not about the variables themselves.
Any relationship to memory addresses is purely an implementation detail. Python, unlike, e.g. C, does not assume any particular relationship to the underlying machine (whether physical or virtual). Variables in python are both opaque, and not language accessible (i.e. not first class).

Python closures and cells (closed-over values)

What is the Python mechanism that makes it so that
[lambda: x for x in range(5)][2]()
is 4?
What is the usual trick for binding a copy of x to each lamba expression so that the above expression will equal 2?
My final solution:
for template, model in zip(model_templates, model_classes):
def create_known_parameters(known_parms):
return lambda self: [getattr(self, p.name)
for p in known_parms]
model.known_parameters = create_known_parameters(template.known_parms)

>>> [lambda x=x: x for x in range(5)][2]()
2

I usually use functools.partial:
[ partial(lambda x: x, x) for x in range(5) ]
Or, of course, you can do it yourself:
[ (lambda x: (lambda: x))(x) for x in range(5) ]

Since no one's answered the "what is the mechanism" part, and this surprised me when I first read it, here's a go:
This:
ls = [lambda: x for x in range(5)]
Is a bit like this:
ls = []
x = 0
ls.append(lambda: x)
x = 1
ls.append(lambda: x)
x = 2
ls.append(lambda: x)
x = 3
ls.append(lambda: x)
x = 4
ls.append(lambda: x)
Each of those lambdas has its own scope, but none of those scopes contain an x. So they're all going to be reading the value of x by looking in an outer scope, so of course they must all be referring to the same object. By the time any of them are called, the loop is done and that object is the integer 4.
So even though these lambdas look like functions involving only immutable values, they can still be affected by side effects, because they depend on the bindings in an outer scope, and that scope can change.
You could of course further change what these lambdas return by rebinding x, or make them throw an error by unbinding it. In the list comprehension version though, the x is only bound in a private scope inside the list comprehension, so you can't mess with it (as easily).
The solution is of course to arrange things so that each lambda has an x in its local scope (or at least some outer scope that is not shared between the lambdas), so that they can all refer to different objects. Ways to do that have been shown in the other answers.

Pointers in Python? ` x.pointerDest = y.pointerDest`?

I am breaking my old question to parts because it is very messy beast here. This question is related to this answer and this answer. I try to understand pointers, not sure even whether they exist in Python.
# Won't change x with y=4
>>> x = 0; y = x; y = 4; x
0
# Won't change y
>>> x = 0; y = x; x = 2; y
0
#so how can I change pointers? Do they even exist in Python?
x = 0
y.pointerDestination = x.pointerDestination #How? By which command?
x = 2
# y should be 0, how?
[Update 2: Solved]
Perhaps, contradictive statements about the existence There are no pointers in Python. and Python does not have the concept of a "pointer" to a simple scalar value.. Does the last one infer that there are pointers to something else, nullifying the first statement?

Scalar objects in Python are immutable. If you use a non-scalar object, such as a list, you can do this:
>>> x = [0]
>>> y = x
>>> y[0] = 4
>>> y
[4]
>>> x
[4]
>>> x is y
True
Python does not have the concept of a "pointer" to a simple scalar value.

Don't confuse pointers to references. They are not the same thing. A pointer is simply an address to an object. You don't really have access to the address of an object in python, only references to them.
When you assign an object to a variable, you are assigning a reference to some object to the variable.
x = 0
# x is a reference to an object `0`
y = [0]
# y is a reference to an object `[0]`
Some objects in python are mutable, meaning you can change properties of the object. Others are immutable, meaning you cannot change the properties of the object.
int (a scalar object) is immutable. There isn't a property of an int that you could change (aka mutating).
# suppose ints had a `value` property which stores the value
x.value = 20 # does not work
list (a non-scalar object) on the other hand is mutable. You can change individual elements of the list to refer to something else.
y[0] = 20 # changes the 0th element of the list to `20`
In the examples you've shown:
>>> x = [0]
>>> y = [x]
you're not dealing with pointers, you're dealing with references to lists with certain values. x is a list that contains a single integer 0. y is a list that contains a reference to whatever x refers to (in this case, the list [0]).
You can change the contents of x like so:
>>> print(x)
[0]
>>> x[0] = 2
>>> print(x)
[2]
You can change the contents of the list referenced by x through y's reference:
>>> print(x)
[2]
>>> print(y)
[[2]]
>>> y[0][0] = 5
>>> print(x)
[5]
>>> print(y)
[[5]]
You can change the contents of y to reference something else:
>>> print(y)
[[5]]
>>> y[0] = 12345
>>> print(x)
[5]
>>> print(y)
[12345]
It's basically the same semantics of a language such as Java or C#. You don't use pointers to objects directly (though you do indirectly since the implementations use pointers behind the scenes), but references to objects.

There are no pointers in Python. There are things called references (which, like C++ references, happen to be commonly implemented in pointers - but unlike C++ references don't imply pass-by-reference). Every variable stores a reference to an object allocated somewhere else (on the heap). Every collection stores references to objects allocated somewhere else. Every member of an object stores a reference to an object allocated somewhere else.
The simple expression x evaluates to the reference stored in x - whoever uses it has no way to determine that is came from a variable. There's no way to get a link to a variable (as opposed to the contents of it) that could be used to track changes of that variable. Item (x[y] = ...) and member (x.y = ...) assignments are different in one regard: They invoke methods and mutate existing objects instead of overwriting a local variable. This difference is mainly important when dealing with scoping, but you can use either of those to emulate mutability for immutable types (as shown by #Greg Hewgill) and to share state changes across function boundaries (def f(x): x = 0 doesn't change anything, but def g(x): x.x = 0 does). It's not fully up to emulating pass by reference though - unless you replace every variable by a wrapper object whose sole purpose is to hold a mutable val property. This would be the equivalent to emulating pass-by-reference through pointers in C, but much more cumbersome.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python confusion -- convention, name and value - python

Related

Details about how a=b=c works

Assignment of variables inside function changes assignment outside - Python

Variables and aliases with Python's code.interact

Python closures and cells (closed-over values)

Pointers in Python? ` x.pointerDest = y.pointerDest`?

Categories

Resources