Would somebody please clear up some technicalities for me.
In my course, it says that a variable doesn't contain a value per se, but a reference in computer memory where the value can be found.
For example:
a = [1, 2, 3]
a contains the reference to the location in computer memory where [1, 2, 3] can be found, sort of like an address.
Does this mean that in my computer, this value of [1, 2, 3] already exists in memory, or am I creating this value [1, 2, 3] on the spot?
a = [1, 2, 3]
causes the following actions by the Python interpreter:
Construct a list containing the elements 1, 2, 3.
Create the variable a
Make the variable from step 2 refer to the list from step 1.
A disassembly of the function might actually be enlightening in this case. Note: this answer is specific to the implementation and version of Python. This was generated with CPython 3.8.9.
Consider the function:
def x():
a = [1,2,3]
Very simple. Assign the list [1,2,3] to a local variable a.
Now let's look at the byte code that Python generated for this function:
import dis
dis.dis(x)
2 0 LOAD_CONST 1 (1)
2 LOAD_CONST 2 (2)
4 LOAD_CONST 3 (3)
6 BUILD_LIST 3
8 STORE_FAST 0 (a)
10 LOAD_CONST 0 (None)
12 RETURN_VALUE
I won't get into detail what all these byte codes mean, but you can see the list of instructions the python compiler has turned that simple function into. It loads three constants (1, 2, and 3), onto Python's stack, and uses the BUILD_LIST 3 operation to build a list from three items on the stack, and replaces them with a reference to the new list. This reference is then STOREd in the local variable 0 (which the programmer named a). Further code would use this.
So, the function actually translates your function into, roughly, the commands for "build a new list with contents 0, 1, 2" and "store the reference into a".
So, for a local variable, it is a "slot" (that the programmer has named 'a', and the compiler has named 0) with a reference to a list it just built.
Side note: The constants 1, 2, and 3 that are loaded onto the stack actually exist as references to integer objects in Python, which can have their own functions. For efficiency, CPython keeps a cache of common small numbers so there aren't copies. This is not true of many other programming languages. For example, C and Java can both have variables that contain just an integer without being a reference to an object.
In my course, it says that a variable doesn't contain a value per se, but a reference in computer memory where the value can be found.
That's true, depending on the definition of "memory" and definition of "value".
Memory might refer to virtual memory or RAM (physical memory). You don't have access to physical RAM directly in Python.
The definition of memory might include CPU registers or it might not. Some might argue that a CPU register is not memory. But still, a value may be stored there. Not in Python, though.
"value" may be an "address" again.
sort of like an address.
IMHO, it's good enough to think of it being an address. It doesn't behave like a pointer in C++, though. You can't directly write to it and the address may change over time.
this value of [1, 2, 3] already exists in memory
First, it exists in your PY file on disk. The Python interpreter will load that file into memory, so yes, those "values" exist in memory as text characters.
The interpreter may then
find that those values already exist in memory and reuse those
find a place for these values and store them in a different format (not as text characters but as int objects).
or am I creating this value [1, 2, 3] on the spot?
As mentioned before, it's kinda both. They exist in memory as text before and are then created again as their proper data types or not created because they already exist.
Additionally, to the excellent answers by #Thomas Weller, and #Barmar, you can see how the objects are stored in memory by using id() for each of the objects once they are mapped to a variable.
a = [1,2,3]
hex(id(a))
'0x1778bff00'
Furthermore, as #Max points out in their comment, this list type object is also just storing multiple int objects in this case which have their own memory location. These can be checked by the same logic -
[hex(id(i)) for i in a]
['0x10326e930', '0x10326e950', '0x10326e970']
Now, if you create another list object b which stores the 3 int objects and the previously defined list object a, you can see these refer to the same memory locations -
b = [1,2,3,a]
[hex(id(i)) for i in b]
['0x10326e930', '0x10326e950', '0x10326e970', '0x1778bff00']
And this also shows the behavior of self-referencing objects, such as an object that stores itself. But for this b has to be defined once initially already, since without a memory allocated to b you wont be able to store it into another object (in this case, store it in itself)
b = [1,2,3,b]
hex(id(b)) #'0x1590ad480'
[hex(id(i)) for i in b]
['0x10326e930', '0x10326e950', '0x10326e970', '0x17789f380']
However, if you map the same list of elements of 2 different variables, while the int objects still point to the same memory, these 2 variables have different memory locations, as expected -
d = [1,2,3]
e = [1,2,3]
print('d ->',hex(id(d)))
print('e ->',hex(id(e)))
print('elements of d ->',[hex(id(i)) for i in d])
print('elements of e ->',[hex(id(i)) for i in e])
d -> 0x16a838840
e -> 0x16a37d880
elements of d -> ['0x10326e930', '0x10326e950', '0x10326e970']
elements of e -> ['0x10326e930', '0x10326e950', '0x10326e970']
Redefining a variable with the same elements will keep the same int objects these point to, but the memory location for the variable changes.
d = [1,2,3]
print('old d ->',hex(id(d)))
d = [1,2,3]
print('new d ->',hex(id(d)))
old d -> 0x16a5e7040
new d -> 0x16a839080
What does a variable actually contain?
In short, that's not a question that even makes sense to ask about Python.
In another sense, that depends on the Python implementation. The semantics of variable in Python are simple, but somewhat abstract: a variable associates an object with a name. That's it.
In a = [1,2,3], the name is a and the object is a value of type list. Until the name a goes out of scope, is deleted with del a, or is assigned a new value, a refers to the list [1,2,3].
There is no deeper level, like "a is an address in memory where the list can be found". Python doesn't have a concept of an address space that you can access by location: it just has names for objects that exist... somewhere. Where that somewhere might be isn't important, and Python doesn't provide anyway to find out. The only two things you can do with a name are 1) look up its value and 2) make it refer to something other value.
Related
I've found this statement in one of the answers to this question.
What does it mean? I would have no problem if the statement were "Python never implicitly copies dictionary objects". I believe tuples, lists, sets etc are considered "object" in python but the problem with dictionary as described in the question doesn't arise with them.
The statement in the linked answer is broader than it should be. Implicit copies are rare in Python, and in the cases where they happen, it is arguable whether Python is performing the implicit copy, but they happen.
What is definitely true is that the default rules of name assignment do not involve a copy. By default,
a = b
will not copy the object being assigned to a. This default can be overridden by a custom local namespace object, which can happen when using exec or a metaclass with a __prepare__ method, but doing so is extremely rare.
As for cases where implicit copies do happen, the first that comes to mind is that the multiprocessing standard library module performs implicit copies all over the place, which is one of the reasons that multiprocessing causes a lot of confusion. Assignments other than name assignment may also involve copies; a.b = c, a[b] = c, and a[b:c] = d may all involve copies, depending on what a is. a[b:c] = d is particularly likely to involve copying d's data, although it will usually not involve producing an object that is a copy of d.
python has a lot of difficult types. they are divide on two groups:
1) not change - integer, string, tuple
2) change - list, dictionary
for example:
- not change
x = 10
for this 'x' python create new object like 'Int' with link in memory 0x0001f0a
x += 1 # x = x + 1
python create new link in memory like 0x1003c00
- change
x = [1, 2, 'spam']
for this 'x' python create new object like 'Int' with link in memory 0x0001f0a
y = x
python copy link from 'x' to 'y'
For instance:
a = some_process_that_generates_integer_result()
b = a
Someone told me that b and a will point to same chunk of integer object, thus b would modify the reference count of that object. The code is executed in function PyObject* ast2obj_expr(void* _o) in Python-ast.c:
static PyObject* ast2obj_object(void *o)
{
if (!o)
o = Py_None;
Py_INCREF((PyObject*)o);
return (PyObject*)o;
}
......
case Num_kind:
result = PyType_GenericNew(Num_type, NULL, NULL);
if (!result) goto failed;
value = ast2obj_object(o->v.Num.n);
if (!value) goto failed;
if (PyObject_SetAttrString(result, "n", value) == -1)
goto failed;
Py_DECREF(value);
break;
However, I think modifying reference count without ownership change is really futile. What I expect is that each variable holding primitive values (floats, integers, etc.) always have their own value, instead of referring to a same object.
And in the execution of my simple test code, I found the break point in the above Num_kind branch is never reached:
def some_function(x, y):
return (x+y)*(x-y)
a = some_function(666666,66666)
print a
b = a
print a
print b
b = a + 999999
print a
print b
b = a
print a
print b
I'm using the python2.7-dbg program provided by Debian. I'm sure the program and the source code matches, because many other break points works properly.
So, what does CPython actually do on primitive type objects?
First of all, there are no “primitive objects” in Python. Everything is an object, of the same kind, and they are all handled in the same way on the language level. As such, the following assignments work the same way regardless of the values which are assigned:
a = some_process_that_generates_integer_result()
b = a
In Python, assignments are always reference copies. So whatever the function returns, its reference is copied into the variable a. And then in the second line, the reference is again copied into the variable b. As such, both variables will refer to the exact same object.
You can easily verify this by using the id() function which will tell you the identity of an object:
print id(a)
print id(b)
This will print the same identifying number twice. Note though, that wil doing just this, you copied the reference two more times: It’s not variables that are passed to functions but copies of references.
This is different from other languages where you often differentiate between “call by value” and “call by reference”. The former means that you create a copy of the value and pass it to a function, which means that new memory is allocated for that value; the latter means that the actual reference is passed and changes to that reference affect the original variable as well.
What Python does is often called “call by assignment”: every function call where you pass arguments is essentially an assignment into new variables (which are then available to the function). And an assignment copies the reference.
When everything is an object, this is actually a very simple strategy. And as I said above, what happens with integers is then no different to what happens to other objects. The only “special” thing about integers is that they are immutable, so you cannot change their values. This means that an integer object always refers to the exact same value. This makes it easy to share the object (in memory) with multiple values. Every operation that yields a new result gives you a different object, so when you do a series of arithmetic operations, you are actually changing what object a variable is pointing to all the time.
The same happens with other immutable objects too, for example strings. Every operation that yields a changed string gives you a different string object.
Assignments with mutable objects however are the same too. It’s just that changing the value of those objects is possible, so they appear different. Consider this example:
a = [1] # creates a new list object
b = a # copies the reference to that same list object
c = [2] # creates a new list object
b = a + c # concats the two lists and creates a new list object
d = b
# at this point, we have *three* list objects
d.append(3) # mutates the list object
print(d)
print(b) # same result since b and d reference the same list object
Now coming back to your question and the C code you cite there, you are actually looking at the wrong part of CPython to get an explanation there. AST is the abstract syntax tree that the parser creates when parsing a file. It reflects the syntax structure of a program but says nothing about the actual run-time behavior yet.
The code you showed for the Num_kind is actually responsible for creating Num AST objects. You can get an idea of this when using the ast module:
>>> import ast
>>> doc = ast.parse('foo = 5')
# the document contains an assignment
>>> doc.body[0]
<_ast.Assign object at 0x0000000002322278>
# the target of that assignment has the id `foo`
>>> doc.body[0].targets[0].id
'foo'
# and the value of that assignment is the `Num` object that was
# created in that C code, with that `n` property containing the value
>>> doc.body[0].value
<_ast.Num object at 0x00000000023224E0>
>>> doc.body[0].value.n
5
If you want to get an idea of the actual evaluation of Python code, you should first look at the byte code. The byte code is what is being executed at run-time by the virtual machine. You can use the dis module to see byte code for Python code:
>>> def test():
foo = 5
>>> import dis
>>> dis.dis(test)
2 0 LOAD_CONST 1 (5)
3 STORE_FAST 0 (foo)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
As you can see, there are two major byte code instructions here: LOAD_CONST and STORE_FAST. LOAD_CONST will just load a constant value onto the evaluation stack. In this example, we just load a constant number, but we could also load the value from a function call instead (try playing with the dis module to figure out how it works).
The assignment itself is made using STORE_FAST. The byte code interpreter does the following for that instruction:
TARGET(STORE_FAST)
{
v = POP();
SETLOCAL(oparg, v);
FAST_DISPATCH();
}
So it essentially gets the value (the reference to the integer object) from the stack, and then calls SETLOCAL which essentially will just assign the value to local variable.
Note though, that this does not increase the reference count of that value. That’s what happens with LOAD_CONST, or any other byte code instruction that gets a value from somewhere:
TARGET(LOAD_CONST)
{
x = GETITEM(consts, oparg);
Py_INCREF(x);
PUSH(x);
FAST_DISPATCH();
}
So tl;dr: Assignments in Python are always reference copies. References are also copied whenever a value is used (but in many other situations that copied reference only exists for a short time). The AST is responsible for creating an object representation of parsed programs (only the syntax), while the byte code interpreter runs the previously compiled byte code to do actual stuff at run-time and deal with real objects.
I want to clean up some code I've written, in order to scale the magnitude of what I'm trying to do. In order to do so, I'd like to ideally create a list of references to objects, so that I can systematically set the objects, using a loop, without actually have to put the objects in list. I've read about the way Python handles references and pass-by, but haven't quite found a way to do this effectively.
To better demonstrate what I'm trying to do:
I'm using bokeh, and would like to set up a large number of select boxes. Each box looks like this
select_one_name = Select(
title = 'test',
value = 'first_value',
options = ['first_value', 'second_value', 'etc']
)
Setting up each select is fine, when I only have a few, but when I have 20, my code gets very long and unwieldy. What I'd like to be able to do, is have a list of sample_list = [select_one_name, select_two_name, etc] that I can then loop through, to set the values of each select_one_name, select_two_name, etc. However, I want to have my reference select_one_name still point to the correct value, rather than necessarily refer to the value by calling sample_list[0].
I'm not sure if this is do-able--if there's a better way to do this, than creating a list of references, please let me know. I know that I could just create a list of objects, but I'm trying to avoid that.
For reference, I'm on Python 2.7, Anaconda distribution, Windows 7. Thanks!
To follow up on #Alex Martelli's post below:
The reason why I thought this might not work, is because when I tried a mini-test with a list of lists, I didn't get the results I wanted. To demonstrate
x = [1, 2, 3]
y = [4, 5, 6]
test = [x, y]
test[0].append(1)
Results in x = [1, 2, 3, 1] but if instead, I use test[0] = [1, 2], then x remains [1, 2, 3], although test itself reflects the change.
Drawing a parallel back to my original example, I thought that I would see the same results as from setting to equal. Is this not true?
Every Python list always is internally an array of references (in CPython, which is no doubt what you're using, at the C level it's an array of PyObject* -- "pointers to Python objects").
No copies of the objects get made implicitly: rather (again, in CPython) each object's reference count gets incremented when the you add "the object" (actually a reference to it) to the list. In fact when you do want an object's copy you need to specifically ask for one (with the copy module in general, or sometimes with type-specific copy methods).
Multiple references to the same object are internally pointers to exactly the same memory. If an object is mutable, then mutating it gets reflected through all the references to it. Of course, there are immutable objects (strings, numbers, tuples, ...) to which such mutation cannot apply.
So when you do, e.g,
sample_list = [select_one_name, select_two_name, etc]
each of the names (as long as it's in scope) still refers to exactly the same object as the corresponding item in sample_list.
In other words, using sample_list[0] and select_one_name is totally equivalent as long as both references to the same object exist.
IOW squared, your stated purpose is already accomplished by Python's most fundamental semantics. Now, please edit the Q to clarify which behavior you're observing that seems to contradict this, versus which behavior you think you should be observing (and desire), and we may be able to help further -- because to this point all the above observations amount to "you're getting exactly the semantics you ask for" so "steady as she goes" is all I can practically suggest!-)
Added (better here in the answer than just below in comments:-): note the focus on mutating operation. The OP tried test[0]= somelist followed by test[0].append and saw somelist mutated accordingly; then tried test[0] = [1, 2] and was surprised to see somelist not changed. But that's because assignment to a reference is not a mutating operation on the object that said reference used to indicate! It just re-seats the reference, decrement the previously-referred-to object's reference count, and that's it.
If you want to mutate an existing object (which needs to be a mutable one in the first place, but, a list satisfies that), you need to perform mutating operations on it (through whatever reference, doesn't matter). For example, besides append and many other named methods, one mutating operation on a list is assignment to a slice, including the whole-list slice denoted as [:]. So, test[0][:] = [1,2] would in fact mutate somelist -- very different from test[0] = [1,2] which assigns to a reference, not to a slice.
This is not recommended, but it works.
sample_list = ["select_one_name", "select_two_name", "select_three_name"]
for select in sample_list:
locals()[select] = Select(
title = 'test',value = 'first_value',
options = ['first_value', 'second_value', 'etc']
)
You can use select_one_name, select_two_name, etc directly because they're set in the local scope due the special locals() list.
A cleaner approach is to use a dictionary, e.g.
selects = {
'select_one_name': Select(...),
'select_two_name': Select(...),
'select_three_name': Select(...)
}
And reference selects['select_one_name'] in your code and you can iterate over selects.keys() or selects.items().
This question already has answers here:
How do I pass a variable by reference?
(39 answers)
Closed 8 years ago.
Since Python doesn't have pointers, I am wondering how I can pass a reference to an object through to a function instead of copying the entire object. This is a very contrived example, but say I am writing a function like this:
def some_function(x):
c = x/2 + 47
return c
y = 4
z = 12
print some_function(y)
print some_function(z)
From my understanding, when I call some_function(y), Python allocates new space to store the argument value, then erases this data once the function has returned c and it's no longer needed. Since I am not actually altering the argument within some_function, how can I simply reference y from within the function instead of copying y when I pass it through? In this case it doesn't matter much, but if y was very large (say a giant matrix), copying it could eat up some significant time and space.
Your understanding is, unfortunately, completely wrong. Python does not copy the value, nor does it allocate space for a new one. It passes a value which is itself a reference to the object. If you modify that object (rather than rebinding its name), then the original will be modified.
Edit
I wish you would stop worrying about memory allocation: Python is not C++, almost all of the time you don't need to think about memory.
It's easier to demonstrate rebinding via the use of something like a list:
def my_func(foo):
foo.append(3) # now the source list also has the number 3
foo = [3] # we've re-bound 'foo' to something else, severing the relationship
foo.append(4) # the source list is unaffected
return foo
original = [1, 2]
new = my_func(original)
print original # [1, 2, 3]
print new # [3, 4]
It might help if you think in terms of names rather than variables: inside the function, the name "foo" starts off being a reference to the original list, but then we change that name to point to a new, different list.
Python parameters are always "references".
The way parameters in Python works and the way they are explained on the docs can be confusing and misleading to newcomers to the languages, specially if you have a background on other languages which allows you to choose between "pass by value" and "pass by reference".
In Python terms, a "reference" is just a pointer with some more metadata to help the garbage collector do its job. And every variable and every parameter are always "references".
So, internally, Python pass a "pointer" to each parameter. You can easily see this in this example:
>>> def f(L):
... L.append(3)
...
>>> X = []
>>> f(X)
>>> X
[3]
The variable X points to a list, and the parameter L is a copy of the "pointer" of the list, and not a copy of the list itself.
Take care to note that this is not the same as "pass-by-reference" as C++ with the & qualifier, or pascal with the var qualifier.
This question already has answers here:
Why variable = object doesn't work like variable = number
(10 answers)
Closed 4 years ago.
There is this code:
# assignment behaviour for integer
a = b = 0
print a, b # prints 0 0
a = 4
print a, b # prints 4 0 - different!
# assignment behaviour for class object
class Klasa:
def __init__(self, num):
self.num = num
a = Klasa(2)
b = a
print a.num, b.num # prints 2 2
a.num = 3
print a.num, b.num # prints 3 3 - the same!
Questions:
Why assignment operator works differently for fundamental type and
class object (for fundamental types it copies by value, for class object it copies by reference)?
How to copy class objects only by value?
How to make references for fundamental types like in C++ int& b = a?
This is a stumbling block for many Python users. The object reference semantics are different from what C programmers are used to.
Let's take the first case. When you say a = b = 0, a new int object is created with value 0 and two references to it are created (one is a and another is b). These two variables point to the same object (the integer which we created). Now, we run a = 4. A new int object of value 4 is created and a is made to point to that. This means, that the number of references to 4 is one and the number of references to 0 has been reduced by one.
Compare this with a = 4 in C where the area of memory which a "points" to is written to. a = b = 4 in C means that 4 is written to two pieces of memory - one for a and another for b.
Now the second case, a = Klass(2) creates an object of type Klass, increments its reference count by one and makes a point to it. b = a simply takes what a points to , makes b point to the same thing and increments the reference count of the thing by one. It's the same as what would happen if you did a = b = Klass(2). Trying to print a.num and b.num are the same since you're dereferencing the same object and printing an attribute value. You can use the id builtin function to see that the object is the same (id(a) and id(b) will return the same identifier). Now, you change the object by assigning a value to one of it's attributes. Since a and b point to the same object, you'd expect the change in value to be visible when the object is accessed via a or b. And that's exactly how it is.
Now, for the answers to your questions.
The assignment operator doesn't work differently for these two. All it does is add a reference to the RValue and makes the LValue point to it. It's always "by reference" (although this term makes more sense in the context of parameter passing than simple assignments).
If you want copies of objects, use the copy module.
As I said in point 1, when you do an assignment, you always shift references. Copying is never done unless you ask for it.
Quoting from Data Model
Objects are Python’s abstraction for data. All data in a Python
program is represented by objects or by relations between objects. (In
a sense, and in conformance to Von Neumann’s model of a “stored
program computer,” code is also represented by objects.)
From Python's point of view, Fundamental data type is fundamentally different from C/C++. It is used to map C/C++ data types to Python. And so let's leave it from the discussion for the time being and consider the fact that all data are object and are manifestation of some class. Every object has an ID (somewhat like address), Value, and a Type.
All objects are copied by reference. For ex
>>> x=20
>>> y=x
>>> id(x)==id(y)
True
>>>
The only way to have a new instance is by creating one.
>>> x=3
>>> id(x)==id(y)
False
>>> x==y
False
This may sound complicated at first instance but to simplify a bit, Python made some types immutable. For example you can't change a string. You have to slice it and create a new string object.
Often copying by reference gives unexpected results for ex.
x=[[0]*8]*8 may give you a feeling that it creates a two dimensional list of 0s. But in fact it creates a list of the reference of the same list object [0]s. So doing x[1][1] would end up changing all the duplicate instance at the same time.
The Copy module provides a method called deepcopy to create a new instance of the object rather than a shallow instance. This is beneficial when you intend to have two distinct object and manipulate it separately just as you intended in your second example.
To extend your example
>>> class Klasa:
def __init__(self, num):
self.num = num
>>> a = Klasa(2)
>>> b = copy.deepcopy(a)
>>> print a.num, b.num # prints 2 2
2 2
>>> a.num = 3
>>> print a.num, b.num # prints 3 3 - different!
3 2
It doesn't work differently. In your first example, you changed a so that a and b reference different objects. In your second example, you did not, so a and b still reference the same object.
Integers, by the way, are immutable. You can't modify their value. All you can do is make a new integer and rebind your reference. (like you did in your first example)
Suppose you and I have a common friend. If I decide that I no longer like her, she is still your friend. On the other hand, if I give her a gift, your friend received a gift.
Assignment doesn't copy anything in Python, and "copy by reference" is somewhere between awkward and meaningless (as you actually point out in one of your comments). Assignment causes a variable to begin referring to a value. There aren't separate "fundamental types" in Python; while some of them are built-in, int is still a class.
In both cases, assignment causes the variable to refer to whatever it is that the right-hand-side evaluates to. The behaviour you're seeing is exactly what you should expect in that environment, per the metaphor. Whether your "friend" is an int or a Klasa, assigning to an attribute is fundamentally different from reassigning the variable to a completely other instance, with the correspondingly different behaviour.
The only real difference is that the int doesn't happen to have any attributes you can assign to. (That's the part where the implementation actually has to do a little magic to restrict you.)
You are confusing two different concepts of a "reference". The C++ T& is a magical thing that, when assigned to, updates the referred-to object in-place, and not the reference itself; that can never be "reseated" once the reference is initialized. This is useful in a language where most things are values. In Python, everything is a reference to begin with. The Pythonic reference is more like an always-valid, never-null, not-usable-for-arithmetic, automatically-dereferenced pointer. Assignment causes the reference to start referring to a different thing completely. You can't "update the referred-to object in-place" by replacing it wholesale, because Python's objects just don't work like that. You can, of course, update its internal state by playing with its attributes (if there are any accessible ones), but those attributes are, themselves, also all references.