I know that "variable assignment" in python is in fact a binding / re-bindign of a name (the variable) to an object.
This brings the question: is it possible to have proper assignment in python, eg make an object equal to another object?
I guess there is no need for that in python:
Inmutable objects cannot be 'assigned to' since they can't be changed
Mutable objects could potentially be assigned to, since they can change, and this could be useful, since you may want to manipulate a copy of dictionary separately from the original one. However, in these cases the python philosophy is to offer a cloning method on the mutable object, so you can bind a copy rather than the original.
So I guess the answer is that there is no assignment in python, the best way to mimic it would be binding to a cloned object
I simply wanted to share the question in case I'm missing something important here
Thanks
EDIT:
Both Lie Ryan and Sven Marnach answers are good, I guess the overall answer is a mix of both:
For user defined types, use the idiom:
a.dict = dict(b.dict)
(I guess this has problems as well if the assigned class has redefined attribute access methods, but lets not be fussy :))
For mutable built-ins (lists and dicts) use the cloning / copying methods they provide (eg slices, update)
finally inmutable built-ins can't be changed so can't be assigned
I'll choose Lie Ryan because it's an elegant idiom that I hadn't thought of.
Thanks!
I think you are right with your characterization of assignment in Python -- I just would like to add a different method of cloning and ways of assignment in special cases.
"Copy-constructing" a mutable built-in Python object will yield a (shallow) copy of that object:
l = [2, 3]
m = list(l)
l is m
--> False
[Edit: As pointed out by Paul McGuire in the comments, the behaviour of a "copy contructor" (forgive me the C++ terminology) for a immutable built-in Python object is implementation dependent -- you might get a copy or just the same object. But because the object is immutable anyway, you shouldn't care.]
The copy constructor could be called generically by y = type(x)(x), but this seems a bit cryptic. And of course, there is the copy module which allows for shallow and deep copies.
Some Python objects allow assignment. For example, you can assign to a list without creating a new object:
l = [2, 3]
m = l
l[:] = [3, 4, 5]
m
--> [3, 4, 5]
For dictionaries, you could use the clear() method followed by update(otherdict) to assign to a dictionary without creating a new object. For a set s, you can use
s.clear()
s |= otherset
This brings the question: is it
possible to have proper assignment in
python, eg make an object equal to
another object?
Yes you can:
a.__dict__ = dict(b.__dict__)
will do the default assignment semantic in C/C++ (i.e. do a shallow assignment).
The problem with such generalized assignment is that it never works for everybody. In C++, you can override the assignment operator since you always have to pick whether you want a fully shallow assignment, fully deep assignment, or any shade between fully deep copy and fully shallow copy.
I don't think you are missing anything.
I like to picture variables in python as the name written on 'labels' that are attached to boxes but can change its placement by assignment, whereas in other languages, assignment changes the box's contents (and the assignment operator can be overloaded).
Beginners can write quite complex applications without being aware of that, but they are usually messy programs.
Related
In Python in a Nutshell
Assignment statements can be plain or augmented.
Plain assignment to a variable
(e.g., name=value ) is how you create a new variable or rebind an existing variable to
a new value. Plain assignment to an object attribute (e.g., x.attr=value ) is a request
to object x to create or rebind attribute 'attr' . Plain assignment to an item in a
container (e.g., x[k]=value ) is a request to container x to create or rebind the item
with index or key k .
Augmented assignment (e.g., name+=value ) cannot, per se, create new references.
Augmented assignment can rebind a variable, ask an object to rebind one of its
existing attributes or items, or request the target object to modify itself. When you
make a request to an object, it is up to the object to decide whether and how to
honor the request, and whether to raise an exception.
...
In an augmented assignment, just as in a plain one, Python first evaluates the RHS
expression. Then, when the LHS refers to an object that has a special method for the
appropriate in-place version of the operator, Python calls the method with the RHS
value as its argument. It is up to the method to modify the LHS object appropriately
and return the modified object (“Special Methods” on page 123 covers special meth‐
ods). When the LHS object has no appropriate in-place special method, Python
applies the corresponding binary operator to the LHS and RHS objects, then
rebinds the target reference to the operator’s result. For example, x+=y is like
x=x.__iadd__(y) when x has special method __iadd__ for in-place addition.
Otherwise, x+=y is like x=x+y .
Augmented assignment never creates its target reference; the target must already be
bound when augmented assignment executes. Augmented assignment can rebind
the target reference to a new object, or modify the same object to which the target
reference was already bound. Plain assignment, in contrast, can create or rebind the
LHS target reference, but it never modifies the object, if any, to which the target ref‐
erence was previously bound. The distinction between objects and references to
objects is crucial here. For example, x=x+y does not modify the object to which
name x was originally bound. Rather, it rebinds the name x to refer to a new object.
x+=y , in contrast, modifies the object to which the name x is bound, when that
object has special method __iadd__ ; otherwise, x+=y rebinds the name x to a new
object, just like x=x+y .
Is the difference whether an assigment performs in-place modification and not-in-place assignment, (e.g. used by augmented assignment and plain assignment), some implementation details of Python, which programmers in Python don't have to know, or something belonging to the semantics which programmers need to know? Note: in-place modification means change the value in a memory region, while not-in-place assignment allocates a new memory region.
If the answer is no, why do programmers in Python need to know the difference? Is there any situation where programmers in Python need to be aware of the difference?
I suspect that the difference is implementation details, and programmers in Python don't need to know the difference but only need to know the semantics of assignment.
Thanks.
The docs say, regarding the __i<method>__ special methods:
These methods are called to implement the augmented arithmetic assignments (+=, -=, *=, #=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=). These methods should attempt to do the operation in-place (modifying self) and return the result (which could be, but does not have to be, self). If a specific method is not defined, the augmented assignment falls back to the normal methods. For instance, if x is an instance of a class with an __iadd__() method, x += y is equivalent to x = x.__iadd__(y) . Otherwise, x.__add__(y) and y.__radd__(x) are considered, as with the evaluation of x + y. In certain situations, augmented assignment can result in unexpected errors (see Why does a_tuple\[i\] += \[‘item’\] raise an exception when the addition works?), but this behavior is in fact part of the data model.
To answer the question you pose:
Is the difference between in-place modification and not, (e.g. used by augmented assignment and plain assignment), some implementation details of Python, which programmers in Python don't have to know?
Yes, you need to be aware of this when implementing data model for custom objects.
As a user of such objects, you would also better understand what you're doing when using augmented assignment if you understand this.
Why? If you don't implement in-place behavior, when performing augmented assignment, the name or lookup gets reassigned to a object that is the result of the standard implementation of the operation.
As an implementer, and as a user, you'll need to know this.
follow-on question from asker:
My question is about whether an assignment performs in-place modification and not-in-place assignment, not which method is invoked. in-place modification means change the value in a memory region, while not-in-place assignment creates a new memory region. I was wondering if such difference is implementation details which programmers don't need to know, or something belonging to the semantics which programmers need to know.
In Python everything is an object. Every object has a header with some details. Any object that contains other objects does not actually contain the object, rather it has a pointer, or reference, to the location in memory of the object it holds. Mutation of the object changes the reference to a new reference.
The old location of the old object only gets overwritten if the count of non-weak references to that object go to zero. You can think of this as an implementation detail, but knowledge of this helps one to be a more confident user of the language.
Users rarely need to be concerned with these details, but when you do, you'll be glad you understand it.
Again, your question:
My question is about whether an assignment performs in-place modification and not-in-place assignment, not which method is invoked.
Which method is invoked determines the behavior. Therefore you need to know which method is being invoked - either from the semantics of the documentation of the objects you are using, or from your own knowledge of the Python data model - to answer your question.
Is there any situation where programmers in Python need to be aware of the difference?
Python is a dynamic language that gives you lots of polymorphism for free (duck typing). If a function is written to work on lists, it likely will work on many list-like things. Augmented assignment throws a wrench into that. Suppose a function adds data to a collection:
>>> def add_data(collection):
... collection += ('tuple',)
...
>>> l = []
>>> add_data(l)
>>> l
['tuple']
>>> t = tuple()
>>> add_data(t)
>>> t
()
It fails silently in the second case. This is a risk generally when you have multiple references to an object and an augmented assignment is applied to one of them. Its like a box of chocolates but in a bad way.
I have recently discovered that lists in python are automatically passed by reference (unless the notation array[:] is used). For example, these two functions do the same thing:
def foo(z):
z.append(3)
def bar(z):
z.append(3)
return z
x = [1, 2]
y = [1, 2]
foo(x)
bar(y)
print(x, y)
Before now, I always returned arrays that I manipulated, because I thought I had to. Now, I understand it's superfluous (and perhaps inefficient), but it seems like returning values is generally good practice for code readability. My question is, are there any issues for doing either of these methods/ what are the best practices? Is there a third option that I am missing? I'm sorry if this has been asked before but I couldn't find anything that really answers my question.
This answer works on the assumption that the decision as to whether to modify your input in-place or return a copy has already been made.
As you noted, whether or not to return a modified object is a matter of opinion, since the result is functionally equivalent. In general, it is considered good form to not return a list that is modified in-place. According to the Zen of Python (item #2):
Explicit is better than implicit.
This is borne out in the standard library. List methods are notorious for this on SO: list.append, insert, extend, list.sort, etc.
Numpy also uses this pattern frequently, since it often deals with large data sets that would be impractical to copy and return. A common example is the array method numpy.ndarray.sort, not to be confused with the top-level function numpy.sort, which returns a new copy.
The idea is something that is very much a part of the Python way of thinking. Here is an excerpt from Guido's email that explains the whys and wherefors:
I find the chaining form a threat to readability; it requires that the reader must be intimately familiar with each of the methods. The second [unchained] form makes it clear that each of these calls acts on the same object, and so even if you don't know the class and its methods very well, you can understand that the second and third call are applied to x (and that all calls are made for their side-effects), and not to something else.
Python built-ins, as a rule, will not do both, to avoid confusion over whether the function/method modifies its argument in place or returns a new value. When modifying in place, no return is performed (making it implicitly return None). The exceptions are cases where a mutating function returns something other than the object mutated (e.g. dict.pop, dict.setdefault).
It's generally a good idea to follow the same pattern, to avoid confusion.
The "best practice" is technically to not modify the thing at all:
def baz(z):
return z + [3]
x = [1, 2]
y = baz(x)
print(x, y)
but in general it's clearer if you restrict yourself to either returning a new object or modifying an object in-place, but not both at once.
There are examples in the standard library that both modify an object in-place and return something (the foremost example being list.pop()), but that's a special case because it's not returning the object that was modified.
There's not strict should of course, However, a function should either do something, or return something.. So, you'd better either modify the list in place without returning anything, or return a new one, leaving the original one unchanged.
Note: the list is not exactly passed by reference. It's the value of the reference that is actually passed. Keep that in mind if you re-assign
I've found this statement in one of the answers to this question.
What does it mean? I would have no problem if the statement were "Python never implicitly copies dictionary objects". I believe tuples, lists, sets etc are considered "object" in python but the problem with dictionary as described in the question doesn't arise with them.
The statement in the linked answer is broader than it should be. Implicit copies are rare in Python, and in the cases where they happen, it is arguable whether Python is performing the implicit copy, but they happen.
What is definitely true is that the default rules of name assignment do not involve a copy. By default,
a = b
will not copy the object being assigned to a. This default can be overridden by a custom local namespace object, which can happen when using exec or a metaclass with a __prepare__ method, but doing so is extremely rare.
As for cases where implicit copies do happen, the first that comes to mind is that the multiprocessing standard library module performs implicit copies all over the place, which is one of the reasons that multiprocessing causes a lot of confusion. Assignments other than name assignment may also involve copies; a.b = c, a[b] = c, and a[b:c] = d may all involve copies, depending on what a is. a[b:c] = d is particularly likely to involve copying d's data, although it will usually not involve producing an object that is a copy of d.
python has a lot of difficult types. they are divide on two groups:
1) not change - integer, string, tuple
2) change - list, dictionary
for example:
- not change
x = 10
for this 'x' python create new object like 'Int' with link in memory 0x0001f0a
x += 1 # x = x + 1
python create new link in memory like 0x1003c00
- change
x = [1, 2, 'spam']
for this 'x' python create new object like 'Int' with link in memory 0x0001f0a
y = x
python copy link from 'x' to 'y'
I'm new to Python and object orient programming, and have a very basic 101 question:
I see some methods return a modified object, and preserve the original:
In: x="hello"
In: x.upper()
Out: 'HELLO'
In: x
Out: 'hello'
I see other methods modify and overwrite the original object:
In: y=[1,2,3]
In: y.pop(0)
Out: 1
In: y
Out: [2, 3]
Are either of these the norm? Is there a way to know which case I am dealing with for a given class and method?
Your examples show the difference between immutable built-in objects (e.g., strings and tuples) and mutable objects (e.g., lists, dicts, and sets).
In general, if a class (object) is described as immutable, you should expect the former behavior, and the latter for mutable objects.
Both of these are idiomatic in Python, although list.pop() is a slightly special case.
In general, methods in Python either mutate the object or return a value. list.pop() is a little unusual in that, by definition, it must do both: remove an item from the list, and return it to you.
What is not common in Python, although it is in other languages, is to mutate an object and then return that same object - which would allow for methods to be chained together like so:
shape.stretch(x=2).move(3, 5)
... but can cause programs to be harder to debug.
If an object is immutable, like a string, you can be sure that a method won't mutate it (because, by definition, it can't). Failing that, the only way to tell whether a method mutates its object is to read the documentation (normally excellent for Python's built-in and standard library objects), or, of course, the source.
I want to clean up some code I've written, in order to scale the magnitude of what I'm trying to do. In order to do so, I'd like to ideally create a list of references to objects, so that I can systematically set the objects, using a loop, without actually have to put the objects in list. I've read about the way Python handles references and pass-by, but haven't quite found a way to do this effectively.
To better demonstrate what I'm trying to do:
I'm using bokeh, and would like to set up a large number of select boxes. Each box looks like this
select_one_name = Select(
title = 'test',
value = 'first_value',
options = ['first_value', 'second_value', 'etc']
)
Setting up each select is fine, when I only have a few, but when I have 20, my code gets very long and unwieldy. What I'd like to be able to do, is have a list of sample_list = [select_one_name, select_two_name, etc] that I can then loop through, to set the values of each select_one_name, select_two_name, etc. However, I want to have my reference select_one_name still point to the correct value, rather than necessarily refer to the value by calling sample_list[0].
I'm not sure if this is do-able--if there's a better way to do this, than creating a list of references, please let me know. I know that I could just create a list of objects, but I'm trying to avoid that.
For reference, I'm on Python 2.7, Anaconda distribution, Windows 7. Thanks!
To follow up on #Alex Martelli's post below:
The reason why I thought this might not work, is because when I tried a mini-test with a list of lists, I didn't get the results I wanted. To demonstrate
x = [1, 2, 3]
y = [4, 5, 6]
test = [x, y]
test[0].append(1)
Results in x = [1, 2, 3, 1] but if instead, I use test[0] = [1, 2], then x remains [1, 2, 3], although test itself reflects the change.
Drawing a parallel back to my original example, I thought that I would see the same results as from setting to equal. Is this not true?
Every Python list always is internally an array of references (in CPython, which is no doubt what you're using, at the C level it's an array of PyObject* -- "pointers to Python objects").
No copies of the objects get made implicitly: rather (again, in CPython) each object's reference count gets incremented when the you add "the object" (actually a reference to it) to the list. In fact when you do want an object's copy you need to specifically ask for one (with the copy module in general, or sometimes with type-specific copy methods).
Multiple references to the same object are internally pointers to exactly the same memory. If an object is mutable, then mutating it gets reflected through all the references to it. Of course, there are immutable objects (strings, numbers, tuples, ...) to which such mutation cannot apply.
So when you do, e.g,
sample_list = [select_one_name, select_two_name, etc]
each of the names (as long as it's in scope) still refers to exactly the same object as the corresponding item in sample_list.
In other words, using sample_list[0] and select_one_name is totally equivalent as long as both references to the same object exist.
IOW squared, your stated purpose is already accomplished by Python's most fundamental semantics. Now, please edit the Q to clarify which behavior you're observing that seems to contradict this, versus which behavior you think you should be observing (and desire), and we may be able to help further -- because to this point all the above observations amount to "you're getting exactly the semantics you ask for" so "steady as she goes" is all I can practically suggest!-)
Added (better here in the answer than just below in comments:-): note the focus on mutating operation. The OP tried test[0]= somelist followed by test[0].append and saw somelist mutated accordingly; then tried test[0] = [1, 2] and was surprised to see somelist not changed. But that's because assignment to a reference is not a mutating operation on the object that said reference used to indicate! It just re-seats the reference, decrement the previously-referred-to object's reference count, and that's it.
If you want to mutate an existing object (which needs to be a mutable one in the first place, but, a list satisfies that), you need to perform mutating operations on it (through whatever reference, doesn't matter). For example, besides append and many other named methods, one mutating operation on a list is assignment to a slice, including the whole-list slice denoted as [:]. So, test[0][:] = [1,2] would in fact mutate somelist -- very different from test[0] = [1,2] which assigns to a reference, not to a slice.
This is not recommended, but it works.
sample_list = ["select_one_name", "select_two_name", "select_three_name"]
for select in sample_list:
locals()[select] = Select(
title = 'test',value = 'first_value',
options = ['first_value', 'second_value', 'etc']
)
You can use select_one_name, select_two_name, etc directly because they're set in the local scope due the special locals() list.
A cleaner approach is to use a dictionary, e.g.
selects = {
'select_one_name': Select(...),
'select_two_name': Select(...),
'select_three_name': Select(...)
}
And reference selects['select_one_name'] in your code and you can iterate over selects.keys() or selects.items().