Creating a list using mutable sequence methods in Python fails - python

These work:
>>> print [1,2,3] + [4,5]
[1,2,3,4,5]
>>> 'abc'.upper()
'ABC'
This doesn't:
>>> print [1,2,3].extend([4,5])
None
Why? You can use string methods on bare strings, so why can't you use methods on bare sequence types such as lists? Even this doesn't work:
>>> print list([1,2,3]).extend([4,5])
None
N.B. For a colloquial meaning of 'work'. Of course there'll be a good reason why my expected behaviour is incorrect. I'm just curious what it is.
P.S. I've accepted soulcheck's answer below, and he is right, but on investigating how the addition operator is implemented I just found that the following works:
>>> [1,2,3].__add__([4,5])
[1, 2, 3, 4, 5]
But presumably the add method doesn't modify the underlying object and creates a new one to return, like string methods.

extend works, just doesn't return any value as it's modifying the object it works on. That's a python convention to not return any value from methods that modify the object they're called on.
Try:
a = [1,2,3]
a.extend([4,5])
print a
edit:
As for why one reason for this could be that it's sometimes ambigous what the method should return. Should list.extend return the extended list? A boolean? New list size?
Also it's an implementation of command-query separation, ie. if it returned something it would be a query and command in one and cqs says it's evil (although sometimes handy - see list.pop()).
So generally you shouldn't return anything from mutators, but as with almost everything in python it's just a guideline.
list.__add__() is the method called when you do + on lists - it returns a new list.

This is due to the design of the interface. Extend does extend the list but does not return any value.
l = [1, 2, 3]
l.extend([4, 5])
print l

Related

Should I ever return a list that was passed by reference and modified?

I have recently discovered that lists in python are automatically passed by reference (unless the notation array[:] is used). For example, these two functions do the same thing:
def foo(z):
z.append(3)
def bar(z):
z.append(3)
return z
x = [1, 2]
y = [1, 2]
foo(x)
bar(y)
print(x, y)
Before now, I always returned arrays that I manipulated, because I thought I had to. Now, I understand it's superfluous (and perhaps inefficient), but it seems like returning values is generally good practice for code readability. My question is, are there any issues for doing either of these methods/ what are the best practices? Is there a third option that I am missing? I'm sorry if this has been asked before but I couldn't find anything that really answers my question.
This answer works on the assumption that the decision as to whether to modify your input in-place or return a copy has already been made.
As you noted, whether or not to return a modified object is a matter of opinion, since the result is functionally equivalent. In general, it is considered good form to not return a list that is modified in-place. According to the Zen of Python (item #2):
Explicit is better than implicit.
This is borne out in the standard library. List methods are notorious for this on SO: list.append, insert, extend, list.sort, etc.
Numpy also uses this pattern frequently, since it often deals with large data sets that would be impractical to copy and return. A common example is the array method numpy.ndarray.sort, not to be confused with the top-level function numpy.sort, which returns a new copy.
The idea is something that is very much a part of the Python way of thinking. Here is an excerpt from Guido's email that explains the whys and wherefors:
I find the chaining form a threat to readability; it requires that the reader must be intimately familiar with each of the methods. The second [unchained] form makes it clear that each of these calls acts on the same object, and so even if you don't know the class and its methods very well, you can understand that the second and third call are applied to x (and that all calls are made for their side-effects), and not to something else.
Python built-ins, as a rule, will not do both, to avoid confusion over whether the function/method modifies its argument in place or returns a new value. When modifying in place, no return is performed (making it implicitly return None). The exceptions are cases where a mutating function returns something other than the object mutated (e.g. dict.pop, dict.setdefault).
It's generally a good idea to follow the same pattern, to avoid confusion.
The "best practice" is technically to not modify the thing at all:
def baz(z):
return z + [3]
x = [1, 2]
y = baz(x)
print(x, y)
but in general it's clearer if you restrict yourself to either returning a new object or modifying an object in-place, but not both at once.
There are examples in the standard library that both modify an object in-place and return something (the foremost example being list.pop()), but that's a special case because it's not returning the object that was modified.
There's not strict should of course, However, a function should either do something, or return something.. So, you'd better either modify the list in place without returning anything, or return a new one, leaving the original one unchanged.
Note: the list is not exactly passed by reference. It's the value of the reference that is actually passed. Keep that in mind if you re-assign

Python tuple ... is not a tuple? What does the comma do? [duplicate]

This question already has answers here:
Why does adding a trailing comma after an expression create a tuple?
(6 answers)
Closed last month.
I was looking at code in my course material and had to write a function which adds the value 99 to either a list or tuple. The final code looks like this:
def f(l):
print(l)
l += 99,
print(l)
f([1,2,3])
f((1,2,3))
This was used to show something different but I'm getting somewhat hung up on the line l += 99,. What this does, is create an iterable that contains the 99 and list as well as tuple support the simple "addition" of such an object to create a new instance/add a new element.
What I don't really get is what exactly is created using the syntax element,? If I do an assignment like x = 99, the type(x) will be tuple but if I try run x = tuple(99) it will fail as the 99 is not iterable. So is there:
Some kind of intermediate iterable object created using the syntax element,?
Is there a special function defined that would allow the calling of tuple without an iterable and somehow , is mapped to that?
Edit:
In case anyone wonders why the accepted answer is the one it is: The explanation for my second question made it. I should've been more clear with my question but that += is what actuallly got me confused and this answer includes information on this.
If the left-hand argument of = is a simple name, the type of argument currently bound to that name is irrelevant. tuple(99) fails because tuple's argument is not iterable; it has nothing to do with whether or not x already refers to an instance of tuple.
99, creates a tuple with a single argument; parentheses are only necessary to separate it from other uses of commas. For example, foo((99,100)) calls foo with a single tuple argument, while foo(99,100) calls foo with two distinct int arguments.
The syntax element, simply creates an "intermediate" tuple, not some other kind of object (though a tuple is of course iterable).
However, sometimes you need to use parentheses in order to avoid ambiguity. For this reason, you'll often see this:
l += (99,)
...even though the parentheses are not syntactically necessary. I also happen to think that is easier to read. But the parentheses ARE syntactically necessary in other situations, which you have already discovered:
list((99,))
tuple((99,))
set((99,))
You can also do these, since [] makes a list:
list([99])
tuple([99])
set([99])
...but you can't do these, since 99, is not a tuple object in these situations:
list(99,)
tuple(99,)
set(99,)
To answer your second question, no, there is not a way to make the tuple() function receive a non-iterable. In fact this is the purpose of the element, or (element,) syntax - very similar to [] for list and {} for dict and set (since the list, dict, and set functions all also require iterable arguments):
[99] #list
(99,) #tuple - note the comma is required
{99} #set
As discussed in the question comments, it surprising that you can increment (+=) a list using a tuple object. Note that you cannot do this:
l = [1]
l + (2,) # error
This is inconsistent, so it is probably something that should not have been allowed. Instead, you would need to do one of these:
l += [2]
l += list((2,))
However, fixing it would create problems for people (not to mention remove a ripe opportunity for confusion exploitation by evil computer science professors), so they didn't.
The tuple constructor requires an iterable (like it says in your error message) so in order to do x = tuple(99), you need to include it in an iterable like a list:
x = tuple([99])
or
x = tuple((99,))

Python methods: modify original vs return a different object

I'm new to Python and object orient programming, and have a very basic 101 question:
I see some methods return a modified object, and preserve the original:
In: x="hello"
In: x.upper()
Out: 'HELLO'
In: x
Out: 'hello'
I see other methods modify and overwrite the original object:
In: y=[1,2,3]
In: y.pop(0)
Out: 1
In: y
Out: [2, 3]
Are either of these the norm? Is there a way to know which case I am dealing with for a given class and method?
Your examples show the difference between immutable built-in objects (e.g., strings and tuples) and mutable objects (e.g., lists, dicts, and sets).
In general, if a class (object) is described as immutable, you should expect the former behavior, and the latter for mutable objects.
Both of these are idiomatic in Python, although list.pop() is a slightly special case.
In general, methods in Python either mutate the object or return a value. list.pop() is a little unusual in that, by definition, it must do both: remove an item from the list, and return it to you.
What is not common in Python, although it is in other languages, is to mutate an object and then return that same object - which would allow for methods to be chained together like so:
shape.stretch(x=2).move(3, 5)
... but can cause programs to be harder to debug.
If an object is immutable, like a string, you can be sure that a method won't mutate it (because, by definition, it can't). Failing that, the only way to tell whether a method mutates its object is to read the documentation (normally excellent for Python's built-in and standard library objects), or, of course, the source.

Python: Create List of Object References

I want to clean up some code I've written, in order to scale the magnitude of what I'm trying to do. In order to do so, I'd like to ideally create a list of references to objects, so that I can systematically set the objects, using a loop, without actually have to put the objects in list. I've read about the way Python handles references and pass-by, but haven't quite found a way to do this effectively.
To better demonstrate what I'm trying to do:
I'm using bokeh, and would like to set up a large number of select boxes. Each box looks like this
select_one_name = Select(
title = 'test',
value = 'first_value',
options = ['first_value', 'second_value', 'etc']
)
Setting up each select is fine, when I only have a few, but when I have 20, my code gets very long and unwieldy. What I'd like to be able to do, is have a list of sample_list = [select_one_name, select_two_name, etc] that I can then loop through, to set the values of each select_one_name, select_two_name, etc. However, I want to have my reference select_one_name still point to the correct value, rather than necessarily refer to the value by calling sample_list[0].
I'm not sure if this is do-able--if there's a better way to do this, than creating a list of references, please let me know. I know that I could just create a list of objects, but I'm trying to avoid that.
For reference, I'm on Python 2.7, Anaconda distribution, Windows 7. Thanks!
To follow up on #Alex Martelli's post below:
The reason why I thought this might not work, is because when I tried a mini-test with a list of lists, I didn't get the results I wanted. To demonstrate
x = [1, 2, 3]
y = [4, 5, 6]
test = [x, y]
test[0].append(1)
Results in x = [1, 2, 3, 1] but if instead, I use test[0] = [1, 2], then x remains [1, 2, 3], although test itself reflects the change.
Drawing a parallel back to my original example, I thought that I would see the same results as from setting to equal. Is this not true?
Every Python list always is internally an array of references (in CPython, which is no doubt what you're using, at the C level it's an array of PyObject* -- "pointers to Python objects").
No copies of the objects get made implicitly: rather (again, in CPython) each object's reference count gets incremented when the you add "the object" (actually a reference to it) to the list. In fact when you do want an object's copy you need to specifically ask for one (with the copy module in general, or sometimes with type-specific copy methods).
Multiple references to the same object are internally pointers to exactly the same memory. If an object is mutable, then mutating it gets reflected through all the references to it. Of course, there are immutable objects (strings, numbers, tuples, ...) to which such mutation cannot apply.
So when you do, e.g,
sample_list = [select_one_name, select_two_name, etc]
each of the names (as long as it's in scope) still refers to exactly the same object as the corresponding item in sample_list.
In other words, using sample_list[0] and select_one_name is totally equivalent as long as both references to the same object exist.
IOW squared, your stated purpose is already accomplished by Python's most fundamental semantics. Now, please edit the Q to clarify which behavior you're observing that seems to contradict this, versus which behavior you think you should be observing (and desire), and we may be able to help further -- because to this point all the above observations amount to "you're getting exactly the semantics you ask for" so "steady as she goes" is all I can practically suggest!-)
Added (better here in the answer than just below in comments:-): note the focus on mutating operation. The OP tried test[0]= somelist followed by test[0].append and saw somelist mutated accordingly; then tried test[0] = [1, 2] and was surprised to see somelist not changed. But that's because assignment to a reference is not a mutating operation on the object that said reference used to indicate! It just re-seats the reference, decrement the previously-referred-to object's reference count, and that's it.
If you want to mutate an existing object (which needs to be a mutable one in the first place, but, a list satisfies that), you need to perform mutating operations on it (through whatever reference, doesn't matter). For example, besides append and many other named methods, one mutating operation on a list is assignment to a slice, including the whole-list slice denoted as [:]. So, test[0][:] = [1,2] would in fact mutate somelist -- very different from test[0] = [1,2] which assigns to a reference, not to a slice.
This is not recommended, but it works.
sample_list = ["select_one_name", "select_two_name", "select_three_name"]
for select in sample_list:
locals()[select] = Select(
title = 'test',value = 'first_value',
options = ['first_value', 'second_value', 'etc']
)
You can use select_one_name, select_two_name, etc directly because they're set in the local scope due the special locals() list.
A cleaner approach is to use a dictionary, e.g.
selects = {
'select_one_name': Select(...),
'select_two_name': Select(...),
'select_three_name': Select(...)
}
And reference selects['select_one_name'] in your code and you can iterate over selects.keys() or selects.items().

[].append(x) behaviour

This executes as I'd expect:
>>>x=[]
>>>x.append(3)
>>>x
[3]
Why does the following return None?
>>>x = [].append(3)
>>>x
>>>
because list.append changes the list itself and returns None ;)
You can use help to see the docstring of a function or method:
In [11]: help(list.append)
Help on method_descriptor:
append(...)
L.append(object) -- append object to end
EDIT:
This is explained in docs of python3:
Some collection classes are mutable. The methods that add, subtract, or rearrange
their members in place, and don’t return a specific item, never return the
collection instance itself but None.
If the question is "why is x None?", then it's because list.append returns None as others say.
If the question is "why was append designed to return None instead of self?" then ultimately it is because of Guido van Rossum's decision, which he explains here as it applies to a related case:
https://mail.python.org/pipermail/python-dev/2003-October/038855.html
I'd like to explain once more why I'm so adamant that sort() shouldn't
return 'self'.
This comes from a coding style (popular in various other languages, I
believe especially Lisp revels in it) where a series of side effects
on a single object can be chained like this:
x.compress().chop(y).sort(z)
which would be the same as
x.compress()
x.chop(y)
x.sort(z)
I find the chaining form a threat to readability;
In short, he doesn't like methods that return self because he doesn't like method chaining. To prevent you from chaining methods like x.append(3).append(4).append(5) he returns None from append.
I speculate that this could perhaps be considered a specific case of the more general principle that distinguishes between:
pure functions, that have no side-effects and return values
procedures, that have side-effects and do not return values
Of course the Python language doesn't make any such distinction, and the Python libraries do not apply the general principle. For example list.pop() has a side-effect and returns a value, but since the value it returns isn't (necessarily) self it doesn't violate GvR's more specific rule.
The append method of lists returns None. It only modifies the list it is bounded to. Same will happen with:
x = {}.update(a=3)
The method list.append() changes the list inplace, and returns no result (so returns None if you prefer)
Many methods of list work on the list inplace, so you doesn't need to reassign a new list to override the old one.
>>> lst = []
>>> id(lst)
4294245644L
>>> lst.append(1)
>>> id(lst)
4294245644L # <-- same object, doesn't change.
With [].append(1), you're adding 1 to a list freshly created, and you have no reference on this one. So once the append is done, you have lost the list (and will be collected by the garbage collector).
By the way, fun fact, to make sense to my answer:
>>> id([].append(1))
1852276280
>>> id(None)
1852276280

Categories