As I only now noticed after commenting on this answer, slices in Python 3 return shallow copies of whatever they're slicing rather than views. Why is this still the case? Even leaving aside numpy's usage of views rather than copies for slicing, the fact that dict.keys, dict.values, and dict.items all return views in Python 3, and that there are many other aspects of Python 3 geared towards greater use of iterators, makes it seem that there would have been a movement towards slices becoming similar. itertools does have an islice function that makes iterative slices, but that's more limited than normal slicing and does not provide view functionality along the lines of dict.keys or dict.values.
As well, the fact that you can use assignment to slices to modify the original list, but slices are themselves copies and not views, is a contradictory aspect of the language and seems like it violates several of the principles illustrated in the Zen of Python.
That is, the fact you can do
>>> a = [1, 2, 3, 4, 5]
>>> a[::2] = [0, 0, 0]
>>> a
[0, 2, 0, 4, 0]
But not
>>> a = [1, 2, 3, 4, 5]
>>> a[::2][0] = 0
>>> a
[0, 2, 3, 4, 5]
or something like
>>> a = [1, 2, 3, 4, 5]
>>> b = a[::2]
>>> b
view(a[::2] -> [1, 3, 5]) # numpy doesn't explicitly state that its slices are views, but it would probably be a good idea to do it in some way for regular Python
>>> b[0] = 0
>>> b
view(a[::2] -> [0, 3, 5])
>>> a
[0, 2, 3, 4, 5]
Seems somewhat arbitrary/undesirable.
I'm aware of http://www.python.org/dev/peps/pep-3099/ and the part where it says "Slices and extended slices won't go away (even if the __getslice__ and __setslice__ APIs may be replaced) nor will they return views for the standard object types.", but the linked discussion provides no mention of why the decision about slicing with views was made; in fact, the majority of the comments on that specific suggestion out of the suggestions listed in the original post seemed to be positive.
What prevented something like this from being implemented in Python 3.0, which was specifically designed to not be strictly backwards-compatible with Python 2.x and thus would have been the best time to implement such a change in design, and is there anything that may prevent it in future versions of Python?
As well, the fact that you can use assignment to slices to modify the original list, but slices are themselves copies and not views.
Hmm.. that's not quite right; although I can see how you might think that. In other languages, a slice assignment, something like:
a[b:c] = d
is equivalent to
tmp = a.operator[](slice(b, c)) # which returns some sort of reference
tmp.operator=(d) # which has a special meaning for the reference type.
But in python, the first statement is actually converted to this:
a.__setitem__(slice(b, c), d)
Which is to say that an item assignment is actually specially recognized in python to have a special meaning, separate from item lookup and assignment; they may be unrelated. This is consistent with python as a whole, because python doesn't have concepts like the "lvalues" found in C/C++; There's no way to overload the assignment operator itself; only specific cases when the left side of the assignment is not a plain identifier.
Suppose lists did have views; And you tried to use it:
myView = myList[1:10]
yourList = [1, 2, 3, 4]
myView = yourList
In languages besides python, there might be a way to shove yourList into myList, but in python, since the name myView appears as a bare identifier, it can only mean a variable assignemnt; the view is lost.
Well it seems I found a lot of the reasoning behind the views decision, going by the thread starting with http://mail.python.org/pipermail/python-3000/2006-August/003224.html (it's primarily about slicing strings, but at least one e-mail in the thread mentions mutable objects like lists), and also some things from:
http://mail.python.org/pipermail/python-3000/2007-February/005739.html
http://mail.python.org/pipermail/python-dev/2008-May/079692.html and following e-mails in the thread
Looks like the advantages of switching to this style for base Python would be vastly outweighed by the induced complexity and various undesirable edge cases. Oh well.
...And as I then started wondering about the possibility of just replacing the current way slice objects are worked with with an iterable form a la itertools.islice, just as zip, map, etc. all return iterables instead of lists in Python 3, I started realizing all the unexpected behavior and possible problems that could come out of that. Looks like this might be a dead end for now.
On the plus side, numpy's arrays are fairly flexible, so in situations where this sort of thing might be necessary, it wouldn't be too hard to use one-dimensional ndarrays instead of lists. However, it seems ndarrays don't support using slicing to insert additional items within arrays, as happens with Python lists:
>>> a = [0, 0]
>>> a[:1] = [2, 3]
>>> a
[2, 3, 0]
I think the numpy equivalent would instead be something like this:
>>> a = np.array([0, 0]) # or a = np.zeros([2]), but that's not important here
>>> a = np.hstack(([2, 3], a[1:]))
>>> a
array([2, 3, 0])
A slightly more complicated case:
>>> a = [1, 2, 3, 4]
>>> a[1:3] = [0, 0, 0]
>>> a
[1, 0, 0, 0, 4]
versus
>>> a = np.array([1, 2, 3, 4])
>>> a = np.hstack((a[:1], [0, 0, 0], a[3:]))
>>> a
array([1, 0, 0, 0, 4])
And, of course, the above numpy examples don't store the result in the original array as happens with the regular Python list expansion.
Related
I am trying to sort a list by frequency of its elements.
>>> a = [5, 5, 4, 4, 4, 1, 2, 2]
>>> a.sort(key = a.count)
>>> a
[5, 5, 4, 4, 4, 1, 2, 2]
a is unchanged. However:
>>> sorted(a, key = a.count)
[1, 5, 5, 2, 2, 4, 4, 4]
Why does this method not work for .sort()?
What you see is the result of a certain CPython implementation detail of list.sort. Try this again, but create a copy of a first:
a.sort(key=a.copy().count)
a
# [1, 5, 5, 2, 2, 4, 4, 4]
.sort modifies a internally, so a.count is going to produce un-predictable results. This is documented as an implementation detail.
What copy call does is it creates a copy of a and uses that list's count method as the key. You can see what happens with some debug statements:
def count(x):
print(a)
return a.count(x)
a.sort(key=count)
[]
[]
[]
...
a turns up as an empty list when accessed inside .sort, and [].count(anything) will be 0. This explains why the output is the same as the input - the predicates are all the same (0).
OTOH, sorted creates a new list, so it doesn't have this problem.
If you really want to sort by frequency counts, the idiomatic method is to use a Counter:
from collections import Counter
a.sort(key=Counter(a).get)
a
# [1, 5, 5, 2, 2, 4, 4, 4]
It doesn't work with the list.sort method because CPython decides to "empty the list" temporarily (the other answer already presents this). This is mentioned in the documentation as implementation detail:
CPython implementation detail: While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined. The C implementation of Python makes the list appear empty for the duration, and raises ValueError if it can detect that the list has been mutated during a sort.
The source code contains a similar comment with a bit more explanation:
/* The list is temporarily made empty, so that mutations performed
* by comparison functions can't affect the slice of memory we're
* sorting (allowing mutations during sorting is a core-dump
* factory, since ob_item may change).
*/
The explanation isn't straight-forward but the problem is that the key-function and the comparisons could change the list instance during sorting which is very likely to result in undefined behavior of the C-code (which may crash the interpreter). To prevent that the list is emptied during the sorting, so that even if someone changes the instance it won't result in an interpreter crash.
This doesn't happen with sorted because sorted copies the list and simply sorts the copy. The copy is still emptied during the sorting but there's no way to access it, so it isn't visible.
However you really shouldn't sort like this to get a frequency sort. That's because for each item you call the key function once. And list.count iterates over each item, so you effectively iterate the whole list for each element (what is called O(n**2) complexity). A better way would be to calculate the frequency once for each element (can be done in O(n)) and then just access that in the key.
However since CPython has a Counter class that also supports most_common you could really just use that:
>>> from collections import Counter
>>> [item for item, count in reversed(Counter(a).most_common()) for _ in range(count)]
[1, 2, 2, 5, 5, 4, 4, 4]
This may change the order of the elements with equal counts but since you're doing a frequency count that shouldn't matter to much.
I came across a code snippet where I could not understand two of the statements, though I could see the end result of each.
I will create a variable before giving the statements:
train = np.random.random((10,100))
One of them read as :
train = train[:-1, 1:-1]
What does this slicing mean? How to read this? I know that that -1 in slicing denotes from the back. But I cannot understand this.
Another statement read as follows:
la = [0.2**(7-j) for j in range(1,t+1)]
np.array(la)[:,None]
What does slicing with None as in [:,None] mean?
For the above two statements, along with how each statement is read, it will be helpful to have an alternative method along, so that I understand it better.
One of Python's strengths is its uniform application of straightforward principles. Numpy indexing, like all indexing in Python, passes a single argument to the indexed object's (i.e., the array's) __getitem__ method, and numpy arrays were one of the primary justifications for the slicing mechanism (or at least one of its very early uses).
When I'm trying to understand new behaviours I like to start with a concrete and comprehensible example, so rather than 10x100 random values I'll start with a one-dimensional 4-element vector and work up to 3x4, which should be big enough to understand what's going on.
simple = np.array([1, 2, 3, 4])
train = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
The interpreter shows these as
array([1, 2, 3, 4])
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
The expression simple[x] is equivalent to (which is to say the interpreter ends up executing) simple.__getitem__(x) under the hood - note this call takes a single argument.
The numpy array's __getitem__ method implements indexing with an integer very simply: it selects a single element from the first dimension. So simple[1] is 2, and train[1] is array([5, 6, 7, 8]).
When __getitem__ receives a tuple as an argument (which is how Python's syntax interprets expressions like array[x, y, z]) it applies each element of the tuple as an index to successive dimensions of the indexed object. So result = train[1, 2] is equivalent (conceptually - the code is more complex in implementation) to
temp = train[1] # i.e. train.__getitem__(1)
result = temp[2] # i.e. temp.__getitem__(2)
and sure enough we find that result comes out at 7. You could think of array[x, y, z] as equivalent to array[x][y][z].
Now we can add slicing to the mix. Expressions containing a colon can be regarded as slice literals (I haven't seen a better name for them), and the interpreter creates slice objects for them. As the documentation notes, a slice object is mostly a container for three values, start, stop and slice, and it's up to each object's __getitem__ method how it interprets them. You might find this question helpful to understand slicing further.
With what you now know, you should be able to understand the answer to your first question.
result = train[:-1, 1:-1]
will call train.__getitem__ with a two-element tuple of slices. This is equivalent to
temp = train[:-1]
result = temp[..., 1:-1]
The first statement can be read as "set temp to all but the last row of train", and the second as "set result to all but the first and last columns of temp". train[:-1] is
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
and applying the [1:-1] subscripting to the second dimension of that array gives
array([[2, 3],
[6, 7]])
The ellipsis on the first dimension of the temp subscript says "pass everything," so the subscript expression[...]can be considered equivalent to[:]. As far as theNonevalues are concerned, a slice has a maximum of three data points: _start_, _stop_ and _step_. ANonevalue for any of these gives the default value, which is0for _start_, the length of the indexed object for _stop_, and1for _step. Sox[None:None:None]is equivalent tox[0:len(x):1]which is equivalent tox[::]`.
With this knowledge under your belt you should stand a bit more chance of understanding what's going on.
I'm at the very beginning of learning Python 3. Getting to know the language basics. There is a method to the list data type:
list.append(x)
and in the tutorial it is said to be equivalent to this expression:
a[len(a):] = [x]
Can someone please explain this expression? I can't grasp the len(a): part. It's a slice right? From the last item to the last? Can't make sense of it.
I'm aware this is very newbie, sorry. I'm determined to learn Python for Blender scripting and the Game Engine, and want to understand well all the constructs.
Think back to how slices work: a[beginning:end].
If you do not supply one of them, then you get all the list from beginning or all the way to end.
What that means is if I ask for a[2:], I will get the list from the index 2 all the way to the end of the list and len(a) is an index right after the last element of the array... so a[len(a):] is basically an empty array positioned right after the last element of the array.
Say you have a = [0,1,2], and you do a[3:] = [3,4,5], what you're telling Python is that right after [0,1,2 and right before ], there should be 3,4,5.
Thus a will become [0,1,2,3,4,5] and after that step a[3:] will indeed be equal to [3,4,5] just as you declared.
Edit: as chepner commented, any index greater than or equal to len(a) will work just as well. For instance, a = [0,1,2] and a[42:] = [3,4,5] will also result in a becoming [0,1,2,3,4,5].
One could generally state that l[len(l):] = [1] is similar to append, and that is what is stated in the docs, but, that is a special case that holds true only when the right hand side has a single element.
In the more general case it is safer to state that it is equivalent to extend for the following reasons:
Append takes an object and appends that to the end; with slice assignment you extend a list with the given iterable on the right hand side:
l[len(l):] = [1, 2, 3]
is equivalent to:
l.extend([1, 2, 3])
The same argument to append would cause [1, 2, 3] to be appended as an object at the end of l. In this scenario len(l) is simply used in order for the extending of the list to be performed at the end of l.
Some examples to illustrate their difference:
l = [1, 2]
l[len(l):] = [1, 2] # l becomes [1, 2, 1, 2]
l.extend([1, 2]) # l becomes [1, 2, 1, 2, 1, 2]
l.append([1, 2]) # l becomes [1, 2, 1, 2, 1, 2, [1, 2]]
As you note, l.append(<iterable>) doesn't actually append each value in the iterable, it appends the iterable itself.
I know, there are plenty of threads about list vs. array but I've got a slightly different problem.
Using Python, I find myself converting between np.array and list quite often as I want to use attributes like
remove, append, extend, sort, index, … for lists
and on the other hand modify the content by things like
*, /, +, -, np.exp(), np.sqrt(), … which only works for arrays.
It must be pretty messy to switch between data types with list(array) and np.asarray(list), I assume. But I just can't think of a proper solution. I don't really want to write a loop every time I want to find and remove something from my array.
Any suggestions?
A numpy array:
>>> A=np.array([1,4,9,2,7])
delete:
>>> A=np.delete(A, [2,3])
>>> A
array([1, 4, 7])
append (beware: it's O(n), unlike list.append which is O(1)):
>>> A=np.append(A, [5,0])
>>> A
array([1, 4, 7, 5, 0])
sort:
>>> np.sort(A)
array([0, 1, 4, 5, 7])
index:
>>> A
array([1, 4, 7, 5, 0])
>>> np.where(A==7)
(array([2]),)
I recently started coding in Python 2.7. I'm a molecular biologist.
I'm writing a script that involves creating lists like this one:
mylist = [[0, 4, 6, 1], 102]
These lists are incremented by adding an item to mylist[0] and summing a value to mylist[1].
To do this, I use the code:
def addres(oldpep, res):
return [oldpep[0] + res[0], oldpep[1] + res[1]]
Which works well. Since mylist[0] can become a bit long, and I have millions of these lists to take care of, I thought that using append or extend might make my code faster, so I tried:
def addres(pep, res):
pep[0].extend(res[0])
pep[1] += res[1]
return pep
Which in my mind should give the same result. It does give the same result when I try it on an arbitrary list. But when I feed it to the million of lists, it gives me a very different result. So... what's the difference between the two? All the rest of the script is exactly the same.
Thank you!
Roberto
The difference is that the second version of addres modifies the list that you passed in as pep, where the first version returns a new one.
>>> mylist = [[0, 4, 6, 1], 102]
>>> list2 = [[3, 1, 2], 205]
>>> addres(mylist, list2)
[[0, 4, 6, 1, 3, 1, 2], 307]
>>> mylist
[[0, 4, 6, 1, 3, 1, 2], 307]
If you need to not modify the original lists, I don't think you're going to really going to get a faster Python implementation of addres than the first one you wrote. You might be able to deal with the modification, though, or come up with a somewhat different approach to speed up your code if that's the problem you're facing.
List are objects in python which are passed by reference.
a=list()
This doesn't mean that a is the list but a is pointing towards a list just created.
In first example, you are using list element and creating a new list, an another object while in the second one you are modifying the list content itself.