Consider this:
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
These are correct statements in Python to remove elements:
numbers[0:2] = []
numbers[3:5] = []
However the statement below is not allowed:
numbers[::2] = []
ValueError: attempt to assign sequence of size 0 to extended slice of size 5
What prevents such a statement in Python?
ValueError: attempt to assign sequence of size 0 to extended slice of size 5
What prevents such a statement in Python?
It is noted in the documentation that the replacement must have the same length for the case where there is an explicit step (which is 2 in your case).
Operation
Result
Notes
s[i:j] = t
slice of s from i to j is replaced by the contents of the iterable t
s[i:j:k] = t
the elements of s[i:j:k] are replaced by those of t
(1) t must have the same length as the slice it is replacing.
The correct way is also documented there.
Operation
Result
Notes
del s[i:j]
same as s[i:j] = []
del s[i:j:k]
removes the elements of s[i:j:k] from the list
Code:
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
del numbers[::2]
print(numbers)
Output:
[1, 3, 5, 7, 9]
There is no need for numbers[::2] = [] to have the behaviour of deleting every second element, because you can already do that by writing numbers[:] = numbers[1::2]. If instead you want to (e.g.) replace every second element with the value 1, you can write one of the below, which are explicit about their behaviour.
for i in range(0, len(numbers), 2):
numbers[i] = 1
# or:
numbers[::2] = [1] * len(numbers[::2])
It is not obvious what the correct behaviour should be for assigning m elements to a non-contiguous slice of n list locations when m != n. In the comments you propose a possible behaviour, but your proposal is not consistent with how slice assignment works in other cases (normally, each element on the right-hand side gets used once on the left-hand side) and certainly doesn't fulfil the principle of least astonishment. In cases like this, I think there is no (non-raising) behaviour that most people would expect, so raising an error is the best option.
You have to create a new list with the slicing already mentioned in other answers:
numbers = numbers[1::2]
if you work on the same list you incur into heavy performance loss because inserting or deleting (not appending!) an element to a list is O(N). Since you have O(N) for each insertion and you have N/2 insertions your total cost is O(N**2). You really don't want that cost.
Creating a new list with the output of the slicing, on the other hand, has just O(N) total cost.
Related
I would like to perform an operation to each element in my list, and when a certain condition is met, to skip to the last element of the list.
Here is a MWE where I print all the items in a list until I reach my condition (item ==4), after which I manually repeat the print statement on the final element. The desired output is to print 0, 1, 2, 3, 4, 7:
my_list = [0, 1, 2, 3, 4, 5, 6, 7]
breaked_out = False
for item in my_list:
print(item)
if item == 4:
breaked_out = True
break
if breaked_out:
print(my_list[-1])
I have this ugly use of a flag (breaked_out) and also need to repeat my print() command. This isn't particularly legible either.
I have a slightly better implementation in mind that uses a while loop instead:
my_list = [0, 1, 2, 3, 4, 5, 6, 7]
i = 0
while i < len(my_list):
item = my_list[i]
print(item)
if item == 4:
i = len(my_list)-1
else:
i += 1
Here I'm not repeating my operation (in this case, print()) but I have to do this unpythonic index accounting.
Is there a more readable way to get this sort of iteration? Other things to add about my situation:
This is iterating on a list, not a generator, so I have access to len().
I need to loop through in this order, so I can't reverse(my_list) and treat the special case first.
There is no Pythonic way to do exactly what you want
Not what you asked for, but less ugly
The minimal change, not so ugly solution that doesn't actually advance the iterator is to just put the code in the loop instead of having a flag variable:
my_list = [0, 1, 2, 3, 4, 5, 6, 7]
for item in my_list:
print(item)
if item == 4:
print(my_list[-1]) # Handling of last element inlined
break
# Optionally, an else: block can run to do something special when you didn't break
# which might be important if the item that's equal to 4 is the last or second
# to last item, where the former does the work for the final element twice,
# while the latter does it only once, but looks like it never found the element
# (processing all items without breaking looking the same as processing elements
# 0 through n - 1, then processing n separately, then breaking)
else:
print("Processed them all!")
or to avoid processing the final element twice when it's the first element meeting the test criteria, use enumerate to track your position:
my_list = [0, 1, 2, 3, 4, 5, 6, 7]
for i, item in enumerate(my_list, 1): # Lets us test against len(my_list) rather than len(my_list) - 1
print(item)
if item == 4 and i < len(my_list): # Don't process last item if we just processed it!
print(my_list[-1]) # Handling of last element inlined
break
What you asked for:
There's only two ways I know of to do this, both of which involve converting to an iterator first so you can manipulate the iterator within the loop so it will skip to the last element for the next loop. In both cases, your original code changes to:
my_list = [0, 1, 2, 3, 4, 5, 6, 7]
lstiter = iter(my_list)
for item in lstiter:
print(item)
if item == 4:
# iterator advance goes here
where that placeholder line at the bottom is what changes.
The documented, but slow approach
Using the consume recipe from itertools, advance it to near the end. You need to know where you are, so the for loop changes to:
for i, item in enumerate(lstiter, 1): # Starting from 1 avoids needing an extra - 1 in
# the length check and consume's second argument
and the placeholder is filled with:
if i < len(my_list):
consume(lstiter, len(my_list) - i)
Downside: Advancing an arbitrary iterator manually like this is O(n) (it has to produce and discard all the values, which can take time for a large list).
The efficient, but undocumented approach
list iterators provide a __setstate__ method (it's used for pickleing them so they can be unpickled at the same offset in the underlying list). You can abuse this to change the position of the iterator to any place you like. For this, you keep the for loop without enumerate, and just fill the placeholder with:
lstiter.__setstate__(len(my_list) - 1)
which directly skips to the iterator such that the next element it produces will be the final element of the list. It's efficient, it's simple, but it's non-obvious, and I doubt any part of the spec requires that __setstate__ be provided at all, let alone implemented in this useful way (there are a bazillion methods you can choose from to implement pickling, and they could have selected another option). That said, the implementation is effectively required for all pickle protocol versions to date, for compatibility reasons (if they got rid of __setstate__, pickles produced on older Python would not be readable on modern Python), so it should be fairly reliable.
A warning:
If the final element of your list matches the condition, this will turn into an infinite loop, unlike the other solutions (break double processes the final element in that case, consume only processes each element at most once). breaking explicitly doesn't reenter the loop, so that's safe, and the consume recipe can't back up an iterator, so again, safe. But since this sets the position, it can set it back to the same position forever. If this is a possibility, I'd recommend the explicit break (using enumerate to check indices to avoid double-processing the final element), or failing that, you can add even more hackery by checking the length hint of the iterator to see if you were already at the end (and therefore should not adjust the position):
from operator import length_hint # At top of file
my_list = [0, 1, 2, 3, 4, 5, 6, 7]
lstiter = iter(my_list)
for item in lstiter:
print(item)
if item == 4 and length_hint(lstiter) > 1: # Don't set when finished or reaching last item anyway
lstiter.__setstate__(len(my_list) - 1)
As an alternative to checking length_hint and similar hackery, you could use a flag variable that gets set to True when the condition passes and prevents reentering the if a second time, e.g.:
my_list = [0, 1, 2, 3, 4, 5, 6, 7]
lstiter = iter(my_list)
skipped = False
for i, item in enumerate(lstiter, 1):
print(item)
if item == 4 and not skipped and i < len(my_list): # Don't set when finished or reaching last item anyway
lstiter.__setstate__(len(my_list) - 1)
skipped = True
but this is straying further and further from Pythonic with every change. :-)
Don't do too much for a simple task
my_list = [0, 1, 2, 3, 4, 5, 6, 7]
for item in my_list:
print(item)
if item == 4:
print(my_list[-1])
break
# task here
lst = ['apple', 'orange', 'kiwi', 'ananas',
'tea', 'coffee', 'milk', 'love', 'peace']
for i in range(len(lst)):
if (i + 1) % 2 == 0:
lst.append(lst[i])
lst.pop(i)
Basically here I want the items with even index to be added at the end of this list
it works for the second item but still doesn’t for the rest of them
You can use Python's wider-step ranges:
lst = lst[1::2] + lst[0::2]
The right hand side of the plus says "grab every 2nd element starting from the first" and the left hand side says "grab every 2nd element starting from the second". This basically reconstructs the list with the odd elements first and the even elements last.
It even avoids expensive pops that make your reference algorithm O(n^2)
The problem with your approach is that the elements shift after you moved the first element. So when you are at the next element with "even" index, the element that's there was originally at an odd index. Thus, after you shift the first element, you can just directly continue with the element at the next index, which previously was two indices away, then again the next one, and so on, for half the indices in the list.
Here's an example, using a list of numbers so it's easier to see what happens. If you want odd indices instead, use range(1, len(lst)//2+1).
lst = list(range(10))
for i in range(len(lst)//2):
lst.append(lst.pop(i))
# [1, 3, 5, 7, 9, 0, 2, 4, 6, 8]
However, even if this works, modifying a list while iterating it is generally a very bad idea leading to many headaches. Also, that repeated pop(i) makes the whole operation O(n²).
Instead, it would be much faster and saner to just combine two slices of the list:
lst = list(range(10))
lst = lst[1::2] + lst[0::2]
# [1, 3, 5, 7, 9, 0, 2, 4, 6, 8]
(If you need to change the list "in-place", e.g. because of other references pointing to that list, you can replace the content of the list using an assignment to a slice: lst[:] = .... This would still not be "in-place" in the sense of not using additional memory. But if the list is so big that this is a problem, then the O(n²) running time will probably be a bigger problem anyway.)
A simple way would be to build a new list by using comprehensions:
lst2 = [v for i, v in enumerate(lst) if i%2 == 0] + \
[v for i, v in enumerate(lst) if i%2 != 0]
But it is possible to change the list in place. The rule is to start from the end of the list in order not to change oddness of indices when an element is removed
last = len(lst) - 1 # when an element is popped, the list loses one element
for i in range(len(lst), 0, -1):
if (i % 2) == 0:
val = lst.pop(i - 1) # remove element with even index
lst.insert(last, val) # insert it before last inserted
last -= 1
I've been working on implementing common sorting algorithms into Python, and whilst working on selection sort I ran into a problem finding the minimum value of a sublist and swapping it with the first value of the sublist, which from my testing appears to be due to a problem with how I am using min() in my program.
Here is my code:
def selection_sort(li):
for i in range(0, len(li)):
a, b = i, li.index(min(li[i:]))
li[a], li[b] = li[b], li[a]
This works fine for lists that have zero duplicate elements within them:
>>> selection_sort([9,8,7,6,5,4,3,2,1])
[1, 2, 3, 4, 5, 6, 7, 8, 9]
However, it completely fails when there are duplicate elements within the list.
>>> selection_sort([9,8,8,7,6,6,5,5,5,4,2,1,1])
[8, 8, 7, 6, 6, 5, 5, 5, 4, 2, 9, 1, 1]
I tried to solve this problem by examining what min() is doing on line 3 of my code, and found that min() returns the index value of the smallest element inside the sublist as intended, but the index is of the element within the larger list rather than of the sublist, which I hope this experimentation helps to illustrate more clearly:
>>> a = [1,2,1,1,2]
>>> min(a)
1 # expected
>>> a.index(min(a))
0 # also expected
>>> a.index(min(a[1:]))
0 # should be 1?
I'm not sure what is causing this behaviour; it could be possible to copy li[i:] into a temporary variable b and then do b.index(min(b)), but copying li[i:] into b for each loop might require a lot of memory, and selection sort is an in-place algorithm so I am uncertain as to whether this approach is ideal.
You're not quite getting the concept correctly!
li.index(item) will return the first appearance of that item in the list li.
What you should do instead is if you're finding the minimum element in the sublist, search for that element in the sublist as well instead of searching it in the whole list. Also when searching in the sliced list, you will get the index in respect to the sublist. Though you can easily fix that by adding the starting step to the index returned.
A small fix for your problem would be:
def selection_sort(li):
for i in range(0, len(li)):
a, b = i, i + li[i:].index(min(li[i:]))
li[a], li[b] = li[b], li[a]
When you write a.index(min(a[1:])) you are searching for the first occurence of the min of a[1:], but you are searching in the original list. That's why you get 0 as a result.
By the way, the function you are looking for is generally called argmin. It is not contained in pure python, but numpy module has it.
One way you can do it is using list comprehension:
idxs = [i for i, val in enumerate(a) if val == min(a)]
Or even better, write your own code, which is faster asymptotically:
idxs = []
minval = None
for i, val in enumerate(a):
if minval is None or minval > val:
idxs = [i]
minval = val
elif minval == val:
idxs.append(i)
I would like to ask what the following does in Python.
It was taken from http://danieljlewis.org/files/2010/06/Jenks.pdf
I have entered comments telling what I think is happening there.
# Seems to be a function that returns a float vector
# dataList seems to be a vector of flat.
# numClass seems to an int
def getJenksBreaks( dataList, numClass ):
# dataList seems to be a vector of float. "Sort" seems to sort it ascendingly
dataList.sort()
# create a 1-dimensional vector
mat1 = []
# "in range" seems to be something like "for i = 0 to len(dataList)+1)
for i in range(0,len(dataList)+1):
# create a 1-dimensional-vector?
temp = []
for j in range(0,numClass+1):
# append a zero to the vector?
temp.append(0)
# append the vector to a vector??
mat1.append(temp)
(...)
I am a little confused because in the pdf there are no explicit variable declarations. However I think and hope I could guess the variables.
Yes, the method append() adds elements to the end of the list. I think your interpretation of the code is correct.
But note the following:
x =[1,2,3,4]
x.append(5)
print(x)
[1, 2, 3, 4, 5]
while
x.append([6,7])
print(x)
[1, 2, 3, 4, 5, [6, 7]]
If you want something like
[1, 2, 3, 4, 5, 6, 7]
you may use extend()
x.extend([6,7])
print(x)
[1, 2, 3, 4, 5, 6, 7]
Python doesn't have explicit variable declarations. It's dynamically typed, variables are whatever type they get assigned to.
Your assessment of the code is pretty much correct.
One detail: The range function goes up to, but does not include, the last element. So the +1 in the second argument to range causes the last iterated value to be len(dataList) and numClass, respectively. This looks suspicious, because the range is zero-indexed, which means it will perform a total of len(dataList) + 1 iterations (which seems suspicious).
Presumably dataList.sort() modifies the original value of dataList, which is the traditional behavior of the .sort() method.
It is indeed appending the new vector to the initial one, if you look at the full source code there are several blocks that continue to concatenate more vectors to mat1.
append is a list function used to append a value at the end of the list
mat1 and temp together are creating a 2D array (eg = [[], [], []]) or matrix of (m x n)
where m = len(dataList)+1 and n = numClass
the resultant matrix is a zero martix as all its value is 0.
In Python, variables are implicitely declared. When you type this:
i = 1
i is set to a value of 1, which happens to be an integer. So we will talk of i as being an integer, although i is only a reference to an integer value. The consequence of that is that you don't need type declarations as in C++ or Java.
Your understanding is mostly correct, as for the comments. [] refers to a list. You can think of it as a linked-list (although its actual implementation is closer to std::vectors for instance).
As Python variables are only references to objects in general, lists are effectively lists of references, and can potentially hold any kind of values. This is valid Python:
# A vector of numbers
vect = [1.0, 2.0, 3.0, 4.0]
But this is perfectly valid code as well:
# The list of my objects:
list = [1, [2,"a"], True, 'foo', object()]
This list contains an integer, another list, a boolean... In Python, you usually rely on duck typing for your variable types, so this is not a problem.
Finally, one of the methods of list is sort, which sorts it in-place, as you correctly guessed, and the range function generates a range of numbers.
The syntax for x in L: ... iterates over the content of L (assuming it is iterable) and sets the variable x to each of the successive values in that context. For example:
>>> for x in ['a', 'b', 'c']:
... print x
a
b
c
Since range generates a range of numbers, this is effectively the idiomatic way to generate a for i = 0; i < N; i += 1 type of loop:
>>> for i in range(4): # range(4) == [0,1,2,3]
... print i
0
1
2
3
I have a sorted list of integers, L, and I have a value X that I wish to insert into the list such that L's order is maintained. Similarly, I wish to quickly find and remove the first instance of X.
Questions:
How do I use the bisect module to do the first part, if possible?
Is L.remove(X) going to be the most efficient way to do the second part? Does Python detect that the list has been sorted and automatically use a logarithmic removal process?
Example code attempts:
i = bisect_left(L, y)
L.pop(i) #works
del L[bisect_left(L, i)] #doesn't work if I use this instead of pop
You use the bisect.insort() function:
bisect.insort(L, X)
L.remove(X) will scan the whole list until it finds X. Use del L[bisect.bisect_left(L, X)] instead (provided that X is indeed in L).
Note that removing from the middle of a list is still going to incur a cost as the elements from that position onwards all have to be shifted left one step. A binary tree might be a better solution if that is going to be a performance bottleneck.
You could use Raymond Hettinger's IndexableSkiplist. It performs 3 operations in O(ln n) time:
insert value
remove value
lookup value by rank
import skiplist
import random
random.seed(2013)
N = 10
skip = skiplist.IndexableSkiplist(N)
data = range(N)
random.shuffle(data)
for num in data:
skip.insert(num)
print(list(skip))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for num in data[:N//2]:
skip.remove(num)
print(list(skip))
# [0, 3, 4, 6, 9]