Trying to understand insertion sort algorithm - python

I'm reading some books on Python, data structures, and analysis and design of algorithms. I want to really understand the in's and out's of coding, and become an efficient programmer. It's difficult to ask the book to clarify, hence my question on stackoverflow. I'm really finding Algorithms and recursion challenging ... I posted some code (insertion sort) below that I'm trying to understand exactly what's happening. I understand, generally, what is supposed to happen, but I'm not really getting the how and why.
From trying to analyze pieces of the code on Python Idle, I know that:
key (holds variables) = 8, 2, 4, 9, 3, 6
and that:
i (holds the length) = 7 ( 1, 2, 3, 4, 5, 6, 7)
I don't know why 1 is used in the first line: range(1, len(mylist)). Any help is appreciated.
mylist = [8, 2, 4, 9, 3, 6]
for j in range(1,len(mylist)):
key = mylist[j]
i = j
while i > 0 and mylist[i-1] > key:
mylist[i] = mylist[i - 1]
i -= 1
mylist[i] = key

Let me try to break this down.
Start by considering a list. It is "almost" sorted. That is, the first few elements are sorted, but the last element is not sorted. So it looks something like this:
[10, 20, 30, 50, 15]
Obviously, the 15 is in the wrong place. So how do we move it?
key = mylist[4]
mylist[4] = mylist[3]
mylist[3] = key
That'll switch around the 15 and the 50 so now the list looks like:
[10, 20, 30, 15, 50]
But we'd like to do this several times in a loop. To do that we can do:
while ???:
key = mylist[i]
mylist[i] = mylist[i-1]
mylist[i-1] = key
i -= 1
That loop will go back one position at a time swapping the two elements. That'll move the out of order position one place back each time. But how do we know when to stop?
Let's look again at our list and the moves we want to make:
[10, 20, 30, 50, 15]
[10, 20, 30, 15, 50]
[10, 20, 15, 30, 50]
[10, 15, 20, 30, 50]
# stop! we are sorted now!
But what is different that last time around? Every time we move the number one place back, it is because the 15 is less then the element on the left, meaning its not sorted. When that is no longer true we should stop moving. But we can easily deal with that:
key = mylist[i]
while key < mylist[i-1]:
mylist[i] = mylist[i-1]
mylist[i-1] = key
i -= 1
Ok, but happens if we now try to sort this list:
[10, 20, 1]
[10, 1, 20]
[1, 10, 20]
# ERROR!!
At this point something bad happens. We try to check whether key < mylist[i-1] but when we've reached the beginning, i = 0, and this checks the end of the list. But we should stop moving to left at this point...
If we reach the beginning of the list, we can't move our pivot/key further so we should stop. We update our while loop to handle that:
key = mylist[i]
while i > 0 and key < mylist[i-1]:
mylist[i] = mylist[i-1]
mylist[i-1] = key
i -= 1
So now we have a technique for sorting an almost sorted list. But how can we use that to sort a whole list? We sort parts of the list at a time.
[8, 2, 4, 9, 3, 6]
First we sort the first 1 elements:
[8, 2, 4, 9, 3, 6]
Then we sort the first 2 elements:
[2, 8, 4, 9, 3, 6]
Then we sort the first 3 elements
[2, 4, 8, 9, 3, 6]
So on and so forth
[2, 4, 8, 9, 3, 6]
[2, 4, 8, 9, 3, 6]
[2, 3, 4, 8, 9, 6]
[2, 3, 4, 6, 8, 9]
But how do we do we do that? With a for loop
for j in range(len(mylist)):
i = j
key = mylist[i]
while i > 0 and key < mylist[i-1]:
mylist[i] = mylist[i-1]
mylist[i-1] = key
i -= 1
But we can skip the first time through, because a list of one element is obviously already sorted.
for j in range(1, len(mylist)):
i = j
key = mylist[i]
while i > 0 and key < mylist[i-1]:
mylist[i] = mylist[i-1]
mylist[i-1] = key
i -= 1
A few minor changes which make no difference brings us back to your original code
for j in range(1, len(mylist)):
key = mylist[j]
i = j
while i > 0 and key < mylist[i-1]:
mylist[i] = mylist[i-1]
i -= 1
mylist[i] = key

The insertion sort algorithm works by trying to build up a sorted list of increasing length at the start of the array. The idea is that if you start off by building a one-element sorted list at the beginning, then a two-element list, then a three-element list, etc., that once you've built up an n-element sorted list, you have sorted the entire array and are done.
For example, given the array
3 1 4
We can split this into a zero-element sorted list and a three-element unsorted list:
| 3 1 4
Now, we add 3 to our sorted list. Since that list is now only one element long, it's automatically sorted:
3 | 1 4
Now, we want to add 1 to our sorted list. If we just add 1 to the end of the list like this:
3 1 | 4
then the sorted list is no longer sorted. To fix this, the inner loop of the insertion sort code works by continuously swapping the 1 with the element before it until it's in the proper position. In our case, we swap the 1 and the 3:
1 3 | 4
and since the 1 is now at the beginning of the array, we don't need to move it any more. This is why the inner loop runs while i > 0; once the index of the new element (i) is at the start of the array, there's nothing before it that could be any bigger.
Finally, we update the array by adding 4 to the sorted list. Since it's in sorted position, we're done:
1 3 4
And our array is now in sorted order.
Now, to your original question: why does the outer loop start at 1? This is a cute optimization trick. The idea is that any one-element array must automatically be sorted. This means that the algorithm can start off by saying that the first element of the array is a one-element sorted list. For example, given the array
2 7 1 8
The insertion sort algorithm could try splitting this array like this, putting an empty sorted list at the front:
| 2 7 1 8
But a marginally faster option is to split the list like this:
2 | 7 1 8
which is guaranteed to be safe because any one-element list is automatically sorted.
This is really an optimization of the algorithm on the part of the authors. The algorithm would work perfectly fine if the outer loop started at zero, but they've just decided to start it at one to avoid an unnecessary loop iteration.
Hope this helps!

Have a look at the while loop. It starts with i having the value of 1, but then i is decreased. So in the last line, the minimum value of i could be 0, which is the first element in the list. If you would start with 0, i would become -1 which is valid in python, but means the last element. Therefore the range has to start with 1.
I would like to mention, that you are asking for insertion sort. I don't thin that your code implements insertion sort. Looks rather like bubble sort or something like that.

The reason is that:
i = j
and that mylist is accessed like:
mylist[i - 1]
Therefor the first value is 0. If the range would have started at 0, it would cause an mylist to be accessed at position -1.

Check out animated InsertionSort HERE

Later on i = j is set, and and myList[i-1] is accessed. So, j must be j >= 1.
Added: setting j = 0 is logicaly wrong because in the loop myList[j-1] is accessed - this is just by doing statical analysis of the code (and knowing i = j). Even if this cannot happen during runtime because of while i > 0, it is at least meaningless. If the expression myList[j-1] appears in the code, then it must surely be j >= 1.

The j-the iteration inserts the j-th element into the sorted elements before j. So it makes no sense to start with j=0. In the case j=1 the sublist below is myList[0:1] which is allways sorted, and the loop inserts myList[1] into the sublist myList[0:2]

Related

Find index of minimum value in a Python sublist - min() returns index of minimum value in list

I've been working on implementing common sorting algorithms into Python, and whilst working on selection sort I ran into a problem finding the minimum value of a sublist and swapping it with the first value of the sublist, which from my testing appears to be due to a problem with how I am using min() in my program.
Here is my code:
def selection_sort(li):
for i in range(0, len(li)):
a, b = i, li.index(min(li[i:]))
li[a], li[b] = li[b], li[a]
This works fine for lists that have zero duplicate elements within them:
>>> selection_sort([9,8,7,6,5,4,3,2,1])
[1, 2, 3, 4, 5, 6, 7, 8, 9]
However, it completely fails when there are duplicate elements within the list.
>>> selection_sort([9,8,8,7,6,6,5,5,5,4,2,1,1])
[8, 8, 7, 6, 6, 5, 5, 5, 4, 2, 9, 1, 1]
I tried to solve this problem by examining what min() is doing on line 3 of my code, and found that min() returns the index value of the smallest element inside the sublist as intended, but the index is of the element within the larger list rather than of the sublist, which I hope this experimentation helps to illustrate more clearly:
>>> a = [1,2,1,1,2]
>>> min(a)
1 # expected
>>> a.index(min(a))
0 # also expected
>>> a.index(min(a[1:]))
0 # should be 1?
I'm not sure what is causing this behaviour; it could be possible to copy li[i:] into a temporary variable b and then do b.index(min(b)), but copying li[i:] into b for each loop might require a lot of memory, and selection sort is an in-place algorithm so I am uncertain as to whether this approach is ideal.
You're not quite getting the concept correctly!
li.index(item) will return the first appearance of that item in the list li.
What you should do instead is if you're finding the minimum element in the sublist, search for that element in the sublist as well instead of searching it in the whole list. Also when searching in the sliced list, you will get the index in respect to the sublist. Though you can easily fix that by adding the starting step to the index returned.
A small fix for your problem would be:
def selection_sort(li):
for i in range(0, len(li)):
a, b = i, i + li[i:].index(min(li[i:]))
li[a], li[b] = li[b], li[a]
When you write a.index(min(a[1:])) you are searching for the first occurence of the min of a[1:], but you are searching in the original list. That's why you get 0 as a result.
By the way, the function you are looking for is generally called argmin. It is not contained in pure python, but numpy module has it.
One way you can do it is using list comprehension:
idxs = [i for i, val in enumerate(a) if val == min(a)]
Or even better, write your own code, which is faster asymptotically:
idxs = []
minval = None
for i, val in enumerate(a):
if minval is None or minval > val:
idxs = [i]
minval = val
elif minval == val:
idxs.append(i)

python pop() giving unexpected results [duplicate]

I was practicing python 'list variable' with 'for loop', but was surprised to see that the order of the items in the list changed.
xlist=[1,2,3,4,5]
print(xlist)
#loop all items in the lxist
for item in xlist:
print(item)
#multiply each item by 5
xlist[xlist.index(item)] = item * 5
#print the list
print(xlist)
I was expecting the list order to be [5,10,15,20,25] but instead i got [25, 10, 15, 20, 5]
I am using python 3.8(32 version) using pycharm IDE.
Can anyone clarify why the order of the list has changed
You are not using the .index method correctly. Two problems, semantically, it doesn't mean what you think it means, it gives you the first index of some object in a list. So note, on your last iteration:
xlist.index(5) == 0
Because on your first iteration, you set:
xlist[0] = 1 * 5
The correct way to do this is to maintain and index as you iterate, either manually by using something like index = 0 outside the loop and incrementing it, or by iterating over a range and extracting the item using that index. But the pythonic way to do this is to use enumerate, which automatically provides a counter when you loop:
for index, item in enumerate(xlist):
xlist[index] = item*5
The other problem is even if your items were all unique and the index returned was correct, using .index in a loop is unnecessarily making your algorithm quadratic time, since .index takes linear time.
The index method returns the index of the first occurrence of the item you have passed as an argument (assuming it exists). So, by the time you reach the last element, i.e. 5 at index 4, the item at index 0 is also 5, so you get 5 * 5 at index 0 in the final result.
When the index method is searching for the 5th number (5) it locates the first index that has that value. At this point in time, index 0 (the 1st number) is also 5 so it multiplies index 0 by 5. A better way to loop through is to use the enumerate method to loop through each index and modify the number at that index, rather than find the index afterwards. This eliminates the troubles with the index method.
xlist=[1,2,3,4,5]
print(xlist)
#loop all items in the lxist
for i, item in enumerate(xlist):
print(item)
#multiply each item by 5
xlist[i] *= 5
#print the list
print(xlist)
Results:
[1, 2, 3, 4, 5]
1
[5, 2, 3, 4, 5]
2
[5, 10, 3, 4, 5]
3
[5, 10, 15, 4, 5]
4
[5, 10, 15, 20, 5]
5
[5, 10, 15, 20, 25]

Why is the range loop in bubble sort reversed?

I am new to Python and learning data structure in Python. I am trying to implement a bubble sort algorithm in python and I did well but I was not getting a correct result. Then I found some tutorial and there I saw that they are first setting a base range for checking.
So the syntax of range in python is:
range([start], stop[, step])
And the bubble sort algorithm is:
def bubbleSort(alist):
for i in range(len(alist) - 1, 0, -1):
for j in range(i):
if alist[j] > alist[j+1]:
temp = alist[j]
alist[j] = alist[j+1]
alist[j+1] = temp
return alist
print(bubbleSort([5, 1, 2, 3, 9, 8, 0]))
I understood all the other logic of the algorithm but I am not able to get why the loop is starting from the end of the list and going till first element of the list:
for i in range(len(alist) - 1, 0, -1):
Why is this traversing the list in reverse? The main purpose of this loop is setting the range condition only so why can't we traverse from the first element to len(list) - 1 like this:
for i in range(0, len(alist) - 1, 1):
In your code, the index i is the largest index that the inner loop will consider when swapping the elements. The way bubble sort works is by swapping sibling elements to move the largest element to the right.
This means that after the first outer iteration (or the first full cycle of the inner loop), the largest element of your list is positioned at the far end of the list. So it’s already in its correct place and does not need to be considered again. That’s why for the next iteration, i is one less to skip the last element and only look at the items 0..len(lst)-1.
Then in the next iteration, the last two elements will be sorted correctly, so it only needs to look at the item 0..len(lst)-2, and so on.
So you want to decrement i since more and more elements at the end of the list will be already in its correct position and don’t need to be looked at any longer. You don’t have to do that; you could also just always have the inner loop go up to the very end but you don’t need to, so you can skip a few iterations by not doing it.
I asked why we are going reverse in the list like len(list)-1,0. Why are we not going forward way like 0,len(list)-1?
I was hoping that the above explanation would already cover that but let’s go into detail. Try adding a print(i, alist) at the end of the outer loop. So you get the result for every iteration of i:
>>> bubbleSort([5, 1, 3, 9, 2, 8, 0])
6 [1, 3, 5, 2, 8, 0, 9]
5 [1, 3, 2, 5, 0, 8, 9]
4 [1, 2, 3, 0, 5, 8, 9]
3 [1, 2, 0, 3, 5, 8, 9]
2 [1, 0, 2, 3, 5, 8, 9]
1 [0, 1, 2, 3, 5, 8, 9]
As you can see, the list will be sorted from the right to the left. This works well for our index i which will limit how far the inner loop will go: For i = 4 for example, we already have 3 sorted elements at the end, so the inner loop will only have to look at the first 4 elements.
Now, let’s try changing the range to go in the other direction. The loop will be for i in range(0, len(alist)). Then we get this result:
>>> bubbleSort([5, 1, 3, 9, 2, 8, 0])
0 [5, 1, 3, 9, 2, 8, 0]
1 [1, 5, 3, 9, 2, 8, 0]
2 [1, 3, 5, 9, 2, 8, 0]
3 [1, 3, 5, 9, 2, 8, 0]
4 [1, 3, 5, 2, 9, 8, 0]
5 [1, 3, 2, 5, 8, 9, 0]
6 [1, 2, 3, 5, 8, 0, 9]
As you can see, this is not sorted at all. But why? i still limits how far the inner loop will go, so at i = 1, the loop will only look at the first pair and sort that; the rest will stay the same. At i = 2, the loop will look at the first two pairs and swap those (once!); the rest will stay the same. And so on. By the time the inner loop can reach the last element (which is only on the final iteration), there aren’t enough iterations left to swap the zero (which also happens to be the smallest element) to the very left.
This is again because bubble sort works by sorting the largest elements to the rightmost side first. So we have to start the algorithm by making the inner loop be able to reach that right side completely. Only when we are certain that those elements are in the right position, we can stop going that far.
There is one way to use a incrementing outer loop: By sorting the smallest elements first. But this also means that we have to start the inner loop on the far right side to make sure that we check all elements as we look for the smallest element. So we really have to make those loops go in the opposite directions.
It's because when you bubble from the start of the list to the end, the final result is that the last item in the list will be sorted (you've bubbled the largest item to the end). As a result, you don't want to include the last item in the list when you do the next bubble (you know it's already in the right place). This means the list you need to sort gets shorter, starting at the end and going down towards the start. In this code, i is always the length of the remaining unsorted list.
You can use this for:
for i in range(0,len(alist)-1,1):
but consequently you should change your second iteration:
for j in range(0,len(alist)-i,1):
I think the purpose of using reverse iteration in the first line is to simplify the second iteration. This is the advantage of using python
as #Jeremy McGibbon's answer, the logic behind bubble sort is to avoid j reach the "sorted part" in the behind of list. When using the example code, j range will be decreased as the value of i decrease. When you change i to increasing, you should handle j iteration differently
You can write the code as follow
lst = [9,6,5,7,8,3,2,1,0,4]
lengthOfArray = len(lst) - 1
for i in range(lengthOfArray):
for j in range(lengthOfArray - i):
if lst[j] > lst[j + 1]:
lst[j], lst[j + 1] = lst[j + 1], lst[j]
print(lst)

Iterateing over a list python program

I wanted know how to iterate over a list in my code. I want to get rid of every index value that is equal to 10 and, after completing the program I got an index out of range error. I wanted to know what that means and how could I refine my code so that I get rid of every value that is equal to ten and then return the new list without the 10 values.
Here is my code:
mylist = [10,10,10,10,10,10,9,9,9,9,9,0,0,0]
for i in range(len(mylist)):
if mylist[i] ==10:
mylist.pop(i)
print( mylist)
An index out of range error is when you are trying to iterate past the amount of items in a data structure. so say you have 3 items in a list if you try access a 4th it will give an index error.
The issue you are having is because you are mutating the list as you iterate over it.
Its generally not a good idea to edit the data you are currently looping over. A better way might be to use a list comprehension like this
mylist = [10,10,10,10,10,10,9,9,9,9,9,0,0,0]
mylist = [x for x in mylist if x != 10]
>>> mylist
[9, 9, 9, 9, 9, 0, 0, 0]
Try the following if your goal is just to eliminate the items with value of 10:
>>> mylist = [10,10,10,10,10,10,9,9,9,9,9,0,0,0]
>>> [x for x in mylist if x != 10]
[9, 9, 9, 9, 9, 0, 0, 0]
Why you are getting Index Out Of Range: At the start of your program,
len(mylist) = 14, but as you remove elements from the list, the length
of your list decreases, and you end up accessing index that do not
exist.
How to fix it: You can try accessing the elements in reverse order and
removing 10, as you encounter it.
a = [10,10,10,10,10,10,9,9,9,9,9,0,0,0]
for i, e in reversed(list(enumerate(a))):
if e == 10:
a.pop(i)
print a

Python. Greedy algorithm, 2 for loops iterating over lists

I am trying to write a greedy algorithm where I have a constraint on how many items I can fit into a box and I have to fit as many items in a box as possible before moving on to the next one (ie maximizing the weight of each box).
I've been trying to solve this by creating two identical lists, let's say a_list and b_list.
a_list = [9, 8, 6, 4, 3, 2, 2, 2]
b_list = [9, 8, 6, 4, 3, 2, 2, 2]
The constraint on each box is 10 here, so for example I can only fit the first item (9) into one before moving onto the next box. The following box should contain 8 + 2.
Each box is a list within the main list ie
list_ = [[9], [8,2],[6,4].....]
I can only move on to next box once the current one cannot have further items fitted into it.
When I am trying iterate through the two lists I don't know how to delete items to avoid them appearing multiple times in list_.
I'm close but I have a couple of items coming up twice while one doesn't come up at all.
It is also the case that despite my sorting the lists in descending order, not all my boxes are optimal, one of them only has one item with value '2' in it. I know it's to do with the loop but I don't understand why it's not going through the items in descending order.
limit = 10
list_ = [[]]
for i in a_list:
for j in b_list:
if sum(l[-1]) + i + j <= limit:
l[-1].append(i)
l[-1].append(j)
b_list.remove(j)
elif sum(l[-1]) + j <= limit:
l[-1].append(j)
b_list.remove(j)
else:
l.append([])
The only reason I think that you're using an a_list and a b_list is that you assume you need to pick two items per box, which need not be the case.
I think you should use a single list, and use a list index based approach to track which items are added.
You also will have issues with deletes, so try setting items that are added to -1 and filter them out after each pass, to avoid confusions with deletes while looping.
I'm resisting sharing the solution code here, but ping me if you need it.
Changing a list as you iterate over it is always a challenge. Here is one solution that uses a while loop, which I generally don't endorse, but it is a simple enough algorithms that it should work here with no issues.
The while loop checks if there are any elements left in the initial list. It then pops (removes and saves) the first element of the list and iterates over the rest of the list looking for additional elements that satisfy the condition of the summing to less than the constraint. If an element is found it is appended to sub-list and its index is recorded. At the end of the for loop the sub list is append to the output list and then recorded indices are removed in the reverse order.
a_list = [9, 8, 6, 4, 3, 2, 2, 2]
constraint = 10
out = []
while a_list:
# grab first element of a_list and reset the list of
# to pop from a_list to pop from a_list
sub_out = [a_list.pop(0)]
pop_list = []
for i,a in enumerate(a_list):
if a+sum(sub_out) <= constraint:
sub_out.append(a)
pop_list.append(i)
# append the sub_list to the output list
out.append(sub_out)
# remove each item in the pop_list in the reverse order
for i in reversed(pop_list):
a_list.pop(i)
#output:
>>> out
[[9], [8, 2], [6, 4], [3, 2, 2]]

Categories