QuickSort with Pivot Function issue? - python

I have been looking at this for the past couple hours and can still not understand where I have messed up. I keep getting Index out of bounds errors like below:
Each small edit or change i have done, has run me into another error, then i end back up here after trying to simplify my code.
def quickSort(alist):
firstList = []
secondList = []
thirdList = []
if(len(alist) > 1):
#pivot = pivot_leftmost(alist)
#pivot = pivot_best_of_three(alist)
pivot = pivot_ninther(alist)
#pivot = pivot_random(alist)
for item in alist:
if(item < pivot):
firstList.append(item)
if(item == pivot):
secondList.append(item)
if(item > pivot):
thirdList.append(item)
sortedList = quickSort(firstList) + secondList + quickSort(thirdList)
return sortedList
else:
print("list:", alist)
print("sorted, nothing to do") #debug
print("") #debug
return alist
def pivot_ninther(alist):
listLength = int(len(alist))
third = int(listLength / 3)
leftList = alist[:third]
midlist = alist[third:(third * 2)]
lastlist = alist[(third * 2):(third * 3)]
leftBest = pivot_best_of_three(leftList)
midBest = pivot_best_of_three(midlist)
lastBest = pivot_best_of_three(lastlist)
pivots = [leftBest, midBest, lastBest]
return pivot_best_of_three(pivots)
I am pretty sure a fresh pair of eyes can easily find it, but i have been lookig at it for hours. Thanks!
UPDATE: (My Best_of_three function)
def pivot_best_of_three(alist):
leftmost = 0
middle = int(len(alist) / 2)
rightmost = len(alist) - 1
if (alist[leftmost] <= alist[middle] <= alist[rightmost] or alist[rightmost] <= alist[middle] <= alist[leftmost]):
return alist[middle]
elif (alist[middle] <= alist[leftmost] <= alist[rightmost] or alist[rightmost] <= alist[leftmost] <= alist[middle]):
return alist[leftmost]
else:
return alist[rightmost]

The IndexError occurs when pivot_best_of_three tries to find the rightmost member of a list of zero items. The simple way to fix that is to simply not pass it such lists. :)
Here are slightly modified versions of those functions. I've tested these versions with lists of various lengths, down to length zero, and they appear to function correctly.
def pivot_ninther(alist):
listLength = len(alist)
if listLength < 3:
return alist[0]
third = listLength // 3
leftList = alist[:third]
midlist = alist[third:-third]
lastlist = alist[-third:]
leftBest = pivot_best_of_three(leftList)
midBest = pivot_best_of_three(midlist)
lastBest = pivot_best_of_three(lastlist)
pivots = [leftBest, midBest, lastBest]
return pivot_best_of_three(pivots)
def pivot_best_of_three(alist):
leftmost = alist[0]
middle = alist[len(alist) // 2]
rightmost = alist[-1]
if (leftmost <= middle <= rightmost) or (rightmost <= middle <= leftmost):
return middle
elif (middle <= leftmost <= rightmost) or (rightmost <= leftmost <= middle):
return leftmost
else:
return rightmost
As you can see, I've simplified pivot_best_of_three so it doesn't index alist multiple times for the same value.
But it can be simplified further by using a simple sorting network:
def sort3(a, b, c):
if c < b: b, c = c, b
if b < a: a, b = b, a
if c < b: b, c = c, b
return a, b, c
def pivot_best_of_three(alist):
leftmost = alist[0]
middle = alist[len(alist) // 2]
rightmost = alist[-1]
return sort3(leftmost, middle, rightmost)[1]

Try a smaller count (<= 20) and check to see what happens in pivot_ninther() when third == 0 (at the deepest level of recursion)? Seems like it would create empty arrays and then try to index them.
The code should check to make sure length >= 9 before calling pivot_ninther(), then >= 3 if calling ...best_of_three(). If just one or two items, pick one.
Suggestion, after you get the quicksort to work, back up the source code and rather than make new arrays, the pivot function should work with the original array and first / middle / last indexes.
You can use swaps to simplify finding the median of 3. This will help in cases like starting with a reversed order array.
// median of 3
i = lo, j = (lo + hi)/2, k = hi;
if (a[k] < a[i])
swap(a[k], a[i]);
if (a[j] < a[i])
swap(a[j], a[i]);
if (a[k] < a[j])
swap(a[k], a[j]);
pivot = a[j];
wiki article:
http://en.wikipedia.org/wiki/Quicksort#Choice_of_pivot

Related

Number of inversions with merge sort: Why do I not need to initialize a global variable here?

I implemented merge sort in python and am trying to get the number of inversions of a given array. An inversion is defined as a pair of indices 0 ≤ 𝑖 < 𝑗 < 𝑛 such that 𝑎𝑖 > 𝑎𝑗. This problem is also described as 'how far away is an array from being sorted'.
Here is my code:
def merge(a,b):
c = [] # Output array
i, j = 0, 0 # Indices of the left and right split arrays
global num_inversions
# It looks like there is no need to initialize this? I don't understand this
while i < len(a) and j < len(b):
if a[i] <= b[j]:
c.append(a[i])
i += 1
else:
c.append(b[j])
j += 1
num_inversions = num_inversions + len(a) - i
if i == len(a):
c.extend(b[j:])
else:
c.extend(a[i:])
return c
def mergesort(a):
# Base case
if len(a) <= 1:
return a
mid = len(a) // 2
left_array = mergesort(a[:mid])
right_array = mergesort(a[mid:])
return merge(left_array, right_array)
if __name__ == '__main__':
#inp = input()
#n, *a = list(map(int, inp.split()))
a = [6, 5, 4, 3, 2, 1] # Sample input. Gives the correct answer 15
a_sorted = mergesort(a)
print(num_inversions)
I seem to be getting the correct result. What I don't understand is why the global variable num_inversions can be declared in the function, and why it doesn't need to be initialized.
Initializing it in the function is obviously wrong because the value will keep getting rewritten each time the function is called recursively.
Edit: Sorry, this question was incredibly stupid. I realized I did initialize it at some point while testing it, and forgot.

Finding first pair of numbers in array that sum to value

Im trying to solve the following Codewars problem: https://www.codewars.com/kata/sum-of-pairs/train/python
Here is my current implementation in Python:
def sum_pairs(ints, s):
right = float("inf")
n = len(ints)
m = {}
dup = {}
for i, x in enumerate(ints):
if x not in m.keys():
m[x] = i # Track first index of x using hash map.
elif x in m.keys() and x not in dup.keys():
dup[x] = i
for x in m.keys():
if s - x in m.keys():
if x == s-x and x in dup.keys():
j = m[x]
k = dup[x]
else:
j = m[x]
k = m[s-x]
comp = max(j,k)
if comp < right and j!= k:
right = comp
if right > n:
return None
return [s - ints[right],ints[right]]
The code seems to produce correct results, however the input can consist of array with up to 10 000 000 elements, so the execution times out for large inputs. I need help with optimizing/modifying the code so that it can handle sufficiently large arrays.
Your code inefficient for large list test cases so it gives timeout error. Instead you can do:
def sum_pairs(lst, s):
seen = set()
for item in lst:
if s - item in seen:
return [s - item, item]
seen.add(item)
We put the values in seen until we find a value that produces the specified sum with one of the seen values.
For more information go: Referance link
Maybe this code:
def sum_pairs(lst, s):
c = 0
while c<len(lst)-1:
if c != len(lst)-1:
x= lst[c]
spam = c+1
while spam < len(lst):
nxt= lst[spam]
if nxt + x== s:
return [x, nxt]
spam += 1
else:
return None
c +=1
lst = [5, 6, 5, 8]
s = 14
print(sum_pairs(lst, s))
Output:
[6, 8]
This answer unfortunately still times out, even though it's supposed to run in O(n^3) (since it is dominated by the sort, the rest of the algorithm running in O(n)). I'm not sure how you can obtain better than this complexity, but I thought I might put this idea out there.
def sum_pairs(ints, s):
ints_with_idx = enumerate(ints)
# Sort the array of ints
ints_with_idx = sorted(ints_with_idx, key = lambda (idx, num) : num)
diff = 1000000
l = 0
r = len(ints) - 1
# Indexes of the sum operands in sorted array
lSum = 0
rSum = 0
while l < r:
# Compute the absolute difference between the current sum and the desired sum
sum = ints_with_idx[l][1] + ints_with_idx[r][1]
absDiff = abs(sum - s)
if absDiff < diff:
# Update the best difference
lSum = l
rSum = r
diff = absDiff
elif sum > s:
# Decrease the large value
r -= 1
else:
# Test to see if the indexes are better (more to the left) for the same difference
if absDiff == diff:
rightmostIdx = max(ints_with_idx[l][0], ints_with_idx[r][0])
if rightmostIdx < max(ints_with_idx[lSum][0], ints_with_idx[rSum][0]):
lSum = l
rSum = r
# Increase the small value
l += 1
# Retrieve indexes of sum operands
aSumIdx = ints_with_idx[lSum][0]
bSumIdx = ints_with_idx[rSum][0]
# Retrieve values of operands for sum in correct order
aSum = ints[min(aSumIdx, bSumIdx)]
bSum = ints[max(aSumIdx, bSumIdx)]
if aSum + bSum == s:
return [aSum, bSum]
else:
return None

python3 binary search not working [duplicate]

I am trying to implement the binary search in python and have written it as follows. However, I can't make it stop whenever needle_element is larger than the largest element in the array.
Can you help? Thanks.
def binary_search(array, needle_element):
mid = (len(array)) / 2
if not len(array):
raise "Error"
if needle_element == array[mid]:
return mid
elif needle_element > array[mid]:
return mid + binary_search(array[mid:],needle_element)
elif needle_element < array[mid]:
return binary_search(array[:mid],needle_element)
else:
raise "Error"
It would be much better to work with a lower and upper indexes as Lasse V. Karlsen was suggesting in a comment to the question.
This would be the code:
def binary_search(array, target):
lower = 0
upper = len(array)
while lower < upper: # use < instead of <=
x = lower + (upper - lower) // 2
val = array[x]
if target == val:
return x
elif target > val:
if lower == x: # these two are the actual lines
break # you're looking for
lower = x
elif target < val:
upper = x
lower < upper will stop once you have reached the smaller number (from the left side)
if lower == x: break will stop once you've reached the higher number (from the right side)
Example:
>>> binary_search([1,5,8,10], 5) # return 1
1
>>> binary_search([1,5,8,10], 0) # return None
>>> binary_search([1,5,8,10], 15) # return None
Why not use the bisect module? It should do the job you need---less code for you to maintain and test.
array[mid:] creates a new sub-copy everytime you call it = slow. Also you use recursion, which in Python is slow, too.
Try this:
def binarysearch(sequence, value):
lo, hi = 0, len(sequence) - 1
while lo <= hi:
mid = (lo + hi) // 2
if sequence[mid] < value:
lo = mid + 1
elif value < sequence[mid]:
hi = mid - 1
else:
return mid
return None
In the case that needle_element > array[mid], you currently pass array[mid:] to the recursive call. But you know that array[mid] is too small, so you can pass array[mid+1:] instead (and adjust the returned index accordingly).
If the needle is larger than all the elements in the array, doing it this way will eventually give you an empty array, and an error will be raised as expected.
Note: Creating a sub-array each time will result in bad performance for large arrays. It's better to pass in the bounds of the array instead.
You can improve your algorithm as the others suggested, but let's first look at why it doesn't work:
You're getting stuck in a loop because if needle_element > array[mid], you're including element mid in the bisected array you search next. So if needle is not in the array, you'll eventually be searching an array of length one forever. Pass array[mid+1:] instead (it's legal even if mid+1 is not a valid index), and you'll eventually call your function with an array of length zero. So len(array) == 0 means "not found", not an error. Handle it appropriately.
This is a tail recursive solution, I think this is cleaner than copying partial arrays and then keeping track of the indexes for returning:
def binarySearch(elem, arr):
# return the index at which elem lies, or return false
# if elem is not found
# pre: array must be sorted
return binarySearchHelper(elem, arr, 0, len(arr) - 1)
def binarySearchHelper(elem, arr, start, end):
if start > end:
return False
mid = (start + end)//2
if arr[mid] == elem:
return mid
elif arr[mid] > elem:
# recurse to the left of mid
return binarySearchHelper(elem, arr, start, mid - 1)
else:
# recurse to the right of mid
return binarySearchHelper(elem, arr, mid + 1, end)
def binary_search(array, target):
low = 0
mid = len(array) / 2
upper = len(array)
if len(array) == 1:
if array[0] == target:
print target
return array[0]
else:
return False
if target == array[mid]:
print array[mid]
return mid
else:
if mid > low:
arrayl = array[0:mid]
binary_search(arrayl, target)
if upper > mid:
arrayu = array[mid:len(array)]
binary_search(arrayu, target)
if __name__ == "__main__":
a = [3,2,9,8,4,1,9,6,5,9,7]
binary_search(a,9)
Using Recursion:
def binarySearch(arr,item):
c = len(arr)//2
if item > arr[c]:
ans = binarySearch(arr[c+1:],item)
if ans:
return binarySearch(arr[c+1],item)+c+1
elif item < arr[c]:
return binarySearch(arr[:c],item)
else:
return c
binarySearch([1,5,8,10,20,50,60],10)
All the answers above are true , but I think it would help to share my code
def binary_search(number):
numbers_list = range(20, 100)
i = 0
j = len(numbers_list)
while i < j:
middle = int((i + j) / 2)
if number > numbers_list[middle]:
i = middle + 1
else:
j = middle
return 'the index is '+str(i)
If you're doing a binary search, I'm guessing the array is sorted. If that is true you should be able to compare the last element in the array to the needle_element. As octopus says, this can be done before the search begins.
You can just check to see that needle_element is in the bounds of the array before starting at all. This will make it more efficient also, since you won't have to do several steps to get to the end.
if needle_element < array[0] or needle_element > array[-1]:
# do something, raise error perhaps?
It returns the index of key in array by using recursive.
round() is a function convert float to integer and make code fast and goes to expected case[O(logn)].
A=[1,2,3,4,5,6,7,8,9,10]
low = 0
hi = len(A)
v=3
def BS(A,low,hi,v):
mid = round((hi+low)/2.0)
if v == mid:
print ("You have found dude!" + " " + "Index of v is ", A.index(v))
elif v < mid:
print ("Item is smaller than mid")
hi = mid-1
BS(A,low,hi,v)
else :
print ("Item is greater than mid")
low = mid + 1
BS(A,low,hi,v)
BS(A,low,hi,v)
Without the lower/upper indexes this should also do:
def exists_element(element, array):
if not array:
yield False
mid = len(array) // 2
if element == array[mid]:
yield True
elif element < array[mid]:
yield from exists_element(element, array[:mid])
else:
yield from exists_element(element, array[mid + 1:])
Returning a boolean if the value is in the list.
Capture the first and last index of the list, loop and divide the list capturing the mid value.
In each loop will do the same, then compare if value input is equal to mid value.
def binarySearch(array, value):
array = sorted(array)
first = 0
last = len(array) - 1
while first <= last:
midIndex = (first + last) // 2
midValue = array[midIndex]
if value == midValue:
return True
if value < midValue:
last = midIndex - 1
if value > midValue:
first = midIndex + 1
return False

Python function to split and add lists recursively

I'm having trouble writing a function in python that will take a list, split it into 2 equal sides and then recursively add each element in each half. In the end returning the sum of both halves.
def sumLists(aList):
half = len(aList)//2
leftHalf = aList[half:]
rightHalf = aList[:half]
if len(aList) == 1:
return aList[0]
else:
sumOfLeft = sumLists(leftHalf[1:])
resultLeft = leftHalf[0] + sumOfLeft
sumOfRight = sumLists(rightHalf[1:])
resultRight = rightHalf[0] + sumOfRight
return resultLeft + resultRight
Any tips are appreciated, thanks!
You're overcomplicating the else block. You don't need to call sumLists on leftHalf[1:] and rightHalf[1:] and manually add the first respective elements; it suffices to call sumLists on the complete lists.
This slicing is what's causing your RuntimeError. A leftHalf with length one will have a leftHalf[1:] of length zero. But your function recurses forever for lengths of length zero because you didn't write an if case for that scenario.
You could rewrite your else so that it doesn't require slicing:
def sumLists(aList):
half = len(aList)//2
leftHalf = aList[half:]
rightHalf = aList[:half]
if len(aList) == 1:
return aList[0]
else:
return sumLists(leftHalf) + sumLists(rightHalf)
... Or you could add a special case for empty lists:
def sumLists(aList):
half = len(aList)//2
leftHalf = aList[half:]
rightHalf = aList[:half]
if len(aList) == 0:
return 0
elif len(aList) == 1:
return aList[0]
else:
sumOfLeft = sumLists(leftHalf[1:])
resultLeft = leftHalf[0] + sumOfLeft
sumOfRight = sumLists(rightHalf[1:])
resultRight = rightHalf[0] + sumOfRight
return resultLeft + resultRight
Or both:
def sumLists(aList):
half = len(aList)//2
leftHalf = aList[half:]
rightHalf = aList[:half]
if len(aList) == 0:
return 0
if len(aList) == 1:
return aList[0]
else:
return sumLists(leftHalf) + sumLists(rightHalf)
I think aList[half:] is the right side, and aList[:half] is the left side, is it right?
the follow code will sum the list left and right side, hope this can solve your problem.
def sumlist(l):
if not l or not isinstance(l, list):
return 0 # or return other default value.
if len(l) == 1:
return l[0]
half = len(l) // 2
left = l[:half] # left
right = l[-half:] # right
return sumlist(left) + sumlist(right)
test:
l = [1,2,3,4,5,6,7,8,9]
result = sumlist(l)
print(result) # 40
Maybe I misread your question, but do you actually need to do both things in one function?
To sum a list recursively, you could define that the empty list (your base case) has the sum 0. Any non-empty list has the sum of the first element plus the sum of the remainder (the "tail") of the list:
def sumList(a):
return 0 if not a else a[0] + sumList(a[1:])
To split a list recursively, you could keep track of the "left" and "right" list. The left list starts out being your input, the right list is empty. You then recursively prepend the last element of the left list to the right list until the right list is no longer shorter than the left:
def splitList(a, b=None):
b = b or []
return (a, b) if len(a) <= len(b) else splitList(a[:-1], [a[-1]] + b)
I suspect that passing around slices of lists is very inefficient in Python, so it might be better to rather pass around indices, e.g.
def sumList(a, idx=None):
idx = idx or 0
return 0 if idx >= len(a) else a[idx] + sumList(a, idx+1)

Implementing Median of Median Selection Algorithm in Python

Okay. I give up. I've been trying to implement the median of medians algorithm but I am continually given the wrong result. I know there is a lot of code below, but I can't find my error, and each chunk of code has a fairly process design. Quicksort is what I use to sort the medians I get from the median of medians pivot selection. Should be a straightforward quicksort implementation. getMean simply returns the mean of a given list.
getPivot is what I use to select the pivot. It runs through the list, taking 5 integers at a time, finding the mean of those integers, placing that mean into a list c, then finding the median of c. That is the pivot I use for the dSelect algorithm.
The dSelect algorithm is simple in theory. The base case returns an item when the list is 1 item long. Otherwise, much like in quickSort, I iterate over the list. If the number I am currently on, j, is less than the pivot, I move it to the left of the list, i, and increment i. If it is larger, I move it to the right of the list, i + 1, and do not increment i. After this loops through the entire list, I should have the pivot in its proper index, and print statements indicate that I do. At this point, I recurse to the left or the right depending on whether the pivot is greater than or less than the position I am trying to find.
I am not sure what other print statements to test at this point, so I'm turning to anyone dedicated enough to take a stab at this code. I know there are related topics, I know I could do more print statements, but believe me, I've tried. What should be a simple algo has got me quite stumped.
def quickSort(m, left, right):
if right - left <= 1:
return m
pivot = m[left]
i = left + 1
j = left + 1
for j in range(j, right):
if m[j] < pivot:
m[j], m[i] = m[i], m[j]
i += 1
elif m[j] == pivot:
m[j], m[i] = m[i], m[j]
print m
m[left], m[i-1] = m[i-1], m[left]
m = quickSort(m, left, i-1)
m = quickSort(m, i, right)
print m
return m
def getMedian(list):
length = len(list)
if length <= 1:
return list[0]
elif length % 2 == 0:
i = length/2
return list[i]
else:
i = (length + 1)/2
return list[i]
def getPivot(m):
c = []
i = 0
while i <= len(m) - 1:
tempList = []
j = 0
while j < 5 and i <= len(m) - 1:
tempList.append(m[i])
i = i + 1
j = j + 1
tempList = quickSort(tempList, 0, len(tempList) - 1)
c.append(getMedian(tempList))
c = quickSort(c, 0, len(c) - 1)
medianOfMedians = getMedian(c)
return medianOfMedians
def dSelect(m, position):
pivot = getPivot(m)
i = 0
j = 0
if len(m) <= 1:
return m[0]
for j in range(0, len(m)):
if m[j] < pivot:
m[j], m[i] = m[i], m[j]
i += 1
elif m[j] == pivot:
m[j], m[i] = m[i], m[j]
print "i: " + str(i)
print "j: " + str(j)
print "index of pivot: " + str(m.index(pivot))
print "pivot: " + str(pivot) + " list: " + str(m)
if m.index(pivot) == position:
return pivot
elif m.index(pivot) > position:
return dSelect(m[0:i], position)
else:
return dSelect(m[i:], position - i)
The biggest issue is with this line here:
i = (length + 1)/2
if list = [1, 2, 3, 4, 5, 6, 7] the answer should be 4 which is list[3]. Your version looks like the following:
i = (7 + 1) / 2
and so i is equal to 4 instead of 3. Similar problem with the even length list part although that shouldn't be as big of an issue.

Categories