Quick Sort algorithm with three way partition - python

I am new to algorithms and was working on implementing the Quick Sort algorithm with a 3-way partition such that it works fast even on sequences containing many equal elements. The following was my implementation:
def randomized_quick_sort(a, l, r):
if l >= r:
return
k = random.randint(l, r)
a[l], a[k] = a[k], a[l]
#use partition3
m1,m2 = partition3(a, l, r)
randomized_quick_sort(a, l, m1 - 1);
randomized_quick_sort(a, m2 + 1, r);
def partition3(a, l, r):
x, j, t = a[l], l, r
for i in range(l + 1, t+1):
if a[i] < x:
j += 1
a[i], a[j] = a[j], a[i]
elif a[i]>x:
a[i],a[t]=a[t],a[i]
i-=1
t-=1
a[l], a[j] = a[j], a[l]
return j,t
It does not generate correctly sorted lists. I found the correct implementation of the partition code here in Stack Overflow.
def partition3(a, l, r):
x, j, t = a[l], l, r
i = j
while i <= t :
if a[i] < x:
a[j], a[i] = a[i], a[j]
j += 1
elif a[i] > x:
a[t], a[i] = a[i], a[t]
t -= 1
i -= 1 # remain in the same i in this case
i += 1
return j,t
Can someone please explain to me how the incorrect partition implementation was failing?
Thanks in advance

Related

Python MergeSort not sorting correctly

I am trying to implement python Merge Sort, but for some reason when merging it does not sort it correctly at all. I am trying to turn Pseudo code into Python code and I am failing miserably. If anyone can help, I'd really appreciate it. I have tried to debug it and I am just confused.
def mergeSort(A, p, r):
if p < r:
q = int(((p + (r-1)) / 2))
mergeSort(A, p, q)
mergeSort(A, q + 1, r)
merge(A, p, q, r)
def merge(A, p, q, r): # Issue with sorting
n1 = q - p + 1
n2 = r - q
L = [0] * n1
R = [0] * n2
for i in range(0, n1):
L[i] = A[p + i]
for j in range(0, n2):
R[j] = A[q + j]
L.append(infinity)
R.append(infinity)
i = 0
j = 0
for k in range(p,r):
if L[i] <= R[j]:
A[k] = L[i]
i = i + 1
else:
A[k] = R[j]
j = j + 1
Example Input:
['date', 'apple', 'banana', 'cucumber', 'acorn', 'aaaa']
Example Output:
['banana', 'acorn', 'aaaa', 'cucumber', 'date', 'date']
mergeSort psuedocode
merge pseudocode
The issue with your implementation is that you are not properly updating the indices when merging the two sublists back together. In the following lines:
for k in range(p,r):
if L[i] <= R[j]:
A[k] = L[i]
i = i + 1
else:
A[k] = R[j]
j = j + 1
you should be incrementing k instead of i and j. Also the condition inside of the if statement should be L[i] <= R[j] instead of L[i] >= R[j].
Here is the corrected version of the merge function:
def merge(A, p, q, r):
n1 = q - p + 1
n2 = r - q
L = [0] * n1
R = [0] * n2
for i in range(0, n1):
L[i] = A[p + i]
for j in range(0, n2):
R[j] = A[q + j]
i = 0
j = 0
k = p
while i < n1 and j < n2:
if L[i] <= R[j]:
A[k] = L[i]
i += 1
else:
A[k] = R[j]
j += 1
k += 1
while i < n1:
A[k] = L[i]
i += 1
k += 1
while j < n2:
A[k] = R[j]
j += 1
k += 1
Also, you have to import the math library and use math.inf instead of infinity.

How can I convert pseudocode to mergesort algorithm?

I need to convert pseudocode into a merge sort algorithm that mirrors that pseudocode. I am new to pseudocode so I'm having trouble with this. Can anyone tell me what is wrong with my algorithm? Please note that the arrays in the pseudocode are 1-indexed.
PSEUDOCODE:
MergeSort(A[1 .. n]):
if n > 1
m ← bn/2c
MergeSort(A[1 .. m])
MergeSort(A[m + 1 .. n])
Merge(A[1 .. n], m)
Merge(A[1 .. n], m):
i ← 1; j ← m + 1
for k ← 1 to n
if j > n
B[k] ← A[i]; i ← i + 1
else if i > m
B[k] ← A[j]; j ← j + 1
else if A[i] < A[ j]
B[k] ← A[i]; i ← i + 1
else
B[k] ← A[j]; j ← j + 1
for k ← 1 to n
A[k] ← B[k]
MY CODE
def mergeSort(arr):
n = len(arr)
if n > 1:
m = n//2
mergeSort(arr[:m])
mergeSort(arr[m:])
merge(arr, m)
def merge(arr, m):
n = len(arr)
i = 0
j = m
b = [0] * n
for k in range(n):
if j >= n:
b[k] = arr[i]
i += 1
elif i > m-1:
b[k] = arr[j]
j += 1
elif arr[i] < arr[j]:
b[k] = arr[i]
i += 1
else:
b[k] = arr[j]
j += 1
for k in range(n):
arr[k] = b[k]
The thing is that in the pseudo code version, the notation A[1..m] is supposed to mean a partition of the array, but in-place, not as a new array (slice): it is like a window on a part of the array, with its own indexing, but not copied.
The translation to list slicing in Python does not reflect this. arr[:m] creates a new list, and so whatever mergeSort(arr[:m]) does with that new list, it doesn't touch arr itself: all that work is for nothing, as it doesn't mutate arr, but a sliced copy of it, which we lose access to.
A solution is to not create slices, but to pass the start/end indices of the intended partition to the function call.
Here is the adapted code:
def mergeSort(arr):
mergeSortRec(arr, 0, len(arr))
def mergeSortRec(arr, start, end):
n = end - start
if n > 1:
m = start + n//2
mergeSortRec(arr, start, m)
mergeSortRec(arr, m, end)
merge(arr, start, m, end)
def merge(arr, start, m, end):
n = end - start
i = start
j = m
b = [0] * n
for k in range(n):
if j >= end:
b[k] = arr[i]
i += 1
elif i >= m:
b[k] = arr[j]
j += 1
elif arr[i] < arr[j]:
b[k] = arr[i]
i += 1
else:
b[k] = arr[j]
j += 1
for k in range(n):
arr[start + k] = b[k]
This code is working fine for me, however your operations are being applied in place, so you just need to call the function with the array to sort rather than getting the return value (which will always be None, because you provide no return in the function mergeSort)
arr = np.random.uniform(1, 10, 10)
print(arr)
[2.10748505 9.47408117 5.4620788 1.5585025 9.57387679 4.13719947
1.28671255 4.150946 2.84044402 6.56294717]
mergeSort(arr)
print(arr)
[1.28671255 1.5585025 2.10748505 2.84044402 4.13719947 4.150946
5.4620788 6.56294717 9.47408117 9.57387679]

Key comparisons in a merge-insertion hybrid sort

I was given the task with the merge-insertion sort described as(paraphrased):
Starting off with merge sort, once a threshold S(small positive integer) is reached, the algorithm will then sort the sub arrays with insertion sort.
We are tasked to find the optimal S value for varying length of inputs to achieve minimum key comparisons. I implemented the code by modifying what was available online to get:
def mergeSort(arr, l, r, cutoff):
if l < r:
m = l+(r-l)//2
if len(arr[l:r+1]) > cutoff:
return mergeSort(arr, l, m, cutoff)+mergeSort(arr, m+1, r, cutoff)+merge(arr, l, m, r)
else:
return insertionSort(arr, l, r+1)
return 0
def merge(arr, l, m, r):
comp = 0
n1 = m - l + 1
n2 = r - m
L = [0] * (n1)
R = [0] * (n2)
for i in range(0, n1):
L[i] = arr[l + i]
for j in range(0, n2):
R[j] = arr[m + 1 + j]
i = 0
j = 0
k = l
while i < n1 and j < n2:
if L[i] <= R[j]:
arr[k] = L[i]
i += 1
else:
arr[k] = R[j]
j += 1
k += 1
comp +=1
while i < n1:
arr[k] = L[i]
i += 1
k += 1
while j < n2:
arr[k] = R[j]
j += 1
k += 1
return comp
def insertionSort(arr, l, r):
comp = 0
for i in range(l+1, r):
key = arr[i]
j = i-1
while j >= l:
if key >= arr[j]:
comp += 1
break
arr[j + 1] = arr[j]
j -= 1
comp += 1
arr[j + 1] = key
return comp
However the graph I get for the minimum value of S against length is:
This means that a near-pure mergesort is almost always preferred over the hybrid. Which is against what is available online, saying that insertion sort will perform faster than mergesort at low values of S(~10-25). I can't seem to find any error with my code, so is hybrid sort really better than merge sort?
IMO the question is flawed.
Mergesort always performs N Lg(N) key comparisons, while Insertionsort takes N²/2 of them. Hence as of N=2, the comparison count favors Mergesort in all cases. (This is only approximate, as N does not always divide evenly).
But the number of moves as well as the overhead will tend to favor Insertionsort. So a more relevant metric is the actual running time which, unfortunately, will depend on the key length and type.

Merge sort in python: slicing vs iterating - impact on complexity

I want to check that my understanding of how python handles slices is correct.
Here's my implementation of merge sort:
def merge_sort(L):
def merge(a, b):
i, j = 0, 0
c = []
while i < len(a) and j < len(b):
if a[i] < b[j]:
c.append(a[i])
i += 1
elif b[j] < a[i]:
c.append(b[j])
j += 1
if a[i:]:
c.extend(a[i:])
if b[j:]:
c.extend(b[j:])
return c
if len(L) <= 1:
return L
else:
mid = len(L) // 2
left = merge_sort(L[:mid])
right = merge_sort(L[mid:])
return merge(left, right)
Am I right in thinking that I could replace this:
if a[i:]:
c.extend(a[i:])
if b[j:]:
c.extend(b[j:])
With this:
while i < len(a):
c.append(a[i])
i += 1
while j < len(b):
c.append(b[j])
j += 1
And have the exact same level of complexity? My understanding of slicing is that its complexity is equivalent to slice length? Is that correct?
Does the fact that I'm calling a slice twice (first in the condition, second time inside of it) make it 2x complexity?
Your implementation of mergesort has problems:
in the merge function's main loop, you do nothing if the values in a[i] and b[j] are equal, or more precisely if you have neither a[i] < b[i] nor a[i] > b[i]. This causes an infinite loop.
there is no need to define merge as a local function, actually there is no need to make it a separate function, you could inline the code and save the overhead of a function call.
Here is a modified version:
def merge_sort(L):
if len(L) <= 1:
return L
else:
mid = len(L) // 2
a = merge_sort(L[:mid])
b = merge_sort(L[mid:])
i, j = 0, 0
c = []
while i < len(a) and j < len(b):
if a[i] <= b[j]:
c.append(a[i])
i += 1
else:
c.append(b[j])
j += 1
if a[i:]:
c.extend(a[i:])
else:
c.extend(b[j:])
return c
Regarding performance, slicing or iterating has no impact on complexity since both operations have linear time cost.
Regarding performance, here are directions to try:
replace the test if a[i:] with if i < len(a). Creating the slice twice is costly.
perform the sort in place, avoiding the append operations
restructure the main loop to have a single test per iteration
Here is a modified version:
def merge_sort(L):
if len(L) <= 1:
return L
else:
mid = len(L) // 2
a = merge_sort(L[:mid])
b = merge_sort(L[mid:])
i, j, k = 0, 0, 0
while True:
if a[i] <= b[j]:
L[k] = a[i]
k += 1
i += 1
if (i == len(a)):
L[k:] = b[j:]
return L
else:
L[k] = b[j]
k += 1
j += 1
if (j == len(b)):
L[k:] = a[i:]
return L

Count number of comparisons for QuickSort

I want to count the number of comparisons in quicksort. In order to do so, I introduced a counting variable c. Although I think the implementation is correct, the counter is significantly higher than with insertion sort, which should not be the case. Have I done something wrong?
Here is my code.
def quick_sort(a):
c = 0
c = quickSortImpl(a, 0, len(a)-1, c)
return c
def quickSortImpl(a, l, r, c):
if r > l:
k, c = partition(a, l, r, c)
c = quickSortImpl(a, l, k-1, c)
c = quickSortImpl(a, k+1, r, c)
return c
def partition(a, l, r, c):
pivot = a[r]
i = l
j = r - 1
while True:
c += 1
while i < r and a[i] <= pivot:
c += 1
i += 1
c += 1
while j > l and a[j] >= pivot:
c += 1
j -= 1
if i < j:
a[i], a[j] = a[j], a[i]
else:
break
a[r] = a[i]
a[i] = pivot
return i, c
Click here for comparison between Insertion Sort vs Quick Sort

Categories