Functional implementation of recursive merge sort?

Functional implementation of recursive merge sort? - python

it has been several days of trying to implement a functional recursive merge sort in Python. Aside from that, I want to be able to print out each step of the sorting algorithm. Is there any way to make this Python code run in a functional-paradigm way? This is what I have so far...
def merge_sort(arr, low, high):
# If low is less than high, proceed. Else, array is sorted
if low < high:
mid = (low + high) // 2 # Get the midpoint
merge_sort (arr, low, mid) # Recursively split the left half
merge_sort (arr, mid+1, high) # Recursively split the right half
return merge(arr, low, mid, high) # merge halves together
def merge (arr, low, mid, high):
temp = []
# Copy all the values into a temporary array for displaying
for index, elem in enumerate(arr):
temp.append(elem)
left_p, right_p, i = low, mid+1, low
# While left and right pointer still have elements to read
while left_p <= mid and right_p <= high:
if temp[left_p] <= temp[right_p]: # If the left value is less than the right value. Shift left pointer by 1 unit to the right
arr[i] = temp[left_p]
left_p += 1
else: # Else, place the right value into target array. Shift right pointer by 1 unit to the right
arr[i] = temp[right_p]
right_p += 1
i += 1 # Increment target array pointer
# Copy the rest of the left side of the array into the target array
while left_p <= mid:
arr[i] = temp[left_p]
i += 1
left_p += 1
print(*arr) # Display the current form of the array
return arr
def main():
# Get input from user
arr = [int(input()) for x in range(int(input("Input the number of elements: ")))]
print("Sorting...")
sorted_arr = merge_sort(arr.copy(), 0, len(arr)-1)
print("\nSorted Array")
print(*sorted_arr)
if __name__ == "__main__":
main()
Any help would be appreciated! Thank you.

In a purely functional merge sort, we don't want to mutate any values.
We can define a nice recursive version with zero mutation like so:
def merge(a1, a2):
if len(a1) == 0:
return a2
if len(a2) == 0:
return a1
if a1[0] <= a2[0]:
rec = merge(a1[1:], a2)
return [a1[0]] + rec
rec = merge(a1, a2[1:])
return [a2[0]] + rec
def merge_sort(arr):
if len(arr) <= 1:
return arr
halfway = len(arr) // 2
left = merge_sort(arr[:halfway])
right = merge_sort(arr[halfway:])
return merge(left, right)
You can add a print(arr) to the top of merge_sort to print step-by-step, however technically side-effects will make it impure (although still referentially transparent, in this case). In python, however, you can't separate out the side effects from the pure computation using monads, so if you want to really avoid this print, you'll have to return the layers, and print them at the end :)
Also, this version is technically doing a lot of copies of the list, so it's relatively slow. This can be fixed by using a linked list, and consing / unconsing it. However that's out of scope.

Related

Finding Pivot element in a sorted array, How do I optimize this code?

Is there a way to optimize this code?
I'm trying to find an element in the list where the sorted list is rotated. [8,9,10,11,12,6,7] Here the element I'm trying to find is 6 with an index 5.
class Solution:
def search(self, nums: List[int]) -> int:
lenn = len(nums)-1
pivot, mid = 0, (0 + lenn)//2
if(nums[0] > nums[lenn]):
while True:
if(nums[mid] > nums[mid+1]):
pivot = mid+1
break
if(nums[mid] < nums[mid-1]):
pivot = mid
break
if(nums[mid] > nums[lenn]):
mid += 1
if(nums[mid] < nums[lenn]):
mid -= 1
return pivot

Your solution is fast for small lists but not for big ones as it perform a linear-time search. To be faster, you can use a binary search. To do that, you need to use a keep a [start;end] range of values that contains the "pivot". You just need to take the middle item and find which part (left or right) contains increasing items.
Moreover, you can micro-optimize the code. The default implementation of Python is CPython which is a slow interpreter. It optimize almost nothing compared to compilers (of languages like C, Java, Rust) so you should do the optimizations yourself if you want a fast code. The idea is to use a linear search for few items and a binary search for many ones, and to store values in temporary variables not to compute them twice.
Here is the resulting implementation:
def search(nums):
first, last = nums[0], nums[-1]
# Trivial case
if first <= last:
return 0
lenn = len(nums)-1
# Micro-optimized fast linear search for few items (optional)
if lenn < 10:
pivot, mid = 0, lenn//2
while True:
middle = nums[mid]
if middle > nums[mid+1]:
return mid+1
elif middle < nums[mid-1]:
return mid
elif middle > last:
mid += 1
elif middle < last:
mid -= 1
# Binary search for many items
start, end = 0, lenn
while start+1 < end:
mid = (start + end) // 2
midVal = nums[mid]
if midVal < nums[start]:
end = mid
elif midVal > nums[end]:
start = mid
else:
assert False
return start+1

Recursive Quicksort using Lambdas

I'm trying to implement a recursive quicksort algorithm using two methods (swap, partition) while running the main algorithm using recursion in a lambda expression. I'm getting an infinite recursion error and honestly I can't find the syntax error. Can someone help me out? Thanks :)
def swap(array, a, b):
array[a], array[b] = array[b], array[a]
def partition(array, high, low):
pivot = array[high]
i = low
for x in range(low, high-1):
if array[x] < pivot:
i+=1
swap(array, array[x], array[high])
return i
g = lambda array, low, high: g(array,low,partition(array,high,low)-1)+g(array,partition(array,high,low)+1,high) if low < high else print("not sorted")

There are several issues in partition:
The call to swap is passing values from your list, instead of indices.
Even when the previous mistake is corrected, it will either move the pivot value to the low+1 index, or it will not move at all.
The returned index i, should be the one where the pivot was moved. In a correct implementation that means i is the last index to which a value was moved, which was the value at index high. This is not what is happening, as already with the first swap the pivot value is moved.
The swap should be of the current value with the value at i, so that all values up to the one at index i are less or equal to the pivot value.
Here is the corrected partition function:
def partition(array, high, low):
pivot = array[high]
i = low - 1
for x in range(low, high+1):
if array[x] <= pivot:
i+=1
swap(array, x, i)
return i
These are the issues in the function g:
It is supposed to perform the sort in-place, so the + operator for lists should not occur here, as that would create a new list. Moreover, the base case (in else) does not return anything, so the + operator will fail with an error
partition(array,high,low) is called twice, which is not only a waste, but the second call will in most cases return a different result, because the pivot can be different. This means the second call of g will potentially not work with an adjacent partition, but will either leave an (unsorted) gap, or work on an overlapping partition.
Here is a correction for the function g:
def g(array, low, high):
if low < high:
i = partition(array, high, low)
g(array, low, i-1)
g(array, i+1, high)
You should also consider using a better name than g, and change the order of the high/low parameters for partition: that reversed order is a good way to confuse the readers of your code.

Here is Hoare's quicksort algorithm implemented in Python -
def quicksort(A, lo, hi):
if lo >= 0 and hi >= 0 and lo < hi:
p = partition(A, lo, hi)
quicksort(A, lo, p)
quicksort(A, p + 1, hi)
def partition(A, lo, hi):
pivot = A[(hi + lo) // 2]
i = lo
j = hi
while True:
while A[i] < pivot:
i += 1
while A[j] > pivot:
j -= 1
if i >= j:
return j
swap(A, i, j)
def swap(A, i, j):
A[i], A[j] = A[j], A[i]
You can write g using lambda if you wish, but I would recommend to define an ordinary function instead -
g = lambda a: quicksort(a, 0, len(a) - 1)
Given a sample input, x -
x = [5,0,9,7,4,2,8,3,1,6]
g(x)
print(x)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
See this related Q&A if you would like to count the number of comparisons and swaps used.

python3 binary search not working [duplicate]

I am trying to implement the binary search in python and have written it as follows. However, I can't make it stop whenever needle_element is larger than the largest element in the array.
Can you help? Thanks.
def binary_search(array, needle_element):
mid = (len(array)) / 2
if not len(array):
raise "Error"
if needle_element == array[mid]:
return mid
elif needle_element > array[mid]:
return mid + binary_search(array[mid:],needle_element)
elif needle_element < array[mid]:
return binary_search(array[:mid],needle_element)
else:
raise "Error"

It would be much better to work with a lower and upper indexes as Lasse V. Karlsen was suggesting in a comment to the question.
This would be the code:
def binary_search(array, target):
lower = 0
upper = len(array)
while lower < upper: # use < instead of <=
x = lower + (upper - lower) // 2
val = array[x]
if target == val:
return x
elif target > val:
if lower == x: # these two are the actual lines
break # you're looking for
lower = x
elif target < val:
upper = x
lower < upper will stop once you have reached the smaller number (from the left side)
if lower == x: break will stop once you've reached the higher number (from the right side)
Example:
>>> binary_search([1,5,8,10], 5) # return 1
1
>>> binary_search([1,5,8,10], 0) # return None
>>> binary_search([1,5,8,10], 15) # return None

Why not use the bisect module? It should do the job you need---less code for you to maintain and test.

array[mid:] creates a new sub-copy everytime you call it = slow. Also you use recursion, which in Python is slow, too.
Try this:
def binarysearch(sequence, value):
lo, hi = 0, len(sequence) - 1
while lo <= hi:
mid = (lo + hi) // 2
if sequence[mid] < value:
lo = mid + 1
elif value < sequence[mid]:
hi = mid - 1
else:
return mid
return None

In the case that needle_element > array[mid], you currently pass array[mid:] to the recursive call. But you know that array[mid] is too small, so you can pass array[mid+1:] instead (and adjust the returned index accordingly).
If the needle is larger than all the elements in the array, doing it this way will eventually give you an empty array, and an error will be raised as expected.
Note: Creating a sub-array each time will result in bad performance for large arrays. It's better to pass in the bounds of the array instead.

You can improve your algorithm as the others suggested, but let's first look at why it doesn't work:
You're getting stuck in a loop because if needle_element > array[mid], you're including element mid in the bisected array you search next. So if needle is not in the array, you'll eventually be searching an array of length one forever. Pass array[mid+1:] instead (it's legal even if mid+1 is not a valid index), and you'll eventually call your function with an array of length zero. So len(array) == 0 means "not found", not an error. Handle it appropriately.

This is a tail recursive solution, I think this is cleaner than copying partial arrays and then keeping track of the indexes for returning:
def binarySearch(elem, arr):
# return the index at which elem lies, or return false
# if elem is not found
# pre: array must be sorted
return binarySearchHelper(elem, arr, 0, len(arr) - 1)
def binarySearchHelper(elem, arr, start, end):
if start > end:
return False
mid = (start + end)//2
if arr[mid] == elem:
return mid
elif arr[mid] > elem:
# recurse to the left of mid
return binarySearchHelper(elem, arr, start, mid - 1)
else:
# recurse to the right of mid
return binarySearchHelper(elem, arr, mid + 1, end)

def binary_search(array, target):
low = 0
mid = len(array) / 2
upper = len(array)
if len(array) == 1:
if array[0] == target:
print target
return array[0]
else:
return False
if target == array[mid]:
print array[mid]
return mid
else:
if mid > low:
arrayl = array[0:mid]
binary_search(arrayl, target)
if upper > mid:
arrayu = array[mid:len(array)]
binary_search(arrayu, target)
if __name__ == "__main__":
a = [3,2,9,8,4,1,9,6,5,9,7]
binary_search(a,9)

Using Recursion:
def binarySearch(arr,item):
c = len(arr)//2
if item > arr[c]:
ans = binarySearch(arr[c+1:],item)
if ans:
return binarySearch(arr[c+1],item)+c+1
elif item < arr[c]:
return binarySearch(arr[:c],item)
else:
return c
binarySearch([1,5,8,10,20,50,60],10)

All the answers above are true , but I think it would help to share my code
def binary_search(number):
numbers_list = range(20, 100)
i = 0
j = len(numbers_list)
while i < j:
middle = int((i + j) / 2)
if number > numbers_list[middle]:
i = middle + 1
else:
j = middle
return 'the index is '+str(i)

If you're doing a binary search, I'm guessing the array is sorted. If that is true you should be able to compare the last element in the array to the needle_element. As octopus says, this can be done before the search begins.

You can just check to see that needle_element is in the bounds of the array before starting at all. This will make it more efficient also, since you won't have to do several steps to get to the end.
if needle_element < array[0] or needle_element > array[-1]:
# do something, raise error perhaps?

It returns the index of key in array by using recursive.
round() is a function convert float to integer and make code fast and goes to expected case[O(logn)].
A=[1,2,3,4,5,6,7,8,9,10]
low = 0
hi = len(A)
v=3
def BS(A,low,hi,v):
mid = round((hi+low)/2.0)
if v == mid:
print ("You have found dude!" + " " + "Index of v is ", A.index(v))
elif v < mid:
print ("Item is smaller than mid")
hi = mid-1
BS(A,low,hi,v)
else :
print ("Item is greater than mid")
low = mid + 1
BS(A,low,hi,v)
BS(A,low,hi,v)

Without the lower/upper indexes this should also do:
def exists_element(element, array):
if not array:
yield False
mid = len(array) // 2
if element == array[mid]:
yield True
elif element < array[mid]:
yield from exists_element(element, array[:mid])
else:
yield from exists_element(element, array[mid + 1:])

Returning a boolean if the value is in the list.
Capture the first and last index of the list, loop and divide the list capturing the mid value.
In each loop will do the same, then compare if value input is equal to mid value.
def binarySearch(array, value):
array = sorted(array)
first = 0
last = len(array) - 1
while first <= last:
midIndex = (first + last) // 2
midValue = array[midIndex]
if value == midValue:
return True
if value < midValue:
last = midIndex - 1
if value > midValue:
first = midIndex + 1
return False

Optimizing solution to Three Sum

I am trying to solve the 3 Sum problem stated as:
Given an array S of n integers, are there elements a, b, c in S such that a + b + c = 0? Find all unique triplets in the array which gives the sum of zero.
Note: The solution set must not contain duplicate triplets.
Here is my solution to this problem:
def threeSum(nums):
"""
:type nums: List[int]
:rtype: List[List[int]]
"""
nums.sort()
n = len(nums)
solutions = []
for i, num in enumerate(nums):
if i > n - 3:
break
left, right = i+1, n-1
while left < right:
s = num + nums[left] + nums[right] # check if current sum is 0
if s == 0:
new_solution = [num, nums[left], nums[right]]
# add to the solution set only if this triplet is unique
if new_solution not in solutions:
solutions.append(new_solution)
right -= 1
left += 1
elif s > 0:
right -= 1
else:
left += 1
return solutions
This solution works fine with a time complexity of O(n**2 + k) and space complexity of O(k) where n is the size of the input array and k is the number of solutions.
While running this code on LeetCode, I am getting TimeOut error for arrays of large size. I would like to know how can I further optimize my code to pass the judge.
P.S: I have read the discussion in this related question. This did not help me resolve the issue.

A couple of improvements you can make to your algorithm:
1) Use sets instead of a list for your solution. Using a set will insure that you don't have any duplicate and you don't have to do a if new_solution not in solutions: check.
2) Add an edge case check for an all zero list. Not too much overhead but saves a HUGE amount of time for some cases.
3) Change enumerate to a second while. It is a little faster. Weirdly enough I am getting better performance in the test with a while loop then a n_max = n -2; for i in range(0, n_max): Reading this question and answer for xrange or range should be faster.
NOTE: If I run the test 5 times I won't get the same time for any of them. All my test are +-100 ms. So take some of the small optimizations with a grain of salt. They might NOT really be faster for all python programs. They might only be faster for the exact hardware/software config the tests are running on.
ALSO: If you remove all the comments from the code it is a LOT faster HAHAHAH like 300ms faster. Just a funny side effect of however the tests are being run.
I have put in the O() notation into all of the parts of your code that take a lot of time.
def threeSum(nums):
"""
:type nums: List[int]
:rtype: List[List[int]]
"""
# timsort: O(nlogn)
nums.sort()
# Stored val: Really fast
n = len(nums)
# Memory alloc: Fast
solutions = []
# O(n) for enumerate
for i, num in enumerate(nums):
if i > n - 3:
break
left, right = i+1, n-1
# O(1/2k) where k is n-i? Not 100% sure about this one
while left < right:
s = num + nums[left] + nums[right] # check if current sum is 0
if s == 0:
new_solution = [num, nums[left], nums[right]]
# add to the solution set only if this triplet is unique
# O(n) for not in
if new_solution not in solutions:
solutions.append(new_solution)
right -= 1
left += 1
elif s > 0:
right -= 1
else:
left += 1
return solutions
Here is some code that won't time out and is fast(ish). It also hints at a way to make the algorithm WAY faster (Use sets more ;) )
class Solution(object):
def threeSum(self, nums):
"""
:type nums: List[int]
:rtype: List[List[int]]
"""
# timsort: O(nlogn)
nums.sort()
# Stored val: Really fast
n = len(nums)
# Hash table
solutions = set()
# O(n): hash tables are really fast :)
unique_set = set(nums)
# covers a lot of edge cases with 2 memory lookups and 1 hash so it's worth the time
if len(unique_set) == 1 and 0 in unique_set and len(nums) > 2:
return [[0, 0, 0]]
# O(n) but a little faster than enumerate.
i = 0
while i < n - 2:
num = nums[i]
left = i + 1
right = n - 1
# O(1/2k) where k is n-i? Not 100% sure about this one
while left < right:
# I think its worth the memory alloc for the vars to not have to hit the list index twice. Not sure
# how much faster it really is. Might save two lookups per cycle.
left_num = nums[left]
right_num = nums[right]
s = num + left_num + right_num # check if current sum is 0
if s == 0:
# add to the solution set only if this triplet is unique
# Hash lookup
solutions.add(tuple([right_num, num, left_num]))
right -= 1
left += 1
elif s > 0:
right -= 1
else:
left += 1
i += 1
return list(solutions)

I benchamrked the faster code provided by PeterH but I found a faster solution, and the code is simpler too.
class Solution(object):
def threeSum(self, nums):
res = []
nums.sort()
length = len(nums)
for i in xrange(length-2): #[8]
if nums[i]>0: break #[7]
if i>0 and nums[i]==nums[i-1]: continue #[1]
l, r = i+1, length-1 #[2]
while l<r:
total = nums[i]+nums[l]+nums[r]
if total<0: #[3]
l+=1
elif total>0: #[4]
r-=1
else: #[5]
res.append([nums[i], nums[l], nums[r]])
while l<r and nums[l]==nums[l+1]: #[6]
l+=1
while l<r and nums[r]==nums[r-1]: #[6]
r-=1
l+=1
r-=1
return res
https://leetcode.com/problems/3sum/discuss/232712/Best-Python-Solution-(Explained)

Majority Element Python

I'm having trouble getting the right output on my "Majority Element" divide and conquer algorithm implementation in Python 3.
This should be relatively correct; however, it would still appear that I'm missing something or it is slightly off and I cannot figure out why that is.
I've tried some debugging statement and different things. It looks like on the particular case listed in the comment above the code, it's resolving -1 for "left_m" and 941795895 for "right_m" when it does the recursive calls. When it compares the element at each index to those variables, the counter will obviously never increment.
Am I going about this the wrong way? Any help would be greatly appreciated.
Thanks.
# Input:
# 10
# 2 124554847 2 941795895 2 2 2 2 792755190 756617003
# Your output:
# 0
#
# Correct output:
# 1
def get_majority_element(a, left, right):
if left == right:
return -1
if left + 1 == right:
return a[left]
left_m = get_majority_element(a, left, (left + right - 1)//2)
right_m = get_majority_element(a, (left + right - 1)//2 + 1, right)
left_count = 0
for i in range(0, right):
if a[i] == left_m:
left_count += 1
if left_count > len(a)//2:
return left_m
right_count = 0
for i in range(0, right):
if a[i] == right_m:
right_count += 1
if right_count > len(a)//2:
return right_m
return -1
if __name__ == '__main__':
input = sys.stdin.read()
n, *a = list(map(int, input.split()))
if get_majority_element(a, 0, n) != -1:
print(1)
else:
print(0)

When counting the appearance of left and right major elements, your loops go over the range(0, right). Instead, they should be going over the range(left, right). Starting from 0 may, and will cause returning incorrect major elements in the smaller subproblems.
Also, in addition to the problem of having incorrect starting indices in ranges your for loops cover, you seem to have a problem in your recursive call arguments, probably due to your intuition causing you to overlook some details. Throughout the get_majority_element function, you treat parameter right to be the first index that is not in the list, as opposed to right being the index of the rightmost element in the list.
However, in your first recursive call, you give the third argument as if it is the last element included in that list. If right is the index of the first element not in that list, it should actually be the same with the second parameter of the second recursive call you're making in the following line. So, the third argument of the first recursive call you're making is less than it should be, by 1, causing you to overlook 1 element each time you recursively go down.
Thirdly, you have an error in the if statements following the for loops, similar to the problem you had with loop ranges. You are dividing the occurances of the element for all len(a) elements, although you should only care about the subproblem you are currently working on. So, you should divide it by the number of elements in that subproblem, rather than len(a). (i.e. (right - left)//2)
You can find the working code below, and look here in order to observe it's execution.
def get_majority_element(a, left, right):
if left == right:
return -1
if left + 1 == right:
return a[left]
left_m = get_majority_element(a, left, (left + right - 1)//2 + 1)
right_m = get_majority_element(a, (left + right - 1)//2 + 1, right)
left_count = 0
for i in range(left, right):
if a[i] == left_m:
left_count += 1
if left_count > (right-left)//2:
return left_m
right_count = 0
for i in range(left, right):
if a[i] == right_m:
right_count += 1
if right_count > (right-left)//2:
return right_m
return -1
if __name__ == '__main__':
input = sys.stdin.read()
n, *a = list(map(int, input.split()))
print("n=" + str(n))
if get_majority_element(a, 0, len(a)) != -1:
print(1)
else:
print(0)

I am trying to use Boyers and Moore's algorithm to find the majority element among a list. I am using a inbuilt function, count; so, if the majority element is greater than half size of the list then it gives output of 1, otherwise 0. You can find more in this link about Boyers and Moore's Algorithm on finding majority Algorithm information here
# Uses python3
import sys
def get_majority_element(a,n):
maximum = a[0]
amount = 1
for i in (a[1:]):
if not maximum == i:
if amount >= 1:
amount = amount - 1
else:
maximum = i
amount = 1
else:
amount = amount + 1
output = a.count(maximum)
if output > n//2:
return 1
return 0
if __name__ == '__main__':
input = sys.stdin.read()
n, *a = list(map(int, input.split()))
print (get_majority_element(a,n))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Functional implementation of recursive merge sort? - python

Related

Finding Pivot element in a sorted array, How do I optimize this code?

Recursive Quicksort using Lambdas

python3 binary search not working [duplicate]

Optimizing solution to Three Sum

Majority Element Python

Categories

Resources