How to use multithreading with divide and conquer? - python

I am new at Python and have been trying to use multithreading. There already exists an in-depth comment on Stackoverflow on the topic, but I still have some questions.
The goal of my program is to create and populate an array (although I guess that technically it would have to be called a "list" in Python) and have it sorted by a "divide and conquer" algorithm. Unfortunately, the terms "list" and "array" seem to be conflated by many a user, even though they are not the same. If "array" is used in my comment, please bear in mind that I have posted different code from various resources and have, for the sake of respecting the original author/s, not changed its content.
My code for populating the list count was quite simple
#!/usr/bin/env python3
count = []
i = 149
while i >= 0:
count.append(i)
print(i)
i -= 1
After that I used this very handy guide on the topic of "divide and conquer" to create two lists for sorting, which were merged later on. My main concern is now how to use those lists correctly with multithreading.
In the earlier mentioned post it was argued that, basically, just a few lines of code were needed to use multithreading:
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
as well as
results = pool.starmap(function, zip(list_a, list_b))
to pass multiple lists.
I have tried to adapt the code but failed. My function's parameters are def merge(count, l, m, r) (used to divide the list count into a left and a right part) and the two temporarily created lists are called L and R.
def merge(arr, l, m, r):
n1 = m - l + 1
n2 = r- m
# create temp arrays
L = [0] * (n1)
R = [0] * (n2)
But every time I run the program, it just responds with the following error message:
Traceback (most recent call last):
File "./DaCcountdownTEST.py", line 71, in <module>
results = pool.starmap(merge,zip(L,R))
NameError: name 'L' is not defined
I don't know the cause for my problem.
Any help is greatly appreciated!

I'm not sure exactly what's going wrong with your code, but here's a complete working example of a multi-threaded version of the mergeSort code you linked to:
from multiprocessing.dummy import Pool as ThreadPool
# Merges two subarrays of arr[].
# First subarray is arr[l..m]
# Second subarray is arr[m+1..r]
def merge(arr, l, m, r):
n1 = m - l + 1
n2 = r- m
# create temp arrays
L = [0] * (n1)
R = [0] * (n2)
# Copy data to temp arrays L[] and R[]
for i in range(0 , n1):
L[i] = arr[l + i]
for j in range(0 , n2):
R[j] = arr[m + 1 + j]
# Merge the temp arrays back into arr[l..r]
i = 0 # Initial index of first subarray
j = 0 # Initial index of second subarray
k = l # Initial index of merged subarray
while i < n1 and j < n2 :
if L[i] <= R[j]:
arr[k] = L[i]
i += 1
else:
arr[k] = R[j]
j += 1
k += 1
# Copy the remaining elements of L[], if there
# are any
while i < n1:
arr[k] = L[i]
i += 1
k += 1
# Copy the remaining elements of R[], if there
# are any
while j < n2:
arr[k] = R[j]
j += 1
k += 1
# l is for left index and r is right index of the
# sub-array of arr to be sorted
def mergeSort(arr,l=0,r=None):
if r is None:
r = len(arr) - 1
if l < r:
# Same as (l+r)/2, but avoids overflow for
# large l and h
m = (l+(r-1))//2
# Sort first and second halves
pool = ThreadPool(2)
pool.starmap(mergeSort, zip((arr, arr), (l, m+1), (m, r)))
pool.close()
pool.join()
merge(arr, l, m, r)
Here's a short test of the code:
arr = np.random.randint(0,100,10)
print(arr)
mergeSort(arr)
print(arr)
which produces the following output:
[93 56 55 60 0 28 17 77 84 2]
[ 0 2 17 28 55 56 60 77 84 93]
Sadly, it does seem to be quite a bit slower than the single threaded version. However, this kind of slowdown is par for the course when it comes to multithreading compute-bound tasks in Python.

Related

Time limit excedded for program - find Kth smallest element

I am looking at GeeksForGeeks problem Kth smallest element:
Given an array arr[] and an integer K where K is smaller than size of array, the task is to find the Kth smallest element in the given array. It is given that all array elements are distinct.
Expected Time Complexity: O(n)
Expected Auxiliary Space: O(log(n))
Constraints:
1 <= N <= 105
1 <= arr[i] <= 105
1 <= K <= N
My Code:
class Solution:
def kthSmallest(self,arr, l, r, k):
'''
arr : given array
l : starting index of the array i.e 0
r : ending index of the array i.e size-1
k : find kth smallest element and return using this function
'''
arr2=arr[:k]
arr2.insert(0,None)
for i in range(k//2,0,-1):
arr2=self.heapify(arr2,i,k-1)
for i in arr[k:]:
if i <arr2[1]:
arr2[1]=i
arr2=self.heapify(arr2,1,k-1)
return arr2[1]
def heapify(self,arr, i, r):
if 2 * i <= r + 1 and arr[2 * i] > arr[i]:
arr[2 * i], arr[i] = arr[i], arr[i * 2]
arr = self.heapify(arr, 2 * i, r)
if 2 * i + 1 <= r + 1 and arr[2 * i + 1] > arr[i]:
arr[2 * i + 1], arr[i] = arr[i], arr[i * 2 + 1]
arr = self.heapify(arr, 2 * i + 1, r)
return arr
I made a sub array of first K elements in the array, and max heapified it.
Then for the rest of the elements in the array, if the element is smaller than the first element of the heap, I replaced the top element and then max heapified the top element. I am getting time limit exceeded error. Any idea?
The problem is that your heapify function is not efficient. In the worst case it makes two recursive calls at the same recursion depth. This may even happen at several recursion depths, so that the number of times heapify is called recursively could become quite large. The goal is to have this only call heapify once (at the most) per recursion level.
It should first find the greatest child, and only then determine whether heapify should be called again, and make that single call if needed.
Some other remarks:
Instead of making heapify recursive, use an iterative solution. This will also save some execution time.
It is strange to pass k-1 to heapify as last argument, when the last element sits at index k, and so you get the weird comparison <= r + 1 in that function. It is more intuitive to pass k as argument, and work with <= r inside the function.
As arr is mutated by heapify it is not needed to return it. This is just overhead that is useless.
2 * i is calculated several times. It is better to calcuate this only once.
arr[k:] makes a copy of that part of the list. This is not really needed. You could just iterate over the range and take the corresponding value from the array in the loop.
It is not clear why the main function needs to get l and r as arguments, since in comments it is explained that l will be 0 and r the index of the last element. But in my opinion, since you get them, you should use them. So you should not assume l is 0,... etc.
I would use a more descriptive name for arr2. Why not name it heap?
Here is the improvement of your code:
class Solution:
def kthSmallest(self,arr, l, r, k):
'''
arr : given array
l : starting index of the array i.e 0
r : ending index of the array i.e size-1
k : find kth smallest element and return using this function
'''
heap = [arr[i] for i in range(l, r + 1)]
heap.insert(0, None)
for i in range(k//2, 0, -1):
self.heapify(heap, i, k)
for i in range(l + k, r + 1):
val = arr[i]
if val < heap[1]:
heap[1] = val
self.heapify(heap, 1, k)
return heap[1]
def heapify(self, arr, i, r):
child = 2 * i
while child <= r:
if child + 1 <= r and arr[child + 1] > arr[child]:
child += 1
if arr[child] <= arr[i]:
break
arr[child], arr[i] = arr[i], arr[child]
i = child
child = 2 * i
Finally, there is a heapq module you can use, which simplifies your code:
from heapq import heapify, heapreplace
class Solution:
def kthSmallest(self, arr, l, r, k):
'''
arr : given array
l : starting index of the array i.e 0
r : ending index of the array i.e size-1
k : find kth smallest element and return using this function
'''
heap = [-arr[i] for i in range(l, l + k)]
heapify(heap)
for i in range(l + k, r + 1):
val = -arr[i]
if val > heap[0]:
heapreplace(heap, val)
return -heap[0]
The unary minus that occurs here and there is to make the native minheap work as a maxheap.

I was asked to make a program in python to check whether a sorting algorithm is stable or not. Need some help regarding that

So I decided to approach this problem by creating another array (lets say arraycount) with the same length of the input array(lets say arraynumber) which is to be sorted. For the program to really check the stability of the sorting function, I decided to use test cases of integer array of repeated occurrences eg arraynumber = {10,8,7,8,2,2,2,1,6}. I then use a function to calculate the occurrences of the number in the array and store them in the arraycount[] which will result as arraycount={1,1,1,2,1,2,3,1,1}.
Naturally after sorting in increasing order they become arraynumber={1,2,2,2,6,7,8,8,10} and the result of the arraycount decides that its stable or not. Hence if its stable arraycount={1,1,2,3,1,1,1,2,1} otherwise it won't be same.
This is the code I have been working on. Help and feedback is appreciated. Main function and input is not taken yet. I will take another array copy of the original arraycount which will be compared afterward to check if it's retained the original position. Sorry if I made any trivial mistakes. Thank you.
def merge(arr1, l, m, r,arr2):
n1 = m - l + 1
n2 = r- m
# create temp arrays
L = [0] * (n1)
R = [0] * (n2)
# Copy data to temp arrays L[] and R[]
for i in range(0 , n1):
L[i] = arr1[l + i]
for j in range(0 , n2):
R[j] = arr1[m + 1 + j]
# Merge the temp arrays back into arr[l..r]
i = 0 # Initial index of first subarray
j = 0 # Initial index of second subarray
k = l # Initial index of merged subarray
while i < n1 and j < n2 :
if L[i] <= R[j]:
arr1[k] = L[i]
arr2[k]=L[i]
i += 1
else:
arr1[k] = R[j]
arr2[k]=R[j]
j += 1
k += 1
# Copy the remaining elements of L[], if there
# are any
while i < n1:
arr1[k] = L[i]
arr2[k]=L[i]
i += 1
k += 1
# Copy the remaining elements of R[], if there
# are any
while j < n2:
arr1[k] = R[j]
arr2[k]=R[j]
j += 1
k += 1
# l is for left index and r is right index of the
# sub-array of arr to be sorted
def mergeSort(arr1,l,r,arr2):
if l < r:
# Same as (l+r)/2, but avoids overflow for
# large l and h
m = (l+(r-1))/2
# Sort first and second halves
mergeSort(arr1, l, m,arr2)
mergeSort(arr1, m+1, r,arr2)
merge(arr1, l, m, r,arr2)
# This function takes last element as pivot, places
# the pivot element at its correct position in sorted
# array, and places all smaller (smaller than pivot)
# to left of pivot and all greater elements to right
# of pivot
def partition(arr1,low,high,arr2):
i = ( low-1 ) # index of smaller element
pivot = arr1[high] # pivot
for j in range(low , high):
# If current element is smaller than or
# equal to pivot
if arr1[j] <= pivot:
# increment index of smaller element
i = i+1
arr1[i],arr1[j] = arr1[j],arr1[i]
arr2[i],arr2[j]=arr2[j],arr2[i]
arr1[i+1],arr1[high] = arr1[high],arr1[i+1]
arr2[i+1],arr2[high]=arr2[high],arr2[i+1]
return ( i+1 )
# The main function that implements QuickSort
# arr1[] --> Array to be sorted,
# arr2[] --> Second Array of the count to be sorted along with first array
# low --> Starting index,
# high --> Ending index
# Function to do Quick sort
def quickSort(arr1,low,high,arr2):
if low < high:
# pi is partitioning index, arr[p] is now
# at right place
pi = partition(arr1,low,high,arr2)
# Separately sort elements before
# partition and after partition
quickSort(arr1, low, pi-1,arr2)
quickSort(arr1, pi+1, high,arr2)
def countArray(arr1,arr2):
i=0,j=1,k=0
while i<range(len(arr1[])):
while j<range(len(arr1[])):
if arr1[i]==arr1[j]:
arr2[i]=k+1
j=j+1
k=0
i=i+1
arraylist=[]
arraycount=[]
def isStable(arr1,arr2):
k=0
while i < range(len(arr1[])):
if arr[i]==arr2[i]:
k=k+1
i=i+1
if k==len(arr2):
print("Stable sort")
else:
print("Unstable sort")

merge sort python implementation methodology

I'm learning the well known sorting algorithms and implementing them myself. I have recently done merge sort and the code I have is:
def merge(l, r, direction):
# print('merging')
# print(l, r)
# adding infinity to end of list so we know when we've hit the bottom of one pile
l.append(inf)
r.append(inf)
A = []
i, j = 0, 0
while (i < len(l)) and (j < len(r)):
if l[i] <= r[j]:
A.append(l[i])
i += 1
else:
A.append(r[j])
j += 1
# removing infinity from end of list
A.pop()
return(A)
def merge_sort(num_lst, direction='increasing', level=0):
if len(num_lst) > 1:
mid = len(num_lst)//2
l = num_lst[:mid]
r = num_lst[mid:]
l = merge_sort(l, level=level + 1)
r = merge_sort(r, level=level + 1)
num_lst = merge(l, r, direction)
return num_lst
What I have seen in other implementations that differs from my own is in the merging of lists. Where I just create a blank list and append elements in numerical order other pass the preexisting into merge and overwrite each element to create a list in numerical order. Something like:
def merge(arr, l, m, r):
n1 = m - l + 1
n2 = r- m
# create temp arrays
L = [0] * (n1)
R = [0] * (n2)
# Copy data to temp arrays L[] and R[]
for i in range(0 , n1):
L[i] = arr[l + i]
for j in range(0 , n2):
R[j] = arr[m + 1 + j]
# Merge the temp arrays back into arr[l..r]
i = 0 # Initial index of first subarray
j = 0 # Initial index of second subarray
k = l # Initial index of merged subarray
while i < n1 and j < n2 :
if L[i] <= R[j]:
arr[k] = L[i]
i += 1
else:
arr[k] = R[j]
j += 1
k += 1
# Copy the remaining elements of L[], if there
# are any
while i < n1:
arr[k] = L[i]
i += 1
k += 1
# Copy the remaining elements of R[], if there
# are any
while j < n2:
arr[k] = R[j]
j += 1
k += 1
I'm curious about the following:
Problem
Is using a append() on a blank list a bad idea? By my understanding when python creates a list it grabs a certain size chunk of memory and if our list grows beyond that copies the list to a different and larger section of memory (which seems it would be pretty high cost if it happened even once for a large list let alone repeatedly).
Is there just a higher cost to using append() compared to accessing a list by index? It would seemed to me append would be able to do things at a fairly low cost.
When you instantiate a list Python will allocate the memory necessary for holding the items plus some extra memory for future appends / extends. When you put too much extra stuff in the list it eventually needs to be reallocated which potentially slows down the program (see this part of the source code). The reallocation happens here and the size is calculated as:
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
As the comment states the growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ....
So if the size of the final list is known in advance and allocated from the beginning this will save you some extra reallocations (and thus compute time).
from time import time
append_arr = []
index_arr = [None] * 10240*1024
t0 = time()
for x in range(10240*1024):
append_arr.append(x)
t1 = time()
t2 = time()
for i in range(10240*1024):
index_arr[i] = i
t3 = time()
print(str(t1-t0))
print(str(t3-t2))
Looks like appending is slower.

Merge Sort in Python - RuntimeError: maximum recursion depth exceeded

I am working on merge sort in Python.
I have checked merge function. The arrays are merging fine.
But in the mergeSort Function an error is coming :---
RuntimeError: maximum recursion depth exceeded.
RuntimeError Traceback (most recent call
last) in ()
63 print(arr[i]),
64
---> 65 mergeSort(arr,0,n-1)
66 print(" ")
67 print("Sorted Array is")
in mergeSort(arr, l, r)
53 m = l+(r-1)/2
54 mergeSort(arr,l,m)
---> 55 mergeSort(arr,m+1,r)
56 merge(arr,l,m,r)
57
What can be the possible cause of this?
def merge(arr,l,m,r):
n1 = m-l+1
n2 = r-m
L = [0] * n1
R = [0] * n2
print("First List")
for i in range(0,n1):
L[i] = arr[i+l]
print(L[i]),
print("")
print("Second List")
for j in range(0,n2):
R[j] = arr[j+m+1]
print(R[j]),
#Merging the temp arrays
i = 0
j = 0
k = 0
print("")
print("Merged List ---------->")
while i < n1 and j < n2:
if L[i] <= R[j]:
arr[k] = L[i]
print(arr[k]),
i+=1
else:
arr[k] = R[j]
print(arr[k]),
j+=1
k+=1
while i<n1:
arr[k] = L[i]
i+=1
k+=1
while j<n2:
arr[k] = R[j]
print(arr[k])
j+=1
k+=1
def mergeSort(arr,l,r):
if l<r:
m = l+(r-1)/2
mergeSort(arr,l,m)
mergeSort(arr,m+1,r)
merge(arr,l,m,r)
arr = [0,12,13,0,1,22]
n = len(arr)
mergeSort(arr,0,n-1)
print(" ")
print("Sorted Array is")
for i in range(0,n-1):
print(arr[i])
The way m is calculated in mergeSort is wrong(you need to divide in half the whole expression, not only (r-1)). Change it to:
m = (l+(r-1))/2
As you were calculating that incorrectly your method was calling itself recursively over and over again until the maximum method stack depth was exceeded and thus crashing.
You did the basic mistake .Python is dynamically type language
Before going to why you got that Run Time RecurssionError ,you need to make one change.
Instead of m = l+(r-1)/2 write m = (l+r)/2
Because when you called mergeSort(arr,0,n-1) at that time n-1 is last index value and that's why there is no need of r-1 in m = l+(r-1)/2
Floor Division operator(//) VS Normal division operator(/)
Now coming to the main point why did you got that RecurssionError here is the answer
When you are doing operation like
m = (l+r)/2
Divide operation will give fractional part i.e. m will be treated as floating variable
so m would never be 0 it would be any floating number say something 0.123 or 2.122 or 1.0025
Because of floating point as m would never be 0 the if condition if l<r: would always true and the mergeSort() function goes into Infinite loop and that's why you got that RunTime RecurssionERROR
You want m as integer and not float so instead of m=(l+r)/2
Write m=(l+r)//2 the floor division operator(//) will give you integer value and your mergeSort() function will not go under Infinite Loop
Your code will get executed this time without any Error just make one change m=(l+r)//2
BUT BUT BUT your merge() function algo is not good it does not give sorted array also the data is lost and something else gets printed
Its because when you are calling mergesort function, you are passing l = 0 and r= lenght of list.
And no matter what your calculation going to be in every case you l will be less than r.
m = l+(r-1)/2

Optimal Search Tree Using Python - Code Analysis

First of all, sorry about the naive question. But I couldn't find help elsewhere
I'm trying to create an Optimal Search Tree using Dynamic Programing in Python that receives two lists (a set of keys and a set of frequencies) and returns two answers:
1 - The smallest path cost.
2 - The generated tree for that smallest cost.
I basically need to create a tree organized by the most accessed items on top (most accessed item it's the root), and return the smallest path cost from that tree, by using the Dynamic Programming solution.
I've the following implemented code using Python:
def optimalSearchTree(keys, freq, n):
#Create an auxiliary 2D matrix to store results of subproblems
cost = [[0 for x in xrange(n)] for y in xrange(n)]
#For a single key, cost is equal to frequency of the key
#for i in xrange (0,n):
# cost[i][i] = freq[i]
# Now we need to consider chains of length 2, 3, ... .
# L is chain length.
for L in xrange (2,n):
for i in xrange(0,n-L+1):
j = i+L-1
cost[i][j] = sys.maxint
for r in xrange (i,j):
if (r > i):
c = cost[i][r-1] + sum(freq, i, j)
elif (r < j):
c = cost[r+1][j] + sum(freq, i, j)
elif (c < cost[i][j]):
cost[i][j] = c
return cost[0][n-1]
def sum(freq, i, j):
s = 0
k = i
for k in xrange (k,j):
s += freq[k]
return s
keys = [10,12,20]
freq = [34,8,50]
n=sys.getsizeof(keys)/sys.getsizeof(keys[0])
print(optimalSearchTree(keys, freq, n))
I'm trying to output the answer 1. The smallest cost for that tree should be 142 (the value stored on the Matrix Position [0][n-1], according to the Dynamic Programming solution). But unfortunately it's returning 0. I couldn't find any issues in that code. What's going wrong?
You have several very questionable statements in your code, definitely inspired by C/Java programming practices. For instance,
keys = [10,12,20]
freq = [34,8,50]
n=sys.getsizeof(keys)/sys.getsizeof(keys[0])
I think you think you calculate the number of items in the list. However, n is not 3:
sys.getsizeof(keys)/sys.getsizeof(keys[0])
3.142857142857143
What you need is this:
n = len(keys)
One more find: elif (r < j) is always True, because r is in the range between i (inclusive) and j (exclusive). The elif (c < cost[i][j]) condition is never checked. The matrix c is never updated in the loop - that's why you always end up with a 0.
Another suggestion: do not overwrite the built-in function sum(). Your namesake function calculates the sum of all items in a slice of a list:
sum(freq[i:j])
import sys
def optimalSearchTree(keys, freq):
#Create an auxiliary 2D matrix to store results of subproblems
n = len(keys)
cost = [[0 for x in range(n)] for y in range(n)]
storeRoot = [[0 for i in range(n)] for i in range(n)]
#For a single key, cost is equal to frequency of the key
for i in range (0,n):
cost[i][i] = freq[i]
# Now we need to consider chains of length 2, 3, ... .
# L is chain length.
for L in range (2,n+1):
for i in range(0,n-L+1):
j = i + L - 1
cost[i][j] = sys.maxsize
for r in range (i,j+1):
c = (cost[i][r-1] if r > i else 0)
c += (cost[r+1][j] if r < j else 0)
c += sum(freq[i:j+1])
if (c < cost[i][j]):
cost[i][j] = c
storeRoot[i][j] = r
return cost[0][n-1], storeRoot
if __name__ == "__main__" :
keys = [10,12,20]
freq = [34,8,50]
print(optimalSearchTree(keys, freq))

Categories