Creating min heap from array - 2 methods

Creating min heap from array - 2 methods - python

I am working on a problem about building a min heap from an array. I have 2 approaches - the first is recursion and the second is using a while loop. The recursion approach passed the tests on the online grader, but the while loop version doesn't seem to work. I generated some random stress tests in my code below and found that the 2 methods gave different answers as well.
May I know what's the mistake in my second method? The question is as follows:
Input Format. The first line of the input contains single integer 𝑛. The next line contains 𝑛 space-separated
integers 𝑎𝑖.
Constraints. 1 ≤ 𝑛 ≤ 100 000; 0 ≤ 𝑖, 𝑗 ≤ 𝑛 − 1; 0 ≤ 𝑎0, 𝑎1,..., 𝑎𝑛−1 ≤ 109. All 𝑎𝑖 are distinct.
Output Format. The first line of the output should contain single integer 𝑚 — the total number of swaps.
𝑚 must satisfy conditions 0 ≤ 𝑚 ≤ 4𝑛. The next 𝑚 lines should contain the swap operations used to convert the array 𝑎 into a heap. Each swap is described by a pair of integers 𝑖, 𝑗 — the 0-based
indices of the elements to be swapped. After applying all the swaps in the specified order the array must become a heap, that is, for each 𝑖 where 0 ≤ 𝑖 ≤ 𝑛 − 1 the following conditions must be true:
If 2𝑖 + 1 ≤ 𝑛 − 1, then 𝑎𝑖 < 𝑎2𝑖+1.
If 2𝑖 + 2 ≤ 𝑛 − 1, then 𝑎𝑖 < 𝑎2𝑖+2.
Note that all the elements of the input array are distinct. Note that any sequence of swaps that has length at most 4𝑛 and after which your initial array becomes a correct heap will be graded as correct.
My code:
# python3
from random import randint
swaps = []
def sift_down(i, n, data):
min_index = i
left_child = 2*i + 1
right_child = 2*i + 2
if left_child < n and data[left_child] < data[min_index]:
min_index = left_child
if right_child < n and data[right_child] < data[min_index]:
min_index = right_child
if i != min_index:
swaps.append([i, min_index])
data[i], data[min_index] = data[min_index], data[i]
sift_down(min_index, n, data)
def build_heap(data):
n = len(data)
for i in range(n//2, -1, -1):
sift_down(i, n, data)
return swaps
# wrong answer using while loop instead of recursion
def build_heap2(data):
swap = []
for i in range(len(data)-1, 0, -1):
current_node = i
prev_node = i // 2 if i % 2 != 0 else i // 2 - 1
while data[prev_node] > data[current_node] and current_node != 0:
swap.append((prev_node, current_node))
data[prev_node], data[current_node] = data[current_node], data[prev_node]
current_node = prev_node
prev_node = current_node // 2 if current_node % 2 != 0 else current_node // 2 - 1
return swap
def main():
# n = int(input())
# data = list(map(int, input().split()))
# assert len(data) == n
while True:
n = randint(1, 100000)
data = []
data2 = []
for i in range(n):
data.append(randint(0, 10^9))
data2 = data.copy()
swaps = build_heap(data)
swaps2 = build_heap2(data2)
if swaps != swaps2:
print("recursion")
print(data[0], len(data), len(swaps))
print("loop:")
print(data2[0], len(data2), len(swaps2))
break
else:
print("success")
swaps = build_heap(data)
print(len(swaps))
for i, j in swaps:
print(i, j)
if __name__ == "__main__":
main()

Your build_heap2 implements an idea that is not correct. It starts from the bottom of the tree (correct), but then bubbles values up the tree (wrong), in the upper part of the tree that has not been heapified yet. This is not good. Not only can it report the wrong number of swaps, it will not always deliver a valid heap. For instance, for [3, 1, 2, 4, 0] the result after the swaps is still not a heap, as the value 1 ends up as a child of 3.
The purpose is to build little heaps at the bottom of the tree and after the children of a parent node have been turned into heaps, the value in that parent node is sifted down into either of these child-heaps. This is right, as now the moving value is moving within a subtree that is already heapified. The result is that the parent of these two little heaps is now the root of a valid heap itself. And so at the end of the algorithm, the root will be the root of a valid heap.
So instead of swapping values upwards in the tree, you need to swap downwards (choosing the child with the least value).
Here is the corrected version:
def build_heap(data):
swap = []
# We can start at the deepest parent:
for i in range(len(data) // 2 - 1, -1, -1):
current_node = i
while True:
child_node = current_node * 2 + 1
if child_node >= len(data):
break
if child_node + 1 < len(data) and data[child_node + 1] < data[child_node]:
child_node += 1
if data[current_node] < data[child_node]:
break
# swap the current value DOWN, with the least of both child values
swap.append((child_node, current_node))
data[child_node], data[current_node] = data[current_node], data[child_node]
current_node = child_node
return swap

There are (at least) two ways to build a heap.
The O(N) solution works backwards from the middle of the dataset towards the beginning, ensuring that each successive element is the correct root of the subtree at that point:
def build_heap_down(data):
n = len(data)
for subtree in range(n // 2 - 1, -1, -1):
sift_down(subtree, n, data)
The other solution, which is O(N log N), just adds each element in turn to a successively larger heap:
def build_heap_up(data):
for new_element in range(1, n):
sift_up(new_element, data)
Since build_heap_up() is log-linear in the worst case (which I believe is reverse-sorted input), it probably doesn't satisfy the needs of your assignment, which impose a linear bound on the number of swaps. Still, some experimentation is worth doing. Perhaps that's the point of this assignment.
def sift_up(elt, data):
while elt > 0:
parent = (elt - 1) // 2
if data[parent] <= data[elt]: return
swap(parent, elt, data)
elt = parent
def sift_down(elt, limit, data):
while True:
kid = 2 * elt + 1
if kid >= limit: return
if kid + 1 < limit and data[kid + 1] < data[kid]: kid += 1
if data[elt] <= data[kid]: return
swap(elt, kid, data)
elt = kid
The key insight here is that both sift_up and sift_down require that the array they are working with be a heap except for the element being sifted. sift_down works with the array from the sifted element to the end, so doing it correctly on the entire array requires working backwards. sift_up works with the array from the beginning to the sifted element, so the iteration has to work forwards.
Your build_heap does build_heap_down, as far as I can see correctly. Although it uses recursion, it does the same thing as my loop above (and the version from #trincot); recursion at the very end of a function can always be turned into a simple loop using tail call elimination. (Some languages automatically perform this program transformation, but Python isn't one of them.)
Your build_heap2 is an incorrect version of build_heap_up because it works backwards instead of working forwards. That's easy to fix. But don't expect it to produce the same heap, much less the same list of swaps. There are many possible heaps which can be built from a given list of numbers, which is why it's possible to find an O(N) algorithm for build_heap and not for sort.

Related

Getting all subsets from subset sum problem on Python using Dynamic Programming

I am trying to extract all subsets from a list of elements which add up to a certain value.
Example -
List = [1,3,4,5,6]
Sum - 9
Output Expected = [[3,6],[5,4]]
Have tried different approaches and getting the expected output but on a huge list of elements it is taking a significant amount of time.
Can this be optimized using Dynamic Programming or any other technique.
Approach-1
def subset(array, num):
result = []
def find(arr, num, path=()):
if not arr:
return
if arr[0] == num:
result.append(path + (arr[0],))
else:
find(arr[1:], num - arr[0], path + (arr[0],))
find(arr[1:], num, path)
find(array, num)
return result
numbers = [2, 2, 1, 12, 15, 2, 3]
x = 7
subset(numbers,x)
Approach-2
def isSubsetSum(arr, subset, N, subsetSize, subsetSum, index , sum):
global flag
if (subsetSum == sum):
flag = 1
for i in range(0, subsetSize):
print(subset[i], end = " ")
print("")
else:
for i in range(index, N):
subset[subsetSize] = arr[i]
isSubsetSum(arr, subset, N, subsetSize + 1,
subsetSum + arr[i], i + 1, sum)

If you want to output all subsets you can't do better than a sluggish O(2^n) complexity, because in the worst case that will be the size of your output and time complexity is lower-bounded by output size (this is a known NP-Complete problem). But, if rather than returning a list of all subsets, you just want to return a boolean value indicating whether achieving the target sum is possible, or just one subset summing to target (if it exists), you can use dynamic programming for a pseudo-polynomial O(nK) time solution, where n is the number of elements and K is the target integer.
The DP approach involves filling in an (n+1) x (K+1) table, with the sub-problems corresponding to the entries of the table being:
DP[i][k] = subset(A[i:], k) for 0 <= i <= n, 0 <= k <= K
That is, subset(A[i:], k) asks, 'Can I sum to (little) k using the suffix of A starting at index i?' Once you fill in the whole table, the answer to the overall problem, subset(A[0:], K) will be at DP[0][K]
The base cases are for i=n: they indicate that you can't sum to anything except for 0 if you're working with the empty suffix of your array
subset(A[n:], k>0) = False, subset(A[n:], k=0) = True
The recursive cases to fill in the table are:
subset(A[i:], k) = subset(A[i+1:, k) OR (A[i] <= k AND subset(A[i+i:], k-A[i]))
This simply relates the idea that you can use the current array suffix to sum to k either by skipping over the first element of that suffix and using the answer you already had in the previous row (when that first element wasn't in your array suffix), or by using A[i] in your sum and checking if you could make the reduced sum k-A[i] in the previous row. Of course, you can only use the new element if it doesn't itself exceed your target sum.
ex: subset(A[i:] = [3,4,1,6], k = 8)
would check: could I already sum to 8 with the previous suffix (A[i+1:] = [4,1,6])? No. Or, could I use the 3 which is now available to me to sum to 8? That is, could I sum to k = 8 - 3 = 5 with [4,1,6]? Yes. Because at least one of the conditions was true, I set DP[i][8] = True
Because all the base cases are for i=n, and the recurrence relation for subset(A[i:], k) relies on the answers to the smaller sub-problems subset(A[i+i:],...), you start at the bottom of the table, where i = n, fill out every k value from 0 to K for each row, and work your way up to row i = 0, ensuring you have the answers to the smaller sub-problems when you need them.
def subsetSum(A: list[int], K: int) -> bool:
N = len(A)
DP = [[None] * (K+1) for x in range(N+1)]
DP[N] = [True if x == 0 else False for x in range(K+1)]
for i in range(N-1, -1, -1):
Ai = A[i]
DP[i] = [DP[i+1][k] or (Ai <=k and DP[i+1][k-Ai]) for k in range(0, K+1)]
# print result
print(f"A = {A}, K = {K}")
print('Ai,k:', *range(0,K+1), sep='\t')
for (i, row) in enumerate(DP): print(A[i] if i < N else None, *row, sep='\t')
print(f"DP[0][K] = {DP[0][K]}")
return DP[0][K]
subsetSum([1,4,3,5,6], 9)
If you want to return an actual possible subset alongside the bool indicating whether or not it's possible to make one, then for every True flag in your DP you should also store the k index for the previous row that got you there (it will either be the current k index or k-A[i], depending on which table lookup returned True, which will indicate whether or not A[i] was used). Then you walk backwards from DP[0][K] after the table is filled to get a subset. This makes the code messier but it's definitely do-able. You can't get all subsets this way though (at least not without increasing your time complexity again) because the DP table compresses information.

Here is the optimized solution to the problem with a complexity of O(n^2).
def get_subsets(data: list, target: int):
# initialize final result which is a list of all subsets summing up to target
subsets = []
# records the difference between the target value and a group of numbers
differences = {}
for number in data:
prospects = []
# iterate through every record in differences
for diff in differences:
# the number complements a record in differences, i.e. a desired subset is found
if number - diff == 0:
new_subset = [number] + differences[diff]
new_subset.sort()
if new_subset not in subsets:
subsets.append(new_subset)
# the number fell short to reach the target; add to prospect instead
elif number - diff < 0:
prospects.append((number, diff))
# update the differences record
for prospect in prospects:
new_diff = target - sum(differences[prospect[1]]) - prospect[0]
differences[new_diff] = differences[prospect[1]] + [prospect[0]]
differences[target - number] = [number]
return subsets

Longest Arithmetic Progression

Given a list of numbers arr (not sorted) , find the Longest Arithmetic Progression in it.
Arrays: Integer a
1 ≤ arr.size() ≤ 10^3. and
-10^9 ≤ arr[i] ≤ 10^9.
Examples:
arr = [7,6,1,9,7,9,5,6,1,1,4,0] -------------- output = [7,6,5,4]
arr = [4,4,6,7,8,13,45,67] -------------- output = [4,6,8]
from itertools import combinations
def arithmeticProgression2(a):
n=len(a)
diff = ((y-x, x) for x, y in combinations(a, 2))
dic=[]
for d, n in diff:
k = []
seq=a
while n in seq:
k.append(n)
i=seq.index(n)
seq=seq[i+1:]
n += d
dic.append(k)
maxx=max([len(k) for k in dic])
for x in dic:
if len(x)==maxx:
return x
in case arr.size() is big enough. my code will be run more than 4000ms.
Example :
arr = [randint(-10**9,10**9) for i in range(10**3)]
runtime > 4000ms
How to reduce the space complexity for the above solution?

One of the things that makes the code slow is that you build series from scratch for each pair, which is not necessary:
you don't actually need to build k each time. If you just keep the step, the length and the start (or end) value of a progression, you know enough. Only build the progression explicitly for the final result
by doing this for each pair, you also create series where the start point is in fact in the middle of a longer series (having the same step), and so you partly do double work, and work that is not useful, as in that case the progression that starts earlier will evidently be longer than the currently analysed one.
It makes your code run in O(n³) time instead of the possible O(n²).
The following seems to return the result much faster in O(n²), using dynamic programming:
def longestprogression(data):
if len(data) < 3:
return data
maxlen = 0 # length of longest progression so far
endvalue = None # last value of longest progression
beststep = None # step of longest progression
# progressions ending in index i, keyed by their step size,
# with the progression length as value
dp = [{} for _ in range(len(data))]
# iterate all possible ending pairs of progressions
for j in range(1, len(arr)):
for i in range(j):
step = arr[j] - arr[i]
if step in dp[i]:
curlen = dp[i][step] + 1
else:
curlen = 2
dp[j][step] = curlen
if curlen > maxlen:
maxlen = curlen
endvalue = arr[j]
beststep = step
# rebuild the longest progression from the values we maintained
return list(reversed(range(endvalue, endvalue - maxlen * beststep, -beststep)))

Is there a python function that returns the first positive int that does not occur in list?

I'm tryin to design a function that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.
This code works fine yet has a high order of complexity, is there another solution that reduces the order of complexity?
Note: The 10000000 number is the range of integers in array A, I tried the sort function but does it reduces the complexity?
def solution(A):
for i in range(10000000):
if(A.count(i)) <= 0:
return(i)

The following is O(n logn):
a = [2, 1, 10, 3, 2, 15]
a.sort()
if a[0] > 1:
print(1)
else:
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
If you don't like the special handling of 1, you could just append zero to the array and have the same logic handle both cases:
a = sorted(a + [0])
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
Caveats (both trivial to fix and both left as an exercise for the reader):
Neither version handles empty input.
The code assumes there no negative numbers in the input.

O(n) time and O(n) space:
def solution(A):
count = [0] * len(A)
for x in A:
if 0 < x <= len(A):
count[x-1] = 1 # count[0] is to count 1
for i in range(len(count)):
if count[i] == 0:
return i+1
return len(A)+1 # only if A = [1, 2, ..., len(A)]

This should be O(n). Utilizes a temporary set to speed things along.
a = [2, 1, 10, 3, 2, 15]
#use a set of only the positive numbers for lookup
temp_set = set()
for i in a:
if i > 0:
temp_set.add(i)
#iterate from 1 upto length of set +1 (to ensure edge case is handled)
for i in range(1, len(temp_set) + 2):
if i not in temp_set:
print(i)
break

My proposal is a recursive function inspired by quicksort.
Each step divides the input sequence into two sublists (lt = less than pivot; ge = greater or equal than pivot) and decides, which of the sublists is to be processed in the next step. Note that there is no sorting.
The idea is that a set of integers such that lo <= n < hi contains "gaps" only if it has less than (hi - lo) elements.
The input sequence must not contain dups. A set can be passed directly.
# all cseq items > 0 assumed, no duplicates!
def find(cseq, cmin=1):
# cmin = possible minimum not ruled out yet
size = len(cseq)
if size <= 1:
return cmin+1 if cmin in cseq else cmin
lt = []
ge = []
pivot = cmin + size // 2
for n in cseq:
(lt if n < pivot else ge).append(n)
return find(lt, cmin) if cmin + len(lt) < pivot else find(ge, pivot)
test = set(range(1,100))
print(find(test)) # 100
test.remove(42)
print(find(test)) # 42
test.remove(1)
print(find(test)) # 1

Inspired by various solutions and comments above, about 20%-50% faster in my (simplistic) tests than the fastest of them (though I'm sure it could be made faster), and handling all the corner cases mentioned (non-positive numbers, duplicates, and empty list):
import numpy
def firstNotPresent(l):
positive = numpy.fromiter(set(l), dtype=int) # deduplicate
positive = positive[positive > 0] # only keep positive numbers
positive.sort()
top = positive.size + 1
if top == 1: # empty list
return 1
sequence = numpy.arange(1, top)
try:
return numpy.where(sequence < positive)[0][0]
except IndexError: # no numbers are missing, top is next
return top
The idea is: if you enumerate the positive, deduplicated, sorted list starting from one, the first time the index is less than the list value, the index value is missing from the list, and hence is the lowest positive number missing from the list.
This and the other solutions I tested against (those from adrtam, Paritosh Singh, and VPfB) all appear to be roughly O(n), as expected. (It is, I think, fairly obvious that this is a lower bound, since every element in the list must be examined to find the answer.) Edit: looking at this again, of course the big-O for this approach is at least O(n log(n)), because of the sort. It's just that the sort is so fast comparitively speaking that it looked linear overall.

Trying to write a code that will find the sum of the even numbers of the Fibonacci sequence?

I'm new to programming and I'm trying to write a program in Python that will find the sum of the even numbers of the numbers below 4,000,000 in the Fibonacci sequence. I'm not sure what I'm doing wrong but nothing will print. Thanks for any help.
def fib():
listx = []
for x in range(4000000):
if x == 0:
return 1
elif x == 1:
return 1
else:
listx.append(fib(x - 1) + fib(x - 2))
return listx
def evens(fib):
y = 0
for x in fib():
if x % 2 == 0:
y += x
else:
continue
print (y)

Here's an approach that uses a generator to keep memory usage to a minimum:
def fib_gen(up_to):
n, m = 0, 1
while n <= up_to:
yield n
n, m = m, n + m
total = 0
for f in fib_gen(4000000):
if f % 2 == 0:
total += f
Another option:
def fib_gen(up_to, filter):
n, m = 0, 1
while n <= up_to:
if filter(n):
yield n
n, m = m, n + m
sum(fib_gen(4000000, lambda f: f % 2 == 0)) # sum of evens
sum(fib_gen(4000000, lambda f: f % 2)) # sum of odds

First things first, there appears to be some contention between your requirements and the code you've delivered :-) The text of your question (presumably taken from an assignment, or Euler #2) requests the ...
sum of the even numbers of the numbers below 4,000,000 in the Fibonacci sequence.
Your code is summing the even numbers from the first four million Fibonacci numbers which is vastly different. The four millionth Fibonacci number has, according to Binet's formula, north of 800,000 digits in it (as opposed to the seven digits in the highest one below four million).
So, assuming the text to be more correct than the code, you don't actually need to construct a list and then evaluate every item in it, that's rather wasteful on memory.
The Fibonacci numbers can be generated on the fly and then simply accumulated if they're even. It's also far more useful to be able to use an arbitrary method to accumulate the numbers, something like the following:
def sumFibWithCond(limit, callback):
# Set up initial conditions.
grandparent, parent, child = 0, 0, 1
accum = 0
# Loop until number is at or beyond limit.
while child < limit:
# Add any suitable number to the accumulator.
accum = accum + callback(child)
# Set up next Fibonacci cycle.
grandparent, parent = parent, child
child = grandparent + child
# Return accumulator when done.
return accum
def accumulateEvens(num):
# Return even numbers as-is, zero for odd numbers.
if num % 2 == 0:
return num
return 0
sumEvensBelowFourMillion = sumFibWithCond(4000000, accumulateEvens)
Of special note is the initial conditions. The numbers are initialised to 0, 0, 1 since we want to ensure we check every Fibonacci number (in child) for the accumulating condition. This means the initial value of child should be one assuming, as per the question, that's the first number you want.
This doesn't make any difference in the current scenario since one is not even but, were you to change the accumulating condition to "odd numbers" (or any other condition that allowed for one), it would make a difference.
And, if you'd prefer to subscribe to the Fibonacci sequence starting with zero, the starting values should be 0, 1, 0 instead.

Maybe this will help you.
def sumOfEvenFibs():
# a,b,c in the Fibonacci sequence
a = 1
b = 1
result = 0
while b < 4000000:
if b % 2 == 0:
result += b
c = a + b
a = b
b = c
return result

Interviewstreet's Insertion sort program

I tried to program Interiewstreet's Insertion sort challenge Link for the challenge
in Python and here is my code shown below.
The program runs fine for a limit(which I'm not sure of) of input elements, but returns a false output for inputs of larger sizes. Can anyone guide me what am I doing wrong?
# This program tries to identify number of times swapping is done to sort the input array
"""
=>Get input values and print them
=>Get number of test cases and get inputs for those test cases
=>Complete Insertion sort routine
=>Add a variable to count the swapping's
"""
def sort_swap_times(nums):
""" This function takes a list of elements and then returns the number of times
swapping was necessary to complete the sorting
"""
times_swapped = 0L
# perform the insertion sort routine
for j in range(1, len(nums)):
key = nums[j]
i = j - 1
while i >= 0 and nums[i] > key:
# perform swap and update the tracker
nums[i + 1] = nums[i]
times_swapped += 1
i = i - 1
# place the key value in the position identified
nums[i + 1] = key
return times_swapped
# get the upper limit.
limit = int(raw_input())
swap_count = []
# get the length and elements.
for i in range(limit):
length = int(raw_input())
elements_str = raw_input() # returns a list of strings
# convert the given elements from str to int
elements_int = map(int, elements_str.split())
# pass integer elements list to perform the sorting
# get the number of times swapping was needed and append the return value to swap_count list
swap_count.append(sort_swap_times(elements_int))
# print the swap counts for each input array
for x in swap_count:
print x

Your algorithm is correct, but this is a naive approach to the problem and will give you a Time Limit Exceed signal on large test cases (i.e., len(nums) > 10000). Let's analyze the run-time complexity of your algorithm.
for j in range(1, len(nums)):
key = nums[j]
i = j - 1
while i >= 0 and nums[i] > key:
# perform swap and update the tracker
nums[i + 1] = nums[i]
times_swapped += 1
i = i - 1
# place the key value in the position identified
nums[i + 1] = key
The number of steps required in the above snippet is proportional to 1 + 2 + .. + len(nums)-1, or len(nums)*(len(nums)-1)/2 steps, which is O(len(nums)^2).
Hint:
Use the fact that all values will be within [1,10^6]. What you are really doing here is finding the number of inversions in the list, i.e. find all pairs of i < j s.t. nums[i] > nums[j]. Think of a data structure that allows you to find the number of swaps needed for each insert operation in logarithmic time complexity. Of course, there are other approaches.
Spoiler:
Binary Indexed Trees

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating min heap from array - 2 methods - python

Related

Getting all subsets from subset sum problem on Python using Dynamic Programming

Longest Arithmetic Progression

Is there a python function that returns the first positive int that does not occur in list?

Trying to write a code that will find the sum of the even numbers of the Fibonacci sequence?

Interviewstreet's Insertion sort program

Categories

Resources