Finding Missing Element in an Array - python

I have an interesting problem that given two sorted arrays:
a with n elements , b with n-1 elements.
b has all the elements of a except one element is missing.
How to find that element in O(log n) time?
I have tried this code:
def lostElements2(a, b):
if len(a)<len(b):
a, b = b, a
l, r = 0, len(a)-1
while l<r:
m = l + (r-l)//2
if a[m]==b[m]:
l = m+1
else:
r = m - 1
return a[r]
print(lostElements2([-1,0,4,5,7,9], [-1,0,4,5,9]))
I am not getting what should I return in the function, should it be a[l], a[r]?
I am getting how the logic inside the function should be: if the mid values of both arrays match, it means, b till the mid point is the same as a, and hence the missing element must be on the right of mid.
But am not able to create a final solution, when should the loop stop and what should be returned? How will it guarantee that a[l] or a[r] is indeed the missing element?

The point of l and r should be that l is always a position where the lists are equal, while r is always a position where they differ. Ie.
a[l]==b[l] and a[r]!=b[r]
The only mistake in the code is to update r to m-1 instead of m. If we know that a[m]!=b[m], we can safely set r=m. But setting it to m-1risks getting a[r]==b[r], which breaks the algorithm.
def lostElements2(a, b):
if len(a) < len(b):
a, b = b, a
if a[0] != b[0]:
return a[0]
l, r = 0, len(a)-1
while l < r:
m = l + (r-l)//2
if a[m] == b[m]:
l = m+1
else:
r = m # The only change
return a[r]
(As #btilly points out, this algorithm fails if we allow for repeated values.)
edit from #btilly
To fix that potential flaw, if the values are equal, we search for the range with the same value. To do that we walk forward in steps of size 1, 2, 4, 8 and so on until the value switches, then do a binary search. And walk backwards the same. Now look for a difference at each edge.
The effort required for that search is O(log(k)) where k is the length of the repeated value. So we are now replacing O(log(n)) lookups with searches. If there is an upper bound K on the length of that search, that makes the overall running time. O(log(n)log(K)). That makes the worst case running time O(log(n)^2). If K is close to sqrt(n), it is easy to actually hit that worst case.
I claimed in a comment that if at most K elements are repeated more than K times then the running time is O(log(n)log(K)). On further analysis, that claim is wrong. If K = log(n) and the log(n) runs of length sqrt(n) are placed to hit all the choices of the search, then you get running time O(log(n)^2) and not O(log(n)log(log(n))).
However if at most log(K) elements are repeated more than K times, then you DO get a running time of O(log(n)log(K)). Which should be good enough for most cases. :-)

The principle of this problem is simple, the details are hard.
You have arranged that array a is the longer one. Good, that simplifies life. Now you need to return the value of a at the first position where the value of a differs from the value of b.
Now you need to be sure to deal with the following edge cases.
The differing value is the last (ie at a position where only array a has a value.
The differing value is the very first. (Binary search algorithms are easy to screw up for this case.
There is a run the same. That is a = [1, 1, 2, 2, 2, 2, 3] while b = [1, 2, 2, 2, 2, 3] - when you land in the middle the fact that the values match can mislead you!
Good luck!

Your code is not handling the case where the missing element is the index m itself. Your if/else clause that follows will always move the bounds of where the missing element can be to not include m.
You could fix this by including an additional check:
if a[m]==b[m]:
l = m+1
elif m==0 or a[m-1]==b[m-1]:
return a[m]
else:
r = m - 1
An alternative would be to store the last value of m:
last_m = 0
...
else:
last_m = m
r = m - 1
...
return a[last_m]
Which would cause it to return the last time a mismatch was detected.

Related

Sum of two squares in Python

I have written a code based on the two pointer algorithm to find the sum of two squares. My problem is that I run into a memory error when running this code for an input n=55555**2 + 66666**2. I am wondering how to correct this memory error.
def sum_of_two_squares(n):
look=tuple(range(n))
i=0
j = len(look)-1
while i < j:
x = (look[i])**2 + (look[j])**2
if x == n:
return (j,i)
elif x < n:
i += 1
else:
j -= 1
return None
n=55555**2 + 66666**2
print(sum_of_two_squares(n))
The problem Im trying to solve using two pointer algorithm is:
return a tuple of two positive integers whose squares add up to n, or return None if the integer n cannot be so expressed as a sum of two squares. The returned tuple must present the larger of its two numbers first. Furthermore, if some integer can be expressed as a sum of two squares in several ways, return the breakdown that maximizes the larger number. For example, the integer 85 allows two such representations 7*7 + 6*6 and 9*9 + 2*2, of which this function must therefore return (9, 2).
You're creating a tuple of size 55555^2 + 66666^2 = 7530713581
So if each element of the tuple takes one byte, the tuple will take up 7.01 GiB.
You'll need to either reduce the size of the tuple, or possibly make each element take up less space by specifying the type of each element: I would suggest looking into Numpy for the latter.
Specifically for this problem:
Why use a tuple at all?
You create the variable look which is just a list of integers:
look=tuple(range(n)) # = (0, 1, 2, ..., n-1)
Then you reference it, but never modify it. So: look[i] == i and look[j] == j.
So you're looking up numbers in a list of numbers. Why look them up? Why not just use i in place of look[i] and remove look altogether?
As others have pointed out, there's no need to use tuples at all.
One reasonably efficient way of solving this problem is to generate a series of integer square values (0, 1, 4, 9, etc...) and test whether or not subtracting these values from n leaves you with a value that is a perfect square.
You can generate a series of perfect squares efficiently by adding successive odd numbers together: 0 (+1) → 1 (+3) → 4 (+5) → 9 (etc.)
There are also various tricks you can use to test whether or not a number is a perfect square (for example, see the answers to this question), but — in Python, at least — it seems that simply testing the value of int(n**0.5) is faster than iterative methods such as a binary search.
def integer_sqrt(n):
# If n is a perfect square, return its (integer) square
# root. Otherwise return -1
r = int(n**0.5)
if r * r == n:
return r
return -1
def sum_of_two_squares(n):
# If n can be expressed as the sum of two squared integers,
# return these integers as a tuple. Otherwise return <None>
# i: iterator variable
# x: value of i**2
# y: value we need to add to x to obtain (i+1)**2
i, x, y = 0, 0, 1
# If i**2 > n / 2, then we can stop searching
max_x = n >> 1
while x <= max_x:
r = integer_sqrt(n-x)
if r >= 0:
return (i, r)
i, x, y = i+1, x+y, y+2
return None
This returns a solution to sum_of_two_squares(55555**2 + 66666**2) in a fraction of a second.
You do not need the ranges at all, and certainly do not need to convert them into tuples. They take a ridiculous amount of space, but you only need their current elements, numbers i and j. Also, as the friendly commenter suggested, you can start with sqrt(n) to improve the performance further.
def sum_of_two_squares(n):
i = 1
j = int(n ** (1/2))
while i < j:
x = i * i + j * j
if x == n:
return j, i
if x < n:
i += 1
else:
j -= 1
Bear in mind that the problem takes a very long time to be solved. Be patient. And no, NumPy won't help. There is nothing here to vectorize.

Trying to understand the time complexity of this dynamic recursive subset sum

# Returns true if there exists a subsequence of `A[0…n]` with the given sum
def subsetSum(A, n, k, lookup):
# return true if the sum becomes 0 (subset found)
if k == 0:
return True
# base case: no items left, or sum becomes negative
if n < 0 or k < 0:
return False
# construct a unique key from dynamic elements of the input
key = (n, k)
# if the subproblem is seen for the first time, solve it and
# store its result in a dictionary
if key not in lookup:
# Case 1. Include the current item `A[n]` in the subset and recur
# for the remaining items `n-1` with the decreased total `k-A[n]`
include = subsetSum(A, n - 1, k - A[n], lookup)
# Case 2. Exclude the current item `A[n]` from the subset and recur for
# the remaining items `n-1`
exclude = subsetSum(A, n - 1, k, lookup)
# assign true if we get subset by including or excluding the current item
lookup[key] = include or exclude
# return solution to the current subproblem
return lookup[key]
if __name__ == '__main__':
# Input: a set of items and a sum
A = [7, 3, 2, 5, 8]
k = 14
# create a dictionary to store solutions to subproblems
lookup = {}
if subsetSum(A, len(A) - 1, k, lookup):
print('Subsequence with the given sum exists')
else:
print('Subsequence with the given sum does not exist')
It is said that the complexity of this algorithm is O(n * sum), but I can't understand how or why;
can someone help me? Could be a wordy explanation or a recurrence relation, anything is fine
The simplest explanation I can give is to realize that when lookup[(n, k)] has a value, it is True or False and indicates whether some subset of A[:n+1] sums to k.
Imagine a naive algorithm that just fills in all the elements of lookup row by row.
lookup[(0, i)] (for 0 ≤ i ≤ total) has just two elements true, i = A[0] and i = 0, and all the other elements are false.
lookup[(1, i)] (for 0 ≤ i ≤ total) is true if lookup[(0, i)] is true or i ≥ A[1] and lookup[(0, i - A[1]) is true. I can reach the sum i either by using A[i] or not, and I've already calculated both of those.
...
lookup[(r, i)] (for 0 ≤ i ≤ total) is true if lookup[(r - 1, i)] is true or i ≥ A[r] and lookup[(r - 1, i - A[r]) is true.
Filling in this table this way, it is clear that we can completely fill the lookup table for rows 0 ≤ row < len(A) in time len(A) * total since filling in each element in linear. And our final answer is just checking if (len(A) - 1, sum) True in the table.
Your program is doing the exact same thing, but calculating the value of entries of lookup as they are needed.
Sorry for submitting two answers. I think I came up with a slightly simpler explanation.
Take your code in imagine putting the three lines inside if key not in lookup: into a separate function, calculateLookup(A, n, k, lookup). I'm going to call "the cost of calling calculateLookup for n and k for a specific value of n and k to be the total time spent in the call to calculateLookup(A, n, k, loopup), but excluding any recursive calls to calculateLookup.
The key insight is that as defined above, the cost of calling calculateLookup() for any n and k is O(1). Since we are excluding recursive calls in the cost, and there are no for loops, the cost of calculateLookup is the cost of just executing a few tests.
The entire algorithm does a fixed amount of work, calls calculateLookup, and then a small amount of work. Hence the amount of time spent in our code is the same as asking how many times do we call calculateLookup?
Now we're back to previous answer. Because of the lookup table, every call to calculateLookup is called with a different value for (n, k). We also know that we check the bounds of n and k before each call to calculateLookup so 1 ≤ k ≤ sum and 0 ≤ n ≤ len(A). So calculateLookup is called at most (len(A) * sum) times.
In general, for these algorithms that use memoization/cacheing, the easiest thing to do is to separately calculate and then sum:
How long things take assuming all values you need are cached.
How long it takes to fill the cache.
The algorithm you presented is just filling up the lookup cache. It's doing it in an unusual order, and its not filling every entry in the table, but that's all its doing.
The code would be slightly faster with
lookup[key] = subsetSum(A, n - 1, k - A[n], lookup) or subsetSum(A, n - 1, k, lookup)
Doesn't change the O() of the code in the worst case, but can avoid some unnecessary calculations.

No Negative Prefix

def minOperation(A,N):
operations = 0
for i in range(N):
if sum(A[:i+1])<0:
A[i] = A[i]+1
operations += 1
return operations
What am I doing wrong in this code?
The question says:
Given an array A[] of N integers. In each operation, the person can increase the ith element by 1 (i.e. set A[i] = A[i] + 1). The task is to calculate the minimum number of operations required such that there is no prefix in the array A[] whose sum is less than zero. (i.e. for all i, This condition should be satisfied A[1] + A[2] + .. + A[i] >= 0).
what I am doing wrong in this code
Your code assumes that a sum that becomes negative, can always be solved by just adding one to the last term, but you might need to suddenly add a one to all terms visited so far. For example:
A = [1, 1, 1, 1, -9]
Your code will not add a one to the first 4 terms, and then will only change -9 to -8, but that doesn't do enough. At that moment you really should consider that all previous terms should get an extra point, so that the prefix sum becomes 2+2+2+2-8, which then is still OK.
So the idea is to keep track of how many ones you have added, which will at the same time indicate at which index you can add the next one -- if needed. Each time your running total becomes negative, you know how many operations you would need in addition. If those are available, then use them to get the total to zero.
Code:
def minOperations(lst):
operations = 0
total = 0
for i, val in enumerate(lst):
total += val
if total < 0:
# Add the number of operations to make the total 0
operations += -total
total = 0
# If more operations are needed than available...
if operations > i + 1:
return -1 # ...it cannot be solved
return operations
# Example run
lst = [1, 1, -5, 3, 2, -4, -1]
print(minOperations(lst)) # 3
NB: it is not efficient to recalculate the sum of the subarray at each iteration. Just keep adding the current value to a running sum.
Walk through array and calculate prefix sums.
If for every sum inequality
- PrefixSum[i] <= i
is true, then result is
max(0, - min(PrefixSum))
else result is None (we cannot add enough ones)

How to generate natural products in order?

You want to have a list of the ordered products n x m so that both n and m are natural numbers and 1 < (n x m) < upper_limit, say uper_limit = 100. Also both n and m cannot be bigger than the square root of the upper limit (therefore n <= 10 and m <= 10).
The most straightforward thing to do would be to generate all the products with a list comprehension and then sort the result.
sorted(n*m for n in range(1,10) for m in range(1,n))
However when upper_limit becomes very big then this is not very efficient, especially if the objective is to found only one number given certain criteria (ex. find the max product such that ... -> I would want to generate the products in descending order, test them and stop the whole process as soon as I find the first one that respects the criteria).
So, how to generate this products in order?
The first thing I have done was to start from the upper_limit and go backwards one by one, making a double test:
- checking if the number can be a product of n and m
- checking for the criteria
Again, this is not very efficient ...
Any algorithm that solves this problem?
I found a slightly more efficient solution to this problem.
For a and b being natural numbers:
S = a + b
D = abs(a - b)
If S is constant, the smaller D is, the bigger a*b is.
For each S (taken in decreasing order) it is therefore possible to iterate through all the possible tuples (a, b) with increasing D.
First I plug the external condition and if the product ab respects the condition I then iterate through other (a,b) tuples with smaller decreasing S and smaller increasing D to check if I find other numbers that respect the same condition but have a bigger ab. I repeat the iteration until I find a number with D == 0 or 1 (because in that case there cannot be tuples with smaller S that have a higher product)
The following code will check all the possible combinations without repetition and will stop when the condition is met. In this code if the break is executed in the inner loop, the break statement in the outer loop is executed as well, otherwise continue statement is executed.
from math import sqrt
n = m = round(sqrt(int(input("Enter upper limit"))))
for i in range(n, 0, -1):
for j in range(i - 1, 0, -1):
if * required condition:
n = i
m = j
break
else:
continue
break

Heap sort not working python

Referring to MIT's open course ware algo course chapter 4, I created heap sort according to the psuedocode given:
def heap_sort(A):
build_max_heap(A)
curr_heap_size = len(A)
while curr_heap_size:
A[0],A[curr_heap_size-1] = A[curr_heap_size-1],A
curr_heap_size -= 1
max_heapify(A,0)
return A
build_max_heap is guaranteed to be correct, as I checked it with pythons heapq library.
However, heap_sort does not seem to work correctly,
test_array = [1,5,3,6,49,2,4,5,6]
heap_sort(test_array)
print(test_array) # --> [6,5,5,4,3,49,6,2,1]
Completely stumped here, I cross checked with Heap sort Python implementation and it appears to be the same...
Would appreciate help, thank you!
EDIT: Code for max_heapify and build_max_heap:
def max_heapify(A,i):
heap_size = len(A)
l,r = i*2+1,i*2+2
largest = l
if l < heap_size and A[l] > A[largest]:
largest = l
if r < heap_size and A[r] > A[largest]:
largest = r
if largest != i:
A[i],A[largest] = A[largest],A[i]
max_heapify(A,largest)
def build_max_heap(A):
array_size = len(A)
for i in range(array_size // 2 ,-1,-1):
max_heapify(A,largest)
You have a few mistakes in your code that made it harder to regenerate your case, and find the solution to your particular issue, but here it goes.
First, your code includes a syntax error in heap_sort function, specifically when you try to swap the first and the last elements of A. On the right hand side of that assignment, the second value is A, even though it should be A[0].
Secondly, your usage of variable largest in build_max_heap implies either that largest is a global variable declaration of which you did not provide in your question, or you meant to use i, instead. I assumed it was the second case, and since I have a working heap_sort based on the code you provided, I reckon my assumption was correct.
Third, in max_heapify, you initialize largest to l, even though you should initialize it with i. I believe you will find this to be a trivial mistake, as further down that same function, you clearly expect the value of largest to be equal to that of i.
Finally, your most crucial error is that you keep passing down the entire array and use an array length that never decreases.(i.e. it is always the initial length of test_array) The algorithm you use finds the maximum element of the given array and exclude it from the remainder of the structure. That way, you have an array that keeps on decreasing in size while sending its largest element to the end.(i.e. just beyond its reach/length) But, since you never decrease the size of the array, and its length is continually computed as len(test_array), it will never work as expected.
There are two approaches that can solve your issue. Option 1 is passing down to max_heapify a shorter version of A in heap_sort. Specifically you should pass A[:curr_heap_size] at each iteration of the loop. This method can work, but it is not really space efficient, as you make a new list each time. Instead, you can pass down curr_heap_size as an argument to functions build_max_heap and max_heapify, and assume that to be the length. (i.e. as opposed to len(A))
Below is a working heap_sort implementation, based on your code. All I did was fixing the mistakes I listed above.
def max_heapify(A, heap_size, i):
l,r = i*2+1,i*2+2
largest = i
if l < heap_size and A[l] > A[largest]:
largest = l
if r < heap_size and A[r] > A[largest]:
largest = r
if largest != i:
A[i], A[largest] = A[largest], A[i]
max_heapify(A, heap_size, largest)
def build_max_heap(A, array_size):
for i in range(array_size // 2 ,-1,-1):
max_heapify(A, array_size, i)
def heap_sort(A):
curr_heap_size = len(A)
build_max_heap(A, curr_heap_size)
while curr_heap_size > 0:
A[0], A[curr_heap_size-1] = A[curr_heap_size-1], A[0]
curr_heap_size -= 1
max_heapify(A, curr_heap_size, 0)
return A

Categories