two sorted array and find k'th smallest number - python

two sorted array A, B with length n,m (n <= m), and k which (k >= log n) is given.
with log (nm) we can find k-th smallest number in union of these two array.
I have a solution In problem 2 here, but my challenge is why two given condition "(n <= m)" and "k >= log n" does not affect this algorithm?

First assumption: n <= m is an assumption "without loss of generality". If n >= m, then just swap A and B in your head. They included this assumption even though it was not needed, because they felt it was "free" to make this assumption.
Second assumption: A trivial algorithm to find the kth smallest element is to iterate over A and B simultaneously, advancing in the array that has the smaller element of the two. This would be exactly like running the "Merge" function from mergesort, but stopping once you've merged the first k elements. The complexity would be O(k). They wanted you to find a more complex algorithm, so they "ruled out" this algorithm by stating that k >= log(n), which implies that complexity O(k) is never better than O(log(n)). Technically, if they wanted to thoroughly rule this algorithm out, they should also have stated that k <= n + m - log(n), otherwise you could run the "merge" function from the end: merge the n+m-k largest elements, then return the n+m-kth largest element, which is the same as the kth smallest element.

Related

Count all possible triangles

I have 3 lists each containing some integers.
Now I have to find out how many Triangles can I make such that each side of the triangle is from a different list.
A = [3,1,2]
B = [1,5]
C = [2,4,1]
Possible Triangles:
3,5,4
1,1,1
2,1,2
2,5,4
So answer should be 4.
I tried using three loops and using the triangle property where sum of any two sides is always greater than the third. But the time complexity for this O(n^3). I want something more faster.
count = 0
for i in range(len(a)):
for j in range(len(b)):
for k in range(len(c)):
if (a[i]+b[j] > c[k]) and (a[i]+c[k] > b[j]) and (c[k]+b[j] > a[i]):
count += 1
print(count)
What's the O of this code?
import itertools as it
[sides for sides in it.product(A, B, C) if max(sides) < sum(sides) / 2.]
Edit: Jump to the bottom for an O(n^2) solution I found later, but which builds on the ideas in my previous solutions.
Solution 1: use binary search to get O(n^2 log(n))
Pseudo code:
C_s = sort C in ascending order
for a in A:
for b in B:
use binary search to find the highest index c_low in C_s s.t. C_s[c_low] <= |a-b|
use binary search to find lowest index c_high in C_s s.t. C_s[c_high] >= a+b
count += (c_high-c_low-1)
Explanation and demonstration of correctness:
given the pair (a,b):
values of c from C_s[0] to C_s[c_low] are too short to make a triangle
values of c from C_s[c_high] to C_s[end] are too large
there are c_high-c_low-1 values of c between c_high and c_low, excluding the end points themselves, that will make a valid triangle with sides (a,b,c)
I assume the standard binary search behaviour of returning an index just past either end if all values in C_s are either > |a-b| or < a+b, as if C_s[-1] = -infinity and C_s[end+1] = +infinity.
Solution 2: cache values to get best case O(n^2) but worse case O(n^2 log n)
The algorithm above can be improved with a cache for the results of the binary searches. If you cache them in an associative array with insert and lookup both O(1), then this is faster in the best case, and probably also in the average case, depending on the distribution of inputs:
C_s = sort C in ascending order
low_cache = empty associative array
high_cache = empty associative array
for a in A:
for b in B:
if |a-b| in low_cache:
c_low = low_cache{|a-b|}
else:
use binary search to find the highest index c_low in C_s s.t. C_s[c_low] <= |a-b|
set low_cache{|a-b|} = c_low
if a+b in high_cache:
c_high = high_cache{a+b}
else:
use binary search to find lowest index c_high in C_s s.t. C_s[c_high] >= a+b
set high_cache{a+b} = c_high
count += (c_high-c_low-1)
Best case analysis: if A and B are both {1 .. n}, i.e. the n numbers between 1 and n, then the binary search for c_low will be called n times, and the binary search for c_high will be called 2*n-1 times, and the rest of the time the results will be found in the cache.
So the inner loop does 2*n^2 cache lookups with total time O(2*n^2 * 1) = O(n^2), plus 3*n-1 binary searches with total time O((3*n-1) log n) = O(n log n), with an overall best-case time of O(n^2).
Worst case analysis: the cache misses every time, we're back to the original algorithm with its O(n^2 log n).
Average case: this all depends on the distribution of inputs.
More ideas I thought of
You might be able to make it faster by sorting A and B too. EDIT: see solution 3 below for how to get rid of the log n factor by sorting B. I haven't yet found a way to improve it further by sorting A, though.
For Example, if you sort B, as you move to the next b, you can assume the next c_low is greater than the previous if |a-b| is greater, or vice versa. Similar logic applies to c_high. EDIT: the merge idea in solution 3 takes advantage of exactly this fact.
An easy optimization if A, B and C are of different sizes: make C the largest one of the three lists.
Another idea that doesn't actually help, but that I think is worth describing to guide further thought:
calculate all the values of |a-b| and a+b and store them in an array (time: O(n^2))
sort that array (O((n^2)log(n^2))=O(n^2 log n)) -- this is the step that breaks the idea and brings us back to O(n^2 log n),
use a merge algorithm between than sorted array and C_sorted (O(n^2+n)=O(n^2)) to extract the info from the results.
That is very fuzzy pseudo code, I know, but if sorting the array of n^2 elements could somehow be made faster, maybe by constructing that array in a more clever way by sorting A and B first, maybe the whole thing could come down to O(n^2), at the cost of significantly increased complexity. However, thinking about this overnight, I don't see how to get rid of the log n factor here: you could make O(n) sorted lists of O(n) elements and merge them, but merging O(n) lists brings the log n factor right back in - it may be somewhat faster, but only by a constant, the complexity class stays the same.
Solution 3: use merge to bring it down to O(n^2)
Trying to figure out I could how to take advantage of sorting B, I finally realized the idea was simple: use the merging algorithm of merge sort, which can merge two sorted lists of n elements in 2*n - 1 comparisons, but modify it to calculate c_low and c_high (as defined in solution 1) for all elements of B in O(n) operations.
For c_high, this is simple, because {a+b for b in B} sorts in the same order as B, but for c_low it will take two merging passes, since {|a-b| for b in B} is descending as b increases up to b<=a, and ascending again for b>=a.
Pseudo code:
B_s = sort B in ascending order
C_s = sort C in ascending order
C_s_rev = reverse C_s
for a in A:
use binary search to find the highest index mid where B_s[mid] <= a
use an adapted merge over B_s[0..mid] and C_s_rev to calculate c_low_i for i in 0 .. mid
use an adapted merge over B_s[mid+1..end_B] and C_s to calculate c_low_i for i in mid+1 .. end_B
use an adapted merge over B_s and C_s to calculate c_high_i for i in range(len(B_s))
for i in 0 .. end_B:
count += c_high_i - c_low_i - 1
Big-Oh analysis: the body of the loop takes O(log(n) + 2n + 2n + 2n + n) = O(n) and runs n times (over the elements of A) yielding a total complexity class of O(n^2).

How to generate natural products in order?

You want to have a list of the ordered products n x m so that both n and m are natural numbers and 1 < (n x m) < upper_limit, say uper_limit = 100. Also both n and m cannot be bigger than the square root of the upper limit (therefore n <= 10 and m <= 10).
The most straightforward thing to do would be to generate all the products with a list comprehension and then sort the result.
sorted(n*m for n in range(1,10) for m in range(1,n))
However when upper_limit becomes very big then this is not very efficient, especially if the objective is to found only one number given certain criteria (ex. find the max product such that ... -> I would want to generate the products in descending order, test them and stop the whole process as soon as I find the first one that respects the criteria).
So, how to generate this products in order?
The first thing I have done was to start from the upper_limit and go backwards one by one, making a double test:
- checking if the number can be a product of n and m
- checking for the criteria
Again, this is not very efficient ...
Any algorithm that solves this problem?
I found a slightly more efficient solution to this problem.
For a and b being natural numbers:
S = a + b
D = abs(a - b)
If S is constant, the smaller D is, the bigger a*b is.
For each S (taken in decreasing order) it is therefore possible to iterate through all the possible tuples (a, b) with increasing D.
First I plug the external condition and if the product ab respects the condition I then iterate through other (a,b) tuples with smaller decreasing S and smaller increasing D to check if I find other numbers that respect the same condition but have a bigger ab. I repeat the iteration until I find a number with D == 0 or 1 (because in that case there cannot be tuples with smaller S that have a higher product)
The following code will check all the possible combinations without repetition and will stop when the condition is met. In this code if the break is executed in the inner loop, the break statement in the outer loop is executed as well, otherwise continue statement is executed.
from math import sqrt
n = m = round(sqrt(int(input("Enter upper limit"))))
for i in range(n, 0, -1):
for j in range(i - 1, 0, -1):
if * required condition:
n = i
m = j
break
else:
continue
break

Consider an array n of integers A=[a1,a2,a3......an]. Find and print the total number of pairs such that ai*aj <= max(ai,ai+1,.....aj) where i < j

Can anyone please help me with the above question.
We have to find combination of elements in the array (a1,a2),(a1,a3),(a1,a4).... so on, and pick those combinations which satisfies the condition (ai*aj) <= max(A) where A is the array and return the number of combinations possible.
Example : input array A = [1,1,2,4,2] and it returns 8 as the combinations are :
(1,1),(1,2),(1,4),(1,2),(1,2),(1,4),(1,2),(2,2).
It's easy to solve this using nested for loops but that would be very time consuming.(O(n^2)).
Naive algorithm:
array = [1,1,2,4,2]
result = []
for i in range(len(array)):
for j in range(len(array)):
if array[i] * array[j] <= max(array):
if (array[j],array[i]) not in result:
result.append((array[i],array[j]))
print(len(result))
What should be the approach when we encounter such problems ?
What I understand reading your problem description is that you want to find the total count of pairs whose multiplication is less than the maximum element in the range between these pairs i.e. ai*aj <= max(ai,ai+1,…aj).
The naive approach suggested by Thomas is easy to understand but it is still of time complexity of O(n^2)). We can optimize this to reduce the time complexity to O(n*log^2n). Lets discuss about it in details.
At first, from each index- i, we can find out the range say {l, r} in which element at index- i will be greater than or equal to all the elements from l to i , as well as it is greater than all elements ranges from i + 1 to r. This can be easily calculated in O(n) time complexity using the idea of histogram data-structure.
Now, lets for each index-i, we find out such range {l, r}, and if we want to traverse over minimum length out of two ranges i.e. min( i - l, r - i ) then overall we will traverse n*logn indices for overall array. While traversing over small length range, if we encounter some element say x, then we somehow have to find out how many elements exists in other range such that values are less than ai / x. This can be solved using offline processing with Fenwick Tree data-structure in O(logn) time complexity for each query. Hence, we can solve above problem with overall complexity of O(n log^2 n) time complexity.
What about sorting the array, then iterating up: for each element, e, binary search the closest element to floor(max(A) / e) that's lower than or equal to e. Add the number of elements to the left of that index. (If there are many duplicates, hash their count, present only two of them in the sorted array, and use prefix sums to return the correct number of items to the left of any index.)
1 1 2 4 2
1 1 2 2 4
0
1
2
3
2
It's easy to solve this using nested for loops but that would be very time consuming.(O(n^2)).
since where i < j we can cut this in half:
for i in range(len(array)):
for j in range(i+1, len(array)):
...
Now let's get rid of this part if (array[j],array[i]) not in result: as it does not reflect your results: (1,1),(1,2),(1,4),(1,2),(1,2),(1,4),(1,2),(2,2) here you have dupes.
The next expensive step we can get rid of is max(array) wich is not only wrong (max(ai,ai+1,…aj) translates to max(array[i:j])) but has to iterate over a whole section of the array in each iteration. Since the Array doesn't change, the only thing that may change this maximum is array[j], the new value you're processing.
Let's store that in a variable:
array = [1,1,2,4,2]
result = []
for i in range(len(array)):
maxValue = array[i]
for j in range(i+1, len(array)):
if array[j] > maxValue:
maxValue = array[j]
if array[i] * array[j] <= maxValue:
result.append((array[i], array[j]))
print(len(result))
Still a naive algorithm, but imo we've made some improvements.
Another thing we could make is not only store the maxValue, but also a pivot = maxValue / array[i] and therefore replace the multiplication by a simple comparison if array[j] <= pivot:. Doing this under the assumption that the multiplication would have been called way more often than the maxValue and therefore the pivot changes.
But since I'm not very experienced in python, I'm not sure wether this would make any difference in python or wether I'm on the road to pointless micro-optimizations with this.

permutation calculation runtime complexity with some changes

I have a question about the runtime complexity of the standard permutation finding algorithm. Consider a list A, find (and print) all permutations of its elements.
Here's my recursive implementation, where printperm() prints every permutation:
def printperm(A, p):
if len(A) == len(p):
print("".join(p))
return
for i in range(0, len(A)):
if A[i] != 0:
tmp = A[i] # remember ith position
A[i] = 0 # mark character i as used
p.append(tmp) # select character i for this permutation
printperm(A, p) # Solve subproblem, which is smaller because we marked a character in this subproblem as smaller
p.pop() # done with selecting character i for this permutation
A[i] = tmp # restore character i in preparation for selecting the next available character
printperm(['a', 'b', 'c', 'd'], [])
The runtime complexity appears to be O(n!) where n is the size of A. This is because at each recursion level, the amount of work decreases by 1. So, the top recursion level is n amount of work, the next level is n-1, and the next level is n-2, and so on. So the total complexity is n*(n-1)*(n-2)...=n!
Now the problem is the print("".join(p)) statement. Every time this line runs, it iterates through the list, which iterates through the entire list, which is complexity n. There are n! number of permutations of a list of size n. So that means the amount of work done by the print("".join(p)) statement is n!*n.
Does the presence of the print("".join(p)) statement then increases the runtime complexity to O(n * n!)?? But this doesn't seem right, because I'm not running the print statement on every recursion call. Where does my logic for getting O(n * n!) break down?
You're basically right! The possible confusion comes in your "... and the next level is n-2, and so on". "And so on" is glossing over that at the very bottom level of the recursion, you're not doing O(1) work, but rather O(n) work to do the print. So the total complexity is proportional to
n * (n-1) * (n-2) ... * 2 * n
which equals n! * n. Note that the .join() doesn't really matter to this. It would also take O(n) work to simply print(p).
EDIT: But that's not really right, for a different reason. At all levels above the print level, you're doing
for i in range(0, len(A)):
and len(A) doesn't change. So every level is doing O(n) work. To be sure, the deeper the level the more zeroes there are in A, and so the less work the loop does, but it's nevertheless still O(n) merely to iterate over range(n) at all.
Permutations can be generated from combinations using divide and conquer technique to achieve better time complexity. However, generated output is not in order.
Suppose we want to generate permutations for n=8 {0,1,2,3,4,5,6,7}.
Below are instances of permutations
01234567
01234576
03215654
30126754
Note first 4 items in each arrangement are of same set {0,1,2,3} and last 4 items are of same set {4,5,6,7}
This means given set A={0,1,2,3} B={4,5,6,7} permutations from A are 4! and from B are 4!.
It follows that taking one instance of permutations from A and one instance of permutations from B we get a correct permutation instance from universal set {0,1,2,3,4,5,6,7}.
Total permutations given sets A and B such that A union B and A equals universal set and B intersection is null set gives universal set are 2*A!*B! (By swapping A and B order
In this case we get 2*4!*4! = 1156.
So to generate all permutations when n=8, we simply need to generate possible combinations of sets A and B that satisfy conditions explained earlier.
Generating combinations of 8 elements taking 4 at time in lexicographical order is pretty fast. We need to generate the first half combinations each assign to respective set A and set B is found using set operation.
For each pair A and B we apply divide and conquer algorithm.
A and B doesn't need to be of equal sizes and the trick can be applied recursive for higher values of n.
In stead of using 8! = 40320 steps
We can use 8!/(2*4!4!)(4!+4!+1) = 1750 steps with this method. I ignore the cross product operation when joining set A and set B permutations because its straight forward operation.
In real implementation this method runs approximately 20 times faster than some naive algorithm and can take advantage of parallelism. For example different pairs of set A and B can be grouped and process on different threads.

What is the complexity of this algorithm (search twice integer equals in an array)

I have a question, what is the complexity of this alogirthm ?
def search(t):
i = 0;
find = False
while (not(find) and i < len(t)) :
j = i + 1
while (not(find) and j < len(t)) :
if (t[j] == t[i]) :
find = True
j += 1
i += 1
return find
Thanks
Assuming t is a list, it's quadratic (O(n^2), where n is the length of the list).
You know it is because it iterates through t (first while loop), and in each of these iterations, it iterates through t again. Which means it iterates through len(t) elements, len(t) times. Therefore, O(len(t)**2).
You can bring the complexity of that algorithm down to O(len(t)) and exactly one line of code by using the appropriate data structure:
def search(t):
return (len(set(t)) != len(t))
For more info about how sets work, see https://docs.python.org/2/library/stdtypes.html#set-types-set-frozenset
The best case complexity is O(1), as the search may succeed immediately.
The worst case complexity is O(N²), achieved in case the search fails (there are (N-1)+(N-2)+...+2+1 comparisons made, i.e. N(N-1)/2 in total).
The average case can be estimated as follows: assuming that the array contains K entries that are not unique and are spread uniformly, the first of these is located after N/K elements on average, so the outer loop will run N/K times, with a cost of (N-1)+(N-2)+....+(N-N/K) comparisons. In the last iteration of the outer loop, the inner loop will run about 2N/K times.
Roughly, the expected time is O(N²/K).

Categories