Creating a heap with heapify vs heappush. Which one is faster? - python

Question
I have to create a priority queue storing distances. To build the heap I am thinking about the following two possibilities:
from heapq import heapify, heappush
n = 35000 # input size
# way A: using heapify
dist = []
for i in range(n):
dist.push(distance) # distance is computed in O(1) time
heapify(dist)
# way B: using heappush
dist = []
for i in range(n):
heappush(dist, distance) # distance is computed in O(1) time
Which one is faster?
Reasoning
According to the docs heapify() runs in linear time, and I'm guessing heappush() runs in O(log n) time. Therefore, the running time for each way would be:
A: O(2n) = O(n)
B: O(n log n)
However, it is counter intuitive for me that A is faster than B. Am I missing something? is it A really faster than B?
**EDIT
I've been testing with different inputs and different sizes of the array, and I am still not sure which one is faster.
After reading the link of the comment by Elisha, I understand how heapify() runs in linear time. However, I still don't know if using heappush() could be faster depending on the input.
I mean, heappush() has a worst case running time of O(log n), but in average will probably be smaller, depending on the input. Its best case running time is actually O(1). In the other hand heapify() has a best case running time of O(n), and must be called after filling the array, which takes also O(n). That makes a best case of O(2n).
So heappush() could be as fast as linear or as slow as O(n log n), whereas heapify() is going to take 2n time in any case. If we look at the worst case, heapify() will be better. But what about an average case?
Can we even be sure that one be faster than te other?

Yes, we can be certain that one is faster than the other.
heap.push builds the heap from the bottom up. Each item is added to the end of the array and then "bubbled up" to its correct position. If you were building a min heap and you presented the items in reverse order, then every item you inserted would require log(n) (n being the current size of the heap) comparisons. So the worst case for building a heap by insertion is O(n log n).
Imagine starting with an empty heap and adding 127 items in reverse order (i.e. 127, 126, 125, 124, etc.). Each new item is smaller than all the other items, so every item will require the maximum number of swaps to bubble up from the last position to the top. The first item that’s added makes zero swaps. The next two items make one swap each. The next four items make two swaps each. Eight items make three swaps. 16 items make four swaps. 32 items make five swaps, and 64 items make six swaps.It works out to:
0 + 2*1 + 4*2 + 8*3 + 16*4 + 32*5 + 64*6
0 + 2 + 8 + 24 + 64 + 160 + 384 = 642 swaps
The worst case for build-heap is n swaps. Consider that same array of 127 items. The leaf level contains 64 nodes. build-heap starts at the halfway point and works its way backwards, moving things down as required. The next-to-last level has 32 nodes that at worst will move down one level. The next level up has 16 nodes that can't move down more than two levels. If you add it up, you get:
64*0 + 32*1 + 16*2 + 8*3 + 4*4 + 2*5 + 1*6
0 + 32 + 32 + 24 + 16 + 10 + 6 = 120 swaps
That's the absolute worst case for build-heap. It's O(n).
If you profile those two algorithms on an array of, say, a million items, you'll see a huge difference in the running time, with build-heap being much faster.

Related

How do you take randomness into account when finding the complexity of a function?

I've been trying to understand complexities, but all the online material has left me confused. Especially the part where they create actual mathematical functions. I have a for loop and a while loop. My confusion arises from the while loop. I know the complexity of the for loop is O(n), but the while loop is based on randomness. A random number if picked, and if this number is not in the list, it is added and the while loop broken. But my confusion arises here, the while loop may run in the worst case (in my thoughts) for an m number of times, until it is done. So I was thinking the complexity would then be O(n*m)?
I'm just really lost, and need some help.
Technically worst-case complexity is O(inf): random.randint if we consider it real random generator (it isn't, of course) can produce arbitrary long sequence with equal elements. However, we can estimate "average-case" complexity. It isn't the real average-case complexity (best, worst and average cases must be defined by input, not randomly), but it can show how many iterations program will do, if we run it for fixed n multiple times and take average of results.
Let's note that the list works as set here (you never add repeated number), so I'd stick with not in set comparison instead which is O(1) (while not in list is O(i)) to remove that complexity source and simplify things a bit: now count of iterations and complexity can be estimated with same big O limits. Single trial here is choosing from uniform integer distribution on [1; n]. Success is choosing number that is not in the list yet.
Then what's expected value of number of trials before getting item that is not in the set? Set size before each step is i in your code. We can pick any of n-i numbers. Thus probability of success is p_i = (n-i)/n (as the distribution is uniform). Every outer iteration is an example of geometrical distribution: count of trials before first success. So estimated count of while iterations is n_i = 1 / p_i = n / (n-i). To get final complexity we should sum this counts for each for iteration: sum(n_i for i in range(n)). This is obviously equal to n * Harmonic(n), where Harmonic(n) is n-th harmonic number (sum of first n reciprocals to natural numbers). Harmonic(n) ~ O(log n), thus "average-case" complexity of this code is O(n log n).
For list it will be sum(i*n / (n-i) for i in range(n)) ~ O(n^2 log(n)) (proof of this equality will be a little longer).
Big 'O' notation is used for worst case scenarios only.
figure out what could be he worst case for given loop.
make a function in 'n' , and take highest power of 'n' and ignore constant if any, you will get time complexity.

Time complexity with a logarithmic recursive function

Could someone please explain the time complexity of the following bit of code:
def fn(n):
if n==0:
linear_time_fn(n) #some function that does work in O(n) time
else:
linear_time_fn(n)
fn(n//5)
I was under the impression that the complexity of the code is O(nlogn) while the actual complexity is be O(n). How is this function different from one like merge sort which has an O(nlogn) complexity? Thanks.
It's O(n) because n is smaller in each recursive level. So you have O(log n) calls to the function, but you don't do n units of work each time. The first call is O(n), the second call is O(n//5), the next call is O(n//5//5), and so on.
When you combine these, it's O(n).
You are correct that this is O(n). The difference between this and merge sort is that this makes one recursive call, while merge sort makes two.
So for this code, you have
One problem of size n
One problem of size n\2
One problem of size n\4
...
With merge sort, you have
One problem of size n
which yields two problems of size n/2
which yields four problems of size n/4
...
which yields n problems of size 1.
In the first case, you have n + n/2 + n/4 + ... = 1.
In the second case you have 1 + 1 + 1 + .... 1, but after log2(n) steps, you reach the end.

Can you help me with the time complexity of this Python code?

I have written this code and I think its time complexity is O(n+m) as time depends on both the inputs, am I right? Is there a better algorithm you can suggest?
The function return the length of union of both the inputs.
class Solution :
def getUnion(self,a,b,):
p= 0
lower, greater = a,b
if len(a)>len(b):
lower,greater = b,a
while p< len(lower): # O(n+m)
if lower[p] in greater:
greater.remove(lower[p])
p+=1
return len(lower+greater)
print(Solution().getUnion([1,2,3,4,5],[2,3,4,54,67]))
Assuming π‘š is the shorter length, and 𝑛 the longer (or both are equal), then the while loop will iterate π‘š times.
Inside a single iteration of that loop an in greater operation is executed, which has a time complexity of O(𝑛) -- for each individual execution.
So the total time complexity is O(π‘šπ‘›).
The correctness of this algorithm depends on whether we can assume that the input lists only contain unique values (each).
You can do better using a set:
return len(set(a + b))
Building a set is O(π‘š + 𝑛), and getting its length is a constant time operation, so this is O(π‘š + 𝑛)

Python generator time complexity log(n)

In python3 range built with help of generators
Logarithmic Time β€” O(log n)
An algorithm is said to have a logarithmic time complexity when it reduces the size of the input data in each step. example if we are printing first 10 digits with help of generators first we will get one element so remaining 9 element has to process, then second element so remaining 8 element has to process
for index in range(0, len(data)):
print(data[index])
When i check the url python generators time complexity confusion its saying O(n).
Since every time its generating only one output because we need to do __next__
it will be everytime 1 unit cost.
Can I get explanation on this
That explanation of logarithmic time complexity is wrong.
You get logarithmic complexity if you reduce the size of the input by a fraction, not by a fixed amount. For instance, binary search divides the size by 2 on each iteration, so it's O(log n). If the input size is 8 it takes 4 iterations, doubling the size to 16 only increase iterations to 5.

Complexity 0 Notation

I am a bit confused on this question and would appreciate some guidance on it:
An O(n2) function takes approx 1 second to run when N is 10000.
How long will it take when N is 30000?
I was thinking that it would either be 1 second as well or 3 seconds since it is three times the size, but I am not sure if my logic is correct.
Thank you.
From Wikipedia:
In computer science, the time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the string representing the input.
This way, if complexity is O(n^2) and input is 3 times greater, then time of work is 3^2 = 9 times greater. Time of work is 9 seconds.
There are many problems with the question.
First problem: time complexity does not, in general, measure time in seconds. For example, the time complexity of a sorting algorithm might refer to the number of comparisons (or swaps), and the time complexity of a hash table lookup might also refer to the number of comparisons performed. It's debatable whether the actual runtime is proportional to these measurements.
Second problem: the definition of big-O is this:
f(n) = O(g(n)) if there's N and k such that n > N implies f(n) < k*g(n).
That's a problem because even if the runtime in this case is measured in seconds, applying the definition to O(n^2) says only that for large enough n that the function is bounded above by some multiple of n^2.
So there's no guarantee that 10000 and 30000 are big enough to qualify for "big enough", and even if they were, you can't begin to estimate k from a single data point. And even with that estimate, you only get an upper bound.
What the question probably meant to ask was this:
Suppose that a function runs in time approximately proportional to n^2. It takes 1 second when n=10000. Approximately long does it take when n=30000?
Then, one can solve the equations:
1 sec ~= k * 10000^2
answer ~= k * 30000^2
= 3^2 * k * 10000^2
~= 3^2 * 1 sec
= 9 sec

Categories