Can you help me with the time complexity of this Python code? - python

I have written this code and I think its time complexity is O(n+m) as time depends on both the inputs, am I right? Is there a better algorithm you can suggest?
The function return the length of union of both the inputs.
class Solution :
def getUnion(self,a,b,):
p= 0
lower, greater = a,b
if len(a)>len(b):
lower,greater = b,a
while p< len(lower): # O(n+m)
if lower[p] in greater:
greater.remove(lower[p])
p+=1
return len(lower+greater)
print(Solution().getUnion([1,2,3,4,5],[2,3,4,54,67]))

Assuming š‘š is the shorter length, and š‘› the longer (or both are equal), then the while loop will iterate š‘š times.
Inside a single iteration of that loop an in greater operation is executed, which has a time complexity of O(š‘›) -- for each individual execution.
So the total time complexity is O(š‘šš‘›).
The correctness of this algorithm depends on whether we can assume that the input lists only contain unique values (each).
You can do better using a set:
return len(set(a + b))
Building a set is O(š‘š + š‘›), and getting its length is a constant time operation, so this is O(š‘š + š‘›)

Related

What is the complexity of these two funtions with respect to input size of integer

Assume that given a number n, our function iterates n times. What is its complexity with respect to the input size assuming all the other operations are constant-time? Why?
My answer: O(2^n), where n is the size of input, size of input = log2n, iterates n times so 2^n
Assume that given a number n, our function iterates n**2 times. What is its complexity with respect to the input size assuming all the other operations are constant-time? Why?
My answer: O(2^ (2^n)), where n is the size of input, size of input = log2n,
I am not sure I understand the questions correctly especially for question2. Is my answers correct for both questions?
If all operations within the iterations are are constant time, time complexity is just the number of iterations. So the first case is O(n), and the second case is O(n**2).

What is the time complexity of a while loop that uses random.shuffle (python) inside of it?

first of all, can we even measure it since we don't know how many times random.shuffle will shuffle the array until it reaches the desired outcome
def sort(numbers):
import random
while not sort(numbers)==numbers:
random.shuffle(numbers)
return numbers
First I assume the function name to not be sort as this would be trivial and would lead to unconditional infinite recursion. I am assuming this function
import random
def random_sort(numbers):
while not sorted(numbers) == numbers:
random.shuffle(numbers)
return numbers
Without looking at the implementation to much I would assume O(n) for the inner shuffle random.shuffle(numbers). Where n is the number of elements in numbers.
Then we have the while loop. It stops when the array is sorted. Now shuffle returns us one of all possible permutations of numbers. The loop aborts when its sorted. This is for just one of those. (if we don't assume a small number space).
This stopping is statistical. So we need technically define which complexity we are speaking of. This is where best case, worst case, amortized case comes in.
Best case
The numbers we get are already sorted. Then we have the cost of sort(numbers) and the comparison .. == numbers. Sorting a sorted array is O(n). So our best case complexity is O(n).
Worst case
The shuffle never gives us the right permutation. This is definitely possible. The algorithm would never terminate. So its O(āˆž).
Average case
This is probably the most interesting case. First we need to establish how many permutations shuffle is giving us. Here is a link which discusses that. An approximation is given as e ā‹… n!. Which is O(n!) (please check).
Now the question is on average when does our loop stop. This is answered in this link. They say its the geometric distribution (please check). The result is 1/ p, where p is the probablity of getting it. In our case this is p = 1 / (e ā‹… n!). So we need on average e ā‹… n! tries.
Now for each try we need to sort O(n log(n)), compare O(n) and compute the shuffle O(n). For the shuffle we can say it uses the Fisher Yates algorithm which has a complexity of O(n), as shown here.
So we have O(n! n log(n)) for the average complexity.

Time complexity with a logarithmic recursive function

Could someone please explain the time complexity of the following bit of code:
def fn(n):
if n==0:
linear_time_fn(n) #some function that does work in O(n) time
else:
linear_time_fn(n)
fn(n//5)
I was under the impression that the complexity of the code is O(nlogn) while the actual complexity is be O(n). How is this function different from one like merge sort which has an O(nlogn) complexity? Thanks.
It's O(n) because n is smaller in each recursive level. So you have O(log n) calls to the function, but you don't do n units of work each time. The first call is O(n), the second call is O(n//5), the next call is O(n//5//5), and so on.
When you combine these, it's O(n).
You are correct that this is O(n). The difference between this and merge sort is that this makes one recursive call, while merge sort makes two.
So for this code, you have
One problem of size n
One problem of size n\2
One problem of size n\4
...
With merge sort, you have
One problem of size n
which yields two problems of size n/2
which yields four problems of size n/4
...
which yields n problems of size 1.
In the first case, you have n + n/2 + n/4 + ... = 1.
In the second case you have 1 + 1 + 1 + .... 1, but after log2(n) steps, you reach the end.

Time complexity analysis of two algorithms contradicts empirical results

I wrote the following simple function that checks whether str1 is a permutation of str2:
def is_perm(str1, str2):
return True if sorted(str1)==sorted(str2) else False
Assuming that sorted(str) has a time complexity of O(n*logn), we can expect a time complexity of O(2*n*logn)=O(n*logn). The following function is an attempt to achieve a better time complexity:
def is_perm2(str1, str2):
dict1 = {}
dict2 = {}
for char in str1:
if char in dict1:
dict1[char] += 1
else:
dict1[char] = 1
for char in str2:
if char in dict2:
dict2[char] += 1
else:
dict2[char] = 1
if dict1==dict2:
return True
else:
return False
Each for-loop iterates n times. Assuming that dictionary lookup and both dictionary updates have constant time complexity, I expect an overall complexity of O(2n)=O(n). However, timeit measurements show the following, contradicting results. Why is is_perm2 slower than is_perm after 1000000 executions even though it's time complexity looks better? Are my assumptions wrong?
import timeit
print(timeit.timeit('is_perm("helloworld","worldhello")', 'from __main__ import is_perm', number=10000000))
print(timeit.timeit('is_perm2("helloworld","worldhello")', 'from __main__ import is_perm2', number=10000000))
# output of first print-call: 12.4199592999993934 seconds
# output of second print-call: 37.13826630001131 seconds
There is no guarantee that an algorithm with a time complexity of O(nlogn) will be slower than one with a time complexity of O(n) for a given input. The second one could for instance have a large constant overhead, making it slower for input sizes that are below 100000 (for instance).
In your test the input size is 10 ("helloworld"), which doesn't tell us much. Repeating that test doesn't make a difference, even if repeated 10000000 times. The repetition only gives a more precise estimate of the average time spent on that particular input.
You would need to feed the algorithm with increasingly large inputs. If memory allows, that would eventually bring us to an input size for which the O(nlogn) algorithm takes more time than the O(n) algorithm.
In this case, I found that the input size had to be really large in comparison with available memory, and I only barely managed to find a case where the difference showed:
import random
import string
import timeit
def shuffled_string(str):
lst = list(str)
random.shuffle(lst)
return "".join(lst)
def random_string(size):
return "".join(random.choices(string.printable, k=size))
str1 = random_string(10000000)
str2 = shuffled_string(str1)
print("start")
print(timeit.timeit(lambda: is_perm(str1, str2), number=5))
print(timeit.timeit(lambda: is_perm2(str1, str2), number=5))
After the initial set up of the strings (which each have a size of 10 million characters), the output on repl.it was:
54.72847577700304
51.07616817899543
The reason why the input has to be so large to see this happen, is that sorted is doing all the hard work in lower-level, compiled code (often C), while the second solution does all the looping and character reading in Python code (often interpreted). It is clear that the overhead of the second solution is huge in comparison with the first.
Improving the second solution
Although not your question, we could improve the implementation of the second algorithm, by relying on another built-in function: Counter:
def is_perm3(str1, str2):
return Counter(str1) == Counter(str2)
With the same test set up as above, the timing for this implementation on repl.it is:
24.917681352002546
Assuming that dictionary lookup and both dictionary updates have constant time complexity,
python dictionary is hashmap.
So exactly, dictionary lookup and update costs O(n) in worst case.
Total average time complexity of is_perm2 is O(n) but worse case time complexity is O(n^2).
If you want get exactly O(n) time complexity, please use List(not Dictionary) to store frequency of characters.
You can easily convert each character to ascii numbers and store their frequency to python list.

How to find complexity for the following program?

So I am not a CS major and have hard time answering questions about a program's big(O) complexity.
I wrote the following routine to output the pairs of numbers in an array which sum to 0:
asd=[-3,-2,-3,2,3,2,4,5,8,-8,9,10,-4]
def sum_zero(asd):
for i in range(len(asd)):
for j in range(i,len(asd)):
if asd[i]+asd[j]==0:
print asd[i],asd[j]
Now if someone asks the complexity of this method I know since the first loop goes thorough all the n items it will be more than (unless I am wrong) but can someone explain how to find the correct complexity?
If there is better more efficient way of solving this?
I won't give you a full solution, but will try to guide you.
You should get a pencil and a paper, and ask yourself:
How many times does the statement print asd[i], asd[j] execute? (in worst case, meaning that you shouldn't really care about the condition there)
You'll find that it really depends on the loop above it, which gets executed len(asd) (denote it by n) times.
The only thing you need to know, how many times is the inner loop executed giving that the outer loop has n iterations? (i runs from 0 up to n)
If you still not sure about the result, just take a real example, say n=20, and calculate how many times is the lowest statement executed, this will give you a very good indication about the answer.
def sum_zero(asd):
for i in range(len(asd)): # len called once = o(1), range called once = o(1)
for j in range(i,len(asd)): # len called once per i times = O(n), range called once per i times = O(n)
if asd[i]+asd[j]==0: # asd[i] called twice per j = o(2*nĀ²)
# adding is called once per j = O(nĀ²)
# comparing with 0 is called once per j = O(nĀ²)
print asd[i],asd[j] # asd[i] is called twice per j = O(2*nĀ²)
sum_zero(asd) # called once, o(1)
Assuming the worst case scenario (the if-condition always being true):
Total:
O(1) * 3
O(n) * 2
O(nĀ²) * 6
O(6nĀ² + 2n + 3)
A simple program to demonstrate the complexity:
target= []
quadraditc = []
linear = []
for x in xrange(1,100):
linear.append(x)
target.append(6*(x**2) + 2*x + 3)
quadraditc.append(x**2)
import matplotlib.pyplot as plt
plt.plot(linear,label="Linear")
plt.plot(target,label="Target Function")
plt.plot(quadraditc,label="Quadratic")
plt.ylabel('Complexity')
plt.xlabel('Time')
plt.legend(loc=2)
plt.show()
EDIT:
As pointed out by #Micah Smith, the above answer is the worst case operations, the Big-O is actually O(n^2), since the constants and lower order terms are omitted.

Categories