Time complexity of python sorted method inside a loop - python

def findEquals(words, word):
wordCounts = sorted(Counter(word).values())
equals = []
for word in words:
wordsCounts = sorted(Counter(word).values())
if wordCounts == wordsCounts:
equals.append(word)
return equals
So I have this loop inside my code where words is a list of words. For each word in the list, I store the frequency of letters in a Counter as values which is then sorted using the method sorted(). I know that sorted() has a worst time complexity of O(nlog(n)). Am I right to say that the worst time complexity of the whole code is O(n^2 log(n)) since sorted() is used inside a for loop?

Am I right to say that the worst time complexity of the whole code is O(n^2 log(n)) since sorted() is used inside a for loop?
Not necessarily. It's true that a for loop is O(N), and that sorted is O(N log N), but the "N"s in those expressions refer to different values. The N in O(N) refers to the length of words, and the N in O(N log N) refers to the length of word.
It would only be correct to say that the total complexity of the algorithm is O(N^2 log N), if the average length of each word string is equal to the length of the words list. For instance, if words contained five words, each having five letters. But it is unlikely that this is the case. A more general conclusion to make might be that the algorithm is O(m * n log n), with m corresponding to the size of words, and n corresponding to the average size of a word.

The asymptotic analysis here will be a little weird because as you've written it, it's actually a function of three inputs: the haystack, the length of each word in the haystack, and the needle.
To make it more simple, you can cache the sorted values of the needle: these are static (and this operation will be overwhelmed by the iteration through the haystack).
I've also simplified the code to use filtering, which will abstract out the loop iteration. This won't have a theoretical impact on the performance of the algorithm, but returning an iterator will yield lazy results, which might improve real performance.
Because the frequencies will be integers, they can be sorted in O(n) with respect to the number of frequencies.
Thus, for every element in the haystack, you'll sort and compare with the needle's frequency. This will be O(n*m) where n is the size of haystack and m is the size of each element in haystack.
Therefore, your code can be simplified:
def find_same_frequency_distribution(haystack, needle):
needle_counts = sorted(Counter(needle).values())
return filter(lambda x: sorted(Counter(x).values()) == needle_counts, haystack)
>>> def find_same_frequency_distribution(haystack, needle):
... needle_counts = sorted(Counter(needle).values())
... return filter(lambda x: sorted(Counter(x).values()) == needle_counts, haystack)
...
>>> li = ["daddy", "mummy", "dddya", "foosa"]
>>> for e in find_same_frequency_distribution(li, "babdb"):
... print(e)
...
daddy
mummy
dddya

Related

What is Time complexity of "set" and "if item in array" in Python?

I need to check if a number and its double exist in an array. This code using set to solve it. However I am not sure the Time complexity is better than O(N^2). I use the for loop and if 2*item in s like below. Isn't it to know whether the item is in an array, we use another O(N). Which mean O(N^2) in total? If it is optimal, how can I implement the codes in C without using nested loop?
Thanks a lot!
def checkIfExist(arr]) -> bool:
s = set(array)
for item in s:
if 2 * item in s and item != 0:
return True
if arr.count(0) >= 2:
return True
return False
The time complexity of the 'in' operator for sets in python is on average O(1) and only in the worst case O(N), since sets in python use a HashTable internally.
So your function's time complexity on average should be O(N) and only in the worst case will be O(N^2), where N is the length of the array.
More here https://wiki.python.org/moin/TimeComplexity

How do I calculate Time Complexity for this particular algorithm?

I know there are many other questions out there asking for the general guide of how to calculate the time complexity, such as this one.
From them I have learnt that when there is a loop, such as the (for... if...) in my Python programme, the Time complexity is N * N where N is the size of input. (please correct me if this is also wrong) (Edited once after being corrected by an answer)
# greatest common divisor of two integers
a, b = map(int, input().split())
list = []
for i in range(1, a+b+1):
if a % i == 0 and b % i == 0:
list.append(i)
n = len(list)
print(list[n-1])
However, do other parts of the code also contribute to the time complexity, that will make it more than a simple O(n) = N^2 ? For example, in the second loop where I was finding the common divisors of both a and b (a%i = 0), is there a way to know how many machine instructions the computer will execute in finding all the divisors, and the consequent time complexity, in this specific loop?
I wish the question is making sense, apologise if it is not clear enough.
Thanks for answering
First, a few hints:
In your code there is no nested loop. The if-statement does not constitute a loop.
Not all nested loops have quadratic time complexity.
Writing O(n) = N*N doesn't make any sense: what is n and what is N? Why does n appear on the left but N is on the right? You should expect your time complexity function to be dependent on the input of your algorithm, so first define what the relevant inputs are and what names you give them.
Also, O(n) is a set of functions (namely those asymptotically bounded from above by the function f(n) = n, whereas f(N) = N*N is one function. By abuse of notation, we conventionally write n*n = O(n) to mean n*n ∈ O(n) (which is a mathematically false statement), but switching the sides (O(n) = n*n) is undefined. A mathematically correct statement would be n = O(n*n).
You can assume all (fixed bit-length) arithmetic operations to be O(1), since there is a constant upper bound to the number of processor instructions needed. The exact number of processor instructions is irrelevant for the analysis.
Let's look at the code in more detail and annotate it:
a, b = map(int, input().split()) # O(1)
list = [] # O(1)
for i in range(1, a+b+1): # O(a+b) multiplied by what's inside the loop
if a % i == 0 and b % i == 0: # O(1)
list.append(i) # O(1) (amortized)
n = len(list) # O(1)
print(list[n-1]) # O(log(a+b))
So what's the overall complexity? The dominating part is indeed the loop (the stuff before and after is negligible, complexity-wise), so it's O(a+b), if you take a and b to be the input parameters. (If you instead wanted to take the length N of your input input() as the input parameter, it would be O(2^N), since a+b grows exponentially with respect to N.)
One thing to keep in mind, and you have the right idea, is that higher degree take precedence. So you can have a step that’s constant O(1) but happens n times O(N) then it will be O(1) * O(N) = O(N).
Your program is O(N) because the only thing really affecting the time complexity is the loop, and as you know a simple loop like that is O(N) because it increases linearly as n increases.
Now if you had a nested loop that had both loops increasing as n increased, then it would be O(n^2).

Whats time complexity of this python function?

def hasPairWithSum(arr,target):
for i in range(len(arr)):
if ((target-arr[i]) in arr[i+1:]):
return True
return False
In python, is time complexity for this function O(n) or O(n^2), in other words 'if ((target-arr[i]) in arr[i+1:])' is this another for loop or no?
Also what about following function, is it also O(n^2), if not why:
def hasPairWithSum2(arr,target):
seen = set()
for num in arr:
num2 = target - num
if num2 in seen:
return True
seen.add(num)
return False
Thanks!
The first version has a O(n²) time complexity:
Indeed the arr[i+1:] already creates a new list with n-1-i elements, which is not a constant time operation. The in operator will scan that new list, and in worst case visit each value in that new list.
If we count the number of elements copied into a new list (with arr[i+1), we can sum those counts for each iteration of the outer loop:
n-1
+ n-2
+ n-3
+ ...
+ 1
+ 0
This is a triangular number, and equals n(n-1)/2, which is O(n²).
Second version
The second version, using a set, runs in O(n) average time complexity.
There is no list slicing here, and the in operator on a set has -- contrary to a list argument -- an average constant time complexity. So now every action within the loop has an (average) constant time complexity, giving the algorithm an average time complexity of O(n).
According to the python docs, the in operator for a set can have an amortised worst time complexity of O(n). So you would still get a worst time complexity for your algorithm of O(n²).
It's O(n^2)
First loop will run n times and second will run (n-m) for every n.
So the whole thing will run n(n-m) times that is n^2 - nm. Knowing that m<n we know that it has a complexity of O(n^2)
But if it's critical to something you might consult someone who is better at that.

Calculating complexity of an algorithm (Big-O)

I'm currently doing some work around Big-O complexity and calculating the complexity of algorithms.
I seem to be struggling to work out the steps to calculate the complexity and was looking for some help to tackle this.
The function:
index = 0
while index < len(self.items):
if self.items[index] == item:
self.items.pop(index)
else:
index += 1
The actual challenge is to rewrite this function so that is has O(n) worst case complexity.
My problem with this is, as far as I thought, assignment statements and if statements have a complexity of O(1) whereas the while loop has a complexity of (n) and in the worst case any statements within the while loop could execute n times. So i work this out as 1 + n + 1 = 2 + n = O(n)
I figure I must be working this out incorrectly as there'd be no point in rewriting the function otherwise.
Any help with this is greatly appreciated.
If self.items is a list, the pop operation has complexity "k" where k is the index,
so the only way this is not O(N) is because of the pop operation.
Probably the exercise is done in order for you to use some other method of iterating and removing from the list.
To make it O(N) you can do:
self.items = [x for x in self.items if x == item]
If you are using Python's built in list data structure the pop() operation is not constant in the worst case and is O(N). So your overall complexity is O(N^2). You will need to use some other data structure like linked list if you cannot use auxiliary space.
With no arguments to pop its O(1)
With an argument to pop:
Average time Complexity O(k) (k represents the number passed in as an argument for pop
Amortized worst case time complexity O(k)
Worst case time complexity O(n)
Time Complexity - Python Wiki
So to make your code effective, allow user to pop from the end of the list:
for example:
def pop():
list.pop(-1)
Reference
Since you are passing index to self.items.pop(index), Its NOT O(1).

time complexity of variable loops

i want to try to calculate the O(n) of my program (in python). there are two problems:
1: i have a very basic knowledge of O(n) [aka: i know O(n) has to do with time and calculations]
and
2: all of the loops in my program are not set to any particular value. they are based on the input data.
The n in O(n) means precisely the input size. So, if I have this code:
def findmax(l):
maybemax = 0
for i in l:
if i > maybemax:
maybemax = i
return maybemax
Then I'd say that the complexity is O(n) -- how long it takes is proportional to the input size (since the loop loops as many times as the length of l).
If I had
def allbigger(l, m):
for el in l:
for el2 in m:
if el < el2:
return False
return True
then, in the worst case (that is, when I return True), I have one loop of length len(l) and inside it, one of length len(m), so I say that it's O(l * m) or O(n^2) if the lists are expected to be about the same length.
Try this out to start, then head to wiki:
Plain English Explanation of Big O Notation

Categories