i want to try to calculate the O(n) of my program (in python). there are two problems:
1: i have a very basic knowledge of O(n) [aka: i know O(n) has to do with time and calculations]
and
2: all of the loops in my program are not set to any particular value. they are based on the input data.
The n in O(n) means precisely the input size. So, if I have this code:
def findmax(l):
maybemax = 0
for i in l:
if i > maybemax:
maybemax = i
return maybemax
Then I'd say that the complexity is O(n) -- how long it takes is proportional to the input size (since the loop loops as many times as the length of l).
If I had
def allbigger(l, m):
for el in l:
for el2 in m:
if el < el2:
return False
return True
then, in the worst case (that is, when I return True), I have one loop of length len(l) and inside it, one of length len(m), so I say that it's O(l * m) or O(n^2) if the lists are expected to be about the same length.
Try this out to start, then head to wiki:
Plain English Explanation of Big O Notation
Related
I know there are many other questions out there asking for the general guide of how to calculate the time complexity, such as this one.
From them I have learnt that when there is a loop, such as the (for... if...) in my Python programme, the Time complexity is N * N where N is the size of input. (please correct me if this is also wrong) (Edited once after being corrected by an answer)
# greatest common divisor of two integers
a, b = map(int, input().split())
list = []
for i in range(1, a+b+1):
if a % i == 0 and b % i == 0:
list.append(i)
n = len(list)
print(list[n-1])
However, do other parts of the code also contribute to the time complexity, that will make it more than a simple O(n) = N^2 ? For example, in the second loop where I was finding the common divisors of both a and b (a%i = 0), is there a way to know how many machine instructions the computer will execute in finding all the divisors, and the consequent time complexity, in this specific loop?
I wish the question is making sense, apologise if it is not clear enough.
Thanks for answering
First, a few hints:
In your code there is no nested loop. The if-statement does not constitute a loop.
Not all nested loops have quadratic time complexity.
Writing O(n) = N*N doesn't make any sense: what is n and what is N? Why does n appear on the left but N is on the right? You should expect your time complexity function to be dependent on the input of your algorithm, so first define what the relevant inputs are and what names you give them.
Also, O(n) is a set of functions (namely those asymptotically bounded from above by the function f(n) = n, whereas f(N) = N*N is one function. By abuse of notation, we conventionally write n*n = O(n) to mean n*n ∈ O(n) (which is a mathematically false statement), but switching the sides (O(n) = n*n) is undefined. A mathematically correct statement would be n = O(n*n).
You can assume all (fixed bit-length) arithmetic operations to be O(1), since there is a constant upper bound to the number of processor instructions needed. The exact number of processor instructions is irrelevant for the analysis.
Let's look at the code in more detail and annotate it:
a, b = map(int, input().split()) # O(1)
list = [] # O(1)
for i in range(1, a+b+1): # O(a+b) multiplied by what's inside the loop
if a % i == 0 and b % i == 0: # O(1)
list.append(i) # O(1) (amortized)
n = len(list) # O(1)
print(list[n-1]) # O(log(a+b))
So what's the overall complexity? The dominating part is indeed the loop (the stuff before and after is negligible, complexity-wise), so it's O(a+b), if you take a and b to be the input parameters. (If you instead wanted to take the length N of your input input() as the input parameter, it would be O(2^N), since a+b grows exponentially with respect to N.)
One thing to keep in mind, and you have the right idea, is that higher degree take precedence. So you can have a step that’s constant O(1) but happens n times O(N) then it will be O(1) * O(N) = O(N).
Your program is O(N) because the only thing really affecting the time complexity is the loop, and as you know a simple loop like that is O(N) because it increases linearly as n increases.
Now if you had a nested loop that had both loops increasing as n increased, then it would be O(n^2).
I am trying to understand Big-O notation so I was making my own example for a O(n) using a while loop since I find while loops a bit confusing to understand in Big O notation. I defined a function called linear_example that takes in a list , the example is is python:
So my code is :
def linear_example (l):
n =10
while n>1:
n -= 1
for i in l:
print(i)
My thought process is the code in the for loop runs in constant time O(1)
and the code in the while loop runs in O(n) time .
So there for it would be O(1)+O(n) which would evaluate to O(n).
Feedback?
Think of a simple for-loop:
for i in l:
print(i)
This will be O(n) since you’re iterating through the list for however many items exist in l. (Where n == len(l))
Now we add a while loop which does the same thing ten times, so:
n + n + ... + n (x10)
And the complexity is O(10n).
Since this is still a polynomial with degree one, we can simplify this down to O(n), yes.
Not quite. First of all, n is not a fixed value, so O(n) is meaningless. Let's assume a given value M for this, changing the first two lines:
def linear_example (l, M):
n = M
The code in the for loop does run in O(1) time, provided that each element i of l is of finite bounded print time. However, the loop iterates len(l) times, so the loop complexity is O(len(l)).
Now, that loop runs once entirely through for each value of n in the while loop, a total of M times. Therefore, the complexity is the product of the loop complexities: O(M * len(l)).
def findEquals(words, word):
wordCounts = sorted(Counter(word).values())
equals = []
for word in words:
wordsCounts = sorted(Counter(word).values())
if wordCounts == wordsCounts:
equals.append(word)
return equals
So I have this loop inside my code where words is a list of words. For each word in the list, I store the frequency of letters in a Counter as values which is then sorted using the method sorted(). I know that sorted() has a worst time complexity of O(nlog(n)). Am I right to say that the worst time complexity of the whole code is O(n^2 log(n)) since sorted() is used inside a for loop?
Am I right to say that the worst time complexity of the whole code is O(n^2 log(n)) since sorted() is used inside a for loop?
Not necessarily. It's true that a for loop is O(N), and that sorted is O(N log N), but the "N"s in those expressions refer to different values. The N in O(N) refers to the length of words, and the N in O(N log N) refers to the length of word.
It would only be correct to say that the total complexity of the algorithm is O(N^2 log N), if the average length of each word string is equal to the length of the words list. For instance, if words contained five words, each having five letters. But it is unlikely that this is the case. A more general conclusion to make might be that the algorithm is O(m * n log n), with m corresponding to the size of words, and n corresponding to the average size of a word.
The asymptotic analysis here will be a little weird because as you've written it, it's actually a function of three inputs: the haystack, the length of each word in the haystack, and the needle.
To make it more simple, you can cache the sorted values of the needle: these are static (and this operation will be overwhelmed by the iteration through the haystack).
I've also simplified the code to use filtering, which will abstract out the loop iteration. This won't have a theoretical impact on the performance of the algorithm, but returning an iterator will yield lazy results, which might improve real performance.
Because the frequencies will be integers, they can be sorted in O(n) with respect to the number of frequencies.
Thus, for every element in the haystack, you'll sort and compare with the needle's frequency. This will be O(n*m) where n is the size of haystack and m is the size of each element in haystack.
Therefore, your code can be simplified:
def find_same_frequency_distribution(haystack, needle):
needle_counts = sorted(Counter(needle).values())
return filter(lambda x: sorted(Counter(x).values()) == needle_counts, haystack)
>>> def find_same_frequency_distribution(haystack, needle):
... needle_counts = sorted(Counter(needle).values())
... return filter(lambda x: sorted(Counter(x).values()) == needle_counts, haystack)
...
>>> li = ["daddy", "mummy", "dddya", "foosa"]
>>> for e in find_same_frequency_distribution(li, "babdb"):
... print(e)
...
daddy
mummy
dddya
I have a question, what is the complexity of this alogirthm ?
def search(t):
i = 0;
find = False
while (not(find) and i < len(t)) :
j = i + 1
while (not(find) and j < len(t)) :
if (t[j] == t[i]) :
find = True
j += 1
i += 1
return find
Thanks
Assuming t is a list, it's quadratic (O(n^2), where n is the length of the list).
You know it is because it iterates through t (first while loop), and in each of these iterations, it iterates through t again. Which means it iterates through len(t) elements, len(t) times. Therefore, O(len(t)**2).
You can bring the complexity of that algorithm down to O(len(t)) and exactly one line of code by using the appropriate data structure:
def search(t):
return (len(set(t)) != len(t))
For more info about how sets work, see https://docs.python.org/2/library/stdtypes.html#set-types-set-frozenset
The best case complexity is O(1), as the search may succeed immediately.
The worst case complexity is O(N²), achieved in case the search fails (there are (N-1)+(N-2)+...+2+1 comparisons made, i.e. N(N-1)/2 in total).
The average case can be estimated as follows: assuming that the array contains K entries that are not unique and are spread uniformly, the first of these is located after N/K elements on average, so the outer loop will run N/K times, with a cost of (N-1)+(N-2)+....+(N-N/K) comparisons. In the last iteration of the outer loop, the inner loop will run about 2N/K times.
Roughly, the expected time is O(N²/K).
I am creating a fast method of generating a list of primes in the range(0, limit+1). In the function I end up removing all integers in the list named removable from the list named primes. I am looking for a fast and pythonic way of removing the integers, knowing that both lists are always sorted.
I might be wrong, but I believe list.remove(n) iterates over the list comparing each element with n. meaning that the following code runs in O(n^2) time.
# removable and primes are both sorted lists of integers
for composite in removable:
primes.remove(composite)
Based off my assumption (which could be wrong and please confirm whether or not this is correct) and the fact that both lists are always sorted, I would think that the following code runs faster, since it only loops over the list once for a O(n) time. However, it is not at all pythonic or clean.
i = 0
j = 0
while i < len(primes) and j < len(removable):
if primes[i] == removable[j]:
primes = primes[:i] + primes[i+1:]
j += 1
else:
i += 1
Is there perhaps a built in function or simpler way of doing this? And what is the fastest way?
Side notes: I have not actually timed the functions or code above. Also, it doesn't matter if the list removable is changed/destroyed in the process.
For anyone interested the full functions is below:
import math
# returns a list of primes in range(0, limit+1)
def fastPrimeList(limit):
if limit < 2:
return list()
sqrtLimit = int(math.ceil(math.sqrt(limit)))
primes = [2] + range(3, limit+1, 2)
index = 1
while primes[index] <= sqrtLimit:
removable = list()
index2 = index
while primes[index] * primes[index2] <= limit:
composite = primes[index] * primes[index2]
removable.append(composite)
index2 += 1
for composite in removable:
primes.remove(composite)
index += 1
return primes
This is quite fast and clean, it does O(n) set membership checks, and in amortized time it runs in O(n) (first line is O(n) amortized, second line is O(n * 1) amortized, because a membership check is O(1) amortized):
removable_set = set(removable)
primes = [p for p in primes if p not in removable_set]
Here is the modification of your 2nd solution. It does O(n) basic operations (worst case):
tmp = []
i = j = 0
while i < len(primes) and j < len(removable):
if primes[i] < removable[j]:
tmp.append(primes[i])
i += 1
elif primes[i] == removable[j]:
i += 1
else:
j += 1
primes[:i] = tmp
del tmp
Please note that constants also matter. The Python interpreter is quite slow (i.e. with a large constant) to execute Python code. The 2nd solution has lots of Python code, and it can indeed be slower for small practical values of n than the solution with sets, because the set operations are implemented in C, thus they are fast (i.e. with a small constant).
If you have multiple working solutions, run them on typical input sizes, and measure the time. You may get surprised about their relative speed, often it is not what you would predict.
The most important thing here is to remove the quadratic behavior. You have this for two reasons.
First, calling remove searches the entire list for values to remove. Doing this takes linear time, and you're doing it once for each element in removable, so your total time is O(NM) (where N is the length of primes and M is the length of removable).
Second, removing elements from the middle of a list forces you to shift the whole rest of the list up one slot. So, each one takes linear time, and again you're doing it M times, so again it's O(NM).
How can you avoid these?
For the first, you either need to take advantage of the sorting, or just use something that allows you to do constant-time lookups instead of linear-time, like a set.
For the second, you either need to create a list of indices to delete and then do a second pass to move each element up the appropriate number of indices all at once, or just build a new list instead of trying to mutate the original in-place.
So, there are a variety of options here. Which one is best? It almost certainly doesn't matter; changing your O(NM) time to just O(N+M) will probably be more than enough of an optimization that you're happy with the results. But if you need to squeeze out more performance, then you'll have to implement all of them and test them on realistic data.
The only one of these that I think isn't obvious is how to "use the sorting". The idea is to use the same kind of staggered-zip iteration that you'd use in a merge sort, like this:
def sorted_subtract(seq1, seq2):
i1, i2 = 0, 0
while i1 < len(seq1):
if seq1[i1] != seq2[i2]:
i2 += 1
if i2 == len(seq2):
yield from seq1[i1:]
return
else:
yield seq1[i1]
i1 += 1