time complexity of a double for loop - python

I'm trying to ask myself what is the time and space complexity of the following code. This is intended to spit out anagram from a list of words.
def ana(input_):
results = []
for key, word1 in enumerate(input_):
a=set([i for i in word1])
for word2 in input_[key+1:]:
b=set([i for i in word2])
if len(set(a)-set(b))==0 and len(set(b)-set(a))==0:
if word1 not in results:
results.append(word1)
if word2 not in results:
results.append(word2)
For time complexity, the first for loop indicate there is at least N iteration. I'm confused with the second for loop. It's definitely less than N, but more than log N.
How should this be noted?
In term of space complexity:
this is just O(N) to store the results. Is this correct?

TLDR: A nested loop of the size N - 1 + N - 2 + ... + N - N is still O(N^2).
The outer loop iterates through input_ completely -- for a size N input_ the loop contributes a factor of O(N). That means the function is at least O(N) in total.
The inner loop iterates through input_[key+1:], with key ranging from 0 to N - 1. That means in the first iteration it has O(N), but for the last iteration it has O(1). That means the function is between O(N^2) and O(N) in total.
You can quickly estimate the complexity now already. The first few inner iterations contribute O(N) + O(N - 1) + ... and so on. Even if we look at the first half of all iterations, the last of these still contributes O(N/2). Since O(N - 1) = O(N) and O(N/2) = O(N) we definitely have N/2 loops of O(N) each. That gives O(N / 2 * N) = O(N^2) complexity.
You can also calculate the iterations accurately -- this situation is similar to triangular numbers. The inner loop performs a total of (N - 1) + (N - 2) + ... + (N - N + 1) + (N - N) iterations. This can be re-ordered to ((N - 1) + (N - N)) + ((N - 2) + (N - N + 1)) + ... i.e. pairs of lowest/largest remaining.
Each pair has the value (N - 1 + N - N) or just (N - 1).
Rearranging our initial N terms into pairs means we have N / 2 pairs.
This means the sum of all pairs is (N - 1) * N / 2.
In other words, the inner loop does a total of (N - 1) * N / 2 iterations. That is O((N - 1) * N / 2) = O(N * N) = O(N^2) complexity.

Related

Runtime of two nested for loops

I am calculating the run time complexity of the below Python code as n^2 but it seems that it is incorrect. The correct answer shown is n(n-1)/2. Can anyone help help me in understanding why the inner loop is not running n*n times but rather n(n-1)/2?
for i in range(n):
for j in range(i):
val += 1
On the first run of the inner loop, when i = 0, the for loop does not execute at all. (There are zero elements to iterate over in range(0)).
On the second run of the inner loop, when i = 1, the for loop executes for one iteration. (There is one element to iterate over in range(1)).
Then, there are two elements in the third run, three in the fourth, following this pattern, up until i = n - 1.
The total number of iterations is 0 + 1 + ... + (n - 1). This summation has a well known form*: for any natural m, 0 + 1 + ... + m = m * (m + 1) / 2. In this case, we have m = n - 1, so there are n * (n - 1) / 2 iterations in total.
You're right that the asymptotic time complexity of this algorithm is O(n^2). However, given that you've said that the "correct answer" is n * (n - 1) / 2, the question most likely asked for the exact number of iterations (rather than an asymptotic bound).
* See here for a proof of this closed form.

What is the time complexity of a recursive function that spawns n - (i + 2) function calls on each iteration?

I'm working on a DP problem called house robber, I've solved the problem using a DP approach but my initial thought was to use the following recursive function:
def rec(self, start, possible):
if len(possible) == 0:
return start
money = start
for i, num in enumerate(possible):
nextMoney = self.rec(start + num, possible[i+2:])
money = max(money, nextMoney)
return money
def rob(self, nums: List[int]) -> int:
# Base Case
if(len(nums) == 0):
return 0
elif len(nums) == 1:
return nums[0]
path1 = self.rec(nums[0], nums[2:])
path2 = self.rec(nums[1], nums[3:])
# Recursion
return max(path1, path2)
My DP solution is O(n) but I'm struggling to determine the time complexity of the algorithm described above. My instinct says its of exponential order to the log(n) = O(n ^ log (n))
If anyone can point me in the right direction here that would be highly appreciated. Thanks.
Problem for reference: https://leetcode.com/problems/house-robber/
The code enumerates all subsets of 1..n with no two adjacent numbers. You do lots of slicing of possible which creates an O(n^2) cost per call, so the recurrence relation is:
T(n) = n^2 + sum(T(i) for i=0..n-2)
Subtracting T(n) from T(n-1):
T(n) - T(n-1) = n^2 - (n-1)^2 + T(n-2)
T(n) = 2n - 1 + T(n-1) + T(n-2)
Let U(n) = T(n) + 2n + 5, so T(n) = U(n) - 2n - 5. Substituting for T(n), T(n-1) and T(n-2) we get:
U(n) - 2n - 5 = 2n - 1 + U(n-1) - 2(n-1) - 5 + U(n-2) - 2(n-2) - 5
U(n) = U(n-1) + U(n-2) (simplifying)
so U(n) = Fib(n) (ie: fibonacci numbers), and T(n) = Fib(n) - 2n - 5.
So your runtime is Theta(Fib(n)), which is Theta(phi^n), where phi is the golden ratio.
[An interesting note is that if you removed the list slicing that causes the O(n^2) cost per call, the complexity class of your code would be the same -- the cost of slicing is lost in the otherwise exponential cost of the code].

How do I calculate the time complexity of a nested 'for' loop?

Consider:
def fun(n):
for i in range(1, n+1):
for j in range(1, n, i):
print (i, “,”, j)
I'm having trouble with the nested for loop. Is it 2n^2 + 2n + 1?
The inner loop runs from 1 (inclusive) to n (exclusive) in hops of i. That thus means that it will make (n-1)//i steps.
The outer loop makes n runs where i ranges from 1 to n. We can slighlty overestimate the total number of steps by calculating the number of steps as a sum:
n n
--- ---
\ n-1 \ 1
/ --- = n-1 * / ---
--- i --- i
i=1 i=1
We can here use the Stirling approximation: we know that the integral if 1/i will be between the sum of 1/(i+1) and 1/i.
The integral of 1/i, so we approximate this as:
n n
--- /\
\ 1 | 1
/ --- ≈ | --- dx = log(n) - log(1)
--- i \/ x
i=1 1
So that means that the total number of steps is, in terms of big oh O(n log n).
The complexity of your code is O(n log n). Inside the first for loop, the complexity is O(n/i) which in total we have:
O(n/1) + O(n/2) + ...+ O(n/i)+...+O(1)
Is equal to:
n O( 1 + 1/2 + ... + 1/n ) = n O(log(n)) = O(n log(n))

What's the time complexity of this method to find the number of inversions in an array (python)?

inv=0
for j in range(n):
inv=inv+ sum((x<arr[j]) for x in arr[j:] )
For every element I am checking the number of elements smaller than it occurring after it in the array.(arr[j : ])
It is O(n2). Here is how you can compute this:
for the 1st element, you need to compare with the next n-1 elements.
for the 2nd element, you need to compare with the next n-2 elements.
...
for the nth element, you need to compare with the next 0 elements.
Therefore, in total you are making (n-1) + (n-2) + ... + 1 + 0 = n(n-1) / 2 comparisons, which is quadratic in n.
More efficient approaches do exist. For example, by using a divide and conquer based strategy, you can count them in O(n log(n)). See this nice link!
inv=0
for j in range(n):
inv=inv+ sum((x<arr[j]) for x in arr[j:] )
Let's break this code into three parts
1: inv = 0
This will take contant time operation sat T1
2: for j in range(n):
here we are running a loop for variable n
total time required now is T1 + N * f(a) here f(a) is the time taken by the body of loop. For simplicity we can remove constant factor. So complexity is N * f(a)
Now here comes the tricky part. What is f(a)
3: inv = inv + sum((x<arr[j]) for x in arr[j:] )
concentrate on sum((x < arr[j] for x in arr[j:])
sum will add all the values that are below arr[j] in
loop for x in arr[j:]
so you are left with f(a) as N, N - 1, N - 2 up to N - N
Combining this all together you get N * (N + N - 1 + N - 2 + ... + N - N) which is (N * N - 1) / 2 that is O(N^2)
Hope you get it.

Why is this algorithm worse?

In Wikipedia this is one of the given algorithms to generate prime numbers:
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only candidates below sqrt(n) need be checked.
candidates = [i for i in range(n + 1)]
fin = int(n ** 0.5)
# Loop over the candidates, marking out each multiple.
for i in range(2, fin + 1):
if not candidates[i]:
continue
candidates[i + i::i] = [None] * (n // i - 1)
# Filter out non-primes and return the list.
return [i for i in candidates[2:] if i]
I changed the algorithm slightly.
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only candidates below sqrt(n) need be checked.
candidates = [i for i in range(n + 1)]
fin = int(n ** 0.5)
# Loop over the candidates, marking out each multiple.
candidates[4::2] = [None] * (n // 2 - 1)
for i in range(3, fin + 1, 2):
if not candidates[i]:
continue
candidates[i + i::i] = [None] * (n // i - 1)
# Filter out non-primes and return the list.
return [i for i in candidates[2:] if i]
I first marked off all the multiples of 2, and then I considered odd numbers only. When I timed both algorithms (tried 40.000.000) the first one was always better (albeit very slightly). I don't understand why. Can somebody please explain?
P.S.: When I try 100.000.000, my computer freezes. Why is that? I have Core Duo E8500, 4GB RAM, Windows 7 Pro 64 Bit.
Update 1: This is Python 3.
Update 2: This is how I timed:
start = time.time()
a = eratosthenes_sieve(40000000)
end = time.time()
print(end - start)
UPDATE: Upon valuable comments (especially by nightcracker and Winston Ewert) I managed to code what I intended in the first place:
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only c below sqrt(n) need be checked.
c = [i for i in range(3, n + 1, 2)]
fin = int(n ** 0.5) // 2
# Loop over the c, marking out each multiple.
for i in range(fin):
if not c[i]:
continue
c[c[i] + i::c[i]] = [None] * ((n // c[i]) - (n // (2 * c[i])) - 1)
# Filter out non-primes and return the list.
return [2] + [i for i in c if i]
This algorithm improves the original algorithm (mentioned at the top) by (usually) 50%. (Still, worse than the algorithm mentioned by nightcracker, naturally).
A question to Python Masters: Is there a more Pythonic way to express this last code, in a more "functional" way?
UPDATE 2: I still couldn't decode the algorithm mentioned by nightcracker. I guess I'm too stupid.
The question is, why would it even be faster? In both examples you are filtering multiples of two, the hard way. It doesn't matter whether you hardcode candidates[4::2] = [None] * (n // 2 - 1) or that it gets executed in the first loop of for i in range(2, fin + 1):.
If you are interested in an optimized sieve of Eratosthenes, here you go:
def primesbelow(N):
# https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
#""" Input N>=6, Returns a list of primes, 2 <= p < N """
correction = N % 6 > 1
N = (N, N-1, N+4, N+3, N+2, N+1)[N%6]
sieve = [True] * (N // 3)
sieve[0] = False
for i in range(int(N ** .5) // 3 + 1):
if sieve[i]:
k = (3 * i + 1) | 1
sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]
Explanation here: Porting optimized Sieve of Eratosthenes from Python to C++
The original source is here, but there was no explanation. In short this primesieve skips multiples of 2 and 3 and uses a few hacks to make use of fast Python assignment.
You do not save a lot of time avoiding the evens. Most of the computation time within the algorithm is spent doing this:
candidates[i + i::i] = [None] * (n // i - 1)
That line causes a lot of action on the part of the computer. Whenever the number in question is even, this is not run as the loop bails on the if statement. The time spent running the loop for even numbers is thus really really small. So eliminating those even rounds does not produce a significant change in the timing of the loop. That's why your method isn't considerably faster.
When python produces numbers for range it uses a formula: start + index * step. Multiplying by two as you do in your case is going to be slightly more expensive then one as in the original case.
There is also quite possibly a small overhead to having a longer function.
Neither are of those are really significant speed issues, but they override the very small amount of benefit your version brings.
Its probably slightly slower because you are performing extra set up to do something that was done in the first case anyway (marking off multiples of two). That setup time might be what you see if it is as slight as you say
Your extra step is unnecessary and will actually traverse the whole collection n once doing that 'get rid of evens' operation rather than just operating on n^1/2.

Categories