I have two functions, both of which flatten an arbitrarily nested list of lists in Python.
I am trying to figure out the time complexity of both, to see which is more efficient, but I haven't found anything definitive on SO anything so far. There are lots of questions about lists of lists, but not to the nth degree of nesting.
function 1 (iterative)
def flattenIterative(arr):
i = 0
while i < len(arr):
while isinstance(arr[i], list):
if not arr[i]:
arr.pop(i)
i -= 1
break
else:
arr[i: i + 1] = arr[i]
i += 1
return arr
function 2 (recursive)
def flattenRecursive(arr):
if not arr:
return arr
if isinstance(arr[0], list):
return flattenRecursive(arr[0]) + flattenRecursive(arr[1:])
return arr[:1] + flattenRecursiveweb(arr[1:])
My thoughts are below:
function 1 complexity
I think that the time complexity for the iterative version is O(n * m), where n is the length of the initial array, and m is the amount of nesting. I think space complexity of O(n) where n is the length of the initial array.
function 2 complexity
I think that the time complexity for the recursive version will be O(n) where n is the length of the input array. I think space complexity of O(n * m) where n is the length of the initial array, and m is the depth of nesting.
summary
So, to me it seems that the iterative function is slower, but more efficient with space. Conversely, the recursive function is faster, but less efficient with space. Is this correct?
I don't think so. There are N elements, so you will need to visit each element at least once. Overall, your algorithm will run for O(N) iterations. The deciding factor is what happens per iteration.
Your first algorithm has 2 loops, but if you observe carefully, it is still iterating over each element O(1) times per iteration. However, as #abarnert pointed out, the arr[i: i + 1] = arr[i] moves every element from arr[i+1:] up, which is O(N) again.
Your second algorithm is similar, but you are adding lists in this case (in the previous case, it was a simple slice assignment), and unfortunately, list addition is linear in complexity.
In summary, both your algorithms are quadratic.
Related
I know there are many other questions out there asking for the general guide of how to calculate the time complexity, such as this one.
From them I have learnt that when there is a loop, such as the (for... if...) in my Python programme, the Time complexity is N * N where N is the size of input. (please correct me if this is also wrong) (Edited once after being corrected by an answer)
# greatest common divisor of two integers
a, b = map(int, input().split())
list = []
for i in range(1, a+b+1):
if a % i == 0 and b % i == 0:
list.append(i)
n = len(list)
print(list[n-1])
However, do other parts of the code also contribute to the time complexity, that will make it more than a simple O(n) = N^2 ? For example, in the second loop where I was finding the common divisors of both a and b (a%i = 0), is there a way to know how many machine instructions the computer will execute in finding all the divisors, and the consequent time complexity, in this specific loop?
I wish the question is making sense, apologise if it is not clear enough.
Thanks for answering
First, a few hints:
In your code there is no nested loop. The if-statement does not constitute a loop.
Not all nested loops have quadratic time complexity.
Writing O(n) = N*N doesn't make any sense: what is n and what is N? Why does n appear on the left but N is on the right? You should expect your time complexity function to be dependent on the input of your algorithm, so first define what the relevant inputs are and what names you give them.
Also, O(n) is a set of functions (namely those asymptotically bounded from above by the function f(n) = n, whereas f(N) = N*N is one function. By abuse of notation, we conventionally write n*n = O(n) to mean n*n ∈ O(n) (which is a mathematically false statement), but switching the sides (O(n) = n*n) is undefined. A mathematically correct statement would be n = O(n*n).
You can assume all (fixed bit-length) arithmetic operations to be O(1), since there is a constant upper bound to the number of processor instructions needed. The exact number of processor instructions is irrelevant for the analysis.
Let's look at the code in more detail and annotate it:
a, b = map(int, input().split()) # O(1)
list = [] # O(1)
for i in range(1, a+b+1): # O(a+b) multiplied by what's inside the loop
if a % i == 0 and b % i == 0: # O(1)
list.append(i) # O(1) (amortized)
n = len(list) # O(1)
print(list[n-1]) # O(log(a+b))
So what's the overall complexity? The dominating part is indeed the loop (the stuff before and after is negligible, complexity-wise), so it's O(a+b), if you take a and b to be the input parameters. (If you instead wanted to take the length N of your input input() as the input parameter, it would be O(2^N), since a+b grows exponentially with respect to N.)
One thing to keep in mind, and you have the right idea, is that higher degree take precedence. So you can have a step that’s constant O(1) but happens n times O(N) then it will be O(1) * O(N) = O(N).
Your program is O(N) because the only thing really affecting the time complexity is the loop, and as you know a simple loop like that is O(N) because it increases linearly as n increases.
Now if you had a nested loop that had both loops increasing as n increased, then it would be O(n^2).
I'm currently doing some work around Big-O complexity and calculating the complexity of algorithms.
I seem to be struggling to work out the steps to calculate the complexity and was looking for some help to tackle this.
The function:
index = 0
while index < len(self.items):
if self.items[index] == item:
self.items.pop(index)
else:
index += 1
The actual challenge is to rewrite this function so that is has O(n) worst case complexity.
My problem with this is, as far as I thought, assignment statements and if statements have a complexity of O(1) whereas the while loop has a complexity of (n) and in the worst case any statements within the while loop could execute n times. So i work this out as 1 + n + 1 = 2 + n = O(n)
I figure I must be working this out incorrectly as there'd be no point in rewriting the function otherwise.
Any help with this is greatly appreciated.
If self.items is a list, the pop operation has complexity "k" where k is the index,
so the only way this is not O(N) is because of the pop operation.
Probably the exercise is done in order for you to use some other method of iterating and removing from the list.
To make it O(N) you can do:
self.items = [x for x in self.items if x == item]
If you are using Python's built in list data structure the pop() operation is not constant in the worst case and is O(N). So your overall complexity is O(N^2). You will need to use some other data structure like linked list if you cannot use auxiliary space.
With no arguments to pop its O(1)
With an argument to pop:
Average time Complexity O(k) (k represents the number passed in as an argument for pop
Amortized worst case time complexity O(k)
Worst case time complexity O(n)
Time Complexity - Python Wiki
So to make your code effective, allow user to pop from the end of the list:
for example:
def pop():
list.pop(-1)
Reference
Since you are passing index to self.items.pop(index), Its NOT O(1).
A variation of the DNF is as follows:
def dutch_flag_partition(pivot_index , A):
pivot = A[pivot_index]
# First pass: group elements smaller than pivot.
for i in range(len(A)):
# Look for a smaller element.
for j in range(i + 1, len(A)):
if A[j] < pivot:
A[i], A[j] = A[j], A[i]
break
# Second pass: group elements larger than pivot.
for i in reversed(range(len(A))):
if A[i] < pivot:
break
# Look for a larger element. Stop when we reach an element less than
# pivot , since first pass has moved them to the start of A.
for j in reversed(range(i)):
if A[j] > pivot:
A[i], A[j] = A[j], A[i]
break
The additional space complexity is given as O(1). Is that because the swapping doesn't depend on the input length? And time complexity, given as O(N^2), is it so due to the nested loops? Thanks
The additional space complexity is given as O(1). Is that because the swapping doesn't depend on the input length?
No. Swapping, in fact, takes no extra space at all.
More importantly, you can't just look for one thing and say however much that thing takes, that's the complexity. You have to look over all the things, and the largest one determines the complexity. So, look over all the things you're creating:
pivot is just a reference to one of the list members, which is constant size.
a range is constant size.
an iterator over a range is constant-size.
the i and j integer values returned by the range iterator are constant size.1
…
Since nothing is larger than constant size, the total size is constant.
And time complexity, given as O(N^2), is it so due to the nested loops?
Well, yes, but you have to get a bit more detailed than that. Two nested loops don't necessarily mean quadratic. Two nested loops that do linear work inside the nested loop would be cubic. Two nested loops that combine so that the size of the inner loop depends inversely on the outer loop are linear. And so on.
And again, you have to add up everything, not just pick one thing and guess.
So, the first pass does:
A plain list indexing and assignment, constant.
A loop over the input length.
… with a loop over the input length
… with some list indexing, comparisons, and assignments, all constant
… which also breaks early in some cases… which we can come back to.
So, if the break doesn't help at all, that's O(1 + N * N * 1), which is O(N * N).
And the second pass is similarly O(N * (1 + N * 1)), which is again O(N * N).
And if you add O(N * N + N * N), you get O(N * N).
Also, even if the break made the first pass log-linear or something, O(N * log N + N * N) is still O(N * N), so it wouldn't matter.
So the time is quadratic.
1. Technically, this isn't quite true. Integers are variable-sized, and the memory they take is the log of their magnitude. So, i and j, and the stop attributes of the range objects, and probably some other stuff are all log N. But, unless you're dealing with huge-int arithmetic, like in crypto algorithms that multiply huge prime factors, people usually ignore this, and get away with it.
The additional space complexity is given as O(1). Is that because the swapping doesn't depend on the input length?
As you are "just" swapping there is no new data being created or generated, you are just reassigning values you already have, thus why the space complexity is constant.
And time complexity, given as O(N^2), is it so due to the nested loops?
True. It's a second order polynomial time complexity because you have two for loops nested.
You have a break in them, so in more favorable cases your time complexity will be below N^2. However, as big-O is worst case then it's ok to say it's of degree 2.
This is the function:
c = []
def badsort(l):
v = 0
m = len(l)
while v<m:
c.append(min(l))
l.remove(min(l))
v+=1
return c
Although I realize that this is a very inefficient way to sort, I was wondering what the time complexity of such a function would be, as although it does not have nested loops, it repeats the loop multiple times.
Terminology
Assume that n = len(l).
Iteration Count
The outer loops runs n times. The min() in the inner loop runs twice over l (room for optimisation here) but for incrementally decreasing numbers (for each iteration of the loop, the length of l decrements because you remove an item from the list every time).
That way the complexity is 2 * (n + (n-1) + (n-2) + ... + (n-n)).
This equals 2 * (n^2 - (1 + 2 + 3 + ... + n)).
The second term in the parenthesis is a triangular number and diverges to n*(n+1)/2.
Therefore your complexity equals 2*(n^2 - n*(n+1)/2)).
This can be expanded to 2*(n^2 - n^2/2 - n/2),
and simplified to n^2 - n.
BigO Notation
BigO notation is interested in the overall growth trend, rather than the precise growth rate of the function.
Drop Constants
In BigO notation, the constants are dropped. This leaves us still with n^2 - n since there are no constants.
Retain Only Dominant Terms
Also, in BigO notation only the dominant terms are considered. n^2 is dominant over n, so n is dropped.
Result
That means the answer in BigO is O(n) = n^2, i.e. quadratic complexity.
Here are a couple of useful points to help you understand how to find the complexity of a function.
Measure the number of iterations
Measure the complexity of each operation at each iteration
For the first point, you see the terminating condition is v < m, where v is 0 initially, and m is the size of the list. Since v increments by one at each iteration, the loop runs at most (at least) N times, where N is the size of the list.
Now, for the second point. Per iteration we have -
c.append(min(l))
Where min is a linear operation, taking O(N) time. append is a constant operation.
Next,
l.remove(min(l))
Again, min is linear, and so is remove. So, you have O(N) + O(N) which is O(N).
In summary, you have O(N) iterations, and O(N) per iteration, making it O(N ** 2), or quadratic.
The time compexity for this problem is O(n^2). While the code itself has only one obvious loop, the while loop, the min and max functions are both O(n) by implementation, because at worst case, it would have to scan the entire list to find the corresponding minimum or maximum value. list.remove is O(n) because it too has to traverse the list until it finds the first target value, which at worst case, could be at the end. list.append is amortized O(1), due to a clever implementation of the method, because list.append is technically O(n)/n = O(1) for n objects pushed:
def badsort(l):
v = 0
m = len(l)
while v<m: #O(n)
c.append(min(l)) #O(n) + O(1)
l.remove(min(l)) #O(n) + O(n)
v+=1
return c
Thus, there is:
Outer(O(n)) * Inner(O(n)+O(n)+O(n)) = Outer(O(n)) * Inner(O(n))
O(n)+O(n)+O(n) can be combined to simply O(n) because big o measures worst case. Thus, by combining the outer and inner compexities, the final complexity is O(n^2).
I am creating a fast method of generating a list of primes in the range(0, limit+1). In the function I end up removing all integers in the list named removable from the list named primes. I am looking for a fast and pythonic way of removing the integers, knowing that both lists are always sorted.
I might be wrong, but I believe list.remove(n) iterates over the list comparing each element with n. meaning that the following code runs in O(n^2) time.
# removable and primes are both sorted lists of integers
for composite in removable:
primes.remove(composite)
Based off my assumption (which could be wrong and please confirm whether or not this is correct) and the fact that both lists are always sorted, I would think that the following code runs faster, since it only loops over the list once for a O(n) time. However, it is not at all pythonic or clean.
i = 0
j = 0
while i < len(primes) and j < len(removable):
if primes[i] == removable[j]:
primes = primes[:i] + primes[i+1:]
j += 1
else:
i += 1
Is there perhaps a built in function or simpler way of doing this? And what is the fastest way?
Side notes: I have not actually timed the functions or code above. Also, it doesn't matter if the list removable is changed/destroyed in the process.
For anyone interested the full functions is below:
import math
# returns a list of primes in range(0, limit+1)
def fastPrimeList(limit):
if limit < 2:
return list()
sqrtLimit = int(math.ceil(math.sqrt(limit)))
primes = [2] + range(3, limit+1, 2)
index = 1
while primes[index] <= sqrtLimit:
removable = list()
index2 = index
while primes[index] * primes[index2] <= limit:
composite = primes[index] * primes[index2]
removable.append(composite)
index2 += 1
for composite in removable:
primes.remove(composite)
index += 1
return primes
This is quite fast and clean, it does O(n) set membership checks, and in amortized time it runs in O(n) (first line is O(n) amortized, second line is O(n * 1) amortized, because a membership check is O(1) amortized):
removable_set = set(removable)
primes = [p for p in primes if p not in removable_set]
Here is the modification of your 2nd solution. It does O(n) basic operations (worst case):
tmp = []
i = j = 0
while i < len(primes) and j < len(removable):
if primes[i] < removable[j]:
tmp.append(primes[i])
i += 1
elif primes[i] == removable[j]:
i += 1
else:
j += 1
primes[:i] = tmp
del tmp
Please note that constants also matter. The Python interpreter is quite slow (i.e. with a large constant) to execute Python code. The 2nd solution has lots of Python code, and it can indeed be slower for small practical values of n than the solution with sets, because the set operations are implemented in C, thus they are fast (i.e. with a small constant).
If you have multiple working solutions, run them on typical input sizes, and measure the time. You may get surprised about their relative speed, often it is not what you would predict.
The most important thing here is to remove the quadratic behavior. You have this for two reasons.
First, calling remove searches the entire list for values to remove. Doing this takes linear time, and you're doing it once for each element in removable, so your total time is O(NM) (where N is the length of primes and M is the length of removable).
Second, removing elements from the middle of a list forces you to shift the whole rest of the list up one slot. So, each one takes linear time, and again you're doing it M times, so again it's O(NM).
How can you avoid these?
For the first, you either need to take advantage of the sorting, or just use something that allows you to do constant-time lookups instead of linear-time, like a set.
For the second, you either need to create a list of indices to delete and then do a second pass to move each element up the appropriate number of indices all at once, or just build a new list instead of trying to mutate the original in-place.
So, there are a variety of options here. Which one is best? It almost certainly doesn't matter; changing your O(NM) time to just O(N+M) will probably be more than enough of an optimization that you're happy with the results. But if you need to squeeze out more performance, then you'll have to implement all of them and test them on realistic data.
The only one of these that I think isn't obvious is how to "use the sorting". The idea is to use the same kind of staggered-zip iteration that you'd use in a merge sort, like this:
def sorted_subtract(seq1, seq2):
i1, i2 = 0, 0
while i1 < len(seq1):
if seq1[i1] != seq2[i2]:
i2 += 1
if i2 == len(seq2):
yield from seq1[i1:]
return
else:
yield seq1[i1]
i1 += 1