Problems measuring complexity in python's scripts - python

I have problems measuring complexity with python. Given the next two scripts:
1 def program1(L):
2 multiples = []
3 for x in L:
4 for y in L:
5 multiples.append(x*y)
6 return multiples
1 def program3(L1, L2):
2 intersection = []
3 for elt in L1:
4 if elt in L2:
5 intersection.append(elt)
6 return intersection
In the first one the best case (minimum steps to run the sript) is two considering an empty list L so are executed only the second and the sixth lines. The solution for the best case scenario: is 2.
In the worst case scenario L is a long list it goes through the loop for x in L n times.
The inner loop has three operations (assignment of a value to y, x*y, and list appending). So the inner loop executes 3*n times on each iteration of the outer loop. Thus the nested loop structure is executed n * (3*n + 1) = 3*n**2 + n times. Adding the second and the sixth line we get the solution 3n²+n+2.
But my question is: Where does comes from the number 1 in n(3n+1)?
According to me the solution is n(3n)+2 = 3n²+2 vs the right answer n(3n+1)+2 = 3n²+n+2.
Meanwhile in the second one the worst case scenario is n²+2n+2 but I don't understand why there is a quadratic term if there is only one loop.

According to you, there are three instructions in the innermost (y) loop of program1.
Assign to y.
Compute x*y.
Append to list.
By that same logic, there is one instruction in the outmost (x) loop:
Assign to x.
Perform inmost loop, see above.
That would make the outer loop:
n * (1 {assign to x} + n * 3 {assign, multiply, append})
Or:
n * (1 + 3n)
Adding the init/return instructions gives:
2 + n + 3n²
In program2, there is a similar situation with a "hidden loop":
2 instructions for init/return, plus ...
Then you run for elt in L1, which is going to be n iterations (n is size of L1). Your inner code is an if statement. In the worst case, the if body always runs. In the best case, it never runs.
The if condition is testing elt in L2, which is going to run an iterative function, type(L2).__contains__() on L2. The simple case will be an O(m) operation, where m is the length of L2. It is possible that L2 is not a list but some type where the in operation does not require a linear scan. For example, it might be a B-tree, or a dict, or a set, or who knows what? So you could assume that the best-case scenario is that elt in L2 is O(1) and the answer is no, while the worst-case is that elt in L2 is O(m) and the answer is yes.
Best case: 2 + n * (1 {assign to elt} + 1 {search L2})
Best case if L2 is a list: 2 + n * (1 {assign to elt} + m {search L2})
Worst case: 2 + n * (1 {assign to elt} + m {search L2} + 1 {append})
Which gives you 2 + 2n best case, 2 + n + nm best case if L2 is a list, and 2 + 2n + nm worst case.
You may be inclined to treat m as equal to n. That's your call, but if you're counting assignment statements, I'd argue against it.

Related

Execution time difference between x += y and x = x + y [duplicate]

This question already has answers here:
Why does += behave unexpectedly on lists?
(9 answers)
+ and += operators are different? [duplicate]
(4 answers)
What exactly does += do?
(17 answers)
What does the += signify in python 3? [duplicate]
(3 answers)
Closed 1 year ago.
I was trying to submit my solution to a leetcode problem wherein x and y are lists and using
x = x + y
gave me a Time limit exceeded
whereas
using
x += y
passed the test cases and gave me AC.
What is the execution time difference between the two and the difference in the way both are executed?
For list objects,
temp = temp + []
Creates a new list, and takes linear time on the size of the resulting list (it scales linearly). Importantly, it re-creates the entire new list. If done in a loop, e.g.
x = []
for i in range(N):
x = x + [i]
The entire algorithm is quadratic time, O(N^2)
On the other hand, temp += [] works in-place. It does not create a new list. It is O(K) where K is the size of the list on the right, i.e. the number of elements added. This works this way because python list objects are implemented as array-lists which overallocate so you don't have to reallocate each time the list increases in size. Simply put, this means that appending an item to the end of the list is amortized constant time. Importantly, this makes:
x = []
for i in range(N):
x += [i]
linear time, i.e. O(N).
To see this behavior empirically, you could use the following script:
import pandas as pd
import matplotlib.pyplot as plt
import time
def concatenate(N):
result = []
for i in range(N):
result = result + [i]
def inplace(N):
result = []
for i in range(N):
result += [i]
def time_func(N, f):
start = time.perf_counter()
f(N)
stop = time.perf_counter()
return stop - start
NS = range(0, 100_001, 10_000)
inplc = [time_func(n, inplace) for n in NS]
concat = [time_func(n, concatenate) for n in NS]
df = pd.DataFrame({"in-place":inplc, "concat": concat}, index=NS)
df.plot()
plt.savefig('in-place-vs-new-list-loop.png')
Notice, at a N == 100_000, the concatenation version is taking over 10 seconds, whereas the in-place extend version takes 0.01 seconds... so it's several orders of magnitude slower, and the difference will keep growing dramatically (i.e. quadratically) as N increases.
To understand this behavior, here is an informal treatment of the time complexity:
For concat, at each iteration, x = x + [i] takes i amount of work, where i is the length of the resulting array. So the whole loop will be 0 + 1 + 2 + 3 + ... + N. Now, using the handy formula for the Nth partial sum of this well-known series the loop will require N*(N+1)/2 total work.
N*(N + 1) / 2 == N^2/2 + N/2 which is simply O(N^2)
On the other hand, the in-place extend version, on each iteration,
temp += [i]
Requires only 1 (constant) amount of work. So for the whole loop, it's just
1 + 1 + ... + 1 (N times)
So N total amount of work, so it is O(N)
The expression a = a + b does the following:
Allocate a new list big enough to hold both a and b.
Copy a to the new buffer.
Copy b to the new buffer.
Bind the name a to the new buffer (which is what list.__add__ returns).
The allocation and copies are inevitable in this case, regardless of the fact that b is empty.
The expression a += b is roughly equivalent to list.extend with an assignment at the end:
Extend the buffer of a by enough elements to hold b. This does not always involve a reallocation, because list growth is amortized to take O(n) time in the long run.
Copy b to the end of a.
Rebind the name a to the same object (which is what list.__iadd__ returns).
Notice that in this case, step is is a reallocation, so elements of a are copied only when the memory moves. Since b is empty in your case, nothing is reallocated or copied at all.

What is the Big-O and exact runtime

def alg 3(n)
x = 0
i = 1
while i < n:
for j in range(0, n**3,n*3):
x += 1
i *= 3
return x
I don't really get the Big-O and exact runtime of this code. I first thought that the Big-O is O(n^3) * logn because of the n**3 but the n * 3 confuses me. Could someone please explain that problem? Thanks.
In order to compute the complexity we have to decompose the problem into two sub problems:
the inner for: the complexity of this one is n³/3n ~ O(n²)
the outer while: how many steps i needs to reach n:
first step i = 1 second step i = 3 third step i = 3*3 = 9 kth step i = 3^k
hence the solution for the second sub-problem is log3(k) ~ O(log(k))
the final complexity is: first sub-problem complexity multiplied by the second sub-problem complexity --> O(n²log(n))

Python - Complexity of O(n^2) number of jumps in list

If a list has 3 elements, how is that O(n^2) means we have to do 9 jumps to navigate a collection of 3 elements? If a jump is just one element how can it be 9 jumps?
list = [1, 2, 3]
I don't understand what the author means we have to do 9 jumps.. Can someone explain?
The code example provided below is O(n) because it uses a for loop to iterate over n elements.
n = len(list) # n = the length of the list here 3.
for i in range(0, n): # This runs n = 3 times, where n is the size of the list O(n)
print(i)
Now consider a nested for loop, meaning a second loop within your current for loop.
This is O(n^2) due to iterating over n*n elements.
n = len(list) # n = the length of the list here 3.
for i in range(0, n): # This runs n = 3 times.
for j in range (0, n): # This runs n = 3 times for each value of the i.
print(j)
Think of it this way, the above code segment prints 0, 1, 2 in the inner loop as many times as the outer loop specifies, in this case 3. Therefore at the end 0, 1, 2, 0, 1, 2, 0, 1, 2 will be printed which is exactly n*n = 3*3 = 9 elements.
O(n^2) means that for you to reach the desired output of that input you have to go through all the input data twice, for example if your solution consists of a nested loop in which the outer and the inner loop both go from 0 to n (Size of the input data).
You can check this for more Big O notation coding examples.
Big O notation gives us an estimate of the number of operations we have to perform in order to use the algorithm on N elements if Big O notation is n then we need to n operations to perform for the function.
Linear search in list takes Big O of n
Why
def linearsearch(numbers,s):
for i in range(len(numbers)):
if numbers[i] == s:
return i
return -1
We will look a worst case that s is not in the list. For that, we will do all constant operations n times if our list size is n.
When n increase time will be increase linearly since the time taken for the linearsearch depends on size of the list
If all constant operations take c time then when we do for 5 elements it takes 5c and if we do for 10 elements it takes 10c (Here we assume the worst case when there is no s in the list)
Lets see O(n2)
If we have n elements then the time taken will be square of n
Lets look following linear search in 2D list
Numbers = [[1,2,3][4,5,6]]
def linearsearch(numbers,s):
for inner in range(len(numbers)):
for i in range(len(numbers[inner])):
if numbers[inner][i] == s:
return i
return -1
With above code and we consider some assumptions to illustrate n2
Inner list has n elements and outer list also has n innerlist
When we need to search an element like s inside the innerlist then we need to do the thoee operations n2 times
If we have 2 elements in the innerlist and 2 innerlist in the outer list
Then 4 c will be average time if we consider c for time taken for constant operations 3,3 then 9
So if we have n elements inside the inner list then the tims complexity will be O(n2)

How is the Big O of this function O(n^3)?

I am stuck on this Big O notation on how it is supposed to be O(n^3). Where did my thought process go wrong?
I know that a nested for loop is O(n^2) and that the while loop is probably a O(nlogn) function because the for loop is a O(n) function and the value for the while loop is being multiplied by two which makes it O(logn). That being said, the answer is stated to be O(n^3) and I'm confused how this came to be unless the recursive part of the function has something to do with it?
def do_stuff2(n, x=1.23):
if n <= 0:
return 0
val = 1
for i in range(n//2):
for j in range(n//4):
x += 2*x + j/2 + i*1.2
while val <= n:
for i in range(n):
x += val**2 + i//2
val *= 2
x += do_stuff2(n - 1, x/2)
return x
I believe that the x does not affect the big o notation because it is a constant because it is not used in deciding how many times any of the loops loop.
So again, I expected the output of the function to be O(n^2), but the actual output is O(n^3)
Your function has two nested for loops, that's O(n^2):
for i in range(n//2):
for j in range(n//4):
x += 2*x + j/2 + i*1.2
But on top of that, your do_stuff2() function takes an argument n and calls itself until n <= 0, meaning that's one more O(n).

Time Complexity of recursive of function

Need help proving the time complexity of a recursive function.
Supposedly it's 2^n. I need to prove that this is the case.
def F(n):
if n == 0:
return 0
else:
result = 0
for i in range(n):
result += F(i)
return n*result+n`
Here's another version that does the same thing. The assignment said to use an array to store values in an attempt to reduce the time complexity so what I did was this
def F2(n,array):
if n < len(array):
answer = array[n]
elif n == 0:
answer = 0
array.append(answer)
else:
result = 0
for i in range(n):
result += F2(i,array)
answer = n*result+n
array.append(answer)
return answer
Again what I am looking for is the explanation of how to find the complexities of the two snippets of code, not interested in just knowing the answer.
All and any help greatly appreciated.
PS: for some reason, I can't get "def F2" to stay in the code block...sorry about that
Okay, the first function you wrote is an example of Exhaustive Search where you are exploring every possible branch that can be formed from a set of whole numbers up to n (which you have passed in the argument and you are using for loop for that). To explain you the time complexity I am going to consider the recursion stack as a Tree (to represent a recursive function call stack you can either use a stack or use an n-ary Tree)
Let's call you first function F1:
F1(3), now three branches will be formed for each number in the set S (set is the whole numbers up to n). I have taken n = 3, coz it will be easy for me to make the diagram for it. You can try will other larger numbers and observe the recursion call stack.
3
/| \
0 1 2 ----> the leftmost node is returns 0 coz (n==0) it's the base case
| /\
0 0 1
|
0 ----> returns 0
So here you have explored every possibility branches. If you try to write the recursive equation for the above problem then:
T(n) = 1; n is 0
= T(n-1) + T(n-2) + T(n-3) + ... + T(1); otherwise
Here,
T(n-1) = T(n-2) + T(n-3) + ... T(1).
So, T(n-1) + T(n-2) + T(n-3) + ... + T(1) = T(n-1) + T(n-1)
So, the Recursive equation becomes:
T(n) = 1; n is 0
= 2*T(n-1); otherwise
Now you can easily solve this recurrence relation (or use can use Masters theorem for the fast solution). You will get the time complexity as O(2^n).
Solving the recurrence relation:
T(n) = 2T(n-1)
= 2(2T(n-1-1) = 4T(n-2)
= 4(2T(n-3)
= 8T(n-3)
= 2^k T(n-k), for some integer `k` ----> equation 1
Now we are given the base case where n is 0, so let,
n-k = 0 , i.e. k = n;
Put k = n in equation 1,
T(n) = 2^n * T(n-n)
= 2^n * T(0)
= 2^n * 1; // as T(0) is 1
= 2^n
So, T.C = O(2^n)
So this is how you can get the time complexity for your first function. Next, if you observe the recursion Tree formed above (each node in the tree is a subproblem of the main problem), you will see that the nodes are repeating (i.e. the subproblems are repeating). So you have used a memory in your second function F2 to store the already computed value and whenever the sub-problems are occurring again (i.e. repeating subproblems) you are using the pre-computed value (this saves time for computing the sub-problems again and again). The approach is also known as Dynamic Programming.
Let's now see the second function, here you are returning answer. But, if you see your function you are building an array named as array in your program. The main time complexity goes there. Calculating its time complexity is simple because there is always just one level of recursion involved (or casually you can say no recursion involved) as every number i which is in range of number n is always going to be less than the number n, So the first if condition gets executed and control returns from there in F2. So each call can't go deeper than 2 level in the call stack.
So,
Time complexity of second function = time taken to build the array;
= 1 comparisions + 1 comparisions + 2 comparisions + ... + (n-1) comparisions
= 1 + 2 + 3 + ... + n-1
= O(n^2).
Let me give you a simple way to observe such recursions more deeply. You can print the recursion stack on the console and observe how the function calls are being made. Below I have written your code where I am printing the function calls.
Code:
def indent(n):
for i in xrange(n):
print ' '*i,
# second argument rec_cnt is just taken to print the indented function properly
def F(n, rec_cnt):
indent(rec_cnt)
print 'F(' + str(n) + ')'
if n == 0:
return 0
else:
result = 0
for i in range(n):
result += F(i, rec_cnt+1)
return n*result+n
# third argument is just taken to print the indented function properly
def F2(n, array, rec_cnt):
indent(rec_cnt)
print 'F2(' + str(n) + ')'
if n < len(array):
answer = array[n]
elif n == 0:
answer = 0
array.append(answer)
else:
result = 0
for i in range(n):
result += F2(i, array, rec_cnt+1)
answer = n*result+n
array.append(answer)
return answer
print F(4, 1)
lis = []
print F2(4, lis, 1)
Now observe the output:
F(4)
F(0)
F(1)
F(0)
F(2)
F(0)
F(1)
F(0)
F(3)
F(0)
F(1)
F(0)
F(2)
F(0)
F(1)
F(0)
96
F2(4)
F2(0)
F2(1)
F2(0)
F2(2)
F2(0)
F2(1)
F2(3)
F2(0)
F2(1)
F2(2)
96
In the first function call stack i.e. F1, you see that each call is explored up to 0, i.e. we are exploring each possible branch up to 0 (the base case), so, we call it Exhaustive Search.
In the second function call stack, you can see that the function calls are getting only two levels deep, i.e. they are using the pre-computed value to solve the repeated subproblems. Thus, it's time complexity is lesser than F1.

Categories