def f2(L):
sum = 0
i = 1
while i < len(L):
sum = sum + L[i]
i = i * 2
return sum
Let n be the size of the list L passed to this function. Which of the following most accurately describes how the runtime of this function grow as n grows?
(a) It grows linearly, like n does.
(b) It grows quadratically, like n^2 does.
(c) It grows less than linearly.
(d) It grows more than quadratically.
I don't understand how you figure out the relationship between the runtime of the function and the growth of n. Can someone please explain this to me?
ok, since this is homework:
this is the code:
def f2(L):
sum = 0
i = 1
while i < len(L):
sum = sum + L[i]
i = i * 2
return sum
it is obviously dependant on len(L).
So lets see for each line, what it costs:
sum = 0
i = 1
# [...]
return sum
those are obviously constant time, independant of L.
In the loop we have:
sum = sum + L[i] # time to lookup L[i] (`timelookup(L)`) plus time to add to the sum (obviously constant time)
i = i * 2 # obviously constant time
and how many times is the loop executed?
it's obvously dependant on the size of L.
Lets call that loops(L)
so we got an overall complexity of
loops(L) * (timelookup(L) + const)
Being the nice guy I am, I'll tell you that list lookup is constant in python, so it boils down to
O(loops(L)) (constant factors ignored, as big-O convention implies)
And how often do you loop, based on the len() of L?
(a) as often as there are items in the list (b) quadratically as often as there are items in the list?
(c) less often as there are items in the list (d) more often than (b) ?
I am not a computer science major and I don't claim to have a strong grasp of this kind of theory, but I thought it might be relevant for someone from my perspective to try and contribute an answer.
Your function will always take time to execute, and if it is operating on a list argument of varying length, then the time it takes to run that function will be relative to how many elements are in that list.
Lets assume it takes 1 unit of time to process a list of length == 1. What the question is asking, is the relationship between the size of the list getting bigger vs the increase in time for this function to execute.
This link breaks down some basics of Big O notation: http://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
If it were O(1) complexity (which is not actually one of your A-D options) then it would mean the complexity never grows regardless of the size of L. Obviously in your example it is doing a while loop dependent on growing a counter i in relation to the length of L. I would focus on the fact that i is being multiplied, to indicate the relationship between how long it will take to get through that while loop vs the length of L. Basically, try to compare how many loops the while loop will need to perform at various values of len(L), and then that will determine your complexity. 1 unit of time can be 1 iteration through the while loop.
Hopefully I have made some form of contribution here, with my own lack of expertise on the subject.
Update
To clarify based on the comment from ch3ka, if you were doing more than what you currently have inside your with loop, then you would also have to consider the added complexity for each loop. But because your list lookup L[i] is constant complexity, as is the math that follows it, we can ignore those in terms of the complexity.
Here's a quick-and-dirty way to find out:
import matplotlib.pyplot as plt
def f2(L):
sum = 0
i = 1
times = 0
while i < len(L):
sum = sum + L[i]
i = i * 2
times += 1 # track how many times the loop gets called
return times
def main():
i = range(1200)
f_i = [f2([1]*n) for n in i]
plt.plot(i, f_i)
if __name__=="__main__":
main()
... which results in
Horizontal axis is size of L, vertical axis is how many times the function loops; big-O should be pretty obvious from this.
Consider what happens with an input of length n=10. Now consider what happens if the input size is doubled to 20. Will the runtime double as well? Then it's linear. If the runtime grows by factor 4, then it's quadratic. Etc.
When you look at the function, you have to determine how the size of the list will affect the number of loops that will occur.
In your specific situation, lets increment n and see how many times the while loop will run.
n = 0, loop = 0 times
n = 1, loop = 1 time
n = 2, loop = 1 time
n = 3, loop = 2 times
n = 4, loop = 2 times
See the pattern? Now answer your question, does it:
(a) It grows linearly, like n does. (b) It grows quadratically, like n^2 does.
(c) It grows less than linearly. (d) It grows more than quadratically.
Checkout Hugh's answer for an empirical result :)
it's O(log(len(L))), as list lookup is a constant time operation, independant of the size of the list.
Related
I just got this in an interview question, but got stumped on mod, my answer was O(log10) because it was divided every-time, but I couldn't explain it. So i switched my answer to O(n) but the interviewer said what does n represent, and I was going to say x, but that didn't make sense. if x is 1000, the code doesn't run 1000 times.
x is any number 1-10000000
def code(x):
count = 0
while x > 0:
x = x // 10
result = x % 7
if (result % 7 )== 0:
count += 1
return count
The while loop executes about log10x times. The rest is constant-time operations. Therefore, your time complexity is O(log x). Notice that in big-O notation, the base of the logarithm does not matter.
A more complicated question is what value will count have with respect to x. But this is not what you asked for. Notice that these calculations (assuming constant-time arithmetic operations) take constant time, just like the line x = x // 10.
Edit: If x is bounded above, it turns into a philosophical question whether the time complexity is O(log n) or O(1). It depends on how you view it (is the bound a part of the algorithm, or do you view the algorithm as general and the bound just something you know about the input you will get to it).
I know there are many other questions out there asking for the general guide of how to calculate the time complexity, such as this one.
From them I have learnt that when there is a loop, such as the (for... if...) in my Python programme, the Time complexity is N * N where N is the size of input. (please correct me if this is also wrong) (Edited once after being corrected by an answer)
# greatest common divisor of two integers
a, b = map(int, input().split())
list = []
for i in range(1, a+b+1):
if a % i == 0 and b % i == 0:
list.append(i)
n = len(list)
print(list[n-1])
However, do other parts of the code also contribute to the time complexity, that will make it more than a simple O(n) = N^2 ? For example, in the second loop where I was finding the common divisors of both a and b (a%i = 0), is there a way to know how many machine instructions the computer will execute in finding all the divisors, and the consequent time complexity, in this specific loop?
I wish the question is making sense, apologise if it is not clear enough.
Thanks for answering
First, a few hints:
In your code there is no nested loop. The if-statement does not constitute a loop.
Not all nested loops have quadratic time complexity.
Writing O(n) = N*N doesn't make any sense: what is n and what is N? Why does n appear on the left but N is on the right? You should expect your time complexity function to be dependent on the input of your algorithm, so first define what the relevant inputs are and what names you give them.
Also, O(n) is a set of functions (namely those asymptotically bounded from above by the function f(n) = n, whereas f(N) = N*N is one function. By abuse of notation, we conventionally write n*n = O(n) to mean n*n ∈ O(n) (which is a mathematically false statement), but switching the sides (O(n) = n*n) is undefined. A mathematically correct statement would be n = O(n*n).
You can assume all (fixed bit-length) arithmetic operations to be O(1), since there is a constant upper bound to the number of processor instructions needed. The exact number of processor instructions is irrelevant for the analysis.
Let's look at the code in more detail and annotate it:
a, b = map(int, input().split()) # O(1)
list = [] # O(1)
for i in range(1, a+b+1): # O(a+b) multiplied by what's inside the loop
if a % i == 0 and b % i == 0: # O(1)
list.append(i) # O(1) (amortized)
n = len(list) # O(1)
print(list[n-1]) # O(log(a+b))
So what's the overall complexity? The dominating part is indeed the loop (the stuff before and after is negligible, complexity-wise), so it's O(a+b), if you take a and b to be the input parameters. (If you instead wanted to take the length N of your input input() as the input parameter, it would be O(2^N), since a+b grows exponentially with respect to N.)
One thing to keep in mind, and you have the right idea, is that higher degree take precedence. So you can have a step that’s constant O(1) but happens n times O(N) then it will be O(1) * O(N) = O(N).
Your program is O(N) because the only thing really affecting the time complexity is the loop, and as you know a simple loop like that is O(N) because it increases linearly as n increases.
Now if you had a nested loop that had both loops increasing as n increased, then it would be O(n^2).
I'm trying to find out the time complexity (Big-O) of functions and trying to provide appropriate reason.
First function goes:
r = 0
# Assignment is constant time. Executed once. O(1)
for i in range(n):
for j in range(i+1,n):
for k in range(i,j):
r += 1
# Assignment and access are O(1). Executed n^3
like this.
I see that this is triple nested loop, so it must be O(n^3).
but I think my reasoning here is very weak. I don't really get what is going
on inside the triple nested loop here
Second function is:
i = n
# Assignment is constant time. Executed once. O(1)
while i>0:
k = 2 + 2
i = i // 2
# i is reduced by the equation above per iteration.
# so the assignment and access which are O(1) is executed
# log n times ??
I found out this algorithm to be O(1). But like the first function,
I don't see what is going on in the while-loop.
Can someone explain thoroughly about the time complexity of the two
functions? Thanks!
For such a simple case, you could find the number of iterations of the innermost loop as a function of n exactly:
sum_(i=0)^(n-1)(sum_(j=i+1)^(n-1)(sum_(k=i)^(j-1) 1)) = 1/6 n (n^2-1)
i.e., Θ(n**3) time complexity (see Big Theta): it assumes that r += 1 is O(1) if r has O(log n) digits (model has words with log n bits).
The second loop is even simpler: i //= 2 is i >>= 1. n has Θ(log n) digits and each iteration drops one binary digit (shift right) and therefore the whole loop is Θ(log n) time complexity if we assume that the i >> 1 shift of log(n) digits is O(1) operation (same model as in the first example).
Well first of all, for the first function, the time complexity seems to be closer to O(N log N) because the parameters of each loop decreases each time.
Also, for the second function, the runtime is O(log2 N). Except, say i == n == 2. After one run i is 1. After another i is 0.5. After another i is 0.25. And so on... I assume you would want int(i).
For a rigorous mathematical approach to each function, you can go to https://www.coursera.org/course/algo. It's a great course for this sort of thing. I was sort of sloppy in my calculations.
I am working on Project Euler Problem 50, which states:
The prime 41, can be written as the sum of six consecutive primes:
41 = 2 + 3 + 5 + 7 + 11 + 13
This is the longest sum of consecutive primes that adds to a prime below one-hundred.
The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.
Which prime, below one-million, can be written as the sum of the most consecutive primes?
For determining the terms in prime P (if it at all can be written as a sum of primes) I use a sliding window of all the primes (in increasing order) up to (but not including) P, and calculate the sum of all these windows, if the sum is equal to the prime considered, I count the length of the window...
This works fine for all primes up to 1000, but for primes up to 10**6 it is very slow, so I was hoping memozation would help; when calculating the sum of sliding windows, a lot of double work is done...(right?)
So I found the standard memoizaton implemention on the net and just pasted it in my code, is this correct? (I have no idea how it is supposed to work here...)
primes = tuple(n for n in range(1, 10**6) if is_prime(n)==True)
count_best = 0
##http://docs.python.org/release/2.3.5/lib/itertools-example.html:
## Slightly modified (first for loop)
from itertools import islice
def window(seq):
for n in range(2, len(seq) + 1):
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
def memoize(function):
cache = {}
def decorated_function(*args):
if args in cache:
return cache[args]
else:
val = function(*args)
cache[args] = val
return val
return decorated_function
#memoize
def find_lin_comb(prime):
global count_best
for windows in window(primes[0 : primes.index(prime)]):
if sum(windows) == prime and len(windows) > count_best:
count_best = len(windows)
print('Prime: ', prime, 'Terms: ', count_best)
##Find them:
for x in primes[::-1]: find_lin_comb(x)
(btw, the tuple of prime numbers is generated "decently" fast)
All input is appreciated, I am just a hobby programmer, so please don´t get to advanced on me.
Thank you!
Edit: here is a working code paste that doesn´t have ruined indentations:
http://pastebin.com/R1NpMqgb
This works fine for all primes up to 1000, but for primes up to 10**6 it is very slow, so I was hoping memozation would help; when calculating the sum of sliding windows, a lot of double work is done...(right?)
Yes, right. And of course it's slow for the primes up to 106.
Say you have n primes up to N, numbered in increasing order, p_1 = 2, p_2 = 3, .... When considering whether prime no. k is the sum of consecutive primes, you consider all windows [p_i, ..., p_j], for pairs (i,j) with i < j < k. There are (k-1)*(k-2)/2 of them. Going through all k to n, you examine about n³/6 windows in total (counting multiplicity, you're examining w(i.j) in total n-j times). Even ignoring the cost of creating the window and summing it, you can see how it scales badly:
For N = 1000, there are n = 168 primes and about 790000 windows to examine (counting multiplicity).
For N = 10**6, there are n = 78498 primes and about 8.3*10**13 windows to examine.
Now factor in the work for creating and summing the windows, estimate it low at j-i+1 for summing the j-i+1 primes in w(i,j), the work for p_k is about k³/6, and the total work becomes roughly k**4/24. Something like 33 million steps for N = 1000, peanuts, but nearly 1.6*10**18 for N = 1000000.
A year contains about 3.1*10**7 seconds, with a ~3GHz CPU, that's roughly 1017 clock cycles. So we're talking of an operation needing something like 100 CPU-years (may be a factor of 10 off or so).
You aren't willing to wait that long, I suppose;)
Now, with memoisation, you still look at each window multiple times, but you do the computation of each window only once. That means you need about n³/6 work for the computation of the windows, and look about n³/6 times at any window.
Problem 1: You still need to look at windows about 8.3*10**13 times, that's several hours even if looking cost only one cycle.
Problem 2: There are about 8.3*10**13 windows to memoise. You don't have that much memory, unless you can use a really large HD.
You can circumvent the memory problem by throwing away data you don't need anymore and only calculating data for the windows when it is needed, but once you know which data you may throw away when, you should be able to see a much better approach.
The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.
What does this tell you about the window generating that sum? Where can it start, where can it stop? How can you use that information to create an efficient algorithm to solve the problem?
The memoize decorator adds a wrapper to a function to cache the return value for each value of the argument (each combination of values in case of multiple arguments). It is useful when the function is called multiple times with the same arguments. You can only use it with a pure function, i.e.
The function has no side effects. Changing a global variable and doing output are examples of side effects.
The return value depends only on the values of the arguments, not on some global variables that may change values between calls.
Your find_lin_comb function does not satisfy the above criteria. For one thing, it is called with a different argument every time, for another, the function does not return a value.
I have a very long loop, and I would like to check the status every N iterations, in my specific case I have a loop of 10 million elements and I want to print a short report every millionth iteration.
So, currently I am doing just (n is the iteration counter):
if (n % 1000000==0):
print('Progress report...')
but I am worried I am slowing down the process by computing the modulus at each iteration, as one iteration lasts just few milliseconds.
Is there a better way to do this? Or shouldn't I worry at all about the modulus operation?
How about keeping a counter and resetting it to zero when you reach the wanted number? Adding and checking equality is faster than modulo.
printcounter = 0
# Whatever a while loop is in Python
while (...):
...
if (printcounter == 1000000):
print('Progress report...')
printcounter = 0
...
printcounter += 1
Although it's quite possible that the compiler is doing some sort of optimization like this for you already... but this may give you some peace of mind.
1. Human-language declarations for x and n:
let x be the number of iterations that have been examined at any given time.
let n be the multiple of iterations upon which your code will executed.
Example 1: "After x iterations, how many times was n done?"
Example 2: "It is the xth iteration and the action has occurred every nth time, so far."
2. What we're doing:
The first code block (Block A) uses only one variable, x (defined above), and uses 5 (an integer) rather than the variable n (defined above).
The second code block (Block B) uses both of the variables (x and n) that are defined above. The integer, 5, will be replaced by the variable, n. So, Block B literally performs an action at each nth iteration.
Our goal is to do one thing on every iteration and two things on every nth iteration.
We are going through 100 iterations.
m. Easy-to-understand attempt:
Block A, minimal variables:
for x in 100:
#what to do every time (100 times in-total): replace this line with your every-iteration functions.
if x % 5 == 0:
#what to do every 5th time: replace this line with your nth-iteration functions.
Block B, generalization.
n = 5
for x in 100:
#what to do every time (100 times in-total): replace this line with your every-iteration functions.
if x % n == 0:
#what to do every 5th time: replace this line with your nth-iteration functions.
Please, let me know if you have any issues because I haven't had time to test it after writing it here.
3. Exercises
If you've done this properly, see if you can use it with the turtle.Pen() and turtle.forward() function. For example, move the turtle forward 4 times and then right and forward once?
See if you can use this program with the turtle.circle() function. For example, draw a circle with radius+1 4 times and a circle of a new color with radiut+1 once?
Check out the reading (seen below) to attempt to improve the programs from exercise 1 and 2. I can't think of a good reason to be doing this: I just feel like it might be useful!
About modulo and other basic operators:
https://docs.python.org/2/library/stdtypes.html
http://www.tutorialspoint.com/python/python_basic_operators.htm
About turtle:
https://docs.python.org/2/library/turtle.html
https://michael0x2a.com/blog/turtle-examples
Is it really slowing down? You have to try and see for yourself. It won't be much of a slowdown, but if we're talking about nanoseconds it may be considerable. Alternatively you can convert one 10 million loop to two smaller loops:
m = 1000000
for i in range(10):
for i in range(m):
// do sth
print("Progress report")
It's difficult to know how your system will optimize your code without testing.
You could simplify the relational part by realizing that zero is evaluated as false.
if(not N % 10000000)
do stuff
Something like that ? :
for n in xrange(1000000,11000000,1000000):
for i in xrange(n-1000000,n):
x = 10/2
print 'Progress at '+str(i)
result
Progress at 999999
Progress at 1999999
Progress at 2999999
Progress at 3999999
Progress at 4999999
Progress at 5999999
Progress at 6999999
Progress at 7999999
Progress at 8999999
Progress at 9999999
.
EDIT
Better:
for n in xrange(0,10000000,1000000):
for i in xrange(n,n+1000000):
x = 10/2
print 'Progress at '+str(i)
And inspired from pajton:
m = 1000000
for n in xrange(0,10*m,m):
for i in xrange(n,n+m):
x = 10/2
print 'Progress at '+str(i+1)
I prefer this that I find more immediately readable than the pajton's solution.
It keeps the display of a value depending of i
I'd do some testing to see how much time your modulus calls are consuming. You can use timeit for that. If your results indicate a need for time reduction, another approach which eliminates your modulus calculation:
for m in xrange(m_min, m_max):
for n in xrange(n_min, n_max):
#do_n_stuff
print('Progress report...')
It's fast enough that I wouldn't worry about it.
If you really wanted to speed it up, you could do this to avoid the modulus
if (n == 1000000):
n = 0
print('Progress report...')
This makes the inner loop lean, and m does not have to be divisible by interval.
m = 10000000
interval = 1000000
i = 0
while i < m:
checkpoint = min(m, i+interval)
for j in xrange(i, checkpoint):
#do something
i = checkpoint
print "progress"
When I'm doing timing/reports based on count iterations, I just divide my counter by the desired iteration and determine if the result is an integer. So:
if n/1000000 == int(n/1000000):
print(report)