I have this python code for finding the longest substring. I'm trying to figure out the asymptotic run time of it and I've arrived at an answer but I'm not sure if it's correct. Here is the code:
def longest_substring(s, t):
best = ' '
for s_start in range(0, len(s)):
for s_end in range(s_start, len(s)+1):
for t_start in range(0, len(t)):
for t_end in range(t_start, len(t)+1):
if s[s_start:s_end] == t[t_start:t_end]:
current = s[s_start:s_end]
if len(current) > len(best):
best = current
return best
Obviously this function has a very slow run time. It was designed that way. My approach was that because there is a for loop with 3 more nested for-loops, the run-time is something like O(n^4). I am not sure if this is correct due to not every loop iterating over the input size. Also, it is to be assumed that s = t = n(input size). Any ideas?
If you're not convinced that it's O(n^5), try calculating how many loops you run through for string s alone (i.e. the outer two loops). When s_start == 0, the inner loop runs n + 1 times; when s_start == 1, the inner loop runs n times, and so on, until s_start = n - 1, for which the inner loop runs twice.
The sum
(n + 1) + (n) + (n - 1) + ... + 2
is an arithmetic series for which the formula is
((n + 1) + 2) * n / 2
which is O(n^2).
An additional n factor comes from s[s_start:s_end] == t[t_start:t_end], which is O(n).
Related
I am calculating the run time complexity of the below Python code as n^2 but it seems that it is incorrect. The correct answer shown is n(n-1)/2. Can anyone help help me in understanding why the inner loop is not running n*n times but rather n(n-1)/2?
for i in range(n):
for j in range(i):
val += 1
On the first run of the inner loop, when i = 0, the for loop does not execute at all. (There are zero elements to iterate over in range(0)).
On the second run of the inner loop, when i = 1, the for loop executes for one iteration. (There is one element to iterate over in range(1)).
Then, there are two elements in the third run, three in the fourth, following this pattern, up until i = n - 1.
The total number of iterations is 0 + 1 + ... + (n - 1). This summation has a well known form*: for any natural m, 0 + 1 + ... + m = m * (m + 1) / 2. In this case, we have m = n - 1, so there are n * (n - 1) / 2 iterations in total.
You're right that the asymptotic time complexity of this algorithm is O(n^2). However, given that you've said that the "correct answer" is n * (n - 1) / 2, the question most likely asked for the exact number of iterations (rather than an asymptotic bound).
* See here for a proof of this closed form.
def s_r(n):
if(n==1):
return 1
temp = 1
for i in range(int(n)):
temp += i
return temp * s_r(n/2) * s_r(n/2) * s_r(n/2) * s_r(n/2)
Using recursion tree what’s the Big Oh
And also how do we write this function into one recursive call.
I could only do the first part which I got O(n^2). This was one my exam questions which I want to know the answer and guidance to. Thank you
First, note that the program is incorrect: n/2 is floating point division in python (since Python3), so the program does not terminate unless n is a power of 2 (or eventually rounds to a power of 2). The corrected version of the program that works for all integer n>=1, is this:
def s_r(n):
if(n==1):
return 1
temp = 1
for i in range(n):
temp += i
return temp * s_r(n//2) * s_r(n//2) * s_r(n//2) * s_r(n//2)
If T(n) is the number of arithmetic operations performed by the function, then we have the recurrence relation:
T(1) = 0
T(n) = n + 4T(n//2)
T(n) is Theta(n^2) -- for n a power of 2 and telescoping we get: n + 4(n/2) + 16(n/4) + ... = 1n + 2n + 4n + 8n + ... + nn = (2n-1)n
We can rewrite the program to use Theta(log n) arithmetic operations. First, the temp variable is 1 + 0+1+...+(n-1) = n(n-1)/2 + 1. Second, we can avoid making the same recursive call 4 times.
def s_r(n):
return (1 + n * (n-1) // 2) * s_r(n//2) ** 4 if n > 1 else 1
Back to complexity, I have been careful to say that the first function does Theta(n^2) arithmetic operations and the second Theta(log n) arithmetic operations, rather than use the expression "time complexity". The result of the function grows FAST, so it's not practical to assume arithmetic operations run in O(1) time. If we print the length of the result and the time taken to compute it (using the second, faster version of the code) for powers of 2 using this...
import math
import timeit
def s_r(n):
return (1 + n * (n-1) // 2) * s_r(n//2) ** 4 if n > 1 else 1
for i in range(16):
n = 2 ** i
start = timeit.default_timer()
r = s_r(n)
end = timeit.default_timer()
print('n=2**%d,' % i, 'digits=%d,' % (int(math.log(r))+1), 'time=%.3gsec' % (end - start))
... we get this table, in which you can see the number of digits in the result grows FAST (s_r(2**14) has 101 million digits for example), and the time measured does not grow with log(n), but when n is doubled the time increases by something like a factor of 10, so it grows something like n^3 or n^4.
n=2**0, digits=1, time=6e-07sec
n=2**1, digits=1, time=1.5e-06sec
n=2**2, digits=5, time=9e-07sec
n=2**3, digits=23, time=1.1e-06sec
n=2**4, digits=94, time=2.1e-06sec
n=2**5, digits=382, time=2.6e-06sec
n=2**6, digits=1533, time=3.8e-06sec
n=2**7, digits=6140, time=3.99e-05sec
n=2**8, digits=24569, time=0.000105sec
n=2**9, digits=98286, time=0.000835sec
n=2**10, digits=393154, time=0.00668sec
n=2**11, digits=1572628, time=0.0592sec
n=2**12, digits=6290527, time=0.516sec
n=2**13, digits=25162123, time=4.69sec
n=2**14, digits=100648510, time=42.1sec
n=2**15, digits=402594059, time=377sec
Note that it's not wrong to say the time complexity of the original function is O(n^2) and the improved version of the code O(log n), it's just that these describe a measure of the program (arithmetic operations) and are not at all useful as an estimate of actual program running time.
As #kcsquared said in comments I also believe this function is O(n²). This code can be refactored into one function call by storing the result of recursion call, or just doing some math application. Also you can simplify the range sum, by using the built-in sum
def s_r(n):
if n == 1:
return 1
return (sum(range(int(n))) + 1) * s_r(n/2) ** 4
I have a little problem which looks like a “combinatorics” problem.
We know that 1+2+3+…+k = (k^2 + k)/2; so, let’s take the set of numbers S = {1,2,3,4,…,(k^2 + k)/2} and divide this collection into k parts:
The 1st part has 1 element 1; the 2nd has 2 elements 2,3; the 3rd has 3 elements 4,5,6; …and so on…; the kth having k elements (k^2 – k + 2)/2,…,(k^2 + k)/2.
Then I have to draw a random integer in S, say i = random.randint(1, (k^2 + k)/2) and I have to do some operations according to the element that was drawn:
if i == 1:
`something`
else if 2 <= i <= 3:
`something else`
else if 4 <= i <= 6:
`something else`
…
else: # last line when `i` is in the last `kth` part
`something else`
The number k I have to use is variable, so I can't actually write the above program, because I don't know a priori where it should stop...
It seems to me that the best would be to define a function:
def cases(k):
i = random.randint(1, (k^2 + k)/2)
if i == 1:
`something`
else if 2 <= i <= 3:
… and so on…
But the problem remains: how could I write such a function without a specific k? There may be a trick in Python to do this, but I don't see how.
All ideas will be welcome.
Sorry for the indenting; I acknowledge it's wrong. Following the comment of Zach Munro, I allow myself an answer to better clarify the idea that I had in mind and which was not very clear.
Below is the kind of program I was thinking of; the something and something else are actually similar, for they use the same function, but with a different domain each time:
def cases(start, end, k):
delta = (end - start) / k
i = random.randint(1, (k**2 + k)/2)
if i == 1:
# random in the 1st part
x = random.uniform(start, start + 1*delta)
elif 2 <= i <= 3:
# random in the 2nd part
x = random.uniform(start + 1*delta, start + 2*delta)
elif 4 <= i <= 6:
# random in the 3rd part
x = random.uniform(start + 2*delta, start + 3*delta)
...
elif (k**2 – k + 2)/2 <= i <= (k**2 + k)/2:
# random in the kth part
x = random.uniform(start + (k-1)*delta, start + k*delta)
The question is always how to stop using ... in the program, but to make something that runs when the parameters are provided (start is the beginning of an interval, end is the end of the interval and k is in fact the number of parts into which we separate this interval).
Basically, we sample more and more as we move away from the origin of the interval.
I'm studing recursive function and i faced question of
"Print sum of 1 to n with no 'for' or 'while' "
ex ) n = 10
answer =
55
n = 100
answer = 5050
so i coded
import sys
sys.setrecursionlimit(1000000)
sum = 0
def count(n):
global sum
sum += n
if n!=0:
count(n-1)
count(n = int(input()))
print(sum)
I know it's not good way to get right answer, but there was a solution
n=int(input())
def f(x) :
if x==1 :
return 1
else :
return ((x+1)//2)*((x+1)//2)+f(x//2)*2
print(f(n))
and it works super well , but i really don't know how can human think that logic and i have no idea how it works.
Can you guys explain how does it works?
Even if i'm looking that formula but i don't know why he(or she) used like that
And i wonder there is another solution too (I think it's reall important to me)
I'm really noob of python and code so i need you guys help, thank you for watching this
Here is a recursive solution.
def rsum(n):
if n == 1: # BASE CASE
return 1
else: # RECURSIVE CASE
return n + rsum(n-1)
You can also use range and sum to do so.
n = 100
sum_1_to_n = sum(range(n+1))
you can try this:
def f(n):
if n == 1:
return 1
return n + f(n - 1)
print(f(10))
this function basically goes from n to 1 and each time it adds the current n, in the end, it returns the sum of n + n - 1 + ... + 1
In order to get at a recursive solution, you have to (re)define your problems in terms of finding the answer based on the result of a smaller version of the same problem.
In this case you can think of the result sumUpTo(n) as adding n to the result of sumUpTo(n-1). In other words: sumUpTo(n) = n + sumUpTo(n-1).
This only leaves the problem of finding a value of n for which you know the answer without relying on your sumUpTo function. For example sumUpTo(0) = 0. That is called your base condition.
Translating this to Python code, you get:
def sumUpTo(n): return 0 if n==0 else n + sumUpTo(n-1)
Recursive solutions are often very elegant but require a different way of approaching problems. All recursive solutions can be converted to non-recursive (aka iterative) and are generally slower than their iterative counterpart.
The second solution is based on the formula ∑1..n = n*(n+1)/2. To understand this formula, take a number (let's say 7) and pair up the sequence up to that number in increasing order with the same sequence in decreasing order, then add up each pair:
1 2 3 4 5 6 7 = 28
7 6 5 4 3 2 1 = 28
-- -- -- -- -- -- -- --
8 8 8 8 8 8 8 = 56
Every pair will add up to n+1 (8 in this case) and you have n (7) of those pairs. If you add them all up you get n*(n+1) = 56 which correspond to adding the sequence twice. So the sum of the sequence is half of that total n*(n+1)/2 = 28.
The recursion in the second solution reduces the number of iterations but is a bit artificial as it serves only to compensate for the error introduced by propagating the integer division by 2 to each term instead of doing it on the result of n*(n+1). Obviously n//2 * (n+1)//2 isn't the same as n*(n+1)//2 since one of the terms will lose its remainder before the multiplication takes place. But given that the formula to obtain the result mathematically is part of the solution doing more than 1 iteration is pointless.
There are 2 ways to find the answer
1. Recursion
def sum(n):
if n == 1:
return 1
if n <= 0:
return 0
else:
return n + sum(n-1)
print(sum(100))
This is a simple recursion code snippet when you try to apply the recurrent function
F_n = n + F_(n-1) to find the answer
2. Formula
Let S = 1 + 2 + 3 + ... + n
Then let's do something like this
S = 1 + 2 + 3 + ... + n
S = n + (n - 1) + (n - 2) + ... + 1
Let's combine them and we get
2S = (n + 1) + (n + 1) + ... + (n + 1) - n times
From that you get
S = ((n + 1) * n) / 2
So for n = 100, you get
S = 101 * 100 / 2 = 5050
So in python, you will get something like
sum = lambda n: ( (n + 1) * n) / 2
print(sum(100))
In Wikipedia this is one of the given algorithms to generate prime numbers:
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only candidates below sqrt(n) need be checked.
candidates = [i for i in range(n + 1)]
fin = int(n ** 0.5)
# Loop over the candidates, marking out each multiple.
for i in range(2, fin + 1):
if not candidates[i]:
continue
candidates[i + i::i] = [None] * (n // i - 1)
# Filter out non-primes and return the list.
return [i for i in candidates[2:] if i]
I changed the algorithm slightly.
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only candidates below sqrt(n) need be checked.
candidates = [i for i in range(n + 1)]
fin = int(n ** 0.5)
# Loop over the candidates, marking out each multiple.
candidates[4::2] = [None] * (n // 2 - 1)
for i in range(3, fin + 1, 2):
if not candidates[i]:
continue
candidates[i + i::i] = [None] * (n // i - 1)
# Filter out non-primes and return the list.
return [i for i in candidates[2:] if i]
I first marked off all the multiples of 2, and then I considered odd numbers only. When I timed both algorithms (tried 40.000.000) the first one was always better (albeit very slightly). I don't understand why. Can somebody please explain?
P.S.: When I try 100.000.000, my computer freezes. Why is that? I have Core Duo E8500, 4GB RAM, Windows 7 Pro 64 Bit.
Update 1: This is Python 3.
Update 2: This is how I timed:
start = time.time()
a = eratosthenes_sieve(40000000)
end = time.time()
print(end - start)
UPDATE: Upon valuable comments (especially by nightcracker and Winston Ewert) I managed to code what I intended in the first place:
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only c below sqrt(n) need be checked.
c = [i for i in range(3, n + 1, 2)]
fin = int(n ** 0.5) // 2
# Loop over the c, marking out each multiple.
for i in range(fin):
if not c[i]:
continue
c[c[i] + i::c[i]] = [None] * ((n // c[i]) - (n // (2 * c[i])) - 1)
# Filter out non-primes and return the list.
return [2] + [i for i in c if i]
This algorithm improves the original algorithm (mentioned at the top) by (usually) 50%. (Still, worse than the algorithm mentioned by nightcracker, naturally).
A question to Python Masters: Is there a more Pythonic way to express this last code, in a more "functional" way?
UPDATE 2: I still couldn't decode the algorithm mentioned by nightcracker. I guess I'm too stupid.
The question is, why would it even be faster? In both examples you are filtering multiples of two, the hard way. It doesn't matter whether you hardcode candidates[4::2] = [None] * (n // 2 - 1) or that it gets executed in the first loop of for i in range(2, fin + 1):.
If you are interested in an optimized sieve of Eratosthenes, here you go:
def primesbelow(N):
# https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
#""" Input N>=6, Returns a list of primes, 2 <= p < N """
correction = N % 6 > 1
N = (N, N-1, N+4, N+3, N+2, N+1)[N%6]
sieve = [True] * (N // 3)
sieve[0] = False
for i in range(int(N ** .5) // 3 + 1):
if sieve[i]:
k = (3 * i + 1) | 1
sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]
Explanation here: Porting optimized Sieve of Eratosthenes from Python to C++
The original source is here, but there was no explanation. In short this primesieve skips multiples of 2 and 3 and uses a few hacks to make use of fast Python assignment.
You do not save a lot of time avoiding the evens. Most of the computation time within the algorithm is spent doing this:
candidates[i + i::i] = [None] * (n // i - 1)
That line causes a lot of action on the part of the computer. Whenever the number in question is even, this is not run as the loop bails on the if statement. The time spent running the loop for even numbers is thus really really small. So eliminating those even rounds does not produce a significant change in the timing of the loop. That's why your method isn't considerably faster.
When python produces numbers for range it uses a formula: start + index * step. Multiplying by two as you do in your case is going to be slightly more expensive then one as in the original case.
There is also quite possibly a small overhead to having a longer function.
Neither are of those are really significant speed issues, but they override the very small amount of benefit your version brings.
Its probably slightly slower because you are performing extra set up to do something that was done in the first case anyway (marking off multiples of two). That setup time might be what you see if it is as slight as you say
Your extra step is unnecessary and will actually traverse the whole collection n once doing that 'get rid of evens' operation rather than just operating on n^1/2.