Python slice complexity?

Python slice complexity? - python

What is the time complexity of this short program? I'm thinking it's linear, but then again, in the if statement there's a Python slice. I'm wondering how it would contribute to the complexity:
def count_s(string, sub):
i = 0
j =len(sub)-1
count = 0
while j < len(string):
if string[i:j+1] == sub:
count = count + 1
j = j + 1
i = i + 1
else:
i = i + 1
j = j + 1
return count

Python's time complexity is O(k) where k is the length of the slice, so your function is O(n * k), which is not linear, yet still not exponential. On the other hand, you could use Python's builtin count function, as such: string.count(sub).
Or, if you're set on using your own code, it could be slimmed down to this for readability's sake:
def count_s(string, sub):
i = count = 0
j = len(sub) - 1
while j < len(string):
count += string[i:j + 1] == sub
j += 1
i += 1
return count

Related

Calculating the time complexity of the following code

Could someone please help with finding the complexity of the following code?
def mystery(n):
sum = 0
if n % 2 == 0:
for i in range(len(n + 10000)):
sum += 1
elif n % 3 == 0:
i, j = 0, 0
while i <= n:
while j <= n:
sum += j - 1
j += 1
i += 1
else:
sum = n**3
Would the time complexity of the following code be O(n^2) since in the worst case, the elif statement would get executed, so the outer while loop would execute n times, while the nested while loop would be executed n times once only because we never reset j? Therefore, we would have O(n^2 + n) and because the leading term is n^2, the complexity would be O(n^2)?

In the elif section:
when i=0, the while j <= n: loop is O(n).
when i>0, j=n+1, so the while j <= n: loop is O(1).
So the elif section is O(n) + O(n*1) = O(n+n) = O(n).

Python 3: Optimizing Project Euler Problem #14

I'm trying to solve the Hackerrank Project Euler Problem #14 (Longest Collatz sequence) using Python 3. Following is my implementation.
cache_limit = 5000001
lookup = [0] * cache_limit
lookup[1] = 1
def collatz(num):
if num == 1:
return 1
elif num % 2 == 0:
return num >> 1
else:
return (3 * num) + 1
def compute(start):
global cache_limit
global lookup
cur = start
count = 1
while cur > 1:
count += 1
if cur < cache_limit:
retrieved_count = lookup[cur]
if retrieved_count > 0:
count = count + retrieved_count - 2
break
else:
cur = collatz(cur)
else:
cur = collatz(cur)
if start < cache_limit:
lookup[start] = count
return count
def main(tc):
test_cases = [int(input()) for _ in range(tc)]
bound = max(test_cases)
results = [0] * (bound + 1)
start = 1
maxCount = 1
for i in range(1, bound + 1):
count = compute(i)
if count >= maxCount:
maxCount = count
start = i
results[i] = start
for tc in test_cases:
print(results[tc])
if __name__ == "__main__":
tc = int(input())
main(tc)
There are 12 test cases. The above implementation passes till test case #8 but fails for test cases #9 through #12 with the following reason.
Terminated due to timeout
I'm stuck with this for a while now. Not sure what else can be done here.
What else can be optimized here so that I stop getting timed out?
Any help will be appreciated :)
Note: Using the above implementation, I'm able to solve the actual Project Euler Problem #14. It is giving timeout only for those 4 test cases in hackerrank.

Yes, there are things you can do to your code to optimize it. But I think, more importantly, there is a mathematical observation you need to consider which is at the heart of the problem:
whenever n is odd, then 3 * n + 1 is always even.
Given this, one can always divide (3 * n + 1) by 2. And that saves one a fair bit of time...

Here is an improvement (it takes 1.6 seconds): there is no need to compute the sequence of every number. You can create a dictionary and store the number of the elements of a sequence. If a number that has appeared already comes up, the sequence is computed as dic[original_number] = dic[n] + count - 1. This saves a lot of time.
import time
start = time.time()
def main(n,dic):
'''Counts the elements of the sequence starting at n and finishing at 1'''
count = 1
original_number = n
while True:
if n < original_number:
dic[original_number] = dic[n] + count - 1 #-1 because when n < original_number, n is counted twice otherwise
break
if n == 1:
dic[original_number] = count
break
if (n % 2 == 0):
n = n/2
else:
n = 3*n + 1
count += 1
return dic
limit = 10**6
dic = {n:0 for n in range(1,limit+1)}
if __name__ == '__main__':
n = 1
while n < limit:
dic=main(n,dic)
n += 1
print('Longest chain: ', max(dic.values()))
print('Number that gives the longest chain: ', max(dic, key=dic.get))
end = time.time()
print('Time taken:', end-start)

The trick to solve this question is to compute the answers for only largest input and save the result as lookup for all smaller inputs rather than calculating for extreme upper bound.
Here is my implementation which passes all the Test Cases.(Python3)
MAX = int(5 * 1e6)
ans = [0]
steps = [0]*(MAX+1)
def solve(N):
if N < MAX+1:
if steps[N] != 0:
return steps[N]
if N == 1:
return 0
else:
if N % 2 != 0:
result = 1+ solve(3*N + 1) # This is recursion
else:
result = 1 + solve(N>>1) # This is recursion
if N < MAX+1:
steps[N]=result # This is memoization
return result
inputs = [int(input()) for _ in range(int(input()))]
largest = max(inputs)
mx = 0
collatz=1
for i in range(1,largest+1):
curr_count=solve(i)
if curr_count >= mx:
mx = curr_count
collatz = i
ans.append(collatz)
for _ in inputs:
print(ans[_])

this is my brute force take:
'
#counter
C = 0
N = 0
for i in range(1,1000001):
n = i
c = 0
while n != 1:
if n % 2 == 0:
_next = n/2
else:
_next= 3*n+1
c = c + 1
n = _next
if c > C:
C = c
N = i
print(N,C)

Here's my implementation(for the question specifically on Project Euler website):
num = 1
limit = int(input())
seq_list = []
while num < limit:
sequence_num = 0
n = num
if n == 1:
sequence_num = 1
else:
while n != 1:
if n % 2 == 0:
n = n / 2
sequence_num += 1
else:
n = 3 * n + 1
sequence_num += 1
sequence_num += 1
seq_list.append(sequence_num)
num += 1
k = seq_list.index(max(seq_list))
print(k + 1)

What will be the big-O notation for this code?

Will it be O(n) or greater?
n = length of list
a is a list of integers that is very long
final_count=0
while(n>1):
i=0
j=1
while(a[i+1]==a[i]):
i=i+1
j=j+1
if i==n-1:
break
for k in range(j):
a.pop(0)
final_count=final_count+j*(n-j)
n=n-j

The way I see it, your code would be O(n) if it wasn’t for the a.pop(0) part. Since lists are implemented as arrays in memory, removing an element at the top means that all elements in the array need to moved as well. So removing from a list is O(n). You do that in a loop over j and as far as I can tell, in the end, the sum of all js will be the same as n, so you are removing the item n times from the list, making this part quadratic (O(n²)).
You can avoid this though by not modifying your list, and just keeping track of the initial index. This not only removes the need for the pop, but also the loop over j, making the complexity calculation a bit more straight-forward:
final_count = 0
offset = 0
while n > 1:
i = 0
j = 1
while a[offset + i + 1] == a[offset + i]:
i += 1
j += 1
if i == n - 1:
break
offset += j
final_count += j * (n - j)
n = n - j
Btw. it’s not exactly clear to me why you keep track of j, since j = i + 1 at every time, so you can get rid of that, making the code a bit simpler. And at the very end j * (n - j) is exactly j * n if you first adjust n:
final_count = 0
offset = 0
while n > 1:
i = 0
while a[offset + i + 1] == a[offset + i]:
i += 1
if i == n - 1:
break
offset += i + 1
n = n - (i + 1)
final_count += (i + 1) * n
One final note: As you may notice, offset is now counting from zero to the length of your list, while n is doing the reverse, counting from the length of your list to zero. You could probably combine this, so you don’t need both.

Why does set( ) make this code run so much faster?

I wrote some code for Project Euler Problem 35:
#Project Euler: Problem 35
import time
start = time.time()
def sieve_erat(n):
'''creates list of all primes < n'''
x = range(2,n)
b = 0
while x[b] < int(n ** 0.5) + 1:
x = filter(lambda y: y % x[b] != 0 or y == x[b], x)
b += 1
else:
return x
def circularPrimes(n):
'''returns # of circular primes below n'''
count = 0
primes = sieve_erat(n)
b = set(primes)
for prime in primes:
inc = 0
a = str(prime)
while inc < len(a):
if int(a) not in b:
break
a = a[-1] + a[0:len(a) - 1]
inc += 1
else:
count += 1
else:
return count
print circularPrimes(1000000)
elapsed = (time.time() - start)
print "Found in %s seconds" % elapsed
I am wondering why this code (above) runs so much faster when I set b = set(primes) in the circularPrimes function. The running time for this code is about 8 seconds. Initially, I did not set b = set(primes) and my circularPrimes function was this:
def circularPrimes(n):
'''returns # of circular primes below n'''
count = 0
primes = sieve_erat(n)
for prime in primes:
inc = 0
a = str(prime)
while inc < len(a):
if int(a) not in primes:
break
a = a[-1] + a[0:len(a) - 1]
inc += 1
else:
count += 1
else:
return count
My initial code (without b = set(primes)) ran so long that I didn't wait for it to finish. I am curious as to why there is such a large discrepancy in terms of running time between the two pieces of code as I do not believe that primes would have had any duplicates that would have made iterating through it take so much longer that iterating through set(primes). Maybe my idea of set( ) is wrong. Any help is welcome.

I believe the culprit here is if int(a) not in b:. Sets are implemented internally as hashtables, meaning that checking for membership is significantly less expensive than with a list (since you just need to check for collision).
You can check out the innards of sets here.

How to optimize this python script further?

I've created this script to compute the string similarity in python. Is there any way I can make it run any faster?
tries = input()
while tries > 0:
mainstr = raw_input()
tot = 0
ml = len(mainstr)
for i in xrange(ml):
j = 0
substr = mainstr[i:]
ll = len(substr)
for j in xrange(ll):
if substr[j] != mainstr[j]:
break
j = j + 1
tot = tot + j
print tot
tries = tries - 1
EDIT: After applying some optimization this is the code, but it's not enough!
tries = int(raw_input())
while tries > 0:
mainstr = raw_input()
tot = 0
ml = len(mainstr)
for i in xrange(ml):
for j in xrange(ml-i):
if mainstr[i+j] != mainstr[j]:
break
j += 1
tot += j
print tot
tries = tries - 1
EDIT 2: The third version of the code. It's still no go!
def mf():
tries = int(raw_input())
for _ in xrange(tries):
mainstr = raw_input()
tot = 0
ml = len(mainstr)
for i in xrange(ml):
for j in xrange(ml-i):
if mainstr[i+j] != mainstr[j]:
break
j += 1
tot += j
print tot
mf()

You could improve it by a constant factor if you use i = mainstr.find(mainstr[0], i+1) instead of checking all i. Special case for i==0 also could help.
Put the code inside a function. It also might speed up things by a constant factor.
Use for ... else: j += 1 to avoid incrementing j at each step.
Try to find a better than O(n**2) algorithm that exploits the fact that you compare all suffixes of the string.
The most straight-forward C implementation is 100 times faster than CPython (Pypy is 10-30 times faster) and passes the challenge:
import os
def string_similarity(string, _cp=os.path.commonprefix):
return sum(len(_cp([string, string[i:]])) for i in xrange(len(string)))
for _ in xrange(int(raw_input())):
print string_similarity(raw_input())
The above optimizations give only several percents improvement and they are not enough to pass the challenge in CPython (Python time limit is only 8 time larger).
There is almost no difference (in CPython) between:
def string_similarity(string):
len_string = len(string)
total = len_string # similarity with itself
for i in xrange(1, len_string):
for n, c in enumerate(string[i:]):
if c != string[n]:
break
else:
n += 1
total += n
return total
And:
def string_similarity(string):
len_string = len(string)
total = len_string # similarity with itself
i = 0
while True:
i = string.find(string[0], i+1)
if i == -1:
break
n = 0
for n in xrange(1, len_string-i):
if string[i+n] != string[n]:
break
else:
n += 1
total += n
return total

You can skip the memory allocation inside the loop. substr = mainstr[i:] allocates a new string unnecessarily. You only use it in substr[j] != mainstr[j], which is equivalent to mainstr[i + j] != mainstr[j], so you don't need to build substr.
Memory allocations are expensive, so you'll want to avoid them in tight loops.

For such simple numeric scripts there are just two things you have to do:
Use PyPy (it does not have complex dependencies and will be massively faster)
Put most of the code in a function. That speeds up stuff for both CPython and PyPy quite drastically. Instead of:
some_code
do:
def main():
some_code
if __name__ == '__main__':
main()
That's pretty much it.
Cheers,
fijal

Here's mine. It passes the test case, but may not be the absolute fastest.
import sys
def simstring(string, other):
val = 0
for l, r in zip(string, other):
if l != r:
return val
val += 1
return val
dsize = sys.stdin.readline()
for i in range(int(dsize)):
ss = 0
string = sys.stdin.readline().strip()
suffix = string
while suffix:
ss += simstring(string, suffix)
suffix = suffix[1:]
sys.stdout.write(str(ss)+"\n")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python slice complexity? - python

Related

Calculating the time complexity of the following code

Python 3: Optimizing Project Euler Problem #14

What will be the big-O notation for this code?

Why does set( ) make this code run so much faster?

How to optimize this python script further?

Categories

Resources