Related
I have been working on the LeetCode problem 5. Longest Palindromic Substring:
Given a string s, return the longest palindromic substring in s.
But I kept getting time limit exceeded on large test cases.
I used dynamic programming as follows:
dp[(i, j)] = True implies that s[i] to s[j] is a palindrome. So if s[i] == str[j] and dp[(i+1, j-1]) is set to True, that means S[i] to S[j] is also a palindrome.
How can I improve the performance of this implementation?
class Solution:
def longestPalindrome(self, s: str) -> str:
dp = {}
res = ""
for i in range(len(s)):
# single character is always a palindrome
dp[(i, i)] = True
res = s[i]
#fill in the table diagonally
for x in range(len(s) - 1):
i = 0
j = x + 1
while j <= len(s)-1:
if s[i] == s[j] and (j - i == 1 or dp[(i+1, j-1)] == True):
dp[(i, j)] = True
if(j-i+1) > len(res):
res = s[i:j+1]
else:
dp[(i, j)] = False
i += 1
j += 1
return res
I think the judging system for this problem is kind of too tight, it took some time to make it pass, improved version:
class Solution:
def longestPalindrome(self, s: str) -> str:
dp = {}
res = ""
for i in range(len(s)):
dp[(i, i)] = True
res = s[i]
for x in range(len(s)): # iterate till the end of the string
for i in range(x): # iterate up to the current state (less work) and for loop looks better here
if s[i] == s[x] and (dp.get((i + 1, x - 1), False) or x - i == 1):
dp[(i, x)] = True
if x - i + 1 > len(res):
res = s[i:x + 1]
return res
Here is another idea to improve the performance:
The nested loop will check over many cases where the DP value is already False for smaller ranges. We can avoid looking at large spans, by looking for palindromes from inside-out and stop extending the span as soon as it no longer is a palindrome. This process should be repeated at every offset in the source string, but this could still save some processing.
The inputs for which then most time is wasted, are those where there are lots of the same letters after each other, like "aaaaaaabcaaaaaaa". These lead to many iterations: each "a" or "aa" could be the center of a palindrome, but "growing" each of them is a waste of time. We should just consider all consecutive "a" together from the start and expand from there onwards.
You can specifically deal with these cases by first grouping consecutive letters which are the same. So the above example would be turned into 4 groups: a(7)b(1)c(1)a(7)
Then let each group in turn be taken as the center of a palindrome. For each group, "fan out" to potentially include one or more neighboring groups at both sides in "tandem". Continue fanning out until either the outside groups are not about the same letter, or they have a different group size. From that result you can derive what the largest palindrome is around that center. In particular, when the case is that the letters of the outer groups are the same, but not their sizes, you still include that letter at the outside of the palindrome, but with a repetition that corresponds to the least of these two mismatching group sizes.
Here is an implementation. I used named tuples to make it more readable:
from itertools import groupby
from collections import namedtuple
Group = namedtuple("Group", "letter,size,end")
class Solution:
def longestPalindrome(self, s: str) -> str:
longest = ""
x = 0
groups = [Group(group[0], len(group), x := x + len(group)) for group in
("".join(group[1]) for group in groupby(s))]
for i in range(len(groups)):
for j in range(0, min(i+1, len(groups) - i)):
if groups[i - j].letter != groups[i + j].letter:
break
left = groups[i - j]
right = groups[i + j]
if left.size != right.size:
break
size = right.end - (left.end - left.size) - abs(left.size - right.size)
if size > len(longest):
x = left.end - left.size + max(0, left.size - right.size)
longest = s[x:x+size]
return longest
Alternatively, you can try this approach, it seems to be faster than 96% Python submission.
def longestPalindrome(self, s: str) -> str:
N = len(s)
if N == 0:
return 0
max_len, start = 1, 0
for i in range(N):
df = i - max_len
if df >= 1 and s[df-1: i+1] == s[df-1: i+1][::-1]:
start = df - 1
max_len += 2
continue
if df >= 0 and s[df: i+1] == s[df: i+1][::-1]:
start= df
max_len += 1
return s[start: start + max_len]
If you want to improve the performance, you should create a variable for len(s) at the beginning of the function and use it. That way instead of calling len(s) 3 times, you would do it just once.
Also, I see no reason to create a class for this function. A simple function will outrun a class method, albeit very slightly.
I am trying to count the number of unique numbers in a sorted array using binary search. I need to get the edge of the change from one number to the next to count. I was thinking of doing this without using recursion. Is there an iterative approach?
def unique(x):
start = 0
end = len(x)-1
count =0
# This is the current number we are looking for
item = x[start]
while start <= end:
middle = (start + end)//2
if item == x[middle]:
start = middle+1
elif item < x[middle]:
end = middle -1
#when item item greater, change to next number
count+=1
# if the number
return count
unique([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5])
Thank you.
Edit: Even if the runtime benefit is negligent from o(n), what is my binary search missing? It's confusing when not looking for an actual item. How can I fix this?
Working code exploiting binary search (returns 3 for given example).
As discussed in comments, complexity is about O(k*log(n)) where k is number of unique items, so this approach works well when k is small compared with n, and might become worse than linear scan in case of k ~ n
def countuniquebs(A):
n = len(A)
t = A[0]
l = 1
count = 0
while l < n - 1:
r = n - 1
while l < r:
m = (r + l) // 2
if A[m] > t:
r = m
else:
l = m + 1
count += 1
if l < n:
t = A[l]
return count
print(countuniquebs([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5]))
I wouldn't quite call it "using a binary search", but this binary divide-and-conquer algorithm works in O(k*log(n)/log(k)) time, which is better than a repeated binary search, and never worse than a linear scan:
def countUniques(A, start, end):
len = end-start
if len < 1:
return 0
if A[start] == A[end-1]:
return 1
if len < 3:
return 2
mid = start + len//2
return countUniques(A, start, mid+1) + countUniques(A, mid, end) - 1
A = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,4,5,5,5,5,5,5,5,5,5,5]
print(countUniques(A,0,len(A)))
I would like to know if my implementation is efficient.
I have tried to find the simplest and low complex solution to that problem using python.
def count_gap(x):
"""
Perform Find the longest sequence of zeros between ones "gap" in binary representation of an integer
Parameters
----------
x : int
input integer value
Returns
----------
max_gap : int
the maximum gap length
"""
try:
# Convert int to binary
b = "{0:b}".format(x)
# Iterate from right to lift
# Start detecting gaps after fist "one"
for i,j in enumerate(b[::-1]):
if int(j) == 1:
max_gap = max([len(i) for i in b[::-1][i:].split('1') if i])
break
except ValueError:
print("Oops! no gap found")
max_gap = 0
return max_gap
let me know your opinion.
I do realize that brevity does not mean readability nor efficiency.
However, ability to spell out solution in spoken language and implement it in Python in no time constitutes efficient use of my time.
For binary gap: hey, lets convert int into binary, strip trailing zeros, split at '1' to list, then find longest element in list and get this element lenght.
def binary_gap(N):
return len(max(format(N, 'b').strip('0').split('1')))
Your implementation converts the integer to a base two string then visits each character in the string. Instead, you could just visit each bit in the integer using << and &. Doing so will avoid visiting each bit twice (first to convert it to a string, then to check if if it's a "1" or not in the resulting string). It will also avoid allocating memory for the string and then for each substring you inspect.
You can inspect each bit of the integer by visiting 1 << 0, 1 << 1, ..., 1 << (x.bit_length).
For example:
def max_gap(x):
max_gap_length = 0
current_gap_length = 0
for i in range(x.bit_length()):
if x & (1 << i):
# Set, any gap is over.
if current_gap_length > max_gap_length:
max_gap_length = current_gap_length
current_gap_length = 0
else:
# Not set, the gap widens.
current_gap_length += 1
# Gap might end at the end.
if current_gap_length > max_gap_length:
max_gap_length = current_gap_length
return max_gap_length
def max_gap(N):
xs = bin(N)[2:].strip('0').split('1')
return max([len(x) for x in xs])
Explanation:
Both leading and trailing zeros are redundant with binary gap finding
as they are not bounded by two 1's (left and right respectively)
So step 1 striping zeros left and right
Then splitting by 1's yields all sequences of 0'z
Solution: The maximum length of 0's sub-strings
As suggested in the comments, itertools.groupby is efficient in grouping elements of an iterable like a string. You could approach it like this:
from itertools import groupby
def count_gap(x):
b = "{0:b}".format(x)
return max(len(list(v)) for k, v in groupby(b.strip("0")) if k == "0")
number = 123456
print(count_gap(number))
First we strip all zeroes from the ends, because a gap has to have on both ends a one. Then itertools.groupby groups ones and zeros and we extract the key (i.e. "0" or "1") together with a group (i.e. if we convert it into a list, it looks like "0000" or "11"). Next we collect the length for every group v, if k is zero. And from this we determine the largest number, i.e. the longest gap of zeroes amidst the ones.
I think the accepted answer dose not work when the input number is 32 (100000). Here is my solution:
def solution(N):
res = 0
st = -1
for i in range(N.bit_length()):
if N & (1 << i):
if st != -1:
res = max(res, i - st - 1)
st = i
return res
def solution(N):
# write your code in Python 3.6
count = 0
gap_list=[]
bin_var = format(N,"b")
for bit in bin_var:
if (bit =="1"):
gap_list.append(count)
count =0
else:
count +=1
return max(gap_list)
Here is my solution:
def solution(N):
num = binary = format(N, "06b")
char = str(num)
find=False
result, conteur=0, 0
for c in char:
if c=='1' and not find:
find = True
if find and c=='0':
conteur+=1
if c=='1':
if result<conteur:
result=conteur
conteur=0
return result
this also works:
def binary_gap(n):
max_gap = 0
current_gap = 0
# Skip the tailing zero(s)
while n > 0 and n % 2 == 0:
n //= 2
while n > 0:
remainder = n % 2
if remainder == 0:
# Inside a gap
current_gap += 1
else:
# Gap ends
if current_gap != 0:
max_gap = max(current_gap, max_gap)
current_gap = 0
n //= 2
return max_gap
Old question, but I would solve it using generators.
from itertools import dropwhile
# a generator that returns binary
# representation of the input
def getBinary(N):
while N:
yield N%2
N //= 2
def longestGap(N):
longestGap = 0
currentGap = 0
# we want to discard the initial 0's in the binary
# representation of the input
for i in dropwhile(lambda x: not x, getBinary(N)):
if i:
# a new gap is found. Compare to the maximum
longestGap = max(currentGap, longestGap)
currentGap = 0
else:
# extend the previous gap or start a new one
currentGap+=1
return longestGap
Can be done using strip() and split() function :
Steps:
Convert to binary (Remove first two characters )
Convert int to string
Remove the trailing and starting 0 and 1 respectively
Split with 1 from the string to find the subsequences of strings
Find the length of the longest substring
Second strip('1') is not mandatory but it will decrease the cases to be checked and will improve the time complexity
Worst case T
def solution(N):
return len(max(bin(N)[2:].strip('0').strip('1').split('1')))
Solution using bit shift operator (100%). Basically the complexity is O(N).
def solution(N):
# write your code in Python 3.6
meet_one = False
count = 0
keep = []
while N:
if meet_one and N & 1 == 0:
count+=1
if N & 1:
meet_one = True
keep.append(count)
count = 0
N >>=1
return max(keep)
def solution(N):
# write your code in Python 3.6
iterable_N = "{0:b}".format(N)
max_gap = 0
gap_positions = []
for index, item in enumerate(iterable_N):
if item == "1":
if len(gap_positions) > 0:
if (index - gap_positions[-1]) > max_gap:
max_gap = index - gap_positions[-1]
gap_positions.append(index)
max_gap -= 1
return max_gap if max_gap >= 0 else 0
this also works:
def solution(N):
bin_num = str(bin(N)[2:])
list1 = bin_num.split('1')
max_gap =0
if bin_num.endswith('0'):
len1 = len(list1) - 1
else:
len1 = len(list1)
if len1 != 0:
for i in range(len1):
if max_gap < len(list1[i]):
max_gap = len(list1[i])
return max_gap
def solution(number):
bits = [int(digit) for digit in bin(number)[2:]]
occurences = [i for i, bit in enumerate(bits) if(bit==1)]
res = [occurences[counter+1]-a-1 for counter, a in enumerate(occurences) if(counter+1 < len(occurences))]
if(not res):
print("Gap: 0")
else:
print("Gap: ", max(res))
number = 1042
solution(number)
This works
def solution(number):
# convert number to binary then strip trailing zeroes
binary = ("{0:b}".format(number)).strip("0")
longest_gap = 0
current_gap = 0
for c in binary:
if c is "0":
current_gap = current_gap + 1
else:
current_gap = 0
if current_gap > longest_gap:
longest_gap = current_gap
return longest_gap
def max_gap(N):
bin = '{0:b}'.format(N)
binary_gap = []
bin_list = [bin[i:i+1] for i in range(0, len(bin), 1)]
for i in range(len(bin_list)):
if (bin_list[i] == '1'):
# print(i)
# print(bin_list[i])
# print(binary_gap)
gap = []
for j in range(len(bin_list[i+1:])):
# print(j)
# print(bin_list[j])
if(bin_list[i+j+1]=='1'):
binary_gap.append(j)
# print(j)
# print(bin_list[j])
# print(binary_gap)
break
elif(bin_list[i+j+1]=='0'):
# print(i+j+1)
# print(bin_list[j])
# print(binary_gap)
continue
else:
# print(i+j+1)
# print(bin_list[i+j])
# print(binary_gap)
break
else:
# print(i)
# print(bin_list[i])
# print(binary_gap)
binary_gap.append(0)
return max(binary_gap)
pass
def find(s, ch):
return [i for i, ltr in enumerate(s) if ltr == ch]
def solution(N):
get_bin = lambda x: format(x, 'b')
binary_num = get_bin(N)
print(binary_num)
binary_str = str(binary_num)
list_1s = find(binary_str,'1')
diffmax = 0
for i in range(len(list_1s)-1):
if len(list_1s)<1:
diffmax = 0
break
else:
diff = list_1s[i+1] - list_1s[i] - 1
if diff > diffmax:
diffmax = diff
return diffmax
pass
def solution(N: int) -> int:
binary = bin(N)[2:]
longest_gap = 0
gap = 0
for char in binary:
if char == '0':
gap += 1
else:
if gap > longest_gap:
longest_gap = gap
gap = 0
return longest_gap
Here's a solution using iterators and generators that will handle edge cases such as the binary gap for the number 32 (100000) being 0 and the binary gap for 0 being 0. It doesn't create a list, instead relying on iterating and processing elements of the bit string one step at a time for a memory efficient solution.
def solution(N):
def counter(n):
count = 0
preceeding_one = False
for x in reversed(bin(n).lstrip('0b')):
x = int(x)
if x == 1:
count = 0
preceeding_one = True
yield count
if preceeding_one and x == 0:
count += 1
yield count
yield count
return(max(counter(N)))
Here is one more that seems to be easy to understand.
def count_gap(x):
binary_str = list(bin(x)[2:].strip('0'))
max_gap = 0
n = len(binary_str)
pivot_point = 0
for i in range(pivot_point, n):
zeros = 0
for j in range(i + 1, n):
if binary_str[j] == '0':
zeros += 1
else:
pivot_point = j
break
max_gap = max(max_gap, zeros)
return max_gap
This is really old, I know. But here's mine:
def solution(N):
gap_list = [len(gap) for gap in bin(N)[2:].strip("0").split("1") if gap != ""]
return max(gap_list) if gap_list else 0
Here is another efficient solution. Hope it may helps you. You just need to pass any number in function and it will return longest Binary gap.
def LongestBinaryGap(num):
n = int(num/2)
bin_arr = []
for i in range(0,n):
if i == 0:
n1 = int(num/2)
bin_arr.append(num%2)
else:
bin_arr.append(n1%2)
n1 = int(n1/2)
if n1 == 0:
break
print(bin_arr)
result = ""
count = 0
count_arr = []
for i in bin_arr:
if result == "found":
if i == 0:
count += 1
else:
if count > 0:
count_arr.append(count)
count = 0
if i == 1:
result = 'found'
else:
pass
if len(count_arr) == 0:
return 0
else:
return max(count_arr)
print(LongestBinaryGap(1130)) # Here you can pass any number.
My code in python 3.6 scores 100
Get the binary Number .. Get the positions of 1
get the abs differennce between 1.. sort it
S = bin(num).replace("0b", "")
res = [int(x) for x in str(S)]
print(res)
if res.count(1) < 2 or res.count(0) < 1:
print("its has no binary gap")
else:
positionof1 = [i for i,x in enumerate(res) if x==1]
print(positionof1)
differnce = [abs(j-i) for i,j in zip(positionof1, positionof1[1:])]
differnce[:] = [differnce - 1 for differnce in differnce]
differnce.sort()
print(differnce[-1])
def solution(N):
return len(max(bin(N).strip('0').split('1')[1:]))
def solution(N):
maksimum = 0
zeros_list = str(N).split('1')
if zeros_list[-1] != "" :
zeros_list.pop()
for item in zeros_list :
if len(item) > maksimum :
maksimum = len(item)
return(maksimum)
def solution(N):
# Convert the number to bin
br = bin(N).split('b')[1]
sunya=[]
groupvalues=[]
for i in br:
count = i
if int(count) == 1:
groupvalues.append(len(sunya))
sunya=[]
if int(count) == 0:
sunya.append('count')
return max(groupvalues)
def solution(N):
bin_num = str(bin(N)[2:])
bin_num = bin_num.rstrip('0')
bin_num = bin_num.lstrip('0')
list1 = bin_num.split('1')
max_gap = 0
for i in range(len(list1)):
if len(list1[i]) > max_gap:
max_gap = len(list1[i])
return (max_gap)
l = [[i, i, 1] for i in range(1,1000000)]
def collatz(li):
for el in li:
if el[1] == 1:
li.remove(el)
elif el[1] % 2 == 0:
el[1] = el[1] / 2
el[2] += 1
elif el[1] % 2 == 1:
el[1] = 3*el[1] + 1
el[2] += 1
return li
while len(collatz(l)) >= 2:
l = collatz(l)
print l
Hi, this is a (partial) solution to Euler problem 14, written in Python.
Longest Collatz sequence
Problem 14
The following iterative sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.
I wrote partial because it does not really output the solution since I can't really run it in the whole 1 - 1000000 range. It's way too slow - taking more than 20 minutes the last time I killed the process. I have barely just started with python and programming in general (about 2 weeks) and I am looking to understand what's the obvious mistake I am making in terms of efficiency. I googled some solutions and even the average ones are orders of magnitude faster than mine. So what am I missing here? Any pointers to literature to avoid making the same mistakes in the future?
a little improvement upon sara's answer
import time
start = time.time()
def collatz(n):
k = n
length = 1
nList = []
nList.append(n)
while n != 1:
if n not in dic:
n = collatzRule(n)
nList.append(n)
length += 1
else:
# we dont need the values but we do need the real length for the for-loop
nList.extend([None for _ in range(dic[n] - 1)])
length = (length - 1) + dic[n]
break
for seq in nList:
if seq not in dic:
dic[seq] = len(nList) - nList.index(seq)
return length
def collatzRule(n):
if n % 2 == 0:
return n // 2
else:
return 3 * n + 1
longestLen = 0
longestNum = 0
dic = {}
for n in range(2, 1000001):
prsntLen = collatz(n)
if prsntLen > longestLen:
longestLen = prsntLen
longestNum = n
# print(f'{n}: {prsntLen}')
print(f'The starting num is: {longestNum} with the longest chain having: {longestLen} terms.')
print(f'time taken: {time.time() - start}')
Sara's answer is great, but can be more efficient.
If the value we return from the function is len(seq), why not just counting the number of iterations instead of conducting a list first?
I have changed the code slightly, and the performance improvement is significant
def collatz(x):
count = 1
temp = x
while temp > 1:
if temp % 2 == 0:
temp = int(temp/2)
if temp in has2: # calculate temp and check if in cache
count += has2[temp]
break
else:
count += 1
else:
temp = 3*temp + 1
if temp in has2:
count += has2[temp]
break
else:
count += 1
has2[x] = count
return count
837799 has 525 elements. calculation time =1.97099995613 seconds.
Compared to the original version
837799 has 525 elements. calculation time =11.3389999866 seconds.
Using list of int rather than building the whole list is ~80% faster.
the problem is you use brute force algorithm that is inefficient.this is my solution to problem 14 from project Euler. it takes a few second to run. the key is you should save previous results in a dictionary so you don't have to compute those results again.:
#problem 14 project euler
import time
start=time.time()
has2={}
def collatz(x):
seq=[]
seq.append(x)
temp=x
while(temp>1):
if temp%2==0:
temp=int(temp/2)
if temp in has2:
seq+=has2[temp]
break
else:
seq.append(temp)
else:
temp=3*temp+1
if temp in has2:
seq+=has2[temp]
break
else:
seq.append(temp)
has2[x]=seq
return len(seq)
num=0
greatest=0
for i in range(1000000):
c=collatz(i)
if num<c:
num=c
greatest=i
print('{0} has {1} elements. calculation time ={2} seconds.'.format(greatest,num,time.time()-start))
As #Sara says you could use dictionary to save previous results and then look them up for making program run faster. But I don't quite understand your results, taking more than 20 mins sounds like you have some problem.
By using bruteforce i get code to run about at 16 sec.
#!/bin/python3
########################
# Collatz Conjecture #
# Written by jeb 2015 #
########################
import time
current = 0
high = 0
# While number is not one, either divide it by 2
# or multiply with 3 and add one
# Returns number of iterations
def NonRecursiveCollatz(i):
counter = 1
while i != 1:
counter = counter + 1
if i%2 == 0:
i = i / 2
else:
i = 3*i + 1
return counter
time_start = time.time()
# Test all numbers between 1 and 1.000.000
# If number returned is higher than last one, store it nd remember
# what number we used as input to the function
for i in range(1,1000000):
current = NonRecursiveCollatz(i)
if current > high:
high = current
number = i
elapsed_time = time.time() - time_start
print "Highest chain"
print high
print "From number "
print number
print "Time taken "
print elapsed_time
With the output:
Highest chain
525
From number
837799
Time taken
16.730340004
//Longest Colletz Sequence
public class Problem14 {
static long getLength(long numb) {
long length = 0;
for(long i=numb; i>=1;) {
length++;
if(i==1)
break;
if(i%2==0)
i = i/2;
else
i = (3*i)+1;
}
return length;
}
static void solution(long numb) {
long number = numb;
long maxLength = getLength(number);
for(long i=numb; i>=1; i--) {
if(getLength(i)>=maxLength) {
maxLength = getLength(i);
number = i;
}
}
System.out.println("`enter code here`Length of "+number+" is : "+maxLength);
}
public static void main(String args[]) {
long begin = System.currentTimeMillis();
solution(1000000);
long end = System.currentTimeMillis();
System.out.println("Time : "+(end-begin));
}
}
output :
Length of 837799 is : 525
Time : 502
I've created this script to compute the string similarity in python. Is there any way I can make it run any faster?
tries = input()
while tries > 0:
mainstr = raw_input()
tot = 0
ml = len(mainstr)
for i in xrange(ml):
j = 0
substr = mainstr[i:]
ll = len(substr)
for j in xrange(ll):
if substr[j] != mainstr[j]:
break
j = j + 1
tot = tot + j
print tot
tries = tries - 1
EDIT: After applying some optimization this is the code, but it's not enough!
tries = int(raw_input())
while tries > 0:
mainstr = raw_input()
tot = 0
ml = len(mainstr)
for i in xrange(ml):
for j in xrange(ml-i):
if mainstr[i+j] != mainstr[j]:
break
j += 1
tot += j
print tot
tries = tries - 1
EDIT 2: The third version of the code. It's still no go!
def mf():
tries = int(raw_input())
for _ in xrange(tries):
mainstr = raw_input()
tot = 0
ml = len(mainstr)
for i in xrange(ml):
for j in xrange(ml-i):
if mainstr[i+j] != mainstr[j]:
break
j += 1
tot += j
print tot
mf()
You could improve it by a constant factor if you use i = mainstr.find(mainstr[0], i+1) instead of checking all i. Special case for i==0 also could help.
Put the code inside a function. It also might speed up things by a constant factor.
Use for ... else: j += 1 to avoid incrementing j at each step.
Try to find a better than O(n**2) algorithm that exploits the fact that you compare all suffixes of the string.
The most straight-forward C implementation is 100 times faster than CPython (Pypy is 10-30 times faster) and passes the challenge:
import os
def string_similarity(string, _cp=os.path.commonprefix):
return sum(len(_cp([string, string[i:]])) for i in xrange(len(string)))
for _ in xrange(int(raw_input())):
print string_similarity(raw_input())
The above optimizations give only several percents improvement and they are not enough to pass the challenge in CPython (Python time limit is only 8 time larger).
There is almost no difference (in CPython) between:
def string_similarity(string):
len_string = len(string)
total = len_string # similarity with itself
for i in xrange(1, len_string):
for n, c in enumerate(string[i:]):
if c != string[n]:
break
else:
n += 1
total += n
return total
And:
def string_similarity(string):
len_string = len(string)
total = len_string # similarity with itself
i = 0
while True:
i = string.find(string[0], i+1)
if i == -1:
break
n = 0
for n in xrange(1, len_string-i):
if string[i+n] != string[n]:
break
else:
n += 1
total += n
return total
You can skip the memory allocation inside the loop. substr = mainstr[i:] allocates a new string unnecessarily. You only use it in substr[j] != mainstr[j], which is equivalent to mainstr[i + j] != mainstr[j], so you don't need to build substr.
Memory allocations are expensive, so you'll want to avoid them in tight loops.
For such simple numeric scripts there are just two things you have to do:
Use PyPy (it does not have complex dependencies and will be massively faster)
Put most of the code in a function. That speeds up stuff for both CPython and PyPy quite drastically. Instead of:
some_code
do:
def main():
some_code
if __name__ == '__main__':
main()
That's pretty much it.
Cheers,
fijal
Here's mine. It passes the test case, but may not be the absolute fastest.
import sys
def simstring(string, other):
val = 0
for l, r in zip(string, other):
if l != r:
return val
val += 1
return val
dsize = sys.stdin.readline()
for i in range(int(dsize)):
ss = 0
string = sys.stdin.readline().strip()
suffix = string
while suffix:
ss += simstring(string, suffix)
suffix = suffix[1:]
sys.stdout.write(str(ss)+"\n")