string comparison time complexity for advice - python

I'm working on a problem to find wholly repeated shortest substring of a given string, and if no match, return length of the string.
My major idea is learned from Juliana's answer here (Check if string is repetition of an unknown substring), I rewrite the algorithm in Python 2.7.
I think it should be O(n^2), but not confident I am correct, here is my thought -- since in the outer loop, it tries possibility of begin character to iterate with -- it is O(n) external loop, and in the inner loop, it compares character one by one -- it is O(n) internal comparison. So, overall time complexity is O(n^2)? If I am not correct, please help to correct. If I am correct, please help to confirm. :)
Input and output example
catcatcat => 3
aaaaaa=>1
aaaaaba = > 7
My code,
def rorate_solution(word):
for i in range(1, len(word)//2 + 1):
j = i
k = 0
while k < len(word):
if word[j] != word[k]:
break
j+=1
if j == len(word):
j = 0
k+=1
else:
return i
return len(word)
if __name__ == "__main__":
print rorate_solution('catcatcat')
print rorate_solution('catcatcatdog')
print rorate_solution('abaaba')
print rorate_solution('aaaaab')
print rorate_solution('aaaaaa')

Your assessment of the runtime of your re-write is correct.
But Use just the preprocessing of KMP to find the shortest period of a string.
(The re-write could be more simple:
def shortestPeriod(word):
"""the length of the shortest prefix p of word
such that word is a repetition p
"""
# try prefixes of increasing length
for i in xrange(1, len(word)//2 + 1):
j = i
while word[j] == word[j-i]:
j += 1
if len(word) <= j:
return i
return len(word)
if __name__ == "__main__":
for word in ('catcatcat', 'catcatcatdog',
'abaaba', 'ababbbababbbababbbababbb',
'aaaaab', 'aaaaaa'):
print shortestBase(word)
- yours compares word[0:i] to word[i:2*i] twice in a row.)

Related

Python: how to optimize this code for better performance

I use this code to syllable a list of strings, I need to be able to speed up this code to pass some tests, any ideas to improve this code and make it faster?
def separate(word):
s = ""
L = []
voc = ['a','e','i','o','u','y','j']
for letter in word:
if len(s) < 1:
s += letter
elif s[len(s) - 1] not in voc:
s += letter
elif s.startswith('aeiouyj'):
s +=letter
elif letter in voc:
s += letter
else:
L.append(s)
s = letter
L.append(s)
return L
Did some small misc adjustments.
def separate(word):
s=''
L = []
for letter in word:
if s=='':
s = letter
elif s[-1] not in 'aeiouyj' or letter in 'aeiouyj':
s += letter
else:
L.append(s)
s = letter
L.append(s)
return L
Not sure if s.startswith('aeiouyj') is of any use in original code because it never going to be True.
def separate(word):
s = ""
L = []
voc = ['a','e','i','o','u','y','j']
s=word[0]
for letter in word[1:]:
if s[len(s) - 1] not in voc:
s += letter
elif letter in voc or s.startswith('aeiouyj'):
s +=letter
else:
L.append(s)
s = letter
L.append(s)
return L
After analysing the code, we can see that it is going into first if just in the beginning. So, we can skip this part. One more thing you can do bring the if before last else to the second place. After analysing a Harry Potter paragraph with this program it enters into loops "1,2436,0,98,959" times respectively. The third if maybe eliminated too, since it never entered into this branch in my case.
Instead of using loops you can use recursion which is faster than it. Also Functional programming is always preferred as it is based on input.

Print odd index in same line

I'm trying to complete challenge on HackerRank ( Day 6 : Let's review!) and I only did to print the even numbers on the same line, but I can't print the odd indexes that would be needed to complete the challenge.
This is my code:
word_check = input()
for index, char in enumerate (word_check):
if (index % 2 == 0):
print( char ,end ="" )
This is the most specific task:
Given a string, S , of length N that is indexed from 0 to N -1 , print its even-indexed and odd-indexed characters as space-separated strings on a single line.
Thanks!!!
RavDev
You can use slice notation for indexing the original string:
word_check[::2] + " " + word_check[1::2]
[::2] means "start at the beginning and skip every second element until we reach the end" and [1::2] means "start at the second element and skip every second element until we reach the end". Leaving out either start or stop arguments of the slice implies beginning or end of the sequence respectively. Leaving out the step argument implies a step size of 1.
Slice notation is a better approach, but if you want to use for loop and stick to your approach, you can do in this way:
even =''
odd=''
for index, char in enumerate (word_check):
if (index % 2 == 0):
even += char
else: odd += char
print (even, odd)
I am currently trying to solve the same problem. To get your answers on the same line, initiate two strings: one for even and one for odd. If the character's index is even, add it to the even string and vice versa. Here is my working code so far:
def indexes(word,letter):
result = list()
for i,x in enumerate(word):
if x == letter:
result.append(i)
return result
T = int(input())
if T <= 10 and T>= 1:
for i in range(T):
evenstring = ""
oddstring = ""
lastchar = False
S = input()
if len(S) >= 2 and len(S) <= 10000:
for index, char in enumerate (S):
if (index % 2 == 0):
evenstring += char
else: oddstring += char
if len(indexes(S, char)) > 1:
evenstring.replace(evenstring[evenstring.rfind(char)], '')
oddstring.replace(oddstring[oddstring.rfind(char)], '')
print(evenstring, oddstring)
Your next problem now is trying to remove any reoccurrences of duplicate letters from your final answer (they show up in other test cases)

How do I go about ending this loop?

I am trying to count the longest length of string in alphabetical order
s = 'abcv'
longest = 1
current = 1
for i in range (len(s) - 1):
if s[i] <= s[i+1]:
current += 1
else:
if current > longest:
longest = current
current = 0
i += 1
print longest
For this specific string, 'Current' ends up at the correct length, 4, but never modifies longest.
EDIT: The following code now runs into an error
s = 'abcv'
current = 1
biggest = 0
for i in range(len(s) - 1):
while s[i] <= s[i+1]:
current += 1
i += 1
if current > biggest:
biggest = current
current = 0
print biggest
It seems my logic is correct , but I run into errors for certain strings. :(
Although code sources are available on the internet which print the longest string, I can't seem to find how to print the longest length.
break will jump behind the loop (to sam indentation as the for statement. continue will jump to start of loop and do the next iteration
Your logic in the else: statement does not work - you need to indent it one less.
if s[i] <= s[i+1]:
checks for "is actual char less or equal then next char" - if this is the case you need to increment your internal counter and set longest if it is longer
You might get into trouble with if s[i] <= s[i+1]: - you are doing it till len(s)-1. "jfjfjf" is len("jfjfjf") = 6 - you would iterate from 0 to 5 - but the if accesses s[5] and s[6] which is more then there are items.
A different approach without going over explicit indexes and split into two responsibilities (get list of alphabetical substring, order them longest first):
# split string into list of substrings that internally are alphabetically ordered (<=)
def getAlphabeticalSplits(s):
result = []
temp = ""
for c in s: # just use all characters in s
# if temp is empty or the last char in it is less/euqal to current char
if temp == "" or temp[-1] <= c:
temp += c # append it to the temp substring
else:
result.append(temp) # else add it to the list of substrings
temp = "" # and clear tem
# done with all chars, return list of substrings
return result
# return the splitted list as copy after sorting reverse by length
def SortAlphSplits(sp, rev = True):
return sorted(sp, key=lambda x: len(x), reverse=rev)
splitter = getAlphabeticalSplits("akdsfabcdemfjklmnopqrjdhsgt")
print(splitter)
sortedSplitter = SortAlphSplits(splitter)
print (sortedSplitter)
print(len(sortedSplitter[0]))
Output:
['ak', 's', 'abcdem', 'jklmnopqr', 'dhs']
['jklmnopqr', 'abcdem', 'dhs', 'ak', 's']
9
This one returns the array of splits + sorts them by length descending. In a critical environment this costs more memory then yours as you only cache some numbers whereas the other approach fills lists and copies it into a sorted one.
To solve your codes index problem change your logic slightly:
Start at the second character and test if the one before is less that this. That way you will ever check this char with the one before
s = 'abcvabcdefga'
current = 0
biggest = 0
for i in range(1,len(s)): # compares the index[1] with [0] , 2 with 1 etc
if s[i] >= s[i-1]: # this char is bigger/equal last char
current += 1
biggest = max(current,biggest)
else:
current = 1
print biggest
You have to edit out the else statement. Because consider the case where the current just exceeds longest, i.e, from current = 3 and longest =3 , current becomes 4 by incrementing itself. Now here , you still want it to go inside the if current > longest statement
s = 'abcv'
longest = 1
current = 1
for i in range (len(s) - 1):
if s[i] <= s[i+1]:
current += 1
#else:
if current > longest:
longest = current
current = 0
i += 1
longest = current
print longest
Use a while condition loop, then you can easy define, at what condition your loop is done.
If you want QualityCode for longterm:
While loop is better practice than a break, because you see the Looping condition at one place. The simple break is often worse to recognize inbetween the loopbody.
At the end of the loop, current is the length of the last substring in ascending order. Assigning it to longest is not right as the last substring in ascending is not necessarily the longest.
So longest=max(current,longest) instead of longest=current after the loop, should solve it for you.
Edit: ^ was for before the edit. You just need to add longest=max(current,longest) after the for loop, for the same reason (the last ascending substring is not considered). Something like this:
s = 'abcv'
longest = 1
current = 1
for i in range (len(s) - 1):
if s[i] <= s[i+1]:
current += 1
else:
if current > longest:
longest = current
current = 0
i += 1
longest=max(current,longest) #extra
print longest
The loop ends when there is no code after the tab space so technically your loop has already ended

Python: "IndexError: string index out of range" Beginner

I know, I know, this question has been asked plenty of times before. But I can't figure out how to fix it here - in this particular instance. When I subtract 2, which is what was recommended, I still get the same error within if statement. Thanks
The code (at least it should) take a string "s" and measure it against the alphabet "order" and then give an output of the longest substring in s which is in alphabetical order.
order = "abcdefghijklmnopqrstuvwxyz"
s = 'abcbcdabc'
match = ""
for i in range(len(s)):
for j in range(len(order)):
if (((i + j ) - 2) < len(order) and order[i] == s[j]):
match += s[i]
print("Longest substring in alphabetical order is: " + match)
That is because you are using index j of order list to access s list. It is possible that j is greater than len(s) hence the IndexError.
I don't know what you are trying to achieve with the code. But in any case heres what you can change to make it working: match += s[i] OR match += order[j]

I need to count like characters in the same position in python, but I have no idea how to get it right, as i am new to the program

Write a Python script that asks the user to enter two DNA
sequences with the same length. If the two sequences have
different lengths then output "Invalid input, the length must
be the same!" If inputs are valid, then compute how many dna bases at the
same position are equal in these two sequences and output the
answer "x positions of these two sequences have the same
character". x is the actual number, depending on the user's
input.
Below is what I have so far.
g=input('Enter DNA Sequence: ')
h=input('Enter Second DNA Sequence: ')
i=0
count=0
if len(g)!=len(h):
print('Invalid')
else:
while i<=len(g):
if g[i]==h[i]:
count+=1
i+=1
print(count)
Do this in your while loop instead (choose better variable names in your actual code):
for i, j in zip(g, h):
if i == j:
count += 1
OR replace the loop entirely with
count = sum(1 for i, j in zip(g, h) if i == j)
This will fix your index error. In general, you shouldn't be indexing lists in python, but looping over them. If you really want to index them, the i <= len(g) was the problem... it should be changed to i < len(g).
If you wanted to be really tricky, you could use the fact that True == 1 and False == 0:
count = sum(int(i == j) for i, j in zip(g, h))
The issue here is your loop condition. Your code gives you an IndexError; this means that you tried to access a character of a string, but there is no character at that index. What it means here is that i is greater than the len(g) - 1.
Consider this code:
while i<=len(g):
print(i)
i+=1
For g = "abc", it prints
0
1
2
3
Those are four numbers, not three! Since you start from 0, you must omit the last number, 3. You can adjust your condition as such:
while i < len(g):
# do things
But in Python, you should avoid using while loops when a for-loop will do. Here, you can use a for-loop to iterate through a sequence, and zip to combine two sequences into one.
for i, j in zip(g, h):
# i is the character of g, and j is the character of h
if i != j:
count += 1
You'll notice that you avoid the possibility of index errors and don't have to type so many [i]s.
i<=len(g) - replace this with i<len(g), because index counting starts from 0, not 1. This is the error you are facing. But in addition, your code is not very pretty...
First way to simplify it, keeping your structure:
for i in range(len(g)):
if g[i]==h[i]:
count+=1
Even better, you can actually make it a one-liner:
sum(g[i]==h[i] for i in range(len(g)))
Here the fact that True is evaluated to 1 in Python is used.
g = raw_input('Enter DNA Sequence: ')
h = raw_input('Enter Second DNA Sequence: ')
c = 0
count = 0
if len(g) != len(h):
print('Invalid')
else:
for i in g:
if g[c] != h[c]:
print "string does not match at : " + str(c)
count = count + 1
c = c + 1
print(count)
if(len(g)==len(h)):
print sum([1 for a,b in zip(g,h) if a==b])
Edit: Fixed the unclosed parens. Thanks for the comments, will definitely look at the generator solution and learn a bit - thanks!

Categories