Optimize python code to avoid runtime error - python

Given a string that might have multiple occurrences of the same character, return the closest same character of any indicated character in the string.
Given the string s and n number of queries. In each query, you are given an index a (where 0 <= a <= |s| ) of a character, and you need to print the index of the closet same character. If there are multiple answers, print the smallest one. Otherwise, print -1.
For example, string s = 'youyouy', with a given query 3: there are two matching character at indices 0 and 6, each 3 away, we choose the smallest one which is 0.
Here is my plan:
I put the string in a dictionary, the key is distinct letters in a string, values are letters corresponding indexes. When given a query, find the corresponding letter in the dictionary and return the closest value to the query.
def closest(s, queries):
res = []
dict2={}
#dict2 - letter - indexs
for i in range(len(s)):
if s[i] not in dict2:
dict2[s[i]]=[i]
else:
dict2[s[i]].append(i)
for num in queries:
#closet- denotes closet letter index
closet = math.inf
#num is out of range , append -1
if num > (len(s)-1):
res.append(-1)
continue
#this is the only one letter, append -1
letter=s[num]
if len(dict2[letter])==1:
res.append(-1)
continue
#temp = list for that letters
temp=dict2[s[num]]
index=temp.index(num) . #in the list, letter index's index in list
if index==0:
closet=temp[1]
elif index==(len(temp)-1):
closet=temp[index-1]
else:
distance1=num-temp[index-1] . #left
distance2=temp[index+1]-num . #right
if distance1 <= distance2:
closet=temp[index-1]
else:
closet=temp[index+1]
if closet == math.inf:
res.append(-1)
else:
res.append(closet)
return res
I got two runtime error. I am wondering if you could help me out to maybe reduce some run time ?
Also, I am looking for another suggestions! I have used Python for a while, and I am looking for a job (university new grad). Is java usually running faster than Python? Should I switch to Java?

Im trying to do as simple as i can , but i look like a bit complex. Though you question is avoiding runtime error , i want to present my idea
s='oooyyouoy'
k='0123456789'
def cloest(string,pos):
c = string[pos]
p1 , p2 = s[:pos] , s[pos+1:]
# reserve left part and find the closet one , add 1 because len(p1)=final_position + 1
l = len(p1) - (p1[::-1].find(c) + 1)
# find without reserve and add 1 because s[pos+1:]
r = (p2.find(c) + 1) + pos
# judge which one is closer if same chose left one
result = l if (pos - l) <= (r - pos) else r
if result == pos:
return -1
else:
return result
print(cloest(s,4))

Related

Recursive Decompression of Strings

I'm trying to decompress strings using recursion. For example, the input:
3[b3[a]]
should output:
baaabaaabaaa
but I get:
baaaabaaaabaaaabbaaaabaaaabaaaaa
I have the following code but it is clearly off. The first find_end function works as intended. I am absolutely new to using recursion and any help understanding / tracking where the extra letters come from or any general tips to help me understand this really cool methodology would be greatly appreciated.
def find_end(original, start, level):
if original[start] != "[":
message = "ERROR in find_error, must start with [:", original[start:]
raise ValueError(message)
indent = level * " "
index = start + 1
count = 1
while count != 0 and index < len(original):
if original[index] == "[":
count += 1
elif original[index] == "]":
count -= 1
index += 1
if count != 0:
message = "ERROR in find_error, mismatched brackets:", original[start:]
raise ValueError(message)
return index - 1
def decompress(original, level):
# set the result to an empty string
result = ""
# for any character in the string we have not looked at yet
for i in range(len(original)):
# if the character at the current index is a digit
if original[i].isnumeric():
# the character of the current index is the number of repetitions needed
repititions = int(original[i])
# start = the next index containing the '[' character
x = 0
while x < (len(original)):
if original[x].isnumeric():
start = x + 1
x = len(original)
else:
x += 1
# last = the index of the matching ']'
last = find_end(original, start, level)
# calculate a substring using `original[start + 1:last]
sub_original = original[start + 1 : last]
# RECURSIVELY call decompress with the substring
# sub = decompress(original, level + 1)
# concatenate the result of the recursive call times the number of repetitions needed to the result
result += decompress(sub_original, level + 1) * repititions
# set the current index to the index of the matching ']'
i = last
# else
else:
# concatenate the letter at the current index to the result
if original[i] != "[" and original[i] != "]":
result += original[i]
# return the result
return result
def main():
passed = True
ORIGINAL = 0
EXPECTED = 1
# The test cases
provided = [
("3[b]", "bbb"),
("3[b3[a]]", "baaabaaabaaa"),
("3[b2[ca]]", "bcacabcacabcaca"),
("5[a3[b]1[ab]]", "abbbababbbababbbababbbababbbab"),
]
# Run the provided tests cases
for t in provided:
actual = decompress(t[ORIGINAL], 0)
if actual != t[EXPECTED]:
print("Error decompressing:", t[ORIGINAL])
print(" Expected:", t[EXPECTED])
print(" Actual: ", actual)
print()
passed = False
# print that all the tests passed
if passed:
print("All tests passed")
if __name__ == '__main__':
main()
From what I gathered from your code, it probably gives the wrong result because of the approach you've taken to find the last matching closing brace at a given level (I'm not 100% sure, the code was a lot). However, I can suggest a cleaner approach using stacks (almost similar to DFS, without the complications):
def decomp(s):
stack = []
for i in s:
if i.isalnum():
stack.append(i)
elif i == "]":
temp = stack.pop()
count = stack.pop()
if count.isnumeric():
stack.append(int(count)*temp)
else:
stack.append(count+temp)
for i in range(len(stack)-2, -1, -1):
if stack[i].isnumeric():
stack[i] = int(stack[i])*stack[i+1]
else:
stack[i] += stack[i+1]
return stack[0]
print(decomp("3[b]")) # bbb
print(decomp("3[b3[a]]")) # baaabaaabaaa
print(decomp("3[b2[ca]]")) # bcacabcacabcaca
print(decomp("5[a3[b]1[ab]]")) # abbbababbbababbbababbbababbbab
This works on a simple observation: rather tha evaluating a substring after on reading a [, evaluate the substring after encountering a ]. That would allow you to build the result AFTER the pieces have been evaluated individually as well. (This is similar to the prefix/postfix evaluation using programming).
(You can add error checking to this as well, if you wish. It would be easier to check if the string is semantically correct in one pass and evaluate it in another pass, rather than doing both in one go)
Here is the solution with the similar idea from above:
we go through string putting everything on stack until we find ']', then we go back until '[' taking everything off, find the number, multiply and put it back on stack
It's much less consuming as we don't add strings, but work with lists
Note: multiply number can't be more than 9 as we parse it as one element string
def decompress(string):
stack = []
letters = []
for i in string:
if i != ']':
stack.append(i)
elif i == ']':
letter = stack.pop()
while letter != '[':
letters.append(letter)
letter = stack.pop()
word = ''.join(letters[::-1])
letters = []
stack.append(''.join([word for j in range(int(stack.pop()))]))
return ''.join(stack)

How to find the most amount of shared characters in two strings? (Python)

yamxxopd
yndfyamxx
Output: 5
I am not quite sure how to find the number of the most amount of shared characters between two strings. For example (the strings above) the most amount of characters shared together is "yamxx" which is 5 characters long.
xx would not be a solution because that is not the most amount of shared characters. In this case the most is yamxx which is 5 characters long so the output would be 5.
I am quite new to python and stack overflow so any help would be much appreciated!
Note: They should be the same order in both strings
Here is simple, efficient solution using dynamic programming.
def longest_subtring(X, Y):
m,n = len(X), len(Y)
LCSuff = [[0 for k in range(n+1)] for l in range(m+1)]
result = 0
for i in range(m + 1):
for j in range(n + 1):
if (i == 0 or j == 0):
LCSuff[i][j] = 0
elif (X[i-1] == Y[j-1]):
LCSuff[i][j] = LCSuff[i-1][j-1] + 1
result = max(result, LCSuff[i][j])
else:
LCSuff[i][j] = 0
print (result )
longest_subtring("abcd", "arcd") # prints 2
longest_subtring("yammxdj", "nhjdyammx") # prints 5
This solution starts with sub-strings of longest possible lengths. If, for a certain length, there are no matching sub-strings of that length, it moves on to the next lower length. This way, it can stop at the first successful match.
s_1 = "yamxxopd"
s_2 = "yndfyamxx"
l_1, l_2 = len(s_1), len(s_2)
found = False
sub_length = l_1 # Let's start with the longest possible sub-string
while (not found) and sub_length: # Loop, over decreasing lengths of sub-string
for start in range(l_1 - sub_length + 1): # Loop, over all start-positions of sub-string
sub_str = s_1[start:(start+sub_length)] # Get the sub-string at that start-position
if sub_str in s_2: # If found a match for the sub-string, in s_2
found = True # Stop trying with smaller lengths of sub-string
break # Stop trying with this length of sub-string
else: # If no matches found for this length of sub-string
sub_length -= 1 # Let's try a smaller length for the sub-strings
print (f"Answer is {sub_length}" if found else "No common sub-string")
Output:
Answer is 5
s1 = "yamxxopd"
s2 = "yndfyamxx"
# initializing counter
counter = 0
# creating and initializing a string without repetition
s = ""
for x in s1:
if x not in s:
s = s + x
for x in s:
if x in s2:
counter = counter + 1
# display the number of the most amount of shared characters in two strings s1 and s2
print(counter) # display 5

Infinite string

We are given N words, each of length at max 50.All words consist of small case alphabets and digits and then we concatenate all the N words to form a bigger string A.An infinite string S is built by performing infinite steps on A recursively: In ith step, A is concatenated with ′$′ i times followed by reverse of A. Eg: let N be 3 and each word be '1','2' and '3' after concatenating we get A= 123 reverse of a is 321 and on first recursion it will be
A=123$321 on second recursion it will be A=123$321$$123$321 And so on… The infinite string thus obtained is S.Now after ith recursion we have to find the character at index say k.Now recursion can be large as pow(10,4) and N which can be large as (pow(10,4)) and length of each word at max is 50 so in worst case scenario our starting string can have a length of 5*(10**5) which is huge so recursion and adding the string won't work.
What I came up with is that the string would be a palindrome after 1 st recursion so if I can calculate the pos of '$'*I I can calculate any index since the string before and after it is a palindrome.I came up with a pattern that
looks like this:
string='123'
k=len(string)
recursion=100
lis=[]
for i in range(1,recursion+1):
x=(2**(i-1))
y=x*(k+1)+(x-i)
lis.append(y)
print(lis[:10])
Output:
[4, 8, 17, 36, 75, 154, 313, 632, 1271, 2550]
Now I have two problems with it first I also want to add position of adjacent '$' in the list because at position 8 which is the after 2nd recursion there will be more (recursion-1)=1 more '$' at position 9 and likewise for position 17 which is 3rd recursion there will be (3-1) two more '$' in position 18 and 19 and this would continue until ith recursion and for that I would have to insert while loop and that would make my algorithm to give TLE
string='123'
k=len(string)
recursion=100
lis=[]
for i in range(1,recursion+1):
x=(2**(i-1))
y=x*(k+1)+(x-i)
lis.append(y)
count=1
while(count<i):
y=y+1
lis.append(y)
count+=1
print(lis[:10])
Output: [4, 8, 9, 17, 18, 19, 36, 37, 38, 39]
The idea behind finding the position of $ is that the string before and after it is a palindrome and if the index of $ is odd the element before and after it would be the last element of the string and it is even the element before and after it would be the first element of the string.
The number of dollar signs that S will have in each group of them follows the following sequence:
1 2 1 3 1 2 1 4 1 2 1 ...
This corresponds to the number of trailing zeroes that i has in its binary representation, plus one:
bin(i) | dollar signs
--------+-------------
00001 | 1
00010 | 2
00011 | 1
00100 | 3
00101 | 1
00110 | 2
... ...
With that information you can use a loop that subtracts from k the size of the original words and then subtracts the number of dollars according to the above observation. This way you can detect whether k points at a dollar or within a word.
Once k has been "normalised" to an index within the limits of the original total words length, there only remains a check to see whether the characters are in their normal order or reversed. This depends on the number of iterations done in the above loop, and corresponds to i, i.e. whether it is odd or even.
This leads to this code:
def getCharAt(words, k):
size = sum([len(word) for word in words]) # sum up the word sizes
i = 0
while k >= size:
i += 1
# Determine number of dollars: corresponds to one more than the
# number of trailing zeroes in the binary representation of i
b = bin(i)
dollars = len(b) - b.rindex("1")
k -= size + dollars
if k < 0:
return '$'
if i%2: # if i is odd, then look in reversed order
k = size - 1 - k
# Get the character at the k-th index
for word in words:
if k < len(word):
return word[k]
k -= len(word)
You would call it like so:
print (getCharAt(['1','2','3'], 13)) # outputs 3
Generator Version
When you need to request multiple characters like that, it might be more interesting to create a generator, which just keeps producing the next character as long as you keep iterating:
def getCharacters(words):
i = 0
while True:
i += 1
if i%2:
for word in words:
yield from word
else:
for word in reversed(words):
yield from reversed(word)
b = bin(i)
dollars = len(b) - b.rindex("1")
yield from "$" * dollars
If for instance you want the first 80 characters from the infinite string that would be built from "a", "b" and "cd", then call it like this:
import itertools
print ("".join(itertools.islice(getCharacters(['a', 'b', 'cd']), 80)))
Output:
abcd$dcba$$abcd$dcba$$$abcd$dcba$$abcd$dcba$$$$abcd$dcba$$abcd$dcba$$$abcd$dcba$
Here is my solution to the problem (index starts at 1 for findIndex) I am basically counting recursively to find the value of the findIndex element.
def findInd(k,n,findIndex,orientation):
temp = k # no. of characters covered.
tempRec = n # no. of dollars to be added
bool = True # keeps track of if dollar or reverse of string is to be added.
while temp < findIndex:
if bool:
temp += tempRec
tempRec += 1
bool = not bool
else:
temp += temp - (tempRec - 1)
bool = not bool
# print(temp,findIndex)
if bool:
if findIndex <= k:
if orientation: # checks if string must be reversed.
return A[findIndex - 1]
else:
return A[::-1][findIndex - 1] # the string reverses when there is a single dollar so this is necessary
else:
if tempRec-1 == 1:
return findInd(k,1,findIndex - (temp+tempRec-1)/2,False) # we send a false for orientation as we want a reverse in case we encounter a single dollar sign.
else:
return findInd(k,1,findIndex - (temp+tempRec-1)/2,True)
else:
return "$"
A = "123" # change to suit your need
findIndex = 24 # the index to be found # change to suit your need
k = len(A) # length of the string.
print(findInd(k,1,findIndex,True))
I think this will satisfy your time constraint also as I do not go through each element.

extract substring pattern

I have long file like 1200 sequences
>3fm8|A|A0JLQ2
CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTP
QKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
>2ht9|A|A0JLT0
LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDA
LYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCYL
I want to read each possible pattern has cysteine in middle and has in the beginning five string and follow by other five string such as xxxxxCxxxxx
the output should be like this:
QDIQLCGMGIL
ILPEHCIIDIT
TISDNCVVIFS
FSKTSCSYCTM
this is the pogram only give position of C . it is not work like what I want
pos=[]
def find(ch,string1):
for i in range(len(string1)):
if ch == string1[i]:
pos.append(i)
return pos
z=find('C','AWERQRTCWERTYCTAAAACTTCTTT')
print z
You need to return outside the loop, you are returning on the first match so you only ever get a single character in your list:
def find(ch,string1):
pos = []
for i in range(len(string1)):
if ch == string1[i]:
pos.append(i)
return pos # outside
You can also use enumerate with a list comp in place of your range logic:
def indexes(ch, s1):
return [index for index, char in enumerate(s1)if char == ch and 5 >= index <= len(s1) - 6]
Each index in the list comp is the character index and each char is the actual character so we keep each index where char is equal to ch.
If you want the five chars that are both sides:
In [24]: s="CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTP QKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP"
In [25]: inds = indexes("C",s)
In [26]: [s[i-5:i+6] for i in inds]
Out[26]: ['QDIQLCGMGIL', 'ILPEHCIIDIT']
I added checking the index as we obviously cannot get five chars before C if the index is < 5 and the same from the end.
You can do it all in a single function, yielding a slice when you find a match:
def find(ch, s):
ln = len(s)
for i, char in enumerate(s):
if ch == char and 5 <= i <= ln - 6:
yield s[i- 5:i + 6]
Where presuming the data in your question is actually two lines from yoru file like:
s="""">3fm8|A|A0JLQ2CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTPQKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
>2ht9|A|A0JLT0LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDALYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCY"""
Running:
for line in s.splitlines():
print(list(find("C" ,line)))
would output:
['0JLQ2CFLVNL', 'QDIQLCGMGIL', 'ILPEHCIIDIT']
['TISDNCVVIFS', 'FSKTSCSYCTM', 'TSCSYCTMAKK']
Which gives six matches not four as your expected output suggest so I presume you did not include all possible matches.
You can also speed up the code using str.find, starting at the last match index + 1 for each subsequent match
def find(ch, s):
ln, i = len(s) - 6, s.find(ch)
while 5 <= i <= ln:
yield s[i - 5:i + 6]
i = s.find(ch, i + 1)
Which will give the same output. Of course if the strings cannot overlap you can start looking for the next match much further in the string each time.
My solution is based on regex, and shows all possible solutions using regex and while loop. Thanks to #Smac89 for improving it by transforming it into a generator:
import re
string = """CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTPQKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDA LYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCYL"""
# Generator
def find_cysteine2(string):
# Create a loop that will utilize regex multiple times
# in order to capture matches within groups
while True:
# Find a match
data = re.search(r'(\w{5}C\w{5})',string)
# If match exists, let's collect the data
if data:
# Collect the string
yield data.group(1)
# Shrink the string to not include
# the previous result
location = data.start() + 1
string = string[location:]
# If there are no matches, stop the loop
else:
break
print [x for x in find_cysteine2(string)]
# ['QDIQLCGMGIL', 'ILPEHCIIDIT', 'TISDNCVVIFS', 'FSKTSCSYCTM', 'TSCSYCTMAKK']

Why won't my for loop work? (Python)

Yes, this is homework. I'm just trying to understand why this doesn't seem to work.
I'm trying to find the longest substring in a string that's in alphabetical order. I make a list of random letters, and say the length is 19. When I run my code, it prints out indices 0 through 17. (I know this happens because I subtract 1 from the range) However, when I leave off that -1, it tells me the "string index is out of range." Why does that happen?
s = 'cntniymrmbhfinjttbiuqhib'
sub = ''
longest = []
for i in range(len(s) - 1):
if s[i] <= s[i+1]:
sub += s[i]
longest.append(sub)
elif s[i-1] <= s[i]:
sub += s[i]
longest.append(sub)
sub = ' '
else:
sub = ' '
print(longest)
print ('Longest substring in alphabetical order is: ' + max(longest, key=len))
I've also tried a few other methods
If I just say:
for i in s:
it throws an error, saying "string indices must be integers, not str." This seems like a much simpler way to iterate through the string, but how would I compare individual letters this way?
This is Python 2.7 by the way.
Edit: I'm sure my if/elif statements could be improved but that's the first thing I could think of. I can come back to that later if need be.
The issue is the line if s[i] <= s[i+1]:. If i=18 (the final iteration of your loop without the -1 in it). Then i+1=19 is out of bounds.
Note that the line elif s[i-1] <= s[i]: is also probably not doing what you want it to. When i=0 we have i-1 = -1. Python allows negative indices to mean counting from the back of the indexed object so s[-1] is the last character in the list (s[-2] would be the second last etc.).
A simpler way to get the previous and next character is to use zip whilst slicing the string to count from the first and second characters respectively.
zip works like this if you haven't seen it before:
>>> for char, x in zip(['a','b','c'], [1,2,3,4]):
>>> print char, x
'a' 1
'b' 2
'c' 3
So you can just do:
for previous_char, char, next_char in zip(string, string[1:], string[2:]):
To iterate over all the triples of characters without messing up at the ends.
However there is a much simpler way to do this. Instead of comparing the current character in the string to other characters in the string you should compare it with the last character in the current string of alphabetised characters for example:
s = "abcdabcdefa"
longest = [s[0]]
current = [s[0]]
for char in s[1:]:
if char >= current[-1]: # current[-1] == current[len(current)-1]
current.append(char)
else:
current=[char]
if len(longest) < len(current):
longest = current
print longest
This avoids having to do any fancy indexing.
I'm sure my if/elif statements could be improved but that's the first
thing I could think of. I can come back to that later if need be.
#or1426's solution creates a list of the currently longest sorted sequence and copies it over to longest whenever a longer sequence is found. This creates a new list every time a longer sequence is found, and appends to a list for every character. This is actually very fast in Python, but see below.
#Deej's solution keeps the currently longest sorted sequence in a string variable, and every time a longer substring is found (even if it's a continuation of the current sequence) the substring is saved to a list. The list ends up having all sorted substrings of the original string, and the longest is found by using a call to max.
Here is a faster solution that only keeps track of the indices of the currently largest sequence, and only makes changes to longest when it finds a character that is not in sorted order:
def bjorn4(s):
# we start out with s[0] being the longest sorted substring (LSS)
longest = (0, 1) # the slice-indices of the longest sorted substring
longlen = 1 # the length of longest
cur_start = 0 # the slice-indices of the *current* LSS
cur_stop = 1
for ch in s[1:]: # skip the first ch since we handled it above
end = cur_stop-1 # cur_stop is a slice index, subtract one to get the last ch in the LSS
if ch >= s[end]: # if ch >= then we're still in sorted order..
cur_stop += 1 # just extend the current LSS by one
else:
# we found a ch that is not in sorted order
if longlen < (cur_stop-cur_start):
# if the current LSS is longer than longest, then..
longest = (cur_start, cur_stop) # store current in longest
longlen = longest[1] - longest[0] # precompute longlen
# since we can't add ch to the current LSS we must create a new current around ch
cur_start, cur_stop = cur_stop, cur_stop+1
# if the LSS is at the end, then we'll not enter the else part above, so
# check for it after the for loop
if longlen < (cur_stop - cur_start):
longest = (cur_start, cur_stop)
return s[longest[0]:longest[1]]
How much faster? It's almost twice as fast as orl1426 and three times faster than deej. As always that depends on your input. The more chunks of sorted substrings that exist, the faster the above algorithm will be compared to the others. E.g. on an input string of length 100000 containing alternating 100 random chars and 100 in-order chars, I get:
bjorn4: 2.4350001812
or1426: 3.84699988365
deej : 7.13800001144
if I change it to alternating 1000 random chars and 1000 sorted chars, then I get:
bjorn4: 23.129999876
or1426: 38.8380000591
deej : MemoryError
Update:
Here is a further optimized version of my algorithm, with the comparison code:
import random, string
from itertools import izip_longest
import timeit
def _randstr(n):
ls = []
for i in range(n):
ls.append(random.choice(string.lowercase))
return ''.join(ls)
def _sortstr(n):
return ''.join(sorted(_randstr(n)))
def badstr(nish):
res = ""
for i in range(nish):
res += _sortstr(i)
if len(res) >= nish:
break
return res
def achampion(s):
start = end = longest = 0
best = ""
for c1, c2 in izip_longest(s, s[1:]):
end += 1
if c2 and c1 <= c2:
continue
if (end-start) > longest:
longest = end - start
best = s[start:end]
start = end
return best
def bjorn(s):
cur_start = 0
cur_stop = 1
long_start = cur_start
long_end = cur_stop
for ch in s[1:]:
if ch < s[cur_stop-1]:
if (long_end-long_start) < (cur_stop-cur_start):
long_start = cur_start
long_end = cur_stop
cur_start = cur_stop
cur_stop += 1
if (long_end-long_start) < (cur_stop-cur_start):
return s[cur_start:cur_stop]
return s[long_start:long_end]
def or1426(s):
longest = [s[0]]
current = [s[0]]
for char in s[1:]:
if char >= current[-1]: # current[-1] == current[len(current)-1]
current.append(char)
else:
current=[char]
if len(longest) < len(current):
longest = current
return ''.join(longest)
if __name__ == "__main__":
print 'achampion:', round(min(timeit.Timer(
"achampion(rstr)",
setup="gc.enable();from __main__ import achampion, badstr; rstr=badstr(30000)"
).repeat(15, 50)), 3)
print 'bjorn:', round(min(timeit.Timer(
"bjorn(rstr)",
setup="gc.enable();from __main__ import bjorn, badstr; rstr=badstr(30000)"
).repeat(15, 50)), 3)
print 'or1426:', round(min(timeit.Timer(
"or1426(rstr)",
setup="gc.enable();from __main__ import or1426, badstr; rstr=badstr(30000)"
).repeat(15, 50)), 3)
With output:
achampion: 0.274
bjorn: 0.253
or1426: 0.486
changing the data to be random:
achampion: 0.350
bjorn: 0.337
or1426: 0.565
and sorted:
achampion: 0.262
bjorn: 0.245
or1426: 0.503
"no, no, it's not dead, it's resting"
Now Deej has an answer I feel more comfortable posting answers to homework.
Just reordering #Deej's logic a little you can simplify to:
sub = ''
longest = []
for i in range(len(s)-1): # -1 simplifies the if condition
sub += s[i]
if s[i] <= s[i+1]:
continue # Keep adding to sub until condition fails
longest.append(sub) # Only add to longest when condition fails
sub = ''
max(longest, key=len)
But as mentioned by #thebjorn this has the issue of keeping every ascending partition in a list (in memory). You could fix this by using a generator, and I only put the rest here for instructional purposes:
def alpha_partition(s):
sub = ''
for i in range(len(s)-1):
sub += s[i]
if s[i] <= s[i+1]:
continue
yield sub
sub = ''
max(alpha_partition(s), key=len)
This certainly wont be the fastest solution (string construction and indexing) but it's quite simple to change, use zip to avoid the indexing into the string and indexes to avoid string construction and addition:
from itertools import izip_longest # For py3.X use zip_longest
def alpha_partition(s):
start = end = 0
for c1, c2 in izip_longest(s, s[1:]):
end += 1
if c2 and c1 <= c2:
continue
yield s[start:end]
start = end
max(alpha_partition(s), key=len)
Which should operate pretty efficiently and be only slightly slower than the iterative indexing approach from #thebjorn due to the generator overhead.
Using s*100
alpha_partition(): 1000 loops, best of 3: 448 µs per loop
#thebjorn: 1000 loops, best of 3: 389 µs per loop
For reference turning the generator into an iterative function:
from itertools import izip_longest # For py3.X use zip_longest
def best_alpha_partition(s):
start = end = longest = 0
best = ""
for c1, c2 in izip_longest(s, s[1:]):
end += 1
if c2 and c1 <= c2:
continue
if (end-start) > longest:
longest = end - start
best = s[start:end]
start = end
return best
best_alpha_partition(s)
best_alpha_partition(): 1000 loops, best of 3: 306 µs per loop
I personally prefer the generator form because you would use exactly the same generator for finding the minimum, the top 5, etc. very reusable vs. the iterative function which only does one thing.
ok, so after reading your responses and trying all kinds of different things, I finally came up with a solution that gets exactly what I need. It's not the prettiest code, but it works. I'm sure the solutions mentioned would work as well, however I couldn't figure them out. Here's what I did:
s = 'inaciaebganawfiaefc'
sub = ''
longest = []
for i in range(len(s)):
if (i+1) < len(s) and s[i] <= s[i+1]:
sub += s[i]
longest.append(sub)
elif i >= 0 and s[i-1] <= s[i]:
sub += s[i]
longest.append(sub)
sub = ''
else:
sub = ''
print ('Longest substring in alphabetical order is: ' + max(longest, key=len))

Categories