Longest sequence of consecutive letters

Longest sequence of consecutive letters - python

Suppose I have a string of lower case letters, e.g.
'ablccmdnneofffpg'
And my aim is to find the longest sequence of the consecutive numbers inside this string which in this case is:
'abcdefg'
The intuitive attempt is to find loop around each letter and obtain the longest sequence starting from that letter. One possible solution is
longest_length = 0
start = None
current_start = 0
while current_start < len(word) - longest_length:
current_length = 1
last_in_sequence = ord(word[current_start])
for i in range(current_start + 1, len(word)):
if ord(word[i]) - last_in_sequence == 1:
current_length += 1
last_in_sequence = ord(word[i])
if current_length > longest_length:
longest_length = current_length
start = current_start
while (current_start < len(word) - 1 and
ord(word[current_start + 1]) - ord(word[current_start]) == 1):
current_start += 1
current_start += 1
Are there any other ways of solving the problem with fewer lines, or even using some pythonic methods?

You could keep track of all subsequences of consecutive characters as seen in the string using a dictionary, and then take the one with the largest length.
Each subsequence is keyed by the next candidate in the alphabet so that once the anticipated candidate is reached in the string, it is used to update the value of the corresponding subsequence in the dictionary and added as a new dictionary value keyed by the next alphabet:
def longest_sequence(s):
d = {}
for x in s:
if x in d:
d[chr(ord(x)+1)] = d[x] + x
else:
d[chr(ord(x)+1)] = x
return max(d.values(), key=len)
print(longest_sequence('ablccmdnneofffpg'))
# abcdefg
print(longest_sequence('ba'))
# b
print(longest_sequence('sblccmtdnneofffpgtuyvgmmwwwtxjyuuz'))
# stuvwxyz

A solution that trades memory for (some) time:
It keeps track of all sequences seen and then at the end prints the longest found (although there could be more than one).
from contextlib import suppress
class Sequence:
def __init__(self, letters=''):
self.letters = letters
self.last = self._next_letter(letters[-1:])
def append(self, letter):
self.letters += letter
self.last = self._next_letter(letter)
def _next_letter(self, letter):
with suppress(TypeError):
return chr(ord(letter) + 1)
return 'a'
def __repr__(self):
return 'Sequence({}, {})'.format(repr(self.letters),
repr(self.last))
word = 'ablccmdnneofffpg'
sequences = []
for letter in word:
for s in sequences:
if s.last == letter:
s.append(letter)
break
else:
sequences.append(Sequence(letters=letter))
sequences = list(sorted(sequences, key=lambda s: len(s.letters), reverse=True))
print(sequences[0].letters)

You are basically asking for the longest increasing subsequence, which is a well-studied problem. Have a look at the pseudo code in Wikipedia.

Similar to MosesKoledoye's solution, but only stores the lengthes for the ordinals of the chars and only builts the solution string in the end. This should therefore be a little more space-efficient:
def longest_seq(s):
d = {}
for c in s:
c, prev_c = ord(c), ord(c) - 1
d[c] = max(d.get(c, 0), d.pop(prev_c, 0) + 1)
c, l = max(d.items(), key=lambda i: i[1])
return ''.join(map(chr, range(c-l+1, c+1)))

Related

Code for consecutive strings works but can't pass random tests

In this problem, I'm given an array(list) strarr of strings and an integer k. My task is to return the first longest string consisting of k consecutive strings taken in the array. My code passed all the sample tests from CodeWars but can't seem to pass the random tests.
Here's the link to the problem.
I did it in two days. I found the max consecutively combined string first. Here's the code for that.
strarr = []
def longest_consec(strarr, k):
strarr.append('')
length = len(strarr)
cons_list = []
end = k
start = 0
freq = -length/2
final_string = []
largest = max(strarr, key=len, default='')
if k == 1:
return largest
elif 1 < k < length:
while(freq <= 1):
cons_list.append(strarr[start:end])
start += k-1
end += k-1
freq += 1
for index in cons_list:
final_string.append(''.join(index))
return max(final_string, key=len, default='')
else:
return ""
Since that didn't pass all the random tests, I compared the combined k strings on both sides of the single largest string. But, this way, the code doesn't account for the case when the single largest string is in the middle. Please help.
strarr = []
def longest_consec(strarr, k):
strarr.append('')
length = len(strarr)
largest = max(strarr, key=len, default='')
pos = int(strarr.index(largest))
if k == 1:
return largest
elif 1 < k < length:
prev_string = ''.join(strarr[pos+1-k:pos+1])
next_string = ''.join(strarr[pos:pos+k])
if len(prev_string) >= len(next_string):
res = prev_string
else:
res = next_string
return res
else:
return ""
print(longest_consec(["zone", "abigail", "theta", "form", "libe"], 2))

Let's start from the first statement of your function:
if k == 1:
while(p <= 1):
b.append(strarr[j:i])
j += 1
i += 1
p += 1
for w in b:
q.append(''.join(w))
return max(q, key=len)
Here q is finally equal strarr so you can shorten this code to:
if k == 1:
return max(strarr, key=len)
I see that second statement's condition checks if k value is between 1 and length of string array inclusive:
elif k > 1 and k <= 2*a:
...
If you want no errors remove equality symbol, last element of every array has index lesser than its length (equal exactly length of it minus 1).
Ceiling and division is not necessary in a definition, so you can shorten this:
a = ceil(len(strarr)/2)
into this:
a = len(strarr)
then your elif statement may look like below:
elif 1 < k < a: # Same as (k > 1 and k < a)
...
again, I see you want to concatenate (add) the longest string to k next strings using this code:
while(p <= 1):
b.append(strarr[j:i])
j += k-1
i += k-1
p += 1
for w in b:
q.append(''.join(w))
return max(q, key=len)
the more clearer way of doing this:
longest = max(strarr, key=len) # Longest string in array.
index = 0 # Index of the current item.
for string in strarr:
# If current string is equal the longest one ...
if string == longest:
# Join 'k' strings from current index (longest string index).
return ''.join(strarr[index:index + k])
index += 1 # Increase current index.
And the last statement which is:
elif k > 2*a or k<1:
return ""
if all previous statements failed then value is invalid so you can instead write:
return "" # Same as with else.
Now everything should work. I advice you learning the basics (especially lists, strings and slices), and please name your variables wisely so they are more readable.

You can try this as well
this has passed all the test cases on the platform you suggested.
def longest_consec(strarr, k):
i = 0
max_ = ""
res = ""
if (k<=0) or (k>len(strarr)):
return ""
while i<=(len(strarr)-k):
start = "".join(strarr[i:i+k])
max_ = max(max_, start, key=len)
if max_==start:
res=strarr[i:i+k]
i+=1
return max_
#output: ["zone", "abigail", "theta", "form", "libe", "zas", "theta", "abigail"], 2 -> abigailtheta
#output: ["zones", "abigail", "theta", "form", "libe", "zas", "theta", "abigail"],2 -> zonesabigail

Leetcode 5: Longes Palindrome Substring

I have been working on the LeetCode problem 5. Longest Palindromic Substring:
Given a string s, return the longest palindromic substring in s.
But I kept getting time limit exceeded on large test cases.
I used dynamic programming as follows:
dp[(i, j)] = True implies that s[i] to s[j] is a palindrome. So if s[i] == str[j] and dp[(i+1, j-1]) is set to True, that means S[i] to S[j] is also a palindrome.
How can I improve the performance of this implementation?
class Solution:
def longestPalindrome(self, s: str) -> str:
dp = {}
res = ""
for i in range(len(s)):
# single character is always a palindrome
dp[(i, i)] = True
res = s[i]
#fill in the table diagonally
for x in range(len(s) - 1):
i = 0
j = x + 1
while j <= len(s)-1:
if s[i] == s[j] and (j - i == 1 or dp[(i+1, j-1)] == True):
dp[(i, j)] = True
if(j-i+1) > len(res):
res = s[i:j+1]
else:
dp[(i, j)] = False
i += 1
j += 1
return res

I think the judging system for this problem is kind of too tight, it took some time to make it pass, improved version:
class Solution:
def longestPalindrome(self, s: str) -> str:
dp = {}
res = ""
for i in range(len(s)):
dp[(i, i)] = True
res = s[i]
for x in range(len(s)): # iterate till the end of the string
for i in range(x): # iterate up to the current state (less work) and for loop looks better here
if s[i] == s[x] and (dp.get((i + 1, x - 1), False) or x - i == 1):
dp[(i, x)] = True
if x - i + 1 > len(res):
res = s[i:x + 1]
return res

Here is another idea to improve the performance:
The nested loop will check over many cases where the DP value is already False for smaller ranges. We can avoid looking at large spans, by looking for palindromes from inside-out and stop extending the span as soon as it no longer is a palindrome. This process should be repeated at every offset in the source string, but this could still save some processing.
The inputs for which then most time is wasted, are those where there are lots of the same letters after each other, like "aaaaaaabcaaaaaaa". These lead to many iterations: each "a" or "aa" could be the center of a palindrome, but "growing" each of them is a waste of time. We should just consider all consecutive "a" together from the start and expand from there onwards.
You can specifically deal with these cases by first grouping consecutive letters which are the same. So the above example would be turned into 4 groups: a(7)b(1)c(1)a(7)
Then let each group in turn be taken as the center of a palindrome. For each group, "fan out" to potentially include one or more neighboring groups at both sides in "tandem". Continue fanning out until either the outside groups are not about the same letter, or they have a different group size. From that result you can derive what the largest palindrome is around that center. In particular, when the case is that the letters of the outer groups are the same, but not their sizes, you still include that letter at the outside of the palindrome, but with a repetition that corresponds to the least of these two mismatching group sizes.
Here is an implementation. I used named tuples to make it more readable:
from itertools import groupby
from collections import namedtuple
Group = namedtuple("Group", "letter,size,end")
class Solution:
def longestPalindrome(self, s: str) -> str:
longest = ""
x = 0
groups = [Group(group[0], len(group), x := x + len(group)) for group in
("".join(group[1]) for group in groupby(s))]
for i in range(len(groups)):
for j in range(0, min(i+1, len(groups) - i)):
if groups[i - j].letter != groups[i + j].letter:
break
left = groups[i - j]
right = groups[i + j]
if left.size != right.size:
break
size = right.end - (left.end - left.size) - abs(left.size - right.size)
if size > len(longest):
x = left.end - left.size + max(0, left.size - right.size)
longest = s[x:x+size]
return longest

Alternatively, you can try this approach, it seems to be faster than 96% Python submission.
def longestPalindrome(self, s: str) -> str:
N = len(s)
if N == 0:
return 0
max_len, start = 1, 0
for i in range(N):
df = i - max_len
if df >= 1 and s[df-1: i+1] == s[df-1: i+1][::-1]:
start = df - 1
max_len += 2
continue
if df >= 0 and s[df: i+1] == s[df: i+1][::-1]:
start= df
max_len += 1
return s[start: start + max_len]

If you want to improve the performance, you should create a variable for len(s) at the beginning of the function and use it. That way instead of calling len(s) 3 times, you would do it just once.
Also, I see no reason to create a class for this function. A simple function will outrun a class method, albeit very slightly.

count characters occurences in string

I want to find out how often does "reindeer" (in any order) come in a random string and what is the left over string after "reindeer" is removed. I need to preserve order of the left over string
So for example
"erindAeer" -> A (reindeer comes 1 time)
"ierndeBeCrerindAeer" -> ( 2 reindeers, left over is BCA)
I thought of sorting and removing "reindeer", but i need to preserve the order . What's a good way to do this?

We can replace those letters after knowing how many times they repeat, and Counter is convenient for counting elements.
from collections import Counter
def leftover(letter_set, string):
lcount, scount = Counter(letter_set), Counter(string)
repeat = min(scount[l] // lcount[l] for l in lcount)
for l in lcount:
string = string.replace(l, "", lcount[l] * repeat)
return f"{repeat} {letter_set}, left over is {string}"
print(leftover("reindeer", "ierndeBeCrerindAeer"))
print(leftover("reindeer", "ierndeBeCrerindAeere"))
print(leftover("reindeer", "ierndeBeCrerindAee"))
Output:
2 reindeer, left over is BCA
2 reindeer, left over is BCAe
1 reindeer, left over is BCerindAee

Here is a rather simple approach using collections.Counter:
from collections import Counter
def purge(pattern, string):
scount, pcount = Counter(string), Counter(pattern)
cnt = min(scount[x] // pcount[x] for x in pcount)
scount.subtract(pattern * cnt)
return cnt, "".join(scount.subtract(c) or c for c in string if scount[c])
>>> purge("reindeer", "ierndeBeCrerindAeer")
(2, 'BCA')

Here is the code in Python:
def find_reindeers(s):
rmap = {}
for x in "reindeer":
if x not in rmap:
rmap[x] = 0
rmap[x] += 1
hmap = {key: 0 for key in "reindeer"}
for x in s:
if x in "reindeer":
hmap[x] += 1
total_occ = min([hmap[x]//rmap[x] for x in "reindeer"])
left_over = ""
print(hmap, rmap)
for x in s:
if (x in "reindeer" and hmap[x] > total_occ * rmap[x]) or (x not in "reindeer"):
left_over += x
return total_occ, left_over
print(find_reindeers("ierndeBeCrerindAeer"))
Output for ierndeBeCrerindAeer:
(2, "BCA")

You can do it by using count and replace string function:
import queue
word = "reindeer"
given_string = "ierndeBeCrerindAeer"
new_string = ""
counter = 0
tmp = ""
letters = queue.Queue()
for i in given_string:
if not i in word:
new_string += i
else:
letters.put(i)
x = 0
while x < len(word):
while not letters.empty():
j = letters.get()
if j == word[x]:
tmp += j
# print(tmp)
break
else:
letters.put(j)
x = x +1
if tmp == word:
counter += 1
tmp = ""
x = 0
print(f"The word {word} occurs {counter} times in the string {given_string}.")
print("The left over word is",new_string)
Output will be:
The word reindeer occurs 2 times in the string ierndeBeCrerindAeer.
The left over word is BCA
It's easy to use queue here so that we don't repeat the elements that are already present or found.
Hope this answers your question, Thank you!

Python Optimization : Find the most occured sequence of 4 letters inside a 1000 letters string randomly generated

I'm here to ask help about my program.
I realise a program that raison d'être is to find the most occured four letters string on a x letters bigger string which have been generated randomly.
As example, if you would know the most occured sequence of four letters in 'abcdeabcdef' it's pretty easy to understand that is 'abcd' so the program will return this.
Unfortunately, my program works very slow, I mean, It take 119.7 seconds, for analyze all possibilities and display the results for only a 1000 letters string.
This is my program, right now :
import random
chars = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
string = ''
for _ in range(1000):
string += str(chars[random.randint(0, 25)])
print(string)
number = []
for ____ in range(0,26):
print(____)
for ___ in range(0,26):
for __ in range(0, 26):
for _ in range(0, 26):
test = chars[____] + chars[___] + chars[__] + chars[_]
print('trying :',test, end = ' ')
number.append(0)
for i in range(len(string) -3):
if string[i: i+4] == test:
number[len(number) -1] += 1
print('>> finished')
_max = max(number)
for i in range(len(number)-1):
if number[i] == _max :
j, k, l, m = i, 0, 0, 0
while j > 25:
j -= 26
k += 1
while k > 25:
k -= 26
l += 1
while l > 25:
l -= 26
m += 1
Result = chars[m] + chars[l] + chars[k] + chars[j]
print(str(Result),'occured',_max, 'times' )
I think there is ways to optimize it but at my level, I really don't know. Maybe the structure itself is not the best. Hope you'll gonna help me :D

You only need to loop through your list once to count the 4-letter sequences. You are currently looping n*n*n*n. You can use zip to make a four letter sequence that collects the 997 substrings, then use Counter to count them:
from collections import Counter
import random
chars = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
s = "".join([chars[random.randint(0, 25)] for _ in range(1000)])
it = zip(s, s[1:], s[2:], s[3:])
counts = Counter(it)
counts.most_common(1)
Edit:
.most_common(x) returns a list of the x most common strings. counts.most_common(1) returns a single item list with the tuple of letters and number of times it occurred like; [(('a', 'b', 'c', 'd'), 2)]. So to get a string, just index into it and join():
''.join(counts.most_common(1)[0][0])

Even with your current approach of iterating through every possible 4-letter combination, you can speed up a lot by keeping a dictionary instead of a list, and testing whether the sequence occurs at all first before trying to count the occurrences:
counts = {}
for a in chars:
for b in chars:
for c in chars:
for d in chars:
test = a + b + c + d
print('trying :',test, end = ' ')
if test in s: # if it occurs at all
# then record how often it occurs
counts[test] = sum(1 for i in range(len(s)-4)
if test == s[i:i+4])
The multiple loops can be replaced with itertools.permutations, though this improves readability rather than performance:
length = 4
for sequence in itertools.permutations(chars, length):
test = "".join(sequence)
if test in s:
counts[test] = sum(1 for i in range(len(s)-length) if test == s[i:i+length])
You can then display the results like this:
_max = max(counts.values())
for k, v in counts.items():
if v == _max:
print(k, "occurred", _max, "times")
Provided that the string is shorter or around the same length as 26**4 characters, then it is much faster still to iterate through the string rather than through every combination:
length = 4
counts = {}
for i in range(len(s) - length):
sequence = s[i:i+length]
if sequence in counts:
counts[sequence] += 1
else:
counts[sequence] = 1
This is equivalent to the Counter approach already suggested.

Return the number of words over min length and Replace ones that arent with word + spaces

For any name whose length is less than the min_length, replace that item of the list with a new string containing the original name with the space(s) added to the right-hand side to achieve the minimum length
example: min_length = 5 /// 'dog' after the change = 'dog '
and also Return the amount of names that were originally over the min_length in the list
def pad_names(list_of_names, min_length):
'''(list of str, int) -> int
Return the number of names that are longer than the minimum length.
>>> pad_names(list_of_names, 5)
2
'''
new_list = []
counter = 0
for word in list_of_names:
if len(word) > min_length:
counter = counter + 1
for word in list_of_names:
if len(word) < min_length:
new_list.append(word.ljust(min_length))
else:
new_list.append(word)
return(counter)
I currently am getting 5 errors on this function:
all errors are #incorrect return value
test_02: pad_names_one_name_much_padding_changes_list
Cause: The list is not changed correctly for the case of 1 name needing padding
test_04: pad_names_one_name_needs_one_pad_changes_list
Cause: one name needs one pad changes list
test_10: pad_names_all_padded_by_3_changes_list
Cause:all padded by 3 changes list
test_12: pad_names_all_padded_different_amounts_changes_list
Cause: all padded different amounts changes list
test_20: pad_names_one_empty_name_list_changed
Cause: one empty name list changed
This function does not need to be efficient just needs to pass those tests without creating more problems

Keeping in spirit with how you've written this, I'm guessing you want to be modifying the list of names so the result can be checked (as well as returning the count)...
def test(names, minlen):
counter = 0
for idx, name in enumerate(names):
newname = name.ljust(minlen)
if newname != name:
names[idx] = newname
else:
counter += 1
return counter

Not sure I completely understand your question, but here is a fairly pythonic way to do what I think you want:
data = ['cat', 'snake', 'zebra']
# pad all strings with spaces to 5 chars minimum
# uses string.ljust for padding and a list comprehension
padded_data = [d.ljust( 5, ' ' ) for d in data]
# get the count of original elements whose length was 5 or great
ct = sum( len(d) >= 5 for d in data )

You mean this?
def pad_names(list_of_names, min_length):
counter = 0
for i, val in enumerate(list_of_names):
if len(val) >= min_length:
counter += 1
elif len(val) < min_length:
list_of_names[i] = val.ljust(min_length)
return counter
OR:
def pad_names(words, min_length):
i, counter = 0, 0
for val in words:
if len(val) >= min_length:
counter += 1
elif len(val) < min_length:
words[i] = val.ljust(min_length)
i += 1
return counter

string.ljust() is a quick and simple way to pad strings to a minimum length.
Here's how I would write the function:
def pad_names(list_of_names, min_length):
# count strings that meets the min_length requirement
count = sum(1 for s in list_of_names if len(s) > min_length)
# replace list with updated content, using ljust() to pad strings
list_of_names[:] = [s.ljust(min_length) for s in list_of_names]
return count # return num of strings that exceeds min_length
While succinct, that may not be the most efficient approach for large datasets since we're essentially creating a new list and copying it over the new one.
If performance is an issue, I'd go with a loop and only update the relevant entries.
def pad_names(list_of_names, min_length):
count = 0
for i, s in enumerate(list_of_names):
if len(s) > min_length: # should it be >= ??
count += 1 # increment counter
else:
list_of_names[i] = s.ljust(min_length) # update original list
return count

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Longest sequence of consecutive letters - python

You are basically asking for the longest increasing subsequence, which is a well-studied problem. Have a look at the pseudo code in Wikipedia.

Related

Code for consecutive strings works but can't pass random tests

Leetcode 5: Longes Palindrome Substring

count characters occurences in string

Python Optimization : Find the most occured sequence of 4 letters inside a 1000 letters string randomly generated

Return the number of words over min length and Replace ones that arent with word + spaces

Categories

Resources