Keep one of two consecutive chars in a string

Keep one of two consecutive chars in a string - python

So I want to replicate a word n times in my function but I want to eliminate the consecutive characters.
For example repete (amanha, 2) = "amanhamanha"
My function:
def repete(palavra,n):
a = []
b=""
for n in range (0,n):
a.append(palavra)
b = b.join(a)
return b

The first step is to determine the longest overlap between the start and end of the word. The next() function can be used to get the number of characters to skip by getting the first match starting from the longest substring down to the shortest and defaulting to zero if there is no overlap. Then the repetition can be performed on the remaining part of the word (i.e. skipping the length of the common part)
def repeat(w,n):
skip = next((i for i in range(len(w)-1,0,-1) if w[:i]==w[-i:]),0)
return w + (n-1)*w[skip:]
print(repeat("amanha",2)) # amanhamanha
print(repeat("abc",2)) # abcabc
print(repeat("abcdab",2)) # abcdabcdab
You could also use the max() function to get the length to skip (not as efficient as next() but shorter to write):
def repeat(w,n):
skip = max(range(len(w)),key=lambda i:i*(w[:i]==w[-i:]))
return w + (n-1)*w[skip:]

Related

Time Complexity for LeetCode 3. Longest Substring Without Repeating Characters

Problem: Given a string s, find the length of the longest substring
without repeating characters.
Example: Input: s = "abcabcbb" Output: 3 Explanation: The answer is
"abc", with the length of 3.
My solution:
class Solution:
def lengthOfLongestSubstring(self, s: str) -> int:
seen = set()
l = r = curr_len = max_len = 0
n = len(s)
while l < n:
if r < n and s[r] not in seen:
seen.add(s[r])
curr_len += 1
max_len = max(curr_len, max_len)
r += 1
else:
l += 1
r = l
curr_len = 0
seen.clear()
return max_len
I know this is not an efficient solution, but I am having trouble figuring out its time complexity.
I visit every character in the string but, for each one of them, the window expands until it finds a repeated char. So every char ends up being visited multiple times, but not sure if enough times to justify an O(n2) time complexity and, obviously, it's way worse than O(n).

You could claim the algorithm to be O(n) if you know the size of the character set your input can be composed of, because the length your window can expand is limited by the number of different characters you could pass over before encountering a duplicate, and this is capped by the size of the character set you're working with, which itself is some constant independent of the length of the string. For example, if you are only working with lower case alphabetic characters, the algorithm is O(26n) = O(n).
To be more exact you could say that it runs in O(n*(min(m,n)) where n is the length of the string and m is the number of characters in the alphabet of the string. The reason for the min is that even if you're somehow working with an alphabet of unlimited unique characters, at worst you're doing a double for loop to the end of the string. That means however that if the number of possible characters you can encounter in the string exceeds the string's length you have a worst case O(n^2) performance (which occurs when every character of the string is unique).

Time limit exceeded error. Word Ladder leetcode

I am trying to solve leetcode problem(https://leetcode.com/problems/word-ladder/description/):
Given two words (beginWord and endWord), and a dictionary's word list, find the length of shortest transformation sequence from beginWord to endWord, such that:
Only one letter can be changed at a time.
Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
Note:
Return 0 if there is no such transformation sequence.
All words have the same length.
All words contain only lowercase alphabetic characters.
You may assume no duplicates in the word list.
You may assume beginWord and endWord are non-empty and are not the same.
Input:
beginWord = "hit",
endWord = "cog",
wordList = ["hot","dot","dog","lot","log","cog"]
Output:
5
Explanation:
As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" ->
"cog", return its length 5.
import queue
class Solution:
def isadjacent(self,a, b):
count = 0
n = len(a)
for i in range(n):
if a[i] != b[i]:
count += 1
if count > 1:
return False
if count == 1:
return True
def ladderLength(self,beginWord, endWord, wordList):
word_queue = queue.Queue(maxsize=0)
word_queue.put((beginWord,1))
while word_queue.qsize() > 0:
queue_last = word_queue.get()
index = 0
while index != len(wordList):
if self.isadjacent(queue_last[0],wordList[index]):
new_len = queue_last[1]+1
if wordList[index] == endWord:
return new_len
word_queue.put((wordList[index],new_len))
wordList.pop(index)
index-=1
index+=1
return 0
Can someone suggest how to optimise it and prevent the error!

The basic idea is to find the adjacent words faster. Instead of considering every word in the list (even one that has already been filtered by word length), construct each possible neighbor string and check whether it is in the dictionary. To make those lookups fast, make sure the word list is stored in something like a set that supports fast membership tests.
To go even faster, you could store two sorted word lists, one sorted by the reverse of each word. Then look for possibilities involving changing a letter in the first half in the reversed list and for the latter half in the normal list. All the existing neighbors can then be found without making any non-word strings. This can even be extended to n lists, each sorted by omitting one letter from all the words.

CodeEval Hard Challenge 6 - LONGEST COMMON SUBSEQUENCE - python

I am trying to solve the Longest Common Subsequence in Python. I've completed it and it's working fine although I've submitted it and it says it's 50% partially completed. I'm not sure what I'm missing here, any help is appreciated.
CHALLENGE DESCRIPTION:
You are given two sequences. Write a program to determine the longest common subsequence between the two strings (each string can have a maximum length of 50 characters). NOTE: This subsequence need not be contiguous. The input file may contain empty lines, these need to be ignored.
INPUT SAMPLE:
The first argument will be a path to a filename that contains two strings per line, semicolon delimited. You can assume that there is only one unique subsequence per test case. E.g.:
XMJYAUZ;MZJAWXU
OUTPUT SAMPLE:
The longest common subsequence. Ensure that there are no trailing empty spaces on each line you print. E.g.:
MJAU
My code is
# LONGEST COMMON SUBSEQUENCE
import argparse
def get_longest_common_subsequence(strings):
# here we will store the subsequence list
subsequences_list = list()
# split the strings in 2 different variables and limit them to 50 characters
first = strings[0]
second = strings[1]
startpos = 0
# we need to start from each index in the first string so we can find the longest subsequence
# therefore we do a loop with the length of the first string, incrementing the start every time
for start in range(len(first)):
# here we will store the current subsequence
subsequence = ''
# store the index of the found character
idx = -1
# loop through all the characters in the first string, starting at the 'start' position
for i in first[start:50]:
# search for the current character in the second string
pos = second[0:50].find(i)
# if the character was found and is in the correct sequence add it to the subsequence and update the index
if pos > idx:
subsequence += i
idx = pos
# if we have a subsequence, add it to the subsequences list
if len(subsequence) > 0:
subsequences_list.append(subsequence)
# increment the start
startpos += 1
# sort the list of subsequences with the longest at the top
subsequences_list.sort(key=len, reverse=True)
# return the longest subsequence
return subsequences_list[0]
def main():
parser = argparse.ArgumentParser()
parser.add_argument('filename')
args = parser.parse_args()
# read file as the first argument
with open(args.filename) as f:
# loop through each line
for line in f:
# if the line is empty it means it's not valid. otherwise print the common subsequence
if line.strip() not in ['\n', '\r\n', '']:
strings = line.replace('\n', '').split(';')
if len(strings[0]) > 50 or len(strings[1]) > 50:
break
print get_longest_common_subsequence(strings)
return 0
if __name__ == '__main__':
main()

The following solution prints unordered/unsorted longest common subsequences/substrings from semi-colon-separated string pairs. If a string from the pair is longer than 50 characters, then the pair is skipped (its not difficult to trim it to length 50 if that is desired).
Note: if sorting/ordering is desired it can be implemented (either alphabetic order, or sort by the order of the first string or sort by the order of the second string.
with open('filename.txt') as f:
for line in f:
line = line.strip()
if line and ';' in line and len(line) <= 101:
a, b = line.split(';')
a = set(a.strip())
b = set(b.strip())
common = a & b # intersection
if common:
print ''.join(common)
Also note: If the substrings have internal common whitespace (ie ABC DE; ZM YCA) then it will be part of the output because it will not be stripped. If that is not desired then you can replace the line a = set(a.strip()) with a = {char for char in a if char.strip()} and likewise for b.

def lcs_recursive(xlist,ylist):
if not xlist or not ylist:
return []
x,xs,y,ys, = xlist[0],xlist[1:],ylist[0],ylist[1:]
if x == y:
return [x] + lcs_recursive(xs,ys)
else:
return max(lcs_recursive(xlist,ys),lcs_recursive(xs,ylist),key=len)
s1 = 'XMJYAUZ'
s2 = 'MZJAWXU'
print (lcs_recursive(s1,s2))
This will give the correct answer MJAU and X & Z are not part of the answer because they are sequential (Note:- Subsequent)

Writing a recursive function that returns the digit with longest consecutive sequence

How do I write a recursive function that that takes an int value and returns the digit with the longest consecutive sequence?
For example, f(1122333) returns 3 and f(1223) returns 2
I have no idea how to approach this problem, and I'm kind of new to recursion in general.

Something like this. Not tested. Was fun to think about though.
Pseudo code:
(Assumes integer division)
Def number helperLongest(number myNum):
Return longest(myNum, -1, 0, -1, 0)
Def number longest(number myNum,number prevLongest, number numOfPrevLong, number currentLongest,number numOfLongest):
If (myNum/10 < 1) //base case
If (myNum == currentLongest)
numOfLongest++
Else //deal with corner case of < 10 input
If (numOfLongest > numOfPrevLong)
prevLongest = currentLongest
numOfPrevLongest = numOfLongest
currentLongest = myNum
numOfLongest = 1
return (numOfLongest>numOfPrevLong)?currentLongest:prevLongest
Else //recurse
if(myNum%10 == currentLongest)
numOfLongest++;
Else //have to break the chain
if (numOfLongest > numOfPrevLongest)
prevLongest = currentLongest
numOfPrevLongest = numOfLongest
currentLongest = myNum%10
numOfLongest = 1
myNewNum = myNum/10;
return longest(myNewNum,prevLongest,numOfPrevLong,currentLongest,numberOfLongest);
In words: go through the number digit by digit, starting from the end. If the current last digit matches the one before it, increment the counter. If it doesn't, and it's bigger than the previous maximum, save it. Reset the current digit to the current last digit and reset the counter. Chop off the last digit. Feed the smaller number and all of this information back into the function until you get down to one final digit (the first digit in the original number). Compare your current counter with the maximum stored, and return the larger.
One note: in case of a tie the first substring of matching numbers (which is actually the last substring in the original number) would be returned. If the other behavior is desired, then interchange the two > with >=.

The easiest thing I can think of is to do this via tail recursion. Within the function, I would have a private function that we would use for recursion. First, I would convert the integer into a list where each digit is separated as an individual element. This recursive private function takes in a list of elements, the number we are investigating, current number that holds the longest consecutive sequence and a count describing how many times we have seen. The count is important as we will be counting how many times we have encountered a particular reference number. The list is as an input is important, because we can simply provide a list with one less element for each call by skipping over the first element of this list. Eventually, we will get down to only one number in the list, which is the base case.
In other words, with any recursive algorithm, you need the base case, which is where we will stop and return something, and the recursive case where we need to call the function with the inputs modified.
The base case is when we provide a number with a single digit. This means that we've reached the end of the number. If this is the case, what we need to do is check to see if this value is equal to the current value that is currently considered to be consecutive. If this value matches, we increment current consecutive count by 1. Should this value exceed the current longest consecutive count, we will return this single digit as the number that pertains to the longest consecutive sequence. If not, then we simply return what this value was before we decided to do this check.
The recursive case is slightly more complicated. Given a digit we're looking at, we check to see if this digit is equal to the digit that is being considered as part of the consecutive stream. If it is, increment the count of this digit by 1, and we check to see if this count is larger than the current largest consecutive count. If it is, then we need to update the current longest value to this value and also update the longest count. If this doesn't match, we reset the count back to 1, as this is the first digit of its kind to be encountered. The current value to match will be this value, and we will recurse where we submit a list of values that starts from the second index onwards, with the other variables updated.
As we keep recursing and specifying values of the list from the second index onwards, we would effectively be searching linearly in the list from the beginning up until the end when we finally reach the last element of the list, and this is where we stop.
Without further ado, this is what I wrote. The function is called longest_consecutive_value and it takes in an integer:
# Function that determines the value that has the longest consecutive sequence
def longest_consecutive_value(value):
# Recursive function
def longest_recursive(list_val, current_val, current_longest_val, longest_count, count):
# Base case
if len(list_val) == 1:
# If single digit is equal to the current value in question,
# increment count
if list_val[0] == current_val:
count += 1
# If current count is longer than the longest count, return
# the single digit
if count > longest_count:
return list_val[0]
# Else, return the current longest value
else:
return current_longest_val
# Recursive case
else:
# If the left most digit is equal to the current value in question...
if list_val[0] == current_val:
# Increment count
count += 1
# If count is larger than longest count...
if count > longest_count:
# Update current longest value
current_longest_val = list_val[0]
# Update longest count
longest_count = count
# If not equal, reset counter to 1
else:
count = 1
# Current value is the left most digit
current_val = list_val[0]
# Recurse on the modified parameters
return longest_recursive(list_val[1:], current_val, current_longest_val, longest_count, count)
# Set up - Convert the integer into a list of numbers
list_num = map(int, str(value))
# Call private recursive function with initial values
return longest_recursive(list_num, list_num[0], list_num[0], 0, 0)
Here are some example cases (using IPython):
In [4]: longest_consecutive_value(1122333)
Out[4]: 3
In [5]: longest_consecutive_value(11223)
Out[5]: 1
In [6]: longest_consecutive_value(11223334444555555)
Out[6]: 5
In [7]: longest_consecutive_value(11111111122)
Out[7]: 1
In [8]: longest_consecutive_value(1122334444)
Out[8]: 4
Note that if there are multiple digits that share the same amount of consecutive numbers, only the first number that produced that length of consecutive numbers is what is output. As noted by Ron Thompson in his post, if you desire the most recent or last consecutive digit that satisfies the requirements, then use >= instead of > when checking for the counts.

Find the longest substring with contiguous characters, where the string may be jumbled

Given a string, find the longest substring whose characters are contiguous (i.e. they are consecutive letters) but possibly jumbled (i.e. out of order). For example:
Input : "owadcbjkl"
Output: "adcb"
We consider adcb as contiguous as it forms abcd.
(This is an interview question.)
I have an idea of running a while loop with 2 conditions, one that checks for continuous characters using Python's ord and another condition to find the minimum and maximum and check if all the following characters fall in this range.
Is there any way this problem could be solved with low running time complexity? The best I can achieve is O(N^2) where N is the length of the input string and ord() seems to be a slow operation.

If the substring is defined as ''.join(sorted(substr)) in alphabet then:
there is no duplicates in the substring and therefore the size of
the longest substring is less than (or equal to) the size of the alphabet
(ord(max(substr)) - ord(min(substr)) + 1) == len(substr), where
ord() returns position in the alphabet (+/- constant) (builtin
ord() can be used for lowercase ascii letters)
Here's O(n*m*m)-time, O(m)-space solution, where n is len(input_string) and m is len(alphabet):
from itertools import count
def longest_substr(input_string):
maxsubstr = input_string[0:0] # empty slice (to accept subclasses of str)
for start in range(len(input_string)): # O(n)
for end in count(start + len(maxsubstr) + 1): # O(m)
substr = input_string[start:end] # O(m)
if len(set(substr)) != (end - start): # found duplicates or EOS
break
if (ord(max(substr)) - ord(min(substr)) + 1) == len(substr):
maxsubstr = substr
return maxsubstr
Example:
print(longest_substr("owadcbjkl"))
# -> adcb

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keep one of two consecutive chars in a string - python

So I want to replicate a word n times in my function but I want to eliminate the consecutive characters. For example repete (amanha, 2) = "amanhamanha" My function: def repete(palavra,n): a = [] b="" for n in range (0,n): a.append(palavra) b = b.join(a) return b

Related

Time Complexity for LeetCode 3. Longest Substring Without Repeating Characters

Time limit exceeded error. Word Ladder leetcode

CodeEval Hard Challenge 6 - LONGEST COMMON SUBSEQUENCE - python

Writing a recursive function that returns the digit with longest consecutive sequence

Find the longest substring with contiguous characters, where the string may be jumbled

Categories

Resources