String manipulation algorithm to find string greater than original string - python

I have few words(strings) like 'hefg','dhck','dkhc','lmno' which is to be converted to new words by swapping some or all the characters such that the new word is greater than the original word lexicographically also the new word is the least of all the words greater than the original word.
for e.g 'dhck'
should output 'dhkc' and not 'kdhc','dchk' or any other.
i have these inputs
hefg
dhck
dkhc
fedcbabcd
which should output
hegf
dhkc
hcdk
fedcbabdc
I have tried with this code in python it worked for all except 'dkhc' and 'fedcbabcd'.
I have figured out that the first character in case of 'fedcbabcd' is the max so, it is not getting swapped.and
Im getting "ValueError: min() arg is an empty sequence"
How can I modify the algorithm To fix the cases?
list1=['d','k','h','c']
list2=[]
maxVal=list1.index(max(list1))
for i in range(maxVal):
temp=list1[maxVal]
list1[maxVal]=list1[i-1]
list1[i-1]=temp
list2.append(''.join(list1))
print(min(list2))

You can try something like this:
iterate the characters in the string in reverse order
keep track of the characters you've already seen, and where you saw them
if you've seen a character larger than the curent character, swap it with the smallest larger character
sort all the characters after the that position to get the minimum string
Example code:
def next_word(word):
word = list(word)
seen = {}
for i in range(len(word)-1, -1, -1):
if any(x > word[i] for x in seen):
x = min(x for x in seen if x > word[i])
word[i], word[seen[x]] = word[seen[x]], word[i]
return ''.join(word[:i+1] + sorted(word[i+1:]))
if word[i] not in seen:
seen[word[i]] = i
for word in ["hefg", "dhck", "dkhc", "fedcbabcd"]:
print(word, next_word(word))
Result:
hefg hegf
dhck dhkc
dkhc hcdk
fedcbabcd fedcbabdc

The max character and its position doesn't influence the algorithm in the general case. For example, for 'fedcbabcd', you could prepend an a or a z at the beginning of the string and it wouldn't change the fact that you need to swap the final two letters.
Consider the input 'dgfecba'. Here, the output is 'eabcdfg'. Why? Notice that the final six letters are sorted in decreasing order, so by changing anything there, you get a smaller string lexicographically, which is no good. It follows that you need to replace the initial 'd'. What should we put in its place? We want something greater than 'd', but as small as possible, so 'e'. What about the remaining six letters? Again, we want a string that's as small as possible, so we sort the letters lexicographically: 'eabcdfg'.
So the algorithm is:
start at the back of the string (right end);
go left while the symbols keep increasing;
let i be the rightmost position where s[i] < s[i + 1]; in our case, that's i = 0;
leave the symbols on position 0, 1, ..., i - 1 untouched;
find the position among i+1 ... n-1 containing the least symbol that's greater than s[i]; call this position j; in our case, j = 3;
swap s[i] and s[j]; in our case, we obtain 'egfdcba';
reverse the string s[i+1] ... s[n-1]; in our case, we obtain 'eabcdfg'.

Your problem can we reworded as finding the next lexicographical permutation of a string.
The algorithm in the above link is described as follow:
1) Find the longest non-increasing suffix
2) The number left of the
suffix is our pivot
3) Find the right-most successor of the pivot in
the suffix
4) Swap the successor and the pivot
5) Reverse the suffix
The above algorithm is especially interesting because it is O(n).
Code
def next_lexicographical(word):
word = list(word)
# Find the pivot and the successor
pivot = next(i for i in range(len(word) - 2, -1, -1) if word[i] < word[i+1])
successor = next(i for i in range(len(word) - 1, pivot, -1) if word[i] > word[pivot])
# Swap the pivot and the successor
word[pivot], word[successor] = word[successor], word[pivot]
# Reverse the suffix
word[pivot+1:] = word[-1:pivot:-1]
# Reform the word and return it
return ''.join(word)
The above algorithm will raise a StopIteration exception if the word is already the last lexicographical permutation.
Example
words = [
'hefg',
'dhck',
'dkhc',
'fedcbabcd'
]
for word in words:
print(next_lexicographical(word))
Output
hegf
dhkc
hcdk
fedcbabdc

Related

Recursively searching for a string in a list of characters

I have a problem to solve which is to recursively search for a string in a list (length of string and list is atleast 2) and return it's positions. for example: if we had ab with the list ['a','b','c'], the function should return '(0,2)', as ab starts at index 0 and ends at 1 (we add one more).
if we had bc with the same list the function should return '(1,3)'.
if we had ac with the same list the function should return not found.
Note that I'm solving a bigger problem which is to recursively search for a string in a matrix of characters (that appears from up to down, or left to right only), but I am nowhere near the solution, so I'm starting by searching for a word in a row of a matrix on a given index (as for searching for a word in a normal list), so my code might have char_mat[idx], treat it as a normal list like ['c','d','e'] for example.
Note that my code is full of bugs and it doesn't work, so I explained what I tried to do under it.
def search_at_idx(search_word, char_mat, idx, start, end):
if len(char_mat[idx]) == 2:
if ''.join(char_mat[idx]) == search_word:
return 0,2
else:
return 'not found', 'not found'
start, end = search_at_idx(search_word, char_mat[idx][1:], idx, start+1, end)
return start, end
The idea of what I tried to do here is to find the base of the recursion (when the length of the list reaches 2), and with that little problem I just check if my word is equal to the chars when joined together as a string, and return the position of the string if it's equal else return not found
Then for the recursion step, I send the list without the first character, and my start index +1, so if this function does all the job for me (as the recursion hypothesis), I need to check the last element in the list so my recursion works. (but I don't know really if this is the way to do it since the last index can be not in the word, so I got stuck). Now I know that I made alot of mistakes and I'm nowhere near the correct answer,I would really appreciate any explanation or help in order to understand how to do this problem and move on to my bigger problem which is finding the string in a matrix of chars.
I've thrown together a little example that should get you a few steps ahead
char_mat = [['c', 'e', 'l', 'k', 'v'],]
search_word = 'lk'
def search_at_idx(search_word, char_mat, idx, start=0):
if len(char_mat[idx]) < len(search_word):
return 'not', 'found'
if ''.join(char_mat[idx][:len(search_word)]) == search_word:
return start, start+len(search_word)
char_mat[idx] = char_mat[idx][1:]
start, end = search_at_idx(search_word, char_mat, idx, start+1)
return start, end
print(search_at_idx(search_word, char_mat, 0))
To point out a few errors of yours:
In your recursion, you use char_mat[idx][1:]. This will pass a slice of the list and not the modified matrix. That means your next call to char_mat[idx] will check the letter at that index in the array. I'll recommend using the debugger and stepping through the program to check the contents of your variables
Instead of using start and end, you can always assume that the found word has the same length as the word you are searching for. So the distance you have to look is always start + len(search_word)
If you have any additional questions about my code, please comment.
Here's an example for list comprehension if that counts as loophole:
foundword = list(map("".join, list(zip(*([char_mat[idx][i:] + list(char_mat[idx][i-1]) for i in range(len(search_word))])))[:-1])).index(search_word)
print((foundword, foundword + len(search_word)) if foundword else 'Not found')
l = ["a","b","c"]
def my_indexes(pattern, look_list, indx_val):
if pattern == "".join(look_list)[:2]:
return indx_val, indx_val+1
else:
if len(look_list) == 2:
return None
return my_indexes(pattern, look_list[1:],indx_val+1)
print(my_indexes("bc",l,0))
Two options:
1.We find the case we are looking for, so the first two elements of our list are "ab", or
2. "a" and "b" are not first two elements of our list. call the same function without first element of the list,and increase indx_val so our result will be correct.We stop doing this when the len(list) = 2 and we didn't find a case. (assuming we're looking for length of 2 chars)
edit: for all lengths
l = ["a","b","c","d"]
def my_indexes(pattern, look_list, indx_val):
if pattern == "".join(look_list)[:len(pattern)]:
return indx_val, indx_val+len(pattern) # -1 to match correct indexes
else:
if len(look_list) == len(pattern):
return None
return my_indexes(pattern, look_list[1:],indx_val+1)
print(my_indexes("cd",l,0))

Time limit exceeded error. Word Ladder leetcode

I am trying to solve leetcode problem(https://leetcode.com/problems/word-ladder/description/):
Given two words (beginWord and endWord), and a dictionary's word list, find the length of shortest transformation sequence from beginWord to endWord, such that:
Only one letter can be changed at a time.
Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
Note:
Return 0 if there is no such transformation sequence.
All words have the same length.
All words contain only lowercase alphabetic characters.
You may assume no duplicates in the word list.
You may assume beginWord and endWord are non-empty and are not the same.
Input:
beginWord = "hit",
endWord = "cog",
wordList = ["hot","dot","dog","lot","log","cog"]
Output:
5
Explanation:
As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" ->
"cog", return its length 5.
import queue
class Solution:
def isadjacent(self,a, b):
count = 0
n = len(a)
for i in range(n):
if a[i] != b[i]:
count += 1
if count > 1:
return False
if count == 1:
return True
def ladderLength(self,beginWord, endWord, wordList):
word_queue = queue.Queue(maxsize=0)
word_queue.put((beginWord,1))
while word_queue.qsize() > 0:
queue_last = word_queue.get()
index = 0
while index != len(wordList):
if self.isadjacent(queue_last[0],wordList[index]):
new_len = queue_last[1]+1
if wordList[index] == endWord:
return new_len
word_queue.put((wordList[index],new_len))
wordList.pop(index)
index-=1
index+=1
return 0
Can someone suggest how to optimise it and prevent the error!
The basic idea is to find the adjacent words faster. Instead of considering every word in the list (even one that has already been filtered by word length), construct each possible neighbor string and check whether it is in the dictionary. To make those lookups fast, make sure the word list is stored in something like a set that supports fast membership tests.
To go even faster, you could store two sorted word lists, one sorted by the reverse of each word. Then look for possibilities involving changing a letter in the first half in the reversed list and for the latter half in the normal list. All the existing neighbors can then be found without making any non-word strings. This can even be extended to n lists, each sorted by omitting one letter from all the words.

Consecutive values in strings, getting indices

The following is a python string of length of approximately +1000.
string1 = "XXXXXXXXXXXXXXXXXXXXXAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBB........AAAAXXXXX"
len(string1) ## 1311
I would like to know the index of where the consecutive X's end and the non-X characters begin. Reading this string from left to right, the first non-X character is at index location 22, and the first non-X character from the right is at index location 1306.
How does one find these indices?
My guess would be:
for x in string1:
if x != "X":
print(string.index(x))
The problem with this is it outputs all indices that are not X. It does not give me the index where the consecutive X's end.
Even more confusing for me is how to "check" for consecutive X's. Let's say I have this string:
string2 = "XXXXAAXAAAAAAAAAAAAAAABBBBBBBBBBBBBB........AAAAXXXXX"
Here, the consecutive X's end at index 4, not index 7. How could I check several characters ahead whether this is really no longer consecutive?
using regex, split the first & last group of Xs, get their lengths to construct the indices.
import re
mystr = 'XXXXAAXAAAAAAAAAAAAAAABBBBBBBBBBBBBB........AAAAXXXXX'
xs = re.split('[A-W|Y-Z]+', mystr)
indices = (len(xs[0]), len(mystr) - len(xs[-1]) - 1)
# (4, 47)
I simply need the outputs for the indices. I'm then going to put them in randint(first_index, second_index)
Its possible to pass the indices to the function like this
randint(*indices)
However, I suspect that you want to use the output of randint(first_index, last_index) to select a random character from the middle, this would be a shorter alternative.
from random import choice
randchar = choice(mystr.strip('X'))
If I understood well your question, you just do:
def getIndexs(string):
lst =[]
flag = False
for i, char in enumerate(string):
if char == "x":
flag = True
if ((char != "x") and flag):
lst.append(i-1)
flag = False
return lst
print(getIndexs("xxxxbbbxxxxaaaxxxbb"))
[3, 10, 16]
If the sequences are, as you say, only in the beginning and at the end of your string, a simple loop / reversed loop would suffice:
string1 = "XXXXXXXXXXXXXXXXXXXXXAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBB........AAAAXXXXX"
left_index = 0
for char in string1:
left_index += 1
if char != "X":
break
right_index = len(string1)
for char in reversed(string1):
if char != "X":
break
right_index -= 1
print(left_index) # 22
print(right_index) # 65
Regex can lookahead and identify characters that don't match the pattern:
>>>[match.span() for match in re.finditer(r'X{2,}((?=[^X])|$)', string2)]
[(0, 4), (48, 53)]
Breaking this down:
X - the character we're matching
{2,} - need to see at least two in a row to consider a match
((?=[^X])|$) - two conditions will satisfy the match
(?=[^X]) - lookahead for anything but an X
$ - the end of the string
As a result, finditer returns each instance where there are multiple X's, followed by a non-X or an end of line. match.span() extracts the position information from each match from the string.
This will give you the first index and last index (of non-'X' character).
s = 'XXABCDXXXEFGHXXXXX'
first_index = len(s) - len(s.lstrip('X'))
last_index = len(s.rstrip('X')) - len(s) - 1
print first_index, last_index
2 -6
How it works:
For first_index:
We strip all the 'X' characters at the beginning of our string. Finding the difference in length between the original and shortened string gives us the index of the first non-'X' character.
For last_index:
Similarly, we strip the 'X' characters at the end of our string. We also subtract 1 from the difference, since reverse indexing in Python starts from -1.
Note:
If you just want to randomly select one of the characters between first_index and last_index, you can do:
import random
shortened_s = s.strip('X')
random.choice(shortened_s)

python recursion with bubble sort

So, i have this problem where i recieve 2 strings of letters ACGT, one with only letters, the other contain letters and dashes "-".both are same length. the string with the dashes is compared to the string without it. cell for cell. and for each pairing i have a scoring system. i wrote this code for the scoring system:
for example:
dna1: -ACA
dna2: TACG
the scoring is -1. (because dash compared to a letter(T) gives -2, letter compared to same letter gives +1 (A to A), +1 (C to C) and non similar letters give (-1) so sum is -1.
def get_score(dna1, dna2, match=1, mismatch=-1, gap=-2):
""""""
score = 0
for index in range(len(dna1)):
if dna1[index] is dna2[index]:
score += match
elif dna1[index] is not dna2[index]:
if "-" not in (dna1[index], dna2[index]):
score += mismatch
else:
score += gap
this is working fine.
now i have to use recursion to give the best possible score for 2 strings.
i recieve 2 strings, they can be of different sizes this time. ( i cant change the order of letters).
so i wrote this code that adds "-" as many times needed to the shorter string to create 2 strings of same length and put them in the start of list. now i want to start moving the dashes and record the score for every dash position, and finally get the highest posibble score. so for moving the dashes around i wrote a litle bubble sort.. but it dosnt seem to do what i want. i realize its a long quesiton but i'd love some help. let me know if anything i wrote is not understood.
def best_score(dna1, dna2, match=1, mismatch=-1, gap=-2,\
score=[], count=0):
""""""
diff = abs(len(dna1) - len(dna2))
if len(dna1) is len(dna2):
short = []
elif len(dna1) < len(dna2):
short = [base for base in iter(dna1)]
else:
short = [base for base in iter(dna2)]
for i in range(diff):
short.insert(count, "-")
for i in range(diff+count, len(short)-1):
if len(dna1) < len(dna2):
score.append((get_score(short, dna2),\
''.join(short), dna2))
else:
score.append((get_score(dna1, short),\
dna1, ''.join(short)))
short[i+1], short[i] = short[i], short[i+1]
if count is min(len(dna1), len(dna2)):
return score[score.index(max(score))]
return best_score(dna1, dna2, 1, -1, -2, score, count+1)
First, if I correctly deciephered your cost function, your best score value do not depend on gap, as number of dashes is fixed.
Second, it is lineary dependent on number of mismatches and so doesn't depend on match and mismatch exact values, as long as they are positive and negative respectively.
So your task reduces to lookup of a longest subsequence of longest string letters strictly matching subsequence of letters of the shortest one.
Third, define by M(string, substr) function returnin length of best match from above. If you smallest string fisrt letter is S, that is substr == 'S<letters>', then
M(string, 'S<letters>') = \
max(1 + M(string[string.index(S):], '<letters>') + # found S
M(string[1:], '<letters>')) # letter S not found, placed at 1st place
latter is an easy to implement recursive expression.
For a pair string, substr denoting m=M(string, substr) best score is equal
m * match + (len(substr) - m) * mismatch + (len(string)-len(substr)) * gap
It is straightforward, storing what value was max in recursive expression, to find what exactly best match is.

Find the longest substring with contiguous characters, where the string may be jumbled

Given a string, find the longest substring whose characters are contiguous (i.e. they are consecutive letters) but possibly jumbled (i.e. out of order). For example:
Input : "owadcbjkl"
Output: "adcb"
We consider adcb as contiguous as it forms abcd.
(This is an interview question.)
I have an idea of running a while loop with 2 conditions, one that checks for continuous characters using Python's ord and another condition to find the minimum and maximum and check if all the following characters fall in this range.
Is there any way this problem could be solved with low running time complexity? The best I can achieve is O(N^2) where N is the length of the input string and ord() seems to be a slow operation.
If the substring is defined as ''.join(sorted(substr)) in alphabet then:
there is no duplicates in the substring and therefore the size of
the longest substring is less than (or equal to) the size of the alphabet
(ord(max(substr)) - ord(min(substr)) + 1) == len(substr), where
ord() returns position in the alphabet (+/- constant) (builtin
ord() can be used for lowercase ascii letters)
Here's O(n*m*m)-time, O(m)-space solution, where n is len(input_string) and m is len(alphabet):
from itertools import count
def longest_substr(input_string):
maxsubstr = input_string[0:0] # empty slice (to accept subclasses of str)
for start in range(len(input_string)): # O(n)
for end in count(start + len(maxsubstr) + 1): # O(m)
substr = input_string[start:end] # O(m)
if len(set(substr)) != (end - start): # found duplicates or EOS
break
if (ord(max(substr)) - ord(min(substr)) + 1) == len(substr):
maxsubstr = substr
return maxsubstr
Example:
print(longest_substr("owadcbjkl"))
# -> adcb

Categories