I am trying to solve leetcode problem(https://leetcode.com/problems/word-ladder/description/):
Given two words (beginWord and endWord), and a dictionary's word list, find the length of shortest transformation sequence from beginWord to endWord, such that:
Only one letter can be changed at a time.
Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
Note:
Return 0 if there is no such transformation sequence.
All words have the same length.
All words contain only lowercase alphabetic characters.
You may assume no duplicates in the word list.
You may assume beginWord and endWord are non-empty and are not the same.
Input:
beginWord = "hit",
endWord = "cog",
wordList = ["hot","dot","dog","lot","log","cog"]
Output:
5
Explanation:
As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" ->
"cog", return its length 5.
import queue
class Solution:
def isadjacent(self,a, b):
count = 0
n = len(a)
for i in range(n):
if a[i] != b[i]:
count += 1
if count > 1:
return False
if count == 1:
return True
def ladderLength(self,beginWord, endWord, wordList):
word_queue = queue.Queue(maxsize=0)
word_queue.put((beginWord,1))
while word_queue.qsize() > 0:
queue_last = word_queue.get()
index = 0
while index != len(wordList):
if self.isadjacent(queue_last[0],wordList[index]):
new_len = queue_last[1]+1
if wordList[index] == endWord:
return new_len
word_queue.put((wordList[index],new_len))
wordList.pop(index)
index-=1
index+=1
return 0
Can someone suggest how to optimise it and prevent the error!
The basic idea is to find the adjacent words faster. Instead of considering every word in the list (even one that has already been filtered by word length), construct each possible neighbor string and check whether it is in the dictionary. To make those lookups fast, make sure the word list is stored in something like a set that supports fast membership tests.
To go even faster, you could store two sorted word lists, one sorted by the reverse of each word. Then look for possibilities involving changing a letter in the first half in the reversed list and for the latter half in the normal list. All the existing neighbors can then be found without making any non-word strings. This can even be extended to n lists, each sorted by omitting one letter from all the words.
Related
I have a problem to solve which is to recursively search for a string in a list (length of string and list is atleast 2) and return it's positions. for example: if we had ab with the list ['a','b','c'], the function should return '(0,2)', as ab starts at index 0 and ends at 1 (we add one more).
if we had bc with the same list the function should return '(1,3)'.
if we had ac with the same list the function should return not found.
Note that I'm solving a bigger problem which is to recursively search for a string in a matrix of characters (that appears from up to down, or left to right only), but I am nowhere near the solution, so I'm starting by searching for a word in a row of a matrix on a given index (as for searching for a word in a normal list), so my code might have char_mat[idx], treat it as a normal list like ['c','d','e'] for example.
Note that my code is full of bugs and it doesn't work, so I explained what I tried to do under it.
def search_at_idx(search_word, char_mat, idx, start, end):
if len(char_mat[idx]) == 2:
if ''.join(char_mat[idx]) == search_word:
return 0,2
else:
return 'not found', 'not found'
start, end = search_at_idx(search_word, char_mat[idx][1:], idx, start+1, end)
return start, end
The idea of what I tried to do here is to find the base of the recursion (when the length of the list reaches 2), and with that little problem I just check if my word is equal to the chars when joined together as a string, and return the position of the string if it's equal else return not found
Then for the recursion step, I send the list without the first character, and my start index +1, so if this function does all the job for me (as the recursion hypothesis), I need to check the last element in the list so my recursion works. (but I don't know really if this is the way to do it since the last index can be not in the word, so I got stuck). Now I know that I made alot of mistakes and I'm nowhere near the correct answer,I would really appreciate any explanation or help in order to understand how to do this problem and move on to my bigger problem which is finding the string in a matrix of chars.
I've thrown together a little example that should get you a few steps ahead
char_mat = [['c', 'e', 'l', 'k', 'v'],]
search_word = 'lk'
def search_at_idx(search_word, char_mat, idx, start=0):
if len(char_mat[idx]) < len(search_word):
return 'not', 'found'
if ''.join(char_mat[idx][:len(search_word)]) == search_word:
return start, start+len(search_word)
char_mat[idx] = char_mat[idx][1:]
start, end = search_at_idx(search_word, char_mat, idx, start+1)
return start, end
print(search_at_idx(search_word, char_mat, 0))
To point out a few errors of yours:
In your recursion, you use char_mat[idx][1:]. This will pass a slice of the list and not the modified matrix. That means your next call to char_mat[idx] will check the letter at that index in the array. I'll recommend using the debugger and stepping through the program to check the contents of your variables
Instead of using start and end, you can always assume that the found word has the same length as the word you are searching for. So the distance you have to look is always start + len(search_word)
If you have any additional questions about my code, please comment.
Here's an example for list comprehension if that counts as loophole:
foundword = list(map("".join, list(zip(*([char_mat[idx][i:] + list(char_mat[idx][i-1]) for i in range(len(search_word))])))[:-1])).index(search_word)
print((foundword, foundword + len(search_word)) if foundword else 'Not found')
l = ["a","b","c"]
def my_indexes(pattern, look_list, indx_val):
if pattern == "".join(look_list)[:2]:
return indx_val, indx_val+1
else:
if len(look_list) == 2:
return None
return my_indexes(pattern, look_list[1:],indx_val+1)
print(my_indexes("bc",l,0))
Two options:
1.We find the case we are looking for, so the first two elements of our list are "ab", or
2. "a" and "b" are not first two elements of our list. call the same function without first element of the list,and increase indx_val so our result will be correct.We stop doing this when the len(list) = 2 and we didn't find a case. (assuming we're looking for length of 2 chars)
edit: for all lengths
l = ["a","b","c","d"]
def my_indexes(pattern, look_list, indx_val):
if pattern == "".join(look_list)[:len(pattern)]:
return indx_val, indx_val+len(pattern) # -1 to match correct indexes
else:
if len(look_list) == len(pattern):
return None
return my_indexes(pattern, look_list[1:],indx_val+1)
print(my_indexes("cd",l,0))
I have few words(strings) like 'hefg','dhck','dkhc','lmno' which is to be converted to new words by swapping some or all the characters such that the new word is greater than the original word lexicographically also the new word is the least of all the words greater than the original word.
for e.g 'dhck'
should output 'dhkc' and not 'kdhc','dchk' or any other.
i have these inputs
hefg
dhck
dkhc
fedcbabcd
which should output
hegf
dhkc
hcdk
fedcbabdc
I have tried with this code in python it worked for all except 'dkhc' and 'fedcbabcd'.
I have figured out that the first character in case of 'fedcbabcd' is the max so, it is not getting swapped.and
Im getting "ValueError: min() arg is an empty sequence"
How can I modify the algorithm To fix the cases?
list1=['d','k','h','c']
list2=[]
maxVal=list1.index(max(list1))
for i in range(maxVal):
temp=list1[maxVal]
list1[maxVal]=list1[i-1]
list1[i-1]=temp
list2.append(''.join(list1))
print(min(list2))
You can try something like this:
iterate the characters in the string in reverse order
keep track of the characters you've already seen, and where you saw them
if you've seen a character larger than the curent character, swap it with the smallest larger character
sort all the characters after the that position to get the minimum string
Example code:
def next_word(word):
word = list(word)
seen = {}
for i in range(len(word)-1, -1, -1):
if any(x > word[i] for x in seen):
x = min(x for x in seen if x > word[i])
word[i], word[seen[x]] = word[seen[x]], word[i]
return ''.join(word[:i+1] + sorted(word[i+1:]))
if word[i] not in seen:
seen[word[i]] = i
for word in ["hefg", "dhck", "dkhc", "fedcbabcd"]:
print(word, next_word(word))
Result:
hefg hegf
dhck dhkc
dkhc hcdk
fedcbabcd fedcbabdc
The max character and its position doesn't influence the algorithm in the general case. For example, for 'fedcbabcd', you could prepend an a or a z at the beginning of the string and it wouldn't change the fact that you need to swap the final two letters.
Consider the input 'dgfecba'. Here, the output is 'eabcdfg'. Why? Notice that the final six letters are sorted in decreasing order, so by changing anything there, you get a smaller string lexicographically, which is no good. It follows that you need to replace the initial 'd'. What should we put in its place? We want something greater than 'd', but as small as possible, so 'e'. What about the remaining six letters? Again, we want a string that's as small as possible, so we sort the letters lexicographically: 'eabcdfg'.
So the algorithm is:
start at the back of the string (right end);
go left while the symbols keep increasing;
let i be the rightmost position where s[i] < s[i + 1]; in our case, that's i = 0;
leave the symbols on position 0, 1, ..., i - 1 untouched;
find the position among i+1 ... n-1 containing the least symbol that's greater than s[i]; call this position j; in our case, j = 3;
swap s[i] and s[j]; in our case, we obtain 'egfdcba';
reverse the string s[i+1] ... s[n-1]; in our case, we obtain 'eabcdfg'.
Your problem can we reworded as finding the next lexicographical permutation of a string.
The algorithm in the above link is described as follow:
1) Find the longest non-increasing suffix
2) The number left of the
suffix is our pivot
3) Find the right-most successor of the pivot in
the suffix
4) Swap the successor and the pivot
5) Reverse the suffix
The above algorithm is especially interesting because it is O(n).
Code
def next_lexicographical(word):
word = list(word)
# Find the pivot and the successor
pivot = next(i for i in range(len(word) - 2, -1, -1) if word[i] < word[i+1])
successor = next(i for i in range(len(word) - 1, pivot, -1) if word[i] > word[pivot])
# Swap the pivot and the successor
word[pivot], word[successor] = word[successor], word[pivot]
# Reverse the suffix
word[pivot+1:] = word[-1:pivot:-1]
# Reform the word and return it
return ''.join(word)
The above algorithm will raise a StopIteration exception if the word is already the last lexicographical permutation.
Example
words = [
'hefg',
'dhck',
'dkhc',
'fedcbabcd'
]
for word in words:
print(next_lexicographical(word))
Output
hegf
dhkc
hcdk
fedcbabdc
Given a string str and a list of variable-length prefixes p, I want to find all possible prefixes found at the start of str, allowing for up to k mismatches and wildcards (dot character) in str.
I only want to search at the beginning of the string and need to do this efficiently for len(p) <= 1000; k <= 5 and millions of strs.
So for example:
str = 'abc.efghijklmnop'
p = ['abc', 'xxx', 'xbc', 'abcxx', 'abcxxx']
k = 1
result = ['abc', 'xbc', 'abcxx'] #but not 'xxx', 'abcxxx'
Is there an efficient algorithm for this, ideally with a python implementation already available?
My current idea would be to walk through str character by character and keep a running tally of each prefix's mismatch count.
At each step, I would calculate a new list of candidates which is the list of prefixes that do not have too many mismatches.
If I reach the end of a prefix it gets added to the returned list.
So something like this:
def find_prefixes_with_mismatches(str, p, k):
p_with_end = [prefix+'$' for prefix in p]
candidates = list(range(len(p)))
mismatches = [0 for _ in candidates]
result = []
for char_ix in range(len(str)):
#at each iteration we build a new set of candidates
new_candidates = []
for prefix_ix in candidates:
#have we reached the end?
if p_with_end[prefix_ix][char_ix] == '$':
#then this is a match
result.append(p[prefix_ix])
#do not add to new_candidates
else:
#do we have a mismatch
if str[char_ix] != p_with_end[prefix_ix][char_ix] and str[char_ix] != '.' and p_with_end[prefix_ix][char_ix] != '.':
mismatches[prefix_ix] += 1
#only add to new_candidates if the number is still not >k
if mismatches[prefix_ix] <= k:
new_candidates.append(prefix_ix)
else:
#if not, this remains a candidate
new_candidates.append(prefix_ix)
#update candidates
candidates = new_candidates
return result
But I'm not sure if this will be any more efficient than simply searching one prefix after the other, since it requires rebuilding this list of candidates at every step.
I do not know of something that does exactly this.
But if I were to write it, I'd try constructing a trie of all possible decision points, with an attached vector of all states you wound up in. You would then take each string, walk the trie until you hit a final matched node, then return the precompiled vector of results.
If you've got a lot of prefixes and have set k large, that trie may be very big. But if you're amortizing creating it against running it on millions of strings, it may be worthwhile.
we've started doing Lists in our class and I'm a bit confused thus coming here since previous questions/answers have helped me in the past.
The first question was to sum up all negative numbers in a list, I think I got it right but just want to double check.
import random
def sumNegative(lst):
sum = 0
for e in lst:
if e < 0:
sum = sum + e
return sum
lst = []
for i in range(100):
lst.append(random.randrange(-1000, 1000))
print(sumNegative(lst))
For the 2nd question, I'm a bit stuck on how to write it. The question was:
Count how many words occur in a list up to and including the first occurrence of the word “sap”. I'm assuming it's a random list but wasn't given much info so just going off that.
I know the ending would be similar but no idea how the initial part would be since it's string opposed to numbers.
I wrote a code for a in-class problem which was to count how many odd numbers are on a list(It was random list here, so assuming it's random for that question as well) and got:
import random
def countOdd(lst):
odd = 0
for e in lst:
if e % 2 = 0:
odd = odd + 1
return odd
lst = []
for i in range(100):
lst.append(random.randint(0, 1000))
print(countOdd(lst))
How exactly would I change this to fit the criteria for the 2nd question? I'm just confused on that part. Thanks.
The code to sum -ve numbers looks fine! I might suggest testing it on a list that you can manually check, such as:
print(sumNegative([1, -1, -2]))
The same logic would apply to your random list.
A note about your countOdd function, it appears that you are missing an = (== checks for equality, = is for assignment) and the code seems to count even numbers, not odd. The code should be:
def countOdd(lst):
odd = 0
for e in lst:
if e%2 == 1: # Odd%2 == 1
odd = odd + 1
return odd
As for your second question, you can use a very similar function:
def countWordsBeforeSap(inputList):
numWords = 0
for word in inputList:
if word.lower() != "sap":
numWords = numWords + 1
else:
return numWords
inputList = ["trees", "produce", "sap"]
print(countWordsBeforeSap(inputList))
To explain the above, the countWordsBeforeSap function:
Starts iterating through the words.
If the word is anything other than "sap" it increments the counter and continues
If the word IS "sap" then it returns early from the function
The function could be more general by passing in the word that you wanted to check for:
def countWordsBefore(inputList, wordToCheckFor):
numWords = 0
for word in inputList:
if word.lower() != wordToCheckFor:
numWords = numWords + 1
else:
return numWords
inputList = ["trees", "produce", "sap"]
print(countWordsBeforeSap(inputList, "sap"))
If the words that you are checking come from a single string then you would initially need to split the string into individual words like so:
inputString = "Trees produce sap"
inputList = inputString.split(" ")
Which splits the initial string into words that are separated by spaces.
Hope this helps!
Tom
def count_words(lst, end="sap"):
"""Note that I added an extra input parameter.
This input parameter has a default value of "sap" which is the actual question.
However you can change this input parameter to any other word if you want to by
just doing "count_words(lst, "another_word".
"""
words = []
# First we need to loop through each item in the list.
for item in lst:
# We append the item to our "words" list first thing in this loop,
# as this will make sure we will count up to and INCLUDING.
words.append(item)
# Now check if we have reached the 'end' word.
if item == end:
# Break out of the loop prematurely, as we have reached the end.
break
# Our 'words' list now has all the words up to and including the 'end' variable.
# 'len' will return how many items there are in the list.
return len(words)
lst = ["something", "another", "woo", "sap", "this_wont_be_counted"]
print(count_words(lst))
Hope this helps you understand lists better!
You can make effective use of list/generator comprehensions. Below are fast and memory efficient.
1. Sum of negatives:
print(sum( i<0 for i in lst))
2. Count of words before sap: Like you sample list, it assumes no numbers are there in list.
print(lst.index('sap'))
If it's a random list. Filter strings. Find Index for sap
l = ['a','b',1,2,'sap',3,'d']
l = filter(lambda x: type(x)==str, l)
print(l.index('sap'))
3. Count of odd numbers:
print(sum(i%2 != 0 for i in lst))
I'd like to compute the edits required to transform one string, A, into another string B using only inserts and deletions, with the minimum number of operations required.
So something like "kitten" -> "sitting" would yield a list of operations something like ("delete at 0", "insert 's' at 0", "delete at 4", "insert 'i' at 3", "insert 'g' at 6")
Is there an algorithm to do this, note that I don't want the edit distance, I want the actual edits.
I had an assignment similar to this at one point. Try using an A* variant. Construct a graph of possible 'neighbors' for a given word and search outward using A* with the distance heuristic being the number of letter needed to change in the current word to reach the target. It should be clear as to why this is a good heuristic-it's always going to underestimate accurately. You could think of a neighbor as a word that can be reached from the current word only using one operation. It should be clear that this algorithm will correctly solve your problem optimally with slight modification.
I tried to make something that works, at least for your precise case.
word_before = "kitten"
word_after = "sitting"
# If the strings aren't the same length, we stuff the smallest one with spaces
if len(word_before) > len(word_after):
word_after += " "*(len(word_before)-len(word_after))
elif len(word_before) < len(word_after):
word_before += " "*(len(word_after)-len(word_before))
operations = []
for idx, char in enumerate(word_before):
if char != word_after[idx]:
if char != " ":
operations += ["delete at "+str(idx)]
operations += ["insert '"+word_after[idx]+"' at "+str(idx)]
print(operations)
This should be what you're looking for, using itertools.zip_longest to zip the lists together and iterate over them in pairs compares them and applies the correct operation, it appends the operation to a list at the end of each operation, it compares the lists if they match and breaks out or continues if they don't
from itertools import zip_longest
a = "kitten"
b = "sitting"
def transform(a, b):
ops = []
for i, j in zip_longest(a, b, fillvalue=''):
if i == j:
pass
else:
index = a.index(i)
print(a, b)
ops.append('delete {} '.format(i)) if i != '' else ''
a = a.replace(i, '')
if a == b:
break
ops[-1] += 'insert {} at {},'.format(j, index if i not in b else b.index(j))
return ops
result = transform(a, b)
print(result, ' {} operation(s) was carried out'.format(len(result)))
Since you only have delete and insert operations, this is an instance of the Longest Common Subsequence Problem : https://en.wikipedia.org/wiki/Longest_common_subsequence_problem
Indeed, there is a common subsequence of length k in two strings S and T, S of length n and T of length m, if and only only you can transform S into T with m+n-2k insert and delete operations. Think about this as intuition : the order of the letters is preserved both when adding and deleting letters, as well as when taking a subsequence.
EDIT : since you asked for the list of edits, a possible way to do the edits is to first remove all the characters of S not in the common subsequence, and then insert all the characters of T that are not the in common subsequence.