find set of words in list

find set of words in list - python

In the example I made up below, I am trying to get the words STEM Employment from the text list. If I find those set of words in order, I would like to find the index number of the first word so I can then use that same index number for the width and height lists since they are parallel (meaning their len is always the same dynamic value)
teststring = ("STEM Employment").split(" ")
data = {"text":["some","more","STEM","Employment","data"],
"width":[100,45,50,90,354],
"height":[500,320,320,432,554]}
so for this example, the answer would be 50 and 320 because the first word is STEM. However I am not just looking for STEM I have to make sure that Employment follows right after STEM in the list.
I tried writing a forloop for this but my forloop stops short when it confirms the first word STEM. I am not sure how to fix it:
testchecker = 0
for testword in range(len(data)):
print(data["text"][testword])
for m in teststring:
# print(m)
print(testchecker)
if m in data["text"][testword]:
print("true")
testchecker = testchecker + 1
if testchecker == len(teststring):
print("match")
print(testword-testchecker+1)
pass
else:
testchecker = 0

You can make data["text"] a string with join and check for "STEM Employment" in that. Then find the index of "STEM".
teststring = "STEM Employment"
data = {"text":["some","more","STEM","Employment","data"],
"width":[100,45,50,90,354],
"height":[500,320,320,432,554]}
if teststring in " ".join(data["text"]):
idx = data["text"].index(teststring.split(' ')[0])
print(data["width"][idx], data["height"][idx])
Output:
50 320
Another option:
teststring = "STEM Employment".split(' ')
# Make sure all words in testring are in data["text"]
if all(s in data["text"] for s in teststring):
# Get the indexes of each word
indexes = [data["text"].index(s) for s in teststring]
# Make sure all indexes are sequential
if all(b - a == 1 for a, b in zip(indexes, indexes[1:])):
print(data["width"][indexes[0]], data["height"][indexes[0]])

Related

Find most common substring in a list of strings?

I have a Python list of string names where I would like to remove a common substring from all of the names.
And after reading this similar answer I could almost achieve the desired result using SequenceMatcher.
But only when all items have a common substring:
From List:
string 1 = myKey_apples
string 2 = myKey_appleses
string 3 = myKey_oranges
common substring = "myKey_"
To List:
string 1 = apples
string 2 = appleses
string 3 = oranges
However I have a slightly noisy list that contains a few scattered items that don't fit the same naming convention.
I would like to remove the "most common" substring from the majority:
From List:
string 1 = myKey_apples
string 2 = myKey_appleses
string 3 = myKey_oranges
string 4 = foo
string 5 = myKey_Banannas
common substring = ""
To List:
string 1 = apples
string 2 = appleses
string 3 = oranges
string 4 = foo
string 5 = Banannas
I need a way to match the "myKey_" substring so I can remove it from all names.
But when I use the SequenceMatcher the item "foo" causes the "longest match" to be equal to blank "".
I think the only way to solve this is to find the "most common substring". But how could that be accomplished?
Basic example code:
from difflib import SequenceMatcher
names = ["myKey_apples",
"myKey_appleses",
"myKey_oranges",
#"foo",
"myKey_Banannas"]
string2 = names[0]
for i in range(1, len(names)):
string1 = string2
string2 = names[i]
match = SequenceMatcher(None, string1, string2).find_longest_match(0, len(string1), 0, len(string2))
print(string1[match.a: match.a + match.size]) # -> myKey_

Given names = ["myKey_apples", "myKey_appleses", "myKey_oranges", "foo", "myKey_Banannas"]
An O(n^2) solution I can think of is to find all possible substrings and storing them in a dictionary with the number of times they occur :
substring_counts={}
for i in range(0, len(names)):
for j in range(i+1,len(names)):
string1 = names[i]
string2 = names[j]
match = SequenceMatcher(None, string1, string2).find_longest_match(0, len(string1), 0, len(string2))
matching_substring=string1[match.a:match.a+match.size]
if(matching_substring not in substring_counts):
substring_counts[matching_substring]=1
else:
substring_counts[matching_substring]+=1
print(substring_counts) #{'myKey_': 5, 'myKey_apples': 1, 'o': 1, '': 3}
And then picking the maximum occurring substring
import operator
max_occurring_substring=max(substring_counts.iteritems(), key=operator.itemgetter(1))[0]
print(max_occurring_substring) #myKey_

Here's a overly verbose solution to your problem:
def find_matching_key(list_in, max_key_only = True):
"""
returns the longest matching key in the list * with the highest frequency
"""
keys = {}
curr_key = ''
# If n does not exceed max_n, don't bother adding
max_n = 0
for word in list(set(list_in)): #get unique values to speed up
for i in range(len(word)):
# Look up the whole word, then one less letter, sequentially
curr_key = word[0:len(word)-i]
# if not in, count occurance
if curr_key not in keys.keys() and curr_key!='':
n = 0
for word2 in list_in:
if curr_key in word2:
n+=1
# if large n, Add to dictionary
if n > max_n:
max_n = n
keys[curr_key] = n
# Finish the word
# Finish for loop
if max_key_only:
return max(keys, key=keys.get)
else:
return keys
# Create your "from list"
From_List = [
"myKey_apples",
"myKey_appleses",
"myKey_oranges",
"foo",
"myKey_Banannas"
]
# Use the function
key = find_matching_key(From_List, True)
# Iterate over your list, replacing values
new_From_List = [x.replace(key,'') for x in From_List]
print(new_From_List)
['apples', 'appleses', 'oranges', 'foo', 'Banannas']
Needless to say, this solution would look a lot neater with recursion. Thought I'd sketch out a rough dynamic programming solution for you though.

I would first find the starting letter with the most occurrences. Then I would take each word having that starting letter, and take while all these words have matching letters. Then in the end I would remove the prefix that was found from each starting word:
from collections import Counter
from itertools import takewhile
strings = ["myKey_apples", "myKey_appleses", "myKey_oranges", "berries"]
def remove_mc_prefix(words):
cnt = Counter()
for word in words:
cnt[word[0]] += 1
first_letter = list(cnt)[0]
filter_list = [word for word in words if word[0] == first_letter]
filter_list.sort(key = lambda s: len(s)) # To avoid iob
prefix = ""
length = len(filter_list[0])
for i in range(length):
test = filter_list[0][i]
if all([word[i] == test for word in filter_list]):
prefix += test
else: break
return [word[len(prefix):] if word.startswith(prefix) else word for word in words]
print(remove_mc_prefix(strings))
Out: ['apples', 'appleses', 'oranges', 'berries']

To find the most-common-substring from list of python-string
I already tested on python-3.10.5 I hope it will work for you.
I have the same use case but a different kind of task, I just need to find one common-pattern-string from a list of more than 100s files. To use as a regular-expression.
Your Basic example code is not working in my case. because 1st checking with 2nd, 2nd with 3rd, 3rd with 4th and so on. So, I change it to the most common substring and will check with each one.
The downside of this code is that if something is not common with the most common substring, the final most common substring will be an empty one.
But in my case, it is working.
from difflib import SequenceMatcher
for i in range(1, len(names)):
if i==1:
string1, string2 = names[0], names[i]
else:
string1, string2 = most_common_substring, names[i]
match = SequenceMatcher(None, string1, string2).find_longest_match(0, len(string1), 0, len(string2))
most_common_substring = string1[match.a: match.a + match.size]
print(f"most_common_substring : {most_common_substring}")
python python-3python-difflib

extract substring pattern

I have long file like 1200 sequences
>3fm8|A|A0JLQ2
CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTP
QKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
>2ht9|A|A0JLT0
LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDA
LYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCYL
I want to read each possible pattern has cysteine in middle and has in the beginning five string and follow by other five string such as xxxxxCxxxxx
the output should be like this:
QDIQLCGMGIL
ILPEHCIIDIT
TISDNCVVIFS
FSKTSCSYCTM
this is the pogram only give position of C . it is not work like what I want
pos=[]
def find(ch,string1):
for i in range(len(string1)):
if ch == string1[i]:
pos.append(i)
return pos
z=find('C','AWERQRTCWERTYCTAAAACTTCTTT')
print z

You need to return outside the loop, you are returning on the first match so you only ever get a single character in your list:
def find(ch,string1):
pos = []
for i in range(len(string1)):
if ch == string1[i]:
pos.append(i)
return pos # outside
You can also use enumerate with a list comp in place of your range logic:
def indexes(ch, s1):
return [index for index, char in enumerate(s1)if char == ch and 5 >= index <= len(s1) - 6]
Each index in the list comp is the character index and each char is the actual character so we keep each index where char is equal to ch.
If you want the five chars that are both sides:
In [24]: s="CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTP QKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP"
In [25]: inds = indexes("C",s)
In [26]: [s[i-5:i+6] for i in inds]
Out[26]: ['QDIQLCGMGIL', 'ILPEHCIIDIT']
I added checking the index as we obviously cannot get five chars before C if the index is < 5 and the same from the end.
You can do it all in a single function, yielding a slice when you find a match:
def find(ch, s):
ln = len(s)
for i, char in enumerate(s):
if ch == char and 5 <= i <= ln - 6:
yield s[i- 5:i + 6]
Where presuming the data in your question is actually two lines from yoru file like:
s="""">3fm8|A|A0JLQ2CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTPQKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
>2ht9|A|A0JLT0LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDALYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCY"""
Running:
for line in s.splitlines():
print(list(find("C" ,line)))
would output:
['0JLQ2CFLVNL', 'QDIQLCGMGIL', 'ILPEHCIIDIT']
['TISDNCVVIFS', 'FSKTSCSYCTM', 'TSCSYCTMAKK']
Which gives six matches not four as your expected output suggest so I presume you did not include all possible matches.
You can also speed up the code using str.find, starting at the last match index + 1 for each subsequent match
def find(ch, s):
ln, i = len(s) - 6, s.find(ch)
while 5 <= i <= ln:
yield s[i - 5:i + 6]
i = s.find(ch, i + 1)
Which will give the same output. Of course if the strings cannot overlap you can start looking for the next match much further in the string each time.

My solution is based on regex, and shows all possible solutions using regex and while loop. Thanks to #Smac89 for improving it by transforming it into a generator:
import re
string = """CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTPQKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDA LYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCYL"""
# Generator
def find_cysteine2(string):
# Create a loop that will utilize regex multiple times
# in order to capture matches within groups
while True:
# Find a match
data = re.search(r'(\w{5}C\w{5})',string)
# If match exists, let's collect the data
if data:
# Collect the string
yield data.group(1)
# Shrink the string to not include
# the previous result
location = data.start() + 1
string = string[location:]
# If there are no matches, stop the loop
else:
break
print [x for x in find_cysteine2(string)]
# ['QDIQLCGMGIL', 'ILPEHCIIDIT', 'TISDNCVVIFS', 'FSKTSCSYCTM', 'TSCSYCTMAKK']

Cesar Cipher on Python beginner level

''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
for i in word:
r = chr(ord(i)+shift)
if r > "z":
r = chr(ord(i) - 26 + shift)
word = word.replace(i, r)
return word
if __name__ == "__main__": print encrypt("programming", 3)
This gives me wrong answers on shifts higher than 1 and words longer then 2. I can't figure out why. Any help please?

Thilo explains the problem exactly. Let's step through it:
''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
for i in word:
r = chr(ord(i)+shift)
if r > "z":
r = chr(ord(i) - 26 + shift)
word = word.replace(i, r)
return word
Try encrypt('abc', 1) and see what happens:
First loop:
i = 'a'
r = chr(ord('a')+1) = 'b'
word = 'abc'.replace('a', 'b') = 'bbc'
Second loop:
i = 'b'
r = chr(ord('b')+1) = 'c'
word = 'bbc'.replace('b', 'c') = 'ccc'
Third loop:
i = 'c'
r = chr(ord('c')+1) = 'd'
word = 'ccc'.replace('c', 'd') = 'ddd'
You don't want to replace every instance of i with r, just this one. How would you do this? Well, if you keep track of the index, you can just replace at that index. The built-in enumerate function lets you get each index and each corresponding value at the same time.
for index, ch in enumerate(word):
r = chr(ord(ch)+shift)
if r > "z":
r = chr(ord(ch) - 26 + shift)
word = new_word_replacing_one_char(index, r)
Now you just have to write that new_word_replacing_one_char function, which is pretty easy if you know slicing. (If you haven't learned slicing yet, you may want to convert the string into a list of characters, so you can just say word[index] = r, and then convert back into a string at the end.)

I don't know how Python likes replacing characters in the word while you are iterating over it, but one thing that seems to be a problem for sure is repeated letters, because replace will replace all occurrences of the letter, not just the one you are currently looking at, so you will end up shifting those repeated letters more than once (as you hit them again in a later iteration).
Come to think of it, this will also happen with non-repeated letters. For example, shifting ABC by 1 will become -> BBC -> CCC -> DDD in your three iterations.

I had this assignment as well. The hint is you have to keep track of where the values wrap, and use that to your advantage. I also recommend using the upper function call so everything is the same case, reduces the number of checks to do.

In Python, strings are immutable - that is they cannot be changed. Lists, however, can be. So to use your algorithm, use a list instead:
''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
# Convert the word to a list
word = list(word)
# Iterate over the word by index
for i in xrange(len(word)):
# Get the character at i
c = word[i]
# Apply shift algorithm
r = chr(ord(c)+shift)
if r > "z":
r = chr(ord(c) - 26 + shift)
# Replace the character at i
word[i] = r
# Convert the list back to a string
return ''.join(word)
if __name__ == "__main__": print encrypt("programming", 3)

Python: How to replace N random string occurrences in text?

Say that I have 10 different tokens, "(TOKEN)" in a string. How do I replace 2 of those tokens, chosen at random, with some other string, leaving the other tokens intact?

>>> import random
>>> text = '(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)'
>>> token = '(TOKEN)'
>>> replace = 'foo'
>>> num_replacements = 2
>>> num_tokens = text.count(token) #10 in this case
>>> points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
>>> replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))
'(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__foo__(TOKEN)__foo__(TOKEN)__(TOKEN)__(TOKEN)'
In function form:
>>> def random_replace(text, token, replace, num_replacements):
num_tokens = text.count(token)
points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
return replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))
>>> random_replace('....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....','(TOKEN)','FOO',2)
'....FOO....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....FOO....'
Test:
>>> for i in range(0,9):
print random_replace('....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....','(0)','(%d)'%i,i)
....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(1)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(0)....(2)....(2)....(0)....
....(3)....(0)....(0)....(3)....(0)....(3)....(0)....(0)....
....(4)....(4)....(0)....(0)....(4)....(4)....(0)....(0)....
....(0)....(5)....(5)....(5)....(5)....(0)....(0)....(5)....
....(6)....(6)....(6)....(0)....(6)....(0)....(6)....(6)....
....(7)....(7)....(7)....(7)....(7)....(7)....(0)....(7)....
....(8)....(8)....(8)....(8)....(8)....(8)....(8)....(8)....

If you need exactly two, then:
Detect the tokens (keep some links to them, like index into the string)
Choose two at random (random.choice)
Replace them

What are you trying to do, exactly? A good answer will depend on that...
That said, a brute-force solution that comes to mind is to:
Store the 10 tokens in an array, such that tokens[0] is the first token, tokens[1] is the second, ... and so on
Create a dictionary to associate each unique "(TOKEN)" with two numbers: start_idx, end_idx
Write a little parser that walks through your string and looks for each of the 10 tokens. Whenever one is found, record the start/end indexes (as start_idx, end_idx) in the string where that token occurs.
Once done parsing, generate a random number in the range [0,9]. Lets call this R
Now, your random "(TOKEN)" is tokens[R];
Use the dictionary in step (3) to find the start_idx, end_idx values in the string; replace the text there with "some other string"

My solution in code:
import random
s = "(TOKEN)test(TOKEN)fgsfds(TOKEN)qwerty(TOKEN)42(TOKEN)(TOKEN)ttt"
replace_from = "(TOKEN)"
replace_to = "[REPLACED]"
amount_to_replace = 2
def random_replace(s, replace_from, replace_to, amount_to_replace):
parts = s.split(replace_from)
indices = random.sample(xrange(len(parts) - 1), amount_to_replace)
replaced_s_parts = list()
for i in xrange(len(parts)):
replaced_s_parts.append(parts[i])
if i < len(parts) - 1:
if i in indices:
replaced_s_parts.append(replace_to)
else:
replaced_s_parts.append(replace_from)
return "".join(replaced_s_parts)
#TEST
for i in xrange(5):
print random_replace(s, replace_from, replace_to, 2)
Explanation:
Splits string into several parts using replace_from
Chooses indexes of tokens to replace using random.sample. This returned list contains unique numbers
Build a list for string reconstruction, replacing tokens with generated index by replace_to.
Concatenate all list elements into single string

Try this solution:
import random
def replace_random(tokens, eqv, n):
random_tokens = eqv.keys()
random.shuffle(random_tokens)
for i in xrange(n):
t = random_tokens[i]
tokens = tokens.replace(t, eqv[t])
return tokens
Assuming that a string with tokens exists, and a suitable equivalence table can be constructed with a replacement for each token:
tokens = '(TOKEN1) (TOKEN2) (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) (TOKEN9) (TOKEN10)'
equivalences = {
'(TOKEN1)' : 'REPLACEMENT1',
'(TOKEN2)' : 'REPLACEMENT2',
'(TOKEN3)' : 'REPLACEMENT3',
'(TOKEN4)' : 'REPLACEMENT4',
'(TOKEN5)' : 'REPLACEMENT5',
'(TOKEN6)' : 'REPLACEMENT6',
'(TOKEN7)' : 'REPLACEMENT7',
'(TOKEN8)' : 'REPLACEMENT8',
'(TOKEN9)' : 'REPLACEMENT9',
'(TOKEN10)' : 'REPLACEMENT10'
}
You can call it like this:
replace_random(tokens, equivalences, 2)
> '(TOKEN1) REPLACEMENT2 (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) REPLACEMENT9 (TOKEN10)'

There are lots of ways to do this. My approach would be to write a function that takes the original string, the token string, and a function that returns the replacement text for an occurrence of the token in the original:
def strByReplacingTokensUsingFunction(original, token, function):
outputComponents = []
matchNumber = 0
unexaminedOffset = 0
while True:
matchOffset = original.find(token, unexaminedOffset)
if matchOffset < 0:
matchOffset = len(original)
outputComponents.append(original[unexaminedOffset:matchOffset])
if matchOffset == len(original):
break
unexaminedOffset = matchOffset + len(token)
replacement = function(original=original, offset=matchOffset, matchNumber=matchNumber, token=token)
outputComponents.append(replacement)
matchNumber += 1
return ''.join(outputComponents)
(You could certainly change this to use shorter identifiers. My style is somewhat more verbose than typical Python style.)
Given that function, it's easy to replace two random occurrences out of ten. Here's some sample input:
sampleInput = 'a(TOKEN)b(TOKEN)c(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)g(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k'
The random module has a handy method for picking random items from a population (not picking the same item twice):
import random
replacementIndexes = random.sample(range(10), 2)
Then we can use the function above to replace the randomly-chosen occurrences:
sampleOutput = strByReplacingTokensUsingFunction(sampleInput, '(TOKEN)',
(lambda matchNumber, token, **keywords:
'REPLACEMENT' if (matchNumber in replacementIndexes) else token))
print sampleOutput
And here's some test output:
a(TOKEN)b(TOKEN)cREPLACEMENTd(TOKEN)e(TOKEN)fREPLACEMENTg(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k
Here's another run:
a(TOKEN)bREPLACEMENTc(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)gREPLACEMENTh(TOKEN)i(TOKEN)j(TOKEN)k

from random import sample
mystr = 'adad(TOKEN)hgfh(TOKEN)hjgjh(TOKEN)kjhk(TOKEN)jkhjk(TOKEN)utuy(TOKEN)tyuu(TOKEN)tyuy(TOKEN)tyuy(TOKEN)tyuy(TOKEN)'
def replace(mystr, substr, n_repl, replacement='XXXXXXX', tokens=10, index=0):
choices = sorted(sample(xrange(tokens),n_repl))
for i in xrange(choices[-1]+1):
index = mystr.index(substr, index) + 1
if i in choices:
mystr = mystr[:index-1] + mystr[index-1:].replace(substr,replacement,1)
return mystr
print replace(mystr,'(TOKEN)',2)

The Alphabet and Recursion

I'm almost done with my program, but I've made a subtle mistake. My program is supposed to take a word, and by changing one letter at a time, is eventually supposed to reach a target word, in the specified number of steps. I had been trying at first to look for similarities, for example: if the word was find, and the target word lose, here's how my program would output in 4 steps:
['find','fine','line','lone','lose]
Which is actually the output I wanted. But if you consider a tougher set of words, like Java and work, the output is supposed to be in 6 steps.
['java', 'lava', 'lave', 'wave', 'wove', 'wore', 'work']
So my mistake is that I didn't realize you could get to the target word, by using letters that don't exist in the target word or original word.
Here's my Original Code:
import string
def changeling(word,target,steps):
alpha=string.ascii_lowercase
x=word##word and target has been changed to keep the coding readable.
z=target
if steps==0 and word!= target:##if the target can't be reached, return nothing.
return []
if x==z:##if target has been reached.
return [z]
if len(word)!=len(target):##if the word and target word aren't the same length print error.
print "error"
return None
i=1
if lookup
if lookup(z[0]+x[1:]) is True and z[0]+x[1:]!=x :##check every letter that could be from z, in variations of, and check if they're in the dictionary.
word=z[0]+x[1:]
while i!=len(x):
if lookup(x[:i-1]+z[i-1]+x[i:]) and x[:i-1]+z[i-1]+x[i:]!=x:
word=x[:i-1]+z[i-1]+x[i:]
i+=1
if lookup(x[:len(x)-1]+z[len(word)-1]) and x[:len(x)-1]+z[len(x)-1]!=x :##same applies here.
word=x[:len(x)-1]+z[len(word)-1]
y = changeling(word,target,steps-1)
if y :
return [x] + y##used to concatenate the first word to the final list, and if the list goes past the amount of steps.
else:
return None
Here's my current code:
import string
def changeling(word,target,steps):
alpha=string.ascii_lowercase
x=word##word and target has been changed to keep the coding readable.
z=target
if steps==0 and word!= target:##if the target can't be reached, return nothing.
return []
if x==z:##if target has been reached.
return [z]
holderlist=[]
if len(word)!=len(target):##if the word and target word aren't the same length print error.
print "error"
return None
i=1
for items in alpha:
i=1
while i!=len(x):
if lookup(x[:i-1]+items+x[i:]) is True and x[:i-1]+items+x[i:]!=x:
word =x[:i-1]+items+x[i:]
holderlist.append(word)
i+=1
if lookup(x[:len(x)-1]+items) is True and x[:len(x)-1]+items!=x:
word=x[:len(x)-1]+items
holderlist.append(word)
y = changeling(word,target,steps-1)
if y :
return [x] + y##used to concatenate the first word to the final list, and if the/
list goes past the amount of steps.
else:
return None
The differences between the two is that the first checks every variation of find with the letters from lose. Meaning: lind, fond, fisd, and fine. Then, if it finds a working word with the lookup function, it calls changeling on that newfound word.
As opposed to my new program, which checks every variation of find with every single letter in the alphabet.
I can't seem to get this code to work. I've tested it by simply printing what the results are of find:
for items in alpha:
i=1
while i!=len(x):
print (x[:i-1]+items+x[i:])
i+=1
print (x[:len(x)-1]+items)
This gives:
aind
fand
fiad
fina
bind
fbnd
fibd
finb
cind
fcnd
ficd
finc
dind
fdnd
fidd
find
eind
fend
fied
fine
find
ffnd
fifd
finf
gind
fgnd
figd
fing
hind
fhnd
fihd
finh
iind
find
fiid
fini
jind
fjnd
fijd
finj
kind
fknd
fikd
fink
lind
flnd
fild
finl
mind
fmnd
fimd
finm
nind
fnnd
find
finn
oind
fond
fiod
fino
pind
fpnd
fipd
finp
qind
fqnd
fiqd
finq
rind
frnd
fird
finr
sind
fsnd
fisd
fins
tind
ftnd
fitd
fint
uind
fund
fiud
finu
vind
fvnd
fivd
finv
wind
fwnd
fiwd
finw
xind
fxnd
fixd
finx
yind
fynd
fiyd
finy
zind
fznd
fizd
finz
Which is perfect! Notice that each letter in the alphabet goes through my word at least once. Now, what my program does is use a helper function to determine if that word is in a dictionary that I've been given.
Consider this, instead of like my first program, I now receive multiple words that are legal, except when I do word=foundword it means I'm replacing the previous word each time. Which is why I'm trying holderlist.append(word).
I think my problem is that I need changeling to run through each word in holderlist, and I'm not sure how to do that. Although that's only speculation.
Any help would be appreciated,
Cheers.

I might be slightly confused about what you need, but by borrowing from this post I belive I have some code that should be helpful.
>>> alphabet = 'abcdefghijklmnopqrstuvwxyz'
>>> word = 'java'
>>> splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
>>> splits
[('', 'java'), ('j', 'ava'), ('ja', 'va'), ('jav', 'a'), ('java', '')]
>>> replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
>>> replaces
['aava', 'bava', 'cava', 'dava', 'eava', 'fava', 'gava', 'hava', 'iava', 'java', 'kava', 'lava', 'mava', 'nava', 'oava', 'pava', 'qava', 'rava', 'sava', 'tava', 'uava', 'vava', 'wav
a', 'xava', 'yava', 'zava', 'java', 'jbva', 'jcva', 'jdva', 'jeva', 'jfva', 'jgva', 'jhva', 'jiva', 'jjva', 'jkva', 'jlva', 'jmva', 'jnva', 'jova', 'jpva', 'jqva', 'jrva', 'jsva', '
jtva', 'juva', 'jvva', 'jwva', 'jxva', 'jyva', 'jzva', 'jaaa', 'jaba', 'jaca', 'jada', 'jaea', 'jafa', 'jaga', 'jaha', 'jaia', 'jaja', 'jaka', 'jala', 'jama', 'jana', 'jaoa', 'japa'
, 'jaqa', 'jara', 'jasa', 'jata', 'jaua', 'java', 'jawa', 'jaxa', 'jaya', 'jaza', 'java', 'javb', 'javc', 'javd', 'jave', 'javf', 'javg', 'javh', 'javi', 'javj', 'javk', 'javl', 'ja
vm', 'javn', 'javo', 'javp', 'javq', 'javr', 'javs', 'javt', 'javu', 'javv', 'javw', 'javx', 'javy', 'javz']
Once you have a list of all possible replaces, you can simply do
valid_words = [valid for valid in replaces if lookup(valid)]
Which should give you all words that can be formed by replacing 1 character in word. By placing this code in a separate method, you could take a word, obtain possible next words from that current word, and recurse over each of those words. For example:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def next_word(word):
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
return [valid for valid in replaces if lookup(valid)]
Is this enough help? I think your code could really benefit by separating tasks into smaller chunks.

Fixed your code:
import string
def changeling(word, target, steps):
alpha=string.ascii_lowercase
x = word #word and target has been changed to keep the coding readable.
z = target
if steps == 0 and word != target: #if the target can't be reached, return nothing.
return []
if x == z: #if target has been reached.
return [z]
holderlist = []
if len(word) != len(target): #if the word and target word aren't the same length print error.
raise BaseException("Starting word and target word not the same length: %d and %d" % (len(word),
i = 1
for items in alpha:
i=1
while i != len(x):
if lookup(x[:i-1] + items + x[i:]) is True and x[:i-1] + items + x[i:] != x:
word = x[:i-1] + items + x[i:]
holderlist.append(word)
i += 1
if lookup(x[:len(x)-1] + items) is True and x[:len(x)-1] + items != x:
word = x[:len(x)-1] + items
holderlist.append(word)
y = [changeling(pos_word, target, steps-1) for pos_word in holderlist]
if y:
return [x] + y #used to concatenate the first word to the final list, and if the list goes past the amount of steps.
else:
return None
Where len(word) and len(target), it'd be better to raise an exception than print something obscure, w/o a stack trace and non-fatal.
Oh and backslashes(\), not forward slashes(/), are used to continue lines. And they don't work on comments

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

find set of words in list - python

Related

Find most common substring in a list of strings?

extract substring pattern

Cesar Cipher on Python beginner level

Python: How to replace N random string occurrences in text?

The Alphabet and Recursion

Categories

Resources