The Alphabet and Recursion - python

I'm almost done with my program, but I've made a subtle mistake. My program is supposed to take a word, and by changing one letter at a time, is eventually supposed to reach a target word, in the specified number of steps. I had been trying at first to look for similarities, for example: if the word was find, and the target word lose, here's how my program would output in 4 steps:
['find','fine','line','lone','lose]
Which is actually the output I wanted. But if you consider a tougher set of words, like Java and work, the output is supposed to be in 6 steps.
['java', 'lava', 'lave', 'wave', 'wove', 'wore', 'work']
So my mistake is that I didn't realize you could get to the target word, by using letters that don't exist in the target word or original word.
Here's my Original Code:
import string
def changeling(word,target,steps):
alpha=string.ascii_lowercase
x=word##word and target has been changed to keep the coding readable.
z=target
if steps==0 and word!= target:##if the target can't be reached, return nothing.
return []
if x==z:##if target has been reached.
return [z]
if len(word)!=len(target):##if the word and target word aren't the same length print error.
print "error"
return None
i=1
if lookup
if lookup(z[0]+x[1:]) is True and z[0]+x[1:]!=x :##check every letter that could be from z, in variations of, and check if they're in the dictionary.
word=z[0]+x[1:]
while i!=len(x):
if lookup(x[:i-1]+z[i-1]+x[i:]) and x[:i-1]+z[i-1]+x[i:]!=x:
word=x[:i-1]+z[i-1]+x[i:]
i+=1
if lookup(x[:len(x)-1]+z[len(word)-1]) and x[:len(x)-1]+z[len(x)-1]!=x :##same applies here.
word=x[:len(x)-1]+z[len(word)-1]
y = changeling(word,target,steps-1)
if y :
return [x] + y##used to concatenate the first word to the final list, and if the list goes past the amount of steps.
else:
return None
Here's my current code:
import string
def changeling(word,target,steps):
alpha=string.ascii_lowercase
x=word##word and target has been changed to keep the coding readable.
z=target
if steps==0 and word!= target:##if the target can't be reached, return nothing.
return []
if x==z:##if target has been reached.
return [z]
holderlist=[]
if len(word)!=len(target):##if the word and target word aren't the same length print error.
print "error"
return None
i=1
for items in alpha:
i=1
while i!=len(x):
if lookup(x[:i-1]+items+x[i:]) is True and x[:i-1]+items+x[i:]!=x:
word =x[:i-1]+items+x[i:]
holderlist.append(word)
i+=1
if lookup(x[:len(x)-1]+items) is True and x[:len(x)-1]+items!=x:
word=x[:len(x)-1]+items
holderlist.append(word)
y = changeling(word,target,steps-1)
if y :
return [x] + y##used to concatenate the first word to the final list, and if the/
list goes past the amount of steps.
else:
return None
The differences between the two is that the first checks every variation of find with the letters from lose. Meaning: lind, fond, fisd, and fine. Then, if it finds a working word with the lookup function, it calls changeling on that newfound word.
As opposed to my new program, which checks every variation of find with every single letter in the alphabet.
I can't seem to get this code to work. I've tested it by simply printing what the results are of find:
for items in alpha:
i=1
while i!=len(x):
print (x[:i-1]+items+x[i:])
i+=1
print (x[:len(x)-1]+items)
This gives:
aind
fand
fiad
fina
bind
fbnd
fibd
finb
cind
fcnd
ficd
finc
dind
fdnd
fidd
find
eind
fend
fied
fine
find
ffnd
fifd
finf
gind
fgnd
figd
fing
hind
fhnd
fihd
finh
iind
find
fiid
fini
jind
fjnd
fijd
finj
kind
fknd
fikd
fink
lind
flnd
fild
finl
mind
fmnd
fimd
finm
nind
fnnd
find
finn
oind
fond
fiod
fino
pind
fpnd
fipd
finp
qind
fqnd
fiqd
finq
rind
frnd
fird
finr
sind
fsnd
fisd
fins
tind
ftnd
fitd
fint
uind
fund
fiud
finu
vind
fvnd
fivd
finv
wind
fwnd
fiwd
finw
xind
fxnd
fixd
finx
yind
fynd
fiyd
finy
zind
fznd
fizd
finz
Which is perfect! Notice that each letter in the alphabet goes through my word at least once. Now, what my program does is use a helper function to determine if that word is in a dictionary that I've been given.
Consider this, instead of like my first program, I now receive multiple words that are legal, except when I do word=foundword it means I'm replacing the previous word each time. Which is why I'm trying holderlist.append(word).
I think my problem is that I need changeling to run through each word in holderlist, and I'm not sure how to do that. Although that's only speculation.
Any help would be appreciated,
Cheers.

I might be slightly confused about what you need, but by borrowing from this post I belive I have some code that should be helpful.
>>> alphabet = 'abcdefghijklmnopqrstuvwxyz'
>>> word = 'java'
>>> splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
>>> splits
[('', 'java'), ('j', 'ava'), ('ja', 'va'), ('jav', 'a'), ('java', '')]
>>> replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
>>> replaces
['aava', 'bava', 'cava', 'dava', 'eava', 'fava', 'gava', 'hava', 'iava', 'java', 'kava', 'lava', 'mava', 'nava', 'oava', 'pava', 'qava', 'rava', 'sava', 'tava', 'uava', 'vava', 'wav
a', 'xava', 'yava', 'zava', 'java', 'jbva', 'jcva', 'jdva', 'jeva', 'jfva', 'jgva', 'jhva', 'jiva', 'jjva', 'jkva', 'jlva', 'jmva', 'jnva', 'jova', 'jpva', 'jqva', 'jrva', 'jsva', '
jtva', 'juva', 'jvva', 'jwva', 'jxva', 'jyva', 'jzva', 'jaaa', 'jaba', 'jaca', 'jada', 'jaea', 'jafa', 'jaga', 'jaha', 'jaia', 'jaja', 'jaka', 'jala', 'jama', 'jana', 'jaoa', 'japa'
, 'jaqa', 'jara', 'jasa', 'jata', 'jaua', 'java', 'jawa', 'jaxa', 'jaya', 'jaza', 'java', 'javb', 'javc', 'javd', 'jave', 'javf', 'javg', 'javh', 'javi', 'javj', 'javk', 'javl', 'ja
vm', 'javn', 'javo', 'javp', 'javq', 'javr', 'javs', 'javt', 'javu', 'javv', 'javw', 'javx', 'javy', 'javz']
Once you have a list of all possible replaces, you can simply do
valid_words = [valid for valid in replaces if lookup(valid)]
Which should give you all words that can be formed by replacing 1 character in word. By placing this code in a separate method, you could take a word, obtain possible next words from that current word, and recurse over each of those words. For example:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def next_word(word):
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
return [valid for valid in replaces if lookup(valid)]
Is this enough help? I think your code could really benefit by separating tasks into smaller chunks.

Fixed your code:
import string
def changeling(word, target, steps):
alpha=string.ascii_lowercase
x = word #word and target has been changed to keep the coding readable.
z = target
if steps == 0 and word != target: #if the target can't be reached, return nothing.
return []
if x == z: #if target has been reached.
return [z]
holderlist = []
if len(word) != len(target): #if the word and target word aren't the same length print error.
raise BaseException("Starting word and target word not the same length: %d and %d" % (len(word),
i = 1
for items in alpha:
i=1
while i != len(x):
if lookup(x[:i-1] + items + x[i:]) is True and x[:i-1] + items + x[i:] != x:
word = x[:i-1] + items + x[i:]
holderlist.append(word)
i += 1
if lookup(x[:len(x)-1] + items) is True and x[:len(x)-1] + items != x:
word = x[:len(x)-1] + items
holderlist.append(word)
y = [changeling(pos_word, target, steps-1) for pos_word in holderlist]
if y:
return [x] + y #used to concatenate the first word to the final list, and if the list goes past the amount of steps.
else:
return None
Where len(word) and len(target), it'd be better to raise an exception than print something obscure, w/o a stack trace and non-fatal.
Oh and backslashes(\), not forward slashes(/), are used to continue lines. And they don't work on comments

Related

find set of words in list

In the example I made up below, I am trying to get the words STEM Employment from the text list. If I find those set of words in order, I would like to find the index number of the first word so I can then use that same index number for the width and height lists since they are parallel (meaning their len is always the same dynamic value)
teststring = ("STEM Employment").split(" ")
data = {"text":["some","more","STEM","Employment","data"],
"width":[100,45,50,90,354],
"height":[500,320,320,432,554]}
so for this example, the answer would be 50 and 320 because the first word is STEM. However I am not just looking for STEM I have to make sure that Employment follows right after STEM in the list.
I tried writing a forloop for this but my forloop stops short when it confirms the first word STEM. I am not sure how to fix it:
testchecker = 0
for testword in range(len(data)):
print(data["text"][testword])
for m in teststring:
# print(m)
print(testchecker)
if m in data["text"][testword]:
print("true")
testchecker = testchecker + 1
if testchecker == len(teststring):
print("match")
print(testword-testchecker+1)
pass
else:
testchecker = 0
You can make data["text"] a string with join and check for "STEM Employment" in that. Then find the index of "STEM".
teststring = "STEM Employment"
data = {"text":["some","more","STEM","Employment","data"],
"width":[100,45,50,90,354],
"height":[500,320,320,432,554]}
if teststring in " ".join(data["text"]):
idx = data["text"].index(teststring.split(' ')[0])
print(data["width"][idx], data["height"][idx])
Output:
50 320
Another option:
teststring = "STEM Employment".split(' ')
# Make sure all words in testring are in data["text"]
if all(s in data["text"] for s in teststring):
# Get the indexes of each word
indexes = [data["text"].index(s) for s in teststring]
# Make sure all indexes are sequential
if all(b - a == 1 for a, b in zip(indexes, indexes[1:])):
print(data["width"][indexes[0]], data["height"][indexes[0]])

Ignoring Changed Index Check (Python)

I have made a script:
our_word = "Success"
def duplicate_encode(word):
char_list = []
final_str = ""
changed_index = []
base_wrd = word.lower()
for k in base_wrd:
char_list.append(k)
for i in range(0, len(char_list)):
count = 0
for j in range(i + 1, len(char_list)):
if j not in changed_index:
if char_list[j] == char_list[i]:
char_list[j] = ")"
changed_index.append(j)
count += 1
else:
continue
if count > 0:
char_list[i] = ")"
else:
char_list[i] = "("
print(changed_index)
print(char_list)
final_str = "".join(char_list)
return final_str
print(duplicate_encode(our_word))
essentialy the purpose of this script is to convert a string to a new string where each character in the new string is "(", if that character appears only once in the original string, or ")", if that character appears more than once in the original string. I have made a rather layered up script (I am relatively new to the python language so didn't want to use any helpful in-built functions) that attempts to do this. My issue is that where I check if the current index has been previously edited (in order to prevent it from changing), it seems to ignore it. So instead of the intended )())()) I get )()((((. I'd really appreciate an insightful answer to why I am getting this issue and ways to work around this, since I'm trying to gather an intuitive knowledge surrounding python. Thanks!
word = "Success"
print(''.join([')' if word.lower().count(c) > 1 else '(' for c in word.lower()]))
The issue here has nothing to do with your understanding of Python. It's purely algorithmic. If you retain this 'layered' algorithm, it is essential that you add one more check in the "i" loop.
our_word = "Success"
def duplicate_encode(word):
char_list = list(word.lower())
changed_index = []
for i in range(len(word)):
count = 0
for j in range(i + 1, len(word)):
if j not in changed_index:
if char_list[j] == char_list[i]:
char_list[j] = ")"
changed_index.append(j)
count += 1
if i not in changed_index: # the new inportant check to avoid reversal of already assigned ')' to '('
char_list[i] = ")" if count > 0 else "("
return "".join(char_list)
print(duplicate_encode(our_word))
Your algorithm can be greatly simplified if you avoid using char_list as both the input and output. Instead, you can create an output list of the same length filled with ( by default, and then only change an element when a duplicate is found. The loops will simply walk along the entire input list once for each character looking for any matches (other than self-matches). If one is found, the output list can be updated and the inner loop will break and move on to the next character.
The final code should look like this:
def duplicate_encode(word):
char_list = list(word.lower())
output = list('(' * len(word))
for i in range(len(char_list)):
for j in range(len(char_list)):
if i != j and char_list[i] == char_list[j]:
output[i] = ')'
break
return ''.join(output)
for our_word in (
'Success',
'ChJsTk(u cIUzI htBp#qX)OTIHpVtHHhQ',
):
result = duplicate_encode(our_word)
print(our_word)
print(result)
Output:
Success
)())())
ChJsTk(u cIUzI htBp#qX)OTIHpVtHHhQ
))(()(()))))())))()()((())))()))))

How to remove pair of small and capital letters in a string?

Basically what I'm trying to do is create a code that removes a pair of lower and capital letters. e.g. :
AbBax -» x
cCdatabasacCADde -» database
I've tried doing this but it gives me an error, maybe my train of thought is wrong.
def decode(c_p):
t_cp=[]
for i in c_p:
t_cp+=[I,]
#here I added each character from the string to a list so it would be easier to analyse each character
new_c_p=""
for c in range(len(t_cp)-1):
if not t_cp[c]==chr(ord(c)) and t_cp[c+1]==chr(ord(c) + 32) or not t_cp[c]==chr(ord(c) + 32) and t_cp[c+1]==chr(ord(c)) :
#here I analyse the index c and c+1 to know if the first character corresponds to the next in capital or vice-versa, if doesn't correspond, I add that character into new_c_p
new_c_p+=c
return new_c_p
Here's a slightly simpler approach:
def decode(c_p):
while True:
for i, pair in enumerate(zip(c_p, c_p[1:])):
up, lo = sorted(pair)
if up.lower() == lo and up == lo.upper():
c_p = c_p[:i] + c_p[i+2:]
break
else:
return c_p
decode("cCdatabasacCADde")
# 'database'
And here is an even better one that does not start all the way from the beginning every time and has actually linear time and space complexity:
def decode(c_p):
stack = []
for c in c_p:
if not stack:
stack.append(c)
else:
up, lo = sorted((stack[-1], c))
if up.lower() == lo and lo.upper() == up:
stack.pop()
else:
stack.append(c)
return "".join(stack)

Infer Spaces: Ignore Numbers and Special Characters

I'm trying to pass over everything that isn't a letter (apostrophes, etc), and then continue on afterwards. The number should be in its respective place in the result. This is from this accepted answer, and the word list is here.
The string is "thereare7deadlysins"
The code below outputs "there are 7 d e a d l y s i n s"
I'm trying to get "there are 7 deadly sins"
I tried (below), but I receive IndexError: 'string index out of range'
# Backtrack to recover the minimal-cost string.
out = []
i = len(s)
while i>0:
if isinstance(s[i], int):
continue
c,k = best_match(i)
assert c == cost[i]
out.append(s[i-k:i])
i -= k
The entire thing is:
from math import log
import string
# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).
words = open("/Users/.../Desktop/wordlist.txt").read().split()
wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words))
maxword = max(len(x) for x in words)
table = string.maketrans("","")
l = "".join("thereare7deadlysins".split()).lower()
def infer_spaces(s):
"""Uses dynamic programming to infer the location of spaces in a string
without spaces."""
# Find the best match for the i first characters, assuming cost has
# been built for the i-1 first characters.
# Returns a pair (match_cost, match_length).
def best_match(i):
candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)
# Build the cost array.
cost = [0]
for i in range(1,len(s)+1):
c,k = best_match(i)
cost.append(c)
# Backtrack to recover the minimal-cost string.
out = []
i = len(s)
while i>0:
c,k = best_match(i)
assert c == cost[i]
out.append(s[i-k:i])
i -= k
return " ".join(reversed(out))
def test_trans(s):
return s.translate(table, string.punctuation)
s = test_trans(l)
print(infer_spaces(s))
EDIT: Based on the accepted answer the following solved my problem:
1. Remove single letters from the wordlist (except a, e, i)
2. Added the following below wordcost.
nums = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
for n in nums:
wordcost[n] = log(2)
The suggestion to change wordcost to (below) did not produce optimal results.
wordcost = dict( (k, (i+1)*log(1+len(k))) for i,k in enumerate(words) )
Example:
String: "Recall8importantscreeningquestions"
Original wordcost: "recall 8 important screening questions"
Suggested wordcost: "re call 8 important s c re e n in g question s"
Note that the word list contains all 26 individual letters as words.
With just the following modifications, your algorithm will correctly infer the spaces for the input string "therearesevendeadlysins" (i.e. "7" changed to "seven"):
Remove the single letter words from the word list (perhaps except for "a" and "i".)
As #Pm 2Ring noted, change the definition of wordcost
to:
wordcost = wordcost = dict( (k, (i+1)*log(1+len(k))) for i,k in enumerate(words) )
So there is something about non-letters that is goofing up your algorithm. Since you have already removed punctuation, perhaps you should treat a string of non-letters as a single word.
For instance, if you add:
wordcost["7"] = log(2)
(in addition to changes 1 and 2 above) your algorithm works on the original test string.
i = len(s) -1
to avoid IndexError: 'string index out of range'
and
if s[i].isdigit():
is the test you're looking for.

Cesar Cipher on Python beginner level

''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
for i in word:
r = chr(ord(i)+shift)
if r > "z":
r = chr(ord(i) - 26 + shift)
word = word.replace(i, r)
return word
if __name__ == "__main__": print encrypt("programming", 3)
This gives me wrong answers on shifts higher than 1 and words longer then 2. I can't figure out why. Any help please?
Thilo explains the problem exactly. Let's step through it:
''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
for i in word:
r = chr(ord(i)+shift)
if r > "z":
r = chr(ord(i) - 26 + shift)
word = word.replace(i, r)
return word
Try encrypt('abc', 1) and see what happens:
First loop:
i = 'a'
r = chr(ord('a')+1) = 'b'
word = 'abc'.replace('a', 'b') = 'bbc'
Second loop:
i = 'b'
r = chr(ord('b')+1) = 'c'
word = 'bbc'.replace('b', 'c') = 'ccc'
Third loop:
i = 'c'
r = chr(ord('c')+1) = 'd'
word = 'ccc'.replace('c', 'd') = 'ddd'
You don't want to replace every instance of i with r, just this one. How would you do this? Well, if you keep track of the index, you can just replace at that index. The built-in enumerate function lets you get each index and each corresponding value at the same time.
for index, ch in enumerate(word):
r = chr(ord(ch)+shift)
if r > "z":
r = chr(ord(ch) - 26 + shift)
word = new_word_replacing_one_char(index, r)
Now you just have to write that new_word_replacing_one_char function, which is pretty easy if you know slicing. (If you haven't learned slicing yet, you may want to convert the string into a list of characters, so you can just say word[index] = r, and then convert back into a string at the end.)
I don't know how Python likes replacing characters in the word while you are iterating over it, but one thing that seems to be a problem for sure is repeated letters, because replace will replace all occurrences of the letter, not just the one you are currently looking at, so you will end up shifting those repeated letters more than once (as you hit them again in a later iteration).
Come to think of it, this will also happen with non-repeated letters. For example, shifting ABC by 1 will become -> BBC -> CCC -> DDD in your three iterations.
I had this assignment as well. The hint is you have to keep track of where the values wrap, and use that to your advantage. I also recommend using the upper function call so everything is the same case, reduces the number of checks to do.
In Python, strings are immutable - that is they cannot be changed. Lists, however, can be. So to use your algorithm, use a list instead:
''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
# Convert the word to a list
word = list(word)
# Iterate over the word by index
for i in xrange(len(word)):
# Get the character at i
c = word[i]
# Apply shift algorithm
r = chr(ord(c)+shift)
if r > "z":
r = chr(ord(c) - 26 + shift)
# Replace the character at i
word[i] = r
# Convert the list back to a string
return ''.join(word)
if __name__ == "__main__": print encrypt("programming", 3)

Categories