Where is the fault in this function? - python

with open("german.txt") as f:
words = f.read().split()
for word in words:
color = word.lower().replace("o", "0").replace("i", "1").replace("s", "5").replace("t", "7")
if len(word) == 3 or len(word) == 6:
ok = True
for c in color:
if c not in "abcdef0123456789":
ok = False
break
if ok:
print(word, "#" + color)
This program works, but why doesn't it work anymore when I add a function structure to it?
with open("german.txt") as f:
words = f.read().split()
def replace_letters_with_numbers(word):
color = word.lower().replace("o", "0").replace("i", "1").replace("s", "5").replace("t", "7")
def is_valid_color(word):
if len(word) == 3 or len(word) == 6:
ok = True
for c in color:
if c not in "abcdef0123456789":
ok = False
break
if ok:
print(word, "#" + color)
for word in words:
replace_letters_with_numbers(word)
is_valid_color(word)
Thanks in advance!

There are a few different issues.
Scoping
Return values
With your top-down approach, all the variables are defined.
Your functional approach straight up just not doing anything. For example, the replace_letters_with_number function is just assigning the new word into a local variable and returning nothing. Basically this function does nothing. One way to solve this is to return the updated word.
def replace_letters_with_numbers(word):
return word.lower().replace("o", "0").replace("i", "1").replace("s", "5").replace("t", "7")
Another issue is that when you run the is_valid_color the variable color technically does not exist. If we assume that the word being passed intno this function has already been replaced we can change the variable color to word.
With these approaches we could change how the for loop is executed, by assigning a variable color when we call replace_letters_with_number. Resulting in:
for word in words:
color = replace_letters_with_numbers(word)
is_valid_color(color)
There are other improvements that could be made; however, the main issue was that the variable color was not defined.

What I find wrong with your code:
your fist function is correct but is does not return anything. this means you cannot use the word resulting from this function in the subsequent functions. (so add )
return color
in your second function you must pass the returned value instead of word. Because word will still be in this case the raw word you read.
Also if you prefer and based on my observation I suggest splitting the functions into 3 for clarity as below. And from you code this is my understanding (correct if i got this wrong)
read a series of words and replace specific letters with numbers
filter only words of length 3 and 6
filter words from 2 with characters that are within the alpha-numeric characters defined by you.
The below code does this exactly.
def color_word(word):
color = word.lower().replace("o", "0").replace("i", "1").replace("s", "5").replace("t", "7")
return color
def word_length_is_3_and_6(word):
ok = False
if len(word) == 3 or len(word) == 6:
ok = True
print(f'{word} has {len(word)} letters \n resulting color from word : {color_word(word)}')
return ok
def check_alpha_numeric_state(word):
ok=True
for each_char in word:
if each_char not in "abcdef0123456789":
ok = False
print('all letters in word are not alphanumeric - [abcdef0123456789]')
break
else:
print(word, "#" + color_word(word))
if __name__ == '__main__':
words = ['happil', 'nos', 'Trusts' 'shon']
for each_word in words:
colorword = color_word(each_word) #returns the word with replaced letters and numbers
word_length = word_length_is_3_and_6(each_word) #boolean
if word_length == True:
check_alpha_numeric_state(colorword)

Related

Find words in a string of text (where letters aren't consecutive)

I'd like write code to find specific instances of words in a long string of text, where the letters making up the word are not adjacent, but consecutive.
The string I use will be thousands of characters long, but a as a shorter example... If I want to find instances of the word "chair" within the following string, where each letter is no more than 10 characters from the previous.
djecskjwidhl;asdjakimcoperkldrlkadkj
To avoid the problem of finding many instances in a large string, I'd prefer to limit the distance between every two letters to 10. So the word chair in the string abcCabcabcHabcAabdIabcR would count. But the word chair in the string abcCabcabcabcabcabcabcabcabHjdkeAlcndInadhR would not count.
Can I do this with python code? If so I'd appreciate an example that I could work with.
Maybe paste the string of text or use an input file? Have it search for the word or words I want, and then identify if those words are there?
Thanks.
This code below will do what you want:
will_find = "aaaaaaaaaaaaaaaaaaaaaaaabcCabcabcHabcAabdIabcR"
wont_find = "abcCabcabcabcabcabcabcabcabHjdkeAlcndInadhR"
looking_for = "CHAIR"
max_look = 10
def find_word(characters, word):
i = characters.find(word[0])
if i == -1:
print("I couldnt find the first character ...")
return False
for symbol in word:
print(characters[i:i + max_look+1])
if symbol in characters[i:i + max_look+1]:
i += characters[i: i + max_look+1].find(symbol)
print("{} is in the range of {} [{}]".format(symbol, characters[i:i+ max_look], i))
continue
else:
print("Couldnt find {} in {}".format(symbol, characters[i: i + max_look]))
return False
return True
find_word(will_find, looking_for)
print("--------")
find_word(wont_find, looking_for)
An alternative, this may also work for you.
long_string = 'djecskjwidhl;asdjakimcoperkldrlkadkj'
check_word = 'chair'
def substringChecker(longString, substring):
starting_index = []
n , derived_word = 0, substring[0]
for i, char in enumerate(longString[:-11]):
if char == substring[n] and substring[n + 1] in longString[i : i + 11]:
n += 1
derived_word += substring[n]
starting_index.append(i)
if len(derived_word) == len(substring):
return derived_word == substring, starting_index[0]
return False
print(substringChecker(long_string, check_word))
(True, 3)
To check if the word is there:
string = "abccabcabchabcaabdiabcr"
word = "chair"
while string or word:
index = string[:10].find(word[0])
if index > -1:
string = string[index+1:]
word = word[1:]
continue
if not word:
print("found")
else:
break

Find out if string contains a combination of letters in a specific order

I am attempting to write a program to find words in the English language that contain 3 letters of your choice, in order, but not necessarily consecutively. For example, the letter combination EJS would output, among others, the word EJectS. You supply the letters, and the program outputs the words.
However, the program does not give the letters in the right order, and does not work at all with double letters, like the letters FSF or VVC. I hope someone can tell me how I can fix this error.
Here is the full code:
with open("words_alpha.txt") as words:
wlist = list(words)
while True:
elim1 = []
elim2 = []
elim3 = []
search = input("input letters here: ")
for element1 in wlist:
element1 = element1[:-1]
val1 = element1.find(search[0])
if val1 > -1:
elim1.append(element1)
for element2 in elim1:
val2 = element2[(val1):].find(search[2])
if val2 > -1:
elim2.append(element2)
for element3 in elim2:
val3 = element3[((val1+val2)):].find(search[1])
if val3 > -1:
elim3.append(element3)
print(elim3)
You are making this very complicated for yourself. To test whether a word contains the letters E, J and S in that order, you can match it with the regex E.*J.*S:
>>> import re
>>> re.search('E.*J.*S', 'EJectS')
<_sre.SRE_Match object; span=(0, 6), match='EJectS'>
>>> re.search('E.*J.*S', 'JEt engineS') is None
True
So here's a simple way to write a function which tests for an arbitrary combination of letters:
import re
def contains_letters_in_order(word, letters):
regex = '.*'.join(map(re.escape, letters))
return re.search(regex, word) is not None
Examples:
>>> contains_letters_in_order('EJectS', 'EJS')
True
>>> contains_letters_in_order('JEt engineS', 'EJS')
False
>>> contains_letters_in_order('ABra Cadabra', 'ABC')
True
>>> contains_letters_in_order('Abra CadaBra', 'ABC')
False
If you want to test every word in a wordlist, it is worth doing pattern = re.compile(regex) once, and then pattern.search(word) for each word.
You need to read the file correctly with read(), and since there is a newline between each word, call split('\n') to properly create the word list. The logic is simple. If all the letters are in the word, get the index for each letter, and check that the order of the indexes matches the order of the letters.
with open('words_alpha.txt') as file:
word_list = file.read().split('\n')
search = input("input letters here: ").lower()
found = []
for word in word_list:
if all(x in word for x in search):
i = word.find(search[0])
j = word.find(search[1], i + 1)
k = word.find(search[2], j + 1)
if i < j < k:
found.append(word)
print(found)
Using Function:
def get_words_with_letters(word_list, search):
search = search.lower()
for word in word_list:
if all(x in word for x in search):
i = word.find(search[0])
j = word.find(search[1], i + 1)
k = word.find(search[2], j + 1)
if i < j < k:
yield word
words = list(get_words_with_letters('fsf'))
The issue with your code is that you're using val1 from a specific word in your first loop for another word in your second loop. So val1 will be the wrong value most of the time as you're using the position of the first letter in the last word you checked in your first loop for every word in your seconds loop.
There are a lot of ways to solve what you're trying to do. However, my code below should be fairly close to what you had in mind with your solution. I have tried to explain everything that's going on in the comments:
# Read words from file
with open("words_alpha.txt") as f:
words = f.readlines()
# Begin infinite loop
while True:
# Get user input
search = input("Input letters here: ")
# Loop over all words
for word in words:
# Remove newline characters at the end
word = word.strip()
# Start looking for the letters at the beginning of the word
position = -1
# Check position for each letter
for letter in search:
position = word[position + 1:].find(letter)
# Break out of loop if letter not found
if position < 0:
break
# If there was no `break` in the loop, the word contains all letters
else:
print(word)
For every new letter we start looking beginning at position + 1 where position is the position of the previously found letter. (That's why we have to do position = -1, so we start looking for the first letter at -1 + 1 = 0.)
You should ideally move the removal of \n outside of the loop, so you will have to do it once and not for every search. I just left it inside the loop for consistency with your code.
Also, by the way, there's no handling of uppercase/lowercase for now. So, for example, should the search for abc be different from Abc? I'm not sure, what you need there.

Fastest way to determine if two strings differ by a single letter in a large set of strings

I am trying to compare two strings and add one of the strings to a list if they are almost equal (differ by a single letter). What would be the fastest way to do this as my set of words is over 90k and doing this often takes too long?
EDIT: one of the words (comparison_word in code below) does not change.
EDIT2: the words must be of equal length
This is my current code:
for word in set_of_words:
amount = 0
if len(word) == len(comparison_word):
for i in range(len(word)):
if comparison_word[i] != word[i]:
amount += 1
if amount == 1:
list_of_words.append(word)
return list_of_words
You might find zip is a more efficient than indexing:
def almost_equal(set_of_words,comp):
ln = len(comp)
for word in set_of_words:
count = 0
if len(word) == ln:
for a, b in zip(word, comp):
count += a != b
if count == 2:
break
else:
yield word
Demo:
In [5]: list(almost_equal(["foo","bar","foob","foe"],"foa"))
Out[5]: ['foo', 'foe']
The following searches my dictionary of 61K words in about 25 msec.
import re
def search(word, text):
ws = [r'\b{}[^{}]{}\b'.format(w[:i],w[i],w[i+1:]) for i in range(len(word))]
for mo in re.finditer('|'.join(ws), text):
yield mo.group()
with open("/12dicts/5desk.txt") as f:
text = f.read()
for hit in search('zealoos', text):
print(hit) #prints zealous
Presuming that the list of strings is in a file, one string per line, read it in as one long string and use a regular expression to search the string for matches.
search() takes a word like 'what' and turns it into a regular expression like this:
\b[^w]hat\b|\bw[^h]at\b|\bwh[^a]t\b|\bwha[^t]\b
And then scans all the words and find all the near misses--at C-speed.
The idea is to reduce the amount of work being done:
n_comparison_word = len(comparison_word)
for word in set_of_words:
amount = 0
n_word = len(word)
if n_word != n_comparison_word:
continue
for i in range(n_word):
if comparison_word[i] != word[i]:
amount += 1
if amount == 2:
break
if amount == 1:
list_of_words.append(word)
return list_of_words
Some notes:
The value of len(comparison_word) needs to be computed only once (ever).
The value of len(word) needs to computed once (per iteration of the loop).
You know you can stop looking at a word when amount reaches the value 2 (or more - in any case that word can no longer be part of the result).
It may be worth reading this part of the Python documentation regarding the continue and break statements which are both used in the code.
Haven't done exhaustive testing, but if comparison_word is not too long (fewer than 6 letters), and your set_of_words can change, then it might be worth it to compute all acceptable words, store those in a set, a simply iterate through set_of_words and test for word in acceptable_words.
If not, here's my take on your code:
for word in set_of_words:
different_letter_exists = False
length = len(word)
if length == len(comparison_word):
for i, letter in enumerate(word):
if letter != comparison_word[i]:
if different_letter_exists:
break
else:
different_letter_exists = True
if i == length:
list_of_words.append(word)
Essentially: for every word, once you encounter an different letter, different_letter_exists is set to True. If you encounter it again, you break out of the loop. A new word is only added if i == length, which only happens if enumerate gets all the way to the end, which only happens if only one different letter exists.
Good luck :)

exercise 12.4 from "think python", the script doesnt get to the end

What is the longest English word, that remains a valid English word, as you remove its
letters one at a time?
Now, letters can be removed from either end, or the middle, but you can’t rearrange any
of the letters. Every time you drop a letter, you wind up with another English word. If
you do that, you’re eventually going to wind up with one letter and that too is going
to be an English word—one that’s found in the dictionary. I want to know what’s the
longest word and how many letters does it have?
I’m going to give you a little modest example: Sprite. Ok? You start off with sprite,
you take a letter off, one from the interior of the word, take the r away, and we’re left
with the word spite, then we take the e off the end, we’re left with spit, we take the s off,
we’re left with pit, it, and I.
I wrote it by defining two functions:
reduced - for a given word returning all the reduced words that are
real words. if the word is "a" or "i" that means the word is the
minimum length word so it returns True. if the word does not have any
real reducible words it returns False.
create - for a given word (it actually gets a one word string), returns True if the word is reducible till it gets to "a" or "i"
def reduced(words):
''' creating a list of reduced words True : 'a' or 'i' False : no reduced words'''
words = list()
if word == 'a' or word == 'i':
return True
for letter in range(len(word)):
reduced_word = word[:letter] + word[letter+1:]
if reduced_word in world_list:
words.append(reduced_word)
if len(words) == 0:
return False
return words
def create(root):
'''
getting a list type!
return True : root reducable till 0
return False: else'''
if root == True:
return True
elif root == False:
return False
else:
for word in root:
word = reduced(word)
return create(word)
fin = open("words.txt")
world_list = list() #world list
reducable_words = list() #list of reducable words
longest = "" # the longest reducable word
# Creating world_list
for line in fin:
word = line.strip()
world_list.append(word)
# Creating tuples list
for word in world_list:
if (create([word])) == True:
reducable_words.append(word)
print(reducable_words)
The problem is, the script never gets to the last line, there is a problem with the second for loop. The world_list is correctly appended, so I can't see why this isn't working.
Your create() function seems of no value, I suggest tossing it altogether and focusing on your reduced() function which is very close. I've made some small changes to it: world_list -> word_list; instead of True or False, it returns an empty or non-empty list; it only returns non-empty if a complete reduction can be found (but ignores multiple possible reductions.)
def reduced(word):
''' returns a list of reduced words or an empty list if no reduced words '''
if word == 'a' or word == 'i':
return list(word)
words = list()
for letter in range(len(word)):
reduced_word = word[:letter] + word[letter + 1:]
if reduced_word in word_list:
words = reduced(reduced_word)
if words:
return [word] + words
return words
Using this slight rework of reduce(), you should be able to finish off the remainder of your program.
('daunt', '->', ['daunt', 'aunt', 'ant', 'at', 'a'])

Small issue with Palindrome program

I've been working on this Palindrome program and am really close to completing it.Close to the point that it's driving me a bit crazy haha.
The program is supposed to check each 'phrase' to determine if it is a Palindrome or not and return a lowercase version with white space and punctuation removed if it is in fact a Palindrome. Otherwise, if not, it's supposed to return None.
I'm just having an issue with bringing my test data into the function. I can't seem to think of the correct way of dealing with it. It's probably pretty simple...Any ideas?
Thanks!
import string
def reverse(word):
newword = ''
letterflag = -1
for numoletter in word:
newword += word[letterflag]
letterflag -= 1
return newword
def Palindromize(phrase):
for punct in string.punctuation:
phrase= phrase.replace(punct,'')
phrase = str(phrase.lower())
firstindex = 0
secondindex = len(phrase) - 1
flag = 0
while firstindex != secondindex and firstindex < secondindex:
char1 = phrase[firstindex]
char2 = phrase[secondindex]
if char1 == char2:
flag += 1
else:
break
firstindex += 1
secondindex -= 1
if flag == len(phrase) // (2):
print phrase.strip()
else:
print None
def Main():
data = ['Murder for a jar of red rum',12321, 'nope', 'abcbA', 3443, 'what',
'Never odd or even', 'Rats live on no evil star']
for word in data:
word == word.split()
Palindromize(word)
if __name__ == '__main__':
Main()
Maybe this line is causing the problems.
for word in data:
word == word.split() # This line.
Palindromize(word)
You're testing for equality here, rather than reassigning the variable word which can be done using word = word.split(). word then becomes a list, and you might want to iterate over the list using
for elem in word:
Palindromize(elem)
Also, you seem to be calling the split method on int, which is not possible, try converting them to strings.
Also, why do you convert the phrase to lower case in the for loop, just doing it once will suffice.
At the "core" of your program, you could do much better in Python, using filter for example. Here is a quick demonstration:
>>> phrase = 'Murder for a jar of red rum!'
>>> normalized = filter(str.isalnum, phrase.lower())
>>> normalized
'murderforajarofredrum'
>>> reversed = normalized[-1::-1]
>>> reversed
'murderforajarofredrum'
# Test is it is a palindrome
>>> reversed == normalized
True
Before you go bananas, let's rethink the problem:
You have already pointed out that Palindromes only make sense in strings without punctuation, whitespace, or mixed case. Thus, you need to convert your input string, either by removing the unwanted characters or by picking the allowed ones. For the latter, one can imagine:
import string
clean_data = [ch for ch in original_data if ch in string.ascii_letters]
clean_data = ''.join(clean_data).lower()
Having the cleaned version of the input, one might consider the third parameter in slicing of strings, particularly when it's -1 ;)
Does a comparison like
if clean_data[::-1] == clean_data:
....
ring a bell?
One of the primary errors that i spotted is here:
for word in data:
word==word.split()
Here, there are two mistakes:
1. Double equals make no point here.
2. If you wish to split the contents of each iteration of data, then doing like this doesn't change the original list, since you are modifying the duplicate set called word. To achieve your list, do:
for i in range(data):
data[i]=data[i].split()
This may clear your errors

Categories