Compare a permutation to list of words - Python - python

I'm trying to compare the output of a permutation of a string to a txt with pretty much every word in the dictionary. The function itself is an anagram solver.
The function takes word as a parameter. Here's basically what I have
def anagram(word):
{c for c in permutations(word,len(word))}
this output will give me a set of possible combinations of word.
If word = dog, the output will be:
[{'d','g','g'},{'g','o','d'}
plus the other 6 or so combinations.
I want to compare the result of this permutation to a list of words and then return the word(s) which are anagrams of the original word.
So
if result (god or dog or dgo or gdo...) is in word_list:
return result
Thanks in advance!
EDIT
Sorry I didn't explicitly say that the word list had already been imported in as a set/list.
The code for that is:
def load_words():
name = 'words.2-10.txt'
if isfile(name):
all_words = [ l.rstrip() for l in open(name, 'r') ]
as_lists = {}
for size in range(2, 11):
as_lists[size] = [ word for word in all_words if len(word) == size ]
as_sets = { size : (set(words) if words else None) for size, words in as_lists.iteritems() }
return as_lists, as_sets
return None, None
word_lists, word_sets = load_words()
Apologies!

First, you can get all the words from the file, and form a set using set comprehension, like this
with open("strings.txt") as strings_file:
words = {line.strip() for line in strings_file}
And then, when you generate the permutations, just join them with "".join, like this
def anagram(word):
return {"".join(c) for c in permutations(word, len(word))}
and then you can simply do set intersection operation, like this
print words & anagram("dog")
Now, you can use the same set of words to compare against any number of permutations, like this
print words & anagram("cabbage")
print words & anagram("Jon")
print words & anagram("Ffisegydd")

import itertools
def anagram(word):
for w in itertools.permutations(word):
yield ''.join(w)
def main():
word = input("Enter a word: ")
listOfWords = ['some', 'list', 'of', 'words']
for w in anagram(word):
if w in listOfWords:
print(w, 'is in the list')

Here is an approach with python sets. If you need a more efficient way than sets you should look into generators (yield) and the filte method.
from itertools import permutations
word_list = set(["god", "dog", "bla"])
def anagram(word):
perm = [''.join(p) for p in permutations(word)]
return set(perm)
dog = anagram("dog")
print word_list.intersection(dog)

Related

How to print palindroms of given list of words usinf FOR LOOP?

I'am searching on the internet quite long but I'am not succesfull. I have a list of words:
words = ["kayak", "pas", "pes", "reviver", "caj", "osel", "racecar","bizon", "zubr", "madam"]
And I need to print only words that are palindromes. I found this solution:
words = ["kayak", "pas", "pes", "reviver", "caj", "osel", "racecar","bizon", "zubr", "madam"]
palindrome = list(filter(lambda x: (x == "".join(reversed(x))), words))
print(palindrome)
But I dont like it. I have to use FOR loop somehow and I dont know how. I tried many things but still dont get it.
Thank you.
Here is the example with explanations:
words = ['kayak', 'pas', 'pes', 'reviver', 'caj', 'osel', 'racecar','bizon', 'zubr', 'madam']
# Start with empty list
palindromes = []
# Iterate over each word
for word in words:
# Check if word is equal to same word in reverse order
is_palindrome = (word == word[::-1])
# Append to results if needed
if is_palindrome:
palindromes.append(word)
print(palindromes)
# => ['kayak', 'reviver', 'racecar', 'madam']
With a for loop:
palindrome = []
for i in words:
if i == "".join(reversed(i)):
palindrome.append(i)
print(palindrome)
Or list comprehension:
palindrome = [i for i in words if i == "".join(reversed(i))]
print(palindrome)
Output:
['kayak', 'reviver', 'racecar', 'madam']

Looking through a text file

I've been trying to create a function which gives a hint to the user who plays hangman.
The idea behind the function is that I'm having a list of 5k words plus and I need to sort it out through numerous indicators, such as the word should be the same as the pattern, if the pattern is a___le so the words that I should look for suppose to be in the same group and that if the user has numerous wrong letter it'll not consider the words whom includes this letters
I'm aware that I didn't do it in the most pythonic or elegant way but if someone can tell me what is going wrong here I'm always getting a empty list or a list containing the words with the same length but the other conditions are being constantly ignored.
def filter_words_list(words, pattern, wrong_guess_lst):
"""
:param words: The words I received from the main function
:param pattern: the pattern of the word in seach such as p__pl_
:param wrong_guess_lst: the set of wrong used letters of the users
:return: the function returns the words which are much to the conditions.
"""
list(wrong_guess_lst) # Since I am receiving it as a set I'm converting it to a list.
words_suggestions = [] # The list I'd like to put my suggested words.
for i in range(0, len(words)): # First loop matching between the length of the patterns and the words
if len(words[i]) == len(pattern):
for j in range(0, len(pattern)):
if pattern[j] != '_':
if pattern[j] == words[i][j]: # Checking if the letters of the words are a much.
for t in range(0, len(wrong_guess_lst)):
if wrong_guess_lst[t] != words[i][j]: # Does the same as before but only with the wrong guess lst.
words_suggestions.append(words[i])
return words_suggestions
I think this is what you are looking for (explanation in code comments):
def get_suggestions(words: list, pattern: str, exclude: list) -> list:
"""Finds pattern and returns all words matching it."""
# get the length of the pattern for filtering
length = len(pattern)
# create a filtered generator so that memory is not take up;
# it only give the items from the word list that match the
# conditions i.e. the same length as pattern and not in excluded words
filter_words = (word for word in words
if len(word) == length and word not in exclude)
# create a mapping of all the letters and their corresponding index in the string
mapping = {i: letter for i, letter in enumerate(pattern) if letter != '_'}
# return list comprehension that is made of words in the filtered words that
# match the condition that all letters are in the same place and have the same
# value as the mapping
return [word for word in filter_words
if all(word[i] == v for i, v in mapping.items())]
word_list = [
'peach', 'peace', 'great', 'good', 'food',
'reach', 'race', 'face', 'competent', 'completed'
]
exclude_list = ['good']
word_pattern = 'pe___'
suggestions = get_suggestions(word_list, word_pattern, exclude_list)
print(suggestions)
# output:
# ['peach', 'peace']
# a bit of testing
# order of items in the list is important
# it should be the same as in the word_list
patterns_and_answers = {
'_oo_': ['food'], # 'good' is in the excluded words
'_omp_____': ['competent', 'completed'],
'__ce': ['race', 'face'],
'gr_a_': ['great'],
'_a_e': ['race', 'face'],
'_ea__': ['peach', 'peace', 'reach']
}
for p, correct in patterns_and_answers.items():
assert get_suggestions(word_list, p, exclude_list) == correct
print('all test cases successfull')

Optimization of Scrabble Algorithm

I'm trying to write an algorithm that by given to it a bunch of letters is giving you all the words that can be constructed of the letters, for instance, given 'car' should return a list contains [arc,car,a, etc...] and out of it returns the best scrabble word. The problem is in finding that list which contains all the words.
I've got a giant txt file dictionary, line delimited and I've tried this so far:
def find_optimal(bunch_of_letters: str):
words_to_check = []
c1 = Counter(bunch_of_letters.lower())
for word in load_words():
c2 = Counter(word.lower())
if c2 & c1 == c2:
words_to_check.append(word)
max_word = max_word_value(words_to_check)
return max_word,calc_word_value(max_word)
max_word_value - returns the word with the maximum value of the list given
calc_word_value - returns the word's score in scrabble.
load_words - return a list of the dictionary.
I'm currently using counters to do the trick but, the problem is that I'm currently on about 2.5 seconds per search and I don't know how to optimize this, any thoughts?
Try this:
def find_optimal(bunch_of_letters):
bunch_of_letters = ''.join(sorted(bunch_of_letters))
words_to_check = [word for word in load_words() if ''.join(sorted(word)) in bunch_of_letters]
max_word = max_word_value(words_to_check)
return max_word, calc_word_value(max_word)
I've just used (or at least tried to use) a list comprehension. Essentially, words_to_check will (hopefully!) be a list of all of the words which are in your text file.
On a side note, if you don't want to use a gigantic text file for the words, check out enchant!
from itertools import permutations
theword = 'car' # or we can use input('Type in a word: ')
mylist = [permutations(theword, i)for i in range(1, len(theword)+1)]
for generator in mylist:
for word in generator:
print(''.join(word))
# instead of .join just print (word) for tuple
Output:
c
a
r
ca
cr
...
ar rc ra car cra acr arc rca rac
This will give us all the possible combinations (i.e. permutations) of a word.
If you're looking to see if the generated word is an actual word in the English dictionary we can use This Answer
import enchant
d = enchant.Dict("en_US")
for word in mylist:
print(d.check(word), word)
Conclusion:
If want to generate all the combinations of the word. We use this code:
from itertools import combinations, permutations, product
word = 'word' # or we can use input('Type in a word: ')
solution = permutations(word, 4)
for i in solution:
print(''.join(i)) # just print(i) if you want a tuple

For each word in the text file, extract surrounding 5 words

For each occurrence of a certain word, I need to display the context by showing about 5 words preceding and following the occurrence of the word.
Example output for the word 'stranger' in a text file of content when you enter occurs('stranger', 'movie.txt'):
My code so far:
def occurs(word, filename):
infile = open(filename,'r')
lines = infile.read().splitlines()
infile.close()
wordsString = ''.join(lines)
words = wordsString.split()
print(words)
for i in range(len(words)):
if words[i].find(word):
#stuck here
I'd suggest slicing words depending on i:
print(words[i-5:i+6])
(This would go where your comment is)
Alternatively, to print as shown in your example:
print("...", " ".join(words[i-5:i+6]), "...")
To account for the word being in the first 5:
if i > 5:
print("...", " ".join(words[i-5:i+6]), "...")
else:
print("...", " ".join(words[0:i+6]), "...")
Additionally, find is not doing what you think it is. If find() doesn't find the string, it returns -1 which evaluates to True when used in a if statement. Try:
if word in words[i].lower():
This retrieves the index of every occurrence of the word in words, which is a list of all words in the file. Then slicing is used to get a list of the matched word and the 5 words before and after.
def occurs(word, filename):
infile = open(filename,'r')
lines = infile.read().splitlines()
infile.close()
wordsString = ''.join(lines)
words = wordsString.split()
matches = [i for i, w in enumerate(words) if w.lower().find(word) != -1]
for m in matches:
l = " ".join(words[m-5:m+6])
print(f"... {l} ...")
Consider the more_itertools.adajacent tool.
Given
import more_itertools as mit
s = """\
But we did not answer him, for he was a stranger and we were not used to, strangers and were shy of them.
We were simple folk, in our village, and when a stranger was a pleasant person we were soon friends.
"""
word, distance = "stranger", 5
words = s.splitlines()[0].split()
Demo
neighbors = list(mit.adjacent(lambda x: x == word, words, distance))
" ".join(word for bool_, word in neighbors if bool_)
# 'him, for he was a stranger and we were not used'
Details
more_itertools.adjacent returns an iterable of tuples, e.g. (bool, item) pairs. A True boolean is returned for words in the string that satisfy the predicate. Example:
>>> neighbors
[(False, 'But'),
...
(True, 'a'),
(True, 'stranger'),
(True, 'and'),
...
(False, 'to,')]
Neighboring words are filtered from the results given a distance from the target word.
Note: more_itertools is a third-party library. Install by pip install more_itertools.
Whenever I see rolling views of files, I think collections.deque
import collections
def occurs(needle, fname):
with open(fname) as f:
lines = f.readlines()
words = iter(''.join(lines).split())
view = collections.deque(maxlen=11)
# prime the deque
for _ in range(10): # leaves an 11-length deque with 10 elements
view.append(next(words, ""))
for w in words:
view.append(w)
if view[5] == needle:
yield list(view.copy())
Note that this approach intentionally does not handle any edge cases for needle names in the first 5 words or the last 5 words of the file. The question is ambiguous as to whether matching the third word should give the first through ninth words, or something different.

How many common English words of 4 letters or more can you make from the letters of a given word (each letter can only be used once)

On the back of a block calendar I found the following riddle:
How many common English words of 4 letters or more can you make from the letters
of the word 'textbook' (each letter can only be used once).
My first solution that I came up with was:
from itertools import permutations
with open('/usr/share/dict/words') as f:
words = f.readlines()
words = map(lambda x: x.strip(), words)
given_word = 'textbook'
found_words = []
ps = (permutations(given_word, i) for i in range(4, len(given_word)+1))
for p in ps:
for word in map(''.join, p):
if word in words and word != given_word:
found_words.append(word)
print set(found_words)
This gives the result set(['tote', 'oboe', 'text', 'boot', 'took', 'toot', 'book', 'toke', 'betook']) but took more than 7 minutes on my machine.
My next iteration was:
with open('/usr/share/dict/words') as f:
words = f.readlines()
words = map(lambda x: x.strip(), words)
given_word = 'textbook'
print [word for word in words if len(word) >= 4 and sorted(filter(lambda letter: letter in word, given_word)) == sorted(word) and word != given_word]
Which return an answer almost immediately but gave as answer: ['book', 'oboe', 'text', 'toot']
What is the fastest, correct and most pythonic solution to this problem?
(edit: added my earlier permutations solution and its different output).
I thought I'd share this slightly interesting trick although it takes a good bit more code than the rest and isn't really "pythonic". This will take a good bit more code than the other solutions but should be rather quick if I look at the timing the others need.
We're doing a bit preprocessing to speed up the computations. The basic approach is the following: We assign every letter in the alphabet a prime number. E.g. A = 2, B = 3, and so on. We then compute a hash for every word in the alphabet which is simply the product of the prime representations of every character in the word. We then store every word in a dictionary indexed by the hash.
Now if we want to find out which words are equivalent to say textbook we only have to compute the same hash for the word and look it up in our dictionary. Usually (say in C++) we'd have to worry about overflows, but in python it's even simpler than that: Every word in the list with the same index will contain exactly the same characters.
Here's the code with the slightly optimization that in our case we only have to worry about characters also appearing in the given word, which means we can get by with a much smaller prime table than otherwise (the obvious optimization would be to only assign characters that appear in the word a value at all - it was fast enough anyhow so I didn't bother and this way we could pre process only once and do it for several words). The prime algorithm is useful often enough so you should have one yourself anyhow ;)
from collections import defaultdict
from itertools import permutations
PRIMES = list(gen_primes(256)) # some arbitrary prime generator
def get_dict(path):
res = defaultdict(list)
with open(path, "r") as file:
for line in file.readlines():
word = line.strip().upper()
hash = compute_hash(word)
res[hash].append(word)
return res
def compute_hash(word):
hash = 1
for char in word:
try:
hash *= PRIMES[ord(char) - ord(' ')]
except IndexError:
# contains some character out of range - always 0 for our purposes
return 0
return hash
def get_result(path, given_word):
words = get_dict(path)
given_word = given_word.upper()
result = set()
powerset = lambda x: powerset(x[1:]) + [x[:1] + y for y in powerset(x[1:])] if x else [x]
for word in (word for word in powerset(given_word) if len(word) >= 4):
hash = compute_hash(word)
for equiv in words[hash]:
result.add(equiv)
return result
if __name__ == '__main__':
path = "dict.txt"
given_word = "textbook"
result = get_result(path, given_word)
print(result)
Runs on my ubuntu word list (98k words) rather quickly, but not what I'd call pythonic since it's basically a port of a c++ algorithm. Useful if you want to compare more than one word that way..
How about this?
from itertools import permutations, chain
with open('/usr/share/dict/words') as fp:
words = set(fp.read().split())
given_word = 'textbook'
perms = (permutations(given_word, i) for i in range(4, len(given_word)+1))
pwords = (''.join(p) for p in chain(*perms))
matches = words.intersection(pwords)
print matches
which gives
>>> print matches
set(['textbook', 'keto', 'obex', 'tote', 'oboe', 'text', 'boot', 'toto', 'took', 'koto', 'bott', 'tobe', 'boke', 'toot', 'book', 'bote', 'otto', 'toke', 'toko', 'oket'])
There is a generator itertools.permutations with which you can gather all permutations of a sequence with a specified length. That makes it easier:
from itertools import permutations
GIVEN_WORD = 'textbook'
with open('/usr/share/dict/words', 'r') as f:
words = [s.strip() for s in f.readlines()]
print len(filter(lambda x: ''.join(x) in words, permutations(GIVEN_WORD, 4)))
Edit #1: Oh! It says "4 or more" ;) Forget what I said!
Edit #2: This is the second version I came up with:
LETTERS = set('textbook')
with open('/usr/share/dict/words') as f:
WORDS = filter(lambda x: len(x) >= 4, [l.strip() for l in f])
matching = filter(lambda x: set(x).issubset(LETTERS) and all([x.count(c) == 1 for c in x]), WORDS)
print len(matching)
Create the whole power set, then check whether the dictionary word is in the set (order of the letters doesn't matter):
powerset = lambda x: powerset(x[1:]) + [x[:1] + y for y in powerset(x[1:])] if x else [x]
pw = map(lambda x: sorted(x), powerset(given_word))
filter(lambda x: sorted(x) in pw, words)
The following just checks each word in the dictionary to see if it is of the appropriate length, and then if it is a permutation of 'textbook'. I borrowed the permutation check from
Checking if two strings are permutations of each other in Python
but changed it slightly.
given_word = 'textbook'
with open('/usr/share/dict/words', 'r') as f:
words = [s.strip() for s in f.readlines()]
matches = []
for word in words:
if word != given_word and 4 <= len(word) <= len(given_word):
if all(word.count(char) <= given_word.count(char) for char in word):
matches.append(word)
print sorted(matches)
This finishes almost immediately and gives the correct result.
Permutations get very big for longer words. Try counterrevolutionary for example.
I would filter the dict for words from 4 to len(word) (8 for textbook).
Then I would filter with regular expression "oboe".matches ("[textbook]+").
The remaining words, I would sort, and compare them with a sorted version of your word, ("beoo", "bekoottx") with jumping to the next index of a matching character, to find mismatching numbers of characters:
("beoo", "bekoottx")
("eoo", "ekoottx")
("oo", "koottx")
("oo", "oottx")
("o", "ottx")
("", "ttx") => matched
("bbo", "bekoottx")
("bo", "ekoottx") => mismatch
Since I don't talk python, I leave the implementation as an exercise to the audience.

Categories