How to print words containing specific letters - python

I have file with words, each line contain a word. What I try do to is ask user for a letters and search for a words with all these letters user has input.
I work on it for a few days but can't make lines 7 and 8 running properly only getting different errors or either is not giving any results.
letters = input('letters: ')
words = open('thesewords').read().splitlines()
print (words)
print(".......................")
for word in words:
if all(letters) in word:
print(word)

You are using all() wrongly.
all(letters) is always a True for string letters, and True in <string> returns you a TypeError.
What you should do is:
all(x in word for x in letters)
So, it becomes:
for word in words:
if all(x in word for x in letters):
print(word)

A more simpler solution if you omit all would be:
letters = input('letters: ')
words_in_file = open('thesewords').read().splitlines()
for word in words_in_file:
if letters in words:
print(word)

Try this:
letters = input('letters: ')
# Make sure you include the full file name and close the string
# Also, .readlines() is simpler than .read().splitlines()
words = open('thesewords.txt').readlines()
# I'll assume these are the words:
words = ['spam', 'eggs', 'cheese', 'foo', 'bar']
print(words)
print(".......................")
for word in words:
if all(x in word for x in letters):
print(word)

As the there are a lot of Syntax Error in the code, I am trying to re-write the code which you have provided drawing a rough sketch of your objective. I hope the code below will satisfy your demand.
letters = input("letters:" )
words = open("thesewords.txt","r")
for word in line.split():
print (word)
print(".......................")
for wrd in words:
if letters in wrd:
print(wrd)
else:
continue

Related

receives a string and return with only english words

here is my code
def words_only(sentence):
wordlist1 = sentence.split()
wordlist2 = []
for word in wordlist1:
modified = ''
for char in word:
if char in '_-!,.?":;0123456789':
char = ''
modified += char
wordlist2.append(modified)
return wordlist2
And the description is: words_only receives a string as an argument and returns a list with all the words that were in the sentence. For the purposes of this functions, words are a sequence of only letters (either lower case or upper case
the input is
words_only("two-fold will count as 2 words.")
However, I failed the last test. my output is
['twofold', 'will', 'count', 'as', '', 'words']
the correct output should be
["two", "fold", "will", "count", "as", "words"]
how can I fix my code so that the colon will disappear and "two-fold" will count as 2 words? And there is also an empty string that caused the error.
The problem is that when you have a word that is only formed by the symbols/numbers in your list it will give you an empty string, in the image of your output it seems that you should not add this empty string to your final list. You can fix it by adding an if statement before wordlist2.append(modified_w).
write:
if modified_w:
wordlist2.append(modified_w)
and empty string is considered False so it will not add it, whereas if the symbol/number is within a word it will remove it from the word and then add the corrected word
When using Python, you should utilize its features as much as possible - namely, list comprehension and built-in functions:
[word for word in sentence.split() if word.isalpha()]
This would happen is because you didn't divide the word that conclude str '-'
You can change the code
def words_only(sentence):
wordlist1 = sentence.split()
wordlist2 = []
for word in wordlist1:
modified = ''
for char in word:
if char in '-_0123456789,.[];{}':
if modified: wordlist2.append(modified)
modified = ''
else:
modified += char
if modified: wordlist2.append(modified)
return wordlist2
hope this will be helpful!

How to limit the results of a Python if-in statement when checking if a string is found in another string?

I wrote a Python for loop that goes through each word in the English language (from nltk.corpus import words), and prints words made only of 6 letters provided by the user. The 6 user inputs are stored in a list named characters, so the for loop compares the items from the list to each string (english words).
The problem is that words are printed that contain multiple characters of the same character. For example, if the characters are 'u, l, c, i , e, n', words with multiple letters such as "icicle" are returned. How to I prevent the script from returning words with duplicate letters?
characters = [input1, input2, input3, input4, input5, input6]
for word in word_list:
word = word.lower()
if len(word) == 3:
if word[0] in characters and word[1] in characters and word[2] in characters:
print(word)
elif len(word) == 4:
if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters:
print(word)
elif len(word) == 5:
if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters and word[4] in characters:
print(word)
elif len(word) == 6:
if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters and word[4] in characters and word[5] in characters:
print(word)
I know the code is inefficiently written, so I'd appreciate tips on improvement as well. An example of the results of the above script is:
eel
eileen
eli
ell
elle
ellen
ellice
encell
ennui
eunice
ice
iceni
icicle
ilicic
ilicin
ill
inn
inulin
This is untested since I have no test data, but should do:
characters = [input1, input2, input3, input4, input5, input6]
for word in word_list:
word = word.lower()
isIn = True
for c in word:
if c not in characters or word.count(c) != 1:
isIn = False
if isIn:
print(word)
I don't know this package, but it sounds that your word list is big.
You should use a keyword tree instead of looping through the whole list everytime, when new letters are given. It is possible that this package contains better data structures for accessing those words, if not then you should transform it into a Trie. It is a one-time task and after it, lookup times become faster for every input.
Answering your question, you can make a dictionary, what maps the input letters with their quantities. For example:
input = {'a':1, 'b':2, 'c':1}
Then, if you are looping on each word, costly you can count each letter. If you are using a Trie, then you only need to go over on children and make a recursive call if
input[children's letter] != 0
before the recursive call, you need to decrement that value, and after call increment it.
This way, you only go over on the words that starts the same as your letters instead of going over every word, every time.
Hope it helps :)
You can use collections.Counter.
from collections import Counter
Then, to get Counter objects (essentially multisets) which count how many times each character occurs in the word and in the inputted allowed characters:
word_counter = Counter(word)
characters_counter = Counter(characters)
To check that the word is a subset of the characters, and print if so, do
if word_counter & characters_counter == word_counter:
print(word)
(& means intersection)
Very simple. Quick, because it uses standard library functionality hash maps that are optimized and probably written in C, instead of costly multiple-level list loops and finds and additions and removals. It also has the added benefit that if a user enters the same characters multiple times, then it will allow words with that character repeated multiple times, up to however many times the user entered it.
For example, if the user entered "i, i, c, c, l, e" then the word "icicle" would still be printed, whereas if they entered "i, i, c, z, l, e" then "icicle" would not be printed.
from collections import Counter
# input characters, get words...
characters_counter = Counter(characters)
for word in word_list:
word_counter = Counter(word)
if word_counter & characters_counter == word_counter:
print(word)
Done!
My first thought about the efficiency is:
def test_word(word, characters):
for i in range(len(word)):
if word[i] not in characters: # Does everything in 2 lines :)
return False
return True
This function returns False if the word has letters not in the list "characters", and True otherwise.
The reason I used a function is simply because it is neater and you can run the code from any point in your program easily. Make sure you use a copy of the list "characters" if you need to use it in the future:
copy_of_chars = characters.copy()
test_word(word, copy_of_chars)
About the duplicate letters- I would delete any letter in the list that has been "found":
def test_word(word, characters):
for i in range(len(word)):
if word[i] not in characters:
return False
characters.pop(characters.index([word[i]])) # Removes the letter from the list "characters"
return True
This function will return False if the word has characters not in the list characters, or if it has multiple letters when only one can be found in the list "characters". Otherwise it will return True.
Hope this helps!
Didn't test it:
for word in word_list:
if word < 6:
if all(letter in character for letter in list(word.lower()):
print(word)

Printing only words from a list that contain characters from another list?

I am working on a small problem for fun, sent to me by a friend. The problem requires me to populate an array with common words from a text file, and then print all the words from this list containing certain characters provided by the user. I am able to populate my array no problem, but it seems the part of the code that actually compares the two lists is not working. Below is the function I've written to compare the 2 lists.
#Function that prompts user for the set of letters to match and then compares that list of letters to each word in our wordList.
def getLetters():
#Prompt user for list of letters and convert that string into a list of characters
string = input("Enter your target letters: ")
letterList = list(string)
#For each word in the wordList, loop through each character in the word and check to see if the character is in our letter list, if it is increase matchCount by 1.
for word in wordList:
matchCount = 0
for char in word:
if char in letterList:
matchCount+=1
#If matchCount is equal to the length of the word, all of the characters in the word are present in our letter list and the word should be added to our matchList.
if matchCount == len(word):
matchList.append(word)
print(matchList)
The code runs just fine, I don't get any error output, but once the user enters their list of letters, nothing happens. To test I've tried a few inputs matching up with words I know are in my wordList (e.g. added, axe, tree, etc). But nothing ever prints after I enter my letter string.
This is how I populate my wordList:
def readWords(filename):
try:
with open(filename) as file:
#Load entire file as string, split string into word list using whitespace as delimiter
s = file.read()
wordList = s.split(" ")
getLetters()
#Error handling for invalid filename. Just prompts the user for filename again. Should change to use ospath.exists. But does the job for now
except FileNotFoundError:
print("File does not exist, check directory and try again. Dictionary file must be in program directory because I am bad and am not using ospath.")
getFile()
Edit: Changed the function to reset matchCount to 0 before it starts looping characters, still no output.
Your code only needs a simple change:
Pass wordList as a parameter for getLetters. Also if you like you could make a change in order to know if all the letters of the word are in the letter list.
def getLetters(wordList):
string = input("Enter your target letters: ")
letterList = list(string)
matchList = []
for word in wordList:
if all([letter in letterList for letter in word]):
matchList.append(word)
return matchList
And in readWords:
def readWords(filename):
try:
with open(filename) as file:
s = file.read()
wordList = s.split(" ")
result = getLetters(wordList)
except FileNotFoundError:
print("...")
else:
# No exceptions.
return result
Edit: add a global declaration to modify your list from inside a function:
wordList = [] #['axe', 'tree', 'etc']
def readWords(filename):
try:
with open(filename) as file:
s = file.read()
global wordList # must add to modify global list
wordList = s.split(" ")
except:
pass
Here is a working example:
wordList = ['axe', 'tree', 'etc']
# Function that prompts user for the set of letters to match and then compares that list of letters to each word in our wordList.
def getLetters():
# Prompt user for list of letters and convert that string into a list of characters
string = input("Enter your target letters: ")
letterList = list(string)
# For each word in the wordList, loop through each character in the word and check to see if the character is in our letter list, if it is increase matchCount by 1.
matchList = []
for word in wordList:
matchCount = 0
for char in word:
if char in letterList:
matchCount += 1
# If matchCount is equal to the length of the word, all of the characters in the word are present in our letter list and the word should be added to our matchList.
if matchCount == len(word):
matchList.append(word)
print(matchList)
getLetters()
output:
Enter your target letters: xae
['axe']

Finding the number of words with all vowels

I am given a text file that is stored in a list called words_list:
if __name__ = "__main__":
words_file = open('words.txt')
words_list = []
for w in words_file:
w = w.strip().strip('\n')
words_list.append(w)
That's what the list of strings look like (it's a really, really long list of words)
I have to find "all the words" with all of the vowels; so far I have:
def all_vowel(words_list):
count = 0
for w in words_list:
if all_five_vowels(w): # this function just returns true
count = count + 1
if count == 0
print '<None found>'
else
print count
The problem with this is that count adds 1 every time it sees a vowel, whereas I want it to add 1 only if the entire word has all of the vowels.
Simply test if any of your words are a subset of the vowels set:
vowels = set('aeiou')
with open('words.txt') as words_file:
for word in words_file:
word = word.strip()
if vowels.issubset(word):
print word
set.issubset() works on any sequence (including strings):
>>> set('aeiou').issubset('word')
False
>>> set('aeiou').issubset('education')
True
Assuming the word_list variable is an actual list, probably your "all_five_vowels" function is wrong.
This could be an alternative implementation:
def all_five_vowels(word):
vowels = ['a','e','o','i','u']
for letter in word:
if letter in vowels:
vowels.remove(letter)
if len(vowels) == 0:
return True
return False
#Martijn Peters has already posted a solution that is probably the fastest solution in Python. For completeness, here is another good way to solve this in Python:
vowels = set('aeiou')
with open('words.txt') as words_file:
for word in words_file:
word = word.strip()
if all(ch in vowels for ch in word):
print word
This uses the built-in function all() with a generator expression, and it's a handy pattern to learn. This reads as "if all the characters in the word are vowels, print the word." Python also has any() which could be used for checks like "if any character in the word is a vowel, print the word".
More discussion of any() and all() here: "exists" keyword in Python?

Code to detect all words that start with a capital letter in a string

I'm writing out a small snippet that grabs all letters that start with a capital letter in python . Here's my code
def WordSplitter(n):
list1=[]
words=n.split()
print words
#print all([word[0].isupper() for word in words])
if ([word[0].isupper() for word in words]):
list1.append(word)
print list1
WordSplitter("Hello How Are You")
Now when I run the above code. Im expecting that list will contain all the elements, from the string , since all of the words in it start with a capital letter.
But here's my output:
#ubuntu:~/py-scripts$ python wordsplit.py
['Hello', 'How', 'Are', 'You']
['You']# Im expecting this list to contain all words that start with a capital letter
You're only evaluating it once, so you get a list of True and it only appends the last item.
print [word for word in words if word[0].isupper() ]
or
for word in words:
if word[0].isupper():
list1.append(word)
You can take advantage of the filter function:
l = ['How', 'are', 'You']
print filter(str.istitle, l)
I have written the following python snippet to store the capital letter starting words into a dictionary as key and no of its appearances as a value in this dictionary against the key.
#!/usr/bin/env python
import sys
import re
hash = {} # initialize an empty dictinonary
for line in sys.stdin.readlines():
for word in line.strip().split(): # removing newline char at the end of the line
x = re.search(r"[A-Z]\S+", word)
if x:
#if word[0].isupper():
if word in hash:
hash[word] += 1
else:
hash[word] = 1
for word, cnt in hash.iteritems(): # iterating over the dictionary items
sys.stdout.write("%d %s\n" % (cnt, word))
In the above code, I shown both ways, the array index to check for the uppercase start letter and by using the regular expression. Anymore improvement suggestion for the above code for performance or for simplicity is welcome

Categories