Finding the number of words with all vowels - python

I am given a text file that is stored in a list called words_list:
if __name__ = "__main__":
words_file = open('words.txt')
words_list = []
for w in words_file:
w = w.strip().strip('\n')
words_list.append(w)
That's what the list of strings look like (it's a really, really long list of words)
I have to find "all the words" with all of the vowels; so far I have:
def all_vowel(words_list):
count = 0
for w in words_list:
if all_five_vowels(w): # this function just returns true
count = count + 1
if count == 0
print '<None found>'
else
print count
The problem with this is that count adds 1 every time it sees a vowel, whereas I want it to add 1 only if the entire word has all of the vowels.

Simply test if any of your words are a subset of the vowels set:
vowels = set('aeiou')
with open('words.txt') as words_file:
for word in words_file:
word = word.strip()
if vowels.issubset(word):
print word
set.issubset() works on any sequence (including strings):
>>> set('aeiou').issubset('word')
False
>>> set('aeiou').issubset('education')
True

Assuming the word_list variable is an actual list, probably your "all_five_vowels" function is wrong.
This could be an alternative implementation:
def all_five_vowels(word):
vowels = ['a','e','o','i','u']
for letter in word:
if letter in vowels:
vowels.remove(letter)
if len(vowels) == 0:
return True
return False

#Martijn Peters has already posted a solution that is probably the fastest solution in Python. For completeness, here is another good way to solve this in Python:
vowels = set('aeiou')
with open('words.txt') as words_file:
for word in words_file:
word = word.strip()
if all(ch in vowels for ch in word):
print word
This uses the built-in function all() with a generator expression, and it's a handy pattern to learn. This reads as "if all the characters in the word are vowels, print the word." Python also has any() which could be used for checks like "if any character in the word is a vowel, print the word".
More discussion of any() and all() here: "exists" keyword in Python?

Related

How to print words containing specific letters

I have file with words, each line contain a word. What I try do to is ask user for a letters and search for a words with all these letters user has input.
I work on it for a few days but can't make lines 7 and 8 running properly only getting different errors or either is not giving any results.
letters = input('letters: ')
words = open('thesewords').read().splitlines()
print (words)
print(".......................")
for word in words:
if all(letters) in word:
print(word)
You are using all() wrongly.
all(letters) is always a True for string letters, and True in <string> returns you a TypeError.
What you should do is:
all(x in word for x in letters)
So, it becomes:
for word in words:
if all(x in word for x in letters):
print(word)
A more simpler solution if you omit all would be:
letters = input('letters: ')
words_in_file = open('thesewords').read().splitlines()
for word in words_in_file:
if letters in words:
print(word)
Try this:
letters = input('letters: ')
# Make sure you include the full file name and close the string
# Also, .readlines() is simpler than .read().splitlines()
words = open('thesewords.txt').readlines()
# I'll assume these are the words:
words = ['spam', 'eggs', 'cheese', 'foo', 'bar']
print(words)
print(".......................")
for word in words:
if all(x in word for x in letters):
print(word)
As the there are a lot of Syntax Error in the code, I am trying to re-write the code which you have provided drawing a rough sketch of your objective. I hope the code below will satisfy your demand.
letters = input("letters:" )
words = open("thesewords.txt","r")
for word in line.split():
print (word)
print(".......................")
for wrd in words:
if letters in wrd:
print(wrd)
else:
continue

How does this code only print out the initials of a string?

This is the function:
def initials(phrase):
words = phrase.split()
result = ""
for word in words:
result += word[0]
return result.upper()
This is an exercise on my online course. The objective is to return the first initials of a string capitalized. For example, initials ("Universal Serial Bus") should return "USB".
phrase is a str type object.
str objects can have functions applied to them through their methods. split is a function that returns a list containing multiple str objects. This is stored in words
the for word in words takes each element of words and puts it in the variable word for each iteration of the loop.
The += function adds the first letter of word to result by accessing the first character of the str by using the [0] index of word.
Then the upper function is applied to the result.
I hope this clears it up for you.
def initials(phrase):
words = phrase.split()
result = ""
for word in words:
result += word[0]
return result.upper()
This:
Splits the phrase at every space (" "), with phrase.split(). .split() returns a list which is assigned to words
Iterates through the list words and adds the first letter of each word (word[0]) to the result variable.
Returns result converted to uppercase (result.upper())
def initials(phrase):
words = phrase.split()
result = ""
for word in words:
result += word[0].upper()
return result
print(ShortName("Active Teens Taking Initiative To Understand Driving Experiences"))
Should be: ATTITUDE
def initials(phrase):
words =phrase.split()
result=""+""
for word in words:
result += word[0].upper()
return result
print(initials("Universal Serial Bus")) # Should be: USB
print(initials("local area network")) # Should be: LAN
print(initials("Operating system")) # Should be: OS
Here is output:
USB
LAN
OS
This:
Splits the phrase at every space (" "+" ") and concatenate next one first letter,with phrase.split() returns a list which is assigned to words Iterates through the list words and adds the first letter of each word (word[0]) to the result variable.
Returns result converted to uppercase (result.upper())
strong text
def initials(phrase):
words = phrase.split()
result = ""
for word in words:
result += word[0].uppper()
return result

How to limit the results of a Python if-in statement when checking if a string is found in another string?

I wrote a Python for loop that goes through each word in the English language (from nltk.corpus import words), and prints words made only of 6 letters provided by the user. The 6 user inputs are stored in a list named characters, so the for loop compares the items from the list to each string (english words).
The problem is that words are printed that contain multiple characters of the same character. For example, if the characters are 'u, l, c, i , e, n', words with multiple letters such as "icicle" are returned. How to I prevent the script from returning words with duplicate letters?
characters = [input1, input2, input3, input4, input5, input6]
for word in word_list:
word = word.lower()
if len(word) == 3:
if word[0] in characters and word[1] in characters and word[2] in characters:
print(word)
elif len(word) == 4:
if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters:
print(word)
elif len(word) == 5:
if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters and word[4] in characters:
print(word)
elif len(word) == 6:
if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters and word[4] in characters and word[5] in characters:
print(word)
I know the code is inefficiently written, so I'd appreciate tips on improvement as well. An example of the results of the above script is:
eel
eileen
eli
ell
elle
ellen
ellice
encell
ennui
eunice
ice
iceni
icicle
ilicic
ilicin
ill
inn
inulin
This is untested since I have no test data, but should do:
characters = [input1, input2, input3, input4, input5, input6]
for word in word_list:
word = word.lower()
isIn = True
for c in word:
if c not in characters or word.count(c) != 1:
isIn = False
if isIn:
print(word)
I don't know this package, but it sounds that your word list is big.
You should use a keyword tree instead of looping through the whole list everytime, when new letters are given. It is possible that this package contains better data structures for accessing those words, if not then you should transform it into a Trie. It is a one-time task and after it, lookup times become faster for every input.
Answering your question, you can make a dictionary, what maps the input letters with their quantities. For example:
input = {'a':1, 'b':2, 'c':1}
Then, if you are looping on each word, costly you can count each letter. If you are using a Trie, then you only need to go over on children and make a recursive call if
input[children's letter] != 0
before the recursive call, you need to decrement that value, and after call increment it.
This way, you only go over on the words that starts the same as your letters instead of going over every word, every time.
Hope it helps :)
You can use collections.Counter.
from collections import Counter
Then, to get Counter objects (essentially multisets) which count how many times each character occurs in the word and in the inputted allowed characters:
word_counter = Counter(word)
characters_counter = Counter(characters)
To check that the word is a subset of the characters, and print if so, do
if word_counter & characters_counter == word_counter:
print(word)
(& means intersection)
Very simple. Quick, because it uses standard library functionality hash maps that are optimized and probably written in C, instead of costly multiple-level list loops and finds and additions and removals. It also has the added benefit that if a user enters the same characters multiple times, then it will allow words with that character repeated multiple times, up to however many times the user entered it.
For example, if the user entered "i, i, c, c, l, e" then the word "icicle" would still be printed, whereas if they entered "i, i, c, z, l, e" then "icicle" would not be printed.
from collections import Counter
# input characters, get words...
characters_counter = Counter(characters)
for word in word_list:
word_counter = Counter(word)
if word_counter & characters_counter == word_counter:
print(word)
Done!
My first thought about the efficiency is:
def test_word(word, characters):
for i in range(len(word)):
if word[i] not in characters: # Does everything in 2 lines :)
return False
return True
This function returns False if the word has letters not in the list "characters", and True otherwise.
The reason I used a function is simply because it is neater and you can run the code from any point in your program easily. Make sure you use a copy of the list "characters" if you need to use it in the future:
copy_of_chars = characters.copy()
test_word(word, copy_of_chars)
About the duplicate letters- I would delete any letter in the list that has been "found":
def test_word(word, characters):
for i in range(len(word)):
if word[i] not in characters:
return False
characters.pop(characters.index([word[i]])) # Removes the letter from the list "characters"
return True
This function will return False if the word has characters not in the list characters, or if it has multiple letters when only one can be found in the list "characters". Otherwise it will return True.
Hope this helps!
Didn't test it:
for word in word_list:
if word < 6:
if all(letter in character for letter in list(word.lower()):
print(word)

Code to detect all words that start with a capital letter in a string

I'm writing out a small snippet that grabs all letters that start with a capital letter in python . Here's my code
def WordSplitter(n):
list1=[]
words=n.split()
print words
#print all([word[0].isupper() for word in words])
if ([word[0].isupper() for word in words]):
list1.append(word)
print list1
WordSplitter("Hello How Are You")
Now when I run the above code. Im expecting that list will contain all the elements, from the string , since all of the words in it start with a capital letter.
But here's my output:
#ubuntu:~/py-scripts$ python wordsplit.py
['Hello', 'How', 'Are', 'You']
['You']# Im expecting this list to contain all words that start with a capital letter
You're only evaluating it once, so you get a list of True and it only appends the last item.
print [word for word in words if word[0].isupper() ]
or
for word in words:
if word[0].isupper():
list1.append(word)
You can take advantage of the filter function:
l = ['How', 'are', 'You']
print filter(str.istitle, l)
I have written the following python snippet to store the capital letter starting words into a dictionary as key and no of its appearances as a value in this dictionary against the key.
#!/usr/bin/env python
import sys
import re
hash = {} # initialize an empty dictinonary
for line in sys.stdin.readlines():
for word in line.strip().split(): # removing newline char at the end of the line
x = re.search(r"[A-Z]\S+", word)
if x:
#if word[0].isupper():
if word in hash:
hash[word] += 1
else:
hash[word] = 1
for word, cnt in hash.iteritems(): # iterating over the dictionary items
sys.stdout.write("%d %s\n" % (cnt, word))
In the above code, I shown both ways, the array index to check for the uppercase start letter and by using the regular expression. Anymore improvement suggestion for the above code for performance or for simplicity is welcome

How do I match vowels?

I am having trouble with a small component of a bigger program I am in the works on. Basically I need to have a user input a word and I need to print the index of the first vowel.
word= raw_input("Enter word: ")
vowel= "aeiouAEIOU"
for index in word:
if index == vowel:
print index
However, this isn't working. What's wrong?
Try:
word = raw_input("Enter word: ")
vowels = "aeiouAEIOU"
for index,c in enumerate(word):
if c in vowels:
print index
break
for .. in will iterate over actual characters in a string, not indexes. enumerate will return indexes as well as characters and make referring to both easier.
Just to be different:
import re
def findVowel(s):
match = re.match('([^aeiou]*)', s, flags=re.I)
if match:
index = len(match.group(1))
if index < len(s):
return index
return -1 # not found
The same idea using list comprehension:
word = raw_input("Enter word: ")
res = [i for i,ch in enumerate(word) if ch.lower() in "aeiou"]
print(res[0] if res else None)
index == vowel asks if the letter index is equal to the entire vowel list. What you want to know is if it is contained in the vowel list. See some of the other answers for how in works.
One alternative solution, and arguably a more elegant one, is to use the re library.
import re
word = raw_input('Enter a word:')
try:
print re.search('[aeiou]', word, re.I).start()
except AttributeError:
print 'No vowels found in word'
In essence, the re library implements a regular expression matching engine. re.search() searches for the regular expression specified by the first string in the second one and returns the first match. [aeiou] means "match a or e or i or o or u" and re.I tells re.search() to make the search case-insensitive.
for i in range(len(word)):
if word[i] in vowel:
print i
break
will do what you want.
"for index in word" loops over the characters of word rather than the indices. (You can loop over the indices and characters together using the "enumerate" function; I'll let you look that up for yourself.)

Categories