Python make a list of words from a file

Python make a list of words from a file - python

I'm trying to make a list of words from a file that includes only words that do not contain any duplicate letters such as 'hello' but 'helo' would be included.
My code words perfectly when I use a list that I create by just typing in words however when I try to do it with the file list it just prints all the words even if they include duplicate letters.
words = []
length = 5
file = open('dictionary.txt')
for word in file:
if len(word) == length+1:
words.insert(-1, word.rstrip('\n'))
alpha = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
x = 0
while x in range(0, len(alpha)):
i = 0
while i in range(0, len(words)):
if words[i].count(alpha[x]) > 1:
del(words[i])
i = i - 1
else:
i = i + 1
x = x + 1
print(words)

This snippet adds words, and removes duplicated letters before inserting them
words = []
length = 5
file = open('dictionary.txt')
for word in file:
clean_word = word.strip('\n')
if len(clean_word) == length + 1:
words.append(''.join(set(clean_word))
We convert the string to a set, which removed duplicates, and then we join the set to a string again:
>>> word = "helloool"
>>> set(word)
set(['h', 'e', 'l', 'o'])
>>> ''.join(set(word))
'helo'
I am not 100% sure how you want to remove duplicates like this, so I've assumed no letter can be more than once in the word (as your question specifies "duplicate letter" and not "double letter").

What does your dictionary.txt look like? Your code should work so long as each word is on a separate line (for x in file iterates through lines) and at least some of the words have 5 non-repeating letters.
Also, couple of tips:
You can read lines from a file into a list by calling file.readlines()
You can check for repeats in a list or string by using sets. Sets remove all duplicate elements, so checking if len(word) == len(set(word)) will tell you if there are duplicate letters in much less code :)

Related

Find out if string contains a combination of letters in a specific order

I am attempting to write a program to find words in the English language that contain 3 letters of your choice, in order, but not necessarily consecutively. For example, the letter combination EJS would output, among others, the word EJectS. You supply the letters, and the program outputs the words.
However, the program does not give the letters in the right order, and does not work at all with double letters, like the letters FSF or VVC. I hope someone can tell me how I can fix this error.
Here is the full code:
with open("words_alpha.txt") as words:
wlist = list(words)
while True:
elim1 = []
elim2 = []
elim3 = []
search = input("input letters here: ")
for element1 in wlist:
element1 = element1[:-1]
val1 = element1.find(search[0])
if val1 > -1:
elim1.append(element1)
for element2 in elim1:
val2 = element2[(val1):].find(search[2])
if val2 > -1:
elim2.append(element2)
for element3 in elim2:
val3 = element3[((val1+val2)):].find(search[1])
if val3 > -1:
elim3.append(element3)
print(elim3)

You are making this very complicated for yourself. To test whether a word contains the letters E, J and S in that order, you can match it with the regex E.*J.*S:
>>> import re
>>> re.search('E.*J.*S', 'EJectS')
<_sre.SRE_Match object; span=(0, 6), match='EJectS'>
>>> re.search('E.*J.*S', 'JEt engineS') is None
True
So here's a simple way to write a function which tests for an arbitrary combination of letters:
import re
def contains_letters_in_order(word, letters):
regex = '.*'.join(map(re.escape, letters))
return re.search(regex, word) is not None
Examples:
>>> contains_letters_in_order('EJectS', 'EJS')
True
>>> contains_letters_in_order('JEt engineS', 'EJS')
False
>>> contains_letters_in_order('ABra Cadabra', 'ABC')
True
>>> contains_letters_in_order('Abra CadaBra', 'ABC')
False
If you want to test every word in a wordlist, it is worth doing pattern = re.compile(regex) once, and then pattern.search(word) for each word.

You need to read the file correctly with read(), and since there is a newline between each word, call split('\n') to properly create the word list. The logic is simple. If all the letters are in the word, get the index for each letter, and check that the order of the indexes matches the order of the letters.
with open('words_alpha.txt') as file:
word_list = file.read().split('\n')
search = input("input letters here: ").lower()
found = []
for word in word_list:
if all(x in word for x in search):
i = word.find(search[0])
j = word.find(search[1], i + 1)
k = word.find(search[2], j + 1)
if i < j < k:
found.append(word)
print(found)
Using Function:
def get_words_with_letters(word_list, search):
search = search.lower()
for word in word_list:
if all(x in word for x in search):
i = word.find(search[0])
j = word.find(search[1], i + 1)
k = word.find(search[2], j + 1)
if i < j < k:
yield word
words = list(get_words_with_letters('fsf'))

The issue with your code is that you're using val1 from a specific word in your first loop for another word in your second loop. So val1 will be the wrong value most of the time as you're using the position of the first letter in the last word you checked in your first loop for every word in your seconds loop.
There are a lot of ways to solve what you're trying to do. However, my code below should be fairly close to what you had in mind with your solution. I have tried to explain everything that's going on in the comments:
# Read words from file
with open("words_alpha.txt") as f:
words = f.readlines()
# Begin infinite loop
while True:
# Get user input
search = input("Input letters here: ")
# Loop over all words
for word in words:
# Remove newline characters at the end
word = word.strip()
# Start looking for the letters at the beginning of the word
position = -1
# Check position for each letter
for letter in search:
position = word[position + 1:].find(letter)
# Break out of loop if letter not found
if position < 0:
break
# If there was no `break` in the loop, the word contains all letters
else:
print(word)
For every new letter we start looking beginning at position + 1 where position is the position of the previously found letter. (That's why we have to do position = -1, so we start looking for the first letter at -1 + 1 = 0.)
You should ideally move the removal of \n outside of the loop, so you will have to do it once and not for every search. I just left it inside the loop for consistency with your code.
Also, by the way, there's no handling of uppercase/lowercase for now. So, for example, should the search for abc be different from Abc? I'm not sure, what you need there.

How to take out punctuation from string and find a count of words of a certain length?

I am opening trying to create a function that opens a .txt file and counts the words that have the same length as the number specified by the user.
The .txt file is:
This is a random text document. How many words have a length of one?
How many words have the length three? We have the power to figure it out!
Is a function capable of doing this?
I'm able to open and read the file, but I am unable to exclude punctuation and find the length of each word.
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
if len(i) == number:
count += 1
return count

You can try using the replace() on the string and pass in the desired punctuation and replace it with an empty string("").
It would look something like this:
puncstr = "Hello!"
nopuncstr = puncstr.replace(".", "").replace("?", "").replace("!", "")

I have written a sample code to remove punctuations and to count the number of words. Modify according to your requirement.
import re
fin = """This is a random text document. How many words have a length of one? How many words have the length three? We have the power to figure it out! Is a function capable of doing this?"""
fin = re.sub(r'[^\w\s]','',fin)
print(len(fin.split()))
The above code prints the number of words. Hope this helps!!

instead of cascading replace() just use strip() a one time call
Edit: a cleaner version
pl = '?!."\'' # punctuation list
def samplePractice(number):
with open('sample.txt', 'r') as fin:
words = fin.read().split()
# clean words
words = [w.strip(pl) for w in words]
count = 0
for word in words:
if len(word) == number:
print(word, end=', ')
count += 1
return count
result = samplePractice(4)
print('\nResult:', result)
output:
This, text, many, have, many, have, have, this,
Result: 8
your code is almost ok, it just the second for block in wrong position
pl = '?!."\'' # punctuation list
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
i = i.strip(pl) # clean the word by strip
if len(i) == number:
count += 1
return count
result = samplePractice(4)
print(result)
output:
8

Print values from list based from separate text file

How do I print a list of words from a separate text file? I want to print all the words unless the word has a length of 4 characters.
words.txt file looks like this:
abate chicanery disseminate gainsay latent aberrant coagulate dissolution garrulous laud
It has 334 total words in it. I'm trying to display the list until it reaches a word with a length of 4 and stops.
wordsFile = open("words.txt", 'r')
words = wordsFile.read()
wordsFile.close()
wordList = words.split()
#List outputs length of words in list
lengths= [len(i) for i in wordList]
for i in range(10):
if i >= len(lengths):
break
print(lengths[i], end = ' ')
# While loop displays names based on length of words in list
while words != 4:
if words in wordList:
print("\nSelected words are:", words)
break
output
5 9 11 7 6 8 9 11 9 4
sample desired output
Selected words are:
Abate
Chicanery
disseminate
gainsay
latent
aberrant
coagulate
dissolution
garrulous

Given that you only want the first 10 words. There isn't much point reading all 4 lines. You can safely read just the 1st and save yourself some time.
#from itertools import chain
with open('words.txt') as f:
# could raise `StopIteration` if file is empty
words = next(f).strip().split()
# to read all lines
#words = []
#for line in f:
# words.extend(line.strip().split())
# more functional way
# words = list(chain.from_iterable(line.strip().split() for line in f))
print("Selected words are:")
for word in words[:10]:
if len(word) != 4:
print(word)
There are a few alternative methods I left in there but commented out.
Edit using a while loop.
i = 0
while i < 10:
if len(words[i]) != 4:
print(words[i])
i += 1
Since you know how many iterations you can do, you can hide the mechanics of the iteration using a for loop. A while does not facilitate this very well and is better used when you don't know how many iterations you will do.

To read all words from a text file, and print each of them unless they have a length of 4:
with open("words.txt","r") as wordsFile:
words = wordsFile.read()
wordsList = words.split()
print ("Selected words are:")
for word in wordsList:
if len(word) != 4: # ..unless it has a length of 4
print (word)
Later in your question you write, "I'm trying to display the first 10 words "...). If so, add a counter, and add a condition to print if its value is <= 10.

While i'd use a for or a while loop, like Paul Rooney suggested, you can also adapt your code.
When you create the list lengths[], you create a list with ALL the lengths of the words contained in wordList.
You then cycle the first 10 lengths in lengths[] with the for loop;
If you need to use this method, you can nest a for loop, comparing words and lengths:
#lengths[] contains all the lengths of the words in wordList
lengths= [len(i) for i in wordList]
#foo[] cointains all the words in wordList
foo = [i for i in wordList]
#for the first 10 elements of lengths, if the elements isn't 4 char long
#print foo[] element with same index
for i in range(10):
if lengths[i] != 4:
print(foo[i])
if i >= len(lengths):
break
I hope this is clear and it's the answer you were looking for

How to print words that only cointain letters from a list?

Hello I have recently been trying to create a progam in Python 3 which will read a text file wich contains 23005 words, the user will then enter a string of 9 characters which the program will use to create words and compare them to the ones in the text file.
I want to print words which contains between 4-9 letters and that also contains the letter in the middle of my list. For example if the user enters the string "anitsksem" then the fifth letter "s" must be present in the word.
Here is how far I have gotten on my own:
# Open selected file & read
filen = open("svenskaOrdUTF-8.txt", "r")
# Read all rows and store them in a list
wordList = filen.readlines()
# Close File
filen.close()
# letterList index
i = 0
# List of letters that user will input
letterList = []
# List of words that are our correct answers
solvedList = []
# User inputs 9 letters that will be stored in our letterList
string = input(str("Ange Nio Bokstäver: "))
userInput = False
# Checks if user input is correct
while userInput == False:
# if the string is equal to 9 letters
# insert letter into our letterList.
# also set userInput to True
if len(string) == 9:
userInput = True
for char in string:
letterList.insert(i, char)
i += 1
# If string not equal to 9 ask user for a new input
elif len(string) != 9:
print("Du har inte angivit nio bokstäver")
string = input(str("Ange Nio Bokstäver: "))
# For each word in wordList
# and for each char within that word
# check if said word contains a letter from our letterList
# if it does and meets the requirements to be a correct answer
# add said word to our solvedList
for word in wordList:
for char in word:
if char in letterList:
if len(word) >= 4 and len(word) <= 9 and letterList[4] in word:
print("Char:", word)
solvedList.append(word)
The issue that I run into is that instead of printing words which only contain letters from my letterList, it prints out words which contains at least one letter from my letterList. This also mean that some words are printed out multiple time, for example if the words contains multiple letters from letterList.
I've been trying to solve these problems for a while but I just can't seem to figure it out. I Have also tried using permutations to create all possible combinations of the letters in my list and then comparing them to my wordlist, however I felt that solution was to slow given the number of combinations which must be created.
# For each word in wordList
# and for each char within that word
# check if said word contains a letter from our letterList
# if it does and meets the requirements to be a correct answer
# add said word to our solvedList
for word in wordList:
for char in word:
if char in letterList:
if len(word) >= 4 and len(word) <= 9 and letterList[4] in word:
print("Char:", word)
solvedList.append(word)
Also since I'm kinda to new to python, if you have any general tips to share, I would really appreciate it.

You get multiple words mainly because you iterate through each character in a given word and if that character is in the letterList you append and print it.
Instead, iterate on a word basis and not on a character basis while also using the with context managers to automatically close files:
with open('american-english') as f:
for w in f:
w = w.strip()
cond = all(i in letterList for i in w) and letterList[4] in w
if 9 > len(w) >= 4 and cond:
print(w)
Here cond is used to trim down the if statement, all(..) is used to check if every character in the word is in letterList, w.strip() is to remove any redundant white-space.
Additionally, to populate your letterList when the input is 9 letters, don't use insert. Instead, just supply the string to list and the list will be created in a similar, but noticeably faster, fashion:
This:
if len(string) == 9:
userInput = True
for char in string:
letterList.insert(i, char)
i += 1
Can be written as:
if len(string) == 9:
userInput = True
letterList = list(string)
With these changes, the initial open and readlines are not needed, neither is the initialization of letterList.

You can try this logic:
for word in wordList:
# if not a valid work skip - moving this check out side the inner for-each will improve performance
if len(word) < 4 or len(word) > 9 or letterList[4] not in word:
continue
# find the number of matching words
match_count = 0
for char in word:
if char in letterList:
match_count += 1
# check if total number of match is equal to the word count
if match_count == len(word):
print("Char:", word)
solvedList.append(word)

You can use lambda functions to get this done.
I am just putting up a POC here leave it to you to convert it into complete solution.
filen = open("test.text", "r")
word_list = filen.read().split()
print("Enter your string")
search_letter = raw_input()[4]
solved_list = [ word for word in word_list if len(word) >= 4 and len(word) <= 9 and search_letter in word]
print solved_list

reading and checking the consecutive words in a file

I want to read the words in a file, and say for example, check if the word is "1",if word is 1, I have to check if the next word is "two". After that i have to do some other task. Can u help me to check the occurance of "1" and "two" consecutively.
I have used
filne = raw_input("name of existing file to be proceesed:")
f = open(filne, 'r+')
for word in f.read().split():
for i in xrange(len(word)):
print word[i]
print word[i+1]
but its not working.

The easiest way to deal with consecutive items is with zip:
with open(filename, 'r') as f: # better way to open file
for line in f: # for each line
words = line.strip().split() # all words on the line
for word1, word2 in zip(words, words[1:]): # iterate through pairs
if word1 == '1' and word2 == 'crore': # test the pair
At the moment, your indices (i and i+1) are within each word (i.e. characters) not for words within the list.

I think you want to print two consecutive words from the file,
In your code you are iterating over the each character instead of each word in file if thats what you intend to do.
You can do that in following way:
f = open('yourFileName')
str1 = f.read().split()
for i in xrange(len(str1)-1): # -1 otherwise it will be index out of range error
print str1[i]
print str1[i+1]
and if you want to check some word is present and want check for word next to it, use
if 'wordYouWantToCheck' in str1:
index=str1.index('wordYouWantToCheck')
Now you have index for the word you are looking for, you can check for the word next to it using str1[index+1].
But 'index' function will return only the first occurrence of the word. To accomplish your intent here, you can use 'enumerate' function.
indices = [i for i,x in enumerate(str1) if x == "1"]
This will return list containing indices of all occurrences of word '1'.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.