I'm making a script that reads a dictionary and picks out words that fit a search criteria. The code runs fine, but the problem is that it doesn't write any words to the file "wow" or print them out. The source for the dictionary is https://github.com/dwyl/english-words/blob/master/words.zip.
I've tried changing the opening of the file to "w+" instead of "a+" but it didn't make a difference. I checked if there just weren't any words that fitted the criteria but that isn't the issue.
listExample = [] #creates a list
with open("words.txt") as f: #opens the "words" text file
for line in f:
listExample.append(line)
x = 0
file = open("wow.txt","a+") #opens "wow" so I can save the right words to it
while True:
if x < 5000: # limits the search because I don't want to wait too long
if len(listExample[x]) == 11: #this loop iterates through all words
word = listExample[x] #if the words is 11 letters long
lastLetter = word[10]
print(x)
if lastLetter == "t": #and the last letter is t
file.write(word) #it writes the word to the file "wow"
print("This word is cool!",word) #and prints it
else:
print(word) #or it just prints it
x += 1 #iteration
else:
file.close()
break #breaks after 5000 to keep it short
It created the "wow" file but it is empty. How can I fix this issue?
This fixes your problem. You were splitting the text in such a way that each word had a line break at the end and maybe a space too. I've put in .strip() to get rid of any whitespace. Also I've defined lastLetter as word[-1] to get the final letter regardless of the word's length.
P.S. Thanks to Ocaso Protal for suggesting strip instead of replace.
listExample = [] #creates a list
with open("words.txt") as f: #opens the "words" text file
for line in f:
listExample.append(line)
x = 0
file = open("wow.txt","a+") #opens "wow" so I can save the right words to it
while True:
if x < 5000: # limits the search because I don't want to wait too long
word = listExample[x].strip()
if len(word) == 11:
lastLetter = word[-1]
print(x)
if lastLetter == "t": #and the last letter is t
file.write(word + '\n') #it writes the word to the file "wow"
print("This word is cool!",word) #and prints it
else:
print(word) #or it just prints it
x += 1 #iteration
else:
print('closing')
file.close()
break #breaks after 5000 to keep it short
Related
New here!
I am searching for the following or the next word for the word "I". Ex "I am new here" -> the next word is "am".
import re
word = 'i'
with open('tedtalk.txt', 'r') as words:
pat = re.compile(r'\b{}\b \b(\w+)\b'.format(word))
print(pat.findall(words))
with open('tedtalk.txt','r') as f:
for line in f:
phrase = 'I'
if phrase in line:
next(f)
These are the codes i have developed so far, but i am kind of stuck already. Thanks in advance!
you have 2 options.
first, with split
with open('tedtalk.txt','r') as f:
data = f.read()
search_word = "I"
list_of_words = data.split()
next_word = list_of_words[list_of_words.index(search_word) + 1]
second, with regex:
import re
regex = re.compile(r"\bI\b\s*?\b(\w+)\b")
with open('tedtalk.txt','r') as f:
data = f.readlines()
result = regex.findall(data)
In your first piece of code, words is a file object, and there will be problems with line-by-line verification. For example, in the following case, am2 may not be found.
tedtalk.txt
I am1 new here, I
am2 new here, I am3 new here
So I modified the program and read 4096 bytes multiple times to prevent the file from being too large and causing the memory to explode.
In order to prevent the data being truncated causing it to be missed, the I will be looked for from the end of the data for a single read, and if found, the data following it will be truncated and put in front of the next read.
import re
regex = re.compile(r"\bI\b\s*?\b(\w+)\b")
def find_index(data, target_value="I"):
"""Look for spaces from the back, the intention is to find the value between two space blocks and check if it is equal to `target_value`"""
index = once_read_data.rfind(" ")
if index != -1:
index2 = index
while True:
index2 = once_read_data.rfind(" ", 0, index2)
if index2 == -1:
break
t = index - index2
# two adjacent spaces
if t == 1:
continue
elif t == 2 and once_read_data[index2 + 1: index] == target_value:
return index2
result = []
with open('tedtalk.txt', 'r') as f:
# Save data that might have been truncated last time.
prev_data = ""
while True:
once_read_data = prev_data + f.read(4096)
if not once_read_data:
break
index = find_index(once_read_data)
if index is not None:
# Slicing based on the found index.
prev_data = once_read_data[index:]
once_read_data = once_read_data[:index]
else:
prev_data = ""
result += regex.findall(once_read_data)
print(result)
Output:
['am1', 'am2', 'am3']
search_word = 'I'
prev_data = ""
result = []
with open('tedtalk.txt', 'r') as f:
while True:
data = prev_data + f.readline()
if data == prev_data: # reached eof
break
list_of_words = data.split()
for word_pos, word in enumerate(list_of_words[:-1]):
if word == search_word:
result.append(list_of_words[word_pos+1])
prev_data = list_of_words[-1] + ' '
print(result)
I modified the code to read the text file by line, this should handle unlimited large file. The code also addresses the case where the search word is the last word on a line by taking as next word the first word of the next line.
If you rather treat each line independently and ignore the search word if it is the last word in the line, the code can be simplified as follows:
search_word = 'I'
result = []
with open('tedtalk.txt', 'r') as f:
while True:
data = f.readline()
if not data: # reached eof
break
list_of_words = data.split()
for word_pos, word in enumerate(list_of_words[:-1]):
if word == search_word:
result.append(list_of_words[word_pos+1])
print(result)
I am tasked with building a program that will ask for an input for a word. I am to write a program to search the word in a dictionary. (I already have composed)
[My hint is: you will find the first character of the word. Get the list of words that starts with that character.
Traverse the list to find the word.]
So far I have the following code:
Word = input ("Search word: ")
my_file = open("input.txt",'r')
d = {}
for line in my_file:
key = line[0]
if key not in d:
d[key] = [line.strip("\n")]
else:d[key].append(line.strip("\n"))
I have gotten close, but I am stuck. Thank you in advance!
user_word=input("Search word: ")
def file_records():
with open("input.txt",'r') as fd:
for line in fd:
yield line.strip()
for record in file_records():
if record == user_word:
print ("Word is found")
break
for record in file_records():
if record != user_word:
print ("Word is not found")
break
You could do something like this,
words = []
with open("input.txt",'r') as fd:
words = [w.strip() for w in fd.readlines()]
user_word in words #will return True or False. Eg. "americophobia" in ["americophobia",...]
fd.readlines() reads all the lines in the file to a list and then w.strip() should strip off all leading and ending whitespaces (including newline). Else try - w.strip( \r\n\t)
[w.strip() for w in fd.readlines()] is called list comprehension in python
This should work as long as the file is not too huge. If there are millions of record, you might want to consider creating a genertor function to read file. Something like,
def file_records():
with open("input.txt",'r') as fd:
for line in fd:
yield line.strip()
#and then call this function as
for record in file_records():
if record == user_word:
print(user_word + " is found")
break
else:
print(user_word + " is not found")
PS: Not sure why you would need a python dictionary. Your professor would have meant English dictionary :)
Hey my program has this which I think should work, I have gone through the whole program and everything seems fine. But somehow the program doesn't split the words in the list by spaces.
I have used lists and dictionaries in my code which seem to work, I have tested a few things but when I input the .txt file everything is fine and the program works perfectly but when you input your own words the program seems to put all of the words with spaces (example of what I mean: if I input house phone keyboard the program adds all 3 words as if they were one)
My question is how can I split the words by spaces when the user inputs the words themselves instead of using a .txt file?
#These are the helper variables
wordsList = []
wordsDict = {}
words = 0
typeFile = 0
textFile = 0
text = 0
sortValue = 0
#Here we ask the person if they want to input their own words or use a .txt
file
print("Input 'a' to input your own words or input 'b' to input text from a
.txt file")
typeFile = input()
#Here we make an if statement if they chose to input their own words
if typeFile == "a":
print("enter 'x' when finished")
while True:
#Here we ask the user to input their words
print("Input your words")
words = input()
#This will finish asking for input when the user enters x
if words == 'x':
break
#this puts the words on the list
wordsList.append(words)
#Here we make an if statement if they chose to use a .txt file
if typeFile == "b":
#This asks for the .txt file they want to open
words = input("input which .txt file you would like to use
(text_file.txt)")
#This opens the file
textFile = open(words, 'r')
text = textFile.read()
#This adds the words of the file into the list and separates each word
by spaces
wordsList = text.split()
#This sorts the list
for text in range(len(wordsList)):
#This looks if the word has been added to the dictionary
if wordsList[text] in wordsDict.keys():
#this will add a +1 value to the word if it is in already in the
dictionary
wordsDict[wordsList[text]] += 1
#This looks if the word has been added to the dictionary
if wordsList[text] not in wordsDict.keys():
#this adds the word to the dictionary if it isnt already in
wordsDict[wordsList[text]] = 1
#This sorts the words from lowest to highest
sortValue = sorted(wordsDict.items(), key = lambda t:t[1])
#These will print the top 5 used words in the list
print("These are the most used 5 words")
for num in range(1,6):
print(num,':',sortValue[num*-1])
words = input("Press return to exit.")
I have tested your code, and it works fine as long as: after choosing option a, you enter one word at a time, and then press enter. If you want to add multiple words together, with spaces in between them before pressing enter, then modify the 'if typeFile == "a":' section as follows:
if typeFile == "a":
print("enter 'x' when finished")
while True:
#Here we ask the user to input their words
print("Input your words")
words = input()
#This will finish asking for input when the user enters x
if words == 'x':
break
#this puts the words on the list
if ' ' in words:
words=words.split()
wordsList.extend(words)
else:
wordsList.append(words)
Your program should then work, provided you provide at least 5 different words for the dictionary. If you do not, you will get a 'list index out of range' statement. To solve this, modify the section after "#These will print the top 5 used words in the list" as follows:
#These will print the top 5 used words in the list
print("These are the most used 5 words")
if len(wordsDict)<5:
highest=len(wordsDict)
else:
highest=6
for num in range(1,highest):
print(num,':',sortValue[num*-1])
words = input("Press return to exit.")
I'm using python 3.6.3 and the following code works fine for me.
import os
cmd = None
words_list = list()
words_dict = dict()
while cmd != 'x':
cmd = input('What you want to do? [a] to enter your own sentence; [i]mport a file; e[x]it: ')
if cmd == 'a':
os.system('clear')
sentence = input('Enter your sentence: ')
split_sentence = sentence.split(' ')
words_list.append(split_sentence)
elif cmd == 'i':
# Add code to read each line in file.
os.system('clear')
print('Importing file...')
elif cmd != 'x':
os.system('clear')
print(f'{cmd} is an invalid command.')
else:
os.system('clear')
print('Good Bye!')
def list_to_dict(sentence_list):
for i in sentence_list:
for j in i:
if j in words_dict:
words_dict[j] += 1
else:
words_dict[j] = 1
return words_dict
print(list_to_dict(words_list))
I have to create a python file that prompts the user for a file path to a text document and then convert it into pig Latin and do a line/word count.
• A function to generate the pig Latin version of a single word
• A function to print line and word counts to standard output
• Correct pig Latin output with identical formatting as the original text file
• Correct line and word counts
I can't figure out why the pig latin is coming out wrong. My teacher said that I need another string.strip("\n") because it is making the words convert wrong but I have no idea where I am supposed to put that.
Also my line counter is broken. It counts but it always says 222 lines.
How can I make it just count the lines with words ?
#Step 1: User enters text file.
#Step 2: Pig Latin function rewrites file and saves as .txt.
#Step 3: Tracks how many lines and words it rewrites.
vowels = ("A", "a", "E", "e", "I", "i", "O", "o", "U", "u")
# Functions
def pig_word(string):
line = string.strip("\n")
for word in string.split(" "):
first_letter = word[0]
if first_letter in vowels:
return word + "way"
else:
return word[1:] + first_letter + "ay"
def pig_sentence(sentence):
word_list = sentence.split(" ")
convert = " "
for word in word_list:
convert = convert + pig_word(word)
convert = convert + " "
return convert
def line_counter(s):
line_count = 0
for line in s:
line_count += 1
return line_count
def word_counter(line):
word_count = 0
list_of_words = line.split()
word_count += len(list_of_words)
return word_count
# File path conversion
text = raw_input("Enter the path of a text file: ")
file_path = open(text, "r")
out_file = open("pig_output.txt", "w")
s = file_path.read()
pig = pig_sentence(s)
out_file.write(pig+" ")
out_file.write("\n")
linecount = line_counter(s)
wordcount = word_counter(s)
file_path.close()
out_file.close()
# Results
print "\n\n\n\nTranslation finished and written to pig_output.txt"
print "A total of {} lines were translated successfully.".format(linecount)
print "A total of {} words were translated successfully.".format(wordcount)
print "\n\n\n\n"
your first problem is here:
def pig_word(string):
line = string.strip("\n") #!!!! line is NEVER USED !!!
for word in string.split(" "): #you want *line*.split here
the second issue is caused by iterating over a string, it goes through every character instead of every line like a file does:
>>> for i in "abcd":
... print(i)
a
b
c
d
so in your line_counter instead of doing:
for line in s:
line_count += 1
you just need to do:
for line in s.split("\n"):
line_count += 1
The first reason why your not getting the output you want is because in your pig_word(string) function, you return the first word in the string when you put that return inside of your for loop. Also, your teacher was talking about taking all the lines into the function, and iterating over each line via str.split('\n'). \n represents the "new-line" character.
You can try something like this to correct that.
def pig_sentence(string):
lines = []
for line in string.split('\n'):
new_string = ""
for word in line.split(" "):
first_letter = word[0]
if first_letter in vowels:
new_string += word + "way"
else:
new_string += word[1:] + first_letter + "ay"
lines.append(new_string)
return lines
The Changes Made
Initialized a new list lines that we can append to throughout the loops.
Iterate over each line in the passed in string.
For each line, create a new string new_string.
Use your code, but instead of returning we add it to new_string, then append new_string to our list of new lines, lines.
Note that this removes the need for two functions. Also note that I renamed pig_word to pig_sentence.
The second error is in your function line_counter(s). You are iterating over each character rather than each line. Here add that str.split('\n') again to get the output you want by splitting the string into a list of lines then iterating over the list.
Here is the modified function:
def line_counter(s):
line_count = 0
for _ in s.split('\n'):
line_count += 1
return line_count
(Since there is nothing erroneous with your file i.o., I'm just going to use a string literal here for the testing.)
Test
paragraph = """\
Hello world
how are you
pig latin\
"""
lines = line_counter(paragraph)
words = sum([word_counter(line) for line in paragraph.split('\n')])
out = pig_sentence(paragraph)
print(lines, words, out)
The output is what we expect!
3 7 ['elloHay', 'elloHayorldway', 'owhay', 'owhayareway', 'owhayarewayouyay', 'igpay', 'igpayatinlay']
You are removing only the space, you need to remove all punctuation as well as end of line characters. Replace
split(" ")
with
split()
Your sentence list is the equivalent of
sentence = 'Hello there.\nMy name is Roxy.\nHow are you?
If you print after split(" ") and split() you will see the difference and you will get the results that you expect.
Additionally, you will get incorrect results because you will have there translated to heretay. you need to loop around so that it comes out as erethay
That is move every consonent to the end before adding 'ay' so that the new word starts with a vowel.
I have an assignment that reads:
Write a function which takes the input file name and list of words
and write into the file “Repeated_word.txt” the word and number of
times word repeated in input file?
word_list = [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
My code is below.
All it does is create the new file 'Repeated_word.txt' however it doesn't write the number of times the word from the wordlist appears in the file.
#obtain the name of the file
filename = raw_input("What is the file being used?: ")
fin = open(filename, "r")
#create list of words to see if repeated
word_list = ["Emma", "Woodhouse", "father", "Taylor", "Miss", "been", "she", "her"]
def repeatedWords(fin, word_list):
#open the file
fin = open(filename, "r")
#create output file
fout = open("Repeated_word.txt", "w")
#loop through each word of the file
for line in fin:
#split the lines into words
words = line.split()
for word in words:
#check if word in words is equal to a word from word_list
for i in range(len(word_list)):
if word == i:
#count number of times word is in word
count = words.count(word)
fout.write(word, count)
fout.close
repeatedWords(fin, word_list)
These lines,
for i in range(len(word_list)):
if word == i:
should be
for i in range(len(word_list)):
if word == word_list[i]:
or
for i in word_list:
if word == i:
word is a string, whereas i is an integer, the way you have it right now. These are never equal, hence nothing ever gets written to the file.
In response to your further question, you can either 1) use a dictionary to keep track of how many of each word you have, or 2) read in the whole file at once. This is one way you might do that:
words = fin.read().split()
for word in word_list:
fout.write(word, words.count(word), '\n')
I leave it up to you to figure out where to put this in your code and what you need to replace. This is, after all, your assignment, not ours.
Seems like you are making several mistakes here:
[1] for i in range(len(word_list)):
[2] if word == i:
[3] #count number of times word is in word
[4] count = words.count(word)
[5] fout.write(word, count)
First, you are comparing the word from cin with an integer from the range. [line 2]
Then you are writing the count to fout upon every match per line. [line 5] I guess you should keep the counts (e.g. in a dict) and write them all at the end of parsing input file.