I'm trying to create a simple program that opens a file, splits it into single word lines (for ease of use) and creates a dictionary with the words, the key being the word and the value being the number of times the word is repeated. This is what I have so far:
infile = open('paragraph.txt', 'r')
word_dictionary = {}
string_split = infile.read().split()
for word in string_split:
if word not in word_dictionary:
word_dictionary[word] = 1
else:
word_dictionary[word] =+1
infile.close()
word_dictionary
The line word_dictionary prints nothing, meaning that the lines are not being put into a dictionary. Any help?
The paragraph.txt file contains this:
This is a sample text file to be used for a program. It should have nothing important in here or be used for anything else because it is useless. Use at your own will, or don't because there's no point in using it.
I want the dictionary to do something like this, but I don't care too much about the formatting.
Two things. First of all the shorter version of
num = num + 1
is
num += 1
not
num =+ 1
code
infile = open('paragraph.txt', 'r')
word_dictionary = {}
string_split = infile.read().split()
for word in string_split:
if word not in word_dictionary:
word_dictionary[word] = 1
else:
word_dictionary[word] +=1
infile.close()
print(word_dictionary)
Secondly you need to print word_dictionary
Related
Sorry if this is a silly question but I am new to python. I have a piece of code that was opening a text reading it, creating a list of words, then from that list create a dictionary of each word with a count of how many times it appears in the list of words. This code was working fine and was printing out the dictionary fine however when i put it in a function and called the function it returns the dictionary but only with one entry. Any ideas why, any help is much appreciated.
def createDict():
wordlist = []
with open('superman.txt','r', encoding="utf8") as superman:
for line in superman:
for word in line.split():
wordlist.append(word)
#print(word)
table = str.maketrans("!#$%&()*+, ./:;<=>?#[\]^_`{|}~0123456789'“”-''—", 47*' ' )
lenght = len(wordlist)
i = 0
while i < lenght:
wordlist[i] = wordlist[i].translate(table)
wordlist[i] = wordlist[i].lower()
wordlist[i] = wordlist[i].strip()
i += 1
wordlist = list(filter(str.strip, wordlist))
word_dict = {}
for item in wordlist:
if item in word_dict.keys():
word_dict[item] += 1
else:
word_dict[item] = 1
return(word_dict)
try initializing the dictionary outside of the function and then using global inside the function. Is that one item in the dictionary the last iteration?
Fix your indenting in your iterating over the wordlist. Should read:
for item in wordlist:
if item in word_dict.keys():
word_dict[item] += 1
else:
word_dict[item] = 1
this seems to be an indentation and whitespace issue. Make sure the if and else statements near the end of your function are at the same level.
Below is code I got working with the indentation at the correct level. In addition comments to explain the thought process
def createDict():
wordlist = []
with open('superman.txt','r', encoding="utf8") as superman:
for line in superman:
for word in line.split():
wordlist.append(word)
#print(word)
table = str.maketrans("!#$%&()*+, ./:;<=>?#[\]^_`{|}~0123456789'“”-''—", 47*' ' )
lenght = len(wordlist)
i = 0
while i < lenght:
wordlist[i] = wordlist[i].translate(table)
wordlist[i] = wordlist[i].lower()
wordlist[i] = wordlist[i].strip()
i += 1
wordlist = list(filter(str.strip, wordlist))
# print(len(wordlist)) # check to see if wordlist is fine. Indeed it is
word_dict = {}
for item in wordlist:
# for dictionaries don't worry about using dict.keys()
# method. You can use a shorter if [value] in [dict] conditional
# The issue in your code was the whitespace and indentation
# of the else statement.
# Please make sure that if and else are at the same indentation levels
# Python reads by indentation and whitespace because it
# doeesn't use curly brackets like other languages like javascript
if item in word_dict:
word_dict[item] += 1
else:
word_dict[item] = 1
return word_dict # print here too
Please let me know if you have any questions. Cheers!
I am opening trying to create a function that opens a .txt file and counts the words that have the same length as the number specified by the user.
The .txt file is:
This is a random text document. How many words have a length of one?
How many words have the length three? We have the power to figure it out!
Is a function capable of doing this?
I'm able to open and read the file, but I am unable to exclude punctuation and find the length of each word.
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
if len(i) == number:
count += 1
return count
You can try using the replace() on the string and pass in the desired punctuation and replace it with an empty string("").
It would look something like this:
puncstr = "Hello!"
nopuncstr = puncstr.replace(".", "").replace("?", "").replace("!", "")
I have written a sample code to remove punctuations and to count the number of words. Modify according to your requirement.
import re
fin = """This is a random text document. How many words have a length of one? How many words have the length three? We have the power to figure it out! Is a function capable of doing this?"""
fin = re.sub(r'[^\w\s]','',fin)
print(len(fin.split()))
The above code prints the number of words. Hope this helps!!
instead of cascading replace() just use strip() a one time call
Edit: a cleaner version
pl = '?!."\'' # punctuation list
def samplePractice(number):
with open('sample.txt', 'r') as fin:
words = fin.read().split()
# clean words
words = [w.strip(pl) for w in words]
count = 0
for word in words:
if len(word) == number:
print(word, end=', ')
count += 1
return count
result = samplePractice(4)
print('\nResult:', result)
output:
This, text, many, have, many, have, have, this,
Result: 8
your code is almost ok, it just the second for block in wrong position
pl = '?!."\'' # punctuation list
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
i = i.strip(pl) # clean the word by strip
if len(i) == number:
count += 1
return count
result = samplePractice(4)
print(result)
output:
8
I am tasked with building a program that will ask for an input for a word. I am to write a program to search the word in a dictionary. (I already have composed)
[My hint is: you will find the first character of the word. Get the list of words that starts with that character.
Traverse the list to find the word.]
So far I have the following code:
Word = input ("Search word: ")
my_file = open("input.txt",'r')
d = {}
for line in my_file:
key = line[0]
if key not in d:
d[key] = [line.strip("\n")]
else:d[key].append(line.strip("\n"))
I have gotten close, but I am stuck. Thank you in advance!
user_word=input("Search word: ")
def file_records():
with open("input.txt",'r') as fd:
for line in fd:
yield line.strip()
for record in file_records():
if record == user_word:
print ("Word is found")
break
for record in file_records():
if record != user_word:
print ("Word is not found")
break
You could do something like this,
words = []
with open("input.txt",'r') as fd:
words = [w.strip() for w in fd.readlines()]
user_word in words #will return True or False. Eg. "americophobia" in ["americophobia",...]
fd.readlines() reads all the lines in the file to a list and then w.strip() should strip off all leading and ending whitespaces (including newline). Else try - w.strip( \r\n\t)
[w.strip() for w in fd.readlines()] is called list comprehension in python
This should work as long as the file is not too huge. If there are millions of record, you might want to consider creating a genertor function to read file. Something like,
def file_records():
with open("input.txt",'r') as fd:
for line in fd:
yield line.strip()
#and then call this function as
for record in file_records():
if record == user_word:
print(user_word + " is found")
break
else:
print(user_word + " is not found")
PS: Not sure why you would need a python dictionary. Your professor would have meant English dictionary :)
I have an assignment that reads:
Write a function which takes the input file name and list of words
and write into the file “Repeated_word.txt” the word and number of
times word repeated in input file?
word_list = [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
My code is below.
All it does is create the new file 'Repeated_word.txt' however it doesn't write the number of times the word from the wordlist appears in the file.
#obtain the name of the file
filename = raw_input("What is the file being used?: ")
fin = open(filename, "r")
#create list of words to see if repeated
word_list = ["Emma", "Woodhouse", "father", "Taylor", "Miss", "been", "she", "her"]
def repeatedWords(fin, word_list):
#open the file
fin = open(filename, "r")
#create output file
fout = open("Repeated_word.txt", "w")
#loop through each word of the file
for line in fin:
#split the lines into words
words = line.split()
for word in words:
#check if word in words is equal to a word from word_list
for i in range(len(word_list)):
if word == i:
#count number of times word is in word
count = words.count(word)
fout.write(word, count)
fout.close
repeatedWords(fin, word_list)
These lines,
for i in range(len(word_list)):
if word == i:
should be
for i in range(len(word_list)):
if word == word_list[i]:
or
for i in word_list:
if word == i:
word is a string, whereas i is an integer, the way you have it right now. These are never equal, hence nothing ever gets written to the file.
In response to your further question, you can either 1) use a dictionary to keep track of how many of each word you have, or 2) read in the whole file at once. This is one way you might do that:
words = fin.read().split()
for word in word_list:
fout.write(word, words.count(word), '\n')
I leave it up to you to figure out where to put this in your code and what you need to replace. This is, after all, your assignment, not ours.
Seems like you are making several mistakes here:
[1] for i in range(len(word_list)):
[2] if word == i:
[3] #count number of times word is in word
[4] count = words.count(word)
[5] fout.write(word, count)
First, you are comparing the word from cin with an integer from the range. [line 2]
Then you are writing the count to fout upon every match per line. [line 5] I guess you should keep the counts (e.g. in a dict) and write them all at the end of parsing input file.
The task is to write a program which prompts for a filename and then produces a concordance of that file.
Ex. A concordance is an alphabetical index that shows the lines in a document where each word occurs. For example, a concordance for this paragraph might appear as:
Word Line Number
a 1 1 2
alphabetical 1
an 1
appear 2
Here I make a list so that I can sort the words.
I have this code:
f = open(raw_input("Enter a filename: "), "r")
myDict = {}
linenum = 0
for line in f:
line = line.strip()
line = line.lower()
line = line.split()
linenum += 1
for word in line:
word = word.strip()
word = word.lower()
myDict[word] = linenum
if word in myDict:
myDict.sort()
else:
myDict.append(word)
print "%-15s %-15s" %("Word", "Line Number")
print "%-15s %-15d" %(myDict.keys(), myDict.values())
When I run the program now it says 'dict' has no attribute 'sort'. Can you explain this please?
The file is the same as the example and the output should also be the example from above. I'm very new at python please help :[
I think it makes sense to use a dict, but you'll have to add a key along with each value you add to the dict. For example:
>>> dict = {}
>>> dict["apple"] = "red"
>>> dict["banana"] = "yellow"
>>> dict
{'apple': 'red', 'banana': 'yellow'}
In this example, the keys are "apple" and "banana", and the values are "red" and "yellow". Since this is homework, I'll leave it up to you to determine appropriate keys and values for your assignment.
Also, this line is problematic:
for word in line:
line is a string, so you're actually looking at each character in line, rather than each word. You'll have to find some way to transform line into a list of words...
Lastly, your final statement will only print the last word read. You're building a dict, but you're not printing the dict, you're printing a single value. Once you build the dict, you should print the dict itself.
myDict[word] = linenum
if word in myDict:
myDict.sort()
else:
myDict.append(word)
You're on the right path, but sorting the dictionary isn't the right way to handle words that appear more than once (furthermore, dict doesn't have a sort method, which is why you're getting an error, but even if it did, you wouldn't need it here). Also, once you assign a value to a key, it's added to the dictionary, so it's already been "appended".
In your example, the word a appears 3 times, and the output lists each line it appears in, so you'll need a way to store a list of lines for each word.
Do you want myDict to just be a list? If so, declare it as myDict = []. A list has sort and append functions, but a dictionary doesn’t.
you can easily sort the order of the dictionary this way:
f = open(raw_input("Enter a filename: "), "r")
myDict = {}
linenum = 0
for line in f:
line = line.strip()
line = line.lower()
line = line.split()
linenum += 1
for word in line:
word = word.strip()
word = word.lower()
if not word in myDict:
myDict[word] = []
myDict[word].append(linenum)
print "%-15s %-15s" %("Word", "Line Number")
for key in sorted(myDict):
print '%-15s: %-15d' % (key, myDict(key))
Hope it helps
Jordi