Sorry if this is a silly question but I am new to python. I have a piece of code that was opening a text reading it, creating a list of words, then from that list create a dictionary of each word with a count of how many times it appears in the list of words. This code was working fine and was printing out the dictionary fine however when i put it in a function and called the function it returns the dictionary but only with one entry. Any ideas why, any help is much appreciated.
def createDict():
wordlist = []
with open('superman.txt','r', encoding="utf8") as superman:
for line in superman:
for word in line.split():
wordlist.append(word)
#print(word)
table = str.maketrans("!#$%&()*+, ./:;<=>?#[\]^_`{|}~0123456789'“”-''—", 47*' ' )
lenght = len(wordlist)
i = 0
while i < lenght:
wordlist[i] = wordlist[i].translate(table)
wordlist[i] = wordlist[i].lower()
wordlist[i] = wordlist[i].strip()
i += 1
wordlist = list(filter(str.strip, wordlist))
word_dict = {}
for item in wordlist:
if item in word_dict.keys():
word_dict[item] += 1
else:
word_dict[item] = 1
return(word_dict)
try initializing the dictionary outside of the function and then using global inside the function. Is that one item in the dictionary the last iteration?
Fix your indenting in your iterating over the wordlist. Should read:
for item in wordlist:
if item in word_dict.keys():
word_dict[item] += 1
else:
word_dict[item] = 1
this seems to be an indentation and whitespace issue. Make sure the if and else statements near the end of your function are at the same level.
Below is code I got working with the indentation at the correct level. In addition comments to explain the thought process
def createDict():
wordlist = []
with open('superman.txt','r', encoding="utf8") as superman:
for line in superman:
for word in line.split():
wordlist.append(word)
#print(word)
table = str.maketrans("!#$%&()*+, ./:;<=>?#[\]^_`{|}~0123456789'“”-''—", 47*' ' )
lenght = len(wordlist)
i = 0
while i < lenght:
wordlist[i] = wordlist[i].translate(table)
wordlist[i] = wordlist[i].lower()
wordlist[i] = wordlist[i].strip()
i += 1
wordlist = list(filter(str.strip, wordlist))
# print(len(wordlist)) # check to see if wordlist is fine. Indeed it is
word_dict = {}
for item in wordlist:
# for dictionaries don't worry about using dict.keys()
# method. You can use a shorter if [value] in [dict] conditional
# The issue in your code was the whitespace and indentation
# of the else statement.
# Please make sure that if and else are at the same indentation levels
# Python reads by indentation and whitespace because it
# doeesn't use curly brackets like other languages like javascript
if item in word_dict:
word_dict[item] += 1
else:
word_dict[item] = 1
return word_dict # print here too
Please let me know if you have any questions. Cheers!
Related
I'm trying to create a simple program that opens a file, splits it into single word lines (for ease of use) and creates a dictionary with the words, the key being the word and the value being the number of times the word is repeated. This is what I have so far:
infile = open('paragraph.txt', 'r')
word_dictionary = {}
string_split = infile.read().split()
for word in string_split:
if word not in word_dictionary:
word_dictionary[word] = 1
else:
word_dictionary[word] =+1
infile.close()
word_dictionary
The line word_dictionary prints nothing, meaning that the lines are not being put into a dictionary. Any help?
The paragraph.txt file contains this:
This is a sample text file to be used for a program. It should have nothing important in here or be used for anything else because it is useless. Use at your own will, or don't because there's no point in using it.
I want the dictionary to do something like this, but I don't care too much about the formatting.
Two things. First of all the shorter version of
num = num + 1
is
num += 1
not
num =+ 1
code
infile = open('paragraph.txt', 'r')
word_dictionary = {}
string_split = infile.read().split()
for word in string_split:
if word not in word_dictionary:
word_dictionary[word] = 1
else:
word_dictionary[word] +=1
infile.close()
print(word_dictionary)
Secondly you need to print word_dictionary
I am trying to model a bag of words. I am having some trouble incrementing the counter inside my dictionary when the word is found in my data (type series):
def build_voc(self, data):
for document in data:
for word in document.split(' '):
if word in self.voc:
self.voc_ctr[word] = self.voc_ctr[word] + 1
else:
self.voc.append(word)
self.voc_ctr = 1
I tried indexing it as well this way just to test where the error was:
self.voc_ctr[word][0] = self.voc_ctr[word][0] + 1
But it still gives me the same error at that line:
TypeError: 'int' object is not subscriptable
Knowing that this is a function in the same class, where self.voc and self.voc_ctr are defined:
class BV:
def __init__(self):
self.voc = []
self.voc_ctr = {}
def build_voc(self, data):
for document in data:
for word in document.split(' '):
if word in self.voc:
self.voc_ctr[word] = self.voc_ctr[word] + 1
else:
self.voc.append(word)
self.voc_ctr = 1
The error seems to say self.voc_ctr is an int object, but I defined it as a list so I don't know where I went wrong.
Your code isn't going into your "if" statement first, it's going into your "else" and initializing your self.voc_ctr to the integer, 1.
It look like you have more going on than just a counter not working. In this part of code:
if word in self.voc:
self.voc_ctr[word] = self.voc_ctr[word] + 1
...you're saying "If the word is in my list, create a dictionary entry containing that word and assign the value of the entry AFTER that new entry to it." Once you correct your initial 'int' error, you're going to get a KeyError. Since self.voc_ctr[word] won't exists until AFTER the assignment operation is complete, self.voc_ctr[word] + 1 won't exist either.
To implement a counter for each word, try doing this:
if word in self.voc:
self.voc_ctr[word] = 1
else:
self.voc_ctr[word] = 0
I don't know what else you have to do with this program, but this will solve your counter issue.
def build_voc(self, data):
for document in data:
for word in document.split(' '):
if word in self.voc:
self.voc_ctr[word] = self.voc_ctr[word] + 1
else:
self.voc.append(word)
self.voc_ctr = 1 ## <-------- The function fails here
The way you are doing is not the best/optimal way to do it, you do not need a list to first check and then add it to a dictionary
Dictionary itself is the best way to check if the word exists or not
Try to use the modified version
voc_ctr = {}
def build_voc(data):
for document in data:
for word in document.split(' '):
if word in voc:
voc_ctr[word] += 1
else:
voc_ctr = 1
I'd like to know how to achieve the same result as the code I listed below without using any collections, or for someone to explain what goes on inside the Counter collection (in code or in a way that isn't confusing) since I can't seem to find it anywhere. This code is meant to read a text file called juliet.txt. I am trying to make it count the amount of letters and spaces inside the document and then print it as a result.
Code:
from collections import Counter
text = open('juliet.txt', 'r').read()
letters = 0
counter = Counter(text)
spacesAndNewlines = counter[' '] + counter['\n']
while letters < len(text):
print (text[letters])
letters += 1
while letters == len(text):
print (letters)
letters += 1
print (spacesAndNewlines)
Sounds like a homework question to me, in which case you won't get any benefit from me answering you.
letters = {}
with open('juliet.txt') as fh:
data = fh.read()
for char in data:
if char in letters:
letters[char] = 1
else:
letters[char] += 1
print(letters)
This uses a standard dictionary - normally I would use a defaultdict but for some weird reason you don't like collections. With the defaultdict you wouldn't need to do the laborious test to see if the char is already in the dictionary.
Struggling with this exercise which must use a dictionary and count the number of times each word appears in a number of user inputs. It works in a fashion, but does not atomise each word from each line of user input. So instead of counting an input of 'happy days' as 1 x happy and 1 x days, it gives me 1 x happy days. I have tried split() along with the lower() but this converts the input to a list and I am struggling with then pouring that list into a dictionary.
As you may have guessed, I'm a bit of a novice, so all help would be greatly appreciated!
occurrences = {}
while True:
word = input('Enter line: ')
word = word.lower() #this is also where I have tried a split()
if word =='':
break
occurrences[word]=occurrences.get(word,0)+1
for word in (occurrences):
print(word, occurrences[word])
EDIT
Cheers for responses. This ended up being the final solution. They weren't worried about case and wanted the final results sorted().
occurrences = {}
while True:
words = input('Enter line: ')
if words =='':
break
for word in words.split():
occurrences[word]=occurrences.get(word,0)+1
for word in sorted(occurrences):
print(word, occurrences[word])
What you have is almost there, you just want to loop over the words when adding them to the dict
occurrences = {}
while True:
words = input('Enter line: ')
words = words.lower() #this is also where I have tried a split()
if words =='':
break
for word in words.split():
occurrences[word]=occurrences.get(word,0)+1
for word in (occurrences):
print(word, occurrences[word])
This line does not get executed: occurrences[word]=occurrences.get(word,0)+1
Because if it enters the if, it goes to the break and never executes that line. To make it be outside of the if don't indent it.
In general, the indentation of the posted code is messed up, I guess it's not really like that in your actual code.
Do you want line by line stats or do you want overall stats ? I'm guessing you want line by line, but you can also get overall stats easily by uncommenting a few lines in the following code:
# occurrences = dict() # create a dictionary here if yuo want to have incremental overall stats
while True:
words = input('Enter line: ')
if words =='':
break
word_list = words.lower().split()
print word_list
occurrences = dict() # create a dict here if you want line by line stats
for word in word_list:
occurrences[word] = occurrences.get(word,0)+1
## use the lines bellow if you want line by line stats
for k,v in occurrences.items():
print k, " X ", v
## use the lines bellow if you want overall stats
# for k,v in occurrences.items():
# print k, " X ", v
Here is my code:
def detLoser(frag, a):
word = frag + a
if word in wordlist:
lost = True
else:
for words in wordlist:
if words[:len(word) == word:
return #I want this to break out.
else:
lost = True
Where I have a return, I've tried putting in both return and break and both give me errors. Both give me the following error: SyntaxError: invalid syntax. Any Ideas? What is the best way to handle this?
You've omitted the ] from the list slice. But what is the code trying to achieve, anyway?
foo[ : len( foo ) ] == foo
always!
I assume this isn't the complete code -- if so, where is wordlist defined? (is it a list? -- it's much faster to test containment for a set.)
def detLoser(frag, a):
word = frag + a
if word in wordlist:
lost = True
else:
for words in wordlist:
if word.startswith(words):
return #I want this to break out.
else:
lost = True
you can probably rewrite the for loop using any or all eg. ( you should use a set instead of a list for wordlist though)
def detLoser(frag, a):
word = frag + a
return word in wordlist or any(w.startswith(word) for w in wordlist)