Counting occurrences of word in a text file - python

I have to write a program that asks for a specific filename on the computer, counts the number of characters and words in the file and finally, the program should be able to count the amount of a word (from a user-input).

You are finishing iterating over the file before you are attempting to count the occurrence of a specific word. Reorganizing your code to put all of the counting in the file iterations should fix it.
numLines = 0
numWords = 0
numChars = 0
count = 0
filename = input("Which file would you like to work with?: ")
freq_word = input("Which word would you like to find the frequency for?: ")
with open(filename, 'r') as fin:
for line in fin:
words = line.split()
for word in words:
if word == freq_word:
count +=1
numWords += len(words)
numChars += len(line)
print(filename, "contains: ", numChars, "characters and total amount of words is: ", numWords)
print(freq_word, "occurs ", count, "number of time")

Related

why is my code returning 0 even though word exists in file

So this is a code of me trying to find a word a user inputs and look up how many lines contain the word and if no lines contain the word output not found however when i input a word that I know exist in the file it returns 0 and not only is the word in the file it doesn't even output not found like I want it to. (here is my code)
response = input('Please enter words: ')
letters = response.split()
count = 0
with open("alice.txt", "r", encoding="utf-8") as program:
for line in program:
if letters in line:
count += 1
if(count < 1):
print("not found")
print(count)
What you're doing isn't gonna work the split function returns a list of strings and you're checking that list against a single string.
Is this what you wanted to do?
response = input("Please enter a word: ")
count = 0
with open("alice.txt", 'r') as program:
for line in program:
if response in line:
count += 1
if count == 0:
print("not found")
print(count)
You dont need the split function and the place of if condition is wrong in your code. Please refer below code.
response = input('Please enter word: ')
count = 0
with open("alice.txt", "r", encoding="utf-8") as program:
for line in program:
if response in line:
count += 1
if count == 0:
print('Not found')
else:
print(count)
You had an issue with opening the txt file as a single line, and not as a list of the individual lines.
Adding ".readlines()" can fix this issue!
I also went ahead and set the individual lines as 'line', where I then search for the input word in the new 'line' variable.
response = input('Please enter words: ')
letters = response.split()
count = 0
foo = open(
"alice.txt", "r",
encoding="utf-8").readlines()
for line in foo:
for word in letters:
if word in line:
count += 1
if(count < 1):
print("not found")
else:
print(count)

Counting several instances of the same word from a text file

Complete beginner, searched a lot of threads but couldn't find a solution that fits me.
I have a text file, python_examples.txt which contains some words. On line four, the word hello appears twice in a row, like "hello hello".
My code is supposed to find the word the user inputs and count how many times it appears, it works but as I said, not if the same word appears multiple times on the same row. So there are 2 hellos on line 4 and one on line 13 but it only finds a total of 2 hellos. Fixes? Thanks,
user_input = input("Type in the word you are searching for: ")
word_count = 0
line_count = 0
with open ("python_example.txt", "r+") as f:
for line in f:
line_count += 1
if user_input in line:
word_count += 1
print("found " + user_input + " on line " + str(line_count))
else:
print ("nothing on line " + str(line_count))
print ("\nfound a total of " + str(word_count) + " words containing " + "'" + user_input + "'")
you can use str.count:
word_count += line.count(user_input)
instead of :
word_count += 1
it will count all appearance of user_input in the file line
The issue is with these two lines:
if user_input in line:
word_count += 1
You increase the count by 1 if the input appears on the line, regardless of whether it appears more than once.
This should do the job:
user_input = input("Type in the word you are searching for: ")
word_count = 0
with open("python_example.txt") as f:
for line_num, line in enumerate(f, start=1):
line_inp_count = line.count(user_input)
if line_inp_count:
word_count += line_inp_count
print(f"input {user_input} appears {line_inp_count} time(s) on line {line_num}")
else:
print(f"nothing on line {line_num}")
print(f"the input appeared a total of {word_count} times in {line_num} lines.")
Let me know if you have any questions :)
One option is use a library to parse the words in your text file rather than iterating one line at a time. There are several classes in nltk.tokenize which are easy to use.
import nltk.tokenize.regexp
def count_word_in_file(filepath, word):
"""Give the number for times word appears in text at filepath."""
tokenizer = nltk.tokenize.regexp.WordPunctTokenizer()
with open(filepath) as f:
tokens = tokenizer.tokenize(f.read())
return tokens.count(word)
This handles awkward cases like the substring 'hell' appearing in 'hello' as mentioned in a comment, and is also a route towards case-insenstive matching, stemming, and other refinements.

How to fix the counting of strings to include the duplicate entries

I'm having problems figuring out how to get the total count of email addresses there are. The code I have written only comes up with the non-duplicate addresses, where the assignment is asking for the total number including the duplicates.
I've tried the for loop, and just setting count to the len() function and got the same result. I reread the materials and I am completely stumped as to how to include the duplicate entries.
fname = input("Enter file name: ")
if len(fname) == 0:
fname = "mbox-short.txt"
fh = open(fname)
for line in fh:
line = line.rstrip()
if not line.startswith('From '):
continue
words = line.split()
print(words[1])
count = len(words[1])
print("There were", count, "lines in the file with From as the first word")
Expected result: There were 27 lines in the file with From as the first word
Actual Result: There were 14 lines in the file with From as the first word
Increment a counter variable in the loop that's reading from the file.
count = 0
for line in fh:
line = line.rstrip()
if line.startswith('From '):
words = line.split()
print(words[1])
count += 1
print("There were", count, "lines in the file with From as the first word")

Counting Paragraph and Most Frequent Words in Python Text File

I am trying to count the number of paragraphs and the most frequent words in a text file (any text file for that matter) but seem to have zero output when I run my code, no errors either. Any tips on where I'm going wrong?
filename = input("enter file name: ")
inf = open(filename, 'r')
#frequent words
wordcount={}
for word in inf.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for key in wordcount.keys():
print ("%s %s " %(key , wordcount[key]))
#Count Paragraph(s)
linecount = 0
for i in inf:
paragraphcount = 0
if '\n' in i:
linecount += 1
if len(i) < 2: paragraphcount *= 0
elif len(i) > 2: paragraphcount = paragraphcount + 1
print('%-4d %4d %s' % (paragraphcount, linecount, i))
inf.close()
filename = raw_input("enter file name: ")
wordcount={}
paragraphcount = 0
linecount = 0
with open(filename, 'r') as ftext:
for line in ftext.readlines():
if line in ('\n', '\r\n'):
if linecount == 0:
paragraphcount = paragraphcount + 1
linecount = linecount + 1
else:
linecount = 0
#frequent words
for word in line.split():
wordcount[word] = wordcount.get(word,0) + 1
print wordcount
print paragraphcount
When you are reading a file, there is a cursor that indicates which byte you are reading at the moment. In your code, you are trying to read the file twice and encountered a strange behavior, which shoud have been a hint that you are doing something wrong. To the solution,
What is the correct way ?
You should read the file once, store every line, then find word count and paragraph count, using the same store. Rather than trying to reading it twice.
What is happening is the current code ?
When you first read the file, your byte cursor is set to the end of the file, when you try to read lines, if returns an empty list because it tries to read the end of the file. You can corrent this by resetting the file pointer(the cursor).
Call inf.seek(0) just before you try to read lines. But instead of this, you should be focusing on implementing a method I mentioned in the first section.

How to read a text file in Python

I have an assignment that reads:
Write a function which takes the input file name and list of words
and write into the file “Repeated_word.txt” the word and number of
times word repeated in input file?
word_list = [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
My code is below.
All it does is create the new file 'Repeated_word.txt' however it doesn't write the number of times the word from the wordlist appears in the file.
#obtain the name of the file
filename = raw_input("What is the file being used?: ")
fin = open(filename, "r")
#create list of words to see if repeated
word_list = ["Emma", "Woodhouse", "father", "Taylor", "Miss", "been", "she", "her"]
def repeatedWords(fin, word_list):
#open the file
fin = open(filename, "r")
#create output file
fout = open("Repeated_word.txt", "w")
#loop through each word of the file
for line in fin:
#split the lines into words
words = line.split()
for word in words:
#check if word in words is equal to a word from word_list
for i in range(len(word_list)):
if word == i:
#count number of times word is in word
count = words.count(word)
fout.write(word, count)
fout.close
repeatedWords(fin, word_list)
These lines,
for i in range(len(word_list)):
if word == i:
should be
for i in range(len(word_list)):
if word == word_list[i]:
or
for i in word_list:
if word == i:
word is a string, whereas i is an integer, the way you have it right now. These are never equal, hence nothing ever gets written to the file.
In response to your further question, you can either 1) use a dictionary to keep track of how many of each word you have, or 2) read in the whole file at once. This is one way you might do that:
words = fin.read().split()
for word in word_list:
fout.write(word, words.count(word), '\n')
I leave it up to you to figure out where to put this in your code and what you need to replace. This is, after all, your assignment, not ours.
Seems like you are making several mistakes here:
[1] for i in range(len(word_list)):
[2] if word == i:
[3] #count number of times word is in word
[4] count = words.count(word)
[5] fout.write(word, count)
First, you are comparing the word from cin with an integer from the range. [line 2]
Then you are writing the count to fout upon every match per line. [line 5] I guess you should keep the counts (e.g. in a dict) and write them all at the end of parsing input file.

Categories