Complete beginner, searched a lot of threads but couldn't find a solution that fits me.
I have a text file, python_examples.txt which contains some words. On line four, the word hello appears twice in a row, like "hello hello".
My code is supposed to find the word the user inputs and count how many times it appears, it works but as I said, not if the same word appears multiple times on the same row. So there are 2 hellos on line 4 and one on line 13 but it only finds a total of 2 hellos. Fixes? Thanks,
user_input = input("Type in the word you are searching for: ")
word_count = 0
line_count = 0
with open ("python_example.txt", "r+") as f:
for line in f:
line_count += 1
if user_input in line:
word_count += 1
print("found " + user_input + " on line " + str(line_count))
print ("nothing on line " + str(line_count))
print ("\nfound a total of " + str(word_count) + " words containing " + "'" + user_input + "'")
you can use str.count:
word_count += line.count(user_input)
instead of :
word_count += 1
it will count all appearance of user_input in the file line
The issue is with these two lines:
if user_input in line:
word_count += 1
You increase the count by 1 if the input appears on the line, regardless of whether it appears more than once.
This should do the job:
user_input = input("Type in the word you are searching for: ")
word_count = 0
with open("python_example.txt") as f:
for line_num, line in enumerate(f, start=1):
line_inp_count = line.count(user_input)
if line_inp_count:
word_count += line_inp_count
print(f"input {user_input} appears {line_inp_count} time(s) on line {line_num}")
print(f"nothing on line {line_num}")
print(f"the input appeared a total of {word_count} times in {line_num} lines.")
Let me know if you have any questions :)
One option is use a library to parse the words in your text file rather than iterating one line at a time. There are several classes in nltk.tokenize which are easy to use.
import nltk.tokenize.regexp
def count_word_in_file(filepath, word):
"""Give the number for times word appears in text at filepath."""
tokenizer = nltk.tokenize.regexp.WordPunctTokenizer()
with open(filepath) as f:
tokens = tokenizer.tokenize(
return tokens.count(word)
This handles awkward cases like the substring 'hell' appearing in 'hello' as mentioned in a comment, and is also a route towards case-insenstive matching, stemming, and other refinements.
So this is a code of me trying to find a word a user inputs and look up how many lines contain the word and if no lines contain the word output not found however when i input a word that I know exist in the file it returns 0 and not only is the word in the file it doesn't even output not found like I want it to. (here is my code)
response = input('Please enter words: ')
letters = response.split()
count = 0
with open("alice.txt", "r", encoding="utf-8") as program:
for line in program:
if letters in line:
count += 1
if(count < 1):
print("not found")
What you're doing isn't gonna work the split function returns a list of strings and you're checking that list against a single string.
Is this what you wanted to do?
response = input("Please enter a word: ")
count = 0
with open("alice.txt", 'r') as program:
for line in program:
if response in line:
count += 1
if count == 0:
print("not found")
You dont need the split function and the place of if condition is wrong in your code. Please refer below code.
response = input('Please enter word: ')
count = 0
with open("alice.txt", "r", encoding="utf-8") as program:
for line in program:
if response in line:
count += 1
if count == 0:
print('Not found')
You had an issue with opening the txt file as a single line, and not as a list of the individual lines.
Adding ".readlines()" can fix this issue!
I also went ahead and set the individual lines as 'line', where I then search for the input word in the new 'line' variable.
response = input('Please enter words: ')
letters = response.split()
count = 0
foo = open(
"alice.txt", "r",
for line in foo:
for word in letters:
if word in line:
count += 1
if(count < 1):
print("not found")
I'm having problems figuring out how to get the total count of email addresses there are. The code I have written only comes up with the non-duplicate addresses, where the assignment is asking for the total number including the duplicates.
I've tried the for loop, and just setting count to the len() function and got the same result. I reread the materials and I am completely stumped as to how to include the duplicate entries.
fname = input("Enter file name: ")
if len(fname) == 0:
fname = "mbox-short.txt"
fh = open(fname)
for line in fh:
line = line.rstrip()
if not line.startswith('From '):
words = line.split()
count = len(words[1])
print("There were", count, "lines in the file with From as the first word")
Expected result: There were 27 lines in the file with From as the first word
Actual Result: There were 14 lines in the file with From as the first word
Increment a counter variable in the loop that's reading from the file.
count = 0
for line in fh:
line = line.rstrip()
if line.startswith('From '):
words = line.split()
count += 1
print("There were", count, "lines in the file with From as the first word")
I have to write a program that asks for a specific filename on the computer, counts the number of characters and words in the file and finally, the program should be able to count the amount of a word (from a user-input).
You are finishing iterating over the file before you are attempting to count the occurrence of a specific word. Reorganizing your code to put all of the counting in the file iterations should fix it.
numLines = 0
numWords = 0
numChars = 0
count = 0
filename = input("Which file would you like to work with?: ")
freq_word = input("Which word would you like to find the frequency for?: ")
with open(filename, 'r') as fin:
for line in fin:
words = line.split()
for word in words:
if word == freq_word:
count +=1
numWords += len(words)
numChars += len(line)
print(filename, "contains: ", numChars, "characters and total amount of words is: ", numWords)
print(freq_word, "occurs ", count, "number of time")
I am trying to count the number of paragraphs and the most frequent words in a text file (any text file for that matter) but seem to have zero output when I run my code, no errors either. Any tips on where I'm going wrong?
filename = input("enter file name: ")
inf = open(filename, 'r')
#frequent words
for word in
if word not in wordcount:
wordcount[word] = 1
wordcount[word] += 1
for key in wordcount.keys():
print ("%s %s " %(key , wordcount[key]))
#Count Paragraph(s)
linecount = 0
for i in inf:
paragraphcount = 0
if '\n' in i:
linecount += 1
if len(i) < 2: paragraphcount *= 0
elif len(i) > 2: paragraphcount = paragraphcount + 1
print('%-4d %4d %s' % (paragraphcount, linecount, i))
filename = raw_input("enter file name: ")
paragraphcount = 0
linecount = 0
with open(filename, 'r') as ftext:
for line in ftext.readlines():
if line in ('\n', '\r\n'):
if linecount == 0:
paragraphcount = paragraphcount + 1
linecount = linecount + 1
linecount = 0
#frequent words
for word in line.split():
wordcount[word] = wordcount.get(word,0) + 1
print wordcount
print paragraphcount
When you are reading a file, there is a cursor that indicates which byte you are reading at the moment. In your code, you are trying to read the file twice and encountered a strange behavior, which shoud have been a hint that you are doing something wrong. To the solution,
What is the correct way ?
You should read the file once, store every line, then find word count and paragraph count, using the same store. Rather than trying to reading it twice.
What is happening is the current code ?
When you first read the file, your byte cursor is set to the end of the file, when you try to read lines, if returns an empty list because it tries to read the end of the file. You can corrent this by resetting the file pointer(the cursor).
Call just before you try to read lines. But instead of this, you should be focusing on implementing a method I mentioned in the first section.
def showCounts(fileName):
lineCount = 0
wordCount = 0
numCount = 0
comCount = 0
dotCount = 0
with open(fileName, 'r') as f:
for line in f:
for char in line:
if char.isdigit() == True:
elif char == '.':
elif char == ',':
#i know formatting below looks off but it's right
words = line.split()
lineCount += 1
wordCount += len(words)
for word in words:
# text = word.translate(string.punctuation)
exclude = set(string.punctuation)
text = ""
text = ''.join(ch for ch in text if ch not in exclude)
if int(text) >= 0 or int(text) < 0:
numCount += 1
except ValueError:
print("Line count: " + str(lineCount))
print("Word count: " + str(wordCount))
print("Number count: " + str(numCount))
print("Comma count: " + str(comCount))
print("Dot count: " + str(dotCount) + "\n")
I have it read a .txt file containing words, lines, dots, commas, and numbers. It will give me the correct number of dots commas and numbers, but the words and lines values will be each much much higher than they actually are. Any one know why? Thanks guys.
I don't know if this is actually the answer, but my reputation isn't high enough to comment, so I'm putting it here. You obviously don't need to accept it as the final answer if it doesn't solve the issue.
So, I think it might have something to do with the fact that all of your print statements are actually outside of the showCounts() function. Try indenting the print statements.
I hope this helps.