How to fix the counting of strings to include the duplicate entries

How to fix the counting of strings to include the duplicate entries - python

I'm having problems figuring out how to get the total count of email addresses there are. The code I have written only comes up with the non-duplicate addresses, where the assignment is asking for the total number including the duplicates.
I've tried the for loop, and just setting count to the len() function and got the same result. I reread the materials and I am completely stumped as to how to include the duplicate entries.
fname = input("Enter file name: ")
if len(fname) == 0:
fname = "mbox-short.txt"
fh = open(fname)
for line in fh:
line = line.rstrip()
if not line.startswith('From '):
continue
words = line.split()
print(words[1])
count = len(words[1])
print("There were", count, "lines in the file with From as the first word")
Expected result: There were 27 lines in the file with From as the first word
Actual Result: There were 14 lines in the file with From as the first word

Increment a counter variable in the loop that's reading from the file.
count = 0
for line in fh:
line = line.rstrip()
if line.startswith('From '):
words = line.split()
print(words[1])
count += 1
print("There were", count, "lines in the file with From as the first word")

Related

why is my code returning 0 even though word exists in file

So this is a code of me trying to find a word a user inputs and look up how many lines contain the word and if no lines contain the word output not found however when i input a word that I know exist in the file it returns 0 and not only is the word in the file it doesn't even output not found like I want it to. (here is my code)
response = input('Please enter words: ')
letters = response.split()
count = 0
with open("alice.txt", "r", encoding="utf-8") as program:
for line in program:
if letters in line:
count += 1
if(count < 1):
print("not found")
print(count)

What you're doing isn't gonna work the split function returns a list of strings and you're checking that list against a single string.
Is this what you wanted to do?
response = input("Please enter a word: ")
count = 0
with open("alice.txt", 'r') as program:
for line in program:
if response in line:
count += 1
if count == 0:
print("not found")
print(count)

You dont need the split function and the place of if condition is wrong in your code. Please refer below code.
response = input('Please enter word: ')
count = 0
with open("alice.txt", "r", encoding="utf-8") as program:
for line in program:
if response in line:
count += 1
if count == 0:
print('Not found')
else:
print(count)

You had an issue with opening the txt file as a single line, and not as a list of the individual lines.
Adding ".readlines()" can fix this issue!
I also went ahead and set the individual lines as 'line', where I then search for the input word in the new 'line' variable.
response = input('Please enter words: ')
letters = response.split()
count = 0
foo = open(
"alice.txt", "r",
encoding="utf-8").readlines()
for line in foo:
for word in letters:
if word in line:
count += 1
if(count < 1):
print("not found")
else:
print(count)

Counting several instances of the same word from a text file

Complete beginner, searched a lot of threads but couldn't find a solution that fits me.
I have a text file, python_examples.txt which contains some words. On line four, the word hello appears twice in a row, like "hello hello".
My code is supposed to find the word the user inputs and count how many times it appears, it works but as I said, not if the same word appears multiple times on the same row. So there are 2 hellos on line 4 and one on line 13 but it only finds a total of 2 hellos. Fixes? Thanks,
user_input = input("Type in the word you are searching for: ")
word_count = 0
line_count = 0
with open ("python_example.txt", "r+") as f:
for line in f:
line_count += 1
if user_input in line:
word_count += 1
print("found " + user_input + " on line " + str(line_count))
else:
print ("nothing on line " + str(line_count))
print ("\nfound a total of " + str(word_count) + " words containing " + "'" + user_input + "'")

you can use str.count:
word_count += line.count(user_input)
instead of :
word_count += 1
it will count all appearance of user_input in the file line

The issue is with these two lines:
if user_input in line:
word_count += 1
You increase the count by 1 if the input appears on the line, regardless of whether it appears more than once.
This should do the job:
user_input = input("Type in the word you are searching for: ")
word_count = 0
with open("python_example.txt") as f:
for line_num, line in enumerate(f, start=1):
line_inp_count = line.count(user_input)
if line_inp_count:
word_count += line_inp_count
print(f"input {user_input} appears {line_inp_count} time(s) on line {line_num}")
else:
print(f"nothing on line {line_num}")
print(f"the input appeared a total of {word_count} times in {line_num} lines.")
Let me know if you have any questions :)

One option is use a library to parse the words in your text file rather than iterating one line at a time. There are several classes in nltk.tokenize which are easy to use.
import nltk.tokenize.regexp
def count_word_in_file(filepath, word):
"""Give the number for times word appears in text at filepath."""
tokenizer = nltk.tokenize.regexp.WordPunctTokenizer()
with open(filepath) as f:
tokens = tokenizer.tokenize(f.read())
return tokens.count(word)
This handles awkward cases like the substring 'hell' appearing in 'hello' as mentioned in a comment, and is also a route towards case-insenstive matching, stemming, and other refinements.

Counting occurrences of word in a text file

I have to write a program that asks for a specific filename on the computer, counts the number of characters and words in the file and finally, the program should be able to count the amount of a word (from a user-input).

You are finishing iterating over the file before you are attempting to count the occurrence of a specific word. Reorganizing your code to put all of the counting in the file iterations should fix it.
numLines = 0
numWords = 0
numChars = 0
count = 0
filename = input("Which file would you like to work with?: ")
freq_word = input("Which word would you like to find the frequency for?: ")
with open(filename, 'r') as fin:
for line in fin:
words = line.split()
for word in words:
if word == freq_word:
count +=1
numWords += len(words)
numChars += len(line)
print(filename, "contains: ", numChars, "characters and total amount of words is: ", numWords)
print(freq_word, "occurs ", count, "number of time")

Python - location of a word in a file wrt beginning of the file

I want to get a position of a word in a file from the beginning of the file. It should count \n, \r, \t also while counting the position. I tried index and find commands but they work on individual line and does not give location from the beginning. Can someone suggest anything? I have written below code:
def findword(filename:str, word:str):
try:
file = open(filename, 'r')
count = 0
occurances = []
for line in file:
if word in line:
print("word found")
occurance = line.find(word)
occurances.append(occurance)
print(occurance)
count = count + 1
print(count)
occurances.sort()
print(occurances)
if count == 0:
print("word not found")
except FileNotFoundError:
print("warning: file not found")
return -1
Thank you.

Counting Paragraph and Most Frequent Words in Python Text File

I am trying to count the number of paragraphs and the most frequent words in a text file (any text file for that matter) but seem to have zero output when I run my code, no errors either. Any tips on where I'm going wrong?
filename = input("enter file name: ")
inf = open(filename, 'r')
#frequent words
wordcount={}
for word in inf.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for key in wordcount.keys():
print ("%s %s " %(key , wordcount[key]))
#Count Paragraph(s)
linecount = 0
for i in inf:
paragraphcount = 0
if '\n' in i:
linecount += 1
if len(i) < 2: paragraphcount *= 0
elif len(i) > 2: paragraphcount = paragraphcount + 1
print('%-4d %4d %s' % (paragraphcount, linecount, i))
inf.close()

filename = raw_input("enter file name: ")
wordcount={}
paragraphcount = 0
linecount = 0
with open(filename, 'r') as ftext:
for line in ftext.readlines():
if line in ('\n', '\r\n'):
if linecount == 0:
paragraphcount = paragraphcount + 1
linecount = linecount + 1
else:
linecount = 0
#frequent words
for word in line.split():
wordcount[word] = wordcount.get(word,0) + 1
print wordcount
print paragraphcount

When you are reading a file, there is a cursor that indicates which byte you are reading at the moment. In your code, you are trying to read the file twice and encountered a strange behavior, which shoud have been a hint that you are doing something wrong. To the solution,
What is the correct way ?
You should read the file once, store every line, then find word count and paragraph count, using the same store. Rather than trying to reading it twice.
What is happening is the current code ?
When you first read the file, your byte cursor is set to the end of the file, when you try to read lines, if returns an empty list because it tries to read the end of the file. You can corrent this by resetting the file pointer(the cursor).
Call inf.seek(0) just before you try to read lines. But instead of this, you should be focusing on implementing a method I mentioned in the first section.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to fix the counting of strings to include the duplicate entries - python

Increment a counter variable in the loop that's reading from the file. count = 0 for line in fh: line = line.rstrip() if line.startswith('From '): words = line.split() print(words[1]) count += 1 print("There were", count, "lines in the file with From as the first word")

Related

why is my code returning 0 even though word exists in file

Counting several instances of the same word from a text file

Counting occurrences of word in a text file

Python - location of a word in a file wrt beginning of the file

Counting Paragraph and Most Frequent Words in Python Text File

Categories

Resources