Counting Paragraph and Most Frequent Words in Python Text File - python

I am trying to count the number of paragraphs and the most frequent words in a text file (any text file for that matter) but seem to have zero output when I run my code, no errors either. Any tips on where I'm going wrong?
filename = input("enter file name: ")
inf = open(filename, 'r')
#frequent words
wordcount={}
for word in inf.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for key in wordcount.keys():
print ("%s %s " %(key , wordcount[key]))
#Count Paragraph(s)
linecount = 0
for i in inf:
paragraphcount = 0
if '\n' in i:
linecount += 1
if len(i) < 2: paragraphcount *= 0
elif len(i) > 2: paragraphcount = paragraphcount + 1
print('%-4d %4d %s' % (paragraphcount, linecount, i))
inf.close()

filename = raw_input("enter file name: ")
wordcount={}
paragraphcount = 0
linecount = 0
with open(filename, 'r') as ftext:
for line in ftext.readlines():
if line in ('\n', '\r\n'):
if linecount == 0:
paragraphcount = paragraphcount + 1
linecount = linecount + 1
else:
linecount = 0
#frequent words
for word in line.split():
wordcount[word] = wordcount.get(word,0) + 1
print wordcount
print paragraphcount

When you are reading a file, there is a cursor that indicates which byte you are reading at the moment. In your code, you are trying to read the file twice and encountered a strange behavior, which shoud have been a hint that you are doing something wrong. To the solution,
What is the correct way ?
You should read the file once, store every line, then find word count and paragraph count, using the same store. Rather than trying to reading it twice.
What is happening is the current code ?
When you first read the file, your byte cursor is set to the end of the file, when you try to read lines, if returns an empty list because it tries to read the end of the file. You can corrent this by resetting the file pointer(the cursor).
Call inf.seek(0) just before you try to read lines. But instead of this, you should be focusing on implementing a method I mentioned in the first section.

Related

Counting several instances of the same word from a text file

Complete beginner, searched a lot of threads but couldn't find a solution that fits me.
I have a text file, python_examples.txt which contains some words. On line four, the word hello appears twice in a row, like "hello hello".
My code is supposed to find the word the user inputs and count how many times it appears, it works but as I said, not if the same word appears multiple times on the same row. So there are 2 hellos on line 4 and one on line 13 but it only finds a total of 2 hellos. Fixes? Thanks,
user_input = input("Type in the word you are searching for: ")
word_count = 0
line_count = 0
with open ("python_example.txt", "r+") as f:
for line in f:
line_count += 1
if user_input in line:
word_count += 1
print("found " + user_input + " on line " + str(line_count))
else:
print ("nothing on line " + str(line_count))
print ("\nfound a total of " + str(word_count) + " words containing " + "'" + user_input + "'")
you can use str.count:
word_count += line.count(user_input)
instead of :
word_count += 1
it will count all appearance of user_input in the file line
The issue is with these two lines:
if user_input in line:
word_count += 1
You increase the count by 1 if the input appears on the line, regardless of whether it appears more than once.
This should do the job:
user_input = input("Type in the word you are searching for: ")
word_count = 0
with open("python_example.txt") as f:
for line_num, line in enumerate(f, start=1):
line_inp_count = line.count(user_input)
if line_inp_count:
word_count += line_inp_count
print(f"input {user_input} appears {line_inp_count} time(s) on line {line_num}")
else:
print(f"nothing on line {line_num}")
print(f"the input appeared a total of {word_count} times in {line_num} lines.")
Let me know if you have any questions :)
One option is use a library to parse the words in your text file rather than iterating one line at a time. There are several classes in nltk.tokenize which are easy to use.
import nltk.tokenize.regexp
def count_word_in_file(filepath, word):
"""Give the number for times word appears in text at filepath."""
tokenizer = nltk.tokenize.regexp.WordPunctTokenizer()
with open(filepath) as f:
tokens = tokenizer.tokenize(f.read())
return tokens.count(word)
This handles awkward cases like the substring 'hell' appearing in 'hello' as mentioned in a comment, and is also a route towards case-insenstive matching, stemming, and other refinements.

Read First 10 Lines in a File; If Shorter Only Read Those Lines

I want to open a file, and read the first 10 lines of a file. If a file has less than 10 lines it should read as many lines as it has. Each line has to be numbered, wether it's text or it's whitespace. Because I have to strip each line, I can't differentiate between an empty string, and the end of a file. For example if I read a file with only three lines, it will print out lines 1 - 10, with lines 4 - 10 being empty, but I would like to have it stop after reaching that 3rd line, as that would be the end of the file. I would really appreciate any help, thank you.
def get_file_name():
fileName = input('Input File Name: ')
return fileName
def top(fileName):
try:
file = open(fileName, 'r')
line = 'text'
cnt = 1
while cnt <= 10:
if line != '':
line = file.readline()
line = line.rstrip('\n')
print(str(cnt) + '.', line)
cnt += 1
else:
line = file.readline()
line = line.rstrip('\n')
print(str(cnt) + '.', line)
cnt += 1
file.close()
except IOError:
print('FILE NOT FOUND ERROR:', fileName)
def main():
fileName = get_file_name()
top(fileName)
main()
def read_lines():
f = open("file-name.txt","r")
num = 1
for line in f:
if num > 10:
break
print("LINE NO.",num, ":",line)
num = num + 1
f.close()
Here, the loop exits at the end of the file. So if you only had 7 lines, it will exit automatically after the 7th line.
However, if you have 10 or more than 10 lines then the "num" variable takes care of that.
EDIT: I have edited the print statement to include the line count as well and started the line count with 1.
with open(filename, 'r') as f:
cnt = 1
for line in f:
if cnt <= 10:
print(str(cnt) + '.', line, end='')
cnt += 1
else:
break
This should do exactly what you need. You can always remove the if/else and then it will read exactly however many lines are in the file. Example:
with open(filename, 'r') as f:
cnt = 1
for line in f:
print(str(cnt) + '.', line, end='')
cnt += 1
You can try to load all the lines into array, count the total line and use an if statement to check if total is 10 or not, then finally use a for loop like for i in range (0,9): to print the lines.

How to fix the counting of strings to include the duplicate entries

I'm having problems figuring out how to get the total count of email addresses there are. The code I have written only comes up with the non-duplicate addresses, where the assignment is asking for the total number including the duplicates.
I've tried the for loop, and just setting count to the len() function and got the same result. I reread the materials and I am completely stumped as to how to include the duplicate entries.
fname = input("Enter file name: ")
if len(fname) == 0:
fname = "mbox-short.txt"
fh = open(fname)
for line in fh:
line = line.rstrip()
if not line.startswith('From '):
continue
words = line.split()
print(words[1])
count = len(words[1])
print("There were", count, "lines in the file with From as the first word")
Expected result: There were 27 lines in the file with From as the first word
Actual Result: There were 14 lines in the file with From as the first word
Increment a counter variable in the loop that's reading from the file.
count = 0
for line in fh:
line = line.rstrip()
if line.startswith('From '):
words = line.split()
print(words[1])
count += 1
print("There were", count, "lines in the file with From as the first word")

Counting occurrences of word in a text file

I have to write a program that asks for a specific filename on the computer, counts the number of characters and words in the file and finally, the program should be able to count the amount of a word (from a user-input).
You are finishing iterating over the file before you are attempting to count the occurrence of a specific word. Reorganizing your code to put all of the counting in the file iterations should fix it.
numLines = 0
numWords = 0
numChars = 0
count = 0
filename = input("Which file would you like to work with?: ")
freq_word = input("Which word would you like to find the frequency for?: ")
with open(filename, 'r') as fin:
for line in fin:
words = line.split()
for word in words:
if word == freq_word:
count +=1
numWords += len(words)
numChars += len(line)
print(filename, "contains: ", numChars, "characters and total amount of words is: ", numWords)
print(freq_word, "occurs ", count, "number of time")

Python - I read a file but it shows me erroneous values?

def showCounts(fileName):
lineCount = 0
wordCount = 0
numCount = 0
comCount = 0
dotCount = 0
with open(fileName, 'r') as f:
for line in f:
for char in line:
if char.isdigit() == True:
numCount+=1
elif char == '.':
dotCount+=1
elif char == ',':
comCount+=1
#i know formatting below looks off but it's right
words = line.split()
lineCount += 1
wordCount += len(words)
for word in words:
# text = word.translate(string.punctuation)
exclude = set(string.punctuation)
text = ""
text = ''.join(ch for ch in text if ch not in exclude)
try:
if int(text) >= 0 or int(text) < 0:
numCount += 1
except ValueError:
pass
print("Line count: " + str(lineCount))
print("Word count: " + str(wordCount))
print("Number count: " + str(numCount))
print("Comma count: " + str(comCount))
print("Dot count: " + str(dotCount) + "\n")
I have it read a .txt file containing words, lines, dots, commas, and numbers. It will give me the correct number of dots commas and numbers, but the words and lines values will be each much much higher than they actually are. Any one know why? Thanks guys.
I don't know if this is actually the answer, but my reputation isn't high enough to comment, so I'm putting it here. You obviously don't need to accept it as the final answer if it doesn't solve the issue.
So, I think it might have something to do with the fact that all of your print statements are actually outside of the showCounts() function. Try indenting the print statements.
I hope this helps.

Categories