Not counting correct characters from file - python

I have the following code which runs as expected but has a few issues when counting the number of characters and the length of the longest line in characters. The follwoing is my code:
def stats(file_name):
n_chars = 0
n_words = 0
n_lines = 0
longest_line = 0
with open(file_name) as f:
lines = f.readlines()
n_lines = len(lines)
longest_line = max([len(line) for line in lines])
words = []
line_words = [line.split() for line in lines]
for line in line_words:
for word in line:
words.append(word)
n_words = len(words)
chars = []
line_chars = [list(word) for word in words]
for line in line_chars:
for char in line:
chars.append(char)
n_chars = len(chars)
f.close()
return n_chars, n_words, n_lines, longest_line
Can you guys see anything that would make the code not count the correct number of characters. The longest line always appears as one more than the correct answer.
The input is the following:
BEAUTIFUL Soup, so rich and green,
Waiting in a hot tureen!
Who for such dainties would not stoop?
Soup of the evening, beautiful Soup!
Soup of the evening, beautiful Soup!
Beau--ootiful Soo-oop!
Beau--ootiful Soo-oop!
Soo--oop of the e--e--evening,
Beautiful, beautiful Soup!
Beautiful Soup! Who cares for fish,
Game, or any other dish?
Who would not give all else for two
Pennyworth only of Beautiful Soup?
Pennyworth only of beautiful Soup?
Beau--ootiful Soo-oop!
Beau--ootiful Soo-oop!
Soo--oop of the e--e--evening,
Beautiful, beauti--FUL SOUP!
The expected output is the following:
Characters: 553
Words: 81
Lines: 21
Longest line: 38
The actual (failed) output:
characters: 469
words: 81
lines: 21
longest: 39

You only count the non-whitespace characters. Probably the wanted number of characters include whitespaces.
def stats(file_name):
n_chars = 0
n_words = 0
longest_line = 0
with open(file_name) as lines:
for n_lines, line in enumerate(lines, 1):
longest_line = max(longest_line, len(line.rstrip('\n'))
n_chars += len(line)
n_words += len(line.split())
return n_chars, n_words, n_lines, longest_line

To get the correct number of characters, you have to count spaces as well as the other characters. Otherwise you'll get much smaller value. Something like:
n_chars = sum( [len(line) for line in lines] )

Related

How to count total words in a text file without using rstrip() in Python?

Good day!
I have the following snippets:
words_count = 0
lines_count = 0
line_max = None
file = open("alice.txt", "r")
for line in file:
line = line.rstrip("\n")
words = line.split()
words_count += len(words)
if line_max == None or len(words) > len(line_max.split()):
line_max = line
lines.append(line)
file.close()
This is using rstrip method to get rid of the white spaces in the file, but my exam unit do not allow the method rstrip since it was not introduced. My question is: Is there any other way to get the same result of Total number of words: 26466 without using the rstrip?
Thank you guys!
Interestingly, this works for me without using str.rstrip:
import requests
wc = 0
content = requests.get('https://files.catbox.moe/dz39pw.txt').text
for line in content.split('\n'):
# line = line.rstrip("\n")
words = line.split()
wc += len(words)
assert wc == 26466
Note that a one-liner way of doing that in Python could be:
wc = sum(len(line.split()) for line in content.split('\n'))

How to take out punctuation from string and find a count of words of a certain length?

I am opening trying to create a function that opens a .txt file and counts the words that have the same length as the number specified by the user.
The .txt file is:
This is a random text document. How many words have a length of one?
How many words have the length three? We have the power to figure it out!
Is a function capable of doing this?
I'm able to open and read the file, but I am unable to exclude punctuation and find the length of each word.
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
if len(i) == number:
count += 1
return count
You can try using the replace() on the string and pass in the desired punctuation and replace it with an empty string("").
It would look something like this:
puncstr = "Hello!"
nopuncstr = puncstr.replace(".", "").replace("?", "").replace("!", "")
I have written a sample code to remove punctuations and to count the number of words. Modify according to your requirement.
import re
fin = """This is a random text document. How many words have a length of one? How many words have the length three? We have the power to figure it out! Is a function capable of doing this?"""
fin = re.sub(r'[^\w\s]','',fin)
print(len(fin.split()))
The above code prints the number of words. Hope this helps!!
instead of cascading replace() just use strip() a one time call
Edit: a cleaner version
pl = '?!."\'' # punctuation list
def samplePractice(number):
with open('sample.txt', 'r') as fin:
words = fin.read().split()
# clean words
words = [w.strip(pl) for w in words]
count = 0
for word in words:
if len(word) == number:
print(word, end=', ')
count += 1
return count
result = samplePractice(4)
print('\nResult:', result)
output:
This, text, many, have, many, have, have, this,
Result: 8
your code is almost ok, it just the second for block in wrong position
pl = '?!."\'' # punctuation list
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
i = i.strip(pl) # clean the word by strip
if len(i) == number:
count += 1
return count
result = samplePractice(4)
print(result)
output:
8

Output features of a file based on its longest line

I want to write a program file_stats.py that when run on the command line, accepts a text file name as an argument and outputs the number of characters, words, lines, and the length (in characters) of the longest line in the file. Does anyone know the proper syntax to do something like this if I want the output to look like this:
Characters: 553
Words: 81
Lines: 21
Longest line: 38
Assuming your file path is a string, something like this should work
file = "pathtofile.txt"
with open(file, "r") as f:
text = f.read()
lines = text.split("\n")
longest_line = 0
for l in lines:
if len(l) > longest_line:
longest_line = len(l)
print("Longest line: {}".format(longest_line))
The whole program
n_chars = 0
n_words = 0
n_lines = 0
longest_line = 0
with open('my_text_file') as f:
lines = f.readlines()
# Find the number of Lines
n_lines = len(lines)
# Find the Longest line
longest_line = max([len(line) for line in lines])
# Find the number of Words
words = []
line_words = [line.split() for line in lines]
for line in line_words:
for word in line:
words.append(word)
n_words = len(words)
# Find the number of Characters
chars = []
line_chars = [list(word) for word in words]
for line in line_chars:
for char in line:
chars.append(char)
n_chars = len(chars)
print("Characters: ", n_chars)
print("Words: ", n_words)
print("Lines: ", n_lines)
print("Longest: ", longest_line)

How to search words from txt file to python

How can I show words which length are 20 in a text file?
To show how to list all the word, I know I can use the following code:
#Program for searching words is in 20 words length in words.txt file
def main():
file = open("words.txt","r")
lines = file.readlines()
file.close()
for line in lines:
print (line)
return
main()
But I not sure how to focus and show all the words with 20 letters.
Big thanks
If your lines have lines of text and not just a single word per line, you would first have to split them, which returns a list of the words:
words = line.split(' ')
Then you can iterate over each word in this list and check whether its length is 20.
for word in words:
if len(word) == 20:
# Do what you want to do here
If each line has a single word, you can just operate on line directly and skip the for loop. You may need to strip the trailing end-of-line character though, word = line.strip('\n'). If you just want to collect them all, you can do this:
words_longer_than_20 = []
for word in words:
if len(word) > 20:
words_longer_than_20.append(word)
If your file has one word only per line, and you want only the words with 20 letters you can simply use:
with open("words.txt", "r") as f:
words = f.read().splitlines()
found = [x for x in words if len(x) == 20]
you can then print the list or print each word seperately
You can try this:
f = open('file.txt')
new_file = f.read().splitlines()
words = [i for i in f if len(i) == 20]
f.close()

word counter || python

I want to print the number of words in a txt file having 1-20 letter.
Tried this but it prints 20 zeroes instead. any idea?
edit - in the end the program should plot 20 numbers, each one is the number of words in the file containing 1-20 letters.
fin = open('words.txt')
for i in range(20):
counter = 0
for line in fin:
word = line.strip()
if len(word) == i:
counter = counter + 1
print counter,
EDIT
To produce individual counts for each word length you can use a collections.Counter:
from collections import Counter
def word_lengths(f):
for line in f:
for word in line.split(): # does not ignore punctuation
yield len(word)
with open('words.txt') as fin:
counts = Counter(length for length in word_lengths(fin) if length <= 20)
This uses a generator to read the file and produce a sequence of word lengths. The filtered word lengths are fed into a Counter. You could perform the length filtering on the Counter instead.
If you want to ignore punctuation you could look at using str.translate() to remove unwanted characters, or possibly re.split(r'\W+', line) instead of line.split().
Try it like this:
with open('words.txt') as fin:
counter = 0
for line in fin:
for word in line.split():
if len(word) <= 20:
counter = counter + 1
print counter,
This could be simplified to:
with open('words.txt') as fin:
counter = sum([1 for line in fin
for word in line.split() if len(word) <= 20])
but that's playing code golf.
You can also use a collections.Counter if it is practical to read the entire file into memory:
from collections import Counter
with open('words.txt') as fin:
c = Counter(fin.read().split())
counter = sum(c[k] for k in c if len(k) <= 20)
And no doubt there are many other ways to do it. None of the above expect or handle punctuation.
It should be like this,counter shouldn't be in for loop,and you could use len() method to get the length of words:
with open("test") as f:
counter = 0
for line in f:
for word in line.split():
if len(word)<=20:
counter+=1
print counter
Or my way:
import re
with open("file") as f:
print len(filter(lambda x:len(x)<20,re.split('\n| ', f.read())))
Hope this helps.
using regular expressions
import re
REGEX = r"(\b\S{1,20}\b)"
finder = re.compile(REGEX)
with open("words.txt") as out:
data = out.read()
matches = re.findall(finder, data)
lst = [0 for _ in range(20)]
for m in matches:
lst[len(m)] += 1
print(lst)

Categories