Calculating Statistics From File

Calculating Statistics From File - python

Write a function named file_stats that takes one string parameter (in_file) that is the name of an existing text file. The function file_stats should calculate three statistics about in_file: the number of lines it contains, the number of words and the number of characters, and print the three statistics on separate lines. For example, the following would be be correct input and output. (Hint: the number of characters may vary depending on what platform you are working.)
file_stats('created_equal.txt')
lines 2
words 13
characters 72
Below is what I have:
fileName = "C:\Users\Jeff Hardy\Desktop\index.txt"
chars = 0
words = 0
lines = 0
def file_stats(in_file):
global lines, words, chars
with open(in_file, 'r') as fd:
for line in fd:
lines += 1
wordsList = line.split()
words += len(wordsList)
for word in wordsList:
chars += len(word)
file_stats(fileName)
print("Number of lines: {0}".format(lines))
print("Number of words: {0}".format(words))
print("Number of chars: {0}".format(chars))
The code is giving me the following error:
(unicode error) 'unicodeescape' codec can't decode bytes in
position 2-3: truncated\UXXXXXXXX escape

I believe your error has to do with the encoding of your file,
or needs to be fileName = "C:\\Users\\Jeff Hardy\\Desktop\\index.txt"
and the instructions are asking you to print within the function, not affect a global variable, then you need to update the values within the loop, not after it (indentation matters)
def file_stats(in_file):
lines = words = chars = 0
with open(in_file, 'r', encoding="utf-8") as fd:
for line in fd:
lines += 1
words += len(line.split()) # If you split "x , y" is the comma a word?
chars += len(line) # Are spaces considered a character?
print("lines {0}".format(lines))
print("words {0}".format(words))
print("characters {0}".format(chards))

Related

Why do two of the same strings not return as being the same when compared?

I have the following code:
file = open('AdjectivesList.txt', 'r')
lines = file.readlines()
file.close()
for word in words:
wordLowercase = word.lower()
for x, lol in enumerate(lines):
gg = (lines[x].lower())
if wordLowercase == gg:
print('identified')
Even when wordLowercase does equal gg, the string "identified" is not being printed. Why is this the case?

.readlines() includes the newline character at the end of every line in the text file. This is most likely the cause of your problem. You can remove the newline character (and any whitespace characters from the left and right of the string) by using .strip().
gg = lines[x].lower().strip()
Reference
https://www.tutorialspoint.com/python/file_readlines.htm

I am finding the next line after matching a line using startswith in python but i have multiple spaces between lines

Escape character is '^]'.
abc-2#terminal length 0
I am reading multiple files and This is the content i have in files and I am trying find the next line using "Escape character is '^]'." and Every file has different number of spaces in between 2 lines.
I am writing below code, but it is printing empty
with open(report_file_path, "r") as in_file:
for line in in_file:
abc="Escape character is '^]'."
if line.strip() == abc:
result= next(in_file)
print result
#Output should be : abc-2#terminal length 0
but I am getting empty

Use a while loop to check if the next line have any content.
Ex:
with open(filename2, "r") as in_file:
for line in in_file:
abc="Escape character is '^]'."
if line.strip()==abc:
while True:
result= next(in_file)
if result.strip():
break
print(result)

trying to print to a text file with words that only have two or more occurring vowels

import re
twovowels=re.compile(r".*[aeiou].*[aeiou].*", re.I)
nonword=re.compile(r"\W+", re.U)
text_file = open("twoVoweledWordList.txt", "w")
file = open("FirstMondayArticle.html","r")
for line in file:
for word in nonword.split(line):
if twovowels.match(word): print word
text_file.write('\n' + word)
text_file.close()
file.close()
This is my python code, I am trying to print only the words that have two or more occurring vowels. When i run this code, it prints everything, including the words and numbers that do not have vowels, to my text file. But the python shell shows me all of the words that have two or more occurring vowels. So how do I change that?

You can remove the vowels with str.translate and compare lengths. If after removing the letters the length difference is > 1 you have at least two vowels:
with open("FirstMondayArticle.html") as f, open("twoVoweledWordList.txt", "w") as out:
for line in file:
for word in line.split():
if len(word) - len(word.lower().translate(None,"aeiou")) > 1:
out.write("{}\n".format(word.rstrip()))
In your own code you always write the word as text_file.write('\n' + word) is outside the if block. a good lesson in why you should not have multiple statements on one line, your code is equivalent to:
if twovowels.match(word):
print(word)
text_file.write('\n' + word) # <- outside the if
Your code with the if in the correct location, some changes to your naming convention, adding some spaces between assignments and using with which closes your files for you:
import re
with open("FirstMondayArticle.html") as f, open("twoVoweledWordList.txt", "w") as out:
two_vowels = re.compile(r".*[aeiou].*[aeiou].*", re.I)
non_word = re.compile(r"\W+", re.U)
for line in f:
for word in non_word.split(line):
if two_vowels.match(word):
print(word)
out.write("{}\n".format(word.rstrip()))

Because it is outside of if condition. This is what the code lines should look like:
for line in file:
for word in nonword.split(line):
if twovowels.match(word):
print word
text_file.write('\n' + word)
text_file.close()
file.close()
Here is a sample program on Tutorialspoint showing the code above is correct.

I would suggest an alternate, and simpler, method, not using re:
def twovowels(word):
count = 0
for char in word.lower():
if char in "aeiou":
count = count + 1
if count > 1:
return True
return False
with open("FirstMondayArticle.html") as file,
open("twoVoweledWordList.txt", "w") as text_file:
for line in file:
for word in line.split():
if twovowels(word):
print word
text_file.write(word + "\n")

counting characters and lines from a file python 2.7

I'm writing a program that counts all lines, words and characters from a file given as input.
import string
def main():
print "Program determines the number of lines, words and chars in a file."
file_name = raw_input("What is the file name to analyze? ")
in_file = open(file_name, 'r')
data = in_file.read()
words = string.split(data)
chars = 0
lines = 0
for i in words:
chars = chars + len(i)
print chars, len(words)
main()
To some extent, the code is ok.
I don't know however how to count 'spaces' in the file. My character counter counts only letters, spaces are excluded.
Plus I'm drawing a blank when it comes to counting lines.

You can just use len(data) for the character length.
You can split data by lines using the .splitlines() method, and length of that result is the number of lines.
But, a better approach would be to read the file line by line:
chars = words = lines = 0
with open(file_name, 'r') as in_file:
for line in in_file:
lines += 1
words += len(line.split())
chars += len(line)
Now the program will work even if the file is very large; it won't hold more than one line at a time in memory (plus a small buffer that python keeps to make the for line in in_file: loop a little faster).

Very Simple:
If you want to print no of chars , no of words and no of lines in the file. and including the spaces.. Shortest answer i feel is mine..
import string
data = open('diamond.txt', 'r').read()
print len(data.splitlines()), len(string.split(data)), len(data)
Keep coding buddies...

read file-
d=fp.readlines()
characters-
sum([len(i)-1 for i in d])
lines-
len(d)
words-
sum([len(i.split()) for i in d])

This is one crude way of counting words without using any keywords:
#count number of words in file
fp=open("hello1.txt","r+");
data=fp.read();
word_count=1;
for i in data:
if i==" ":
word_count=word_count+1;
# end if
# end for
print ("number of words are:", word_count);

Iterable error in word count Python 3.3 program

I'm trying to complete a simple word-count program, which keeps track of the number of words, characters, and lines in a connected file.
# This program counts the number of lines, words, and characters in a file, entered by the user.
# The file is test text from a standard lorem ipsum generator.
import string
def wc():
# Sets the count of normal lines, words, and characters to 0 for proper iterative operation.
lines = 0
words = 0
chars = 0
print("This program will count the number of lines, words, and characters in a file.")
# Stores a variable as a string for more graceful coding and no errors experienced previously.
filename =("test.txt")
# Opens file and stores it as new variable, and loops through each line once the connection with file is made.
with open(filename, 'r') as fileObject:
for l in fileObject:
# Splits text file into each individual word for word count.
words = l.split()
lines += 1
words += len(words)
chars += len(l)
print("Lines:", lines)
print("Words:", words)
print("Characters:", chars)
wc()
while 1:
pass
Now, if all goes well, it should be printing the total number of lines, letters, and words in the file, but all I get is this message:
"words += len(words)
TypeError: 'int' object is not iterable
"
What is wrong?
SOLVED! New code:
# This program counts the number of lines, words, and characters in a file, entered by the user.
# The file is test text from a standard lorem ipsum generator.
import string
def wc():
# Sets the count of normal lines, words, and characters to 0 for proper iterative operation.
lines = 0
words = 0
chars = 0
print("This program will count the number of lines, words, and characters in a file.")
# Stores a variable as a string for more graceful coding and no errors experienced previously.
filename =("test.txt")
# Opens file and stores it as new variable, and loops through each line once the connection with file is made.
with open(filename, 'r') as fileObject:
for l in fileObject:
# Splits text file into each individual word for word count.
wordsFind = l.split()
lines += 1
words += len(wordsFind)
chars += len(l)
print("Lines:", lines)
print("Words:", words)
print("Characters:", chars)
wc()
while 1:
pass

It looks like you're using the variable name words for your count, and also for the result of l.split(). You need to differentiate these by using different variable names for them.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calculating Statistics From File - python

Related

Why do two of the same strings not return as being the same when compared?

I am finding the next line after matching a line using startswith in python but i have multiple spaces between lines

trying to print to a text file with words that only have two or more occurring vowels

counting characters and lines from a file python 2.7

Iterable error in word count Python 3.3 program

Categories

Resources