how do you count charcters with out spaces? I am not getting the right number. The right number of num_charsx is 1761
num_words = 0
num_chars = 0
with open("C:/Python33/fire.txt",'r') as f:
for line in f:
words = line.split('\n')
num_words += len(words)
num_chars += len(line)
num_charsx = num_chars - line.count(' ')
print(num_charsx)
2064
words = line.split('\n')
num_words += len(words)
doesn't do what you think it does. In the loop
for line in f:
line is a string that ends in '\n', so line.split('\n') is a two-item list, with the first item containing all the characters of the line apart from the terminating '\n'; the second item in that list is the empty string. Example:
line = 'This is a test\n'
words = line.split('\n')
print(words, len(words))
output
['This is a test', ''] 2
So your num_words += len(words) doesn't actually count words, it just gets twice the count of the number of lines.
To get an actual list of the words in line you need
words = line.split()
Your penultimate line
num_charsx = num_chars - line.count(' ')
is outside the for loop so it subtracts the space count of the last line of the file from the total num_chars, but I assume you really want to subtract the total space count of the whole file from num_chars.
Here's a repaired version of your code.
num_words = 0
num_chars = 0
num_spaces = 0
with open(fname, 'r') as f:
for num_lines, line in enumerate(f, 1):
num_words += len(line.split())
num_chars += len(line) - 1
num_spaces += line.count(' ')
num_charsx = num_chars - num_spaces
print(num_lines, num_words, num_chars, num_spaces, num_charsx)
I've modified the line reading loop to use enumerate. That's an efficient way to get the line number and the line contents without having to maintain a separate line counter.
In num_chars += len(line) - 1 the -1 is so we don't include the terminating '\n' of each line in the char count.
Note that on Windows text file lines are (normally) terminated with '\r\n' but that terminator gets converted to '\n' when you read a file opened in text mode. So on Windows the actual byte size of the file is num_chars + 2 * num_lines, assuming the last line has a '\r\n' terminator; it may not, in which case the actual size will be 2 bytes less than that.
You may want to try splitting the lines with a ' ' instead of a '\n'. As the '\n' should pretty much being done by the for loop.
The other option if you just want a character count is you could just use the replace method to remove ' ' and then count the length of the string.
num_chars = len(line.replace(' ', ''))
You could also try this:
num_chars = 0
with open("C:/Python33/fire.txt",'r') as f:
for line in f:
num_chars += len(line.split('\n')[0])
num_charsx = num_chars - line.count(' ')
print(num_charsx)
Related
I am currently trying to count the characters per line and the first 1,000 characters of a txt file. My code counts the characters of the file and produces no error, but does not stop the count at 1,000. I know there needs to be a break to fix this problem, but I do not know what I'm doing wrong. This is my first post and would like to apologize in advance if I'm not being succinct or clear enough.
This keeps the values of characters at over 1,000 but does not print them:
with open('myfile', 'r') as f:
characters = 0
for lines in f.readlines():
length = len(lines) - lines.count('\n')
characters += sum(len(length) for length in lines)
if characters >= 1000:
break
print('the number of characters in this line is: %s' % length)
print('the total number of characters is: %s' % characters)
and this also keeps the values of characters at over 1,000 and prints them:
with open('myfile', 'r') as f:
characters = 0
for lines in f.readlines():
length = len(lines) - lines.count('\n')
characters += sum(len(length) for length in lines)
print('the number of characters in this line is: %s' % length)
print('the total number of characters is: %s' % characters)
with open('myfile', 'r') as f:
ch = 0
for line in f.readlines():
length = len(line)
ch = ch + len(line)
print('the number of characters in this line is: %d' % length)
if ch >= 1000:
break
f.close()
I want to write a program file_stats.py that when run on the command line, accepts a text file name as an argument and outputs the number of characters, words, lines, and the length (in characters) of the longest line in the file. Does anyone know the proper syntax to do something like this if I want the output to look like this:
Characters: 553
Words: 81
Lines: 21
Longest line: 38
Assuming your file path is a string, something like this should work
file = "pathtofile.txt"
with open(file, "r") as f:
text = f.read()
lines = text.split("\n")
longest_line = 0
for l in lines:
if len(l) > longest_line:
longest_line = len(l)
print("Longest line: {}".format(longest_line))
The whole program
n_chars = 0
n_words = 0
n_lines = 0
longest_line = 0
with open('my_text_file') as f:
lines = f.readlines()
# Find the number of Lines
n_lines = len(lines)
# Find the Longest line
longest_line = max([len(line) for line in lines])
# Find the number of Words
words = []
line_words = [line.split() for line in lines]
for line in line_words:
for word in line:
words.append(word)
n_words = len(words)
# Find the number of Characters
chars = []
line_chars = [list(word) for word in words]
for line in line_chars:
for char in line:
chars.append(char)
n_chars = len(chars)
print("Characters: ", n_chars)
print("Words: ", n_words)
print("Lines: ", n_lines)
print("Longest: ", longest_line)
I am trying to count the number of paragraphs and the most frequent words in a text file (any text file for that matter) but seem to have zero output when I run my code, no errors either. Any tips on where I'm going wrong?
filename = input("enter file name: ")
inf = open(filename, 'r')
#frequent words
wordcount={}
for word in inf.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for key in wordcount.keys():
print ("%s %s " %(key , wordcount[key]))
#Count Paragraph(s)
linecount = 0
for i in inf:
paragraphcount = 0
if '\n' in i:
linecount += 1
if len(i) < 2: paragraphcount *= 0
elif len(i) > 2: paragraphcount = paragraphcount + 1
print('%-4d %4d %s' % (paragraphcount, linecount, i))
inf.close()
filename = raw_input("enter file name: ")
wordcount={}
paragraphcount = 0
linecount = 0
with open(filename, 'r') as ftext:
for line in ftext.readlines():
if line in ('\n', '\r\n'):
if linecount == 0:
paragraphcount = paragraphcount + 1
linecount = linecount + 1
else:
linecount = 0
#frequent words
for word in line.split():
wordcount[word] = wordcount.get(word,0) + 1
print wordcount
print paragraphcount
When you are reading a file, there is a cursor that indicates which byte you are reading at the moment. In your code, you are trying to read the file twice and encountered a strange behavior, which shoud have been a hint that you are doing something wrong. To the solution,
What is the correct way ?
You should read the file once, store every line, then find word count and paragraph count, using the same store. Rather than trying to reading it twice.
What is happening is the current code ?
When you first read the file, your byte cursor is set to the end of the file, when you try to read lines, if returns an empty list because it tries to read the end of the file. You can corrent this by resetting the file pointer(the cursor).
Call inf.seek(0) just before you try to read lines. But instead of this, you should be focusing on implementing a method I mentioned in the first section.
This is what I have so far:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
infile = open(filename)
lines = infile.readlines()
words = infile.read()
chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))
When executed, return the number of lines properly, but return 0 for words and character counts. Not sure why...
You can iterate through the file once and count lines, words and chars without seeking back to the beginning multiple times, which you would need to do with your approach because you exhaust the iterator when counting lines:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
lines = chars = 0
words = []
with open(filename) as infile:
for line in infile:
lines += 1
words.extend(line.split())
chars += len(line)
print("line count:", lines)
print("word count:", len(words))
print("character counter:", chars)
return len(words) > len(set(words)) # Returns True if duplicate words
Or alternatively use the side effect that the file is at the end for chars:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
words = []
with open(filename) as infile:
for lines, line in enumerate(infile, 1):
words.extend(line.split())
chars = infile.tell()
print("line count:", lines)
print("word count:", len(words))
print("character counter:", chars)
return len(words) > len(set(words)) # Returns True if duplicate words
you need to go back to beginning of file with infile.seek(0) after you read the position is at the end, seek(0) resets it to the start, so that you can read again.
infile = open('data')
lines = infile.readlines()
infile.seek(0)
print(lines)
words = infile.read()
infile.seek(0)
chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))
Output:
line count: 2
word count: 19
character counter: 113
other way of doing it....:
from collections import Counter
from itertools import chain
infile = open('data')
lines = infile.readlines()
cnt_lines = len(lines)
words = list(chain.from_iterable([x.split() for x in lines]))
cnt_words = len(words)
cnt_chars = len([ c for word in words for c in word])
# show words frequency
print(Counter(words))
You have exhausted the iterator after you call to readlines, you can seek back to the start but really you don't need to read all the file into memory at all:
def stats(filename):
chars, words, dupes = 0, 0, False
seen = set()
with open(filename) as f:
for i, line in enumerate(f, 1):
chars += len(line)
spl = line.split()
words += len(spl)
if dupes or not seen.isdisjoint(spl):
dupes = True
elif not dupes:
seen.update(spl)
return i, chars, words, dupes
Then assign the values by unpacking:
no_lines, no_chars, no_words, has_dupes = stats("your_file")
You may want to use chars += len(line.rstrip()) if you don't want to include the line endings. The code only stores exactly the amount of data needed, using readlines, read, dicts of full data etc.. means for large files your code won't be very practical
File_Name = 'file.txt'
line_count = 0
word_count = 0
char_count = 0
with open(File_Name,'r') as fh:
# This will produce a list of lines.
# Each line of the file will be an element of the list.
data = fh.readlines()
# Count of total number for list elements == total number of lines.
line_count = len(data)
for line in data:
word_count = word_count + len(line.split())
char_count = char_count + len(line)
print('Line Count : ' , line_count )
print('Word Count : ', word_count)
print('Char Count : ', char_count)
I am not able to strip the space and newlines. Any idea what might gone wrong?
line_count = 0
word_count = 0
for fline in fh:
line = repr(fline)
line = line.strip()
print line
line_count += 1
word_count += len(line.split())
result['size'] = filesize
result['line'] = line_count
result['words'] = word_count
output
'value of $input if it is\n'
' larger than or equal to ygjhg\n'
' that number. Otherwise assigns the value of \n'
' \n'
' '
Your strings are surrounded by double quotes because of repr():
>>> x = 'hello\n'
>>> repr(x)
"'hello\\n'"
>>> repr(x).strip()
"'hello\\n'"
>>>
Here is your edited code:
line_count = 0
word_count = 0
for fline in fh:
line = repr(line.strip())
print line
line_count += 1
word_count += len(line.split())
result['size'] = filesize
result['line'] = line_count
result['words'] = word_count
If fline is a string, then calling repr with it as the argument would enclose it in literal quotes. Thus:
foo\n
becomes
"foo\n"
Since the newline isn't at the end of the string anymore, strip won't remove it. Maybe consider not calling repr unless you desperately need to, or calling it after calling strip.
From what the others have mentioned, just change
line = repr(fline)
line = line.strip()
to
line = line.strip()
line = repr(fline)
Note that you might be wanting .rstrip() or even .rstrip("\n") instead.