I am currently trying to count the characters per line and the first 1,000 characters of a txt file. My code counts the characters of the file and produces no error, but does not stop the count at 1,000. I know there needs to be a break to fix this problem, but I do not know what I'm doing wrong. This is my first post and would like to apologize in advance if I'm not being succinct or clear enough.
This keeps the values of characters at over 1,000 but does not print them:
with open('myfile', 'r') as f:
characters = 0
for lines in f.readlines():
length = len(lines) - lines.count('\n')
characters += sum(len(length) for length in lines)
if characters >= 1000:
break
print('the number of characters in this line is: %s' % length)
print('the total number of characters is: %s' % characters)
and this also keeps the values of characters at over 1,000 and prints them:
with open('myfile', 'r') as f:
characters = 0
for lines in f.readlines():
length = len(lines) - lines.count('\n')
characters += sum(len(length) for length in lines)
print('the number of characters in this line is: %s' % length)
print('the total number of characters is: %s' % characters)
with open('myfile', 'r') as f:
ch = 0
for line in f.readlines():
length = len(line)
ch = ch + len(line)
print('the number of characters in this line is: %d' % length)
if ch >= 1000:
break
f.close()
Related
I want to open a file, and read the first 10 lines of a file. If a file has less than 10 lines it should read as many lines as it has. Each line has to be numbered, wether it's text or it's whitespace. Because I have to strip each line, I can't differentiate between an empty string, and the end of a file. For example if I read a file with only three lines, it will print out lines 1 - 10, with lines 4 - 10 being empty, but I would like to have it stop after reaching that 3rd line, as that would be the end of the file. I would really appreciate any help, thank you.
def get_file_name():
fileName = input('Input File Name: ')
return fileName
def top(fileName):
try:
file = open(fileName, 'r')
line = 'text'
cnt = 1
while cnt <= 10:
if line != '':
line = file.readline()
line = line.rstrip('\n')
print(str(cnt) + '.', line)
cnt += 1
else:
line = file.readline()
line = line.rstrip('\n')
print(str(cnt) + '.', line)
cnt += 1
file.close()
except IOError:
print('FILE NOT FOUND ERROR:', fileName)
def main():
fileName = get_file_name()
top(fileName)
main()
def read_lines():
f = open("file-name.txt","r")
num = 1
for line in f:
if num > 10:
break
print("LINE NO.",num, ":",line)
num = num + 1
f.close()
Here, the loop exits at the end of the file. So if you only had 7 lines, it will exit automatically after the 7th line.
However, if you have 10 or more than 10 lines then the "num" variable takes care of that.
EDIT: I have edited the print statement to include the line count as well and started the line count with 1.
with open(filename, 'r') as f:
cnt = 1
for line in f:
if cnt <= 10:
print(str(cnt) + '.', line, end='')
cnt += 1
else:
break
This should do exactly what you need. You can always remove the if/else and then it will read exactly however many lines are in the file. Example:
with open(filename, 'r') as f:
cnt = 1
for line in f:
print(str(cnt) + '.', line, end='')
cnt += 1
You can try to load all the lines into array, count the total line and use an if statement to check if total is 10 or not, then finally use a for loop like for i in range (0,9): to print the lines.
how do you count charcters with out spaces? I am not getting the right number. The right number of num_charsx is 1761
num_words = 0
num_chars = 0
with open("C:/Python33/fire.txt",'r') as f:
for line in f:
words = line.split('\n')
num_words += len(words)
num_chars += len(line)
num_charsx = num_chars - line.count(' ')
print(num_charsx)
2064
words = line.split('\n')
num_words += len(words)
doesn't do what you think it does. In the loop
for line in f:
line is a string that ends in '\n', so line.split('\n') is a two-item list, with the first item containing all the characters of the line apart from the terminating '\n'; the second item in that list is the empty string. Example:
line = 'This is a test\n'
words = line.split('\n')
print(words, len(words))
output
['This is a test', ''] 2
So your num_words += len(words) doesn't actually count words, it just gets twice the count of the number of lines.
To get an actual list of the words in line you need
words = line.split()
Your penultimate line
num_charsx = num_chars - line.count(' ')
is outside the for loop so it subtracts the space count of the last line of the file from the total num_chars, but I assume you really want to subtract the total space count of the whole file from num_chars.
Here's a repaired version of your code.
num_words = 0
num_chars = 0
num_spaces = 0
with open(fname, 'r') as f:
for num_lines, line in enumerate(f, 1):
num_words += len(line.split())
num_chars += len(line) - 1
num_spaces += line.count(' ')
num_charsx = num_chars - num_spaces
print(num_lines, num_words, num_chars, num_spaces, num_charsx)
I've modified the line reading loop to use enumerate. That's an efficient way to get the line number and the line contents without having to maintain a separate line counter.
In num_chars += len(line) - 1 the -1 is so we don't include the terminating '\n' of each line in the char count.
Note that on Windows text file lines are (normally) terminated with '\r\n' but that terminator gets converted to '\n' when you read a file opened in text mode. So on Windows the actual byte size of the file is num_chars + 2 * num_lines, assuming the last line has a '\r\n' terminator; it may not, in which case the actual size will be 2 bytes less than that.
You may want to try splitting the lines with a ' ' instead of a '\n'. As the '\n' should pretty much being done by the for loop.
The other option if you just want a character count is you could just use the replace method to remove ' ' and then count the length of the string.
num_chars = len(line.replace(' ', ''))
You could also try this:
num_chars = 0
with open("C:/Python33/fire.txt",'r') as f:
for line in f:
num_chars += len(line.split('\n')[0])
num_charsx = num_chars - line.count(' ')
print(num_charsx)
I am trying to count the number of paragraphs and the most frequent words in a text file (any text file for that matter) but seem to have zero output when I run my code, no errors either. Any tips on where I'm going wrong?
filename = input("enter file name: ")
inf = open(filename, 'r')
#frequent words
wordcount={}
for word in inf.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for key in wordcount.keys():
print ("%s %s " %(key , wordcount[key]))
#Count Paragraph(s)
linecount = 0
for i in inf:
paragraphcount = 0
if '\n' in i:
linecount += 1
if len(i) < 2: paragraphcount *= 0
elif len(i) > 2: paragraphcount = paragraphcount + 1
print('%-4d %4d %s' % (paragraphcount, linecount, i))
inf.close()
filename = raw_input("enter file name: ")
wordcount={}
paragraphcount = 0
linecount = 0
with open(filename, 'r') as ftext:
for line in ftext.readlines():
if line in ('\n', '\r\n'):
if linecount == 0:
paragraphcount = paragraphcount + 1
linecount = linecount + 1
else:
linecount = 0
#frequent words
for word in line.split():
wordcount[word] = wordcount.get(word,0) + 1
print wordcount
print paragraphcount
When you are reading a file, there is a cursor that indicates which byte you are reading at the moment. In your code, you are trying to read the file twice and encountered a strange behavior, which shoud have been a hint that you are doing something wrong. To the solution,
What is the correct way ?
You should read the file once, store every line, then find word count and paragraph count, using the same store. Rather than trying to reading it twice.
What is happening is the current code ?
When you first read the file, your byte cursor is set to the end of the file, when you try to read lines, if returns an empty list because it tries to read the end of the file. You can corrent this by resetting the file pointer(the cursor).
Call inf.seek(0) just before you try to read lines. But instead of this, you should be focusing on implementing a method I mentioned in the first section.
This is what I have so far:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
infile = open(filename)
lines = infile.readlines()
words = infile.read()
chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))
When executed, return the number of lines properly, but return 0 for words and character counts. Not sure why...
You can iterate through the file once and count lines, words and chars without seeking back to the beginning multiple times, which you would need to do with your approach because you exhaust the iterator when counting lines:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
lines = chars = 0
words = []
with open(filename) as infile:
for line in infile:
lines += 1
words.extend(line.split())
chars += len(line)
print("line count:", lines)
print("word count:", len(words))
print("character counter:", chars)
return len(words) > len(set(words)) # Returns True if duplicate words
Or alternatively use the side effect that the file is at the end for chars:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
words = []
with open(filename) as infile:
for lines, line in enumerate(infile, 1):
words.extend(line.split())
chars = infile.tell()
print("line count:", lines)
print("word count:", len(words))
print("character counter:", chars)
return len(words) > len(set(words)) # Returns True if duplicate words
you need to go back to beginning of file with infile.seek(0) after you read the position is at the end, seek(0) resets it to the start, so that you can read again.
infile = open('data')
lines = infile.readlines()
infile.seek(0)
print(lines)
words = infile.read()
infile.seek(0)
chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))
Output:
line count: 2
word count: 19
character counter: 113
other way of doing it....:
from collections import Counter
from itertools import chain
infile = open('data')
lines = infile.readlines()
cnt_lines = len(lines)
words = list(chain.from_iterable([x.split() for x in lines]))
cnt_words = len(words)
cnt_chars = len([ c for word in words for c in word])
# show words frequency
print(Counter(words))
You have exhausted the iterator after you call to readlines, you can seek back to the start but really you don't need to read all the file into memory at all:
def stats(filename):
chars, words, dupes = 0, 0, False
seen = set()
with open(filename) as f:
for i, line in enumerate(f, 1):
chars += len(line)
spl = line.split()
words += len(spl)
if dupes or not seen.isdisjoint(spl):
dupes = True
elif not dupes:
seen.update(spl)
return i, chars, words, dupes
Then assign the values by unpacking:
no_lines, no_chars, no_words, has_dupes = stats("your_file")
You may want to use chars += len(line.rstrip()) if you don't want to include the line endings. The code only stores exactly the amount of data needed, using readlines, read, dicts of full data etc.. means for large files your code won't be very practical
File_Name = 'file.txt'
line_count = 0
word_count = 0
char_count = 0
with open(File_Name,'r') as fh:
# This will produce a list of lines.
# Each line of the file will be an element of the list.
data = fh.readlines()
# Count of total number for list elements == total number of lines.
line_count = len(data)
for line in data:
word_count = word_count + len(line.split())
char_count = char_count + len(line)
print('Line Count : ' , line_count )
print('Word Count : ', word_count)
print('Char Count : ', char_count)
I have a code that relies on me reading a text file, printing off the numbers where there are numbers, printing off specific error messages where there are strings instead of numbers, then summing ALL the numbers up and printing their sum (then saving ONLY the numbers to a new text file).
I have been attempting this problem for hours, and I have what is written below.
I do not know why my code does not seem to be summing up properly.
And the python code:
f=open("C:\\Users\\Emily\\Documents\\not_just_numbers.txt", "r")
s=f.readlines()
p=str(s)
for line in s:
printnum=0
try:
printnum+=float(line)
print("Adding:", printnum)
except ValueError:
print("Invalid Literal for Int() With Base 10:", ValueError)
for line in s:
if p.isdigit():
total=0
for number in s:
total+=int(number)
print("The sum is:", total)
I have a code that relies on me reading a text file, printing off the
numbers where there are numbers, printing off specific error messages
where there are strings instead of numbers, then summing ALL the
numbers up and printing their sum (then saving ONLY the numbers to a
new text file).
So you have to do the following:
Print numbers
Print a message when there isn't a number
Sum the numbers and print the sum
Save only the numbers to a new file
Here is one approach:
total = 0
with open('input.txt', 'r') as inp, open('output.txt', 'w') as outp:
for line in inp:
try:
num = float(line)
total += num
outp.write(line)
except ValueError:
print('{} is not a number!'.format(line))
print('Total of all numbers: {}'.format(total))
This is a very short way to sum all numbers in your file (you will have to add try and except)
import re
print(sum(float(num) for num in re.findall('[0-9]+', open("C:\\Users\\Emily\\Documents\\not_just_numbers.txt", 'r').read())))
You are checking the wrong condition:
for line in s:
if p.isdigit():
p is this:
s=f.readlines()
p=str(s)
Being a strified version of a list, it will start with a '[', and hence p.isdigit() will always be false. You instead want to check line.isdigit(), and you want to only initialise total once instead of each time around the loop:
total = 0
for line in f:
if line.isdigit():
total += int(line)
Note that by iterating over f directly, you also don't need to ever call readlines().
Here is what you can do:
data.txt:
1
2
hello
3
world
4
code:
total = 0
with open('data.txt') as infile:
with open('results.txt', 'w') as outfile:
for line in infile:
try:
num = int(line)
total += num
print(num, file=outfile)
except ValueError:
print(
"'{}' is not a number".format(line.rstrip())
)
print(total)
--output:--
'hello' is not a number
'world' is not a number
10
$ cat results.txt
1
2
3
4
you can also try this:
f=open("C:\\Users\\Emily\\Documents\\not_just_numbers.txt", "r")
ww=open("C:\\Users\\Emily\\Documents\\not_just_numbers_out.txt", "w")
s=f.readlines()
p=str(s)
for line in s:
#printnum=0
try:
#printnum+=float(line)
print("Adding:", float(line))
ww.write(line)
except ValueError:
print("Invalid Literal for Int() With Base 10:", ValueError)
total=0
for line in s:
if line.strip().isdigit():
total += int(line)
print("The sum is:", total)
here str.strip([chars]) means
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped
Every time you enter a new line you reset the total to zero if the number is a digit.
You might want your total to initialize before you enter the loop.
I tried debugging the for loop using the isdigit and isalpha
apparently every new line is not considered a digit or alphanumeric these always evaluate to false
As it turns out you don't need the for loop you've done most of the program with your try except statement
Here's how I did it on my system.
f = open("/home/david/Desktop/not_just_numbers.txt", 'r')
s = f.readlines()
p = str(s)
total = 0
for line in s:
#print(int(line))
printnum = 0
try:
printnum += float(line)
total += printnum
#print("Adding: ", printnum)
except ValueError:
print("Invalid Literal for Int() With Base 10:", ValueError)
print("The sum is: ", total)
$ echo -e '1/n2/n3/n4/n5' | python -c "import sys; print sum(int(l) for l in sys.stdin)"
Read from file containing numbers separated by new lines:
total = 0
with open("file_with_numbers.txt", "r") as f:
for line in f:
total += int(line)
print(total)