counting characters and lines from a file python 2.7 - python

I'm writing a program that counts all lines, words and characters from a file given as input.
import string
def main():
print "Program determines the number of lines, words and chars in a file."
file_name = raw_input("What is the file name to analyze? ")
in_file = open(file_name, 'r')
data = in_file.read()
words = string.split(data)
chars = 0
lines = 0
for i in words:
chars = chars + len(i)
print chars, len(words)
main()
To some extent, the code is ok.
I don't know however how to count 'spaces' in the file. My character counter counts only letters, spaces are excluded.
Plus I'm drawing a blank when it comes to counting lines.

You can just use len(data) for the character length.
You can split data by lines using the .splitlines() method, and length of that result is the number of lines.
But, a better approach would be to read the file line by line:
chars = words = lines = 0
with open(file_name, 'r') as in_file:
for line in in_file:
lines += 1
words += len(line.split())
chars += len(line)
Now the program will work even if the file is very large; it won't hold more than one line at a time in memory (plus a small buffer that python keeps to make the for line in in_file: loop a little faster).

Very Simple:
If you want to print no of chars , no of words and no of lines in the file. and including the spaces.. Shortest answer i feel is mine..
import string
data = open('diamond.txt', 'r').read()
print len(data.splitlines()), len(string.split(data)), len(data)
Keep coding buddies...

read file-
d=fp.readlines()
characters-
sum([len(i)-1 for i in d])
lines-
len(d)
words-
sum([len(i.split()) for i in d])

This is one crude way of counting words without using any keywords:
#count number of words in file
fp=open("hello1.txt","r+");
data=fp.read();
word_count=1;
for i in data:
if i==" ":
word_count=word_count+1;
# end if
# end for
print ("number of words are:", word_count);

Related

Calculating Statistics From File

Write a function named file_stats that takes one string parameter (in_file) that is the name of an existing text file. The function file_stats should calculate three statistics about in_file: the number of lines it contains, the number of words and the number of characters, and print the three statistics on separate lines. For example, the following would be be correct input and output. (Hint: the number of characters may vary depending on what platform you are working.)
file_stats('created_equal.txt')
lines 2
words 13
characters 72
Below is what I have:
fileName = "C:\Users\Jeff Hardy\Desktop\index.txt"
chars = 0
words = 0
lines = 0
def file_stats(in_file):
global lines, words, chars
with open(in_file, 'r') as fd:
for line in fd:
lines += 1
wordsList = line.split()
words += len(wordsList)
for word in wordsList:
chars += len(word)
file_stats(fileName)
print("Number of lines: {0}".format(lines))
print("Number of words: {0}".format(words))
print("Number of chars: {0}".format(chars))
The code is giving me the following error:
(unicode error) 'unicodeescape' codec can't decode bytes in
position 2-3: truncated\UXXXXXXXX escape
I believe your error has to do with the encoding of your file,
or needs to be fileName = "C:\\Users\\Jeff Hardy\\Desktop\\index.txt"
and the instructions are asking you to print within the function, not affect a global variable, then you need to update the values within the loop, not after it (indentation matters)
def file_stats(in_file):
lines = words = chars = 0
with open(in_file, 'r', encoding="utf-8") as fd:
for line in fd:
lines += 1
words += len(line.split()) # If you split "x , y" is the comma a word?
chars += len(line) # Are spaces considered a character?
print("lines {0}".format(lines))
print("words {0}".format(words))
print("characters {0}".format(chards))

trying to print to a text file with words that only have two or more occurring vowels

import re
twovowels=re.compile(r".*[aeiou].*[aeiou].*", re.I)
nonword=re.compile(r"\W+", re.U)
text_file = open("twoVoweledWordList.txt", "w")
file = open("FirstMondayArticle.html","r")
for line in file:
for word in nonword.split(line):
if twovowels.match(word): print word
text_file.write('\n' + word)
text_file.close()
file.close()
This is my python code, I am trying to print only the words that have two or more occurring vowels. When i run this code, it prints everything, including the words and numbers that do not have vowels, to my text file. But the python shell shows me all of the words that have two or more occurring vowels. So how do I change that?
You can remove the vowels with str.translate and compare lengths. If after removing the letters the length difference is > 1 you have at least two vowels:
with open("FirstMondayArticle.html") as f, open("twoVoweledWordList.txt", "w") as out:
for line in file:
for word in line.split():
if len(word) - len(word.lower().translate(None,"aeiou")) > 1:
out.write("{}\n".format(word.rstrip()))
In your own code you always write the word as text_file.write('\n' + word) is outside the if block. a good lesson in why you should not have multiple statements on one line, your code is equivalent to:
if twovowels.match(word):
print(word)
text_file.write('\n' + word) # <- outside the if
Your code with the if in the correct location, some changes to your naming convention, adding some spaces between assignments and using with which closes your files for you:
import re
with open("FirstMondayArticle.html") as f, open("twoVoweledWordList.txt", "w") as out:
two_vowels = re.compile(r".*[aeiou].*[aeiou].*", re.I)
non_word = re.compile(r"\W+", re.U)
for line in f:
for word in non_word.split(line):
if two_vowels.match(word):
print(word)
out.write("{}\n".format(word.rstrip()))
Because it is outside of if condition. This is what the code lines should look like:
for line in file:
for word in nonword.split(line):
if twovowels.match(word):
print word
text_file.write('\n' + word)
text_file.close()
file.close()
Here is a sample program on Tutorialspoint showing the code above is correct.
I would suggest an alternate, and simpler, method, not using re:
def twovowels(word):
count = 0
for char in word.lower():
if char in "aeiou":
count = count + 1
if count > 1:
return True
return False
with open("FirstMondayArticle.html") as file,
open("twoVoweledWordList.txt", "w") as text_file:
for line in file:
for word in line.split():
if twovowels(word):
print word
text_file.write(word + "\n")

Reading a very large file word by word in Python

I have some pretty large text files (>2g) that I would like to process word by word. The files are space-delimited text files with no line breaks (all words are in a single line). I want to take each word, test if it is a dictionary word (using enchant), and if so, write it to a new file.
This is my code right now:
with open('big_file_of_words', 'r') as in_file:
with open('output_file', 'w') as out_file:
words = in_file.read().split(' ')
for word in word:
if d.check(word) == True:
out_file.write("%s " % word)
I looked at lazy method for reading big file in python, which suggests using yield to read in chunks, but I am concerned that using chunks of predetermined size will split words in the middle. Basically, I want chunks to be as close to the specified size while splitting only on spaces. Any suggestions?
Combine the last word of one chunk with the first of the next:
def read_words(filename):
last = ""
with open(filename) as inp:
while True:
buf = inp.read(10240)
if not buf:
break
words = (last+buf).split()
last = words.pop()
for word in words:
yield word
yield last
with open('output.txt') as output:
for word in read_words('input.txt'):
if check(word):
output.write("%s " % word)
You might be able to get away with something similar to an answer on the question you've linked to, but combining re and mmap, eg:
import mmap
import re
with open('big_file_of_words', 'r') as in_file, with open('output_file', 'w') as out_file:
mf = mmap.mmap(in_file.fileno(), 0, access=ACCESS_READ)
for word in re.finditer('\w+', mf):
# do something
fortunately Petr Viktorin has already written code for us. The following code reads a chunk from a file, then does a yield for each contained word. If a word spans chunks, that's handled also.
line = ''
while True:
word, space, line = line.partition(' ')
if space:
# A word was found
yield word
else:
# A word was not found; read a chunk of data from file
next_chunk = input_file.read(1000)
if next_chunk:
# Add the chunk to our line
line = word + next_chunk
else:
# No more data; yield the last word and return
yield word.rstrip('\n')
return
https://stackoverflow.com/a/7745406/143880

Iterable error in word count Python 3.3 program

I'm trying to complete a simple word-count program, which keeps track of the number of words, characters, and lines in a connected file.
# This program counts the number of lines, words, and characters in a file, entered by the user.
# The file is test text from a standard lorem ipsum generator.
import string
def wc():
# Sets the count of normal lines, words, and characters to 0 for proper iterative operation.
lines = 0
words = 0
chars = 0
print("This program will count the number of lines, words, and characters in a file.")
# Stores a variable as a string for more graceful coding and no errors experienced previously.
filename =("test.txt")
# Opens file and stores it as new variable, and loops through each line once the connection with file is made.
with open(filename, 'r') as fileObject:
for l in fileObject:
# Splits text file into each individual word for word count.
words = l.split()
lines += 1
words += len(words)
chars += len(l)
print("Lines:", lines)
print("Words:", words)
print("Characters:", chars)
wc()
while 1:
pass
Now, if all goes well, it should be printing the total number of lines, letters, and words in the file, but all I get is this message:
"words += len(words)
TypeError: 'int' object is not iterable
"
What is wrong?
SOLVED! New code:
# This program counts the number of lines, words, and characters in a file, entered by the user.
# The file is test text from a standard lorem ipsum generator.
import string
def wc():
# Sets the count of normal lines, words, and characters to 0 for proper iterative operation.
lines = 0
words = 0
chars = 0
print("This program will count the number of lines, words, and characters in a file.")
# Stores a variable as a string for more graceful coding and no errors experienced previously.
filename =("test.txt")
# Opens file and stores it as new variable, and loops through each line once the connection with file is made.
with open(filename, 'r') as fileObject:
for l in fileObject:
# Splits text file into each individual word for word count.
wordsFind = l.split()
lines += 1
words += len(wordsFind)
chars += len(l)
print("Lines:", lines)
print("Words:", words)
print("Characters:", chars)
wc()
while 1:
pass
It looks like you're using the variable name words for your count, and also for the result of l.split(). You need to differentiate these by using different variable names for them.

Two simple questions about python

I have 2 simple questions about python:
1.How to get number of lines of a file in python?
2.How to locate the position in a file object to the
last line easily?
lines are just data delimited by the newline char '\n'.
1) Since lines are variable length, you have to read the entire file to know where the newline chars are, so you can count how many lines:
count = 0
for line in open('myfile'):
count += 1
print count, line # it will be the last line
2) reading a chunk from the end of the file is the fastest method to find the last newline char.
def seek_newline_backwards(file_obj, eol_char='\n', buffer_size=200):
if not file_obj.tell(): return # already in beginning of file
# All lines end with \n, including the last one, so assuming we are just
# after one end of line char
file_obj.seek(-1, os.SEEK_CUR)
while file_obj.tell():
ammount = min(buffer_size, file_obj.tell())
file_obj.seek(-ammount, os.SEEK_CUR)
data = file_obj.read(ammount)
eol_pos = data.rfind(eol_char)
if eol_pos != -1:
file_obj.seek(eol_pos - len(data) + 1, os.SEEK_CUR)
break
file_obj.seek(-len(data), os.SEEK_CUR)
You can use that like this:
f = open('some_file.txt')
f.seek(0, os.SEEK_END)
seek_newline_backwards(f)
print f.tell(), repr(f.readline())
Let's not forget
f = open("myfile.txt")
lines = f.readlines()
numlines = len(lines)
lastline = lines[-1]
NOTE: this reads the whole file in memory as a list. Keep that in mind in the case that the file is very large.
The easiest way is simply to read the file into memory. eg:
f = open('filename.txt')
lines = f.readlines()
num_lines = len(lines)
last_line = lines[-1]
However for big files, this may use up a lot of memory, as the whole file is loaded into RAM. An alternative is to iterate through the file line by line. eg:
f = open('filename.txt')
num_lines = sum(1 for line in f)
This is more efficient, since it won't load the entire file into memory, but only look at a line at a time. If you want the last line as well, you can keep track of the lines as you iterate and get both answers by:
f = open('filename.txt')
count=0
last_line = None
for line in f:
num_lines += 1
last_line = line
print "There were %d lines. The last was: %s" % (num_lines, last_line)
One final possible improvement if you need only the last line, is to start at the end of the file, and seek backwards until you find a newline character. Here's a question which has some code doing this. If you need both the linecount as well though, theres no alternative except to iterate through all lines in the file however.
For small files that fit memory,
how about using str.count() for getting the number of lines of a file:
line_count = open("myfile.txt").read().count('\n')
I'd like too add to the other solutions that some of them (those who look for \n) will not work with files with OS 9-style line endings (\r only), and that they may contain an extra blank line at the end because lots of text editors append it for some curious reasons, so you might or might not want to add a check for it.
The only way to count lines [that I know of] is to read all lines, like this:
count = 0
for line in open("file.txt"): count = count + 1
After the loop, count will have the number of lines read.
For the first question there're already a few good ones, I'll suggest #Brian's one as the best (most pythonic, line ending character proof and memory efficient):
f = open('filename.txt')
num_lines = sum(1 for line in f)
For the second one, I like #nosklo's one, but modified to be more general should be:
import os
f = open('myfile')
to = f.seek(0, os.SEEK_END)
found = -1
while found == -1 and to > 0:
fro = max(0, to-1024)
f.seek(fro)
chunk = f.read(to-fro)
found = chunk.rfind("\n")
to -= 1024
if found != -1:
found += fro
It seachs in chunks of 1Kb from the end of the file, until it finds a newline character or the file ends. At the end of the code, found is the index of the last newline character.
Answer to the first question (beware of poor performance on large files when using this method):
f = open("myfile.txt").readlines()
print len(f) - 1
Answer to the second question:
f = open("myfile.txt").read()
print f.rfind("\n")
P.S. Yes I do understand that this only suits for small files and simple programs. I think I will not delete this answer however useless for real use-cases it may seem.
Answer1:
x = open("file.txt")
opens the file or we have x associated with file.txt
y = x.readlines()
returns all lines in list
length = len(y)
returns length of list to Length
Or in one line
length = len(open("file.txt").readlines())
Answer2 :
last = y[-1]
returns the last element of list
Approach:
Open the file in read-mode and assign a file object named “file”.
Assign 0 to the counter variable.
Read the content of the file using the read function and assign it to a
variable named “Content”.
Create a list of the content where the elements are split wherever they encounter an “\n”.
Traverse the list using a for loop and iterate the counter variable respectively.
Further the value now present in the variable Counter is displayed
which is the required action in this program.
Python program to count the number of lines in a text file
# Opening a file
file = open("filename","file mode")#file mode like r,w,a...
Counter = 0
# Reading from file
Content = file.read()
CoList = Content.split("\n")
for i in CoList:
if i:
Counter += 1
print("This is the number of lines in the file")
print(Counter)
The above code will print the number of lines present in a file. Replace filename with the file with extension and file mode with read - 'r'.

Categories