Python - Counting blank lines in a text file - python

Let's say I have a file with the following content(every even line is blank):
Line 1 Line 2 Line 3 ...
I tried to read the file in 2 ways:
count = 0
for line in open("myfile.txt"):
if line == '': #or if len(line) == 0
count += 1
and
count = 0
file = open('myfile.txt')
lines = file.readlines()
for line in lines:
if line == '': #or if len(line) == 0
count += 1
But count always remains 0. How can I count the number of blank lines?

In a more simple and pythonic way:
with open(filename) as fd:
count = sum(1 for line in fd if len(line.strip()) == 0)
This keep the linear complexity in time and a constant complexity in memory.
And, most of all, it get rid of the variable count as a manually incremented variable.

When you use readlines() function, it doesn't automatically remove the EOL characters for you. So you either compare against the end of line, something like:
if line == os.linesep:
count += 1
(you have to import os module of course), or you strip the line (as suggested by #khelwood's comment on your question) and compare against '' as you are doing.
Notice that using os.linesep might not necessarily work as you would expect if you are running your program on a certain OS, e.g. MacOS, but the file you are checking is from a different OS, e.g. Linux, as the line ending will be different. So to check for all cases you have to do something like:
if line == '\n' or line == '\r' or line == '\r\n':
count += 1
Hope this helps.

Every line ends with a newline character '\n'. Note that it is only one character.
An easy workaround is to check wether the line equals '\n', or wether its length is 1, not 0.

You can use count from itertools, which returns iterator. Furthermore I used just strip instead of checking length.
from itertools import count
counter = count()
with open('myfile.txt', 'r') as f:
for line in f.readlines():
if not line.strip():
counter.next()
print counter.next()

Related

I want to replace words from a file by the line no using python i have a list of line no?

if I have a file like:
Flower
Magnet
5001
100
0
and I have a list containing line number, which I have to change.
list =[2,3]
How can I do this using python and the output I expect is:
Flower
Most
Most
100
0
Code that I've tried:
f = open("your_file.txt","r")
line = f.readlines()[2]
print(line)
if line=="5001":
print "yes"
else:
print "no"
but it is not able to match.
i want to overwrite the file which i am reading
You may simply loop through the list of indices that you have to replace in your file (my original answer needlessly looped through all lines in the file):
with open('test.txt') as f:
data = f.read().splitlines()
replace = {1,2}
for i in replace:
data[i] = 'Most'
print('\n'.join(data))
Output:
Flower
Most
Most
100
0
To overwrite the file you have opened with the replacements, you may use the following:
with open('test.txt', 'r+') as f:
data = f.read().splitlines()
replace = {1,2}
for i in replace:
data[i] = 'Most'
f.seek(0)
f.write('\n'.join(data))
f.truncate()
The reason that you're having this problem is that when you take a line from a file opened in python, you also get the newline character (\n) at the end. To solve this, you could use the string.strip() function, which will automatically remove these characters.
Eg.
f = open("your_file.txt","r")
line = f.readlines()
lineToCheck = line[2].strip()
if(lineToCheck == "5001"):
print("yes")
else:
print("no")

Find the line number a string is on in an external text file

I am trying to create a program where it gets input from a string entered by the user and searches for that string in a text file and prints out the line number. If the string is not in the text file, it will print that out. How would I do this? Also I am not sure if even the for loop that I have so far would work for this so any suggestions / help would be great :).
What I have so far:
file = open('test.txt', 'r')
string = input("Enter string to search")
for string in file:
print("") #print the line number
You can implement this algorithm:
Initialize a counter
Read lines one by one
If the line matches the target, return the current count
Increment the count
If reached the end without returning, the line is not in the file
For example:
def find_line(path, target):
with open(path) as fh:
count = 1
for line in fh:
if line.strip() == target:
return count
count += 1
return 0
A text file differs from memory used in programs (such as dictionaries and arrays) in the manner that it is sequential. Much like the old tapes used for storage a long, long time ago, there's no way to grab/find a specific line without combing through all prior lines (or somehow guessing the exact memory location). Your best option is just to create a for loop that iterates through each line until it finds the one it's looking for, returning the amount of lines traversed until that point.
file = open('test.txt', 'r')
string = input("Enter string to search")
lineCount = 0
for line in file:
lineCount += 1
if string == line.rstrip(): # remove trailing newline
print(lineCount)
break
filepath = 'test.txt'
substring = "aaa"
with open(filepath) as fp:
line = fp.readline()
cnt = 1
flag = False
while line:
if substring in line:
print("string found in line {}".format(cnt))
flag = True
break
line = fp.readline()
cnt += 1
if not flag:
print("string not found in file")
If the string will match a line exactly, we can do this in one-line:
print(open('test.txt').read().split("\n").index(input("Enter string to search")))
Well the above kind of works accept it won't print "no match" if there isn't one. For that, we can just add a little try:
try:
print(open('test.txt').read().split("\n").index(input("Enter string to search")))
except ValueError:
print("no match")
Otherwise, if the string is just somewhere in one of the lines, we can do:
string = input("Enter string to search")
for i, l in enumerate(open('test.txt').read().split("\n")):
if string in l:
print("Line number", i)
break
else:
print("no match")

set current position in a text file one line back

I want to set the current position in a textfile one line back.
Example:
I search in a textfile for a word "x".
Textfile:
Line: qwe qwe
Line: x
Line: qwer
Line: qwefgdg
If i find that word, the current position of the fobj shall be set back one line.
( in the example I find the word in the 2. Line so the position shall be set to the beginning of the 1. Line)
I try to use fseek. But I wasn't that succesfull.
This is not how you do it in Python. You should just iterate over the file, test the current line and never worry about file pointers. If you need to retrieve the content of the previous line, just store it.
>>> with open('text.txt') as f: print(f.read())
a
b
c
d
e
f
>>> needle = 'c\n'
>>> with open('test.txt') as myfile:
previous = None
position = 0
for line in myfile:
if line == needle:
print("Previous line is {}".format(repr(previous)))
break
position += len(line) if line else 0
previous = line
Previous line is 'b\n'
>>> position
4
If you really need the byte position of the previous line, be aware that the tell/seek methods don't blend well with iteration, so reopen the file to be safe.
f = open('filename').readlines()
i = 0
while True:
if i > range(len(f)-1):
break
if 'x' in f[i]:
i = i - 1
print f[i]
i += 1
Be careful as that will create a forever loop. Make sure you enter an exit condition for loop to terminate.

Read a file line by line, sometimes reading the next line within same loop

I'd like to read a file in python line by line, but in some cases (based on an if condition) I'd also like to read the next line in the file, and then keep reading it the same way.
Example:
file_handler = open(fname, 'r')
for line in file_handler:
if line[0] == '#':
print line
else:
line2 = file_handler.readline()
print line2
basically in this example I am trying to read it line by line, but when the line does not start with # I'd like to read the next line, print it, and then keep reading the line after line2. This is just an example where I got the error for similar stuff I am doing in my code but my goal is as stated in the title.
But I'd get an error like ValueError: Mixing iteration and read methods would lose data.
Would it be possible to do what I am trying to do in a smarter way?
If you just want to skip over lines not starting with #, there's a much easier way to do this:
file_handler = open(fname, 'r')
for line in file_handler:
if line[0] != '#':
continue
# now do the regular logic
print line
Obviously this kind of simplistic logic won't work in all possible cases. When it doesn't, you have to do exactly what the error implies: either use iteration consistently, or use read methods consistently. This is going to be more tedious and error-prone, but it's not that bad.
For example, with readline:
while True:
line = file_handler.readline()
if not line:
break
if line[0] == '#':
print line
else:
line2 = file_handler.readline()
print line2
Or, with iteration:
lines = file_handler
for line in file_handler:
if line[0] == '#':
print line
else:
print line
print next(file_handler)
However, that last version is sort of "cheating". You're relying on the fact that the iterator in the for loop is the same thing as the iterable it was created from. This happens to be true for files, but not for, say, lists. So really, you should do the same kind of while True loop here, unless you want to add an explicit iter call (or at least a comment explaining why you don't need one).
And a better solution might be to write a generator function that transforms one iterator into another based on your rule, and then print out each value iterated by that generator:
def doublifier(iterable):
it = iter(iterable)
while True:
line = next(it)
if line.startswith('#'):
yield line, next(it)
else:
yield (line,)
file_handler = open(fname, 'r')
for line in file_handler:
if line.startswith('#'): # <<< comment 1
print line
else:
line2 = next(file_handler) # <<< comment 2
print line2
Discussion
Your code used a single equal sign, which is incorrect. It should be double equal sign for comparison. I recommend to use the .startswith() function to enhance code clarity.
Use the next() function to advance to the next line since you are using file_handler as an iterator.
add a flag value:
if flag is True:
print line #or whatever
flag = False
if line[0] == '#':
flag = True
This is versatile version :-)
You can save a bit of state information that tells you what to do with the next line:
want_next = False
for line in open(fname):
if want_next:
print line
want_next = False
elif line[0] == '#':
print line
want_next = True
I think what you are looking for is next rather than readline.
A few things. In your code, you use = rather than ==. I will use startswith instead. If you call next on an iterator, it will return the next item or throw a StopIteration exception.
The file
ewolf#~ $cat foo.txt
# zork zap
# woo hoo
here is
some line
# a line
with no haiku
The program
file_handler = open( 'foo.txt', 'r' )
for line in file_handler:
line = line.strip()
if line.startswith( '#' ):
print "Not Skipped : " + line
elif line is not None:
try:
l2 = file_handler.next()
l2 = l2.strip()
print "Skipping. Next line is : " + l2
except StopIteration:
# End of File
pass
The output
Not Skipped : # zork zap
Not Skipped : # woo hoo
Skipping. Next line is : some line
Not Skipped : # a line
Skipping. Next line is :
try if line[0] == "#" instead of line[0] = "#"

Two simple questions about python

I have 2 simple questions about python:
1.How to get number of lines of a file in python?
2.How to locate the position in a file object to the
last line easily?
lines are just data delimited by the newline char '\n'.
1) Since lines are variable length, you have to read the entire file to know where the newline chars are, so you can count how many lines:
count = 0
for line in open('myfile'):
count += 1
print count, line # it will be the last line
2) reading a chunk from the end of the file is the fastest method to find the last newline char.
def seek_newline_backwards(file_obj, eol_char='\n', buffer_size=200):
if not file_obj.tell(): return # already in beginning of file
# All lines end with \n, including the last one, so assuming we are just
# after one end of line char
file_obj.seek(-1, os.SEEK_CUR)
while file_obj.tell():
ammount = min(buffer_size, file_obj.tell())
file_obj.seek(-ammount, os.SEEK_CUR)
data = file_obj.read(ammount)
eol_pos = data.rfind(eol_char)
if eol_pos != -1:
file_obj.seek(eol_pos - len(data) + 1, os.SEEK_CUR)
break
file_obj.seek(-len(data), os.SEEK_CUR)
You can use that like this:
f = open('some_file.txt')
f.seek(0, os.SEEK_END)
seek_newline_backwards(f)
print f.tell(), repr(f.readline())
Let's not forget
f = open("myfile.txt")
lines = f.readlines()
numlines = len(lines)
lastline = lines[-1]
NOTE: this reads the whole file in memory as a list. Keep that in mind in the case that the file is very large.
The easiest way is simply to read the file into memory. eg:
f = open('filename.txt')
lines = f.readlines()
num_lines = len(lines)
last_line = lines[-1]
However for big files, this may use up a lot of memory, as the whole file is loaded into RAM. An alternative is to iterate through the file line by line. eg:
f = open('filename.txt')
num_lines = sum(1 for line in f)
This is more efficient, since it won't load the entire file into memory, but only look at a line at a time. If you want the last line as well, you can keep track of the lines as you iterate and get both answers by:
f = open('filename.txt')
count=0
last_line = None
for line in f:
num_lines += 1
last_line = line
print "There were %d lines. The last was: %s" % (num_lines, last_line)
One final possible improvement if you need only the last line, is to start at the end of the file, and seek backwards until you find a newline character. Here's a question which has some code doing this. If you need both the linecount as well though, theres no alternative except to iterate through all lines in the file however.
For small files that fit memory,
how about using str.count() for getting the number of lines of a file:
line_count = open("myfile.txt").read().count('\n')
I'd like too add to the other solutions that some of them (those who look for \n) will not work with files with OS 9-style line endings (\r only), and that they may contain an extra blank line at the end because lots of text editors append it for some curious reasons, so you might or might not want to add a check for it.
The only way to count lines [that I know of] is to read all lines, like this:
count = 0
for line in open("file.txt"): count = count + 1
After the loop, count will have the number of lines read.
For the first question there're already a few good ones, I'll suggest #Brian's one as the best (most pythonic, line ending character proof and memory efficient):
f = open('filename.txt')
num_lines = sum(1 for line in f)
For the second one, I like #nosklo's one, but modified to be more general should be:
import os
f = open('myfile')
to = f.seek(0, os.SEEK_END)
found = -1
while found == -1 and to > 0:
fro = max(0, to-1024)
f.seek(fro)
chunk = f.read(to-fro)
found = chunk.rfind("\n")
to -= 1024
if found != -1:
found += fro
It seachs in chunks of 1Kb from the end of the file, until it finds a newline character or the file ends. At the end of the code, found is the index of the last newline character.
Answer to the first question (beware of poor performance on large files when using this method):
f = open("myfile.txt").readlines()
print len(f) - 1
Answer to the second question:
f = open("myfile.txt").read()
print f.rfind("\n")
P.S. Yes I do understand that this only suits for small files and simple programs. I think I will not delete this answer however useless for real use-cases it may seem.
Answer1:
x = open("file.txt")
opens the file or we have x associated with file.txt
y = x.readlines()
returns all lines in list
length = len(y)
returns length of list to Length
Or in one line
length = len(open("file.txt").readlines())
Answer2 :
last = y[-1]
returns the last element of list
Approach:
Open the file in read-mode and assign a file object named “file”.
Assign 0 to the counter variable.
Read the content of the file using the read function and assign it to a
variable named “Content”.
Create a list of the content where the elements are split wherever they encounter an “\n”.
Traverse the list using a for loop and iterate the counter variable respectively.
Further the value now present in the variable Counter is displayed
which is the required action in this program.
Python program to count the number of lines in a text file
# Opening a file
file = open("filename","file mode")#file mode like r,w,a...
Counter = 0
# Reading from file
Content = file.read()
CoList = Content.split("\n")
for i in CoList:
if i:
Counter += 1
print("This is the number of lines in the file")
print(Counter)
The above code will print the number of lines present in a file. Replace filename with the file with extension and file mode with read - 'r'.

Categories