find a matching line in a txt with python - python

I can test whether a string is in a text file with python like the following:
if(pdfname in open(filename).read()):
//do something
But how I can I verify it? I want the matching line show up. Thanks.

You can print out the relevant lines with the second for statement below, which I've added to your existing code for illustration:
pdfname = "statementConcentrator"
if (pdfname in open("line56.good").read()):
print "Found it"
lineNum = 0
for line in open("line56.good").readlines():
lineNum = lineNum + 1
if pdfname in line:
print "#%5d: %s"%(lineNum, line[:-1])
This outputs your current line plus my output for verification:
Found it
# 115: statementConcentrator=0
and, checking that file, that is indeed the line it's found on.
Note that you can simply use a script as follows to do both jobs in a single loop:
pdfname = "statementConcentrator"
lineNum = 0
count = 0
for line in open("line56.good").readlines():
lineNum = lineNum + 1
if pdfname in line:
print "#%5d: %s"%(lineNum, line[:-1])
count = count + 1
print "Found on %d line(s)."%(count)

Just loop over the file:
handle = open(filename, 'r')
for line in handle:
if pdfname in line:
print line
handle.close()

Related

Lines missing in python

I am writing a code in python where I am removing all the text after a specific word but in output lines are missing. I have a text file in unicode which have 3 lines:
my name is test1
my name is
my name is test 2
What I want is to remove text after word "test" so I could get the output as below
my name is test
my name is
my name is test
I have written a code but it does the task but also removes the second line "my name is"
My code is below
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index > 0:
txt += line[:index + len(splitStr)] + "\n"
with open(r"test.txt", "w") as fp:
fp.write(txt)
It looks like if there is no keyword found the index become -1.
So you are avoiding the lines w/o keyword.
I would modify your if by adding the condition as follows:
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index > 0:
txt += line[:index + len(splitStr)] + "\n"
elif index < 0:
txt += line
with open(r"test.txt", "w") as fp:
fp.write(txt)
No need to add \n because the line already contains it.
Your code does not append the line if the splitStr is not defined.
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index != -1:
txt += line[:index + len(splitStr)] + "\n"
else:
txt += line
with open(r"test.txt", "w") as fp:
fp.write(txt)
In my solution I simulate the input file via io.StringIO. Compared to your code my solution remove the else branch and only use one += operater. Also splitStr is set only one time and not on each iteration. This makes the code more clear and reduces possible errore sources.
import io
# simulates a file for this example
the_file = io.StringIO("""my name is test1
my name is
my name is test 2""")
txt = ""
splitStr = "test"
with the_file as fp:
# each line
for line in fp.readlines():
# cut somoething?
if splitStr in line:
# find index
index = line.find(splitStr)
# cut after 'splitStr' and add newline
line = line[:index + len(splitStr)] + "\n"
# append line to output
txt += line
print(txt)
When handling with files in Python 3 it is recommended to use pathlib for that like this.
import pathlib
file_path = pathlib.Path("test.txt")
# read from wile
with file_path.open('r') as fp:
# do something
# write back to the file
with file_path.open('w') as fp:
# do something
Suggestion:
for line in fp.readlines():
i = line.find('test')
if i != -1:
line = line[:i]

Python text processing/finding data

I am trying to parse/process some information from a text file using Python. This file contains names, employee numbers and other data. I do not know the names or employee numbers ahead of time. I do know that after the names there is the text: "Per End" and before the employee number there is the text: "File:". I can find these items using the .find() method. But, how do I ask Python to look at the information that comes before or after "Per End" and "File:"? In this specific case the output should be the name and employee number.
The text looks like this:
SMITH, John
Per End: 12/10/2016
File:
002013
Dept:
000400
Rate:10384 60
My code is thus:
file = open("Register.txt", "rt")
lines = file.readlines()
file.close()
countPer = 0
for line in lines:
line = line.strip()
print (line)
if line.find('Per End') != -1:
countPer += 1
print ("Per End #'s: ", countPer)
file = open("Register.txt", "rt")
lines = file.readlines()
file.close()
for indx, line in enumerate(lines):
line = line.strip()
print (line)
if line.find('Per End') != -1:
print lines[indx-1].strip()
if line.find('File:') != -1:
print lines[indx+1].strip()
enumerate(lines) gives access to indices and line as well, there by you can access previous and next lines as well
here is my stdout directly ran in python shell:
>>> file = open("r.txt", "rt")
>>> lines = file.readlines()
>>> file.close()
>>> lines
['SMITH, John\n', 'Per End: 12/10/2016\n', 'File:\n', '002013\n', 'Dept:\n', '000400\n', 'Rate:10384 60\n']
>>> for indx, line in enumerate(lines):
... line = line.strip()
... if line.find('Per End') != -1:
... print lines[indx-1].strip()
... if line.find('File:') != -1:
... print lines[indx+1].strip()
SMITH, John
002013
Here is how I would do it.
First, some test data.
test = """SMITH, John\n
Per End: 12/10/2016\n
File:\n
002013\n
Dept:\n
000400\n
Rate:10384 60\n"""
text = [line for line in test.splitlines(keepends=False) if line != ""]
Now for the real answer.
count_per, count_num = 0, 0
Using enumerate on an iterable gives you an index automagically.
for idx, line in enumerate(text):
# Just test whether what you're looking for is in the `str`
if 'Per End' in line:
print(text[idx - 1]) # access the full set of lines with idx
count_per += 1
if 'File:' in line:
print(text[idx + 1])
count_num += 1
print("Per Ends = {}".format(count_per))
print("Files = {}".format(count_num))
yields for me:
SMITH, John
002013
Per Ends = 1
Files = 1

How do I count the number of lines that are full-line comments in python?

I'm trying to create a function that accepts a file as input and prints the number of lines that are full-line comments (i.e. the line begins with #followed by some comments).
For example a file that contains say the following lines should print the result 2:
abc
#some random comment
cde
fgh
#another random comment
So far I tried along the lines of but just not picking up the hash symbol:
infile = open("code.py", "r")
line = infile.readline()
def countHashedLines(filename) :
while line != "" :
hashes = '#'
value = line
print(value) #here you will get all
#if(value == hashes): tried this but just wasn't working
# print("hi")
for line in value:
line = line.split('#', 1)[1]
line = line.rstrip()
print(value)
line = infile.readline()
return()
Thanks in advance,
Jemma
I re-worded a few statements for ease of use (subjective) but this will give you the desired output.
def countHashedLines(lines):
tally = 0
for line in lines:
if line.startswith('#'): tally += 1
return tally
infile = open('code.py', 'r')
all_lines = infile.readlines()
num_hash_nums = countHashedLines(all_lines) # <- 2
infile.close()
...or if you want a compact and clean version of the function...
def countHashedLines(lines):
return len([line for line in lines if line.startswith('#')])
I would pass the file through standard input
import sys
count = 0
for line in sys.stdin: """ Note: you could also open the file and iterate through it"""
if line[0] == '#': """ Every time a line begins with # """
count += 1 """ Increment """
print(count)
Here is another solution that uses regular expressions and will detect comments that have white space in front.
import re
def countFullLineComments(infile) :
count = 0
p = re.compile(r"^\s*#.*$")
for line in infile.readlines():
m = p.match(line)
if m:
count += 1
print(m.group(0))
return count
infile = open("code.py", "r")
print(countFullLineComments(infile))

Evaluating the next line in a For Loop while in the current iteration

Here is what I am trying to do:
I am trying to solve an issue that has to do with wrapping in a text file.
I want to open a txt file, read a line and if the line contains what I want it to contain, check the next line to see if it does not contain what is in the first line. If it does not, add the line to the first line.
import re
stuff = open("my file")
for line in stuff:
if re.search("From ", line):
first = line
print first
if re.search('From ', handle.next()):
continue
else: first = first + handle.next()
else: continue
I have looked a quite a few things and cannot seem to find an answer. Please help!
I would try to do something like this, but this is invalid for triples of "From " and not elegant at all.
lines = open("file", 'r').readlines()
lines2 = open("file2", 'w')
counter_list=[]
last_from = 0
for counter, line in enumerate(lines):
if "From " in line and counter != last_from +1:
last_from = counter
current_count = counter
if current_count+1 == counter:
if "From " in line:
counter_list.append(current_count+1)
for counter, line in enumerate(lines):
if counter in counter_list:
lines2.write(line)
else:
lines2.write(line, '\n')
Than you can check the lines2 if its helped.
You could also revert order of lines, then check in next line not in previous. That would solve your problem in one loop.
Thank you Martjin for helping me reset my mind frame! This is what I came up with:
handle = open("my file")
first = ""
second = ""
sent = ""
for line in handle:
line = line.rstrip()
if len(first) > 0:
if line.startswith("From "):
if len(sent) > 0:
print sent
else: continue
first = line
second = ""
else:
second = second + line
else:
if line.startswith("From "):
first = line
sent = first + second
It is probably crude, but it definitely got the job done!

How to change back a line during file read

In my code I have a line length print like this:
line = file.readline()
print("length = ", len(line))
after that I start to scan the lines by doing this:
for i in range(len(line)):
if(file.read(1) == 'b'):
print("letter 'b' found.")
The problem is that the for loop starts reading on line 2 of the file.
How can I make it start reading at line 1 without closing and reopening the file?
It is possible to use file.seek to move the position of the next read, but that's inefficient. You've already read in the line, so you can just process
line without having to read it in a second time.
with open(filename,'r') as f:
line = f.readline()
print("length = ", len(line))
if 'b' in line:
print("letter 'b' found.")
for line in f:
...
It seems that you need to handle the first line specially.
lineno = 1
found = False
for line in file:
if 'b' in line:
found = True
if lineno == 1:
print("length of first line: %d" % len(line))
lineno += 1
if found:
print("letter 'b' found.")
It sounds like you want something like this:
with open('file.txt', 'r') as f:
for line in f:
for character in line:
if character == "b":
print "letter 'b' found."
or if you just need the number:
with open('file.txt', 'r') as f:
b = sum(1 for line in f for char in line if char == "b")
print "found %d b" % b
#! usr/bin/env python
#Open the file , i assumed its called somefile.txt
file = open('somefile.txt.txt','r')
#Lets loop through the lines ...
for line in file.readlines():
#test if letter 'b' is in each line ...
if 'b' in line:
#print that we found a b in the line
print "letter b found"

Categories