Copy each line 3 lines previous to each match - python

I am trying to copy lines four lines before a line that contains a specific keyword.
if line.find("keyword") == 0:
f.write(line -3)
I don't need the line where I found the keyword, but 4 lines before it. Since the write method doesn't work with line numbers, I got stuck

If you're already using two files, it's as simple as keeping a buffer and writing out the last 3 entries in it when you encounter a match:
buf = [] # your buffer
with open("in_file", "r") as f_in, open("out_file", "w") as f_out: # open the in/out files
for line in f_in: # iterate the input file line by line
if "keyword" in line: # the current line contains a keyword
f_out.writelines(buf[-3:]) # write the last 3 lines (or less if not available)
f_out.write(line) # write the current line, omit if not needed
buf = [] # reset the buffer
else:
buf.append(line) # add the current line to the buffer

You can just use a list, append to the list each line (and truncate to last 4). When you reach the target line you are done.
last_3 = []
with open("the_dst_file") as fw:
with open("the_source_file") as fr:
for line in fr:
if line.find("keyword") == 0:
fw.write(last_3[0] + "\n")
last_3 = []
continue
last_3.append(line)
last_3 = last_3[-3:]
If the format of the file is known in a way that "keyword" will always have at least 3 lines preceding it, and at least 3 lines between instances, then the above is good. If not, then you would need to guard against the write by checking that the len of last_3 is at == 3 before pulling off the first element.

Related

Deleting a specific number of lines from text file using Python

Suppose I have a text file that goes like this:
AAAAAAAAAAAAAAAAAAAAA #<--- line 1
BBBBBBBBBBBBBBBBBBBBB #<--- line 2
CCCCCCCCCCCCCCCCCCCCC #<--- line 3
DDDDDDDDDDDDDDDDDDDDD #<--- line 4
EEEEEEEEEEEEEEEEEEEEE #<--- line 5
FFFFFFFFFFFFFFFFFFFFF #<--- line 6
GGGGGGGGGGGGGGGGGGGGG #<--- line 7
HHHHHHHHHHHHHHHHHHHHH #<--- line 8
Ignore "#<--- line...", it's just for demonstration
Assumptions
I don't know what line 3 is going to contain (because it changes
all the time)...
The first 2 lines have to be deleted...
After the first 2 lines, I want to keep 3 lines...
Then, I want to delete all lines after the 3rd line.
End Result
The end result should look like this:
CCCCCCCCCCCCCCCCCCCCC #<--- line 3
DDDDDDDDDDDDDDDDDDDDD #<--- line 4
EEEEEEEEEEEEEEEEEEEEE #<--- line 5
Lines deleted: First 2 + Everything after the next 3 (i.e. after line 5)
Required
All Pythonic suggestions are welcome! Thanks!
Reference Material
https://thispointer.com/python-how-to-delete-specific-lines-in-a-file-in-a-memory-efficient-way/
def delete_multiple_lines(original_file, line_numbers):
"""In a file, delete the lines at line number in given list"""
is_skipped = False
counter = 0
# Create name of dummy / temporary file
dummy_file = original_file + '.bak'
# Open original file in read only mode and dummy file in write mode
with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
# Line by line copy data from original file to dummy file
for line in read_obj:
# If current line number exist in list then skip copying that line
if counter not in line_numbers:
write_obj.write(line)
else:
is_skipped = True
counter += 1
# If any line is skipped then rename dummy file as original file
if is_skipped:
os.remove(original_file)
os.rename(dummy_file, original_file)
else:
os.remove(dummy_file)
Then...
delete_multiple_lines('sample.txt', [0,1,2])
The problem with this method might be that, if your file had 1-100 lines on top to delete, you'll have to specify [0,1,2...100]. Right?
Answer
Courtesy of #sandes
The following code will:
delete the first 63
get you the next 95
ignore the rest
create a new file
with open("sample.txt", "r") as f:
lines = f.readlines()
new_lines = []
idx_lines_wanted = [x for x in range(63,((63*2)+95))]
# delete first 63, then get the next 95
for i, line in enumerate(lines):
if i > len(idx_lines_wanted) -1:
break
if i in idx_lines_wanted:
new_lines.append(line)
with open("sample2.txt", "w") as f:
for line in new_lines:
f.write(line)
EDIT: iterating directly over f
based in #Kenny's comment and #chepner's suggestion
with open("your_file.txt", "r") as f:
new_lines = []
for idx, line in enumerate(f):
if idx in [x for x in range(2,5)]: #[2,3,4]
new_lines.append(line)
with open("your_new_file.txt", "w") as f:
for line in new_lines:
f.write(line)
This is really something that's better handled by an actual text editor.
import subprocess
subprocess.run(['ed', original_file], input=b'1,2d\n+3,$d\nwq\n')
A crash course in ed, the POSIX standard text editor.
ed opens the file named by its argument. It then proceeds to read commands from its standard input. Each command is a single character, with some commands taking one or two "addresses" to indicate which lines to operate on.
After each command, the "current" line number is set to the line last affected by a command. This is used with relative addresses, as we'll see in a moment.
1,2d means to delete lines 1 through 2; the current line is set to 2
+3,$d deletes all the lines from line 5 (current line is 2, so 2 + 3 == 5) through the end of the file ($ is a special address indicating the last line of the file)
wq writes all changes to disk and quits the editor.

Python - deleting lines and previos lines (matching pattern patterns)

I want to find the lines which start with a word of a list. If the word is found i want the line it stands in and the previous line to be deleted.
I am able to get the line and the previos one and print them but i can not get my head around not to pass them to my outputfile.
F.e.:
in-put:
This is not supposed to be deleted.
This shall be deleted.
Titel
This is not supposed to be deleted.
This is not supposed to be deleted
out-put:
This is not supposed to be deleted.
This is not supposed to be deleted.
This is not supposed to be deleted
I tried it with this code, but i keep getting a TypeError: 'str' object does not support item assignment
with open(file1) as f_in, open(file2, 'w') as f_out:
lines = f_in.read().splitlines()
for i, line in enumerate(lines):
clean = True
if line.startswith(('Text', 'Titel')):
for (line[i-1]) in lines:
clean = False
for line in lines:
clean =False
if clean == True:
f_out.write(line)
You don't have to read the file at once. Read the lines after each other, and store the current line, but write it out only after the next read, or not.
with open("file1") as finp, open("file2","w") as fout:
lprev=""
for line in finp:
if line.startswith("Titel") or line.startswith("Text"):
lprev=""
continue
if lprev:
fout.write(lprev)
lprev=line
if lprev:
fout.write(lprev) # write out the last line if needed
First keep track of which lines you want to copy:
lines_to_keep = []
with open(file1) as f_in:
deleted_previous_line = True
for line in f_in:
if line.startswith(('Text', 'Titel')):
if not deleted_previous_line:
del lines_to_keep[-1]
deleted_previous_line = True
continue
deleted_previous_line = False
lines_to_keep.append(line)
The trick with the deleted_previous_line is necessary to ensure it does not delete too many lines if consecutive lines start with 'Text' or 'Titel'.
Then write it to your output file
with open(file2, 'w') as f_out:
f_out.writelines(lines_to_keep)

How to read a specific line which is above the current line from a file using python

I have a file, after reading a line from the file I have named it current_line, I want to fetch the 4th line above the current_line. How can this be done using python?
line 1
line 2
line 3
line 4
line 5
line 6
Now say I have fetched line 6 and I have made
current_line = line 6
Now i want 4 the line from above (ie) N now want line 2
output_line = line 2
PS: I dont want to read the file from bottom.
You can keep a list of the last 4 lines while iterating over the lines of your file. A good way to do it is to use a deque with a maximum length of 4:
from collections import deque
last_lines = deque(maxlen=4)
with open('test.txt') as f:
for line in f:
if line.endswith('6\n'): # Your real condition here
print(last_lines[0])
last_lines.append(line)
# Output:
# line 2
Once a bounded length deque is full, when new items are added, a
corresponding number of items are discarded from the opposite end.
We read the file line by line and only keep the needed lines in memory.
Imagine we have just read line 10. We have lines 6 to 9 in the queue.
If the condition is met, we retrieve line 6 at the start of the queue and use it.
We append line 10 to the deque, the first item (line 6) gets pushed out, as we are sure that we won't need it anymore, we now have lines 7 to 10 in the queue.
My approach would be converting the contents to a list splitting on \n and retrieving required line by index.
lines = '''line 1
line 2
line 3
line 4
line 5
line 6'''
s = lines.split('\n')
current_line = 'line 6'
output_line = s[s.index(current_line) - 4]
# line 2
Since you are reading from file, you don't need to explicitly split on \n. You could read from file as list of lines using readlines:
with open('path/to/your_file') as f:
lines = f.readlines()
current_line = 'line 6'
output_line = lines[lines.index(current_line) - 4]
# line 2
You can use enumerate for your open(). For example:
with open('path/to/your.file') as f:
for i, line in enumerate(f):
# Do something with line
# And you have the i as index.
To go back to the i-4 line, you may think about using while.
But why do you need to go back?
you can do:
with open("file.txt") as f:
lines = f.readlines()
for nbr_line, line in enumerate(lines):
if line == ...:
output_line = lines[nbr_line - 4] # !!! nbr_line - 4 may be < 0
As I can see you are reading the file line by line. I suggest you to read whole file into the list as below example.
with open("filename.txt","r") as fd:
lines=fd.readlines() # this will read each line and append it to lines list
lines[line_number] will give you the respected line.
f.readLines not effective solution. If you work on huge file why do you want to read all file into memory?
def getNthLine(i):
if i<1:
return 'NaN';
else:
with open('temp.text', 'r') as f:
for index, line in enumerate(f):
if index == i:
line = line.strip()
return line
f = open('temp.text','r');
for i,line in enumerate(f):
print(line.strip());
print(getNthLine(i-1));
There is no much more options to solve that kind of a problem.
you could also use tell and seek methods to play around but generally no need for ninja :).
If you using on huge file just do not forget to use enumerate
This is how you could do it with a generator, avoids reading the whole file into memory.
Update: used collections.deque (deque stands for "double ended queue") as recommended by Thierry Lathuille.
import collections
def file_generator(filepath):
with open(filepath) as file:
for l in file:
yield l.rstrip()
def get_n_lines_previous(filepath, n, match):
file_gen = file_generator(filepath)
stored_lines = collections.deque('',n)
for line in file_gen:
if line == match:
return stored_lines[0]
stored_lines.append(line)
if __name__ == "__main__":
print(get_n_lines_previous("lines.txt", 4, "line 6"))

Find unique entries in files

guess you have a solution concerning the following issue:
I want to compare two lists for common entries (on the basis of column 10) and write common entries to one file and unique entries for the first list into another file. The code I wrote is:
INFILE1 = open ("c:\\python\\test\\58962.filtered.csv", "r")
INFILE2 = open ("c:\\python\\test\\83887.filtered.csv", "r")
OUTFILE1 = open ("c:\\python\\test\\58962_vs_83887.common.csv", "w")
OUTFILE2 = open ("c:\\python\\test\\58962_vs_83887.unique.csv", "w")
for line in INFILE1:
line = line.rstrip().split(",")
if line[11] in INFILE2:
OUTFILE1.write(line)
else:
OUTFILE2.write(line)
INFILE1.close()
INFILE2.close()
OUTFILE1.close()
OUTFILE2.close()
The following error appears:
8 OUTFILE1.write(line)
9 else:
---> 10 OUTFILE2.write(line)
11 INFILE1.close()
TypeError: write() argument must be str, not list
Does somebody know about help for this?
Best
This line
line = line.rstrip().split(",")
replaces the line you read from a file by it's splitted list. You then try to write the splitted list to your file - thats not how the write method works and it tells you exactly that.
Change it to :
for line in INFILE1:
lineList = line.rstrip().split(",") # dont overwrite line, use lineList
if lineList[11] in INFILE2: # used lineList
OUTFILE1.write(line) # corrected indentation
else:
OUTFILE2.write(line)
You could have easily found your error yourself, just printing out the line before and after splitting or just befrore writing.
Please read How to debug small programs (#1) and follow it - its easier to find and fix bugs yourself then posting questions here.
You have some other problem at hand, though:
Files are stream based, they start with a position of 0 in the file. The position is advanced if you access parts of the file. When at the end, you wont get anything by using INFILE2.read() or other methods.
So if you want to repeatadly check if some lines column of file1 is somewhere in file2 you need to read file2 into a list (or other datastructure) so your repeated checks work. In other words, this:
if lineList[11] in INFILE2:
might work once, then the file is consumed and it will return false all the time.
You also might want to change from:
f = open(...., ...)
# do something with f
f.close()
to
with open(name,"r") as f:
# do something with f, no close needed, closed when leaving block
as it is safer, will close the file even if exceptions happen.
To solve that try this (untested) code:
with open ("c:\\python\\test\\83887.filtered.csv", "r") as file2:
infile2 = file2.readlines() # read in all lines as list
with open ("c:\\python\\test\\58962.filtered.csv", "r") as INFILE1:
# next 2 lines are 1 line, \ at end signifies line continues
with open ("c:\\python\\test\\58962_vs_83887.common.csv", "w") as OUTFILE1, \
with open ("c:\\python\\test\\58962_vs_83887.unique.csv", "w") as OUTFILE2:
for line in INFILE1:
lineList = line.rstrip().split(",")
if any(lineList[11] in x for x in infile2): # check the list of lines if
# any contains line[11]
OUTFILE1.write(line)
else:
OUTFILE2.write(line)
# all files are autoclosed here
Links to read:
the-with-statement
any() and other built-ins

Two simple questions about python

I have 2 simple questions about python:
1.How to get number of lines of a file in python?
2.How to locate the position in a file object to the
last line easily?
lines are just data delimited by the newline char '\n'.
1) Since lines are variable length, you have to read the entire file to know where the newline chars are, so you can count how many lines:
count = 0
for line in open('myfile'):
count += 1
print count, line # it will be the last line
2) reading a chunk from the end of the file is the fastest method to find the last newline char.
def seek_newline_backwards(file_obj, eol_char='\n', buffer_size=200):
if not file_obj.tell(): return # already in beginning of file
# All lines end with \n, including the last one, so assuming we are just
# after one end of line char
file_obj.seek(-1, os.SEEK_CUR)
while file_obj.tell():
ammount = min(buffer_size, file_obj.tell())
file_obj.seek(-ammount, os.SEEK_CUR)
data = file_obj.read(ammount)
eol_pos = data.rfind(eol_char)
if eol_pos != -1:
file_obj.seek(eol_pos - len(data) + 1, os.SEEK_CUR)
break
file_obj.seek(-len(data), os.SEEK_CUR)
You can use that like this:
f = open('some_file.txt')
f.seek(0, os.SEEK_END)
seek_newline_backwards(f)
print f.tell(), repr(f.readline())
Let's not forget
f = open("myfile.txt")
lines = f.readlines()
numlines = len(lines)
lastline = lines[-1]
NOTE: this reads the whole file in memory as a list. Keep that in mind in the case that the file is very large.
The easiest way is simply to read the file into memory. eg:
f = open('filename.txt')
lines = f.readlines()
num_lines = len(lines)
last_line = lines[-1]
However for big files, this may use up a lot of memory, as the whole file is loaded into RAM. An alternative is to iterate through the file line by line. eg:
f = open('filename.txt')
num_lines = sum(1 for line in f)
This is more efficient, since it won't load the entire file into memory, but only look at a line at a time. If you want the last line as well, you can keep track of the lines as you iterate and get both answers by:
f = open('filename.txt')
count=0
last_line = None
for line in f:
num_lines += 1
last_line = line
print "There were %d lines. The last was: %s" % (num_lines, last_line)
One final possible improvement if you need only the last line, is to start at the end of the file, and seek backwards until you find a newline character. Here's a question which has some code doing this. If you need both the linecount as well though, theres no alternative except to iterate through all lines in the file however.
For small files that fit memory,
how about using str.count() for getting the number of lines of a file:
line_count = open("myfile.txt").read().count('\n')
I'd like too add to the other solutions that some of them (those who look for \n) will not work with files with OS 9-style line endings (\r only), and that they may contain an extra blank line at the end because lots of text editors append it for some curious reasons, so you might or might not want to add a check for it.
The only way to count lines [that I know of] is to read all lines, like this:
count = 0
for line in open("file.txt"): count = count + 1
After the loop, count will have the number of lines read.
For the first question there're already a few good ones, I'll suggest #Brian's one as the best (most pythonic, line ending character proof and memory efficient):
f = open('filename.txt')
num_lines = sum(1 for line in f)
For the second one, I like #nosklo's one, but modified to be more general should be:
import os
f = open('myfile')
to = f.seek(0, os.SEEK_END)
found = -1
while found == -1 and to > 0:
fro = max(0, to-1024)
f.seek(fro)
chunk = f.read(to-fro)
found = chunk.rfind("\n")
to -= 1024
if found != -1:
found += fro
It seachs in chunks of 1Kb from the end of the file, until it finds a newline character or the file ends. At the end of the code, found is the index of the last newline character.
Answer to the first question (beware of poor performance on large files when using this method):
f = open("myfile.txt").readlines()
print len(f) - 1
Answer to the second question:
f = open("myfile.txt").read()
print f.rfind("\n")
P.S. Yes I do understand that this only suits for small files and simple programs. I think I will not delete this answer however useless for real use-cases it may seem.
Answer1:
x = open("file.txt")
opens the file or we have x associated with file.txt
y = x.readlines()
returns all lines in list
length = len(y)
returns length of list to Length
Or in one line
length = len(open("file.txt").readlines())
Answer2 :
last = y[-1]
returns the last element of list
Approach:
Open the file in read-mode and assign a file object named “file”.
Assign 0 to the counter variable.
Read the content of the file using the read function and assign it to a
variable named “Content”.
Create a list of the content where the elements are split wherever they encounter an “\n”.
Traverse the list using a for loop and iterate the counter variable respectively.
Further the value now present in the variable Counter is displayed
which is the required action in this program.
Python program to count the number of lines in a text file
# Opening a file
file = open("filename","file mode")#file mode like r,w,a...
Counter = 0
# Reading from file
Content = file.read()
CoList = Content.split("\n")
for i in CoList:
if i:
Counter += 1
print("This is the number of lines in the file")
print(Counter)
The above code will print the number of lines present in a file. Replace filename with the file with extension and file mode with read - 'r'.

Categories