Suppose I have a text file that goes like this:
AAAAAAAAAAAAAAAAAAAAA #<--- line 1
BBBBBBBBBBBBBBBBBBBBB #<--- line 2
CCCCCCCCCCCCCCCCCCCCC #<--- line 3
DDDDDDDDDDDDDDDDDDDDD #<--- line 4
EEEEEEEEEEEEEEEEEEEEE #<--- line 5
FFFFFFFFFFFFFFFFFFFFF #<--- line 6
GGGGGGGGGGGGGGGGGGGGG #<--- line 7
HHHHHHHHHHHHHHHHHHHHH #<--- line 8
Ignore "#<--- line...", it's just for demonstration
Assumptions
I don't know what line 3 is going to contain (because it changes
all the time)...
The first 2 lines have to be deleted...
After the first 2 lines, I want to keep 3 lines...
Then, I want to delete all lines after the 3rd line.
End Result
The end result should look like this:
CCCCCCCCCCCCCCCCCCCCC #<--- line 3
DDDDDDDDDDDDDDDDDDDDD #<--- line 4
EEEEEEEEEEEEEEEEEEEEE #<--- line 5
Lines deleted: First 2 + Everything after the next 3 (i.e. after line 5)
Required
All Pythonic suggestions are welcome! Thanks!
Reference Material
https://thispointer.com/python-how-to-delete-specific-lines-in-a-file-in-a-memory-efficient-way/
def delete_multiple_lines(original_file, line_numbers):
"""In a file, delete the lines at line number in given list"""
is_skipped = False
counter = 0
# Create name of dummy / temporary file
dummy_file = original_file + '.bak'
# Open original file in read only mode and dummy file in write mode
with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
# Line by line copy data from original file to dummy file
for line in read_obj:
# If current line number exist in list then skip copying that line
if counter not in line_numbers:
write_obj.write(line)
else:
is_skipped = True
counter += 1
# If any line is skipped then rename dummy file as original file
if is_skipped:
os.remove(original_file)
os.rename(dummy_file, original_file)
else:
os.remove(dummy_file)
Then...
delete_multiple_lines('sample.txt', [0,1,2])
The problem with this method might be that, if your file had 1-100 lines on top to delete, you'll have to specify [0,1,2...100]. Right?
Answer
Courtesy of #sandes
The following code will:
delete the first 63
get you the next 95
ignore the rest
create a new file
with open("sample.txt", "r") as f:
lines = f.readlines()
new_lines = []
idx_lines_wanted = [x for x in range(63,((63*2)+95))]
# delete first 63, then get the next 95
for i, line in enumerate(lines):
if i > len(idx_lines_wanted) -1:
break
if i in idx_lines_wanted:
new_lines.append(line)
with open("sample2.txt", "w") as f:
for line in new_lines:
f.write(line)
EDIT: iterating directly over f
based in #Kenny's comment and #chepner's suggestion
with open("your_file.txt", "r") as f:
new_lines = []
for idx, line in enumerate(f):
if idx in [x for x in range(2,5)]: #[2,3,4]
new_lines.append(line)
with open("your_new_file.txt", "w") as f:
for line in new_lines:
f.write(line)
This is really something that's better handled by an actual text editor.
import subprocess
subprocess.run(['ed', original_file], input=b'1,2d\n+3,$d\nwq\n')
A crash course in ed, the POSIX standard text editor.
ed opens the file named by its argument. It then proceeds to read commands from its standard input. Each command is a single character, with some commands taking one or two "addresses" to indicate which lines to operate on.
After each command, the "current" line number is set to the line last affected by a command. This is used with relative addresses, as we'll see in a moment.
1,2d means to delete lines 1 through 2; the current line is set to 2
+3,$d deletes all the lines from line 5 (current line is 2, so 2 + 3 == 5) through the end of the file ($ is a special address indicating the last line of the file)
wq writes all changes to disk and quits the editor.
Related
Im stuck trying to fix a csv file. The csv has 8 columns and all of them are on single lines. I need to be able to add a "," (Already did that) and then bring up 8 comas to the same line.
Example :
data
data
data
data
data
data
data, data, data, data, data, data
Every 8 Lines.
import pandas as pd
filepath = "file.txt"
with open(filepath) as f:
lines = f.read().splitlines()
with open(filepath, "w") as f:
for line in lines:
f.write(line + ",\n")
dataframe1 = pd.read_csv("file.txt")
# storing this dataframe in a csv file
dataframe1.to_csv('Exported.csv',
index = None)
If you want to modify every 8th line of the input file:
open the input file for reading and create a new file for writing
iterate the input file, line by line
every 8th line, transform the line (before writing)
write the line
For purpose of illustrating how to get some fine control over this, I'm going to fix every 4th line of this file:
R01_C1 R01_C2 R01_C3 R01_C4
R02_C1 R02_C2 R02_C3 R02_C4
R03_C1 R03_C2 R03_C3 R03_C4
R04_C1 R04_C2 R04_C3 R04_C4
R05_C1 R05_C2 R05_C3 R05_C4
R06_C1 R06_C2 R06_C3 R06_C4
R07_C1 R07_C2 R07_C3 R07_C4
R08_C1 R08_C2 R08_C3 R08_C4
R09_C1 R09_C2 R09_C3 R09_C4
R10_C1 R10_C2 R10_C3 R10_C4
R11_C1 R11_C2 R11_C3 R11_C4
R12_C1 R12_C2 R12_C3 R12_C4
I run this:
with open("bad-file.txt") as in_f, open("good-file.txt", "w") as out_f:
for i, line in enumerate(in_f):
if i % 4 == 0:
line = line.replace(" ", ",")
out_f.write(line)
and end up with this:
R01_C1,R01_C2,R01_C3,R01_C4 <-- line 1
R02_C1 R02_C2 R02_C3 R02_C4
R03_C1 R03_C2 R03_C3 R03_C4
R04_C1 R04_C2 R04_C3 R04_C4
R05_C1,R05_C2,R05_C3,R05_C4 <-- line 5
R06_C1 R06_C2 R06_C3 R06_C4
R07_C1 R07_C2 R07_C3 R07_C4
R08_C1 R08_C2 R08_C3 R08_C4
R09_C1,R09_C2,R09_C3,R09_C4 <-- line 9
R10_C1 R10_C2 R10_C3 R10_C4
R11_C1 R11_C2 R11_C3 R11_C4
R12_C1 R12_C2 R12_C3 R12_C4
You can control where the replacement starts by playing with the start value of enumerate():
for i, line in enumerate(in_f, 1):
yields
R01_C1 R01_C2 R01_C3 R01_C4
R02_C1 R02_C2 R02_C3 R02_C4
R03_C1 R03_C2 R03_C3 R03_C4
R04_C1,R04_C2,R04_C3,R04_C4 <-- line 4
R05_C1 R05_C2 R05_C3 R05_C4
R06_C1 R06_C2 R06_C3 R06_C4
R07_C1 R07_C2 R07_C3 R07_C4
R08_C1,R08_C2,R08_C3,R08_C4 <-- line 8
R09_C1 R09_C2 R09_C3 R09_C4
R10_C1 R10_C2 R10_C3 R10_C4
R11_C1 R11_C2 R11_C3 R11_C4
R12_C1,R12_C2,R12_C3,R12_C4 <-- line 12
I'm reading a 15GB of file in python, my code looks like that:
infile = open(file, "r")
count=0
line = infile.readline()
num_lines = int(sum(1 for line in open(file)))
while line:
if count%2==0:
if count>num_lines:
break
fields=line.split(";")
tr=int(fields[0].split(",")[1])
for ff in fields[1:]:
ffsplit=ff.split(",")
address=int(ffsplit[0])
amount=int(ffsplit[1])
if address not in add_balance.keys():
add_balance[address]=-amount
else:
add_balance[address]-=amount
if address not in de_send.keys():
de_send[address]=1
else:
de_send[address]+=1
else:
fields=line.split(";")
for ff in fields:
ffsplit=ff.split(",")
address=int(ffsplit[0])
amount=int(ffsplit[1])
if address not in add_balance.keys():
add_balance[address]=amount
else:
add_balance[address]+=amount
if address not in de_rec.keys():
de_rec[address]=1
else:
de_rec[address]+=1
count+=1
line=infile.readline()
Now when tr is in a certain range ([100000,200000],[200000,300000] and so on) I need to create a networkX graph (adding tr and address in the range as nodes) in that range and do some other operations while still updating the dictionaries.
tr works like an index so starting from 1 every two line (that's the reason of the count%2==0) in increase by 1
I tried to create a def createGraph that while reading the file also creates nodes in that range. My problem is that every time i create the graph I start reading the file from the beginning so obviously it wasn't computationally efficient.
How can I starting from a certain tr (let's say 100000) create a graph every 100000 tr inside the whlie clause?
if the file never change, you can precompute the position of the desire line using .tell and then use .seek method to move to that line and start working from there
>>> with open("test.txt","w") as file: #demostration file
for n in range(10):
print("line",n,file=file)
>>> desire_line=4
>>> position_line=0
>>> with open("test.txt") as file: #get the line position
for i,n in enumerate(iter(file.readline,"")):
if i==desire_line:
break
position_line=file.tell()
>>> with open("test.txt") as file:
file.seek(position_line)
for line in file:
print(line)
40
line 5
line 6
line 7
line 8
line 9
>>>
if the file does change, in particular in the lines prior to your desire point which will messed up the seek, you can use the itertools module to help you get there
>>> import itertools
>>> with open("test.txt") as file:
for line in itertools.islice(file,5,None):
print(line)
line 5
line 6
line 7
line 8
line 9
>>>
For more alternatives check this answer
I have test.txt that looks like the screenshot
PS: There are trailing spaces in the second line, so it is < space > line 2
in result we have to get:
line 1
line 2
line 3
This is what I have so far
with open("test", 'r+') as fd:
lines = fd.readlines()
fd.seek(0)
fd.writelines(line for line in lines if line.strip())
fd.truncate()
But it is not handling cases when the line starts with space (in the example, line 2) , How do I modify my code? I want to us Python
Test.txt file:
first line
second line
third line
Python Code:
#// Imports
import os
#// Global Var's
fileName:str = "test.txt"
#// Logic
def FileCorection(file:str):
try:
#// Read original file
with open(file, "r") as r:
#// Write a temporary file
with open(f"temp_{file}", "w") as w:
# Get line from original file
line = r.readline()
# While we still have lines
while line:
# Make a temporary line with out spaces at the end and also at the front of the line (in case they are)
tempLine:str = line.strip()
#// Check if the line is empty
if tempLine == "":
Line:tuple = (False, "Empty line...")
#// If not then get the line
else:
Line:tuple = (True, tempLine)
#// Print/Show Line if is True... in this case you care set witch line to pre writed in a new file
if Line[0] == True:
print(Line[1])
w.write(f"{Line[1]}\n")
line = r.readline()
finally:
# Make shore the files are closed
# By default they shood but just to make shore
r.close()
w.close()
# Now repalce the temp file with the original one
# By replaceing we delete the original one and we rename the temporary one with the same name
os.remove(file)
os.rename(f"temp_{file}", file)
if __name__ == "__main__":
FileCorection(fileName)
# Show when is done
print(">> DONE!")
Console out:
first line
second line
third line
>> DONE!
Process finished with exit code 0
P.S.: The code was updated/optimized!
I would suggest formatting the input(screenshot of the text file would do). Assuming your input looks like this you can use strip when text begins with a space.
#Code
with open(r"demo.txt","r") as f:
data = f.read()
data_list = [s.strip() for s in data.split("\n") if len(s)>0]
print("\n".join(data_list))
I am trying to copy lines four lines before a line that contains a specific keyword.
if line.find("keyword") == 0:
f.write(line -3)
I don't need the line where I found the keyword, but 4 lines before it. Since the write method doesn't work with line numbers, I got stuck
If you're already using two files, it's as simple as keeping a buffer and writing out the last 3 entries in it when you encounter a match:
buf = [] # your buffer
with open("in_file", "r") as f_in, open("out_file", "w") as f_out: # open the in/out files
for line in f_in: # iterate the input file line by line
if "keyword" in line: # the current line contains a keyword
f_out.writelines(buf[-3:]) # write the last 3 lines (or less if not available)
f_out.write(line) # write the current line, omit if not needed
buf = [] # reset the buffer
else:
buf.append(line) # add the current line to the buffer
You can just use a list, append to the list each line (and truncate to last 4). When you reach the target line you are done.
last_3 = []
with open("the_dst_file") as fw:
with open("the_source_file") as fr:
for line in fr:
if line.find("keyword") == 0:
fw.write(last_3[0] + "\n")
last_3 = []
continue
last_3.append(line)
last_3 = last_3[-3:]
If the format of the file is known in a way that "keyword" will always have at least 3 lines preceding it, and at least 3 lines between instances, then the above is good. If not, then you would need to guard against the write by checking that the len of last_3 is at == 3 before pulling off the first element.
I have a file, after reading a line from the file I have named it current_line, I want to fetch the 4th line above the current_line. How can this be done using python?
line 1
line 2
line 3
line 4
line 5
line 6
Now say I have fetched line 6 and I have made
current_line = line 6
Now i want 4 the line from above (ie) N now want line 2
output_line = line 2
PS: I dont want to read the file from bottom.
You can keep a list of the last 4 lines while iterating over the lines of your file. A good way to do it is to use a deque with a maximum length of 4:
from collections import deque
last_lines = deque(maxlen=4)
with open('test.txt') as f:
for line in f:
if line.endswith('6\n'): # Your real condition here
print(last_lines[0])
last_lines.append(line)
# Output:
# line 2
Once a bounded length deque is full, when new items are added, a
corresponding number of items are discarded from the opposite end.
We read the file line by line and only keep the needed lines in memory.
Imagine we have just read line 10. We have lines 6 to 9 in the queue.
If the condition is met, we retrieve line 6 at the start of the queue and use it.
We append line 10 to the deque, the first item (line 6) gets pushed out, as we are sure that we won't need it anymore, we now have lines 7 to 10 in the queue.
My approach would be converting the contents to a list splitting on \n and retrieving required line by index.
lines = '''line 1
line 2
line 3
line 4
line 5
line 6'''
s = lines.split('\n')
current_line = 'line 6'
output_line = s[s.index(current_line) - 4]
# line 2
Since you are reading from file, you don't need to explicitly split on \n. You could read from file as list of lines using readlines:
with open('path/to/your_file') as f:
lines = f.readlines()
current_line = 'line 6'
output_line = lines[lines.index(current_line) - 4]
# line 2
You can use enumerate for your open(). For example:
with open('path/to/your.file') as f:
for i, line in enumerate(f):
# Do something with line
# And you have the i as index.
To go back to the i-4 line, you may think about using while.
But why do you need to go back?
you can do:
with open("file.txt") as f:
lines = f.readlines()
for nbr_line, line in enumerate(lines):
if line == ...:
output_line = lines[nbr_line - 4] # !!! nbr_line - 4 may be < 0
As I can see you are reading the file line by line. I suggest you to read whole file into the list as below example.
with open("filename.txt","r") as fd:
lines=fd.readlines() # this will read each line and append it to lines list
lines[line_number] will give you the respected line.
f.readLines not effective solution. If you work on huge file why do you want to read all file into memory?
def getNthLine(i):
if i<1:
return 'NaN';
else:
with open('temp.text', 'r') as f:
for index, line in enumerate(f):
if index == i:
line = line.strip()
return line
f = open('temp.text','r');
for i,line in enumerate(f):
print(line.strip());
print(getNthLine(i-1));
There is no much more options to solve that kind of a problem.
you could also use tell and seek methods to play around but generally no need for ninja :).
If you using on huge file just do not forget to use enumerate
This is how you could do it with a generator, avoids reading the whole file into memory.
Update: used collections.deque (deque stands for "double ended queue") as recommended by Thierry Lathuille.
import collections
def file_generator(filepath):
with open(filepath) as file:
for l in file:
yield l.rstrip()
def get_n_lines_previous(filepath, n, match):
file_gen = file_generator(filepath)
stored_lines = collections.deque('',n)
for line in file_gen:
if line == match:
return stored_lines[0]
stored_lines.append(line)
if __name__ == "__main__":
print(get_n_lines_previous("lines.txt", 4, "line 6"))