Check if log file has been updated / only parse new data - Python - python

I have a log file and I want to check, every n seconds (or when it has been modified), if there are new data which have been appended to it.
I check the number of lines, if the previous count < the new, I start to parse the new data at the index = the previous line count.
def parse_file(path, index_to_start_parsing):
//Parse the file
file_path = r"my_log_path"
previous_lines_count = 0
check_seconds = 5
while True:
time.sleep(check_seconds)
with open(file_path) as f:
current_lines_count = sum(1 for _ in f)
if current_lines_count > lines_count:
data = parse_file(file_path, previous_lines_count)
previous_lines_count = current_lines_count
It works, but I'm looking for a more optimized method.
How can I check if the file has changed (I read about watchdog) and how can I parse only the new data appended to the file with a more efficient way.
EDIT:
I use os.stat('somefile.txt').st_size to check if the file changed.

Use f.tell() to discover the current offset of the file pointer when you're done reading in your initial pass. On future passes, use f.seek() to advance the file pointer to that previous point in the file, and continue reading as normal.
offset = 0
while True:
f = open(file)
f.seek(offset)
while line = f.readline():
# process...
pass
offset = f.tell()
f.close()
# wait until next iteration
This is an old C trick.
More info here.
Checking if the file has been modified is usually as easy as looking at the file size and seeing if it's the same as the last time you looked.

Related

How to quickly get the last line of a huge csv file (48M lines)? [duplicate]

This question already has answers here:
How to read the last line of a file in Python?
(10 answers)
Closed 1 year ago.
I have a csv file that grows until it reaches approximately 48M of lines.
Before adding new lines to it, I need to read the last line.
I tried the code below, but it got too slow and I need a faster alternative:
def return_last_line(filepath):
with open(filepath,'r') as file:
for x in file:
pass
return x
return_last_line('lala.csv')
Here is my take, in python:
I created a function that lets you choose how many last lines, because the last lines may be empty.
def get_last_line(file, how_many_last_lines = 1):
# open your file using with: safety first, kids!
with open(file, 'r') as file:
# find the position of the end of the file: end of the file stream
end_of_file = file.seek(0,2)
# set your stream at the end: seek the final position of the file
file.seek(end_of_file)
# trace back each character of your file in a loop
n = 0
for num in range(end_of_file+1):
file.seek(end_of_file - num)
# save the last characters of your file as a string: last_line
last_line = file.read()
# count how many '\n' you have in your string:
# if you have 1, you are in the last line; if you have 2, you have the two last lines
if last_line.count('\n') == how_many_last_lines:
return last_line
get_last_line('lala.csv', 2)
This lala.csv has 48 million lines, such as in your example. It took me 0 seconds to get the last line.
Here is code for finding the last line of a file mmap, and it should work on Unixen and derivatives and Windows alike (I've tested this on Linux only, please tell me if it works on Windows too ;), i.e. pretty much everywhere where it matters. Since it uses memory mapped I/O it could be expected to be quite performant.
It expects that you can map the entire file into the address space of a processor - should be OK for 50M file everywhere but for 5G file you'd need a 64-bit processor or some extra slicing.
import mmap
def iterate_lines_backwards(filename):
with open(filename, "rb") as f:
# memory-map the file, size 0 means whole file
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
start = len(mm)
while start > 0:
start, prev = mm.rfind(b"\n", 0, start), start
slice = mm[start + 1:prev + 1]
# if the last character in the file was a '\n',
# technically the empty string after that is not a line.
if slice:
yield slice.decode()
def get_last_nonempty_line(filename):
for line in iterate_lines_backwards(filename):
if stripped := line.rstrip("\r\n"):
return stripped
print(get_last_nonempty_line("datafile.csv"))
As a bonus there is a generator iterate_lines_backwards that would efficiently iterate over the lines of a file in reverse for any number of lines:
print("Iterating the lines of datafile.csv backwards")
for l in iterate_lines_backwards("datafile.csv"):
print(l, end="")
This is generally a rather tricky thing to do. A very efficient way of getting a chunk that includes the last lines is the following:
import os
def get_last_lines(path, offset=500):
""" An efficient way to get the last lines of a file.
IMPORTANT:
1. Choose offset to be greater than
max_line_length * number of lines that you want to recover.
2. This will throw an os.OSError if the file is shorter than
the offset.
"""
with path.open("rb") as f:
f.seek(-offset, os.SEEK_END)
while f.read(1) != b"\n":
f.seek(-2, os.SEEK_CUR)
return f.readlines()
You need to know the maximum line length though and ensure that the file is at least one offset long!
To use it, do the following:
from pathlib import Path
n_last_lines = 10
last_bit_of_file = get_last_lines(Path("/path/to/my/file"))
real_last_n_lines = last_bit_of_file[-10:]
Now finally you need to decode the binary to strings:
real_last_n_lines_non_binary = [x.decode() for x in real_last_n_lines]
Probably all of this could be wrapped in one more convenient function.
If you are running your code in a Unix based environment, you can execute tail shell command from Python to read the last line:
import subprocess
subprocess.run(['tail', '-n', '1', '/path/to/lala.csv'])
You could additionally store the last line in a separate file, which you update whenever you add new lines to the main file.
This works well for me:
https://pypi.org/project/file-read-backwards/
from file_read_backwards import FileReadBackwards
with FileReadBackwards("/tmp/file", encoding="utf-8") as frb:
# getting lines by lines starting from the last line up
for l in frb:
if l:
print(l)
break
An easy way to do this is with deque:
from collections import deque
def return_last_line(filepath):
with open(filepath,'r') as f:
q = deque(f, 1)
return q[0]
since seek() returns the position that it moved to, you can use it to move backward and position the cursor to the beginning of the last line.
with open("test.txt") as f:
p = f.seek(0,2)-1 # ignore trailing end of line
while p>0 and f.read(1)!="\n": # detect end of line (or start of file)
p = f.seek(p-1,0) # search backward
lastLine = f.read().strip() # read from start of last line
print(lastLine)
To get the last non-empty line, you can add a while loop around the search:
with open("test.txt") as f:
p,lastLine = f.seek(0,2),"" # start from end of file
while p and not lastLine: # want last non-empty line
while p>0 and f.read(1)!="\n": # detect end of line (or start of file)
p = f.seek(p-1,0) # search backward
lastLine = f.read().strip() # read from start of last line
Based on #kuropan
Faster and shorter:
# 60.lastlinefromlargefile.py
# juanfc 2021-03-17
import os
def get_last_lines(fileName, offset=500):
""" An efficient way to get the last lines of a file.
IMPORTANT:
1. Choose offset to be greater than
max_line_length * number of lines that you want to recover.
2. This will throw an os.OSError if the file is shorter than
the offset.
"""
with open(fileName, "rb") as f:
f.seek(-offset, os.SEEK_END)
return f.read().decode('utf-8').rstrip().split('\n')[-1]
print(get_last_lines('60.lastlinefromlargefile.py'))

Is there a way to read file in reverse using with open using Python

I'm trying to read a file.out server file but I need to read only latest data in datetime range.
Is it possible to reverse read file using with open() with modes(methods)?
The a+ mode gives access to the end of the file:
``a+'' Open for reading and writing. The file is created if it does not
exist. The stream is positioned at the end of the file. Subsequent writes
to the file will always end up at the then current end of the file,
irrespective of any intervening fseek(3) or similar.
Is there a way to use maybe a+ or other modes(methods) to access the end of the file and read a specific range?
Since regular r mode reads file from beginning
with open('file.out','r') as file:
have tried using reversed()
for line in reversed(list(open('file.out').readlines())):
but it returns no rows for me.
Or there are other ways to reverse read file... help
EDIT
What I got so far:
import os
import time
from datetime import datetime as dt
start_0 = dt.strptime('2019-01-27','%Y-%m-%d')
stop_0 = dt.strptime('2019-01-27','%Y-%m-%d')
start_1 = dt.strptime('09:34:11.057','%H:%M:%S.%f')
stop_1 = dt.strptime('09:59:43.534','%H:%M:%S.%f')
os.system("touch temp_file.txt")
process_start = time.clock()
count = 0
print("reading data...")
for line in reversed(list(open('file.out'))):
try:
th = dt.strptime(line.split()[0],'%Y-%m-%d')
tm = dt.strptime(line.split()[1],'%H:%M:%S.%f')
if (th == start_0) and (th <= stop_0):
if (tm > start_1) and (tm < stop_1):
count += 1
print("%d occurancies" % (count))
os.system("echo '"+line.rstrip()+"' >> temp_file.txt")
if (th == start_0) and (tm < start_1):
break
except KeyboardInterrupt:
print("\nLast line before interrupt:%s" % (str(line)))
break
except IndexError as err:
continue
except ValueError as err:
continue
process_finish = time.clock()
print("Done:" + str(process_finish - process_start) + " seconds.")
I'm adding these limitations so when I find the rows it could atleast print that the occurancies appeared and then just stop reading the file.
The problem is that it's reading, but it's way too slow..
EDIT 2
(2019-04-29 9.34am)
All the answers I received works well for reverse reading logs, but in my (and maybe for other people's) case, when you have n GB size log Rocky's answer below suited me the best.
The code that works for me:
(I only added for loop to Rocky's code):
import collections
log_lines = collections.deque()
for line in open("file.out", "r"):
log_lines.appendleft(line)
if len(log_lines) > number_of_rows:
log_lines.pop()
log_lines = list(log_lines)
for line in log_lines:
print(str(line).split("\n"))
Thanks people, all the answers works.
-lpkej
There's no way to do it with open params but if you want to read the last part of a large file without loading that file into memory, (which is what reversed(list(fp)) will do) you can use a 2 pass solution.
LINES_FROM_END = 1000
with open(FILEPATH, "r") as fin:
s = 0
while fin.readline(): # fixed typo, readlines() will read everything...
s += 1
fin.seek(0)
mylines = []
for i, e in enumerate(fin):
if i >= s - LINES_FROM_END:
mylines.append(e)
This won't keep your file in the memory, you can also reduce this to one pass by using collections.deque
# one pass (a lot faster):
mylines = collections.deque()
for line in open(FILEPATH, "r"):
mylines.appendleft(line)
if len(mylines) > LINES_FROM_END:
mylines.pop()
mylines = list(mylines)
# mylines will contain #LINES_FROM_END count of lines from the end.
Sure there is:
filename = 'data.txt'
for line in reversed(list(open(filename))):
print(line.rstrip())
EDIT:
As mentioned in comments this will read the whole file into memory. This solution should not be used with large files.
Another option is to mmap.mmap the file and then use rfind from the end to search for the newlines and then slice out the lines.
Hey m8 I have made this code it works for me I can read in my file in reversed order. hope it helps :)
I start by creating a new text file, so I don't know how much that is important for you.
def main():
f = open("Textfile.txt", "w+")
for i in range(10):
f.write("line number %d\r\n" % (i+1))
f.close
def readReversed():
for line in reversed(list(open("Textfile.txt"))):
print(line.rstrip())
main()
readReversed()

Read only one random row from CSV file and move to another CSV

I'm facing a problem in reading random rows from a large csv file and moving it to another CSV file using 0.18.1 pandas and 2.7.10 Python on Windows.
I want to load only the randomly selected rows into the memory and move them to another CSV. I don't want to load the entire content of first CSV into memory.
This is the code I used:
import random
file_size = 100
f = open("customers.csv",'r')
o = open("train_select.csv", 'w')
for i in range(0, 50):
offset = random.randrange(file_size)
f.seek(offset)
f.readline()
random_line = f.readline()
o.write(random_line)
The current output looks something like this:
2;flhxu-name;tum-firstname; 17520;buo-city;1966/04/24;wfyz-street; 96;GA;GEORGIA
1;jwcdf-name;fsj-firstname; 13520;oem-city;1954/02/07;amrb-street; 145;AK;ALASKA
1;jwcdf-name;fsj-firstname; 13520;oem-city;1954/02/07;amrb-street; 145;AK;ALASKA
My problems are 2 fold:
I want to see the header also in the second csv and not just the rows.
A row should be selected by random function only once.
The output should be something like this:
id;name;firstname;zip;city;birthdate;street;housenr;stateCode;state
2;flhxu-name;tum-firstname; 17520;buo-city;1966/04/24;wfyz-street; 96;GA;GEORGIA
1;jwcdf-name;fsj-firstname; 13520;oem-city;1954/02/07;amrb-street; 145;AK;ALASKA
You have do simpler than that:
first, read the customers file fully, title is a special case, keep it out.
shuffle the list of lines (that's what you were looking for)
write back title + shuffled lines
code:
import random
with open("customers.csv",'r') as f:
title = f.readline()
lines = f.readlines()
random.shuffle(lines)
with open("train_select.csv", 'w') as f:
f.write(title)
f.writelines(lines)
EDIT: if you don't want to hold the whole file in memory, here's an alternative. The only drawback is that you have to read the file once (but not store in memory) to compute line offsets:
import random
input_file = "customers.csv"
line_offsets = list()
# just read the title
with open(input_file,'r') as f:
title = f.readline()
# store offset of the first
while True:
# store offset of the next line start
line_offsets.append(f.tell())
line = f.readline()
if line=="":
break
# now shuffle the offsets
random.shuffle(line_offsets)
# and write the output file
with open("train_select.csv", 'w') as fw:
fw.write(title)
for offset in line_offsets:
# seek to a line start
f.seek(offset)
fw.write(f.readline())
At OP request, and since my 2 previous implementations had to read the input file, here's a more complex implementation where the file is not read in advance.
It uses bisect to store the couples of offsets of the lines, and a minimum line len (to be configured) in order to avoid that the random list is too long for nothing.
Basically, the program generates randomly ordered offsets ranging from offset of the second line (title line skipped) to the end of file, by step of minimum_line_len.
For each offset, it checks if line has not already been read (using bisect, which is fast but further testing is complex because of the corner cases).
- if not read, skip back to find previous linefeed (that is reading the file, can't do otherwise) write it in the output file, store the start/end offsets in the couple list
- if already read, skip
the code:
import random,os,bisect
input_file = "csv2.csv"
input_size = os.path.getsize(input_file)
smallest_line_len = 4
line_offsets = []
# just read the title
with open(input_file,'r') as f, open("train_select.csv", 'w') as fw:
# read title and write it back
title = f.readline()
fw.write(title)
# generate offset list, starting from current pos to the end of file
# with a step of min line len to avoid generating too many numbers
# (this can be 1 but that will take a while)
offset_list = list(range(f.tell(),input_size,smallest_line_len))
# shuffle the list at random
random.shuffle(offset_list)
# now loop through the offsets
for offset in offset_list:
# look if the offset is already contained in the list of sorted tuples
insertion_point = bisect.bisect(line_offsets,(offset,0))
if len(line_offsets)>0 and insertion_point == len(line_offsets) and line_offsets[-1][1]>offset:
# bisect tells to insert at the end: check if within last couple boundary: if so, already processed
continue
elif insertion_point < len(line_offsets) and (offset==line_offsets[insertion_point][0] or
(0 < insertion_point and line_offsets[insertion_point-1][0]<=offset<=line_offsets[insertion_point-1][1])):
# offset is already known, line has already been processed: skip
continue
else:
# offset is not known: rewind until we meet an end of line
f.seek(offset)
while True:
c=f.read(1)
if c=="\n":
# we found the line terminator of the previous line: OK
break
offset -= 1
f.seek(offset)
# now store the current position: start of the current line
line_start = offset+1
# now read the line fully
line = f.readline()
# now compute line end (approx..)
line_end = f.tell() - 1
# and insert the "line" in the sorted list
line_offsets.insert(insertion_point,(line_start,line_end))
fw.write(line)
if

Prevent closing file in Python

I have a problem reading characters from a file. I have a file called fst.fasta and I want to know the number of occurrences of the letters A and T.
This is the first code sample :
f = open("fst.fasta","r")
a = f.read().count("A")
t = f.read().count("T")
print "nbr de A : ", a
print "nbr de T : ", t
The result:
nbr of A : 255
nbr of T : 0
Even if there are Ts i get always 0
But after that, I tried this :
f = open("fst.fasta","r")
a = f.read().count("A")
f = open("fst.fasta","r")
t = f.read().count("T")
print "nbr de A : ", a
print "nbr de T : ", t
This worked! Is there any other way to avoid repeating f = open("fst.fasta","r") ?
You're dealing with the fact that read() has a side effect (to use the term really loosely): it reads through the file and as it does so sets a pointer to where it is in that file. When it returns you can expect that pointer to be set to the last position. Therefore, executing read() again starts from that position and doesn't give you anything back. This is what you want:
f = open("fst.fasta","r")
contents = f.read()
a = contents.count("A")
t = contents.count("T")
The documentation also indicates other ways you can use read:
next_value = f.read(1)
if next_value == "":
# We have reached the end of the file
What has happened in the code above is that, instead of getting all the characters in the file, the file handler has only returned 1 character. You could replace 1 with any number, or even a variable to get a certain chunk of the file. The file handler remembers where the above-mentioned pointer is, and you can pick up where you left off. (Note that this is a really good idea for very large files, where reading it all into memory is prohibitive.)
Only once you call f.close() does the file handler 'forget' where it is - but it also forgets the file, and you'd have to open() it again to start from the beginning.
There are other functions provided (such as seek() and readline()) that will let you move around a file using different semantics. f.tell() will tell you where the pointer is in the file currently.
Each time you call f.read(), it consumes the entire remaining contents of the file and returns it. You then use that data only to count the as, and then attempt to read the data thats already been used. There are two solutions"
Option 1: Use f.seek(0)
a = f.read().count("A")
f.seek(0)
t = f.read().count("T")
The f.seek call sets the psoition of the file back to the beginning.
Option 2. Store the result of f.read():
data = f.read()
a = data.count("A")
t = data.count("T")
f.seek(0) before the second f.read() will reset the file pointer to the beginning of the file. Or more sanely, save the result of f.read() to a variable, and you can then call .count on that variable to your heart's content without rereading the file pointlessly.
Try the with construct:
with open("fst.fasta","r") as f:
file_as_string = f.read()
a = file_as_string.count("A")
t = file_as_string.count("T")
This keeps the file open until you exit the block.
Read it into a string:
f = open ("fst.fasta")
allLines = f.readlines()
f.close()
# At this point, you are no longer using the file handler.
for line in allLines:
print (line.count("A"), " ", line.count("T"))

Python3 - dumping a JSON data into penultimate line of a file [duplicate]

Is there a way to do this? Say I have a file that's a list of names that goes like this:
Alfred
Bill
Donald
How could I insert the third name, "Charlie", at line x (in this case 3), and automatically send all others down one line? I've seen other questions like this, but they didn't get helpful answers. Can it be done, preferably with either a method or a loop?
This is a way of doing the trick.
with open("path_to_file", "r") as f:
contents = f.readlines()
contents.insert(index, value)
with open("path_to_file", "w") as f:
contents = "".join(contents)
f.write(contents)
index and value are the line and value of your choice, lines starting from 0.
If you want to search a file for a substring and add a new text to the next line, one of the elegant ways to do it is the following:
import os, fileinput
old = "A"
new = "B"
for line in fileinput.FileInput(file_path, inplace=True):
if old in line :
line += new + os.linesep
print(line, end="")
There is a combination of techniques which I found useful in solving this issue:
with open(file, 'r+') as fd:
contents = fd.readlines()
contents.insert(index, new_string) # new_string should end in a newline
fd.seek(0) # readlines consumes the iterator, so we need to start over
fd.writelines(contents) # No need to truncate as we are increasing filesize
In our particular application, we wanted to add it after a certain string:
with open(file, 'r+') as fd:
contents = fd.readlines()
if match_string in contents[-1]: # Handle last line to prevent IndexError
contents.append(insert_string)
else:
for index, line in enumerate(contents):
if match_string in line and insert_string not in contents[index + 1]:
contents.insert(index + 1, insert_string)
break
fd.seek(0)
fd.writelines(contents)
If you want it to insert the string after every instance of the match, instead of just the first, remove the else: (and properly unindent) and the break.
Note also that the and insert_string not in contents[index + 1]: prevents it from adding more than one copy after the match_string, so it's safe to run repeatedly.
You can just read the data into a list and insert the new record where you want.
names = []
with open('names.txt', 'r+') as fd:
for line in fd:
names.append(line.split(' ')[-1].strip())
names.insert(2, "Charlie") # element 2 will be 3. in your list
fd.seek(0)
fd.truncate()
for i in xrange(len(names)):
fd.write("%d. %s\n" %(i + 1, names[i]))
The accepted answer has to load the whole file into memory, which doesn't work nicely for large files. The following solution writes the file contents with the new data inserted into the right line to a temporary file in the same directory (so on the same file system), only reading small chunks from the source file at a time. It then overwrites the source file with the contents of the temporary file in an efficient way (Python 3.8+).
from pathlib import Path
from shutil import copyfile
from tempfile import NamedTemporaryFile
sourcefile = Path("/path/to/source").resolve()
insert_lineno = 152 # The line to insert the new data into.
insert_data = "..." # Some string to insert.
with sourcefile.open(mode="r") as source:
destination = NamedTemporaryFile(mode="w", dir=str(sourcefile.parent))
lineno = 1
while lineno < insert_lineno:
destination.file.write(source.readline())
lineno += 1
# Insert the new data.
destination.file.write(insert_data)
# Write the rest in chunks.
while True:
data = source.read(1024)
if not data:
break
destination.file.write(data)
# Finish writing data.
destination.flush()
# Overwrite the original file's contents with that of the temporary file.
# This uses a memory-optimised copy operation starting from Python 3.8.
copyfile(destination.name, str(sourcefile))
# Delete the temporary file.
destination.close()
EDIT 2020-09-08: I just found an answer on Code Review that does something similar to above with more explanation - it might be useful to some.
You don't show us what the output should look like, so one possible interpretation is that you want this as the output:
Alfred
Bill
Charlie
Donald
(Insert Charlie, then add 1 to all subsequent lines.) Here's one possible solution:
def insert_line(input_stream, pos, new_name, output_stream):
inserted = False
for line in input_stream:
number, name = parse_line(line)
if number == pos:
print >> output_stream, format_line(number, new_name)
inserted = True
print >> output_stream, format_line(number if not inserted else (number + 1), name)
def parse_line(line):
number_str, name = line.strip().split()
return (get_number(number_str), name)
def get_number(number_str):
return int(number_str.split('.')[0])
def format_line(number, name):
return add_dot(number) + ' ' + name
def add_dot(number):
return str(number) + '.'
input_stream = open('input.txt', 'r')
output_stream = open('output.txt', 'w')
insert_line(input_stream, 3, 'Charlie', output_stream)
input_stream.close()
output_stream.close()
Parse the file into a python list using file.readlines() or file.read().split('\n')
Identify the position where you have to insert a new line, according to your criteria.
Insert a new list element there using list.insert().
Write the result to the file.
location_of_line = 0
with open(filename, 'r') as file_you_want_to_read:
#readlines in file and put in a list
contents = file_you_want_to_read.readlines()
#find location of what line you want to insert after
for index, line in enumerate(contents):
if line.startswith('whatever you are looking for')
location_of_line = index
#now you have a list of every line in that file
context.insert(location_of_line, "whatever you want to append to middle of file")
with open(filename, 'w') as file_to_write_to:
file_to_write_to.writelines(contents)
That is how I ended up getting whatever data I want to insert to the middle of the file.
this is just pseudo code, as I was having a hard time finding clear understanding of what is going on.
essentially you read in the file to its entirety and add it into a list, then you insert your lines that you want to that list, and then re-write to the same file.
i am sure there are better ways to do this, may not be efficient, but it makes more sense to me at least, I hope it makes sense to someone else.
A simple but not efficient way is to read the whole content, change it and then rewrite it:
line_index = 3
lines = None
with open('file.txt', 'r') as file_handler:
lines = file_handler.readlines()
lines.insert(line_index, 'Charlie')
with open('file.txt', 'w') as file_handler:
file_handler.writelines(lines)
I write this in order to reutilize/correct martincho's answer (accepted one)
! IMPORTANT: This code loads all the file into ram and rewrites content to the file
Variables index, value may be what you desire, but pay attention to making value string and end with '\n' if you don't want it to mess with existing data.
with open("path_to_file", "r+") as f:
# Read the content into a variable
contents = f.readlines()
contents.insert(index, value)
# Reset the reader's location (in bytes)
f.seek(0)
# Rewrite the content to the file
f.writelines(contents)
See the python docs about file.seek method: Python docs
Below is a slightly awkward solution for the special case in which you are creating the original file yourself and happen to know the insertion location (e.g. you know ahead of time that you will need to insert a line with an additional name before the third line, but won't know the name until after you've fetched and written the rest of the names). Reading, storing and then re-writing the entire contents of the file as described in other answers is, I think, more elegant than this option, but may be undesirable for large files.
You can leave a buffer of invisible null characters ('\0') at the insertion location to be overwritten later:
num_names = 1_000_000 # Enough data to make storing in a list unideal
max_len = 20 # The maximum allowed length of the inserted line
line_to_insert = 2 # The third line is at index 2 (0-based indexing)
with open(filename, 'w+') as file:
for i in range(line_to_insert):
name = get_name(i) # Returns 'Alfred' for i = 0, etc.
file.write(F'{i + 1}. {name}\n')
insert_position = file.tell() # Position to jump back to for insertion
file.write('\0' * max_len + '\n') # Buffer will show up as a blank line
for i in range(line_to_insert, num_names):
name = get_name(i)
file.write(F'{i + 2}. {name}\n') # Line numbering now bumped up by 1.
# Later, once you have the name to insert...
with open(filename, 'r+') as file: # Must use 'r+' to write to middle of file
file.seek(insert_position) # Move stream to the insertion line
name = get_bonus_name() # This lucky winner jumps up to 3rd place
new_line = F'{line_to_insert + 1}. {name}'
file.write(new_line[:max_len]) # Slice so you don't overwrite next line
Unfortunately there is no way to delete-without-replacement any excess null characters that did not get overwritten (or in general any characters anywhere in the middle of a file), unless you then re-write everything that follows. But the null characters will not affect how your file looks to a human (they have zero width).

Categories