I have a problem reading characters from a file. I have a file called fst.fasta and I want to know the number of occurrences of the letters A and T.
This is the first code sample :
f = open("fst.fasta","r")
a = f.read().count("A")
t = f.read().count("T")
print "nbr de A : ", a
print "nbr de T : ", t
The result:
nbr of A : 255
nbr of T : 0
Even if there are Ts i get always 0
But after that, I tried this :
f = open("fst.fasta","r")
a = f.read().count("A")
f = open("fst.fasta","r")
t = f.read().count("T")
print "nbr de A : ", a
print "nbr de T : ", t
This worked! Is there any other way to avoid repeating f = open("fst.fasta","r") ?
You're dealing with the fact that read() has a side effect (to use the term really loosely): it reads through the file and as it does so sets a pointer to where it is in that file. When it returns you can expect that pointer to be set to the last position. Therefore, executing read() again starts from that position and doesn't give you anything back. This is what you want:
f = open("fst.fasta","r")
contents = f.read()
a = contents.count("A")
t = contents.count("T")
The documentation also indicates other ways you can use read:
next_value = f.read(1)
if next_value == "":
# We have reached the end of the file
What has happened in the code above is that, instead of getting all the characters in the file, the file handler has only returned 1 character. You could replace 1 with any number, or even a variable to get a certain chunk of the file. The file handler remembers where the above-mentioned pointer is, and you can pick up where you left off. (Note that this is a really good idea for very large files, where reading it all into memory is prohibitive.)
Only once you call f.close() does the file handler 'forget' where it is - but it also forgets the file, and you'd have to open() it again to start from the beginning.
There are other functions provided (such as seek() and readline()) that will let you move around a file using different semantics. f.tell() will tell you where the pointer is in the file currently.
Each time you call f.read(), it consumes the entire remaining contents of the file and returns it. You then use that data only to count the as, and then attempt to read the data thats already been used. There are two solutions"
Option 1: Use f.seek(0)
a = f.read().count("A")
f.seek(0)
t = f.read().count("T")
The f.seek call sets the psoition of the file back to the beginning.
Option 2. Store the result of f.read():
data = f.read()
a = data.count("A")
t = data.count("T")
f.seek(0) before the second f.read() will reset the file pointer to the beginning of the file. Or more sanely, save the result of f.read() to a variable, and you can then call .count on that variable to your heart's content without rereading the file pointlessly.
Try the with construct:
with open("fst.fasta","r") as f:
file_as_string = f.read()
a = file_as_string.count("A")
t = file_as_string.count("T")
This keeps the file open until you exit the block.
Read it into a string:
f = open ("fst.fasta")
allLines = f.readlines()
f.close()
# At this point, you are no longer using the file handler.
for line in allLines:
print (line.count("A"), " ", line.count("T"))
Related
def antipreamble(file_name):
"""Removes the preamble from a text file"""
try:
fin = open(file_name, "r")
print(f"Opened {file_name} successfully")
except:
print(f"Sorry - could not open {file_name}")
i = 0
for line in fin:
if "*** START OF" in line:
text_start = i
if "*** END OF" in line:
text_end = i
i += 1
fin.seek(0)
i = 0
newfile_name = file_name[:-4] +"_new.txt"
try:
fout = open(newfile_name, "r+")
print(f"Opened {newfile_name} successfully")
except:
print(f"Sorry - could not open {newfile_name}")
i = 0
for lines in fin:
if i > text_start and i < text_end:
fout.write(lines)
i += 1
fin.close()
fout.close()
tried adding
global fout
but did nothing.
You wrote
fout = open(newfile_name, "r+")
I don't recommend that.
Using a write mode of "w" would more
clearly express your intent.
Avoid explicit .close(), and prefer to
routinely use a context manager:
with open(newfile_name, "w") as fout:
...
fout.write( ... )
Definitely avoid try statements that
do not actually handle / fix a specific error condition.
Much better to let the error bubble up the call
stack and display a helpful diagnostic message.
It's unclear exactly what format the input file is in.
Notice that this condition is exclusive:
if i > text_start and i < text_end:
Prefer inclusive:
if i >= text_start and i <= text_end:
if you intend to display lines that begin
with those START / END OF markers.
Identifier names really matter.
They strongly affect how we reason about code.
You wrote this perfectly sensible line:
for line in fin:
Later you wrote:
for lines in fin:
Prefer singular here.
The plural identifier you chose would be more
appropriate in an assignment like lines = fin.readlines().
Seeking to start-of-file is a nice technique,
and so is maintaining that pair of line indexes.
But we could produce the same effect with just
a single pass and a boolean flag.
Init selected = False.
Open files for read and for write.
Scan each input line.
If it's a START OF marker then set selected = True.
If selected, then write current line to the output file.
If we scan an END OF marker, then clear the selected flag.
This approach allows handling input files that
have multiple selected chunks, that is,
that have more than one START OF marker.
I would like to insert a string at a specific column of a specific line in a file.
Suppose I have a file file.txt
How was the English test?
How was the Math test?
How was the Chemistry test?
How was the test?
I would like to change the last line to say How was the History test? by adding the string History at line 4 column 13.
Currently I read in every line of the file and add the string to the specified position.
with open("file.txt", "r+") as f:
# Read entire file
lines = f.readlines()
# Update line
lino = 4 - 1
colno = 13 -1
lines[lino] = lines[lino][:colno] + "History " + lines[lino][colno:]
# Rewrite file
f.seek(0)
for line in lines:
f.write(line)
f.truncate()
f.close()
But I feel like I should be able to simply add the line to the file without having to read and rewrite the entire file.
This is possibly a duplicate of below SO thread
Fastest Way to Delete a Line from Large File in Python
In above it's a talk about delete, which is just a manipulation, and yours is more of a modification. So the code would get updated like below
def update(filename, lineno, column, text):
fro = open(filename, "rb")
current_line = 0
while current_line < lineno - 1:
fro.readline()
current_line += 1
seekpoint = fro.tell()
frw = open(filename, "r+b")
frw.seek(seekpoint, 0)
# read the line we want to update
line = fro.readline()
chars = line[0: column-1] + text + line[column-1:]
while chars:
frw.writelines(chars)
chars = fro.readline()
fro.close()
frw.truncate()
frw.close()
if __name__ == "__main__":
update("file.txt", 4, 13, "History ")
In a large file it make sense to not make modification till the lineno where the update needs to happen, Imagine you have file with 10K lines and update needs to happen at 9K, your code will load all 9K lines of data in memory unnecessarily. The code you have would work still but is not the optimal way of doing it
The function readlines() reads the entire file. But it doesn't have to. It actually reads from the current file cursor position to the end, which happens to be 0 right after opening. (To confirm this, try f.tell() right after with statement.) What if we started closer to the end of the file?
The way your code is written implies some prior knowledge of your file contents and layouts. Can you place any constraints on each line? For example, given your sample data, we might say that lines are guaranteed to be 27 bytes or less. Let's round that to 32 for "power of 2-ness" and try seeking backwards from the end of the file.
# note the "rb+"; need to open in binary mode, else seeking is strictly
# a "forward from 0" operation. We need to be able to seek backwards
with open("file.txt", "rb+") as f:
# caveat: if file is less than 32 bytes, this will throw
# an exception. The second parameter, 2, says "from end of file"
f.seek(-32, 2)
last = f.readlines()[-1].decode()
At which point the code has only read the last 32 bytes of the file.1 readlines() (at the byte level) will look for the line end byte (in Unix, \n or 0x0a or byte value 10), and return the before and after. Spelled out:
>>> last = f.readlines()
>>> print( last )
[b'hemistry test?\n', b'How was the test?']
>>> last = last[-1]
>>> print( last )
b'How was the test?'
Crucially, this works robustly under UTF-8 encoding by exploiting the UTF-8 property that ASCII byte values under 128 do not occur when encoding non-ASCII bytes. In other words, the exact byte \n (or 0x0a) only ever occurs as a newline and never as part of a character. If you are using a non-UTF-8 encoding, you will need to check if the code assumptions still hold.
Another note: 32 bytes is arbitrary given the example data. A more realistic and typical value might be 512, 1024, or 4096. Finally, to put it back to a working example for you:
with open("file.txt", "rb+") as f:
# caveat: if file is less than 32 bytes, this will throw
# an exception. The second parameter, 2, says "from end of file"
f.seek(-32, 2)
# does *not* read while file, unless file is exactly 32 bytes.
last = f.readlines()[-1]
last_decoded = last.decode()
# Update line
colno = 13 -1
last_decoded = last_decoded[:colno] + "History " + last_decoded[colno:]
last_line_bytes = len( last )
f.seek(-last_line_bytes, 2)
f.write( last_decoded.encode() )
f.truncate()
Note that there is no need for f.close(). The with statement handles that automatically.
1 The pedantic will correctly note that the computer and OS will likely have read at least 512 bytes, if not 4096 bytes, relating to the on-disk or in-memory page size.
You can use this piece of code :
with open("test.txt",'r+') as f:
# Read the file
lines=f.readlines()
# Gets the column
column=int(input("Column:"))-1
# Gets the line
line=int(input("Line:"))-1
# Gets the word
word=input("Word:")
lines[line]=lines[line][0:column]+word+lines[line][column:]
# Delete the file
f.seek(0)
for i in lines:
# Append the lines
f.write(i)
This answer will only loop through the file once and only write everything after the insert. In cases where the insert is at the end there is almost no overhead and where the insert at the beginning it is no worse than a full read and write.
def insert(file, line, column, text):
ln, cn = line - 1, column - 1 # offset from human index to Python index
count = 0 # initial count of characters
with open(file, 'r+') as f: # open file for reading an writing
for idx, line in enumerate(f): # for all line in the file
if idx < ln: # before the given line
count += len(line) # read and count characters
elif idx == ln: # once at the line
f.seek(count + cn) # place cursor at the correct character location
remainder = f.read() # store all character afterwards
f.seek(count + cn) # move cursor back to the correct character location
f.write(text + remainder) # insert text and rewrite the remainder
return # You're finished!
I'm not sure whether you were having problems changing your file to contain the word "History", or whether you wanted to know how to only rewrite certain parts of a file, without having to rewrite the whole thing.
If you were having problems in general, here is some simple code which should work, so long as you know the line within the file that you want to change. Just change the first and last lines of the program to read and write statements accordingly.
fileData="""How was the English test?
How was the Math test?
How was the Chemistry test?
How was the test?""" # So that I don't have to create the file, I'm writing the text directly into a variable.
fileData=fileData.split("\n")
fileData[3]=fileData[3][:11]+" History"+fileData[3][11:] # The 3 referes to the line to add "History" to. (The first line is line 0)
storeData=""
for i in fileData:storeData+=i+"\n"
storeData=storeData[:-1]
print(storeData) # You can change this to a write command.
If you wanted to know how to change specific "parts" to a file, without rewriting the whole thing, then (to my knowledge) that is not possible.
Say you had a file which said Ths is a TEST file., and you wanted to correct it to say This is a TEST file.; you would technically be changing 17 characters and adding one on the end. You are changing the "s" to an "i", the first space to an "s", the "i" (from "is") to a space, etc... as you shift the text forward.
A computer can't actually insert bytes between other bytes. It can only move the data, to make room.
Is there a way to do this? Say I have a file that's a list of names that goes like this:
Alfred
Bill
Donald
How could I insert the third name, "Charlie", at line x (in this case 3), and automatically send all others down one line? I've seen other questions like this, but they didn't get helpful answers. Can it be done, preferably with either a method or a loop?
This is a way of doing the trick.
with open("path_to_file", "r") as f:
contents = f.readlines()
contents.insert(index, value)
with open("path_to_file", "w") as f:
contents = "".join(contents)
f.write(contents)
index and value are the line and value of your choice, lines starting from 0.
If you want to search a file for a substring and add a new text to the next line, one of the elegant ways to do it is the following:
import os, fileinput
old = "A"
new = "B"
for line in fileinput.FileInput(file_path, inplace=True):
if old in line :
line += new + os.linesep
print(line, end="")
There is a combination of techniques which I found useful in solving this issue:
with open(file, 'r+') as fd:
contents = fd.readlines()
contents.insert(index, new_string) # new_string should end in a newline
fd.seek(0) # readlines consumes the iterator, so we need to start over
fd.writelines(contents) # No need to truncate as we are increasing filesize
In our particular application, we wanted to add it after a certain string:
with open(file, 'r+') as fd:
contents = fd.readlines()
if match_string in contents[-1]: # Handle last line to prevent IndexError
contents.append(insert_string)
else:
for index, line in enumerate(contents):
if match_string in line and insert_string not in contents[index + 1]:
contents.insert(index + 1, insert_string)
break
fd.seek(0)
fd.writelines(contents)
If you want it to insert the string after every instance of the match, instead of just the first, remove the else: (and properly unindent) and the break.
Note also that the and insert_string not in contents[index + 1]: prevents it from adding more than one copy after the match_string, so it's safe to run repeatedly.
You can just read the data into a list and insert the new record where you want.
names = []
with open('names.txt', 'r+') as fd:
for line in fd:
names.append(line.split(' ')[-1].strip())
names.insert(2, "Charlie") # element 2 will be 3. in your list
fd.seek(0)
fd.truncate()
for i in xrange(len(names)):
fd.write("%d. %s\n" %(i + 1, names[i]))
The accepted answer has to load the whole file into memory, which doesn't work nicely for large files. The following solution writes the file contents with the new data inserted into the right line to a temporary file in the same directory (so on the same file system), only reading small chunks from the source file at a time. It then overwrites the source file with the contents of the temporary file in an efficient way (Python 3.8+).
from pathlib import Path
from shutil import copyfile
from tempfile import NamedTemporaryFile
sourcefile = Path("/path/to/source").resolve()
insert_lineno = 152 # The line to insert the new data into.
insert_data = "..." # Some string to insert.
with sourcefile.open(mode="r") as source:
destination = NamedTemporaryFile(mode="w", dir=str(sourcefile.parent))
lineno = 1
while lineno < insert_lineno:
destination.file.write(source.readline())
lineno += 1
# Insert the new data.
destination.file.write(insert_data)
# Write the rest in chunks.
while True:
data = source.read(1024)
if not data:
break
destination.file.write(data)
# Finish writing data.
destination.flush()
# Overwrite the original file's contents with that of the temporary file.
# This uses a memory-optimised copy operation starting from Python 3.8.
copyfile(destination.name, str(sourcefile))
# Delete the temporary file.
destination.close()
EDIT 2020-09-08: I just found an answer on Code Review that does something similar to above with more explanation - it might be useful to some.
You don't show us what the output should look like, so one possible interpretation is that you want this as the output:
Alfred
Bill
Charlie
Donald
(Insert Charlie, then add 1 to all subsequent lines.) Here's one possible solution:
def insert_line(input_stream, pos, new_name, output_stream):
inserted = False
for line in input_stream:
number, name = parse_line(line)
if number == pos:
print >> output_stream, format_line(number, new_name)
inserted = True
print >> output_stream, format_line(number if not inserted else (number + 1), name)
def parse_line(line):
number_str, name = line.strip().split()
return (get_number(number_str), name)
def get_number(number_str):
return int(number_str.split('.')[0])
def format_line(number, name):
return add_dot(number) + ' ' + name
def add_dot(number):
return str(number) + '.'
input_stream = open('input.txt', 'r')
output_stream = open('output.txt', 'w')
insert_line(input_stream, 3, 'Charlie', output_stream)
input_stream.close()
output_stream.close()
Parse the file into a python list using file.readlines() or file.read().split('\n')
Identify the position where you have to insert a new line, according to your criteria.
Insert a new list element there using list.insert().
Write the result to the file.
location_of_line = 0
with open(filename, 'r') as file_you_want_to_read:
#readlines in file and put in a list
contents = file_you_want_to_read.readlines()
#find location of what line you want to insert after
for index, line in enumerate(contents):
if line.startswith('whatever you are looking for')
location_of_line = index
#now you have a list of every line in that file
context.insert(location_of_line, "whatever you want to append to middle of file")
with open(filename, 'w') as file_to_write_to:
file_to_write_to.writelines(contents)
That is how I ended up getting whatever data I want to insert to the middle of the file.
this is just pseudo code, as I was having a hard time finding clear understanding of what is going on.
essentially you read in the file to its entirety and add it into a list, then you insert your lines that you want to that list, and then re-write to the same file.
i am sure there are better ways to do this, may not be efficient, but it makes more sense to me at least, I hope it makes sense to someone else.
A simple but not efficient way is to read the whole content, change it and then rewrite it:
line_index = 3
lines = None
with open('file.txt', 'r') as file_handler:
lines = file_handler.readlines()
lines.insert(line_index, 'Charlie')
with open('file.txt', 'w') as file_handler:
file_handler.writelines(lines)
I write this in order to reutilize/correct martincho's answer (accepted one)
! IMPORTANT: This code loads all the file into ram and rewrites content to the file
Variables index, value may be what you desire, but pay attention to making value string and end with '\n' if you don't want it to mess with existing data.
with open("path_to_file", "r+") as f:
# Read the content into a variable
contents = f.readlines()
contents.insert(index, value)
# Reset the reader's location (in bytes)
f.seek(0)
# Rewrite the content to the file
f.writelines(contents)
See the python docs about file.seek method: Python docs
Below is a slightly awkward solution for the special case in which you are creating the original file yourself and happen to know the insertion location (e.g. you know ahead of time that you will need to insert a line with an additional name before the third line, but won't know the name until after you've fetched and written the rest of the names). Reading, storing and then re-writing the entire contents of the file as described in other answers is, I think, more elegant than this option, but may be undesirable for large files.
You can leave a buffer of invisible null characters ('\0') at the insertion location to be overwritten later:
num_names = 1_000_000 # Enough data to make storing in a list unideal
max_len = 20 # The maximum allowed length of the inserted line
line_to_insert = 2 # The third line is at index 2 (0-based indexing)
with open(filename, 'w+') as file:
for i in range(line_to_insert):
name = get_name(i) # Returns 'Alfred' for i = 0, etc.
file.write(F'{i + 1}. {name}\n')
insert_position = file.tell() # Position to jump back to for insertion
file.write('\0' * max_len + '\n') # Buffer will show up as a blank line
for i in range(line_to_insert, num_names):
name = get_name(i)
file.write(F'{i + 2}. {name}\n') # Line numbering now bumped up by 1.
# Later, once you have the name to insert...
with open(filename, 'r+') as file: # Must use 'r+' to write to middle of file
file.seek(insert_position) # Move stream to the insertion line
name = get_bonus_name() # This lucky winner jumps up to 3rd place
new_line = F'{line_to_insert + 1}. {name}'
file.write(new_line[:max_len]) # Slice so you don't overwrite next line
Unfortunately there is no way to delete-without-replacement any excess null characters that did not get overwritten (or in general any characters anywhere in the middle of a file), unless you then re-write everything that follows. But the null characters will not affect how your file looks to a human (they have zero width).
Is there a way to do this? Say I have a file that's a list of names that goes like this:
Alfred
Bill
Donald
How could I insert the third name, "Charlie", at line x (in this case 3), and automatically send all others down one line? I've seen other questions like this, but they didn't get helpful answers. Can it be done, preferably with either a method or a loop?
This is a way of doing the trick.
with open("path_to_file", "r") as f:
contents = f.readlines()
contents.insert(index, value)
with open("path_to_file", "w") as f:
contents = "".join(contents)
f.write(contents)
index and value are the line and value of your choice, lines starting from 0.
If you want to search a file for a substring and add a new text to the next line, one of the elegant ways to do it is the following:
import os, fileinput
old = "A"
new = "B"
for line in fileinput.FileInput(file_path, inplace=True):
if old in line :
line += new + os.linesep
print(line, end="")
There is a combination of techniques which I found useful in solving this issue:
with open(file, 'r+') as fd:
contents = fd.readlines()
contents.insert(index, new_string) # new_string should end in a newline
fd.seek(0) # readlines consumes the iterator, so we need to start over
fd.writelines(contents) # No need to truncate as we are increasing filesize
In our particular application, we wanted to add it after a certain string:
with open(file, 'r+') as fd:
contents = fd.readlines()
if match_string in contents[-1]: # Handle last line to prevent IndexError
contents.append(insert_string)
else:
for index, line in enumerate(contents):
if match_string in line and insert_string not in contents[index + 1]:
contents.insert(index + 1, insert_string)
break
fd.seek(0)
fd.writelines(contents)
If you want it to insert the string after every instance of the match, instead of just the first, remove the else: (and properly unindent) and the break.
Note also that the and insert_string not in contents[index + 1]: prevents it from adding more than one copy after the match_string, so it's safe to run repeatedly.
You can just read the data into a list and insert the new record where you want.
names = []
with open('names.txt', 'r+') as fd:
for line in fd:
names.append(line.split(' ')[-1].strip())
names.insert(2, "Charlie") # element 2 will be 3. in your list
fd.seek(0)
fd.truncate()
for i in xrange(len(names)):
fd.write("%d. %s\n" %(i + 1, names[i]))
The accepted answer has to load the whole file into memory, which doesn't work nicely for large files. The following solution writes the file contents with the new data inserted into the right line to a temporary file in the same directory (so on the same file system), only reading small chunks from the source file at a time. It then overwrites the source file with the contents of the temporary file in an efficient way (Python 3.8+).
from pathlib import Path
from shutil import copyfile
from tempfile import NamedTemporaryFile
sourcefile = Path("/path/to/source").resolve()
insert_lineno = 152 # The line to insert the new data into.
insert_data = "..." # Some string to insert.
with sourcefile.open(mode="r") as source:
destination = NamedTemporaryFile(mode="w", dir=str(sourcefile.parent))
lineno = 1
while lineno < insert_lineno:
destination.file.write(source.readline())
lineno += 1
# Insert the new data.
destination.file.write(insert_data)
# Write the rest in chunks.
while True:
data = source.read(1024)
if not data:
break
destination.file.write(data)
# Finish writing data.
destination.flush()
# Overwrite the original file's contents with that of the temporary file.
# This uses a memory-optimised copy operation starting from Python 3.8.
copyfile(destination.name, str(sourcefile))
# Delete the temporary file.
destination.close()
EDIT 2020-09-08: I just found an answer on Code Review that does something similar to above with more explanation - it might be useful to some.
You don't show us what the output should look like, so one possible interpretation is that you want this as the output:
Alfred
Bill
Charlie
Donald
(Insert Charlie, then add 1 to all subsequent lines.) Here's one possible solution:
def insert_line(input_stream, pos, new_name, output_stream):
inserted = False
for line in input_stream:
number, name = parse_line(line)
if number == pos:
print >> output_stream, format_line(number, new_name)
inserted = True
print >> output_stream, format_line(number if not inserted else (number + 1), name)
def parse_line(line):
number_str, name = line.strip().split()
return (get_number(number_str), name)
def get_number(number_str):
return int(number_str.split('.')[0])
def format_line(number, name):
return add_dot(number) + ' ' + name
def add_dot(number):
return str(number) + '.'
input_stream = open('input.txt', 'r')
output_stream = open('output.txt', 'w')
insert_line(input_stream, 3, 'Charlie', output_stream)
input_stream.close()
output_stream.close()
Parse the file into a python list using file.readlines() or file.read().split('\n')
Identify the position where you have to insert a new line, according to your criteria.
Insert a new list element there using list.insert().
Write the result to the file.
location_of_line = 0
with open(filename, 'r') as file_you_want_to_read:
#readlines in file and put in a list
contents = file_you_want_to_read.readlines()
#find location of what line you want to insert after
for index, line in enumerate(contents):
if line.startswith('whatever you are looking for')
location_of_line = index
#now you have a list of every line in that file
context.insert(location_of_line, "whatever you want to append to middle of file")
with open(filename, 'w') as file_to_write_to:
file_to_write_to.writelines(contents)
That is how I ended up getting whatever data I want to insert to the middle of the file.
this is just pseudo code, as I was having a hard time finding clear understanding of what is going on.
essentially you read in the file to its entirety and add it into a list, then you insert your lines that you want to that list, and then re-write to the same file.
i am sure there are better ways to do this, may not be efficient, but it makes more sense to me at least, I hope it makes sense to someone else.
A simple but not efficient way is to read the whole content, change it and then rewrite it:
line_index = 3
lines = None
with open('file.txt', 'r') as file_handler:
lines = file_handler.readlines()
lines.insert(line_index, 'Charlie')
with open('file.txt', 'w') as file_handler:
file_handler.writelines(lines)
I write this in order to reutilize/correct martincho's answer (accepted one)
! IMPORTANT: This code loads all the file into ram and rewrites content to the file
Variables index, value may be what you desire, but pay attention to making value string and end with '\n' if you don't want it to mess with existing data.
with open("path_to_file", "r+") as f:
# Read the content into a variable
contents = f.readlines()
contents.insert(index, value)
# Reset the reader's location (in bytes)
f.seek(0)
# Rewrite the content to the file
f.writelines(contents)
See the python docs about file.seek method: Python docs
Below is a slightly awkward solution for the special case in which you are creating the original file yourself and happen to know the insertion location (e.g. you know ahead of time that you will need to insert a line with an additional name before the third line, but won't know the name until after you've fetched and written the rest of the names). Reading, storing and then re-writing the entire contents of the file as described in other answers is, I think, more elegant than this option, but may be undesirable for large files.
You can leave a buffer of invisible null characters ('\0') at the insertion location to be overwritten later:
num_names = 1_000_000 # Enough data to make storing in a list unideal
max_len = 20 # The maximum allowed length of the inserted line
line_to_insert = 2 # The third line is at index 2 (0-based indexing)
with open(filename, 'w+') as file:
for i in range(line_to_insert):
name = get_name(i) # Returns 'Alfred' for i = 0, etc.
file.write(F'{i + 1}. {name}\n')
insert_position = file.tell() # Position to jump back to for insertion
file.write('\0' * max_len + '\n') # Buffer will show up as a blank line
for i in range(line_to_insert, num_names):
name = get_name(i)
file.write(F'{i + 2}. {name}\n') # Line numbering now bumped up by 1.
# Later, once you have the name to insert...
with open(filename, 'r+') as file: # Must use 'r+' to write to middle of file
file.seek(insert_position) # Move stream to the insertion line
name = get_bonus_name() # This lucky winner jumps up to 3rd place
new_line = F'{line_to_insert + 1}. {name}'
file.write(new_line[:max_len]) # Slice so you don't overwrite next line
Unfortunately there is no way to delete-without-replacement any excess null characters that did not get overwritten (or in general any characters anywhere in the middle of a file), unless you then re-write everything that follows. But the null characters will not affect how your file looks to a human (they have zero width).
I am reading a csv file several times, but cutting its size every time I go through it. So, once I've reached the bottom, I am writing a new csv file which is, say, the bottom half of the .csv file. I then wish to change the csv reader to use this new file instead, but it doesn't seem to be working... Here's what I've done.
sent = open(someFilePath)
r_send = csv.reader(sent)
try:
something = r_send.next()
except StopIteration:
sent.seek(0)
sent.close()
newFile = cutFile(someFilePath, someLineNumber)
sent = open(newFile, "r")
r_send = csv.reader(sent)
where cutFile does..
def cutFile(sender, lines):
sent = open(sender, "r")
new_sent = open(sender + ".temp.csv", "w")
counter = 0
for line in sent:
counter = counter + 1
if counter >= lines:
print >> new_sent, ",".join(line)
new_sent.close()
return sender + ".temp.csv"
Why is this not working?
Is something = r_send.next() in some kind of loop? The way you wrote it, it's only going to read one line.
Why do you need ",".join(line)? You can simply print line itself, and it should work.
Plus, there really is no need to seek(0) before closing a file.
I suggest the following:
Use for something in r_send: rather than something = r_send.next(); you won't even need the try... except blocks, as you'll just put the stuff closing the original file outside that loop (as someone else mentioned, you aren't even looping through the original file in your current code). Then you'll probably want to wrap all that in another loop so it keeps continuing until the file has been fully manipulated.
Use new_sent.write(line) instead of print >> new_sent, ",".join(line). Not that it makes that much of a difference besides the ",".join bit (which you don't need since you aren't using the csv module to write to a file), which you shouldn't be using here anyway, but it makes the fact that you're writing to a file more evident.
So...
sent = open(someFilePath)
r_send = csv.reader(sent)
someLineNumber = len(sent.readlines())
while someLineNumber > 0:
for something in r_send:
# do stuff
someLineNumber /= 2 # //= 2 in Python 3
sent.close()
newFile = cutFile(someFilePath, someLineNumber)
sent = open(newFile, "r")
r_send = csv.reader(sent)
Something like that.