Python: read a line and write back to that same line - python

I am using python to make a template updater for html. I read a line and compare it with the template file to see if there are any changes that needs to be updated. Then I want to write any changes (if there are any) back to the same line I just read from.
Reading the file, my file pointer is positioned now on the next line after a readline(). Is there anyway I can write back to the same line without having to open two file handles for reading and writing?
Here is a code snippet of what I want to do:
cLine = fp.readline()
if cLine != templateLine:
# Here is where I would like to write back to the line I read from
# in cLine

Updating lines in place in text file - very difficult
Many questions in SO are trying to read the file and update it at once.
While this is technically possible, it is very difficult.
(text) files are not organized on disk by lines, but by bytes.
The problem is, that read number of bytes on old lines is very often different from new one, and this mess up the resulting file.
Update by creating a new file
While it sounds inefficient, it is the most effective way from programming point of view.
Just read from file on one side, write to another file on the other side, close the files and copy the content from newly created over the old one.
Or create the file in memory and finally do the writing over the old one after you close the old one.

At the OS level the things are a bit different from how it looks from Python - from Python a file looks almost like a list of strings, with each string having arbitrary length, so it seems to be easy to swap a line for something else without affecting the rest of the lines:
l = ["Hello", "world"]
l[0] = "Good bye"
In reality, though, any file is just a stream of bytes, with strings following each other without any "padding". So you can only overwrite the data in-place if the resulting string has exactly the same length as the source string - otherwise it'll simply overwrite the following lines.
If that is the case (your processing guarantees not to change the length of strings), you can "rewind" the file to the start of the line and overwrite the line with new data. The below script converts all lines in file to uppercase in-place:
def eof(f):
cur_loc = f.tell()
f.seek(0,2)
eof_loc = f.tell()
f.seek(cur_loc, 0)
if cur_loc >= eof_loc:
return True
return False
with open('testfile.txt', 'r+t') as fp:
while True:
last_pos = fp.tell()
line = fp.readline()
new_line = line.upper()
fp.seek(last_pos)
fp.write(new_line)
print "Read %s, Wrote %s" % (line, new_line)
if eof(fp):
break
Somewhat related: Undo a Python file readline() operation so file pointer is back in original state
This approach is only justified when your output lines are guaranteed to have the same length, and when, say, the file you're working with is really huge so you have to modify it in place.
In all other cases it would be much easier and more performant to just build the output in memory and write it back at once. Another option is to write to a temporary file, then delete the original and rename the temporary file so it replaces the original file.

Related

Replacing a line on Python

I'm trying to convert PHP code to Python, and I have problems with replacing lines. Although I find it easier to do using Python, I'm absolutely lost; I can find the line to replace, I can add something to the end of the line, but I can't write the line again on the file.
file = open("cache.ucb", 'rb')
for line in file:
if line.split('~!')[0] == ex[4]:
line += "~!" + mask[0]
line = line.rstrip() + "\n"
# Write on the file here!
Basically, the file uses ~! as a separator, and I read each line. If the first token separated with ~! of the line starts with ex[4], which could be for example Catbuntu, I want to append mask[0], which could be Bousie, on the end of that line. Then I remove the new line characters and add one to the end.
And there's the problem. I want to write the file as it was, but changing only that line. Is that possible?
Assuming you're on python >=2.7, the following should work a treat
original = open(filename)
newfile = []
for line in original:
if line.split('~!')[0] == ex[4]:
line += "~!" + mask[0]
line = line.rstrip() + "\n"
newfile.append(line)
original.close()
amended.open(filename, "w")
amended.writeLines(newfile)
amended.close()
If for whatever reason you are on python 2.6 or lower, replace the second to last line with:
amended.write("".join(newfile))
EDIT: Fixed to replace a mistake copied from the question, factor out a filename.
You cannot modify a file in-place, at least not if you want to insert characters to a line. You'll just end up overwriting the start of the next line.
There are two different ways to do this:
Read the file into memory, close it, then write back the new version.
Write a new temporary file as you go along, then move it over the original version.
So, how do you choose between them? I'll try to summarize the differences, ordered so that each one typically trumps the ones below if it's important (but that's just "typically"—you have to think through your own use case):
2 doesn't require holding the entire thing in memory. If your file is, say, 20GB long, this is obviously a huge win; if it's 16KB, it doesn't matter.
2 makes the entire operation atomic. Even if it fails halfway through, or some other process tries to read the file while you're in the middle of changing it, there is no way anyone can see some invalid half-modified file; they will see either the original file, or the new one.
2 requires some free disk space (because there are, temporarily, two copies of the file at the same time).
2 is a huge pain in the neck if you care about both Windows and POSIX.
2 can involve copying across filesystems if the original file and the temp directory are on different filesystems, unless you're careful about it.
2 is simpler if neither of the above two are an issue.
Drakekin's answer tells you how to do #1.
Here's how to do #2 if you don't care about Windows or about cross-filesystem issues:
infile = open("cache.ucb", 'rb')
outfile = tempfile.NamedTemporaryFile(delete=False)
for line in infile:
if line.split('~!')[0] == ex[4]:
line += "~!" + mask[0]
line = line.rstrip() + "\n"
outfile.write(line)
infile.close()
os.rename(outfile.name, "cache.ucb")
outfile.close()
You can solve the cross-filesystem problem by, e.g., passing dir=os.path.dirname(original path) to the NamedTemporaryFile constructor, but only if you're sure you'll always have permissions to create a new file alongside the original (which isn't always guaranteed, just because you have permission to rewrite the original—UNIX permissions, Windows ACLs, the OS X sandbox, etc. all give ways that can be false).
To solve the Windows problem… well, start with Is an atomic file rename (with overwrite) possible on Windows, and similar discussions all over the internet.
Open the file in mode 'wb' and put file.write(line) at the end of your loop.
You don't have your file open for writing.
file = open("cache.ucb", 'rb')
This line opens a file for reading in binary mode. You need to open it for writing also.
Try opening the file in write mode, 'w' and writing the line back.
Or you can simply open your file for read/write at the beginning and write inside your loop:
file = open("cache.ucb", 'a+')

file pointer down then over

The Task
I am writing a program in python that running a SAP2000 program by importing a new .s2k file each time into the Sap2000 program, and then a new file is generated from the results of the previous run by the means of exporting the data.
The file is about 1,500 lines containing arbitrary words and numbers. (For a better understanding, see this: http://pastebin.com/8ptYacJz, which is the file I am dealing with.)
I'm required to replace one number in the file.
That number is somewhere in the middle of line 800.
The Question
Does anyone know an efficient way to move down to the middle of line 800 in a file, in order to replace one number?
What I've Tried
Regular expressions did not work, because there can be more then one instance of the same number.
So I came up with the solution of templating the file and writing a new file each time with the number to be changed as a template parameter.
This solution does work but the person insists that I can move the file pointer down to line 800, then over to the middle of the line to replace the number.
Here is the only code I have for the problem that takes the file buffer to a line then back up to the beginning when I try to seek over.
import sys
import os
#open file
f = open("output.$2k")
#this will go to line 883 in text file
count = 0;
while count < 883:
line = f.readline()
count = count+1
#this would seek over to middle of file DOESN'T WORK
f.seek(0,0)
line = f.readline()
print(line)
f.close()
Yes and no. Consider:
f=open('output.$2k','r+')
f.seek(300)
f.write('\n')
f.close()
This script just changes the 300th character in your ascii file to a newline. Now the tricky part is that there is no way to know the length of a line in an ascii file short of reading until you get to a newline. So, locating the particular character in the file at the middle of the 800th line is non-trivial. However, if you can make guarantees (due to the way the file was written) about the line length, you can calculate the position without any problem. Also note that replacing 1 with 100 won't work here. You need to replace 1 character with 1 character.
And just for all the other *NIX users in the world ... please don't put $ in your filename. That's just a nightmare...
OK, i'm not a professional programmer, but my (stupid) approach would be: If it's always line 800, read the file line by line while tracking the line numbers. Write then directly to a new file. Read line 800, change it, write it. Then write the rest. Dumb and not elegant but it should work-unless i miss something which i probably do. And there goes my meager reputation :D
No. Read in the line, manipulate it, then write it out to the new file you've previously opened for writing (and have been writing the other lines to, unmodified).
A first thing:
#this would seek over to middle of file DOESN'T WORK
f.seek(0,0)
this is not true. This seeks to the beginning of the file.
To your actual question:
Does anyone know an efficient way to move down to the middle of line 800 in a file, in order to replace one number?
In general, no. You'd need to rewrite the file. For example like this:
# open the file in read-and-update mode
with open("file", 'r+') as f:
# read all lines
lines = f.readlines()
# update 800'th line
my_line = lines[799].split()
my_line[5] = "%s" % my_number # TODO: put in index of number and updated number
lines[799] = " ".join(my_line)
# truncate and rewrite file
f.truncate(0)
f.writelines(lines)
You can do it, if the starting position of the number in the file is predictable (e.g. number_starting_pos = 1234 from the beginning of the file) and the size of the string representation is also predictable (e.g. 20).
Then you could rewrite the number and make sure you fill up the padding with whitespace again to overwrite any content of the previous entry.
Similar to this:
with open("file", 'r+') as f:
# seek to the number starting position
f.seek(number_starting_pos, 0)
# update number field, assuming width (20), arbitrary space-padding allowed
my_number_string = "%19s " % my_number
# make sure the string is indeed exactly of the specific size (it may be longer)
assert len(my_number_string) == 20, "file writing would fail! aborting!"
f.write(my_number_string)
For this to work, you'd need to have a look at the docs of your SAP-thingy, and see if whitespace indeed not matters.
However, both approaches are based on a lot of assumptions. Depending on your use case it may easily break your code, e.g. if a line is inserted or even a characters is inserted before the number field.

Python open() modes and file writing

I'm learning PyGTK and I'm making a Text Editor (That seems to be the hello world of pygtk :])
Anyways, I have a "Save" function that writes the TextBuffer to a file. Looks something like
try:
f = open(self.working_file_path, "rw+")
buff = self._get_buffer()
f.write(self._get_text())
#update modified flag
buff.set_modified(False)
f.close()
except IOError as e:
print "File Doesnt Exist so bring up Save As..."
......
Basically, if the file exist, write the buffer to it, if not bring up the Save As Dialog.
My question is: What is the best way to "update" a file. I seem to only be able to append to the end of a file. I've tried various file modes, but I'm sure I'm missing something.
Thanks in advance!
You can open a file in "r+" mode, which allows you to both read and write to the file, and to seek to particular positions and write there. This probably doesn't help you do what I think you want though; it sounds like you're wanting to only write out the changed data?
Remember that on the disk the file isn't stored as a series of extensible lines, it's just a sequence of bytes; some of those bytes indicate line-endings, but the next line follows on immediately. So if you edit the first line in the file and you write the new first line out, unless the new one happens to be exactly the same length as the old one the second line now won't be in the right place, so you'll need to move it (and have taken a copy of it first if the new line you wrote out was longer than the original). And this now means that the next line isn't in the right position either... and so on until you've had to read in and write out the entire rest of the file.
In practice you almost never write only part of an existing file unless you can simply append more data; if you need to "alter" a file you read it in, alter it in memory, and write it back out or you read in the file in pieces (often line by line) and then write out to a new file as you go (and then possibly move the new file over the top of the original). The first approach is easiest, the second is better for not having to hold the whole thing in memory at once.
At the point where you write to the file, your location is at the end of the file, so you need to seek back to the beginning. Then, you will overwrite the file, but this may leave old content at the end, so you also need to truncate the file.
Additionally, the mode you're specifying ('rw+') is invalid, and I get IOErrors when I try to do some operations on files opened with it. I believe that you want mode 'r+' ("Open for reading and writing. The stream is positioned at the beginning of the file."). 'w+' is similar, but would create the file if it didn't exist.
So, what you're looking for might be code like this:
try:
f = open(self.working_file_path, "r+")
buff = self._get_buffer()
f.seek(0)
f.truncate()
f.write(self._get_text())
#update modified flag
buff.set_modified(False)
f.close()
except IOError as e:
print "File Doesnt Exist so bring up Save As..."
......
However, you may want to modify this code to correctly catch and handle errors while truncating and writing the file, rather than assuming that all IOErrors in this section are non-existant-file errors from the call to open.
Read the file in as a list, add an element to the start of it, write it all out. Something like this.
f = open(self.working_file_path, "r+")
flist = f.readlines()
flist.insert(0, self._get_text())
f.seek(0)
f.writelines(flist)

How do I modify the last line of a file?

The last line of my file is:
29-dez,40,
How can I modify that line so that it reads:
29-Dez,40,90,100,50
Note: I don't want to write a new line. I want to take the same line and put new values after 29-Dez,40,
I'm new at python. I'm having a lot of trouble manipulating files and for me every example I look at seems difficult.
Unless the file is huge, you'll probably find it easier to read the entire file into a data structure (which might just be a list of lines), and then modify the data structure in memory, and finally write it back to the file.
On the other hand maybe your file is really huge - multiple GBs at least. In which case: the last line is probably terminated with a new line character, if you seek to that position you can overwrite it with the new text at the end of the last line.
So perhaps:
f = open("foo.file", "wb")
f.seek(-len(os.linesep), os.SEEK_END)
f.write("new text at end of last line" + os.linesep)
f.close()
(Modulo line endings on different platforms)
To expand on what Doug said, in order to read the file contents into a data structure you can use the readlines() method of the file object.
The below code sample reads the file into a list of "lines", edits the last line, then writes it back out to the file:
#!/usr/bin/python
MYFILE="file.txt"
# read the file into a list of lines
lines = open(MYFILE, 'r').readlines()
# now edit the last line of the list of lines
new_last_line = (lines[-1].rstrip() + ",90,100,50")
lines[-1] = new_last_line
# now write the modified list back out to the file
open(MYFILE, 'w').writelines(lines)
If the file is very large then this approach will not work well, because this reads all the file lines into memory each time and writes them back out to the file, which is very inefficient. For a small file however this will work fine.
Don't work with files directly, make a data structure that fits your needs in form of a class and make read from/write to file methods.
I recently wrote a script to do something very similar to this. It would traverse a project, find all module dependencies and add any missing import statements. I won't clutter this post up with the entire script, but I'll show how I went about modifying my files.
import os
from mmap import mmap
def insert_import(filename, text):
if len(text) < 1:
return
f = open(filename, 'r+')
m = mmap(f.fileno(), os.path.getsize(filename))
origSize = m.size()
m.resize(origSize + len(text))
pos = 0
while True:
l = m.readline()
if l.startswith(('import', 'from')):
continue
else:
pos = m.tell() - len(l)
break
m[pos+len(text):] = m[pos:origSize]
m[pos:pos+len(text)] = text
m.close()
f.close()
Summary: This snippet takes a filename and a blob of text to insert. It finds the last import statement already present, and sticks the text in at that location.
The part I suggest paying most attention to is the use of mmap. It lets you work with files in the same manner you may work with a string. Very handy.

Python truncate lines as they are read

I have an application that reads lines from a file and runs its magic on each line as it is read. Once the line is read and properly processed, I would like to delete the line from the file. A backup of the removed line is already being kept. I would like to do something like
file = open('myfile.txt', 'rw+')
for line in file:
processLine(line)
file.truncate(line)
This seems like a simple problem, but I would like to do it right rather than a whole lot of complicated seek() and tell() calls.
Maybe all I really want to do is remove a particular line from a file.
After spending far to long on this problem I decided that everyone was probably right and this it just not a good way to do things. It just seemed so elegant solution. What I was looking for was something akin to a FIFO that would just let me pop lines out of a file.
Remove all lines after you've done with them:
with open('myfile.txt', 'r+') as file:
for line in file:
processLine(line)
file.truncate(0)
Remove each line independently:
lines = open('myfile.txt').readlines()
for line in lines[::-1]: # process lines in reverse order
processLine(line)
del lines[-1] # remove the [last] line
open('myfile.txt', 'w').writelines(lines)
You can leave only those lines that cause exceptions:
import fileinput, sys
for line in fileinput.input(['myfile.txt'], inplace=1):
try: processLine(line)
except Exception:
sys.stdout.write(line) # it prints to 'myfile.txt'
In general, as other people already said it is a bad idea what you are trying to do.
You can't. It is just not possible with actual text file implementations on current filesystems.
Text files are sequential, because the lines in a text file can be of any length.
Deleting a particular line would mean rewriting the entire file from that point on.
Suppose you have a file with the following 3 lines;
'line1\nline2reallybig\nline3\nlast line'
To delete the second line you'd have to move the third and fourth lines' positions in the disk. The only way would be to store the third and fourth lines somewhere, truncate the file on the second line, and rewrite the missing lines.
If you know the size of every line in the text file, you can truncate the file in any position using .truncate(line_size * line_number) but even then you'd have to rewrite everything after the line.
You're better off keeping a index into the file so that you can start where you stopped last, without destroying part of the file. Something like this would work :
try :
for index, line in enumerate(file) :
processLine(line)
except :
# Failed, start from this line number next time.
print(index)
raise
Truncating the file as you read it seems a bit extreme. What if your script has a bug that doesn't cause an error? In that case you'll want to restart at the beginning of your file.
How about having your script print the line number it breaks on and having it take a line number as a parameter so you can tell it which line to start processing from?
First of all, calling the operation truncate is probably not the best pick. If I understand the problem correctly, you want to delete everything up to the current position in file. (I would expect truncate to cut everything from the current position up to the end of the file. This is how the standard Python truncate method works, at least if I Googled correctly.)
Second, I am not sure it is wise to modify the file while iterating on in using the for loop. Wouldn’t it be better to save the number of lines processed and delete them after the main loop has finished, exception or not? The file iterator supports in-place filtering, which means it should be fairly simple to drop the processed lines afterwards.
P.S. I don’t know Python, take this with a grain of salt.
A related post has what seems a good strategy to do that, see
How can I run the first process from a list of processes stored in a file and immediately delete the first line as if the file was a queue and I called "pop"?
I have used it as follows:
import os;
tasklist_file = open(tasklist_filename, 'rw');
first_line = tasklist_file.readline();
temp = os.system("sed -i -e '1d' " + tasklist_filename); # remove first line from task file;
I'm not sure it works on Windows.
Tried it on a mac and it did do the trick.
This is what I use for file based queues. It returns the first line and rewrites the file with the rest. When it's done it returns None:
def pop_a_text_line(filename):
with open(filename,'r') as f:
S = f.readlines()
if len(S) > 0:
pop = S[0]
with open(filename,'w') as f:
f.writelines(S[1:])
else:
pop = None
return pop

Categories