I have a problem whereby I am trying to first check a text file for the existence of a known string, and based on this, loop over the file and insert a different line.
For some reason, after calling file.read() to check for the test string, the for loop appears not to work. I have tried calling file.seek(0) to get back to the start of the file, but this has not helped.
My current code is as follows:
try:
f_old = open(text_file)
f_new = open(text_file + '.new','w')
except:
print 'Unable to open text file!'
logger.info('Unable to open text file, exiting')
sys.exit()
wroteOut = False
# first check if file contains an test string
if '<dir>' in f_old.read():
#f_old.seek(0) # <-- do we need to do this??
for line in f_old: # loop thru file
print line
if '<test string>' in line:
line = ' <found the test string!>'
if '<test string2>' in line:
line = ' <found test string2!>'
f_new.write(line) # write out the line
wroteOut = True # set flag so we know it worked
f_new.close()
f_old.close()
You already know the answer:
#f_old.seek(0) # <-- do we need to do this??
Yes, you need to seek back to the start of the file before you can read the contents again.
All file operations work with the current file position. Using file.read() reads all of the file, leaving the current position set to the end of the file. If you wanted to re-read data from the start of the file, a file.seek(0) call is required. The alternatives are to:
Not read the file again, you just read all of the data, so use that information instead. File operations are slow, using the same data from memory is much, much faster:
contents = f_old.read()
if '<dir>' in contents:
for line in contents.splitlines():
# ....
Re-open the file. Opening a file in read mode puts the current file position back at the start.
Related
I want to write a file that says hello guys how are you but each word must be an item of list. Here is my code. It shows nothing when I run it, when I run second time it shows item by item as I want. But when I click text file, it is written two times.
with open('stavanger.txt','r+') as f: # file closes itself with with open as filename command
words = ['hello\n','guys\n','how\n', 'are\n','you\n']
f.writelines(words)
for i in f:
x=i.rstrip().split(',')#turn text file into list and we seperate list items by comma .
print(x)
The problem is that writing to a file uses a buffer. So after the line f.writelines(words) nothing really happened. Only the buffer changed.
In effect, the file still haven't changed and the file pointer is still at the beginning of the file. So the second time you run your code you see the content printed, which leaves the file pointer at the end of the file and only then the buffer is passed to the file and you have the duplicated content.
Simply use mode='w' if you just want to write to a file...
You start reading the file from where the writing stopped. It is better to open the file first for writing, then for reading
Something like this
with open('stavanger.txt', 'w') as f: # file closes itself with with open as filename command
words = ['hello\n', 'guys\n', 'how\n', 'are\n', 'you\n']
f.writelines(words)
with open('stavanger.txt', 'r') as f:
for i in f:
x = i.rstrip().split(',') # turn text file into list and we seperate list items by comma .
print(x)
I am trying to move each line down at the bottom of the file; this is how the file look like:
daodaos 12391039
idiejda 94093420
jfijdsf 10903213
....
#completed
So at the end of the parsing, I am planning to get all the entry that are on the top, under the actual string that says # completed.
The problem is that I am not sure how can I do this in one pass; I know that I can read the whole file, every single line, close the file and then re-open the file in write mode; searching for that line, removing it from the file and adding it to the end; but it feels incredibly inefficient.
Is there a way in one pass, to process the current line; then in the same for loop, delete the line and append it at the end of the file?
file = open('myfile.txt', 'a')
for items in file:
#process items line
#append items line to the end of the file
#remove items line from the file
suggest to keep it simple read and writeback
with open('myfile.txt') as f:
lines = f.readlines()
with open('myfile.txt', 'w') as f:
newlines = []
for line in lines:
# do you stuff, check if completed, rearrange the list
if line.startswith('#completed'):
idx=i
newlines = lines[idx:] + lines[:idx]
break
f.write(''.join(newlines)) # write back new lines
below is another version i could think of if insist wanna modify while reading
with open('myfile.txt', 'r+') as f:
newlines = ''
line = True
while line:
line = f.readline()
if line.startswith('#completed'):
# line += f.read() # uncomment this line if you interest on line after #completed
f.truncate()
f.seek(0)
f.write(line + newlines)
break
else:
newlines += line
Not really.
Your main problem here is that you're iterating on the file at the same time you want to change it. This will Do Bad Things (tm) to your processing, unless you plan to micro-manage the file position pointer.
You do have that power: the seek method lets you move to a given file location, expressed in bytes. seek(0) moves to the start of the file; seek(-1) to the end. The problem you face is that your for loop trusts that this pointer indicates the next line to read.
One distinct problem is that you can't just remove a line from the middle of the file; something exists in those bytes. Think of it as lines of text on a page, written in pencil. You can erase line 4, but this does not cause lines 5-end to magically float up half a centimeter; they're still in the same physical location.
How to Do It ... sort of
Read all of the lines into a list. You can easily change a list the way you want. When you hit the end, then write the list back to the file -- or use your magic seek and append powers to alter only a little of it.
I'll recommend you to do this the simple way: read all the file and store it in a variable, move the completed files to another variable and then rewrite your file.
I want to be able to open a file, append some text to the end, and then read only the first line. I know exactly how long the first line of the file is, and the file is large enough that I don't want to read it into memory all at once. I've tried using:
with open('./output files/log.txt', 'a+') as f:
f.write('This is example text')
content = f.readline()
print(content)
but the print statement is blank. When I try using open('./output files/log.txt') or open('./output files/log.txt', 'r+') instead of open('./output files/log.txt', 'a+') this works so I know it has to do with the 'a+ argument. My problem is that I have to append to the file. How can I append to the file and still get the first line without using something like
with open('./output files/log.txt', 'a+') as f_1:
f.write('This is example text')
with open('./output files/log.txt') as f_2:
content = f_2.readline()
print(content)
When you open a file with the append flag a, it moves the file descriptor's pointer to the end of the file, so that the write call will add to the end of the file.
The readline() function reads from the current pointer of the file until the next '\n' character it reads. So when you open a file with append, and then call readline, it will try to read a line starting from the end of the file. This is why your print call is coming up blank.
You can see this in action by looking at where the file object is currently pointing, using the tell() function.
To read the first line, you'd have to make sure the file's pointer is back at the beginning of the file, which you can do using the seek function. seek takes two arguments: offset and from_what. If you omit the second argument, offset is taken from the beginning of the file. So to jump to the beginning of the file, do: seek(0).
If you want to jump back to the end of the file, you can include the from_what option. from_what=2 means take the offset from the end of the file. So to jump to the end: seek(0, 2).
Demonstration of file pointers when opened in append mode:
Example using a text file that looks like this:
the first line of the file
and the last line
Code:
with open('example.txt', 'a+') as fd:
print fd.tell() # at end of file
fd.write('example line\n')
print fd.tell() # at new end of the file after writing
# jump to the beginning of the file:
fd.seek(0)
print fd.readline()
# jump back to the end of the file
fd.seek(0, 2)
fd.write('went back to the end')
console output:
45
57
the first line of the file
new contents of example.txt:
the first line of the file
and the last line
example line
went back to the end
Edit: added jumping back to end of file
You need to go back to the start of the file using seek(0), like so:
with open('./output files/log.txt', 'a+') as f_1:
f_1.write('This is example text')
f_1.seek(0)
print(f_1.readline())
I want to append a new line in the starting of 2GB+ file. I tried following code but code OUT of MEMORY
error.
myfile = open(tableTempFile, "r+")
myfile.read() # read everything in the file
myfile.seek(0) # rewind
myfile.write("WRITE IN THE FIRST LINE ")
myfile.close();
What is the way to write in a file file without getting the entire file in memory?
How to append a new line at starting of the file?
Please note, there's no way to do this with any built-in functions in Python.
You can do this easily in LINUX using tail / cat etc.
For doing it via Python we must use an auxiliary file and for doing this with very large files, I think this method is the possibility:
def add_line_at_start(filename,line_to_be_added):
f = fileinput.input(filename,inplace=1)
for xline in f:
if f.isfirstline():
print line_to_be_added.rstrip('\r\n') + '\n' + xline,
else:
print xline
NOTE:
Never try to use read() / readlines() functions when you are dealing with big files. These methods tried load the complete file into your memory
In your given code, seek function is going to take you the starting point but then everything you write would overwrite the current content
If you can afford having the entire file in memory at once:
first_line_update = "WRITE IN THE FIRST LINE \n"
with open(tableTempFile, 'r+') as f:
lines = f.readlines()
lines[0] = first_line_update
f.writelines(lines)
otherwise:
from shutil import copy
from itertools import islice, chain
# TODO: use a NamedTemporaryFile from the tempfile module
first_line_update = "WRITE IN THE FIRST LINE \n"
with open("inputfile", 'r') as infile, open("tmpfile", 'w+') as outfile:
# replace the first line with the string provided:
outfile.writelines(
(line for line in chain((first_line_update,), islice(infile,1,None)))
# if you don't want to replace the first line but to insert another line before
# this simplifies to:
#outfile.writelines(line for line in chain((first_line_update,), infile))
copy("tmpfile", "infile")
# TODO: remove temporary file
Generally, you can't do that. A file is a sequence of bytes, not a sequence of lines. This data model doesn't allow for insertions in arbitrary points - you can either replace a byte by another or append bytes at the end.
You can either:
Replace the first X bytes in the file. This could work for you if you can make sure that the first line's length will never vary.
Truncate the file, write the first line, then rewrite all the rest after it. If you can't fit all your file into the memory, then:
create a temporary file (the tempfile module will help you)
write your line to it
open your base file in r and copy its contents after the first line to the temporary file, piece-wise
close both files, then replace the input file by the temporary file
(Note that appending to the end of a file is much easier - all you need to do is open the file in the append a mode.)
I have a file where each line starts with a number. The user can delete a row by typing in the number of the row the user would like to delete.
The issue I'm having is setting the mode for opening it. When I use a+, the original content is still there. However, tacked onto the end of the file are the lines that I want to keep. On the other hand, when I use w+, the entire file is deleted. I'm sure there is a better way than opening it with w+ mode, deleting everything, and then re-opening it and appending the lines.
def DeleteToDo(self):
print "Which Item Do You Want To Delete?"
DeleteItem = raw_input(">") #select a line number to delete
print "Are You Sure You Want To Delete Number" + DeleteItem + "(y/n)"
VerifyDelete = str.lower(raw_input(">"))
if VerifyDelete == "y":
FILE = open(ToDo.filename,"a+") #open the file (tried w+ as well, entire file is deleted)
FileLines = FILE.readlines() #read and display the lines
for line in FileLines:
FILE.truncate()
if line[0:1] != DeleteItem: #if the number (first character) of the current line doesn't equal the number to be deleted, re-write that line
FILE.write(line)
else:
print "Nothing Deleted"
This is what a typical file may look like
1. info here
2. more stuff here
3. even more stuff here
When you open a file for writing, you clobber the file (delete its current contents and start a new file). You can find this out by reading documentation for the open() command.
When you open a file for appending, you do not clobber the file. But how can you delete just one line? A file is a sequence of bytes stored on a storage device; there is no way for you to delete one line and have all the other lines automatically "slide down" into new positions on the storage device.
(If your data was stored in a database, you could actually delete just one "row" from the database; but a file is not a database.)
So, the traditional way to solve this: you read from the original file, and you copy it to a new output file. As you copy, you perform any desired edits; for example, you can delete a line simply by not copying that one line; or you can insert a line by writing it in the new file.
Then, once you have successfully written the new file, and successfully closed it, if there is no error, you go ahead and rename the new file back to the same name as the old file (which clobbers the old file).
In Python, your code should be something like this:
import os
# "num_to_delete" was specified by the user earlier.
# I'm assuming that the number to delete is set off from
# the rest of the line with a space.
s_to_delete = str(num_to_delete) + ' '
def want_input_line(line):
return not line.startswith(s_to_delete)
in_fname = "original_input_filename.txt"
out_fname = "temporary_filename.txt"
with open(in_fname) as in_f, open(out_fname, "w") as out_f:
for line in in_f:
if want_input_line(line):
out_f.write(line)
os.rename(out_fname, in_fname)
Note that if you happen to have a file called temporary_filename.txt it will be clobbered by this code. Really we don't care what the filename is, and we can ask Python to make up some unique filename for us, using the tempfile module.
Any recent version of Python will let you use multiple statements in a single with statement, but if you happen to be using Python 2.6 or something you can nest two with statements to get the same effect:
with open(in_fname) as in_f:
with open(out_fname, "w") as out_f:
for line in in_f:
... # do the rest of the code
Also, note that I did not use the .readlines() method to get the input lines, because .readlines() reads the entire contents of the file into memory, all at once, and if the file is very large this will be slow or might not even work. You can simply write a for loop using the "file object" you get back from open(); this will give you one line at a time, and your program will work with even really large files.
EDIT: Note that my answer is assuming that you just want to do one editing step. As #jdi noted in comments for another answer, if you want to allow for "interactive" editing where the user can delete multiple lines, or insert lines, or whatever, then the easiest way is in fact to read all the lines into memory using .readlines(), insert/delete/update/whatever on the resulting list, and then only write out the list to a file a single time when editing is all done.
def DeleteToDo():
print ("Which Item Do You Want To Delete?")
DeleteItem = raw_input(">") #select a line number to delete
print ("Are You Sure You Want To Delete Number" + DeleteItem + "(y/n)")
DeleteItem=int(DeleteItem)
VerifyDelete = str.lower(raw_input(">"))
if VerifyDelete == "y":
FILE = open('data.txt',"r") #open the file (tried w+ as well, entire file is deleted)
lines=[x.strip() for x in FILE if int(x[:x.index('.')])!=DeleteItem] #read all the lines first except the line which matches the line number to be deleted
FILE.close()
FILE = open('data.txt',"w")#open the file again
for x in lines:FILE.write(x+'\n') #write the data to the file
else:
print ("Nothing Deleted")
DeleteToDo()
Instead of writing out all lines one by one to the file, delete the line from memory (to which you read the file using readlines()) and then write the memory back to disk in one shot. That way you will get the result you want, and you won't have to clog the I/O.
You could mmap the file... after haven read the suitable documentation...
You don't need to check for the lines numbers in your file, you can do something like this:
def DeleteToDo(self):
print "Which Item Do You Want To Delete?"
DeleteItem = int(raw_input(">")) - 1
print "Are You Sure You Want To Delete Number" + str(DeleteItem) + "(y/n)"
VerifyDelete = str.lower(raw_input(">"))
if VerifyDelete == "y":
with open(ToDo.filename,"r") as f:
lines = ''.join([a for i,a in enumerate(f) if i != DeleteItem])
with open(ToDo.filename, "w") as f:
f.write(lines)
else:
print "Nothing Deleted"