I am trying to loop over the lines of a text file which is verifiably non-empty and I am running into problems with my script. In my attempt to debug what I wrote, I figured I would make sure my script is properly reading from the file, so I am currently trying to print every line in it.
At first I tried using the usual way of doing this in Python i.e.:
with open('file.txt') as fo:
for line in fo:
print line
but my script is not printing anything. I then tried storing all of the lines in a list like so:
with open('file.txt') as fo:
flines = fo.readlines()
print flines
and yet my program still outputs an empty list (i.e. []). I have also tried making sure that my file pointer is pointing to the beginning of the file using fo.seek(0) before attempting to read from it, yet that also does not work.
I have spent some time reading solutions to similar questions posted on here, but so far nothing I have tried has worked. I do not know how such an elementary I/O operation is giving me so much trouble, but I must be missing something basic and would thus really appreciate any help/suggestions.
EDIT: Here is the part of my script which is causing the problem:
import subprocess as sbp
with open('conf_15000.xyz','w') as fo:
p1 =sbp.Popen(['head','-n', '300000','nnp-pos-1.xyz'],stdout=sbp.PIPE)
p2 = sbp.Popen(['tail','-n', '198'],stdin=p1.stdout,stdout=fo)
with open('conf_15000.xyz','r') as fp:
fp.seek(0)
flines = fp.readlines()
print flines
And here is an exerpt from the nnp-pos-1.xyz file (all lines have the same format and there are 370642 of them in total):
Ti 32.9136715924 28.5387609200 24.6554922872
O 39.9997000300 35.1489480846 22.8396092714
O 33.7314699265 30.3398473499 23.8866085372
Ti 27.7756767925 31.3455930970 25.9779887743
O 31.1520937719 29.0752315770 25.4786577758
O 26.1870965535 32.4876155555 26.3346205619
Ti 38.4478275543 25.5734609650 22.0654953429
O 24.1328940232 31.3858060129 28.8575469919
O 38.6506317714 27.3779871011 22.6552032123
Ti 40.5617501289 27.5095900385 22.8436684314
O 38.2400600469 29.1828342919 20.7853056680
O 38.8481088254 27.2704154737 26.9590081202
When running the script, the file being read from (conf_15000.xyz) gets written to properly, however I cannot seem to be able to read from it at runtime.
EDIT-2: Following sudonym's recommendation I am using the absolute file path and am checking whether or not the file is empty before reading from it by adding the following unindented lines between the two with statements I wrote in my previous edit:
print os.path.isfile(r'full/path/to/file')
print (os.stat(r'full/path/to/file').st_size != 0)
The first boolean evaluates to True (meaning the file exists) while the second evaluates to False (meaning the file is empty). This is very strange because both of these lines are added after I close the file pointer fo which writes to the file and also because the file being written to (and subsequently read from with fp) is not empty after I execute the script (in fact, it contains all the lines it is supposed to).
EDIT-3: Turns out the reason why my script saw the file it needed to read as empty is because it did not wait for the subprocess (p2 in the example above) that writes to it to stop executing, meaning it would execute the lines after my first with statement before the file pointer was actually closed (i. e. before the file was done being written to). The fix was therefore to add the statement p2.wait() at the end of the first with statement like so:
import subprocess as sbp
with open('conf_15000.xyz','w') as fo:
p1 =sbp.Popen(['head','-n', '300000','nnp-pos-1.xyz'],stdout=sbp.PIPE)
p2 = sbp.Popen(['tail','-n', '198'],stdin=p1.stdout,stdout=fo)
p2.wait()
with open('conf_15000.xyz','r') as fp:
fp.seek(0)
flines = fp.readlines()
print flines
Now everything works the way it is supposed to.
You probably need to flush() the buffers first (and maybe call os.fsync() too) - after writing and before reading.
See file.flush() and this post.
first, include the absolute path.
Second, check if the file actually exists and is not empty:
import os
FILEPATH = r'path\to\file.txt' # full path as raw string
if os.path.isfile(FILEPATH) and (os.stat(FILEPATH).st_size != 0):
with open(FILEPATH) as fo:
flines = fo.readlines()
print flines
else:
print FILEPATH, "doesn't exist or is empty"
Related
Started Python a week ago and I have some questions to ask about reading and writing to the same files. I've gone through some tutorials online but I am still confused about it. I can understand simple read and write files.
openFile = open("filepath", "r")
readFile = openFile.read()
print readFile
openFile = open("filepath", "a")
appendFile = openFile.write("\nTest 123")
openFile.close()
But, if I try the following I get a bunch of unknown text in the text file I am writing to. Can anyone explain why I am getting such errors and why I cannot use the same openFile object the way shown below.
# I get an error when I use the codes below:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
readFile = openFile.read()
print readFile
openFile.close()
I will try to clarify my problems. In the example above, openFile is the object used to open file. I have no problems if I want write to it the first time. If I want to use the same openFile to read files or append something to it. It doesn't happen or an error is given. I have to declare the same/different open file object before I can perform another read/write action to the same file.
#I have no problems if I do this:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
openFile2 = open("filepath", "r+")
readFile = openFile2.read()
print readFile
openFile.close()
I will be grateful if anyone can tell me what I did wrong here or is it just a Pythong thing. I am using Python 2.7. Thanks!
Updated Response:
This seems like a bug specific to Windows - http://bugs.python.org/issue1521491.
Quoting from the workaround explained at http://mail.python.org/pipermail/python-bugs-list/2005-August/029886.html
the effect of mixing reads with writes on a file open for update is
entirely undefined unless a file-positioning operation occurs between
them (for example, a seek()). I can't guess what
you expect to happen, but seems most likely that what you
intend could be obtained reliably by inserting
fp.seek(fp.tell())
between read() and your write().
My original response demonstrates how reading/writing on the same file opened for appending works. It is apparently not true if you are using Windows.
Original Response:
In 'r+' mode, using write method will write the string object to the file based on where the pointer is. In your case, it will append the string "Test abc" to the start of the file. See an example below:
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\n'
>>> f.write("foooooooooooooo")
>>> f.close()
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\nfoooooooooooooo'
The string "foooooooooooooo" got appended at the end of the file since the pointer was already at the end of the file.
Are you on a system that differentiates between binary and text files? You might want to use 'rb+' as a mode in that case.
Append 'b' to the mode to open the file in binary mode, on systems
that differentiate between binary and text files; on systems that
don’t have this distinction, adding the 'b' has no effect.
http://docs.python.org/2/library/functions.html#open
Every open file has an implicit pointer which indicates where data will be read and written. Normally this defaults to the start of the file, but if you use a mode of a (append) then it defaults to the end of the file. It's also worth noting that the w mode will truncate your file (i.e. delete all the contents) even if you add + to the mode.
Whenever you read or write N characters, the read/write pointer will move forward that amount within the file. I find it helps to think of this like an old cassette tape, if you remember those. So, if you executed the following code:
fd = open("testfile.txt", "w+")
fd.write("This is a test file.\n")
fd.close()
fd = open("testfile.txt", "r+")
print fd.read(4)
fd.write(" IS")
fd.close()
... It should end up printing This and then leaving the file content as This IS a test file.. This is because the initial read(4) returns the first 4 characters of the file, because the pointer is at the start of the file. It leaves the pointer at the space character just after This, so the following write(" IS") overwrites the next three characters with a space (the same as is already there) followed by IS, replacing the existing is.
You can use the seek() method of the file to jump to a specific point. After the example above, if you executed the following:
fd = open("testfile.txt", "r+")
fd.seek(10)
fd.write("TEST")
fd.close()
... Then you'll find that the file now contains This IS a TEST file..
All this applies on Unix systems, and you can test those examples to make sure. However, I've had problems mixing read() and write() on Windows systems. For example, when I execute that first example on my Windows machine then it correctly prints This, but when I check the file afterwards the write() has been completely ignored. However, the second example (using seek()) seems to work fine on Windows.
In summary, if you want to read/write from the middle of a file in Windows I'd suggest always using an explicit seek() instead of relying on the position of the read/write pointer. If you're doing only reads or only writes then it's pretty safe.
One final point - if you're specifying paths on Windows as literal strings, remember to escape your backslashes:
fd = open("C:\\Users\\johndoe\\Desktop\\testfile.txt", "r+")
Or you can use raw strings by putting an r at the start:
fd = open(r"C:\Users\johndoe\Desktop\testfile.txt", "r+")
Or the most portable option is to use os.path.join():
fd = open(os.path.join("C:\\", "Users", "johndoe", "Desktop", "testfile.txt"), "r+")
You can find more information about file IO in the official Python docs.
Reading and Writing happens where the current file pointer is and it advances with each read/write.
In your particular case, writing to the openFile, causes the file-pointer to point to the end of file. Trying to read from the end would result EOF.
You need to reset the file pointer, to point to the beginning of the file before through seek(0) before reading from it
You can read, modify and save to the same file in python but you have actually to replace the whole content in file, and to call before updating file content:
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
I needed a function to go through all subdirectories of folder and edit content of the files based on some criteria, if it helps:
new_file_content = ""
for directories, subdirectories, files in os.walk(folder_path):
for file_name in files:
file_path = os.path.join(directories, file_name)
# open file for reading and writing
with io.open(file_path, "r+", encoding="utf-8") as edit_file:
for current_line in edit_file:
if condition in current_line:
# update current line
current_line = current_line.replace('john', 'jack')
new_file_content += current_line
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
# delete actual file content
edit_file.truncate()
# rewrite updated file content
edit_file.write(new_file_content)
# empties new content in order to set for next iteration
new_file_content = ""
edit_file.close()
I'm trying to convert PHP code to Python, and I have problems with replacing lines. Although I find it easier to do using Python, I'm absolutely lost; I can find the line to replace, I can add something to the end of the line, but I can't write the line again on the file.
file = open("cache.ucb", 'rb')
for line in file:
if line.split('~!')[0] == ex[4]:
line += "~!" + mask[0]
line = line.rstrip() + "\n"
# Write on the file here!
Basically, the file uses ~! as a separator, and I read each line. If the first token separated with ~! of the line starts with ex[4], which could be for example Catbuntu, I want to append mask[0], which could be Bousie, on the end of that line. Then I remove the new line characters and add one to the end.
And there's the problem. I want to write the file as it was, but changing only that line. Is that possible?
Assuming you're on python >=2.7, the following should work a treat
original = open(filename)
newfile = []
for line in original:
if line.split('~!')[0] == ex[4]:
line += "~!" + mask[0]
line = line.rstrip() + "\n"
newfile.append(line)
original.close()
amended.open(filename, "w")
amended.writeLines(newfile)
amended.close()
If for whatever reason you are on python 2.6 or lower, replace the second to last line with:
amended.write("".join(newfile))
EDIT: Fixed to replace a mistake copied from the question, factor out a filename.
You cannot modify a file in-place, at least not if you want to insert characters to a line. You'll just end up overwriting the start of the next line.
There are two different ways to do this:
Read the file into memory, close it, then write back the new version.
Write a new temporary file as you go along, then move it over the original version.
So, how do you choose between them? I'll try to summarize the differences, ordered so that each one typically trumps the ones below if it's important (but that's just "typically"—you have to think through your own use case):
2 doesn't require holding the entire thing in memory. If your file is, say, 20GB long, this is obviously a huge win; if it's 16KB, it doesn't matter.
2 makes the entire operation atomic. Even if it fails halfway through, or some other process tries to read the file while you're in the middle of changing it, there is no way anyone can see some invalid half-modified file; they will see either the original file, or the new one.
2 requires some free disk space (because there are, temporarily, two copies of the file at the same time).
2 is a huge pain in the neck if you care about both Windows and POSIX.
2 can involve copying across filesystems if the original file and the temp directory are on different filesystems, unless you're careful about it.
2 is simpler if neither of the above two are an issue.
Drakekin's answer tells you how to do #1.
Here's how to do #2 if you don't care about Windows or about cross-filesystem issues:
infile = open("cache.ucb", 'rb')
outfile = tempfile.NamedTemporaryFile(delete=False)
for line in infile:
if line.split('~!')[0] == ex[4]:
line += "~!" + mask[0]
line = line.rstrip() + "\n"
outfile.write(line)
infile.close()
os.rename(outfile.name, "cache.ucb")
outfile.close()
You can solve the cross-filesystem problem by, e.g., passing dir=os.path.dirname(original path) to the NamedTemporaryFile constructor, but only if you're sure you'll always have permissions to create a new file alongside the original (which isn't always guaranteed, just because you have permission to rewrite the original—UNIX permissions, Windows ACLs, the OS X sandbox, etc. all give ways that can be false).
To solve the Windows problem… well, start with Is an atomic file rename (with overwrite) possible on Windows, and similar discussions all over the internet.
Open the file in mode 'wb' and put file.write(line) at the end of your loop.
You don't have your file open for writing.
file = open("cache.ucb", 'rb')
This line opens a file for reading in binary mode. You need to open it for writing also.
Try opening the file in write mode, 'w' and writing the line back.
Or you can simply open your file for read/write at the beginning and write inside your loop:
file = open("cache.ucb", 'a+')
I have a python script that runs a subprocess to get some data and then process it. What I'm trying to achieve is have the data written to a file, and then use the data from the file to do the processing (the reason is that the subprocess is slow, but can change based on the date, time, and parameters I use, and I need to run the script frequently)
I've tried various methods, including opening the file as w+ and trying to seek to the beginning after the write is done, but nothing seems to work - the file is written, but when I try to read back from it (using file.readline()) i get EOF back.
This is what I'm essentially trying to accomplish:
myFile = open(fileName, "w")
p = subprocess.Popen(args, stdout=myFile)
myFile.flush() # force the file to disk
os.fsync(myFile) # ..
myFile.close()
myFile = open(fileName, "r")
while myFile.readline():
pass # do stuff
myFile.close()
But even though the file is correctly written (after the script runs, i can see the contents of the file), readline never returns a valid line. Like I said I also tried using the same file object, and doing seek(0) on it, to no luck. This only worked when opening the file as r+, which fails when the file doesn't already exist.
Any help would be appreciated. Also if there's a cleaner way to do this, i'm open to it :)
PS: I realize I can Popen and stdout to a pipe, read from the pipe and then write line by line the data to the file as I do that, but I'm trying to separate the creation of the data file from the reading.
The subprocess almost certainly isn't finishing before you try to read from the file. In fact, it's likely that the subprocess isn't even writing anything before you try to read from the file. For true separation you're going to have to have the subprocess write to a temporary file then replace the file you read from, so that you either read the previous version or the new version but never get to see the partially-written file from the new version.
You can do this in a number of ways; the easiest would be to change the subprocess, but I don't know if that's an option for you here. Alternatively, you can wrap it in your own separate script to manage the files. You probably don't want to call the subprocess in the script that analyses the output file either; you'll want a cronjob or something to regenerate periodically.
This should work as is provided the subprocess is finishing in time (see James's answer).
If you want to wait for it to finish, add p.wait() after the Popen invocation.
What is your actual while loop, though? while myFile.readline() makes it seem as you're not actually saving the line for anything. Try this:
myFile = open(fileName, "r")
print myFile.readlines()
myFile.close()
Or, if you want to interactively examine the state of your program:
myFile = open(fileName, "r")
import pdb; pdb.set_trace()
myFile.close()
Then you can do things like print myFile.readlines() after it stops.
#James Aylett pointed me to the right path, it appears that my problem was that subprocess.Popen wasn't finished running when I call .flush().
The solution, is to call p.wait() right after the subprocess.Popen call, to allow for the underlying command to finish. After doing that, .flush does the right thing (since all the data is there), and I can proceed to read from the file.
So the above code becomes:
myFile = open(fileName, "w")
p = subprocess.Popen(args, stdout=myFile)
p.wait() # <-- Missing line
myFile.flush() # force the file to disk
os.fsync(myFile) # ..
myFile.close()
myFile = open(fileName, "r")
while myFile.readline():
pass # do stuff
myFile.close()
And then it all works!
The last line of my file is:
29-dez,40,
How can I modify that line so that it reads:
29-Dez,40,90,100,50
Note: I don't want to write a new line. I want to take the same line and put new values after 29-Dez,40,
I'm new at python. I'm having a lot of trouble manipulating files and for me every example I look at seems difficult.
Unless the file is huge, you'll probably find it easier to read the entire file into a data structure (which might just be a list of lines), and then modify the data structure in memory, and finally write it back to the file.
On the other hand maybe your file is really huge - multiple GBs at least. In which case: the last line is probably terminated with a new line character, if you seek to that position you can overwrite it with the new text at the end of the last line.
So perhaps:
f = open("foo.file", "wb")
f.seek(-len(os.linesep), os.SEEK_END)
f.write("new text at end of last line" + os.linesep)
f.close()
(Modulo line endings on different platforms)
To expand on what Doug said, in order to read the file contents into a data structure you can use the readlines() method of the file object.
The below code sample reads the file into a list of "lines", edits the last line, then writes it back out to the file:
#!/usr/bin/python
MYFILE="file.txt"
# read the file into a list of lines
lines = open(MYFILE, 'r').readlines()
# now edit the last line of the list of lines
new_last_line = (lines[-1].rstrip() + ",90,100,50")
lines[-1] = new_last_line
# now write the modified list back out to the file
open(MYFILE, 'w').writelines(lines)
If the file is very large then this approach will not work well, because this reads all the file lines into memory each time and writes them back out to the file, which is very inefficient. For a small file however this will work fine.
Don't work with files directly, make a data structure that fits your needs in form of a class and make read from/write to file methods.
I recently wrote a script to do something very similar to this. It would traverse a project, find all module dependencies and add any missing import statements. I won't clutter this post up with the entire script, but I'll show how I went about modifying my files.
import os
from mmap import mmap
def insert_import(filename, text):
if len(text) < 1:
return
f = open(filename, 'r+')
m = mmap(f.fileno(), os.path.getsize(filename))
origSize = m.size()
m.resize(origSize + len(text))
pos = 0
while True:
l = m.readline()
if l.startswith(('import', 'from')):
continue
else:
pos = m.tell() - len(l)
break
m[pos+len(text):] = m[pos:origSize]
m[pos:pos+len(text)] = text
m.close()
f.close()
Summary: This snippet takes a filename and a blob of text to insert. It finds the last import statement already present, and sticks the text in at that location.
The part I suggest paying most attention to is the use of mmap. It lets you work with files in the same manner you may work with a string. Very handy.
I have an application that reads lines from a file and runs its magic on each line as it is read. Once the line is read and properly processed, I would like to delete the line from the file. A backup of the removed line is already being kept. I would like to do something like
file = open('myfile.txt', 'rw+')
for line in file:
processLine(line)
file.truncate(line)
This seems like a simple problem, but I would like to do it right rather than a whole lot of complicated seek() and tell() calls.
Maybe all I really want to do is remove a particular line from a file.
After spending far to long on this problem I decided that everyone was probably right and this it just not a good way to do things. It just seemed so elegant solution. What I was looking for was something akin to a FIFO that would just let me pop lines out of a file.
Remove all lines after you've done with them:
with open('myfile.txt', 'r+') as file:
for line in file:
processLine(line)
file.truncate(0)
Remove each line independently:
lines = open('myfile.txt').readlines()
for line in lines[::-1]: # process lines in reverse order
processLine(line)
del lines[-1] # remove the [last] line
open('myfile.txt', 'w').writelines(lines)
You can leave only those lines that cause exceptions:
import fileinput, sys
for line in fileinput.input(['myfile.txt'], inplace=1):
try: processLine(line)
except Exception:
sys.stdout.write(line) # it prints to 'myfile.txt'
In general, as other people already said it is a bad idea what you are trying to do.
You can't. It is just not possible with actual text file implementations on current filesystems.
Text files are sequential, because the lines in a text file can be of any length.
Deleting a particular line would mean rewriting the entire file from that point on.
Suppose you have a file with the following 3 lines;
'line1\nline2reallybig\nline3\nlast line'
To delete the second line you'd have to move the third and fourth lines' positions in the disk. The only way would be to store the third and fourth lines somewhere, truncate the file on the second line, and rewrite the missing lines.
If you know the size of every line in the text file, you can truncate the file in any position using .truncate(line_size * line_number) but even then you'd have to rewrite everything after the line.
You're better off keeping a index into the file so that you can start where you stopped last, without destroying part of the file. Something like this would work :
try :
for index, line in enumerate(file) :
processLine(line)
except :
# Failed, start from this line number next time.
print(index)
raise
Truncating the file as you read it seems a bit extreme. What if your script has a bug that doesn't cause an error? In that case you'll want to restart at the beginning of your file.
How about having your script print the line number it breaks on and having it take a line number as a parameter so you can tell it which line to start processing from?
First of all, calling the operation truncate is probably not the best pick. If I understand the problem correctly, you want to delete everything up to the current position in file. (I would expect truncate to cut everything from the current position up to the end of the file. This is how the standard Python truncate method works, at least if I Googled correctly.)
Second, I am not sure it is wise to modify the file while iterating on in using the for loop. Wouldn’t it be better to save the number of lines processed and delete them after the main loop has finished, exception or not? The file iterator supports in-place filtering, which means it should be fairly simple to drop the processed lines afterwards.
P.S. I don’t know Python, take this with a grain of salt.
A related post has what seems a good strategy to do that, see
How can I run the first process from a list of processes stored in a file and immediately delete the first line as if the file was a queue and I called "pop"?
I have used it as follows:
import os;
tasklist_file = open(tasklist_filename, 'rw');
first_line = tasklist_file.readline();
temp = os.system("sed -i -e '1d' " + tasklist_filename); # remove first line from task file;
I'm not sure it works on Windows.
Tried it on a mac and it did do the trick.
This is what I use for file based queues. It returns the first line and rewrites the file with the rest. When it's done it returns None:
def pop_a_text_line(filename):
with open(filename,'r') as f:
S = f.readlines()
if len(S) > 0:
pop = S[0]
with open(filename,'w') as f:
f.writelines(S[1:])
else:
pop = None
return pop