Consider the following piece of Python (2.x) code:
for line in open('foo').readlines():
print line.rstrip()
I assume that since the open file remains unreferenced it has to be closed automatically. I have read about garbage collector in Python which frees memory allocated by unused objects. Is GC general enough to handle the files too?
UPDATE
For current versions of python, the clear recommendation is to close files explicitly or use a with statement. No indication anymore that the GC will close the file for you. So now the answer should be: Maybe, but no guarantee. Always use close() or a with statement.
In the Python 3.8 docs the text has been updated to:
If you’re not using the with keyword, then you should call f.close() to close the file and immediately free up any system resources used by it.
Warning: Calling f.write() without using the with keyword or calling f.close() might result in the arguments of f.write() not being completely written to the disk, even if the program exits successfully.
Old Answer:
Taken from the Python 3.6 docs:
If you’re not using the with keyword, then you should call f.close() to close the file and immediately free up any system resources used by it. If you don’t explicitly close a file, Python’s garbage collector will eventually destroy the object and close the open file for you, but the file may stay open for a while. Another risk is that different Python implementations will do this clean-up at different times.
So yes, the file will be closed automatically, but in order to be in control of the process you should do so yourself or use a with statement:
with open('foo') as foo_file:
for line in foo_file.readlines():
print line.rstrip()
foo_file will be clsoed once the with block ends
In the Python 2.7 docs, the wording was different:
When you’re done with a file, call f.close() to close it and free up
any system resources taken up by the open file. After calling
f.close(), attempts to use the file object will automatically fail.
so I assume that you should not depend on the garbage collector automatically closing files for you and just do it manually/use with
I often use open without with so I ran a little test. For the test I use Python 3.9 so I'm not speaking of earlier versions but for 3.9 at least, we do not need the with to have a clean file close.
bash
inotifywait -m "testfile"
python3.9
lines=[line for line in open("testfile")]
sleep(5)
for line in lines:
print(line)
Watch the inotifywait window and run the python script. Before the sleep the final event will be CLOSE_NOWRITE,CLOSE and there will be no other events from that file through the run of the python script.
It depends on what you do, check out this description how it works.
In general I would recommend to use the context manager of the file:
with open("foo", "r") as f:
for line in f.readlines():
# ....
which is similar to (for basic understanding):
file_context_manager = open("foo", "r").__enter__()
for line in file_context_manager.readlines():
# ....
file_context_manager.__exit__()
The first version is a lot more readable, and the withstatement calls the exit method automatically (plus a bit more context handling).
The file will be closed automatically when the scope of the with statement is left.
Related
I'm writing a program, which, inter alia, works with temporary file, created using tempfile library.
The temporary file creates and fills in function:
def func():
mod_script = tempfile.NamedTemporaryFile(dir='special')
dest = open(mod_script, 'w')
# filling dest
return mod_script
(I use open() and not with open() because I execute the temporary file after calling func())
After some operations with mod_script outside func(), I call mod_script.close(). And all works fine.
But I have one problem. If my program fails (or if I interrupt it), the temporary file doesn't remove.
How do I fix it ?
I really don't want to write try...except...finally clauses because I'll have to write it so many times (there are many points, where my program can fail).
First, use a with statement, and pass delete=False to the constructor.
Then you need to put the necessary error handling in your program. Catch exceptions (see try..finally) and clean up during program exit whether it is successful or crashes.
Alternatively, keep the file open while executing it to prevent the automatic deletion-on-close from deleting it before you have executed it. This may have issues on Windows where it tends to have conflicts using files that are open.
I want to create empty file using Python script in Unix environment. Could see different ways mentioned of achieving the same. What are the benefits/pitfalls of one over the other.
os.system('touch abc')
open('abc','a').close()
open('abc','a')
subprocess.call(['touch','abc'])
Well, for a start, the ones that rely on touch are not portable. They won't work under standard Windows, for example, without the installation of CygWin, GNUWin32, or some other package providing a touch utility..
They also involve the creation of a separate process for doing the work, something that's totally unnecessary in this case.
Of the four, I would probably use open('abc','a').close() if the intent is to try and just create the file if it doesn't exist. In my opinion, that makes the intent clear.
But, if you're trying to create an empty file, I'd probably be using the w write mode rather than the a append mode.
In addition, you probably also want to catch the exception if, for example, you cannot actually create the file.
TLDR: use
open('abc','a').close()
(or 'w' instead of 'a' if the intent is to truncate the file if it already exists).
Invoking a separate process to do something Python can do itself is wasteful, and non-portable to platforms where the external command is not available. (Additionally, os.system uses two processes -- one more for a shell to parse the command line -- and is being deprecated in favor of subprocess.)
Not closing an open filehandle when you're done with it is bad practice, and could cause resource depletion in a larger program (you run out of filehandles if you open more and more files and never close them).
To create an empty file on Unix in Python:
import os
try:
os.close(os.open('abc', os.O_WRONLY | os.O_CREAT | os.O_EXCL |
getattr(os, "O_CLOEXEC", 0) |
os.O_NONBLOCK | os.O_NOCTTY))
except OSError:
pass # decide what to consider an error in your case and reraise
# 1. is it an error if 'abc' entry already exists?
# 2. is it an error if 'abc' is a directory or a symlink to a directory?
# 3. is it an error if 'abc' is a named pipe?
# 4. it is probably an error if the parent directory is not writable
# or the filesystem is read-only (can't create a file)
Or more portable variant:
try:
open('abc', 'ab', 0).close()
except OSError:
pass # see the comment above
Without the explicit .close() call, non-reference-counting Python implementations such as Pypy, Jython may delay closing the file until garbage collection is run (it may exhaust available file descriptors for your process).
The latter example may stuck on FIFO and follows symlinks. On my system, it is equivalent to:
from os import *
open("abc", O_WRONLY|O_CREAT|O_APPEND|O_CLOEXEC, 0666)
In addition, touch command updates the access and modification times of existing files to the current time.
In more recent Python 3 variants, we have Path.touch() from pathlib. This will create an empty file if it doesn't exist, and update the mtime if it does, in the same way as your example os.system('touch abc'), but it's much more portable:
from pathlib import Path
abc = Path('abc')
abc.touch()
Example:
subprocess.call(cmd, stdout=open('status_grid','a'), cwd = folder)
is the file status_grid closed automatically?
No, it doesn't:
import subprocess
f = open('b','a')
subprocess.call('ls', stdout=f)
print f.closed
Output:
False
Now a better answer might come from unutbu. You don't give your open file a reference, so once your subprocess completes, it's up to the garbage collector how much longer the file is open.
One way to be sure is
with open('status_grid', 'a') as my_file:
subprocess.call(cmd, stdout=my_file, cwd = folder)
If not done explicitly, the file will be closed when it is garbage collected. When the file is garbage collected is not specified by the Python language per se.
In CPython, the file is garbage collected when there are no more references to the file object.
With other implementations of Python, such as Jython, garbage collection may happen completely differently:
Jython has "true" garbage collection whereas CPython uses reference
counting. This means that in Jython users don't need to worry about
handling circular references as these are guaranteed to be collected
properly. On the other hand, users of Jython have no guarantees of
when an object will be finalized -- this can cause problems for people
who use open("foo", 'r').read() excessively. Both behaviors are
acceptable -- and highly unlikely to change.
As EMS and Charles Salvia point out, to be sure when the file is closed, it is best to not leave it up to the garbage collector. The best way to do that is to use a with statement, which guarantees the file will be closed when Python leaves the with-suite:
with open('status_grid','a') as f:
subprocess.call(cmd, stdout=f, cwd = folder)
No it's not. You can wrap your call in a with statement to ensure the file closes automatically:
with open('status_grid','a') as myfile:
subprocess.call(cmd, stdout=myfile, cwd = folder)
Note: with current CPython implementations based on reference counting, the file will be closed when the reference count reaches 0, which will happen immediately in the code you posted. However, this is just an implementation detail of CPython. Other implementations may leave the file open indefinitely. Use the with statement to ensure you've written portable code.
In Python (>2.7) does the code :
open('tick.001', 'w').write('test')
has the same result as :
ftest = open('tick.001', 'w')
ftest.write('test')
ftest.close()
and where to find documentation about the 'close' for this inline functionnality ?
The close() here happens when the file object is deallocated from memory, as part of its deletion logic. Because modern Pythons on other virtual machines — like Java and .NET — cannot control when an object is deallocated from memory, it is no longer considered good Python to open() like this without a close(). The recommendation today is to use a with statement, which explicitly requests a close() when the block is exited:
with open('myfile') as f:
# use the file
# when you get back out to this level of code, the file is closed
If you do not need a name f for the file, then you can omit the as clause from the statement:
with open('myfile'):
# use the file
# when you get back out to this level of code, the file is closed
In Python, and in general - does a close() operation on a file object imply a flush() operation?
Yes. It uses the underlying close() function which does that for you (source).
NB: close() and flush() won't ensure that the data is actually secure on the disk. It just ensures that the OS has the data == that it isn't buffered inside the process.
You can try sync or fsync to get the data written to the disk.
Yes, in Python 3 this is finally in the official documentation, but is was already the case in Python 2 (see Martin's answer).
As a complement to this question, yes python flushes before close, however if you want to ensure data is written properly to disk this is not enough.
This is how I would write a file in a way that it's atomically updated on a UNIX/Linux server, whenever the target file exists or not. Note that some filesystem will implicitly commit data to disk on close+rename (ext3 with data=ordered (default), and ext4 initially uncovered many application flaws before adding detection of write-close-rename patterns and sync data before metadata on those[1]).
# Write destfile, using a temporary name .<name>_XXXXXXXX
base, name = os.path.split(destfile)
tmpname = os.path.join(base, '.{}_'.format(name)) # This is the tmpfile prefix
with tempfile.NamedTemporaryFile('w', prefix=tmpname, delete=False) as fd:
# Replace prefix with actual file path/name
tmpname = str(fd.name)
try:
# Write fd here... ex:
json.dumps({}, fd)
# We want to fdatasync before closing, so we need to flush before close anyway
fd.flush()
os.fdatasync(fd)
# Since we're using tmpfile, we need to also set the proper permissions
if os.path.exists(destfile):
# Copy destination file's mask
os.fchmod(fd.fileno, os.stat(destfile).st_mode)
else:
# Set mask based on current umask value
umask = os.umask(0o22)
os.umask(umask)
os.fchmod(fd.fileno, 0o666 & ~umask) # 0o777 for dirs and executable files
# Now we can close and rename the file (overwriting any existing one)
fd.close()
os.rename(tmpname, destfile)
except:
# On error, try to cleanup the temporary file
try:
os.unlink(tmpname)
except OSError:
pass
raise
IMHO it would have been nice if Python provided simple methods around this... At the same time I guess if you care about data consistency it's probably best to really understand what is going on at a low level, especially since there are many differences across various Operating Systems and Filesystems.
Also note that this does not guarantee the written data can be recovered, only that you will get a consistent copy of the data (old or new). To ensure the new data is safely written and accessible when returning, you need to use os.fsync(...) after the rename, and even then if you have unsafe caches in the write path you could still lose data. this is common on consumer-grade hardware although any system can be configured for unsafe writes which boosts performance too. At least even with unsafe caches, the method above should still guarantee whichever copy of the data you get is valid.
filehandle.close does not necessarily flush. Surprisingly, filehandle.flush doesn't help either---it still can get stuck in the OS buffers when Python is running. Observe this session where I wrote to a file, closed it and Ctrl-Z to the shell command prompt and examined the file:
$ cat xyz
ghi
$ fg
python
>>> x=open("xyz","a")
>>> x.write("morestuff\n")
>>> x.write("morestuff\n")
>>> x.write("morestuff\n")
>>> x.flush
<built-in method flush of file object at 0x7f58e0044660>
>>> x.close
<built-in method close of file object at 0x7f58e0044660>
>>>
[1]+ Stopped python
$ cat xyz
ghi
Subsequently I can reopen the file, and that necessarily syncs the file (because, in this case, I open it in the append mode). As the others have said, the sync syscall (available from the os package) should flush all buffers to disk but it has possible system-wide performance implications (it syncs all files on the system).