When determining whether or not a file exists, how does using the try statement avoid a "race condition"?
I'm asking because a highly upvoted answer (update: it was deleted) seems to imply that using os.path.exists() creates an opportunity that would not exist otherwise.
The example given is:
try:
with open(filename): pass
except IOError:
print 'Oh dear.'
But I'm not understanding how that avoids a race condition compared to:
if not os.path.exists(filename):
print 'Oh dear.'
How does calling os.path.exists(filename) allow the attacker to do something with the file that they could not already do?
The race condition is, of course, between your program and some other code that operates on file (race condition always requires at least two parallel processes or threads, see this for details). That means using open() instead of exists() may really help only in two situations:
You check for existence of a file that is created or deleted by some background process (however, if you run inside a web server, that often means there are many copies of your process running in parallel to process HTTP requests, so for web apps race condition is possible even if there are no other programs).
There may be some malicious program running that is trying to crash your code by destroying the file at the moments you expect it to exist.
exists() just performs a single check. If file exists, it may be deleted a microsecond after exists() returned True. If file is absent, it may be created immediately.
However, open() not just tests for file existence, but also opens the file (and does these two actions atomically, so nothing can happen between the check and the opening). Usually files can not be deleted while they are open by someone. That means that inside with you may be completely sure: file really exists now since it is open. Though it's true only inside with, and the file still may be deleted immediately after with block exits, putting code that needs file to exist inside with guarantees that code will not fail.
Here's an example of usage:
try:
with open('filename') as f:
do_stuff_that_depends_on_the_existence_of_the_file(f)
except IOError as e:
print 'Trouble opening file'
If you are opening the file with any access at all, then the OS will guarantee that the file exists, or else it will fail with an error. If the access is exclusive, any other process in contention for the file will either be blocked by you, or block you.
The try is just a way to detect the error or success of the act of opening the file, since file I/O APIs in Python typically do not have return codes (exceptions are used instead). So to really answer your question, it's not the try that avoids the race condition, it's the open. It's basically the same in C (on which Python is based), but without exceptions. Read this for more information.
Note that you would probably want to execute code that depends on access to the file inside the try block. Once you close the file, its existence is no longer guaranteed.
Calling os.path.exists merely gives a snapshot at a moment in time when the file may or may not exist, and you have no knowledge of the existence of the file once os.path.exists returns. Malevolent code or unexpected logic may delete or change the file when you are not expecting it. It is akin to turning your head to check that a road is clear before driving into it. Once you turn your head back, you have nothing but a guess about what is going on where you are no longer looking. Holding the file open guarantees an extended consistent state, something not possible (for good or ill) when driving. :)
Your suggestion of checking that a file does not exist rather than using try/open is still insufficient because of the snapshot nature of os.path.exists. Unfortunately I know of no way to prevent files from being created in a directory in all cases, so I think it is best to check for the positive existence of a file, rather than its absence.
I think what you're asking is the particular race condition where:
file is opened
context is switched and the file is deleted
context is switched back and file operations are attempted on the "opened" file
The way you're "protected" in this case is by putting all the file handling code in a try block, if at any point the file becomes inaccessible/corrupt your file operations will be able to fail "gracefully" via the catch block.
Note of course modern OS's this can't happen anyway, when a file is "deleted" the delete won't take place until all open handles on the file are resolved (released)
Related
The standard answer to "how can I ensure a file is closed in Python" is to wrap the commands in a "with" statement, so that the destructor is called upon exiting the "with" block.
But what about a case where you can't do that because the file handle needs to remain open across a large swath of code? For example, you open the file handle in an object constructor, saving it to an object property, and then referring to the file handle in many object methods.
It would be possible to move the opening of the file handle to the methods themselves, but basically in that case I'd be opening/closing the file every time a method is called, which is far less efficient.
I have tried placing a "close" command in the object destructor (the "del" method), but this does not work.
A dirty but easy win is to keep record of file names when you open them, and make sure file.close(...) appears at the end. Keep a list or wrapping the open() function (suggest by this post) may do the job.
Post check what files are open in Python
suggests several solutions like wrapping the built-in file object, command line method and module psutil, etc, maybe some of them would fit in your situations.
E.g. If I am trying to open a file, can I not simply check if os.path.exists(myfile) instead of using try/except . I think the answer to why I should not rely on os.path.exists(myfile) is that there may be a number of other reasons why the file may not open.
Is that the logic behind why error handling using try/except should be used?
Is there a general guideline on when to use Exceptions in Python.
Race conditions.
In the time between checking whether a file exists and doing an operation that file might have been deleted, edited, renamed, etc...
On top of that, an exception will give you an OS error code that allows you to get more relevant reason why an operation has failed.
Finally, it's considered Pythonic to ask for forgiveness, rather than ask for permission.
Generally you use try/except when you handle things that are outside of the parameters that you can influence.
Within your script you can check variables for type, lists for length, etc. and you can be sure that the result will be sufficient since you are the only one handling these objects. As soon however as you handle files in the file system or you connect to remote hosts etc. you can neither influence or check all parameters anymore nor can you be sure that the result of the check stays valid.
As you said,
the file might be existent but you don't have access rights
you might be able to ping a host address but a connection is declined
There are too many factors that could go wrong to check them all seperately plus, if you do, they might still change until you actually perform your command.
With the try/error you can generally catch every exception and handle the most important errors individually. You make sure that the error is handled even if the test succeeds at first but fails after you start running your commands.
I am writing a program in Python 2.7 that retrieves remote files and dumps them in a directory that can be specified by the user. Currently, in order to verify that the program can in fact write to that directory, I do the following (assuming that dumpdir is a string containing the name of the directory to check):
try:
os.mkdir(dumpdir+'/.mwcrawler')
os.rmdir(dumpdir+'/.mwcrawler')
except:
logging.error('Could not open %s for writing, using default', dumpdir)
But this feels even more hackish than my usual code. What's the correct way to go about this? Maybe some sort of assertion on privileges?
In general, it's better to ask for forgiveness than permission—you have to handle errors in writing each file anyway, so why check in advance?
But, when you have a user interface—even a command-line interface, where you may read a prefs file long before you get to any writing—it's often much more convenient to the user to return errors as soon as possible. As long as that's the only reason you're doing this check, there's nothing wrong with it.
However, there are many little ways you could improve the way you do the check.
First, you should almost never just use except: without specifying anything. Besides the fact that this catches different things in different versions of Python (and therefore also confuses human readers used to other versions), it means you have no way of distinguishing between not writable, a bad Unicode character, or even a typo in the code.
Plus, your error message says "not readable" if it can't write, which is pretty odd.
Second, unless you're sure nobody will ever have a file named .mwcrawler (e.g., because you refuse to transfer files starting with '.' or something), using any fixed name is just asking for trouble. A better solution is to use, e.g., tempfile.mkdtemp.
Also, you should avoid using string manipulation for paths if you want to be portable. That's what os.path (and higher-level utilities) are for—so you don't have to learn or think about Windows, zOS, etc.
Putting it all together:
try:
d = tempfile.mkdtemp(prefix='.mwcrawler', dir=dumpdir)
except Exception as e:
logging.error('Could not open %s for reading (%s), using default', dumpdir, e)
else:
os.rmdir(d)
This link describes the usage of os.access, a method specifically created for your needs.
It also explains a better way of approaching rights checking.
As also rightfully mentioned in comments, os.access will have issues in a few specific cases, so just to be totally sure, "hit-n-run" approach is actually better, try writing, catch exception, see what happened - go from there.
Say you have:
def my_func():
fh = open(...)
try:
print fh.read()
finally:
fh.close()
My first question is: Is it worth having the try/finally (or with) statement? Isn't the file closed anyway when the function terminates (via garbage collection)?
I came across this after reading a recipe form Martelli's "python cookbook" where
all_the_text = open('thefile.txt').read()
comes with the comment: "When you do so, you no longer have a reference to the file object as soon as the reading operation finishes. In practice, Python notices the lack of a reference at once, and immediately closes the file."
My function example is almost the same. You do have a reference, it's just that the reference has a very short life.
My second question is: What does "immediately" in Martelli's statement mean? Even though you don't have a reference at all, doesn't the file closing happen at garbage collection time anyway?
It is good practice to close the file yourself. Using the with statement leads to clean code and it automatically closes the file (which is a Good Thing).
Even though Python is a high-level programming language, you still need to be in control of what you're doing. As a rule of thumb: if you open a file, it also needs to be closed. There's never a good reason to be sloppy in your code :-)
Regarding your second question: it won't run immediately, it'll run when the garbage collector decides it is time to run. When the file object is deallocated Python will close the file. Here are some articles on garbage collection in Python (also see the gc module), it's an interesting read.
It also shows that Python's garbage collection uses a threshold based on the number of allocated and deallocated objects before it decides to garbage collect. If your file is big then Python might hold the file open longer than necessary because the garbage collection code might not have run yet.
I am writing a wrapper for the ConfigParser in Python to provide an easy interface for storing and retrieving application settings.
The wrapper has two methods, read and write, and a set of properties for the different application settings.
The write method is just a wrapper for the ConfigParser's write method with the addition of also creating the file object needed by the ConfigParser. It looks like this:
def write(self):
f = open(self.path, "w")
try:
self.config_parser.write(f)
finally:
f.close()
I would like to write a unit test that asserts that this method raises an IOError if the file could not be written to and in the other case that the write method of the config parser was called.
The second test is quite easy to handle with a mock object. But the open call makes things a little tricky. Eventually I have to create a file object to pass to the config parser. The fact that a file will actually be created when running this code doesn't make it very useful for a unit test. Is there some strategy for mocking file creation? Can this piece of code be tested in some way? Or is it just too simple to be tested?
First, you don't actually need to unit test open(), since it's pretty reasonable to assume that the standard library is correct.
Next, you don't want to do file system manipulations to get open() to generate the error you want, because then you're not unit testing, you're doing a functional/integration test by including the file system.
So you could perhaps replace open() in the global namespace with a surrogate that just raises an IOError. Though, probably need to make sure you put things back if execution continues.
But in the end, what value does the test have? There's so little in that code snippet that's your own system. Even replacing open() really just ends up being a test that says "does the try and finally statement in Python work?"
My suggestion? Just add a statement to the docstring that records your expectation. "Raises an IOError if the file can't be written." Then move on. You can add a unit test later if this method gains some complexity (and merit for testing).
Actually, only open could throw an exception in your code. The docs for write() doesn't say anything about exceptions. Possibly only a ValueError or something for a bad file pointer (as a result of open failing, which can't be the case here).
Making an IOError for open is easy. Just create the file elsewhere and open it for writing there. Or you could change the permissions for it so you don't have access.
You'd might wanna use the with statement here though, and it'll handle the closing itself.
In python 2.5 you need the first line. In later versions you don't need it.
from __future__ import with_statement # python 2.5 only
def write(self):
with open(self.path, 'w') as f:
self.config_parser.write(f)
The write method is guaranteed to be called if open succeeds, and won't be called if open raises an IOError. I don't know why you'd need a test to see if write was called. The code says that it does. Don't overdo your testing. ;)
Remember you don't have to test that open() or ConfigParser work—they're not part of your code—you just have to test that you use them correctly. You can monkeypatch the module with your own open(), just as for the instance attribute, and can return a mock from it that helps you test.
However, unit tests are not my only tool, and this is one function that's simple enough to analyze and "prove"† that it works.
†Less rigorously than mathematicians would like, I'm sure, but good enough for me.