E.g. If I am trying to open a file, can I not simply check if os.path.exists(myfile) instead of using try/except . I think the answer to why I should not rely on os.path.exists(myfile) is that there may be a number of other reasons why the file may not open.
Is that the logic behind why error handling using try/except should be used?
Is there a general guideline on when to use Exceptions in Python.
Race conditions.
In the time between checking whether a file exists and doing an operation that file might have been deleted, edited, renamed, etc...
On top of that, an exception will give you an OS error code that allows you to get more relevant reason why an operation has failed.
Finally, it's considered Pythonic to ask for forgiveness, rather than ask for permission.
Generally you use try/except when you handle things that are outside of the parameters that you can influence.
Within your script you can check variables for type, lists for length, etc. and you can be sure that the result will be sufficient since you are the only one handling these objects. As soon however as you handle files in the file system or you connect to remote hosts etc. you can neither influence or check all parameters anymore nor can you be sure that the result of the check stays valid.
As you said,
the file might be existent but you don't have access rights
you might be able to ping a host address but a connection is declined
There are too many factors that could go wrong to check them all seperately plus, if you do, they might still change until you actually perform your command.
With the try/error you can generally catch every exception and handle the most important errors individually. You make sure that the error is handled even if the test succeeds at first but fails after you start running your commands.
Related
I've always had trouble understanding the real purpose of catching exceptions in any programming language.
Generally, when you can catch an exception, you can also prevent the exception from ever occurring. Here's an example.
What's the point of doing this:
import os
import argparse
if __name__=="__main__":
parser = argparse.ArgumentParser()
parser.add_argument("filename", type=str, help="the filename")
args = parser.parse_args()
try:
with open(args.filename) as fs:
fs.write("test")
except FileNotFoundError:
print("that file was not found")
If I can do this:
import os
import argparse
if __name__=="__main__":
parser = argparse.ArgumentParser()
parser.add_argument("filename", type=str, help="the filename")
args = parser.parse_args()
if os.path.exists(args.filename):
with open(args.filename) as fs:
fs.write("test")
else:
print("that file was not found")
What is the added value of catching anything as an exception, when it is often more feasible to prevent the exception from ever happening in the first place?
Exceptions are exceptional situations being thrown when an undesired, erroneous issue happened. It is true that in many cases it is unnecessary to use the exception pattern as you can check for the possibility to perform the operation before giving it a chance to err. But, sometimes that's not possible.
For instance, let's consider the example that you are running a database query. You cannot know whether that will succeed without actually running it. So, there are cases when you cannot prevent exceptions by doing validations.
Also, there are cases when there could be many exceptions upon many levels. Your code would become very difficult to read and work with if you validate everything. Consider the case when you have a method that receives an object and calls 100 other methods, passing the same object.
When do you validate whether that object was properly initialized? Do you validate it at the method that calls the other methods only? But then, later somebody else might call one of the 100 methods from another place without validating it. So, if we are to only validate, then we will end up writing the same validating code in 101 methods, instead of catching the exception at a single place, regardless of which method throws it.
You will also need to use third-party libraries. Are they perfect? Probably not quite and not all of them. How do you validate everything in the code written by someone else?
Summary:
sometimes there is no way to know whether an operation succeeds before running it
third-party libraries will come as they are, possibly with errors, you cannot apply validation for them, unless you get into their code and refactor the whole thing (you could as well write the whole library instead)
doing validation-only may lead to code repetition, unreadable code, code difficult to maintain and very long refactoring when the duplicated validation needs to be changed
you cannot think about all the possible errors, you need at least a layer that catches the problems you didn't foresee
when you upgrade some versions, like Python version for example, it is quite possible that something will no longer work
Validating operations is often a good idea, but it cannot substitute exception handling. They are going hand-in-hand. You validate what makes sense to be validated and that's often a subjective decision, but, if still an exception occurs, you need to handle that properly
I am writing a program in Python 2.7 that retrieves remote files and dumps them in a directory that can be specified by the user. Currently, in order to verify that the program can in fact write to that directory, I do the following (assuming that dumpdir is a string containing the name of the directory to check):
try:
os.mkdir(dumpdir+'/.mwcrawler')
os.rmdir(dumpdir+'/.mwcrawler')
except:
logging.error('Could not open %s for writing, using default', dumpdir)
But this feels even more hackish than my usual code. What's the correct way to go about this? Maybe some sort of assertion on privileges?
In general, it's better to ask for forgiveness than permission—you have to handle errors in writing each file anyway, so why check in advance?
But, when you have a user interface—even a command-line interface, where you may read a prefs file long before you get to any writing—it's often much more convenient to the user to return errors as soon as possible. As long as that's the only reason you're doing this check, there's nothing wrong with it.
However, there are many little ways you could improve the way you do the check.
First, you should almost never just use except: without specifying anything. Besides the fact that this catches different things in different versions of Python (and therefore also confuses human readers used to other versions), it means you have no way of distinguishing between not writable, a bad Unicode character, or even a typo in the code.
Plus, your error message says "not readable" if it can't write, which is pretty odd.
Second, unless you're sure nobody will ever have a file named .mwcrawler (e.g., because you refuse to transfer files starting with '.' or something), using any fixed name is just asking for trouble. A better solution is to use, e.g., tempfile.mkdtemp.
Also, you should avoid using string manipulation for paths if you want to be portable. That's what os.path (and higher-level utilities) are for—so you don't have to learn or think about Windows, zOS, etc.
Putting it all together:
try:
d = tempfile.mkdtemp(prefix='.mwcrawler', dir=dumpdir)
except Exception as e:
logging.error('Could not open %s for reading (%s), using default', dumpdir, e)
else:
os.rmdir(d)
This link describes the usage of os.access, a method specifically created for your needs.
It also explains a better way of approaching rights checking.
As also rightfully mentioned in comments, os.access will have issues in a few specific cases, so just to be totally sure, "hit-n-run" approach is actually better, try writing, catch exception, see what happened - go from there.
When determining whether or not a file exists, how does using the try statement avoid a "race condition"?
I'm asking because a highly upvoted answer (update: it was deleted) seems to imply that using os.path.exists() creates an opportunity that would not exist otherwise.
The example given is:
try:
with open(filename): pass
except IOError:
print 'Oh dear.'
But I'm not understanding how that avoids a race condition compared to:
if not os.path.exists(filename):
print 'Oh dear.'
How does calling os.path.exists(filename) allow the attacker to do something with the file that they could not already do?
The race condition is, of course, between your program and some other code that operates on file (race condition always requires at least two parallel processes or threads, see this for details). That means using open() instead of exists() may really help only in two situations:
You check for existence of a file that is created or deleted by some background process (however, if you run inside a web server, that often means there are many copies of your process running in parallel to process HTTP requests, so for web apps race condition is possible even if there are no other programs).
There may be some malicious program running that is trying to crash your code by destroying the file at the moments you expect it to exist.
exists() just performs a single check. If file exists, it may be deleted a microsecond after exists() returned True. If file is absent, it may be created immediately.
However, open() not just tests for file existence, but also opens the file (and does these two actions atomically, so nothing can happen between the check and the opening). Usually files can not be deleted while they are open by someone. That means that inside with you may be completely sure: file really exists now since it is open. Though it's true only inside with, and the file still may be deleted immediately after with block exits, putting code that needs file to exist inside with guarantees that code will not fail.
Here's an example of usage:
try:
with open('filename') as f:
do_stuff_that_depends_on_the_existence_of_the_file(f)
except IOError as e:
print 'Trouble opening file'
If you are opening the file with any access at all, then the OS will guarantee that the file exists, or else it will fail with an error. If the access is exclusive, any other process in contention for the file will either be blocked by you, or block you.
The try is just a way to detect the error or success of the act of opening the file, since file I/O APIs in Python typically do not have return codes (exceptions are used instead). So to really answer your question, it's not the try that avoids the race condition, it's the open. It's basically the same in C (on which Python is based), but without exceptions. Read this for more information.
Note that you would probably want to execute code that depends on access to the file inside the try block. Once you close the file, its existence is no longer guaranteed.
Calling os.path.exists merely gives a snapshot at a moment in time when the file may or may not exist, and you have no knowledge of the existence of the file once os.path.exists returns. Malevolent code or unexpected logic may delete or change the file when you are not expecting it. It is akin to turning your head to check that a road is clear before driving into it. Once you turn your head back, you have nothing but a guess about what is going on where you are no longer looking. Holding the file open guarantees an extended consistent state, something not possible (for good or ill) when driving. :)
Your suggestion of checking that a file does not exist rather than using try/open is still insufficient because of the snapshot nature of os.path.exists. Unfortunately I know of no way to prevent files from being created in a directory in all cases, so I think it is best to check for the positive existence of a file, rather than its absence.
I think what you're asking is the particular race condition where:
file is opened
context is switched and the file is deleted
context is switched back and file operations are attempted on the "opened" file
The way you're "protected" in this case is by putting all the file handling code in a try block, if at any point the file becomes inaccessible/corrupt your file operations will be able to fail "gracefully" via the catch block.
Note of course modern OS's this can't happen anyway, when a file is "deleted" the delete won't take place until all open handles on the file are resolved (released)
So, I'm new to programming and my question is:
Is it considered a bad practice to use an exception handler to override error-message-behaviour of default methods of a programming language with custom functionality? I mean, is it ethically correct to use something like this (Python):
def index(p, val):
try:
return p.index(val)
except ValueError:
return -1
Maybe I wasn't precise enough. What I meant is: is it a normal or not-recommended practice to consider thrown exceptions (well, I guess it's not applicable everywhere) as legit and valid case-statements?
Like, the idea of the example given above is not to make a custom error message, but to suppress possible errors happening without warning neither users nor other program modules, that something is going wrong.
I think that doing something like this is OK as long as you use function names which make it clear that the user isn't using a built-in. If the user thinks they're using a builtin and all of a sudden index returns -1, imagine the bugs that could happen ... They do:
a[index(a,'foo')]
and all of a sudden they get the last element in the list (which isn't foo).
As a very important rule though, Only handle exceptions that you know what to do with. Your example above does this nicely. Kudos.
This is perfectly fine but depends on what kind of condition you are checking. It is the developers responsibility to check for these conditions. Some exceptions are fatal for the program and some may not. All depends on the context of the method.
With a language like python, I would argue it is much better to give a custom error message for the function than the generic ValueError exception.
However, for your own applications, having this functionality inside your methods can make code easier to read and maintain.
For other languages, the same is true, but you should try and make sure that you don't mimick another function with a different behaviour, whilst hiding the Exceptions.
If you know where exactly your errors will occur and the cause of error too, then there is nothing wrong with such kind of handling. Becaues you are just taking appropriate action for something wrong happening, that you know can happen .
So, For E.g: - If you are trying to divide two numbers, and you know that if the denominator is 0, then you can't divide, then in that case you can use a custom message to denote the problem.
Suppose I'm creating a class to validate a number, like "Social Security" in US (just as an example of a country-based id). There are some rules to validate this number that comes from an input in a html form in a website.
I thinking about creating a simple class in Python, and a public validate method. This validate returns True or False, simply. This method will call other small private methods (like for the first 'x' numbers if there is a different rule), each one returning True or False as well.
Since this is really simple, I'm thinking of using boolean status codes only (if it's valid or not, don't need meaningful messages about what is wrong).
I've been reading some articles about using exceptions, and I would like to know your opinion in my situation: would using exceptions would be a good idea?
This is a very old question but since the only answer - IMO - is not applicable to Python, here comes my take on it.
Exceptions in Python is something many programmers new to the language have difficulties dealing with. Compared to other languages, Python differs significantly in how exceptions are used: in fact Python routinely uses exceptions for flow control.
The canonical example is the for loop: you will certainly agree that there is nothing "uniquely bizarre" about the loop exhausting its iterations (indeed that's what all loops do, unless broken)... yet rather than checking in advance if there are still values to process, Python keeps on trying reading values from the iterable, and failing that, rises the StopIterator exception, which in turn is catch by the for expression and make the code exiting the loop.
Furthermore, it is idiomatic in Python to go by the EAFP (it's Easier to Ask for Forgiveness than Permission = try-except) rather than LBYL (Look Before You Leap = if not A, B or C then).
In this regard, csj's answer is correct for C or Java but is irrelevant for Python (whose exceptions are seldom "exceptional" in nature).
Another factor to consider - though - is the scenario in which user data is invalid but you fail to act on the validation function outcome:
with a return statement, failing to process the False value will result in having your non-valid data sent down the pipeline,
contrarily, if you were to raise an Exception, failing to catch it would result in the exception propagating through your stack eventually resulting in your code to halt.
While the second option might seems scary at first, it is still the right road to take: if data is invalid, there is no sense in passing it further down the line... it will most probably introduce difficult-to-track bugs later on in the flow and you will have also missed the chance to fix a bug in your code (failing to act on non-valid data).
Again. Using exceptions is the pythonic way to do (but it does not apply to most other languages) as also stated in this other answer and in the zen of python:
Errors should never pass silently.
Unless explicitly silenced.
HTH!
If an input is either valid or not, then just return the boolean. There's nothing exceptional about a validation test encountering an invalid value.