Should I make my python code less fool-proof to improve readability? - python

I try to make my code fool-proof, but I've noticed that it takes a lot of time to type things out and it takes more time to read the code.
Instead of:
class TextServer(object):
def __init__(self, text_values):
self.text_values = text_values
# <more code>
# <more methods>
I tend to write this:
class TextServer(object):
def __init__(self, text_values):
for text_value in text_values:
assert isinstance(text_value, basestring), u'All text_values should be str or unicode.'
assert 2 <= len(text_value), u'All text_values should be at least two characters long.'
self.__text_values = frozenset(text_values) # <They shouldn't change.>
# <more code>
#property
def text_values(self):
# <'text_values' shouldn't be replaced.>
return self.__text_values
# <more methods>
Is my python coding style too paranoid? Or is there a way to improve readability while keeping it fool-proof?
Note 1: I've added the comments between < and > just for clarification.
Note 2: The main fool I try to prevent to abuse my code is my future self.

Here's some good advice on Python idioms from this page:
Catch errors rather than avoiding them to avoid cluttering your code with special cases. This idiom is called EAFP ('easier to ask forgiveness than permission'), as opposed to LBYL ('look before you leap'). This often makes the code more readable. For example:
Worse:
#check whether int conversion will raise an error
if not isinstance(s, str) or not s.isdigit:
return None
elif len(s) > 10: #too many digits for int conversion
return None
else:
return int(str)
Better:
try:
return int(str)
except (TypeError, ValueError, OverflowError): #int conversion failed
return None
(Note that in this case, the second version is much better, since it correctly handles leading + and -, and also values between 2 and 10 billion (for 32-bit machines). Don't clutter your code by anticipating all the possible failures: just try it and use appropriate exception handling.)

"Is my python coding style too paranoid? Or is there a way to improve readability while keeping it fool-proof?"
Who's the fool you're protecting yourself from?
You? Are you worried that you didn't remember the API you wrote?
A peer? Are you worried that someone in the next cubicle will actively make an effort to pass the wrong things through an API? You can talk to them to resolve this problem. It saves a lot of code if you provide documentation.
A total sociopath who will download your code, refuse to read the API documentation, and then call all the methods with improper arguments? What possible help can you provide to them?
The "fool-proof" coding isn't really very helpful, since all of these scenarios are more easily addressed another way.
If you're fool-proofing against yourself, perhaps that's not really sensible.
If you're fool-proofing for a co-worker or peer, you should -- perhaps -- talk to them and make sure they understand the API docs.
If you're fool-proofing against some hypothetical sociopathic programmer who's out to subvert the API, there's nothing you can do. It's Python. They have the source. Why would they go through the effort to misuse the API when they can just edit the source to break things?

It's unusual in Python to use a private instance attribute, and then expose it through a property as you have. Just use self.text_values.

Your code is too paranoid (especially when you want to protect only against yourself).
In Python circles, LBYL is generally (but not always) frowned upon. But there's also the (often unstated) assumption that one has (good) unit tests.
Me personally? I think readability is of paramount importance. I mean, if you yourself think it's hard to read, what will others think? And less readable code is also more likely to catch bugs. Not to mention making working on it harder/more time consuming (you have to dig to find what the code actually does in all that LBYLing)

If you try to make your code totally foolproof, someone will invent a better fool. Seriously, a good rule of thumb is to guard against likely errors, but don't clutter your code trying to think of every conceivable way a caller could break you.

instead of spending time in assert and private variable i prefer spend time in documentation and test cases. i prefer read documentation and when i have to read code i prefer read tests. this is more true the more code grow up. in the same time tests give you a fool-proof code and useful use cases.

I base the need for error checking code on the consequences of the errors that it's checking for. If crap data gets into my system, how long will it be before I discover it, how hard will it be to determine that the problem was crap data, and how difficult will it be to fix? For cases like the one you've posted, the answers are generally "not long," "not hard," and "not difficult."
But data that's going to be persisted somewhere and then used as the input to a complicated algorithm in six weeks? I'll check the hell out of that.

I don't think this is specific to python. I'm a firm believer in Design by Contract: Ideally all functions should have clear pre- and post-conditions; unfortunately most languages (Eiffel being the canonical exception) don't provide particularly convenient ways to achieve this which contributes to the apparent conflict between clarity and correctness.
As a practical matter, one approach is to write a 'checkValues' method so as to avoid cluttering __init__. You can even compress it to:
def __init__(self, text_values):
self.text_values = checkValues( text_values )
def checkValues(text_values):
for text_value in text_values:
assert isinstance(text_value, basestring), u'All text_values should be str or unicode.'
assert 2 <= len(text_value), u'All text_values should be at least two characters long.'
return( frozenset( text_values ) )
Another approach would be using a folding text editor that can hide/show the pre-conditions with the aid of some commenting conventions; this would also be useful for auto-generating documentation.

Take all the energy you're putting into argument checking and channel it instead into writing clear, concise doc-strings.
Unless you're writing code for nuclear reactors; in which case I would appreciate you doing both.

Another way to think is: you only need to catch errors that you can correct. If you're checking the input just for aborting with an AssertionError, you're better off just allowing the code to raise the appropriate exception later so you can debug correctly.
This line in particular is pretty bad since it stops duck-typing:
assert isinstance(text_value, basestring), u'All text_values should be str or unicode.'

Related

Recipe for anonymous functions in python?

I'm looking for the best recipie to allow inline definition of functions, or multi-line lambda, in python.
For example, I'd like to do the following:
def callfunc(func):
func("Hello")
>>> callfunc(define('x', '''
... print x, "World!"
... '''))
Hello World!
I've found an example for the define function in this answer:
def define(arglist, body):
g = {}
exec("def anonfunc({0}):\n{1}".format(
arglist,
"\n".join(" {0}".format(line) for line in body.splitlines())), g)
return g["anonfunc"]
This is one possible solution, but it is not ideal. Desireable features would be:
be smarter about indentation,
hide the innards better (e.g. don't have anonfunc in the function's scope)
provide access to variables in the surrounding scope / captures
better error handling
and some things I haven't thought of. I had a really nice implementation once that did most of the above, but I lost in unfortunately. I'm wondering if someone else has made something similar.
Disclaimer:
I'm well aware this is controversial among Python users, and regarded as a hack or unpythonic. I'm also aware of the discussions regaring multi-line-lambdas on the python-dev mailing list, and that a similar feature was omitted on purpose. However, from the same discussions I've learned that there is also interest in such a function by many others.
I'm not asking whether this is a good idea or not, but instead: Given that one has decided to implement this, (either out of fun and curiosity, madness, genuinely thinking this is a nice idea, or being held at gunpoint) how to make anonymous define work as close as possible to def using python's (2.7 or 3.x) current facilities?
Examples:
A bit more as to why, this can be really handy for callbacks in GUIs:
# gtk example:
self.ntimes = 0
button.connect('clicked', define('*a', '''
self.ntimes += 1
label.set_text("Button has been clicked %d times" % self.ntimes)
''')
The benefit over defining a function with def is that your code is in a more logical order. This is simplified code taken from a Twisted application:
# twisted example:
def sayHello(self):
d = self.callRemote(HelloCommand)
def handle_response(response):
# do something, this happens after (x)!
pass
d.addCallback(handle_response) # (x)
Note how it seems out of order. I usually break stuff like this up, to keep the code order == execution order:
def sayHello_d(self):
d = self.callRemote(HelloCommand)
d.addCallback(self._sayHello_2)
return d
def _sayHello_2(self, response):
# handle response
pass
This is better wrt. ordering but more verbose. Now, with the anonymous functions trick:
d = self.callRemote(HelloCommand)
d.addCallback(define('response', '''
print "callback"
print "got response from", response["name"]
'''))
If you come from a javascript or ruby background, python's abilities to deal with anonymous functions may indeed seem limited, but this is for a reason. Python designers decided that clarity of code is more important than conciseness. If you don't like that, you probably don't like python at all. There's nothing wrong about that, there are many other choices - why not to try a language that tastes better to you?
Putting chunks of code into strings and interpreting them on the fly is definitely a wrong way to "extend" a language, just because none of tools you're working with - from syntax highlighters to the python interpreter itself - would be able to deal with "stringified" code in a sensible way.
To answer the question as asked: what you're doing there is essentially an attempt to construct some better-than-python programming language and compile it to python on the fly. The idea is not new in the world of scripting languages and can be productive or not (CoffeeScript is an example of a successful implementation), but your very approach is wrong. format() not the tool you're looking for when working with code. If you're writing a compiler, do it properly: use a parser (e.g. pyparsing) to read your code in an AST, walk through the AST to generate python code (or even bytecode), catch syntax errors as you go and take measures to provide better runtime feedback (e.g. error context, line numbers etc). Finally, make sure your compiler works across different python versions and implementations.
Or just use ruby.

What is your strategy to avoid dynamic typing errors in Python (NoneType has no attribute x)?

I'm not sure if I like Python's dynamic-ness. It often results in me forgetting to check a type, trying to call an attribute and getting the NoneType (or any other) has no attribute x error. A lot of them are pretty harmless but if not handled correctly they can bring down your entire app/process/etc.
Over time I got better predicting where these could pop up and adding explicit type checking, but because I'm only human I miss one occasionally and then some end-user finds it.
So I'm interested in your strategy to avoid these. Do you use type-checking decorators? Maybe special object wrappers?
Please share...
forgetting to check a type
This doesn't make much sense. You so rarely need to "check" a type. You simply run unit tests and if you've provided the wrong type object, things fail. You never need to "check" much, in my experience.
trying to call an attribute and
getting the NoneType (or any other)
has no attribute x error.
Unexpected None is a plain-old bug. 80% of the time, I omitted the return. Unit tests always reveal these.
Of those that remain, 80% of the time, they're plain old bugs due to an "early exit" which returns None because someone wrote an incomplete return statement. These if foo: return structures are easy to detect with unit tests. In some cases, they should have been if foo: return somethingMeaningful, and in still other cases, they should have been if foo: raise Exception("Foo").
The rest are dumb mistakes misreading the API's. Generally, mutator functions don't return anything. Sometimes I forget. Unit tests find these quickly, since basically, nothing works right.
That covers the "unexpected None" cases pretty solidly. Easy to unit test for. Most of the mistakes involve fairly trivial-to-write tests for some pretty obvious species of mistakes: wrong return; failure to raise an exception.
Other "has no attribute X" errors are really wild mistakes where a totally wrong type was used. That's either really wrong assignment statements or really wrong function (or method) calls. They always fail elaborately during unit testing, requiring very little effort to fix.
A lot of them are pretty harmless but if not handled correctly they can bring down your entire app/process/etc.
Um... Harmless? If it's a bug, I pray that it brings down my entire app as quickly as possible so I can find it. A bug that doesn't crash my app is the most horrible situation imaginable. "Harmless" isn't a word I'd use for a bug that fails to crash my app.
If you write good unit tests for all of your code, you should find the errors very quickly when testing code.
You can also use decorators to enforce the type of attributes.
>>> #accepts(int, int, int)
... #returns(float)
... def average(x, y, z):
... return (x + y + z) / 2
...
>>> average(5.5, 10, 15.0)
TypeWarning: 'average' method accepts (int, int, int), but was given
(float, int, float)
15.25
>>> average(5, 10, 15)
TypeWarning: 'average' method returns (float), but result is (int)
15
I'm not really a fan of them, but I can see their usefulness.
One tool to try to help you keep your pieces fitting together well is interfaces. zope.interface is the most notable package in the Python world for using interfaces. Check out http://wiki.zope.org/zope3/WhatAreInterfaces and http://glyph.twistedmatrix.com/2009/02/explaining-why-interfaces-are-great.html to start to get an idea how interfaces and z.i in particular work. Interfaces can prove very useful in a large Python codebases.
Interfaces are no substitute for testing. Reasonably comprehensive testing is especially important in highly dynamic languages like Python where there are types of bugs that could not exist in a statically types language. Tests will also help you catch the sorts of bugs that are not unique to dynamic languages. Fortunately, developing in Python means that testing is easy (due to the flexibility) and you have plenty of time to write them that you saved because you're using Python.
One advantage of TDD is that you end up writing code that is easier to write tests for.
Writing code first and then the tests can result in code that superficially works the same, but is much harder to write 100% coverage tests for.
Each case is likely to be different
It might make sense to have a decorator to check whether a particular parameter is None (or some other unexpected value) if you use it in a bunch of places.
Maybe it is appropriate to use the Null pattern - if the code is blowing up because you are setting the initial value to None, you could instead set the initial value to a null version of the object.
More and more wrappers can add up to quite a performance hit though, so it's always better to write code from the start that avoids the corner cases
forgetting to check a type
With duck typing, it shouldn't be necessary to check a type. But that's theory, in reality you will often want to validate input parameters (e.g. checking a UUID with a regex). For that purpose, I created myself some handy decorators for simple type and return type checking which are called like this:
#decorators.params(0, int, 2, str) # first parameter must be integer / third a string
#decorators.returnsOrNone(int, long) # must return an int/long value or None
def doSomething(integerParam, noMatterWhatParam, stringParam):
...
For everything else I mostly use assertions. Of course one often forgets to check a parameter, so it's necessary to test and to test often.
trying to call an attribute
Happens to me very seldom. Actually I often use methods instead of direct access to attributes (the "good" old getter/setter approach sometimes).
because I'm only human I miss one occasionally and then some end-user finds it
"Software is always completed at the customers'." - An anti-pattern which you should solve with unit tests that handle all possible cases in a function. Easier said than done, but it helps...
As for other common Python mistakes (mistyped names, wrong imports, ...), I'm using Eclipse with PyDev for projects (not for small scripts). PyDev warns you about most of the simple kinds of mistakes.
I haven’t done a lot of Python programming, but I’ve done no programming at all in staticly typed languages, so I don’t tend to think about things in terms of variable types. That might explain why I haven’t come across this problem much. (Although the small amount of Python programming I’ve done might explain that too.)
I do enjoy Python 3’s revised handling of strings (i.e. all strings are unicode, everything else is just a stream of bytes), because in Python 2 you might not notice TypeErrors until dealing with unusual real world string values.
You can hint your IDE via function doc, for example: http://www.pydev.org/manual_adv_type_hints.html, in JavaScript the jsDoc helps in a similar way.
But at some point you will face errors that a typed language would avoid immediately without unit tests (via the IDE compilation and the types/inference).
Of course this does not remove the benefit of unit tests, static analysis and assertions. For larger project I tend to use statically typed languages because they have very good IDE support (excellent autocompletion, heavy refactoring...). You can still use scripting or a DSL for some sub part of the project.
Something you can use to simplify your code is using the Null Object Design Pattern (to which I was introduced in Python Cookbook).
Roughly, the goal with Null objects is to provide an 'intelligent'
replacement for the often used primitive data type None in Python or
Null (or Null pointers) in other languages. These are used for many
purposes including the important case where one member of some group
of otherwise similar elements is special for whatever reason. Most
often this results in conditional statements to distinguish between
ordinary elements and the primitive Null value.
This object just eats the lack of attribute error, and you can avoid checking for their existence.
It's nothing more than
class Null(object):
def __init__(self, *args, **kwargs):
"Ignore parameters."
return None
def __call__(self, *args, **kwargs):
"Ignore method calls."
return self
def __getattr__(self, mname):
"Ignore attribute requests."
return self
def __setattr__(self, name, value):
"Ignore attribute setting."
return self
def __delattr__(self, name):
"Ignore deleting attributes."
return self
def __repr__(self):
"Return a string representation."
return "<Null>"
def __str__(self):
"Convert to a string and return it."
return "Null"
With this, if you do Null("any", "params", "you", "want").attribute_that_doesnt_exists() it won't explode, but just silently become the equivalent of pass.
Normally you'd do something like
if obj.attr:
obj.attr()
With this, you just do:
obj.attr()
and forget about it. Beware that extensive use of the Null object can potentially hide bugs in your code.
I tend to use
if x is None:
raise ValueError('x cannot be None')
But this will only work with the actual None value.
A more general approach is to test for the necessary attributes before you try to use them. For example:
def write_data(f):
# Here we expect f is a file-like object. But what if it's not?
if not hasattr(f, 'write'):
raise ValueError('write_data requires a file-like object')
# Now we can do stuff with f that assumes it is a file-like object
The point of this code is that instead of getting an error message like "NoneType has no attribute write", you get "write_data requires a file-like object". The actual bug isn't in write_data(), and isn't really a problem with NoneType at all. The actual bug is in the code that calls write_data(). The key is to communicate that information as directly as possible.

Is it better to use an exception or a return code in Python?

You may know this recommendation from Microsoft about the use of exceptions in .NET:
Performance Considerations
...
Throw exceptions only for
extraordinary conditions, ...
In addition, do not throw an exception
when a return code is sufficient...
(See the whole text at http://msdn.microsoft.com/en-us/library/system.exception.aspx.)
As a point of comparison, would you recommend the same for Python code?
The pythonic thing to do is to raise and handle exceptions. The excellent book "Python in a nutshell" discusses this in 'Error-Checking Strategies' in Chapter 6.
The book discusses EAFP ("it's easier to ask forgiveness than permission") vs. LBYL ("look before you leap").
So to answer your question:
No, I would not recommend the same for python code. I suggest you read chapter 6 of Python in a nutshell.
The best way to understand exceptions is "if your method can't do what its name says it does, throw." My personal opinion is that this advice should be applied equally to both .NET and Python.
The key difference is where you have methods that frequently can't do what their name says they should do, for instance, parsing strings as integers or retrieving a record from a database. The C# style is to avoid an exception being thrown in the first place:
int i;
if (Int32.TryParse(myString, out i)) {
doWhatever(i);
}
else {
doWhatever(0);
}
whereas Python is much more at ease with this kind of thing:
try:
i = int(myString)
except ValueError:
i = 0
doWhatever(i);
Usually, Python is geared towards expressiveness.
I would apply the same principle here: usually, you expect a function to return a result (in line with its name!) and not an error code.
For this reason, it is usually better raising an exception than returning an error code.
However, what is stated in the MSDN article applies to Python as well, and it's not really connected to returning an error code instead of an exception.
In many cases, you can see exception handling used for normal flow control, and for handling expected situations. In certain environments, this has a huge impact on performance; in all environments it has a big impact on program expressiveness and maintainability.
Exceptions are for exceptional situations, that are outside of normal program flow; if you expect something will happen, then you should handle directly, and then raise anything that you cannot expect / handle.
Of course, this is not a recipe, but only an heuristic; the final decision is always up to the developer and onto the context and cannot be stated in a fixed set of guidelines - and this is much truer for exception handling.
In Python exceptions are not very expensive like they are in some other languages, so I wouldn't recommend trying to avoid exceptions. But if you do throw an exception you would usually want catch it somewhere in your code, the exception being if a fatal error occurs.
I think whether to return an error code or throw an exception is something very valid to think about, and a cross-linguistic comparison may be helpful and informative. I guess the very generalized answer to this concern is simply the consideration: that the set of legal return values for any function should be made as small as possible, and as large as necessary.
Generally, this will mean that if a given method returns an integer number in a single test case, users can rightfully expect the method to always return an integer number or throw an exception. But, of course, the conceptually simplest way is not always the best way to handle things.
The return-value-of-least-surprise is usually None; and if you look into it, you’ll see that it’s the very semantics of None that license its usage across the board: it is a singleton, immutable value that, in a lot of cases, evaluates to False or prohibits further computation—no concatenation, no arithmetics. So if you chose to write a frob(x) method that returns a number for a string input, and None for non-numeric strings and any other input, and you use that inside an expression like a=42+frob('foo'), you still get an exception very close to the point where bogus things happened. Of course, if you stuff frob('foo') into a database column that has not been defined with NOT NULL, you might run into problems perhaps months later. This may or may not be justifiable.
So in most cases where you e.g. want to derive a number from a string, using somwething like a bare float(x) or int(x) is the way to go, as these built-ins will raise an exception when not given a digestable input. If that doesn’t suit your use case, consider returning None from a custom method; basically, this return value tells consumers that ‘Sorry, I was unable to understand your input.’. But you only want to do this if you positively know that going on in your program does make sense from that point onwards.
You see, I just found out how to turn each notice, warning, and error message into a potentially show-stopping exception in, uhm, PHP. It just drives me crazy that a typo in a variable name generates in the standard PHP configuration, nothing but a notice to the user. This is so bad. The program just goes on doing things with a program code that does not make sense at all! I can’t believe people find this a feature.
Likewise, one should view it like this: if, at any given point in time, it can be asserted with reasonable costs that the execution of a piece of code does no more make sense — since values are missing, are out of bounds, or are of an unexpected type, or when resources like a database connection have gone down — it is imperative, to minimize debugging headaches, to break execution and hand control up to any level in the code which feels entitled to handle the mishap.
Experience shows that refraining from early action and allowing bogus values to creep into your data is good for nothing but making your code harder to debug. So are many examples of over-zealous type-casting: allowing integers to be added to floats is reasonable. To allow a string with nothing but digits to be added to a number is a bogus practice that is likely to create strange, unlocalized errors that may pop up on any given line that happens to be processing that data.
I did a simple experiment to compare the performance of raising exceptions with the following code:
from functools import wraps
from time import time
import logging
def timed(foo):
#wraps(foo)
def bar(*a, **kw):
s = time()
foo(*a, **kw)
e = time()
print '%f sec' % (e - s)
return bar
class SomeException(Exception):
pass
def somefunc(_raise=False):
if _raise:
raise SomeException()
else:
return
#timed
def test1(_reps):
for i in xrange(_reps):
try:
somefunc(True)
except SomeException:
pass
#timed
def test2(_reps):
for i in xrange(_reps):
somefunc(False)
def main():
test1(1000000)
test2(1000000)
pass
if __name__ == '__main__':
main()
With the following results:
Raising exceptions: 3.142000 sec
Using return: 0.383000 sec
Exceptions are about 8 times slower than using return.

"else" considered harmful in Python?

In an answer (by S.Lott) to a question about Python's try...else statement:
Actually, even on an if-statement, the
else: can be abused in truly terrible
ways creating bugs that are very hard
to find. [...]
Think twice about else:. It is
generally a problem. Avoid it except
in an if-statement and even then
consider documenting the else-
condition to make it explicit.
Is this a widely held opinion? Is else considered harmful?
Of course you can write confusing code with it but that's true of any other language construct. Even Python's for...else seems to me a very handy thing to have (less so for try...else).
S.Lott has obviously seen some bad code out there. Haven't we all? I do not consider else harmful, though I've seen it used to write bad code. In those cases, all the surrounding code has been bad as well, so why blame poor else?
No it is not harmful, it is necessary.
There should always be a catch-all statement. All switches should have a default. All pattern matching in an ML language should have a default.
The argument that it is impossible to reason what is true after a series of if statements is a fact of life. The computer is the biggest finite state machine out there, and it is silly to enumerate every single possibility in every situation.
If you are really afraid that unknown errors go unnoticed in else statements, is it really that hard to raise an exception there?
Saying that else is considered harmful is a bit like saying that variables or classes are harmful. Heck, it's even like saying that goto is harmful. Sure, things can be misused. But at some point, you just have to trust programmers to be adults and be smart enough not to.
What it comes down to is this: if you're willing to not use something because an answer on SO or a blog post or even a famous paper by Dijkstra told you not to, you need to consider if programming is the right profession for you.
I wouldn't say it is harmful, but there are times when the else statement can get you into trouble. For instance, if you need to do some processing based on an input value and there are only two valid input values. Only checking for one could introduce a bug.
eg:
The only valid inputs are 1 and 2:
if(input == 1)
{
//do processing
...
}
else
{
//do processing
...
}
In this case, using the else would allow all values other than 1 to be processed when it should only be for values 1 and 2.
To me, the whole concept of certain popular language constructs being inherently bad is just plain wrong. Even goto has its place. I've seen very readable, maintainable code by the likes of Walter Bright and Linus Torvalds that uses it. It's much better to just teach programmers that readability counts and to use common sense than to arbitrarily declare certain constructs "harmful".
If you write:
if foo:
# ...
elif bar:
# ...
# ...
then the reader may be left wondering: what if neither foo nor bar is true? Perhaps you know, from your understanding of the code, that it must be the case that either foo or bar. I would prefer to see:
if foo:
# ...
else:
# at this point, we know that bar is true.
# ...
# ...
or:
if foo:
# ...
else:
assert bar
# ...
# ...
This makes it clear to the reader how you expect control to flow, without requiring the reader to have intimate knowledge of where foo and bar come from.
(in the original case, you could still write a comment explaining what is happening, but I think I would then wonder: "Why not just use an else: clause?")
I think the point is not that you shouldn't use else:; rather, that an else: clause can allow you to write unclear code and you should try to recognise when this happens and add a little comment to help out any readers.
Which is true about most things in programming languages, really :-)
Au contraire... In my opinion, there MUST be an else for every if. Granted, you can do stupid things, but you can abuse any construct if you try hard enough. You know the saying "a real programer can write FORTRAN in every language".
What I do lots of time is to write the else part as a comment, describing why there's nothing to be done.
Else is most useful when documenting assumptions about the code. It ensures that you have thought through both sides of an if statement.
Always using an else clause with each if statement is even a recommended practice in "Code Complete".
The rationale behind including the else statement (of try...else) in Python in the first place was to only catch the exceptions you really want to. Normally when you have a try...except block, there's some code that might raise an exception, and then there's some more code that should only run if the previous code was successful. Without an else block, you'd have to put all that code in the try block:
try:
something_that_might_raise_error()
do_this_only_if_that_was_ok()
except ValueError:
# whatever
The issue is, what if do_this_only_if_that_was_ok() raises a ValueError? It would get caught by the except statement, when you might not have wanted it to. That's the purpose of the else block:
try:
something_that_might_raise_error()
except ValueError:
# whatever
else:
do_this_only_if_that_was_ok()
I guess it's a matter of opinion to some extent, but I personally think this is a great idea, even though I use it very rarely. When I do use it, it just feels very appropriate (and besides, I think it helps clarify the code flow a bit)
Seems to me that, for any language and any flow-control statement where there is a default scenario or side-effect, that scenario needs to have the same level of consideration. The logic in if or switch or while is only as good as the condition if(x) while(x) or for(...). Therefore the statement is not harmful but the logic in their condition is.
Therefore, as developers it is our responsibility to code with the wide scope of the else in-mind. Too many developers treat it as a 'if not the above' when in-fact it can ignore all common sense because the only logic in it is the negation of the preceding logic, which is often incomplete. (an algorithm design error itself)
I don't then consider 'else' any more harmful than off-by-ones in a for() loop or bad memory management. It's all about the algorithms. If your automata is complete in its scope and possible branches, and all are concrete and understood then there is no danger. The danger is misuse of the logic behind the expressions by people not realizing the impact of wide-scope logic. Computers are stupid, they do what they are told by their operator(in theory)
I do consider the try and catch to be dangerous because it can negate handling to an unknown quantity of code. Branching above the raise may contain a bug, highlighted by the raise itself. This is can be non-obvious. It is like turning a sequential set of instructions into a tree or graph of error handling, where each component is dependent on the branches in the parent. Odd. Mind you, I love C.
There is a so called "dangling else" problem which is encountered in C family languages as follows:
if (a==4)
if (b==2)
printf("here!");
else
printf("which one");
This innocent code can be understood in two ways:
if (a==4)
if (b==2)
printf("here!");
else
printf("which one");
or
if (a==4)
if (b==2)
printf("here!");
else
printf("which one");
The problem is that the "else" is "dangling", one can confuse the owner of the else. Of course the compiler will not make this confusion, but it is valid for mortals.
Thanks to Python, we can not have a dangling else problem in Python since we have to write either
if a==4:
if b==2:
print "here!"
else:
print "which one"
or
if a==4:
if b==2:
print "here!"
else:
print "which one"
So that human eye catches it. And, nope, I do not think "else" is harmful, it is as harmful as "if".
In the example posited of being hard to reason, it can be written explicitly, but the else is still necessary.
E.g.
if a < 10:
# condition stated explicitly
elif a > 10 and b < 10:
# condition confusing but at least explicit
else:
# Exactly what is true here?
# Can be hard to reason out what condition is true
Can be written
if a < 10:
# condition stated explicitly
elif a > 10 and b < 10:
# condition confusing but at least explicit
elif a > 10 and b >=10:
# else condition
else:
# Handle edge case with error?
I think the point with respect to try...except...else is that it is an easy mistake to use it to create inconsistent state rather than fix it. It is not that it should be avoided at all costs, but it can be counter-productive.
Consider:
try:
file = open('somefile','r')
except IOError:
logger.error("File not found!")
else:
# Some file operations
file.close()
# Some code that no longer explicitly references 'file'
It would be real nice to say that the above block prevented code from trying to access a file that didn't exist, or a directory for which the user has no permissions, and to say that everything is encapsulated because it is within a try...except...else block. But in reality, a lot of code in the above form really should look like this:
try:
file = open('somefile','r')
except IOError:
logger.error("File not found!")
return False
# Some file operations
file.close()
# Some code that no longer explicitly references 'file'
You are often fooling yourself by saying that because file is no longer referenced in scope, it's okay to go on coding after the block, but in many cases something will come up where it just isn't okay. Or maybe a variable will later be created within the else block that isn't created in the except block.
This is how I would differentiate the if...else from try...except...else. In both cases, one must make the blocks parallel in most cases (variables and state set in one ought to be set in the other) but in the latter, coders often don't, likely because it's impossible or irrelevant. In such cases, it often will make a whole lot more sense to return to the caller than to try and keep working around what you think you will have in the best case scenario.

How much input validation should I be doing on my python functions/methods?

I'm interested in how much up front validation people do in the Python they write.
Here are a few examples of simple functions:
def factorial(num):
"""Computes the factorial of num."""
def isPalindrome(inputStr):
"""Tests to see if inputStr is the same backwards and forwards."""
def sum(nums):
"""Same as the built-in sum()... computes the sum of all the numbers passed in."""
How thoroughly do you check the input values before beginning computation, and how do you do your checking? Do you throw some kind of proprietary exception if input is faulty (BadInputException defined in the same module, for example)? Do you just start your calculation and figure it will throw an exception at some point if bad data was passed in ("asd" to factorial, for example)?
When the passed in value is supposed to be a container do you check not only the container but all the values inside it?
What about situations like factorial, where what's passed in might be convertible to an int (e.g. a float) but you might lose precision when doing so?
I assert what's absolutely essential.
Important: What's absolutely essential. Some people over-test things.
def factorial(num):
assert int(num)
assert num > 0
Isn't completely correct. long is also a legal possibility.
def factorial(num):
assert type(num) in ( int, long )
assert num > 0
Is better, but still not perfect. Many Python types (like rational numbers, or number-like objects) can also work in a good factorial function. It's hard to assert that an object has basic integer-like properties without being too specific and eliminating future unthought-of classes from consideration.
I never define unique exceptions for individual functions. I define a unique exception for a significant module or package. Usually, however, just an Error class or something similar. That way the application says except somelibrary.Error,e: which is about all you need to know. Fine-grained exceptions get fussy and silly.
I've never done this, but I can see places where it might be necessary.
assert all( type(i) in (int,long) for i in someList )
Generally, however, the ordinary Python built-in type checks work fine. They find almost all of the exceptional situations that matter almost all the time. When something isn't the right type, Python raises a TypeError that always points at the right line of code.
BTW. I only add asserts at design time if I'm absolutely certain the function will be abused. I sometimes add assertions later when I have a unit test that fails in an obscure way.
For calculations like sum, factorial etc, pythons built-in type checks will do fine. The calculations will end upp calling add, mul etc for the types, and if they break, they will throw the correct exception anyway. By enforcing your own checks, you may invalidate otherwise working input.
I'm trying to write docstring stating what type of parameter is expected and accepted, and I'm not checking it explicitly in my functions.
If someone wants to use my function with any other type its his responsibility to check if his type emulates one I accept well enough. Maybe your factorial can be used with some custom long-like type to obtain something you wouldn't think of? Or maybe your sum can be used to concatenate strings? Why should you disallow it by type checking? It's not C, anyway.
I basically try to convert the variable to what it should be and pass up or throw the appropriate exception if that fails.
def factorial(num):
"""Computes the factorial of num."""
try:
num = int(num)
except ValueError, e:
print e
else:
...
It rather depends on what I'm writing, and how the output gets there. Python doesn't have the public/private protections of other OO-languages. Instead there are conventions. For example, external code should only call object methods that are not prefixed by an underscore.
Therefore, if I'm writing a module, I'd validate anything that is not generated from my own code, i.e. any calls to publicly-accessible methods/functions. Sometimes, if I know the validation is expensive, I make it togglable with a kwarg:
def publicly_accessible_function(arg1, validate=False):
if validate:
do_validation(arg1)
do_work
Internal methods can do validation via the assert statement, which can be disabled altogether when the code goes out of development and into production.
I almost never enforce any kind of a check, unless I think there's a possibility that someone might think they can pass some X which would produce completely crazy results.
The other time I check is when I accept several types for an argument, for example a function that takes a list, might accept an arbitrary object and just wrap it in a list (if it's not already a list). So in that case I check for the type -not to enforce anything- just because I want the function to be flexible in how it's used.
Only bother to check if you have a failing unit-test that forces you to.
Also consider "EAFP"... It's the Python way!
A bit of perspective on how another language handles it might add some value. For Perl, I remember using this module - http://search.cpan.org/dist/Params-Validate/ which offloads a lot of parameter validation from the developer. I was searching for something similar in python and came across this: http://www.voidspace.org.uk/python/validate.html I haven't tried it out. But I guess aiming for a standard way of validating params across the entire codebase leads to upfront setting of parameter validation expectations across the entire team.

Categories