Python: Preventing "if" staircases? - python

Whenever I'm coding something that requires a lot of conditionals, I end up doing this:
if foo:
if bar:
if foobar:
if barfoo:
if foobarfoo:
if barfoobar:
# And forever and ever and ever
I can't write if foo and bar and foobar and ... because I check for the value list elements (if foo[1] == 'bar') inside of an if somewhere down the line, and if the list index don't exist, I get an error.
Is there a shortcut to conditionally checking things like this, or an alternative method? Thanks.

I can't write if foo and bar and foobar and ... because I call list elements inside of an if somewhere down the line, and if the list index don't exist, I get an error.
in python,and short circuits. If the left side of the expression is false, the right side is not evaluated at all.
foo = dict()
if 'bar' in foo and foo['bar']:
doSomething()

Fail fast:
if not foo:
return
if not foobar:
return
and so forth.

There's also
if all((foo, bar, foobar, barfoo, foobarfoo, barfoobar)):
print "oh yeah"
all will also shortcircuit

Forgive me if I'm stating the obvious -- but if you're checking for many different conditions in advance of one or two operations, you might be better off using try/except -- especially for those conditions (if any) that are clear error conditions.

See if you can't break some of that out into a function that includes some of the conditionals (assuming some are in common with your various cases).

Break it up into several sub-components where appropriate. As for where to draw the dividing lines, that's really up to you. While a huge staircase of conditional statements isn't great, neither is a massive if-statement with so many predicates that they wrap several lines. Instead, try to group your conditions into logical bunches.
You might write it as:
if foo and bar and foobar:
...
if barfoo and foobarfoo and barfoobar:
...
I also suggest introducing helper methods along the way. Even if those helper methods are called only from this code, that's fine.
def handle_bar():
if barfoo and foobarfoo and barfoobar:
...
if foo and bar and foobar:
...
handle_bar()
If scopes get confusing or you find yourself passing around too much state as function arguments, wrap it in a class and use member variables where its conceptually appropriate.
Overall, my advice is to separate concepts into individual pieces of code at an appropriate granularity. If you don't do it at all, you get a long piece of code that requires lots of scrolling to see the big picture. If you over-do it, you force the reader to jump around your code too much.

If you have more than 3 to 5 tests (or more), consider keeping your conditions in a dictionary, list or tuple. Then test that data structure. Much cleaner than many individual named data.
If you are testing "truth" against a named list of variables of unknown length or a sequence data structure (like a list or tuple) you can do this:
def all_true(*args):
for test in args:
if bool(test) is False: return False
return True
foo=bar=foobar=barfoo=foobarfoo=barfoobar=1
if foo:
if bar:
if foobar:
if barfoo:
if foobarfoo:
if barfoobar:
print "True by Stairs!"
if all_true(foo,bar,foobar,barfoo,foobarfoo,barfoobar):
print "True by function!"
t=(foo,bar,foobar,barfoo,foobarfoo,barfoobar)
if all_true(*t): print "The tuple is true!"
l=[foo,bar,foobar,barfoo,foobarfoo,barfoobar]
if all_true(*l): print "list is true!"
bar=0
# run the same tests...
The all_true() function will short-circuit against the first false it finds.

Related

How to refer to "True-like" and "False-like" when documenting a function?

Strongly related question.
When writing docstrings for my functions in python, I sometimes want to write something like this about argument specifications:
def foo(bar):
"""
Do some stuff.
bar : callable
must return True if XXX and False otherwise.
"""
if bar(...):
... # code goes here
However, this is not perfectly precise, because in this example bar could be returning any object that will be evaluated to True in the if-statement when the conditions XXX is fulfilled. Such a callable would be a perfectly valid argument to pass to foo.
How should I formulate my documentation to reflect that foo does not strictly requires bar's output to be a boolean?
My first move was to write something like "[...] must return an object that will be evaluated to True if ...", but I find it obfuscated.
This is a slang question! I've always wanted to answer one of these!
Ahem.
The term for something that evaluates to True when used in an if statement is "truthy". The term for something that evaluates to False when used in an if statement is "falsy" or "falsey". So, your documentation can be written like this:
def foo(bar):
"""
Do some stuff.
bar : callable
must return a truthy value iff XXX.
"""
if bar(...):
... # code goes here
"Iff" is more slang, this time from the mathematical world. It means "if and only if". These words are commonly used in programming contexts, so I expect most programmers will understand them; if not, truthy, falsy, falsey and iff all come up with their respective correct meanings when searched in a search engine.
I'm not sure if there's a standard for this, but based on python's Truth Value Testing you would probably be safe writing something like
def foo(bar):
"""
Do some stuff.
bar : callable
when tested for truth value, return value should evaluate to True if and only if XXX.
"""
if bar(...):
... # code goes here
I would suggest that is is just fine to say "must return True if XXX and False otherwise." Because of duck typing, I read this the way you intend. Further, this seems to be standard in how the Python standard library documents things.
In fact, if you do strictly require the value to be True or False and use this language, I will have a very bad day tracking down this bug!
As others have said, "truthy" and "falsy" are fine, well-understood alternatives if you are still concerned.

Best practice for using parentheses in Python function returns?

I'm learning Python and, so far, I absolutely love it. Everything about it.
I just have one question about a seeming inconsistency in function returns, and I'm interested in learning the logic behind the rule.
If I'm returning a literal or variable in a function return, no parentheses are needed:
def fun_with_functions(a, b):
total = a + b
return total
However, when I'm returning the result of another function call, the function is wrapped around a set of parentheses. To wit:
def lets_have_fun():
return(fun_with_functions(42, 9000))
This is, at least, the way I've been taught, using the A Smarter Way to Learn Python book. I came across this discrepancy and it was given without an explanation. You can see the online exercise here (skip to Exercize 10).
Can someone explain to me why this is necessary? Is it even necessary in the first place? And are there other similar variations in parenthetical syntax that I should be aware of?
Edit: I've rephrased the title of my question to reflect the responses. Returning a result within parentheses is not mandatory, as I originally thought, but it is generally considered best practice, as I have now learned.
It's not necessary. The parentheses are used for several reason, one reason it's for code style:
example = some_really_long_function_name() or another_really_function_name()
so you can use:
example = (some_really_long_function_name()
or
another_really_function_name())
Another use it's like in maths, to force evaluation precede. So you want to ensure the excute between parenthese before. I imagine that the functions return the result of another one, it's just best practice to ensure the execution of the first one but it's no necessary.
I don't think it is mandatory. Tried in both python2 and python3, and a without function defined without parentheses in lets_have_fun() return clause works just fine. So as jack6e says, it's just a preference.
if you
return ("something,) # , is important, the ( ) are optional, thx #roganjosh
you are returning a tuple.
If you are returning
return someFunction(4,9)
or
return (someFunction(4,9))
makes no difference. To test, use:
def f(i,g):
return i * g
def r():
return f(4,6)
def q():
return (f(4,6))
print (type(r()))
print (type(q()))
Output:
<type 'int'>
<type 'int'>

Recommended way to initialize variable in if block

I have the following code (minus some other operations):
def foobar():
msg=None
if foo:
msg='foo'
else:
msg='bar'
return msg
Is the following better practice for the msg variable?
def foobar():
if foo:
msg='foo'
else:
msg='bar'
return msg
I'm aware that I could simplify the above functions to ternary expressions, however there are operations in each if-else block that I've left out.
Either should be fine but I would probably do:
def foobar():
msg='bar'
if foo:
msg='foo'
return msg
Just for completeness, here are some one-line alternatives to if/else blocks:
msg = 'foo' if foo else 'bar'
msg = foo and 'foo' or 'bar'
msg = ('bar', 'foo')[bool(foo)]
The first of those is definitely the most clear, if you don't like the one-liner I would suggest using your second method or thagorn's answer. The bool() call is only necessary in the last one if foo is not already a bool (or 0/1).
Obviously in your example function you could just return this immediately without even using a msg variable:
def foobar():
return 'foo' if foo else 'bar'
In Python there's no great advantage to initializing before a conditional as in your first example. You just need to be sure that the variable is initialized before it's returned. That assumes (based on your examples) that you're using the "single exit point" paradigm. In some cases in Python it's appropriate, but other times you get cleaner code by exiting early when possible.
def earlyReturn(mycheck):
if not mycheck:
return 'You forgot something.'
# code here if the test passes without needing an extra level of indentation.
I realize that there are some things left out, but if you don't actually need to manipulate msg, I imagine you could just return the intended contents, without ever needing a variable; return 'foo'
I would definitely say that the later is better. There is no recommendation for Python to initialize variables. Therefor it shall be avoided if it's not adding something of value to the code like a fallback value or makes the code more readible, which it does'nt in this case.
Edit: By fallback value I mean the same as thagorn and mikebabcock has suggested.
If what you've shown is all that msg is involved in, then initializing it doesn't do anything for you, and the second solution is better.
If that's the entire logic, why not do:
def foobar():
msg='bar'
if foo:
msg='foo'
return msg

pythonic way to rewrite an assignment in an if statement

Is there a pythonic preferred way to do this that I would do in C++:
for s in str:
if r = regex.match(s):
print r.groups()
I really like that syntax, imo it's a lot cleaner than having temporary variables everywhere. The only other way that's not overly complex is
for s in str:
r = regex.match(s)
if r:
print r.groups()
I guess I'm complaining about a pretty pedantic issue. I just miss the former syntax.
How about
for r in [regex.match(s) for s in str]:
if r:
print r.groups()
or a bit more functional
for r in filter(None, map(regex.match, str)):
print r.groups()
Perhaps it's a bit hacky, but using a function object's attributes to store the last result allows you to do something along these lines:
def fn(regex, s):
fn.match = regex.match(s) # save result
return fn.match
for s in strings:
if fn(regex, s):
print fn.match.groups()
Or more generically:
def cache(value):
cache.value = value
return value
for s in strings:
if cache(regex.match(s)):
print cache.value.groups()
Note that although the "value" saved can be a collection of a number of things, this approach is limited to holding only one such at a time, so more than one function may be required to handle situations where multiple values need to be saved simultaneously, such as in nested function calls, loops or other threads. So, in accordance with the DRY principle, rather than writing each one, a factory function can help:
def Cache():
def cache(value):
cache.value = value
return value
return cache
cache1 = Cache()
for s in strings:
if cache1(regex.match(s)):
# use another at same time
cache2 = Cache()
if cache2(somethingelse) != cache1.value:
process(cache2.value)
print cache1.value.groups()
...
There's a recipe to make an assignment expression but it's very hacky. Your first option doesn't compile so your second option is the way to go.
## {{{ http://code.activestate.com/recipes/202234/ (r2)
import sys
def set(**kw):
assert len(kw)==1
a = sys._getframe(1)
a.f_locals.update(kw)
return kw.values()[0]
#
# sample
#
A=range(10)
while set(x=A.pop()):
print x
## end of http://code.activestate.com/recipes/202234/ }}}
As you can see, production code shouldn't touch this hack with a ten foot, double bagged stick.
This might be an overly simplistic answer, but would you consider this:
for s in str:
if regex.match(s):
print regex.match(s).groups()
There is no pythonic way to do something that is not pythonic. It's that way for a reason, because 1, allowing statements in the conditional part of an if statement would make the grammar pretty ugly, for instance, if you allowed assignment statements in if conditions, why not also allow if statements? how would you actually write that? C like languages don't have this problem, because they don't have assignment statements. They make do with just assignment expressions and expression statements.
the second reason is because of the way
if foo = bar:
pass
looks very similar to
if foo == bar:
pass
even if you are clever enough to type the correct one, and even if most of the members on your team are sharp enough to notice it, are you sure that the one you are looking at now is exactly what is supposed to be there? it's not unreasonable for a new dev to see this and just fix it (one way or the other) and now its definitely wrong.
Whenever I find that my loop logic is getting complex I do what I would with any other bit of logic: I extract it to a function. In Python it is a lot easier than some other languages to do this cleanly.
So extract the code that just generates the items of interest:
def matching(strings, regex):
for s in strings:
r = regex.match(s)
if r: yield r
and then when you want to use it, the loop itself is as simple as they get:
for r in matching(strings, regex):
print r.groups()
Yet another answer is to use the "Assign and test" recipe for allowing assigning and testing in a single statement published in O'Reilly Media's July 2002 1st edition of the Python Cookbook and also online at Activestate. It's object-oriented, the crux of which is this:
# from http://code.activestate.com/recipes/66061
class DataHolder:
def __init__(self, value=None):
self.value = value
def set(self, value):
self.value = value
return value
def get(self):
return self.value
This can optionally be modified slightly by adding the custom __call__() method shown below to provide an alternative way to retrieve instances' values -- which, while less explicit, seems like a completely logical thing for a 'DataHolder' object to do when called, I think.
def __call__(self):
return self.value
Allowing your example to be re-written:
r = DataHolder()
for s in strings:
if r.set(regex.match(s))
print r.get().groups()
# or
print r().groups()
As also noted in the original recipe, if you use it a lot, adding the class and/or an instance of it to the __builtin__ module to make it globally available is very tempting despite the potential downsides:
import __builtin__
__builtin__.DataHolder = DataHolder
__builtin__.data = DataHolder()
As I mentioned in my other answer to this question, it must be noted that this approach is limited to holding only one result/value at a time, so more than one instance is required to handle situations where multiple values need to be saved simultaneously, such as in nested function calls, loops or other threads. That doesn't mean you should use it or the other answer, just that more effort will be required.

Can I be warned when I used a generator function by accident

I was working with generator functions and private functions of a class. I am wondering
Why when yielding (which in my one case was by accident) in __someFunc that this function just appears not to be called from within __someGenerator. Also what is the terminology I want to use when referring to these aspects of the language?
Can the python interpreter warn of such instances?
Below is an example snippet of my scenario.
class someClass():
def __init__(self):
pass
#Copy and paste mistake where yield ended up in a regular function
def __someFunc(self):
print "hello"
#yield True #if yielding in this function it isn't called
def __someGenerator (self):
for i in range(0, 10):
self.__someFunc()
yield True
yield False
def someMethod(self):
func = self.__someGenerator()
while func.next():
print "next"
sc = someClass()
sc.someMethod()
I got burned on this and spent some time trying to figure out why a function just wasn't getting called. I finally discovered I was yielding in function I didn't want to in.
A "generator" isn't so much a language feature, as a name for functions that "yield." Yielding is pretty much always legal. There's not really any way for Python to know that you didn't "mean" to yield from some function.
This PEP http://www.python.org/dev/peps/pep-0255/ talks about generators, and may help you understand the background better.
I sympathize with your experience, but compilers can't figure out what you "meant for them to do", only what you actually told them to do.
I'll try to answer the first of your questions.
A regular function, when called like this:
val = func()
executes its inside statements until it ends or a return statement is reached. Then the return value of the function is assigned to val.
If a compiler recognizes the function to actually be a generator and not a regular function (it does that by looking for yield statements inside the function -- if there's at least one, it's a generator), the scenario when calling it the same way as above has different consequences. Upon calling func(), no code inside the function is executed, and a special <generator> value is assigned to val. Then, the first time you call val.next(), the actual statements of func are being executed until a yield or return is encountered, upon which the execution of the function stops, value yielded is returned and generator waits for another call to val.next().
That's why, in your example, function __someFunc didn't print "hello" -- its statements were not executed, because you haven't called self.__someFunc().next(), but only self.__someFunc().
Unfortunately, I'm pretty sure there's no built-in warning mechanism for programming errors like yours.
Python doesn't know whether you want to create a generator object for later iteration or call a function. But python isn't your only tool for seeing what's going on with your code. If you're using an editor or IDE that allows customized syntax highlighting, you can tell it to give the yield keyword a different color, or even a bright background, which will help you find your errors more quickly, at least. In vim, for example, you might do:
:syntax keyword Yield yield
:highlight yield ctermbg=yellow guibg=yellow ctermfg=blue guifg=blue
Those are horrendous colors, by the way. I recommend picking something better. Another option, if your editor or IDE won't cooperate, is to set up a custom rule in a code checker like pylint. An example from pylint's source tarball:
from pylint.interfaces import IRawChecker
from pylint.checkers import BaseChecker
class MyRawChecker(BaseChecker):
"""check for line continuations with '\' instead of using triple
quoted string or parenthesis
"""
__implements__ = IRawChecker
name = 'custom_raw'
msgs = {'W9901': ('use \\ for line continuation',
('Used when a \\ is used for a line continuation instead'
' of using triple quoted string or parenthesis.')),
}
options = ()
def process_module(self, stream):
"""process a module
the module's content is accessible via the stream object
"""
for (lineno, line) in enumerate(stream):
if line.rstrip().endswith('\\'):
self.add_message('W9901', line=lineno)
def register(linter):
"""required method to auto register this checker"""
linter.register_checker(MyRawChecker(linter))
The pylint manual is available here: http://www.logilab.org/card/pylint_manual
And vim's syntax documentation is here: http://www.vim.org/htmldoc/syntax.html
Because the return keyword is applicable in both generator functions and regular functions, there's nothing you could possibly check (as #Christopher mentions). The return keyword in a generator indicates that a StopIteration exception should be raised.
If you try to return with a value from within a generator (which doesn't make sense, since return just means "stop iteration"), the compiler will complain at compile-time -- this may catch some copy-and-paste mistakes:
>>> def foo():
... yield 12
... return 15
...
File "<stdin>", line 3
SyntaxError: 'return' with argument inside generator
I personally just advise against copy and paste programming. :-)
From the PEP:
Note that return means "I'm done, and have nothing interesting to
return", for both generator functions and non-generator functions.
We do this.
Generators have names with "generate" or "gen" in their name. It will have a yield statement in the body. Pretty easy to check visually, since no method is much over 20 lines of code.
Other methods don't have "gen" in their name.
Also, we do not every use __ (double underscore) names under any circumstances. 32,000 lines of code. Non __ names.
The "generator vs. non-generator" method function is entirely a design question. What did the programmer "intend" to happen. The compiler can't easily validate your intent, it can only validate what you actually typed.

Categories