Is there any value to a Switch / Case implementation in Python? - python

Recently, I saw some discussions online about how there is no good "switch / case" equivalent in Python. I realize that there are several ways to do something similar - some with lambda, some with dictionaries. There have been other StackOverflow discussions about the alternatives. There were even two PEPs (PEP 0275 and PEP 3103) discussing (and rejecting) the integration of switch / case into the language.
I came up with what I think is an elegant way to do switch / case.
It ends up looking like this:
from switch_case import switch, case # note the import style
x = 42
switch(x) # note the switch statement
if case(1): # note the case statement
print(1)
if case(2):
print(2)
if case(): # note the case with no args
print("Some number besides 1 or 2")
So, my questions are: Is this a worthwhile creation? Do you have any suggestions for making it better?
I put the include file on github, along with extensive examples. (I think the entire include file is about 50 executable lines, but I have 1500 lines of examples and documentation.) Did I over-engineer this thing, and waste a bunch of time, or will someone find this worthwhile?
Edit:
Trying to explain why this is different from other approaches:
1) Multiple paths are possible (executing two or more cases),
which is harder in the dictionary method.
2) can do checking for comparisons other than "equals"
(such as case(less_than(1000)).
3) More readable than the dictionary method, and possibly if/elif method
4) can track how many True cases there were.
5) can limit how many True cases are permitted. (i.e. execute the
first 2 True cases of...)
6) allows for a default case.
Here's a more elaborate example:
from switch_case import switch, case, between
x=12
switch(x, limit=1) # only execute the FIRST True case
if case(between(10,100)): # note the "between" case Function
print ("%d has two digits."%x)
if case(*range(0,100,2)): # note that this is an if, not an elif!
print ("%d is even."%x) # doesn't get executed for 2 digit numbers,
# because limit is 1; previous case was True.
if case():
print ("Nothing interesting to say about %d"%x)
# Running this program produces this output:
12 has two digits.
Here's an example attempting to show how switch_case can be more clear and concise than conventional if/else:
# conventional if/elif/else:
if (status_code == 2 or status_code == 4 or (11 <= status_code < 20)
or status_code==32):
[block of code]
elif status_code == 25 or status_code == 45:
[block of code]
if status_code <= 100:
[block can get executed in addition to above blocks]
# switch_case alternative (assumes import already)
switch(status_code)
if case (2, 4, between(11,20), 32): # significantly shorter!
[block of code]
elif case(25, 45):
[block of code]
if case(le(100)):
[block can get executed in addition to above blocks]
The big savings is in long if statements where the same switch is repeated over and over. Not sure how frequent of a use-case that is, but there seems to be certain cases where this makes sense.
The example file on github has even more examples.

So, my questions are: Is this a worthwhile creation?
No.
Do you have any suggestions for making it better?
Yes. Don't bother. What has it saved? Seriously? You have actually made the code more obscure by removing the variable x from each elif condition.. Also, by replacing the obvious elif with if you have created intentional confusion for all Python programmers who will now think that the cases are independent.
This creates confusion.
The big savings is in long if statements where the same switch is repeated over and over. Not sure how frequent of a use-case that is, but there seems to be certain cases where this makes sense.
No. It's very rare, very contrived and very hard to read. Seeing the actual variable(s) involved is essential. Eliding the variable name makes things intentionally confusing. Now I have to go find the owning switch() function to interpret the case.
When there are two or more variables, this completely collapses.

There have been a plethora of discussions that address this issue on Stackoverflow. You can use the search function at the top to look for some other discussions.
However, I fail to see how your solution is better than a basic dictionary:
def switch(x):
return {
1 : 1,
2 : 2,
}[x]
Although, adding a default clause is non-trivial with this method. However, your example seems to replicate a complex if/else statement anyway ? Not sure if I would include an external library for this.

IMHO, the main reason for the switch statement to exist is so it can be translated/compiled into a (very fast) jump table. How would your proposed implementation accomplish that goal? Python's dictionaries do it today, as other posters have shown.
Secondarily, I guess a switch statement might read more clearly than the alternatives in some languages, but in python's case I think if/elif/else wins on clarity.

from pyswitch import Switch # pyswitch can be found on PyPI
myswitch = Switch()
#myswitch.case(42)
def case42(value):
print "I got 42!"
#myswitch.case(range(10))
def caseRange10(value):
print "I got a number from 0-9, and it was %d!" % value
#myswitch.caseIn('lo')
def caseLo(value):
print "I got a string with 'lo' in it; it was '%s'" % value
#myswitch.caseRegEx(r'\b([Pp]y\w)\b')
def caseReExPy(matchOb):
print r"I got a string that matched the regex '\b[Pp]y\w\b', and the match was '%s'" % matchOb.group(1)
#myswitch.default
def caseDefault(value):
print "Hey, default handler here, with a value of %r." % value
myswitch(5) # prints: I got a number from 0-9, and it was 5!
myswitch('foobar') # prints: Hey, default handler here, with a value of foobar.
myswitch('The word is Python') # prints: I got a string that matched the regex '\b[Pp]y\w\b', and the match was 'Python'
You get the idea. Why? Yep, dispatch tables are the way to go in Python. I just got tired of writing them over and over, so I wrote a class and some decorators to handle it for me.

I have always just used dictionaries, if/elses, or lambdas for my switch like statements. Reading through your code tho =)
docs:
why-isn-t-there-a-switch-or-case-statement-in-python

Update 2021: match-case introduced in Python 3.10
This hotly debated topic can now be closed.
In fact Python 3.10 released in October 2021 introduces structural pattern matching which brings a match-case construct to the language.
See this related answer for details.

Related

Zen of Python 'Explicit is better than implicit'

I'm trying to understand what 'implicit' and 'explicit' really means in the context of Python.
a = []
# my understanding is that this is implicit
if not a:
print("list is empty")
# my understanding is that this is explicit
if len(a) == 0:
print("list is empty")
I'm trying to follow the Zen of Python rules, but I'm curious to know if this applies in this situation or if I am over-thinking it?
The two statements have very different semantics. Remember that Python is dynamically typed.
For the case where a = [], both not a and len(a) == 0 are equivalent. A valid alternative might be to check not len(a). In some cases, you may even want to check for both emptiness and listness by doing a == [].
But a can be anything. For example, a = None. The check not a is fine, and will return True. But len(a) == 0 will not be fine at all. Instead you will get TypeError: object of type 'NoneType' has no len(). This is a totally valid option, but the if statements do very different things and you have to pick which one you want.
(Almost) everything has a __bool__ method in Python, but not everything has __len__. You have to decide which one to use based on the situation. Things to consider are:
Have you already verified whether a is a sequence?
Do you need to?
Do you mind if your if statement crashed on non-sequences?
Do you want to handle other falsy objects as if they were empty lists?
Remember that making the code look pretty takes second place to getting the job done correctly.
Though this question is old, I'd like to offer a perspective.
In a dynamic language, my preference would be to always describe the expected type and objective of a variable in order to offer more purpose understanding. Then use the knowledge of the language to be succinct and increase readability where possible (in python, an empty list's boolean result is false). Thus the code:
lst_colours = []
if not lst_colours:
print("list is empty")
Even better to convey meaning is using a variable for very specific checks.
lst_colours = []
b_is_list_empty = not lst_colours
if b_is_list_empty:
print("list is empty")
Checking a list is empty would be a common thing to do several times in a code base. So even better such things in a separate file helper function library. Thus isolating common checks, and reducing code duplication.
lst_colours = []
if b_is_list_empty(lst_colours):
print("list is empty")
def b_is_list_empty (lst):
......
Most importantly, add meaning as much as possible, have an agreed company standard to chose how to tackle the simple things, like variable naming and implicit/explicit code choices.
Try to think of:
if not a:
...
as shorthand for:
if len(a) == 0:
...
I don't think this is a good example of a gotcha with Python's Zen rule of "explicit" over "implicit". This is done rather mostly because of readability. It's not that the second one is bad and the other is good. It's just that the first one is more skillful. If one understands boolean nature of lists in Python, I think you find the first is more readable and readability counts in Python.

Variable as loop's expression

Lets say I have string which changes according to input:
expression=True
or
expression="a>1"
How can I use this variable as loop's expression in the way, so I won't need to repeat myself writing double loop. (and without using eval)?
Well pseudo code:
expression="a<2"
a=1
while expression:
print a,
a+=0.1
would print something like that:
1 1.1 1.2 <...> 1.9
EDIT:
No, I don't want to print numbers, I want to change loop condition(expression) dynamically.
CODE THAT works:
a="b==2"
b=2
while eval(a):
//do things.
Sample code:
somevar = 3
expression = lambda: somevar < 5
while expression():
...
if continue_if_even:
expression = lambda: (somevar % 2) == 0
...
Maybe using lambda might be the solution for your problem. And it's way better (more elegant, more bug-free, more secure) than using eval.
Of course, there are some very special cases where eval is still needed.
You're asking how to run user input. The answer is eval (or - not here, but generally - exec). Of course this is a bad answer, but it's the only answer. And if the only answer is bad, the question is bad.
What are you really trying to do? There are few programs (most notably programming language implementations) that need to give the user this much power. Yours propably doesn't. Chances are you can do what you want to do without running user input. But we need to know what you're trying to do to suggest viable alternatives.
You seem to want to change the condition of the loop dynamically, but you're not providing a very good use case so it's hard to understand why. If you just want to print the numbers between 1 and 1.9 with an increment of 0.1, there are easy ways to do that:
for x in xrange(10):
print "1.%d" % i
is one. There's no need for this dynamic expression magic. Also, you seem to want the same value (a) to have two very different meanings at the same time, both the value to print and the expression that controls how many values to print. That's perhaps a source of some of the confusion.

Should I make my python code less fool-proof to improve readability?

I try to make my code fool-proof, but I've noticed that it takes a lot of time to type things out and it takes more time to read the code.
Instead of:
class TextServer(object):
def __init__(self, text_values):
self.text_values = text_values
# <more code>
# <more methods>
I tend to write this:
class TextServer(object):
def __init__(self, text_values):
for text_value in text_values:
assert isinstance(text_value, basestring), u'All text_values should be str or unicode.'
assert 2 <= len(text_value), u'All text_values should be at least two characters long.'
self.__text_values = frozenset(text_values) # <They shouldn't change.>
# <more code>
#property
def text_values(self):
# <'text_values' shouldn't be replaced.>
return self.__text_values
# <more methods>
Is my python coding style too paranoid? Or is there a way to improve readability while keeping it fool-proof?
Note 1: I've added the comments between < and > just for clarification.
Note 2: The main fool I try to prevent to abuse my code is my future self.
Here's some good advice on Python idioms from this page:
Catch errors rather than avoiding them to avoid cluttering your code with special cases. This idiom is called EAFP ('easier to ask forgiveness than permission'), as opposed to LBYL ('look before you leap'). This often makes the code more readable. For example:
Worse:
#check whether int conversion will raise an error
if not isinstance(s, str) or not s.isdigit:
return None
elif len(s) > 10: #too many digits for int conversion
return None
else:
return int(str)
Better:
try:
return int(str)
except (TypeError, ValueError, OverflowError): #int conversion failed
return None
(Note that in this case, the second version is much better, since it correctly handles leading + and -, and also values between 2 and 10 billion (for 32-bit machines). Don't clutter your code by anticipating all the possible failures: just try it and use appropriate exception handling.)
"Is my python coding style too paranoid? Or is there a way to improve readability while keeping it fool-proof?"
Who's the fool you're protecting yourself from?
You? Are you worried that you didn't remember the API you wrote?
A peer? Are you worried that someone in the next cubicle will actively make an effort to pass the wrong things through an API? You can talk to them to resolve this problem. It saves a lot of code if you provide documentation.
A total sociopath who will download your code, refuse to read the API documentation, and then call all the methods with improper arguments? What possible help can you provide to them?
The "fool-proof" coding isn't really very helpful, since all of these scenarios are more easily addressed another way.
If you're fool-proofing against yourself, perhaps that's not really sensible.
If you're fool-proofing for a co-worker or peer, you should -- perhaps -- talk to them and make sure they understand the API docs.
If you're fool-proofing against some hypothetical sociopathic programmer who's out to subvert the API, there's nothing you can do. It's Python. They have the source. Why would they go through the effort to misuse the API when they can just edit the source to break things?
It's unusual in Python to use a private instance attribute, and then expose it through a property as you have. Just use self.text_values.
Your code is too paranoid (especially when you want to protect only against yourself).
In Python circles, LBYL is generally (but not always) frowned upon. But there's also the (often unstated) assumption that one has (good) unit tests.
Me personally? I think readability is of paramount importance. I mean, if you yourself think it's hard to read, what will others think? And less readable code is also more likely to catch bugs. Not to mention making working on it harder/more time consuming (you have to dig to find what the code actually does in all that LBYLing)
If you try to make your code totally foolproof, someone will invent a better fool. Seriously, a good rule of thumb is to guard against likely errors, but don't clutter your code trying to think of every conceivable way a caller could break you.
instead of spending time in assert and private variable i prefer spend time in documentation and test cases. i prefer read documentation and when i have to read code i prefer read tests. this is more true the more code grow up. in the same time tests give you a fool-proof code and useful use cases.
I base the need for error checking code on the consequences of the errors that it's checking for. If crap data gets into my system, how long will it be before I discover it, how hard will it be to determine that the problem was crap data, and how difficult will it be to fix? For cases like the one you've posted, the answers are generally "not long," "not hard," and "not difficult."
But data that's going to be persisted somewhere and then used as the input to a complicated algorithm in six weeks? I'll check the hell out of that.
I don't think this is specific to python. I'm a firm believer in Design by Contract: Ideally all functions should have clear pre- and post-conditions; unfortunately most languages (Eiffel being the canonical exception) don't provide particularly convenient ways to achieve this which contributes to the apparent conflict between clarity and correctness.
As a practical matter, one approach is to write a 'checkValues' method so as to avoid cluttering __init__. You can even compress it to:
def __init__(self, text_values):
self.text_values = checkValues( text_values )
def checkValues(text_values):
for text_value in text_values:
assert isinstance(text_value, basestring), u'All text_values should be str or unicode.'
assert 2 <= len(text_value), u'All text_values should be at least two characters long.'
return( frozenset( text_values ) )
Another approach would be using a folding text editor that can hide/show the pre-conditions with the aid of some commenting conventions; this would also be useful for auto-generating documentation.
Take all the energy you're putting into argument checking and channel it instead into writing clear, concise doc-strings.
Unless you're writing code for nuclear reactors; in which case I would appreciate you doing both.
Another way to think is: you only need to catch errors that you can correct. If you're checking the input just for aborting with an AssertionError, you're better off just allowing the code to raise the appropriate exception later so you can debug correctly.
This line in particular is pretty bad since it stops duck-typing:
assert isinstance(text_value, basestring), u'All text_values should be str or unicode.'

Common pitfalls in Python [duplicate]

This question already has answers here:
Python 2.x gotchas and landmines [closed]
(23 answers)
Closed 3 years ago.
Today I was bitten again by mutable default arguments after many years. I usually don't use mutable default arguments unless needed, but I think with time I forgot about that. Today in the application I added tocElements=[] in a PDF generation function's argument list and now "Table of Contents" gets longer and longer after each invocation of "generate pdf". :)
What else should I add to my list of things to MUST avoid?
Always import modules the same way, e.g. from y import x and import x are treated as different modules.
Do not use range in place of lists because range() will become an iterator anyway, the following will fail:
myIndexList = [0, 1, 3]
isListSorted = myIndexList == range(3) # will fail in 3.0
isListSorted = myIndexList == list(range(3)) # will not
Same thing can be mistakenly done with xrange:
myIndexList == xrange(3)
Be careful catching multiple exception types:
try:
raise KeyError("hmm bug")
except KeyError, TypeError:
print TypeError
This prints "hmm bug", though it is not a bug; it looks like we are catching exceptions of both types, but instead we are catching KeyError only as variable TypeError, use this instead:
try:
raise KeyError("hmm bug")
except (KeyError, TypeError):
print TypeError
Don't use index to loop over a sequence
Don't :
for i in range(len(tab)) :
print tab[i]
Do :
for elem in tab :
print elem
For will automate most iteration operations for you.
Use enumerate if you really need both the index and the element.
for i, elem in enumerate(tab):
print i, elem
Be careful when using "==" to check against True or False
if (var == True) :
# this will execute if var is True or 1, 1.0, 1L
if (var != True) :
# this will execute if var is neither True nor 1
if (var == False) :
# this will execute if var is False or 0 (or 0.0, 0L, 0j)
if (var == None) :
# only execute if var is None
if var :
# execute if var is a non-empty string/list/dictionary/tuple, non-0, etc
if not var :
# execute if var is "", {}, [], (), 0, None, etc.
if var is True :
# only execute if var is boolean True, not 1
if var is False :
# only execute if var is boolean False, not 0
if var is None :
# same as var == None
Do not check if you can, just do it and handle the error
Pythonistas usually say "It's easier to ask for forgiveness than permission".
Don't :
if os.path.isfile(file_path) :
file = open(file_path)
else :
# do something
Do :
try :
file = open(file_path)
except OSError as e:
# do something
Or even better with python 2.6+ / 3:
with open(file_path) as file :
It is much better because it's much more generical. You can apply "try / except" to almost anything. You don't need to care about what to do to prevent it, just about the error you are risking.
Do not check against type
Python is dynamically typed, therefore checking for type makes you lose flexibility. Instead, use duck typing by checking behavior. E.G, you expect a string in a function, then use str() to convert any object in a string. You expect a list, use list() to convert any iterable in a list.
Don't :
def foo(name) :
if isinstance(name, str) :
print name.lower()
def bar(listing) :
if isinstance(listing, list) :
listing.extend((1, 2, 3))
return ", ".join(listing)
Do :
def foo(name) :
print str(name).lower()
def bar(listing) :
l = list(listing)
l.extend((1, 2, 3))
return ", ".join(l)
Using the last way, foo will accept any object. Bar will accept strings, tuples, sets, lists and much more. Cheap DRY :-)
Don't mix spaces and tabs
Just don't. You would cry.
Use object as first parent
This is tricky, but it will bite you as your program grows. There are old and new classes in Python 2.x. The old ones are, well, old. They lack some features, and can have awkward behavior with inheritance. To be usable, any of your class must be of the "new style". To do so, make it inherit from "object" :
Don't :
class Father :
pass
class Child(Father) :
pass
Do :
class Father(object) :
pass
class Child(Father) :
pass
In Python 3.x all classes are new style so you can declare class Father: is fine.
Don't initialize class attributes outside the __init__ method
People coming from other languages find it tempting because that what you do the job in Java or PHP. You write the class name, then list your attributes and give them a default value. It seems to work in Python, however, this doesn't work the way you think.
Doing that will setup class attributes (static attributes), then when you will try to get the object attribute, it will gives you its value unless it's empty. In that case it will return the class attributes.
It implies two big hazards :
If the class attribute is changed, then the initial value is changed.
If you set a mutable object as a default value, you'll get the same object shared across instances.
Don't (unless you want static) :
class Car(object):
color = "red"
wheels = [wheel(), Wheel(), Wheel(), Wheel()]
Do :
class Car(object):
def __init__(self):
self.color = "red"
self.wheels = [wheel(), Wheel(), Wheel(), Wheel()]
When you need a population of arrays you might be tempted to type something like this:
>>> a=[[1,2,3,4,5]]*4
And sure enough it will give you what you expect when you look at it
>>> from pprint import pprint
>>> pprint(a)
[[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
But don't expect the elements of your population to be seperate objects:
>>> a[0][0] = 2
>>> pprint(a)
[[2, 2, 3, 4, 5],
[2, 2, 3, 4, 5],
[2, 2, 3, 4, 5],
[2, 2, 3, 4, 5]]
Unless this is what you need...
It is worth mentioning a workaround:
a = [[1,2,3,4,5] for _ in range(4)]
Python Language Gotchas -- things that fail in very obscure ways
Using mutable default arguments.
Leading zeroes mean octal. 09 is a very obscure syntax error in Python 2.x
Misspelling overridden method names in a superclass or subclass. The superclass misspelling mistake is worse, because none of the subclasses override it correctly.
Python Design Gotchas
Spending time on introspection (e.g. trying to automatically determine types or superclass identity or other stuff). First, it's obvious from reading the source. More importantly, time spent on weird Python introspection usually indicates a fundamental failure to grasp polymorphism. 80% of the Python introspection questions on SO are failure to get Polymorphism.
Spending time on code golf. Just because your mental model of your application is four keywords ("do", "what", "I", "mean"), doesn't mean you should build a hyper-complex introspective decorator-driven framework to do that. Python allows you to take DRY to a level that is silliness. The rest of the Python introspection questions on SO attempts to reduce complex problems to code golf exercises.
Monkeypatching.
Failure to actually read through the standard library, and reinventing the wheel.
Conflating interactive type-as-you go Python with a proper program. While you're typing interactively, you may lose track of a variable and have to use globals(). Also, while you're typing, almost everything is global. In proper programs, you'll never "lose track of" a variable, and nothing will be global.
Mutating a default argument:
def foo(bar=[]):
bar.append('baz')
return bar
The default value is evaluated only once, and not every time the function is called. Repeated calls to foo() would return ['baz'], ['baz', 'baz'], ['baz', 'baz', 'baz'], ...
If you want to mutate bar do something like this:
def foo(bar=None):
if bar is None:
bar = []
bar.append('baz')
return bar
Or, if you like arguments to be final:
def foo(bar=[]):
not_bar = bar[:]
not_bar.append('baz')
return not_bar
I don't know whether this is a common mistake, but while Python doesn't have increment and decrement operators, double signs are allowed, so
++i
and
--i
is syntactically correct code, but doesn't do anything "useful" or that you may be expecting.
Rolling your own code before looking in the standard library. For example, writing this:
def repeat_list(items):
while True:
for item in items:
yield item
When you could just use this:
from itertools import cycle
Examples of frequently overlooked modules (besides itertools) include:
optparse for creating command line parsers
ConfigParser for reading configuration files in a standard manner
tempfile for creating and managing temporary files
shelve for storing Python objects to disk, handy when a full fledged database is overkill
Avoid using keywords as your own identifiers.
Also, it's always good to not use from somemodule import *.
Not using functional tools. This isn't just a mistake from a style standpoint, it's a mistake from a speed standpoint because a lot of the functional tools are optimized in C.
This is the most common example:
temporary = []
for item in itemlist:
temporary.append(somefunction(item))
itemlist = temporary
The correct way to do it:
itemlist = map(somefunction, itemlist)
The just as correct way to do it:
itemlist = [somefunction(x) for x in itemlist]
And if you only need the processed items available one at a time, rather than all at once, you can save memory and improve speed by using the iterable equivalents
# itertools-based iterator
itemiter = itertools.imap(somefunction, itemlist)
# generator expression-based iterator
itemiter = (somefunction(x) for x in itemlist)
Surprised that nobody said this:
Mix tab and spaces when indenting.
Really, it's a killer. Believe me. In particular, if it runs.
If you're coming from C++, realize that variables declared in a class definition are static. You can initialize nonstatic members in the init method.
Example:
class MyClass:
static_member = 1
def __init__(self):
self.non_static_member = random()
Code Like a Pythonista: Idiomatic Python
Normal copying (assigning) is done by reference, so filling a container by adapting the same object and inserting, ends up with a container with references to the last added object.
Use copy.deepcopy instead.
Importing re and using the full regular expression approach to string matching/transformation, when perfectly good string methods exist for every common operation (e.g. capitalisation, simple matching/searching).
Using the %s formatter in error messages. In almost every circumstance, %r should be used.
For example, imagine code like this:
try:
get_person(person)
except NoSuchPerson:
logger.error("Person %s not found." %(person))
Printed this error:
ERROR: Person wolever not found.
It's impossible to tell if the person variable is the string "wolever", the unicode string u"wolever" or an instance of the Person class (which has __str__ defined as def __str__(self): return self.name). Whereas, if %r was used, there would be three different error messages:
...
logger.error("Person %r not found." %(person))
Would produce the much more helpful errors:
ERROR: Person 'wolever' not found.
ERROR: Person u'wolever' not found.
ERROR: Person not found.
Another good reason for this is that paths are a whole lot easier to copy/paste. Imagine:
try:
stuff = open(path).read()
except IOError:
logger.error("Could not open %s" %(path))
If path is some path/with 'strange' "characters", the error message will be:
ERROR: Could not open some path/with 'strange' "characters"
Which is hard to visually parse and hard to copy/paste into a shell.
Whereas, if %r is used, the error would be:
ERROR: Could not open 'some path/with \'strange\' "characters"'
Easy to visually parse, easy to copy-paste, all around better.
don't write large output messages to standard output
strings are immutable - build them not using "+" operator but rather
using str.join() function.
read those articles:
python gotchas
things to avoid
Gotchas for Python users
Python Landmines
Last link is the original one, this SO question is an duplicate.
A bad habit I had to train myself out of was using X and Y or Z for inline logic.
Unless you can 100% always guarantee that Y will be a true value, even when your code changes in 18 months time, you set yourself up for some unexpected behaviour.
Thankfully, in later versions you can use Y if X else Z.
I would stop using deprecated methods in 2.6, so that your app or script will be ready and easier to convert to Python 3.
I've started learning Python as well and one of the bigest mistakes I made is constantly using C++/C# indexed "for" loop. Python have for(i ; i < length ; i++) type loop and for a good reason - most of the time there are better ways to do there same thing.
Example:
I had a method that iterated over a list and returned the indexes of selected items:
for i in range(len(myList)):
if myList[i].selected:
retVal.append(i)
Instead Python has list comprehension that solves the same problem in a more elegant and easy to read way:
retVal = [index for index, item in enumerate(myList) if item.selected]
++n and --n may not work as expected by people coming from C or Java background.
++n is positive of a positive number, which is simply n.
--n is negative of a negative number, which is simply n.
Some personal opinions, but I find it best NOT to:
use deprecated modules (use warnings for them)
overuse classes and inheritance (typical of static languages legacy maybe)
explicitly use declarative algorithms (as iteration with for vs use of
itertools)
reimplement functions from the standard lib, "because I don't need all of those features"
using features for the sake of it (reducing compatibility with older Python versions)
using metaclasses when you really don't have to and more generally make things too "magic"
avoid using generators
(more personal) try to micro-optimize CPython code on a low-level basis. Better spend time on algorithms and then optimize by making a small C shared lib called by ctypes (it's so easy to gain 5x perf boosts on an inner loop)
use unnecessary lists when iterators would suffice
code a project directly for 3.x before the libs you need are all available (this point may be a bit controversial now!)
import this
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
import not_this
Write ugly code.
Write implicit code.
Write complex code.
Write nested code.
Write dense code.
Write unreadable code.
Write special cases.
Strive for purity.
Ignore errors and exceptions.
Write optimal code before releasing.
Every implementation needs a flowchart.
Don't use namespaces.
Never assume that having a multi-threaded Python application and a SMP capable machine (for instance one equipped with a multi-core CPU) will give you the benefit of introducing true parallelism into your application. Most likely it will not because of GIL (Global Interpreter Lock) which synchronizes your application on the byte-code interpreter level.
There are some workarounds like taking advantage of SMP by putting the concurrent code in C API calls or using multiple processes (instead of threads) via wrappers (for instance like the one available at http://www.parallelpython.org) but if one needs true multi-threading in Python one should look at things like Jython, IronPython etc. (GIL is a feature of the CPython interpreter so other implementations are not affected).
According to Python 3000 FAQ (available at Artima) the above still stands even for the latest Python versions.
Somewhat related to the default mutable argument, how one checks for the "missing" case results in differences when an empty list is passed:
def func1(toc=None):
if not toc:
toc = []
toc.append('bar')
def func2(toc=None):
if toc is None:
toc = []
toc.append('bar')
def demo(toc, func):
print func.__name__
print ' before:', toc
func(toc)
print ' after:', toc
demo([], func1)
demo([], func2)
Here's the output:
func1
before: []
after: []
func2
before: []
after: ['bar']
You've mentioned default arguments... One that's almost as bad as mutable default arguments: default values which aren't None.
Consider a function which will cook some food:
def cook(breakfast="spam"):
arrange_ingredients_for(breakfast)
heat_ingredients_for(breakfast)
serve(breakfast)
Because it specifies a default value for breakfast, it is impossible for some other function to say "cook your default breakfast" without a special-case:
def order(breakfast=None):
if breakfast is None:
cook()
else:
cook(breakfast)
However, this could be avoided if cook used None as a default value:
def cook(breakfast=None):
if breakfast is None:
breakfast = "spam"
def order(breakfast=None):
cook(breakfast)
A good example of this is Django bug #6988. Django's caching module had a "save to cache" function which looked like this:
def set(key, value, timeout=0):
if timeout == 0:
timeout = settings.DEFAULT_TIMEOUT
_caching_backend.set(key, value, timeout)
But, for the memcached backend, a timeout of 0 means "never timeout"… Which, as you can see, would be impossible to specify.
Don't modify a list while iterating over it.
odd = lambda x : bool(x % 2)
numbers = range(10)
for i in range(len(numbers)):
if odd(numbers[i]):
del numbers[i]
One common suggestion to work around this problem is to iterate over the list in reverse:
for i in range(len(numbers)-1,0,-1):
if odd(numbers[i]):
del numbers[i]
But even better is to use a list comprehension to build a new list to replace the old:
numbers[:] = [n for n in numbers if not odd(n)]
Common pitfall: default arguments are evaluated once:
def x(a, l=[]):
l.append(a)
return l
print x(1)
print x(2)
prints:
[1]
[1, 2]
i.e. you always get the same list.
The very first mistake before you even start: don't be afraid of whitespace.
When you show someone a piece of Python code, they are impressed until you tell them that they have to indent correctly. For some reason, most people feel that a language shouldn't force a certain style on them while all of them will indent the code nonetheless.
my_variable = <something>
...
my_varaible = f(my_variable)
...
use my_variable and thinking it contains the result from f, and not the initial value
Python won't warn you in any way that on the second assignment you misspelled the variable name and created a new one.
Creating a local module with the same name as one from the stdlib. This is almost always done by accident (as reported in this question), but usually results in cryptic error messages.
Promiscuous Exception Handling
This is something that I see a surprising amount in production code and it makes me cringe.
try:
do_something() # do_something can raise a lot errors e.g. files, sockets
except:
pass # who cares we'll just ignore it
Was the exception the one you want suppress, or is it more serious? But there are more subtle cases. That can make you pull your hair out trying to figure out.
try:
foo().bar().baz()
except AttributeError: # baz() may return None or an incompatible *duck type*
handle_no_baz()
The problem is foo or baz could be the culprits too. I think this can be more insidious in that this is idiomatic python where you are checking your types for proper methods. But each method call has chance to return something unexpected and suppress bugs that should be raising exceptions.
Knowing what exceptions a method can throw are not always obvious. For example, urllib and urllib2 use socket which has its own exceptions that percolate up and rear their ugly head when you least expect it.
Exception handling is a productivity boon in handling errors over system level languages like C. But I have found suppressing exceptions improperly can create truly mysterious debugging sessions and take away a major advantage interpreted languages provide.

"else" considered harmful in Python?

In an answer (by S.Lott) to a question about Python's try...else statement:
Actually, even on an if-statement, the
else: can be abused in truly terrible
ways creating bugs that are very hard
to find. [...]
Think twice about else:. It is
generally a problem. Avoid it except
in an if-statement and even then
consider documenting the else-
condition to make it explicit.
Is this a widely held opinion? Is else considered harmful?
Of course you can write confusing code with it but that's true of any other language construct. Even Python's for...else seems to me a very handy thing to have (less so for try...else).
S.Lott has obviously seen some bad code out there. Haven't we all? I do not consider else harmful, though I've seen it used to write bad code. In those cases, all the surrounding code has been bad as well, so why blame poor else?
No it is not harmful, it is necessary.
There should always be a catch-all statement. All switches should have a default. All pattern matching in an ML language should have a default.
The argument that it is impossible to reason what is true after a series of if statements is a fact of life. The computer is the biggest finite state machine out there, and it is silly to enumerate every single possibility in every situation.
If you are really afraid that unknown errors go unnoticed in else statements, is it really that hard to raise an exception there?
Saying that else is considered harmful is a bit like saying that variables or classes are harmful. Heck, it's even like saying that goto is harmful. Sure, things can be misused. But at some point, you just have to trust programmers to be adults and be smart enough not to.
What it comes down to is this: if you're willing to not use something because an answer on SO or a blog post or even a famous paper by Dijkstra told you not to, you need to consider if programming is the right profession for you.
I wouldn't say it is harmful, but there are times when the else statement can get you into trouble. For instance, if you need to do some processing based on an input value and there are only two valid input values. Only checking for one could introduce a bug.
eg:
The only valid inputs are 1 and 2:
if(input == 1)
{
//do processing
...
}
else
{
//do processing
...
}
In this case, using the else would allow all values other than 1 to be processed when it should only be for values 1 and 2.
To me, the whole concept of certain popular language constructs being inherently bad is just plain wrong. Even goto has its place. I've seen very readable, maintainable code by the likes of Walter Bright and Linus Torvalds that uses it. It's much better to just teach programmers that readability counts and to use common sense than to arbitrarily declare certain constructs "harmful".
If you write:
if foo:
# ...
elif bar:
# ...
# ...
then the reader may be left wondering: what if neither foo nor bar is true? Perhaps you know, from your understanding of the code, that it must be the case that either foo or bar. I would prefer to see:
if foo:
# ...
else:
# at this point, we know that bar is true.
# ...
# ...
or:
if foo:
# ...
else:
assert bar
# ...
# ...
This makes it clear to the reader how you expect control to flow, without requiring the reader to have intimate knowledge of where foo and bar come from.
(in the original case, you could still write a comment explaining what is happening, but I think I would then wonder: "Why not just use an else: clause?")
I think the point is not that you shouldn't use else:; rather, that an else: clause can allow you to write unclear code and you should try to recognise when this happens and add a little comment to help out any readers.
Which is true about most things in programming languages, really :-)
Au contraire... In my opinion, there MUST be an else for every if. Granted, you can do stupid things, but you can abuse any construct if you try hard enough. You know the saying "a real programer can write FORTRAN in every language".
What I do lots of time is to write the else part as a comment, describing why there's nothing to be done.
Else is most useful when documenting assumptions about the code. It ensures that you have thought through both sides of an if statement.
Always using an else clause with each if statement is even a recommended practice in "Code Complete".
The rationale behind including the else statement (of try...else) in Python in the first place was to only catch the exceptions you really want to. Normally when you have a try...except block, there's some code that might raise an exception, and then there's some more code that should only run if the previous code was successful. Without an else block, you'd have to put all that code in the try block:
try:
something_that_might_raise_error()
do_this_only_if_that_was_ok()
except ValueError:
# whatever
The issue is, what if do_this_only_if_that_was_ok() raises a ValueError? It would get caught by the except statement, when you might not have wanted it to. That's the purpose of the else block:
try:
something_that_might_raise_error()
except ValueError:
# whatever
else:
do_this_only_if_that_was_ok()
I guess it's a matter of opinion to some extent, but I personally think this is a great idea, even though I use it very rarely. When I do use it, it just feels very appropriate (and besides, I think it helps clarify the code flow a bit)
Seems to me that, for any language and any flow-control statement where there is a default scenario or side-effect, that scenario needs to have the same level of consideration. The logic in if or switch or while is only as good as the condition if(x) while(x) or for(...). Therefore the statement is not harmful but the logic in their condition is.
Therefore, as developers it is our responsibility to code with the wide scope of the else in-mind. Too many developers treat it as a 'if not the above' when in-fact it can ignore all common sense because the only logic in it is the negation of the preceding logic, which is often incomplete. (an algorithm design error itself)
I don't then consider 'else' any more harmful than off-by-ones in a for() loop or bad memory management. It's all about the algorithms. If your automata is complete in its scope and possible branches, and all are concrete and understood then there is no danger. The danger is misuse of the logic behind the expressions by people not realizing the impact of wide-scope logic. Computers are stupid, they do what they are told by their operator(in theory)
I do consider the try and catch to be dangerous because it can negate handling to an unknown quantity of code. Branching above the raise may contain a bug, highlighted by the raise itself. This is can be non-obvious. It is like turning a sequential set of instructions into a tree or graph of error handling, where each component is dependent on the branches in the parent. Odd. Mind you, I love C.
There is a so called "dangling else" problem which is encountered in C family languages as follows:
if (a==4)
if (b==2)
printf("here!");
else
printf("which one");
This innocent code can be understood in two ways:
if (a==4)
if (b==2)
printf("here!");
else
printf("which one");
or
if (a==4)
if (b==2)
printf("here!");
else
printf("which one");
The problem is that the "else" is "dangling", one can confuse the owner of the else. Of course the compiler will not make this confusion, but it is valid for mortals.
Thanks to Python, we can not have a dangling else problem in Python since we have to write either
if a==4:
if b==2:
print "here!"
else:
print "which one"
or
if a==4:
if b==2:
print "here!"
else:
print "which one"
So that human eye catches it. And, nope, I do not think "else" is harmful, it is as harmful as "if".
In the example posited of being hard to reason, it can be written explicitly, but the else is still necessary.
E.g.
if a < 10:
# condition stated explicitly
elif a > 10 and b < 10:
# condition confusing but at least explicit
else:
# Exactly what is true here?
# Can be hard to reason out what condition is true
Can be written
if a < 10:
# condition stated explicitly
elif a > 10 and b < 10:
# condition confusing but at least explicit
elif a > 10 and b >=10:
# else condition
else:
# Handle edge case with error?
I think the point with respect to try...except...else is that it is an easy mistake to use it to create inconsistent state rather than fix it. It is not that it should be avoided at all costs, but it can be counter-productive.
Consider:
try:
file = open('somefile','r')
except IOError:
logger.error("File not found!")
else:
# Some file operations
file.close()
# Some code that no longer explicitly references 'file'
It would be real nice to say that the above block prevented code from trying to access a file that didn't exist, or a directory for which the user has no permissions, and to say that everything is encapsulated because it is within a try...except...else block. But in reality, a lot of code in the above form really should look like this:
try:
file = open('somefile','r')
except IOError:
logger.error("File not found!")
return False
# Some file operations
file.close()
# Some code that no longer explicitly references 'file'
You are often fooling yourself by saying that because file is no longer referenced in scope, it's okay to go on coding after the block, but in many cases something will come up where it just isn't okay. Or maybe a variable will later be created within the else block that isn't created in the except block.
This is how I would differentiate the if...else from try...except...else. In both cases, one must make the blocks parallel in most cases (variables and state set in one ought to be set in the other) but in the latter, coders often don't, likely because it's impossible or irrelevant. In such cases, it often will make a whole lot more sense to return to the caller than to try and keep working around what you think you will have in the best case scenario.

Categories