Common pitfalls in Python [duplicate] - python

This question already has answers here:
Python 2.x gotchas and landmines [closed]
(23 answers)
Closed 3 years ago.
Today I was bitten again by mutable default arguments after many years. I usually don't use mutable default arguments unless needed, but I think with time I forgot about that. Today in the application I added tocElements=[] in a PDF generation function's argument list and now "Table of Contents" gets longer and longer after each invocation of "generate pdf". :)
What else should I add to my list of things to MUST avoid?
Always import modules the same way, e.g. from y import x and import x are treated as different modules.
Do not use range in place of lists because range() will become an iterator anyway, the following will fail:
myIndexList = [0, 1, 3]
isListSorted = myIndexList == range(3) # will fail in 3.0
isListSorted = myIndexList == list(range(3)) # will not
Same thing can be mistakenly done with xrange:
myIndexList == xrange(3)
Be careful catching multiple exception types:
try:
raise KeyError("hmm bug")
except KeyError, TypeError:
print TypeError
This prints "hmm bug", though it is not a bug; it looks like we are catching exceptions of both types, but instead we are catching KeyError only as variable TypeError, use this instead:
try:
raise KeyError("hmm bug")
except (KeyError, TypeError):
print TypeError

Don't use index to loop over a sequence
Don't :
for i in range(len(tab)) :
print tab[i]
Do :
for elem in tab :
print elem
For will automate most iteration operations for you.
Use enumerate if you really need both the index and the element.
for i, elem in enumerate(tab):
print i, elem
Be careful when using "==" to check against True or False
if (var == True) :
# this will execute if var is True or 1, 1.0, 1L
if (var != True) :
# this will execute if var is neither True nor 1
if (var == False) :
# this will execute if var is False or 0 (or 0.0, 0L, 0j)
if (var == None) :
# only execute if var is None
if var :
# execute if var is a non-empty string/list/dictionary/tuple, non-0, etc
if not var :
# execute if var is "", {}, [], (), 0, None, etc.
if var is True :
# only execute if var is boolean True, not 1
if var is False :
# only execute if var is boolean False, not 0
if var is None :
# same as var == None
Do not check if you can, just do it and handle the error
Pythonistas usually say "It's easier to ask for forgiveness than permission".
Don't :
if os.path.isfile(file_path) :
file = open(file_path)
else :
# do something
Do :
try :
file = open(file_path)
except OSError as e:
# do something
Or even better with python 2.6+ / 3:
with open(file_path) as file :
It is much better because it's much more generical. You can apply "try / except" to almost anything. You don't need to care about what to do to prevent it, just about the error you are risking.
Do not check against type
Python is dynamically typed, therefore checking for type makes you lose flexibility. Instead, use duck typing by checking behavior. E.G, you expect a string in a function, then use str() to convert any object in a string. You expect a list, use list() to convert any iterable in a list.
Don't :
def foo(name) :
if isinstance(name, str) :
print name.lower()
def bar(listing) :
if isinstance(listing, list) :
listing.extend((1, 2, 3))
return ", ".join(listing)
Do :
def foo(name) :
print str(name).lower()
def bar(listing) :
l = list(listing)
l.extend((1, 2, 3))
return ", ".join(l)
Using the last way, foo will accept any object. Bar will accept strings, tuples, sets, lists and much more. Cheap DRY :-)
Don't mix spaces and tabs
Just don't. You would cry.
Use object as first parent
This is tricky, but it will bite you as your program grows. There are old and new classes in Python 2.x. The old ones are, well, old. They lack some features, and can have awkward behavior with inheritance. To be usable, any of your class must be of the "new style". To do so, make it inherit from "object" :
Don't :
class Father :
pass
class Child(Father) :
pass
Do :
class Father(object) :
pass
class Child(Father) :
pass
In Python 3.x all classes are new style so you can declare class Father: is fine.
Don't initialize class attributes outside the __init__ method
People coming from other languages find it tempting because that what you do the job in Java or PHP. You write the class name, then list your attributes and give them a default value. It seems to work in Python, however, this doesn't work the way you think.
Doing that will setup class attributes (static attributes), then when you will try to get the object attribute, it will gives you its value unless it's empty. In that case it will return the class attributes.
It implies two big hazards :
If the class attribute is changed, then the initial value is changed.
If you set a mutable object as a default value, you'll get the same object shared across instances.
Don't (unless you want static) :
class Car(object):
color = "red"
wheels = [wheel(), Wheel(), Wheel(), Wheel()]
Do :
class Car(object):
def __init__(self):
self.color = "red"
self.wheels = [wheel(), Wheel(), Wheel(), Wheel()]

When you need a population of arrays you might be tempted to type something like this:
>>> a=[[1,2,3,4,5]]*4
And sure enough it will give you what you expect when you look at it
>>> from pprint import pprint
>>> pprint(a)
[[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
But don't expect the elements of your population to be seperate objects:
>>> a[0][0] = 2
>>> pprint(a)
[[2, 2, 3, 4, 5],
[2, 2, 3, 4, 5],
[2, 2, 3, 4, 5],
[2, 2, 3, 4, 5]]
Unless this is what you need...
It is worth mentioning a workaround:
a = [[1,2,3,4,5] for _ in range(4)]

Python Language Gotchas -- things that fail in very obscure ways
Using mutable default arguments.
Leading zeroes mean octal. 09 is a very obscure syntax error in Python 2.x
Misspelling overridden method names in a superclass or subclass. The superclass misspelling mistake is worse, because none of the subclasses override it correctly.
Python Design Gotchas
Spending time on introspection (e.g. trying to automatically determine types or superclass identity or other stuff). First, it's obvious from reading the source. More importantly, time spent on weird Python introspection usually indicates a fundamental failure to grasp polymorphism. 80% of the Python introspection questions on SO are failure to get Polymorphism.
Spending time on code golf. Just because your mental model of your application is four keywords ("do", "what", "I", "mean"), doesn't mean you should build a hyper-complex introspective decorator-driven framework to do that. Python allows you to take DRY to a level that is silliness. The rest of the Python introspection questions on SO attempts to reduce complex problems to code golf exercises.
Monkeypatching.
Failure to actually read through the standard library, and reinventing the wheel.
Conflating interactive type-as-you go Python with a proper program. While you're typing interactively, you may lose track of a variable and have to use globals(). Also, while you're typing, almost everything is global. In proper programs, you'll never "lose track of" a variable, and nothing will be global.

Mutating a default argument:
def foo(bar=[]):
bar.append('baz')
return bar
The default value is evaluated only once, and not every time the function is called. Repeated calls to foo() would return ['baz'], ['baz', 'baz'], ['baz', 'baz', 'baz'], ...
If you want to mutate bar do something like this:
def foo(bar=None):
if bar is None:
bar = []
bar.append('baz')
return bar
Or, if you like arguments to be final:
def foo(bar=[]):
not_bar = bar[:]
not_bar.append('baz')
return not_bar

I don't know whether this is a common mistake, but while Python doesn't have increment and decrement operators, double signs are allowed, so
++i
and
--i
is syntactically correct code, but doesn't do anything "useful" or that you may be expecting.

Rolling your own code before looking in the standard library. For example, writing this:
def repeat_list(items):
while True:
for item in items:
yield item
When you could just use this:
from itertools import cycle
Examples of frequently overlooked modules (besides itertools) include:
optparse for creating command line parsers
ConfigParser for reading configuration files in a standard manner
tempfile for creating and managing temporary files
shelve for storing Python objects to disk, handy when a full fledged database is overkill

Avoid using keywords as your own identifiers.
Also, it's always good to not use from somemodule import *.

Not using functional tools. This isn't just a mistake from a style standpoint, it's a mistake from a speed standpoint because a lot of the functional tools are optimized in C.
This is the most common example:
temporary = []
for item in itemlist:
temporary.append(somefunction(item))
itemlist = temporary
The correct way to do it:
itemlist = map(somefunction, itemlist)
The just as correct way to do it:
itemlist = [somefunction(x) for x in itemlist]
And if you only need the processed items available one at a time, rather than all at once, you can save memory and improve speed by using the iterable equivalents
# itertools-based iterator
itemiter = itertools.imap(somefunction, itemlist)
# generator expression-based iterator
itemiter = (somefunction(x) for x in itemlist)

Surprised that nobody said this:
Mix tab and spaces when indenting.
Really, it's a killer. Believe me. In particular, if it runs.

If you're coming from C++, realize that variables declared in a class definition are static. You can initialize nonstatic members in the init method.
Example:
class MyClass:
static_member = 1
def __init__(self):
self.non_static_member = random()

Code Like a Pythonista: Idiomatic Python

Normal copying (assigning) is done by reference, so filling a container by adapting the same object and inserting, ends up with a container with references to the last added object.
Use copy.deepcopy instead.

Importing re and using the full regular expression approach to string matching/transformation, when perfectly good string methods exist for every common operation (e.g. capitalisation, simple matching/searching).

Using the %s formatter in error messages. In almost every circumstance, %r should be used.
For example, imagine code like this:
try:
get_person(person)
except NoSuchPerson:
logger.error("Person %s not found." %(person))
Printed this error:
ERROR: Person wolever not found.
It's impossible to tell if the person variable is the string "wolever", the unicode string u"wolever" or an instance of the Person class (which has __str__ defined as def __str__(self): return self.name). Whereas, if %r was used, there would be three different error messages:
...
logger.error("Person %r not found." %(person))
Would produce the much more helpful errors:
ERROR: Person 'wolever' not found.
ERROR: Person u'wolever' not found.
ERROR: Person not found.
Another good reason for this is that paths are a whole lot easier to copy/paste. Imagine:
try:
stuff = open(path).read()
except IOError:
logger.error("Could not open %s" %(path))
If path is some path/with 'strange' "characters", the error message will be:
ERROR: Could not open some path/with 'strange' "characters"
Which is hard to visually parse and hard to copy/paste into a shell.
Whereas, if %r is used, the error would be:
ERROR: Could not open 'some path/with \'strange\' "characters"'
Easy to visually parse, easy to copy-paste, all around better.

don't write large output messages to standard output
strings are immutable - build them not using "+" operator but rather
using str.join() function.
read those articles:
python gotchas
things to avoid
Gotchas for Python users
Python Landmines
Last link is the original one, this SO question is an duplicate.

A bad habit I had to train myself out of was using X and Y or Z for inline logic.
Unless you can 100% always guarantee that Y will be a true value, even when your code changes in 18 months time, you set yourself up for some unexpected behaviour.
Thankfully, in later versions you can use Y if X else Z.

I would stop using deprecated methods in 2.6, so that your app or script will be ready and easier to convert to Python 3.

I've started learning Python as well and one of the bigest mistakes I made is constantly using C++/C# indexed "for" loop. Python have for(i ; i < length ; i++) type loop and for a good reason - most of the time there are better ways to do there same thing.
Example:
I had a method that iterated over a list and returned the indexes of selected items:
for i in range(len(myList)):
if myList[i].selected:
retVal.append(i)
Instead Python has list comprehension that solves the same problem in a more elegant and easy to read way:
retVal = [index for index, item in enumerate(myList) if item.selected]

++n and --n may not work as expected by people coming from C or Java background.
++n is positive of a positive number, which is simply n.
--n is negative of a negative number, which is simply n.

Some personal opinions, but I find it best NOT to:
use deprecated modules (use warnings for them)
overuse classes and inheritance (typical of static languages legacy maybe)
explicitly use declarative algorithms (as iteration with for vs use of
itertools)
reimplement functions from the standard lib, "because I don't need all of those features"
using features for the sake of it (reducing compatibility with older Python versions)
using metaclasses when you really don't have to and more generally make things too "magic"
avoid using generators
(more personal) try to micro-optimize CPython code on a low-level basis. Better spend time on algorithms and then optimize by making a small C shared lib called by ctypes (it's so easy to gain 5x perf boosts on an inner loop)
use unnecessary lists when iterators would suffice
code a project directly for 3.x before the libs you need are all available (this point may be a bit controversial now!)

import this
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
import not_this
Write ugly code.
Write implicit code.
Write complex code.
Write nested code.
Write dense code.
Write unreadable code.
Write special cases.
Strive for purity.
Ignore errors and exceptions.
Write optimal code before releasing.
Every implementation needs a flowchart.
Don't use namespaces.

Never assume that having a multi-threaded Python application and a SMP capable machine (for instance one equipped with a multi-core CPU) will give you the benefit of introducing true parallelism into your application. Most likely it will not because of GIL (Global Interpreter Lock) which synchronizes your application on the byte-code interpreter level.
There are some workarounds like taking advantage of SMP by putting the concurrent code in C API calls or using multiple processes (instead of threads) via wrappers (for instance like the one available at http://www.parallelpython.org) but if one needs true multi-threading in Python one should look at things like Jython, IronPython etc. (GIL is a feature of the CPython interpreter so other implementations are not affected).
According to Python 3000 FAQ (available at Artima) the above still stands even for the latest Python versions.

Somewhat related to the default mutable argument, how one checks for the "missing" case results in differences when an empty list is passed:
def func1(toc=None):
if not toc:
toc = []
toc.append('bar')
def func2(toc=None):
if toc is None:
toc = []
toc.append('bar')
def demo(toc, func):
print func.__name__
print ' before:', toc
func(toc)
print ' after:', toc
demo([], func1)
demo([], func2)
Here's the output:
func1
before: []
after: []
func2
before: []
after: ['bar']

You've mentioned default arguments... One that's almost as bad as mutable default arguments: default values which aren't None.
Consider a function which will cook some food:
def cook(breakfast="spam"):
arrange_ingredients_for(breakfast)
heat_ingredients_for(breakfast)
serve(breakfast)
Because it specifies a default value for breakfast, it is impossible for some other function to say "cook your default breakfast" without a special-case:
def order(breakfast=None):
if breakfast is None:
cook()
else:
cook(breakfast)
However, this could be avoided if cook used None as a default value:
def cook(breakfast=None):
if breakfast is None:
breakfast = "spam"
def order(breakfast=None):
cook(breakfast)
A good example of this is Django bug #6988. Django's caching module had a "save to cache" function which looked like this:
def set(key, value, timeout=0):
if timeout == 0:
timeout = settings.DEFAULT_TIMEOUT
_caching_backend.set(key, value, timeout)
But, for the memcached backend, a timeout of 0 means "never timeout"… Which, as you can see, would be impossible to specify.

Don't modify a list while iterating over it.
odd = lambda x : bool(x % 2)
numbers = range(10)
for i in range(len(numbers)):
if odd(numbers[i]):
del numbers[i]
One common suggestion to work around this problem is to iterate over the list in reverse:
for i in range(len(numbers)-1,0,-1):
if odd(numbers[i]):
del numbers[i]
But even better is to use a list comprehension to build a new list to replace the old:
numbers[:] = [n for n in numbers if not odd(n)]

Common pitfall: default arguments are evaluated once:
def x(a, l=[]):
l.append(a)
return l
print x(1)
print x(2)
prints:
[1]
[1, 2]
i.e. you always get the same list.

The very first mistake before you even start: don't be afraid of whitespace.
When you show someone a piece of Python code, they are impressed until you tell them that they have to indent correctly. For some reason, most people feel that a language shouldn't force a certain style on them while all of them will indent the code nonetheless.

my_variable = <something>
...
my_varaible = f(my_variable)
...
use my_variable and thinking it contains the result from f, and not the initial value
Python won't warn you in any way that on the second assignment you misspelled the variable name and created a new one.

Creating a local module with the same name as one from the stdlib. This is almost always done by accident (as reported in this question), but usually results in cryptic error messages.

Promiscuous Exception Handling
This is something that I see a surprising amount in production code and it makes me cringe.
try:
do_something() # do_something can raise a lot errors e.g. files, sockets
except:
pass # who cares we'll just ignore it
Was the exception the one you want suppress, or is it more serious? But there are more subtle cases. That can make you pull your hair out trying to figure out.
try:
foo().bar().baz()
except AttributeError: # baz() may return None or an incompatible *duck type*
handle_no_baz()
The problem is foo or baz could be the culprits too. I think this can be more insidious in that this is idiomatic python where you are checking your types for proper methods. But each method call has chance to return something unexpected and suppress bugs that should be raising exceptions.
Knowing what exceptions a method can throw are not always obvious. For example, urllib and urllib2 use socket which has its own exceptions that percolate up and rear their ugly head when you least expect it.
Exception handling is a productivity boon in handling errors over system level languages like C. But I have found suppressing exceptions improperly can create truly mysterious debugging sessions and take away a major advantage interpreted languages provide.

Related

In simple cases, is it better to except & recover or to avoid the exception?

Consider:
categories = {'foo':[4], 'mer':[2, 9, 0]}
key = 'bar'
value = 5
We could safely append to a list stored in a dictionary in either of the following ways:
Being cautious, we always check whether the list exists before appending to it.
if not somedict.has_key(key):
somedict[key] = []
somedict[key].append(value)
Being direct, we simply clean up if there is an exception.
try:
somedict[key].append(value)
except KeyError:
somedict[key] = [value]
In both cases, the result could be:
{'foo':[4], 'mer':[2, 9, 0], 'bar':[5]}
To restate my question: In simple instances like this, is it better (in terms of style, efficiency, & philosophy) to be cautious or direct?
What you'll find is that your option 1 "being cautious" is often remarkably slow. Also, it's subject to obscure errors because the test you tried to write to "avoid" the exception is incorrect.
What you'll find is that your option 2 "being direct" is often much faster. It's also more likely to be correct, as well as faster and easier for people to read.
Why? Internally, Python often implements things like "contains" or "has_key" as an exception test.
def has_key( self, some_key ):
try:
self[some_key]
except KeyError:
return False
return True
Since this is typically how a has_key type of method is implemented, there's no reason for you code do waste time doing this in addition to what Python will already do.
More fundamentally, there's a correctness issue. Many attempts to prevent or avoid an exception are incomplete are incorrect.
For example, trying to establish if a string is potentially a float-point number is fraught with numerous exceptions and special cases. About the only way to do it correctly is this.
try:
x= float( some_string )
except ValueError:
# not a floating-point value
Just do the algorithm without worrying about "preventing" or "avoiding" exceptions.
In the general case, EFAP ("easier to ask for forgiveness than permission") is preferred in Python. Of course the rule of thumb "exceptions should be for exceptional cases" still holds (if you expect an exception to occur frequently, you propably should "look before you leap") - i.e. it depends. Efficiency-wise, it shouldn't make too much of a difference in most cases - when it does, consider that try blocks without exceptions are cheap and conditions are always checked.
Note that neither is necessary (at least you don't have to do it yourself/epplicitly) some cases, including this example - here, you should just use collections.defaultdict
You don't need a strong, compelling reason to use exceptions--they're not very expensive in Python. Here are some possible reasons to prefer one or the other, for your particular example:
The exception version requires a simpler API. Any container that supports item lookup and assignment (__getitem__ and __setitem__) will work. The non-exception version additionally requires that has_key be implemented.
The exception version may be slightly faster if the key usually exists, since it only requires a single dict lookup. The has_key version requires at least two--one for has_key and one for the actual lookup.
The non-exception version has a more consistent code path: it always puts the value in the array in the same place. By comparison, the exception version has a separate code path for each case.
Unless performance is particularly important (in which case you'd be benchmarking and profiling), none of these are very strong reasons; just use whichever seems more natural.
try is fast enough, except (if it happens) may not be. If the average length of those lists is going to be 1.1, use the check-first method. If it's going to be in the thousands, use try/except. If you are really worried, benchmark the alternatives.
Ensure that you are benchmarking the best alternatives. d.has_key(k) is a slow old has_been; you don't need the attribute lookup and the function call. Use k in d instead. Also use else to save a wasted append on the first trip:
Instead of:
if not somedict.has_key(key):
somedict[key] = []
somedict[key].append(value)
do this:
if key in somedict:
somedict[key].append(value)
else:
somedict[key] = [value]
You can use setdefault for this specific case:
somedict.setdefault(key, []).append(value)
See here: http://docs.python.org/library/stdtypes.html#mapping-types-dict
It depends, for exemple if the key is a paramenter of a function that will be used by an other programer, I would use the second approach, because I can't control the input, and the exception information it's actually usefull for a programer. But if its just a process inside a function and the key it's just some input from a database for exemple, the first approach it's better, then if something goes wrong, maybe show the exception information isn't helpfull at all. Use the exception approach if you want to do someting with the exception information.
EFAP is a good habit to get into for Python.
One reason is that it avoids the race condition if someone wants to use your code in a multithreaded app

Python 3 and static typing

I didn't really pay as much attention to Python 3's development as I would have liked, and only just noticed some interesting new syntax changes. Specifically from this SO answer function parameter annotation:
def digits(x:'nonnegative number') -> "yields number's digits":
# ...
Not knowing anything about this, I thought it could maybe be used for implementing static typing in Python!
After some searching, there seemed to be a lot discussion regarding (entirely optional) static typing in Python, such as that mentioned in PEP 3107, and "Adding Optional Static Typing to Python" (and part 2)
..but, I'm not clear how far this has progressed. Are there any implementations of static typing, using the parameter-annotation? Did any of the parameterised-type ideas make it into Python 3?
Thanks for reading my code!
Indeed, it's not hard to create a generic annotation enforcer in Python. Here's my take:
'''Very simple enforcer of type annotations.
This toy super-decorator can decorate all functions in a given module that have
annotations so that the type of input and output is enforced; an AssertionError is
raised on mismatch.
This module also has a test function func() which should fail and logging facility
log which defaults to print.
Since this is a test module, I cut corners by only checking *keyword* arguments.
'''
import sys
log = print
def func(x:'int' = 0) -> 'str':
'''An example function that fails type checking.'''
return x
# For simplicity, I only do keyword args.
def check_type(*args):
param, value, assert_type = args
log('Checking {0} = {1} of {2}.'.format(*args))
if not isinstance(value, assert_type):
raise AssertionError(
'Check failed - parameter {0} = {1} not {2}.'
.format(*args))
return value
def decorate_func(func):
def newf(*args, **kwargs):
for k, v in kwargs.items():
check_type(k, v, ann[k])
return check_type('<return_value>', func(*args, **kwargs), ann['return'])
ann = {k: eval(v) for k, v in func.__annotations__.items()}
newf.__doc__ = func.__doc__
newf.__type_checked = True
return newf
def decorate_module(module = '__main__'):
'''Enforces type from annotation for all functions in module.'''
d = sys.modules[module].__dict__
for k, f in d.items():
if getattr(f, '__annotations__', {}) and not getattr(f, '__type_checked', False):
log('Decorated {0!r}.'.format(f.__name__))
d[k] = decorate_func(f)
if __name__ == '__main__':
decorate_module()
# This will raise AssertionError.
func(x = 5)
Given this simplicity, it's strange at the first sight that this thing is not mainstream. However, I believe there are good reasons why it's not as useful as it might seem. Generally, type checking helps because if you add integer and dictionary, chances are you made some obvious mistake (and if you meant something reasonable, it's still better to be explicit than implicit).
But in real life you often mix quantities of the same computer type as seen by compiler but clearly different human type, for example the following snippet contains an obvious mistake:
height = 1.75 # Bob's height in meters.
length = len(sys.modules) # Number of modules imported by program.
area = height * length # What's that supposed to mean???
Any human should immediately see a mistake in the above line provided it knows the 'human type' of variables height and length even though it looks to computer as perfectly legal multiplication of int and float.
There's more that can be said about possible solutions to this problem, but enforcing 'computer types' is apparently a half-solution, so, at least in my opinion, it's worse than no solution at all. It's the same reason why Systems Hungarian is a terrible idea while Apps Hungarian is a great one. There's more at the very informative post of Joel Spolsky.
Now if somebody was to implement some kind of Pythonic third-party library that would automatically assign to real-world data its human type and then took care to transform that type like width * height -> area and enforce that check with function annotations, I think that would be a type checking people could really use!
As mentioned in that PEP, static type checking is one of the possible applications that function annotations can be used for, but they're leaving it up to third-party libraries to decide how to do it. That is, there isn't going to be an official implementation in core python.
As far as third-party implementations are concerned, there are some snippets (such as http://code.activestate.com/recipes/572161/), which seem to do the job pretty well.
EDIT:
As a note, I want to mention that checking behavior is preferable to checking type, therefore I think static typechecking is not so great an idea. My answer above is aimed at answering the question, not because I would do typechecking myself in such a way.
"Static typing" in Python can only be implemented so that the type checking is done in run-time, which means it slows down the application. Therefore you don't want that as a generality. Instead you want some of your methods to check it's inputs. This can be easily done with plain asserts, or with decorators if you (mistakenly) think you need it a lot.
There is also an alternative to static type checking, and that is to use an aspect oriented component architecture like The Zope Component Architecture. Instead of checking the type, you adapt it. So instead of:
assert isinstance(theobject, myclass)
you do this:
theobject = IMyClass(theobject)
If theobject already implements IMyClass nothing happens. If it doesn't, an adapter that wraps whatever theobject is to IMyClass will be looked up, and used instead of theobject. If no adapter is found, you get an error.
This combined the dynamicism of Python with the desire to have a specific type in a specific way.
This is not an answer to question directly, but I found out a Python fork that adds static typing: mypy-lang.org, of course one can't rely on it as it's still small endeavor, but interesting.
Sure, static typing seems a bit "unpythonic" and I don't use it all the time. But there are cases (e.g. nested classes, as in domain specific language parsing) where it can really speed up your development.
Then I prefer using beartype explained in this post*. It comes with a git repo, tests and an explanation what it can and what it can't do ... and I like the name ;)
* Please don't pay attention to Cecil's rant about why Python doesn't come with batteries included in this case.

How safe is expression evaluation using eval?

I am building a website where I have a need that user should be able to evaluate some expression based from the value in DB tables, instead of using tools like pyparsing etc, I am thinking of using python itself, and have come up with a solution which is sufficient for my purpose. I am basically using eval to evaluate the expression and passing globals dict with empty __builtins__ so that nothing can be accessed and a locals dict for values from DB, if user will need some functions I can pass those too e.g.
import datetime
def today():
return datetime.datetime.now()
expression = """ first_name.lower() == "anurag" and today().year == 2010 """
print eval(expression, {'__builtins__':{}}, {'first_name':'Anurag', 'today':today})
So my question is how safe it would be , I have three criteria
Can user access current state of my program or table etc someshow?
Can user have access to os level calls?
Can user halt my system by looping or using much memory e.g. by doing range(10*8), in some cases he can e.g 100**1000 etc so 3 is not so much of a problem. i may check such op with tokenize and anyway I will be using GAE so it is not not much of concern.
Edit: IMO this is not the duplicate of Q:661084 because where it ends this one starts, I want to know even with __builtins__ blocked, can user do bad things?
It's completely unsafe to use eval, even with built-ins emptied and blocked -- the attacker can start with a literal, get its __class__, etc, etc, up to object, its __subclasses__, and so forth... basically, Python introspection is just too strong to stand up to a skilled, determined attacker.
ast.literal_eval is safe, if you can live by its limitations...
Certainly it's possible to consume all available memory or create an infinite loop even without the builtins. There are many ways to do it such as 'a'*999999*999999 or to make an infinite loop:
>>> print eval('[[x.append(a) for a in x] for x in [[0]]]',
... {'__builtins__':{}}, {'first_name':'Anurag', 'today':today})
As for 1) and 2), I'm not sure but it looks risky. Here is one thing that I tried that I thought would work, but it seems that someone else already considered that line of attack and blocked it:
>>> import datetime
>>> def today():
>>> return datetime.datetime.now()
>>>
>>> print eval('today.func_globals', {'__builtins__':{}}, {'first_name':'Anurag', 'today':today})
RuntimeError: restricted attribute
I was half expecting to get this instead:
{'__builtins__': <module '__builtin__' (built-in)>, ...
So I think it's probably a bad idea. You only need one tiny hole and you give access to your entire system. Have you considered other methods that don't use eval? What is wrong with them?
It is possible to get create and invoke any class defined in the program, which includes ones that can exit the Python interpreter. In addition, you can create and execute arbitrary strings of bytecode, which can segfault the interpreter. See Eval really is dangerous for all the details.

What are the important language features (idioms) of Python to learn early on [duplicate]

This question already has answers here:
The Zen of Python [closed]
(22 answers)
Python: Am I missing something? [closed]
(16 answers)
Closed 8 years ago.
I would be interested in knowing what the StackOverflow community thinks are the important language features (idioms) of Python. Features that would define a programmer as Pythonic.
Python (pythonic) idiom - "code expression" that is natural or characteristic to the language Python.
Plus, Which idioms should all Python programmers learn early on?
Thanks in advance
Related:
Code Like a Pythonista: Idiomatic Python
Python: Am I missing something?
Python is a language that can be described as:
"rules you can fit in the
palm of your hand with a huge bag of
hooks".
Nearly everything in python follows the same simple standards. Everything is accessible, changeable, and tweakable. There are very few language level elements.
Take for example, the len(data) builtin function. len(data) works by simply checking for a data.__len__() method, and then calls it and returns the value. That way, len() can work on any object that implements a __len__() method.
Start by learning about the types and basic syntax:
Dynamic Strongly Typed Languages
bool, int, float, string, list, tuple, dict, set
statements, indenting, "everything is an object"
basic function definitions
Then move on to learning about how python works:
imports and modules (really simple)
the python path (sys.path)
the dir() function
__builtins__
Once you have an understanding of how to fit pieces together, go back and cover some of the more advanced language features:
iterators
overrides like __len__ (there are tons of these)
list comprehensions and generators
classes and objects (again, really simple once you know a couple rules)
python inheritance rules
And once you have a comfort level with these items (with a focus on what makes them pythonic), look at more specific items:
Threading in python (note the Global Interpreter Lock)
context managers
database access
file IO
sockets
etc...
And never forget The Zen of Python (by Tim Peters)
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
This page covers all the major python idioms: http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
An important idiom in Python is docstrings.
Every object has a __doc__ attribute that can be used to get help on that object. You can set the __doc__ attribute on modules, classes, methods, and functions like this:
# this is m.py
""" module docstring """
class c:
"""class docstring"""
def m(self):
"""method docstring"""
pass
def f(a):
"""function f docstring"""
return
Now, when you type help(m), help(m.f) etc. it will print the docstring as a help message.
Because it's just part of normal object introspection this can be used by documention generating systems like epydoc or used for testing purposes by unittest.
It can also be put to more unconventional (i.e. non-idiomatic) uses such as grammars in Dparser.
Where it gets even more interesting to me is that, even though doc is a read-only attribute on most objects, you can use them anywhere like this:
x = 5
""" pseudo docstring for x """
and documentation tools like epydoc can pick them up and format them properly (as opposed to a normal comment which stays inside the code formatting.
Decorators get my vote. Where else can you write something like:
def trace(num_args=0):
def wrapper(func):
def new_f(*a,**k):
print_args = ''
if num_args > 0:
print_args = str.join(',', [str(x) for x in a[0:num_args]])
print('entering %s(%s)' %(f.__name__,print_args))
rc = f(*a,**k)
if rc is not None:
print('exiting %s(%s)=%s' %(f.__name__,str(rc)))
else:
print('exiting %s(%s)' %(f.__name__))
return rc
return new_f
return wrapper
#trace(1)
def factorial(n):
if n < 2:
return 1
return n * factorial(n-1)
factorial(5)
and get output like:
entering factorial(5)
entering factorial(4)
entering factorial(3)
entering factorial(2)
entering factorial(1)
entering factorial(0)
exiting factorial(0)=1
exiting factorial(1)=1
exiting factorial(2)=2
exiting factorial(3)=6
exiting factorial(4)=24
exiting factorial(5)=120
Everything connected to list usage.
Comprehensions, generators, etc.
Personally, I really like Python syntax defining code blocks by using indentation, and not by the words "BEGIN" and "END" (as in Microsoft's Basic and Visual Basic - I don't like these) or by using left- and right-braces (as in C, C++, Java, Perl - I like these).
This really surprised me because, although indentation has always been very important to me, I didn't make to much "noise" about it - I lived with it, and it is considered a skill to be able to read other peoples, "spaghetti" code. Furthermore, I never heard another programmer suggest making indentation a part of a language. Until Python! I only wish I had realized this idea first.
To me, it is as if Python's syntax forces you to write good, readable code.
Okay, I'll get off my soap-box. ;-)
From a more advanced viewpoint, understanding how dictionaries are used internally by Python. Classes, functions, modules, references are all just properties on a dictionary. Once this is understood it's easy to understand how to monkey patch and use the powerful __gettattr__, __setattr__, and __call__ methods.
Here's one that can help. What's the difference between:
[ foo(x) for x in range(0, 5) ][0]
and
( foo(x) for x in range(0, 5) ).next()
answer:
in the second example, foo is called only once. This may be important if foo has a side effect, or if the iterable being used to construct the list is large.
Two things that struck me as especially Pythonic were dynamic typing and the various flavors of lists used in Python, particularly tuples.
Python's list obsession could be said to be LISP-y, but it's got its own unique flavor. A line like:
return HandEvaluator.StraightFlush, (PokerCard.longFaces[index + 4],
PokerCard.longSuits[flushSuit]), []
or even
return False, False, False
just looks like Python and nothing else. (Technically, you'd see the latter in Lua as well, but Lua is pretty Pythonic in general.)
Using string substitutions:
name = "Joe"
age = 12
print "My name is %s, I am %s" % (name, age)
When I'm not programming in python, that simple use is what I miss most.
Another thing you cannot start early enough is probably testing. Here especially doctests are a great way of testing your code by explaining it at the same time.
doctests are simple text file containing an interactive interpreter session plus text like this:
Let's instantiate our class::
>>> a=Something(text="yes")
>>> a.text
yes
Now call this method and check the results::
>>> a.canify()
>>> a.text
yes, I can
If e.g. a.text returns something different the test will fail.
doctests can be inside docstrings or standalone textfiles and are executed by using the doctests module. Of course the more known unit tests are also available.
I think that tutorials online and books only talk about doing things, not doing things in the best way. Along with the python syntax i think that speed in some cases is important.
Python provides a way to benchmark functions, actually two!!
One way is to use the profile module, like so:
import profile
def foo(x, y, z):
return x**y % z # Just an example.
profile.run('foo(5, 6, 3)')
Another way to do this is to use the timeit module, like this:
import timeit
def foo(x, y, z):
return x**y % z # Can also be 'pow(x, y, z)' which is way faster.
timeit.timeit('foo(5, 6, 3)', 'from __main__ import *', number = 100)
# timeit.timeit(testcode, setupcode, number = number_of_iterations)

Why should functions always return the same type?

I read somewhere that functions should always return only one type
so the following code is considered as bad code:
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
return None
I guess the better solution would be
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
return ()
Wouldn't it be cheaper memory wise to return a None then to create a new empty tuple or is this time difference too small to notice even in larger projects?
Why should functions return values of a consistent type? To meet the following two rules.
Rule 1 -- a function has a "type" -- inputs mapped to outputs. It must return a consistent type of result, or it isn't a function. It's a mess.
Mathematically, we say some function, F, is a mapping from domain, D, to range, R. F: D -> R. The domain and range form the "type" of the function. The input types and the result type are as essential to the definition of the function as is the name or the body.
Rule 2 -- when you have a "problem" or can't return a proper result, raise an exception.
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
raise Exception( "oh, dear me." )
You can break the above rules, but the cost of long-term maintainability and comprehensibility is astronomical.
"Wouldn't it be cheaper memory wise to return a None?" Wrong question.
The point is not to optimize memory at the cost of clear, readable, obvious code.
It's not so clear that a function must always return objects of a limited type, or that returning None is wrong. For instance, re.search can return a _sre.SRE_Match object or a NoneType object:
import re
match=re.search('a','a')
type(match)
# <type '_sre.SRE_Match'>
match=re.search('a','b')
type(match)
# <type 'NoneType'>
Designed this way, you can test for a match with the idiom
if match:
# do xyz
If the developers had required re.search to return a _sre.SRE_Match object, then
the idiom would have to change to
if match.group(1) is None:
# do xyz
There would not be any major gain by requiring re.search to always return a _sre.SRE_Match object.
So I think how you design the function must depend on the situation and in particular, how you plan to use the function.
Also note that both _sre.SRE_Match and NoneType are instances of object, so in a broad sense they are of the same type. So the rule that "functions should always return only one type" is rather meaningless.
Having said that, there is a beautiful simplicity to functions that return objects which all share the same properties. (Duck typing, not static typing, is the python way!) It can allow you to chain together functions: foo(bar(baz))) and know with certainty the type of object you'll receive at the other end.
This can help you check the correctness of your code. By requiring that a function returns only objects of a certain limited type, there are fewer cases to check. "foo always returns an integer, so as long as an integer is expected everywhere I use foo, I'm golden..."
Best practice in what a function should return varies greatly from language to language, and even between different Python projects.
For Python in general, I agree with the premise that returning None is bad if your function generally returns an iterable, because iterating without testing becomes impossible. Just return an empty iterable in this case, it will still test False if you use Python's standard truth testing:
ret_val = x()
if ret_val:
do_stuff(ret_val)
and still allow you to iterate over it without testing:
for child in x():
do_other_stuff(child)
For functions that are likely to return a single value, I think returning None is perfectly acceptable, just document that this might happen in your docstring.
Here are my thoughts on all that and I'll try to also explain why I think that the accepted answer is mostly incorrect.
First of all programming functions != mathematical functions. The closest you can get to mathematical functions is if you do functional programming but even then there are plenty of examples that say otherwise.
Functions do not have to have input
Functions do not have to have output
Functions do not have to map input to output (because of the previous two bullet points)
A function in terms of programming is to be viewed simply as a block of memory with a start (the function's entry point), a body (empty or otherwise) and exit point (one or multiple depending on the implementation) all of which are there for the purpose of reusing code that you've written. Even if you don't see it a function always "returns" something. This something is actually the address of next statement right after the function call. This is something you will see in all of its glory if you do some really low-level programming with an Assembly language (I dare you to go the extra mile and do some machine code by hand like Linus Torvalds who ever so often mentions this during his seminars and interviews :D). In addition you can also take some input and also spit out some output. That is why
def foo():
pass
is a perfectly correct piece of code.
So why would returning multiple types be bad? Well...It isn't at all unless you abuse it. This is of course a matter of poor programming skills and/or not knowing what the language you're using can do.
Wouldn't it be cheaper memory wise to return a None then to create a new empty tuple or is this time difference too small to notice even in larger projects?
As far as I know - yes, returning a NoneType object would be much cheaper memory-wise. Here is a small experiment (returned values are bytes):
>> sys.getsizeof(None)
16
>> sys.getsizeof(())
48
Based on the type of object you are using as your return value (numeric type, list, dictionary, tuple etc.) Python manages the memory in different ways including the initially reserved storage.
However you have to also consider the code that is around the function call and how it handles whatever your function returns. Do you check for NoneType? Or do you simply check if the returned tuple has length of 0? This propagation of the returned value and its type (NoneType vs. empty tuple in your case) might actually be more tedious to handle and blow up in your face. Don't forget - the code itself is loaded into memory so if handling the NoneType requires too much code (even small pieces of code but in a large quantity) better leave the empty tuple, which will also avoid confusion in the minds of people using your function and forgetting that it actually returns 2 types of values.
Speaking of returning multiple types of value this is the part where I agree with the accepted answer (but only partially) - returning a single type makes the code more maintainable without a doubt. It's much easier to check only for type A then A, B, C, ... etc.
However Python is an object-oriented language and as such inheritance, abstract classes etc. and all that is part of the whole OOP shenanigans comes into play. It can go as far as even generating classes on-the-fly, which I have discovered a few months ago and was stunned (never seen that stuff in C/C++).
Side note: You can read a little bit about metaclasses and dynamic classes in this nice overview article with plenty of examples.
There are in fact multiple design patterns and techniques that wouldn't even exists without the so called polymorphic functions. Below I give you two very popular topics (can't find a better way to summarize both in a single term):
Duck typing - often part of the dynamic typing languages which Python is a representative of
Factory method design pattern - basically it's a function that returns various objects based on the input it receives.
Finally whether your function returns one or multiple types is totally based on the problem you have to solve. Can this polymorphic behaviour be abused? Sure, like everything else.
I personally think it is perfectly fine for a function to return a tuple or None. However, a function should return at most 2 different types and the second one should be a None. A function should never return a string and list for example.
If x is called like this
foo, bar = x(foo)
returning None would result in a
TypeError: 'NoneType' object is not iterable
if 'bar' is not in foo.
Example
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
return None
foo, bar = x(["foo", "bar", "baz"])
print foo, bar
foo, bar = x(["foo", "NOT THERE", "baz"])
print foo, bar
This results in:
['foo', 'bar', 'baz'] bar
Traceback (most recent call last):
File "f.py", line 9, in <module>
foo, bar = x(["foo", "NOT THERE", "baz"])
TypeError: 'NoneType' object is not iterable
Premature optimization is the root of all evil. The minuscule efficiency gains might be important, but not until you've proven that you need them.
Whatever your language: a function is defined once, but tends to be used at any number of places. Having a consistent return type (not to mention documented pre- and postconditions) means you have to spend more effort defining the function, but you simplify the usage of the function enormously. Guess whether the one-time costs tend to outweigh the repeated savings...?

Categories