When are function-local variables are created? For example, in the following code is dictionary d1 created each time the function f1 is called or only once when it is compiled?
def f1():
d1 = {1: 2, 3: 4}
return id(d1)
d2 = {1: 2, 3: 4}
def f2():
return id(d2)
Is it faster in general to define a dictionary within function scope or to define it globally (assuming the dictionary is used only in that function). I know it is slower to look up global symbols than local ones, but what if the dictionary is large?
Much python code I've seen seems to define these dictionaries globally, which would seem not to be optimal. But also in the case where you have a class with multiple 'encoding' methods, each with a unique (large-ish) lookup dictionary, it's awkward to have the code and data spread throughout the file.
Local variables are created when assigned to, i.e., during the execution of the function.
If every execution of the function needs (and does not modify!-) the same dict, creating it once, before the function is ever called, is faster. As an alternative to a global variable, a fake argument with a default value is even (marginally) faster, since it's accessed as fast as a local variable but also created only once (at def time):
def f(x, y, _d={1:2, 3:4}):
I'm using the name _d, with a leading underscore, to point out that it's meant as a private implementation detail of the function. Nevertheless it's a bit fragile, as a bumbling calles might accidentally and erroneously pass three arguments (the third one would be bound as _d within the function, likely causing bugs), or the function's body might mistakenly alter _d, so this is only recommended as an optimization to use when profiling reveals it's really needed. A global dict is also subject to erroneous alterations, so, even though it's faster than buiding a local dict afresh on every call, you might still pick the latter possibility to achieve higher robustness (although the global dict solution, plus good unit tests to catch any "oops"es in the function, are the recommended alternative;-).
If you look at the disassembly with the dis module you'll see that the creation and filling of d1 is done on every execution. Given that dictionaries are mutable this is unlikely to change anytime soon, at least until good escape analysis comes to Python virtual machines. On the other hand lookup of global constants will get speculatively optimized with the next generation of Python VM's such as unladen-swallow (the speculation part is that they are constant).
Speed is relative to what you're doing. If you're writing a database-intensive application, I doubt your application is going to suffer one way or another from your choice of global versus local variables. Use a profiler to be sure. ;-)
As Alex noted, the locals are initialized when the function is called. As easy way to demonstrate this for yourself:
import random
def f():
d = [random.randint(1, 100), random.randint(100, 1000)]
print(d)
f()
f()
f()
Related
I often find myself writing code that takes a set of parameters, does some calculalation and then returns the result to another function, which also requires some of the parameters to do some other manipulation, and so on. I end up with a lot of functions where I have to pass around parameters, such as f(x, y, N, epsilon) which then calls g(y, N, epsilon) and so on. All the while I have to include the parameters N and epsilon in every function call and not lose track of them, which is quite tedious.
What I want is to prevent this endlessly passing around of parameters, while still being able to, within a single for loop, to change some of these parameters, e.g.
for epsilon in [1,2,3]:
f(..., epsilon)
I usually have around 10 parameters to keep track of (these are physics problems) and do not know beforehand which I have to vary and which I can keep to a default.
The options I thought of are
Creating a global settings = {'epsilon': 1, 'N': 100} object, which is used by every function. However, I have always been told that putting stuff in the global namespace is bad. I am also afraid that this will not play nice with modifying the settings object within the for loop.
Passing around a settings object as a parameter in every function. This means that I can keep track of the object as it passed around, and makes it play nice with the for loop. However, it is still passing around, which seems stupid to me.
Is there another, third, option that I have not considered? Most of the solution I can find are for the case where your settings are only set once, as you start up the program, and are then unchanged.
I believe this is primarily a matter of preference among coding styles. I'm going to offer my opinion on the ones you posted as well as some other alternatives.
First, creating a global settings variable is not bad by itself. Problems arise if global settings are treated as mutable state rather than being immutable. As you want to modify parameters on the fly it's a dangerous option.
Second, passing the settings around is quite common in functional languages and it's not stupid although it can look clumsy if you're not used to it. The advantage of passing state this way is that you can isolate changes in the dictionary settings you pass around without corrupting the original one, the bad thing is that python messes a bit with immutability because of shared references and you can end up making many deepcopy's to prevent data races, which is totally inefficient. Unless your dict is not nested I would not go that way.
settings = {'epsilon': 1, 'N': 100}
# Unsafe but OK for plain dict
for x in [1, 2, 3]:
f(..., dict(zip(settings, ('epsilon', x))))
# Safe way.
ephimeral = copy.deepcopy(settings)
for x in [1, 2, 3]:
ephimeral['epsilon'] = x
f(..., ephimeral)
Now, there's another option which kind of mixes the other two, probably the one I'll take. Make a global immutable settings variable and write your functions signatures to accept optional keyword arguments. This way you have the advantages of both, ability to avoid continuous variable passing and ability to modify on the fly values without data races:
def f(..., **kwargs):
epsilon = kwargs.get('epsilon', settings['epsilon'])
...
You may also create a function that encapsulates the aforementioned behavior to decouple variable extraction from function definition. There are many possibilities.
Hope this helps.
This question already has answers here:
"Least Astonishment" and the Mutable Default Argument
(33 answers)
Closed 6 months ago.
I had a very difficult time with understanding the root cause of a problem in an algorithm. Then, by simplifying the functions step by step I found out that evaluation of default arguments in Python doesn't behave as I expected.
The code is as follows:
class Node(object):
def __init__(self, children = []):
self.children = children
The problem is that every instance of Node class shares the same children attribute, if the attribute is not given explicitly, such as:
>>> n0 = Node()
>>> n1 = Node()
>>> id(n1.children)
Out[0]: 25000176
>>> id(n0.children)
Out[0]: 25000176
I don't understand the logic of this design decision? Why did Python designers decide that default arguments are to be evaluated at definition time? This seems very counter-intuitive to me.
The alternative would be quite heavyweight -- storing "default argument values" in the function object as "thunks" of code to be executed over and over again every time the function is called without a specified value for that argument -- and would make it much harder to get early binding (binding at def time), which is often what you want. For example, in Python as it exists:
def ack(m, n, _memo={}):
key = m, n
if key not in _memo:
if m==0: v = n + 1
elif n==0: v = ack(m-1, 1)
else: v = ack(m-1, ack(m, n-1))
_memo[key] = v
return _memo[key]
...writing a memoized function like the above is quite an elementary task. Similarly:
for i in range(len(buttons)):
buttons[i].onclick(lambda i=i: say('button %s', i))
...the simple i=i, relying on the early-binding (definition time) of default arg values, is a trivially simple way to get early binding. So, the current rule is simple, straightforward, and lets you do all you want in a way that's extremely easy to explain and understand: if you want late binding of an expression's value, evaluate that expression in the function body; if you want early binding, evaluate it as the default value of an arg.
The alternative, forcing late binding for both situation, would not offer this flexibility, and would force you to go through hoops (such as wrapping your function into a closure factory) every time you needed early binding, as in the above examples -- yet more heavy-weight boilerplate forced on the programmer by this hypothetical design decision (beyond the "invisible" ones of generating and repeatedly evaluating thunks all over the place).
In other words, "There should be one, and preferably only one, obvious way to do it [1]": when you want late binding, there's already a perfectly obvious way to achieve it (since all of the function's code is only executed at call time, obviously everything evaluated there is late-bound); having default-arg evaluation produce early binding gives you an obvious way to achieve early binding as well (a plus!-) rather than giving TWO obvious ways to get late binding and no obvious way to get early binding (a minus!-).
[1]: "Although that way may not be obvious at first unless you're Dutch."
The issue is this.
It's too expensive to evaluate a function as an initializer every time the function is called.
0 is a simple literal. Evaluate it once, use it forever.
int is a function (like list) that would have to be evaluated each time it's required as an initializer.
The construct [] is literal, like 0, that means "this exact object".
The problem is that some people hope that it to means list as in "evaluate this function for me, please, to get the object that is the initializer".
It would be a crushing burden to add the necessary if statement to do this evaluation all the time. It's better to take all arguments as literals and not do any additional function evaluation as part of trying to do a function evaluation.
Also, more fundamentally, it's technically impossible to implement argument defaults as function evaluations.
Consider, for a moment the recursive horror of this kind of circularity. Let's say that instead of default values being literals, we allow them to be functions which are evaluated each time a parameter's default values are required.
[This would parallel the way collections.defaultdict works.]
def aFunc( a=another_func ):
return a*2
def another_func( b=aFunc ):
return b*3
What is the value of another_func()? To get the default for b, it must evaluate aFunc, which requires an eval of another_func. Oops.
Of course in your situation it is difficult to understand. But you must see, that evaluating default args every time would lay a heavy runtime burden on the system.
Also you should know, that in case of container types this problem may occur -- but you could circumvent it by making the thing explicit:
def __init__(self, children = None):
if children is None:
children = []
self.children = children
The workaround for this, discussed here (and very solid), is:
class Node(object):
def __init__(self, children = None):
self.children = [] if children is None else children
As for why look for an answer from von Löwis, but it's likely because the function definition makes a code object due to the architecture of Python, and there might not be a facility for working with reference types like this in default arguments.
I thought this was counterintuitive too, until I learned how Python implements default arguments.
A function's an object. At load time, Python creates the function object, evaluates the defaults in the def statement, puts them into a tuple, and adds that tuple as an attribute of the function named func_defaults. Then, when a function is called, if the call doesn't provide a value, Python grabs the default value out of func_defaults.
For instance:
>>> class C():
pass
>>> def f(x=C()):
pass
>>> f.func_defaults
(<__main__.C instance at 0x0298D4B8>,)
So all calls to f that don't provide an argument will use the same instance of C, because that's the default value.
As far as why Python does it this way: well, that tuple could contain functions that would get called every time a default argument value was needed. Apart from the immediately obvious problem of performance, you start getting into a universe of special cases, like storing literal values instead of functions for non-mutable types to avoid unnecessary function calls. And of course there are performance implications galore.
The actual behavior is really simple. And there's a trivial workaround, in the case where you want a default value to be produced by a function call at runtime:
def f(x = None):
if x == None:
x = g()
This comes from python's emphasis on syntax and execution simplicity. a def statement occurs at a certain point during execution. When the python interpreter reaches that point, it evaluates the code in that line, and then creates a code object from the body of the function, which will be run later, when you call the function.
It's a simple split between function declaration and function body. The declaration is executed when it is reached in the code. The body is executed at call time. Note that the declaration is executed every time it is reached, so you can create multiple functions by looping.
funcs = []
for x in xrange(5):
def foo(x=x, lst=[]):
lst.append(x)
return lst
funcs.append(foo)
for func in funcs:
print "1: ", func()
print "2: ", func()
Five separate functions have been created, with a separate list created each time the function declaration was executed. On each loop through funcs, the same function is executed twice on each pass through, using the same list each time. This gives the results:
1: [0]
2: [0, 0]
1: [1]
2: [1, 1]
1: [2]
2: [2, 2]
1: [3]
2: [3, 3]
1: [4]
2: [4, 4]
Others have given you the workaround, of using param=None, and assigning a list in the body if the value is None, which is fully idiomatic python. It's a little ugly, but the simplicity is powerful, and the workaround is not too painful.
Edited to add: For more discussion on this, see effbot's article here: http://effbot.org/zone/default-values.htm, and the language reference, here: http://docs.python.org/reference/compound_stmts.html#function
I'll provide a dissenting opinion, by addessing the main arguments in the other posts.
Evaluating default arguments when the function is executed would be bad for performance.
I find this hard to believe. If default argument assignments like foo='some_string' really add an unacceptable amount of overhead, I'm sure it would be possible to identify assignments to immutable literals and precompute them.
If you want a default assignment with a mutable object like foo = [], just use foo = None, followed by foo = foo or [] in the function body.
While this may be unproblematic in individual instances, as a design pattern it's not very elegant. It adds boilerplate code and obscures default argument values. Patterns like foo = foo or ... don't work if foo can be an object like a numpy array with undefined truth value. And in situations where None is a meaningful argument value that may be passed intentionally, it can't be used as a sentinel and this workaround becomes really ugly.
The current behaviour is useful for mutable default objects that should be shared accross function calls.
I would be happy to see evidence to the contrary, but in my experience this use case is much less frequent than mutable objects that should be created anew every time the function is called. To me it also seems like a more advanced use case, whereas accidental default assignments with empty containers are a common gotcha for new Python programmers. Therefore, the principle of least astonishment suggests default argument values should be evaluated when the function is executed.
In addition, it seems to me that there exists an easy workaround for mutable objects that should be shared across function calls: initialise them outside the function.
So I would argue that this was a bad design decision. My guess is that it was chosen because its implementation is actually simpler and because it has a valid (albeit limited) use case. Unfortunately, I don't think this will ever change, since the core Python developers want to avoid a repeat of the amount of backwards incompatibility that Python 3 introduced.
Python function definitions are just code, like all the other code; they're not "magical" in the way that some languages are. For example, in Java you could refer "now" to something defined "later":
public static void foo() { bar(); }
public static void main(String[] args) { foo(); }
public static void bar() {}
but in Python
def foo(): bar()
foo() # boom! "bar" has no binding yet
def bar(): pass
foo() # ok
So, the default argument is evaluated at the moment that that line of code is evaluated!
Because if they had, then someone would post a question asking why it wasn't the other way around :-p
Suppose now that they had. How would you implement the current behaviour if needed? It's easy to create new objects inside a function, but you cannot "uncreate" them (you can delete them, but it's not the same).
A pure function is a function similar to a Mathematical function, where there is no interaction with the "Real world" nor side-effects. From a more practical point of view, it means that a pure function can not:
Print or otherwise show a message
Be random
Depend on system time
Change global variables
And others
All this limitations make it easier to reason about pure functions than non-pure ones. The majority of the functions should then be pure so that the program can have less bugs.
In languages with a huge type-system like Haskell the reader can know right from the start if a function is or is not pure, making the successive reading easier.
In Python this information may be emulated by a #pure decorator put on top of the function. I would also like that decorator to actually do some validation work. My problem lies in the implementation of such a decorator.
Right now I simply look the source code of the function for buzzwords such as global or random or print and complains if it finds one of them.
import inspect
def pure(function):
source = inspect.getsource(function)
for non_pure_indicator in ('random', 'time', 'input', 'print', 'global'):
if non_pure_indicator in source:
raise ValueError("The function {} is not pure as it uses `{}`".format(
function.__name__, non_pure_indicator))
return function
However it feels like a weird hack, that may or may not work depending on your luck, could you please help me in writing a better decorator?
I kind of see where you are coming from but I don't think this can work. Let's take a simple example:
def add(a,b):
return a + b
So this probably looks "pure" to you. But in Python the + here is an arbitrary function which can do anything, just depending on the bindings in force when it is called. So that a + b can have arbitrary side effects.
But it's even worse than that. Even if this is just doing standard integer + then there's more 'impure' stuff going on.
The + is creating a new object. Now if you are sure that only the caller has a reference to that new object then there is a sense in which you can think of this as a pure function. But you can't be sure that, during the creation process of that object, no reference to it leaked out.
For example:
class RegisteredNumber(int):
numbers = []
def __new__(cls,*args,**kwargs):
self = int.__new__(cls,*args,**kwargs)
self.numbers.append(self)
return self
def __add__(self,other):
return RegisteredNumber(super().__add__(other))
c = RegisteredNumber(1) + 2
print(RegisteredNumber.numbers)
This will show that the supposedly pure add function has actually changed the state of the RegisteredNumber class. This is not a stupidly contrived example: in my production code base we have classes which track each created instance, for example, to allow access via key.
The notion of purity just doesn't make much sense in Python.
(not an answer, but too long for a comment)
So if a function can return different values for the same set of arguments, it is not pure?
Remember that functions in Python are objects, so you want to check the purity of an object...
Take this example:
def foo(x):
ret, foo.x = x*x+foo.x, foo.x+1
return ret
foo.x=0
calling foo(3) repeatedly gives:
>>> foo(3)
9
>>> foo(3)
10
>>> foo(3)
11
...
Moreover, reading globals does not require to use the global statement, or the global() builtin inside your function. Global variables might change somewhere else, affecting the purity of your function.
All the above situation might be difficult to detect at runtime.
If I do this:
newvar = raw_input()
globals()[newvar] = 4
It is clear that the resulting variable is created at runtime, simply because it's the only possibility. However, if I do this:
globals()['y']=3
It seems that y is also created at runtime. Why is it the case? Where does the dynamic behavior come from?
PS: I am aware that this is a bad practice, I just want to understand.
Your module's (or exec context's, etc.) globals are a dict, and globals() just returns that dict. After that, the ['y'] = 3 part is just like any other dictionary assignment.
If you're asking why Python doesn't optimize this into a static assignment… well, think about what it would have to do.
First, detecting that 'y' is a literal is pretty easy; that information is right there in the AST.
But detecting that the dict is your module's global dictionary is a lot harder. globals isn't a keyword or anything else magical, it's just a regular function in builtins. You could hide it with a global, nonlocal, or local name, or monkeypatch builtins, or even replace the builtins for your globals so it's not accessible. So, it would have to do sufficient analysis to determine that there's no way the name lookup on globals could possibly return anything but the appropriate global dict.
And, not only that, in order to make this useful, the language would have to require every implementation to make the same optimization. Otherwise, you could get different semantics from some programs based on whether the optimization took place.
It's also worth keeping in mind that CPython doesn't do anything beyond basic peephole optimization, so you'd have to build the infrastructure for a more complicated optimizer from scratch just to add this one minor change.
On top of that, references to the same global dictionary are stored all over the place. So, even with this optimization, you could still trick Python just as easily:
g = globals()
g['y'] = 3
globals().__getitem__('globals')()['y'] = 3
def f(): pass
f.__globals__['y'] = 3
inspect.currentframe().f_globals['y'] = 3
I've just run into the following code, it's working fine but it seems weird to me since it's not even a closure, I'm wondering if it's the right way to code in terms of performance or best practices, or should all this be replaced with a regular for loop with all the logic inside ?
mylist = [
{'one': 20,
'two': 4},
{'one': -6,
'two': 64},
{'one': 18,
'two': 1},
{'one': 16,
'two': 100},
# ...
]
def business_function(a_list):
def compute_function(row):
"""
suppose some more complex computations + appending extra values
than this dummy example
"""
row['total'] = row['one'] + row['two']
return row
def filter_function(item):
"""
suppose some complex logic here
"""
return item['one'] > 5
# suppose there is some code here ...
filtered_list = [compute_function(item) for item in a_list if filter_function(item)]
# and some more code here ...
return filtered_list
print business_function(mylist)
I see no problems with using locally-scoped functions like these.
Unless the outer function is going to be called a lot the performance impact would be minimal; the code for both functions is already compiled when the outer function is being called, for instance. All that happens extra is that the code object constant is loaded, attached to a function and that function is stored in a local variable.
By keeping them locally scoped you make it abundantly clear that their utility is to the business_function scope only.
I'm going to say no, simply because there are better ways to do what this method desires.
If to prevent namespace collision, put the functions in a separate module.
If to associate with a single function, put all of them in a separate module.
If to slow down execution, use a sleep function instead.
Otherwise just make them normal functions.
There's a minor downside to this, in that the inner function objects will be created each time the enclosing function is called, which is a small performance penalty. However, this is rarely if ever a problem, and the improved encapsulation of the code may make it worthwhile.
An alternative would be to create a class, but that's not going to reduce overhead.
I don't like this use of nested definitions. The author maybe did this to improve readability or to prevent other people from using his private functions.
If the author wanted to "mark" these functions as private he should have prefixed their names with an underscore. In this way he could also have reused these functions in some other part of his code, without copying them.
If he did this to improve readability... well, why not putting them outside that function?
If they did some real computation and filtering, then they'd deserve to be "top level" functions with their own documentation and comments.
Probably closure should only be used in decorators, or some other really rare case.
If the functions don't use local variables of the outer function and they are not trivial one-liners then I'd put them at the global scope.
Otherwise a reader of your code is forced to read all function definitions (in case they are closures and use/change outer local variables) to understand what the outer function does.
If they are defined outside then a name and possibly the corresponding docstring might be enough to understand their roles in the outer function.