Coming from much less dynamic C++, I have some trouble understanding the behaviour of this Python (2.7) code.
Note: I am aware that this is bad programming style / evil, but I would like to understand it non the less.
vals = [1,2,3]
def f():
vals[0] = 5
print 'inside', vals
print 'outside', vals
f()
print 'outside', vals
This code runs without error, and f manipulates the (seemingly) global list. This is contrary to my prior understanding that global variables that are to be manipulated (and not only read) in a function must be declared as global ....
On the other hand, if I replace vals[0] = 5 with vals += [5,6], execution fails with an UnboundLocalError unless I add a global vals to f. This is what I would have expected to happen in the first case as well.
Could you explain this behaviour?
Why can I manipulate vals in the first case? Why does the second type of manipulation fail while the first does not?
Update:
It was remarked in a comment that vals.extend(...) works without global. This adds to my confusion - why is += treated differently from a call to extend?
global is only needed when you are trying to change the object which the variable references. Because vals[0] = 5 changes the actual object rather than the reference, no error is raised. However, with vals += [5, 6], the interpreter tries to find a local variable because it can't change the global variable.
The confusing thing is that using the += operator with list modifies the original list, like vals[0] = 5. And whereas vals += [5, 6] fails, vals.extend([5, 6]) works. We can enlist the help of dis.dis to lend us some clues.
>>> def a(): v[0] = 1
>>> def b(): v += [1]
>>> def c(): v.extend([1])
>>> import dis
>>> dis.dis(a)
1 0 LOAD_CONST 1 (1)
3 LOAD_GLOBAL 0 (v)
6 LOAD_CONST 2 (0)
9 STORE_SUBSCR
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>> dis.dis(b)
1 0 LOAD_FAST 0 (v)
3 LOAD_CONST 1 (1)
6 BUILD_LIST 1
9 INPLACE_ADD
10 STORE_FAST 0 (v)
13 LOAD_CONST 0 (None)
16 RETURN_VALUE
d
>>> dis.dis(c)
1 0 LOAD_GLOBAL 0 (v)
3 LOAD_ATTR 1 (extend)
6 LOAD_CONST 1 (1)
9 BUILD_LIST 1
12 CALL_FUNCTION 1
15 POP_TOP
16 LOAD_CONST 0 (None)
19 RETURN_VALUE
We can see that functions a and c use LOAD_GLOBAL, whereas b tries to use LOAD_FAST. We can see now why using += won't work - the interpreter tries to load v as a local variable because of it's default behaviour with in-place addition. Because it can't know whether v is a list or not, it essentially assumes that the line means the same as v = v + [1].
global is needed when you want to assign to a variable in the outer scope. If you don't use global, Python will consider vals as a local variable when doing assignments.
+= is an assignment (an augmented assignment) and vals += [5, 6] is equivalent to reading vals, then append [5, 6] to that value and assign the resulting list back to the original vals. Because vals += [5,6] has no global statement, Python sees the assignment and treats vals as local. You didn't create a local variable called vals but you try to append to it and from here the UnboundLocalError.
But for reading it is not necessary to use global. The variable will be looked up locally first then, if it's not found in the local scope, it's looked up in the outer scope and so on. And since you are dealing with a reference type you get back a reference when you do the read. You can change the content of the object trough that reference.
That's why .extend() works (because it's called on the reference and acts on the object itself) while vals += [5, 6] fails (because vals is neither local nor marked global).
Here is a modified example to try out (using a local vals clears the UnboundLocalError):
vals = [1, 2, 3]
def f():
vals = []
vals += [5,6]
print 'inside', vals
print 'outside', vals
f()
print 'outside', vals
As long as you do not change object reference, Python will preserve global object. Compare
In [114]: vals = [1,2,3]
In [116]: id(vals)
Out[116]: 144255596
In [118]: def func():
vals[0] = 5
return id(vals)
.....:
In [119]: func()
Out[119]: 144255596
In [120]: def func_update():
vals = vals
return id(vals)
.....:
In [121]: func_update()
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
/homes/markg/<ipython-input-121-f1149c600a85> in <module>()
----> 1 func_update()
/homes/markg/<ipython-input-120-257ba6ff792a> in func_update()
1 def func_update():
----> 2 vals = vals
3 return id(vals)
UnboundLocalError: local variable 'vals' referenced before assignment
The moment you try assignment, Python regards vals as local variable - and (oops) it's not there!
Related
Is it possible to assign a value to an "eval expression" without manipulating the evaluation string? Example: The user writes the expression
"globalPythonArray[10]"
which would evaluate to the current value of item 10 of globalPythonArray. But the goal is, to set the value of item 10 to a new value instead of getting the old value. A dirty workaround would be, to define a temporary variable "newValue" and extend the evaluation string to
"globalPythonArray[10] = newValue"
and compile and evaluate that modified string. Are there some low level Python C API functions that I can use such that I don't have to manipulate the evaluation string?
I'd say probably not, since accessing and storing subscriptions are different opcodes:
>>> dis.dis(compile('globalPythonArray[10]', 'a', 'exec'))
1 0 LOAD_NAME 0 (globalPythonArray)
2 LOAD_CONST 0 (10)
4 BINARY_SUBSCR
6 POP_TOP
8 LOAD_CONST 1 (None)
10 RETURN_VALUE
>>> dis.dis(compile('globalPythonArray[10] = myValue', 'a', 'exec'))
1 0 LOAD_NAME 0 (myValue)
2 LOAD_NAME 1 (globalPythonArray)
4 LOAD_CONST 0 (10)
6 STORE_SUBSCR
8 LOAD_CONST 1 (None)
10 RETURN_VALUE
Also, insert the usual warning about user input and eval() here:
globalPythonArray[__import__('os').system('rm -rf /')]
It's possible to "assign" a value to an eval expression by manipulating its abstract syntax tree (AST). It's not necessary to modify the evaluation string directly and if the type of the new value is not too complicated (e.g. numeric or string), you can hard code it into the AST:
Compile eval expression to an AST.
Replace Load context of expression at root node by Store.
Create a new AST with an Assign statement at the root node.
Set target to the expression node of the modified eval AST.
Set value to the value.
Compile the new AST to byte code and execute it.
Example:
import ast
import numpy as np
def eval_assign_num(expression, value, global_dict, local_dict):
expr_ast = ast.parse(expression, 'eval', 'eval')
expr_node = expr_ast.body
expr_node.ctx = ast.Store()
assign_ast = ast.Module(body=[
ast.Assign(
targets=[expr_node],
value=ast.Num(n=value)
)
])
ast.fix_missing_locations(assign_ast)
c = compile(assign_ast, 'assign', 'exec')
exec(c, global_dict, local_dict)
class TestClass:
arr = np.array([1, 2])
x = 6
testClass = TestClass()
arr = np.array([1, 2])
eval_assign_num('arr[0]', 10, globals(), locals())
eval_assign_num('testClass.arr[1]', 20, globals(), locals())
eval_assign_num('testClass.x', 30, globals(), locals())
eval_assign_num('newVarName', 40, globals(), locals())
print('arr', arr)
print('testClass.arr', testClass.arr)
print('testClass.x', testClass.x)
print('newVarName', newVarName)
Output:
arr [10 2]
testClass.arr [ 1 20]
testClass.x 30
newVarName 40
First, I fully understand what global statement means and how to use.
Now, let's look at this:
x = 100
def f():
global x
global xxx
x = 99
return x
print(f())
# >>> 99
print(x)
# >>> 99
You can see that by using global x, I successfully changed the value of x in the global environment.
But xxx does not exist at all, why am I allowed to global it and it won't even bring any error even if the function is executed?
global x does not define, declare, or otherwise create x. It simply states that if and when x is assigned to in the current function scope (whether that assignment comes before or after the global statement, which is why it is strongly recommended that global statements be used at the beginning of the function), the assignment is made to a global variable of that name, not a local variable. The actual creation is still the job of an actual assignment.
Put another way, global doesn't generate any byte code by itself; it simply modifies what byte code other assignment statements might generate. Consider these two functions:
def f():
global x
x = 99
def g():
x = 99
The only difference in the byte code for these two functions is that f use STORE_GOBAL as a result of the global statement, while g uses STORE_FAST.
>>> dis.dis(f)
5 0 LOAD_CONST 1 (99)
3 STORE_GLOBAL 0 (x)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
>>> dis.dis(g)
8 0 LOAD_CONST 1 (99)
3 STORE_FAST 0 (x)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
If you were to add an "unused" global statement, such as in
def h():
global xxx
x = 99
the resulting byte code is indistinguishable from g:
>>> dis.dis(h)
3 0 LOAD_CONST 1 (99)
2 STORE_FAST 0 (x)
4 LOAD_CONST 0 (None)
6 RETURN_VALUE
This code doesn't work:
def lol():
i = 1
def _lol():
i += 1
_lol()
lol()
Error:
local variable 'i' referenced before assignment
But, the following code works fine:
def lol():
i = [1]
def _lol():
i[0] += 1
_lol()
lol()
Why is that?
Python scopes fit into 3 categories -- local, nonlocal and global. By default, a function can only change a reference in the local scope (references are created with the assignment operator).
You're free to mutate an object that you have a reference to which is why the second example works (i is a reference to the list [1], then you change/mutate it's first item). In short, you're mutating the object that i references, you're not trying to change the reference. Note that you can give a function access to change the reference in the global scope via the global keyword:
i = 1
def func():
global i # If you comment this out, i doesn't get changed in the global scope
i = 2
func()
print(i) # 2 -- 1 if the global statement is commented out.
Note that python3.x adds the nonlocal keyword. It does the same thing as global but to the non-local scope. e.g.
def foo():
i = 1 # nonlocal to bar
def bar():
nonlocal i
print(i)
i += 1
return bar
bar1 = foo()
bar1() # 1
bar1() # 2
bar1() # 3
bar2 = foo()
bar2() # 1
bar2() # 2
bar1() # 4 bar2 doesn't influence bar1 at all.
augmented operators
This is a bit more advanced, but provided to hopefully help answer questions regarding operators like +=. Consider the case:
x = []
def func():
x += [1]
You might expect this to work -- After all, x += [1] for a list x is really just x.extend([1]), right?. Unfortunately, it's not quite. We can disassemble func using dis.dis to see a little more what's going on.
>>> dis.dis(func)
2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 BUILD_LIST 1
9 INPLACE_ADD
10 STORE_FAST 0 (x) ### IMPORTANT!
13 LOAD_CONST 0 (None)
16 RETURN_VALUE
Notice the byte-code instruction STORE_FAST? That basically says, store the result of INPLACE_ADD in the name x in the local dictionary. In other words, you write:
x += [1]
but python executes1:
x = x.__iadd__([1])
Why? __iadd__ should operate in place so why does it need to rebind the name to __iadd__'s return value? The rebinding part is the problem -- i.e., this code would work:
x = []
def func():
x.__iadd__([1])
The answer is because python has immutable objects and __iadd__ needs to work with them too. Because of this, __iadd__ can return an object other than "self". This ends up being incredibly useful. Consider i = 1; i += 1. This invocation only works because int.__iadd__ is allowed to return a new integer.
1Discussing this in even more depth is actually my all-time most upvoted answer on StackOverflow and can be found here
This is more of a conceptual question. I recently saw a piece of code in Python (it worked in 2.7, and it might also have been run in 2.5 as well) in which a for loop used the same name for both the list that was being iterated over and the item in the list, which strikes me as both bad practice and something that should not work at all.
For example:
x = [1,2,3,4,5]
for x in x:
print x
print x
Yields:
1
2
3
4
5
5
Now, it makes sense to me that the last value printed would be the last value assigned to x from the loop, but I fail to understand why you'd be able to use the same variable name for both your parts of the for loop and have it function as intended. Are they in different scopes? What's going on under the hood that allows something like this to work?
What does dis tell us:
Python 3.4.1 (default, May 19 2014, 13:10:29)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from dis import dis
>>> dis("""x = [1,2,3,4,5]
... for x in x:
... print(x)
... print(x)""")
1 0 LOAD_CONST 0 (1)
3 LOAD_CONST 1 (2)
6 LOAD_CONST 2 (3)
9 LOAD_CONST 3 (4)
12 LOAD_CONST 4 (5)
15 BUILD_LIST 5
18 STORE_NAME 0 (x)
2 21 SETUP_LOOP 24 (to 48)
24 LOAD_NAME 0 (x)
27 GET_ITER
>> 28 FOR_ITER 16 (to 47)
31 STORE_NAME 0 (x)
3 34 LOAD_NAME 1 (print)
37 LOAD_NAME 0 (x)
40 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
43 POP_TOP
44 JUMP_ABSOLUTE 28
>> 47 POP_BLOCK
4 >> 48 LOAD_NAME 1 (print)
51 LOAD_NAME 0 (x)
54 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
57 POP_TOP
58 LOAD_CONST 5 (None)
61 RETURN_VALUE
The key bits are sections 2 and 3 - we load the value out of x (24 LOAD_NAME 0 (x)) and then we get its iterator (27 GET_ITER) and start iterating over it (28 FOR_ITER). Python never goes back to load the iterator again.
Aside: It wouldn't make any sense to do so, since it already has the iterator, and as Abhijit points out in his answer, Section 7.3 of Python's specification actually requires this behavior).
When the name x gets overwritten to point at each value inside of the list formerly known as x Python doesn't have any problems finding the iterator because it never needs to look at the name x again to finish the iteration protocol.
Using your example code as the core reference
x = [1,2,3,4,5]
for x in x:
print x
print x
I would like you to refer the section 7.3. The for statement in the manual
Excerpt 1
The expression list is evaluated once; it should yield an iterable
object. An iterator is created for the result of the expression_list.
What it means is that your variable x, which is a symbolic name of an object list : [1,2,3,4,5] is evaluated to an iterable object. Even if the variable, the symbolic reference changes its allegiance, as the expression-list is not evaluated again, there is no impact to the iterable object that has already been evaluated and generated.
Note
Everything in Python is an Object, has an Identifier, attributes and methods.
Variables are Symbolic name, a reference to one and only one object at any given instance.
Variables at run-time can change its allegiance i.e. can refer to some other object.
Excerpt 2
The suite is then executed once for each item provided by the
iterator, in the order of ascending indices.
Here the suite refers to the iterator and not to the expression-list. So, for each iteration, the iterator is executed to yield the next item instead of referring to the original expression-list.
It is necessary for it to work this way, if you think about it. The expression for the sequence of a for loop could be anything:
binaryfile = open("file", "rb")
for byte in binaryfile.read(5):
...
We can't query the sequence on each pass through the loop, or here we'd end up reading from the next batch of 5 bytes the second time. Naturally Python must in some way store the result of the expression privately before the loop begins.
Are they in different scopes?
No. To confirm this you could keep a reference to the original scope dictionary (locals()) and notice that you are in fact using the same variables inside the loop:
x = [1,2,3,4,5]
loc = locals()
for x in x:
print locals() is loc # True
print loc["x"] # 1
break
What's going on under the hood that allows something like this to
work?
Sean Vieira showed exactly what is going on under the hood, but to describe it in more readable python code, your for loop is essentially equivalent to this while loop:
it = iter(x)
while True:
try:
x = it.next()
except StopIteration:
break
print x
This is different from the traditional indexing approach to iteration you would see in older versions of Java, for example:
for (int index = 0; index < x.length; index++) {
x = x[index];
...
}
This approach would fail when the item variable and the sequence variable are the same, because the sequence x would no longer be available to look up the next index after the first time x was reassigned to the first item.
With the former approach, however, the first line (it = iter(x)) requests an iterator object which is what is actually responsible for providing the next item from then on. The sequence that x originally pointed to no longer needs to be accessed directly.
It's the difference between a variable (x) and the object it points to (the list). When the for loop starts, Python grabs an internal reference to the object pointed to by x. It uses the object and not what x happens to reference at any given time.
If you reassign x, the for loop doesn't change. If x points to a mutable object (e.g., a list) and you change that object (e.g., delete an element) results can be unpredictable.
Basically, the for loop takes in the list x, and then, storing that as a temporary variable, reassigns a x to each value in that temporary variable. Thus, x is now the last value in the list.
>>> x = [1, 2, 3]
>>> [x for x in x]
[1, 2, 3]
>>> x
3
>>>
Just like in this:
>>> def foo(bar):
... return bar
...
>>> x = [1, 2, 3]
>>> for x in foo(x):
... print x
...
1
2
3
>>>
In this example, x is stored in foo() as bar, so although x is being reassigned, it still exist(ed) in foo() so that we could use it to trigger our for loop.
x no longer refers to the original x list, and so there's no confusion. Basically, python remembers it's iterating over the original x list, but as soon as you start assigning the iteration value (0,1,2, etc) to the name x, it no longer refers to the original x list. The name gets reassigned to the iteration value.
In [1]: x = range(5)
In [2]: x
Out[2]: [0, 1, 2, 3, 4]
In [3]: id(x)
Out[3]: 4371091680
In [4]: for x in x:
...: print id(x), x
...:
140470424504688 0
140470424504664 1
140470424504640 2
140470424504616 3
140470424504592 4
In [5]: id(x)
Out[5]: 140470424504592
Suppose I want to execute code, for example
value += 5
inside a namespace of my own (so the result is essentially mydict['value'] += 5). There's a function exec(), but I have to pass a string there:
exec('value += 5', mydict)
and passing statements as strings seems strange (e.g. it's not colorized that way).
Can it be done like:
def block():
value += 5
???(block, mydict)
? The obvious candidate for last line was exec(block.__code__, mydict), but no luck: it raises UnboundLocalError about value. I believe it basically executes block(), not the code inside block, so assignments aren't easy – is that correct?
Of course, another possible solution would be to disassembly block.__code__...
FYI, I got the question because of this thread. Also, this is why some (me undecided) call for new syntax
using mydict:
value += 5
Note how this doesn't throw error but doesn't change mydict either:
def block(value = 0):
value += 5
block(**mydict)
You can pass bytecode instead of a string to exec, you just need to make the right bytecode for the purpose:
>>> bytecode = compile('value += 5', '<string>', 'exec')
>>> mydict = {'value': 23}
>>> exec(bytecode, mydict)
>>> mydict['value']
28
Specifically, ...:
>>> import dis
>>> dis.dis(bytecode)
1 0 LOAD_NAME 0 (value)
3 LOAD_CONST 0 (5)
6 INPLACE_ADD
7 STORE_NAME 0 (value)
10 LOAD_CONST 1 (None)
13 RETURN_VALUE
the load and store instructions must be of the _NAME persuasion, and this compile makes them so, while...:
>>> def f(): value += 5
...
>>> dis.dis(f.func_code)
1 0 LOAD_FAST 0 (value)
3 LOAD_CONST 1 (5)
6 INPLACE_ADD
7 STORE_FAST 0 (value)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
...code in a function is optimized to use the _FAST versions, and those don't work on a dict passed to exec. If you started somehow with a bytecode using the _FAST instructions, you could patch it to use the _NAME kind instead, e.g. with bytecodehacks or some similar approach.
Use the global keyword to force dynamic scoping on any variables you want to modify from within the block:
def block():
global value
value += 5
mydict = {"value": 42}
exec(block.__code__, mydict)
print(mydict["value"])
Here is a crazy decorator to create such a block that uses "custom locals". In reality it is a quick hack to turn all variable access inside the function to global access, and evaluate the result with the custom locals dictionary as environment.
import dis
import functools
import types
import string
def withlocals(func):
"""Decorator for executing a block with custom "local" variables.
The decorated function takes one argument: its scope dictionary.
>>> #withlocals
... def block():
... counter += 1
... luckynumber = 88
>>> d = {"counter": 1}
>>> block(d)
>>> d["counter"]
2
>>> d["luckynumber"]
88
"""
def opstr(*opnames):
return "".join([chr(dis.opmap[N]) for N in opnames])
translation_table = string.maketrans(
opstr("LOAD_FAST", "STORE_FAST"),
opstr("LOAD_GLOBAL", "STORE_GLOBAL"))
c = func.func_code
newcode = types.CodeType(c.co_argcount,
0, # co_nlocals
c.co_stacksize,
c.co_flags,
c.co_code.translate(translation_table),
c.co_consts,
c.co_varnames, # co_names, name of global vars
(), # co_varnames
c.co_filename,
c.co_name,
c.co_firstlineno,
c.co_lnotab)
#functools.wraps(func)
def wrapper(mylocals):
return eval(newcode, mylocals)
return wrapper
if __name__ == '__main__':
import doctest
doctest.testmod()
This is just a monkey-patching adaption of someone's brilliant recipe for a goto decorator
From S.Lott's comment above I think I get the idea for an answer using creation of new class.
class _(__metaclass__ = change(mydict)):
value += 1
...
where change is a metaclass whose __prepare__ reads dictionary and whose __new__ updates dictionary.
For reuse, the snippet below would work, but it's kind of ugly:
def increase_value(d):
class _(__metaclass__ = change(d)):
value += 1
...
increase_value(mydict)