First, I fully understand what global statement means and how to use.
Now, let's look at this:
x = 100
def f():
global x
global xxx
x = 99
return x
print(f())
# >>> 99
print(x)
# >>> 99
You can see that by using global x, I successfully changed the value of x in the global environment.
But xxx does not exist at all, why am I allowed to global it and it won't even bring any error even if the function is executed?
global x does not define, declare, or otherwise create x. It simply states that if and when x is assigned to in the current function scope (whether that assignment comes before or after the global statement, which is why it is strongly recommended that global statements be used at the beginning of the function), the assignment is made to a global variable of that name, not a local variable. The actual creation is still the job of an actual assignment.
Put another way, global doesn't generate any byte code by itself; it simply modifies what byte code other assignment statements might generate. Consider these two functions:
def f():
global x
x = 99
def g():
x = 99
The only difference in the byte code for these two functions is that f use STORE_GOBAL as a result of the global statement, while g uses STORE_FAST.
>>> dis.dis(f)
5 0 LOAD_CONST 1 (99)
3 STORE_GLOBAL 0 (x)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
>>> dis.dis(g)
8 0 LOAD_CONST 1 (99)
3 STORE_FAST 0 (x)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
If you were to add an "unused" global statement, such as in
def h():
global xxx
x = 99
the resulting byte code is indistinguishable from g:
>>> dis.dis(h)
3 0 LOAD_CONST 1 (99)
2 STORE_FAST 0 (x)
4 LOAD_CONST 0 (None)
6 RETURN_VALUE
Related
Here's a simple file depicting some inconsistent Python (3.6) behavior. Why is it possible that Case 1 and Case 2 run but Case 3 fails, even though Case 3 is just a merger of the first two cases?
I have provided the dis output of the first two cases.
import dis # Python bytecode disassembler
class A(object):
def __init__(self):
self.x # In case 2 (and 3), getting x results in a function call (because they are #properties), which fails when instantiating A because y is undefined. Case 1 evaluates the reference to a function without calling it and so it does not raise an exception.
# CASE 1: Legal
def x(self):
y
pass
'''
# CASE 2: Legal
#property
def x(self):
pass
'''
'''
# CASE 3: Illegal:
#property
def x(self):
y
pass
'''
if __name__ == '__main__':
a = A()
dis.dis(A)
Case 1 bytecode:
Disassembly of __init__:
5 0 LOAD_FAST 0 (self)
2 LOAD_ATTR 0 (x)
4 POP_TOP
6 LOAD_CONST 0 (None)
8 RETURN_VALUE
Disassembly of x:
9 0 LOAD_GLOBAL 0 (y)
2 POP_TOP
10 4 LOAD_CONST 0 (None)
6 RETURN_VALUE
Case 2 bytecode:
Disassembly of __init__:
5 0 LOAD_FAST 0 (self)
2 LOAD_ATTR 0 (x)
4 POP_TOP
6 LOAD_CONST 0 (None)
8 RETURN_VALUE
There is no inconsistency here.
When you instantiate a = A(), __init__ is called, which calls self.x, which will execute the body of x. At that point, there is no y inscope, so you get an exception.
Thanks to #chepner's comment:
In case 1, you aren't calling anything; self.x is a function reference
that isn't used. In case 3, self.x actually calls the defined getter
for x, which presumably is then trying to access an undefined global
name.
The behavior caused by the line self.x in case 3 is fundamentally different from case 1 because case 1 doesn't call anything -- it just evaluates a reference to a function.
On the other hand, self.x in case 3 executes the body of the x method, resulting in the undefined y error.
In order to confirm #chepner's comment, I ran a.x() with case 1 and got the same error as in case 3.
I am trying to analyze some messy code, that happens to use global variables quite heavily within functions (I am trying to refactor the code so that functions only use local variables). Is there any way to detect global variables within a function?
For example:
def f(x):
x = x + 1
z = x + y
return z
Here the global variable is y since it isn't given as an argument, and neither is it created within the function.
I tried to detect global variables within the function using string parsing, but it was getting a bit messy; I was wondering if there was a better way to do this?
Edit: If anyone is interested this is the code I am using to detect global variables (based on kindall's answer and Paolo's answer to this question: Capture stdout from a script in Python):
from dis import dis
def capture(f):
"""
Decorator to capture standard output
"""
def captured(*args, **kwargs):
import sys
from cStringIO import StringIO
# setup the environment
backup = sys.stdout
try:
sys.stdout = StringIO() # capture output
f(*args, **kwargs)
out = sys.stdout.getvalue() # release output
finally:
sys.stdout.close() # close the stream
sys.stdout = backup # restore original stdout
return out # captured output wrapped in a string
return captured
def return_globals(f):
"""
Prints all of the global variables in function f
"""
x = dis_(f)
for i in x.splitlines():
if "LOAD_GLOBAL" in i:
print i
dis_ = capture(dis)
dis_(f)
dis by default does not return output, so if you want to manipulate the output of dis as a string, you have to use the capture decorator written by Paolo and posted here: Capture stdout from a script in Python
Inspect the bytecode.
from dis import dis
dis(f)
Result:
2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 BINARY_ADD
7 STORE_FAST 0 (x)
3 10 LOAD_FAST 0 (x)
13 LOAD_GLOBAL 0 (y)
16 BINARY_ADD
17 STORE_FAST 1 (z)
4 20 LOAD_FAST 1 (z)
23 RETURN_VALUE
The global variables will have a LOAD_GLOBAL opcode instead of LOAD_FAST. (If the function changes any global variables, there will be STORE_GLOBAL opcodes as well.)
With a little work, you could even write a function that scans the bytecode of a function and returns a list of the global variables it uses. In fact:
from dis import HAVE_ARGUMENT, opmap
def getglobals(func):
GLOBAL_OPS = opmap["LOAD_GLOBAL"], opmap["STORE_GLOBAL"]
EXTENDED_ARG = opmap["EXTENDED_ARG"]
func = getattr(func, "im_func", func)
code = func.func_code
names = code.co_names
op = (ord(c) for c in code.co_code)
globs = set()
extarg = 0
for c in op:
if c in GLOBAL_OPS:
globs.add(names[next(op) + next(op) * 256 + extarg])
elif c == EXTENDED_ARG:
extarg = (next(op) + next(op) * 256) * 65536
continue
elif c >= HAVE_ARGUMENT:
next(op)
next(op)
extarg = 0
return sorted(globs)
print getglobals(f) # ['y']
As mentioned in the LOAD_GLOBAL documentation:
LOAD_GLOBAL(namei)
Loads the global named co_names[namei] onto the stack.
This means you can inspect the code object for your function to find globals:
>>> f.__code__.co_names
('y',)
Note that this isn't sufficient for nested functions (nor is the dis.dis method in #kindall's answer). In that case, you will need to look at constants too:
# Define a function containing a nested function
>>> def foo():
... def bar():
... return some_global
# It doesn't contain LOAD_GLOBAL, so .co_names is empty.
>>> dis.dis(foo)
2 0 LOAD_CONST 1 (<code object bar at 0x2b70440c84b0, file "<ipython-input-106-77ead3dc3fb7>", line 2>)
3 MAKE_FUNCTION 0
6 STORE_FAST 0 (bar)
9 LOAD_CONST 0 (None)
12 RETURN_VALUE
# Instead, we need to walk the constants to find nested functions:
# (if bar contain a nested function too, we'd need to recurse)
>>> from types import CodeType
>>> for constant in foo.__code__.co_consts:
... if isinstance(constant, CodeType):
... print constant.co_names
('some_global',)
Sometimes, some values/strings are hard-coded in functions. For example in the following function, I define a "constant" comparing string and check against it.
def foo(s):
c_string = "hello"
if s == c_string:
return True
return False
Without discussing too much about why it's bad to do this, and how it should be defined in the outer scope, I'm wondering what happens behind the scenes when it is defined this way.
Does the string get created each call?
If instead of the string "hello" it was the list: [1,2,3] (or a list with mutable content if it matters) would the same happen?
Because the string is immutable (as would a tuple), it is stored with the bytecode object for the function. It is loaded by a very simple and fast index lookup. This is actually faster than a global lookup.
You can see this in a disassembly of the bytecode, using the dis.dis() function:
>>> import dis
>>> def foo(s):
... c_string = "hello"
... if s == c_string:
... return True
... return False
...
>>> dis.dis(foo)
2 0 LOAD_CONST 1 ('hello')
3 STORE_FAST 1 (c_string)
3 6 LOAD_FAST 0 (s)
9 LOAD_FAST 1 (c_string)
12 COMPARE_OP 2 (==)
15 POP_JUMP_IF_FALSE 22
4 18 LOAD_GLOBAL 0 (True)
21 RETURN_VALUE
5 >> 22 LOAD_GLOBAL 1 (False)
25 RETURN_VALUE
>>> foo.__code__.co_consts
(None, 'hello')
The LOAD_CONST opcode loads the string object from the co_costs array that is part of the code object for the function; the reference is pushed to the top of the stack. The STORE_FAST opcode takes the reference from the top of the stack and stores it in the locals array, again a very simple and fast operation.
For mutable literals ({..}, [..]) special opcodes build the object, with the contents still treated as constants as much as possible (more complex structures just follow the same building blocks):
>>> def bar(): return ['spam', 'eggs']
...
>>> dis.dis(bar)
1 0 LOAD_CONST 1 ('spam')
3 LOAD_CONST 2 ('eggs')
6 BUILD_LIST 2
9 RETURN_VALUE
The BUILD_LIST call creates the new list object, using two constant string objects.
Interesting fact: If you used a list object for a membership test (something in ['option1', 'option2', 'option3'] Python knows the list object will never be mutated and will convert it to a tuple for you at compile time (a so-called peephole optimisation). The same applies to a set literal, which is converted to a frozenset() object, but only in Python 3.2 and newer. See Tuple or list when using 'in' in an 'if' clause?
Note that your sample function is using booleans rather verbosely; you could just have used:
def foo(s):
c_string = "hello"
return s == c_string
for the exact same result, avoiding the LOAD_GLOBAL calls in Python 2 (Python 3 made True and False keywords so the values can also be stored as constants).
Coming from much less dynamic C++, I have some trouble understanding the behaviour of this Python (2.7) code.
Note: I am aware that this is bad programming style / evil, but I would like to understand it non the less.
vals = [1,2,3]
def f():
vals[0] = 5
print 'inside', vals
print 'outside', vals
f()
print 'outside', vals
This code runs without error, and f manipulates the (seemingly) global list. This is contrary to my prior understanding that global variables that are to be manipulated (and not only read) in a function must be declared as global ....
On the other hand, if I replace vals[0] = 5 with vals += [5,6], execution fails with an UnboundLocalError unless I add a global vals to f. This is what I would have expected to happen in the first case as well.
Could you explain this behaviour?
Why can I manipulate vals in the first case? Why does the second type of manipulation fail while the first does not?
Update:
It was remarked in a comment that vals.extend(...) works without global. This adds to my confusion - why is += treated differently from a call to extend?
global is only needed when you are trying to change the object which the variable references. Because vals[0] = 5 changes the actual object rather than the reference, no error is raised. However, with vals += [5, 6], the interpreter tries to find a local variable because it can't change the global variable.
The confusing thing is that using the += operator with list modifies the original list, like vals[0] = 5. And whereas vals += [5, 6] fails, vals.extend([5, 6]) works. We can enlist the help of dis.dis to lend us some clues.
>>> def a(): v[0] = 1
>>> def b(): v += [1]
>>> def c(): v.extend([1])
>>> import dis
>>> dis.dis(a)
1 0 LOAD_CONST 1 (1)
3 LOAD_GLOBAL 0 (v)
6 LOAD_CONST 2 (0)
9 STORE_SUBSCR
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>> dis.dis(b)
1 0 LOAD_FAST 0 (v)
3 LOAD_CONST 1 (1)
6 BUILD_LIST 1
9 INPLACE_ADD
10 STORE_FAST 0 (v)
13 LOAD_CONST 0 (None)
16 RETURN_VALUE
d
>>> dis.dis(c)
1 0 LOAD_GLOBAL 0 (v)
3 LOAD_ATTR 1 (extend)
6 LOAD_CONST 1 (1)
9 BUILD_LIST 1
12 CALL_FUNCTION 1
15 POP_TOP
16 LOAD_CONST 0 (None)
19 RETURN_VALUE
We can see that functions a and c use LOAD_GLOBAL, whereas b tries to use LOAD_FAST. We can see now why using += won't work - the interpreter tries to load v as a local variable because of it's default behaviour with in-place addition. Because it can't know whether v is a list or not, it essentially assumes that the line means the same as v = v + [1].
global is needed when you want to assign to a variable in the outer scope. If you don't use global, Python will consider vals as a local variable when doing assignments.
+= is an assignment (an augmented assignment) and vals += [5, 6] is equivalent to reading vals, then append [5, 6] to that value and assign the resulting list back to the original vals. Because vals += [5,6] has no global statement, Python sees the assignment and treats vals as local. You didn't create a local variable called vals but you try to append to it and from here the UnboundLocalError.
But for reading it is not necessary to use global. The variable will be looked up locally first then, if it's not found in the local scope, it's looked up in the outer scope and so on. And since you are dealing with a reference type you get back a reference when you do the read. You can change the content of the object trough that reference.
That's why .extend() works (because it's called on the reference and acts on the object itself) while vals += [5, 6] fails (because vals is neither local nor marked global).
Here is a modified example to try out (using a local vals clears the UnboundLocalError):
vals = [1, 2, 3]
def f():
vals = []
vals += [5,6]
print 'inside', vals
print 'outside', vals
f()
print 'outside', vals
As long as you do not change object reference, Python will preserve global object. Compare
In [114]: vals = [1,2,3]
In [116]: id(vals)
Out[116]: 144255596
In [118]: def func():
vals[0] = 5
return id(vals)
.....:
In [119]: func()
Out[119]: 144255596
In [120]: def func_update():
vals = vals
return id(vals)
.....:
In [121]: func_update()
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
/homes/markg/<ipython-input-121-f1149c600a85> in <module>()
----> 1 func_update()
/homes/markg/<ipython-input-120-257ba6ff792a> in func_update()
1 def func_update():
----> 2 vals = vals
3 return id(vals)
UnboundLocalError: local variable 'vals' referenced before assignment
The moment you try assignment, Python regards vals as local variable - and (oops) it's not there!
This question already has answers here:
UnboundLocalError trying to use a variable (supposed to be global) that is (re)assigned (even after first use)
(14 answers)
Closed 9 years ago.
Global variables cannot be accessed within a function without using the global keyword; (fine) but I did not expect the following:
Case 1:
a = 1
def f():
a += 1
print(a)
>>> f()
...
UnboundLocalError: local variable 'a' referenced before assignment
Reason I presume: Function could not find variable a in its local scope.
Case 2:
a = 1
def f():
print(a)
>>> f()
1
But now, the function finds variable a in its local scope.
Contradicts, the reason I presumed in the previous case.
Why is this happening?
This is because for your second example, the compiler tries to help you out with a bit of magic. In the case of
def f():
print(a)
Here's the bytecode that it reduces to:
dis.dis(f)
2 0 LOAD_GLOBAL 0 (print)
3 LOAD_GLOBAL 1 (a) #aha!
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
You can see that since you never assigned to a within the scope of f, the compiler knows you must be referring to a global.
When you start assigning to a, now the compiler won't try to help you out. Unless you've explicitly told it that a is a global, it will treat it as a local.
def g():
x += 1
dis.dis(g)
2 0 LOAD_FAST 0 (x) #note no assumption that it is global
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_FAST 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
And now you can ask the question, okay, but why does the compiler not help you out in your first example? One way of explaining it is of course "explicit is better than implicit". Though of course the compiler is implicitly doing stuff in your second example, so maybe that's not a satisfying explanation :).
Mostly it comes down to This Is How Python Works, I'd say. If you assign to a variable, python will treat it as local to that scope unless you tell it otherwise. So your statement:
Global variables cannot be accessed within a function without using
the global keyword
is not quite correct. You can access variables from an outer scope but you cannot assign to said variables without explicitly declaring you want to.
Sidenote, this is perfectly legal:
x = [1]
def f():
x[0] += 1
f()
#x is now [2]
Which may be confusing :-). This is because in this context, you're not assigning anything over the reference that x holds; the += operator actually invokes the __setattr__ method of lists in order to alter an attribute. In python 2 this trick is frequently used as a workaround for the fact that there is no nonlocal keyword.