Python: Why is global needed only on assignment and not on reads?

Python: Why is global needed only on assignment and not on reads? - python

If a function needs to modify a variable declared in global scope, it need to use the global declaration. However, if the function just needs to read a global variable it can do so without using a global declaration:
X = 10
def foo():
global X
X = 20 # Needs global declaration
def bar():
print( X ) # Does not need global
My question is about the design of Python: why is Python designed to allow the read of global variables without using the global declaration? That is, why only force assignment to have global, why not force global upon reads too? (That would make it even and elegant.)
Note: I can see that there is no ambiguity while reading, but while assigning it is not clear if one intends to create a new local variable or assign to the global one. But, I am hoping there is a better reason or intention to this uneven design choice by the BDFL.

With nested scopes, the variable lookups are easy. They occur in a chain starting with locals, through enclosing defs, to module globals, and then builtins. The rule is the first match found wins. Accordingly, you don't need a "global" declaration for lookups.
In contrast, with writes you need to specify which scope to write to. There is otherwise no way to determine whether "x = 10" in function would mean "write to a local namespace" or "write to a global namespace."
Executive summary, with write you have a choice of namespace, but with lookups the first-found rule suffices. Hope this helps :-)
Edit: Yes, it is this way "because the BDFL said so", but it isn't unusual in other languages without type declarations to have a first-found rule for lookups and to only require a modifier for nonlocal writes. When you think about it, those two rules lead to very clean code since the scope modifiers are only needed in the least common case (nonlocal writes).

Look at this code:
from module import function
def foo(x):
return function(x)
The name function here is a global. It would get awfully tedious if I had to say global function to get this code to work.
Before you say that your X and my function are different (because one is a variable and the other is an imported function), remember that all names in Python are treated the same: when used, their value is looked up in the scope hierarchy. If you needed global X then you'd need global function. Ick.

Because explicit is better than implicit.
There's no ambiguity when you read a variable. You always get the first one found when searching scopes up from local until global.
When you assign, there's only two scopes the interpreter may unequivocally assume you are assigning to: local and global. Since assigning to local is the most common case and assigning to global is actually discouraged, it's the default. To assign to global you have to do it explicitly, telling the interpreter that wherever you use that variable in this scope, it should go straight to global scope and you know what you're doing. On Python 3 you can also assign to the nearest enclosing scope with 'nonlocal'.
Remember that when you assign to a name in Python, this new assignment has nothing to do with that name previously existing assigned to something else. Imagine if there was no default to local and Python searched up all scopes trying to find a variable with that name and assigning to it as it does when reading. Your functions' behavior could change based not only on your parameters, but on the enclosing scope. Life would be miserable.

You say it yourself that with reads there is no ambiguity and with writes there is. Therefore you need some mechanism for resolving the ambiguity with writes.
One option (possibly actually used by much older versions of Python, IIRC) is to just say writes always go to the local scope. Then there's no need for a global keyword, and no ambiguity. But then you can't write to global variables at all (without using things like globals() to get at them in a round-about way), so that wouldn't be great.
Another option, used by languages that statically declare variables, is to communicate to the language implementation up-front for every scope which names are local (the ones you declare in that scope) and which names are global (names declared at the module scope). But Python doesn't have declared variables, so this solution doesn't work.
Another option would be to have x = 3 assign to a local variable only if there isn't already a name in some outer scope with name x. Seems like it would intuitively do the right thing? It would lead to some seriously nasty corner cases though. Currently, where x = 3 will write to is statically determined by the parser; either there's no global x in the same scope and it's a local write, or there is a global x and it's a global write. But if what it will do depends on the global module scope, you have to wait until runtime to determine where the write goes which means it can change between invocations of a function. Think about that. Every time you create a global in a module, you would alter the behaviour of all functions in the module that happened to be using that name as a local variable name. Do some module scope computation that uses tmp as a temporary variable and say goodbye to using tmp in all functions in the module. And I shudder to think of the obscure bugs involving assigning an attribute on a module you've imported and then calling a function from that module. Yuck.
And another option is to communicate to the language implementation on each assignment whether it should be local or global. This is what Python has gone with. Given that there's a sensible default that covers almost all cases (write to a local variable), we have local assignment as the default and explicitly mark out global assignments with global.
There is an ambiguity with assignments that needs some mechanism to resolve it. global is one such mechanism. It's not the only possible one, but in the context of Python, it seems that all the alternative mechanisms are horrible. I don't know what sort of "better reason" you're looking for.

Related

Why does my module not behave like a singleton?

I have a JSON file that I am using as a datastore in a small game I am using as a way to learn Python.
I am proficient in a number of other languages.
I have several classes that want read access to the JSON so I want to load the JSON from the file into a variable and then allow the other classes to access the variable via getters and setters, because each class wants different parts of the JSON.
This sounds like a job for a Singleton. I understood that a Python Module behaves like a singleton.
However, when I import the Module into my classes the variable resets?
Here is a very cut down example:
Module:- state_manager
x=45
def set_x(value):
x=value
def get_x():
return x
Class:- Game
import Player
import state_manager
value = state_manager.get_x()
Class:- Player
import state_manager
state_manager.set_x(12)
By setting breakpoints I can see that when Player is imported by Game that Player sets the value of x in state_manager to 12.
But when I look at the value of x returned to Game using state_manager.get_x() I get 45.
Why is this?
What is the correct way in Python to create a Module or Object that can be shared among other classes?
I realise I can construct a Singleton myself but I thought I'd use the features of Python.

By setting breakpoints I can see that when Player is imported by Game that Player sets the value of x in state_manager to 12.
I am fairly sure that you're doing something wrong in your inspection, because the set_x function, at least as you quoted it...
x=45
def set_x(value):
x=value
...does not do what you think it does. Since x is being assigned to in the scope of set_x, it does not refer to the global (module-level) variable x, but to a local variable x that is immediately discarded as part of the stack frame when set_x returns. The existence of static assignments is effectively how local variables are declared in Python. The fix is to declare x as referring to the global variable:
x=45
def set_x(value):
global x
x=value

You need to declare x global in any function that attempts to set it globally:
def set_x(value):
global x
x=value
Without the global declaration, x is just a function-local variable.
In general, if a function assigns to a variable, anywhere in the function, then that variable is local unless it is explicitly declared global (or nonlocal). If a function only reads a variable, without setting it, then the variable is taken from a higher scope (e.g., a global, or an up-level reference).

UnboundLocalError causes

Does UnboundLocalError occur only when we try to use assignment operators (e.g. x += 5) on a non-local variable inside a function or there other cases? I tried using methods and functions (e.g. print) and it worked. I also tried to define a local variable (y) using a global variable (x)
(y = x + 5) and it worked too.

Yes - the presence of the assignment operator is what creates a new variable (of the same name) in the current scope. Calling mutating methods on the old object are not a problem, nor is doing something with that old value, since there's no question (if only a single assignment was ever used) which value you're talking about.
The concern here is not the modification of a value. The concern is the ambiguity of the variable used. This can also be solved using the global keyword, which specifically tells Python to use the global version, eliminating the ambiguity.
Remember also that Python variables (or globals) are sort of hoisted, like in JavaScript. Any variable used inside a specific scope is a variable in that scope from the beginning of that scope. That means a variable used inside a function is a variable in that scope from the start of the function, regardless of if it shows up half way through.
A really good reference for this is here. Some more specifics here.

A variable is in the scope of only some of thees python nested function definitions but not all, why?

In the code below:
def f():
a = 'x'
def g():
print(a)
if a == 'x':
return True
return False
def h():
print(a)
def i():
a = a + a
return a
a = i()
return a
if g():
return h()
Why is a accessible in function g, but not in function h or i?
I don't want to use nonlocal since I don't want to modify a in any of the inner functions, however I don't see why a itself is not accessible.

Short answer: because you assigned to a (by writing a = a + a and a = i()), you created local variables. The fact that you use variables before assignment does not matter.
Python checks the scope by checking assignments. If you somewhere write an assignment like a =, a +=, etc. regardless where you write it in the function, the function sees a as a local scope variable.
So in case you write:
a = 2
def f():
print(a)
a = 3
Even if you access a before you assign to a, it will still see a as a local variable. Python does not do codepath analysis here.
it sees a a a local variable in f. It will error if you call f(), since it will say you fetch a before it is actually assigned.
In case a variable is not defined locally, Python will iteratively inspect the outer scopes until it finds an a.
The only ways to access a variable from an outer scope if you assign to in a scope is by working with nonlocal or global (or of course passing it as a parameter).

The other answer is great for explaining what's going wrong. I'm adding my own answer to try to explain some of the reasons behind the issue (i.e. the "why" rather the "what").
First you need to understand Python's architecture a little bit. We often describe Python as an "interpreted" language rather than a "compiled" language like C, but that's not really the whole story. While Python doesn't compile directly to machine code, the interpreter doesn't run on the raw source code when the program is running. Rather, there's an intermediate step where the source code is compiled to to bytecode. The compiling happens automatically when a module is loaded, so you may not even be aware of it (though you may have seen the .pyc files that the compiler writes to cache the bytecode).
Anyway, to get back to your scope issue: Python's compiler uses a bytecode instruction to tell the interpreter to access a local variable rather than it uses for accessing a global variable (and third different instruction is sued for to accessing a variable from an enclosing function's scope). Since the bytecode is written by the compiler, the bytecode instruction to use needs to be decided at a function's compile time, not when the function is called. The choice of instruction is tricky though for ambiguous code like this:
a = 1
def foo():
if bar():
a = 2
print(a)
Does the access of a for the print call use the bytecode instruction that reads the global variable a or the instruction that accesses the local variable a? There's no way for the compiler to know in advance if bar will return a true value or not, so there's no possible answer that will let the function work in all situations.
To avoid ambiguity, Python's designers chose that the scope of a variable should be constant throughout each function (so the compiler can just pick one bytecode instruction and stick with it). That is, a name like a can refer to local or a global (or a closure cell) but only one of those in any given function.
The compiler defaults to using local variables (which are the fastest to access) for any name used as the target of an assignment anywhere in the function's code. Since inner functions are compiled at the same time as the functions that contain them, non-local lookups can also be detected at compile time and the appropriate instruction used. If the name isn't found in either the local or the enclosing scopes, the compiler assumes it is a global variable (which doesn't need to be defined yet). The global and nonlocal statements allow you to explicitly tell the compiler to use a specific scope (overriding what it would pick on its own).
You can explore the different ways the compiler handles variable lookups in different scopes using the dis module from the standard library. The dis module disassembles bytecode into a more readable format. Try calling dis.dis on functions like these:
a = 1
def load_global():
print(a) # access the global variable "a"
def load_fast():
a = 2
print(a) # access the local variable "a", which shadows the global variable
def closure():
a = 2
def load_dref():
print(a) # access the variable "a" from the enclosing scope
return load_dref
load_dref = closure() # both dis.dis(closure) and dis.dis(load_dref) are interesting
The full details of how to interpret the output of dis.dis are beyond the scope (no pun intended) of this answer, but the main things to look for are the LOAD_... bytecode instructions that deal with (a) as their target. You'll see three different LOAD_... instructions in the different functions above, corresponding to the three different kinds of scopes they're reading from (each function is named for the corresponding instruction).

What does it mean that a scope is determined statically and used dynamically?

This is an excerpt of Python docs for Classes I'm struggling to understand:
A scope is a textual region of a Python program where a namespace is directly accessible. “Directly accessible” here means that an unqualified reference to a name attempts to find the name in the namespace.
Although scopes are determined statically, they are used dynamically.
I didn't quite comprehend what the author meant by a scope from this definition, what's a textual region of a program, and what it means that scopes are statically determined and dynamically used. I have an intuitive understanding of a scope, but would love to fully appreciate the docs definition. If someone would be so kind as to elaborate what author had in mind it would be greatly appreciated.

"Defined Statically"
There is global scope and local scope (let's ignore the third one).
Whether a variable is global or local in some function is determined before the function is called, i.e. statically.
For example:
a = 1
b = 2
def func1():
c = 3
print func1.__code__.co_varnames # prints ('c',)
It is determined statically that func1 has one local variable and that its name is c. Statically, because it is done as soon as the function is created, not later when some local variable is actually accessed.
What are the consequences of that? Well, for example, this function fails:
a = 1
def func2():
print a # raises an exception
a = 2
If scopes were dynamic in Python, func2 would have printed 1. Instead, in line with print a it is already known that a is a local variable, so the global a will not be used. Local a wont be used either, because it is not yet initialized.
"Used Dynamically"
From the same document:
On the other hand, the actual search for names is done dynamically, at run time — however, the language definition is evolving towards static name resolution, at “compile” time, so don’t rely on dynamic name resolution! (In fact, local variables are already determined statically.)
Global variables are stored in a dictionary. When global variable a is accessed, the interpreter looks for key a in that dictionary. That is dynammic usage.
Local variables are not used that way. The interpreter knows beforehand how many variables a function has, so it can give each of them a fixed location. Then, accessing local variable xy can be optimized by simply taking "the second local variable" or "the fifth local variable", without actually using the variable name.

Globals as function input instead arguments

I'm just learning about how Python works and after reading a while I'm still confused about globals and proper function arguments. Consider the case globals are not modified inside functions, only referenced.
Can globals be used instead function arguments?
I've heard about using globals is considered a bad practice. Would it be so in this case?
Calling function without arguments:
def myfunc() :
print myvalue
myvalue = 1
myfunc()
Calling function with arguments
def myfunc(arg) :
print arg
myvalue = 1
myfunc(myvalue)

I've heard about using globals is considered a bad practice. Would it be so in this case?
It depends on what you're trying to achieve. If myfunc() is supposed to print any value, then...
def myfunc(arg):
print arg
myfunc(1)
...is better, but if myfunc() should always print the same value, then...
myvalue = 1
def myfunc():
print myvalue
myfunc()
...is better, although with an example so simple, you may as well factor out the global, and just use...
def myfunc():
print 1
myfunc()

Yes. Making a variable global works in these cases instead of passing them in as a function argument. But, the problem is that as soon as you start writing bigger functions, you quickly run out of names and also it is hard to maintain the variables which are defined globally. If you don't need to edit your variable and only want to read it, there is no need to define it as global in the function.
Read about the cons of the global variables here - Are global variables bad?

There are several reasons why using function arguments is better than using globals:
It eliminates possible confusion: once your program gets large, it will become really hard to keep track of which global is used where. Passing function arguments lets you be much more clear about which values the function uses.
There's a particular mistake you WILL make eventually if you use globals, which will look very strange until you understand what's going on. It has to do with both modifying and reading a global variable in the same function. More on this later.
Global variables all live in the same namespace, so you will quickly run into the problem of overlapping names. What if you want two different variables named "index"? Calling them index1 and index2 is going to get real confusing, real fast. Using local variables, or function parameters, means that they all live in different namespaces, and the potential for confusion is greatly reduced.
Now, I mentioned modifying and reading a global variable in the same function, and a confusing error that can result. Here's what it looks like:
record_count = 0 # Global variable
def func():
print "Record count:", record_count
# Do something, maybe read a record from a database
record_count = record_count + 1 # Would normally use += 1 here, but it's easier to see what's happening with the "n = n + 1" syntax
This will FAIL: UnboundLocalError: local variable 'record_count' referenced before assignment
Wait, what? Why is record_count being treated as a local variable, when it's clearly global? Well, if you never assigned to record_count in your function, then Python would use the global variable. But when you assign a value to record_count, Python has to guess what you mean: whether you want to modify the global variable, or whether you want to create a new local variable that shadows (hides) the global variable, and deal only with the local variable. And Python will default to assume that you're being smart with globals (i.e., not modifying them without knowing exactly what you're doing and why), and assume that you meant to create a new local variable named record_count.
But if you're accessing a local variable named record_count inside your function, Python won't let you access the global variable with the same name inside the function. This is to spare you some really nasty, hard-to-track-down bugs. Which means that if this function has a local variable named record_count -- and it does, because of the assignment statement -- then all access to record_count is considered to be accessing the local variable. Including the access in the print statement, before the local variable's value is defined. Thus, the UnboundLocalError exception.
Now, an exercise for the reader. Remove the print statement and notice that the UnboundLocalError exception is still thrown. Can you figure out why? (Hint: before assigning to a variable, the value on the right-hand side of the assignment has to be calculated.)
Now: if you really want to use the global record_count variable in your function, the way to do it is with Python's global statement, which says "Hey, this variable name I'm about to specify? Don't ever make it a local variable, even if I assign to it. Assign to the global variable instead." The way it works is just global record_count (or any other variable name), at the start of your function. Thus:
record_count = 0 # Global variable
def func():
global record_count
print "Record count:", record_count
# Do something, maybe read a record from a database
record_count = record_count + 1 # Again, you would normally use += 1 here
This will do what you expected in the first place. But hopefully now you understand why it will work, and the other version won't.

It depends on what you want to do.
If you need to change the value of a variable that is declared outside of the function then you can't pass it as an argument since that would create a "copy" of that variable inside the functions scope.
However if you only want to work with the value of a variable you should pass it as an argument. The advantage of this is that you can't mess up the global variable by accident.
Also you should declare global variable before they are used.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.