Confusing use of global in Mark Lutz's "Learning Python"

Confusing use of global in Mark Lutz's "Learning Python" - python

On page 551 of the 5th edition, there is the following file, named thismod.py:
var = 99
def local():
var = 0
def glob1():
global var
var+=1
def glob2():
var = 0
import thismod
thismod.var+=1
def glob3():
var = 0
import sys
glob = sys.modules['thismod']
glob.var+=1
def test():
print(var)
local(); glob1(); glob2(); glob3()
print(var)
After which the test is run in the terminal as follows:
>>> import thismod
>>> thismod.test()
99
102
>>> thismod.var
102
The use of the local() function makes perfect sense, as python makes a variable var in the local scope of the function local(). However I am lost when it comes to any uses of the global variables.
If I make a function as follows:
def nonglobal():
var+=1
I get an UnboundLocalError when running this function in the terminal. My current understanding is that the function would run, and first search the local scope of thismod.nonglobal, then, being unable to find an assignment to var in nonglobal(), would then search the global scope of the thismod.py file, wherein it would find thismod.var with the value of 99, but it does not. Running thismod.var immediately after in the terminal, however, yields the value as expected. Thus I clearly haven't understood what has happened with the global var declaration in glob1().
I had expected the above to happen also for the var = 0 line in glob2(), but instead this acts only as a local variable (to glob2()?), despite having had the code for glob1() run prior to the assignment. I had also thought that the import thismod line in glob2() would reset var to 99, but this is another thing that caught me off guard; if I comment out this line, the next line fails.
In short I haven't a clue what's going on with the global scope/imports here, despite having read this chapter of the book. I felt like I understood the discussion on globals and scope, but this example has shown me I do not. I appreciate any explanation and/or further reading that could help me sort this out.
Thanks.

Unless imported with the global keyword, variables in the global scope can only be used in a read-only capacity in any local function. Trying to write to them will produce an error.
Creating a local variable with the same name as a global variable, using the assignment operator =, will "shadow" the global variable (i.e. make the global variable unaccessible in favor of the local variable, with no other connection between them).
The arithmetic assignment operators (+=, -=, /=, etc.) play by weird rules as far as this scope is concerned. On one hand you're assigning to a variable, but on the other hand they're mutative, and global variables are read-only. Thus you get an error, unless you bring the global variable into local scope by using global first.
Admittedly, python has weird rules for this. Using global variables for read-only purposes is okay in general (you don't have to import them as global to use their value), except for when you shadow that variable at any point within the function, even if that point is after the point where you would be using its value. This probably has to do with how the function defines itself, when the def statement is executed:
var = 10
def a():
var2 = var
print(var2)
def b():
var2 = var # raises an error on this line, not the next one
var = var2
print(var)
a() # no errors, prints 10 as expected
b() # UnboundLocalError: local variable 'var' referenced before assignment
Long story short, you can use global variables in a read-only capacity all you like without doing anything special. To actually modify global variables (by reassignment; modifying the properties of global variables is fine), or to use global variables in any capacity while also using local variables which have the same name, you have to use the global keyword to bring them in.
As far as glob2() goes - the name thismod is not in the namespace (i.e. scope) until you import it into the namespace. Since thismod.var is a property of what is now a local variable, and not a global read-only variable, we can write to it just fine. The read-only restriction only really applies to global variables within the same file.
glob3() does effectively the same thing as glob2, except it references sys's list of imported modules rather than using the import keyword. This is basically the same behavior that import exhibits (albeit a gross oversimplification) - in general, modules are only loaded once, and trying to import them again (from anywhere, e.g. in multiple files) will get you the same instance that was imported the first time. importlib has ways to mess with that, but that's not immediately relevant.

Related

How interpreter of Python recognize undefined global variable in function?

How interpreter of Python recognize undefined global variable (a) in function in the following code ?
def show():
print(a)
a = 1
show()
Python is interactive language, so it processes each line of code line by line.
Given this, it should throw an error at the line with undefined variable (print(a)).
However, it works without error.
How does interpreter of Python recognize the undefined variable (a) ?
Or is it just recognized as letters until show function is called?
I converted the above code to bytecode, but I didn't understand well it...

When you define your function inside a python interpreter, python treats it as a sort of black box. It initializes the variables that are not defined inside and outside the function as free variables. Then, it stores a reference to the function inside the global table (you can access it using globals()). This global table holds the values for global variables and global function references.
When you define the variable a python stores it inside the global dictionary as well. Just like the function before it.
After that when you call your function, python sees a variable a. It knows that the variable is free, therefore, must be declared inside the global variable by now. Then it looks up the global table and uses the value that is stored.

Python is run line by line, and in saying that, it will skip over the function until the function is called. So even though it's line by line, it's still running afterwards.

Use of global keyword:
To access a global variable inside a function, there is no need to use global keyword. As a is not assigned inside the function, python will look at the global scope.
We use global keyword to assign a new value to the global variable.
This example throws an error -
def show():
a = a + 5
print(a)
a = 1
show()
Error:
UnboundLocalError: local variable 'a' referenced before assignment

A variable is in the scope of only some of thees python nested function definitions but not all, why?

In the code below:
def f():
a = 'x'
def g():
print(a)
if a == 'x':
return True
return False
def h():
print(a)
def i():
a = a + a
return a
a = i()
return a
if g():
return h()
Why is a accessible in function g, but not in function h or i?
I don't want to use nonlocal since I don't want to modify a in any of the inner functions, however I don't see why a itself is not accessible.

Short answer: because you assigned to a (by writing a = a + a and a = i()), you created local variables. The fact that you use variables before assignment does not matter.
Python checks the scope by checking assignments. If you somewhere write an assignment like a =, a +=, etc. regardless where you write it in the function, the function sees a as a local scope variable.
So in case you write:
a = 2
def f():
print(a)
a = 3
Even if you access a before you assign to a, it will still see a as a local variable. Python does not do codepath analysis here.
it sees a a a local variable in f. It will error if you call f(), since it will say you fetch a before it is actually assigned.
In case a variable is not defined locally, Python will iteratively inspect the outer scopes until it finds an a.
The only ways to access a variable from an outer scope if you assign to in a scope is by working with nonlocal or global (or of course passing it as a parameter).

The other answer is great for explaining what's going wrong. I'm adding my own answer to try to explain some of the reasons behind the issue (i.e. the "why" rather the "what").
First you need to understand Python's architecture a little bit. We often describe Python as an "interpreted" language rather than a "compiled" language like C, but that's not really the whole story. While Python doesn't compile directly to machine code, the interpreter doesn't run on the raw source code when the program is running. Rather, there's an intermediate step where the source code is compiled to to bytecode. The compiling happens automatically when a module is loaded, so you may not even be aware of it (though you may have seen the .pyc files that the compiler writes to cache the bytecode).
Anyway, to get back to your scope issue: Python's compiler uses a bytecode instruction to tell the interpreter to access a local variable rather than it uses for accessing a global variable (and third different instruction is sued for to accessing a variable from an enclosing function's scope). Since the bytecode is written by the compiler, the bytecode instruction to use needs to be decided at a function's compile time, not when the function is called. The choice of instruction is tricky though for ambiguous code like this:
a = 1
def foo():
if bar():
a = 2
print(a)
Does the access of a for the print call use the bytecode instruction that reads the global variable a or the instruction that accesses the local variable a? There's no way for the compiler to know in advance if bar will return a true value or not, so there's no possible answer that will let the function work in all situations.
To avoid ambiguity, Python's designers chose that the scope of a variable should be constant throughout each function (so the compiler can just pick one bytecode instruction and stick with it). That is, a name like a can refer to local or a global (or a closure cell) but only one of those in any given function.
The compiler defaults to using local variables (which are the fastest to access) for any name used as the target of an assignment anywhere in the function's code. Since inner functions are compiled at the same time as the functions that contain them, non-local lookups can also be detected at compile time and the appropriate instruction used. If the name isn't found in either the local or the enclosing scopes, the compiler assumes it is a global variable (which doesn't need to be defined yet). The global and nonlocal statements allow you to explicitly tell the compiler to use a specific scope (overriding what it would pick on its own).
You can explore the different ways the compiler handles variable lookups in different scopes using the dis module from the standard library. The dis module disassembles bytecode into a more readable format. Try calling dis.dis on functions like these:
a = 1
def load_global():
print(a) # access the global variable "a"
def load_fast():
a = 2
print(a) # access the local variable "a", which shadows the global variable
def closure():
a = 2
def load_dref():
print(a) # access the variable "a" from the enclosing scope
return load_dref
load_dref = closure() # both dis.dis(closure) and dis.dis(load_dref) are interesting
The full details of how to interpret the output of dis.dis are beyond the scope (no pun intended) of this answer, but the main things to look for are the LOAD_... bytecode instructions that deal with (a) as their target. You'll see three different LOAD_... instructions in the different functions above, corresponding to the three different kinds of scopes they're reading from (each function is named for the corresponding instruction).

python static variables and methods

I know there have been several posts about this, but I am still confused. Am trying to use a static variable with initialization, and don't know how to do it. So what I have is a package 'config', which has a module the_config.py. What I would like is for this to be something like
# the_config.py
import yaml
user_settings=None
def initialize(user_settings_file)
with open(user_settings_file) as yaml_handle:
user_settings = yaml.safe_load(user_settings_file)
Then there would be a calling module as pipeline.py
#pipeline.py
import config.the_config as my_config
def main(argv):
...
my_config.intialize(user_settings_file)
print my_config.user_settings['Output_Dir']
But this doesn't work. How should I be doing this please?
Thanks in advance.

When you assign to user_settings, it is automatically treated as a local variable in the initialize function. To tell Python that the assignment is intended to change the global variable instead, you need to write
global user_settings
at the beginning of initialize.

In Python any variable that is assigned in the body of a function is considered a local variable, unless it's has been explicitly declared differently with either global or nonlocal declarations.
Python considers also assignment any "augmented-assignment" operator like += or /=.
The mandatory declaration of global that are modified is a (little) price to pay to the fact that in Python there is no need to declare variables.
It's also assumed that your code doesn't rely too much on mutating state in that is kept global variables so if your code requires a lot of global declarations then there's probably something wrong.

I can propose You some way to solve this.
First of all the root of your problem is creation of new local variable in your initialize function
user_settings = yaml.safe_load(user_settings_file)
As soon as there is equal sign right to variable name python create new variable in corresponding scope (in this case local for initialize function
to avoid this one can use following:
use global declaration
def initialize(user_settings_file)
global user_settings # here it is
with open(user_settings_file) as yaml_handle:
user_settings = yaml.safe_load(user_settings_file)
modify existing variable but not create new one
user_settings = {}
def initialize(user_settings_file)
with open(user_settings_file) as yaml_handle:
user_settings.update(yaml.safe_load(user_settings_file)) # here we modify existing user_settings
operate with module attribute (this one is quite tricky)
user_settings = {}
def initialize(user_settings_file)
with open(user_settings_file) as yaml_handle:
import the_config
the_config.user_settings = yaml.safe_load(user_settings_file)

Globals as function input instead arguments

I'm just learning about how Python works and after reading a while I'm still confused about globals and proper function arguments. Consider the case globals are not modified inside functions, only referenced.
Can globals be used instead function arguments?
I've heard about using globals is considered a bad practice. Would it be so in this case?
Calling function without arguments:
def myfunc() :
print myvalue
myvalue = 1
myfunc()
Calling function with arguments
def myfunc(arg) :
print arg
myvalue = 1
myfunc(myvalue)

I've heard about using globals is considered a bad practice. Would it be so in this case?
It depends on what you're trying to achieve. If myfunc() is supposed to print any value, then...
def myfunc(arg):
print arg
myfunc(1)
...is better, but if myfunc() should always print the same value, then...
myvalue = 1
def myfunc():
print myvalue
myfunc()
...is better, although with an example so simple, you may as well factor out the global, and just use...
def myfunc():
print 1
myfunc()

Yes. Making a variable global works in these cases instead of passing them in as a function argument. But, the problem is that as soon as you start writing bigger functions, you quickly run out of names and also it is hard to maintain the variables which are defined globally. If you don't need to edit your variable and only want to read it, there is no need to define it as global in the function.
Read about the cons of the global variables here - Are global variables bad?

There are several reasons why using function arguments is better than using globals:
It eliminates possible confusion: once your program gets large, it will become really hard to keep track of which global is used where. Passing function arguments lets you be much more clear about which values the function uses.
There's a particular mistake you WILL make eventually if you use globals, which will look very strange until you understand what's going on. It has to do with both modifying and reading a global variable in the same function. More on this later.
Global variables all live in the same namespace, so you will quickly run into the problem of overlapping names. What if you want two different variables named "index"? Calling them index1 and index2 is going to get real confusing, real fast. Using local variables, or function parameters, means that they all live in different namespaces, and the potential for confusion is greatly reduced.
Now, I mentioned modifying and reading a global variable in the same function, and a confusing error that can result. Here's what it looks like:
record_count = 0 # Global variable
def func():
print "Record count:", record_count
# Do something, maybe read a record from a database
record_count = record_count + 1 # Would normally use += 1 here, but it's easier to see what's happening with the "n = n + 1" syntax
This will FAIL: UnboundLocalError: local variable 'record_count' referenced before assignment
Wait, what? Why is record_count being treated as a local variable, when it's clearly global? Well, if you never assigned to record_count in your function, then Python would use the global variable. But when you assign a value to record_count, Python has to guess what you mean: whether you want to modify the global variable, or whether you want to create a new local variable that shadows (hides) the global variable, and deal only with the local variable. And Python will default to assume that you're being smart with globals (i.e., not modifying them without knowing exactly what you're doing and why), and assume that you meant to create a new local variable named record_count.
But if you're accessing a local variable named record_count inside your function, Python won't let you access the global variable with the same name inside the function. This is to spare you some really nasty, hard-to-track-down bugs. Which means that if this function has a local variable named record_count -- and it does, because of the assignment statement -- then all access to record_count is considered to be accessing the local variable. Including the access in the print statement, before the local variable's value is defined. Thus, the UnboundLocalError exception.
Now, an exercise for the reader. Remove the print statement and notice that the UnboundLocalError exception is still thrown. Can you figure out why? (Hint: before assigning to a variable, the value on the right-hand side of the assignment has to be calculated.)
Now: if you really want to use the global record_count variable in your function, the way to do it is with Python's global statement, which says "Hey, this variable name I'm about to specify? Don't ever make it a local variable, even if I assign to it. Assign to the global variable instead." The way it works is just global record_count (or any other variable name), at the start of your function. Thus:
record_count = 0 # Global variable
def func():
global record_count
print "Record count:", record_count
# Do something, maybe read a record from a database
record_count = record_count + 1 # Again, you would normally use += 1 here
This will do what you expected in the first place. But hopefully now you understand why it will work, and the other version won't.

It depends on what you want to do.
If you need to change the value of a variable that is declared outside of the function then you can't pass it as an argument since that would create a "copy" of that variable inside the functions scope.
However if you only want to work with the value of a variable you should pass it as an argument. The advantage of this is that you can't mess up the global variable by accident.
Also you should declare global variable before they are used.

How to create module-wide variables in Python? [duplicate]

This question already has answers here:
Using global variables in a function
(25 answers)
Closed 3 years ago.
The community reviewed whether to reopen this question 4 months ago and left it closed:
Original close reason(s) were not resolved
Is there a way to set up a global variable inside of a module? When I tried to do it the most obvious way as appears below, the Python interpreter said the variable __DBNAME__ did not exist.
...
__DBNAME__ = None
def initDB(name):
if not __DBNAME__:
__DBNAME__ = name
else:
raise RuntimeError("Database name has already been set.")
...
And after importing the module in a different file
...
import mymodule
mymodule.initDB('mydb.sqlite')
...
And the traceback was:
...
UnboundLocalError: local variable 'DBNAME' referenced before assignment
...
Any ideas? I'm trying to set up a singleton by using a module, as per this fellow's recommendation.

Here is what is going on.
First, the only global variables Python really has are module-scoped variables. You cannot make a variable that is truly global; all you can do is make a variable in a particular scope. (If you make a variable inside the Python interpreter, and then import other modules, your variable is in the outermost scope and thus global within your Python session.)
All you have to do to make a module-global variable is just assign to a name.
Imagine a file called foo.py, containing this single line:
X = 1
Now imagine you import it.
import foo
print(foo.X) # prints 1
However, let's suppose you want to use one of your module-scope variables as a global inside a function, as in your example. Python's default is to assume that function variables are local. You simply add a global declaration in your function, before you try to use the global.
def initDB(name):
global __DBNAME__ # add this line!
if __DBNAME__ is None: # see notes below; explicit test for None
__DBNAME__ = name
else:
raise RuntimeError("Database name has already been set.")
By the way, for this example, the simple if not __DBNAME__ test is adequate, because any string value other than an empty string will evaluate true, so any actual database name will evaluate true. But for variables that might contain a number value that might be 0, you can't just say if not variablename; in that case, you should explicitly test for None using the is operator. I modified the example to add an explicit None test. The explicit test for None is never wrong, so I default to using it.
Finally, as others have noted on this page, two leading underscores signals to Python that you want the variable to be "private" to the module. If you ever do an import * from mymodule, Python will not import names with two leading underscores into your name space. But if you just do a simple import mymodule and then say dir(mymodule) you will see the "private" variables in the list, and if you explicitly refer to mymodule.__DBNAME__ Python won't care, it will just let you refer to it. The double leading underscores are a major clue to users of your module that you don't want them rebinding that name to some value of their own.
It is considered best practice in Python not to do import *, but to minimize the coupling and maximize explicitness by either using mymodule.something or by explicitly doing an import like from mymodule import something.
EDIT: If, for some reason, you need to do something like this in a very old version of Python that doesn't have the global keyword, there is an easy workaround. Instead of setting a module global variable directly, use a mutable type at the module global level, and store your values inside it.
In your functions, the global variable name will be read-only; you won't be able to rebind the actual global variable name. (If you assign to that variable name inside your function it will only affect the local variable name inside the function.) But you can use that local variable name to access the actual global object, and store data inside it.
You can use a list but your code will be ugly:
__DBNAME__ = [None] # use length-1 list as a mutable
# later, in code:
if __DBNAME__[0] is None:
__DBNAME__[0] = name
A dict is better. But the most convenient is a class instance, and you can just use a trivial class:
class Box:
pass
__m = Box() # m will contain all module-level values
__m.dbname = None # database name global in module
# later, in code:
if __m.dbname is None:
__m.dbname = name
(You don't really need to capitalize the database name variable.)
I like the syntactic sugar of just using __m.dbname rather than __m["DBNAME"]; it seems the most convenient solution in my opinion. But the dict solution works fine also.
With a dict you can use any hashable value as a key, but when you are happy with names that are valid identifiers, you can use a trivial class like Box in the above.

Explicit access to module level variables by accessing them explicity on the module
In short: The technique described here is the same as in steveha's answer, except, that no artificial helper object is created to explicitly scope variables. Instead the module object itself is given a variable pointer, and therefore provides explicit scoping upon access from everywhere. (like assignments in local function scope).
Think of it like self for the current module instead of the current instance !
# db.py
import sys
# this is a pointer to the module object instance itself.
this = sys.modules[__name__]
# we can explicitly make assignments on it
this.db_name = None
def initialize_db(name):
if (this.db_name is None):
# also in local function scope. no scope specifier like global is needed
this.db_name = name
# also the name remains free for local use
db_name = "Locally scoped db_name variable. Doesn't do anything here."
else:
msg = "Database is already initialized to {0}."
raise RuntimeError(msg.format(this.db_name))
As modules are cached and therefore import only once, you can import db.py as often on as many clients as you want, manipulating the same, universal state:
# client_a.py
import db
db.initialize_db('mongo')
# client_b.py
import db
if (db.db_name == 'mongo'):
db.db_name = None # this is the preferred way of usage, as it updates the value for all clients, because they access the same reference from the same module object
# client_c.py
from db import db_name
# be careful when importing like this, as a new reference "db_name" will
# be created in the module namespace of client_c, which points to the value
# that "db.db_name" has at import time of "client_c".
if (db_name == 'mongo'): # checking is fine if "db.db_name" doesn't change
db_name = None # be careful, because this only assigns the reference client_c.db_name to a new value, but leaves db.db_name pointing to its current value.
As an additional bonus I find it quite pythonic overall as it nicely fits Pythons policy of Explicit is better than implicit.

Steveha's answer was helpful to me, but omits an important point (one that I think wisty was getting at). The global keyword is not necessary if you only access but do not assign the variable in the function.
If you assign the variable without the global keyword then Python creates a new local var -- the module variable's value will now be hidden inside the function. Use the global keyword to assign the module var inside a function.
Pylint 1.3.1 under Python 2.7 enforces NOT using global if you don't assign the var.
module_var = '/dev/hello'
def readonly_access():
connect(module_var)
def readwrite_access():
global module_var
module_var = '/dev/hello2'
connect(module_var)

For this, you need to declare the variable as global. However, a global variable is also accessible from outside the module by using module_name.var_name. Add this as the first line of your module:
global __DBNAME__

You are falling for a subtle quirk. You cannot re-assign module-level variables inside a python function. I think this is there to stop people re-assigning stuff inside a function by accident.
You can access the module namespace, you just shouldn't try to re-assign. If your function assigns something, it automatically becomes a function variable - and python won't look in the module namespace.
You can do:
__DB_NAME__ = None
def func():
if __DB_NAME__:
connect(__DB_NAME__)
else:
connect(Default_value)
but you cannot re-assign __DB_NAME__ inside a function.
One workaround:
__DB_NAME__ = [None]
def func():
if __DB_NAME__[0]:
connect(__DB_NAME__[0])
else:
__DB_NAME__[0] = Default_value
Note, I'm not re-assigning __DB_NAME__, I'm just modifying its contents.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Confusing use of global in Mark Lutz's "Learning Python" - python

Related

How interpreter of Python recognize undefined global variable in function?

A variable is in the scope of only some of thees python nested function definitions but not all, why?

python static variables and methods

Globals as function input instead arguments

How to create module-wide variables in Python? [duplicate]

Categories

Resources