I would like to clarify, how globals from different modules are scoped?
I failed to find relevant documentation for this matter, thus I am relying on observation, but I would like to have some more insight, if my findings are pure implementation coincidence, or if I can trust them?
Testcase:
module_1.py:
global_var1 = 'm1_v1'
global_var2 = 'm1_v2'
def f():
print('f: {}'.format(global_var1))
print('f: {}'.format(global_var2))
module_2.py:
import module_1
global_var1 = 'm2_v1'
def g():
print('g: {}'.format(global_var1))
print('g: {}'.format(global_var2))
module_1.f()
g()
$ python3 module_2.py
f: m1_v1
f: m1_v2
g: m2_v1
Traceback (most recent call last):
File "module_2.py", line 11, in <module>
g()
File "module_2.py", line 7, in g
print('g: {}'.format(global_var2))
NameError: name 'global_var2' is not defined
Conclusion:
Thus, my conclusion is that a function will use the globals in this order:
the module where the function is used
the module where the function is defined (EDIT: this is the only answer!)
Globals are not bleeding through from imported modules.
EDIT: Functions imported from other modules bring their module globals like a closure.
Question:
I would like to see some comprehensive documentation on this matter (which I failed to find...).
While testing this out is nice, I have no idea, if this is a freak coincidence, and this behavior should never be trusted, or this is the expected way?
Also, what if a function is somehow imported through a 3rd module? what if the function is a class method? Etc.etc.
If you can't point me to a documentation, but you know a guideline "for sure", I am interested, as well.
I won't go in lengths why and how I want to use this, I am primarily interested to better understand the workings of python. However, if you do know a better solution, given the information at hand, I am interested to see that, as well - it will NOT be accepted as an answer, though.
In case there is a difference between python2 and python3, my main interest is python3, but it is nice to have the info on python2, as well.
Each module has its own global scope. Let's look at the second module:
import module_1
global_var1 = 'm2_v1'
def g():
print('g: {}'.format(global_var1))
print('g: {}'.format(global_var2))
module_1.f()
g()
Since global_var1 and global_var2 aren't local variables, they are looked up in the global scope. Since module_2.global_var2 isn't defined, you get a NameError.
You would either need to create module_2.global_var2 with something like from module_1 import global_var2, or change the definition of g to
def g():
print('g: {}'.format(global_var1))
print('g: {}'.format(module_1.global_var2))
import module_1 only adds one name to the global scope of module_2, that being module_1, which refers to the module object created by the import statement. from module_1 import global_var2 also only adds one name, global_var2, which is initialized to the value of module1.global_var2 (though with the side effect of creating a new module). In this case, module1.global_var2 and module2.globar_var2 are two distinct objects, not two names for the same object.
TLDR: Python uses lexical/static scoping since Python 2.1, including Python 3. (See name resolution and PEP 227)
Lexical scoping means functions only have access to the enclosing scopes in which they are defined, up to their module’s global namespace. Only the builtins are visible in all modules. The global namespace of modules is not shared, and importing or calling a function in another module does not give it access to that namespace.
Some peculiarities worth pointing out:
Local/function scopes are defined when the function is parsed. Names are either local, closures/nonlocal, or global.
Global/module scopes are mutable. Names can be added, removed or changed at any time.
Classes introduce short-lived scopes that are not visible to functions or other classes defined inside them.
Related
I'm creating a function dynamically, and trying to pass the handle to a class for pickling:
def my_func():
exec("""def my_collate_fn():
pass""")
loader = DataLoader(collate_fn=my_collate_fn)
This code above will throw an error saying that my_collate_fn is not defined. Weird thing is that during debugging, the handle did actually exist and I can see it under local scope, but it throws error during runtime. Is there something I missed?
For context I'm strongly avoiding lambda since Pytorch's DataLoader class can't pickle them if number of workers greater that 0.
EDIT:
When you call execyou may pass two additional parameters with dicionaries representing the global and local namespaces where the code is run.
When one creates a function with the def statement, is name is bound on the local namespace. If only globals is given, locals actually defaults to be the same dictionary.
If you do not pass a globals parameter to exec it will use the global namespace where it is called from - the function will be set in the runniong context, just as if where typed inline, and you can just use the name you used inside the exec string. Every linter on earth and some other tools will yell at you.
If you simply pass an ordinary dictionary as he globals parameter, you can retrieve your function from there:
from textwrap import dedent as D
#use of dedent will allow you to keep identation inside the string
# conforming to the indentation outside
def my_func():
namespace = {}
exec(D("""\
def my_collate_fn():
pass
"""), namespace)
return namespace["my_collate_fn"]
The bad news: this is even less pickable than a lambda (if that is possible).
If you have to pass functions around that have to be passed as arguments
to sub-processes (for which the internal mechanism is pickling the function), just declare a plain, named function, at global scope, with def. Pickle will do its best to find the function and pass it around by using its __qualname__, and it should work in most cases - just keep it simple.
I usually don't think too hard about variable scope in python, but I wanted to see if there's a clean explanation for this. Given two files called main.py and utils.py:
utils.py
def run():
print(L)
main.py
import utils
def run():
print(L)
if __name__ == '__main__':
L = [1,2]
run()
utils.run()
The first run() call in main.py runs fine despite L not being fed into run(), and the utils.run() call raises a NameError. Is L a global variable available to all functions defined in main.py?
If I imported utils with from utils import * instead of import utils, would that change anything?
It's module-level scope. A global variable defined in a module is available to all functions defined in the same module (if it's not overriden). Functions in another module don't have access to another module's variables unless they import them.
About "If I imported utils with from utils import * instead of import utils, would that change anything?":
No. The scope is determined at parsing time.
Check
this
for more information.
Notably:
It is important to realize that scopes are determined textually: the global
scope of a function defined in a module is that module’s namespace, no matter
from where or by what alias the function is called. On the other hand, the
actual search for names is done dynamically, at run time [...]
So the global scopes of both functions for variables defined in a module are the modules they're defined in. For one, its module also later has a definition for a global variable it uses, but not the other module, and when it's time to check for a variable when a function is run, each checks their own module's variables definitions, one finds it, the other does not.
See Python's FAQ. Their implementation of scope is a compromise between convenience and the dangers of globals.
Variables are treated as globals if they're only referenced by a function, and need to be explicitly declared as globals (e.g. global foo ) inside of the function body if you want to edit them. If you edit run() to try and change the value of L, you'll get an error.
What's happening here is that your Python code imports utils, and then runs run(). This function sees that you're looking for a variable named "L," and checks your global namespace.
I put a method in a file mymodule.py:
def do_something():
global a
a=1
If I try
>>> execfile('mymodule.py')
>>> do_something()
>>> print a
I get "1" as I expect. But if I import the module instead,
>>> from mymodule import *
and then run do_something(), then the python session knows nothing about the variable "a".
Can anyone explain the difference to me? Thanks.
execfile without globals, locals argument, It executes the file content in the current namespace. (the same namespace that call the execfile)
While, import execute the specified module in a separated namespace, and define the mymodule in the local namespace.
In the second part where you import mymodule, the reason why it isn't showing up is that a is global to the namespace of mymodule as done that way.
Try:
print mymodule.a
This prints:
1
As expected.
As per the Python documentation:
The global statement is a declaration which holds for the entire
current code block. It means that the listed identifiers are to be
interpreted as globals. It would be impossible to assign to a global
variable without global, although free variables may refer to globals
without being declared global.
Names listed in a global statement must not be used in the same code
block textually preceding that global statement.
Names listed in a global statement must not be defined as formal
parameters or in a for loop control target, class definition, function
definition, or import statement.
In Python 3.3.1, this works:
i = 76
def A():
global i
i += 10
print(i) # 76
A()
print(i) # 86
This also works:
def enclosing_function():
i = 76
def A():
nonlocal i
i += 10
print(i) # 76
A()
print(i) # 86
enclosing_function()
But this doesn't work:
i = 76
def A():
nonlocal i # "SyntaxError: no binding for nonlocal 'i' found"
i += 10
print(i)
A()
print(i)
The documentation for the nonlocal keyword states (emphasis added):
The nonlocal statement causes the listed identifiers to refer to
previously bound variables in the nearest enclosing scope.
In the third example, the "nearest enclosing scope" just happens to be the global scope. So why doesn't it work?
PLEASE READ THIS BIT
I do notice that the documentation goes on to state (emphasis added):
The [nonlocal] statement allows encapsulated code to
rebind variables outside of the local scope besides the global
(module) scope.
but, strictly speaking, this doesn't mean that what I'm doing in the third example shouldn't work.
The search order for names is LEGB, i.e Local, Enclosing, Global, Builtin. So the global scope is not an enclosing scope.
EDIT
From the docs:
The nonlocal statement causes the listed identifiers to refer to
previously bound variables in the nearest enclosing scope. This is
important because the default behavior for binding is to search the
local namespace first. The statement allows encapsulated code to
rebind variables outside of the local scope besides the global
(module) scope.
why is a module's scope considered global and not an enclosing one? It's still not global to other modules (well, unless you do from module import *), is it?
If you put some name into module's namespace; it is visible in any module that uses module i.e., it is global for the whole Python process.
In general, your application should use as few mutable globals as possible. See Why globals are bad?:
Non-locality
No Access Control or Constraint Checking
Implicit coupling
Concurrency issues
Namespace pollution
Testing and Confinement
Therefore It would be bad if nonlocal allowed to create globals by accident. If you want to modify a global variable; you could use global keyword directly.
global is the most destructive: may affect all uses of the module anywhere in the program
nonlocal is less destructive: limited by the outer() function scope (the binding is checked at compile time)
no declaration (local variable) is the least destructive option: limited by inner() function scope
You can read about history and motivation behind nonlocal in PEP: 3104
Access to Names in Outer Scopes.
It depends upon the Boundary cases:
nonlocals come with some senstivity areas which we need to be aware of. First, unlike the global statement, nonlocal names really must have previous been assigned in an enclosing def's scope when a nonlocal is evaluated or else you'll get an error-you cannot create them dynamically by assigning them anew in the enclosing scope. In fact, they are checked at function definition time before either or nested function is called
>>>def tester(start):
def nested(label):
nonlocal state #nonlocals must already exist in enclosing def!
state = 0
print(label, state)
return nested
SyntaxError: no binding for nonlocal 'state' found
>>>def tester(start):
def nested(label):
global state #Globals dont have to exits yet when declared
state = 0 #This creates the name in the module now
print(label, state)
return nested
>>> F = tester(0)
>>> F('abc')
abc 0
>>> state
0
Second, nonlocal restricts the scope lookup to just enclosing defs; nonlocals are not looked up in the enclosing module's global scope or the built-in scope outside all def's, even if they are already there:
for example:-
>>>spam = 99
>>>def tester():
def nested():
nonlocal spam #Must be in a def, not the module!
print('current=', spam)
spam += 1
return nested
SyntaxError: no binding for nonlocal 'spam' found
These restrictions make sense once you realize that python would not otherwise generally know enclosing scope to create a brand-new name in. In the prior listing, should spam be assigned in tester, or the module outside? Because this is ambiguous, Python must resolve nonlocals at function creation time, not function call time.
The answer is that the global scope does not enclose anything - it is global to everything. Use the global keyword in such a case.
Historical reasons
In 2.x, nonlocal didn't exist yet. It wasn't considered necessary to be able to modify enclosing, non-global scopes; the global scope was seen as a special case. After all, the concept of a "global variable" is a lot easier to explain than lexical closures.
The global scope works differently
Because functions are objects, and in particular because a nested function could be returned from its enclosing function (producing an object that persists after the call to the enclosing function), Python needs to implement lookup into enclosing scopes differently from lookup into either local or global scopes. Specifically, in the reference implementation of 3.x, Python will attach a __closure__ attribute to the inner function, which is a tuple of cell instances that work like references (in the C++ sense) to the closed-over variables. (These are also references in the reference-counting garbage-collection sense; they keep the call frame data alive so that it can be accessed after the enclosing function returns.)
By contrast, global lookup works by doing a chained dictionary lookup: there's a dictionary that implements the global scope, and if that fails, a separate dictionary for the builtin scope is checked. (Of course, writing a global only writes to the global dict, not the builtin dict; there is no builtin keyword.)
Theoretically, of course, there's no reason why the implementation of nonlocal couldn't fall back on a lookup in the global (and then builtin) scope, in the same way that a lookup in the global scope falls back to builtins. Stack Overflow is not the right place to speculate on the reason behind the design decision. I can't find anything relevant in the PEP, so it may simply not have been considered.
The best I can offer is: like with local variable lookup, nonlocal lookup works by determining at compile time what the scope of the variable will be. If you consider builtins as simply pre-defined, shadow-able globals (i.e. the only real difference between the actual implementation and just dumping them into the global scope ahead of time, is that you can recover access to the builtin with del), then so does global lookup. As they say, "simple is better than complex" and "special cases aren't special enough to break the rules"; so, no fallback behaviour.
I've gotten myself in trouble a few times now with accidentially (unintentionally) referencing global variables in a function or method definition.
My question is: is there any way to disallow python from letting me reference a global variable? Or at least warn me that I am referencing a global variable?
x = 123
def myfunc() :
print x # throw a warning or something!!!
Let me add that the typical situation where this arrises for my is using IPython as an interactive shell. I use 'execfile' to execute a script that defines a class. In the interpreter, I access the class variable directly to do something useful, then decide I want to add that as a method in my class. When I was in the interpreter, I was referencing the class variable. However, when it becomes a method, it needs to reference 'self'. Here's an example.
class MyClass :
a = 1
b = 2
def add(self) :
return a+b
m = MyClass()
Now in my interpreter I run the script 'execfile('script.py')', I'm inspecting my class and type: 'm.a * m.b' and decide, that would be a useful method to have. So I modify my code to be, with the non-intentional copy/paste error:
class MyClass :
a = 1
b = 2
def add(self) :
return a+b
def mult(self) :
return m.a * m.b # I really meant this to be self.a * self.b
This of course still executes in IPython, but it can really confuse me since it is now referencing the previously defined global variable!
Maybe someone has a suggestion given my typical IPython workflow.
First, you probably don't want to do this. As Martijn Pieters points out, many things, like top-level functions and classes, are globals.
You could filter this for only non-callable globals. Functions, classes, builtin-function-or-methods that you import from a C extension module, etc. are callable. You might also want to filter out modules (anything you import is a global). That still won't catch cases where you, say, assign a function to another name after the def. You could add some kind of whitelisting for that (which would also allow you to create global "constants" that you can use without warnings). Really, anything you come up with will be a very rough guide at best, not something you want to treat as an absolute warning.
Also, no matter how you do it, trying to detect implicit global access, but not explicit access (with a global statement) is going to be very hard, so hopefully that isn't important.
There is no obvious way to detect all implicit uses of global variables at the source level.
However, it's pretty easy to do with reflection from inside the interpreter.
The documentation for the inspect module has a nice chart that shows you the standard members of various types. Note that some of them have different names in Python 2.x and Python 3.x.
This function will get you a list of all the global names accessed by a bound method, unbound method, function, or code object in both versions:
def get_globals(thing):
thing = getattr(thing, 'im_func', thing)
thing = getattr(thing, '__func__', thing)
thing = getattr(thing, 'func_code', thing)
thing = getattr(thing, '__code__', thing)
return thing.co_names
If you want to only handle non-callables, you can filter it:
def get_callable_globals(thing):
thing = getattr(thing, 'im_func', thing)
func_globals = getattr(thing, 'func_globals', {})
thing = getattr(thing, 'func_code', thing)
return [name for name in thing.co_names
if callable(func_globals.get(name))]
This isn't perfect (e.g., if a function's globals have a custom builtins replacement, we won't look it up properly), but it's probably good enough.
A simple example of using it:
>>> def foo(myparam):
... myglobal
... mylocal = 1
>>> print get_globals(foo)
('myglobal',)
And you can pretty easily import a module and recursively walk its callables and call get_globals() on each one, which will work for the major cases (top-level functions, and methods of top-level and nested classes), although it won't work for anything defined dynamically (e.g., functions or classes defined inside functions).
If you only care about CPython, another option is to use the dis module to scan all the bytecode in a module, or .pyc file (or class, or whatever), and log each LOAD_GLOBAL op.
One major advantage of this over the inspect method is that it will find functions that have been compiled, even if they haven't been created yet.
The disadvantage is that there is no way to look up the names (how could there be, if some of them haven't even been created yet?), so you can't easily filter out callables. You can try to do something fancy, like connecting up LOAD_GLOBAL ops to corresponding CALL_FUNCTION (and related) ops, but… that's starting to get pretty complicated.
Finally, if you want to hook things dynamically, you can always replace globals with a wrapper that warns every time you access it. For example:
class GlobalsWrapper(collections.MutableMapping):
def __init__(self, globaldict):
self.globaldict = globaldict
# ... implement at least __setitem__, __delitem__, __iter__, __len__
# in the obvious way, by delegating to self.globaldict
def __getitem__(self, key):
print >>sys.stderr, 'Warning: accessing global "{}"'.format(key)
return self.globaldict[key]
globals_wrapper = GlobalsWrapper(globals())
Again, you can filter on non-callables pretty easily:
def __getitem__(self, key):
value = self.globaldict[key]
if not callable(value):
print >>sys.stderr, 'Warning: accessing global "{}"'.format(key)
return value
Obviously for Python 3 you'd need to change the print statement to a print function call.
You can also raise an exception instead of warning pretty easily. Or you might want to consider using the warnings module.
You can hook this into your code in various different ways. The most obvious one is an import hook that gives each new module a GlobalsWrapper around its normally-built globals. Although I'm not sure how that will interact with C extension modules, but my guess is that it will either work, or be harmlessly ignored, either of which is probably fine. The only problem is that this won't affect your top-level script. If that's important, you can write a wrapper script that execfiles the main script with a GlobalsWrapper, or something like that.
I've been struggling with a similar challenge (especially in Jupyter notebooks) and created a small package to limit the scope of functions.
>>> from localscope import localscope
>>> a = 'hello world'
>>> #localscope
... def print_a():
... print(a)
Traceback (most recent call last):
...
ValueError: `a` is not a permitted global
The #localscope decorator uses python's disassembler to find all instances of the decorated function using a LOAD_GLOBAL (global variable access) or LOAD_DEREF (closure access) statement. If the variable to be loaded is a builtin function, is explicitly listed as an exception, or satisfies a predicate, the variable is permitted. Otherwise, an exception is raised.
Note that the decorator analyses the code statically. Consequently, it does not have access to the values of variables accessed by closure.