I'm a bit confused about globals when it comes to packages using other packages. From a quick google search; there's not much that explains.
Simply put: at what level is a variable "globalized" when using global? Is it at the module level, package level, or interpreter level? Ie, in a setup such as this:
<Package>
|- __init__.py
|- Module.py
|- Module2.py
and there is a global statment used within Module.py, is the variable globalized for just that module, or the entire package (including Module2.py and or __init__.py), or at the interperter level (for anything being run in the interpreter).
Also, if in __init__.py I from .Module import *, would the imported functions containing a global statement "properly globalize" any said variable for that file?
How global are global variables?
So-called "global" variables in Python are really module-level. There are actually-global globals, which live in the __builtin__ module in Python 2 or builtins in Python 3, but you shouldn't touch those. (Also, note the presence or lack of an s. __builtins__ is its own weird thing.)
What does the global statement do?
The global statement means that, for just the function in which it appears, the specified variable name or names refer to the "global" (module-level) variable(s), rather than local variables.
What about import *?
Oh god, don't do that. Globals are bad enough, but importing them is worse, and doing it with import * is just about the worst way you could do it. What the import system does with global variables is horribly surprising to new programmers and almost never what you want.
When you do import *, that doesn't mean that your module starts looking in the imported module's global variables for variable lookup. It means that Python looks in the imported module, finds its "public" global variables*, and assigns their current values to identically-named new global variables in the current module.
That means that any assignments to your new global variables won't affect the originals, and any assignments to the originals won't affect your new variables. Any functions you imported with import * are still looking at the original variables, so they won't see changes you make to your copies, and you won't see changes they make to theirs. The results are a confusing mess.
Seriously, if you absolutely must use another module's global variables, import it with the import othermodule syntax and access the globals with othermodule.whatever_global.
*If the module defines an __all__ list, the "public" globals are the variables whose names appear in that list. Otherwise, they're the variables whose names don't start with an underscore. defined functions are stored in ordinary variables, so those get included under the same criteria as other variables.
Related
I have a module which I called entities.py - there are 2 classes within it and 2 global variables as in below pattern:
FIRST_VAR = ...
SECOND_VAR = ...
class FirstClass:
[...]
class SecondClass:
[...]
I also have another module (let's call it main.py for now) where I import both classes and constants as like here:
from entities import FirstClass, SecondClass, FIRST_VAR, SECOND_VAR
In the same "main.py" module I have another constant: THIRD_VAR = ..., and another class, in which all of imported names are being used.
Now, I have a function, which is being called only if a certain condition is met (passing config file path as CLI argument in my case). As my best bet, I've written it as following:
def update_consts_from_config(config: ConfigParser):
global FIRST_VAR
global SECOND_VAR
global THIRD_VAR
FIRST_VAR = ...
SECOND_VAR = ...
THIRD_VAR = ...
This works perfectly fine, although PyCharm indicates two issues, which at least I don't consider accurate.
from entities import FirstClass, SecondClass, FIRST_VAR, SECOND_VAR - here it warns me that FIRST_VAR and SECOND_VAR are unused imports, but from my understanding and testing they are used and not re-declared elsewhere unless function update_consts_from_config is invoked.
Also, under update_consts_from_config function:
global FIRST_VAR - at this and next line, it says
Global variable FIRST_VAR is undefined at the module level
My question is, should I really care about those warnings and (as I think the code is correct and clear), or am I missing something important and should come up with something different here?
I know I can do something as:
import entities
from entities import FirstClass, SecondClass
FIRST_VAR = entities.FIRST_VAR
SECOND_VAR = entities.SECOND_VAR
and work from there, but this look like an overkill for me, entities module has only what I have to import in main.py which also strictly depends on it, therefore I would rather stick to importing those names explicitly than referencing them by entities. just for that reason
What do you think would be a best practice here? I would like my code to clear, unambiguous and somehow optimal.
Import only entities, then refer to variables in its namespace to access/modify them.
Note: this pattern, modifying constants in other modules (which then, to purists, aren't so much constants as globals) can be justified. I have tons of cases where I use constants, rather than magic variables, as module level configuration. However, for example for testing, I might reach in and modify these constants. Say to switch a cache expiry from 2 days to 0.1 seconds to test caching. Or like you propose, to override configuration. Tread carefully, but it can be useful.
main.py:
import entities
def update_consts_from_config(FIRST_VAR):
entities.FIRST_VAR = FIRST_VAR
firstclass = entities.FirstClass()
print(f"{entities.FIRST_VAR=} before override")
firstclass.debug()
entities.debug()
update_consts_from_config("override")
print(f"{entities.FIRST_VAR=} after override")
firstclass.debug()
entities.debug()
entities.py:
FIRST_VAR = "ori"
class FirstClass:
def debug(self):
print(f"entities.py:{FIRST_VAR=}")
def debug():
print(f"making sure no closure/locality effects after object instantation {FIRST_VAR=}")
$ python main.py
entities.FIRST_VAR='ori' before override
entities.py:FIRST_VAR='ori'
making sure no closure/locality effects after object instantation FIRST_VAR='ori'
entities.FIRST_VAR='override' after override
entities.py:FIRST_VAR='override'
making sure no closure/locality effects after object instantation FIRST_VAR='override'
Now, if FIRST_VAR wasn't a string, int or another type of immutable, you should I think be able to import it separately and mutate it. Like SECOND_VAR.append("config override") in main.py. But assigning to a global in main.py will only affect affect the main.py binding, so if you want to share actual state between main.py and entities and other modules, everyone, not just main.py needs to import entities then access entities.FIRST_VAR.
Oh, and if you had:
class SecondClass:
def __init__(self):
self.FIRST_VAR = FIRST_VAR
then its instance-level value of that immutable string/int would not be affected by any overrides done after an instance creation. Mutables like lists or dictionaries would be affected because they're all different bindings pointing to the same variable.
Last, wrt to those "tricky" namespaces. global in your original code means: "dont consider FIRST_VAR as a variable to assign in update_consts_from_config s local namespace , instead assign it to main.py global, script-level namespace".
It does not mean "assign it to some global state magically shared between entities.py and main.py". __builtins__ might be that beast but modifying it is considered extremely bad form in Python.
When we do from modulename import function or variable, then that function or variable gets loaded in calling modules namespace (local). Any change to variable is not global but only within the calling module (local).
When we do import modulename, then functions and variables get setup in the modulename namespace (global). Any change made to variable for example is globally visible.
I want to understand where exactly is the variable value and functions held? Is it in the sys modules dictionary or is there a namespace dictionary that holds the variable values and functions? Any articles or reference links on this subject will also be very helpful.
Consider this code snippet:
global open
print(open)
which gives the following result:
<built-in function open>
My question is: Does the name open belong to the built-in or the global scope in this example?
I thought that the global declaration will force the name open to be mapped to the global scope (and, thus, will lead us to an error), which is not happening here. Why?
First, the direct answer:
The name open belongs to the top-level namespace. Which essentially means "look up in globals, fallback to builtins; assign to globals".
Adding global open just forces it to belong to the top-level namespace, where it already was. (I'm assuming this is top-level code, not inside a function or class.)
Does that seem to contract what you read? Well, it's a bit more complicated.
According to the reference docs:
The global statement is a declaration which holds for the entire current code block. It means that the listed identifiers are to be interpreted as globals.
But, despite what other parts of the docs seem to imply, "interpreted as globals" doesn't actually mean "searched in the global namespace", but "searched in the top-level namespace", as documented in Resolution of names:
Names are resolved in the top-level namespace by searching the global namespace, i.e. the namespace of the module containing the code block, and the builtins namespace, the namespace of the module builtins. The global namespace is searched first. If the name is not found there, the builtins namespace is searched.
And "as globals" means "the same way that names in the global namespace are looked up", aka "in the top-level namespace".
And, of course, assignment to the top-level namespace always goes to globals, not builtins. (That's why you can shadow the builtin open with the global open in the first place.)
Also, notice that, as explained in the exec and eval docs, even this isn't quite true for code run through exec:
If the globals dictionary does not contain a value for the key __builtins__, a reference to the dictionary of the built-in module builtins is inserted under that key. That way you can control what builtins are available to the executed code by inserting your own __builtins__ dictionary into globals before passing it to exec().
And exec is, ultimately, how modules and scripts get executed.
So, what really happens—at least by default—is that the global namespace is searched; if the name is not found, the global namespace is searched for a __builtins__ value; if that's a module or a mapping, it's searched.
If you're curious how this works in CPython in particular:
At compile time:
The compiler builds a symbol table for a function, separating names out into freevars (nonlocals), cellvars (locals that are used as nonlocals by nested functions), locals (any other locals) and globals (which of course technically means "top-level namespace" variables). This is where the global statement comes into play: it forces the name to be added to the global symbol table instead of a different one.
Then it compiles the code, and emits LOAD_GLOBAL instructions for the globals. (And it stores the various names in tuple members on the code object, like co_names for globals and co_cellvars for cellvars and so on.)
At runtime:
When a function object gets created from compiled code, it gets __globals__ attached to it as an attribute.
When a function gets called, its __globals__ becomes the f_globals for the frame.
The interpreter's eval loop then handles each LOAD_GLOBAL instruction by doing exactly what you'd expect with that f_globals, including the fallback to __builtins__ as described in the exec docs.
I'm a bit confused about globals when it comes to packages using other packages. From a quick google search; there's not much that explains.
Simply put: at what level is a variable "globalized" when using global? Is it at the module level, package level, or interpreter level? Ie, in a setup such as this:
<Package>
|- __init__.py
|- Module.py
|- Module2.py
and there is a global statment used within Module.py, is the variable globalized for just that module, or the entire package (including Module2.py and or __init__.py), or at the interperter level (for anything being run in the interpreter).
Also, if in __init__.py I from .Module import *, would the imported functions containing a global statement "properly globalize" any said variable for that file?
How global are global variables?
So-called "global" variables in Python are really module-level. There are actually-global globals, which live in the __builtin__ module in Python 2 or builtins in Python 3, but you shouldn't touch those. (Also, note the presence or lack of an s. __builtins__ is its own weird thing.)
What does the global statement do?
The global statement means that, for just the function in which it appears, the specified variable name or names refer to the "global" (module-level) variable(s), rather than local variables.
What about import *?
Oh god, don't do that. Globals are bad enough, but importing them is worse, and doing it with import * is just about the worst way you could do it. What the import system does with global variables is horribly surprising to new programmers and almost never what you want.
When you do import *, that doesn't mean that your module starts looking in the imported module's global variables for variable lookup. It means that Python looks in the imported module, finds its "public" global variables*, and assigns their current values to identically-named new global variables in the current module.
That means that any assignments to your new global variables won't affect the originals, and any assignments to the originals won't affect your new variables. Any functions you imported with import * are still looking at the original variables, so they won't see changes you make to your copies, and you won't see changes they make to theirs. The results are a confusing mess.
Seriously, if you absolutely must use another module's global variables, import it with the import othermodule syntax and access the globals with othermodule.whatever_global.
*If the module defines an __all__ list, the "public" globals are the variables whose names appear in that list. Otherwise, they're the variables whose names don't start with an underscore. defined functions are stored in ordinary variables, so those get included under the same criteria as other variables.
As I understand it python has the following outermost namespaces:
Builtin - This namespace is global across the entire interpreter and all scripts running within an interpreter instance.
Globals - This namespace is global across a module, ie across a single file.
I am looking for a namespace in between these two, where I can share a few variables declared within the main script to modules called by it.
For example, script.py:
import Log from Log
import foo from foo
log = Log()
foo()
foo.py:
def foo():
log.Log('test') # I want this to refer to the callers log object
I want to be able to call script.py multiple times and in each case, expose the module level log object to the foo method.
Any ideas if this is possible?
It won't be too painful to pass down the log object, but I am working with a large chunk of code that has been ported from Javascript. I also understand that this places constraints on the caller of foo to expose its log object.
Thanks,
Paul
There is no namespace "between" builtins and globals -- but you can easily create your own namespaces and insert them with a name in sys.modules, so any other module can "import" them (ideally not using the from ... import syntax, which carries a load of problems, and definitely not using tghe import ... from syntax you've invented, which just gives a syntax error). For example, in script.py:
import sys
import types
sys.modules['yay'] = types.ModuleType('yay')
import Log
import foo
yay.log = Log.Log()
foo.foo()
and in foo.py
import yay
def foo():
yay.log.Log('test')
Do not fear qualified names -- they're goodness! Or as the last line of the Zen of Python (AKA import this) puts it:
Namespaces are one honking great idea -- let's do more of those!
You can make and use "more of those" most simply -- just qualify your names (situating them in the proper namespace they belong in!) rather than insisting on barenames where they're just not a good fit. There's a bazillion things that are quite easy with qualified names and anywhere between seriously problematic and well-nigh unfeasible for those who're stuck on barenames!-)
There is no such scope. You will need to either add to the builtins scope, or pass the relevant object.
Actually, I did figure out what I was looking for.
This hack is actually used PLY and that is where is stumbled across.
The library code can raise a runtime exception, which then gives access to the callers stack.