Does __all__ have to be a list? [duplicate] - python

In many large projects, even in such as Django, and in official Python documentation use list to list the "available from outside" module components in the __init__.py file:
__all__ = [foo, bar, other]
However, the record
__all__ = (foo, bar, other)
it will also work, and in theory it does not give a significant, but an increase in code performance.
Why, then use it to list?
Maybe there is some magic PEP that I don't know about?

There is no binding reason to use either list or tuple. However, list idiomatically represents a sequence of same kind of items but tuple represents a sequence of different kind of items. This is also encoded by type hints, which have some list[str | int] but positional fields inside tuple[str, int, int].
As such, list more accurately represents "arbitrary sequence of names".
PEP 8 repeatedly makes mention of __all__ being a list:
To better support introspection, modules should explicitly declare the names in their public API using the __all__ attribute. Setting __all__ to an empty list indicates that the module has no public API.
"""This is the example module.
This module does stuff.
"""
...
__all__ = ['a', 'b', 'c']

The language reference says:
The public names defined by a module are determined by checking the module’s namespace for a variable named __all__; if defined, it must be a sequence of strings which are names defined or imported by that module.
A sequence is something that supports iteration (for x in __all__) and access using integer indices (__all__[i]). So it can be a list, or a tuple.

From a practical standpoint, it's somewhat common to add elements to __all__ semi-dynamically. It happens when you want to expose functions defined deep in the package structure at the top level. This is much easier to do with a list.
A couple of examples of modules that do this are numpy and pyserial. I strongly suspect that Django does this too in places, but am not familiar enough with it to know for sure.
The idiom looks something like this in __init__.py:
__all__ = [] # or might have some initial names
from .subpackage import (name1, name2, name3)
__all__.extend(['name1', 'name2', 'name3']) # or append 1-by-1 or +=
I've even seen a slightly sloppier approach, although arguably more maintainable under certain circumstances, that looks like this:
__all__ = []
from .subpackage import *
from .subpackage import __all__ as _all
__all__.extend(_all)
del _all
Clearly this is greatly simplified by having a mutable __all__. There is no substantial benefit to turning it into a tuple after the fact or "appending" to a tuple using +=.
Another way a mutable __all__ is useful is when your API depends on optional external packages. It's much easier to enable or disable names in a list than a tuple.
Here is an example of a module that enables additional functionality if a library called optional_dependency is installed:
# Core API
__all__ = ['name', 'name2', 'name3']
from .sub1 import name1
from .sub2 import name2, name3
try:
import optional_dependency
except ImportError:
# Let it be, it maybe issue a warning
pass
else:
from .opt_support import name4, name5
__all__ += ['name4', 'name5']

Just wanted to document a little error I ran into relevant to this post: note that you need a trailing comma to create a single element tuple. So this:
__all__ = ['foo'] # I am list with one element
Is not the same as this:
__all__ = ('foo') # I am a string
Here's an example of this going wrong. In the second case, if you try to import with the wildcard*:
from mymodule import *
You get the confusing error:
AttributeError: module 'mypackage.mymodule' has no attribute 'f'
What is 'f'?!? Well, it is the first element of __all__, which is pointing to the string 'foo', not the single-element tuple ('foo',).
* Using from x import * is maybe what is more to blame here, as opposed to the tuple vs. list choice. But this still seems to be a relatively common pattern in __init__.py files, and makes me lean towards preferring lists.

Related

Unload after from package import *

Is there a way to clear the namespace after an import like:
from pandas import *
Ps: I know it's the worse way possible. It's for educational purposes.
You can clear globals entirely (built-ins remain, but nothing else you've defined or imported) with:
globals().clear()
globals() returns the dict representing the global namespace, and like all dicts, it has a clear method to remove all mappings from it.
If you want to limit it to what came from pandas only, assuming it defines __all__ (I don't know if pandas specifically does), you could do something like:
import pandas
for name in pandas.__all__:
del globals()[name]
since from SOMEMODULE import *, for a package/module defining __all__, definitionally imports the names listed in __all__, so this unmaps those names specifically. If it doesn't, you're stuck with a slightly uglier heuristic for the case when __all__ is not defined, which I believe is just "does it start with an underscore?", so you could do:
import pandas
for name in vars(pandas):
if not name.startswith('_'):
del globals()[name]

Design of Python conditional imports [duplicate]

This question already has answers here:
Python, doing conditional imports the right way
(4 answers)
Closed last month.
I'm new to conditional importing in Python, and am considering two approaches for my module design. I'd appreciate input on why I might want to go with one vs. the other (or if a better alternative exists).
The problem
I have a program that will need to call structurally identical but distinct modules under different conditions. These modules all have the same functions, inputs, outputs, etc., the only difference is in what they do within their functions. For example,
# module_A.py
def get_the_thing(input):
# do the thing specific to module A
return thing
# module_B.py
def get_the_thing(input):
# do the thing specific to module B
return thing
Option 1
Based on an input value, I would just conditionally import the appropriate module, in line with this answer.
if val == 'A':
import module_A
if val == 'B':
import module_B
Option 2
I use the input variable to generate the module name as a string, then I call the function from the correct module based on that string using this method. I believe this requires me to import all the modules first.
import module_A
import module_B
in_var = get_input() # Say my input variable is 'A', meaning use Module A
module_nm = 'module_' + in_var
function_nm = 'get_the_thing'
getattr(globals()[module_nm], function_nm)(my_args)
The idea is this would call module_A.get_the_thing() by generating the module and function names at runtime. This is a frivolous example for only one function call, but in my actual case I'd be working with a list of functions, just wanted to keep things simple.
Any thoughts on whether either design is better, or if something superior exists to these two? Would appreciate any reasons why. Of course, A is more concise and probably more intuitive, but wasn't sure this necessarily equated to good design or differences in performance.
I'd go with Option 1. It's significantly neater, and you aren't needing to fiddle around with strings to do lookups. Dealing with strings, at the very least, will complicate refactoring. If you ever change any of the names involved, you must remember to update the strings as well; especially since even smart IDEs won't be able to help you here with typical shift+F6 renaming. The less places that you have difficult to maintain code like that, the better.
I'd make a minor change to 1 though. With how you have it now, each use of the module will still require using a qualified name, like module_A.do_thing(). That means whenever you want to call a function, you'll need to first figure out which was imported in the first place, which leads to more messy code. I'd import them under a common name:
if val == 'A':
import module_A as my_module
if val == 'B':
import module_B as my_module
. . .
my_module.do_thing() # The exact function called will depend on which module was imported as my_module
You could also, as suggested in the comments, use a wildcard import to avoid needing to use a name for the module:
if val == 'A':
from module_A import *
if val == 'B':
from module_B import *
. . .
do_thing()
But this is discouraged by PEP8:
Wildcard imports (from <module> import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools.
It also pollutes the namespace that you're importing into, making it easier to accidentally shadow a name from the imported file.

globals()['variable']=value set in a .py file does not reflect value changes when used in other .py file

globalEx1.py:
globals()['a']='100'
def setvalue(val):
globals()['a'] = val
globalEx2.py:
from globalEx1 import *
print a
setvalue('200')
print a
On executing globalEx2.py:
Output:
100
100
How can I change value of globals['a'] using a function, so that it reflects across the .py files?
Each module has its own globals. Python is behaving exactly as expected. Updating globalEx1's a to point to something else isn't going to affect where globalEx2's a is pointing.
There are various ways around this, depending on exactly what you want.
re-import a after the setvalue() call
return a and assign it, like a = setvalue().
import globalEx1 and use globalEx1.a instead of a. (Or use import globalEx1 as and a shorter name.)
pass globalEx2's globals() as an argument to setvalue and set the value on that instead.
make a a mutable object containing your value, like a list, dict or types.SimpleNamespace, and mutate it in setvalue.
use inspect inside setvalue to get the caller's globals from its stack frame. (Convenient, but brittle.)
Last option looks suitable for me.. it will do the job with minimal code change but can I update globals of multiple modules using same way? or it only gives me the caller's globals?
Option 6 is actually the riskiest. The caller itself basically becomes a hidden parameter to the function, so something like a decorator from another module can break it without warning. Option 4 just makes that hidden parameter explicit, so it's not so brittle.
If you need this to work across more than two modules, option 6 isn't good enough, since it only gives you the current call stack. Option 3 is probably the most reliable for what you seem to be trying to do.
How does option 1 work? I mean is it about running again -> "from globalEx1 import *" because I have many variables like 'a'.
A module becomes an object when imported the first time and it's saved in the sys.modules cache, so importing it again doesn't execute the module again. A from ... import (even with the *) just gets attributes from that module object and adds them to the local scope (which is the module globals if done at the top level, that is, outside of any definition.)
The module object's __dict__ is basically its globals, so any function that alters the module's globals will affect the resulting module object's attrs, even if it's done after the module was imported.
We cannot do from 'globalEx1 import *' from a python function, any alternative to this?
The star syntax is only allowed at the top level. But remember that it's just reading attributes from the module object. So you can get a dict of all the module attributes like
return vars(globalEx1)
This will give you more than * would. It doesn't return names that begin with an _ by default, or the subset specified in __all__ otherwise. You can filter the resulting dict with a dict comprehension, and even .update() the globals dict for some other module with the result.
But rather than re-implementing this filtering logic, you could just use exec to make it the top level. Then the only weird key you'd get is __builtins__
namespace = {}
exec('from globalEx1 import *', namespace)
del namespace['__builtins__']
return namespace
Then you can globals().update(namespace) or whatever.
Using exec like this is probably considered bad form, but then so is import * to begin with, honestly.
This is an interesting problem, related to the fact that strings are immutable. The line from globalEx1 import * creates two references in the globalEx2 module: a and setvalue. globalEx2.a initially refers to the same string object as globalEx1.a, since that's how imports work.
However, once you call setvalue, which operates on the globals of globalEx1, the value referenced by globalEx1.a is replaced by another string object. Since strings are immutable, there is no way to do this in place. The value of globalEx2.a remains bound to the original string object, as it should.
You have a couple of workarounds available here. The most pythonic is to fix the import in globalEx2:
import globalEx1
print globalEx1.a
globalEx1.setvalue('200')
print globalEx1.a
Another option would be to use a mutable container for a, and access that:
globals()['a']=['100']
def setvalue(val):
globals()['a'][0] = val
from globalEx1 import *
print a[0]
setvalue('200')
print a[0]
A third, and wilder option, is to make globalEx2's setvalue a copy of the original function, but with its __globals__ attribute set to the namespace of globalEx2 instead of globalEx1:
from functools import update_wrapper
from types import FunctionType
from globalEx1 import *
_setvalue = FunctionType(setvalue.__code__, globals(), name=setvalue.__name__,
argdefs=setvalue.__defaults__,
closure=setvalue.__closure__)
_setvalue = functools.update_wrapper(_setvalue, setvalue)
_setvalue.__kwdefaults__ = f.__kwdefaults__
setvalue = _setvalue
del _setvalue
print a
...
The reason you have to make the copy is that __globals__ is a read-only attribute, and also you don't want to mess with the function in globalEx1. See https://stackoverflow.com/a/13503277/2988730.
Globals are imported only once at the beginning with the import statement. Thus, if the global is an immutable object like str, int, etc, any update will not be reflected. However, if the global is a mutable object like list, etc, updates will be reflected. For example,
globalEx1.py:
globals()['a']=[100]
def setvalue(val):
globals()['a'][0] = val
The output will be changed as expected:
[100]
[200]
Aside
It's easier to define globals like normal variables:
a = [100]
def setvalue(value):
a[0] = value
Or when editing value of immutable objects:
a = 100
def setvalue(value):
global a
a = value

Whats the difference between namespaces and names?

I'm assuming the namespace is the allotted place in memory in which the name is to be stored. Or are they the same thing?
A namespace is a theoretical space in which the link between names and objects are situated: that is what is called a mapping between the names and the objects.
Names are the identifiers written in a script.
Objects are structures of bits lying in the memory.
The data structure that implements this theoretical namespace is a dictionnary. "Implements" means that it is the object that holds this data in the bits of the memory.
But the objects that this dictionary references are not grouped all together in a delimited portion of the memory, they are lying everywhere in the memory, it's the role of the dictionary to know how to find any of them with just a name at start when a name is encountered by the interpreter.
That's why I wrote it is a theoretical space, though it has a concrete existence in the memory. It is theoretical because the fact that several objects disseminated at different places in the memory can be considered to belong to one namespace is the result of the under-the-hood functionning of the Python interpreter, that is to say its data model and its execution model
In fact, things are more complex, under the hood there is a symbol table in the game. But I'm not enough competent concerning the C implementation of Python to say more. And by the way, people rarely allude to the symbol table.
However, I hope that the above explanation will shed some light in your mind concerning the subject of namespace.
Names are what can be traditionally thought of as "variables".
a = 1
b = 2
Both a and b are "names". If you try to reference a name that hasn't been set yet, you'll get a NameError:
>>> print c
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'c' is not defined
Namespaces are the places where names live. Typically this is a module. Each module (file) has it's own namespace -- Or it's own set of names which map to a corresponding value (a->1, b -> 2 in my example above). You can "merge" module namespaces in different ways using python's import functionality.
import foo # bring foo into the current namespace
from foo import * # bring all of the names in foo's namespace into the current namespace
Classes can also be considered a namespace although you can't import the names directly into your module namespace without some work.
class Namespace(object):
pass
namespace = Namespace()
namespace.a = 1
namespace.b = 2
As pointed out in some of the discussion in the comments, I ought to mention that namespaces in python are usually implemented by a dictionary since those are well suited to mapping names to values/objects. On a class, (or class instance), you can access the dictionary via ClassName.__dict__ or instance.__dict__ and you can do the same with a module: module.__dict__ if module is imported.
At a generic/language agnostic level, I think of a namespace as a way to bind a group of related names (and their associated values) together and to prevent those names from getting mixed up with similar names in a different namespace. e.g. I can have foo.something and bar.something. Both names are something, but they live in different namespaces (foo and bar respectively) so we can tell them apart.
'I'm assuming the namespace is the allotted place in memory in which the name is to be stored.'
yep.

How to find out what methods, properties, etc a python module possesses

Lets say I import a module. In order for me to make the best use of it, I would like to know what properties, methods, etc. that I can use. Is there a way to find that out?
As an example: Determining running programs in Python
In this line:
os.system('WMIC /OUTPUT:C:\ProcessList.txt PROCESS get Caption,Commandline,Processid')
Let's say I wanted to also print out the memory consumed by the processes. How do I find out if that's possible? And what would be the correct 'label' for it? (just as the author uses 'Commandline', 'ProcessId')
Similarly, in this:
import win32com.client
def find_process(name):
objWMIService = win32com.client.Dispatch("WbemScripting.SWbemLocator")
objSWbemServices = objWMIService.ConnectServer(".", "root\cimv2")
colItems = objSWbemServices.ExecQuery(
"Select * from Win32_Process where Caption = '{0}'".format(name))
return len(colItems)
print find_process("SciTE.exe")
How would I make the function also print out the memory consumed, the executable path, etc.?
As for Python modules, you can do
>>> import module
>>> help(module)
and you'll get a list of supported methods (more exactly, you get the docstring, which might not contain every single method). If you want that, you can use
>>> dir(module)
although now you'd just get a long list of all properties, methods, classes etc. in that module.
In your first example, you're calling an external program, though. Of course Python has no idea which features wmic.exe has. How should it?
dir(module) returns the names of the attributes of the module
module.__dict__ is the mapping between the keys and the attributes objects themselves
module.__dict__.keys() and dir(module) are lists having the same elements, though they are not equals because the elements aren't in same order in them
it seems that help(module) iswhat you really need
Python has a build in function called dir(). I'm not sure if this is what you are referring to, but fire up a interactive python console and type:
import datetime
dir(datetime)
This should give you a list of methods, properties and submodules
#ldmvcd
Ok, excuse me, I think you are a beginner and you don't see to what fundamental notions I am refering.
Objects are Python’s abstraction for
data. All data in a Python program is
represented by objects or by relations
between objects.
http://docs.python.org/reference/datamodel.html#the-standard-type-hierarchy
I don't understand why it is called "abstraction": for me an object is something real in the machine, a series of bits organized according certain rules to represent conceptual data or functionning.
Names refer to objects. Names are
introduced by name binding operations.
Each occurrence of a name in the
program text refers to the binding of
that name established in the innermost
function block containing the use.
http://docs.python.org/reference/executionmodel.html#naming-and-binding
.
A namespace is a mapping from names to
objects. Most namespaces are currently
implemented as Python dictionaries,
but that’s normally not noticeable in
any way (except for performance), and
it may change in the future. Examples
of namespaces are: the set of built-in
names (containing functions such as
abs(), and built-in exception names);
the global names in a module; and the
local names in a function invocation.
In a sense the set of attributes of an
object also form a namespace.
http://docs.python.org/tutorial/classes.html#a-word-about-names-and-objects
.
By the way, I use the word attribute
for any name following a dot — for
example, in the expression z.real,
real is an attribute of the object z.
Strictly speaking, references to names
in modules are attribute references:
in the expression modname.funcname,
modname is a module object and
funcname is an attribute of it. In
this case there happens to be a
straightforward mapping between the
module’s attributes and the global
names defined in the module: they
share the same namespace!
http://docs.python.org/tutorial/classes.html#a-word-about-names-and-objects
.
Namespaces are created at different
moments and have different lifetimes.
http://docs.python.org/tutorial/classes.html#a-word-about-names-and-objects
.
The namespace for a module is
automatically created the first time a
module is imported. The main module
for a script is always called
main. http://docs.python.org/reference/executionmodel.html#naming-and-binding
.
Well, a Python programm is a big machine that plays with objects, references to these objects , names of these objects, and namespaces in which are binded the names and the objects , namespaces being implemented as dictionaries.
So, you're right: when I refer to keys , I refer to names being the keys in the diverse namespaces. Names are arbitrary or not , according if the objects they have been created to name are user's objects or built-in objects.
I give advise you to read thoroughly the parts
3.1. Objects , values and types
http://docs.python.org/reference/datamodel.html#the-standard-type-hierarchy
and
4.1. Naming and binding
http://docs.python.org/reference/executionmodel.html#naming-and-binding

Categories