Do imports somehow transcend module namespaces? - python

I have a module (say module A) that for one of its functions, returns a BeautifulSoup object. I am writing a second module (module B) that calls this function and stores that BeautifulSoup object. I am confused how I can call BeautifulSoup functions on the object that was returned by module A in module B without module B importing anything from bs4, or having to access those BS4 functions through module A. Is the import basically putting module_a and all of its imports in the package, and so the BeautifulSoup class is visible to module_b?
module_a.py
from bs4 import BeautifulSoup
def function():
some_xml = "<name>Namespaces are strange.</name>"
return BeautifulSoup(some_xml, "xml")
module_b.py
import module_a
def main():
# How does this line know what to do with .find()? or .string?
print(module_a.function().find("name").string)

This is the beauty of a dynamic language like python. module_a.function() tells python to:
go to module a
lookup a function called "function" and call it
take the returned object, lookup "find" and call it
take the returned object, lookup "string" and call it
Since all of these lookups happen dynamically as the function or method is called, module_b doesn't have to have a predefined interface for bs4 or string. Its just looking for a method called "find" that could come from anywhere. In fact, module_a could swap in some other implementation and as long as the methods being called are still there, it still works.
from bs4 import BeautifulSoup imports the bs4 module (and any submodules it imports) into python. You can see the module in sys.modules when its done. It then reaches into bs4, looks up BeautifulSoup and adds that class to module_a 's namespace. The module is now available to all other imported modules... its just that they don't know about it because they haven't imported it. Modules that just use the resulting objects from bs4 never need to see it directly.

Related

Does Python import copy all the code into the file

When we import a module in a Python script, does this copy all the required code into the script, or does it just let the script know where to find it?
What happens if we don't use the module then in the code, does it get optimized out somehow, like in C/C++?
None of those things are the case.
An import does two things. First, if the requested module has not previously been loaded, the import loads the module. This mostly boils down to creating a new global scope and executing the module's code in that scope to initialize the module. The new global scope is used as the module's attributes, as well as for global variable lookup for any code in the module.
Second, the import binds whatever names were requested. import whatever binds the whatever name to the whatever module object. import whatever.thing also binds the whatever name to the whatever module object. from whatever import somefunc looks up the somefunc attribute on the whatever module object and binds the somefunc name to whatever the attribute lookup finds.
Unused imports cannot be optimized out, because both the module loading and the name binding have effects that some other code might be relying on.

Python import module sharing name with a function in __init__.py

My tree looks like
parent/
|--__init__.py
\--a.py
And the content of __init__.py is
import parent.a as _a
a = 'some string'
When I open up a Python at the top level and import parent.a, I would get the string instead of module. For example import parent.a as the_a; type(the_a) == str.
So I think OK probably import is importing the name from the parent namespace, and it's now overridden. So I figure I can go import parent._a as a_module. But this doesn't work as there is "No module named _a".
This is very confusing. A function can override a module with the same name, but a module cannot take on a new name and "reexport".
Is there any explanation I'm not aware of? Or is this documented feature?
Even more confusing, if I remove the import statement in __init__.py, everything is back normal again (import parent.a; type(parent.a) is module). But why is this different? The a name in parent namespace is still a string.
(I ran on Python 3.5.3 and 2.7.13 with the same results)
In an import statement, the module reference never uses attribute lookups. The statements
import parent.a # as ...
and
from parent.a import ... # as ...
will always look for parent.a in the sys.modules namespace before trying to further initiate module loading from disk.
However, for from ... import name statements, Python does look at attributes of the resolved module to find name, before looking for submodules.
Module globals and the attributes on a module object are the same thing. On import, Python adds submodules as attributes (so globals) to the parent module, but you are free to overwrite those attributes, as you did in your code. However, when you then use an import with the parent.a module path, attributes do not come into play.
From the Submodules section of the Python import system reference documentation:
When a submodule is loaded using any mechanism [...] a binding is placed in the parent module’s namespace to the submodule object. For example, if package spam has a submodule foo, after importing spam.foo, spam will have an attribute foo which is bound to the submodule.
Your import parent.a as _a statement adds two names to the parent namespace; first a is added pointing to the parent.a submodule, and then _a is also set, pointing to the same object.
Your next line replaces the name a with a binding to the 'some string' object.
The Searching section of the same details how Python goes about finding a module when you import:
To begin the search, Python needs the fully qualified name of the module [...] being imported.
[...]
This name will be used in various phases of the import search, and it may be the dotted path to a submodule, e.g. foo.bar.baz. In this case, Python first tries to import foo, then foo.bar, and finally foo.bar.baz. If any of the intermediate imports fail, a ModuleNotFoundError is raised.
then further on
The first place checked during import search is sys.modules. This mapping serves as a cache of all modules that have been previously imported, including the intermediate paths. So if foo.bar.baz was previously imported, sys.modules will contain entries for foo, foo.bar, and foo.bar.baz. Each key will have as its value the corresponding module object.
During import, the module name is looked up in sys.modules and if present, the associated value is the module satisfying the import, and the process completes. [...] If the module name is missing, Python will continue searching for the module.
So when trying to import parent.a all that matters is that sys.modules['parent.a'] exists. sys.modules['parent'].a is not consulted.
Only from module import ... would ever look at attributes. From the import statement documentation:
The from form uses a slightly more complex process:
find the module specified in the from clause, loading and initializing it if necessary;
for each of the identifiers specified in the import clauses:
check if the imported module has an attribute by that name
if not, attempt to import a submodule with that name and then check the imported module again for that attribute
[...]
So from parent import _a would work, as would from parent import a, and you'd get the parent.a submodule and the 'some string' object, respectively.
Note that sys.modules is writable, if you must have import parent._a work, you can always just alter sys.modules directly:
sys.modules['parent._a'] = sys.modules['parent.a'] # make parent._a an alias for parent.a
import parent._a # works now
I think I have a coherent understanding of this problem now, just documenting my findings in case others run into this.
What Martijn said above is mostly true, expanding on that answer, import parent.a as _a is a two step process. The first step is module lookup of parent.a, which never goes through attribute lookup, and then it does a binding onto sys.modules, and then an attribute binding of the module to attribute a in parent. In fact this is all you get if you only use import parent.a. This part is described thoroughly by the previous answer.
The second part as _a does an attribute lookup of parent.a, and binds it onto the name _a. So to answer my original question, now if I go outside and start an interactive Python interpreter, now parent.a has been overwritten to the string in __init__.py, and import parent.a as the_a; the_a would get me the string. In fact, this is the same as import parent.a; parent.a. Both the_a and parent.a are the results of attribute lookup. I could still get the submodule by parent._a or sys.modules["parent.a"].
To answer my follow up question:
Even more confusing, if I remove the import statement in __init__.py, everything is back normal again (import parent.a; type(parent.a) is module). But why is this different? The a name in parent namespace is still a string.
This is when I import parent.a in the outside interactive Python interpreter, it first evaluates __init__.py, which does the overwriting of parent.a to a string. But the import hasn't finished yet, it goes on importing the submodule parent.a, and since we are still in the importing part, we don't do attribute lookups, and so we find the correct submodule. When all this is done, it binds the submodule to a of parent, thus overwriting the string that was overwriting the submodule, and making it all correct again.
This sounds very confusing, but remember (https://docs.python.org/3/reference/import.html#submodules):
When a submodule is loaded using any mechanism (e.g. importlib APIs, the import or import-from statements, or built-in __import__()) a binding is placed in the parent module’s namespace to the submodule object. For example, if package spam has a submodule foo, after importing spam.foo, spam will have an attribute foo which is bound to the submodule. Let’s say you have the following directory structure:
An import parent.a first runs all the module set-up code, and then binds the name.

monkeypatch function in module whose namespace was overwritten

I am trying to monkeypatch a function in an external module I use, but monkeypatch can't seem to access the function because the namespace of the module gets overwritten on import.
Concretely, I use a Bio.PDB.PDBList.PDBList object (biopython module) in my code, and I am trying to patch _urlretrieve in Bio.PDB.PDBList to prevent calls to the internet and instead get files from a local directory, without having to mock the instance methods of PDBList which would be substantially more work. But when I try the naïve:
m.setattr("Bio.PDB.PDBList._urlretrieve", mock_retrieve)
pytest complains:
AttributeError: 'type' object at Bio.PDB.PDBList has no attribute '_urlretrieve'
On further inspection of Bio.PDB, I can see that the module namespace .PDBList seems to be overwritten by the class .PDBList.PDBList:
# Download from the PDB
from .PDBList import PDBList
So that would explain why pytest sees Bio.PDB.PDBList as a type object with no attribute _urlretrieve. My question is, is there any way to get monkeypatch to patch this 'hidden' function?
Concrete example of usage of PDBList class:
from Bio.PDB.PDBList import PDBList
_pdblist = PDBList()
downloaded_file = _pdblist.retrieve_pdb_file('2O8B', pdir='./temp', file_format='pdb')
You are right - since the PDBList class has the same name as the module Bio.PDB.PDBList, after import Bio.PDB.PDBList you won't be able to access the module by its name (shadowing problem). However, you can still grab the imported module object from the loaded modules cache and monkeypatch that:
import sys
from unittest.mock import Mock
import Bio.PDB.PDBList
def test_spam(monkeypatch):
assert isinstance(Bio.PDB.PDBList, type)
with monkeypatch.context() as m:
m.setattr(sys.modules['Bio.PDB.PDBList'], '_urlretrieve', Mock())
...

Python, when import from module cmd I haven't all attribute

When I import module urllib in cmd I have only magic fuctions but when I import this module from Spyder I have all attribute.
I'm adding printscreen of this.
Why can not I import all attribute?
To import all attributes of a module into the global namespace use from module import *.
While dir() simply lists the attributes of the global namespace, dir(module) lists the ones in the scope of the specific module.
Anyway, when you import urllib you can still reach all attributes if you specify the module name, e.g. urllib.request.

How does Python's "import" work internally?

When you import a module, then reimport it again, will it get reimported/overwritten, or skipped?
When you import module "a" and "b", but also have module "b" imported in module "a", what happens? Is it safe to do this? For example if that module "b" has a class instantiated in it, will you end up instantiating it twice?
import loads the matching .py, .pyc or .pyo file, creates a module object, and stores it with its fully qualified ("dotted") name in the sys.modules dictionary. If a second import finds the module to import in this dictionary, it will return it without loading the file again.
To answer your questions:
When you import a module, then reimport it again, will it get reimported/overwritten, or skipped?
It will get skipped. To explicitely re-import a module, use the reload() built-in function.
When you import module "a" and "b", but also have module "b" imported in module "a", what happens?
import a will load a from a.py[c], import b will return the module sys.modules['b'] already loaded by a.
Is it safe to do this?
Yes, absolutely.
For example if that module "b" has a class instantiated in it, will you end up instantiating it twice?
Nope.
The module will only be instantiated once. It is safe to import the same module in multiple other modules. If there is a class instance (object) that is created in the module itself, the very same object will be accessed from all modules that import it.
You can, if you like, have a look at all imported modules:
import sys
print sys.modules
sys.modules is dictionary which maps module names the module objects. The first thing the import statement does is looking in sys.modules, if it cannot find the module there, it will be instantiated, and added to sys.modules for future imports.
See this page for more details:
http://effbot.org/zone/import-confusion.htm (see "What Does Python Do to Import a Module?")

Categories