How does Python manage sys.modules? [duplicate] - python

Importing the standard "logging" module pollutes sys.modules with a bunch of dummy entries:
Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32
>>> import sys
>>> import logging
>>> sorted(x for x in sys.modules.keys() if 'log' in x)
['logging', 'logging.atexit', 'logging.cStringIO', 'logging.codecs',
'logging.os', 'logging.string', 'logging.sys', 'logging.thread',
'logging.threading', 'logging.time', 'logging.traceback', 'logging.types']
# and perhaps even more surprising:
>>> import traceback
>>> traceback is sys.modules['logging.traceback']
False
>>> sys.modules['logging.traceback'] is None
True
So importing this package puts extra names into sys.modules, except that they are not modules, just references to None. Other modules (e.g. xml.dom and encodings) have this issue as well. Why?
Edit: Building on bobince's answer, there are pages describing the origin (see section "Dummy Entries in sys.modules") and future of the feature.

None values in sys.modules are cached failures of relative lookups.
So when you're in package foo and you import sys, Python looks first for a foo.sys module, and if that fails goes to the top-level sys module. To avoid having to check the filesystem for foo/sys.py again on further relative imports, it stores None in the sys.modules to flag that the module didn't exist and a subsequent import shouldn't look there again, but go straight to the loaded sys.
This is a cPython implementation detail you can't usefully rely on, but you will need to know it if you're doing nasty magic import/reload hacking.
It happens to all packages, not just logging. For example, import xml.dom and see xml.dom.xml in the module list as it tries to import xml from inside xml.dom.
As Python moves towards absolute import this ugliness will happen less.

Related

importing a class from a module starting with number

I need to import a single class (not the whole file) from a python file starting with number.
There was a topic on importing a whole module and it works, but can't find my way around this one.
(In python, how to import filename starts with a number)
Normally it would be:
from uni_class import Student
though the file is called 123_uni_class.
Tried different variations of
importlib.import_module("123_uni_class")
and
uni_class=__import__("123_uni_class")
Error:
from 123_uni_class import Student
^
SyntaxError: invalid decimal literal
importlib.import_module("123_uni_class") returns the module after importing it, you must give it a valid name in order to reuse it:
import importlib
my_uni_class = importlib.import_module("123_uni_class")
Then you can access your module under the name 'my_uni_class'.
This would be equivalent to import 123_uni_class as my_uni_class if 123_uni_class were valid in this context.
It works for me:
Python 3.6.8 (default, Feb 14 2019, 22:09:48)
[GCC 7.4.0] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib
>>> importlib.import_module('123_a')
<module '123_a' from '/path/to/123_a.py'>
>>> __import__('123_a')
<module '123_a' from '/path/to/123_a.py'>
You would not see an actual syntax error containing the literal text "from 123_uni_class import ..." unless you actually have some source code containing that line.
If you must, you can also bypass the import system entirely by reading the contents of the file and exec()-ing them, possibly into a namespace you provide. For example:
mod = {}
with open('123_uni_class.py') as fobj:
exec(fobj.read(), mod)
Student = mod['Student']
This general technique is used, for example, to read config files written in Python and things like that. I would discourage it though for normal usage and suggest you just use a valid module name.

what is the difference between 'import a.b as b' and 'from a import b' in python [duplicate]

This question already has answers here:
from ... import OR import ... as for modules
(6 answers)
Closed 4 years ago.
I've always used from a import b but recently a team at work decided to move a module into a new namespace, and issued a warning notice telling people to replace import b with import a.b as b.
I've never used import as and the only documentation I can find seems to suggest it doesn't support import a.b as b, though clearly it does.
but is there actually a difference, and if so what?
As far as I know, correct me if I am wrong.
First, import a.b must import a module(a file) or a package(a directory contains __init__.py).
For example, you can import tornado.web as web but you cannot import flask.Flask as Flask as Flask is an object in package flask.
Second, import a.b also import the namespace a which from a import b won't. You can check it by globals().
So what's the influence? For example:
import tornado.web as web
Now you have access to namespace tornado, but you cannot access tornado.options even though tornado has this module. But as python's global package management, if you from tornado import options, you will not only get access to options but also add it to namespace tornado. So now you can also access options by tornado.options.
I am going to provide only a partial answer, and I will speculate a bit.
1) Sometimes I have observed that the second way works while the first does not. On my system:
Python 3.6.3 (default, Oct 3 2017, 21:45:48)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow import keras # <- this works
>>>
>>> import tensorflow.keras as K # <- this fails
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'tensorflow.keras'
>>>
2) I usually don't see a difference between these two approaches to importing. I haven't investigated WHY there is a difference with TensorFlow. It may have to do with what names are imported to the top level by the various TensorFlow subfolder init.py files (which are completely empty in most cases, but the one in ../dist_packages/tensorflow/python is pretty long).

from the same module import two thing, is the module executed twice?

Demo:
from module1 import func1
from module1 import func2
Ques:
Is the module1 executed twice?
Is this the good and correct way to import?
Thx!
In SO2.py:
print 'SO2 is running'
a = 'cabbage'
b = 'python-is-awesome'
In SO.py:
from SO2 import a
from SO2 import b
When running the latter:
SO2 is running
Only once ;).
Per PEP 8, it's okay to do:
from SO2 import a, b
Modules are cached in the import system, and thus are only loaded once unless they are explicitly reloaded. When python tries to import a module it first tries to locate it in sys.modules, which contains the references to all previously loaded Modules. This is the same for packages as well. For a more in-depth explanation, I'd recommend reading the excellent documentation of the import machinery.
More generally, Python only loads a module once, the first time that it is needed--whether it's an "import form" or plain import statement that does the "needing". Any subsequent imports will just note that the module has already been loaded.
The module initialization is only done once, on load.
If you need to refresh the contents of a module, you can reload it. On Python 2, there's a reload(m) built-in function to reload module m. In Python 3 this has been moved to the imp module, and imp.reload(m). In Python 3.4, imp is being deprecated in favor of a new module (as of 3.1) named "importlib". The initialization will get run again when a module is reloaded.
See:
http://docs.python.org/2.7/library/functions.html#reload
http://docs.python.org/3.3/library/imp.html
http://docs.python.org/3.3/library/importlib.html#module-importlib
As far as "best practices" go, I'm not always "pythonic". I agree with the PEP 8 position on one module per import line, but would prefer that modules be also listed alphabetically.
I don't especially like "import from" statements. If module.func1 is going to be used a lot in some particular function in the current source, I prefer adding:
def myfunc(*a, **k):
func1 = module.func1
... etc.
...so it's clear that in that function only that func1 means module.func1.

Python dependencies?

Is it possible to programmatically detect dependencies given a python project residing in SVN?
Here is a twist which adds some precision, and which might be useful if you find you're frequently checking dependencies of miscellaneous code:
Catches only import statements executed by the code being analyzed.
Automatically excludes all system-loaded modules, so you don't have to weed through it.
Also reports the symbols imported from each module.
Code:
import __builtin__
import collections
import sys
IN_USE = collections.defaultdict(set)
_IMPORT = __builtin__.__import__
def _myimport(name, globs=None, locs=None, fromlist=None, level=-1):
global IN_USE
if fromlist is None:
fromlist = []
IN_USE[name].update(fromlist)
return _IMPORT(name, globs, locs, fromlist, level)
# monkey-patch __import__
setattr(__builtin__, '__import__', _myimport)
# import and run the target project here and run the routine
import foobar
foobar.do_something()
# when it finishes running, dump the imports
print 'modules and symbols imported by "foobar":'
for key in sorted(IN_USE.keys()):
print key
for name in sorted(IN_USE[key]):
print ' ', name
Example foobar module:
import byteplay
import cjson
def _other():
from os import path
from sys import modules
def do_something():
import hashlib
import lxml
_other()
Output:
modules and symbols imported by "foobar":
_hashlib
array
array
byteplay
cStringIO
StringIO
cjson
dis
findlabels
foobar
hashlib
itertools
lxml
opcode
*
__all__
operator
os
path
sys
modules
types
warnings
Absolutely! If you are working from a UNIX or Linux shell, a simple combination of grep and awk would work; basically, all you want to do is search for lines containing the "import" keyword.
However, if you are working from any environment, you could just write a small Python script to do the searching for you (don't forget that strings are treated as immutable sequences, so you can do something like if "import" in line: ....
The one sticky spot, would be associating those imported modules to their package name (the first one that comes to mind is the PIL module, in Ubuntu it's provided by the python-imaging package).
Python code can import modules using runtime-constructed strings, so the only surefire way would be to run the code. Real-world example: when you open a database with SQLAlchemy's dbconnect, the library will load one or more db-api modules depending on the content of your database string.
If you're willing to run the code, here is a relatively simple way to do this by examining sys.modules when it finishes:
>>> from sys import modules
>>> import codeofinterest
>>> execute_code_of_interest()
>>> print modules
[ long, list, of, loaded, modules ]
Here, too, you should keep in mind that this could theoretically fail if execute_code_of_interest() modifies sys.modules, but I believe that's quite rare in production code.

How to load compiled python modules from memory?

I need to read all modules (pre-compiled) from a zipfile (built by py2exe compressed) into memory and then load them all.
I know this can be done by loading direct from the zipfile but I need to load them from memory.
Any ideas? (I'm using python 2.5.2 on windows)
TIA Steve
It depends on what exactly you have as "the module (pre-compiled)". Let's assume it's exactly the contents of a .pyc file, e.g., ciao.pyc as built by:
$ cat>'ciao.py'
def ciao(): return 'Ciao!'
$ python -c'import ciao; print ciao.ciao()'
Ciao!
IOW, having thus built ciao.pyc, say that you now do:
$ python
Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> b = open('ciao.pyc', 'rb').read()
>>> len(b)
200
and your goal is to go from that byte string b to an importable module ciao. Here's how:
>>> import marshal
>>> c = marshal.loads(b[8:])
>>> c
<code object <module> at 0x65188, file "ciao.py", line 1>
this is how you get the code object from the .pyc binary contents. Edit: if you're curious, the first 8 bytes are a "magic number" and a timestamp -- not needed here (unless you want to sanity-check them and raise exceptions if warranted, but that seems outside the scope of the question; marshal.loads will raise anyway if it detects a corrupt string).
Then:
>>> import types
>>> m = types.ModuleType('ciao')
>>> import sys
>>> sys.modules['ciao'] = m
>>> exec c in m.__dict__
i.e: make a new module object, install it in sys.modules, populate it by executing the code object in its __dict__. Edit: the order in which you do the sys.modules insertion and exec matters if and only if you may have circular imports -- but, this is the order Python's own import normally uses, so it's better to mimic it (which has no specific downsides).
You can "make a new module object" in several ways (e.g., from functions in standard library modules such as new and imp), but "call the type to get an instance" is the normal Python way these days, and the normal place to obtain the type from (unless it has a built-in name or you otherwise have it already handy) is from the standard library module types, so that's what I recommend.
Now, finally:
>>> import ciao
>>> ciao.ciao()
'Ciao!'
>>>
...you can import the module and use its functions, classes, and so on. Other import (and from) statements will then find the module as sys.modules['ciao'], so you won't need to repeat this sequence of operations (indeed you don't need this last import statement here if all you want is to ensure the module is available for import from elsewhere -- I'm adding it only to show it works;-).
Edit: If you absolutely must import in this way packages and modules therefrom, rather than "plain modules" as I just showed, that's doable, too, but a bit more complicated. As this answer is already pretty long, and I hope you can simplify your life by sticking to plain modules for this purpose, I'm going to shirk that part of the answer;-).
Also note that this may or may not do what you want in cases of "loading the same module from memory multiple times" (this rebuilds the module each time; you might want to check sys.modules and just skip everything if the module's already there) and in particular when such repeated "load from memory" occurs from multiple threads (needing locks -- but, a better architecture is to have a single dedicated thread devoted to performing the task, with other modules communicating with it via a Queue).
Finally, there's no discussion of how to install this functionality as a transparent "import hook" which automagically gets involved in the mechanisms of the import statement internals themselves -- that's feasible, too, but not exactly what you're asking about, so here, too, I hope you can simplify your life by doing things the simple way instead, as this answer outlines.
Compiled Python file consist of
magic number (4 bytes) to determine type and version of Python,
timestamp (4 bytes) to check whether we have newer source,
marshaled code object.
To load module you have to create module object with imp.new_module(), execute unmashaled code in new module's namespace and put it in sys.modules. Below in sample implementation:
import sys, imp, marshal
def load_compiled_from_memory(name, filename, data, ispackage=False):
if data[:4]!=imp.get_magic():
raise ImportError('Bad magic number in %s' % filename)
# Ignore timestamp in data[4:8]
code = marshal.loads(data[8:])
imp.acquire_lock() # Required in threaded applications
try:
mod = imp.new_module(name)
sys.modules[name] = mod # To handle circular and submodule imports
# it should come before exec.
try:
mod.__file__ = filename # Is not so important.
# For package you have to set mod.__path__ here.
# Here I handle simple cases only.
if ispackage:
mod.__path__ = [name.replace('.', '/')]
exec code in mod.__dict__
except:
del sys.modules[name]
raise
finally:
imp.release_lock()
return mod
Update: the code is updated to handle packages properly.
Note that you have to install import hook to handle imports inside loaded modules. One way to do this is adding your finder into sys.meta_path. See PEP302 for more information.

Categories