How to understand Python's module lookup - python

I created two new files, random.py and main.py, in the directory. The code is as follows:
# random.py
if __name__ == "__main__":
print("random")
# main.py
import random
if __name__ == "__main__":
print(random.choice([1, 2, 3]))
When I run the main.py file, the program reports an error.
Traceback (most recent call last):
File "main.py", line 8, in <module>
print(random.choice([1, 2, 3]))
AttributeError: module 'random' has no attribute 'choice'
Main.py imports my own defined random module.
However, if I create a new sys.py file and a main.py file in the same directory, the code is as follows:
# sys.py
if __name__ == "__main__":
print("sys")
# main.py
import sys
if __name__ == "__main__":
print(sys.path)
When I run the main.py file, successfully.
main.py imports the built-in modules sys.
Why is there such a clear difference?
The directory relationship of the script file is as follows:
C:.
main.py
random.py
sys.py
Thank you very much for your answer.
Forgive my poor english.

sys is a built-in module, meaning it's compiled directly into the Python executable itself. Built-in modules outprioritize external files when Python is looking for modules. The standard random module isn't built-in, so it doesn't get that treatment.
Quoting the docs:
When the named module is not found in sys.modules, Python next searches sys.meta_path, which contains a list of meta path finder objects. These finders are queried in order to see if they know how to handle the named module...
Python’s default sys.meta_path has three meta path finders, one that knows how to import built-in modules, one that knows how to import frozen modules, and one that knows how to import modules from an import path (i.e. the path based finder).
Since the finder for built-in modules comes before the finder that searches the import path, built-in modules will be found before anything on the import path.
You can see a tuple of the names of all modules your Python has built-in in sys.builtin_module_names.
That said, while any built-in module would outprioritize a module loaded from a file, sys has its own special handling. sys is one of the foundational building blocks of Python, and much of the sys module's setup needs to happen before the import system is functional enough for the normal import process to work. sys gets explicitly created during interpreter setup in a way that bypasses the normal import system, and then future imports for sys find it in sys.modules without hitting any meta path finders.
How and where sys is created is an implementation detail that varies from Python version to Python version (and is wildly different in different Python implementations), but in the CPython 3.7.4 code, you can see it beginning on line 755 in Python/pylifecycle.c.

tl;dr Caching
sys is somewhat of a special case among other python modules because it gets loaded at program start, unconditionally (presumably because a lot of the constants, functions, and data within - such as the streams stdout and stderr - are used by the python interpreter). As #user2357112 noted in the other answer, this is partly because it's built-in to the python executable, but also because it's necessary for running a substantial amount of python's core functionality (see below how it needs to be loaded for imports to work). random is part of the standard library, but it doesn't get loaded automatically when you execute, which is the primary relevant difference between it and sys, for our purposes
Looking at python's documentation on the subject clarifies how python resolves imports:
The first place checked during import search is sys.modules. This mapping serves as a cache of all modules that have been previously imported, including the intermediate paths.
...
During import, the module name is looked up in sys.modules and if present, the associated value is the module satisfying the import, and the process completes. However, if the value is None, then a ModuleNotFoundError is raised. If the module name is missing, Python will continue searching for the module.
As for where it looks for the module, you can see in your observed behavior that it looks in the local directory first. That is, it searches the local directory first and then the "usual places" afterwards.
The reason for the discrepancy between how sys is handled and how random is handled is caching - sys is cached (so python doesn't even check the path to import), whereas random is not cached (so python does check the path to import it, and imports locally).
There are a few ways you can change this behavior.
First, if you must have a local module called sys, you can use importlib to import it in relative or absolute terms, without running into the ambiguity with the sys that's already cached. I have no idea how this would affect other modules that independently try to import sys, and you really shouldn't be naming your files the same as standard library modules anyway.
Alternatively, if you want the code to check python's built-in modules before checking the local directory, then you should be able to do that by modifying sys.path, which shows the order in which paths are searched for input (the same as the $PATH environment variable, or any other similar language-specific one). The first element of sys.path is usually going to be an empty string '', that would result in searching the current working directory. So you can simply move that to the back of sys.path, to have it searched last instead of first:
sys.path.append(sys.path.pop(0))

Related

Custom Module with Custom Python Package - Module not found error

I wrote a custom python package for Ansible to handle business logic for some servers I manage. I have multiple files and they reference each other by re-importing the package.
So my package named <MyCustomPackage> has functions <Function1> <Function2> <Function3>, etc all in their own files... Some of these functions reference functions in the same package, so to do that the file has:
import MyCustomPackage
at the top. I did it this way instead of a relative import because I'm also unit testing these and mocking would not work with relative paths because of a __init__ file in the test directory which was needed for test discovery. The only way I could mock was through importing the package itself. Seemed simple enough.
The problem is with Ansible. These packages are in module_utils. I import them with:
from ansible.module_utils.MyCustomPackage import MyCustomPackage
but when I use the commands I get module not found errors - and traced it back to the import MyCustomPackage statement in the package itself.
So - how should I be structuring my package? Should I try again with relative file imports, or have the package modify the path so it's found with the friendly name?
Any tips would be helpful! Or if someone has a module they've written with Python modules in module_utils and unit tests that they'd be willing to share, that'd be great also!
Many people have problems with relative imports and imports in general in Python because they are ambiguous and surprisingly depend on your current working directory (and other things).
Thus I've created an experimental, new import library: ultraimport
It gives you more control over your imports and lets you do file system based, relative imports.
Given that you have a file function1.py, to import a function from function2.py, you would then write:
import ultraimport
Function2 = ultraimport('__dir__/function2.py', 'Function2')
This will always work, no matter how you run your code. It also does not force you to a specific package structure. You can just have any files you like.

Does importing a Python file also import the imported files into shell?

I am running Python 3.6.2 and trying to import other files into my shell prompt as needed. I have the following code inside my_file.py.
import numpy as np
def my_file(x):
s = 1/(1+np.exp(-x))
return s
From my 3.6.2 shell prompt I call
from my_file import my_file
But in my shell prompt if I want to use the library numpy I still have to import numpy into the shell prompt even though I have imported a file that imports numpy. Is this functionality by design? Or is there a way to import numpy once?
import has three completely separate effects:
If the module has not yet been imported in the current process (by any script or module), execute its code (usually from disk) and store a module object with the resulting classes, functions, and variables.
If the module is in a package, (import the package first, and) store the new module as an attribute on the containing package (so that references like scipy.special work).
Assign the module ultimately imported to a variable in the invoking scope. (import foo.bar assigns foo; import baz.quux as frob assigns baz.quux to the name frob.)
The first two effects are shared among all clients, while the last is completely local. This is by design, as it avoids accidentally using a dependency of an imported module without making sure it’s available (which would break later if the other modules changed what they imported). It also lets different clients use different shorthands.
As hpaul noted, you can use another module’s imports with a qualified name, but this is abusing the module’s interface just like any other use of a private name unless (like six.moves, for example, or os.path which is actually not a module at all) the module intends to publish names for other modules.

Python Standard Library Import Relationships

I am writing an application in C# with VisualStudio and am using IronPython to write some Python scripts for my application. However, it does not have the entire standard library support by default. So to import some modules (such as os) I need to point my C# code to where the os module actually is. I also understand that it will still be limited to libraries implemented in pure python.
Ultimately I want to have something that can be installed on another machine. My current workaround is to include a copy of https://github.com/python/cpython/tree/2.7/Lib in the Debug folder where the executable is running and it seems excessive/unnecessary to have to include the entire thing. I tried just placing the files I need (for example os.py) here but obviously it imports other modules, which import other modules, etc... I would have to re-run the code to get the error for which module it couldn't find and add them in 1 by 1 and it was getting too tedious.
I was wondering if there was any sort of resource that specifies the relationships between standard library modules and could tell me exactly what files to copy. Essentially what I'm looking for is the graph of the standard library imports. So if I want to import os in these scripts I know to copy os.py, ntpath.py, ...
Thanks
you probably don't need the imports as a tree, but as a simple list, so you can just copy the needed files. You can get that from sys.modules, after you import everything that your script needs - it will contain all modules needed by those that you imported, e.g.:
import sys # even if you don't use it - it's a built-in module, won't add a file to the list, needed to get sys.modules
import os
import time
#import whatever-else
# this gives a list of tuples (module,file)
m=[(z,x.__file__) for z,x in sys.modules.items() if hasattr(x,"__file__") ]
for x in m:
print x[0],x[1]

IPython import local file

I've come across a weird disparity between IPython and the default python interpreter. I have a python file that shadows a built-in module's name: logging.py. Say it has a simple method foo().
If I start up the default python interpreter and call import logging it imports the local file and I can access logging.foo().
If I start up IPython and call import logging it imports the python built-in module. If I change the name to a non-shadow (e.g. import my_logging) then the import will work as expected.
Which is the expected behaviour? The current directory is at the start of sys.path for both interpreters but they differ in which imports have priority.
import sys
print(sys.modules)
IPython starts with most of the standard library imported, including logging. Those modules seem to be imported by full path.
Speculation: IPython already imported those libraries by full path for internal use, now when you import logging again it checks that the module is already imported, regardless of the path, and does nothing.

Reference objects in containing package?

I have a package, spam, that contains a contains the variable _eggs in __init__.py In the same package, in boiler.py, I have the class Boiler.
In Boiler, I want to refer to _eggs in the package’s __init__.py file. Is there a way that I can do this?
The most appropriate way to retrieve that value is via an explicit relative import:
from . import _eggs
However, one thing to keep in mind is that the following command line invocation will then fail to work:
python spam/boiler.py
The reason this won't work is the interpreter doesn't recognise any directly executed file as part of a package, so the relative import will fail.
However, with your current working directory set to the one containing the "spam" folder, you can instead execute the module as:
python -m spam.boiler
This gives the interpreter sufficient information to recognise where boiler.py sits in the module hierarchy and resolve the relative imports correctly.
This will only work with Python 2.6 or later - previous versions couldn't deal with explicit relative imports from main at all. (see PEP 366 for the gory details).
If you are simply doing import spam.boiler from another file, then that should work for any Python version that allows explicit relative imports (although it's possible Python 2.5 may need from __future__ import absolute_imports to correctly enable this feature)

Categories