Python import nuances

Python import nuances - python

Could someone please shed some light on this behavior in the Python interpreter:
from os import path # success
type(path) # <class 'module'>
from path import * # complains that no module called 'path' exists
type(os.path) # complains that the name 'os' is not defined, yet:
from os.path import * # works just fine
As a side-question, I wonder what is the mechanism that allows a statement such as 'from os import path' to work, while yet still os is undefined? Isn't os executed at the time of the from...import, and such it should be "known" as a module? Am I right to say that keeping os out of the known names is simply a convention, intended to prevent the "polution" of the namespace with symbols that have not been imported directly (as in 'import os')?

This is not specific to Python 3, you'd have the same problem in Python 2. Importing of a name adds it to the namespace, nothing more.
This line:
from path import *
Means:
"Try to find a module called path in any directory that is in
PYTHONPATH, and attempt to import all names from it to the current
namespace."
Since there is no such module in the current working directory, and more importantly not in any directory that's in PYTHONPATH, the import fails. Note, the search doesn't search the sub directories of any directory that is in PYTHONPATH.
type(os.path)
This line fails because there is no name os in the current namespace (since its not imported).
I wonder what is the mechanism that allows a statement such as 'from
os import path' to work, while yet still os is undefined?
Importing causes a search of paths that are defined in PYTHONPATH to be searched for modules; see this article on effbot for more clarification on how importing works.
"Undefined" simply means the name doesn't exist in the namespace.
Isn't os executed at the time of the from...import, and such it should
be "known" as a module?
No, when you do from x import y only y is imported, not x.
Am I right to say that keeping os out of the known names is simply a
convention, intended to prevent the "polution" of the namespace with
symbols that have not been imported directly (as in 'import os')?
No, this is not true (and I hope you understand why).

Related

when importing functions from inside builtins like os or sys is it good practice to import as protected?

in myModule.py I am importing environ from os , like
from os import environ since I am only using environ, but when I do dir(myModule) it shows environ as publicly visible , how ever should it be imported as protected assuming some other project may also have its own environ function ?

If you're doing from os import environ, then you'll reference it as environ.
If you do import os, it's os.environ.
So depending on your needs, the second option might be better. The first will look better and read easier, whereas the second avoids namespace pollution.

Expanding on #mgilson's comment - when you do dir(somemodule), everything you see is namespaced to that module. In other words, you have to use the . (name resolution operator) to "reach" those items.
So, in myModule.py you have the following lines:
from os import environ
a = 4
In some other module, or the Python prompt, you have the following statements:
import myModule
dir(myModule)
Now, in order to get to a or environ that is inside myModule, you'd have to explicitly define its scope:
print(a) # this won't work
print(myModule.a) # this will print 4
In Python as a general rule, there is no explicit hiding/protecting. Python expects its users to be consenting adults and "know what they are doing".
However, developers can control what happens when someone tries to import everything from a module (from myModule import *), but this isn't strictly enforced. You can still get to everything inside myModule by prefixing the module name.

Restrict import to specific path, without restricting the imported module's path

Is it possible to import a module from a specific directory, without affecting the import path of the imported module?
If I was to temporarily replace sys.path with the desired directory, the imported module would not be able to import anything outside of that directory.
I don't want to just prepend sys.path with the directory, because I don't want importing to fall back to another source.

The standard library's imp module allows you to search a list of paths to find and import a module without altering sys.path. For example:
import imp
search_paths = [path_to_spam]
modfile, modpath, description = imp.find_module('spam', search_paths)
with modfile:
spam = imp.load_module('spam', modfile, modpath, description)

Creating aliases for Python packages?

I have a directory, let's call it Storage full of packages with unwieldy names like mypackage-xxyyzzww, and of course Storage is on my PYTHONPATH. Since packages have long unmemorable names, all of the packages are symlinked to friendlier names, such as mypackage.
Now, I don't want to rely on file system symbolic links to do this, instead I tried mucking around with sys.path and sys.modules. Currently I'm doing something like this:
import imp
imp.load_package('mypackage', 'Storage/mypackage-xxyyzzww')
How bad is it to do things this way, and is there a chance this will break in the future? One funny thing is that there's even no mention of imp.load_package function in the docs.
EDIT: besides not relying on symbolic links, I can't use PYTHONPATH variable anymore.

Instead of using imp, you can assign different names to imported modules.
import mypackage_xxyyzzww as mypackage
If you then create a __init__.py file inside of Storage, you can add several of the above lines to make importing easier.
Storage/__init__.py:
import mypackage_xxyyzzww as mypackage
import otherpackage_xxyyzzww as otherpackage
Interpreter:
>>> from Storage import mypackage, otherpackage

importlib may be more appropriate, as it uses/implements the PEP302 mechanism.
Follow the DictImporter example, but override find_module to find the real filename and store it in the dict, then override load_module to get the code from the found file.
You shouldn't need to use sys.path once you've created your Storage module
#from importlib import abc
import imp
import os
import sys
import logging
logging.basicConfig(level=logging.DEBUG)
dprint = logging.debug
class MyImporter(object):
def __init__(self,path):
self.path=path
self.names = {}
def find_module(self,fullname,path=None):
dprint("find_module({fullname},{path})".format(**locals()))
ml = imp.find_module(fullname,path)
dprint(repr(ml))
raise ImportError
def load_module(self,fullname):
dprint("load_module({fullname})".format(**locals()))
return imp.load_module(fullname)
raise ImportError
def load_storage( path, modname=None ):
if modname is None:
modname = os.path.basename(path)
mod = imp.new_module(modname)
sys.modules[modname] = mod
assert mod.__name__== modname
mod.__path__=[path]
#sys.meta_path.append(MyImporter(path))
mod.__loader__= MyImporter(path)
return mod
if __name__=="__main__":
load_storage("arbitrary-path-to-code/Storage")
from Storage import plain
from Storage import mypkg
Then when you import Storage.mypackage, python will immediately use your importer without bothering to look on sys.path
That doesn't work. The code above does work to import ordinary modules under Storage without requiring Storage to be on sys.path, but both 3.1 and 2.6 seem to ignore the loader attribute mentioned in PEP302.
If I uncomment the sys.meta_path line, 3.1 dies with StackOverflow, and 2.6 dies with ImportError. hmmm... I'm out of time now, but may look at it later.

Packages are just entries in the namespace. You should not name your path components with anything that is not a legal python variable name.

Why does importing a dotted module name fail when upper level is matched by a module in current directory?

I was trying out one of the Python standard library modules, let's call it foo.bar.baz.
So I wrote a little script starting with
import foo.bar.baz
and saved it as foo.py.
When I executed the script I got an ImportError. It took me a while (I'm still learning Python), but I finally realized the problem was how I named the script. Once I renamed foo.py to something else, the problem went away.
So I understand that the import foo statement will look for the script foo.py before looking for the standard library foo, but it's not clear to me what it was looking for when I said import foo.bar.baz. Is there some way that foo.py could have the content for that statement to make sense? And if not, why didn't the Python interpreter move on to look for a directory hierarchy like foo/bar with the appropriate __init__.py's?.

An import statement like import foo.bar.baz first imports foo, then asks it for bar, and then asks foo.bar for baz. Whether foo will, once imported, be able to satisfy the request for bar or bar.baz is unimportant to the import of foo. It's just a module. There is only one foo module. Both import foo and import foo.bar.baz will find the same module -- just like any other way of importing the foo module.
There is actually a way for foo to be a single module, rather than a package, and still be able to satisfy a statement like import foo.bar.baz: it can add "foo.bar" and "foo.bar.baz" to the sys.modules dict. This is exactly what the os module does with os.path: it imports the right "path" module for the platform (posixpath, ntpath, os2path, etc), and assigns it to the path attribute. Then it does sys.modules["os.path"] = path to make that module importable as os.path, so a statement like import os.path works. There isn't really a reason to do this -- os.path is available without importing it as well -- but it's possible.

How to retrieve a module's path?

I want to detect whether module has changed. Now, using inotify is simple, you just need to know the directory you want to get notifications from.
How do I retrieve a module's path in python?

import a_module
print(a_module.__file__)
Will actually give you the path to the .pyc file that was loaded, at least on Mac OS X. So I guess you can do:
import os
path = os.path.abspath(a_module.__file__)
You can also try:
path = os.path.dirname(a_module.__file__)
To get the module's directory.

There is inspect module in python.
Official documentation
The inspect module provides several useful functions to help get
information about live objects such as modules, classes, methods,
functions, tracebacks, frame objects, and code objects. For example,
it can help you examine the contents of a class, retrieve the source
code of a method, extract and format the argument list for a function,
or get all the information you need to display a detailed traceback.
Example:
>>> import os
>>> import inspect
>>> inspect.getfile(os)
'/usr/lib64/python2.7/os.pyc'
>>> inspect.getfile(inspect)
'/usr/lib64/python2.7/inspect.pyc'
>>> os.path.dirname(inspect.getfile(inspect))
'/usr/lib64/python2.7'

As the other answers have said, the best way to do this is with __file__ (demonstrated again below). However, there is an important caveat, which is that __file__ does NOT exist if you are running the module on its own (i.e. as __main__).
For example, say you have two files (both of which are on your PYTHONPATH):
#/path1/foo.py
import bar
print(bar.__file__)
and
#/path2/bar.py
import os
print(os.getcwd())
print(__file__)
Running foo.py will give the output:
/path1 # "import bar" causes the line "print(os.getcwd())" to run
/path2/bar.py # then "print(__file__)" runs
/path2/bar.py # then the import statement finishes and "print(bar.__file__)" runs
HOWEVER if you try to run bar.py on its own, you will get:
/path2 # "print(os.getcwd())" still works fine
Traceback (most recent call last): # but __file__ doesn't exist if bar.py is running as main
File "/path2/bar.py", line 3, in <module>
print(__file__)
NameError: name '__file__' is not defined
Hope this helps. This caveat cost me a lot of time and confusion while testing the other solutions presented.

I will try tackling a few variations on this question as well:
finding the path of the called script
finding the path of the currently executing script
finding the directory of the called script
(Some of these questions have been asked on SO, but have been closed as duplicates and redirected here.)
Caveats of Using __file__
For a module that you have imported:
import something
something.__file__
will return the absolute path of the module. However, given the folowing script foo.py:
#foo.py
print '__file__', __file__
Calling it with 'python foo.py' Will return simply 'foo.py'. If you add a shebang:
#!/usr/bin/python
#foo.py
print '__file__', __file__
and call it using ./foo.py, it will return './foo.py'. Calling it from a different directory, (eg put foo.py in directory bar), then calling either
python bar/foo.py
or adding a shebang and executing the file directly:
bar/foo.py
will return 'bar/foo.py' (the relative path).
Finding the directory
Now going from there to get the directory, os.path.dirname(__file__) can also be tricky. At least on my system, it returns an empty string if you call it from the same directory as the file. ex.
# foo.py
import os
print '__file__ is:', __file__
print 'os.path.dirname(__file__) is:', os.path.dirname(__file__)
will output:
__file__ is: foo.py
os.path.dirname(__file__) is:
In other words, it returns an empty string, so this does not seem reliable if you want to use it for the current file (as opposed to the file of an imported module). To get around this, you can wrap it in a call to abspath:
# foo.py
import os
print 'os.path.abspath(__file__) is:', os.path.abspath(__file__)
print 'os.path.dirname(os.path.abspath(__file__)) is:', os.path.dirname(os.path.abspath(__file__))
which outputs something like:
os.path.abspath(__file__) is: /home/user/bar/foo.py
os.path.dirname(os.path.abspath(__file__)) is: /home/user/bar
Note that abspath() does NOT resolve symlinks. If you want to do this, use realpath() instead. For example, making a symlink file_import_testing_link pointing to file_import_testing.py, with the following content:
import os
print 'abspath(__file__)',os.path.abspath(__file__)
print 'realpath(__file__)',os.path.realpath(__file__)
executing will print absolute paths something like:
abspath(__file__) /home/user/file_test_link
realpath(__file__) /home/user/file_test.py
file_import_testing_link -> file_import_testing.py
Using inspect
#SummerBreeze mentions using the inspect module.
This seems to work well, and is quite concise, for imported modules:
import os
import inspect
print 'inspect.getfile(os) is:', inspect.getfile(os)
obediently returns the absolute path. For finding the path of the currently executing script:
inspect.getfile(inspect.currentframe())
(thanks #jbochi)
inspect.getabsfile(inspect.currentframe())
gives the absolute path of currently executing script (thanks #Sadman_Sakib).

I don't get why no one is talking about this, but to me the simplest solution is using imp.find_module("modulename") (documentation here):
import imp
imp.find_module("os")
It gives a tuple with the path in second position:
(<open file '/usr/lib/python2.7/os.py', mode 'U' at 0x7f44528d7540>,
'/usr/lib/python2.7/os.py',
('.py', 'U', 1))
The advantage of this method over the "inspect" one is that you don't need to import the module to make it work, and you can use a string in input. Useful when checking modules called in another script for example.
EDIT:
In python3, importlib module should do:
Doc of importlib.util.find_spec:
Return the spec for the specified module.
First, sys.modules is checked to see if the module was already imported. If so, then sys.modules[name].spec is returned. If that happens to be
set to None, then ValueError is raised. If the module is not in
sys.modules, then sys.meta_path is searched for a suitable spec with the
value of 'path' given to the finders. None is returned if no spec could
be found.
If the name is for submodule (contains a dot), the parent module is
automatically imported.
The name and package arguments work the same as importlib.import_module().
In other words, relative module names (with leading dots) work.

This was trivial.
Each module has a __file__ variable that shows its relative path from where you are right now.
Therefore, getting a directory for the module to notify it is simple as:
os.path.dirname(__file__)

import os
path = os.path.abspath(__file__)
dir_path = os.path.dirname(path)

import module
print module.__path__
Packages support one more special attribute, __path__. This is
initialized to be a list containing the name of the directory holding
the package’s __init__.py before the code in that file is executed.
This variable can be modified; doing so affects future searches for
modules and subpackages contained in the package.
While this feature is not often needed, it can be used to extend the
set of modules found in a package.
Source

If you want to retrieve the module path without loading it:
import importlib.util
print(importlib.util.find_spec("requests").origin)
Example output:
/usr/lib64/python3.9/site-packages/requests/__init__.py

Command Line Utility
You can tweak it to a command line utility,
python-which <package name>
Create /usr/local/bin/python-which
#!/usr/bin/env python
import importlib
import os
import sys
args = sys.argv[1:]
if len(args) > 0:
module = importlib.import_module(args[0])
print os.path.dirname(module.__file__)
Make it executable
sudo chmod +x /usr/local/bin/python-which

you can just import your module
then hit its name and you'll get its full path
>>> import os
>>> os
<module 'os' from 'C:\\Users\\Hassan Ashraf\\AppData\\Local\\Programs\\Python\\Python36-32\\lib\\os.py'>
>>>

So I spent a fair amount of time trying to do this with py2exe
The problem was to get the base folder of the script whether it was being run as a python script or as a py2exe executable. Also to have it work whether it was being run from the current folder, another folder or (this was the hardest) from the system's path.
Eventually I used this approach, using sys.frozen as an indicator of running in py2exe:
import os,sys
if hasattr(sys,'frozen'): # only when running in py2exe this exists
base = sys.prefix
else: # otherwise this is a regular python script
base = os.path.dirname(os.path.realpath(__file__))

If you want to retrieve the package's root path from any of its modules, the following works (tested on Python 3.6):
from . import __path__ as ROOT_PATH
print(ROOT_PATH)
The main __init__.py path can also be referenced by using __file__ instead.
Hope this helps!

When you import a module, yo have access to plenty of information. Check out dir(a_module). As for the path, there is a dunder for that: a_module.__path__. You can also just print the module itself.
>>> import a_module
>>> print(dir(a_module))
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__']
>>> print(a_module.__path__)
['/.../.../a_module']
>>> print(a_module)
<module 'a_module' from '/.../.../a_module/__init__.py'>

If you would like to know absolute path from your script you can use Path object:
from pathlib import Path
print(Path().absolute())
print(Path().resolve('.'))
print(Path().cwd())
cwd() method
Return a new path object representing the current directory (as returned by os.getcwd())
resolve() method
Make the path absolute, resolving any symlinks. A new path object is returned:

If you installed it using pip, "pip show" works great ('Location')
$ pip show detectron2
Name: detectron2
Version: 0.1
Summary: Detectron2 is FAIR next-generation research platform for object detection and segmentation.
Home-page: https://github.com/facebookresearch/detectron2
Author: FAIR
Author-email: None
License: UNKNOWN
Location: /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages
Requires: yacs, tabulate, tqdm, pydot, tensorboard, Pillow, termcolor, future, cloudpickle, matplotlib, fvcore
Update:
$ python -m pip show mymodule
(author: wisbucky)

If the only caveat of using __file__ is when current, relative directory is blank (ie, when running as a script from the same directory where the script is), then a trivial solution is:
import os.path
mydir = os.path.dirname(__file__) or '.'
full = os.path.abspath(mydir)
print __file__, mydir, full
And the result:
$ python teste.py
teste.py . /home/user/work/teste
The trick is in or '.' after the dirname() call. It sets the dir as ., which means current directory and is a valid directory for any path-related function.
Thus, using abspath() is not truly needed. But if you use it anyway, the trick is not needed: abspath() accepts blank paths and properly interprets it as the current directory.

I'd like to contribute with one common scenario (in Python 3) and explore a few approaches to it.
The built-in function open() accepts either relative or absolute path as its first argument. The relative path is treated as relative to the current working directory though so it is recommended to pass the absolute path to the file.
Simply said, if you run a script file with the following code, it is not guaranteed that the example.txt file will be created in the same directory where the script file is located:
with open('example.txt', 'w'):
pass
To fix this code we need to get the path to the script and make it absolute. To ensure the path to be absolute we simply use the os.path.realpath() function. To get the path to the script there are several common functions that return various path results:
os.getcwd()
os.path.realpath('example.txt')
sys.argv[0]
__file__
Both functions os.getcwd() and os.path.realpath() return path results based on the current working directory. Generally not what we want. The first element of the sys.argv list is the path of the root script (the script you run) regardless of whether you call the list in the root script itself or in any of its modules. It might come handy in some situations. The __file__ variable contains path of the module from which it has been called.
The following code correctly creates a file example.txt in the same directory where the script is located:
filedir = os.path.dirname(os.path.realpath(__file__))
filepath = os.path.join(filedir, 'example.txt')
with open(filepath, 'w'):
pass

From within modules of a python package I had to refer to a file that resided in the same directory as package. Ex.
some_dir/
maincli.py
top_package/
__init__.py
level_one_a/
__init__.py
my_lib_a.py
level_two/
__init__.py
hello_world.py
level_one_b/
__init__.py
my_lib_b.py
So in above I had to call maincli.py from my_lib_a.py module knowing that top_package and maincli.py are in the same directory. Here's how I get the path to maincli.py:
import sys
import os
import imp
class ConfigurationException(Exception):
pass
# inside of my_lib_a.py
def get_maincli_path():
maincli_path = os.path.abspath(imp.find_module('maincli')[1])
# top_package = __package__.split('.')[0]
# mod = sys.modules.get(top_package)
# modfile = mod.__file__
# pkg_in_dir = os.path.dirname(os.path.dirname(os.path.abspath(modfile)))
# maincli_path = os.path.join(pkg_in_dir, 'maincli.py')
if not os.path.exists(maincli_path):
err_msg = 'This script expects that "maincli.py" be installed to the '\
'same directory: "{0}"'.format(maincli_path)
raise ConfigurationException(err_msg)
return maincli_path
Based on posting by PlasmaBinturong I modified the code.

If you wish to do this dynamically in a "program" try this code:
My point is, you may not know the exact name of the module to "hardcode" it.
It may be selected from a list or may not be currently running to use __file__.
(I know, it will not work in Python 3)
global modpath
modname = 'os' #This can be any module name on the fly
#Create a file called "modname.py"
f=open("modname.py","w")
f.write("import "+modname+"\n")
f.write("modpath = "+modname+"\n")
f.close()
#Call the file with execfile()
execfile('modname.py')
print modpath
<module 'os' from 'C:\Python27\lib\os.pyc'>
I tried to get rid of the "global" issue but found cases where it did not work
I think "execfile()" can be emulated in Python 3
Since this is in a program, it can easily be put in a method or module for reuse.

Here is a quick bash script in case it's useful to anyone. I just want to be able to set an environment variable so that I can pushd to the code.
#!/bin/bash
module=${1:?"I need a module name"}
python << EOI
import $module
import os
print os.path.dirname($module.__file__)
EOI
Shell example:
[root#sri-4625-0004 ~]# export LXML=$(get_python_path.sh lxml)
[root#sri-4625-0004 ~]# echo $LXML
/usr/lib64/python2.7/site-packages/lxml
[root#sri-4625-0004 ~]#

If your import is a site-package (e.g. pandas) I recommend this to get its directory (does not work if import is a module, like e.g. pathlib):
from importlib import resources # part of core Python
import pandas as pd
package_dir = resources.path(package=pd, resource="").__enter__()
In general importlib.resources can be considered when a task is about accessing paths/resources of a site package.

If you used pip, then you can call pip show, but you must call it using the specific version of python that you are using. For example, these could all give different results:
$ python -m pip show numpy
$ python2.7 -m pip show numpy
$ python3 -m pip show numpy
Location: /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python
Don't simply run $ pip show numpy, because there is no guarantee that it will be the same pip that different python versions are calling.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python import nuances - python

Related

when importing functions from inside builtins like os or sys is it good practice to import as protected?

Restrict import to specific path, without restricting the imported module's path

Creating aliases for Python packages?

Why does importing a dotted module name fail when upper level is matched by a module in current directory?

How to retrieve a module's path?

Categories

Resources