Should I use `import os.path` or `import os`? - python

According to the official documentation, os.path is a module. Thus, what is the preferred way of importing it?
# Should I always import it explicitly?
import os.path
Or...
# Is importing os enough?
import os
Please DON'T answer "importing os works for me". I know, it works for me too right now (as of Python 2.6). What I want to know is any official recommendation about this issue. So, if you answer this question, please post your references.

os.path works in a funny way. It looks like os should be a package with a submodule path, but in reality os is a normal module that does magic with sys.modules to inject os.path. Here's what happens:
When Python starts up, it loads a bunch of modules into sys.modules. They aren't bound to any names in your script, but you can access the already-created modules when you import them in some way.
sys.modules is a dict in which modules are cached. When you import a module, if it already has been imported somewhere, it gets the instance stored in sys.modules.
os is among the modules that are loaded when Python starts up. It assigns its path attribute to an os-specific path module.
It injects sys.modules['os.path'] = path so that you're able to do "import os.path" as though it was a submodule.
I tend to think of os.path as a module I want to use rather than a thing in the os module, so even though it's not really a submodule of a package called os, I import it sort of like it is one and I always do import os.path. This is consistent with how os.path is documented.
Incidentally, this sort of structure leads to a lot of Python programmers' early confusion about modules and packages and code organization, I think. This is really for two reasons
If you think of os as a package and know that you can do import os and have access to the submodule os.path, you may be surprised later when you can't do import twisted and automatically access twisted.spread without importing it.
It is confusing that os.name is a normal thing, a string, and os.path is a module. I always structure my packages with empty __init__.py files so that at the same level I always have one type of thing: a module/package or other stuff. Several big Python projects take this approach, which tends to make more structured code.

As per PEP-20 by Tim Peters, "Explicit is better than implicit" and "Readability counts". If all you need from the os module is under os.path, import os.path would be more explicit and let others know what you really care about.
Likewise, PEP-20 also says "Simple is better than complex", so if you also need stuff that resides under the more-general os umbrella, import os would be preferred.

Definitive answer: import os and use os.path. do not import os.path directly.
From the documentation of the module itself:
>>> import os
>>> help(os.path)
...
Instead of importing this module directly, import os and refer to
this module as os.path. The "os.path" name is an alias for this
module on Posix systems; on other systems (e.g. Mac, Windows),
os.path provides the same operations in a manner specific to that
platform, and is an alias to another module (e.g. macpath, ntpath).
...

Interestingly enough, importing os.path will import all of os. try the following in the interactive prompt:
import os.path
dir(os)
The result will be the same as if you just imported os. This is because os.path will refer to a different module based on which operating system you have, so python will import os to determine which module to load for path.
reference
With some modules, saying import foo will not expose foo.bar, so I guess it really depends the design of the specific module.
In general, just importing the explicit modules you need should be marginally faster. On my machine:
import os.path: 7.54285810068e-06 seconds
import os: 9.21904878972e-06 seconds
These times are close enough to be fairly negligible. Your program may need to use other modules from os either now or at a later time, so usually it makes sense just to sacrifice the two microseconds and use import os to avoid this error at a later time. I usually side with just importing os as a whole, but can see why some would prefer import os.path to technically be more efficient and convey to readers of the code that that is the only part of the os module that will need to be used. It essentially boils down to a style question in my mind.

Common sense works here: os is a module, and os.path is a module, too. So just import the module you want to use:
If you want to use functionalities in the os module, then import os.
If you want to use functionalities in the os.path module, then import os.path.
If you want to use functionalities in both modules, then import both modules:
import os
import os.path
For reference:
Lib/idlelib/rpc.py uses os and imports os.
Lib/idlelib/idle.py uses os.path and imports os.path.
Lib/ensurepip/init.py uses both and imports both.

Couldn't find any definitive reference, but I see that the example code for os.walk uses os.path but only imports os

Related

Why do you only sometimes need to import submodules? [duplicate]

According to the official documentation, os.path is a module. Thus, what is the preferred way of importing it?
# Should I always import it explicitly?
import os.path
Or...
# Is importing os enough?
import os
Please DON'T answer "importing os works for me". I know, it works for me too right now (as of Python 2.6). What I want to know is any official recommendation about this issue. So, if you answer this question, please post your references.
os.path works in a funny way. It looks like os should be a package with a submodule path, but in reality os is a normal module that does magic with sys.modules to inject os.path. Here's what happens:
When Python starts up, it loads a bunch of modules into sys.modules. They aren't bound to any names in your script, but you can access the already-created modules when you import them in some way.
sys.modules is a dict in which modules are cached. When you import a module, if it already has been imported somewhere, it gets the instance stored in sys.modules.
os is among the modules that are loaded when Python starts up. It assigns its path attribute to an os-specific path module.
It injects sys.modules['os.path'] = path so that you're able to do "import os.path" as though it was a submodule.
I tend to think of os.path as a module I want to use rather than a thing in the os module, so even though it's not really a submodule of a package called os, I import it sort of like it is one and I always do import os.path. This is consistent with how os.path is documented.
Incidentally, this sort of structure leads to a lot of Python programmers' early confusion about modules and packages and code organization, I think. This is really for two reasons
If you think of os as a package and know that you can do import os and have access to the submodule os.path, you may be surprised later when you can't do import twisted and automatically access twisted.spread without importing it.
It is confusing that os.name is a normal thing, a string, and os.path is a module. I always structure my packages with empty __init__.py files so that at the same level I always have one type of thing: a module/package or other stuff. Several big Python projects take this approach, which tends to make more structured code.
As per PEP-20 by Tim Peters, "Explicit is better than implicit" and "Readability counts". If all you need from the os module is under os.path, import os.path would be more explicit and let others know what you really care about.
Likewise, PEP-20 also says "Simple is better than complex", so if you also need stuff that resides under the more-general os umbrella, import os would be preferred.
Definitive answer: import os and use os.path. do not import os.path directly.
From the documentation of the module itself:
>>> import os
>>> help(os.path)
...
Instead of importing this module directly, import os and refer to
this module as os.path. The "os.path" name is an alias for this
module on Posix systems; on other systems (e.g. Mac, Windows),
os.path provides the same operations in a manner specific to that
platform, and is an alias to another module (e.g. macpath, ntpath).
...
Interestingly enough, importing os.path will import all of os. try the following in the interactive prompt:
import os.path
dir(os)
The result will be the same as if you just imported os. This is because os.path will refer to a different module based on which operating system you have, so python will import os to determine which module to load for path.
reference
With some modules, saying import foo will not expose foo.bar, so I guess it really depends the design of the specific module.
In general, just importing the explicit modules you need should be marginally faster. On my machine:
import os.path: 7.54285810068e-06 seconds
import os: 9.21904878972e-06 seconds
These times are close enough to be fairly negligible. Your program may need to use other modules from os either now or at a later time, so usually it makes sense just to sacrifice the two microseconds and use import os to avoid this error at a later time. I usually side with just importing os as a whole, but can see why some would prefer import os.path to technically be more efficient and convey to readers of the code that that is the only part of the os module that will need to be used. It essentially boils down to a style question in my mind.
Common sense works here: os is a module, and os.path is a module, too. So just import the module you want to use:
If you want to use functionalities in the os module, then import os.
If you want to use functionalities in the os.path module, then import os.path.
If you want to use functionalities in both modules, then import both modules:
import os
import os.path
For reference:
Lib/idlelib/rpc.py uses os and imports os.
Lib/idlelib/idle.py uses os.path and imports os.path.
Lib/ensurepip/init.py uses both and imports both.
Couldn't find any definitive reference, but I see that the example code for os.walk uses os.path but only imports os

Import a module once and use it globally in python

I have done some research and learned that python's import statement only imports something once, and when used again, it just checks if it was already imported. I'm working on a bigger project and noticed that the same thing is imported in multiple files, which aparently doesn't affect performance but leaves the code a bit polluted imo. My question is: is there a way to import something only once and use it everywhere in the directory without calling the import statement over and over?
Here are some of the modules that I'm importing in various files:
from PyQt5.QtWebEngineWidgets import QWebEngineView
from PyQt5.QtCore import *
Each module (.py file) that needs to have those imported names in scope, will have to have its own import statements. This is the standard convention in Python. However, it's not recommended to import * but rather to import only the names you will actually use from the package.
It is possible to put your package import statements in a __init__.py file in the directory instead of in each .py file, but then you will still need a relative import statement to import those names from your package, as described here: Importing external package once in my module without it being added to the namespace

Add path to sys.path vs. PEP E402

In order to import a project specific module somewhere located on your disk, one can easily append this directory to sys.path:
import sys
sys.path.append(some_module_path)
import some_module
However, the latter import now violates PEP E402 ("module level import not at top of file"). At least spyder tells me so. Is spyder here too picky?
In spyder there is the principal idea of a "project", where I assumed environments can be adjusted specific for this project. However, I have no clue, how to modify e.g. the sys.path depending on a spyder project.
How can I modify sys.path in a spyder project? Or is there a general python way of solving this issue?
You could put the sys.path extension in a separate module, e.g. _paths.py.
Contents of _paths.py:
import sys
sys.path.append(some_module_path)
sys.path.append(some_other_module_path)
# ...and so on...
And then in your main application:
import sys
import _paths
import some_module
some_module.some_func()
This solution puts your "project configuration" nicely in a single place (which makes it easy to maintain in the future), and complies with at least PEP8 (including E402) and pylint rules.
As alternative solution to the answer with a separate module if found this as working solution for me.
try:
sys.path.append(Path(__file__).parent.parent)
except IndexError:
pass
If I just use the sys.path.append(...), I get the warning, but using the try-catch block does not produce a warning.
I know this doesn't answer the question, but it may be helpful information.
You can import that module by directly specifying its path, without using sys.path.append
In Python 3 this is as simple as
import imp
some_module = imp.load_source('some_module', '/path/to/some_module.py')
More information here: How to import a module given the full path?

Python Standard Library Import Relationships

I am writing an application in C# with VisualStudio and am using IronPython to write some Python scripts for my application. However, it does not have the entire standard library support by default. So to import some modules (such as os) I need to point my C# code to where the os module actually is. I also understand that it will still be limited to libraries implemented in pure python.
Ultimately I want to have something that can be installed on another machine. My current workaround is to include a copy of https://github.com/python/cpython/tree/2.7/Lib in the Debug folder where the executable is running and it seems excessive/unnecessary to have to include the entire thing. I tried just placing the files I need (for example os.py) here but obviously it imports other modules, which import other modules, etc... I would have to re-run the code to get the error for which module it couldn't find and add them in 1 by 1 and it was getting too tedious.
I was wondering if there was any sort of resource that specifies the relationships between standard library modules and could tell me exactly what files to copy. Essentially what I'm looking for is the graph of the standard library imports. So if I want to import os in these scripts I know to copy os.py, ntpath.py, ...
Thanks
you probably don't need the imports as a tree, but as a simple list, so you can just copy the needed files. You can get that from sys.modules, after you import everything that your script needs - it will contain all modules needed by those that you imported, e.g.:
import sys # even if you don't use it - it's a built-in module, won't add a file to the list, needed to get sys.modules
import os
import time
#import whatever-else
# this gives a list of tuples (module,file)
m=[(z,x.__file__) for z,x in sys.modules.items() if hasattr(x,"__file__") ]
for x in m:
print x[0],x[1]

Properly importing modules in Python

How do I set up module imports so that each module can access the objects of all the others?
I have a medium size Python application with modules files in various subdirectories. I have created modules that append these subdirectories to sys.path and imports a group of modules, using import thisModule as tm. Module objects are referred to with that qualification. I then import that module into the others with from moduleImports import *. The code is sloppy right now and has several of these things, which are often duplicative.
First, the application is failing because some module references aren't assigned. This same code does run when unit tested.
Second, I'm worried that I'm causing a problem with recursive module imports. Importing moduleImports imports thisModule, which imports moduleImports . . . .
What is the right way to do this?
"I have a medium size Python application with modules files in various subdirectories."
Good. Make absolutely sure that each directory include a __init__.py file, so that it's a package.
"I have created modules that append these subdirectories to sys.path"
Bad. Use PYTHONPATH or install the whole structure Lib/site-packages. Don't update sys.path dynamically. It's a bad thing. Hard to manage and maintain.
"imports a group of modules, using import thisModule as tm."
Doesn't make sense. Perhaps you have one import thisModule as tm for each module in your structure. This is typical, standard practice: import just the modules you need, no others.
"I then import that module into the others with from moduleImports import *"
Bad. Don't blanket import a bunch of random stuff.
Each module should have a longish list of the specific things it needs.
import this
import that
import package.module
Explicit list. No magic. No dynamic change to sys.path.
My current project has 100's of modules, a dozen or so packages. Each module imports just what it needs. No magic.
Few pointers
You may have already split
functionality in various module. If
correctly done most of the time you
will not fall into circular import
problems (e.g. if module a depends
on b and b on a you can make a third
module c to remove such circular
dependency). As last resort, in a
import b but in b import a at the
point where a is needed e.g. inside
function.
Once functionality is properly in
modules group them in packages under
a subdir and add a __init__.py file
to it so that you can import the
package. Keep such pakages in a
folder e.g. lib and then either add
to sys.path or set PYTHONPATH env
variable
from module import * may not
be good idea. Instead, import whatever
is needed. It may be fully qualified. It
doesn't hurt to be verbose. e.g.
from pakageA.moduleB import
CoolClass.
The way to do this is to avoid magic. In other words, if your module requires something from another module, it should import it explicitly. You shouldn't rely on things being imported automatically.
As the Zen of Python (import this) has it, explicit is better than implicit.
You won't get recursion on imports because Python caches each module and won't reload one it already has.

Categories