Hi i am new to python and i want to understand the following in detail:
I've written a script, say 'foo.py', which uses the python html parser, i.e.
#!/usr/lib/python
from html.parser import HTMLParser # <-- executes ./tokenize.py ?!
...
Accidentally, in the current directory lies an other python script called 'tokenize.py'. By executing foo.py, the import line triggers tokenize.py to be executed as well. I guess the local directory has priority and the html.parser module has a tokenize.py as well...
But what exactly happens?
And what is a proper way to avoid such conflicts in the future?
thx
EDIT: I run python 3.3.2
You are right, modules in the current directory are loaded first.
The proper way to avoid this is always importing your modules by full name. In your case, in html.parser you should import tokenize like this:
from html.parser import tokenize
Instead of:
import tokenize
If html.parser is an external module which you don't control, just rename your tokenize.py to something else, e.g.:
from html.parser import tokenize as ext_tokenize
Related
I created two new files, random.py and main.py, in the directory. The code is as follows:
# random.py
if __name__ == "__main__":
print("random")
# main.py
import random
if __name__ == "__main__":
print(random.choice([1, 2, 3]))
When I run the main.py file, the program reports an error.
Traceback (most recent call last):
File "main.py", line 8, in <module>
print(random.choice([1, 2, 3]))
AttributeError: module 'random' has no attribute 'choice'
Main.py imports my own defined random module.
However, if I create a new sys.py file and a main.py file in the same directory, the code is as follows:
# sys.py
if __name__ == "__main__":
print("sys")
# main.py
import sys
if __name__ == "__main__":
print(sys.path)
When I run the main.py file, successfully.
main.py imports the built-in modules sys.
Why is there such a clear difference?
The directory relationship of the script file is as follows:
C:.
main.py
random.py
sys.py
Thank you very much for your answer.
Forgive my poor english.
sys is a built-in module, meaning it's compiled directly into the Python executable itself. Built-in modules outprioritize external files when Python is looking for modules. The standard random module isn't built-in, so it doesn't get that treatment.
Quoting the docs:
When the named module is not found in sys.modules, Python next searches sys.meta_path, which contains a list of meta path finder objects. These finders are queried in order to see if they know how to handle the named module...
Python’s default sys.meta_path has three meta path finders, one that knows how to import built-in modules, one that knows how to import frozen modules, and one that knows how to import modules from an import path (i.e. the path based finder).
Since the finder for built-in modules comes before the finder that searches the import path, built-in modules will be found before anything on the import path.
You can see a tuple of the names of all modules your Python has built-in in sys.builtin_module_names.
That said, while any built-in module would outprioritize a module loaded from a file, sys has its own special handling. sys is one of the foundational building blocks of Python, and much of the sys module's setup needs to happen before the import system is functional enough for the normal import process to work. sys gets explicitly created during interpreter setup in a way that bypasses the normal import system, and then future imports for sys find it in sys.modules without hitting any meta path finders.
How and where sys is created is an implementation detail that varies from Python version to Python version (and is wildly different in different Python implementations), but in the CPython 3.7.4 code, you can see it beginning on line 755 in Python/pylifecycle.c.
tl;dr Caching
sys is somewhat of a special case among other python modules because it gets loaded at program start, unconditionally (presumably because a lot of the constants, functions, and data within - such as the streams stdout and stderr - are used by the python interpreter). As #user2357112 noted in the other answer, this is partly because it's built-in to the python executable, but also because it's necessary for running a substantial amount of python's core functionality (see below how it needs to be loaded for imports to work). random is part of the standard library, but it doesn't get loaded automatically when you execute, which is the primary relevant difference between it and sys, for our purposes
Looking at python's documentation on the subject clarifies how python resolves imports:
The first place checked during import search is sys.modules. This mapping serves as a cache of all modules that have been previously imported, including the intermediate paths.
...
During import, the module name is looked up in sys.modules and if present, the associated value is the module satisfying the import, and the process completes. However, if the value is None, then a ModuleNotFoundError is raised. If the module name is missing, Python will continue searching for the module.
As for where it looks for the module, you can see in your observed behavior that it looks in the local directory first. That is, it searches the local directory first and then the "usual places" afterwards.
The reason for the discrepancy between how sys is handled and how random is handled is caching - sys is cached (so python doesn't even check the path to import), whereas random is not cached (so python does check the path to import it, and imports locally).
There are a few ways you can change this behavior.
First, if you must have a local module called sys, you can use importlib to import it in relative or absolute terms, without running into the ambiguity with the sys that's already cached. I have no idea how this would affect other modules that independently try to import sys, and you really shouldn't be naming your files the same as standard library modules anyway.
Alternatively, if you want the code to check python's built-in modules before checking the local directory, then you should be able to do that by modifying sys.path, which shows the order in which paths are searched for input (the same as the $PATH environment variable, or any other similar language-specific one). The first element of sys.path is usually going to be an empty string '', that would result in searching the current working directory. So you can simply move that to the back of sys.path, to have it searched last instead of first:
sys.path.append(sys.path.pop(0))
I am using the following command to run tests:
nosetests --with-coverage --cover-html --cover-package mypackage
I would like the coverage report to be updated, even if a developer adds new, untested, code to the package.
For example, imagine a developer adds a new module to the package but forgets to write tests for it. Since the tests may not import the new module, the code coverage may not reflect the uncovered code. Obviously this is something which could be prevented at the code review stage but it would be great to catch it even earlier.
My solution was to write a simple test which dynamically imports all modules under the top-level package. I used the following code snippet to do this:
import os
import pkgutil
for loader, name, is_pkg in pkgutil.walk_packages([pkg_dirname]):
mod = loader.find_module(name).load_module(name)
Dynamically importing sub-packages and sub-modules like this does not get picked up by the code coverage plugin in nose.
Can anyone suggest a better way to achieve this type of thing?
The problem seems to be the method for dynamically importing all packages/modules under the top-level package.
Using the method defined here seems to work. The key difference being the use of importlib instead of pkgutil. However, importlib was introduced in python 2.7 and 3.1 so this solution is not appropriate for older versions of python.
I have updated the original code snippet to use __import__ instead of the ImpLoader.load_module method. This also seems to do the trick.
import os
import pkgutil
for loader, name, is_pkg in pkgutil.walk_packages([pkg_dirname]):
mod = loader.find_module(name)
__import__(mod.fullname)
I am writing a series of python scripts to extend the QGIS software suite. As such I am trying to follow their naming conventions. In this instance the file I am trying to import is called "r.sun.distributed". Ive tried using import r.sun.distributed and import("r.sun.distributed") However, python says it can't find a module by that name in both cases.
Is there a work around, or do I need to rename my scripts so they fit Python's conventions?
You can use the imp module to import modules without .py extension:
import imp
imp.load_source(path)
you have to convert it to .pyc to call it.
import py_compile
py_compile.compile('filename.py')
First of all let me tell you that I'm a new user and I'm just starting to learn Python in College so my apologies if this question is answered in other topic, but I searched and I can't seem to find it.
I received a file work.pyc from my teacher and he says I have to import it in my Wing IDE using the command from work import *, the question is I don't know where to put the file to import it.
It just says ImportError: No module named work.
Thank you
There are several options for this.
The most straightforward is to just place it in the same folder as the py file that wants to import it.
You may also want to have a look at this
if you're using the python interpreter (the one that lets you directly input python code into it and executes) you'll have to do this:
sys.path.append('newpath')
from work import *
where newpath is the path on your filesystem containing your work.pyc file
If you're working on a script called main.py in the folder project, one option is to place it at project/work.pyc
This will make the module importable because it's in the same working directory as your code.
The way Python resolves import statements works like this (simplified):
The Python interpreter you're using (/usr/bin/python2.6 for example, there can be several on your system) has a list of search paths where it looks for importable code. This list is in sys.path and you can look at it by firing up your interpreter and printing it out like this:
>>> import sys
>>> from pprint import pprint
>>> pprint(sys.path)
sys.path usually contains the path to modules from the standard library, additional installed packages (usually in site-packages) and possibly other 3rd party modules.
When you do something like import foo, Python will first look if there is a module called foo.py in the directory your script lives. If not, it will search sys.path and try to import it from there.
As I said, this explanation is a bit simplified. The details are explained in the section about the module search path.
Note 1:
The *.pyc you got handed is compiled Python bytecode. That means it's contents are binary, it contains instructions to be executed by a Python virtual machine as opposed to source code in *.py that you will normally deal with.
Note 2:
The advice your teacher gave you to do from work import * is rather bad advice. It might be ok to do this for testing purposes in the interactive interpreter, but your should never do that in actual code. Instead you should do something like from work import chop, hack
Main reasons:
Namespace pollution. You're likely to import things you don't need but still pollute your global namespace.
Readability. If you ever read someone elses code and wonder where foo came from, just scroll up and look at the imports, and you'll see exactly where it's being imported from. If that person used import *, you can't do that.
I have a module that conflicts with a built-in module. For example, a myapp.email module defined in myapp/email.py.
I can reference myapp.email anywhere in my code without issue. However, I need to reference the built-in email module from my email module.
# myapp/email.py
from email import message_from_string
It only finds itself, and therefore raises an ImportError, since myapp.email doesn't have a message_from_string method. import email causes the same issue when I try email.message_from_string.
Is there any native support to do this in Python, or am I stuck with renaming my "email" module to something more specific?
You will want to read about Absolute and Relative Imports which addresses this very problem. Use:
from __future__ import absolute_import
Using that, any unadorned package name will always refer to the top level package. You will then need to use relative imports (from .email import ...) to access your own package.
NOTE: The above from ... line needs to be put into any 2.x Python .py files above the import ... lines you're using. In Python 3.x this is the default behavior and so is no longer needed.