I'm studying Python as beginner, please just don't blast me.
I've just studied that a module in Python is a collection of classes and functions; a package, instead, is just a way to identify modules in directories and subdirectories. So, a package in Python should not contain any classes and functions, and NumPy should calle d "module". Am I correct ?
The fact is that official doc for NumPy says :
NumPy is the fundamental package for scientific computing with Python
NumPy is a package. A package is represented by the file __init__.py:
>>> import numpy as np
>>> np
<module 'numpy' from '.../site-packages/numpy/__init__.py'>
When you look into this file, which is pretty long, you will find a lot of imports:
from . import core
from .core import *
from . import compat
from . import lib
from .lib import *
Therefore, all the names imported directly into __init__.py are available at the package level:
>>> np.array
<function numpy.core.multiarray.array>
But as you can see the function array is actually located deep down in the package directory hierarchy. Since always typing numpy.core.multiarray.array is pretty tiresome, importing this function into __init__.py makes a lot of sense becaues now you can type np.array instead.
These terms are often used quite vaguely, but in theory, yes - a module is a collection of classes and functions, while a package is a collection of (one or more) modules. However there are very few cases where a package just contains a module and no supporting code - since any package may want to provide e.g. __version__, __all__ etc, or with subpackages offer methods that provide helper functions relating to imports etc.
So numpy is definitely a package, since it includes several sub-packages (doc, random, fft etc). Of course, it is also a module since it has 'top-level' classes and functions (e.g. numpy.array).
As the other answers make clear, numpy is technically a package (a directory of importable things), but in this case I think the sentence you cite is using the term in its other sense: A package is a thing you can install. PyPI is the Python Package Index, and pip stands for Pip Installs Packages. Both PyPI and pip can deal with things that are single-file modules. Package in this case is the general term for anything installable into your Python environment.
Related
I wrote a custom python package for Ansible to handle business logic for some servers I manage. I have multiple files and they reference each other by re-importing the package.
So my package named <MyCustomPackage> has functions <Function1> <Function2> <Function3>, etc all in their own files... Some of these functions reference functions in the same package, so to do that the file has:
import MyCustomPackage
at the top. I did it this way instead of a relative import because I'm also unit testing these and mocking would not work with relative paths because of a __init__ file in the test directory which was needed for test discovery. The only way I could mock was through importing the package itself. Seemed simple enough.
The problem is with Ansible. These packages are in module_utils. I import them with:
from ansible.module_utils.MyCustomPackage import MyCustomPackage
but when I use the commands I get module not found errors - and traced it back to the import MyCustomPackage statement in the package itself.
So - how should I be structuring my package? Should I try again with relative file imports, or have the package modify the path so it's found with the friendly name?
Any tips would be helpful! Or if someone has a module they've written with Python modules in module_utils and unit tests that they'd be willing to share, that'd be great also!
Many people have problems with relative imports and imports in general in Python because they are ambiguous and surprisingly depend on your current working directory (and other things).
Thus I've created an experimental, new import library: ultraimport
It gives you more control over your imports and lets you do file system based, relative imports.
Given that you have a file function1.py, to import a function from function2.py, you would then write:
import ultraimport
Function2 = ultraimport('__dir__/function2.py', 'Function2')
This will always work, no matter how you run your code. It also does not force you to a specific package structure. You can just have any files you like.
Under normal circumstance, external python modules such as scipy and numpy are compiled into shared objects when being installed (The part written in C). When python calls import scipy, it will dynamically load these shared objects.
Now I am working on a platform which does not support any dynamic loading function. As a result, I have to link those modules statically with python.
My current approach is to compile all source code of scipy/numpy with python, and call the module initialization function when python initializes.
Py_initializeEx(){
...
//init scipy modules statically
//below are scipy modules init functions
init_comb();
init_cython_special();
...
}
However, this brings me another problem. I found in many python module initialization functions, especially when they are auto generated from cython, they contain codes to import its parent packages. For example, the cython_special() calls import scipy, but when it is being called, the scipy initialization is not completed yet.
My question is, is there an easy way I can linked these modules statically? What is your suggestions to solve this problem?
Thanks.
PyImport_AppendInittab - this tells Python in advance of a module initialization function associated with a specific name. You'd identify all the modules you need to use that are compiled, link them statically, and then before Py_Initialize you add them to the Inittab.
Nothing happens until the module is imported at runtime when the correct initialization function is run.
If I got you right, what you could do is add a path to a dir where the modules will be located at.
import sys
sys.path.insert(0,'/path/to/modules')
from module1 import *
from module2 import *
etc.
I have created a Python package which builds on the structure indicated in Kenneth Reitz' "Repository Structure and Python" (1). The main package path is:
/projects-folder (not site-packages)
/package
/package
__init__.py
Datasets.py
Draw.py
Gmaps.py
ShapeSVG.py
project.py
__init__.py
setup.py
With the current structure, I must use the following module import syntax:
import package.package.Datasets
I would prefer to type the following:
import package.Datasets
I am capable of typing the same word twice, of course, but it feels wrong in a deeper sense, i.e., I am structuring my package incorrectly or misunderstanding how Python interprets that structure.
The outer __init__.py is required for Python to detect this package at all, per the docs (2). But that sets up /package/ as the top level of the package and /package/package/ as a sub-package, forcing me into the unwieldy import syntax above.
To avoid this, it seems that my options are to:
Create a package in which the outer folder contains the top level of package modules.
Add the inner folder to my PYTHONPATH environment variable.
Yet both of these seem like suboptimal workarounds for something that shouldn't be an issue in the first place. What should I do?
You've misunderstood. You have two package packages for some reason, but the source you cite never said to do that. The outer folder, with setup.py, is not supposed to be a package.
It sounds like you're running Python in projects-folder and trying to import your package from there. That's not what you should be doing. You have several options to get your package into the import system. (I'll refer to the folder with setup.py in it as setupfolder, to distinguish it from the inner folder):
Build your package with setup.py, for example, python setup.py bdist-wheel --universal, and install the built package with pip.
Skip the build step and just run pip install path/to/setupfolder. Building the package produces an installer useful if you want to distribute your package, but maybe you don't want to do that.
"Install" the package's source tree in development mode with pip install -e path/to/setupfolder, so the Python import system will locate the package's source tree when performing imports. This is handy because you don't have to rebuild and reinstall if you edit the source repository, although you'll still want to restart any running Python processes that are using the package.
Run Python from directly inside the setupfolder.
Any of these options will cause your package to be importable directly as package instead of package.package, as it should be.
While I do not entirely agree with your package structure, you can make use of __all__ and possibly the one legitimate use for star imports I've seen so far. __init__.py can serve more purposes than just identifying your folder as a package or sub-package.
Using a Star Import
In package/package/__init__.py, add a variable __all__ that declares all the public elements you want to export:
__all__ = ['Datasets', 'Draw', 'Gmaps', 'ShapeSVG', 'project']
In package/__init__.py do from package.package import *. Now all the attributes that were available as package.package.x will also be available as package.x.
If you want to directly copy package.package.__all__ to package.__all__ (which is optional, but will allow you to do from package import * properly), you can do something like
from package.package import *
from package.package import __all__ as _all
__all__ = _all
del _all
Not Using a Star Import
You can accomplish the same thing without using package.package.__all__ at all. Just add __all__ directly to package/__init__.py and use from package.package import x-style imports:
from package.package import (
Datasets, Draw, Gmaps, ShapeSVG, project
)
# As before, package.__all__ is optional
__all__ = ['Datasets', 'Gmaps', 'ShapeSVG', 'project']
I would still recommend having a package.package.__all__ variable, but it is optional for this particular purpose.
Pros and Cons
Both approaches are pretty legitimate and I have seen both used in major projects. The first approach reduces redundancy. You only define the public exports in one place: package.package.__all__. The star imports and package.__all__ reference that definition directly, leading to one place that you really have to maintain. On the other hand, there are times when you want to separate the "full" package.package.x API from what you expose via package.x to the casual user. In that case, go with the second option. The only downside here is that you have to be more careful to keep package.__all__ and the corresponding imports synchronized properly.
Note
A number of projects I've seen (numpy especially comes to mind), export the attributes of the individual modules to the top level using this technique. For example, if you had a function package.package.Datasets.get_data, it would be listed in package.package.Datasets.__all__, which would be imported into pacakge.package.__init__, appended to package.package.__all__, and then be referenced by the top-level package and package.__all__.
Maybe it's not possible (I'm more used to Ruby, where this sort of thing is fine). I'm writing a library that provides additional functionality to docker-py, which provides the docker package, so you just import docker and then you get access to docker.Client etc.
Because it seemed a logical naming scheme, I wanted users to pull in my project with import docker.mymodule, so I've created a directory called docker with an __init__.py, and mymodule.py inside it.
When I try to access docker.Client, Python can't see it, as if my docker package has hidden it:
import docker
import docker.mymodule
docker.Client() # AttributeError: 'module' object has no attribute 'Client'
Is this possible, or do all top-level package names have to differ between source trees?
This would only be possible if docker was set up as a namespace package (which it isn't).
See zope.schema, zope.interface, etc. for an example of a namespace package (zope is the namespace package here). Because zope is declared as a namespace package in setup.py, it means that zope doesn't refer to a particular module or directory on the file system, but is a namespace shared by several packages. This also means that the result of import zope is pretty much undefined - it will simply import the top-level module of the first zope.* package found in the import path.
Therefore, when dealing with namespace packages, you need to explicitely import a specific one with import zope.schema or from zope import schema.
Unfortunately, namespace packages aren't that well documented. As noted by #Bakuriu in the comment, these are some resources that contain some helpful information:
Stackoverflow: How do I create a namespace package in Python?
Built-in support for namespace packages in Python 3.3
Namespace packages in the setuptools documentation
Post about namespace packages at sourceweaver.com
How do I set up module imports so that each module can access the objects of all the others?
I have a medium size Python application with modules files in various subdirectories. I have created modules that append these subdirectories to sys.path and imports a group of modules, using import thisModule as tm. Module objects are referred to with that qualification. I then import that module into the others with from moduleImports import *. The code is sloppy right now and has several of these things, which are often duplicative.
First, the application is failing because some module references aren't assigned. This same code does run when unit tested.
Second, I'm worried that I'm causing a problem with recursive module imports. Importing moduleImports imports thisModule, which imports moduleImports . . . .
What is the right way to do this?
"I have a medium size Python application with modules files in various subdirectories."
Good. Make absolutely sure that each directory include a __init__.py file, so that it's a package.
"I have created modules that append these subdirectories to sys.path"
Bad. Use PYTHONPATH or install the whole structure Lib/site-packages. Don't update sys.path dynamically. It's a bad thing. Hard to manage and maintain.
"imports a group of modules, using import thisModule as tm."
Doesn't make sense. Perhaps you have one import thisModule as tm for each module in your structure. This is typical, standard practice: import just the modules you need, no others.
"I then import that module into the others with from moduleImports import *"
Bad. Don't blanket import a bunch of random stuff.
Each module should have a longish list of the specific things it needs.
import this
import that
import package.module
Explicit list. No magic. No dynamic change to sys.path.
My current project has 100's of modules, a dozen or so packages. Each module imports just what it needs. No magic.
Few pointers
You may have already split
functionality in various module. If
correctly done most of the time you
will not fall into circular import
problems (e.g. if module a depends
on b and b on a you can make a third
module c to remove such circular
dependency). As last resort, in a
import b but in b import a at the
point where a is needed e.g. inside
function.
Once functionality is properly in
modules group them in packages under
a subdir and add a __init__.py file
to it so that you can import the
package. Keep such pakages in a
folder e.g. lib and then either add
to sys.path or set PYTHONPATH env
variable
from module import * may not
be good idea. Instead, import whatever
is needed. It may be fully qualified. It
doesn't hurt to be verbose. e.g.
from pakageA.moduleB import
CoolClass.
The way to do this is to avoid magic. In other words, if your module requires something from another module, it should import it explicitly. You shouldn't rely on things being imported automatically.
As the Zen of Python (import this) has it, explicit is better than implicit.
You won't get recursion on imports because Python caches each module and won't reload one it already has.