What does `__import__('pkg_resources').declare_namespace(__name__)` do? - python

In some __init__.py files of modules I saw such single line:
__import__('pkg_resources').declare_namespace(__name__)
What does it do and why people use it? Suppose it's related to dynamic importing and creating namespace at runtime.

It boils down to two things:
__import__ is a Python function that will import a package using a string as the name of the package. It returns a new object that represents the imported package. So foo = __import__('bar') will import a package named bar and store a reference to its objects in a local object variable foo.
From setup utils pkg_resources' documentation, declare_namespace() "Declare[s] that the dotted package name name is a "namespace package" whose contained packages and modules may be spread across multiple distributions."
So __import__('pkg_resources').declare_namespace(__name__) will import the 'pkg_resources' package into a temporary and call the declare_namespace function stored in that temporary (the __import__ function is likely used rather than the import statement so that there is no extra symbol left over named pkg_resources). If this code were in my_namespace/__init__.py, then __name__ is my_namespace and this module will be included in the my_namespace namespace package.
See the setup tools documentation for more details
See this question for discussion on the older mechanism for achieving the same effect.
See PEP 420 for the standardized mechanism that provides similar functionality beginning with Python 3.3.

This is a way to declare the so called "namespace packages" in Python.
What are these and what is the problem:
Imagine you distribute a software product which has a lot of functionality, and not all people want all of it, so you split it into pieces and ship as optional plugins.
You want people to be able to do
import your_project.plugins.plugin1
import your_project.plugins.plugin2
...
Which is fine if your directory structure is exactly as above, namely
your_project/
__init__.py
plugins/
__init__.py
plugin1.py
plugin2.py
But what if you ship those two plugins as separate python packages so they are located in two different directories? Then you might want to put __import__('pkg_resources').declare_namespace(__name__) in each package's __init__.py so that Python knows those packages are part of a bigger "namespace package", in our case it's your_project.plugins.
Please refer to the documentation for more info.

Related

Python modules confusion

I feel a bit confused with how python modules work when I started looking at PyMySQL repository, see here: https://github.com/PyMySQL/PyMySQL?files=1
1) Why is there no pymysql.py file because it is imported like: import pymysql? Isnt it required to have such a file?
2) I cannot find the method connect, used like: pymysql.connect(...), anywhere. Is it possible to rename exported methods somehow?
There's a directory pymysql there. A directory can also be imported as a module*, with the advantage that it can contain submodules. Classically, there's a __init__.py file in the directory that controls what's in the top-level pymysql.* namespace.
So, the connect method you're missing will either be defined directly in pymysql/__init__.py, or defined in one of its siblings in that directory, and then imported from there by pymysql/__init__.py.
*Strictly speaking, a directory that you import like a module is really called a "package". I like to avoid that term—it's potentially confusing because the term is overloaded: what you install with pip is also called a "package" in sense 2, and that might actually contain multiple "packages" in sense 1.
See What is __init__.py for? and the official docs

Python package cannot be imported

I have a Python module called util. I would like to import a script in this package _util.py from another script in scripts folder.
Even if the util package has an empty __init__.py file it does not appear as a Python package but a normal directory, without the small dot on folder image.
How can I import this module?
A preliminary answer to your question is that modules or methods beginning with underscores are meant to be used internally.
_my_method() should only be referenced from within the module holding it
_my_module() should only be referenced from within the package holding it
That being said, this convention is meant to be a hint to other developers, not a strict prohibition. Perhaps the first step you can take to solve the import issue is to rename _util.py to util.py and proceed from there.

Use cases for __init__.py in python 3.3+

Now that __init__.py is no longer required to make a directory recognized as a package, is it best practice to avoid them entirely if possible? Or are there still well-accepted use cases for __init__.py in python 3.3+?
From what I understand, __init__.py were very commonly used to run code at module import time (for example to encapsulate internal file structure of the package or to perform some initialization steps). Are these use cases still relevant with python 3.3+?
There's a very good discussion of this in this answer, and you should probably be familiar with PEP 420 to clarify the difference between and regular packages (use __init__.py) and namespace packages (don't).
What I offer by way of answer is a combination of reading, references, and opinion. No claims to being "canonical" or "pythonic" here.
Are [initialization] use cases still relevant with python 3.3+?
Yes. Take your example as a use case, where the package author wants to bring several things into the root package namespace so the user doesn't have to concern themselves with its internal structure.
Another case is creating a hierarchy of modules. That reference (O'Reilly) actually says:
The purpose of the __init__.py files is to include optional initialization code that runs as different levels of a package are encountered.
They do consider namespace packages in that discussion, but continue:
All things being equal, include the __init__.py files if you’re just starting out with the creation of a new package.
So, for your second question,
is it best practice to avoid __init__.py entirely if possible?
No, unless your intent is to create a namespace package rather than a regular package, in which case you must not use __init__.py.
Why might you want that? The O'Reilly reference has the clearest example I've seen about why namespace packages are cool, which is being able to collapse namespaces from separate, independently-maintained packages:
foo-package/
spam/
blah.py
bar-package/
spam/
grok.py
Which allows
>>> import sys
>>> sys.path.extend(['foo-package', 'bar-package'])
>>> import spam.blah
>>> import spam.grok
>>>
So anyone can extend the namespace with their own code. Cool.

Python File Structure on GitHub

I've been looking around at some open source projects on Python, and I'm seeing a lot of files and patterns that I'm not familiar with.
First of all, a lot of projects just have a file called setup.py, which usually contains one function:
setup(blah, blah, blah)
Second, a lot contain a file that is simply called __init__.py and contains next to no information.
Third, some .py files contain a statement similar to this:
if __name__ == "__main__"
Finally, I'm wondering if there are any "best practices" for dividing Python files up in a git repository. With Java, the idea of file division comes pretty naturally because of the class structure. With Python, many scripts have no classes at all, and sometimes a program will have OOP aspects, but a class by class division does not make that much sense. Is it just "whatever makes the code the most readable," or are there some guidelines somewhere about this?
The setup.py is part of Python’s module distribution using the distrubution utilities. It allows for easy installation of the Python module and is useful when, well, you want to distribute your project as a whole Python module.
The __init__.py is used for Python’s package system. An empty file is usually enough to make Python recognize the directory it is in as a package, but you can also define different things in it.
Finally, the __name__ == '__main__' check is to ensure that the current script is run directly (e.g. from the command line) and it is not just imported into some other script. During a Python script execution only a single module’s __name__ property will be equal to __main__. See also my answer here or the more general question on that topic.
The setup.py is part of distutils setup process. You'll want to have one of those if you're distributing a module instead of just a basic script (which even then it's a good idea to have one so you can easily expand into a module later).
The __init__.py part of the python module import process:
Files named init.py are used to mark directories on disk as a
Python package directories. If you have the files
mydir/spam/init.py mydir/spam/module.py and mydir is on your path,
you can import the code in module.py as:
import spam.module or
from spam import module If you remove the init.py file, Python
will no longer look for submodules inside that directory, so attempts
to import the module will fail.
if __name == "__main__" is a way to indicate code that would be executed if the file was run directly instead of imported.
To answer on how to layout your code, the distfiles documentation has a good guide on this.
In addition to #poke's answer, see this related question on what the directory structure of a python project should be. Here is another useful tutorial on how to make your project easily runnable.

python module layout

I'm just starting to get to the point in my python projects that I need to start using multiple packages and I'm a little confused on exactly how everything is supposed to work together. What exactly should go into the __init__.py of the package? Some projects I see just have blank inits and all of their code are in modules in that package. Other projects implement what seems to be the majority of the package's classes and functions inside the init.
Is there a document or style guide or something that describes what the python authors had in mind for the use of packages and the __init__ file and such?
Edit:
I know the point of having the __init__.py file in the simplest sense that it makes a folder a package. But why would I put a function there instead of a module in that same folder(package)?
__init__.py can be empty, but what it really does is make sure Python treats your directories correctly, provide any initialization you might need for when your package is imported (configuring the environment or something along those lines), or defining __all__ so that Python knows what to do when someone uses from package import *.
Most everything you need to know is described in the docs on Packages. Dive Into Python also has a piece on packaging.
You already know, I guess that __init__.py files are required to make Python treat the directories as containing packages.
In the above model __init__.py can remain empty.
You can can also execute initialization code for the package.
You can also set the __all__ variable.
[Edit: learnings]
When you do "from package import item", or "from package import *", then the variable __all__ can be used to import selected packages.
See : http://docs.python.org/tutorial/modules.html

Categories