I have a package mypack with modules mod_a and mod_b in it. I intend the package itself and mod_a to be imported freely:
import mypack
import mypack.mod_a
However, I'd like to keep mod_b for the exclusive use of mypack. That's because it exists merely to organize the latter's internal code.
My first question is, is it an accepted practice in Python programming to have 'private' modules like this?
If yes, my second question is, what is the best way to convey this intention to the client? Do I prefix the name with an underscore (i.e. _mod_b)? Or would it be a good idea to declare a sub-package private and place all such modules there?
I prefix private modules with an underscore to communicate the intent to the user. In your case, this would be mypack._mod_b
This is in the same spirit (but not completely analogous to) the PEP8 recommendation to name C-extension modules with a leading underscore when it’s wrapped by a Python module; i.e., _socket and socket.
The solution I've settled on is to create a sub-package 'private' and place all the modules I wish to hide in there. This way they stay stowed away, leaving mypack's module list cleaner and easier to parse.
To me, this doesn't look unpythonic either.
While there are not explicit private keywords there is a convention to have put private functions start with a single underscore but a double leading underscore will make it so others cannot easily call the function from outside the module. See the following from PEP 8
- _single_leading_underscore: weak "internal use" indicator. E.g. "from M
import *" does not import objects whose name starts with an underscore.
- single_trailing_underscore_: used by convention to avoid conflicts with
Python keyword, e.g.
Tkinter.Toplevel(master, class_='ClassName')
- __double_leading_underscore: when naming a class attribute, invokes name
mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).
- __double_leading_and_trailing_underscore__: "magic" objects or
attributes that live in user-controlled namespaces. E.g. __init__,
__import__ or __file__. Never invent such names; only use them
as documented.
To make an entire module private, don't include it __init__.py file.
One thing to be aware of in this scenario is indirect imports. If in mypack you
from mypack._mod_b import foo
foo()
Then a user can
from mypack import foo
foo()
and be none the wiser. I recommend importing as
from mypack import _mod_b
_mod_b.foo()
then a user will immediately see a red flag when they try to
from mypack import _mod_b
As for actual directory structure, you could even extend Jeremy's answer into a _package_of_this_kind package, where anything in that can have any 'access modifiers' on it you like - users will know there be dragons
Python doesn't strictly know or support "private" or "protected" methods or classes. There's a convention that methods prefixed with a single underscore aren't part of an official API, but I wouldn't do this on classes or files - it's ugly.
If someone really needs to subclass or access mod_b, why prevent him/her from doing so? You can always supply a preferred API in your documentation and document in your module that you shouldn't access it directly and use mypack in stead.
Related
I tried:
__all__ = ['SpamPublicClass']
But, of course that's just for:
from spammodule import *
Is there a way to block importing of a class. I'm worried about confusion on the API level of my code that somebody will write:
from spammodule import SimilarSpamClass
and it'll cause debugging mayhem.
The convention is to use a _ as a prefix:
class PublicClass(object):
pass
class _PrivateClass(object):
pass
The following:
from module import *
Will not import the _PrivateClass.
But this will not prevent them from importing it. They could still import it explicitly.
from module import _PrivateClass
Start the names of private classes with and underscore, so that it will be clear just by the name that it is not for public use. That will not actually prevent anybody from importing the class, but it shouldn't happen by accident. It's a well established convention that names starting with an underscore are "internal".
There is no way to actually block access to the contents of a module or the contents of a class for that matter in Python. This sort of thing is handled by convention name your class _SimilarSpamClass (with a leading underscore) to indicate to callers that this is an implementation detail of your module and not part of the published API.
To mark something as "private" in Python properly document your public API so other developers know how to use your module correctly and follow the standard naming conventions so that users of your module easily notice when they have strayed from your API to your implementation.
The Python PEP 8 style guide gives the following guidance for a single leading underscore in method names:
_single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose names start with an underscore.
What constitutes "internal use"?
Is this for methods only called within a given class?
MyClass:
def _internal_method(self):
# do_something
def public_method(self):
self._internal_method()
What about inherited methods - are they still considered "internal"?
BaseClass:
def _internal_method(self):
# do something
MyClass(BaseClass):
def public_method(self):
self._internal_method() # or super()._internal_method()
What about if the inheritance is from another module within a software package?
file1.py
BaseClass:
def _internal_method(self):
# do something
file2.py
from file1 import BaseClass
MyClass(BaseClass):
def public_method(self):
self._internal_method() # or super()._internal_method()
All these examples are fine technically, but are they all acceptable stylistically? At what point do you say the leading underscore is not necessary/helpful?
A single leading underscore is Python's convention for "private" and "protected" variables, available as hard-implementations in some other languages.
The "internal use" language is just to say that you are reserving that name, as developer, to be used by your code as you want, and other users of your module/code can't rely on the thing tied to that name to behave the same way in further versions, or even to exist. It is just the use case for "protected" attributes, but without a hard-implementation from the language runtime: users are supposed to know that attribute/function/method can be changed without any previous warning.
So, yes, as long as other classes using your _ prefixed methods are on the same code package - even if on other file, or folder (other completly distinct package), it is ok to use them.
If you have different Python packages, even if closely related, it would not be advisable to call directly on the internal stuff on the other package, style-wise.
And as for limits, sometimes there are entire modules and classes that are not supposed to be used by users of your class - and it would be somewhat impairing to prefix everything on those modules with an _ - I'd say that it is enough to document what public interfaces to your package users are supposed to call, and add on the docs that certain parts (modules/classes/functions) are designed for "internal use and may change without note" - no need to meddle with their names.
As an illustration, I am currently developing a set of tools/library for text-art on the terminal - I put everything users should call as public names in its __init__.py - the remaining names are meant to be "internal".
Is it possible to avoid importing a file with from file import function?
Someone told me i would need to put an underscore as prefix, like: _function, but isn't working.
I'm using Python 2.6 because of a legacy code.
There are ways you can prevent the import, but they're generally hacks and you want to avoid them. The normal method is to just use the underscore:
def _function():
pass
Then, when you import,
from my_module import *
You'll notice that _function is not imported because it begins with an underscore. However, you can always do this:
# In Python culture, this is considered rude
from my_module import _function
But you're not supposed to do that. Just don't do that, and you'll be fine. Python's attitude is that we're all adults. There are a lot of other things you're not supposed to do which are far worse, like
import my_module
# Remove the definition for _function!
del my_module._function
There is no privacy in Python. There are only conventions governing what external code should consider publicly accessible and usable.
Importing a module for the first time, triggers the creation of a module object and the execution of all top-level code in the module. The module object contains the global namespace with the result of that code having run.
Because Python is dynamic you can always introspect the module namespace; you can see all names defined, all objects those names reference, and you can access and alter everything. It doesn't matter here if those names start with underscores or not.
So the only reason you use a leading _ underscore for a name, is to document that the name is internal to the implementation of the module, and that external code should not rely on that name existing in a future version. The from module import * syntax will ignore such names for that reason alone. But you can't prevent a determined programmer from accessing such a name anyway. They simply do so at their own risk, it is not your responsibility to keep them from that access.
If you have functions or other objects that are only needed to initialise the module, you are of course free to delete those names at the end.
I've been reading through the source for the cpython HTTP package for fun and profit, and noticed that in server.py they have the __all__ variable set but also use a leading underscore for the function _quote_html(html).
Isn't this redundant? Don't both serve to limit what's imported by from HTTP import *?
Why do they do both?
Aside from the "private-by-convention" functions with _leading_underscores, there are:
Quite a few imported names;
Four class names;
Three function names without leading underscores;
Two string "constants"; and
One local variable (nobody).
If __all__ wasn't defined to cover only the classes, all of these would also be added to your namespace by a wildcard from server import *.
Yes, you could just use one method or the other, but I think the leading underscore is a stronger sign than the exclusion from __all__; the latter says "you probably won't need this often", the former says "keep out unless you know what you're doing". They both have their place.
__all__ indeed serves as a limit when doing from HTTP import *; prefixing _ to the name of a function or method is a convention for informing the user that that item should be considered private and thus used at his/her own risk.
This is mostly a documentation thing, in a similar vein to comments. A leading underscore is a clearer indication to a person reading the code that particular functions or variables aren't part of the public API than having that person check each name against __all__. PEP8 explicitly recommends using both conventions in this way:
To better support introspection, modules should explicitly declare
the names in their public API using the __all__ attribute. Setting
__all__ to an empty list indicates that the module has no public API.
Even with __all__ set appropriately, internal interfaces (packages,
modules, classes, functions, attributes or other names) should still
be prefixed with a single leading underscore.
This is a conceptual question rather than an actual problem, I wanted to ask the great big Internet crowd for feedback.
We all know imported modules end up in the namespace of that module:
# Module a:
import b
__all__ = ['f']
f = lambda: None
That allows you to do this:
import a
a.b # <- Valid attribute
Sometimes that's great, but most imports are side effects of the feature your module provides. In the example above I don't mean to expose b as a valid interface for callers of a.
To counteract that we could do:
import b as _b
This marks the import as private. But I can't find that practice described anywhere nor does PEP8 talk about using aliasing to mark imports as private. So I take it it's not common practice. But from a certain angle I'd say it's definitely semantically clearer, because it cleans up the exposed bits of your module leaving only the relevant interfaces you actually mean to expose. Working with an IDE with autocomplete it makes the suggested list much slimmer.
My question boils down to if you've seen that pattern in use? Does it have a name? What arguments would go against using it?
I have not had success using the the __all__ functionality to hide the b import. I'm using PyCharm and do not see the autocomplete list change.
E.g. from some module I can do:
import a
And the autocomplete box show both b and f.
While Martijn Pieters says that no one actually uses underscore-hiding module imports, that's not exactly true. The traces of this technique can be easily seen in the Python's standard library itself (see a related question). Let's check it:
$ git clone --depth 1 git#github.com:python/cpython.git
$ cd cpython/Lib
$ find -iname '*.py' | xargs grep 'as \+_' | wc -l
183
$ find -iname '*.py' | xargs grep '^import' | wc -l
4578
So, about 4% of all imports are underscore-prefixed — not a majority, but yet far from “no one”. There also are some examples in numpy and matplotlib packages.
For me, this import-underscoring is the only right way to import module without exposing it at public. Unfortunately, it totally ruins code appearance, so many developers avoid using it. But it has some advantages over the __all__ approach:
Library user can decide whether a name is private or not without consulting documentation by just looking at the name. Looking to just __all__ is not enough to tell private from public as some public names may be not listed there.
No need to maintain a refactoring-unfriendly list of code entity names.
To the conclusion, both _name and __all__ are just plain evil, but the thing which actually needs fixing is the Python's module system, designed under an impression of “simple is better than complex” mantra. Compare to, for example, the way how modules behave in Haskell.
UPD:
It looks like PEP-8 has already answered this question in its “Public and internal-interfaces” section:
Even with __all__ set appropriately, internal interfaces (packages, modules, classes, functions, attributes or other names) should still be prefixed with a single leading underscore.
No one uses that pattern, and it is not named.
That's because the proper method to use is to explicitly mark your exported names with the __all__ variable. IDEs will honour this variable, as do tools like help().
Quoting the import statement documentation:
The public names defined by a module are determined by checking the module’s namespace for a variable named __all__; if defined, it must be a sequence of strings which are names defined or imported by that module. The names given in __all__ are all considered public and are required to exist. If __all__ is not defined, the set of public names includes all names found in the module’s namespace which do not begin with an underscore character ('_'). __all__ should contain the entire public API. It is intended to avoid accidentally exporting items that are not part of the API (such as library modules which were imported and used within the module).
(Emphasis mine).
Also see Can someone explain __all__ in Python?