I know that classes in Python are typically cased using camelCase.
Is it also the normal convention to have the file that contains the class also be camelCase'd especially if the file only contains the class?
For example, should class className also be stored in className.py instead of class_name.py?
The following answer is largely sourced from this answer.
If you're going to follow PEP 8, you should stick to all-lowercase names, with optional underscores.
To quote PEP 8's naming conventions for packages & modules:
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability.
And for classes:
Class names should normally use the CapWords convention.
See this answer for the difference between a module, class and package:
A Python module is simply a Python source file, which can expose classes, functions and global variables.
The official convention is to use all lower case for file names (as others have already stated). The reason, however, has not been mentioned...
Since Python works cross platform (and it is common to use it in that manner), but file systems vary in the use of casing, it is better to just eliminate alternate cases. In Linux, for instance, it is possible to have MyClass.py and myclass.py in the same directory. That is not so in Windows!
On a related note, if you have MyClass.py and myclass.py in a git repo, or even just change the casing on the same file, git can act funky when you push/pull across from Linux and Windows.
And, while barely on topic, but in the same vein, SQL has these same issues where different standards and configurations may or may not allow UpperCases on table names.
I, personally, find it more pleasant to read TitleCasing / camelCasing even on filenames, but when you do anything that can work cross platform it's safest not to.
There is a difference in the naming convention of the class name and the file that contains this class. This missunderstanding might come from languages like java where it is common to have one file per class.
In python you can have several classes per modul (a simple .py file). The classes in this module/file should be called according to the class naming convention: Class names should normally use the CapWords convention.
The file containing this classes should follow the modul naming convention: Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability.
=> CamelCase should in the file camelcase.py (or camel_case.py if neccessary)
My question is, is it also the normal convention to have the file that
contains the class also be camelCase'd especially if the file only
contains the class
Short answer: No.
Longer answer: should be all lower case and underscores as needed.
From PEP8 "Package and Module Names":
Modules should have short, all-lowercase names. Underscores can be
used in the module name if it improves readability. Python packages
should also have short, all-lowercase names, although the use of
underscores is discouraged.
If you're unclear what a module is:
A module is a file containing Python definitions and statements. The
file name is the module name with the suffix .py appended.
First of all, as mentioned above, class names should be CapWords, e.g.:
class SampleClass:
...
BEWARE: Having the same name for file (module) and class creates confusions.
Example 1: Say you have the following module structure:
src/
__init__.py
SampleClass.py
main.py
Your SampleClass.py is:
class SampleClass:
...
Your main.py is:
from src import SampleClass
instance = SampleClass()
Will this code work? NO, cause you should've done either from src.SampleClass import SampleClass or instance = SampleClass.SampleClass(). Awkward code, isn't it?
You can also fix it by adding the following content to __init__.py:
from .SampleClass import SampleClass
Which leads to the Example 2.
Example 2: Say you develop a module:
src/
__init__.py
BaseClass.py
ConcreteClass.py
main.py
BaseClass.py content:
class BaseClass:
...
ConcreteClass.py content:
from src import BaseClass
class ConcreteClass(BaseClass):
...
And your __init__.py content:
from .ConcreteClass import ConcreteClass
from .BaseClass import BaseClass
And main.py content is:
from src import ConcreteClass
instance = ConcreteClass()
The code fails with an error:
class ConcreteClass(BaseClass):
TypeError: module() takes at most 2 arguments (3 given)
It took me a while to understand the error and why I cannot inherit from the class, cause in previous example when I added exports to __init__.py file everything worked. If you use snake case file names it does not fix the problem but the error is a bit easier to understand:
ImportError: cannot import name 'BaseClass' from partially initialized module 'src'
To fix the code you need to fix the import in ConcreteClass.py to be: from .BaseClass import BaseClass.
Last caveat, if in original code you would switch places imports in __init__.py so it looks like:
from .BaseClass import BaseClass
from .ConcreteClass import ConcreteClass
Initial code works, but you really don't want anyone to write a code that will depend on the order of imports. If someone changes the order or applies isort tool to organize imports, good luck fixing those bugs.
Related
Python Double-Underscore methods are hiding everywhere and behind everything in Python! I am curious about how this is specifically working with the interpreter.
import some_module as sm
From my current understanding:
Import searches for requested module
It binds result to the local assignment (if given)
It utilizes the __init__.py . . . ???
There seems to be something going on that is larger than my scope of understanding. I understand we use __init__() for class initialization. It is functioning as a constructor for our class.
I do not understand how calling import is then utilizing the __init__.py.
What exactly is happening when we run import?
How is __init__.py different from other dunder methods?
Can we manipulate this dunder method (if we really wanted to?)
import some_module is going to look for one of two things. It's either going to look for a some_module.py in the search path or a some_module/__init__.py. Only one of those should exist. The only thing __init__.py means when it comes to modules is "this is the module that represents this folder". So consider this folder structure.
foo/
__init__.py
module1.py
bar.py
Then the three modules available are foo (which corresponds to foo/__init__.py), foo.module1 (which corresponds to foo/module1.py), and bar (which corresponds to bar.py). By convention, foo/__init__.py will usually import important names from module1.py and reexport some of them for convenience, but this is by no means a requirement.
The Python documentation for the import statement (link) contains the following:
The public names defined by a module are determined by checking the module’s namespace for a variable named __all__; if defined, it must be a sequence of strings which are names defined or imported by that module.
The Python documentation for modules (link) contains what is seemingly a contradictory statement:
if a package’s __init__.py code defines a list named __all__, it is taken to be the list of module names that should be imported when from package import * is encountered.
It then gives an example where an __init__.py file imports nothing, and simply defines __all__ to be some of the names of modules in that package.
I have tested both ways of using __all__, and both seem to work; indeed one can mix and match within the same __all__ value.
For example, consider the directory structure
foopkg/
__init__.py
foo.py
Where __init__.py contains
# Note no imports
def bar():
print("BAR")
__all__ = ["bar", "foo"]
NOTE: I know one shouldn't define functions in an __init__.py file. I'm just doing it to illustrate that the same __all__ can export both names that do exist in the current namespace, and those which do not.
The following code runs, seemingly auto-importing the foo module:
>>> from foopkg import *
>>> dir()
[..., 'bar', 'foo']
Why does the __all__ attribute have this strange double-behaviour?
The docs seem really unclear on how it is supposed to be used, only mentioning one of its two sides in each place I linked. I understand the overall purpose is to explicitly set the names imported by a wildcard import, but am confused by the additional, seemingly auto-importing behaviour. Is this just a magic shortcut that avoids having to write the import out as well?
The documentation is a bit hard to parse because it does not mention that packages generally also have the behavior of modules, including their __all__ attribute. The behavior of packages is necessarily a superset of the behavior of modules, because packages, unlike modules, can have sub-packages and sub-modules. Behaviors not related to that feature are identical between the two as far as the end-user is concerned.
The python docs can be minimalistic at times. They did not bother to mention that
Package __init__ performs all the module-like code for a package, including support for star-import for direct attributes via __all__, just like a module does.
Modules support all the features of a package __init__.py, except that they can't have a sub-package or sub-module.
It goes without saying that to make a name refer to a sub-module, it has to be imported, hence the apparent, but not really double-standard.
Update: How from M import * actually works?
The __all__ in __init__.py of folder foopkg works the same way as __all__ in foopkg.py
Why it'll auto-import foo you can see here: https://stackoverflow.com/a/54799108/12565014
The most import thing is to look at the cpython implementation: https://github.com/python/cpython/blob/fee552669f21ca294f57fe0df826945edc779090/Python/ceval.c#L5152
It basically loop through __all__ and try to import each element in __all__
That's why it'll auto-import foo and also achieve white listing
The Python PEP 8 style guide gives the following guidance for a single leading underscore in method names:
_single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose names start with an underscore.
What constitutes "internal use"?
Is this for methods only called within a given class?
MyClass:
def _internal_method(self):
# do_something
def public_method(self):
self._internal_method()
What about inherited methods - are they still considered "internal"?
BaseClass:
def _internal_method(self):
# do something
MyClass(BaseClass):
def public_method(self):
self._internal_method() # or super()._internal_method()
What about if the inheritance is from another module within a software package?
file1.py
BaseClass:
def _internal_method(self):
# do something
file2.py
from file1 import BaseClass
MyClass(BaseClass):
def public_method(self):
self._internal_method() # or super()._internal_method()
All these examples are fine technically, but are they all acceptable stylistically? At what point do you say the leading underscore is not necessary/helpful?
A single leading underscore is Python's convention for "private" and "protected" variables, available as hard-implementations in some other languages.
The "internal use" language is just to say that you are reserving that name, as developer, to be used by your code as you want, and other users of your module/code can't rely on the thing tied to that name to behave the same way in further versions, or even to exist. It is just the use case for "protected" attributes, but without a hard-implementation from the language runtime: users are supposed to know that attribute/function/method can be changed without any previous warning.
So, yes, as long as other classes using your _ prefixed methods are on the same code package - even if on other file, or folder (other completly distinct package), it is ok to use them.
If you have different Python packages, even if closely related, it would not be advisable to call directly on the internal stuff on the other package, style-wise.
And as for limits, sometimes there are entire modules and classes that are not supposed to be used by users of your class - and it would be somewhat impairing to prefix everything on those modules with an _ - I'd say that it is enough to document what public interfaces to your package users are supposed to call, and add on the docs that certain parts (modules/classes/functions) are designed for "internal use and may change without note" - no need to meddle with their names.
As an illustration, I am currently developing a set of tools/library for text-art on the terminal - I put everything users should call as public names in its __init__.py - the remaining names are meant to be "internal".
I've got a really complex singleton object. I've decided to modify it, so it'll be a separate module with module--wide global variables that would store data.
Are there some pitfalls of this approach? I just feel, like that's a little bit hacky, and that there may be some problems I cannot see now.
Maybe someone did this or have some opinion :) Thanks in advance for help.
Regards.
// Minimal, Complete, and Verifiable example:
"""
This is __init__.py of the module, that could be used as a singleton:
I need to set and get value of IMPORTANT_VARIABLE from different places in my code.
Folder structure:
--singleton_module
|
-__init__.py
Example of usage:
import singleton_module as my_singleton
my_singleton.set_important_variable(3)
print(my_singleton.get_important_variable())
"""
IMPORTANT_VARIABLE = 0
def set_important_variable(value):
global IMPORTANT_VARIABLE
IMPORTANT_VARIABLE = value
def get_important_variable():
return IMPORTANT_VARIABLE
Technically, Python modules ARE singletons, so from this point of view there's no particular issue (except the usual issues with singletons that is) with your code. I'd just spell the varibale in all_lower (ALL_UPPER denotes a pseudo-constant) and prefix it with either a single ("protected") or double ("really private") leading underscore to make clear it's not part of the public API (standard Python naming convention).
Now whether singletons are a good idea is another debate but that's not the point here...
e.g that in one potential situation I may lost data, or that module could be imported in different places of code two times, so it would not be a singleton if imported inside scope of function or something like that.
A module is only instanciated once per process (the first time it's imported), then subsquent imports will directly get if from sys.modules. The only case where you could have two distinct instances of the same module is when the module is imported by two different path, which can only happens if you have a somewhat broken sys.path ie something like this:
src/
foo/
__init.py
bar/
__init__.py
baaz/
__init__.py
mymodule.py
with both "src" and "foo" in sys.path, then importing mymodule once as from foo.bar.baaz import mymodule and a second time as from bar.baaz import mymodule
Needless to say that it's a degenerate case, but it can happens and lead to hard to diagnose bugs. Note that when you have this case, you do have quite a few other things that breaks, like identity testing anything from mymodule.
Also, I am not sure how would using object instead of module increase security
It doesn't.
And I am just asking, if that's not a bad practice, maybe someone did this and found some problems. This is probably not a popular pattern
Well, quite on the contrary you'll often find advises on using modules as singletons instead of using classes with only staticmethods, classmethods and class attributes (another way of implementing a singleton in Python). This most often concerns stateless classes used as namespaces while your example does have a state, but this doesn't make much practical difference.
Now what you won't get are all the nice OO features like computed attributes, inheritance, magicmethods etc, but I assume you already understood this.
As far as I'm concerned, depending on the context, I might rather use a plain class but only expose one single instance of the class as the module's API ie:
# mymodule.py
__all__ = ["mysingleton"]
class __MySingletonLike(object):
def __init__(self):
self._variable = 42
#property
def variable(self):
return self._variable
#variable.setter
def variable(self, value):
check_value(value) # imaginary validation
self._variable = value
mysingleton = __MySingleton()
but that's only when I have special concerns about the class (implementation reuse, proper testability, other special features requiring a class etc).
I have a package mypack with modules mod_a and mod_b in it. I intend the package itself and mod_a to be imported freely:
import mypack
import mypack.mod_a
However, I'd like to keep mod_b for the exclusive use of mypack. That's because it exists merely to organize the latter's internal code.
My first question is, is it an accepted practice in Python programming to have 'private' modules like this?
If yes, my second question is, what is the best way to convey this intention to the client? Do I prefix the name with an underscore (i.e. _mod_b)? Or would it be a good idea to declare a sub-package private and place all such modules there?
I prefix private modules with an underscore to communicate the intent to the user. In your case, this would be mypack._mod_b
This is in the same spirit (but not completely analogous to) the PEP8 recommendation to name C-extension modules with a leading underscore when it’s wrapped by a Python module; i.e., _socket and socket.
The solution I've settled on is to create a sub-package 'private' and place all the modules I wish to hide in there. This way they stay stowed away, leaving mypack's module list cleaner and easier to parse.
To me, this doesn't look unpythonic either.
While there are not explicit private keywords there is a convention to have put private functions start with a single underscore but a double leading underscore will make it so others cannot easily call the function from outside the module. See the following from PEP 8
- _single_leading_underscore: weak "internal use" indicator. E.g. "from M
import *" does not import objects whose name starts with an underscore.
- single_trailing_underscore_: used by convention to avoid conflicts with
Python keyword, e.g.
Tkinter.Toplevel(master, class_='ClassName')
- __double_leading_underscore: when naming a class attribute, invokes name
mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).
- __double_leading_and_trailing_underscore__: "magic" objects or
attributes that live in user-controlled namespaces. E.g. __init__,
__import__ or __file__. Never invent such names; only use them
as documented.
To make an entire module private, don't include it __init__.py file.
One thing to be aware of in this scenario is indirect imports. If in mypack you
from mypack._mod_b import foo
foo()
Then a user can
from mypack import foo
foo()
and be none the wiser. I recommend importing as
from mypack import _mod_b
_mod_b.foo()
then a user will immediately see a red flag when they try to
from mypack import _mod_b
As for actual directory structure, you could even extend Jeremy's answer into a _package_of_this_kind package, where anything in that can have any 'access modifiers' on it you like - users will know there be dragons
Python doesn't strictly know or support "private" or "protected" methods or classes. There's a convention that methods prefixed with a single underscore aren't part of an official API, but I wouldn't do this on classes or files - it's ugly.
If someone really needs to subclass or access mod_b, why prevent him/her from doing so? You can always supply a preferred API in your documentation and document in your module that you shouldn't access it directly and use mypack in stead.