Python Double-Underscore methods are hiding everywhere and behind everything in Python! I am curious about how this is specifically working with the interpreter.
import some_module as sm
From my current understanding:
Import searches for requested module
It binds result to the local assignment (if given)
It utilizes the __init__.py . . . ???
There seems to be something going on that is larger than my scope of understanding. I understand we use __init__() for class initialization. It is functioning as a constructor for our class.
I do not understand how calling import is then utilizing the __init__.py.
What exactly is happening when we run import?
How is __init__.py different from other dunder methods?
Can we manipulate this dunder method (if we really wanted to?)
import some_module is going to look for one of two things. It's either going to look for a some_module.py in the search path or a some_module/__init__.py. Only one of those should exist. The only thing __init__.py means when it comes to modules is "this is the module that represents this folder". So consider this folder structure.
foo/
__init__.py
module1.py
bar.py
Then the three modules available are foo (which corresponds to foo/__init__.py), foo.module1 (which corresponds to foo/module1.py), and bar (which corresponds to bar.py). By convention, foo/__init__.py will usually import important names from module1.py and reexport some of them for convenience, but this is by no means a requirement.
I have an example file structure provided below.
/.git
/README.md
/project
/Operation A
generateinsights.py
insights.py
/Operation B
generatetargets.py
targets.py
generateinsights.py is run; it references insights.py to get the definition of an insight object. Next, generatetargets.py is run; it refrences targets.py to get the definition of a target object. The issue that I have, is generatetargets.py also needs to understand what an insight object is. How can I set up my imports so that insights.py and targets.py can be referenced by anything in the project directory? It seems like I should use _ init _.py for this, but I can't get it to work properly.
Firstly, you have to rename Operation A and Operation B so that they are composed of only letters, numbers and underscores, for example Operation_A - this is needed to be able to use these in an import statement without raising a SyntaxError.
Then, put an __init__.py file into the project, Operation_A and Operation_B folders. You can leave it empty, but you can also for example define additional attributes for your module.
Finally, you need to make Python find your modules - for this, either:
set your PYTHONPATH environment variable so that it includes the folder containing project or
put the package folder somewhere into Python's default import directories, for example in ยด/usr/lib/python3/site-packages` (requires root permissions)
After that you can import both targets.py and insights.py from any place like this:
from project.Operation_A import insights
from project.Operation_B import targets
Say I have a file called A.py and for whatever reason I decided to consolidate it and some other files into B.py
Users previously did from A import func. What's the correct way to handle this while keeping backwards compatibility?
I was considering keeping A.py and in there, putting from .B import func
Let's say that we have a directory with the following structure:
tests/
|-- __init__.py
|-- test_foo.py
where package foo is tested. In test_foo.py, variable bar is defined (and modified) to later be used.
Now imagine that instead of one file, we have around 20 test_fooX.py, where bar is initialized in every test.
Is it good practice to initiate bar in __init__.py and import it directly in every test? E.g.
from __init__ import bar
The Zen of python mentions that:
Explicit is better than implicit.
Defining bar in every single script would be what the explicit way. However, importing variables improves the structure of the tests/project.
A real scenario would be a logger (imported from foo), whose logging level needs to be changed; or the location of a specific directory instead of defining it every time.
There's nothing really implicit about __init__.py. A package is a module. Because a package is implemented by a directory containing a file named __init__.py, that file contains the contents of the module tests, with other files implementing submodules belonging to the same package.
I know that classes in Python are typically cased using camelCase.
Is it also the normal convention to have the file that contains the class also be camelCase'd especially if the file only contains the class?
For example, should class className also be stored in className.py instead of class_name.py?
The following answer is largely sourced from this answer.
If you're going to follow PEP 8, you should stick to all-lowercase names, with optional underscores.
To quote PEP 8's naming conventions for packages & modules:
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability.
And for classes:
Class names should normally use the CapWords convention.
See this answer for the difference between a module, class and package:
A Python module is simply a Python source file, which can expose classes, functions and global variables.
The official convention is to use all lower case for file names (as others have already stated). The reason, however, has not been mentioned...
Since Python works cross platform (and it is common to use it in that manner), but file systems vary in the use of casing, it is better to just eliminate alternate cases. In Linux, for instance, it is possible to have MyClass.py and myclass.py in the same directory. That is not so in Windows!
On a related note, if you have MyClass.py and myclass.py in a git repo, or even just change the casing on the same file, git can act funky when you push/pull across from Linux and Windows.
And, while barely on topic, but in the same vein, SQL has these same issues where different standards and configurations may or may not allow UpperCases on table names.
I, personally, find it more pleasant to read TitleCasing / camelCasing even on filenames, but when you do anything that can work cross platform it's safest not to.
There is a difference in the naming convention of the class name and the file that contains this class. This missunderstanding might come from languages like java where it is common to have one file per class.
In python you can have several classes per modul (a simple .py file). The classes in this module/file should be called according to the class naming convention: Class names should normally use the CapWords convention.
The file containing this classes should follow the modul naming convention: Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability.
=> CamelCase should in the file camelcase.py (or camel_case.py if neccessary)
My question is, is it also the normal convention to have the file that
contains the class also be camelCase'd especially if the file only
contains the class
Short answer: No.
Longer answer: should be all lower case and underscores as needed.
From PEP8 "Package and Module Names":
Modules should have short, all-lowercase names. Underscores can be
used in the module name if it improves readability. Python packages
should also have short, all-lowercase names, although the use of
underscores is discouraged.
If you're unclear what a module is:
A module is a file containing Python definitions and statements. The
file name is the module name with the suffix .py appended.
First of all, as mentioned above, class names should be CapWords, e.g.:
class SampleClass:
...
BEWARE: Having the same name for file (module) and class creates confusions.
Example 1: Say you have the following module structure:
src/
__init__.py
SampleClass.py
main.py
Your SampleClass.py is:
class SampleClass:
...
Your main.py is:
from src import SampleClass
instance = SampleClass()
Will this code work? NO, cause you should've done either from src.SampleClass import SampleClass or instance = SampleClass.SampleClass(). Awkward code, isn't it?
You can also fix it by adding the following content to __init__.py:
from .SampleClass import SampleClass
Which leads to the Example 2.
Example 2: Say you develop a module:
src/
__init__.py
BaseClass.py
ConcreteClass.py
main.py
BaseClass.py content:
class BaseClass:
...
ConcreteClass.py content:
from src import BaseClass
class ConcreteClass(BaseClass):
...
And your __init__.py content:
from .ConcreteClass import ConcreteClass
from .BaseClass import BaseClass
And main.py content is:
from src import ConcreteClass
instance = ConcreteClass()
The code fails with an error:
class ConcreteClass(BaseClass):
TypeError: module() takes at most 2 arguments (3 given)
It took me a while to understand the error and why I cannot inherit from the class, cause in previous example when I added exports to __init__.py file everything worked. If you use snake case file names it does not fix the problem but the error is a bit easier to understand:
ImportError: cannot import name 'BaseClass' from partially initialized module 'src'
To fix the code you need to fix the import in ConcreteClass.py to be: from .BaseClass import BaseClass.
Last caveat, if in original code you would switch places imports in __init__.py so it looks like:
from .BaseClass import BaseClass
from .ConcreteClass import ConcreteClass
Initial code works, but you really don't want anyone to write a code that will depend on the order of imports. If someone changes the order or applies isort tool to organize imports, good luck fixing those bugs.