Python Comprehension - Importing & Dunder Methods - python

Python Double-Underscore methods are hiding everywhere and behind everything in Python! I am curious about how this is specifically working with the interpreter.
import some_module as sm
From my current understanding:
Import searches for requested module
It binds result to the local assignment (if given)
It utilizes the __init__.py . . . ???
There seems to be something going on that is larger than my scope of understanding. I understand we use __init__() for class initialization. It is functioning as a constructor for our class.
I do not understand how calling import is then utilizing the __init__.py.
What exactly is happening when we run import?
How is __init__.py different from other dunder methods?
Can we manipulate this dunder method (if we really wanted to?)

import some_module is going to look for one of two things. It's either going to look for a some_module.py in the search path or a some_module/__init__.py. Only one of those should exist. The only thing __init__.py means when it comes to modules is "this is the module that represents this folder". So consider this folder structure.
foo/
__init__.py
module1.py
bar.py
Then the three modules available are foo (which corresponds to foo/__init__.py), foo.module1 (which corresponds to foo/module1.py), and bar (which corresponds to bar.py). By convention, foo/__init__.py will usually import important names from module1.py and reexport some of them for convenience, but this is by no means a requirement.

Related

Is it right to import variables from __init__ into a unit-test script?

Let's say that we have a directory with the following structure:
tests/
|-- __init__.py
|-- test_foo.py
where package foo is tested. In test_foo.py, variable bar is defined (and modified) to later be used.
Now imagine that instead of one file, we have around 20 test_fooX.py, where bar is initialized in every test.
Is it good practice to initiate bar in __init__.py and import it directly in every test? E.g.
from __init__ import bar
The Zen of python mentions that:
Explicit is better than implicit.
Defining bar in every single script would be what the explicit way. However, importing variables improves the structure of the tests/project.
A real scenario would be a logger (imported from foo), whose logging level needs to be changed; or the location of a specific directory instead of defining it every time.
There's nothing really implicit about __init__.py. A package is a module. Because a package is implemented by a directory containing a file named __init__.py, that file contains the contents of the module tests, with other files implementing submodules belonging to the same package.

Why does __all__ work differently in packages than in modules?

The Python documentation for the import statement (link) contains the following:
The public names defined by a module are determined by checking the module’s namespace for a variable named __all__; if defined, it must be a sequence of strings which are names defined or imported by that module.
The Python documentation for modules (link) contains what is seemingly a contradictory statement:
if a package’s __init__.py code defines a list named __all__, it is taken to be the list of module names that should be imported when from package import * is encountered.
It then gives an example where an __init__.py file imports nothing, and simply defines __all__ to be some of the names of modules in that package.
I have tested both ways of using __all__, and both seem to work; indeed one can mix and match within the same __all__ value.
For example, consider the directory structure
foopkg/
__init__.py
foo.py
Where __init__.py contains
# Note no imports
def bar():
print("BAR")
__all__ = ["bar", "foo"]
NOTE: I know one shouldn't define functions in an __init__.py file. I'm just doing it to illustrate that the same __all__ can export both names that do exist in the current namespace, and those which do not.
The following code runs, seemingly auto-importing the foo module:
>>> from foopkg import *
>>> dir()
[..., 'bar', 'foo']
Why does the __all__ attribute have this strange double-behaviour?
The docs seem really unclear on how it is supposed to be used, only mentioning one of its two sides in each place I linked. I understand the overall purpose is to explicitly set the names imported by a wildcard import, but am confused by the additional, seemingly auto-importing behaviour. Is this just a magic shortcut that avoids having to write the import out as well?
The documentation is a bit hard to parse because it does not mention that packages generally also have the behavior of modules, including their __all__ attribute. The behavior of packages is necessarily a superset of the behavior of modules, because packages, unlike modules, can have sub-packages and sub-modules. Behaviors not related to that feature are identical between the two as far as the end-user is concerned.
The python docs can be minimalistic at times. They did not bother to mention that
Package __init__ performs all the module-like code for a package, including support for star-import for direct attributes via __all__, just like a module does.
Modules support all the features of a package __init__.py, except that they can't have a sub-package or sub-module.
It goes without saying that to make a name refer to a sub-module, it has to be imported, hence the apparent, but not really double-standard.
Update: How from M import * actually works?
The __all__ in __init__.py of folder foopkg works the same way as __all__ in foopkg.py
Why it'll auto-import foo you can see here: https://stackoverflow.com/a/54799108/12565014
The most import thing is to look at the cpython implementation: https://github.com/python/cpython/blob/fee552669f21ca294f57fe0df826945edc779090/Python/ceval.c#L5152
It basically loop through __all__ and try to import each element in __all__
That's why it'll auto-import foo and also achieve white listing

Should Python class filenames also be camelCased?

I know that classes in Python are typically cased using camelCase.
Is it also the normal convention to have the file that contains the class also be camelCase'd especially if the file only contains the class?
For example, should class className also be stored in className.py instead of class_name.py?
The following answer is largely sourced from this answer.
If you're going to follow PEP 8, you should stick to all-lowercase names, with optional underscores.
To quote PEP 8's naming conventions for packages & modules:
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability.
And for classes:
Class names should normally use the CapWords convention.
See this answer for the difference between a module, class and package:
A Python module is simply a Python source file, which can expose classes, functions and global variables.
The official convention is to use all lower case for file names (as others have already stated). The reason, however, has not been mentioned...
Since Python works cross platform (and it is common to use it in that manner), but file systems vary in the use of casing, it is better to just eliminate alternate cases. In Linux, for instance, it is possible to have MyClass.py and myclass.py in the same directory. That is not so in Windows!
On a related note, if you have MyClass.py and myclass.py in a git repo, or even just change the casing on the same file, git can act funky when you push/pull across from Linux and Windows.
And, while barely on topic, but in the same vein, SQL has these same issues where different standards and configurations may or may not allow UpperCases on table names.
I, personally, find it more pleasant to read TitleCasing / camelCasing even on filenames, but when you do anything that can work cross platform it's safest not to.
There is a difference in the naming convention of the class name and the file that contains this class. This missunderstanding might come from languages like java where it is common to have one file per class.
In python you can have several classes per modul (a simple .py file). The classes in this module/file should be called according to the class naming convention: Class names should normally use the CapWords convention.
The file containing this classes should follow the modul naming convention: Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability.
=> CamelCase should in the file camelcase.py (or camel_case.py if neccessary)
My question is, is it also the normal convention to have the file that
contains the class also be camelCase'd especially if the file only
contains the class
Short answer: No.
Longer answer: should be all lower case and underscores as needed.
From PEP8 "Package and Module Names":
Modules should have short, all-lowercase names. Underscores can be
used in the module name if it improves readability. Python packages
should also have short, all-lowercase names, although the use of
underscores is discouraged.
If you're unclear what a module is:
A module is a file containing Python definitions and statements. The
file name is the module name with the suffix .py appended.
First of all, as mentioned above, class names should be CapWords, e.g.:
class SampleClass:
...
BEWARE: Having the same name for file (module) and class creates confusions.
Example 1: Say you have the following module structure:
src/
__init__.py
SampleClass.py
main.py
Your SampleClass.py is:
class SampleClass:
...
Your main.py is:
from src import SampleClass
instance = SampleClass()
Will this code work? NO, cause you should've done either from src.SampleClass import SampleClass or instance = SampleClass.SampleClass(). Awkward code, isn't it?
You can also fix it by adding the following content to __init__.py:
from .SampleClass import SampleClass
Which leads to the Example 2.
Example 2: Say you develop a module:
src/
__init__.py
BaseClass.py
ConcreteClass.py
main.py
BaseClass.py content:
class BaseClass:
...
ConcreteClass.py content:
from src import BaseClass
class ConcreteClass(BaseClass):
...
And your __init__.py content:
from .ConcreteClass import ConcreteClass
from .BaseClass import BaseClass
And main.py content is:
from src import ConcreteClass
instance = ConcreteClass()
The code fails with an error:
class ConcreteClass(BaseClass):
TypeError: module() takes at most 2 arguments (3 given)
It took me a while to understand the error and why I cannot inherit from the class, cause in previous example when I added exports to __init__.py file everything worked. If you use snake case file names it does not fix the problem but the error is a bit easier to understand:
ImportError: cannot import name 'BaseClass' from partially initialized module 'src'
To fix the code you need to fix the import in ConcreteClass.py to be: from .BaseClass import BaseClass.
Last caveat, if in original code you would switch places imports in __init__.py so it looks like:
from .BaseClass import BaseClass
from .ConcreteClass import ConcreteClass
Initial code works, but you really don't want anyone to write a code that will depend on the order of imports. If someone changes the order or applies isort tool to organize imports, good luck fixing those bugs.

Python namespace elements visibility (vs proper package structure)

Python3
Tried to found an answer but failed. First I'll present the snippet, then I'll explain why I wanted to do it this way and what I wanted to achieve. Maybe it'll look like this approach is "the bad one".
Hence this semi-double topic, cause first I'd like to know why this snippet isn't working and second - I'd like to know if this approach is right.
So:
class Namespace:
def some_function():
pass
class SomeClass:
fcnt = some_function
This won't work due to:
NameError: name 'some_function' is not defined
What I want to achieve is code and file structure readability.
Above example is a snippet which I use (not this one, but it looks like this) in Pyramid project.
My project tree looks like this:
my_project
├── models
│   ├── __init__.py
│   └── some_model.py
├── schemas
│   ├── __init__.py
│   ├── some_schema.py
│   └── some_other_schema.py
...
├── views
│   ├── __init__.py
│   └── some_view.py
└── __init__.py
What I wanted to achieve is clean schema/model/view importing.
In some_schema.py file resides class SomeSchema, in some_other_schema.py class SomeOtherSchema.
With above snippet I can make:
from my_project.schemas.some_schema import Schema
and use it like Schema.SomeSchema()
I've got a little bit lost with packages and imports. How one could make a clean structure (one schema per file) and still be able to use Schema namespace? (In C++ I'd just put each of those classes in Schema namespace, that's why I did this in snippet above. But! What works in C++ maybe shouldn't be used in python, right?).
Thanks for answer in advance.
EDIT:
Ok, I've done some testing (I thought that I've done it, but looks like not..).
using from my_project.schemas.some_schema import Schema with another from my_project.schemas.some_other_schema import Schema causes in the second import shadowing first one. So if after first import I'd be able to use x = Schema.SomeSchema() than after second import I'd be unable to do this, because class Schema gets overriden. Right, so as Erik said - classes aren't namespaces. GOT IT!
in my very first snippet yes, I should've used fnct = Namespace.some_function. What's wierd - it works. I have the same statement in my pyramid code, with one difference. some_function has #colander.deferred decorator. In fact it looks like this:
class Schema:
#colander.deferred
def deferred_some_function(node, kw):
something = kw.get("something", [])
return deform.widget.SelectWidget(values=something,
multiple=True)
class SomeSchema(colander.MappingSchema):
somethings = colander.SchemaNode(colander.Set(),
widget=Schema.deferred_some_function)
And I get NameError: name 'Schema' is not defined
Getting back to package format. With this:
### another/file.py
from foo.bar.schema import SomeSchema
# do something with SomeSchema:
smth = SomeSchema()
smth.fcnt()
I have to make one module foo/bar/schema.py in which I'd have to put all my SomeXSchema classes. An if I have lots of them, then there's the unreadabilty glitch which I wanted to get rid off by splitting SomeXSchema - one per file. Can I accomplish this somehow? I want to call this class for example: User. And here's the THING. Maybe I do it wrong? I'd like to have class named User in schema namespace and class named User in model namespace. Shouldn't I? Maybe I ought to use prefix? Like class SchemaUser and class ModelUser ? I wanted to avoid it by the use of modules/packages.
If I'd use : import foo.bar.schema then I'd have to use it like x = foo.bar.schema.User() right? There is no way to use it like x = schema.User() ? Sorry, I just got stuck, my brain got fixed. Maybe I need a little break to take a fresh look?
ANOTHER EDIT (FOR POINT 3 ONLY)
I did some more research. The answer here would be to make it like this:
## file: myproject/schemas/__init__.py
from .some_schema import SomeSchema
from .some_other_schema import SomeOtherSchema
then usage would be like this:
## some file using it
import myproject.schemas as schema
s1 = schema.SomeSchema()
s2 = schema.SomeOtherSchema()
Would it be lege artis?
If anyone thinks that topic should be changed - go ahead, give me something more meaningful, I'd appreciate it.
Your are swimming upstream by trying to do what you are trying to do.
Classes are meant for defining new data types not as a means to group related parts of code together. Modules are perfectly suited for that, and I presume you know that well because of the "(vs proper package structure)" part in the question title.
Modules can also be imported as objects, so to achieve what you want:
### foo/bar/schema.py
def some_function():
pass
class SomeSchema:
fcnt = some_function
### another/file.py
from foo.bar import schema
# do something with SomeSchema:
smth = schema.SomeSchema()
smth.fcnt()
...although it's also typical to import classes directly into the scope like this (i.e. being able to refer to SomeSchema after the import as opposed to schema.SomeSchema):
### another/file.py
from foo.bar.schema import SomeSchema
# do something with SomeSchema:
smth = SomeSchema()
smth.fcnt()
(Also note that module names should be lowercase as suggested by PEP8 and only class names should use PascalCase)
This, by the way, applies to programming in general, not just Python. There are a few languages such as Java and C# which require that functions be declared inside of classes as statics because they disallow writing of code outside of classes for some weird reason, but even these languages have modules/proper namespaces for structuring your code; i.e. classes are not normally put inside other classes (they sometimes are, but for wholly different reasons/goals than yours).
So basically "class" means a "type" or "a set of objects having similar behavior"; once you ignore that principle/definition, you're writing bad code by definition.
PS. if you are using Python 2.x, you should be inheriting your classes from object so as to get new-style classes.
PPS. in any case, even technically speaking, what you are trying to do won't work cleanly in Python:
class fake_namespace:
def some_function():
pass
class RealClass:
some_function # <-- that name is not even visibile here;
# you'd have to use fake_namespace.some_function instead
...and this is the reason for the exception I reported I was getting: NameError: name 'some_function' is not defined.
EDIT AS PER YOUR EDITS:
I'm not really sure why you're making it so complicated; also some of your statements are false:
If I'd use : import foo.bar.schema then I'd have to use it like x = foo.bar.schema.User right?
No. Please learn how Python modules work.
I'd like to have class named User in Schema namespace and class named User in Model namespace. Shouldn't I? Maybe I ought to use prefix? Like class SchemaUser and class ModelUser
please note that namespaces a.k.a. modules should be lowercase not PascalCase.
An if I have lots of them, then there's the unreadabilty glitch which I wanted to get rid off by splitting SomeXSchema - one per file. Can I accomplish this somehow?
Yes; you can put your classes in individual submodules, e.g. schema1/class1.py, schema/class2.py etc; then you can "collect" them into schema/__init__.py so that you could import them directly from schema:
# schema/__init__.py
from .class1 import Class1
from .class2 import Class2
__all__ = [Class1, Class2] # optional
General note: you can name your schema modules differently, e.g. schema1, schema2, etc; then you could just use them like this:
from somewhere import schema1
from somewhere_else import schema2
s1_user = schema1.User()
s2_user = schema2.User()
# etc
For more information on how Python modules work, refer to http://docs.python.org/2/tutorial/modules.html
Name and binding
You can read Python naming and binding and understand how Python namespace works.
A scope defines the visibility of a name within a block. If a local variable is defined in a block, its scope includes that block. If the definition occurs in a function block, the scope extends to any blocks contained within the defining one, unless a contained block introduces a different binding for the name. The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods this includes generator expressions since they are implemented using a function scope.
BTW, use globals() and locals() can help debug for variable binding.
The User Problem
You can try this instead:
from model import User as modelUser
from foo.bar.schema import User as schemaUser

Python - when is 'import' required?

mod1.py
import mod2
class Universe:
def __init__(self):
pass
def answer(self):
return 42
u = Universe()
mod2.show_answer(u)
mod2.py
#import mod1 -- not necessary
def show_answer(thing):
print thing.answer()
Coming from a C++ background I had the feeling it was necessary to import the module containing the Universe class definition before the show_answer function would work. I.e. everything had to be declared before it could be used.
Am I right in thinking this isn't necessary? This is duck typing, right? So if an import isn't required to see the methods of a class, I'd at least need it for the class definition itself and the top level functions of a module?
In one script I've written, I even went as far as writing a base class to declare an interface with a set of methods, and then deriving concrete classes to inherit that interface, but I think I get it now - that's just wrong in Python, and whether an object has a particular method is checked at runtime at the point where the call is made?
I realise Python is so much more dynamic than C++, it's taken me a while to see how little code you actually need to write!
I think I know the answer to this question, but I just wanted to get clarification and make sure I was on the right track.
UPDATE: Thanks for all the answers, I think I should clarify my question now:
Does mod2.show_answer() need an import (of any description) to know that thing has a method called answer(), or is that determined dynamically at runtime?
In this case you're right: show_answer() is given an object, of which it calls the method "answer". As long as the object given to show_answer() has such a method, it doesn't matter where the object comes from.
If, however, you wanted to create an instance of Universe inside mod2, you'd have to import mod1, because Universe is not in the mod2 namespace, even after mod2 has been imported by mod1.
import is all about names -- mostly "bare names" that are bound at top level (AKA global level, AKA module-level names) in a certain module, say mod2. When you've done import mod2, you get the mod2 namespace as an available name (top-level in your own module, if you're doing the import itself as top level, as is most common; but a local import within a function would make mod2 a local variable of that function, etc); and therefore you can use mod2.foobar to access the name foobar that's bound at top level in mod2. If you have no need to access such names, then you have no need to import mod2 in your own module.
Think of import being more like the linker.
With "import mod2" you are simply telling python that it can find the function in the file mod2.py
Actually, in this case, importing mod1 in mod2.py should not work.
Would it not create a circular reference?
In fact, according to this explanation , the circular import will not work the way you want it to work: if you uncomment import mod1, the second module will still not know about the Universe.
I think this is quite reasonable. If both of your files need access to the type of some specific object, like Universe, you have several choices:
if your program is small, just use one file
if it's big, you need to decide if your files both need to know how Universe is implemented, perhaps passing an object of not-yet-known type to show_answer is fine
if that doesn't work for you, by all means put Universe in a separate module and load it first.
import in Python loads the module into the given namespace. As such, is it as if the def show_answer actually existed in the mod1.py module. Because of this, mod2.py does not need to know of the Universe class and thus you do not need to import mod1 from mod2.py.
I don't know much about C++, so can't directly compare it, but..
import basically loads the other Python script (mod2.py) into the current script (the top level of mod1.py). It's not so much a link, it's closer to an eval
For example, in Python'ish psuedo-code:
eval("mod2.py")
is the same as..
from mod2 import *
..it executes mod2.py, and makes the functions/classes defined accessible in the current script.
Both above snippets would allow you to call show_answer() (well, eval doesn't quite work like that, thus I called it pseudo code!)
import mod2
..is basically the same, but instead of bringing in all the functions into the "top level", it brings them into the mod2 module, so you call show_answer by doing..
mod2.show_answer
Am I right in thinking [the import in mod2.py] isn't necessary?
Absolutely. In fact if you try and import mod1 from mod2 you get a circular dependancy error (since mod2 then tries to import mod1 and so on..)

Categories