How can I access sibling packages in a maintainable and readable way?

How can I access sibling packages in a maintainable and readable way? - python

I often end up in a situation where one package needs to use a sibling package. I want to clarify that I'm not asking about how Python allows you to import sibling packages, which has been asked many times. Instead, my question is about a best practice for writing maintainable code.
Let's say we have a tools package, and the function tools.parse_name() depends on tools.split_name(). Initially, both might live in the same file where everything is easy:
# tools/__init__.py
from .name import parse_name, split_name
# tools/name.py
def parse_name(name):
splits = split_name(name) # Can access from same file.
return do_something_with_splits(splits)
def split_name(name):
return do_something_with_name(name)
Now, at some point we decide that the functions have grown and split them into two files:
# tools/__init__.py
from .parse_name import parse_name
from .split_name import split_name
# tools/parse_name.py
import tools
def parse_name(name):
splits = tools.split_name(name) # Won't work because of import order!
return do_something_with_splits(splits)
# tools/split_name.py
def split_name(name):
return do_something_with_name(name)
The problem is that parse_name.py can't just import the tools package which it is part of itself. At least, this won't allow it to use tools listed below its own line in tools/__init__.py.
The technical solution is to import tools.split_name rather than tools:
# tools/__init__.py
from .parse_name import parse_name
from .split_name import split_name
# tools/parse_name.py
import tools.split_name as tools_split_name
def parse_name(name):
splits = tools_split_name.split_name(name) # Works but ugly!
return do_something_with_splits(splits)
# tools/split_name.py
def split_name(name):
return do_something_with_name(name)
This solution technically works but quickly becomes messy if more than just one sibling packages are used. Moreover, renaming the package tools to utilities would be a nightmare, since now all the module aliases should change as well.
It would like to avoid importing functions directly and instead import packages, so that it is clear where a function came from when reading the code. How can I handle this situation in a readable and maintainable way?

I can literally ask you what syntax do you need and provide it. I won't, but you can do it yourself too.
"The problem is that parse_name.py can't just import the tools package which is part of itself."
That looks like a wrong and strange thing to do, indeed.
"At least, this won't allow it to use tools listed below its own line in tools/__init__.py"
Agreed, but again, we don't need that, if things are structured properly.
To simplify the discussion and reduce the degrees of freedom,I assumed several things in the example below.
You can then adapt to different but similar scenarios, because you can modify the code to fit your import syntax requirements.
I give some hints for changes in the end.
Scenario:
You want to build an import package named tools.
You have a lot of functions in there, that you want to make available to client code in client.py. This file uses the package tools by importing it. To keep simplicity I make all the functions (from everywhere) available below tools namespace, by using a from ... import * form. That is dangerous and should be modified in real scenario to prevent name clashes with and between subpackage names.
You organize the functions together by grouping them in import packages inside your tools package (subpackages).
The subpackages have (by definition) their own folder and at least an __init__.py inside. I choose to put the subpackages code in a single module in every subpackage folder, besides the __init__.py. You can have more modules and/or inner packages.
.
├── client.py
└── tools
├── __init__.py
├── splitter
│   ├── __init__.py
│   └── splitter.py
└── formatter
├── __init__.py
└── formatter.py
I keep the __init__.pys empty, except for the outside one, which is responsible to make all the wanted names available to client importing code, in the tools namespace.
This can be changed of course.
#/tools/__init.py___
# note that relative imports avoid using the outer package name
# which is good if later you change your mind for its name
from .splitter.splitter import *
from .formatter.formatter import *
# tools/client.py
# this is user code
import tools
text = "foo bar"
splits = tools.split(text) # the two funcs came
# from different subpackages
text = tools.titlefy(text)
print(splits)
print(text)
# tools/formatter/formatter.py
from ..splitter import splitter # tools formatter sibling
# subpackage splitter,
# module splitter
def titlefy(name):
splits = splitter.split(name)
return ' '.join([s.title() for s in splits])
# tools/splitter/splitter.py
def split(name):
return name.split()
You can actually tailor the imports syntax to your taste, to answer your comment about what they look like.
from form is needed for relative imports. Otherwise use absolute imports by prefixing the path with tools.
__init__.pys can be used to adjust the imported names into the importer code, or to initialize the module. They can also be empty, or actually start as the only file in the subpackage, with all the code in it, and then be splitted in other modules, despite I don't like this "everything in __init__.py" approach as much.
They are just code that runs on import.
You can also avoid repeated names in imported paths by either using different names, or by putting everything in __init__.py, dropping the module with the repeated name, or by using aliases in the __init__.py imports, or with name attributions there. You may also limit what gets exported when the * form is used by importer by assigning names to an __all__ list.
A change you might want for safer readability is to force client.py in specifying the subpackage when using names that is,
name1 = tools.splitter.split('foo bar')
Change the __init__.py to import only the submodules, like this:
from .splitter import splitter
from .formatter import formatter

I'm not proposing this to be actually used in practice, but just for fun, here is a solution using pkgutil and inspect:
import inspect
import os
import pkgutil
def import_siblings(filepath):
"""Import and combine names from all sibling packages of a file."""
path = os.path.dirname(os.path.abspath(filepath))
merged = type('MergedModule', (object,), {})
for importer, module, _ in pkgutil.iter_modules([path]):
if module + '.py' == os.path.basename(filepath):
continue
sibling = importer.find_module(module).load_module(module)
for name, member in inspect.getmembers(sibling):
if name.startswith('__'):
continue
if hasattr(merged, name):
message = "Two sibling packages define the same name '{}'."
raise KeyError(message.format(name))
setattr(merged, name, member)
return merged
The example from the question becomes:
# tools/__init__.py
from .parse_name import parse_name
from .split_name import split_name
# tools/parse_name.py
tools = import_siblings(__file__)
def parse_name(name):
splits = tools.split_name(name) # Same usage as if this was an external module.
return do_something_with_splits(splits)
# tools/split_name.py
def split_name(name):
return do_something_with_name(name)

Related

Recommended name for top level module in Python project?

I am developing a program that I plan to distribute to multiple users. This consists of a folder containing the following files:
convert_units.py
fit_parameters.py
plot_results.py
top_level_module.py
The purpose of "top_level_module.py" is to import the other modules and call them as needed, as shown below:
from convert_units import convert_units
from fit_parameters import fit_parameters
from plot_results import plot_results
input_data = "somefile.csv"
intermediate_result = convert_units(input_data)
final_result = fit_parameters(intermediate_result)
plot_results(final_result)
The goal of naming the above file "top_level_module" is to help users who wish to inspect the code. Specifically, my hope is that when users see this filename they will immediately realise that this is the highest level module, and hence the correct file to read first. However, the name "top_level_module" seems verbose, and I am wondering if another name is already in common use for this purpose.
So my question is: Does anyone know if there is a convention for naming the top level module? Or if any other name would be more widely intuitive?

One way that is used often, is to have a folder named after your module with the submoduls (convert_units.py etc.) as files in that folder and an __init__.py-file where you put your code from the main-module instead of in a separate top_level_module.py-file.
Your submoduls can even have their own submoduls and so on for bigger projects. For this you'd have e.g. instead of the convert_units.py-file a convert_units-folder again containing an __init__.py-file with the module-level code and other files or folders for the submoduls of this submodule convert_units.
Your file-structure could in the end look something like this.
my_fancy_library # name folder after what you want to call your module
├── __init__.py # module-level code goes here (<- top_level_module.py)
├── fit_parameters.py # smaller submodul that fits inside one file
├── plot_results.py
├── convert_units # bigger submodul with own submodules
│ ├── __init__.py # module-level code of submodule
│ ├── validator.py
│ ├── custom_exceptions.py
...
Your __init__.py-file of your main-module then can look like this to implement your example:
from .convert_units import convert_units
from .fit_parameters import fit_parameters
from .plot_results import plot_results
def fit_data_and_plot(input_data):
intermediate_result = convert_units(input_data)
final_result = fit_parameters(intermediate_result)
plot_results(final_result)
Better to wrap your code inside a function, or otherwise it will be executed during the import. You then use this function in another module by importing it from my_fancy_library import fit_data_and_plot and calling it with the input-path fit_data_and_plot(input_data).
You can also look at the source-code of other libraries too, to get an idea. This can be really confusing for bigger libraries like numpy, so I would start with a smaller example like dateutil.

Mocking a module import in pytest

I am writing a pytest plugin that should test software that's designed to work inside a set of specific environments.
The software I'm writing is run inside a bigger framework, which makes certain Python modules available only when running my Python software inside the framework.
In order to test my software, I'm required to "mock" or fake an entire module (actually, quite a few). I'll need to implement its functionality in some kind of similar-looking way, but my question is how should I make this fake Python module available to my software's code, using a py.test plugin?
For example, let's assume I have the following code in one of my source files:
import fwlib
def fw_sum(a, b):
return fwlib.sum(a, b)
However, the fwlib module is only made available by the framework I run my software from, and I cannot test inside it.
How would I make sure, from within a pytest plugin, that a module named fwlib is already defined in sys.modules? Granted, I'll need to implement fwlib.sum myself. I'm looking for recommendations on how to do just that.

pytest provides a fixture for this use-case: monkeypatch.syspath_prepend.
You may prepend a path to sys.path list of import locations. Write a fake fwlib.py and include it in your tests, appending the directory as necessary. Like the other test modules, it needn't be included with the distribution.
After playing with this myself, I couldn't actually figure out how to get the fixture to mock module level imports correctly from the library code. By the time the tests run, the library code was already imported and then it is too late to patch.
However, I can offer a different solution that works: you may inject the name from within conftest.py, which gets imported first. The subsequent import statement within the code under test will just re-use the object already present in sys.modules.
Package structure:
$ tree .
.
├── conftest.py
├── lib
│   └── my_lib.py
└── tests
└── test_my_lib.py
2 directories, 3 files
Contents of files:
# conftest.py
import sys
def fwlib_sum(a, b):
return a + b
module = type(sys)('fwlib')
module.sum = fwlib_sum
sys.modules['fwlib'] = module
library file:
# lib/my_lib.py
import fwlib
def fw_sum(a, b):
return fwlib.sum(a, b)
test file:
# lib/test_my_lib.py
import my_lib
def test_sum():
assert my_lib.fw_sum(1, 2) == 3

Just to provide a little more details to #wim's good answer, you can use it with submodules too, like so:
import sys
module = type(sys)("my_module_name")
module.submodule = type(sys)("my_submodule_name")
module.submodule.something = sommething
sys.modules["my_module_name"] = module
sys.modules["my_module_name.my_submodule_name"] = module.submodule

Python: Importing everything from a python namespace / package

all.
I'd think that this could be answered easily, but it isn't. As long as I've been searching for an answer, I keep thinking that I'm overlooking something simple.
I have a python workspace with the following package structure:
MyTestProject
/src
/TestProjectNamespace
__init__.py
Module_A.py
Module_B.py
SecondTestProject
/src
/SecondTestProjectNamespace
__init__.py
Module_1.py
Module_2.py
...
Module_10.py
Note that MyTestProjectNamespace has a reference to SecondTestProjectNamespace.
In MyTestProjectNamespace, I need to import everything in SecondTestProjectNamespace. I could import one module at a time with the following statement(s):
from SecondTestProjectNamespace.Module_A import *
from SecondTestProjectNamespace.Module_B import *
...but this isn't practical if the SecondTestProject has 50 modules in it.
Does Python support a way to import everything in a namespace / package? Any help would be appreciated.
Thanks in advance.

Yes, you can roll this using pkgutil.
Here's an example that lists all packages under twisted (except tests), and imports them:
# -*- Mode: Python -*-
# vi:si:et:sw=4:sts=4:ts=4
import pkgutil
import twisted
for importer, modname, ispkg in pkgutil.walk_packages(
path=twisted.__path__,
prefix=twisted.__name__+'.',
onerror=lambda x: None):
# skip tests
if modname.find('test') > -1:
continue
print(modname)
# gloss over import errors
try:
__import__(modname)
except:
print 'Failed importing', modname
pass
# show that we actually imported all these, by showing one subpackage is imported
print twisted.python
I have to agree with the other posters that star imports are a bad idea.

No. It is possible to set up SecondTestProject to automatically import everything in its submodules, by putting code in __init__.py to do the from ... import * you mention. It's also possible to automate this to some extent using the __import__ function and/or the imp module. But there is no quick and easy way to take a package that isn't set up this way and make it work this way.
It's probably not a good idea anyway. If you have 50 modules, importing everything from all of them into your global namespace is going to cause a proliferation of names, and very likely conflicts among those names.

As other had put it - it might not be a good idea. But there are ways of keeping your namespaces and therefore avoiding naming conflicts - and having all the modules/sub-packages in a module available to the package user with a single import.
Let's suppose I have a package named "pack", within it a module named "a.py" defining some "b" variable. All I want to do is :
>>> import pack
>>> pack.a.b
1
One way of doing this is to put in pack/__init__.py a line that says
import a - thus in your case you'd need fifty such lines, and keep them up to date.
Not that bad.
However, the documentation at http://docs.python.org/tutorial/modules.html#importing-from-a-package - says that if you have a string list named __all__ in your __init__.py file, all module/sub-package names in that list are imported when one does from pack import *
That alone would half-work - but would require users of your package to perform the not-recommended "from x import *" form.
But -- you can do the "... import *" inside __init__.py itself, after defining the __all__ variable - so all you have to do is to keep the __all__ up to date:
With the TestProjectNamespace/__init__.py being like this:
__all__ = ["Module_A", "Module_B", ...]
from TestProjectNamespace import *
your users would have
TestProjectNamespace.Module_A (and others) available upon import of TestProjectNamespace.
And, of course - you could automate the creation of __all__ - it is just a variable, after all - but I would not recommend that.

Does Python support a way to import everything in a namespace / package?
No. A package is not a super-module -- it's a collection of modules grouped together.
At least part of the reason is that it's not trivial to determine what 'everything' means inside a folder: there are problems like network drives, soft links, hard links, ...

How do I structure Python code into modules/packages?

Assume I have this barebones structure:
project/
main.py
providers/
__init.py__
acme1.py
acme2.py
acme3.py
acme4.py
acme5.py
acme6.py
Assume that main.py contains (partial):
if complexcondition():
print providers.acme5.get()
Where __init__.py is empty and acme*.py contain (partial):
def get():
value=complexcalculation()
return value
How do I change these files to work?
Note: If the answer is "import acme1", "import acme2", and so on in __init__.py, is there a way to accomplish that without listing them all by hand?

hey! two years later but... maybe could be helpfull to some one
make your providers/__init__.py like that:
import os
import glob
module_path = os.path.dirname(__file__)
files = glob.glob(os.path.join(module_path, 'acme*.py'))
__all__ = [os.path.basename(f)[:-3] for f in files]
you don't have to change it later if add or remove any providers/acme*.py
then use from providers import * in main.py

If I'm reading your question correctly, it looks like you're not trying to do any dynamic importing (like in the question that Van Gale mentioned) but are actually trying to just import all of the modules in the providers package. If that's the case, in __init__.py you would want to have this statement:
__all__ = ["acme1", "acme2", "acme3", "acme4", "acme5", "acme6"]
Then to import everything you would use from ... import *
from providers import *
And then instead of using the package name explicitly in the code, you would just call the imported classes
acme1.get()
acme2.get()
If you have enough modules in the providers package that it becomes a problem populating the __all__ list, you may want to look into breaking them up into smaller packages or storing the data some other way. I personally wouldn't want to have to deal with dynamic importing schennagins every time I wanted to re-use the package.

This question asked today, Dynamic Loading of Python Modules, should have your answer.

Adding code to init.py

I'm taking a look at how the model system in django works and I noticed something that I don't understand.
I know that you create an empty __init__.py file to specify that the current directory is a package. And that you can set some variable in __init__.py so that import * works properly.
But django adds a bunch of from ... import ... statements and defines a bunch of classes in __init__.py. Why? Doesn't this just make things look messy? Is there a reason that requires this code in __init__.py?

All imports in __init__.py are made available when you import the package (directory) that contains it.
Example:
./dir/__init__.py:
import something
./test.py:
import dir
# can now use dir.something
EDIT: forgot to mention, the code in __init__.py runs the first time you import any module from that directory. So it's normally a good place to put any package-level initialisation code.
EDIT2: dgrant pointed out to a possible confusion in my example. In __init__.py import something can import any module, not necessary from the package. For example, we can replace it with import datetime, then in our top level test.py both of these snippets will work:
import dir
print dir.datetime.datetime.now()
and
import dir.some_module_in_dir
print dir.datetime.datetime.now()
The bottom line is: all names assigned in __init__.py, be it imported modules, functions or classes, are automatically available in the package namespace whenever you import the package or a module in the package.

It's just personal preference really, and has to do with the layout of your python modules.
Let's say you have a module called erikutils. There are two ways that it can be a module, either you have a file called erikutils.py on your sys.path or you have a directory called erikutils on your sys.path with an empty __init__.py file inside it. Then let's say you have a bunch of modules called fileutils, procutils, parseutils and you want those to be sub-modules under erikutils. So you make some .py files called fileutils.py, procutils.py, and parseutils.py:
erikutils
__init__.py
fileutils.py
procutils.py
parseutils.py
Maybe you have a few functions that just don't belong in the fileutils, procutils, or parseutils modules. And let's say you don't feel like creating a new module called miscutils. AND, you'd like to be able to call the function like so:
erikutils.foo()
erikutils.bar()
rather than doing
erikutils.miscutils.foo()
erikutils.miscutils.bar()
So because the erikutils module is a directory, not a file, we have to define it's functions inside the __init__.py file.
In django, the best example I can think of is django.db.models.fields. ALL the django *Field classes are defined in the __init__.py file in the django/db/models/fields directory. I guess they did this because they didn't want to cram everything into a hypothetical django/db/models/fields.py model, so they split it out into a few submodules (related.py, files.py, for example) and they stuck the made *Field definitions in the fields module itself (hence, __init__.py).

Using the __init__.py file allows you to make the internal package structure invisible from the outside. If the internal structure changes (e.g. because you split one fat module into two) you only have to adjust the __init__.py file, but not the code that depends on the package. You can also make parts of your package invisible, e.g. if they are not ready for general usage.
Note that you can use the del command, so a typical __init__.py may look like this:
from somemodule import some_function1, some_function2, SomeObject
del somemodule
Now if you decide to split somemodule the new __init__.py might be:
from somemodule1 import some_function1, some_function2
from somemodule2 import SomeObject
del somemodule1
del somemodule2
From the outside the package still looks exactly as before.

"We recommend not putting much code in an __init__.py file, though. Programmers do not expect actual logic to happen in this file, and much like with from x import *, it can trip them up if they are looking for the declaration of a particular piece of code and can't find it until they check __init__.py. "
-- Python Object-Oriented Programming Fourth Edition Steven F. Lott Dusty Phillips

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.