I am developing a program that I plan to distribute to multiple users. This consists of a folder containing the following files:
convert_units.py
fit_parameters.py
plot_results.py
top_level_module.py
The purpose of "top_level_module.py" is to import the other modules and call them as needed, as shown below:
from convert_units import convert_units
from fit_parameters import fit_parameters
from plot_results import plot_results
input_data = "somefile.csv"
intermediate_result = convert_units(input_data)
final_result = fit_parameters(intermediate_result)
plot_results(final_result)
The goal of naming the above file "top_level_module" is to help users who wish to inspect the code. Specifically, my hope is that when users see this filename they will immediately realise that this is the highest level module, and hence the correct file to read first. However, the name "top_level_module" seems verbose, and I am wondering if another name is already in common use for this purpose.
So my question is: Does anyone know if there is a convention for naming the top level module? Or if any other name would be more widely intuitive?
One way that is used often, is to have a folder named after your module with the submoduls (convert_units.py etc.) as files in that folder and an __init__.py-file where you put your code from the main-module instead of in a separate top_level_module.py-file.
Your submoduls can even have their own submoduls and so on for bigger projects. For this you'd have e.g. instead of the convert_units.py-file a convert_units-folder again containing an __init__.py-file with the module-level code and other files or folders for the submoduls of this submodule convert_units.
Your file-structure could in the end look something like this.
my_fancy_library # name folder after what you want to call your module
├── __init__.py # module-level code goes here (<- top_level_module.py)
├── fit_parameters.py # smaller submodul that fits inside one file
├── plot_results.py
├── convert_units # bigger submodul with own submodules
│ ├── __init__.py # module-level code of submodule
│ ├── validator.py
│ ├── custom_exceptions.py
...
Your __init__.py-file of your main-module then can look like this to implement your example:
from .convert_units import convert_units
from .fit_parameters import fit_parameters
from .plot_results import plot_results
def fit_data_and_plot(input_data):
intermediate_result = convert_units(input_data)
final_result = fit_parameters(intermediate_result)
plot_results(final_result)
Better to wrap your code inside a function, or otherwise it will be executed during the import. You then use this function in another module by importing it from my_fancy_library import fit_data_and_plot and calling it with the input-path fit_data_and_plot(input_data).
You can also look at the source-code of other libraries too, to get an idea. This can be really confusing for bigger libraries like numpy, so I would start with a smaller example like dateutil.
Related
I have a problem: I need to use some external code as part of my own project. This seems to be a relatively common use case in machine learning research when trying to build new experiments on top of previously published solutions/models.
The file structure is as follows:
project_root/
├── external/
│ └── SomeoneElsesCode/
│ └── src/
│ └── dir1/
│ └── subdir1/
│ ├── codeineed.py
│ └── anciliarycode.py
└── src/
└── MyModule/
└── mycode.py
When trying to run the line
from external.SomeoneElsesCode.src.dir1.subdir1.codeineed import NeededClass
in mycode.py I am running into the problem with line
from src.dir1.subdir1.anciliarycode import AncilliaryClass
in codeineed.py since the external code is using absolute import paths. Since I do not control the code in SomeoneElsesCode I cannot simply adjust all import paths there. Is there any way to tell the Python interpreter to "relativize" all paths below SomeoneElsesCode? If not, is there any recommended way of dealing with including external code in Python projects?
I believe that I have found an answer, though I hope that someone can suggest a better approach.
The solution to the problem seems to be to replace the builtin __import__() function with a wrapper preprocessing the path. This can be done by placing the following __init__.py file inside the external directory:
import builtins
default_import = builtins.__import__
class ExternalImport:
def __init__(self, default_import) -> None:
self.default_import = default_import
def __call__(self, name, globals=None, locals=None, fromlist=(), level=0):
if (
("src" in name)
and (globals is not None)
and (hasattr(globals, "__getitem__"))
and (globals.get("__package__", None) is not None)
and ("external" in globals.get("__package__", ""))
):
package = globals["__package__"]
externalLib = package.replace("external.", "").split(".")[0]
name = f"external.{externalLib}.{name}"
return self.default_import(name, globals, locals, fromlist, level)
builtins.__import__ = ExternalImport(default_import)
Essentially, I filter all import statements and look for those containing src and coming from the external module to adjust their names. This is not ideal since it requires all external packages to store their code in the src directory; however, I have not found any better way of detecting imports of actual packages (both built-in and installed).
I'm 'reposting' this question with more detail because I feel it was misunderstood the first time. I have a folder structure that looks like so:
folder w space
├── folder1
│ └── subfolder1
│ └── file_1.py
└── folder2
└── folder w space2
└── file_2.py
└── __init__.py
I'm needing to have file_1.py import the methods from file_2.py. Notice that file_2.py, in relation to file_1.py, is 3 directories up and then 3 directories down. I would, in theory, write the relative import as so:
from ...folder2 import folder w space2.file2
However this is not valid due to the spacing in the subfolder. An absolute import is even worse because the base folder contains spaces too:
from folder w space.folder2.folder w space2.file2
With this, how can I access the contents of file_2.py without:
Renaming the folders (I don't own them so I can't even if I wanted to)
Without using sys.path.append() (does not work well in our production env)
Moving file_2.py (for organization must stay where it is)
Any help would be immensely appreciated!
You can use importlib.import_module
# bar.py
import importlib
importlib.import_module("folder with spaces.foo")
# folder with spaces/foo.py
print('Hello World')
The built-in __import__() allows to support spaces in import statements:
sample_with_spaces = __import__("sample with spaces.foo")
sample_with_spaces.foo.hello()
Found it in this question:
https://stackoverflow.com/a/9123555/6180150
filename is used as the identifier for imported modules (i.e. foo.py will be imported as foo) and Python identifiers can't have spaces, this isn't supported by the import statement
So instead of letting the import statement set the identifier, you do that manually by assigning __import__("sample with spaces.foo") to the identifier you select yourself. In my example it is the identifier sample_with_spaces.
EDIT:
I think in your case when executing the script within a subfolder, it's the easiest way to update the working dir like below.
import os
os.chdir("..") # move up as much as needed
sample_with_spaces = __import__("sample with spaces.foo")
sample_with_spaces.foo.hello()
I have a plugins package that contains several modules, each defining one class (each class is a plugin).
My package structure looks like this :
plugins
├ __init__.py
├ first_plugin.py
├ second_plugin.py
└ third_plugin.py
And a plugin file typically looks like this, only containing a class definition (and a few imports if necessary) :
# in first_plugin.py
class MyFirstPlugin:
...
I would like the end user to be able to import a plugin like so :
from plugins import FirstPlugin
instead of having to also type the module name (which is what is currently required to do) :
from plugins.first_plugin import FirstPlugin
Is there a way to achieve this by re-exporting the modules' classes directly in the __init__.py file without having to import everything module by module like so (which becomes cumbersome when there are lots of modules) :
# in __init__.py
from .first_plugin import FirstPlugin
from .second_plugin import SecondPlugin
from .third_plugin import ThirdPlugin
I do not think this is possible in Python. However you can import entire modules so you do not have to import each class individually.
For example
from first_plugin import *
Allowing you to do
from plugin import # Anything in first_plugin
Its kinda a pain but writing libraries is not easy (wait till you use CMake with C/C++, you have to specify every single file in your source tree :D)
I think you could elaborate on answers of this post: How to import all submodules?
For example with pkgutil.walk_packages(__path__) you'd have a list of modules. Then you could use dir on the loaded module and import the results (filtering elements starting with __
I often end up in a situation where one package needs to use a sibling package. I want to clarify that I'm not asking about how Python allows you to import sibling packages, which has been asked many times. Instead, my question is about a best practice for writing maintainable code.
Let's say we have a tools package, and the function tools.parse_name() depends on tools.split_name(). Initially, both might live in the same file where everything is easy:
# tools/__init__.py
from .name import parse_name, split_name
# tools/name.py
def parse_name(name):
splits = split_name(name) # Can access from same file.
return do_something_with_splits(splits)
def split_name(name):
return do_something_with_name(name)
Now, at some point we decide that the functions have grown and split them into two files:
# tools/__init__.py
from .parse_name import parse_name
from .split_name import split_name
# tools/parse_name.py
import tools
def parse_name(name):
splits = tools.split_name(name) # Won't work because of import order!
return do_something_with_splits(splits)
# tools/split_name.py
def split_name(name):
return do_something_with_name(name)
The problem is that parse_name.py can't just import the tools package which it is part of itself. At least, this won't allow it to use tools listed below its own line in tools/__init__.py.
The technical solution is to import tools.split_name rather than tools:
# tools/__init__.py
from .parse_name import parse_name
from .split_name import split_name
# tools/parse_name.py
import tools.split_name as tools_split_name
def parse_name(name):
splits = tools_split_name.split_name(name) # Works but ugly!
return do_something_with_splits(splits)
# tools/split_name.py
def split_name(name):
return do_something_with_name(name)
This solution technically works but quickly becomes messy if more than just one sibling packages are used. Moreover, renaming the package tools to utilities would be a nightmare, since now all the module aliases should change as well.
It would like to avoid importing functions directly and instead import packages, so that it is clear where a function came from when reading the code. How can I handle this situation in a readable and maintainable way?
I can literally ask you what syntax do you need and provide it. I won't, but you can do it yourself too.
"The problem is that parse_name.py can't just import the tools package which is part of itself."
That looks like a wrong and strange thing to do, indeed.
"At least, this won't allow it to use tools listed below its own line in tools/__init__.py"
Agreed, but again, we don't need that, if things are structured properly.
To simplify the discussion and reduce the degrees of freedom,I assumed several things in the example below.
You can then adapt to different but similar scenarios, because you can modify the code to fit your import syntax requirements.
I give some hints for changes in the end.
Scenario:
You want to build an import package named tools.
You have a lot of functions in there, that you want to make available to client code in client.py. This file uses the package tools by importing it. To keep simplicity I make all the functions (from everywhere) available below tools namespace, by using a from ... import * form. That is dangerous and should be modified in real scenario to prevent name clashes with and between subpackage names.
You organize the functions together by grouping them in import packages inside your tools package (subpackages).
The subpackages have (by definition) their own folder and at least an __init__.py inside. I choose to put the subpackages code in a single module in every subpackage folder, besides the __init__.py. You can have more modules and/or inner packages.
.
├── client.py
└── tools
├── __init__.py
├── splitter
│ ├── __init__.py
│ └── splitter.py
└── formatter
├── __init__.py
└── formatter.py
I keep the __init__.pys empty, except for the outside one, which is responsible to make all the wanted names available to client importing code, in the tools namespace.
This can be changed of course.
#/tools/__init.py___
# note that relative imports avoid using the outer package name
# which is good if later you change your mind for its name
from .splitter.splitter import *
from .formatter.formatter import *
# tools/client.py
# this is user code
import tools
text = "foo bar"
splits = tools.split(text) # the two funcs came
# from different subpackages
text = tools.titlefy(text)
print(splits)
print(text)
# tools/formatter/formatter.py
from ..splitter import splitter # tools formatter sibling
# subpackage splitter,
# module splitter
def titlefy(name):
splits = splitter.split(name)
return ' '.join([s.title() for s in splits])
# tools/splitter/splitter.py
def split(name):
return name.split()
You can actually tailor the imports syntax to your taste, to answer your comment about what they look like.
from form is needed for relative imports. Otherwise use absolute imports by prefixing the path with tools.
__init__.pys can be used to adjust the imported names into the importer code, or to initialize the module. They can also be empty, or actually start as the only file in the subpackage, with all the code in it, and then be splitted in other modules, despite I don't like this "everything in __init__.py" approach as much.
They are just code that runs on import.
You can also avoid repeated names in imported paths by either using different names, or by putting everything in __init__.py, dropping the module with the repeated name, or by using aliases in the __init__.py imports, or with name attributions there. You may also limit what gets exported when the * form is used by importer by assigning names to an __all__ list.
A change you might want for safer readability is to force client.py in specifying the subpackage when using names that is,
name1 = tools.splitter.split('foo bar')
Change the __init__.py to import only the submodules, like this:
from .splitter import splitter
from .formatter import formatter
I'm not proposing this to be actually used in practice, but just for fun, here is a solution using pkgutil and inspect:
import inspect
import os
import pkgutil
def import_siblings(filepath):
"""Import and combine names from all sibling packages of a file."""
path = os.path.dirname(os.path.abspath(filepath))
merged = type('MergedModule', (object,), {})
for importer, module, _ in pkgutil.iter_modules([path]):
if module + '.py' == os.path.basename(filepath):
continue
sibling = importer.find_module(module).load_module(module)
for name, member in inspect.getmembers(sibling):
if name.startswith('__'):
continue
if hasattr(merged, name):
message = "Two sibling packages define the same name '{}'."
raise KeyError(message.format(name))
setattr(merged, name, member)
return merged
The example from the question becomes:
# tools/__init__.py
from .parse_name import parse_name
from .split_name import split_name
# tools/parse_name.py
tools = import_siblings(__file__)
def parse_name(name):
splits = tools.split_name(name) # Same usage as if this was an external module.
return do_something_with_splits(splits)
# tools/split_name.py
def split_name(name):
return do_something_with_name(name)
I currently have a Python file that needs to import another Python file in the parent directory. The problem is that the file that I want to import, has a name that starts with a number.
The structure of the files is as following:
parent/
├── 123.py
└── child/
└─── my_file.py
I want to import the 123.py file, is there a way to achieve this?
No. That is not a valid Python module name. Call it something else, beginning with a letter.
The previous answer is a good piece of advice in that if you are naming a module/variable, you shouldn't start it with a number. But if you have to import a module named like that, it is still possible.
For example, to access variable x from module 123.py:
>>> from importlib import import_module
>>> onetwothree = import_module('123')
>>> onetwothree.x
Regarding the other part of your question, the module being in the parent directory, you can first append that directory to your sys path:
>>> import sys
>>> sys.path.append('path/to/dir')
Read more about naming conventions from PEP 8
Mentioned above, it's not a preferable to name modules as numbers because it makes things more complicated than they should be. PEP 8 also recommends that naming is lowercase, using underscores to increase readability. PEP 8 aren't a set of rules, but guidelines. If it's absolutely necessary that you need the module name as a number, you can follow Lgiro's answer.
Another way you can import is using __init__.py files and relative imports.
parent/
├──__init__.py
├── parent.py
└── child/
├───__init__.py
└─── child.py
Then in child.py
from .. import parent
Question related to relative import