relative import from utility script in subdirectory - python

From reading over other answers, it seems that possibly my layout is "un-Pythonic," although I'm really not quite sure. If so that would be helpful to know, along with a suggestion for the better layout.
Here is my script layout:
/
__init__.py
main_prog.py
utilities.py
/support_scripts
support_utility1.py
support_utility2.py
...
The support utilities contain functionality which are related to main_prog.py, but are best placed in their own scripts. Since there are many of them, I have moved them in their own directory. But they use some of the same functionality from utilities.py
When I try to import using from .. import utilities I get the error message "ValueError: attempted relative import beyond top-level package"
Now my first question would simply be: Is it considered bad to try to place additional scripts in subdirectories like this? Knowing a general principle like that would go a long way to solving my problems. And of course if you have any specific suggestions that will help too.

Related

How to structure a python package made of multiple sub-projects?

what is the correct way to structure a python package for multiple functionalities?
I have a case where my project can be broken into 3 completely separate, logical chunks that are entirely decoupled. As such, I have broken the sub-projects into respective folders in my python package project. This has led to a file structure like this;
foobartoo
|_setup.py
|_foobartoo
|_foo
| |_foo.py
|
|_bar
| |_bar.py
|
|_too
|_too.py
This method keeps things together but separate nicely, but after installing I have noticed a definite problem. The only way to access one of the files is to do
from foobartoo.foo.foo import <method/class>
or
import foobartoo.foo.foo as <something>
<something>.<method/class>
This seems extremely impractical.
I can see two alternatives
scrapping the folder system and having foo.py, bar.py and too.py in the same directory under foobartoo.
This seems bad because it will be impossible to know which file/code belongs to which of the projects
breaking the single package into multiple packages.
This seems ok, but it would be better if the 3 things were together as they are intended to be used together.
I have looked at numpy and its source code, and somehow they seem to have a lot of their functionality behind many folders, yet they dont need to reference any of the folders. This seems ideal, just being able to do something like
import foobartoo
foobartoo.<classname/methodname>()
You can add additional paths to your 'main' script so sys will search automatically to many directories like:
import sys
sys.path.append(example_sub_dir)
import sub_script
sys.path.remove(example_sub_dir)
!note that you can add many directories on sys to search but you need to care as import time will be rise respectivelly

Import from parent directory for a test sub-directory without using packaging, Python 2.7

TL;DR
For a fixed and unchangeable non-package directory structure like this:
some_dir/
mod.py
test/
test_mod.py
example_data.txt
what is a non-package way to enable test_mod.py to import from mod.py?
I am restricted to using Python 2.7 in this case.
I want to write a few tests for the functions in mod.py. I want to create a new directory test that sits alongside mod.py and inside it there is one test file test/test_mod.py which should import the functions from mod.py and test them.
Because of the well-known limitations of relative imports, which rely on package-based naming, you can't do this in the straightforward way. Yet, all advice on the topic suggests to build the script as a package and then use relative imports, which is impossible for my use case.
In my case, it is not allowable for mod.py to be built as a package and I cannot require users of mod.py to install it. They are instead free to merely check the file out from version control and just begin using it however they wish, and I am not able to change that circumstance.
Given this, what is a way to provide a simple, straightforward test directory?
Note: not just a test file that sits alongside mod.py, but an actual test directory since there will be other assets like test data that come with it, and the sub-directory organization is critical.
I apologize if this is a duplicate, but out of the dozen or so permutations of this question I've seen in my research before posting, I haven't seen a single one that addresses how to do this. They all say to use packaging, which is not a permissible option for my case.
Based on #mgilson 's comment, I added a file import_helper.py to the test directory.
some_dir/
mod.py
test/
test_mod.py
import_helper.py
example_data.txt
Here is the content of import_helper.py:
import sys as _sys
import os.path as _ospath
import inspect as _inspect
from contextlib import contextmanager as _contextmanager
#_contextmanager
def enable_parent_import():
path_appended = False
try:
current_file = _inspect.getfile(_inspect.currentframe())
current_directory = _ospath.dirname(_ospath.abspath(current_file))
parent_directory = _ospath.dirname(current_directory)
_sys.path.insert(0, parent_directory)
path_appended = True
yield
finally:
if path_appended:
_sys.path.pop(0)
and then in the import section of test_mod.py, prior to an attempt to import mod.py, I have added:
import unittest
from import_helper import enable_parent_import
with enable_parent_import():
from mod import some_mod_function_to_test
It is unfortunate to need to manually mangle PYTHONPATH, but writing it as a context manager helps a little, and restores sys.path back to its original state prior to the parent directory modification.
In order for this solution to scale across multiple instances of this problem (say tomorrow I am asked to write a widget.py module for some unrelated tasks and it also cannot be distributed as a package), I have to replicate my helper function and ensure a copy of it is distributed with any tests, or I have to write that small utility as a package, ensure it gets globally installed across my user base, and then maintain it going forward.
When you manage a lot of Python code internal to a company, often one dysfunctional code distribution mode that occurs is that "installing" some Python code becomes equivalent to checking out the new version from version control.
Since the code is often extremely localized and specific to a small set of tasks for a small subset of a larger team, maintaining the overhead for sharing code via packaging (even if it is a better idea in general) will simply never happen.
As a result, I feel the use case I describe above is extremely common for real-world Python, and it would be nice if some import tools added this functionality for modifying PYTHONPATH, which some sensible default choices (like adding parent directory) being very easy.
That way you could rely on this at least being part of the standard library, and not needing to roll your own code and ensure it's either shipped with your tests or installed across your user base.

Proper way to import across Python package

Let's say I have a couple of Python packages.
/package_name
__init__.py
/dohickey
__init__.py
stuff.py
other_stuff.py
shiny_stuff.py
/thingamabob
__init__.py
cog_master.py
round_cogs.py
teethless_cogs.py
/utilities
__init__.py
important.py
super_critical_top_secret_cog_blueprints.py
What's the best way to utilize the utilites package? Say shiny_stuff.py needs to import important.py, what's the best way to go about that?
Currently I'm thinking
from .utilities import important
But is that the best way? Would it make more sense to add utilities to the path and import it that way?
import sys
sys.path.append(os.path.basename(os.path.basename(__file__)))
import utilities.super_critical_top_secret_cog_blueprints
That seems clunky to add to each of my files.
I think the safest way is always to use absolute import, so in you case:
from package_name.utilities import important
This way you won't have to change your code if you decide to move your shiny_stuff.py in some other package (assuming that package_name will still be in your sys.path).
According to Nick Coghlan (who is a Python core developer):
"“Never add a package directory, or any directory inside a package, directly to the Python path.” (Under the heading "The double import trap")
Adding the package directory to the path gives two separate ways for the module to be referred to. The link above is an excellent blog post about the Python import system. Adding it to the path directly means you can potentially have two copies of a single module, which you don't want. Your relative import from .utilities import important is fine, and an absolute import import package_name.utilities.important is also fine.
A "best" out-of-context choice probably doesn't exist, but you can have some criteria choosing which is better for your use cases, and for such a judgment one should know are the different possible approaches and their characteristics. Probably the best source of information is the PEP 328 itself, which contains some rationale about declaring distinct possibilities for that.
A common approach is to use the "absolute import", in your case it would be something like:
from package_name.utilities import important
This way, you can make this file it a script. It is somewhat independent from other modules and packages, fixed mainly by its location. If you have a package structure and need to change one single module from its location, having absolute path would help this single file to be kept unchanged, but all the ones which uses this module it should change. Of course you can also import the __init__.py files as:
from package_name import utilities
And these imports have the same characteristics. Be careful that utilities.important try to find a variable important within __init__.py, not from important.py, so having a "import important" __init__.py would help avoiding a mistake due to the distinction between file structure and namespace structure.
Another way to do that is the relative approach, by using:
from ..utilities import important
The first dot (from .stuff import ___ or from . import ___) says "the module in this [sub]package", or __init__.py when there's only the dot. From the second dot we are talking about parent directories. Generally, starting with dots in any import isn't allowed in a script/executable, but you can read about explicit relative imports (PEP 366) if you care about scripts with relative imports.
A justification for relative import can be found on the PEP 328 itself:
With the shift to absolute imports, the question arose whether relative imports should be allowed at all. Several use cases were presented, the most important of which is being able to rearrange the structure of large packages without having to edit sub-packages. In addition, a module inside a package can't easily import itself without relative imports.
Either case, the modules are tied to the subpackages in the sense that package_name is imported first no matter which the user tried to import first, unless you use sys.path to search for subpackages as packages (i.e., use the package root inside sys.path)...but that sounds weird, why would one do that?
The __init__.py can auto-import module names, for that one should care about its namespace contents. For example, say important.py has an object called top_secret, which is a dictionary. To find it from anywhere you would need
from package_name.utilities.important import top_secret
Perhaps you want be less specific:
from package_name.utilities import top_secret
That would be done with an __init__.py with the following line inside it:
from .important import top_secret
That's perhaps mixing the relative and absolute imports, but for a __init__.py you probably know that subpackage makes sense as a subpackage, i.e., as an abstraction by itself. If it's just a bunch of files located in the same place with the need for a explicit module name, probably the __init__.py would be empty (or almost empty). But for avoiding explicit module names for the user, the same idea can be done on the root __init__.py, with
from .utilities import top_secret
Completely indirect, but the namespace gets flat this way while the files are nested for some internal organization. For example, the wx package (wxPython) do that: everything can be found from wx import ___ directly.
You can also use some metaprogramming for finding the contents if you want to follow this approach, for example, using __all__ to detect all names a module have, or looking for the file location to know which modules/subpackages are available there to import. However, some simpler code completion utilities might get lost when doing that.
For some contexts you might have other kind of constraints. For example, macropy makes some "magic" with imports and doesn't work on the file you call as a script, so you'll need at least 2 modules just to use this package.
Anyhow, you should always ask whether nesting into subpackages is really needed for you code or API organization. The PEP 20 tells us that "Flat is better than nested", which isn't a law but a point-of-view that suggests you should keep a flat package structure unless nesting is needed for some reason. Likewise, you don't need a module for each class nor anything alike.
Use absolute import in case you need to move to a different location.

Python: Calling a function from a file that has current file imported

I tried searching around but didn't seem to find an answer to my problem, so I'm sorry if I missed something and it actually has been answered before.
So basically I have main.py and another file called check.py (both in same directory)
In my main.py I have:
from check import checkfunction
I have a small function inside main.py that I MUST call inside check.py, but I can't seem to get this import working on my check.py:
from main import mainfunction
How can I get the mainfunction to work inside check.py?
Thanks!
You've got a design with a circular dependency which is usually a bad thing as your two python modules are tightly coupled.
Consider refactoring your code. But if you must stick with your design please see the following SO question for more info on how circular imports work in Python and the various gotchas to look out for.
Several options:
Move the common function to a module imported by both other modules.
Merge both modules into one.
Pass the function from main to the code that needs to call it.
Monkey patch the function into the check module after importing it.
Refactor the whole thing so that you don't have circular dependencies.
If you actually explained why you have this design, someone could possibly propose a better way.

Populating Factory using Metaclasses in Python

Obviously, registering classes in Python is a major use-case for metaclasses. In this case, I've got a serialization module that currently uses dynamic imports to create classes and I'd prefer to replace that with a factory pattern.
So basically, it does this:
data = #(Generic class based on serial data)
moduleName = data.getModule()
className = data.getClass()
aModule = __import__(moduleName)
aClass = getattr(aModule, className)
But I want it to do this:
data = #(Generic class based on serial data)
classKey = data.getFactoryKey()
aClass = factory.getClass(classKey)
However, there's a hitch: If I make the factory rely on metaclasses, the Factory only learns about the existence of classes after their modules are imported (e.g., they're registered at module import time). So to populate the factory, I'd have to either:
manually import all related modules (which would really defeat the purpose of having metaclasses automatically register things...) or
automatically import everything in the whole project (which strikes me as incredibly clunky and ham-fisted).
Out of these options, just registering the classes directly into a factory seems like the best option. Has anyone found a better solution that I'm just not seeing? One option might be to automatically generate the imports required in the factory module by traversing the project files, but unless you do that with a commit-hook, you run the risk of your factory getting out of date.
Update:
I have posted a self-answer, to close this off. If anyone knows a good way to traverse all Python modules across nested subpackages in a way that will never hit a cycle, I will gladly accept that answer rather than this one. The main problem I see happening is:
\A.py (import Sub.S2)
\Sub\S1.py (import A)
\Sub\S2.py
\Sub\S3.py (import Sub.S2)
When you try to import S3, it first needs to import Main (otherwise it won't know what a Sub is). At that point, it tries to import A. While there, the __init__.py is called, and tries to register A. At this point, A tries to import S1. Since the __init__.py in Sub is hit, it tries to import S1, S2, and S3. However, S1 wants to import A (which does not yet exist, as it is in the process of being imported)! So that import fails. You can switch how the traversal occurs (i.e., depth first rather than breadth first), but you hit the same issues. Any insight on a good traversal approach for this would be very helpful. A two-stage approach can probably solve it (i.e., traverse to get all module references, then import as a flat batch). However, I am not quite sure of the best way to handle the final stage (i.e., to know when you are done traversing and then import everything). My big restriction is that I do not want to have a super-package to deal with (i.e., an extra directory under Sub and A). If I had that, it could kick off traversal, but everything would need to import relative to that for no good reason (i.e., all imports longer by an extra directory). Thusfar, adding a special function call to sitecustomize.py seems like my only option (I set the root directory for the package development in that file anyway).
The solution I found to this was to do all imports on the package based off of a particular base directory and have special __init__.py functions for all of the ones that might have modules with classes that I'd want to have registered. So basically, if you import any module, it first has to import the base directory and proceeds to walk every package (i.e., folder) with a similar __init__.py file.
The downside of this approach is that the same modules are sometimes imported multiple times, which is annoying if anyone leaves code with side effects in a module import. However, that's bad either way. Unfortunately, some major packages (cough, cough: Flask) have serious complaints with IDLE if you do this (IDLE just restarts, rather than doing anything). The other downside is that because modules import each other, sometimes it attempts to import a module that it is already in the process of importing (an easily caught error, but one I'm still trying to stamp out). It's not ideal, but it does get the job done. Additional details on the more specific issue are attached, and if anyone can offer a better answer, I will gladly accept it.

Categories