Handling complicated directory structure with python imports - python

I've worked on several medium-sized python applications to date, and every time it seems like I cobble together a terrible system of imports from tangential Stack Overflow answers and half-understood blog posts. It's ugly and hard to maintain and ultimately very unsatisfying. With this question I attempt to put all that behind me.
Say I have a python application split into the following files:
app.py
constants.py
ui/window.py
web/connection.py
With the following include requirements:
app.py needs to include window.py and connection.py
window.py needs to include constants.py and connection.py
connection.py needs to include constants.py
app.py is the starting point for the application, but window.py and connection.py are also invokable from the command line to test basic functionality (ideally from within their respective folders).
What combination of __init__.py files, carefully crafted import statements and wacky python path magic will allow me to achieve this structure?
Thanks very much,
--Dan

It really helps if, instead of thinking in terms of "file structure" first and then trying to figure out the packages, you design things in terms of packages, and then lay out your file structure to implement those packages.
But if you want to know how to hack up what you already have: If you put this at the top level (that is, in one of the paths on your sys.path), and create files names ui/__init__.py and web/__init__.py, then:
app.py can be run as a script.
app.py can be run with -m app.
app.py can be imported with import app.
window.py cannot be run directly.
window.py can be run with -m ui.window.
window.py can be imported with import ui.window.
connection.py cannot be run directly.
connection.py can be run with -m web.connection.
connection.py can be imported with import web.connection.
No wacky path magic is needed; you just need the top level (with app.py, constants.py, ui, and web) to be on your sys.path—which it automatically is when you run with that directory as your working directory, or install everything directly into site-packages, or install it as an egg, etc.
That's as close as you're going to get to what you want. You do ever want to run code with a package directory as your current working directory or otherwise on sys.path, so don't even try. If you think you need that, what you probably want is to separate the runnable code out into a script that you can put at the top level, or somewhere entirely separate. (For example, look at pip or ipython, which installs scripts into somewhere on your system $PATH that do nothing but import some module and run a function.)
The only other thing you might want to consider is putting all of this into a package, say, myapp. You do that by adding a top-level __init__.py, and then running from the parent directory, and adding myapp. to the start of all your importand -m statements. That means you can no longer run app.py as a script either, so again you will need to split the script code out into a separate file from the module that does all the work.

You can use that structure with just a small modification: add empty __init__.py files to the ui/ and web/ directory. Then, where you would have done import window, do either import ui.window, or from ui import window. Similarly, change import connection to import web.connection or from web import connection.
Rationale: Python doesn't work so much with directories as it does with packages, which are directories with an __init__.py in them. By changing ui and web to be packages, you don't have to do any particular Python path magic to work with them, and you get the benefit of adding some structure to your modules and imports. That will become particularly important if you start having modules with the same name in different directories (e.g. a util.py in both the ui and web directories; not necessarily the cleanest design but you get the idea).
If you invoke window.py or connection.py directly to test them, you need to add the top-level directory to your PYTHONPATH for things to still work – but there is a subtle additional wrinkle. When you run this from the top-level directory:
PYTHONPATH=$PWD python web/connection.py
you now have both the top-level directory on your module path AND the web/ directory. This can cause certain relative imports to do unexpected things.
Another way is to use Python's -m option from the top-level directory:
python -m web.foo
I know many folks like to nail their tests right into the modules like this, but I should also note that there are other ways to structure your tests, particularly with an updated unittest library and tools like nosetests, that will make it a little bit easier to run your tests as your project gets larger. See the skeleton here for a reasonable example:
http://learnpythonthehardway.org/book/ex46.html

Related

Is there a way to import python package correctly from external folder?

I have a python package folder, I am using regularly. As part of my work, I have multiple versions of this package folder.
The folder look something like this,
Main folder
scripts.ipynb
other stuff
package folder
init.py
all submodules folders
The init.py file consists of 'import package_name.submodule_name' lines. I have many main folders, like this one, for all the different versions. If I run the scripts.ipynb notebook, then it works correctly with the wanted package version.
But if I run the notebook, outside of the main_folder, it seems I can not import the package correctly. I want to have the ability to run single scripts notebook, where I can toggle and switch between the different main folders/packages.
Giving unique name for every package is very tedious work, because the name of the package is repeated numerous time along the files. I tried to append the python path to the system, or set the PYTHONPATH to look at the correct folder, but it doesn't help.
In the case of appending the python path to sys, the import routine goes to the correct init.py file, but inside the file it fails to import the submodules. When setting the PYTHONPATH to the folder, no error is given, but the functionality is incorrect (error at first call).

Again, relative import in Python + issue with sudo

I have read plenty of documentation (including this and linked references) on this, but it is simply too difficult for my simple mind: I cannot understand the logics of python import and I usually waste plenty of time in random attempts till I reach a working permutation of settings and commands. May be this is due to the fact that I usually use PyCharm, where everything magically work. Now I am using Visual Studio Code on a remote machine and I need to ask here since I have wasted double the time I usually spend on this without reaching a permutation that works.
Using python 3 on linux (remote machine). The python interpreter is configured with a virtual environment and it does not correspond to the system level one.
I have this project. Its folder structure is mirrored in linux filesystem, i.e., prj, src, commonn, etc. are all folders.
prj
|- src
| some py files
| |- common/
| - common1.py
| - common2.py
| |- pipelines/
| - main_pipeline1.py (<- file prefixed with main_ have a __main__ entry point)
| - main_pipeline2.py
| | - other py module
| | - other py module, ... and others - some of these modules use common
|- data/ ...
|- doc/ ...
In pipeline1.py, I have: import common.common1. I corrected this
In what follows $[folder] corresponds to the bash prompt, so $ stands for normal user and folder is the current folder.
When I run pipeline1.py as normal user (on the remote machine), first I get an error:
$[prj/src] python pipeline/pipeline1.py
ModuleNotFoundError: No module named 'common'
In order to have it working I need to add the current folder to PYTHONPATH (that is empty). So
$[prj/src] PYTHONPATH=.
$[prj/src] python pipeline/pipeline1.py
works.
However, the previous script writes in a disk that requires root access, so the previous command needs to be run with sudo. I cannot find a way to run it using sudo:
I tried (after reading, among others, this):
$[prj/src] sudo python pipeline/pipeline1.py
$[prj/src] sudo /path/to/env/bin/python pipeline/pipeline1.py
$[prj/src] sudo -E /path/to/env/bin/python pipeline/pipeline1.py
they all fail, all but the first because python cannot find the module common. Even if I asked to keep the environment with -E (so PYTHONPATH should be kept) the import fails. All the other imports from the virtual environment (that occur before the import common) do not fail.
In the future I need to give the code to a sys admin that might possibly not have any specific knowledge of python: I cannot ask him to set PYTHONPATH check this, check that.
In this case, how should I organize my code to have the import common (or any other module I write) succeed? Do I really need to add PYTHONPATH=. every time?
Is there any kind soul willing to help me? Beer after the pandemic is over.
I made a correction:
import common.common1.py --> common.common1
I'm assuming Linux and also that the Python software to be distributed has the setup.py.
Short answer: no, you don't have to modify the PYTHONPATH or sys.path
Create a virtual env (say /opt/myprog) as usual.
Activate it and install your package (say mypkg) and all its dependencies.
Put all executable scripts to the bin subdirectory of the virtual env and make sure they start with #!/opt/myprog/bin/python3 shebang line. With correct setup.py this will hapen automatically during installation, see scripts. The scripts will be able to normally import the installed package import mypkg or its parts from mypkg import ...
Finally symlink the scripts to a directory in users' PATH e.g. to /usr/local/bin. This must be done manually and only once unless you add or rename a script.
Projects installed this way can be normally upgraded (with pip inside an activated environemnt) and the scripts can be normally invoked from the command line.
Based on the structure and your comments, my guess is that you are not trying
to pip-install this project as a proper Python package. Rather, it is just a
directory with some scripts and modules you want to use. If so, you have at
least a couple of options.
First, don't muck around with PYTHONPATH or modifying sys.path. That is
almost always a worse approach.
The basic rule for Python importing: the root directory for the purpose of
finding packages is the directory of the script/file used to invoke Python.
(Ignoring built-ins and packages that have been formally installed, of course.)
Maybe the easiest solution is to move common under pipelines (and
optionally make it a
package by creating __init__.py inside of it). If you follow the logic
of the basic rule, you'll understand why this works (and it won't
be affected by full-path vs relative-path issues when invoking
python).
src/
pipelines/
common/
__init__.py # Optional for Python 3.3+
common1.py
common2.py
main_pipeline1.py
main_pipeline2.py
Another approach is to create simple runner script at the top level. The runner imports the pipelines, selects
the right one (based command-line argument or some other configuration),
and executes its top-level code (eg, its main()). If the pipelines
are not well organized for that type of importing and execution, this
approach is quite a bit harder.
src/
common/
__init__.py # Optional
common1.py
common2.py
pipelines/
__init__.py # Optional
main_pipeline1.py
main_pipeline2.py
runner.py
Separate issue: import modules, not files.
import common.common1.py # No
import common.common1 # Yes
Add this to the start of pipeline1.py:
import sys
import os
sys.path.append(os.path.realpath(os.path.dirname(__file__), "/.."))
import common.common1

Python script works in PyCharm but not in terminal

I'm currently trying to import one of my modules from a different folder.
Like this...
from Assets.resources.libs.pout import Printer, ForeColor, BackColor
This import method works completely fine in PyCharm, however, when i try to launch the file in cmd or IDLE, i get this error.
ModuleNotFoundError: No module named 'Assets'
This is my file structure from main.py to pout.py:
- Assets
- main.py
- resources
- libs
- pout.py
Any clue about how i could fix this ?
Any help is appreciated !
Edit: The original answer was based on the assumption that the script you're running is within the folder structure given, which a re-read tells me may not be true. The general solution is to do
sys.path.append('path_to_Assets')
but read below for more detail.
Original answer
The paths that modules can be loaded from will differ between the two methods of running the script.
If you add
import sys
print(sys.path)
to the top of your script before the imports you should be able to see the difference.
When you run it yourself the first entry will be the location of the script itself, followed by various system/environment paths. When you run it in PyCharm you will see the same first entry, followed by an entry for the top level of the project. This is how it finds the modules when run from PyCharm. This behaviour is controlled by the "Add content roots to PYTHONPATH" option in the run configuration.
Adding this path programmatically in a way that will work in all situations isn't trivial because, unlike PyCharm, your script doesn't have a concept of where the top level should be. BUT if you know you'll be running the script from the top level, i.e. your working directory will be the folder containing Assets and you're running something like python Assets/main.py then you can do
sys.path.append(os.path.abspath('.'))
and that will add the correct folder to the path.
Appending sys path didn't work for me on windows, hence here is the solution that worked for me:
Add an empty __init__.py file to each directory
i.e. in Assets, resources, libs.
Then try importing with only the base package names.
Worked for me!

ModuleNotFoundError when running script from Terminal

I have the following folder structure:
app
__init__.py
utils
__init__.py
transform.py
products
__init__.py
fish.py
In fish.py I'm importing transform as following: import utils.transform.
When I'm running fish.py from Pycharm, it works perfectly fine. However when I am running fish.py from the Terminal, I am getting error ModuleNotFoundError: No module named 'utils'.
Command I use in Terminal: from app folder python products/fish.py.
I've already looked into the solutions suggested here: Importing files from different folder, adding a path to the application folder into the sys.path helps. However I am wondering if there is any other way of making it work without adding two lines of code into the fish.py. It's because I have many scripts in the /products directory, and do not want to add 2 lines of code into each of them.
I looked into some open source projects, and I saw many examples of importing modules from a parallel folder without adding anything into sys.path, e.g. here:
https://github.com/jakubroztocil/httpie/blob/master/httpie/plugins/builtin.py#L5
How to make it work for my project in the same way?
You probably want to run python -m products.fish. The difference between that and python products/fish.py is that the former is roughly equivalent to doing import products.fish in the shell (but with __name__ set to __main__), while the latter does not have awareness of its place in a package hierarchy.
This expands on #Mad Physicist's answer.
First, assuming app is itself a package (since you added __init__.py to it) and utils and products are its subpackages, you should change the import to import app.utils.transform, and run Python from the root directory (the parent of app). The rest of this answer assumes you've done this. (If it wasn't your intention making app the root package, tell me in a comment.)
The problem is that you're running app.products.fish as if it were a script, i.e. by giving the full path of the file to the python command:
python app/products/fish.py
This makes Python think this fish.py file is a standalone script that isn't part of any package. As defined in the docs (see here, under <script>), this means that Python will search for modules in the same directory as the script, i.e. app/products/:
If the script name refers directly to a Python file, the directory
containing that file is added to the start of sys.path, and the file
is executed as the __main__ module.
But of course, the app folder is not in app/products/, so it will throw an error if you try to import app or any subpackage (e.g. app.utils).
The correct way to start a script that is part of a package is to use the -m (module) switch (reference), which takes a module path as an argument and executes that module as a script (but keeping the current working directory as a module search path):
If this option is given, [...] the current directory
will be added to the start of sys.path.
So you should use the following to start your program:
python -m app.products.fish
Now when app.products.fish tries to import the app.utils.transform module, it will search for app in your current working directory (which contains the app/... tree) and succeed.
As a personal recommendation: don't put runnable scripts inside packages. Use packages only to store all the logic and functionality (functions, classes, constants, etc.) and write a separate script to run your application as you wish, putting it outside the package. This will save you from this kind of problems (including the double import trap), and has also the advantage that you can write several run configurations for the same package by just making a separate startup script for each.

python doctest from a separate file + location

I have a directory structure like this:
|-root
|-app
program.py
tests.txt
|-tests
runTests.py
My tests.txt file contains all the doctests on my program.py code. It calls
from program import *
and then it makes all the doctest calls.
My runTests.py file has this code:
import doctest
doctest.testfile("app/tests.txt")
In the command line I then call:
python runTests.py
and it does indeed find the tests.txt file and reads it successfully but it does not find the module "program" which I am trying to import. What am I doing wrong? How can I have them in separate directories and still be able to run the tests?
Thanks
There are two ways to do this:
You can add it the sys.path in runTests.py: import sys; sys.path.append('app')
The same can be achieved with the environment variable PYTHONPATH.
You can turn the folder into a package by adding a file __init__.py and importing from app.program import * in tests.txt.
Reminder: The main premise of doctest is that the method/function documentation (what you get when you look at the __doc__ property) explains what it does and gives examples (the tests).
Doctest then finds such code and executes it, making sure that the examples in the documentation are actually working.
If you move the tests out, then you're taking away a major source of information from the user of your code. I guess it might make sense if you have extensive documentation outside of the source code or if you have many additional tests (you want to give users a few examples, not all 500 unit tests which drive code coverage to 100%).
That said, to fix the issue, you need to make the import work. The folder app doesn't magically appear in the search path for modules. You have to tell Python that this is in fact a place where it should look.

Categories