How to structure a python package made of multiple sub-projects? - python

what is the correct way to structure a python package for multiple functionalities?
I have a case where my project can be broken into 3 completely separate, logical chunks that are entirely decoupled. As such, I have broken the sub-projects into respective folders in my python package project. This has led to a file structure like this;
foobartoo
|_setup.py
|_foobartoo
|_foo
| |_foo.py
|
|_bar
| |_bar.py
|
|_too
|_too.py
This method keeps things together but separate nicely, but after installing I have noticed a definite problem. The only way to access one of the files is to do
from foobartoo.foo.foo import <method/class>
or
import foobartoo.foo.foo as <something>
<something>.<method/class>
This seems extremely impractical.
I can see two alternatives
scrapping the folder system and having foo.py, bar.py and too.py in the same directory under foobartoo.
This seems bad because it will be impossible to know which file/code belongs to which of the projects
breaking the single package into multiple packages.
This seems ok, but it would be better if the 3 things were together as they are intended to be used together.
I have looked at numpy and its source code, and somehow they seem to have a lot of their functionality behind many folders, yet they dont need to reference any of the folders. This seems ideal, just being able to do something like
import foobartoo
foobartoo.<classname/methodname>()

You can add additional paths to your 'main' script so sys will search automatically to many directories like:
import sys
sys.path.append(example_sub_dir)
import sub_script
sys.path.remove(example_sub_dir)
!note that you can add many directories on sys to search but you need to care as import time will be rise respectivelly

Related

Testing with configuration files

my Google-fu has failed me by giving me results I don't understand, so I'm asking here.
I'm working on a Python project and I currently have a configuration file, which is also .py, that holds various python objects to load when everything starts. I'm trying to get some practice unit testing with pytest and I don't know exactly how to go about this issue. My guess is that I am probably going to make a dedicated testing config file that doesn't change, but I have no clue how to tell my code when to use the actual config file and when to use the testing config file. My current setup in my code is just using import config and setting values from it. I would greatly appreciate some help here!
An approach I've used to solve this problem is manipulating the path when running tests so that an additional package (/tests/resources) can shadow the normal resources package (/main/resources) that I set up for containing assets such as configuration files and making them available for loading at runtime.
Note: This structure is inspired by a pattern that I've brought from Java/Maven, so I won't claim it's Pythonic, and I don't like the path manipulation where the tests are concerned (I know a lot of others won't, either, so beware!). In fact, I found this question while looking for a better way to do this.
To accomplish this you first have to set up a folder to serve as your 'resources' package (See the docs). Once that's done, you create a similar version for your tests, in the tests folder. The directory structure will look something like this:
project
|-main
| |-resources
| | |-__init__.py
| | \-application.yml
| \-[...other source modules/packages...]
|-tests
| |-resources
| | |-__init__.py
| | \-application.yml
| \-[...other modules/packages with tests...]
\-run_tests.py
Note that main and tests are NOT packages. This is what contradicts a lot of conventional guidance in Python project structure. Instead, they're both added to the path, with tests being inserted into the path in front of main, by the run_tests.py script using code like this (some bits borrowed from this post:
import os.path
import sys
import pytest
ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, os.path.join(ROOT_DIR, 'main'))
sys.path.insert(0, os.path.join(ROOT_DIR, 'tests'))
# Add any required args for testing.
args = []
pytest.main(args)
I'll typcially have additional options I want to feed pytest, and that's what args is for, but it's not relevant to this answer, so I left it empty here.
I know this is a late answer, but I hope this helps someone. Also, I know it's a controversial approach, but it's the most satisfying approach I've found so far. That may be because I'm experienced in Java as well as Python. To others, the approach of trying the import of tests.resources, and importing main.resources if it fails may be preferable.
I hope anyone with other approaches to this will share in comments, or post additional answers.
Figured out something based on this.
I ended up creating a testing config file in my folder named "tests" and having this code at the top of every file that used it:
import sys
if "pytest" in sys.modules:
import tests.testing_config as config
else:
import config
I don't think it's quite optimal since I feel something like this should be in the testing code, plus it kind of makes the new config file a dependency that should never be changed lest you break everything, but I guess it works for now if you have no clue how these cracked testing libraries work like me.

Structuring ocean modelling code in python

I am starting to use python for numerical simulation, and in particular I am starting from this project to build mine, that will be more complicated than this since I will have to try a lot of different methods and configurations. I work full time on Fortran90 codes and Matlab codes, and those are the two languages I am "mother tongue". In those two languages one is free to structure the code as he wants, and I am trying to mimic this feature because in my field (computation oceanography) things gets rather complicated easily. See as an example the code I daily work with, NEMO (here the main page, here the source code). The source code (of NEMO) is conveniently divided in folders, each of which contains modules and methods for a specific task (e.g. the domain discretisation routines are in folder DOM, the vertical physics is in the folder ZDF, the lateral physics in LDF and so on), this because the processes (physical or purely mathematical) involved are completely different.
What I am trying to build is this
/shallow_water_model
-
create_conf.py (creates a new subdirectory in /cfgs with a given name, like "caspian_sea" or "mediterranean_sea" and copies the content of the folder /src inside this new subdirectory to create a new configuration)
/cfgs
-
/caspian_sea (example configuration)
/mediterranean_sea (example configuration)
/src
-
swm_main.py (initialize a dictionary and calls the functions)
swm_param.py (fills the dictionary)
/domain
-
swm_grid.py (creates a numerical grid)
/dynamics
-
swm_adv.py (create advection matrix)
swm_dif.py (create diffusion matrix)
/solver
-
swm_rk4.py (time stepping with Runge-Kutta4)
swm_pc.py (time stepping with predictor corrector)
/IO
-
swm_input.py (handles netCDF input)
sim_output.py (handles netCDF output)
The script create_conf.py contains the following structure, and it is supposed to take a string input from the terminal, create a folder with that name and copy all the files and subdirectories of /src folder inside, so one can put there all the input files of this configuration and eventually modify the source code to create an ad-hoc source code for the configuration. This duplication of the source code is common in the ocean modelling community because two different configuration (like the Mediterranean Sea and the Caspian Sea) may differ not only in the input files (like topography, coastlines etc etc) but also in the modelling itself, meaning that the modification you need to make to the source code for each configuration might be substantial. (Most ocean models allow you to put your own modified source files in specific folders and they are instructed to overwrite the specific files at compilation. My code is going to be simple enough to just duplicate the source code.)
import os, sys
import shutil
def create_conf(conf_name="new_config"):
cfg_dir = os.getcwd() + "/cfgs/"
# Check if configuration exists
try:
os.makedirs(cfg_dir + conf_name)
print("Configuration " + conf_name + " correctly created")
except FileExistsError:
# directory already exists
# Handles overwriting, duplicates or stop
# make a copy of "/src" into the new folder
return
# This is supposed to be used directly from the terminal
if __name__ == '__main__':
filename = sys.argv[1]
create_conf(filename)
The script swm_main.py can be thought as a list of calls to the necessary routines depending on the kind of process you want to take into account, just like
import numpy as np
from DOM.swm_domain import set_grid
from swm_param import set_param, set_timestep, set_viscosity
# initialize dictionary (i.e. structure) containing all the parameters of the run
global param
param = dict()
# define the parameters (i.e. call swm_param.py)
set_param(param)
# Create the grid
set_grid(param)
The two routines called just take a particular field of param and assign it a value, like
import numpy as np
import os
def set_param(param):
param['nx'] = 32 # number of grid points in x-direction
param['ny'] = 32 # number of grid points in y-direction
return param
Now, the main topic of discussion is how to achieve this kind of structure in python. I almost always find source codes that are either monolithic (all routines in the same file) or a sequence of files in the same folders. I want to have some better organisation, but the solution I found browsing fills every subfolder in /src with a folder __pycache__ and I need to put a __init__.py file in each folder. I don't know why but these two things make me think there is something sloppy in this approach. Moreover, I need to import modules (like numpy) in every file, and I was wondering whether this was efficient or not.
What do you think would be better to keep this structuring and keep it as simple as possible?
Thanks for your help
As I understand the actual question here is:
the solution I found browsing fills every subfolder in /src with a folder __pycache__ and I need to put a __init__.py file in each folder... this makes me think there is something sloppy in this approach.
There is nothing sloppy or unpythonic about making your code into packages. In order to be able to import from .py files in a directory, one of two conditions has to be satisfied:
the directory must be in your sys.path, or
the directory must be a package, and that package must be a sub-directory of some directory in your sys.path (or a sub-directory of a package which is a sub-directory of some directory in your sys.path)
The first solution is generally hacky in code, although often appropriate in tests, and involves modifying sys.path to add every dir you want. This is generally hacky because the whole point of putting your code inside a package is that the package structure encodes some natural division in the source: e.g. a package modeller is conceptually distinct from a package quickgui, and each could be used independently of each other in different programs.
The easiest[1] way to make a directory into a package is to place an __init__.py in it. The file should contain anything which belongs conceptually at the package level, i.e. not in modules. It may be appropriate to leave it empty, but it's often a good idea to import the public functions/classes/vars from your modules, so you can do from mypkg import thing rather than from mypkg.module import thing. Packages should be conceptually complete, which normally means you should be able (in theory) to use them from multiple places. Sometimes you don't want a separate package: you just want a naming convention, like gui_tools.py gui_constants.py, model_tools.py, model_constants.py, etc. The __pycache__ folder is simply python caching the bytecode to make future imports faster: you can move that or prevent it, but just add *__pycache__* to your .gitignore and forget about them.
Lastly, since you come from very different languages:
lots of python code written by scientists (rather than programmers) is quite unpythonic IMHO. Billion line long single python files is not good style[2]. Python prefers readability, always: call things derived_model not dm1. If you do that you may well find you don't need as many dirs as you thought.
importing the same module in every file is a trivial cost: python imports once: every other import is just another name bound in sys.modules. Always import explicitly.
in general stop worrying about performance in python. Write your code as clearly as possible, then profile it if you need to, and find what is slow. Python is so high level that micro-optimisations learned in compiled languages will probably backfire.
lastly, and this is mostly personal, don't give folders/modules names in CAPITALS. FORTRAN might encourage that, and it was written on machines which often didn't have case sensitivity for filenames, but we no longer have those constraints. In python we reserve capitals for constants, so I find it plain weird when I have to modify or execute something in capitals. Likewise 'DOM' made me think of the document object model which is probably not what you mean here.
References
[1] Python does have implicit namespace packages but you are still better off with explicit packages to signal your intention to make a package (and to avoid various importing problems).
[2] See pep8 for some more conventions on how you structure things. I would also recommend looking at some decent general-purpose libraries to see how they do things: they tend to be written by mainstream programmers who focus on writing clean, maintainable code, rather than by scientists who focus on solving highly specific (and frequently very complicated) problems.

relative import from utility script in subdirectory

From reading over other answers, it seems that possibly my layout is "un-Pythonic," although I'm really not quite sure. If so that would be helpful to know, along with a suggestion for the better layout.
Here is my script layout:
/
__init__.py
main_prog.py
utilities.py
/support_scripts
support_utility1.py
support_utility2.py
...
The support utilities contain functionality which are related to main_prog.py, but are best placed in their own scripts. Since there are many of them, I have moved them in their own directory. But they use some of the same functionality from utilities.py
When I try to import using from .. import utilities I get the error message "ValueError: attempted relative import beyond top-level package"
Now my first question would simply be: Is it considered bad to try to place additional scripts in subdirectories like this? Knowing a general principle like that would go a long way to solving my problems. And of course if you have any specific suggestions that will help too.

Import from parent directory for a test sub-directory without using packaging, Python 2.7

TL;DR
For a fixed and unchangeable non-package directory structure like this:
some_dir/
mod.py
test/
test_mod.py
example_data.txt
what is a non-package way to enable test_mod.py to import from mod.py?
I am restricted to using Python 2.7 in this case.
I want to write a few tests for the functions in mod.py. I want to create a new directory test that sits alongside mod.py and inside it there is one test file test/test_mod.py which should import the functions from mod.py and test them.
Because of the well-known limitations of relative imports, which rely on package-based naming, you can't do this in the straightforward way. Yet, all advice on the topic suggests to build the script as a package and then use relative imports, which is impossible for my use case.
In my case, it is not allowable for mod.py to be built as a package and I cannot require users of mod.py to install it. They are instead free to merely check the file out from version control and just begin using it however they wish, and I am not able to change that circumstance.
Given this, what is a way to provide a simple, straightforward test directory?
Note: not just a test file that sits alongside mod.py, but an actual test directory since there will be other assets like test data that come with it, and the sub-directory organization is critical.
I apologize if this is a duplicate, but out of the dozen or so permutations of this question I've seen in my research before posting, I haven't seen a single one that addresses how to do this. They all say to use packaging, which is not a permissible option for my case.
Based on #mgilson 's comment, I added a file import_helper.py to the test directory.
some_dir/
mod.py
test/
test_mod.py
import_helper.py
example_data.txt
Here is the content of import_helper.py:
import sys as _sys
import os.path as _ospath
import inspect as _inspect
from contextlib import contextmanager as _contextmanager
#_contextmanager
def enable_parent_import():
path_appended = False
try:
current_file = _inspect.getfile(_inspect.currentframe())
current_directory = _ospath.dirname(_ospath.abspath(current_file))
parent_directory = _ospath.dirname(current_directory)
_sys.path.insert(0, parent_directory)
path_appended = True
yield
finally:
if path_appended:
_sys.path.pop(0)
and then in the import section of test_mod.py, prior to an attempt to import mod.py, I have added:
import unittest
from import_helper import enable_parent_import
with enable_parent_import():
from mod import some_mod_function_to_test
It is unfortunate to need to manually mangle PYTHONPATH, but writing it as a context manager helps a little, and restores sys.path back to its original state prior to the parent directory modification.
In order for this solution to scale across multiple instances of this problem (say tomorrow I am asked to write a widget.py module for some unrelated tasks and it also cannot be distributed as a package), I have to replicate my helper function and ensure a copy of it is distributed with any tests, or I have to write that small utility as a package, ensure it gets globally installed across my user base, and then maintain it going forward.
When you manage a lot of Python code internal to a company, often one dysfunctional code distribution mode that occurs is that "installing" some Python code becomes equivalent to checking out the new version from version control.
Since the code is often extremely localized and specific to a small set of tasks for a small subset of a larger team, maintaining the overhead for sharing code via packaging (even if it is a better idea in general) will simply never happen.
As a result, I feel the use case I describe above is extremely common for real-world Python, and it would be nice if some import tools added this functionality for modifying PYTHONPATH, which some sensible default choices (like adding parent directory) being very easy.
That way you could rely on this at least being part of the standard library, and not needing to roll your own code and ensure it's either shipped with your tests or installed across your user base.

Python preprocessing imports

I am managing a quite large python code base (>2000 lines) that I want anyway to be available as a single runnable python script. So I am searching for a method or a tool to merge a development folder, made of different python files into a single running script.
The thing/method I am searching for should take code split into different files, maybe with a starting __init___.py file that contains the imports and merge it into a single, big script.
Much like a preprocessor. Best if a near-native way, better if I can anyway run from the dev folder.
I have already checked out pypp and pypreprocessor but they don't seem to take the point.
Something like a strange use of __import__() or maybe a bunch of from foo import * replaced by the preprocessor with the code? Obviously I only want to merge my directory and not common libraries.
Update
What I want is exactly mantaining the code as a package, and then being able to "compile" it into a single script, easy to copy-paste, distribute and reuse.
It sounds like you're asking how to merge your codebase into a single 2000-plus source file-- are you really, really sure you want to do this? It will make your code harder to maintain. Python files correspond to modules, so unless your main script does from modname import * for all its parts, you'll lose the module structure by converting it into one file.
What I would recommend is leaving the source structured as they are, and solving the problem of how to distribute the program:
You could use PyInstaller, py2exe or something similar to generate a single executable that doesn't even need a python installation. (If you can count on python being present, see #Sebastian's comment below.)
If you want to distribute your code base for use by other python programs, you should definitely start by structuring it as a package, so it can be loaded with a single import.
To distribute a lot of python source files easily, you can package everything into a zip archive or an "egg" (which is actually a zip archive with special housekeeping info). Python can import modules directly from a zip or egg archive.
waffles seems to do exactly what you're after, although I've not tried it
You could probably do this manually, something like:
# file1.py
from .file2 import func1, func2
def something():
func1() + func2()
# file2.py
def func1(): pass
def func2(): pass
# __init__.py
from .file1 import something
if __name__ == "__main__":
something()
Then you can concatenate all the files together, removing any line starting with from ., and.. it might work.
That said, an executable egg or regular PyPI distribution would be much simpler and more reliable!

Categories