Can a Python module be multiple files? - python

For years, I've known that the very definition of a Python module is as a separate file. In fact, even the official documentation states that "a module is a file containing Python definitions and statements". Yet, this online tutorial from people who seem pretty knowledgeable states that "a module usually corresponds to a single file". Where does the "usually" come from? Can a Python module consist of multiple files?

Not really.
Don't read too much into the phrasing of one short throwaway sentence, in a much larger blog post that concerns packaging and packages, both of which are by nature multi-file.
Imports do not make modules multifile
By the logic that modules are multifile because of imports... almost any python module is multifile. Unless the imports are from the subtree, which has no real discernible difference to code using the module. That notion of subtree imports, btw, is relevant... to Python packages.
__module__, the attribute found on classes and functions, also maps to one file, as determined by import path.
The usefulness of expanding the definition of modules that way seems… limited, and risks confusion. Let imports be imports ans modules be modules (i.e. files).
But that's like, my personal opinion.
Let's go all language lawyer on it
And refer to the Python tutorial. I figure they will be talking about modules at some point and will be much more careful in their wording than a blog post which was primarily concerned about another subject.
6. Modules
To support this, Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a module; definitions from a module can be imported into other modules or into the main module (the collection of variables that you have access to in a script executed at the top level and in calculator mode).
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Within a module, the module’s name (as a string) is available as the value of the global variable name.
p.s. OK, what about calling it a file, instead of a module, then?
That supposes that you store Python code in a file system. But you could have an exotic environment that stores it in a database instead (or embeds it in a larger C/Rust executable?). So, module, seems better understood as a "contiguous chunk of Python code". Usually that's a file, but having a separate term allows for flexibility, without changing anything to the core concepts.

Yup, a python module can include more than one file. Basically what you would do is get a file for the main code of the module you are writing, and in that main file include some other tools you can use.
For example, you can have the file my_splitter_module.py, in which you have... say a function that gets a list of integers and split it in half creating two lists. Now say you wanna multiply all the numbers that are in the first half between each other ([1, 2, 3] -> 1 * 2 * 3), but with the other half sum them ([1, 2, 3] -> 1 + 2 + 3). Now say you don't want to make the code messy and so you decide to make another two functions, one that gets a list and multiply its items, and another that sum them.
Of course, you could make the two functions in the same my_splitter_module.py file, but in other situations when you have big files with big classes etc, you would like to make a file like multiply_list.py and sum_list.py, and then importing them to the my_splitter_module.py
At the end, you would import my_splitter_module.py to your main.py file, and while doing this, you would also be importing multiply_list.py and sum_list.py files.

Yes, sure.
If you create a folder named mylib in your PATH or in the same directory as your script, it allows you to use import mylib.
Make sure to put __init__.py in the folder and in that imoprt everything from other files because the variables, functions, etc. import just from the __init__.py.
For example:
project -+- lib -+- __init__.py
| +- bar.py
| +- bar2.py
|
+- foo.py
__init__.py :
from bar import test, random
from bar2 import sample
foo.py :
import lib
print(test)
sample()
Hope it helps.

Related

Structuring ocean modelling code in python

I am starting to use python for numerical simulation, and in particular I am starting from this project to build mine, that will be more complicated than this since I will have to try a lot of different methods and configurations. I work full time on Fortran90 codes and Matlab codes, and those are the two languages I am "mother tongue". In those two languages one is free to structure the code as he wants, and I am trying to mimic this feature because in my field (computation oceanography) things gets rather complicated easily. See as an example the code I daily work with, NEMO (here the main page, here the source code). The source code (of NEMO) is conveniently divided in folders, each of which contains modules and methods for a specific task (e.g. the domain discretisation routines are in folder DOM, the vertical physics is in the folder ZDF, the lateral physics in LDF and so on), this because the processes (physical or purely mathematical) involved are completely different.
What I am trying to build is this
/shallow_water_model
-
create_conf.py (creates a new subdirectory in /cfgs with a given name, like "caspian_sea" or "mediterranean_sea" and copies the content of the folder /src inside this new subdirectory to create a new configuration)
/cfgs
-
/caspian_sea (example configuration)
/mediterranean_sea (example configuration)
/src
-
swm_main.py (initialize a dictionary and calls the functions)
swm_param.py (fills the dictionary)
/domain
-
swm_grid.py (creates a numerical grid)
/dynamics
-
swm_adv.py (create advection matrix)
swm_dif.py (create diffusion matrix)
/solver
-
swm_rk4.py (time stepping with Runge-Kutta4)
swm_pc.py (time stepping with predictor corrector)
/IO
-
swm_input.py (handles netCDF input)
sim_output.py (handles netCDF output)
The script create_conf.py contains the following structure, and it is supposed to take a string input from the terminal, create a folder with that name and copy all the files and subdirectories of /src folder inside, so one can put there all the input files of this configuration and eventually modify the source code to create an ad-hoc source code for the configuration. This duplication of the source code is common in the ocean modelling community because two different configuration (like the Mediterranean Sea and the Caspian Sea) may differ not only in the input files (like topography, coastlines etc etc) but also in the modelling itself, meaning that the modification you need to make to the source code for each configuration might be substantial. (Most ocean models allow you to put your own modified source files in specific folders and they are instructed to overwrite the specific files at compilation. My code is going to be simple enough to just duplicate the source code.)
import os, sys
import shutil
def create_conf(conf_name="new_config"):
cfg_dir = os.getcwd() + "/cfgs/"
# Check if configuration exists
try:
os.makedirs(cfg_dir + conf_name)
print("Configuration " + conf_name + " correctly created")
except FileExistsError:
# directory already exists
# Handles overwriting, duplicates or stop
# make a copy of "/src" into the new folder
return
# This is supposed to be used directly from the terminal
if __name__ == '__main__':
filename = sys.argv[1]
create_conf(filename)
The script swm_main.py can be thought as a list of calls to the necessary routines depending on the kind of process you want to take into account, just like
import numpy as np
from DOM.swm_domain import set_grid
from swm_param import set_param, set_timestep, set_viscosity
# initialize dictionary (i.e. structure) containing all the parameters of the run
global param
param = dict()
# define the parameters (i.e. call swm_param.py)
set_param(param)
# Create the grid
set_grid(param)
The two routines called just take a particular field of param and assign it a value, like
import numpy as np
import os
def set_param(param):
param['nx'] = 32 # number of grid points in x-direction
param['ny'] = 32 # number of grid points in y-direction
return param
Now, the main topic of discussion is how to achieve this kind of structure in python. I almost always find source codes that are either monolithic (all routines in the same file) or a sequence of files in the same folders. I want to have some better organisation, but the solution I found browsing fills every subfolder in /src with a folder __pycache__ and I need to put a __init__.py file in each folder. I don't know why but these two things make me think there is something sloppy in this approach. Moreover, I need to import modules (like numpy) in every file, and I was wondering whether this was efficient or not.
What do you think would be better to keep this structuring and keep it as simple as possible?
Thanks for your help
As I understand the actual question here is:
the solution I found browsing fills every subfolder in /src with a folder __pycache__ and I need to put a __init__.py file in each folder... this makes me think there is something sloppy in this approach.
There is nothing sloppy or unpythonic about making your code into packages. In order to be able to import from .py files in a directory, one of two conditions has to be satisfied:
the directory must be in your sys.path, or
the directory must be a package, and that package must be a sub-directory of some directory in your sys.path (or a sub-directory of a package which is a sub-directory of some directory in your sys.path)
The first solution is generally hacky in code, although often appropriate in tests, and involves modifying sys.path to add every dir you want. This is generally hacky because the whole point of putting your code inside a package is that the package structure encodes some natural division in the source: e.g. a package modeller is conceptually distinct from a package quickgui, and each could be used independently of each other in different programs.
The easiest[1] way to make a directory into a package is to place an __init__.py in it. The file should contain anything which belongs conceptually at the package level, i.e. not in modules. It may be appropriate to leave it empty, but it's often a good idea to import the public functions/classes/vars from your modules, so you can do from mypkg import thing rather than from mypkg.module import thing. Packages should be conceptually complete, which normally means you should be able (in theory) to use them from multiple places. Sometimes you don't want a separate package: you just want a naming convention, like gui_tools.py gui_constants.py, model_tools.py, model_constants.py, etc. The __pycache__ folder is simply python caching the bytecode to make future imports faster: you can move that or prevent it, but just add *__pycache__* to your .gitignore and forget about them.
Lastly, since you come from very different languages:
lots of python code written by scientists (rather than programmers) is quite unpythonic IMHO. Billion line long single python files is not good style[2]. Python prefers readability, always: call things derived_model not dm1. If you do that you may well find you don't need as many dirs as you thought.
importing the same module in every file is a trivial cost: python imports once: every other import is just another name bound in sys.modules. Always import explicitly.
in general stop worrying about performance in python. Write your code as clearly as possible, then profile it if you need to, and find what is slow. Python is so high level that micro-optimisations learned in compiled languages will probably backfire.
lastly, and this is mostly personal, don't give folders/modules names in CAPITALS. FORTRAN might encourage that, and it was written on machines which often didn't have case sensitivity for filenames, but we no longer have those constraints. In python we reserve capitals for constants, so I find it plain weird when I have to modify or execute something in capitals. Likewise 'DOM' made me think of the document object model which is probably not what you mean here.
References
[1] Python does have implicit namespace packages but you are still better off with explicit packages to signal your intention to make a package (and to avoid various importing problems).
[2] See pep8 for some more conventions on how you structure things. I would also recommend looking at some decent general-purpose libraries to see how they do things: they tend to be written by mainstream programmers who focus on writing clean, maintainable code, rather than by scientists who focus on solving highly specific (and frequently very complicated) problems.

Python - Importing function from external file and global modules

In order to simplify my code, I have put various functions into external files with I load via:
from (external_file) import (function_name)
...which works fine.
My question though has to do with other modules, such as cv2 or numpy - do I need those listed in my external file (as well as my main file) or is there a way to just list them in my main file?
Each file you put Python code in is its own module. Each module has its own namespace. If some of your code (in any module) uses some library code, it will need some way to access the library from the namespace it is defined in.
Usually this means you need to import the library in each module it's being used from. Don't worry about duplication, modules are cached when they are first loaded, so additional imports from other modules will quickly find the existing module and just add a reference to it in their own namespaces.
Note that it's generally not a good idea to split up your code too much. There's certainly no need for every function or every class to have its own file. Instead, use modules to group related things together. If you have a couple of functions that interoperate a lot, put them in the same module.

Proper way to import across Python package

Let's say I have a couple of Python packages.
/package_name
__init__.py
/dohickey
__init__.py
stuff.py
other_stuff.py
shiny_stuff.py
/thingamabob
__init__.py
cog_master.py
round_cogs.py
teethless_cogs.py
/utilities
__init__.py
important.py
super_critical_top_secret_cog_blueprints.py
What's the best way to utilize the utilites package? Say shiny_stuff.py needs to import important.py, what's the best way to go about that?
Currently I'm thinking
from .utilities import important
But is that the best way? Would it make more sense to add utilities to the path and import it that way?
import sys
sys.path.append(os.path.basename(os.path.basename(__file__)))
import utilities.super_critical_top_secret_cog_blueprints
That seems clunky to add to each of my files.
I think the safest way is always to use absolute import, so in you case:
from package_name.utilities import important
This way you won't have to change your code if you decide to move your shiny_stuff.py in some other package (assuming that package_name will still be in your sys.path).
According to Nick Coghlan (who is a Python core developer):
"“Never add a package directory, or any directory inside a package, directly to the Python path.” (Under the heading "The double import trap")
Adding the package directory to the path gives two separate ways for the module to be referred to. The link above is an excellent blog post about the Python import system. Adding it to the path directly means you can potentially have two copies of a single module, which you don't want. Your relative import from .utilities import important is fine, and an absolute import import package_name.utilities.important is also fine.
A "best" out-of-context choice probably doesn't exist, but you can have some criteria choosing which is better for your use cases, and for such a judgment one should know are the different possible approaches and their characteristics. Probably the best source of information is the PEP 328 itself, which contains some rationale about declaring distinct possibilities for that.
A common approach is to use the "absolute import", in your case it would be something like:
from package_name.utilities import important
This way, you can make this file it a script. It is somewhat independent from other modules and packages, fixed mainly by its location. If you have a package structure and need to change one single module from its location, having absolute path would help this single file to be kept unchanged, but all the ones which uses this module it should change. Of course you can also import the __init__.py files as:
from package_name import utilities
And these imports have the same characteristics. Be careful that utilities.important try to find a variable important within __init__.py, not from important.py, so having a "import important" __init__.py would help avoiding a mistake due to the distinction between file structure and namespace structure.
Another way to do that is the relative approach, by using:
from ..utilities import important
The first dot (from .stuff import ___ or from . import ___) says "the module in this [sub]package", or __init__.py when there's only the dot. From the second dot we are talking about parent directories. Generally, starting with dots in any import isn't allowed in a script/executable, but you can read about explicit relative imports (PEP 366) if you care about scripts with relative imports.
A justification for relative import can be found on the PEP 328 itself:
With the shift to absolute imports, the question arose whether relative imports should be allowed at all. Several use cases were presented, the most important of which is being able to rearrange the structure of large packages without having to edit sub-packages. In addition, a module inside a package can't easily import itself without relative imports.
Either case, the modules are tied to the subpackages in the sense that package_name is imported first no matter which the user tried to import first, unless you use sys.path to search for subpackages as packages (i.e., use the package root inside sys.path)...but that sounds weird, why would one do that?
The __init__.py can auto-import module names, for that one should care about its namespace contents. For example, say important.py has an object called top_secret, which is a dictionary. To find it from anywhere you would need
from package_name.utilities.important import top_secret
Perhaps you want be less specific:
from package_name.utilities import top_secret
That would be done with an __init__.py with the following line inside it:
from .important import top_secret
That's perhaps mixing the relative and absolute imports, but for a __init__.py you probably know that subpackage makes sense as a subpackage, i.e., as an abstraction by itself. If it's just a bunch of files located in the same place with the need for a explicit module name, probably the __init__.py would be empty (or almost empty). But for avoiding explicit module names for the user, the same idea can be done on the root __init__.py, with
from .utilities import top_secret
Completely indirect, but the namespace gets flat this way while the files are nested for some internal organization. For example, the wx package (wxPython) do that: everything can be found from wx import ___ directly.
You can also use some metaprogramming for finding the contents if you want to follow this approach, for example, using __all__ to detect all names a module have, or looking for the file location to know which modules/subpackages are available there to import. However, some simpler code completion utilities might get lost when doing that.
For some contexts you might have other kind of constraints. For example, macropy makes some "magic" with imports and doesn't work on the file you call as a script, so you'll need at least 2 modules just to use this package.
Anyhow, you should always ask whether nesting into subpackages is really needed for you code or API organization. The PEP 20 tells us that "Flat is better than nested", which isn't a law but a point-of-view that suggests you should keep a flat package structure unless nesting is needed for some reason. Likewise, you don't need a module for each class nor anything alike.
Use absolute import in case you need to move to a different location.

Constants in a python package

I'm writing a python package and am wondering where the best place is to put constants?
I know you can create a file called 'constants.py' in the package and then call them with module.constants.const, but shouldn't there be a way to associate the constant with the whole module? e.g. you can call numpy.pi, how would I do something like that?
Also, where in the module is the best place to put paths to directories outside of the module where I want to read/write files?
Put them where you feel they can most easily be maintained. Usually that means in the module to which the constants logically belong.
You can always import the constants into the __init__.py file of your package to make it easier for someone to find them. If you did decide on a constants module, I'd add a __all__ sequence to state what values are public, then in the __init__.py file do:
from constants import *
to make the same names available at the package level.

Python preprocessing imports

I am managing a quite large python code base (>2000 lines) that I want anyway to be available as a single runnable python script. So I am searching for a method or a tool to merge a development folder, made of different python files into a single running script.
The thing/method I am searching for should take code split into different files, maybe with a starting __init___.py file that contains the imports and merge it into a single, big script.
Much like a preprocessor. Best if a near-native way, better if I can anyway run from the dev folder.
I have already checked out pypp and pypreprocessor but they don't seem to take the point.
Something like a strange use of __import__() or maybe a bunch of from foo import * replaced by the preprocessor with the code? Obviously I only want to merge my directory and not common libraries.
Update
What I want is exactly mantaining the code as a package, and then being able to "compile" it into a single script, easy to copy-paste, distribute and reuse.
It sounds like you're asking how to merge your codebase into a single 2000-plus source file-- are you really, really sure you want to do this? It will make your code harder to maintain. Python files correspond to modules, so unless your main script does from modname import * for all its parts, you'll lose the module structure by converting it into one file.
What I would recommend is leaving the source structured as they are, and solving the problem of how to distribute the program:
You could use PyInstaller, py2exe or something similar to generate a single executable that doesn't even need a python installation. (If you can count on python being present, see #Sebastian's comment below.)
If you want to distribute your code base for use by other python programs, you should definitely start by structuring it as a package, so it can be loaded with a single import.
To distribute a lot of python source files easily, you can package everything into a zip archive or an "egg" (which is actually a zip archive with special housekeeping info). Python can import modules directly from a zip or egg archive.
waffles seems to do exactly what you're after, although I've not tried it
You could probably do this manually, something like:
# file1.py
from .file2 import func1, func2
def something():
func1() + func2()
# file2.py
def func1(): pass
def func2(): pass
# __init__.py
from .file1 import something
if __name__ == "__main__":
something()
Then you can concatenate all the files together, removing any line starting with from ., and.. it might work.
That said, an executable egg or regular PyPI distribution would be much simpler and more reliable!

Categories