Keeping the API updated in Sphinx

Keeping the API updated in Sphinx - python

The scheme is the following. There exists a package called foo (an API under heavy development, in first alpha phases) whose rst files are auto generated with sphinx-apidoc.
For the sake of having a better documentation for foo after those files are generated, there is some editing. In, say, foo.bar.rst there are some paragraphs added to the contents generated with sphinx-apidoc
How can I not loose all that information when a new call of sphinx-apidoc is made? And of course I want potential changes in the API to be reflected, with that manual information added being preserved.

sphinx-apidoc only needs to be re-run when the module structure of your project changes. If adding, removing, and renaming modules is an uncommon occurrence for you, it may be easiest to just place the rst files under version control and update them by hand. Adding or removing a module only requires changing a few lines of rst, so you don't even need to use sphinx-apidoc once you've run it once.

Related

Structuring ocean modelling code in python

I am starting to use python for numerical simulation, and in particular I am starting from this project to build mine, that will be more complicated than this since I will have to try a lot of different methods and configurations. I work full time on Fortran90 codes and Matlab codes, and those are the two languages I am "mother tongue". In those two languages one is free to structure the code as he wants, and I am trying to mimic this feature because in my field (computation oceanography) things gets rather complicated easily. See as an example the code I daily work with, NEMO (here the main page, here the source code). The source code (of NEMO) is conveniently divided in folders, each of which contains modules and methods for a specific task (e.g. the domain discretisation routines are in folder DOM, the vertical physics is in the folder ZDF, the lateral physics in LDF and so on), this because the processes (physical or purely mathematical) involved are completely different.
What I am trying to build is this
/shallow_water_model
-
create_conf.py (creates a new subdirectory in /cfgs with a given name, like "caspian_sea" or "mediterranean_sea" and copies the content of the folder /src inside this new subdirectory to create a new configuration)
/cfgs
-
/caspian_sea (example configuration)
/mediterranean_sea (example configuration)
/src
-
swm_main.py (initialize a dictionary and calls the functions)
swm_param.py (fills the dictionary)
/domain
-
swm_grid.py (creates a numerical grid)
/dynamics
-
swm_adv.py (create advection matrix)
swm_dif.py (create diffusion matrix)
/solver
-
swm_rk4.py (time stepping with Runge-Kutta4)
swm_pc.py (time stepping with predictor corrector)
/IO
-
swm_input.py (handles netCDF input)
sim_output.py (handles netCDF output)
The script create_conf.py contains the following structure, and it is supposed to take a string input from the terminal, create a folder with that name and copy all the files and subdirectories of /src folder inside, so one can put there all the input files of this configuration and eventually modify the source code to create an ad-hoc source code for the configuration. This duplication of the source code is common in the ocean modelling community because two different configuration (like the Mediterranean Sea and the Caspian Sea) may differ not only in the input files (like topography, coastlines etc etc) but also in the modelling itself, meaning that the modification you need to make to the source code for each configuration might be substantial. (Most ocean models allow you to put your own modified source files in specific folders and they are instructed to overwrite the specific files at compilation. My code is going to be simple enough to just duplicate the source code.)
import os, sys
import shutil
def create_conf(conf_name="new_config"):
cfg_dir = os.getcwd() + "/cfgs/"
# Check if configuration exists
try:
os.makedirs(cfg_dir + conf_name)
print("Configuration " + conf_name + " correctly created")
except FileExistsError:
# directory already exists
# Handles overwriting, duplicates or stop
# make a copy of "/src" into the new folder
return
# This is supposed to be used directly from the terminal
if __name__ == '__main__':
filename = sys.argv[1]
create_conf(filename)
The script swm_main.py can be thought as a list of calls to the necessary routines depending on the kind of process you want to take into account, just like
import numpy as np
from DOM.swm_domain import set_grid
from swm_param import set_param, set_timestep, set_viscosity
# initialize dictionary (i.e. structure) containing all the parameters of the run
global param
param = dict()
# define the parameters (i.e. call swm_param.py)
set_param(param)
# Create the grid
set_grid(param)
The two routines called just take a particular field of param and assign it a value, like
import numpy as np
import os
def set_param(param):
param['nx'] = 32 # number of grid points in x-direction
param['ny'] = 32 # number of grid points in y-direction
return param
Now, the main topic of discussion is how to achieve this kind of structure in python. I almost always find source codes that are either monolithic (all routines in the same file) or a sequence of files in the same folders. I want to have some better organisation, but the solution I found browsing fills every subfolder in /src with a folder __pycache__ and I need to put a __init__.py file in each folder. I don't know why but these two things make me think there is something sloppy in this approach. Moreover, I need to import modules (like numpy) in every file, and I was wondering whether this was efficient or not.
What do you think would be better to keep this structuring and keep it as simple as possible?
Thanks for your help

As I understand the actual question here is:
the solution I found browsing fills every subfolder in /src with a folder __pycache__ and I need to put a __init__.py file in each folder... this makes me think there is something sloppy in this approach.
There is nothing sloppy or unpythonic about making your code into packages. In order to be able to import from .py files in a directory, one of two conditions has to be satisfied:
the directory must be in your sys.path, or
the directory must be a package, and that package must be a sub-directory of some directory in your sys.path (or a sub-directory of a package which is a sub-directory of some directory in your sys.path)
The first solution is generally hacky in code, although often appropriate in tests, and involves modifying sys.path to add every dir you want. This is generally hacky because the whole point of putting your code inside a package is that the package structure encodes some natural division in the source: e.g. a package modeller is conceptually distinct from a package quickgui, and each could be used independently of each other in different programs.
The easiest[1] way to make a directory into a package is to place an __init__.py in it. The file should contain anything which belongs conceptually at the package level, i.e. not in modules. It may be appropriate to leave it empty, but it's often a good idea to import the public functions/classes/vars from your modules, so you can do from mypkg import thing rather than from mypkg.module import thing. Packages should be conceptually complete, which normally means you should be able (in theory) to use them from multiple places. Sometimes you don't want a separate package: you just want a naming convention, like gui_tools.py gui_constants.py, model_tools.py, model_constants.py, etc. The __pycache__ folder is simply python caching the bytecode to make future imports faster: you can move that or prevent it, but just add *__pycache__* to your .gitignore and forget about them.
Lastly, since you come from very different languages:
lots of python code written by scientists (rather than programmers) is quite unpythonic IMHO. Billion line long single python files is not good style[2]. Python prefers readability, always: call things derived_model not dm1. If you do that you may well find you don't need as many dirs as you thought.
importing the same module in every file is a trivial cost: python imports once: every other import is just another name bound in sys.modules. Always import explicitly.
in general stop worrying about performance in python. Write your code as clearly as possible, then profile it if you need to, and find what is slow. Python is so high level that micro-optimisations learned in compiled languages will probably backfire.
lastly, and this is mostly personal, don't give folders/modules names in CAPITALS. FORTRAN might encourage that, and it was written on machines which often didn't have case sensitivity for filenames, but we no longer have those constraints. In python we reserve capitals for constants, so I find it plain weird when I have to modify or execute something in capitals. Likewise 'DOM' made me think of the document object model which is probably not what you mean here.
References
[1] Python does have implicit namespace packages but you are still better off with explicit packages to signal your intention to make a package (and to avoid various importing problems).
[2] See pep8 for some more conventions on how you structure things. I would also recommend looking at some decent general-purpose libraries to see how they do things: they tend to be written by mainstream programmers who focus on writing clean, maintainable code, rather than by scientists who focus on solving highly specific (and frequently very complicated) problems.

How to add pre and post-process actions to SCons build?

I'm trying to add pre and post-process actions when building a project with SCons.
The SConstruct and SConscript files are at the top of the project.
Pre-process actions:
Generating code(by calling different tools):
-> without knowing the exact files that will be generated after this pre-process (additional pre-process for deciding which files were generated can be created in order to feed SCons with them)
-> running external scripts(python, pearl scripts), executed before compilation
Post-process actions:
->running external tools, running external scripts that should be executed after linking
What I tried until now:
For pre-process:
To use os.system from python in order to run a cmd. ( works fine but I'm looking for a "SCons solution" )
To use AddPreAction(target, action) function from SCons. Unfortunately this function is executed after compiling the project as the SCons user manual states: "The specified pre_action would be executed before scons calls the link command that actually generates
the executable program binary foo, not before compiling the foo.c file into an object file."
For post-process:
To use AddPostAction(target, action) and this works fine, fortunately.
I'm looking for solutions that will make SCons somehow aware of this pre and post processes.
My question is the following:
What is the best approach, for the requirements stated above, using SCons ? Is there a way to execute pre-process actions before compilation using SCons built-in functions ?

You don't give very much detail about what you've tried to get your pre-processing part working. In general, you should try to create real Builders for the Code generation part...this will make the detection and handling of dependencies easier for SCons (and for you as the user ;) ). You may want to check out our Wiki at https://bitbucket.org/scons/scons/wiki/ToolsForFools , where we explain in large detail how to write new Builders.
If you need to run additional scripts on every build, you should be able to trigger these fine with the os.system() or an appropriate subprocess call right at the start of your top-level SConstruct for example. But what I get from your latest edit, and I'll refer mainly to the first of the questions you asked, is that you're trying to model some sort of "staged" build process. You think you need a "preprocess" stage, where you can hook into and create all the additional headers and sources you might need, by calling your scripts. My guess is, that you're trying to rewrite something like an original make/autotools setup and would like to reuse parts wherever possible, which isn't a bad idea of course. But SCons isn't stage-driven, it's dependency-driven...so your current approach is a bad fit and might lead to problems sooner or later.
The best thing you can do, is to forget Pre- and PostActions and get your dependencies straight. In addition to writing your own Builder(s) to replace your scripts, you'd have to implement a proper Emitter for each of these Builders. This Emitter (check the Tools guide mentioned above) would have to parse your input file that goes into the script, and return the list of filenames that will be generated when the script gets actually run. Like this, SCons will then know a priori which files get generated once the build script is run, and can use these names for resolving dependencies already (even if the actual files don't exist yet).
For the post-processing part: this is usually handled by using the standard Python atexit handler. See e.g. How do I run some code after every build in scons? for an example.

Redefine class/model of python module

I want to use django-achievements (link) module in my app, but it lack some fields in it's model. For example, I want to add CharField to it with path to picture of the badge/achievement. Also I will need to modify module's engine.py file for that.
What is the right way to do that? Download that module to my main app' folder and modify original files, or i can somehow redefine some methods/classes of original models.py and engine.py locally without modifing original files?

I'd say fork it and make your own modifications directly to the source. If it's an improvement you can create a Pull Request and contribute your code to the actual repository (not required, you can always just keep it for your own use).

How to implement a settings file in a Python program

A "settings file" would be a file where things like "background color", "speed of execution", "number of x's" are defined. Currently, I implemented it as a single setting.py file, which I import in the beginning. Someone told me I should make it a settings.ini file instead, but I don't see why! Care to clarify, what is the optimal option?

There is no optimal solution; it is a matter of preference.*
Normally, settings do not need to be expressed in a Turing-complete language: they're often just a bunch of flags and options, sometimes strings and numbers, etc. An argument for having a settings.py file (though very unorthodox) would be if the end-user was expected to write code to generate very esoteric configurations (e.g. maps for a game). This would then be fairly similar to shell script .bashrc-style files.
But again, in 99.9% of programs, the settings are often just a bunch of flags and options, sometimes strings and numbers, etc. It's fine to store them as JSON or XML. It also makes it easy to perform reflection on your settings: for example, automatically listing them in a tree manner, or automatically creating a GUI out of the descriptions.
(Also it may be a (unlikely?) security issue if you allow people to inject code by modifying the settings file.)
&ast;edit: no pun intended...

There are a few reasons why separating out config files from main codebase is a good idea. Of course it depends on your use case and you should evaluate against your usecase.
Configuration can be managed by end user, who do not understand programming languages. It makes more sense to factor out configuration and use a simple ini file which uses simple key-value pairs for config parameters.
Configuration varies based on the installation environment. Your code runs on multiple environment and they all use different configuration. It is very easy to maintain such cases by having separate config files and same source code installed on those environments.
There are package managers that knows what is a config file and what is a source file. They are intelligent to not override any changed config on version upgrade etc. So you do not have to worry about resetting config parameters after version upgrade of package. For example you ship your product with a default config file. User fine tuned few parameters. You shipped another version of the package. User should not expect a config reset after version upgrade.

One problem with having a settings file being a Python module is that it can contain code that will be executed when you import it. This may allow malicious code to be inserted into your program.

For Python use stock libraries:
YAML style configuration files:
http://www.yaml.org/start.html
http://pypi.python.org/pypi/PyYAML/
(Used e.g. Google App Engine)
INI: http://docs.python.org/library/configparser.html
Don't use XML for hand-edited config files.

How do I use multiple .mo files simultaneously for gettext translation?

In short, can I use in python many .mo files for the same language in the same time?
In my python application, I need to use gettext for I18N. This app uses kind of a plug-in system. This means you can download a plug-in and put it in the appropriate directory, and it runs like any other python package. The main application stores the .mo file it uses in let's say ./locale/en/LC_MESSAGES/main.mo. And the plug-in nr 1 has its own .mo file called plugin1.mo in that same directory.
I would use this to load the main.mo I18N messages:
gettext.install('main', './locale', unicode=False)
How can I install the others too, so that all the plug-ins are translated the way they should be?
The solutions I thought of:
Should I gettext.install() in each package's namespace? But this would override the _() defined previously and mess the future translations of the main application.
Is there a way to combine two .mo files in one (when a new plug-in is installed for example)?
At runtime can I combine them into one GNUTranslation object? Or override the default _() method that is added to the global namespace? Then, how would I go with that option?
Instead of _('Hello World'), I would use _('plugin1', 'Hello World in plug-in 1')
Note: The application is not supposed to be aware of all the plug-ins to be installed, so it cannot already have all the messages translated in its main.mo file.

gettext.install() installs the inalterable one and only _ into the builtins (module __builtin__ or builtins in py3) - app-global. This way there is no flexibility.
Note: Python name resolution order is: locals > module-globals > builtins .
So anyway gettext.translation() (Class-based API) or even gettext.GNUTranslations() (e.g. for custom .mo path schemes) would be used explicitely for having multiple translations separately or mixed-style at the same time.
Some options:
Via t = gettext.translation(...); _ = t.ugettext you could simply put separate translations as _ into each module-global namespace - probably more automated way in real world. Just perhaps the main translation could still go into the builtins too (via main_t.install()).
When a mix-all of all/many translations is ok or is what you want, you could chain several translations globally via t.install(); t.add_fallback(t_plugin1); t.add_fallback(t_plugin1);...; - and preserve an app-global approach otherwise.
gettext keywords other than _ could be used - and could be feed via the xgettext -k other_keyword option. But I'd dislike lengthy and module-unique names.
( Yet personally I prefer the keyword I generally over _, and I also enable an operator scheme like I % "some text" instead of _("some text") or I("some text"). Via I = t; t.__call__ = t.__mod__ = t.ugettext effectively; plus a small patch of pygettext.py. This pattern is more pleasant for typing, looks more readable and Pythonic to me, and avoids the crucial/ugly name collision of _ in Python with the interactive-last-result-anaphor (see sys.displayhook) when using gettext'ed modules on interactive Python prompts. _ is also "reserved" as my preferred (local) placeholder for unused values in expressions like _, _, x, y, z, _ = some_tuple.
After all gettext is a rather simple module & mechanism, and things are easily customizable in Python.)

You should use different domains for each plugin. The domain can be package name to prevent conflicts.
I do not understand why you need to translate something outside the plugin by using plugin's domain, but if you really need to, then you should disambiguate the domain each time.
Each plugin can provide it's own "undescore", readily bound to the plugin's domain:
from my.plugin import MessageFactory as _my_plugin
Please, note, that underscore is only a convention so the extraction tools can find i18n-enabled messages in the program. Plugins' messages should be marked with underscore in their respective packages (you do put them into separate packages, right?). In all other places, you are free to call these factories by some other name and leave underscore for the main program translation domain.
I am less sure about .mo-files, but you can surely compile all your .po files into one .mo file. However, if plugins are written by independent uncooperative authors, there could be msgid conflicts.
UPDATE:
If plugins are in the same package with the main app, then there is no sense in using individual translation domains for them (this is not your case). If plugins are in the separate packages, then extraction should be done independently for those packages. In both cases you have no problem with variable _. If for some reason main app wants plugins' translations in its code, use some other name for _, as in the answer. Of course, extraction tools will not identify anything but underscore.
In other words, plugins should care about their translations on their own. The main app could use plugin-specific translation function as part of plug-in API. Extraction or manual addition of strings into po/mo-files are also not of concern for the main app: its up to plugin author to provide translations.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.