How to document config files? - python

Are there any best-practices for config-file documentation, especially for python?
Particularly in scientific computing, it is common to use a config file as the input to control a batch processing job (such as a simulation), and expect the user to customise a substantial portion of the config for their scenario. (The config also likely selects among different processing modules, each possessing different suites of config fields.) Thus, the user ought to know: what each setting means or effects; which settings are unused (in which circumstances); what are the default values (and the permissible values or ranges); etc.
I've found incomplete config file docs to be common. The fundamental problem seems to be that if the docs are maintained separately from the code, they grow out of sync. (This seems less of a problem with API docs due to standard practices involving colocated docstrings and autogeneration from function signatures/argspec.) For example if the standard python configparser is used once to parse the config file, then the code for accessing individual attributes (and implicitly determining the config schema) may still be spread out across the entire code base (and perhaps only available at runtime rather than when building docs).
Further thoughts:
Is it bad practice to replace a config file (yaml or similar) with a user-customised python script (so as to only need API docs)?
Distribution of a well commented example config file (that is also used in automatic tests): how to maintain if different scenarios duplicate large sections but need some completely different fields?
Can a single schema be maintained, both for use in code (to help parse, validate, and set defaults) and to generate docs somehow?
Is there a human readable/writeable way of (des)serialising the state of some (sub)class instance that represents a new batch process (so that config is covered by existing docs)?

Personally, I like to use the argparse module for configuration, and read the default value for each setting from an environment variable. That centralizes the settings and documentation in one place, and allows the user to either tweak settings on the command line or set and forget them in environment variables. Be careful about putting passwords on the command line, though, because other users can probably see your command line arguments in the process list.
Here's an example that uses argparse and environment variables:
def parse_args(argv=None):
parser = ArgumentParser(description='Watch the raw data folder for new runs.',
formatter_class=ArgumentDefaultsHelpFormatter)
parser.add_argument(
'--kive_server',
default=os.environ.get('MICALL_KIVE_SERVER', 'http://localhost:8000'),
help='server to send runs to')
parser.add_argument(
'--kive_user',
default=os.environ.get('MICALL_KIVE_USER', 'kive'),
help='user name for Kive server')
parser.add_argument(
'--kive_password',
default=SUPPRESS,
help='password for Kive server (default not shown)')
args = parser.parse_args(argv)
if not hasattr(args, 'kive_password'):
args.kive_password = os.environ.get('MICALL_KIVE_PASSWORD', 'kive')
return args
Setting those environment variables can be a bit confusing, particularly for system services. If you're using systemd, look at the service unit, and be careful to use EnvironmentFile instead of Environment for any secrets. Environment values can be viewed by any user with systemctl show.
I usually make the default values useful for a developer running on their workstation, so they can start development without changing any configuration.
Another option is to put the configuration settings in a settings.py file, and just be careful not to commit that file to source control. I have often committed a settings_template.py file that users can copy.
If your settings are so complicated/flexible that environment variables or a settings file get messy, then I would convert the project to a library with an API. Instead of settings, users then write a script that calls your API. You don't have to go through the effort of hosting your library on PyPI, either. pip can install from a GitHub repository, for example.

Related

Would allowing user input to python's __import__ be a security risk?

Say I create a simple web server using Flask, and allowing people to query certain things that I modulated in different python files, using the __import__ statement, would doing this with user supplied information be considered a security risk?
Example:
from flask import Flask
app = Flask(__name__)
#app.route("/<author>/<book>/<chapter>")
def index(author, book, chapter):
return getattr(__import__(author), book)(chapter)
# OR
return getattr(__import__("books." + author), book)(chapter)
I've seen a case like this recently when reviewing code, however it didn't feel right to me.
It is entirely insecure, and your system is wide open to attack. Your first return line doesn't limit what kind of names can be imported, which means the user can execute any arbitrary callable in any importable Python module.
That includes:
/pickle/loads/<url-encoded pickle data>
A pickle is a stack language that lets you execute arbitrary Python code, and the attacker can take full control of your server.
Even a prefixed __import__ would be insecure if an attacker can also place a file on your file system in the PYTHONPATH; all they need is a books directory earlier in the path. They can then use this route to have the file executed in your Flask process, again letting them take full control.
I would not use __import__ at all here. Just import those modules at the start and use a dictionary mapping author to the already imported module. You can use __import__ still to discover those modules on start-up, but you now remove the option to load arbitrary code from the filesystem.
Allowing untrusted data to direct calling arbitrary objects in modules should also be avoided (including getattr()). Again, an attacker that has limited access to the system could exploit this path to widen the crack considerably. Always limit the input to a whitelist of possible options (like the modules you loaded at the start, and per module, what objects can actually be called within).
More than being a security risk, it is a bad idea e.g. I could easily crash your web app by visiting the url:
/sys/exit/anything
translating to:
...
getattr(__import__('sys'), 'exit')('anything')
Don't give the possibility to import/execute just about anything to your users. Restrict the possibilities by using say a dictionary of permissible imports, as #MartijnPieters as clearly pointed out.

Config file with a .py file

I have been told that doing this would be a not-very-good practice (it is present in the main answer of Python pattern for sharing configuration throughout application though):
configfile.py
SOUNDENABLED = 1
FILEPATH = 'D:\\TEMP\\hello.txt'
main.py
import configfile
if configfile.SOUNDENABLED == 1:
....
f = open(configfile.FILEPATH, 'a')
...
This is confirmed by the fact that many people use INI files for local configuration with ConfigParser module or iniparse or other similar modules.
Question: Why would using an INI file for local configuration + an INI parser Python module be better than just importing a configfile.py file containing the right config values as constants?
The only concern here is that a .py can have arbitrary Python code, so it has a potential to break your program in arbitrary ways.
If you can trust your users to use it responsibly, there's nothing wrong with this setup. If fact, at one of my previous occupations, we were doing just that, without any problems that I'm aware of. Just on the contrary: it allowed users to eliminate duplication by autogenerating repetitive parts and importing other config files.
Another concern is if you have many files, the configuration ones are better be separated from regular code ones, so users know which files they are supposed to be able to edit (the above link addresses this, too).
Importing a module executes any code that it contains. Nothing restricts your configfile.py to containing only definitions. Down the line, this is a recipe for security concerns and obscure errors. Also, you are bound to Python's module search path for finding the configuration file. What if you want to place the configuration file somewhere else, or if there is a name conflict?
This is a perfectly acceptable practice. Some examples of well-known projects using this method are Django and gunicorn.
It could be better for some reasons
The only extension that config file could have is py.
You cannot distribute your program with configs in separate directory unless you put an __init__.py into this directory
Nasty users of your program can put any python script in config and do bad things.
For example, the YouCompleteMe autocompletion engine stores config in python module, .ycm_extra_conf.py. By default, each time config is imported, it asks you, whether you sure that the file is safe to be executed.
How would you change configuration without restarting your app?
Generally, allowing execution of code that came from somewhere outside is a vulnerability, that could lead to very serious consequences.
However, if you don't care about these, for example, you are developing web application that executes only on your server, this is an acceptable practice to put configuration into python module. Django does so.

Is there a way to share common configuration using Paste Deploy?

This is a pretty simple idea conceptually. In terms of specifics, I'm working with a pretty generic Kotti installation, where I'm customizing some pages / templates.
Much of my configuration is shared between a production and development server. Currently, these settings are included in two separate ini files. It would be nice to DRY this configuration, with common settings in one place.
I'm quite open to this happening in python or an an ini file / section (or perhaps a third, yet-to-be-considered place). I think it's equivalent to use a [DEFAULT] section, or pass dictionaries to loadapp via global_conf, but that seems to be processed in a squirrelly way. For example, Kotti thinks I've properly set sqlalchemy.url, but sqlalchemy iteself fails on url = opts.pop('url'). Moreover, since Kotti defines some default settings, Paste doesn't end up searching for them in the global_conf (e.g., kotti_configurators).
I don't like the idea of passing in a large dict for %(var)s substitution, as it requires effectively renaming all of the shared variables.
In my initial experiments with Paste Deploy, it demands a "main" section in each ini file. So, I don't think I can just use a use = config:shared.ini line. But that's conceptually close to what I'd like to do.
Is there a way to explicitly (and properly) include settings from DEFAULT or global_conf? Or perhaps do this programmatically with python on the results of loadapp?
For example, Kotti thinks I've properly set sqlalchemy.url, but sqlalchemy iteself fails on url = opts.pop('url').
If you think something is odd and you're asking on SO it'd be wise to show a stacktrace and an example of what you tried to do.
Kotti gets its settings the same as any pyramid app. Your entry point (def main(global_conf, **settings) usually) is passed the global_conf and settings dictionary. You're then responsible for fixing that up and shuffling it off into the configurator.
For various reasons PasteDeploy keeps the global_conf separate from the settings, thus you're responsible for merging them if you wish.
[DEFAULT]
foo = bar
[app:main]
use = egg:myapp#main
baz = xyz
def main(global_conf, **app_settings):
settings = global_conf
settings.update(app_settings)
config = Configurator(settings=settings)
config.include('kotti')
return config.make_wsgi_app()

How to implement a settings file in a Python program

A "settings file" would be a file where things like "background color", "speed of execution", "number of x's" are defined. Currently, I implemented it as a single setting.py file, which I import in the beginning. Someone told me I should make it a settings.ini file instead, but I don't see why! Care to clarify, what is the optimal option?
There is no optimal solution; it is a matter of preference.*
Normally, settings do not need to be expressed in a Turing-complete language: they're often just a bunch of flags and options, sometimes strings and numbers, etc. An argument for having a settings.py file (though very unorthodox) would be if the end-user was expected to write code to generate very esoteric configurations (e.g. maps for a game). This would then be fairly similar to shell script .bashrc-style files.
But again, in 99.9% of programs, the settings are often just a bunch of flags and options, sometimes strings and numbers, etc. It's fine to store them as JSON or XML. It also makes it easy to perform reflection on your settings: for example, automatically listing them in a tree manner, or automatically creating a GUI out of the descriptions.
(Also it may be a (unlikely?) security issue if you allow people to inject code by modifying the settings file.)
&ast;edit: no pun intended...
There are a few reasons why separating out config files from main codebase is a good idea. Of course it depends on your use case and you should evaluate against your usecase.
Configuration can be managed by end user, who do not understand programming languages. It makes more sense to factor out configuration and use a simple ini file which uses simple key-value pairs for config parameters.
Configuration varies based on the installation environment. Your code runs on multiple environment and they all use different configuration. It is very easy to maintain such cases by having separate config files and same source code installed on those environments.
There are package managers that knows what is a config file and what is a source file. They are intelligent to not override any changed config on version upgrade etc. So you do not have to worry about resetting config parameters after version upgrade of package. For example you ship your product with a default config file. User fine tuned few parameters. You shipped another version of the package. User should not expect a config reset after version upgrade.
One problem with having a settings file being a Python module is that it can contain code that will be executed when you import it. This may allow malicious code to be inserted into your program.
For Python use stock libraries:
YAML style configuration files:
http://www.yaml.org/start.html
http://pypi.python.org/pypi/PyYAML/
(Used e.g. Google App Engine)
INI: http://docs.python.org/library/configparser.html
Don't use XML for hand-edited config files.

python single configuration file

I am developing a project that requires a single configuration file whose data is used by multiple modules.
My question is: what is the common approach to that? should i read the configuration file from each
of my modules (files) or is there any other way to do it?
I was thinking to have a module named config.py that reads the configuration files and whenever I need a config I do import config and then do something like config.data['teamsdir'] get the 'teamsdir' property (for example).
response: opted for the conf.py approach then since it it is modular, flexible and simple
I can just put the configuration data directly in the file, latter if i want to read from a json file a xml file or multiple sources i just change the conf.py and make sure the data is accessed the same way.
accepted answer: chose "Alex Martelli" response because it was the most complete. voted up other answers because they where good and useful too.
I like the approach of a single config.py module whose body (when first imported) parses one or more configuration-data files and sets its own "global variables" appropriately -- though I'd favor config.teamdata over the round-about config.data['teamdata'] approach.
This assumes configuration settings are read-only once loaded (except maybe in unit-testing scenarios, where the test-code will be doing its own artificial setting of config variables to properly exercise the code-under-test) -- it basically exploits the nature of a module as the simplest Pythonic form of "singleton" (when you don't need subclassing or other features supported only by classes and not by modules, of course).
"One or more" configuration files (e.g. first one somewhere in /etc for general default settings, then one under /usr/local for site-specific overrides thereof, then again possibly one in the user's home directory for user specific settings) is a common and useful pattern.
The approach you describe is ok. If you want to add support for user config files, you can use execfile(os.path.expanduser("~/.yourprogram/config.py")).
One nice approach is to parse the config file(s) into a Python object when the application starts and pass this object around to all classes and modules requiring access to the configuration.
This may save a lot of time parsing the config.
If you want to share your config across different machines, you could perhaps put it on a web server and do import like this:
import urllib2
confstr = urllib2.urlopen("http://yourhost/config.py").read()
exec(confstr)
And if you want to share it across different languages, perhaps you can use JSON to encode and parse the configuration:
import urllib2
import simplejson
confstr = urllib2.urlopen("http://yourhost/config.py").read()
config = simplejson.loads(confstr)

Categories