Config File Preprocessor for Python

Config File Preprocessor for Python - python

I would like to virtualize configuration so my classes don't know anything about how setting arrive. Normally I pass classes a dictionary with everything needed to get up and running. This approach has been helpful in production and testing as the test cases aren't tied to specific data files and the code is unaware of the environment it is running in.
The problem arises when the configuration data gets more extensive; for instance the files grow large or information needs to be passed to subcomponents that also have complex configurations. In these cases a single configuration file no longer makes sense.
It would be nice to add directives to the config files that tell a preprocessor where to pull additional setting from. For instance the following json:
{
"a" : "plain old json string",
"b" : ${LoadFromJsonFile(<file path>)},
"c" : ${LoadCsvAsListOfDict(<file path>)}
{
Is there a library for this or is this something I should write on my own or is this idea problematic?
btw - I looked at the related stackexchange questions and didn't see anything similar and/or specific to python. There was one suggest to use XML with Ant but I would prefer to stick with a python environment.

Related

How to document config files?

Are there any best-practices for config-file documentation, especially for python?
Particularly in scientific computing, it is common to use a config file as the input to control a batch processing job (such as a simulation), and expect the user to customise a substantial portion of the config for their scenario. (The config also likely selects among different processing modules, each possessing different suites of config fields.) Thus, the user ought to know: what each setting means or effects; which settings are unused (in which circumstances); what are the default values (and the permissible values or ranges); etc.
I've found incomplete config file docs to be common. The fundamental problem seems to be that if the docs are maintained separately from the code, they grow out of sync. (This seems less of a problem with API docs due to standard practices involving colocated docstrings and autogeneration from function signatures/argspec.) For example if the standard python configparser is used once to parse the config file, then the code for accessing individual attributes (and implicitly determining the config schema) may still be spread out across the entire code base (and perhaps only available at runtime rather than when building docs).
Further thoughts:
Is it bad practice to replace a config file (yaml or similar) with a user-customised python script (so as to only need API docs)?
Distribution of a well commented example config file (that is also used in automatic tests): how to maintain if different scenarios duplicate large sections but need some completely different fields?
Can a single schema be maintained, both for use in code (to help parse, validate, and set defaults) and to generate docs somehow?
Is there a human readable/writeable way of (des)serialising the state of some (sub)class instance that represents a new batch process (so that config is covered by existing docs)?

Personally, I like to use the argparse module for configuration, and read the default value for each setting from an environment variable. That centralizes the settings and documentation in one place, and allows the user to either tweak settings on the command line or set and forget them in environment variables. Be careful about putting passwords on the command line, though, because other users can probably see your command line arguments in the process list.
Here's an example that uses argparse and environment variables:
def parse_args(argv=None):
parser = ArgumentParser(description='Watch the raw data folder for new runs.',
formatter_class=ArgumentDefaultsHelpFormatter)
parser.add_argument(
'--kive_server',
default=os.environ.get('MICALL_KIVE_SERVER', 'http://localhost:8000'),
help='server to send runs to')
parser.add_argument(
'--kive_user',
default=os.environ.get('MICALL_KIVE_USER', 'kive'),
help='user name for Kive server')
parser.add_argument(
'--kive_password',
default=SUPPRESS,
help='password for Kive server (default not shown)')
args = parser.parse_args(argv)
if not hasattr(args, 'kive_password'):
args.kive_password = os.environ.get('MICALL_KIVE_PASSWORD', 'kive')
return args
Setting those environment variables can be a bit confusing, particularly for system services. If you're using systemd, look at the service unit, and be careful to use EnvironmentFile instead of Environment for any secrets. Environment values can be viewed by any user with systemctl show.
I usually make the default values useful for a developer running on their workstation, so they can start development without changing any configuration.
Another option is to put the configuration settings in a settings.py file, and just be careful not to commit that file to source control. I have often committed a settings_template.py file that users can copy.
If your settings are so complicated/flexible that environment variables or a settings file get messy, then I would convert the project to a library with an API. Instead of settings, users then write a script that calls your API. You don't have to go through the effort of hosting your library on PyPI, either. pip can install from a GitHub repository, for example.

Config file with a .py file

I have been told that doing this would be a not-very-good practice (it is present in the main answer of Python pattern for sharing configuration throughout application though):
configfile.py
SOUNDENABLED = 1
FILEPATH = 'D:\\TEMP\\hello.txt'
main.py
import configfile
if configfile.SOUNDENABLED == 1:
....
f = open(configfile.FILEPATH, 'a')
...
This is confirmed by the fact that many people use INI files for local configuration with ConfigParser module or iniparse or other similar modules.
Question: Why would using an INI file for local configuration + an INI parser Python module be better than just importing a configfile.py file containing the right config values as constants?

The only concern here is that a .py can have arbitrary Python code, so it has a potential to break your program in arbitrary ways.
If you can trust your users to use it responsibly, there's nothing wrong with this setup. If fact, at one of my previous occupations, we were doing just that, without any problems that I'm aware of. Just on the contrary: it allowed users to eliminate duplication by autogenerating repetitive parts and importing other config files.
Another concern is if you have many files, the configuration ones are better be separated from regular code ones, so users know which files they are supposed to be able to edit (the above link addresses this, too).

Importing a module executes any code that it contains. Nothing restricts your configfile.py to containing only definitions. Down the line, this is a recipe for security concerns and obscure errors. Also, you are bound to Python's module search path for finding the configuration file. What if you want to place the configuration file somewhere else, or if there is a name conflict?

This is a perfectly acceptable practice. Some examples of well-known projects using this method are Django and gunicorn.

It could be better for some reasons
The only extension that config file could have is py.
You cannot distribute your program with configs in separate directory unless you put an __init__.py into this directory
Nasty users of your program can put any python script in config and do bad things.
For example, the YouCompleteMe autocompletion engine stores config in python module, .ycm_extra_conf.py. By default, each time config is imported, it asks you, whether you sure that the file is safe to be executed.
How would you change configuration without restarting your app?
Generally, allowing execution of code that came from somewhere outside is a vulnerability, that could lead to very serious consequences.
However, if you don't care about these, for example, you are developing web application that executes only on your server, this is an acceptable practice to put configuration into python module. Django does so.

How to implement a settings file in a Python program

A "settings file" would be a file where things like "background color", "speed of execution", "number of x's" are defined. Currently, I implemented it as a single setting.py file, which I import in the beginning. Someone told me I should make it a settings.ini file instead, but I don't see why! Care to clarify, what is the optimal option?

There is no optimal solution; it is a matter of preference.*
Normally, settings do not need to be expressed in a Turing-complete language: they're often just a bunch of flags and options, sometimes strings and numbers, etc. An argument for having a settings.py file (though very unorthodox) would be if the end-user was expected to write code to generate very esoteric configurations (e.g. maps for a game). This would then be fairly similar to shell script .bashrc-style files.
But again, in 99.9% of programs, the settings are often just a bunch of flags and options, sometimes strings and numbers, etc. It's fine to store them as JSON or XML. It also makes it easy to perform reflection on your settings: for example, automatically listing them in a tree manner, or automatically creating a GUI out of the descriptions.
(Also it may be a (unlikely?) security issue if you allow people to inject code by modifying the settings file.)
&ast;edit: no pun intended...

There are a few reasons why separating out config files from main codebase is a good idea. Of course it depends on your use case and you should evaluate against your usecase.
Configuration can be managed by end user, who do not understand programming languages. It makes more sense to factor out configuration and use a simple ini file which uses simple key-value pairs for config parameters.
Configuration varies based on the installation environment. Your code runs on multiple environment and they all use different configuration. It is very easy to maintain such cases by having separate config files and same source code installed on those environments.
There are package managers that knows what is a config file and what is a source file. They are intelligent to not override any changed config on version upgrade etc. So you do not have to worry about resetting config parameters after version upgrade of package. For example you ship your product with a default config file. User fine tuned few parameters. You shipped another version of the package. User should not expect a config reset after version upgrade.

One problem with having a settings file being a Python module is that it can contain code that will be executed when you import it. This may allow malicious code to be inserted into your program.

For Python use stock libraries:
YAML style configuration files:
http://www.yaml.org/start.html
http://pypi.python.org/pypi/PyYAML/
(Used e.g. Google App Engine)
INI: http://docs.python.org/library/configparser.html
Don't use XML for hand-edited config files.

python single configuration file

I am developing a project that requires a single configuration file whose data is used by multiple modules.
My question is: what is the common approach to that? should i read the configuration file from each
of my modules (files) or is there any other way to do it?
I was thinking to have a module named config.py that reads the configuration files and whenever I need a config I do import config and then do something like config.data['teamsdir'] get the 'teamsdir' property (for example).
response: opted for the conf.py approach then since it it is modular, flexible and simple
I can just put the configuration data directly in the file, latter if i want to read from a json file a xml file or multiple sources i just change the conf.py and make sure the data is accessed the same way.
accepted answer: chose "Alex Martelli" response because it was the most complete. voted up other answers because they where good and useful too.

I like the approach of a single config.py module whose body (when first imported) parses one or more configuration-data files and sets its own "global variables" appropriately -- though I'd favor config.teamdata over the round-about config.data['teamdata'] approach.
This assumes configuration settings are read-only once loaded (except maybe in unit-testing scenarios, where the test-code will be doing its own artificial setting of config variables to properly exercise the code-under-test) -- it basically exploits the nature of a module as the simplest Pythonic form of "singleton" (when you don't need subclassing or other features supported only by classes and not by modules, of course).
"One or more" configuration files (e.g. first one somewhere in /etc for general default settings, then one under /usr/local for site-specific overrides thereof, then again possibly one in the user's home directory for user specific settings) is a common and useful pattern.

The approach you describe is ok. If you want to add support for user config files, you can use execfile(os.path.expanduser("~/.yourprogram/config.py")).

One nice approach is to parse the config file(s) into a Python object when the application starts and pass this object around to all classes and modules requiring access to the configuration.
This may save a lot of time parsing the config.

If you want to share your config across different machines, you could perhaps put it on a web server and do import like this:
import urllib2
confstr = urllib2.urlopen("http://yourhost/config.py").read()
exec(confstr)
And if you want to share it across different languages, perhaps you can use JSON to encode and parse the configuration:
import urllib2
import simplejson
confstr = urllib2.urlopen("http://yourhost/config.py").read()
config = simplejson.loads(confstr)

Is it ever polite to put code in a python configuration file?

One of my favorite features about python is that you can write configuration files in python that are very simple to read and understand. If you put a few boundaries on yourself, you can be pretty confident that non-pythonistas will know exactly what you mean and will be perfectly capable of reconfiguring your program.
My question is, what exactly are those boundaries? My own personal heuristic was
Avoid flow control. No functions, loops, or conditionals. Those wouldn't be in a text config file and people aren't expecting to have understand them. In general, it probably shouldn't matter the order in which your statements execute.
Stick to literal assignments. Methods and functions called on objects are harder to think through. Anything implicit is going to be a mess. If there's something complicated that has to happen with your parameters, change how they're interpreted.
Language keywords and error handling are right out.
I guess I ask this because I came across a situation with my Django config file where it seems to be useful to break these rules. I happen to like it, but I feel a little guilty. Basically, my project is deployed through svn checkouts to a couple different servers that won't all be configured the same (some will share a database, some won't, for example). So, I throw a hook at the end:
try:
from settings_overrides import *
LOCALIZED = True
except ImportError:
LOCALIZED = False
where settings_overrides is on the python path but outside the working copy. What do you think, either about this example, or about python config boundaries in general?

There is a Django wiki page, which addresses exactly the thing you're asking.
http://code.djangoproject.com/wiki/SplitSettings
Do not reinvent the wheel. Use configparser and INI files. Python files are to easy to break by someone, who doesn't know Python.

Your heuristics are good. Rules are made so that boundaries are set and only broken when it's obviously a vastly better solution than the alternate.
Still, I can't help but wonder that the site checking code should be in the parser, and an additional configuration item added that selects which option should be taken.
I don't think that in this case the alternative is so bad that breaking the rules makes sense...
-Adam

I think it's a pain vs pleasure argument.
It's not wrong to put code in a Python config file because it's all valid Python, but it does mean you could confuse a user who comes in to reconfigure an app. If you're that worried about it, rope it off with comments explaining roughly what it does and that the user shouldn't edit it, rather edit the settings_overrides.py file.
As for your example, that's nigh on essential for developers to test then deploy their apps. Definitely more pleasure than pain. But you should really do this instead:
LOCALIZED = False
try:
from settings_overrides import *
except ImportError:
pass
And in your settings_overrides.py file:
LOCALIZED = True
... If nothing but to make it clear what that file does.. What you're doing there splits overrides into two places.

As a general practice, see the other answers on the page; it all depends. Specifically for Django, however, I see nothing fundamentally wrong with writing code in the settings.py file... after all, the settings file IS code :-)
The Django docs on settings themselves say:
A settings file is just a Python module with module-level variables.
And give the example:
assign settings dynamically using normal Python syntax. For example:
MY_SETTING = [str(i) for i in range(30)]

Settings as code is also a security risk. You import your "config", but in reality you are executing whatever code is in that file. Put config in files that you parse first and you can reject nonsensical or malicious values, even if it is more work for you. I blogged about this in December 2008.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Config File Preprocessor for Python - python

Related

How to document config files?

Config file with a .py file

How to implement a settings file in a Python program

python single configuration file

Is it ever polite to put code in a python configuration file?

Categories

Resources