Configuration file with list of key-value pairs in python - python

I have a python script that analyzes a set of error messages and checks for each message if it matches a certain pattern (regular expression) in order to group these messages. For example "file x does not exist" and "file y does not exist" would match "file .* does not exist" and be accounted as two occurrences of "file not found" category.
As the number of patterns and categories is growing, I'd like to put these couples "regular expression/display string" in a configuration file, basically a dictionary serialization of some sort.
I would like this file to be editable by hand, so I'm discarding any form of binary serialization, and also I'd rather not resort to xml serialization to avoid problems with characters to escape (& <> and so on...).
Do you have any idea of what could be a good way of accomplishing this?
Update: thanks to Daren Thomas and Federico Ramponi, but I cannot have an external python file with possibly arbitrary code.

I sometimes just write a python module (i.e. file) called config.py or something with following contents:
config = {
'name': 'hello',
'see?': 'world'
}
this can then be 'read' like so:
from config import config
config['name']
config['see?']
easy.

You have two decent options:
Python standard config file format
using ConfigParser
YAML using a library like PyYAML
The standard Python configuration files look like INI files with [sections] and key : value or key = value pairs. The advantages to this format are:
No third-party libraries necessary
Simple, familiar file format.
YAML is different in that it is designed to be a human friendly data serialization format rather than specifically designed for configuration. It is very readable and gives you a couple different ways to represent the same data. For your problem, you could create a YAML file that looks like this:
file .* does not exist : file not found
user .* not found : authorization error
Or like this:
{ file .* does not exist: file not found,
user .* not found: authorization error }
Using PyYAML couldn't be simpler:
import yaml
errors = yaml.load(open('my.yaml'))
At this point errors is a Python dictionary with the expected format. YAML is capable of representing more than dictionaries: if you prefer a list of pairs, use this format:
-
- file .* does not exist
- file not found
-
- user .* not found
- authorization error
Or
[ [file .* does not exist, file not found],
[user .* not found, authorization error]]
Which will produce a list of lists when yaml.load is called.
One advantage of YAML is that you could use it to export your existing, hard-coded data out to a file to create the initial version, rather than cut/paste plus a bunch of find/replace to get the data into the right format.
The YAML format will take a little more time to get familiar with, but using PyYAML is even simpler than using ConfigParser with the advantage is that you have more options regarding how your data is represented using YAML.
Either one sounds like it will fit your current needs, ConfigParser will be easier to start with while YAML gives you more flexibilty in the future, if your needs expand.
Best of luck!

I've heard that ConfigObj is easier to work with than ConfigParser. It is used by a lot of big projects, IPython, Trac, Turbogears, etc...
From their introduction:
ConfigObj is a simple but powerful config file reader and writer: an ini file round tripper. Its main feature is that it is very easy to use, with a straightforward programmer's interface and a simple syntax for config files. It has lots of other features though :
Nested sections (subsections), to any level
List values
Multiple line values
String interpolation (substitution)
Integrated with a powerful validation system
including automatic type checking/conversion
repeated sections
and allowing default values
When writing out config files, ConfigObj preserves all comments and the order of members and sections
Many useful methods and options for working with configuration files (like the 'reload' method)
Full Unicode support

I think you want the ConfigParser module in the standard library. It reads and writes INI style files. The examples and documentation in the standard documentation I've linked to are very comprehensive.

If you are the only one that has access to the configuration file, you can use a simple, low-level solution. Keep the "dictionary" in a text file as a list of tuples (regexp, message) exactly as if it was a python expression:
[
("file .* does not exist", "file not found"),
("user .* not authorized", "authorization error")
]
In your code, load it, then eval it, and compile the regexps in the result:
f = open("messages.py")
messages = eval(f.read()) # caution: you must be sure of what's in that file
f.close()
messages = [(re.compile(r), m) for (r,m) in messages]
and you end up with a list of tuples (compiled_regexp, message).

I typically do as Daren suggested, just make your config file a Python script:
patterns = {
'file .* does not exist': 'file not found',
'user .* not found': 'authorization error',
}
Then you can use it as:
import config
for pattern in config.patterns:
if re.search(pattern, log_message):
print config.patterns[pattern]
This is what Django does with their settings file, by the way.

Related

How to custom indent JSON?

I want to be able to indent a JSON file in a way where the first key is on the same line as the opening bracket, by deafult, a json.dump() function puts it in a new line.
For example, if the original file was:
[
{
"statusName": "CO_FILTER",
"statusBar": {}
I want it to start like this:
[
{ "statusName": "CO_FILTER",
"statusBar":{}
I believe if you wanted to custom format it, you will most likely have to make/add onto the JSON library, just like the Google Json formatter and many others. But for the most part, its not like a value you can change on demand, it will either be included in another Json library apart from the default library from python. Maybe search for different python Json libraries that dumps Json the way you like.

Write .pypirc file programmatically

Is there some library available to write a ~/.pypirc file programmatically?
Also, what is the formal spec of its format? All I've found is this section of the docs:
https://docs.python.org/3.3/distutils/packageindex.html#pypirc
That leaves out details like what kinds of whitespace are allowable, whether = and : are equivalent or not, and so on.
The file is read by distutils/config.py using RawConfigParser.
So if you want to write it use the same RawConfigParser. The docs you pointed are the only docs. The rest could be deduced from the code.

Manipulating a file with nested tags and key-value pairs

I have a need to work with configuration files that use nested HTML-style tags of key-value pairs using equals signs.
I'd like a Python approach that would allow me to read such files, add, delete or modify sections, and write the updated file.
The files look like:
<tag1>
key1=value1
key2=value2
<tag2>
key3=value3
</tag2>
<tag2>
key3=value four
</tag2>
</tag1>
So its not quite a HTML or XML file, and not a Windows INI file either. There are no spaces surrounding the equals signs, there are a few random blank lines in the files that seem to be ignored and values in the key-value pairs don't use quote marks and may have embedded spaces.
I could not find a definition or name for this exact file structure but I found it hard to focus the search so I may have missed something obvious.
Is this a recognized standard file structure? If so what is it called?
I'd appreciate any pointers on what libraries can be coerced into working with this structure and maybe some examples if they are not readily available in the docco.
Thanks.
For keeping configuration files , you can use the configparser module in python. This makes it very easy to read config information in your app.
For a config file like this :-
[installation]
library=%(prefix)s/lib
include=%(prefix)s/include
bin=%(prefix)s/bin
prefix=/usr/local
[debug]
log_errors=true
show_warnings=False
[server]
port: 8080
nworkers: 32
pid-file=/tmp/spam.pid
root=/www/root
You can read this configuration file like this shown below :
from configparser import ConfigParser
cfg = ConfigParser()
cfg.read('config.ini')
['config.ini']
cfg.sections()
['installation', 'debug', 'server']
cfg.get('installation','library')
'/usr/local/lib'
cfg.getboolean('debug','log_errors')
True
cfg.getint('server','port')
8080
cfg.getint('server','nworkers')
32
print(cfg.get('server','signature'))
If you wanna go with only that html kind of configuration , checkout xml.etree module . It offers a wide range of functions .

how to "source" file into python script

I have a text file /etc/default/foo which contains one line:
FOO="/path/to/foo"
In my python script, I need to reference the variable FOO.
What is the simplest way to "source" the file /etc/default/foo into my python script, same as I would do in bash?
. /etc/default/foo
Same answer as #jil however, that answer is specific to some historical version of Python.
In modern Python (3.x):
exec(open('filename').read())
replaces execfile('filename') from 2.x
You could use execfile:
execfile("/etc/default/foo")
But please be aware that this will evaluate the contents of the file as is into your program source. It is potential security hazard unless you can fully trust the source.
It also means that the file needs to be valid python syntax (your given example file is).
Keep in mind that if you have a "text" file with this content that has a .py as the file extension, you can always do:
import mytextfile
print(mytestfile.FOO)
Of course, this assumes that the text file is syntactically correct as far as Python is concerned. On a project I worked on we did something similar to this. Turned some text files into Python files. Wacky but maybe worth consideration.
Just to give a different approach, note that if your original file is setup as
export FOO=/path/to/foo
You can do source /etc/default/foo; python myprogram.py (or . /etc/default/foo; python myprogram.py) and within myprogram.py all the values that were exported in the sourced' file are visible in os.environ, e.g
import os
os.environ["FOO"]
If you know for certain that it only contains VAR="QUOTED STRING" style variables, like this:
FOO="some value"
Then you can just do this:
>>> with open('foo.sysconfig') as fd:
... exec(fd.read())
Which gets you:
>>> FOO
'some value'
(This is effectively the same thing as the execfile() solution
suggested in the other answer.)
This method has substantial security implications; if instead of FOO="some value" your file contained:
os.system("rm -rf /")
Then you would be In Trouble.
Alternatively, you can do this:
>>> with open('foo.sysconfig') as fd:
... settings = {var: shlex.split(value) for var, value in [line.split('=', 1) for line in fd]}
Which gets you a dictionary settings that has:
>>> settings
{'FOO': ['some value']}
That settings = {...} line is using a dictionary comprehension. You could accomplish the same thing in a few more lines with a for loop and so forth.
And of course if the file contains shell-style variable expansion like ${somevar:-value_if_not_set} then this isn't going to work (unless you write your very own shell style variable parser).
There are a couple ways to do this sort of thing.
You can indeed import the file as a module, as long as the data it contains corresponds to python's syntax. But either the file in question is a .py in the same directory as your script, either you're to use imp (or importlib, depending on your version) like here.
Another solution (that has my preference) can be to use a data format that any python library can parse (JSON comes to my mind as an example).
/etc/default/foo :
{"FOO":"path/to/foo"}
And in your python code :
import json
with open('/etc/default/foo') as file:
data = json.load(file)
FOO = data["FOO"]
## ...
file.close()
This way, you don't risk to execute some uncertain code...
You have the choice, depending on what you prefer. If your data file is auto-generated by some script, it might be easier to keep a simple syntax like FOO="path/to/foo" and use imp.
Hope that it helps !
The Solution
Here is my approach: parse the bash file myself and process only variable assignment lines such as:
FOO="/path/to/foo"
Here is the code:
import shlex
def parse_shell_var(line):
"""
Parse such lines as:
FOO="My variable foo"
:return: a tuple of var name and var value, such as
('FOO', 'My variable foo')
"""
return shlex.split(line, posix=True)[0].split('=', 1)
if __name__ == '__main__':
with open('shell_vars.sh') as f:
shell_vars = dict(parse_shell_var(line) for line in f if '=' in line)
print(shell_vars)
How It Works
Take a look at this snippet:
shell_vars = dict(parse_shell_var(line) for line in f if '=' in line)
This line iterates through the lines in the shell script, only process those lines that has the equal sign (not a fool-proof way to detect variable assignment, but the simplest). Next, run those lines into the function parse_shell_var which uses shlex.split to correctly handle the quotes (or the lack thereof). Finally, the pieces are assembled into a dictionary. The output of this script is:
{'MOO': '/dont/have/a/cow', 'FOO': 'my variable foo', 'BAR': 'My variable bar'}
Here is the contents of shell_vars.sh:
FOO='my variable foo'
BAR="My variable bar"
MOO=/dont/have/a/cow
echo $FOO
Discussion
This approach has a couple of advantages:
It does not execute the shell (either in bash or in Python), which avoids any side-effect
Consequently, it is safe to use, even if the origin of the shell script is unknown
It correctly handles values with or without quotes
This approach is not perfect, it has a few limitations:
The method of detecting variable assignment (by looking for the presence of the equal sign) is primitive and not accurate. There are ways to better detect these lines but that is the topic for another day
It does not correctly parse values which are built upon other variables or commands. That means, it will fail for lines such as:
FOO=$BAR
FOO=$(pwd)
Based off the answer with exec(.read()), value = eval(.read()), it will only return the value. E.g.
1 + 1: 2
"Hello Word": "Hello World"
float(2) + 1: 3.0

Is there a version of ConfigParser that deals with files with no section headers?

I have a config file which is mainly used in shell scripts, and therefore has the following format:
# Database parameters (MySQL only for now)
DBHOST=localhost
DATABASE=stuff
DBUSER=mypkguser
DBPASS=zbxhsxhg
# Storage locations
STUFFDIR=/var/mypkg/stuff
GIZMODIR=/var/mypkg/gizmo
Now I need to read its values from a Python (2.6) script. I would like not to reinvent the wheel and parse it with descriptor.readlines() and looking for equal signs and skipping lines beginning with '#' and dealing with quoted values and blah blah blah boring. I tried using ConfigParser but it doesn't like files that don't have section headers. Do I have any options or will I have to do the boring thing?
Oh, by the way, wrapping a shell script around the Python script is not an option. It has to run within Apache.
I'm not aware of such a module, but as a quick and dirty hack - just add the [section] before the file-content and you can use ConfigParser as intended!
from io import StringIO
filename = 'ham.egg'
vfile = StringIO(u'[Pseudo-Sectio]\n%s' % open(filename).read())

Categories