In Python, how can you load YAML mappings as OrderedDicts? - python

I'd like to get PyYAML's loader to load mappings (and ordered mappings) into the Python 2.7+ OrderedDict type, instead of the vanilla dict and the list of pairs it currently uses.
What's the best way to do that?

Python >= 3.6
In python 3.6+, it seems that dict loading order is preserved by default without special dictionary types. The default Dumper, on the other hand, sorts dictionaries by key. Starting with pyyaml 5.1, you can turn this off by passing sort_keys=False:
a = dict(zip("unsorted", "unsorted"))
s = yaml.safe_dump(a, sort_keys=False)
b = yaml.safe_load(s)
assert list(a.keys()) == list(b.keys()) # True
This can work due to the new dict implementation that has been in use in pypy for some time. While still considered an implementation detail in CPython 3.6, "the insertion-order preserving nature of dicts has been declared an official part of the Python language spec" as of 3.7+, see What's New In Python 3.7.
Note that this is still undocumented from PyYAML side, so you shouldn't rely on this for safety critical applications.
Original answer (compatible with all known versions)
I like #James' solution for its simplicity. However, it changes the default global yaml.Loader class, which can lead to troublesome side effects. Especially, when writing library code this is a bad idea. Also, it doesn't directly work with yaml.safe_load().
Fortunately, the solution can be improved without much effort:
import yaml
from collections import OrderedDict
def ordered_load(stream, Loader=yaml.SafeLoader, object_pairs_hook=OrderedDict):
class OrderedLoader(Loader):
pass
def construct_mapping(loader, node):
loader.flatten_mapping(node)
return object_pairs_hook(loader.construct_pairs(node))
OrderedLoader.add_constructor(
yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
construct_mapping)
return yaml.load(stream, OrderedLoader)
# usage example:
ordered_load(stream, yaml.SafeLoader)
For serialization, you could use the following funcion:
def ordered_dump(data, stream=None, Dumper=yaml.SafeDumper, **kwds):
class OrderedDumper(Dumper):
pass
def _dict_representer(dumper, data):
return dumper.represent_mapping(
yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
data.items())
OrderedDumper.add_representer(OrderedDict, _dict_representer)
return yaml.dump(data, stream, OrderedDumper, **kwds)
# usage:
ordered_dump(data, Dumper=yaml.SafeDumper)
In each case, you could also make the custom subclasses global, so that they don't have to be recreated on each call.

2018 option:
oyaml is a drop-in replacement for PyYAML which preserves dict ordering. Both Python 2 and Python 3 are supported. Just pip install oyaml, and import as shown below:
import oyaml as yaml
You'll no longer be annoyed by screwed-up mappings when dumping/loading.
Note: I'm the author of oyaml.

The yaml module allow you to specify custom 'representers' to convert Python objects to text and 'constructors' to reverse the process.
_mapping_tag = yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG
def dict_representer(dumper, data):
return dumper.represent_dict(data.iteritems())
def dict_constructor(loader, node):
return collections.OrderedDict(loader.construct_pairs(node))
yaml.add_representer(collections.OrderedDict, dict_representer)
yaml.add_constructor(_mapping_tag, dict_constructor)

2015 (and later) option:
ruamel.yaml is a drop in replacement for PyYAML (disclaimer: I am the author of that package). Preserving the order of the mappings was one of the things added in the first version (0.1) back in 2015. Not only does it preserve the order of your dictionaries, it will also preserve comments, anchor names, tags and does support the YAML 1.2 specification (released 2009)
The specification says that the ordering is not guaranteed, but of course there is ordering in the YAML file and the appropriate parser can just hold on to that and transparently generate an object that keeps the ordering. You just need to choose the right parser, loader and dumperĀ¹:
import sys
from ruamel.yaml import YAML
yaml_str = """\
3: abc
conf:
10: def
3: gij # h is missing
more:
- what
- else
"""
yaml = YAML()
data = yaml.load(yaml_str)
data['conf'][10] = 'klm'
data['conf'][3] = 'jig'
yaml.dump(data, sys.stdout)
will give you:
3: abc
conf:
10: klm
3: jig # h is missing
more:
- what
- else
data is of type CommentedMap which functions like a dict, but has extra information that is kept around until being dumped (including the preserved comment!)

Note: there is a library, based on the following answer, which implements also the CLoader and CDumpers: Phynix/yamlloader
I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.
import yaml
import yaml.constructor
try:
# included in standard lib from Python 2.7
from collections import OrderedDict
except ImportError:
# try importing the backported drop-in replacement
# it's available on PyPI
from ordereddict import OrderedDict
class OrderedDictYAMLLoader(yaml.Loader):
"""
A YAML loader that loads mappings into ordered dictionaries.
"""
def __init__(self, *args, **kwargs):
yaml.Loader.__init__(self, *args, **kwargs)
self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)
self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)
def construct_yaml_map(self, node):
data = OrderedDict()
yield data
value = self.construct_mapping(node)
data.update(value)
def construct_mapping(self, node, deep=False):
if isinstance(node, yaml.MappingNode):
self.flatten_mapping(node)
else:
raise yaml.constructor.ConstructorError(None, None,
'expected a mapping node, but found %s' % node.id, node.start_mark)
mapping = OrderedDict()
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
try:
hash(key)
except TypeError, exc:
raise yaml.constructor.ConstructorError('while constructing a mapping',
node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)
value = self.construct_object(value_node, deep=deep)
mapping[key] = value
return mapping

Update: the library was deprecated in favor of the yamlloader (which is based on the yamlordereddictloader)
I've just found a Python library (https://pypi.python.org/pypi/yamlordereddictloader/0.1.1) which was created based on answers to this question and is quite simple to use:
import yaml
import yamlordereddictloader
datas = yaml.load(open('myfile.yml'), Loader=yamlordereddictloader.Loader)

On my For PyYaml installation for Python 2.7 I updated __init__.py, constructor.py, and loader.py. Now supports object_pairs_hook option for load commands. Diff of changes I made is below.
__init__.py
$ diff __init__.py Original
64c64
< def load(stream, Loader=Loader, **kwds):
---
> def load(stream, Loader=Loader):
69c69
< loader = Loader(stream, **kwds)
---
> loader = Loader(stream)
75c75
< def load_all(stream, Loader=Loader, **kwds):
---
> def load_all(stream, Loader=Loader):
80c80
< loader = Loader(stream, **kwds)
---
> loader = Loader(stream)
constructor.py
$ diff constructor.py Original
20,21c20
< def __init__(self, object_pairs_hook=dict):
< self.object_pairs_hook = object_pairs_hook
---
> def __init__(self):
27,29d25
< def create_object_hook(self):
< return self.object_pairs_hook()
<
54,55c50,51
< self.constructed_objects = self.create_object_hook()
< self.recursive_objects = self.create_object_hook()
---
> self.constructed_objects = {}
> self.recursive_objects = {}
129c125
< mapping = self.create_object_hook()
---
> mapping = {}
400c396
< data = self.create_object_hook()
---
> data = {}
595c591
< dictitems = self.create_object_hook()
---
> dictitems = {}
602c598
< dictitems = value.get('dictitems', self.create_object_hook())
---
> dictitems = value.get('dictitems', {})
loader.py
$ diff loader.py Original
13c13
< def __init__(self, stream, **constructKwds):
---
> def __init__(self, stream):
18c18
< BaseConstructor.__init__(self, **constructKwds)
---
> BaseConstructor.__init__(self)
23c23
< def __init__(self, stream, **constructKwds):
---
> def __init__(self, stream):
28c28
< SafeConstructor.__init__(self, **constructKwds)
---
> SafeConstructor.__init__(self)
33c33
< def __init__(self, stream, **constructKwds):
---
> def __init__(self, stream):
38c38
< Constructor.__init__(self, **constructKwds)
---
> Constructor.__init__(self)

here's a simple solution that also checks for duplicated top level keys in your map.
import yaml
import re
from collections import OrderedDict
def yaml_load_od(fname):
"load a yaml file as an OrderedDict"
# detects any duped keys (fail on this) and preserves order of top level keys
with open(fname, 'r') as f:
lines = open(fname, "r").read().splitlines()
top_keys = []
duped_keys = []
for line in lines:
m = re.search(r'^([A-Za-z0-9_]+) *:', line)
if m:
if m.group(1) in top_keys:
duped_keys.append(m.group(1))
else:
top_keys.append(m.group(1))
if duped_keys:
raise Exception('ERROR: duplicate keys: {}'.format(duped_keys))
# 2nd pass to set up the OrderedDict
with open(fname, 'r') as f:
d_tmp = yaml.load(f)
return OrderedDict([(key, d_tmp[key]) for key in top_keys])

Related

Python unittest: mock open specific paths (don't mock others)

The use-case is that I want to mock the opening of two files ~/.myconf and ./.myconf but not the other ones.
I'm testing the setup of a complex object which reads multiple files in its __init__ and so I'd like to mock some data for some of them, not mock at all for some others.
As an example here is how I mock the conditional opening of those two files, but it feels complex and I find it odd that there's no easy way already built-in that I'm missing.
import builtins
import configparser
import unittest
from textwrap import dedent
from pathlib import Path
from unittest.mock import mock_open
OPEN = builtins.open
def get_hierarchical_config():
cwd = Path.cwd()
global_config = configparser.ConfigParser()
local_config = configparser.ConfigParser()
global_config.read(Path("~/.myconf").expanduser().resolve())
local_config.read((cwd / ".myconf").expanduser().resolve())
full_config.read_dict(global_config)
full_config.read_dict(local_config)
return full_config["mysection"]
def get_custom_mock_open(global_conf_str, local_conf_str) -> callable:
def mocked_open():
def conditional_open_func(path, *args, **kwargs):
p = Path(path).expanduser().resolve()
if p.name == ".myconfig":
if p.parent == Path.home():
return mock_open(read_data=global_conf_str)()
return mock_open(read_data=local_conf_str)()
return OPEN(path, *args, **kwargs)
return conditional_open_func
return mocked_open
[...]
class TestConfig(unittest.TestCase):
def test_read_confs(self):
global_conf = dedent(
"""\
[mysection]
no_overwrite=path/to/somewhere
local_overwrite=ERROR:not overwritten
syntax_test_key= no/space= problem2
"""
)
local_conf = dedent(
"""\
[mysection]
local_overwrite=SUCCESS:overwritten
local_new_key=cool value
"""
)
with patch(
"builtins.open",
new_callable=get_custom_mock_open(global_conf, local_conf),
):
conf = dict(get_hierarchical_config()) # reads the config files
target = {
"no_overwrite": "path/to/somewhere",
"local_overwrite": "SUCCESS:overwritten",
"syntax_test_key": "no/space= problem2",
"local_new_key": "cool value",
}
self.assertDictEqual(conf, target)

memory overflow when using numpy load in a loop

Looping over npz files load causes memory overflow (depending on the file
list length).
None of the following seems to help
Deleting the variable which stores the data in the file.
Using mmap.
calling gc.collect() (garbage collection).
The following code should reproduce the phenomenon:
import numpy as np
# generate a file for the demo
X = np.random.randn(1000,1000)
np.savez('tmp.npz',X=X)
# here come the overflow:
for i in xrange(1000000):
data = np.load('tmp.npz')
data.close() # avoid the "too many files are open" error
in my real application the loop is over a list of files and the overflow exceeds 24GB of RAM!
please note that this was tried on ubuntu 11.10, and for both numpy v
1.5.1 as well as 1.6.0
I have filed a report in numpy ticket 2048 but this may be of a wider interest and so I am posting it here as well (moreover, I am not sure that this is a bug but may result of my bad programming).
SOLUTION (by HYRY):
the command
del data.f
should precede the command
data.close()
for more information and a method to find the solution, please read HYRY's kind answer below
I think this is a bug, and maybe I found the solution: call "del data.f".
for i in xrange(10000000):
data = np.load('tmp.npz')
del data.f
data.close() # avoid the "too many files are open" error
to found this kind of memory leak. you can use the following code:
import numpy as np
import gc
# here come the overflow:
for i in xrange(10000):
data = np.load('tmp.npz')
data.close() # avoid the "too many files are open" error
d = dict()
for o in gc.get_objects():
name = type(o).__name__
if name not in d:
d[name] = 1
else:
d[name] += 1
items = d.items()
items.sort(key=lambda x:x[1])
for key, value in items:
print key, value
After the test program, I created a dict and count objects in gc.get_objects(). Here is the output:
...
wrapper_descriptor 1382
function 2330
tuple 9117
BagObj 10000
NpzFile 10000
list 20288
dict 21001
From the result we know that there are something wrong with BagObj and NpzFile. Find the code:
class NpzFile(object):
def __init__(self, fid, own_fid=False):
...
self.zip = _zip
self.f = BagObj(self)
if own_fid:
self.fid = fid
else:
self.fid = None
def close(self):
"""
Close the file.
"""
if self.zip is not None:
self.zip.close()
self.zip = None
if self.fid is not None:
self.fid.close()
self.fid = None
def __del__(self):
self.close()
class BagObj(object):
def __init__(self, obj):
self._obj = obj
def __getattribute__(self, key):
try:
return object.__getattribute__(self, '_obj')[key]
except KeyError:
raise AttributeError, key
NpzFile has del(), NpzFile.f is a BagObj, and BagObj._obj is NpzFile, this is a reference cycle and will cause both NpzFile and BagObj uncollectable. Here is some explanation in Python document: http://docs.python.org/library/gc.html#gc.garbage
So, to break the reference cycle, will need to call "del data.f"
What I found as the solution: (python==3.8 and numpy==1.18.5)
import gc # import garbage collector interface
for i in range(1000):
data = np.load('tmp.npy')
# process data
del data
gc.collect()

Python ConfigParser interpolation from foreign section

With Python ConfigParser, is it possible to use interpolation across foreign sections? My mind seems to tell me I've seen that it's possible somewhere, but I can't find it when searching.
This example doesn't work, but it's to give an idea of what I'm trying to do.
[section1]
root = /usr
[section2]
root = /usr/local
[section3]
dir1 = $(section1:root)/bin
dir2 = $(section2:root)/bin
Note that I'm using Python 2.4.
In python 3.2 and up this is perfectly valid:
[Common]
home_dir: /Users
library_dir: /Library
system_dir: /System
macports_dir: /opt/local
[Frameworks]
Python: 3.2
path: ${Common:system_dir}/Library/Frameworks/
[Arthur]
nickname: Two Sheds
last_name: Jackson
my_dir: ${Common:home_dir}/twosheds
my_pictures: ${my_dir}/Pictures
python_dir: ${Frameworks:path}/Python/Versions/${Frameworks:Python}
Edit:
I just saw that you are using python 2.4, so no, section interpolation cannot be done in python 2.4. It was introduced in python 3.2 - See section 13.2.5 - ConfigParser Interpolation of values.
class configparser.ExtendedInterpolation
An alternative handler
for interpolation which implements a more advanced syntax, used for
instance in zc.buildout. Extended interpolation is using
${section:option} to denote a value from a foreign section.
Interpolation can span multiple levels. For convenience, if the
section: part is omitted, interpolation defaults to the current
section (and possibly the default values from the special section).
For example, the configuration specified above with basic
interpolation, would look like this with extended interpolation:
[Paths]
home_dir: /Users
my_dir: ${home_dir}/lumberjack
my_pictures: ${my_dir}/Pictures
Values from other sections can be fetched as well:
[Common]
home_dir: /Users
library_dir: /Library
system_dir: /System
macports_dir: /opt/local
[Frameworks]
Python: 3.2
path: ${Common:system_dir}/Library/Frameworks/
[Arthur]
nickname: Two Sheds
last_name: Jackson
my_dir: ${Common:home_dir}/twosheds
my_pictures: ${my_dir}/Pictures
python_dir: ${Frameworks:path}/Python/Versions/${Frameworks:Python}
You do have access to the special-case [DEFAULT] section. Values defined here can be accessed via interpolation from other sections even for older versions of Python.
If you're stuck with python 2.7 and you need to do cross-section interpolation it is easy enough to do this by hand using regexps.
Here is the code:
INTERPOLATION_RE = re.compile(r"\$\{(?:(?P<section>[^:]+):)?(?P<key>[^}]+)\}")
def load_something_from_cp(cp, section="section"):
result = []
def interpolate_func(match):
d = match.groupdict()
section = d.get('section', section)
key = d.get('key')
return cp.get(section, key)
for k, v in cp.items(section):
v = re.sub(INTERPOLATION_RE, interpolate_func, v)
result.append(
(v, k)
)
return result
Caveeats:
There is no recursion in interpolation
When parsing many sections, youll need to somehow guess current section.
I have run into this in the project I'm working on right now, and I implemented a quick extension to the ConfigParser.SafeConfigParser class in which I have overwritten the get() function. I thought some may find it useful.
import re
import ConfigParser
class ExtParser(ConfigParser.SafeConfigParser):
#implementing extended interpolation
def __init__(self, *args, **kwargs):
self.cur_depth = 0
ConfigParser.SafeConfigParser.__init__(self, *args, **kwargs)
def get(self, section, option, raw=False, vars=None):
r_opt = ConfigParser.SafeConfigParser.get(self, section, option, raw=True, vars=vars)
if raw:
return r_opt
ret = r_opt
re_oldintp = r'%\((\w*)\)s'
re_newintp = r'\$\{(\w*):(\w*)\}'
m_new = re.findall(re_newintp, r_opt)
if m_new:
for f_section, f_option in m_new:
self.cur_depth = self.cur_depth + 1
if self.cur_depth < ConfigParser.MAX_INTERPOLATION_DEPTH:
sub = self.get(f_section, f_option, vars=vars)
ret = ret.replace('${{{0}:{1}}}'.format(f_section, f_option), sub)
else:
raise ConfigParser.InterpolationDepthError, (option, section, r_opt)
m_old = re.findall(re_oldintp, r_opt)
if m_old:
for l_option in m_old:
self.cur_depth = self.cur_depth + 1
if self.cur_depth < ConfigParser.MAX_INTERPOLATION_DEPTH:
sub = self.get(section, l_option, vars=vars)
ret = ret.replace('%({0})s'.format(l_option), sub)
else:
raise ConfigParser.InterpolationDepthError, (option, section, r_opt)
self.cur_depth = self.cur_depth - 1
return ret

Handling Python program arguments in a json file

I am a Python re-newbie. I would like advice on handling program parameters which are in a file in json format. Currently, I am doing something like what is shown below, however, it seems too wordy, and the idea of typing the same literal string multiple times (sometimes with dashes and sometimes with underscores) seems juvenile - error prone - stinky... :-) (I do have many more parameters!)
#!/usr/bin/env python
import sys
import os
import json ## for control file parsing
# control parameters
mpi_nodes = 1
cluster_size = None
initial_cutoff = None
# ...
#process the arguments
if len(sys.argv) != 2:
raise Exception(
"""Usage:
run_foo <controls.json>
Where:
<control.json> is a dictionary of run parameters
"""
)
# We expect a .json file with our parameters
controlsFileName = sys.argv[1]
err = ""
err += "" #validateFileArgument(controlsFileName, exists=True)
# read in the control parameters from the .json file
try:
controls = json.load(open(controlsFileName, "r"))
except:
err += "Could not process the file '" + controlsFileName + "'!\n"
# check each control parameter. The first one is optional
if "mpi-nodes" in controls:
mpi_nodes = controls["mpi-nodes"]
else:
mpi_nodes = controls["mpi-nodes"] = 1
if "cluster-size" in controls:
cluster_size = controls["cluster-size"]
else:
err += "Missing control definition for \"cluster-size\".\n"
if "initial-cutoff" in controls:
initial_cutoff = controls["initial-cutoff"]
else:
err += "Missing control definition for \"initial-cutoff\".\n"
# ...
# Quit if any of these things were not true
if len(err) > 0:
print err
exit()
#...
This works, but it seems like there must be a better way. I am stuck with the requirements to use a json file and to use the hyphenated parameter names. Any ideas?
I was looking for something with more static binding. Perhaps this is as good as it gets.
Usually, we do things like this.
def get_parameters( some_file_name ):
source= json.loads( some_file_name )
return dict(
mpi_nodes= source.get('mpi-nodes',1),
cluster_size= source['cluster-size'],
initial_cutoff = source['initial-cutoff'],
)
controlsFileName= sys.argv[1]
try:
params = get_parameters( controlsFileName )
except IOError:
print "Could not process the file '{0}'!".format( controlsFileName )
sys.exit( 1 )
except KeyError, e:
print "Missing control definition for '{0}'.".format( e.message )
sys.exit( 2 )
A the end params['mpi_nodes'] has the value of mpi_nodes
If you want a simple variable, you do this. mpi_nodes = params['mpi_nodes']
If you want a namedtuple, change get_parameters like this
def get_parameters( some_file_name ):
Parameters= namedtuple( 'Parameters', 'mpi_nodes, cluster_size, initial_cutoff' )
return Parameters( source.get('mpi-nodes',1),
source['cluster-size'],
source['initial-cutoff'],
)
I don't know if you'd find that better or not.
the argparse library is nice, it can handle most of the argument parsing and validation for you as well as printing pretty help screens
[1] http://docs.python.org/dev/library/argparse.html
I will knock up a quick demo showing how you'd want to use it this arvo.
Assuming you have many more parameters to process, something like this could work:
def underscore(s):
return s.replace('-','_')
# parameters with default values
for name, default in (("mpi-nodes", 1),):
globals()[underscore(name)] = controls.get(name, default)
# mandatory parameters
for name in ("cluster-size", "initial-cutoff"):
try:
globals()[underscore(name)] = controls[name]
except KeyError:
err += "Missing control definition for %r" % name
Instead of manipulating globals, you can also make this more explicit:
def underscore(s):
return s.replace('-','_')
settings = {}
# parameters with default values
for name, default in (("mpi-nodes", 1),):
settings[underscore(name)] = controls.get(name, default)
# mandatory parameters
for name in ("cluster-size", "initial-cutoff"):
try:
settings[underscore(name)] = controls[name]
except KeyError:
err += "Missing control definition for %r" % name
# print out err if necessary
mpi_nodes = settings['mpi_nodes']
cluster_size = settings['cluster_size']
initial_cutoff = settings['initial_cutoff']
I learned something from all of these responses - thanks! I would like to get feedback on my approach which incorporates something from each suggestion. In addition to the conditions imposed by the client, I want something:
1) that is fairly obvious to use and to debug
2) that is easy to maintain and modify
I decided to incorporate str.replace, namedtuple, and globals(), creating a ControlParameters namedtuple in the globals() namespace.
#!/usr/bin/env python
import sys
import os
import collections
import json
def get_parameters(parameters_file_name ):
"""
Access all of the control parameters from the json filename given. A
variable of type namedtuple named "ControlParameters" is injected
into the global namespace. Parameter validation is not performed. Both
the names and the defaults, if any, are defined herein. Parameters not
found in the json file will get values of None.
Parameter usage example: ControlParameters.cluster_size
"""
parameterValues = json.load(open(parameters_file_name, "r"))
Parameters = collections.namedtuple( 'Parameters',
"""
mpi_nodes
cluster_size
initial_cutoff
truncation_length
"""
)
parameters = Parameters(
parameterValues.get(Parameters._fields[0].replace('_', '-'), 1),
parameterValues.get(Parameters._fields[1].replace('_', '-')),
parameterValues.get(Parameters._fields[2].replace('_', '-')),
parameterValues.get(Parameters._fields[3].replace('_', '-'))
)
globals()["ControlParameters"] = parameters
#process the program argument(s)
err = ""
if len(sys.argv) != 2:
raise Exception(
"""Usage:
foo <control.json>
Where:
<control.json> is a dictionary of run parameters
"""
)
# We expect a .json file with our parameters
parameters_file_name = sys.argv[1]
err += "" #validateFileArgument(parameters_file_name, exists=True)
if err == "":
get_parameters(parameters_file_name)
cp_dict = ControlParameters._asdict()
for name in ControlParameters._fields:
if cp_dict[name] == None:
err += "Missing control parameter '%s'\r\n" % name
print err
print "Done"

Can Python be made to generate tracing similar to bash's set -x?

Is there a similar mechanism in Python, to the effect set -x has on bash?
Here's some example output from bash in this mode:
+ for src in cpfs.c log.c popcnt.c ssse3_popcount.c blkcache.c context.c types.c device.c
++ my_mktemp blkcache.c.o
+++ mktemp -t blkcache.c.o.2160.XXX
++ p=/tmp/blkcache.c.o.2160.IKA
++ test 0 -eq 0
++ echo /tmp/blkcache.c.o.2160.IKA
+ obj=/tmp/blkcache.c.o.2160.IKA
I'm aware of the Python trace module, however its output seems to be extremely verbose, and not high level like that of bash.
Perhaps use sys.settrace:
Use traceit() to turn on tracing, use traceit(False) to turn off tracing.
import sys
import linecache
def _traceit(frame, event, arg):
'''
http://www.dalkescientific.com/writings/diary/archive/2005/04/20/tracing_python_code.html
'''
if event == "line":
lineno = frame.f_lineno
filename = frame.f_globals["__file__"]
if (filename.endswith(".pyc") or
filename.endswith(".pyo")):
filename = filename[:-1]
name = frame.f_globals["__name__"]
line = linecache.getline(filename, lineno)
print "%s # %s:%s" % (line.rstrip(), name, lineno,)
return _traceit
def _passit(frame, event, arg):
return _passit
def traceit(on=True):
if on: sys.settrace(_traceit)
else: sys.settrace(_passit)
def mktemp(src):
pass
def my_mktemp(src):
mktemp(src)
p=src
traceit()
for src in ('cpfs.c','log.c',):
my_mktemp(src)
traceit(False)
yields
mktemp(src) # __main__:33
pass # __main__:30
p=src # __main__:34
mktemp(src) # __main__:33
pass # __main__:30
p=src # __main__:34
if on: sys.settrace(_traceit) # __main__:26
else: sys.settrace(_passit) # __main__:27
To trace specific calls, you can wrap each interesting function with your own logger. This does lead to arguments expanded to their values rather than just argument names in the output.
Functions have to be passed in as strings to prevent issues where modules redirect to other modules, like os.path / posixpath. I don't think you can extract the right module name to patch from just the function object.
Wrapping code:
import importlib
def wrapper(ffull, f):
def logger(*args, **kwargs):
print "TRACE: %s (%s, %s)" % (ffull, args, kwargs)
return f(*args, **kwargs)
return logger
def log_execution(ffull):
parts = ffull.split('.')
mname = '.'.join(parts[:-1])
fname = parts[-1]
m = importlib.import_module(mname)
f = getattr(m, fname)
setattr(m, fname, wrapper(ffull, f))
Usage:
for f in ['os.path.join', 'os.listdir', 'sys.exit']:
log_execution(f)
p = os.path.join('/usr', 'bin')
os.listdir(p)
sys.exit(0)
....
% ./a.py
TRACE: os.path.join (('/usr', 'bin'), {})
TRACE: os.listdir (('/usr/bin',), {})
TRACE: sys.exit ((0,), {})
You should try to instrument the trace module to get an higher detail level.
What do you need exactly?

Categories