Python: ambiguities when (un-)pickling a dict-based tree

Python: ambiguities when (un-)pickling a dict-based tree - python

I have a simple tree structure where each node can have several children which are accessed through keys (+ some payload stored in value):
class NodeDict(dict):
def __init__(self, parent):
self.parent = parent
self.value = None
def AddChild(self, label):
self[label] = NodeDict(self)
return self[label]
class TreeDict:
def __init__(self):
self.root = NodeDict(None)
def ToString(self, level, node):
res = ":" + str(node.value) + "\n"
for k, v in node.items():
res += " "*level + str(k) + self.ToString(level + 1, v)
return res
def __str__(self):
return self.ToString(1, self.root)
When I try to pickle such a tree I have the problem that it is not correctly unpickled as the following example shows:
class Obj:
def __init__(self, v):
self.v = v
def __str__(self):
return str(self.v)
t = TreeDict()
a = t.root.AddChild(Obj("A"))
b = a.AddChild(Obj("B"))
c = b.AddChild(Obj("C"))
d = b.AddChild(Obj("D"))
e = a.AddChild(Obj("E"))
print t
import cPickle
cPickle.dump(t, open("test.dat", "w"))
t = cPickle.load(open("test.dat", "r"))
print t
The tree looks like this before
:None
A:None
E:None
B:None
C:None
D:None
and like this after pickling and unpickling:
:None
A:None
B:None
B:None
D:None
C:None
E:None
E:None
The problem arises from the fact that I am using Obj as labels in the tree (i.e. as keys in the dict). (It also fails when using e.g. strings instead of Obj if they are not all mutually different.)
How could this be changed to work as intended?

When subclassing builtin types like dict or using new style classes you should use at least the pickle protocol 2. The default for python2 is still 0, which can have problems with these.
When using dump(t, open("test.dat", "wb"), protocol=2) your code works with cPickle and with pickle alike.
And remember the remark about opening the files in binary mode.

If this is still not working (see Aya's comment), is using another tree structure an option for you? If so, here is a script to pickle DOM tree structure.

Related

pickle, dill and cloudpickle returning field as empty dict on custom class after process termination

I have an object of a custom class that I am trying to serialize and permanently store.
When I serialize it, store it, load it and use it in the same run, it works fine. It only messes up when I've ended the process and then try to load it again from the pickle file. This is the code that works fine:
first_model = NgramModel(3, name="debug")
for paragraph in text:
first_model.train(paragraph_to_sentences(text))
# paragraph to sentences just uses regex to do the equivalent of splitting by punctuation
print(first_model.context_options)
# context_options is a dict (counter)
first_model = NgramModel.load_existing_model("debug")
#load_existing_model loads the pickle file. Look in the class code
print(first_model.context_options)
However, when I run this alone, it prints an empty counter:
first_model = NgramModel.load_existing_model("debug")
print(first_model.context_options)
This is a shortened version of the class file (the only two methods that touch the pickle/dill are update_pickle_state and load_existing_model):
import os
import dill
from itertools import count
from collections import Counter
from os import path
class NgramModel:
context_options: dict[tuple, set[str]] = {}
ngram_count: Counter[tuple] = Counter()
n = 0
pickle_path: str = None
num_paragraphs = 0
num_sentences = 0
def __init__(self, n: int, **kwargs):
self.n = n
self.pickle_path = NgramModel.pathify(kwargs.get('name', NgramModel.gen_pickle_name())) #use name if exists else generate random name
def train(self, paragraph_as_list: list[str]):
'''really the central method that coordinates everything else. Takes a list of sentences, generates data(n-grams) from each, updates the fields, and saves the instance (self) to a pickle file'''
self.num_paragraphs += 1
for sentence in paragraph_as_list:
self.num_sentences += 1
generated = self.generate_Ngrams(sentence)
self.ngram_count.update(generated)
for ngram in generated:
self.add_to_set(ngram)
self.update_pickle_state()
def update_pickle_state(self):
'''saves instance to pickle file'''
file = open(self.pickle_path, "wb")
dill.dump(self, file)
file.close()
#staticmethod
def load_existing_model(name: str):
'''returns object from pickle file'''
path = NgramModel.pathify(name)
file = open(path, "rb")
obj: NgramModel = dill.load(file)
return obj
def generate_Ngrams(self, string: str):
'''ref: https://www.analyticsvidhya.com/blog/2021/09/what-are-n-grams-and-how-to-implement-them-in-python/'''
words = string.split(" ")
words = ["<start>"] * (self.n - 1) + words + ["<end>"] * (self.n - 1)
list_of_tup = []
for i in range(len(words) + 1 - self.n):
list_of_tup.append((tuple(words[i + j] for j in range(self.n - 1)), words[i + self.n - 1]))
return list_of_tup
def add_to_set(self, ngram: tuple[tuple[str, ...], str]):
if ngram[0] not in self.context_options:
self.context_options[ngram[0]] = set()
self.context_options[ngram[0]].add(ngram[1])
#staticmethod
def pathify(name):
'''converts name to path'''
return f"models/{name}.pickle"
#staticmethod
def gen_pickle_name():
for i in count():
new_name = f"unnamed-pickle-{i}"
if not path.exists(NgramModel.pathify(new_name)):
return new_name
All the other fields print properly and are complete and correct except the two dicts

The problem is that is that context_options is a mutable class-member, not an instance member. If I had to guess, dill is only pickling instance members, since the class definition holds class members. That would account for why you see a "filled-out" context_options when you're working in the same shell but not when you load fresh — you're using the dirtied class member in the former case.
It's for stuff like this that you generally don't want to use mutable class members (or similarly, mutable default values in function signatures). More typical is to use something like context_options: dict[tuple, set[str]] = None and then check if it's None in the __init__ to set it to a default value, e.g., an empty dict. Alternatively, you could use a #dataclass and provide a field initializer, i.e.
#dataclasses.dataclass
class NgramModel:
context_options: dict[tuple, set[str]] = dataclasses.field(default_factory=dict)
...
You can observe what I mean about it being a mutable class member with, for instance...
if __name__ == '__main__':
ng = NgramModel(3, name="debug")
print(ng.context_options) # {}
ng.context_options[("foo", "bar")] = {"baz", "qux"}
print(ng.context_options) # {('foo', 'bar'): {'baz', 'qux'}}
ng2 = NgramModel(3, name="debug")
print(ng2.context_options) # {('foo', 'bar'): {'baz', 'qux'}}
I would expect a brand new ng2 to have the same context that the brand new ng had - empty (or whatever an appropriate default is).

How to convert an object back into the code used to create it?

For example if I have a custom Python object like this;
#!/usr/bin/env python3
import os
base_dir = os.path.abspath(".")
class MyFile(dict):
def __init__(self, name, size = None, dir = base_dir):
self.name = name
self.path = os.path.join(dir, name)
self.bytes = size
and somewhere in my program, I initialize my object class;
a = MyFile(name = "foo", size = 10)
I want to be able to return the code used to create the object in the first place. For example;
print(a)
# <__main__.MyFile object at 0x102b84470>
# should instead print:
# MyFile(name = "foo", size = 10)
But since my object has some default attribute values, I only want those to show up in the output if they were explicitly included when the object was initialized;
b = MyFile(name = "bar", dir = "/home")
print(b)
# <__main__.MyFile object at 0x102b845c0>
# should instead print:
# MyFile(name = "bar", dir = "/home")
And to be clear, I am not trying to pull this from the source code, because a lot of my objects will be created dynamically, and I want to be able to return the same thing for them as well;
l = [ ("baz", 4), ("buzz", 12) ]
f = [ MyFile(name = n, size = s) for n, s in l ]
print(f)
# [<__main__.MyFile object at 0x1023844a8>, <__main__.MyFile object at 0x102384828>]
# should instead print:
# [ MyFile(name = "baz", size = 4), MyFile(name = "buzz", size = 12) ]
I saw the inspect library (https://docs.python.org/3/library/inspect.html) but it does not seem to have anything that does this. What am I missing? This functionality would be pretty analogous to R's dput function.

At a very basic level you can do this:
class MyClass:
def __init__(self, a, b):
self.a = a
self.b = b
def __repr__(self):
return f'{self.__class__.__name__}({self.a}, {self.b})'
class MyOtherClass(MyClass):
def method(self):
pass
c = MyClass(1, 2)
oc = MyOtherClass(3, 4)
print(c, oc)
Result:
MyClass(1, 2) MyOtherClass(3, 4)
This does what you ask, as well as taking subclassing into account to provide the correct class name. But of course things can get complicated for several reasons:
class MyClass:
def __init__(self, a, b):
self.a = a + 1
self.b = b if b < 10 else a
self.c = 0
def inc_c(self):
self.c += 1
def __repr__(self):
return f'{self.__class__.__name__}({self.a - 1}, {self.b})'
The value of c isn't covered by the constructor, so the proposed call would set it to 0. And Although you could compensate for the + 1 for a, the value of b will be more complicated - even more so if you realise someone could have changed the value later.
And then you need to consider that subclasses can override behaviour, etc. So, doing something like this only makes sense in very limited use cases.

As simple as replacing your code snippet with the following:
import os
base_dir = os.path.abspath(".")
class MyFile(object):
def __init__(self, name, size = None, dir = base_dir):
self.name = name
self.path = os.path.join(dir, name)
self.bytes = size
self.remember(name,size, dir)
def remember(self, name,size, dir):
self.s= '{}(name = \'{}\'{}{})'.format(self.__class__.__name__,name, ", size="+str(size) if size!=None else "", ', dir="'+dir+'"' if dir!=base_dir else "")
def __repr__(self):
return self.s
a) for a it returns:
MyFile(name = 'foo', size=10)
b) for b it returns:
MyFile(name = 'bar', dir="/home")
c) for f it returns:
[MyFile(name = 'baz', size=4), MyFile(name = 'buzz', size=12)]

Thanks to everyone who commented and answered. Ultimately, I incorporated their ideas and feedback into the following method, which allowed me to preserve the object's native __repr__ while still getting the behaviors I wanted.
#!/usr/bin/env python3
import os
base_dir = os.path.abspath(".")
class MyFile(dict):
"""
A custom dict class that auto-populates some keys based on simple input args
compatible with unittest.TestCase.assertDictEqual
"""
def __init__(self, name, size = None, dir = base_dir):
"""
standard init methods
"""
self.name = name
self.path = os.path.join(dir, name)
self.bytes = size
# auto-populate this key
self['somekey'] = self.path + ' ' + str(self.bytes)
# more logic for more complex keys goes here...
# use these later with `init` and `repr`
self.args = None
self.kwargs = None
#classmethod
def init(cls, *args, **kwargs):
"""
alternative method to initialize the object while retaining the args passed
"""
obj = cls(*args, **kwargs)
obj.args = args
obj.kwargs = kwargs
return(obj)
def repr(self):
"""
returns a text representation of the object that can be used to
create a new copy of an identical object, displaying only the
args that were originally used to create the current object instance
(do not show args that were not passed e.g. default value args)
"""
n = 'MyFile('
if self.args:
for i, arg in enumerate(self.args):
n += arg.__repr__()
if i < len(self.args) - 1 or self.kwargs:
n += ', '
if self.kwargs:
for i, (k, v) in enumerate(self.kwargs.items()):
n += str(k) + '=' + v.__repr__()
if i < len(self.kwargs.items()) - 1:
n += ', '
n += ')'
return(n)
Usage:
# normal object initialization
obj1 = MyFile('foo', size=10)
print(obj1) # {'somekey': '/Users/me/test/foo 10'}
# initialize with classmethod instead to preserve args
obj2 = MyFile.init("foo", size = 10)
print(obj2) # {'somekey': '/Users/me/test/foo 10'}
# view the text representation
repr = obj2.repr()
print(repr) # MyFile('foo', size=10)
# re-load a copy of the object from the text representation
obj3 = eval(repr)
print(obj3) # {'somekey': '/Users/me/test/foo 10'}
The use case for this being where I need to represent large simple data structures (dicts) in my Python code (integration tests), where the data values are dynamically generated from a smaller set of variables. But when I have many hundreds of such data structures that I need to include in the test case, it becomes infeasible to write the code for e.g. MyFile(...) out hundreds of times. This method allows me to use a script to ingest the data, and then print out compact Python code needed to recreate the data using my custom object class. Which I can then just copy/paste into my test cases.

How to get a list of all non imported names in a Python module?

Given a module containing :
import stuff
from foo import Foo
from bar import *
CST = True
def func(): pass
How can I define a function get_defined_objects so that I can do:
print(get_defined_objects('path.to.module'))
{'CST': True, 'func', <function path.to.module.func>}
Right now the only solution I can imagine is to read the original module file, extract defined names with re.search(r'^(?:def|class )?(\w+)(?:\s*=)?' then import the module, and find the intersection with __dict__.
Is there something cleaner ?

Here is something for you to start with using ast. Note that this code does not cover all possible cases, although it should handle e.g. multiple assignment properly. Consider investigating ast's data structures and API more closely if you would like to get access to compiled code, for example.
import ast
with open('module.py') as f:
data = f.read()
tree = ast.parse(data)
elements = [el for el in tree.body if type(el) in (ast.Assign, ast.FunctionDef, ast.ClassDef)]
result = {}
for el in elements:
if type(el) == ast.Assign:
for t in el.targets:
if type(el.value) == ast.Call:
result[t.id] = el.value.func.id + '()'
else:
for attr in ['id', 'i', 's']:
try:
result[t.id] = getattr(el.value, attr)
break
except Exception as e:
pass
elif type(el) == ast.FunctionDef:
result[el.name] = '<function %s>' % el.name
else:
result[el.name] = '<class %s>' % el.name
print result
#

mod = "foo"
import ast, inspect
import importlib
mod = importlib.import_module(mod)
p = ast.parse(inspect.getsource(mod))
from collections import defaultdict
data = defaultdict(defaultdict)
for node in p.body:
if isinstance(node, (ast.ImportFrom, ast.Import)):
continue
if isinstance(node, (ast.ClassDef, ast.FunctionDef)):
data["classes"][node.name] = mod.__dict__[node.name]
elif isinstance(node, ast.Assign):
for trg in node.targets:
if isinstance(node.value, ast.Num):
data["assignments"][trg.id] = node.value.n
elif isinstance(node.value, ast.Str):
data["assignments"][trg.id] = node.value.s
else:
data["assignments"][trg.id] = mod.__dict__[trg.id]
Output:
There is a nice explanation here that lists what the different types do and their attributes which this is based on:
class Nodes(ast.NodeVisitor):
def __init__(self):
self.data = defaultdict()
super(Nodes, self).__init__()
def visit_FunctionDef(self, node):
self.data[node.name] = mod.__dict__[node.name]
print("In FunctionDef with funcion {}".format(node.name))
def visit_ClassDef(self, node):
self.data[node.name] = mod.__dict__[node.name]
def visit_Assign(self, node):
for trg in node.targets:
if isinstance(node.value, (ast.Str, ast.Num, ast.Dict, ast.List, ast.ListComp, ast.NameConstant)):
self.data[trg.id] = mod.__dict__[trg.id]
self.generic_visit(node)
def visit_Name(self, node):
"""
class Name(idctx)
A variable name. id holds the name as a string
and ctx is either class Load class Store class Del.
"""
print("In Name with {}\n".format(node.id))
#
def visit_Dict(self, node):
"""
class Dict(keys, values)
A dictionary. keys and values
hold lists of nodes with matching order
"""
print("In Dict keys = {}, values = {}\n".format(node.keys,node.values))
def visit_Set(self,node):
"""
class Set(elts)
A set. elts holds a list of
nodes representing the elements.
"""
print("In Set elts = {}\n".format(node.elts))
def visit_List(self, node):
"""
class List(eltsctx)
lts holds a list of nodes representing the elements.
ctx is Store if the container
is an assignment target
(i.e. (x,y)=pt), and Load otherwise.
"""
print("In List elts = {}\nctx = {}\n".format(node.elts,node.ctx))
def visit_Tuple(self, node):
"""
class Tuple(eltsctx)
lts holds a list of nodes representing the elements.
ctx is Store if the container
is an assignment target
(i.e. (x,y)=pt), and Load otherwise.
"""
print("In Tuple elts = {}\nctx = {}\n".format(node.elts,node.ctx))
def visit_NameConstant(self, node):
"""
class NameConstant(value)
True, False or None. "value" holds one of those constants.
"""
print("In NameConstant getting value {}\n".format(node.value))
def visit_Load(self, node):
print("In Load with node {}\n".format(node.func))
def visit_Call(self, node):
"""
class Call(func, args, keywords, starargs, kwargs)
A function call. func is the function,
which will often be a Name or Attribute object. Of the arguments:
args holds a list of the arguments passed by position.
keywords holds a list of keyword objects representing arguments
passed by keyword.starargs and kwargs each hold a single node,
for arguments passed as *args and **kwargs.
"""
print("In Call with node {}\n".format(node.func))
def visit_Num(self, node):
print("In Num getting value {}\n".format(node.n))
def visit_Str(self, node):
print("In Str getting value {}\n".format(node.s))
f = Nodes()
f.visit(p)
print(f.data)

A bytecode hack for Python 3.4+. Possible due to dis.get_instructions.
import dis
import importlib
from itertools import islice
import marshal
import os
def consume_iterator(it, n=1):
next(islice(it, n, n), None)
def get_defined_names(module_path):
path, module_name = os.path.split(module_path)
module_name = module_name[:-3]
module_object = importlib.import_module(module_name)
pyc_name = '{}.cpython-34.pyc'.format(module_name)
pyc_path = os.path.join(path, '__pycache__/', pyc_name)
with open(pyc_path, 'rb') as f:
f.read(12) # drop the first 12 bytes
code = marshal.load(f)
# dis.disassemble(code) # see the byte code
instructions = dis.get_instructions(code)
objects = {}
for instruction in instructions:
if instruction.opname == 'STORE_NAME':
objects[instruction.argval] = getattr(module_object,
instruction.argval)
elif instruction.opname == 'IMPORT_NAME':
consume_iterator(instructions, 2)
elif instruction.opname == 'IMPORT_FROM':
consume_iterator(instructions, 1)
return objects
print(get_defined_names('/Users/ashwini/py/so.py'))
For a file like:
#/Users/ashwini/py/so.py
import os
from sys import argv, modules
from math import *
from itertools import product
CST = True
from itertools import permutations, combinations
from itertools import chain
E = 100
from itertools import starmap
def func(): pass
for x in range(10):
pass
class C:
a = 100
d = 1
The output will be:
{'d': 1, 'E': 100, 'CST': True, 'x': 9, 'func': <function func at 0x10efd0510>, 'C': <class 'so.C'>}
A much more better way as someone already mentioned in comments will be to parse the source code using ast module and find out the variable names from there.

While I accepted an answer, it can't hurt to post the solution I ended up using. It's a mix between the other proposals :
import ast
import inspect
import importlib
from types import ModuleType
def extract_definitions(module):
""" Returns the name and value of objects defined at the top level of the given module.
:param module: A module object or the name of the module to import.
:return: A dict {'classes': {}, 'functions': {}, 'assignments': {}} containing defined objects in the module.
"""
if not isinstance(module, ModuleType):
module = importlib.import_module(module)
tree = ast.parse(inspect.getsource(module))
definitions = {'classes': {}, 'functions': {}, 'assignments': {}}
for node in tree.body:
if isinstance(node, ast.ClassDef):
definitions["classes"][node.name] = getattr(module, node.name)
elif isinstance(node, ast.FunctionDef):
definitions["functions"][node.name] = getattr(module, node.name)
elif isinstance(node, ast.Assign):
# for unpacking, you need to loop on all names
for target in node.targets:
definitions["assignments"][target.id] = getattr(module, target.id)
return definitions
I added the ability to import from a string or a module object, then removed the parsing of values and replaced it by a simple getattr from the original module.

Untested
def unexported_names (module):
try:
return [name for name in module.__dict__ if name not in module.__all__]
except AttributeError:
return [name for name in module.__dict__ if name.startswith('_')]

How to extract functions used in a python code file?

I would like to create a list of all the functions used in a code file. For example if we have following code in a file named 'add_random.py'
`
import numpy as np
from numpy import linalg
def foo():
print np.random.rand(4) + np.random.randn(4)
print linalg.norm(np.random.rand(4))
`
I would like to extract the following list:
[numpy.random.rand, np.random.randn, np.linalg.norm, np.random.rand]
The list contains the functions used in the code with their actual name in the form of 'module.submodule.function'. Is there something built in python language that can help me do this?

You can extract all call expressions with:
import ast
class CallCollector(ast.NodeVisitor):
def __init__(self):
self.calls = []
self.current = None
def visit_Call(self, node):
# new call, trace the function expression
self.current = ''
self.visit(node.func)
self.calls.append(self.current)
self.current = None
def generic_visit(self, node):
if self.current is not None:
print "warning: {} node in function expression not supported".format(
node.__class__.__name__)
super(CallCollector, self).generic_visit(node)
# record the func expression
def visit_Name(self, node):
if self.current is None:
return
self.current += node.id
def visit_Attribute(self, node):
if self.current is None:
self.generic_visit(node)
self.visit(node.value)
self.current += '.' + node.attr
Use this with a ast parse tree:
tree = ast.parse(yoursource)
cc = CallCollector()
cc.visit(tree)
print cc.calls
Demo:
>>> tree = ast.parse('''\
... def foo():
... print np.random.rand(4) + np.random.randn(4)
... print linalg.norm(np.random.rand(4))
... ''')
>>> cc = CallCollector()
>>> cc.visit(tree)
>>> cc.calls
['np.random.rand', 'np.random.randn', 'linalg.norm']
The above walker only handles names and attributes; if you need more complex expression support, you'll have to extend this.
Note that collecting names like this is not a trivial task. Any indirection would not be handled. You could build a dictionary in your code of functions to call and dynamically swap out function objects, and static analysis like the above won't be able to track it.

In general, this problem is undecidable, consider for example getattribute(random, "random")().
If you want static analysis, the best there is now is jedi
If you accept dynamic solutions, then cover coverage is your best friend. It will show all used functions, rather than only directly referenced though.
Finally you can always roll your own dynamic instrumentation along the lines of:
import random
import logging
class Proxy(object):
def __getattr__(self, name):
logging.debug("tried to use random.%s", name)
return getattribute(_random, name)
_random = random
random = Proxy()

Lazy data-flow (spreadsheet like) properties with dependencies in Python

My problem is the following: I have some python classes that have properties that are derived from other properties; and those should be cached once they are calculated, and the cached results should be invalidated each time the base properties are changed.
I could do it manually, but it seems quite difficult to maintain if the number of properties grows. So I would like to have something like Makefile rules inside my objects to automatically keep track of what needs to be recalculated.
The desired syntax and behaviour should be something like that:
# this does dirty magic, like generating the reverse dependency graph,
# and preparing the setters that invalidate the cached values
#dataflow_class
class Test(object):
def calc_a(self):
return self.b + self.c
def calc_c(self):
return self.d * 2
a = managed_property(calculate=calc_a, depends_on=('b', 'c'))
b = managed_property(default=0)
c = managed_property(calculate=calc_c, depends_on=('d',))
d = managed_property(default=0)
t = Test()
print t.a
# a has not been initialized, so it calls calc_a
# gets b value
# c has not been initialized, so it calls calc_c
# c value is calculated and stored in t.__c
# a value is calculated and stored in t.__a
t.b = 1
# invalidates the calculated value stored in self.__a
print t.a
# a has been invalidated, so it calls calc_a
# gets b value
# gets c value, from t.__c
# a value is calculated and stored in t.__a
print t.a
# gets value from t.__a
t.d = 2
# invalidates the calculated values stored in t.__a and t.__c
So, is there something like this already available or should I start implementing my own? In the second case, suggestions are welcome :-)

Here, this should do the trick.
The descriptor mechanism (through which the language implements "property") is
more than enough for what you want.
If the code bellow does not work in some corner cases, just write me.
class DependentProperty(object):
def __init__(self, calculate=None, default=None, depends_on=()):
# "name" and "dependence_tree" properties are attributes
# set up by the metaclass of the owner class
if calculate:
self.calculate = calculate
else:
self.default = default
self.depends_on = set(depends_on)
def __get__(self, instance, owner):
if hasattr(self, "default"):
return self.default
if not hasattr(instance, "_" + self.name):
setattr(instance, "_" + self.name,
self.calculate(instance, getattr(instance, "_" + self.name + "_last_value")))
return getattr(instance, "_" + self.name)
def __set__(self, instance, value):
setattr(instance, "_" + self.name + "_last_value", value)
setattr(instance, "_" + self.name, self.calculate(instance, value))
for attr in self.dependence_tree[self.name]:
delattr(instance, attr)
def __delete__(self, instance):
try:
delattr(instance, "_" + self.name)
except AttributeError:
pass
def assemble_tree(name, dict_, all_deps = None):
if all_deps is None:
all_deps = set()
for dependance in dict_[name].depends_on:
all_deps.add(dependance)
assemble_tree(dependance, dict_, all_deps)
return all_deps
def invert_tree(tree):
new_tree = {}
for key, val in tree.items():
for dependence in val:
if dependence not in new_tree:
new_tree[dependence] = set()
new_tree[dependence].add(key)
return new_tree
class DependenceMeta(type):
def __new__(cls, name, bases, dict_):
dependence_tree = {}
properties = []
for key, val in dict_.items():
if not isinstance(val, DependentProperty):
continue
val.name = key
val.dependence_tree = dependence_tree
dependence_tree[key] = set()
properties.append(val)
inverted_tree = {}
for property in properties:
inverted_tree[property.name] = assemble_tree(property.name, dict_)
dependence_tree.update(invert_tree(inverted_tree))
return type.__new__(cls, name, bases, dict_)
if __name__ == "__main__":
# Example and visual test:
class Bla:
__metaclass__ = DependenceMeta
def calc_b(self, x):
print "Calculating b"
return x + self.a
def calc_c(self, x):
print "Calculating c"
return x + self.b
a = DependentProperty(default=10)
b = DependentProperty(depends_on=("a",), calculate=calc_b)
c = DependentProperty(depends_on=("b",), calculate=calc_c)
bla = Bla()
bla.b = 5
bla.c = 10
print bla.a, bla.b, bla.c
bla.b = 10
print bla.b
print bla.c

I would like to have something like Makefile rules
then use one! You may consider this model:
one rule = one python file
one result = one *.data file
the pipe is implemented as a makefile or with another dependency analysis tool (cmake, scons)
The hardware test team in our company use such a framework for intensive exploratory tests:
you can integrate other languages and tools easily
you get a stable and proven solution
computations may be distributed one multiple cpu/computers
you track dependencies on values and rules
debug of intermediate values is easy
the (big) downside to this method is that you have to give up python import keyword because it creates an implicit (and untracked) dependency (there are workarounds for this).

import collections
sentinel=object()
class ManagedProperty(object):
'''
If deptree = {'a':set('b','c')}, then ManagedProperties `b` and
`c` will be reset whenever `a` is modified.
'''
def __init__(self,property_name,calculate=None,depends_on=tuple(),
default=sentinel):
self.property_name=property_name
self.private_name='_'+property_name
self.calculate=calculate
self.depends_on=depends_on
self.default=default
def __get__(self,obj,objtype):
if obj is None:
# Allows getattr(cls,mprop) to return the ManagedProperty instance
return self
try:
return getattr(obj,self.private_name)
except AttributeError:
result=(getattr(obj,self.calculate)()
if self.default is sentinel else self.default)
setattr(obj,self.private_name,result)
return result
def __set__(self,obj,value):
# obj._dependencies is defined by #register
map(obj.__delattr__,getattr(obj,'_dependencies').get(self.property_name,tuple()))
setattr(obj,self.private_name,value)
def __delete__(self,obj):
if hasattr(obj,self.private_name):
delattr(obj,self.private_name)
def register(*mproperties):
def flatten_dependencies(name, deptree, all_deps=None):
'''
A deptree such as {'c': set(['a']), 'd': set(['c'])} means
'a' depends on 'c' and 'c' depends on 'd'.
Given such a deptree, flatten_dependencies('d', deptree) returns the set
of all property_names that depend on 'd' (i.e. set(['a','c']) in the
above case).
'''
if all_deps is None:
all_deps = set()
for dep in deptree.get(name,tuple()):
all_deps.add(dep)
flatten_dependencies(dep, deptree, all_deps)
return all_deps
def classdecorator(cls):
deptree=collections.defaultdict(set)
for mprop in mproperties:
setattr(cls,mprop.property_name,mprop)
# Find all ManagedProperties in dir(cls). Note that some of these may be
# inherited from bases of cls; they may not be listed in mproperties.
# Doing it this way allows ManagedProperties to be overridden by subclasses.
for propname in dir(cls):
mprop=getattr(cls,propname)
if not isinstance(mprop,ManagedProperty):
continue
for underlying_prop in mprop.depends_on:
deptree[underlying_prop].add(mprop.property_name)
# Flatten the dependency tree so no recursion is necessary. If one were
# to use recursion instead, then a naive algorithm would make duplicate
# calls to __delete__. By flattening the tree, there are no duplicate
# calls to __delete__.
dependencies={key:flatten_dependencies(key,deptree)
for key in deptree.keys()}
setattr(cls,'_dependencies',dependencies)
return cls
return classdecorator
These are the unit tests I used to verify its behavior.
if __name__ == "__main__":
import unittest
import sys
def count(meth):
def wrapper(self,*args):
countname=meth.func_name+'_count'
setattr(self,countname,getattr(self,countname,0)+1)
return meth(self,*args)
return wrapper
class Test(unittest.TestCase):
def setUp(self):
#register(
ManagedProperty('d',default=0),
ManagedProperty('b',default=0),
ManagedProperty('c',calculate='calc_c',depends_on=('d',)),
ManagedProperty('a',calculate='calc_a',depends_on=('b','c')))
class Foo(object):
#count
def calc_a(self):
return self.b + self.c
#count
def calc_c(self):
return self.d * 2
#register(ManagedProperty('c',calculate='calc_c',depends_on=('b',)),
ManagedProperty('a',calculate='calc_a',depends_on=('b','c')))
class Bar(Foo):
#count
def calc_c(self):
return self.b * 3
self.Foo=Foo
self.Bar=Bar
self.foo=Foo()
self.foo2=Foo()
self.bar=Bar()
def test_two_instances(self):
self.foo.b = 1
self.assertEqual(self.foo.a,1)
self.assertEqual(self.foo.b,1)
self.assertEqual(self.foo.c,0)
self.assertEqual(self.foo.d,0)
self.assertEqual(self.foo2.a,0)
self.assertEqual(self.foo2.b,0)
self.assertEqual(self.foo2.c,0)
self.assertEqual(self.foo2.d,0)
def test_initialization(self):
self.assertEqual(self.foo.a,0)
self.assertEqual(self.foo.calc_a_count,1)
self.assertEqual(self.foo.a,0)
self.assertEqual(self.foo.calc_a_count,1)
self.assertEqual(self.foo.b,0)
self.assertEqual(self.foo.c,0)
self.assertEqual(self.foo.d,0)
self.assertEqual(self.bar.a,0)
self.assertEqual(self.bar.b,0)
self.assertEqual(self.bar.c,0)
self.assertEqual(self.bar.d,0)
def test_dependence(self):
self.assertEqual(self.Foo._dependencies,
{'c': set(['a']), 'b': set(['a']), 'd': set(['a', 'c'])})
self.assertEqual(self.Bar._dependencies,
{'c': set(['a']), 'b': set(['a', 'c'])})
def test_setting_property_updates_dependent(self):
self.assertEqual(self.foo.a,0)
self.assertEqual(self.foo.calc_a_count,1)
self.foo.b = 1
# invalidates the calculated value stored in foo.a
self.assertEqual(self.foo.a,1)
self.assertEqual(self.foo.calc_a_count,2)
self.assertEqual(self.foo.b,1)
self.assertEqual(self.foo.c,0)
self.assertEqual(self.foo.d,0)
self.foo.d = 2
# invalidates the calculated values stored in foo.a and foo.c
self.assertEqual(self.foo.a,5)
self.assertEqual(self.foo.calc_a_count,3)
self.assertEqual(self.foo.b,1)
self.assertEqual(self.foo.c,4)
self.assertEqual(self.foo.d,2)
self.assertEqual(self.bar.a,0)
self.assertEqual(self.bar.calc_a_count,1)
self.assertEqual(self.bar.b,0)
self.assertEqual(self.bar.c,0)
self.assertEqual(self.bar.calc_c_count,1)
self.assertEqual(self.bar.d,0)
self.bar.b = 2
self.assertEqual(self.bar.a,8)
self.assertEqual(self.bar.calc_a_count,2)
self.assertEqual(self.bar.b,2)
self.assertEqual(self.bar.c,6)
self.assertEqual(self.bar.calc_c_count,2)
self.assertEqual(self.bar.d,0)
self.bar.d = 2
self.assertEqual(self.bar.a,8)
self.assertEqual(self.bar.calc_a_count,2)
self.assertEqual(self.bar.b,2)
self.assertEqual(self.bar.c,6)
self.assertEqual(self.bar.calc_c_count,2)
self.assertEqual(self.bar.d,2)
sys.argv.insert(1,'--verbose')
unittest.main(argv=sys.argv)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: ambiguities when (un-)pickling a dict-based tree - python

If this is still not working (see Aya's comment), is using another tree structure an option for you? If so, here is a script to pickle DOM tree structure.

Related

pickle, dill and cloudpickle returning field as empty dict on custom class after process termination

How to convert an object back into the code used to create it?

How to get a list of all non imported names in a Python module?

How to extract functions used in a python code file?

Lazy data-flow (spreadsheet like) properties with dependencies in Python

Categories

Resources