Deferring a computation in custom object until data available

Deferring a computation in custom object until data available - python

I have a custom class like the below. The idea, as the naming suggests, is that I want to evaluate a token stream in a parser-type tool. Once a bunch of constructs have been parsed out and put into data structures, certain sequences of tokens will evaluate to an int, but when the data structures aren't available yet, a function just returns None instead. This single complex data structure 'constructs' gets passed around pretty much everywhere in the program.
class Toks(list):
def __init__(self, seq, constructs=Constructs()):
list.__init__(self, seq)
self.constructs = constructs
#property
def as_value(self):
val = tokens_as_value(self, self.constructs)
return val if val is not None else self
At points in the code, I want to assign this maybe-computable value to a name, e.g.:
mything.val = Toks(tokens[start:end], constructs).as_value
Well, this gives mything.val either an actual int value or a funny thing that allows us to compute a value later. But this requires a later pass to actually perform the computation, similar to:
if not isinstance(mything.val, int):
mything.val = mything.val.as_value
As it happens, I can do this in my program. However, what I'd really like to happen is to avoid the second pass altogether, and just have access to the property perform the computation and give the computed value if it's computable at that point (and perhaps evaluate to some sentinal if it's not possible to compute).
Any ideas?
To clarify: Depending on the case I get "value" differently; actual code is more like:
if tok.type == 'NUMBER':
mything.val = tok.value # A simple integer value
else:
mything.val = Toks(tokens[start:end], constructs).as_value
There are additional cases, sometimes I know I know the actual value early, and sometimes I'm not sure if I'll only know it later.
I realize I can defer calling (a bit more compactly than #dana suggests) with:
return val if val is not None else lambda: self.as_value
However, that makes later access inconsistent between mything.val and mything.val(), so I'd still have to guard it with an if to see which style to use. It's the same inconvenience whether I need to fall back to mything.val.as_value or to mything.val() after the type check.

You could easily do something like:
class NaiveLazy(object):
def __init__(self, ctor):
self.ctor = ctor
self._value = None
#property
def value(self):
if self._value is None:
self._value = ctor()
return self._value
mything = NaiveLazy(lambda: time.sleep(5) and 10)
And then always use mything.value (example to demonstrate evaluation):
print mything.value # will wait 5 seconds and print 10
print mything.value # will print 10
I've seen some utility libraries create a special object for undefined in case ctor returns None. If you eventually want to extend your code beyond ints, you should think about that:
class Undefined(object): pass
UNDEFINED = Undefined()
#...
self._value = UNDEFINED
#...
if self._value is UNDEFINED: self._value = ctor()
For your example specifically:
def toks_ctor(seq, constructs=Constructs()):
return lambda l=list(seq): tokens_as_value(l, constructs) or UNDEFINED
mything = NaiveLazy(toks_ctor(tokens[start:end], constructs))

If you're using Python3.2+, consider a Future object. This tool lets you run any number of calculations in the background. You can wait for a single future to be completed, and use its value. Or, you can "stream" the results one at a time as they're completed.

You could return a callable object from as_value, which would allow you automatically check for the real return value automatically. The one drawback is you'd need to use mything.val() instead of mything.val:
def tokens_as_value(toks, constructs):
if constructs.constructed:
return "some value"
else:
return None
class Constructs(object):
def __init__(self):
self.constructed = False
class Toks(list):
def __init__(self, seq, constructs=Constructs()):
list.__init__(self, seq)
self.constructs = constructs
#property
def as_value(self):
return FutureVal(tokens_as_value, self, self.constructs)
class FutureVal(object):
def __init__(self, func, *args, **kwargs):
self.func = func
self._val = None
self.args = args
self.kwargs = kwargs
def __call__(self):
if self._val is None:
self._val = self.func(*self.args, **self.kwargs)
return self._val
Just for the purposes of the example, Constructs just contains a boolean that indicates whether or not a real value should be returned from tokens_as_value.
Usage:
>>> t = test.Toks([])
>>> z = t.as_value
>>> z
<test.FutureVal object at 0x7f7292c96150>
>>> print(z())
None
>>> t.constructs.constructed = True
>>> print(z())
our value

Related

How can I return self and another variable in a python class method while method chaining?

I understand what I am asking here is probably not the best code design, but the reason for me asking is strictly academic. I am trying to understand how to make this concept work.
Typically, I will return self from a class method so that the following methods can be chained together. My understanding is by returning self, I am simply returning an instance of the class, for the following methods to work on.
But in this case, I am trying to figure out how to return both self and another value from the method. The idea is if I do not want to chain, or I do not call any class attributes, I want to retrieve the data from the method being called.
Consider this example:
class Test(object):
def __init__(self):
self.hold = None
def methoda(self):
self.hold = 'lol'
return self, 'lol'
def newmethod(self):
self.hold = self.hold * 2
return self, 2
t = Test()
t.methoda().newmethod()
print(t.hold)
In this case, I will get an AttributeError: 'tuple' object has no attribute 'newmethod' which is to be expected because the methoda method is returning a tuple which does not have any methods or attributes called newmethod.
My question is not about unpacking multiple returns, but more about how can I continue to chain methods when the preceding methods are returning multiple values. I also understand that I can control the methods return with an argument to it, but that is not what I am trying to do.
As mentioned previously, I do realize this is probably a bad question, and I am happy to delete the post if the question doesnt make any sense.

Following the suggestion by #JohnColeman, you can return a special tuple with attribute lookup delegated to your object if it is not a normal tuple attribute. That way it acts like a normal tuple except when you are chaining methods.
You can implement this as follows:
class ChainResult(tuple):
def __new__(cls, *args):
return super(ChainResult, cls).__new__(cls, args)
def __getattribute__(self, name):
try:
return getattr(super(), name)
except AttributeError:
return getattr(super().__getitem__(0), name)
class Test(object):
def __init__(self):
self.hold = None
def methoda(self):
self.hold = 'lol'
return ChainResult(self, 'lol')
def newmethod(self):
self.hold = self.hold * 2
return ChainResult(self, 2)
Testing:
>>> t = Test()
>>> t.methoda().newmethod()
>>> print(t.hold)
lollol
The returned result does indeed act as a tuple:
>>> t, res = t.methoda().newmethod()
>>> print(res)
2
>>> print(isinstance(t.methoda().newmethod(), tuple))
True
You could imagine all sorts of semantics with this, such as forwarding the returned values to the next method in the chain using closure:
class ChainResult(tuple):
def __new__(cls, *args):
return super(ChainResult, cls).__new__(cls, args)
def __getattribute__(self, name):
try:
return getattr(super(), name)
except AttributeError:
attr = getattr(super().__getitem__(0), name)
if callable(attr):
chain_results = super().__getitem__(slice(1, None))
return lambda *args, **kw: attr(*(chain_results+args), **kw)
else:
return attr
For example,
class Test:
...
def methodb(self, *args):
print(*args)
would produce
>>> t = Test()
>>> t.methoda().methodb('catz')
lol catz
It would be nice if you could make ChainResults invisible. You can almost do it by initializing the tuple base class with the normal results and saving your object in a separate attribute used only for chaining. Then use a class decorator that wraps every method with ChainResults(self, self.method(*args, **kw)). It will work okay for methods that return a tuple but a single value return will act like a length 1 tuple, so you will need something like obj.method()[0] or result, = obj.method() to work with it. I played a bit with delegating to tuple for a multiple return or to the value itself for a single return; maybe it could be made to work but it introduces so many ambiguities that I doubt it could work well.

How is lazy evaluation implemented (in ORMs for example)

Im curious to know how lazy evaluation is implemented at higher levels, ie in libraries, etc. For example, how does the Django ORM or ActiveRecord defer evaluation of query until it is actually used?

Let's have a look at some methods for django's django.db.models.query.QuerySet class:
class QuerySet(object):
"""
Represents a lazy database lookup for a set of objects.
"""
def __init__(self, model=None, query=None, using=None):
...
self._result_cache = None
...
def __len__(self):
if self._result_cache is None:
...
elif self._iter:
...
return len(self._result_cache)
def __iter__(self):
if self._result_cache is None:
...
if self._iter:
...
return iter(self._result_cache)
def __nonzero__(self):
if self._result_cache is not None:
...
def __contains__(self, val):
if self._result_cache is not None:
...
else:
...
...
def __getitem__(self, k):
...
if self._result_cache is not None:
...
...
The pattern that these methods follow is that no queries are executed until some method that really needs to return some result is called. At that point, the result is stored in self._result_cache and any subsequent call to the same method returns the cached value.

In Python, one object may "exist" - but its intrinsic value will only be known by the outer world at the moment it is used with one of the operators - since the operators are defined in the class by the magic names with double underscores, if a class writes the appropriate code to execute the deferred code when the operator is called, it is just fine.
That means, if the object's value is, for example, to be used like a string, any part of the program that will use the object will call, at some point, the "__str__" coercion method.
For example, let's create an object that behaves like a string, but tells the current time. Strings can be concatenated to other strings(__add__), can have their length requested (__len__), and so on. If we want it to fit perfectly in the place of a string, we'd have to override all methods. The idea is to retrieve the actual value just when one of the operators is called - otherwise, the actual object can freely be assigned to variables, and passed around. It will only be evaluated when its value is needed
Then, one can have some code like this:
class timestr(object):
def __init__(self):
self.value = None
def __str__(self):
self._getvalue()
return self.value
def __len__(self):
self._getvalue()
return len(self.value)
def __add__(self, other):
self._getvalue()
return self.value + other
def _getvalue(self):
timet = time.localtime()
self.value = " %s:%s:%s " % (timet.tm_hour, timet.tm_min, timet.tm_sec)
And using it on the console, you may have:
>>> a = timestr()
>>> b = timestr()
>>> print b
17:16:22
>>> print a
17:16:25
If the value for which you want a lazy evaluation is an attribute of your object (like Peson.name ) instead of what your object actually behaves like - it is even easier. Because Python allows all object attributes to be of a special type - called a descriptor -- which actually has a method called each time the attribute will be accessed. Therefore, one just has to create a class with a proper method named __get__ to fetch the actual value. This method will be called only when the attribute is needed.
Python even has an utility for easy descriptor creation - the "property" keyword, that makes this even easier - you pass a method that is the code to generate the attribute as the first parameter to property.
So, having an Event class with a lazy (and live) evaluated time, is just a matter of writting:
import time
class Event(object):
#property
def time(self):
timet = time.localtime()
return " %s:%s:%s " % (timet.tm_hour, timet.tm_min, timet.tm_sec)
And use it as in:
>>> e= Event()
>>> e.time
' 17:25:8 '
>>> e.time
' 17:25:10 '

The mechanism is quite simple:
class Lazy:
def __init__(self, evaluate):
self.evaluate = evaluate
self.computed = False
def getresult(self):
if not self.computed:
self.result = self.evaluate()
self.computed = True
return self.result
Then, this utility can be used as:
def some_computation(a, b):
return ...
# bind the computation to its operands, but don't evaluate it yet.
lazy = Lazy(lambda: some_computation(1, 2))
# "some_computation()" is evaluated now.
print lazy.getresult()
# use the cached result again without re-computing.
print lazy.getresult()
This implementation uses callables to represent the computation, but there are many variations on this theme (e.g. a base class that requires you to imlement an evaluate() method, etc.).

Not sure about the specifics about which library you talking about but, from an algorithm standpoint, I've always used/undertsood it as follows: (psuedo code from a python novice)
class Object:
#... Other stuff ...
_actual_property = None;
def interface():
if _actual_property is None:
# Execute query and load up _actual_property
return _actual_property
Essentially because the interface and implementation are separated, you can define behaviors to execute upon request.

String construction using OOP and Proxy pattern

I find it very interesting the way how SQLAlchemy constructing query strings, eg:
(Session.query(model.User)
.filter(model.User.age > 18)
.order_by(model.User.age)
.all())
As far as I can see, there applied some kind of Proxy Pattern. In my small project I need to make similar string construction using OOP approach. So, I tried to reconstitute this behavior.
Firstly, some kind of object, one of plenty similar objects:
class SomeObject(object):
items = None
def __init__(self):
self.items = []
def __call__(self):
return ' '.join(self.items) if self.items is not None else ''
def a(self):
self.items.append('a')
return self
def b(self):
self.items.append('b')
return self
All methods of this object return self, so I can call them in any order and unlimited number of times.
Secondly, proxy object, that will call subject's methods if it's not a perform method, which calls object to see the resulting string.
import operator
class Proxy(object):
def __init__(self, some_object):
self.some_object = some_object
def __getattr__(self, name):
self.method = operator.methodcaller(name)
return self
def __call__(self, *args, **kw):
self.some_object = self.method(self.some_object, *args, **kw)
return self
def perform(self):
return self.some_object()
And finally:
>>> obj = SomeObject()
>>> p = Proxy(obj)
>>> print p.a().a().b().perform()
a a b
What can you say about this implementation? Is there better ways to make the desirable amount of classes that would make such a string cunstructing with the same syntax?
PS: Sorry for my english, it's not my primary language.

Actually what you are looking at is not a proxy pattern but the builder pattern, and yes your implementation is IMHO is the classic one (using the Fluent interface pattern).

I don't know what SQLAlchemy does, but I would implement the interface by having the Session.query() method return a Query object with methods like filter(), order_by(), all() etc. Each of these methods simply returns a new Query object taking into account the applied changes. This allows for method chaining as in your first example.
Your own code example has numerous problems. One example
obj = SomeObject()
p = Proxy(obj)
a = p.a
b = p.b
print a().perform() # prints b

python lazy variables? or, delayed expensive computation

I have a set of arrays that are very large and expensive to compute, and not all will necessarily be needed by my code on any given run. I would like to make their declaration optional, but ideally without having to rewrite my whole code.
Example of how it is now:
x = function_that_generates_huge_array_slowly(0)
y = function_that_generates_huge_array_slowly(1)
Example of what I'd like to do:
x = lambda: function_that_generates_huge_array_slowly(0)
y = lambda: function_that_generates_huge_array_slowly(1)
z = x * 5 # this doesn't work because lambda is a function
# is there something that would make this line behave like
# z = x() * 5?
g = x * 6
While using lambda as above achieves one of the desired effects - computation of the array is delayed until it is needed - if you use the variable "x" more than once, it has to be computed each time. I'd like to compute it only once.
EDIT:
After some additional searching, it looks like it is possible to do what I want (approximately) with "lazy" attributes in a class (e.g. http://code.activestate.com/recipes/131495-lazy-attributes/). I don't suppose there's any way to do something similar without making a separate class?
EDIT2: I'm trying to implement some of the solutions, but I'm running in to an issue because I don't understand the difference between:
class sample(object):
def __init__(self):
class one(object):
def __get__(self, obj, type=None):
print "computing ..."
obj.one = 1
return 1
self.one = one()
and
class sample(object):
class one(object):
def __get__(self, obj, type=None):
print "computing ... "
obj.one = 1
return 1
one = one()
I think some variation on these is what I'm looking for, since the expensive variables are intended to be part of a class.

The first half of your problem (reusing the value) is easily solved:
class LazyWrapper(object):
def __init__(self, func):
self.func = func
self.value = None
def __call__(self):
if self.value is None:
self.value = self.func()
return self.value
lazy_wrapper = LazyWrapper(lambda: function_that_generates_huge_array_slowly(0))
But you still have to use it as lazy_wrapper() not lazy_wrapper.
If you're going to be accessing some of the variables many times, it may be faster to use:
class LazyWrapper(object):
def __init__(self, func):
self.func = func
def __call__(self):
try:
return self.value
except AttributeError:
self.value = self.func()
return self.value
Which will make the first call slower and subsequent uses faster.
Edit: I see you found a similar solution that requires you to use attributes on a class. Either way requires you rewrite every lazy variable access, so just pick whichever you like.
Edit 2: You can also do:
class YourClass(object)
def __init__(self, func):
self.func = func
#property
def x(self):
try:
return self.value
except AttributeError:
self.value = self.func()
return self.value
If you want to access x as an instance attribute. No additional class is needed. If you don't want to change the class signature (by making it require func), you can hard code the function call into the property.

Writing a class is more robust, but optimizing for simplicity (which I think you are asking for), I came up with the following solution:
cache = {}
def expensive_calc(factor):
print 'calculating...'
return [1, 2, 3] * factor
def lookup(name):
return ( cache[name] if name in cache
else cache.setdefault(name, expensive_calc(2)) )
print 'run one'
print lookup('x') * 2
print 'run two'
print lookup('x') * 2

Python 3.2 and greater implement an LRU algorithm in the functools module to handle simple cases of caching/memoization:
import functools
#functools.lru_cache(maxsize=128) #cache at most 128 items
def f(x):
print("I'm being called with %r" % x)
return x + 1
z = f(9) + f(9)**2

You can't make a simple name, like x, to really evaluate lazily. A name is just an entry in a hash table (e.g. in that which locals() or globals() return). Unless you patch access methods of these system tables, you cannot attach execution of your code to simple name resolution.
But you can wrap functions in caching wrappers in different ways.
This is an OO way:
class CachedSlowCalculation(object):
cache = {} # our results
def __init__(self, func):
self.func = func
def __call__(self, param):
already_known = self.cache.get(param, None)
if already_known:
return already_known
value = self.func(param)
self.cache[param] = value
return value
calc = CachedSlowCalculation(function_that_generates_huge_array_slowly)
z = calc(1) + calc(1)**2 # only calculates things once
This is a classless way:
def cached(func):
func.__cache = {} # we can attach attrs to objects, functions are objects
def wrapped(param):
cache = func.__cache
already_known = cache.get(param, None)
if already_known:
return already_known
value = func(param)
cache[param] = value
return value
return wrapped
#cached
def f(x):
print "I'm being called with %r" % x
return x + 1
z = f(9) + f(9)**2 # see f called only once
In real world you'll add some logic to keep the cache to a reasonable size, possibly using a LRU algorithm.

To me, it seems that the proper solution for your problem is subclassing a dict and using it.
class LazyDict(dict):
def __init__(self, lazy_variables):
self.lazy_vars = lazy_variables
def __getitem__(self, key):
if key not in self and key in self.lazy_vars:
self[key] = self.lazy_vars[key]()
return super().__getitem__(key)
def generate_a():
print("generate var a lazily..")
return "<a_large_array>"
# You can add as many variables as you want here
lazy_vars = {'a': generate_a}
lazy = LazyDict(lazy_vars)
# retrieve the variable you need from `lazy`
a = lazy['a']
print("Got a:", a)
And you can actually evaluate a variable lazily if you use exec to run your code. The solution is just using a custom globals.
your_code = "print('inside exec');print(a)"
exec(your_code, lazy)
If you did your_code = open(your_file).read(), you could actually run your code and achieve what you want. But I think the more practical approach would be the former one.

Mapping obj.method({argument:value}) to obj.argument(value)

I don't know if this will make sense, but...
I'm trying to dynamically assign methods to an object.
#translate this
object.key(value)
#into this
object.method({key:value})
To be more specific in my example, I have an object (which I didn't write), lets call it motor, which has some generic methods set, status and a few others. Some take a dictionary as an argument and some take a list. To change the motor's speed, and see the result, I use:
motor.set({'move_at':10})
print motor.status('velocity')
The motor object, then formats this request into a JSON-RPC string, and sends it to an IO daemon. The python motor object doesn't care what the arguments are, it just handles JSON formatting and sockets. The strings move_at and velocity are just two of what might be hundreds of valid arguments.
What I'd like to do is the following instead:
motor.move_at(10)
print motor.velocity()
I'd like to do it in a generic way since I have so many different arguments I can pass. What I don't want to do is this:
# create a new function for every possible argument
def move_at(self,x)
return self.set({'move_at':x})
def velocity(self)
return self.status('velocity')
#and a hundred more...
I did some searching on this which suggested the solution lies with lambdas and meta programming, two subjects I haven't been able to get my head around.
UPDATE:
Based on the code from user470379 I've come up with the following...
# This is what I have now....
class Motor(object):
def set(self,a_dict):
print "Setting a value", a_dict
def status(self,a_list):
print "requesting the status of", a_list
return 10
# Now to extend it....
class MyMotor(Motor):
def __getattr__(self,name):
def special_fn(*value):
# What we return depends on how many arguments there are.
if len(value) == 0: return self.status((name))
if len(value) == 1: return self.set({name:value[0]})
return special_fn
def __setattr__(self,attr,value): # This is based on some other answers
self.set({attr:value})
x = MyMotor()
x.move_at = 20 # Uses __setattr__
x.move_at(10) # May remove this style from __getattr__ to simplify code.
print x.velocity()
output:
Setting a value {'move_at': 20}
Setting a value {'move_at': 10}
10
Thank you to everyone who helped!

What about creating your own __getattr__ for the class that returns a function created on the fly? IIRC, there's some tricky cases to watch out for between __getattr__ and __getattribute__ that I don't recall off the top of my head, I'm sure someone will post a comment to remind me:
def __getattr__(self, name):
def set_fn(self, value):
return self.set({name:value})
return set_fn
Then what should happen is that calling an attribute that doesn't exist (ie: move_at) will call the __getattr__ function and create a new function that will be returned (set_fn above). The name variable of that function will be bound to the name parameter passed into __getattr__ ("move_at" in this case). Then that new function will be called with the arguments you passed (10 in this case).
Edit
A more concise version using lambdas (untested):
def __getattr__(self, name):
return lambda value: self.set({name:value})

There are a lot of different potential answers to this, but many of them will probably involve subclassing the object and/or writing or overriding the __getattr__ function.
Essentially, the __getattr__ function is called whenever python can't find an attribute in the usual way.
Assuming you can subclass your object, here's a simple example of what you might do (it's a bit clumsy but it's a start):
class foo(object):
def __init__(self):
print "initting " + repr(self)
self.a = 5
def meth(self):
print self.a
class newfoo(foo):
def __init__(self):
super(newfoo, self).__init__()
def meth2(): # Or, use a lambda: ...
print "meth2: " + str(self.a) # but you don't have to
self.methdict = { "meth2":meth2 }
def __getattr__(self, name):
return self.methdict[name]
f = foo()
g = newfoo()
f.meth()
g.meth()
g.meth2()
Output:
initting <__main__.foo object at 0xb7701e4c>
initting <__main__.newfoo object at 0xb7701e8c>
5
5
meth2: 5

You seem to have certain "properties" of your object that can be set by
obj.set({"name": value})
and queried by
obj.status("name")
A common way to go in Python is to map this behaviour to what looks like simple attribute access. So we write
obj.name = value
to set the property, and we simply use
obj.name
to query it. This can easily be implemented using the __getattr__() and __setattr__() special methods:
class MyMotor(Motor):
def __init__(self, *args, **kw):
self._init_flag = True
Motor.__init__(self, *args, **kw)
self._init_flag = False
def __getattr__(self, name):
return self.status(name)
def __setattr__(self, name, value):
if self._init_flag or hasattr(self, name):
return Motor.__setattr__(self, name, value)
return self.set({name: value})
Note that this code disallows the dynamic creation of new "real" attributes of Motor instances after the initialisation. If this is needed, corresponding exceptions could be added to the __setattr__() implementation.

Instead of setting with function-call syntax, consider using assignment (with =). Similarly, just use attribute syntax to get a value, instead of function-call syntax. Then you can use __getattr__ and __setattr__:
class OtherType(object): # this is the one you didn't write
# dummy implementations for the example:
def set(self, D):
print "setting", D
def status(self, key):
return "<value of %s>" % key
class Blah(object):
def __init__(self, parent):
object.__setattr__(self, "_parent", parent)
def __getattr__(self, attr):
return self._parent.status(attr)
def __setattr__(self, attr, value):
self._parent.set({attr: value})
obj = Blah(OtherType())
obj.velocity = 42 # prints setting {'velocity': 42}
print obj.velocity # prints <value of velocity>

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Deferring a computation in custom object until data available - python

If you're using Python3.2+, consider a Future object. This tool lets you run any number of calculations in the background. You can wait for a single future to be completed, and use its value. Or, you can "stream" the results one at a time as they're completed.

Related

How can I return self and another variable in a python class method while method chaining?

How is lazy evaluation implemented (in ORMs for example)

String construction using OOP and Proxy pattern

python lazy variables? or, delayed expensive computation

Mapping obj.method({argument:value}) to obj.argument(value)

Categories

Resources