What is lazy evaluation in Python?
One website said :
In Python 3.x the range() function returns a special range object which computes elements of the list on demand (lazy or deferred evaluation):
>>> r = range(10)
>>> print(r)
range(0, 10)
>>> print(r[3])
3
What is meant by this?
The object returned by range() (or xrange() in Python2.x) is known as a lazy iterable.
Instead of storing the entire range, [0,1,2,..,9], in memory, the generator stores a definition for (i=0; i<10; i+=1) and computes the next value only when needed (AKA lazy-evaluation).
Essentially, a generator allows you to return a list like structure, but here are some differences:
A list stores all elements when it is created. A generator generates the next element when it is needed.
A list can be iterated over as much as you need, a generator can only be iterated over exactly once.
A list can get elements by index, a generator cannot -- it only generates values once, from start to end.
A generator can be created in two ways:
(1) Very similar to a list comprehension:
# this is a list, create all 5000000 x/2 values immediately, uses []
lis = [x/2 for x in range(5000000)]
# this is a generator, creates each x/2 value only when it is needed, uses ()
gen = (x/2 for x in range(5000000))
(2) As a function, using yield to return the next value:
# this is also a generator, it will run until a yield occurs, and return that result.
# on the next call it picks up where it left off and continues until a yield occurs...
def divby2(n):
num = 0
while num < n:
yield num/2
num += 1
# same as (x/2 for x in range(5000000))
print divby2(5000000)
Note: Even though range(5000000) is a generator in Python3.x, [x/2 for x in range(5000000)] is still a list. range(...) does it's job and generates x one at a time, but the entire list of x/2 values will be computed when this list is create.
In a nutshell, lazy evaluation means that the object is evaluated when it is needed, not when it is created.
In Python 2, range will return a list - this means that if you give it a large number, it will calculate the range and return at the time of creation:
>>> i = range(100)
>>> type(i)
<type 'list'>
In Python 3, however you get a special range object:
>>> i = range(100)
>>> type(i)
<class 'range'>
Only when you consume it, will it actually be evaluated - in other words, it will only return the numbers in the range when you actually need them.
A github repo named python patterns and wikipedia tell us what lazy evaluation is.
Delays the eval of an expr until its value is needed and avoids repeated evals.
range in python3 is not a complete lazy evaluation, because it doesn't avoid repeated eval.
A more classic example for lazy evaluation is cached_property:
import functools
class cached_property(object):
def __init__(self, function):
self.function = function
functools.update_wrapper(self, function)
def __get__(self, obj, type_):
if obj is None:
return self
val = self.function(obj)
obj.__dict__[self.function.__name__] = val
return val
The cached_property(a.k.a lazy_property) is a decorator which convert a func into a lazy evaluation property. The first time property accessed, the func is called to get result and then the value is used the next time you access the property.
eg:
class LogHandler:
def __init__(self, file_path):
self.file_path = file_path
#cached_property
def load_log_file(self):
with open(self.file_path) as f:
# the file is to big that I have to cost 2s to read all file
return f.read()
log_handler = LogHandler('./sys.log')
# only the first time call will cost 2s.
print(log_handler.load_log_file)
# return value is cached to the log_handler obj.
print(log_handler.load_log_file)
To use a proper word, a python generator object like range are more like designed through call_by_need pattern, rather than lazy evaluation
Related
I intend to make a while loop inside a defined function. In addition, I want to return a value on every iteration. Yet it doesn't allow me to iterate over the loop.
Here is the plan:
def func(x):
n=3
while(n>0):
x = x+1
return x
print(func(6))
I know the reason to such issue-return function breaks the loop.
Yet, I insist to use a defined function. Therefore, is there a way to somehow iterate over returning a value, given that such script is inside a defined function?
When you want to return a value and continue the function in the next call at the point where you returned, use yield instead of return.
Technically this produces a so called generator, which gives you the return values value by value. With next() you can iterate over the values. You can also convert it into a list or some other data structure.
Your original function would like this:
def foo(n):
for i in range(n):
yield i
And to use it:
gen = foo(100)
print(next(gen))
or
gen = foo(100)
l = list(gen)
print(l)
Keep in mind that the generator calculates the results 'on demand', so it does not allocate too much memory to store results. When converting this into a list, all results are caclculated and stored in the memory, which causes problems for large n.
Depending on your use case, you may simply use print(x) inside the loop and then return the final value.
If you actually need to return intermediate values to a caller function, you can use yield.
You can create a generator for that, so you could yield values from your generator.
Example:
def func(x):
n=3
while(n>0):
x = x+1
yield x
func_call = func(6) # create generator
print(next(func_call)) # 7
print(next(func_call)) # 8
I am reading Hackers and Painters and am confused by a problem mentioned by the author to illustrate the power of different programming languages.
The problem is:
We want to write a function that generates accumulators—a function that takes a number n, and returns a function that takes another number i and returns n incremented by i. (That’s incremented by, not plus. An accumulator has to accumulate.)
The author mentions several solutions with different programming languages. For example, Common Lisp:
(defun foo (n)
(lambda (i) (incf n i)))
and JavaScript:
function foo(n) { return function (i) { return n += i } }
However, when it comes to Python, the following codes do not work:
def foo(n):
s = n
def bar(i):
s += i
return s
return bar
f = foo(0)
f(1) # UnboundLocalError: local variable 's' referenced before assignment
A simple modification will make it work:
def foo(n):
s = [n]
def bar(i):
s[0] += i
return s[0]
return bar
I am new to Python. Why doesn the first solution not work while the second one does? The author mentions lexical variables but I still don't get it.
s += i is just sugar for s = s + i.*
This means you assign a new value to the variable s (instead of mutating it in place). When you assign to a variable, Python assumes it is local to the function. However, before assigning it needs to evaluate s + i, but s is local and still unassigned -> Error.
In the second case s[0] += i you never assign to s directly, but only ever access an item from s. So Python can clearly see that it is not a local variable and goes looking for it in the outer scope.
Finally, a nicer alternative (in Python 3) is to explicitly tell it that s is not a local variable:
def foo(n):
s = n
def bar(i):
nonlocal s
s += i
return s
return bar
(There is actually no need for s - you could simply use n instead inside bar.)
*The situation is slightly more complex, but the important issue is that computation and assignment are performed in two separate steps.
An infinite generator is one implementation. You can call __next__ on a generator instance to extract successive results iteratively.
def incrementer(n, i):
while True:
n += i
yield n
g = incrementer(2, 5)
print(g.__next__()) # 7
print(g.__next__()) # 12
print(g.__next__()) # 17
If you need a flexible incrementer, one possibility is an object-oriented approach:
class Inc(object):
def __init__(self, n=0):
self.n = n
def incrementer(self, i):
self.n += i
return self.n
g = Inc(2)
g.incrementer(5) # 7
g.incrementer(3) # 10
g.incrementer(7) # 17
In Python if we use a variable and pass it to a function then it will be Call by Value whatever changes you make to the variable it will not be reflected to the original variable.
But when you use a list instead of a variable then the changes that you make to the list in the functions are reflected in the original List outside the function so this is called call by reference.
And this is the reason for the second option does work and the first option doesn't.
I have found some nice examples (here, here) of implementing SICP-like streams in Python. But I am still not sure how to handle an example like the integral found in SICP 3.5.3 "Streams as signals."
The Scheme code found there is
(define (integral integrand initial-value dt)
(define int
(cons-stream initial-value
(add-streams (scale-stream integrand dt)
int)))
int)
What is tricky about this one is that the returned stream int is defined in terms of itself (i.e., the stream int is used in the definition of the stream int).
I believe Python could have something similarly expressive and succinct... but not sure how. So my question is, what is an analogous stream-y construct in Python? (What I mean by a stream is the subject of 3.5 in SICP, but briefly, a construct (like a Python generator) that returns successive elements of a sequence of indefinite length, and can be combined and processed with operations such as add-streams and scale-stream that respect streams' lazy character.)
There are two ways to read your question. The first is simply: How do you use Stream constructs, perhaps the ones from your second link, but with a recursive definition? That can be done, though it is a little clumsy in Python.
In Python you can represent looped data structures but not directly. You can't write:
l = [l]
but you can write:
l = [None]
l[0] = l
Similarly you can't write:
def integral(integrand,initial_value,dt):
int_rec = cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),
int_rec))
return int_rec
but you can write:
def integral(integrand,initial_value,dt):
placeholder = Stream(initial_value,lambda : None)
int_rec = cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),
placeholder))
placeholder._compute_rest = lambda:int_rec
return int_rec
Note that we need to clumsily pre-compute the first element of placeholder and then only fix up the recursion for the rest of the stream. But this does all work (alongside appropriate definitions of all the rest of the code - I'll stick it all at the bottom of this answer).
However, the second part of your question seems to be asking how to do this naturally in Python. You ask for an "analogous stream-y construct in Python". Clearly the answer to that is exactly the generator. The generator naturally provides the lazy evaluation of the stream concept. It differs by not being naturally expressed recursively but then Python does not support that as well as Scheme, as we will see.
In other words, the strict stream concept can be expressed in Python (as in the link and above) but the idiomatic way to do it is to use generators.
It is more or less possible to replicate the Scheme example by a kind of direct mechanical transformation of stream to generator (but avoiding the built-in int):
def integral_rec(integrand,initial_value,dt):
def int_rec():
for x in cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),int_rec())):
yield x
for x in int_rec():
yield x
def cons_stream(a,b):
yield a
for x in b:
yield x
def add_streams(a,b):
while True:
yield next(a) + next(b)
def scale_stream(a,b):
for x in a:
yield x * b
The only tricky thing here is to realise that you need to eagerly call the recursive use of int_rec as an argument to add_streams. Calling it doesn't start it yielding values - it just creates the generator ready to yield them lazily when needed.
This works nicely for small integrands, though it's not very pythonic. The Scheme version works by optimising the tail recursion - the Python version will exceed the max stack depth if your integrand is too long. So this is not really appropriate in Python.
A direct and natural pythonic version would look something like this, I think:
def integral(integrand,initial_value,dt):
value = initial_value
yield value
for x in integrand:
value += dt * x
yield value
This works efficiently and correctly treats the integrand lazily as a "stream". However, it uses iteration rather than recursion to unpack the integrand iterable, which is more the Python way.
In moving to natural Python I have also removed the stream combination functions - for example, replaced add_streams with +=. But we could still use them if we wanted a sort of halfway house version:
def accum(initial_value,a):
value = initial_value
yield value
for x in a:
value += x
yield value
def integral_hybrid(integrand,initial_value,dt):
for x in accum(initial_value,scale_stream(integrand,dt)):
yield x
This hybrid version uses the stream combinations from the Scheme and avoids only the tail recursion. This is still pythonic and python includes various other nice ways to work with iterables in the itertools module. They all "respect streams' lazy character" as you ask.
Finally here is all the code for the first recursive stream example, much of it taken from the Berkeley reference:
class Stream(object):
"""A lazily computed recursive list."""
def __init__(self, first, compute_rest, empty=False):
self.first = first
self._compute_rest = compute_rest
self.empty = empty
self._rest = None
self._computed = False
#property
def rest(self):
"""Return the rest of the stream, computing it if necessary."""
assert not self.empty, 'Empty streams have no rest.'
if not self._computed:
self._rest = self._compute_rest()
self._computed = True
return self._rest
def __repr__(self):
if self.empty:
return '<empty stream>'
return 'Stream({0}, <compute_rest>)'.format(repr(self.first))
Stream.empty = Stream(None, None, True)
def cons_stream(a,b):
return Stream(a,lambda : b)
def add_streams(a,b):
if a.empty or b.empty:
return Stream.empty
def compute_rest():
return add_streams(a.rest,b.rest)
return Stream(a.first+b.first,compute_rest)
def scale_stream(a,scale):
if a.empty:
return Stream.empty
def compute_rest():
return scale_stream(a.rest,scale)
return Stream(a.first*scale,compute_rest)
def make_integer_stream(first=1):
def compute_rest():
return make_integer_stream(first+1)
return Stream(first, compute_rest)
def truncate_stream(s, k):
if s.empty or k == 0:
return Stream.empty
def compute_rest():
return truncate_stream(s.rest, k-1)
return Stream(s.first, compute_rest)
def stream_to_list(s):
r = []
while not s.empty:
r.append(s.first)
s = s.rest
return r
def integral(integrand,initial_value,dt):
placeholder = Stream(initial_value,lambda : None)
int_rec = cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),
placeholder))
placeholder._compute_rest = lambda:int_rec
return int_rec
a = truncate_stream(make_integer_stream(),5)
print(stream_to_list(integral(a,8,.5)))
i would like to perform a calculation using python, where the current value (i) of the equation is based on the previous value of the equation (i-1), which is really easy to do in a spreadsheet but i would rather learn to code it
i have noticed that there is loads of information on finding the previous value from a list, but i don't have a list i need to create it! my equation is shown below.
h=(2*b)-h[i-1]
can anyone give me tell me a method to do this ?
i tried this sort of thing, but that will not work as when i try to do the equation i'm calling a value i haven't created yet, if i set h=0 then i get an error that i am out of index range
i = 1
for i in range(1, len(b)):
h=[]
h=(2*b)-h[i-1]
x+=1
h = [b[0]]
for val in b[1:]:
h.append(2 * val - h[-1]) # As you add to h, you keep up with its tail
for large b list (brr, one-letter identifier), to avoid creating large slice
from itertools import islice # For big list it will keep code less wasteful
for val in islice(b, 1, None):
....
As pointed out by #pad, you simply need to handle the base case of receiving the first sample.
However, your equation makes no use of i other than to retrieve the previous result. It's looking more like a running filter than something which needs to maintain a list of past values (with an array which might never stop growing).
If that is the case, and you only ever want the most recent value,then you might want to go with a generator instead.
def gen():
def eqn(b):
eqn.h = 2*b - eqn.h
return eqn.h
eqn.h = 0
return eqn
And then use thus
>>> f = gen()
>>> f(2)
4
>>> f(3)
2
>>> f(2)
0
>>>
The same effect could be acheived with a true generator using yield and send.
First of, do you need all the intermediate values? That is, do you want a list h from 0 to i? Or do you just want h[i]?
If you just need the i-th value you could us recursion:
def get_h(i):
if i>0:
return (2*b) - get_h(i-1)
else:
return h_0
But be aware that this will not work for large i, as it will exceed the maximum recursion depth. (Thanks for pointing this out kdopen) In that case a simple for-loop or a generator is better.
Even better is to use a (mathematically) closed form of the equation (for your example that is possible, it might not be in other cases):
def get_h(i):
if i%2 == 0:
return h_0
else:
return (2*b)-h_0
In both cases h_0 is the initial value that you start out with.
h = []
for i in range(len(b)):
if i>0:
h.append(2*b - h[i-1])
else:
# handle i=0 case here
You are successively applying a function (equation) to the result of a previous application of that function - the process needs a seed to start it. Your result looks like this [seed, f(seed), f(f(seed)), f(f(f(seed)), ...]. This concept is function composition. You can create a generalized function that will do this for any sequence of functions, in Python functions are first class objects and can be passed around just like any other object. If you need to preserve the intermediate results use a generator.
def composition(functions, x):
""" yields f(x), f(f(x)), f(f(f(x)) ....
for each f in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
yield x
Your specs require a seed and a constant,
seed = 0
b = 10
The equation/function,
def f(x, b = b):
return 2*b - x
f is applied b times.
functions = [f]*b
Usage
print list(composition(functions, seed))
If the intermediate results are not needed composition can be redefined as
def composition(functions, x):
""" Returns f(x), g(f(x)), h(g(f(x)) ....
for each function in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
return x
print composition(functions, seed)
Or more generally, with no limitations on call signature:
def compose(funcs):
'''Return a callable composed of successive application of functions
funcs is an iterable producing callables
for [f, g, h] returns f(g(h(*args, **kwargs)))
'''
def outer(f, g):
def inner(*args, **kwargs):
return f(g(*args, **kwargs))
return inner
return reduce(outer, funcs)
def plus2(x):
return x + 2
def times2(x):
return x * 2
def mod16(x):
return x % 16
funcs = (mod16, plus2, times2)
eq = compose(funcs) # mod16(plus2(times2(x)))
print eq(15)
While the process definition appears to be recursive, I resisted the temptation so I could stay out of maximum recursion depth hades.
I got curious, searched SO for function composition and, of course, there are numerous relavent Q&A's.
An statistical accumulator allows one to perform incremental calculations. For instance, for computing the arithmetic mean of a stream of numbers given at arbitrary times one could make an object which keeps track of the current number of items given, n and their sum, sum. When one requests the mean, the object simply returns sum/n.
An accumulator like this allows you to compute incrementally in the sense that, when given a new number, you don't need to recompute the entire sum and count.
Similar accumulators can be written for other statistics (cf. boost library for a C++ implementation).
How would you implement accumulators in Python? The code I came up with is:
class Accumulator(object):
"""
Used to accumulate the arithmetic mean of a stream of
numbers. This implementation does not allow to remove items
already accumulated, but it could easily be modified to do
so. also, other statistics could be accumulated.
"""
def __init__(self):
# upon initialization, the numnber of items currently
# accumulated (_n) and the total sum of the items acumulated
# (_sum) are set to zero because nothing has been accumulated
# yet.
self._n = 0
self._sum = 0.0
def add(self, item):
# the 'add' is used to add an item to this accumulator
try:
# try to convert the item to a float. If you are
# successful, add the float to the current sum and
# increase the number of accumulated items
self._sum += float(item)
self._n += 1
except ValueError:
# if you fail to convert the item to a float, simply
# ignore the exception (pass on it and do nothing)
pass
#property
def mean(self):
# the property 'mean' returns the current mean accumulated in
# the object
if self._n > 0:
# if you have more than zero items accumulated, then return
# their artithmetic average
return self._sum / self._n
else:
# if you have no items accumulated, return None (you could
# also raise an exception)
return None
# using the object:
# Create an instance of the object "Accumulator"
my_accumulator = Accumulator()
print my_accumulator.mean
# prints None because there are no items accumulated
# add one (a number)
my_accumulator.add(1)
print my_accumulator.mean
# prints 1.0
# add two (a string - it will be converted to a float)
my_accumulator.add('2')
print my_accumulator.mean
# prints 1.5
# add a 'NA' (will be ignored because it cannot be converted to float)
my_accumulator.add('NA')
print my_accumulator.mean
# prints 1.5 (notice that it ignored the 'NA')
Interesting design questions arise:
How to make the accumulator
thread-safe?
How to safely remove
items?
How to architect in a way
that allows other statistics to be
plugged in easily (a factory for statistics)
For a generalized, threadsafe higher-level function, you could use something like the following in combination with the Queue.Queue class and some other bits:
from Queue import Empty
def Accumulator(f, q, storage):
"""Yields successive values of `f` over the accumulation of `q`.
`f` should take a single iterable as its parameter.
`q` is a Queue.Queue or derivative.
`storage` is a persistent sequence that provides an `append` method.
`collections.deque` may be particularly useful, but a `list` is quite acceptable.
>>> from Queue import Queue
>>> from collections import deque
>>> from threading import Thread
>>> def mean(it):
... vals = tuple(it)
... return sum(it) / len(it)
>>> value_queue = Queue()
>>> LastThreeAverage = Accumulator(mean, value_queue, deque((), 3))
>>> def add_to_queue(it, queue):
... for value in it:
... value_queue.put(value)
>>> putting_thread = Thread(target=add_to_queue,
... args=(range(0, 12, 2), value_queue))
>>> putting_thread.start()
>>> list(LastThreeAverage)
[0, 1, 2, 4, 6, 8]
"""
try:
while True:
storage.append(q.get(timeout=0.1))
q.task_done()
yield f(storage)
except Empty:
pass
This generator function evades most of its purported responsibility by delegating it to other entities:
It relies on Queue.Queue to supply its source elements in a thread-safe manner
A collections.deque object can be passed in as the value of the storage parameter; this provides, among other things, a convenient way to only use the last n (in this case 3) values
The function itself (in this case mean) is passed as a parameter. This will result in less-than-optimally efficient code in some cases, but is readily applied to all sorts of situations.
Note that there is a possibility of the accumulator timing out if your producer thread takes longer than 0.1 seconds per value. This is easily remedied by passing a longer timeout or by removing the timeout parameter entirely. In the latter case the function will block indefinitely at the end of the queue; this usage makes more sense in a case where it's being used in a sub thread (usually a daemon thread). Of course you can also parametrize the arguments that are passed to q.get as a fourth argument to Accumulator.
If you want to communicate end of queue, i.e. that there are no more values to come, from the producer thread (here putting_thread), you can pass and check for a sentinel value or use some other method. There is more info in this thread; I opted to write a subclass of Queue.Queue called CloseableQueue that provides a close method.
There are various other ways you could customize the behaviour of such a function, for example by limiting the queue size; this is just an example of usage.
edit
As mentioned above, this loses some efficiency because of the necessity of recalculation and also, I think, doesn't really answer your question.
A generator function can also accept values through its send method. So you can write a mean generator function like
def meangen():
"""Yields the accumulated mean of sent values.
>>> g = meangen()
>>> g.send(None) # Initialize the generator
>>> g.send(4)
4.0
>>> g.send(10)
7.0
>>> g.send(-2)
4.0
"""
sum = yield(None)
count = 1
while True:
sum += yield(sum / float(count))
count += 1
Here the yield expression is both bringing values —the arguments to send— into the function, while simultaneously passing the calculated values out as the return value of send.
You can pass the generator returned by a call to that function to a more optimizable accumulator generator function like this one:
def EfficientAccumulator(g, q):
"""Similar to Accumulator but sends values to a generator `g`.
>>> from Queue import Queue
>>> from threading import Thread
>>> value_queue = Queue()
>>> g = meangen()
>>> g.send(None)
>>> mean_accumulator = EfficientAccumulator(g, value_queue)
>>> def add_to_queue(it, queue):
... for value in it:
... value_queue.put(value)
>>> putting_thread = Thread(target=add_to_queue,
... args=(range(0, 12, 2), value_queue))
>>> putting_thread.start()
>>> list(mean_accumulator)
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0]
"""
try:
while True:
yield(g.send(q.get(timeout=0.1)))
q.task_done()
except Empty:
pass
If I were doing this in Python, there are two things I would do differently:
Separate out the functionality of each accumulator.
Not use #property in any way you did.
For the first one, I would likely want to come up with an API for performing an accumulation, perhaps something like:
def add(self, num) # add a number
def compute(self) # compute the value of the accumulator
Then I would create a AccumulatorRegistry that holds onto these accumulators, and allows the user to call actions and add to all of them. The code may look like:
class Accumulators(object):
_accumulator_library = {}
def __init__(self):
self.accumulator_library = {}
for key, value in Accumulators._accumulator_library.items():
self.accumulator_library[key] = value()
#staticmethod
def register(name, accumulator):
Accumulators._accumulator_library[name] = accumulator
def add(self, num):
for accumulator in self.accumulator_library.values():
accumulator.add(num)
def compute(self, name):
self.accumulator_library[name].compute()
#staticmethod
def register_decorator(name):
def _inner(cls):
Accumulators.register(name, cls)
return cls
#Accumulators.register_decorator("Mean")
class Mean(object):
def __init__(self):
self.total = 0
self.count = 0
def add(self, num):
self.count += 1
self.total += num
def compute(self):
return self.total / float(self.count)
I should probably speak to your thread-safe question. Python's GIL protects you from a lot of threading issues. There are a few things you may way to do to protect yourself though:
If these objects are localized to one thread, use threading.local
If not, you can wrap the operations in a lock, using the with context syntax to deal with holding the lock for you.