Why won't recursive generator work? - python

I have a class where each instance is basically of a bunch of nested lists, each
of which holds a number of integers or another list containing integers, or a
list of lists, etc., like so:
class Foo(list):
def __init__(self):
self.extend(
list(1), list(2), list(3), range(5), [range(3), range(2)]
)
I want to define a method to walk the nested lists and give me
one integer at a time, not unlike os.walk. I tried this:
def _walk(self):
def kids(node):
for x in node:
try:
for y in kids(x):
yield y
except TypeError:
yield x
return kids(x)
But it immediately raises a stopiteration error. If I add a print statement to print each "node" in the first for loop, the function appears to iterate over the whole container in the way I want, but without yielding each node. It just prints them all the first time I call next on the generator.
I'm stumped. Please help!

It works if you change return kids(x) to return kids(self)

Here's a function that is a simpler version of your _walk method that does what you want on an arbitrary iterable. The internal kids function is not required.
def walk(xs):
for x in xs:
try:
for y in walk(x):
yield y
except TypeError:
yield x
This could be trivially adapted to work as a method on your Foo object.

Related

returning value without breaking a loop

I intend to make a while loop inside a defined function. In addition, I want to return a value on every iteration. Yet it doesn't allow me to iterate over the loop.
Here is the plan:
def func(x):
n=3
while(n>0):
x = x+1
return x
print(func(6))
I know the reason to such issue-return function breaks the loop.
Yet, I insist to use a defined function. Therefore, is there a way to somehow iterate over returning a value, given that such script is inside a defined function?
When you want to return a value and continue the function in the next call at the point where you returned, use yield instead of return.
Technically this produces a so called generator, which gives you the return values value by value. With next() you can iterate over the values. You can also convert it into a list or some other data structure.
Your original function would like this:
def foo(n):
for i in range(n):
yield i
And to use it:
gen = foo(100)
print(next(gen))
or
gen = foo(100)
l = list(gen)
print(l)
Keep in mind that the generator calculates the results 'on demand', so it does not allocate too much memory to store results. When converting this into a list, all results are caclculated and stored in the memory, which causes problems for large n.
Depending on your use case, you may simply use print(x) inside the loop and then return the final value.
If you actually need to return intermediate values to a caller function, you can use yield.
You can create a generator for that, so you could yield values from your generator.
Example:
def func(x):
n=3
while(n>0):
x = x+1
yield x
func_call = func(6) # create generator
print(next(func_call)) # 7
print(next(func_call)) # 8

python - documentation on the behavior to expect from list(generator)

Please suggest which documentation to look to understand the behavior of :
list(generator)
Expected it would raise the StopIteration but instead it looks it returns an empty list. Which documentation I should look to understand what behavior to expect?
Built-in functions -list
class list([iterable])
Rather than being a function, list is actually a mutable sequence type, as documented in Lists and Sequence Types — list, tuple, range.
You could probably implement list yourself:
def list(xs):
out = []
for x in xs:
out.append(x)
return out
...where for handles the StopIteration. The loop is essentially equivalent to:
xs = iter(xs)
while True:
try:
x = next(xs)
except StopIteration:
break
out.append(x)
The actual code for these is implemented in CPython.

Lazy evaluation in Python

What is lazy evaluation in Python?
One website said :
In Python 3.x the range() function returns a special range object which computes elements of the list on demand (lazy or deferred evaluation):
>>> r = range(10)
>>> print(r)
range(0, 10)
>>> print(r[3])
3
What is meant by this?
The object returned by range() (or xrange() in Python2.x) is known as a lazy iterable.
Instead of storing the entire range, [0,1,2,..,9], in memory, the generator stores a definition for (i=0; i<10; i+=1) and computes the next value only when needed (AKA lazy-evaluation).
Essentially, a generator allows you to return a list like structure, but here are some differences:
A list stores all elements when it is created. A generator generates the next element when it is needed.
A list can be iterated over as much as you need, a generator can only be iterated over exactly once.
A list can get elements by index, a generator cannot -- it only generates values once, from start to end.
A generator can be created in two ways:
(1) Very similar to a list comprehension:
# this is a list, create all 5000000 x/2 values immediately, uses []
lis = [x/2 for x in range(5000000)]
# this is a generator, creates each x/2 value only when it is needed, uses ()
gen = (x/2 for x in range(5000000))
(2) As a function, using yield to return the next value:
# this is also a generator, it will run until a yield occurs, and return that result.
# on the next call it picks up where it left off and continues until a yield occurs...
def divby2(n):
num = 0
while num < n:
yield num/2
num += 1
# same as (x/2 for x in range(5000000))
print divby2(5000000)
Note: Even though range(5000000) is a generator in Python3.x, [x/2 for x in range(5000000)] is still a list. range(...) does it's job and generates x one at a time, but the entire list of x/2 values will be computed when this list is create.
In a nutshell, lazy evaluation means that the object is evaluated when it is needed, not when it is created.
In Python 2, range will return a list - this means that if you give it a large number, it will calculate the range and return at the time of creation:
>>> i = range(100)
>>> type(i)
<type 'list'>
In Python 3, however you get a special range object:
>>> i = range(100)
>>> type(i)
<class 'range'>
Only when you consume it, will it actually be evaluated - in other words, it will only return the numbers in the range when you actually need them.
A github repo named python patterns and wikipedia tell us what lazy evaluation is.
Delays the eval of an expr until its value is needed and avoids repeated evals.
range in python3 is not a complete lazy evaluation, because it doesn't avoid repeated eval.
A more classic example for lazy evaluation is cached_property:
import functools
class cached_property(object):
def __init__(self, function):
self.function = function
functools.update_wrapper(self, function)
def __get__(self, obj, type_):
if obj is None:
return self
val = self.function(obj)
obj.__dict__[self.function.__name__] = val
return val
The cached_property(a.k.a lazy_property) is a decorator which convert a func into a lazy evaluation property. The first time property accessed, the func is called to get result and then the value is used the next time you access the property.
eg:
class LogHandler:
def __init__(self, file_path):
self.file_path = file_path
#cached_property
def load_log_file(self):
with open(self.file_path) as f:
# the file is to big that I have to cost 2s to read all file
return f.read()
log_handler = LogHandler('./sys.log')
# only the first time call will cost 2s.
print(log_handler.load_log_file)
# return value is cached to the log_handler obj.
print(log_handler.load_log_file)
To use a proper word, a python generator object like range are more like designed through call_by_need pattern, rather than lazy evaluation

Python inspect iterable in __new__ method

I'm trying to write a python (2.7) matrix module. (I know about numpy, this is just for fun.)
My Code:
from numbers import Number
import itertools
test2DMat = [[1,2,3],[4,5,6],[7,8,9]]
test3DMat = [[[1,2,3],[4,5,6],[7,8,9]],[[2,3,4],[5,6,7],[8,9,0]],[[9,8,7],[6,5,4],[3,2,1]]]
class Dim(list):
def __new__(cls,inDim):
# If every item in inDim is a number create a Vec
if all(isinstance(item,Number) for item in inDim):
#return Vec(inDim)
return Vec.__new__(cls,inDim)
# Otherwise create a Dim
return list.__new__(cls,inDim)
def __init__(self,inDim):
# Make sure every item in inDim is iterable
try:
for item in inDim: iter(item)
except TypeError:
raise TypeError('All items in a Dim must be iterable')
# Make sure every item in inDim has the same length
# or that there are zero items in the list
if len(set(len(item) for item in inDim)) > 1:
raise ValueError('All lists in a Dim must be the same length')
inDim = map(Dim,inDim)
list.__init__(self,inDim)
class Vec(Dim):
def __new__(cls,inDim):
if cls.__name__ not in [Vec.__name__,Dim.__name__]:
newMat = list.__new__(Vec,inDim)
newMat.__init__(inDim)
return newMat
return list.__new__(Vec,inDim)
def __init__(self,inDim):
list.__init__(self,inDim)
class Matrix(Dim):
def __new__(cls,inMat):
return Dim.__new__(cls,inMat)
def __init__(self,inMat):
super(Matrix,self).__init__(inMat)
Current Functionality:
So far I have written a few classes, Matrix, Dim, and Vec. Matrix and Vec are both subclasses of Dim. When creating a matrix, one would first start out with a list of lists and they would create a matrix like:
>>> startingList = [[1,2,3],[4,5,6],[7,8,9]]
>>> matrix.Matrix(startingList)
[[1,2,3],[4,5,6],[7,8,9]]
This should create a Matrix. The created Matrix should contain multiple Dims all of the same length. Each of these Dims should contain multiple Dims all of the same length, etc. The last Dim, the one that contains numbers, should contain only numbers and should be a Vec instead of a Dim.
The Problem:
All of this works, for lists. If I were however, to use an iterator object instead (such as that returned by iter()) this does not function as I want it to.
For example:
>>> startingList = [[1,2,3],[4,5,6],[7,8,9]]
>>> matrix.Matrix(iter(startingList))
[]
My Thoughts:
I'm fairly certain that this is happening because in Dim.__new__ I iterate over the input iterable which, when the same iterable is then passed to Matrix.__init__ it has already been iterated over and will therefore appear to be empty, resulting in the empty matrix that I get.
I have tried copying the iterator using itertools.tee(), but this also doesn't work because I don't actually call Matrix.__init__ it gets called implicitly when Matrix.__new__ returns and I therefore cannot call it with different parameters than those passed to Matrix.__init__. Everything I have thought of to do comes up against this same problem.
Is there any way for me to preserve the existing functionality and also allow matrix.Matrix() to be called with an iterator object?
The key is that Vec.__init__ is getting called twice; once inside your __new__ method and once when you return it from the __new__ method. So if you mark it as already initialised and return early from Vec.__init__ if it is already initialised, then you can ignore the second call:
class A(object):
def __new__(cls, param):
return B.__new__(cls, param + 100)
class B(A):
def __new__(cls, param):
b = object.__new__(B)
b.__init__(param)
return b
def __init__(self, param):
if hasattr(self, 'param'):
print "skipping __init__", self
return
self.param = param
print A(5).param
What you would need to do is check if the variable that is passed in is a tuple or list. If it is then you can use it directly, otherwise you need to convert the iterator into a list/tuple.
if isinstance(inDim, collections.Sequence):
pass
elif hastattr(inDim, '__iter__'): # this is better than using iter()
inDim = tuple(inDim)
else:
# item is not iterable
There is also a better way of checking that the length of all the lists are the same:
if len(inDim) > 0:
len_iter = (len(item) for item in inDim)
first_len = len_iter.next()
for other_len in len_iter:
if other_len != first_len:
raise ValueError('All lists in a Dim must be the same length')

Python: the mechanism behind list comprehension

When using list comprehension or the in keyword in a for loop context, i.e:
for o in X:
do_something_with(o)
or
l=[o for o in X]
How does the mechanism behind in works?
Which functions\methods within X does it call?
If X can comply to more than one method, what's the precedence?
How to write an efficient X, so that list comprehension will be quick?
The, afaik, complete and correct answer.
for, both in for loops and list comprehensions, calls iter() on X. iter() will return an iterable if X either has an __iter__ method or a __getitem__ method. If it implements both, __iter__ is used. If it has neither you get TypeError: 'Nothing' object is not iterable.
This implements a __getitem__:
class GetItem(object):
def __init__(self, data):
self.data = data
def __getitem__(self, x):
return self.data[x]
Usage:
>>> data = range(10)
>>> print [x*x for x in GetItem(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
This is an example of implementing __iter__:
class TheIterator(object):
def __init__(self, data):
self.data = data
self.index = -1
# Note: In Python 3 this is called __next__
def next(self):
self.index += 1
try:
return self.data[self.index]
except IndexError:
raise StopIteration
def __iter__(self):
return self
class Iter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
return TheIterator(data)
Usage:
>>> data = range(10)
>>> print [x*x for x in Iter(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
As you see you need both to implement an iterator, and __iter__ that returns the iterator.
You can combine them:
class CombinedIter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
self.index = -1
return self
def next(self):
self.index += 1
try:
return self.data[self.index]
except IndexError:
raise StopIteration
Usage:
>>> well, you get it, it's all the same...
But then you can only have one iterator going at once.
OK, in this case you could just do this:
class CheatIter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
return iter(self.data)
But that's cheating because you are just reusing the __iter__ method of list.
An easier way is to use yield, and make __iter__ into a generator:
class Generator(object):
def __init__(self, data):
self.data = data
def __iter__(self):
for x in self.data:
yield x
This last is the way I would recommend. Easy and efficient.
X must be iterable. It must implement __iter__() which returns an iterator object; the iterator object must implement next(), which returns next item every time it is called or raises a StopIteration if there's no next item.
Lists, tuples and generators are all iterable.
Note that the plain for operator uses the same mechanism.
Answering question's comments I can say that reading source is not the best idea in this case. The code that is responsible for execution of compiled code (ceval.c) does not seem to be very verbose for a person that sees Python sources for the first time. Here is the snippet that represents iteration in for loops:
TARGET(FOR_ITER)
/* before: [iter]; after: [iter, iter()] *or* [] */
v = TOP();
/*
Here tp_iternext corresponds to next() in Python
*/
x = (*v->ob_type->tp_iternext)(v);
if (x != NULL) {
PUSH(x);
PREDICT(STORE_FAST);
PREDICT(UNPACK_SEQUENCE);
DISPATCH();
}
if (PyErr_Occurred()) {
if (!PyErr_ExceptionMatches(
PyExc_StopIteration))
break;
PyErr_Clear();
}
/* iterator ended normally */
x = v = POP();
Py_DECREF(v);
JUMPBY(oparg);
DISPATCH();
To find what actually happens here you need to dive into bunch of other files which verbosity is not much better. Thus I think that in such cases documentation and sites like SO are the first place to go while the source should be checked only for uncovered implementation details.
X must be an iterable object, meaning it needs to have an __iter__() method.
So, to start a for..in loop, or a list comprehension, first X's __iter__() method is called to obtain an iterator object; then that object's next() method is called for each iteration until StopIteration is raised, at which point the iteration stops.
I'm not sure what your third question means, and how to provide a meaningful answer to your fourth question except that your iterator should not construct the entire list in memory at once.
Maybe this helps (tutorial http://docs.python.org/tutorial/classes.html Section 9.9):
Behind the scenes, the for statement
calls iter() on the container object.
The function returns an iterator
object that defines the method next()
which accesses elements in the
container one at a time. When there
are no more elements, next() raises a
StopIteration exception which tells
the for loop to terminate.
To answer your questions:
How does the mechanism behind in works?
It is the exact same mechanism as used for ordinary for loops, as others have already noted.
Which functions\methods within X does it call?
As noted in a comment below, it calls iter(X) to get an iterator. If X has a method function __iter__() defined, this will be called to return an iterator; otherwise, if X defines __getitem__(), this will be called repeatedly to iterate over X. See the Python documentation for iter() here: http://docs.python.org/library/functions.html#iter
If X can comply to more than one method, what's the precedence?
I'm not sure what your question is here, exactly, but Python has standard rules for how it resolves method names, and they are followed here. Here is a discussion of this:
Method Resolution Order (MRO) in new style Python classes
How to write an efficient X, so that list comprehension will be quick?
I suggest you read up more on iterators and generators in Python. One easy way to make any class support iteration is to make a generator function for iter(). Here is a discussion of generators:
http://linuxgazette.net/100/pramode.html

Categories