I'm writing a generator function. I want to know if there's a better (read: more pythonic, ideally with a list comprehension) way to implement something like this:
generator = gen()
captures = []
for _ in xrange(x):
foo = next(generator)
directories.append(foo['name'])
yield foo
The key here is that I don't want to capture the WHOLE yield- the dictionary returned by gen() is large, which is why I'm using a generator. I do need to capture all of the 'name's, though. I feel like there's a way to do this with a list comprehension, but I'm just not seeing it. Thoughts?
There is another / shorter way to do this, but I wouldn't call it more Pythonic:
generator = gen()
directories = []
generator_wrapper = (directories.append(foo['name']) or foo
for foo in generator)
This takes advantage of the fact that append, like all mutating methods in Python, always returns None so .append(...) or foo will always evaluate to foo.
That way the whole dictionary is still the result of the generator expression, and you still get lazy evaluation, but the name still gets saved to the directories list.
You could also use this method in an explicit for loop:
for foo in generator:
yield directories.append(foo['name']) or foo
or even just simplify your loop a bit:
for foo in generator:
directories.append(foo['name'])
yield foo
as there is no reason to use an xrange just to iterate over the generator (unless you actually want to only iterate some known number of steps in).
You want the first x many elements of the generator? Use itertools.islice:
directories = [item['name'] for item in itertools.islice(gen(), x)]
Related
So I was writing a function where a lot of processing happens in the body of a loop, and occasionally it may be of interest to the caller to have the answer to some of the computations.
Normally I would just put the results in a list and return the list, but in this case the results are too large (a few hundred MB on each loop).
I wrote this without really thinking about it, expecting Python's dynamic typing to figure things out, but the following is always created as a generator.
def mixed(is_generator=False):
for i in range(5):
# process some stuff including writing to a file
if is_generator:
yield i
return
From this I have two questions:
1) Does the presence of the yield keyword in a scope immediately turn the object its in into a generator?
2) Is there a sensible way to obtain the behaviour I intended?
2.1) If no, what is the reasoning behind it not being possible? (In terms of how functions and generators work in Python.)
Lets go step by step:
1) Does the presence of the yield keyword in a scope immediately turn the object its in into a generator? Yes
2) Is there a sensible way to obtain the behaviour I intended? Yes, see example below
The thing is to wrap the computation and either return a generator or a list with the data of that generator:
def mixed(is_generator=False):
# create a generator object
gen = (compute_stuff(i) for i in range(5))
# if we want just the generator
if is_generator:
return gen
# if not we consume it with a list and return that list
return list(gen)
Anyway, I would say this is a bad practice. You should have it separated, usually just have the generator function and then use some logic outside:
def computation():
for i in range(5):
# process some stuff including writing to a file
yield i
gen = computation()
if lazy:
for data in gen:
use_data(data)
else:
data = list(gen)
use_all_data(data)
Python 3 has introduced generator-like objects to be returned upon calling range() and zip(). The object returned acts like a generator and can be iterated through once but doesn't 'print' well, much like the enumerate() return argument.
I was perplexed to see, however, that they are distinct object types and do not belong to types.GeneratorType, or at least this is what the types module shows. A function that would run e.g. expecting a generator would not detect them. What is their inheritance? Do they belong to a main "generator" structure, so that they e.g. could be identified along with other generators?
import types
a = [1,2,3]
b = [4,5,6]
# create some generator-type objects
obj_zip = zip(a,b)
obj_enu = enumerate(a)
obj_r = range(10)
print(type(obj_zip))
print(type(obj_enu))
print(type(obj_r))
# checking against types.GeneratorType returns False
print(isinstance(obj_zip,types.GeneratorType))
print(isinstance(obj_enu,types.GeneratorType))
print(isinstance(obj_r,types.GeneratorType))
# checking against their own distinct object types returns True
print(isinstance(obj_zip,zip))
Per the GeneratorType docs:
types.GeneratorType
The type of generator-iterator objects, created by generator functions.
Generator functions are a specific thing in the language; it means functions that use yield or yield from (or generator expressions, which are just a shorthand for inline generator functions). It's a subset of the set of iterators (all things that you can call next() on to get a new value), which is in turn a subset of iterables (all things that you can call iter() on to get an iterator; iterators themselves are iterables, where iter(iterator) behaves as the identity function).
Basically, if you're testing for "can I loop over this?", test isinstance(obj, collections.abc.Iterable). If you're checking "is this an exhaustible iterator?" (that is, will I exhaust it by looping over it?), test either isinstance(obj, collections.abc.Iterator) or for the duck-typing based approach, test iter(obj) is obj (the invariants on iterators require that iter(iterator) yield the original iterator object unchanged).
Note that range is not a generator or iterator. Per the docs:
Rather than being a function, range is actually an immutable sequence type, as documented in Ranges and Sequence Types — list, tuple, range.
Being an immutable sequence type means it is an iterable, but that's it. The fact that it is usually used as if it were an iterator is irrelevant; if it were an iterator, the second loop here would never execute:
r = range(3)
for i in r:
print("First", i)
for i in r:
print("Second", i)
but it works just fine, because each (implicit) call to iter(r) returns a new iterator based on the same underlying iterable.
The documentation says that enumerate is functionally equivalent to a generator. Actually it is implemented in C and returns an iterator, not a generator as described in Does enumerate() produce a generator object.
Generators and iterables are almost the same. This is explained in detail in Difference between Python's Generators and Iterators.
I'm assuming you're trying to solve a real problem, like finding out if you can iterate over something. To solve that you can test if something is an instance of collections.Iterable.
a = enumerate([1,2,3])
isinstance(a, collections.Iterable)
>>> True
not meant to be a full answer (ShadowRanger answer already explains everything) but just to state that types.GeneratorType is really a very limited type as shown in types.py source:
def _g():
yield 1
GeneratorType = type(_g())
it only scopes the type of generator functions. Other "generator-like" objects don't use yield so they're not a match.
This is probably simple, but, I cannot find it for some reason. For example:
def fnc(dt):
print dt
return;
#[(fnc(y)) for y in range(5) for x in range(5)]
for x in range(0, 5):
fnc(x)
for y in range(0, 5):
fnc(y)
I would like the commented out line to have similar behaviour with the double nested loop bellow it. Is this possible? I give up, I cannot find it! Thanks for any input.
You have to used nested list comprehensions to achieve the same result:
[(fnc(x),[fnc(y) for y in range(5)]) for x in range(5)]
I used a tuple (fnc(x), [...]) to output x before performing the list comprehension for y.
P.S.: Don't actually use this. Stick to your loops.
You don't need a list comprehension here. List comprehensions are for building lists not for side effects, as you have in your for loop. Any solution that provides the same result using a list comp. (like the one below) will produce a useless list of Nones:
[fnc(y) for x in range(5) if fnc(x) or 1 for y in range(5)]
The code is unpythonic and unreadable. You should never use it. fnc(x) is always evaluated while evaluating the if, and the branch is always taken because it is short-circuited with a truthy value using or, so that the nested loop always executes ∀ x.
The Pythonic way is to use a vanilla for like you've done.
What you can do is probably technically possible (I'm thinking of a class with an overridden iterator that calls func() in the iteration, although I'm not sure if this is actually implementable).
The implementation, however, would be an aberration.
List comprehensions are intended as a fast way to filter, combine and/or process data in a list to generate another one. You should think of them as a way to quickly apply a function to all the data in the list, appending each time the function results to the output list.
This is why there's no syntax to, say, do assignments or external function calls in comprehensions. If you need to call a function in the inner loop before processing the data, you're better off with the nested loop approach (which is also much more readable than anything equivalent hacked to work in a comprehension)
I'm using a generator function, say:
def foo():
i=0
while (i<10):
i+=1
yield i
Now, I would like the option to copy the generator after any number of iterations, so that the new copy will retain the internal state (will have the same 'i' in the example) but will now be independent from the original (i.e. iterating over the copy should not change the original).
I've tried using copy.deepcopy but I get the error:
"TypeError: object.__new__(generator) is not safe, use generator.__new__()"
Obviously, I could solve this using regular functions with counters for example.
But I'm really looking for a solution using generators.
There are three cases I can think of:
Generator has no side effects, and you just want to be able to walk back through results you've already captured. You could consider a cached generator instead of a true generator. You can shared the cached generator around as well, and if any client walks to an item you haven't been to yet, it will advance. This is similar to the tee() method, but does the tee functionality in the generator/cache itself instead of requiring the client to do it.
Generator has side effects, but no history, and you want to be able to restart anywhere. Consider writing it as a coroutine, where you can pass in the value to start at any time.
Generator has side effects AND history, meaning that the state of the generator at G(x) depends on the results of G(x-1), and so you can't just pass x back into it to start anywhere. In this case, I think you'd need to be more specific about what you are trying to do, as the result depends not just on the generator, but on the state of other data. Probably, in this case, there is a better way to do it.
The comment for itertools.tee was my first guess as well. Because of the warning that you shouldn't advance the original generator any longer after using tee, I might write something like this to spin off a copy:
>>> from itertools import tee
>>>
>>> def foo():
... i = 0
... while i < 10:
... i += 1
... yield i
...
>>>
>>> it = foo()
>>> it.next()
1
>>> it, other = tee(it)
>>> it.next()
2
>>> other.next()
2
Suppose I have a for loop that iterates over a generator and in this loop i'm building a list for later use:
q = []
for index, val in enumerate(['a', 'b', 'c']):
q.append(index+val)
I want to hold onto the generator without creating a function such as this:
def foo(gen):
for index, val in gen:
yield index+val
is it at all possible or is there some inherent problem I don't see here?
I suppose it should look something like this:
iter_q = something()
for index, val in enumerate(['a', 'b', 'c']):
q.add_iteration(index+val)
OK, now that I've written this, it does seem quite impossible (or useless), since this for loop will have to iterate through the whole list before the "generator" is ready, which makes it just an iterator over a premade list, and not a generator in the useful sense of the object.
Posting anyway because I couldn't find such a question myself. (plus, maybe someone still has something interesting to say about it)
If I'm understanding your question correctly, I think you might be wanting a generator expression:
q = (index + val for index, val in enumerate(someOtherGenerator))
That statement creates a new generator q, which will work just like the generator returned by your foo function. You don't need to assign it to a variable either, you can create generator expressions just about anywhere, such as in a function call:
doSomethingWithAGenerator(i+v for i,v in enumerate(someOtherGenerator))
There are some kinds of generators that can't be made in generator expressions, but the most common kinds can be.
I look at it this way: itertools.cycle is probably the best-in-class example of a function looping over an iterator while pretending not to exhaust it. It ultimately creates a complete copy of the entire thing. If itertools can't do it any better than that, there probably is no other (general) solution.