Chaining generators considered harmful? - python

I claim: Chaining generators in Python is memory-inefficient and renders them unusable for certain types of applications. If possible, please prove me wrong.
First, a very simple and straight-forward example without generators:
import gc
def cocktail_objects():
# find all Cocktail objects currently tracked by the garbage collector
return filter(lambda obj: isinstance(obj, Cocktail), gc.get_objects())
class Cocktail(object):
def __init__(self, ingredients):
# ingredients represents our object data, imagine some heavy arrays
self.ingredients = ingredients
def __str__(self):
return self.ingredients
def __repr__(self):
return 'Cocktail(' + str(self) + ')'
def create(first_ingredient):
return Cocktail(first_ingredient)
def with_ingredient(cocktail, ingredient):
# this could be some data transformation function
return Cocktail(cocktail.ingredients + ' and ' + ingredient)
first_ingredients = ['rum', 'vodka']
print 'using iterative style:'
for ingredient in first_ingredients:
cocktail = create(ingredient)
cocktail = with_ingredient(cocktail, 'coke')
cocktail = with_ingredient(cocktail, 'limes')
print cocktail
print cocktail_objects()
This prints as expected:
rum and coke and limes
[Cocktail(rum and coke and limes)]
vodka and coke and limes
[Cocktail(vodka and coke and limes)]
Now let's use iterator objects to make the cocktail transformation easier composable:
class create_iter(object):
def __init__(self, first_ingredients):
self.first_ingredients = first_ingredients
self.i = 0
def __iter__(self):
return self
def next(self):
try:
ingredient = self.first_ingredients[self.i]
except IndexError:
raise StopIteration
else:
self.i += 1
return create(ingredient)
class with_ingredient_iter(object):
def __init__(self, cocktails_iter, ingredient):
self.cocktails_iter = cocktails_iter
self.ingredient = ingredient
def __iter__(self):
return self
def next(self):
cocktail = next(self.cocktails_iter)
return with_ingredient(cocktail, self.ingredient)
print 'using iterators:'
base = create_iter(first_ingredients)
with_coke = with_ingredient_iter(base, 'coke')
with_coke_and_limes = with_ingredient_iter(with_coke, 'limes')
for cocktail in with_coke_and_limes:
print cocktail
print cocktail_objects()
The output is identical to before.
Finally, let's replace iterators with generators to get rid of boiler-plate:
def create_gen(first_ingredients):
for ingredient in first_ingredients:
yield create(ingredient)
def with_ingredient_gen(cocktails_gen, ingredient):
for cocktail in cocktails_gen:
yield with_ingredient(cocktail, ingredient)
print 'using generators:'
base = create_gen(first_ingredients)
with_coke = with_ingredient_gen(base, 'coke')
with_coke_and_limes = with_ingredient_gen(with_coke, 'limes')
for cocktail in with_coke_and_limes:
print cocktail
print cocktail_objects()
This however prints:
rum and coke and limes
[Cocktail(rum), Cocktail(rum and coke), Cocktail(rum and coke and limes)]
vodka and coke and limes
[Cocktail(vodka), Cocktail(vodka and coke), Cocktail(vodka and coke and limes)]
This means that in a chain of generators, all currently yielded objects in that chain stay in memory and don't get released, even though the ones in earlier chain positions aren't needed anymore. Result: higher than necessary memory consumption.
Now, the question is: Why do the generators hold on to the objects that they are yielding until the next iteration starts? Obviously the objects aren't needed anymore in the generators and the references to them could be released.
I'm using generators in one of my projects to transform heavy data (numpy arrays of hundreds of megabytes) in a kind of pipeline. But as you can see this is very inefficient memory-wise. I am using Python 2.7. If this is a behaviour that got fixed in Python 3, please tell me. Otherwise, does this qualify for a bug report? And most importantly, are there any work-arounds except rewriting as shown?
Work-around 1:
print 'using imap:'
from itertools import imap
base = imap(lambda ingredient: create(ingredient), first_ingredients)
with_coke = imap(lambda cocktail: with_ingredient(cocktail, 'coke'), base)
with_coke_and_limes = imap(lambda cocktail: with_ingredient(cocktail, 'limes'), with_coke)
for cocktail in with_coke_and_limes:
print cocktail
print gc.collect()
print cocktail_objects()
Obviously this would be only useable if no state needs to be kept between the "yields". In the examples this is the case.
Preliminary conclusion: If you use iterator classes, then you decide what state you want to keep. If you use generators, Python implicitly decides what state to keep. If you use itertools.imap you cannot keep any state.

Your with_coke_and_limes yields at a certain point in its execution. At that point, the function has a local variable called cocktail (from its for loop) that refers to the "intermediate" cocktail from the next step up in the generator nesting (i.e., "rum and coke"). Just because the generator yields at that point does not mean it can throw away that object. The execution of with_ingredient_gen is suspended at that point, and at that point the local variable cocktail still exists. The function might need to refer to it later, after it resumes. There's nothing that says the yield has to be the last thing in your for loop, or that there has to be only one yield. You could have written with_ingredient_gen like this:
def with_ingredient_gen(cocktails_gen, ingredient):
for cocktail in cocktails_gen:
yield with_ingredient(cocktail, ingredient)
yield with_ingredient(cocktail, "another ingredient")
If Python threw away cocktail after the first yield, what would it do when it resumed the generator on the next iteration and found it needed that cocktail object again for the second yield?
The same applies to the other generators in the chain. Once you advance with_coke_and_limes to create a cocktail, with_coke and base are also activated and then suspended, and they have local variables referring to their own intermediate cocktails. Just as described above, these functions cannot delete the objects they refer to, since they might need them after resuming.
The generator function has to have some sort of reference to an object in order to yield it. And it has to keep that reference after it yields it, because it is suspended immediately after it yields, but it can't know whether it will need the reference once it is resumed.
Note that the only reason you didn't see the intermediate objects in your first example is because you overwrote the same local variable with each successive cocktail, allowing the earlier cocktail objects to be released. If in your first code snippet you do this instead:
for ingredient in first_ingredients:
cocktail = create(ingredient)
cocktail2 = with_ingredient(cocktail, 'coke')
cocktail3 = with_ingredient(cocktail, 'limes')
print cocktail3
print cocktail_objects()
...then you will see all three intermediate cocktails printed in that case as well, because each now has a separate local variable referring to it. Your generator version splits each of these intermediate variables into separate functions, so you can't overwrite the "parent" cocktail with the "derived" cocktail.
You are right that this can cause a problem if you have a deeply nested sequence of generators, each one of which creates large objects in memory and stores them in local variables. However, that is not a common situation. In such a situation, you have a couple of options. One is to perform the operations in a "flat" iterative style as in your first example.
Another option is to write your intermediate generators so that they don't actually create the large objects, but only "stack" the information needed to do so. For instance, in your example, if you don't want the intermediate Cocktail objects, don't create them. Instead of having each generator create a Cocktail and then having the next generator extract the previous cocktail's ingredients, have the generators pass on just the ingredients and have one final generator that combines the stacked ingredients and creates just one cocktail at the end.
It's hard to say exactly how to do this for your real application, but it may be possible. For instance, if your generators working on numpy arrays are doing things like add this, subtract that, transpose, etc., you can pass on "deltas" that describe what to do without actually doing it. Instead of having an intermediate generator, say, multiply an array by 3 and yield the array, have it yield some kind of indicator like "*3" (or possibly even a function doing the multiplication). Then your last generator can iterate over these "instructions" and perform the operations all in one place.

Related

Is there a way to copy an arbitrary generator in Python?

Among the best-known features of functional programming are lazy evaluation and infinite lists. In Python, one generally implements these features with generators. But one of the precepts of functional programming is immutability, and generators are not immutable. Just the opposite. Every time one calls next() on a generator, it changes its internal state.
A possible work-around would be to copy a generator before calling next() on it. That works for some generators such as count(). (Perhaps count() is not generator?)
from itertools import count
count_gen = count()
count_gen_copy = copy(count_gen)
print(next(count_gen), next(count_gen), next(count_gen)) # => 0 1 2
print(next(count_gen_copy), next(count_gen_copy), next(count_gen_copy)) # => 0 1 2
But if I define my own generator, e.g., my_count(), I can't copy it.
def my_count(n=0):
while True:
yield n
n += 1
my_count_gen = my_count()
my_count_gen_copy = copy(my_count_gen)
print(next(my_count_gen), next(my_count_gen), next(my_count_gen))
print(next(my_count_gen_copy), next(my_count_gen_copy), next(my_count_gen_copy))
I get an error message when I attempt to execute copy(my_count_gen): TypeError: can't pickle generator objects.
Is there a way around this, or is there some other approach?
Perhaps another way to ask this is: what is copy() copying when it copies copy_gen?
Thanks.
P.S. If I use __iter__() rather than copy(), the __iter__() version acts like the original.
my_count_gen = my_count()
my_count_gen_i = my_count_gen.__iter__()
print(next(my_count_gen), next(my_count_gen), next(my_count_gen)) # => 0 1 2
print(next(my_count_gen_i), next(my_count_gen_i), next(my_count_gen_i)) # => 3 4 5
There's no way to copy arbitrary generators in Python. The operation just doesn't make sense. A generator could depend on all sorts of other uncopyable resources, like file handles, database connections, locks, worker processes, etc. If a generator is holding a lock and you copied it, what would happen to the lock? If a generator is in the middle of a database transaction and you copy it, what would happen to the transaction?
The things you thought were copyable generators aren't generators at all. They're instances of other iterator classes. If you want to write your own iterator class, you can:
class MyCount:
def __init__(self, n=0):
self._n = n
def __iter__(self):
return self
def __next__(self):
retval = self._n
self._n += 1
return retval
Some iterators you write that way might even be reasonably copyable. Others, copy.copy will do something completely unreasonable and useless.
While copy doesn't make sense on a generator, you can effectively "copy" an iterator so that you can iterate it many times. The easiest way is to use tee from the itertools module.
def my_count(n=0):
while True:
yield n
n += 1
a, b, c = itertools.tee(my_count(), 3)
# now use a, b, c ...
This uses memory to cache the iterator's results and pass them on.

Shared python generator

I am trying to reproduce the reactive extensions "shared" observable concept with Python generators.
Say I have an API that gives me an infinite stream that I can use like this:
def my_generator():
for elem in the_infinite_stream():
yield elem
I could use this generator multiple times like so:
stream1 = my_generator()
stream2 = my_generator()
And the_infinite_stream() will be called twice (once for each generator).
Now say that the_infinite_stream() is an expensive operation. Is there a way to "share" the generator between multiple clients? It seems like tee would do that, but I have to know in advance how many independent generators I want.
The idea is that in other languages (Java, Swift) using the reactive extensions (RxJava, RxSwift) "shared" streams, I can conveniently duplicate the stream on the client side. I am wondering how to do that in Python.
Note: I am using asyncio
I took tee implementation and modified it such you can have various number of generators from infinite_stream:
import collections
def generators_factory(iterable):
it = iter(iterable)
deques = []
already_gone = []
def new_generator():
new_deque = collections.deque()
new_deque.extend(already_gone)
deques.append(new_deque)
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
already_gone.append(newval)
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return gen(new_deque)
return new_generator
# test it:
infinite_stream = [1, 2, 3, 4, 5]
factory = generators_factory(infinite_stream)
gen1 = factory()
gen2 = factory()
print(next(gen1)) # 1
print(next(gen2)) # 1 even after it was produced by gen1
print(list(gen1)) # [2, 3, 4, 5] # the rest after 1
To cache only some amount of values you can change already_gone = [] into already_gone = collections.deque(maxlen=size) and add size=None parameter to generators_factory.
Consider simple class attributes.
Given
def infinite_stream():
"""Yield a number from a (semi-)infinite iterator."""
# Alternatively, `yield from itertools.count()`
yield from iter(range(100000000))
# Helper
def get_data(iterable):
"""Print the state of `data` per stream."""
return ", ".join([f"{x.__name__}: {x.data}" for x in iterable])
Code
class SharedIterator:
"""Share the state of an iterator with subclasses."""
_gen = infinite_stream()
data = None
#staticmethod
def modify():
"""Advance the shared iterator + assign new data."""
cls = SharedIterator
cls.data = next(cls._gen)
Demo
Given a tuple of client streams (A, B and C),
# Streams
class A(SharedIterator): pass
class B(SharedIterator): pass
class C(SharedIterator): pass
streams = A, B, C
let us modify and print the state of one iterator shared between them:
# Observe changed state in subclasses
A.modify()
print("1st access:", get_data(streams))
B.modify()
print("2nd access:", get_data(streams))
C.modify()
print("3rd access:", get_data(streams))
Output
1st access: A: 0, B: 0, C: 0
2nd access: A: 1, B: 1, C: 1
3rd access: A: 2, B: 2, C: 2
Although any stream can modify the iterator, the class attribute is shared between sub-classes.
See Also
Docs on asyncio.Queue - an async alternative to shared container
Post on the Observer Pattern + asyncio
You can call "tee" repeatedly to create multiple iterators as needed.
it = iter([ random.random() for i in range(100)])
base, it_cp = itertools.tee(it)
_, it_cp2 = itertools.tee(base)
_, it_cp3 = itertools.tee(base)
Sample: http://tpcg.io/ZGc6l5.
You can use single generator and "subscriber generators":
subscribed_generators = []
def my_generator():
while true:
elem = yield
do_something(elem) # or yield do_something(elem) depending on your actual use
def publishing_generator():
for elem in the_infinite_stream():
for generator in subscribed_generators:
generator.send(elem)
subscribed_generators.extend([my_generator(), my_generator()])
# Next is just ane example that forces iteration over `the_infinite_stream`
for elem in publishing_generator():
pass
Instead of generator-function you may also create a class with methods: __next__, __iter__, send, throw. That way you can modify MyGenerator.__init__ method to automatically add new instances of it to subscribed_generators.
This is somewhat similar to event-based approach with a "dumb implementation":
for elem in the_infinite_stream is similar to emitting event
for generator ...: generator.send is similar to sending event to each subscriber.
So one way to implement a "more complex but structured solution" would be to use event-based approach:
For example you can use asyncio.Event
Or some third-party solution like aiopubsub
For any of those approaches you should emit event for each element from the_infinite_stream, and your instances of my_generator should be subscribed to those events.
And other approaches can also be used and the best choice depends: on details of your task, on how are you using event-loop in asyncio. For example:
You can implement the_infinite_stream (or wrapper for it) as some class with "cursors" (objects that track current position in the stream for different subscribers); then each my_generator registers new cursor and uses it to get next item in the infinite stream. In this approach event-loop will not automatically revisit my_generator instances, which might be required if those instances "are not equal" (for example have some "priority balancing")
Intermediate generator calling all the instances of my_generator (as described earlier). In this approach each instance of my_generator is automatically revisited by event-loop. Most likely this approach is thread-safe.
Event-based approaches:
using asyncio.Event. Similar to use of intermediate generator. Not
thread-safe
aiopubsub.
something that uses Observer pattern
Make the_infinite_generator (or wrapper for it) to be "Singleton" that "caches" latest event. Some approaches were described in other answers. Another "caching" solutions can be used:
emit the same element once for each instance of the_infinite_generator (use class with custom __new__ method that tracks instances, or use the same instance of class that has a method returning "shifted" iterator over the_infinite_loop) until someone calls special method on
instance of the_infinite_generator (or on class): infinite_gen.next_cycle. In
this case there should always be some "last finalizing
generator/processor" that at the end of each event-loop's cycle will
do the_infinite_generator().next_cycle()
Similar to previous but same event is allowed to fire multiple times in the same my_generator instance (so they should watch for this case). In this approach the_infinite_generator().next_cycle() can be called "periodically" with loop.call_later or loop.cal_at. This approach might be needed if "subscribers" should be able to handle/analyze: delays, rate-limits, timeouts between events, etc.
Many other solutions are possible. It's hard to propose something specific without looking at your current implementation and without knowing what is the desired behavior of generators that use the_infinite_loop
If I understand your description of "shared" streams correctly, that you really need "one" the_infinite_stream generator and a "handler" for it. Example that tries to do this:
class StreamHandler:
def __init__(self):
self.__real_stream = the_infinite_stream()
self.__sub_streams = []
def get_stream(self):
sub_stream = [] # or better use some Queue/deque object. Using list just to show base principle
self.__sub_streams.append(sub_stream)
while True:
while sub_stream:
yield sub_stream.pop(0)
next(self)
def __next__(self):
next_item = next(self.__real_stream)
for sub_stream in self.__sub_steams:
sub_stream.append(next_item)
some_global_variable = StreamHandler()
# Or you can change StreamHandler.__new__ to make it singleton, or you can create an instance at the point of creation of event-loop
def my_generator():
for elem in some_global_variable.get_stream():
yield elem
But if all your my_generator objects are initialized at the same point of infinite stream, and "equally" iterated inside the loop, then this approach will introduce "unnecessary" memory overhead for each "sub_stream" (used as queue). Unnecessary: because those queues will always be the same (but that can be optimized: if there are some existing "empty" sub_stream than it can be re-used for new sub_streams with some changes to "pop-logic"). And many-many other implementations and nuances can be discussed
If you have a single generator, you can use one queue per "subscriber" and route events to each subscriber as the primary generator produces results.
This has the advantage of allowing the subscribers to move at their own pace, and it can be dropped in existing code with very little changes to the original source.
For example:
def my_gen():
...
m1 = Muxer(my_gen)
m2 = Muxer(my_gen)
consumer1(m1).start()
consumer2(m2).start()
As items are pulled from the primary generator they are inserted into queues for each listener. Listeners can subscribe any time by constructing a new Muxer():
import queue
from threading import Lock
from collections import namedtuple
class Muxer():
Entry = namedtuple('Entry', 'genref listeners, lock')
already = {}
top_lock = Lock()
def __init__(self, func, restart=False):
self.restart = restart
self.func = func
self.queue = queue.Queue()
with self.top_lock:
if func not in self.already:
self.already[func] = self.Entry([func()], [], Lock())
ent = self.already[func]
self.genref = ent.genref
self.lock = ent.lock
self.listeners = ent.listeners
self.listeners.append(self)
def __iter__(self):
return self
def __next__(self):
try:
e = self.queue.get_nowait()
except queue.Empty:
with self.lock:
try:
e = self.queue.get_nowait()
except queue.Empty:
try:
e = next(self.genref[0])
for other in self.listeners:
if not other is self:
other.queue.put(e)
except StopIteration:
if self.restart:
self.genref[0] = self.func()
raise
return e
Original source code, including test suite:
https://gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3
The unit tests run many threads concurrently processing the same generated events in sequence. The code is order preserving, with a lock acquired during the single generator's access.
Caveats: the version here uses a singleton to gate access, otherwise it would be possible to accidentally evade its control over the contained generators. It also allows the contained generators to be "restartable", which was a useful feature for me a the time. There is no "close()" feature, simply because I didn't need it. This is an appropriate use case for __del__ however, since the last reference to a listener is the right time to clean up.

strange returning value in a python function

def cons(a, b):
def pair(f):
return f(a, b)
return pair
def car(f):
def left(a, b):
return a
return f(left)
def cdr(f):
def right(a, b):
return b
return f(right)
Found this python code on git.
Just want to know what is f(a,b) in cons definition is, and how does it work?
(Not a function I guess)
cons is a function, that takes two arguments, and returns a function that takes another function, which will consume these two arguments.
For example, consider the following function:
def add(a, b):
return a + b
This is just a function that adds the two inputs, so, for instance, add(2, 5) == 7
As this function takes two arguments, we can use cons to call this function:
func_caller = cons(2, 5) # cons receives two arguments and returns a function, which we call func_caller
result = func_caller(add) # func_caller receives a function, that will process these two arguments
print(result) # result is the actual result of doing add(2, 5), i.e. 7
This technique is useful for wrapping functions and executing stuff, before and after calling the appropriate functions.
For example, we can modify our cons function to actually print the values before and after calling add:
def add(a, b):
print('Adding {} and {}'.format(a, b))
return a + b
def cons(a, b):
print('Received arguments {} and {}'.format(a, b))
def pair(f):
print('Calling {} with {} and {}'.format(f, a, b))
result = f(a, b)
print('Got {}'.format(result))
return result
return pair
With this update, we get the following outputs:
func_caller = cons(2, 5)
# prints "Received arguments 2 and 5" from inside cons
result = func_caller(add)
# prints "Calling add with 2 and 5" from inside pair
# prints "Adding 2 and 5" from inside add
# prints "Got 7" from inside pair
This isn't going to make any sense to you until you know what cons, car, and cdr mean.
In Lisp, lists are stored as a very simple form of linked list. A list is either nil (like None) for an empty list, or it's a pair of a value and another list. The cons function takes a value and a list and returns you another list just by making a pair:
def cons(head, rest):
return (head, rest)
And the car and cdr functions (they stand for "Contents of Address|Data Register", because those are the assembly language instructions used to implement them on a particular 1950s computer, but that isn't very helpful) return the first or second value from a pair:
def car(lst):
return lst[0]
def cdr(lst):
return lst[1]
So, you can make a list:
lst = cons(1, cons(2, cons(3, None)))
… and you can get the second value from it:
print(car(cdr(lst))
… and you can even write functions to get the nth value:
def nth(lst, n):
if n == 0:
return car(lst)
return nth(cdr(lst), n-1)
… or print out the whole list:
def printlist(lst):
if lst:
print(car(lst), end=' ')
printlist(cdr(lst))
If you understand how these work, the next step is to try them on those weird definitions you found.
They still do the same thing. So, the question is: How? And the bigger question is: What's the point?
Well, there's no practical point to using these weird functions; the real point is to show you that everything in computer science can be written with just functions, no built-in data structures like tuples (or even integers; that just takes a different trick).
The key is higher-order functions: functions that take functions as values and/or return other functions. You actually use these all the time: map, sort with a key, decorators, partial… they’re only confusing when they’re really simple:
def car(f):
def left(a, b):
return a
return f(left)
This takes a function, and calls it on a function that returns the first of its two arguments.
And cdr is similar.
It's hard to see how you'd use either of these, until you see cons:
def cons(a, b):
def pair(f):
return f(a, b)
return pair
This takes two things and returns a function that takes another function and applies it to those two things.
So, what do we get from cons(3, None)? We get a function that takes a function, and applies it to the arguments 3 and None:
def pair3(f):
return f(3, None)
And if we call cons(2, cons(3, None))?
def pair23(f):
return f(2, pair3)
And what happens if you call car on that function? Trace through it:
def left(a, b):
return a
return pair23(left)
That pair23(left) does this:
return left(2, pair3)
And left is dead simple:
return 2
So, we got the first element of (2, cons(3, None)).
What if you call cdr?
def right(a, b):
return a
return pair23(right)
That pair23(right) does this:
return right(2, pair3)
… and right is dead simple, so it just returns pair3.
You can work out that if we call car(cdr(pair23)), we're going to get the 3 out of it.
And now you can write lst = cons(1, cons(2, cons(3, None))), write the recursive nth and printlist functions above, and trace through how they work on lst.
I mentioned above that you can even get rid of integers. How do you do that? Read about Church numerals. You define zero and successor functions. Then you can define one as successor(zero) and two as successor(one). You can even recursively define add so that add(x, zero) is x but add(x, successor(y)) is successor(add(x, y)), and go on to define mul, etc.
You also need a special function you can use as a value for nil.
Anyway, once you've done that, using all of the other definitions above, you can do lst = cons(zero(cons(one, cons(two, cons(three, nil)))), and nth(lst, two) will give you back one. (Of course writing printlist will be a bit trickier…)
Obviously, this is all going to be a lot slower than just using tuples and integers and so on. But theoretically, it’s interesting.
Consider this: we could write a tiny dialect of Python that has only three kinds of statements—def, return, and expression statements—and only three kinds of expressions—literals, identifiers, and function calls—and it could do everything normal Python does. (In fact, you could get rid of statements altogether just by having a function-defining expression, which Python already has.) That tiny language would be a pain to use, but it would a lot easier to write a program to reason about programs in that tiny language. And we even know how to translate code using tuples, loops, etc. into code in this tiny subset language, which means we can write a program that reasons about that real Python code.
In fact, with a couple more tricks (curried functions and/or static function types, and lazy evaluation), the compiler/interpreter could do that kind of reasoning on the fly and optimize our code for us. It’s easy to tell programmatically that car(cdr(cons(2, cons(3, None)) is going to return 3 without having to actually evaluate most of those function calls, so we can just skip evaluating them and substitute 3 for the whole expression.
Of course this breaks down if any function can have side effects. You obviously can’t just substitute None for print(3) and get the same results. So instead, you need some clever trick where IO is handled by some magic object that evaluates functions to figure out what it should read and write, and then the whole rest of the program, the part that users write, becomes pure and can be optimized however you want. With a couple more abstractions, we can even make IO something that doesn’t have to be magical to do that.
And then you can build a standard library that gives you back all those things we gave up, written in terms of defining and calling functions, so it’s actually usable—but under the covers it’s all just reducing pure function calls, which is simple enough for a computer to optimize. And then you’ve basically written Haskell.

Switch statements for Object composition in Python

I'm having problems designing a (Python) switch pattern that works well
with object composition. More specifically I want to create a function that gets an 'entity_id' as argument (+other relevant arguments), creates an object and matching components for it (possibly using the additional arguments). Here is a toy example
class Entity:
def __init__(self,name,animal=None,item=None):
self.name = name
class Animal: # Animal component
def __init__(self,legs):
self.legs = legs
class Item: # Item component
def __init__(self,quantity):
self.quantity = quantity
I'd like to have something like:
def place_entity(entity_id,quantity=1):
switch (entity_id):
case 'emu':
animal_component = Animal(2)
ent = Entity('Emu', animal_component)
break
case 'apple':
item_component = Item(quantity)
ent = Entity('Apple(s)', item_component )
break
return(ent)
It would be easy to produce the above using a for loop and if statements, but is there a better way?
It should be easy to
add new types of entities (bananas, nails, sharks, etc.),
add new components (Edible for instance, which tells
if entity is question is edible and how many calories it contains),
without having to change the code in too many places. Note that components sometimes require additional arguments (that are given in the input of the function).
I have seen switch statements replaced by dictionaries, but my implementation (below) of it turned out horrid. Adding another component requires adding code to every entity function!
Also I don't know how to pass arguments to components in an elegant way. Additional arguments do not work in this implementation. That is if I wanted to create an entity (a batch) of apples (let's say quantity=5) I would have to modify every type of entity function to accept a quantity argument (even if it doesn't use it), or modify the quantity after the entity is created (this is not smart since if one uses if statements then one might as well use for loop+if statements).
def create_entity(entity_id,quantity=None):
def emu():
animal_component = Animal(2)
entity_data = {'name':'Emu','animal_component':animal_component,
'item_component':None}
return(entity_data)
def apple(quantity=1):
item_component = Item(quantity)
entity_data = {'name':'Apple(s)','animal_component':None,
'item_component':item_component}
return(entity_data)
entity_dict = {'emu':emu,'apple':apple}
entity_data = entity_dict[entity_id]()
ent = Entity(entity_data['name'], animal=entity_data['animal_component'],
item=entity_data['item_component'])
return(ent)
You can simulate a switch statement using this function definition:
def switch(v): yield lambda *c: v in c
Usage would be very close to what you're looking for:
for case in switch (entity_id):
if case('emu'):
animal_component = Animal(2)
ent = Entity('Emu', animal_component)
break
if case('apple'):
item_component = Item(quantity)
ent = Entity('Apple(s)', item_component )
break

dynamic class instantiation confusion in python

I asked a similar, yet lousy, question very late last night (Access to instance variable, but not instance method in Python) that caused a fair bit of confusion. I'd delete it if I could, but I can't.
I now can ask my question more clearly.
Background: I'm trying to build a black-jack game to learn python syntax. Each hand is an instance of the Hand class and I'm now at the point where I'm trying to allow for hands to be split. So, when it comes time for a hand to be split, I need to create two new hand instances. Given that further splits are possible, and I want to reuse the same methods for re-splitting hands. I therefore (I think) need to dynamically instantiate the Hand class.
Following is a code snippet I'm using to block out the mechanics:
import os
os.system("clear")
class Hand():
instances=[]
def __init__(self, hand_a, name):
Hand.instances.append(self)
self.name = name
self.hand_a = hand_a
def show_hand(self):
ln = len(self.hand_a)
for x in range(ln):
print self.hand_a[x]
class Creation():
def __init__(self):
pass
def create_name(self):
hil = len(Hand.instances)
new_name = 'hand_' + str(hil + 1)
return(new_name)
def new_instance(self):
new_dict = {0: 'Ace of Clubs', 1: '10 of Diamonds'}
new_hand_name = {}
new_hand_name.setdefault(self.create_name(), None)
print new_hand_name
new_hand_name[0] = Hand(new_dict, self.create_name())
print new_hand_name[0]
hand = Hand("blah", 'hand')
hand_z = Hand("blah_z", 'hand_z')
creation = Creation()
creation.new_instance()
here is the output:
{'hand_3': None}
<__main__.Hand instance at 0x10e0f06c8>
With regard to the instance created by the following statement:
new_hand_name[0] = Hand(new_dict, self.create_name)
Is new_hand_name[0] new the variable that refers to the instance?
Or, is hand_3 the variable?
i.e. when calling an instance method, can I use hand_3.show_hand()?
First, to answer your questions: new_hand_name[0] is the variable that refers to the instance- more specifically, it is the value in the new_hand_name dictionary accessed by the key 0. The new_hand_name dictionary, if you printed it, would look like:
{'hand_3': None, 0: <__main__.Hand instance at 0x10e0f06c8>}
Adding the value of "hand_3" to the dictionary is unnecessary, but for that matter, so is the dictionary.
What you really want to do has nothing to do with dynamic instantiation of new classes, which has nothing to do with your problem. The problem is that a Hand might represent a single list of cards, but might also represent a list of lists of cards, each of which have to be played separately. One great way to solve this is to make a separation between a player and a hand, and allow a player to have multiple hands. Imagine this design (I'm also leaving out a lot of the blackjack functionality, but leaving a little in to give you an idea of how to work this in with the rest of the program).
def draw_random_card():
"""
whatever function returns a new card. Might be in a Deck object, depends on
your design
"""
# some code here
class Player:
def __init__(self):
self.hands = []
def deal(self):
"""add a random hand"""
self.hands.append(Hand([draw_random_card(), draw_random_card()]))
def split(self, hand):
"""split the given hand"""
self.hands.remove(hand)
self.hands += hand.split()
class Hand:
def __init__(self, cards):
self.cards = cards
def hit(self):
"""add a random card"""
self.cards.append(draw_random_card())
def split(self):
"""split and return a pair of Hand objects"""
return [Hand(self.cards[0], draw_random_card()),
Hand(self.cards[1], draw_random_card())]
Isn't that simpler?
In response to your comment:
You can refer to any specific hand as self.hands[0] or self.hands[1] within the Players class.
If you want to keep track of a particular hand, you can just pass the hand itself around instead of passing a character string referring to that hand. Like this:
def process_hand(hand):
"""do something to a hand of cards"""
h.hit()
print h.cards()
h.hit()
h = Hand(cards)
process_hand(h)
This is important: modifications you make to the hand in the function work on the actual hand itself. Why put the extra step of passing a string that you then have to look up?
Also note that information specific to each hand, such as the bet, should probably be stored in the Hand class itself.
If you are sure you want to refer to each hand with a specific name (and again, it's not necessary in this case), you just use a dictionary with those names as keys:
self.hands = {}
self.hands["hand1"] = Hand([card1, card2])
self.hands["hand2"] = Hand([card1, card2])
print self.hands["hand1"]
But again, there is probably no good reason to do this. (And note that this is very different than instantiating a new variable "dynamically". It would be a good idea to look into how dictionaries work).

Categories