heapq with custom compare predicate - python

I am trying to build a heap with a custom sort predicate. Since the values going into it are of "user-defined" type, I cannot modify their built-in comparison predicate.
Is there a way to do something like:
h = heapq.heapify([...], key=my_lt_pred)
h = heapq.heappush(h, key=my_lt_pred)
Or even better, I could wrap the heapq functions in my own container so I don't need to keep passing the predicate.

According to the heapq documentation, the way to customize the heap order is to have each element on the heap to be a tuple, with the first tuple element being one that accepts normal Python comparisons.
The functions in the heapq module are a bit cumbersome (since they are not object-oriented), and always require our heap object (a heapified list) to be explicitly passed as the first parameter. We can kill two birds with one stone by creating a very simple wrapper class that will allow us to specify a key function, and present the heap as an object.
The class below keeps an internal list, where each element is a tuple, the first member of which is a key, calculated at element insertion time using the key parameter, passed at Heap instantiation:
# -*- coding: utf-8 -*-
import heapq
class MyHeap(object):
def __init__(self, initial=None, key=lambda x:x):
self.key = key
self.index = 0
if initial:
self._data = [(key(item), i, item) for i, item in enumerate(initial)]
self.index = len(self._data)
heapq.heapify(self._data)
else:
self._data = []
def push(self, item):
heapq.heappush(self._data, (self.key(item), self.index, item))
self.index += 1
def pop(self):
return heapq.heappop(self._data)[2]
(The extra self.index part is to avoid clashes when the evaluated key value is a draw and the stored value is not directly comparable - otherwise heapq could fail with TypeError)

Define a class, in which override the __lt__() function. See example below (works in Python 3.7):
import heapq
class Node(object):
def __init__(self, val: int):
self.val = val
def __repr__(self):
return f'Node value: {self.val}'
def __lt__(self, other):
return self.val < other.val
heap = [Node(2), Node(0), Node(1), Node(4), Node(2)]
heapq.heapify(heap)
print(heap) # output: [Node value: 0, Node value: 2, Node value: 1, Node value: 4, Node value: 2]
heapq.heappop(heap)
print(heap) # output: [Node value: 1, Node value: 2, Node value: 2, Node value: 4]

The heapq documentation suggests that heap elements could be tuples in which the first element is the priority and defines the sort order.
More pertinent to your question, however, is that the documentation includes a discussion with sample code of how one could implement their own heapq wrapper functions to deal with the problems of sort stability and elements with equal priority (among other issues).
In a nutshell, their solution is to have each element in the heapq be a triple with the priority, an entry count and the element to be inserted. The entry count ensures that elements with the same priority a sorted in the order they were added to the heapq.

setattr(ListNode, "__lt__", lambda self, other: self.val <= other.val)
Use this for comparing values of objects in heapq

The limitation with both answers is that they don't allow ties to be treated as ties. In the first, ties are broken by comparing items, in the second by comparing input order. It is faster to just let ties be ties, and if there are a lot of them it could make a big difference. Based on the above and on the docs, it is not clear if this can be achieved in heapq. It does seem strange that heapq does not accept a key, while functions derived from it in the same module do.
P.S.:
If you follow the link in the first comment ("possible duplicate...") there is another suggestion of defining le which seems like a solution.

In python3, you can use cmp_to_key from functools module. cpython source code.
Suppose you need a priority queue of triplets and specify the priority use the last attribute.
from heapq import *
from functools import cmp_to_key
def mycmp(triplet_left, triplet_right):
key_l, key_r = triplet_left[2], triplet_right[2]
if key_l > key_r:
return -1 # larger first
elif key_l == key_r:
return 0 # equal
else:
return 1
WrapperCls = cmp_to_key(mycmp)
pq = []
myobj = tuple(1, 2, "anystring")
# to push an object myobj into pq
heappush(pq, WrapperCls(myobj))
# to get the heap top use the `obj` attribute
inner = pq[0].obj
Performance Test:
Environment
python 3.10.2
Code
from functools import cmp_to_key
from timeit import default_timer as time
from random import randint
from heapq import *
class WrapperCls1:
__slots__ = 'obj'
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
kl, kr = self.obj[2], other.obj[2]
return True if kl > kr else False
def cmp_class2(obj1, obj2):
kl, kr = obj1[2], obj2[2]
return -1 if kl > kr else 0 if kl == kr else 1
WrapperCls2 = cmp_to_key(cmp_class2)
triplets = [[randint(-1000000, 1000000) for _ in range(3)] for _ in range(100000)]
# tuple_triplets = [tuple(randint(-1000000, 1000000) for _ in range(3)) for _ in range(100000)]
def test_cls1():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls1(triplet))
def test_cls2():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls2(triplet))
def test_cls3():
pq = []
for triplet in triplets:
heappush(pq, (-triplet[2], triplet))
start = time()
for _ in range(10):
test_cls1()
# test_cls2()
# test_cls3()
print("total running time (seconds): ", -start+(start:=time()))
Results
use list instead of tuple, per function:
WrapperCls1: 16.2ms
WrapperCls1 with __slots__: 9.8ms
WrapperCls2: 8.6ms
move the priority attribute into the first position ( don't support custom predicate ): 6.0ms.
Therefore, this method is slightly faster than using a custom class with an overridden __lt__() function and the __slots__ attribute.

Simple and Recent
A simple solution is to store entries as a list of tuples for each tuple define the priority in your desired order if you need a different order for each item within the tuple just make it the negative for descending order.
See the official heapq python documentation in this topic Priority Queue Implementation Notes

Related

How to pass an object with attribute Value in python function

I was working on the sorting but I'm not able to call the function with the specific way.
Basically, what I want to do is to create a function that takes a list of object Node with attribute Value and returns a list with the items from the original list stored into sublists. Items of the same value should be in the same sublist and sorted in descending order.
For continuing the code I want to know what should be the parameter of this.
def advanced_sort(<What will come here according to the call>):
Function call:
advanced_sort([Node(1), Node(2), Node(1),Node(2)])
Can anyone please help me out with the code? Thanks in advance.
advanced_sort takes a single argument: a list (or possibly an arbitrary iterable). As such, the signature only has one argument:
def advanced_sort(nodes):
Ignoring type hints, the signature does not and cannot reflect the internal structure of the single argument; it's just a name to refer to the passed value inside the body of the function.
Inside the body, you can write code that assumes that nodes is a list, and that further each element of the list is a Node instance, so that you can do things like assume each value as a Value attribute.
def advanced_sort(nodes):
# If nodes is iterable, then x refers to a different
# element of the iterable each time through the loop.
for x in nodes:
# If nodes is a list of Node instances, then
# x is a Node instance, and thus you can access
# its Value attribute in the normal fashion.
print("Found value {}".format(x.Value))
Assuming a definition of Node like
class Node:
def __init__(self, v):
self.Value = v
the above definition of advanced_sort will produce the following output:
>>> advanced_sort([Node(3), Node(2), Node(1),Node(2)])
Found value 1
Found value 2
Found value 3
Found value 4
The argument is a single iterable object such as a list, a tuple, a set, ...
Then you iterate on the items as in chepner's response.
For exemple you can use a dictionary to group the Nodes by value:
def advanced_sort(node_list):
ret = dict()
for node in node_list:
if node.value not in ret.keys():
ret[node.value] = list()
ret[node.value].append(node)
return [ret[value] for value in sorted(ret.keys(), reverse=True)] #descending order
advanced_sort([Node(3), Node(2), Node(1),Node(1)])
>>> [[Node(3)], [Node(2)], [Node(1),Node(1)]]
Are you able to make changes to the Node class? In that case, you could do something like this:
from functools import total_ordering
#total_ordering
class Node:
def __init__(self, value):
self.value = value
def __eq__(self, other):
if not isinstance(other, Node):
return NotImplemented
return self.value == other.value
def __lt__(self, other):
if not isinstance(other, Node):
return NotImplemented
return self.value < other.value
def __str__(self):
return f"({self.value})"
def main():
from itertools import groupby
nodes = [Node(1), Node(2), Node(1), Node(2)]
nodes_sorted = sorted(nodes, reverse=True)
nodes_sublists = [list(group) for key, group in groupby(nodes_sorted)]
for sublist in nodes_sublists:
print(*map(str, sublist))
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
(2) (2)
(1) (1)

Using queue.PriorityQueue, not caring about comparisons

I'm trying to use queue.PriorityQueue in Python 3(.6).
I would like to store objects with a given priority. But if two objects have the same priority, I don't mind PriorityQueue.get to return either. In other words, my objects can't be compared at integers, it won't make sense to allow them to be, I just care about the priority.
In Python 3.7's documentation, there's a solution involving dataclasses. And I quote:
If the data elements are not comparable, the data can be wrapped in a class that ignores the data item and only compares the priority number:
from dataclasses import dataclass, field
from typing import Any
#dataclass(order=True)
class PrioritizedItem:
priority: int
item: Any=field(compare=False)
Alas, I'm using Python 3.6. In the documentation of this version of Python, there's no comment on using PriorityQueue for the priorities, not bothering about the "object value" which wouldn't be logical in my case.
Is there a better way than to define __le__ and other comparison methods on my custom class? I find this solution particularly ugly and counter-intuitive, but that might be me.
dataclasses is just a convenience method to avoid having to create a lot of boilerplate code.
You don't actually have to create a class. A tuple with a unique counter value too:
from itertools import count
unique = count()
q.put((priority, next(unique), item))
so that ties between equal priority are broken by the integer that follows; because it is always unique the item value is never consulted.
You can also create a class using straight-up rich comparison methods, made simpler with #functools.total_ordering:
from functools import total_ordering
#total_ordering
class PrioritizedItem:
def __init__(self, priority, item):
self.priority = priority
self.item = item
def __eq__(self, other):
if not isinstance(other, __class__):
return NotImplemented
return self.priority == other.priority
def __lt__(self, other):
if not isinstance(other, __class__):
return NotImplemented
return self.priority < other.priority
See priority queue implementation notes - just before the section you quoted (regarding using dataclasses) it tells you how to do it whitout them:
... is to store entries as 3-element list including the priority, an entry count, and the task. The entry count serves as a tie-breaker so that two tasks with the same priority are returned in the order they were added. And since no two entry counts are the same, the tuple comparison will never attempt to directly compare two tasks.
So simply add your items as 3rd element in a tuple (Prio, Count, YourElem) when adding to your queue.
Contreived example:
from queue import PriorityQueue
class CompareError(ValueError): pass
class O:
def __init__(self,n):
self.n = n
def __lq__(self):
raise CompareError
def __repr__(self): return str(self)
def __str__(self): return self.n
def add(prioqueue,prio,item):
"""Adds the 'item' with 'prio' to the 'priorqueue' adding a unique value that
is stored as member of this method 'add.n' which is incremented on each usage."""
prioqueue.put( (prio, add.n, item))
add.n += 1
# no len() on PrioQueue - we ensure our unique integer via method-param
# if you forget to declare this, you get an AttributeError
add.n = 0
h = PriorityQueue()
add(h, 7, O('release product'))
add(h, 1, O('write spec 3'))
add(h, 1, O('write spec 2'))
add(h, 1, O('write spec 1'))
add(h, 3, O('create tests'))
for _ in range(4):
item = h.get()
print(item)
Using h.put( (1, O('write spec 1')) ) leads to
TypeError: '<' not supported between instances of 'O' and 'int'`
Using def add(prioqueue,prio,item): pushes triplets as items wich have guaranteed distinct 2nd values so our O()-instances are never used as tie-breaker.
Output:
(1, 2, write spec 3)
(1, 3, write spec 2)
(1, 4, write spec 1)
(3, 5, create tests)
see MartijnPieters answer #here for a nicer unique 2nd element.
Let's assume that we don't want to write a decorator with equivalent functionality to dataclass. The problem is that we don't want to have to define all of the comparison operators in order to make our custom class comparable based on priority. The #functools.total_ordering decorator can help. Excerpt:
Given a class defining one or more rich comparison ordering methods, this class decorator supplies the rest. This simplifies the effort involved in specifying all of the possible rich comparison operations:
The class must define one of __lt__(), __le__(), __gt__(), or __ge__(). In addition, the class should supply an __eq__() method.
Using the provided example:
from functools import total_ordering
#total_ordering
class PrioritizedItem:
# ...
def __eq__(self, other):
return self.priority == other.priority
def __lt__(self, other):
return self.priority < other.priority
All you need is a wrapper class that implements __lt__ in order for PriorityQueue to work correctly. This is noted here:
The sort routines are guaranteed to use __lt__() when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an __lt__() method
It's as simple as something like this
class PriorityElem:
def __init__(self, elem_to_wrap):
self.wrapped_elem = elem_to_wrap
def __lt__(self, other):
return self.wrapped_elem.priority < other.wrapped_elem.priority
If your elements do not have priorities then it's as simple as:
class PriorityElem:
def __init__(self, elem_to_wrap, priority):
self.wrapped_elem = elem_to_wrap
self.priority = other.priority
def __lt__(self, other):
return self.priority < other.priority
Now you can use PriorityQueue like so
queue = PriorityQueue()
queue.put(PriorityElem(my_custom_class1, 10))
queue.put(PriorityElem(my_custom_class2, 10))
queue.put(PriorityElem(my_custom_class3, 30))
first_returned_elem = queue.get()
# first_returned_elem is PriorityElem(my_custom_class1, 10)
second_returned_elem = queue.get()
# second_returned_elem is PriorityElem(my_custom_class2, 10)
third_returned_elem = queue.get()
# third_returned_elem is PriorityElem(my_custom_class3, 30)
Getting at your original elements in that case would be as simple as
elem = queue.get().wrapped_elem
Since you don't care about sort stability that's all you need.
Edit: As noted in the comments and confirmed here, heappush is not stable:
unlike sorted(), this implementation is not stable.

How do I reverse the order of PriorityQueue in python?

I have created a simple priority queue in python that orders items by their value:
import Queue
q = Queue.PriorityQueue()
for it in items:
q.put((it.value, it))
but when i print the queue using:
while not q.empty()
print q.get()
it will always print the lowest value first. is there a way of getting the last item in a queue without changing the last two lines in the top bit of code to:
for it in items:
q.put((-1*it.value, it))
because that seems a bit messy and creates problems if i want to use that information for something else (i would have to multiply it by -1 again)
You could just make your own class that inherits from PriorityQueue and does the messy -1 multiplication under the hood for you:
class ReversePriorityQueue(PriorityQueue):
def put(self, tup):
newtup = tup[0] * -1, tup[1]
PriorityQueue.put(self, newtup)
def get(self):
tup = PriorityQueue.get(self)
newtup = tup[0] * -1, tup[1]
return newtup
This appears to work with tuples, at least:
Q = ReversePriorityQueue()
In [94]: Q.put((1,1))
In [95]: Q.get()
Out[95]: (1, 1)
In [96]: Q.put((1,1))
In [97]: Q.put((5,5))
In [98]: Q.put((9,9))
In [99]: Q.get()
Out[99]: (9, 9)
In [100]: Q.get()
Out[100]: (5, 5)
In [101]: Q.get()
Out[101]: (1, 1)
I'm sure you could generalize the code to work with more than just tuples from here.
You can make the transformation transparent.
What you want is to have a new q.get and a new q.put that transparently modifies data in and out of queue to reverse order:
# new reversed priority put method
q.oldput = q.put
q.put = lambda p, i: q.oldput((p * -1, i))
# new reversed priority get method
q.oldget = q.get
# create get closure for queue
def newget(q):
def g():
item = q.oldget()
# reverse first element
return item[0] * -1, item[1]
# return get method for given queue
return g
# bind to queue
q.get = newget(q)
# test
items = range(10)
for it in items:
q.put(*(it, it))
while not q.empty():
print q.get()
If this is to be made to a more robust code I strongly recommend using a class and not just re-bind the methods.
You can use a customized Class instead of a tuple.
Then you can do whatever you like in the cmp.
If you'd like to avoid multiplying by -1 when putting to the priority queue. You can do something like this.
Make a wrapper class
class Wrapper:
def __init__(self, value):
self.value = value
def __lt__(self, next):
return self.value[0] > next.value[0]
Notice that I've modified the __lt__ function, I've used > instead of < deliberately.
Now for putting an element to the Priority Queue do
from queue import PriorityQueue
pq = PriorityQueue()
pq.put(Wrapper((priority, 'extra data')))
Now while getting the element make sure to use the value attribute instead
t = pq.get().value
By definition you can only retrieve items from the front of a queue. To reverse the order, you could push everything onto a stack, and then pop items off the stack. Since a stack both adds and retrieves items from its back, this has the same effect as reversing the queue.
In the case of Queue.PriorityQueue
The lowest valued entries are retrieved first (the lowest valued entry is the one returned by sorted(list(entries))[0]).
So can you try queue.LifoQueue.It is of LIFO mechanism.
According to the spec (assuming python 2.7):
The lowest valued entries are retrieved first
So that would mean, no, you can't.

Iterating through a dictionary of a class object without mixin - python

The main function of the class is a dictionary with words as keys and id numbers as values (note: id is not in sequential because some of the entries are removed):
x = {'foo':0, 'bar':1, 'king':3}
When i wrote the iterator function for a customdict class i created, it breaks when iterating through range(1 to infinity) because of a KeyError.
class customdict():
def __init__(self,dic):
self.cdict = dic
self.inverse = {}
def keys(self):
# this is necessary when i try to overload the UserDict.Mixin
return self.cdict.values()
def __getitem__(self, valueid):
""" Iterator function of the inversed dictionary """
if self.inverse == {}:
self.inverse = {v:k for k,v in self.cdict.items()}
return self.inverse[valueid]
x = {'foo':0, 'bar':1, 'king':3}
y = customdict(x)
for i in y:
print i
Without try and except and accessing the len(x), how could I resolve the iteration of the dictionary within the customdict class? Reason being x is >>>, len(x) will take too long for realtime.
I've tried UserDict.DictMixin and suddenly it works, why is that so?:
import UserDict.DictMixin
class customdict(UserDict.DictMixin):
...
Is there a way so that i don't use Mixin because somehow in __future__ and python3, mixins looks like it's deprecated?
Define following method.
def __iter__(self):
for k in self.keys():
yield k
I've tried UserDict.DictMixin and suddenly it works, why is that so?:
Because DictMixin define above __iter__ method for you.
(UserDict.py source code.)
Just share another way:
class customdict(dict):
def __init__(self,dic):
dict.__init__(self,{v:k for k,v in dic.items()})
x = {'foo':0, 'bar':1, 'king':3}
y = customdict(x)
for i in y:
print i,y[i]
result:
0 foo
1 bar
3 king
def __iter__(self):
return iter(self.cdict.itervalues())
In Python3 you'd call values() instead.
You're correct that UserDict.DictMixin is out of date, but it's not the fact that it's a mixin that's the problem, it's the fact that collections.Mapping and collections.MutableMapping use a more sensible underlying interface. So if you want to update from UserDict.DictMixin, you should switch to collections.Mapping and implement __iter__() and __len__() instead of keys().

Statistical accumulator in Python

An statistical accumulator allows one to perform incremental calculations. For instance, for computing the arithmetic mean of a stream of numbers given at arbitrary times one could make an object which keeps track of the current number of items given, n and their sum, sum. When one requests the mean, the object simply returns sum/n.
An accumulator like this allows you to compute incrementally in the sense that, when given a new number, you don't need to recompute the entire sum and count.
Similar accumulators can be written for other statistics (cf. boost library for a C++ implementation).
How would you implement accumulators in Python? The code I came up with is:
class Accumulator(object):
"""
Used to accumulate the arithmetic mean of a stream of
numbers. This implementation does not allow to remove items
already accumulated, but it could easily be modified to do
so. also, other statistics could be accumulated.
"""
def __init__(self):
# upon initialization, the numnber of items currently
# accumulated (_n) and the total sum of the items acumulated
# (_sum) are set to zero because nothing has been accumulated
# yet.
self._n = 0
self._sum = 0.0
def add(self, item):
# the 'add' is used to add an item to this accumulator
try:
# try to convert the item to a float. If you are
# successful, add the float to the current sum and
# increase the number of accumulated items
self._sum += float(item)
self._n += 1
except ValueError:
# if you fail to convert the item to a float, simply
# ignore the exception (pass on it and do nothing)
pass
#property
def mean(self):
# the property 'mean' returns the current mean accumulated in
# the object
if self._n > 0:
# if you have more than zero items accumulated, then return
# their artithmetic average
return self._sum / self._n
else:
# if you have no items accumulated, return None (you could
# also raise an exception)
return None
# using the object:
# Create an instance of the object "Accumulator"
my_accumulator = Accumulator()
print my_accumulator.mean
# prints None because there are no items accumulated
# add one (a number)
my_accumulator.add(1)
print my_accumulator.mean
# prints 1.0
# add two (a string - it will be converted to a float)
my_accumulator.add('2')
print my_accumulator.mean
# prints 1.5
# add a 'NA' (will be ignored because it cannot be converted to float)
my_accumulator.add('NA')
print my_accumulator.mean
# prints 1.5 (notice that it ignored the 'NA')
Interesting design questions arise:
How to make the accumulator
thread-safe?
How to safely remove
items?
How to architect in a way
that allows other statistics to be
plugged in easily (a factory for statistics)
For a generalized, threadsafe higher-level function, you could use something like the following in combination with the Queue.Queue class and some other bits:
from Queue import Empty
def Accumulator(f, q, storage):
"""Yields successive values of `f` over the accumulation of `q`.
`f` should take a single iterable as its parameter.
`q` is a Queue.Queue or derivative.
`storage` is a persistent sequence that provides an `append` method.
`collections.deque` may be particularly useful, but a `list` is quite acceptable.
>>> from Queue import Queue
>>> from collections import deque
>>> from threading import Thread
>>> def mean(it):
... vals = tuple(it)
... return sum(it) / len(it)
>>> value_queue = Queue()
>>> LastThreeAverage = Accumulator(mean, value_queue, deque((), 3))
>>> def add_to_queue(it, queue):
... for value in it:
... value_queue.put(value)
>>> putting_thread = Thread(target=add_to_queue,
... args=(range(0, 12, 2), value_queue))
>>> putting_thread.start()
>>> list(LastThreeAverage)
[0, 1, 2, 4, 6, 8]
"""
try:
while True:
storage.append(q.get(timeout=0.1))
q.task_done()
yield f(storage)
except Empty:
pass
This generator function evades most of its purported responsibility by delegating it to other entities:
It relies on Queue.Queue to supply its source elements in a thread-safe manner
A collections.deque object can be passed in as the value of the storage parameter; this provides, among other things, a convenient way to only use the last n (in this case 3) values
The function itself (in this case mean) is passed as a parameter. This will result in less-than-optimally efficient code in some cases, but is readily applied to all sorts of situations.
Note that there is a possibility of the accumulator timing out if your producer thread takes longer than 0.1 seconds per value. This is easily remedied by passing a longer timeout or by removing the timeout parameter entirely. In the latter case the function will block indefinitely at the end of the queue; this usage makes more sense in a case where it's being used in a sub thread (usually a daemon thread). Of course you can also parametrize the arguments that are passed to q.get as a fourth argument to Accumulator.
If you want to communicate end of queue, i.e. that there are no more values to come, from the producer thread (here putting_thread), you can pass and check for a sentinel value or use some other method. There is more info in this thread; I opted to write a subclass of Queue.Queue called CloseableQueue that provides a close method.
There are various other ways you could customize the behaviour of such a function, for example by limiting the queue size; this is just an example of usage.
edit
As mentioned above, this loses some efficiency because of the necessity of recalculation and also, I think, doesn't really answer your question.
A generator function can also accept values through its send method. So you can write a mean generator function like
def meangen():
"""Yields the accumulated mean of sent values.
>>> g = meangen()
>>> g.send(None) # Initialize the generator
>>> g.send(4)
4.0
>>> g.send(10)
7.0
>>> g.send(-2)
4.0
"""
sum = yield(None)
count = 1
while True:
sum += yield(sum / float(count))
count += 1
Here the yield expression is both bringing values —the arguments to send— into the function, while simultaneously passing the calculated values out as the return value of send.
You can pass the generator returned by a call to that function to a more optimizable accumulator generator function like this one:
def EfficientAccumulator(g, q):
"""Similar to Accumulator but sends values to a generator `g`.
>>> from Queue import Queue
>>> from threading import Thread
>>> value_queue = Queue()
>>> g = meangen()
>>> g.send(None)
>>> mean_accumulator = EfficientAccumulator(g, value_queue)
>>> def add_to_queue(it, queue):
... for value in it:
... value_queue.put(value)
>>> putting_thread = Thread(target=add_to_queue,
... args=(range(0, 12, 2), value_queue))
>>> putting_thread.start()
>>> list(mean_accumulator)
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0]
"""
try:
while True:
yield(g.send(q.get(timeout=0.1)))
q.task_done()
except Empty:
pass
If I were doing this in Python, there are two things I would do differently:
Separate out the functionality of each accumulator.
Not use #property in any way you did.
For the first one, I would likely want to come up with an API for performing an accumulation, perhaps something like:
def add(self, num) # add a number
def compute(self) # compute the value of the accumulator
Then I would create a AccumulatorRegistry that holds onto these accumulators, and allows the user to call actions and add to all of them. The code may look like:
class Accumulators(object):
_accumulator_library = {}
def __init__(self):
self.accumulator_library = {}
for key, value in Accumulators._accumulator_library.items():
self.accumulator_library[key] = value()
#staticmethod
def register(name, accumulator):
Accumulators._accumulator_library[name] = accumulator
def add(self, num):
for accumulator in self.accumulator_library.values():
accumulator.add(num)
def compute(self, name):
self.accumulator_library[name].compute()
#staticmethod
def register_decorator(name):
def _inner(cls):
Accumulators.register(name, cls)
return cls
#Accumulators.register_decorator("Mean")
class Mean(object):
def __init__(self):
self.total = 0
self.count = 0
def add(self, num):
self.count += 1
self.total += num
def compute(self):
return self.total / float(self.count)
I should probably speak to your thread-safe question. Python's GIL protects you from a lot of threading issues. There are a few things you may way to do to protect yourself though:
If these objects are localized to one thread, use threading.local
If not, you can wrap the operations in a lock, using the with context syntax to deal with holding the lock for you.

Categories