How do I reverse the order of PriorityQueue in python? - python

I have created a simple priority queue in python that orders items by their value:
import Queue
q = Queue.PriorityQueue()
for it in items:
q.put((it.value, it))
but when i print the queue using:
while not q.empty()
print q.get()
it will always print the lowest value first. is there a way of getting the last item in a queue without changing the last two lines in the top bit of code to:
for it in items:
q.put((-1*it.value, it))
because that seems a bit messy and creates problems if i want to use that information for something else (i would have to multiply it by -1 again)

You could just make your own class that inherits from PriorityQueue and does the messy -1 multiplication under the hood for you:
class ReversePriorityQueue(PriorityQueue):
def put(self, tup):
newtup = tup[0] * -1, tup[1]
PriorityQueue.put(self, newtup)
def get(self):
tup = PriorityQueue.get(self)
newtup = tup[0] * -1, tup[1]
return newtup
This appears to work with tuples, at least:
Q = ReversePriorityQueue()
In [94]: Q.put((1,1))
In [95]: Q.get()
Out[95]: (1, 1)
In [96]: Q.put((1,1))
In [97]: Q.put((5,5))
In [98]: Q.put((9,9))
In [99]: Q.get()
Out[99]: (9, 9)
In [100]: Q.get()
Out[100]: (5, 5)
In [101]: Q.get()
Out[101]: (1, 1)
I'm sure you could generalize the code to work with more than just tuples from here.

You can make the transformation transparent.
What you want is to have a new q.get and a new q.put that transparently modifies data in and out of queue to reverse order:
# new reversed priority put method
q.oldput = q.put
q.put = lambda p, i: q.oldput((p * -1, i))
# new reversed priority get method
q.oldget = q.get
# create get closure for queue
def newget(q):
def g():
item = q.oldget()
# reverse first element
return item[0] * -1, item[1]
# return get method for given queue
return g
# bind to queue
q.get = newget(q)
# test
items = range(10)
for it in items:
q.put(*(it, it))
while not q.empty():
print q.get()
If this is to be made to a more robust code I strongly recommend using a class and not just re-bind the methods.

You can use a customized Class instead of a tuple.
Then you can do whatever you like in the cmp.

If you'd like to avoid multiplying by -1 when putting to the priority queue. You can do something like this.
Make a wrapper class
class Wrapper:
def __init__(self, value):
self.value = value
def __lt__(self, next):
return self.value[0] > next.value[0]
Notice that I've modified the __lt__ function, I've used > instead of < deliberately.
Now for putting an element to the Priority Queue do
from queue import PriorityQueue
pq = PriorityQueue()
pq.put(Wrapper((priority, 'extra data')))
Now while getting the element make sure to use the value attribute instead
t = pq.get().value

By definition you can only retrieve items from the front of a queue. To reverse the order, you could push everything onto a stack, and then pop items off the stack. Since a stack both adds and retrieves items from its back, this has the same effect as reversing the queue.

In the case of Queue.PriorityQueue
The lowest valued entries are retrieved first (the lowest valued entry is the one returned by sorted(list(entries))[0]).
So can you try queue.LifoQueue.It is of LIFO mechanism.

According to the spec (assuming python 2.7):
The lowest valued entries are retrieved first
So that would mean, no, you can't.

Related

Unscriptable result of generetion function

I have generator function and function that works witn results of the first one. For example:
def gen():
a = 2
b = 3
yield (a, b)
def func():
c = gen()[0]
d = gen()[1]
I have error "'gen()' is unscriptable"
How can I fix it and work with result of func?
You have two problems here.
First, generator objects are not sequences, they're iterators. And you can't index an iterator the way you can a sequence, by subscripting it like [1]. You can loop over them with a for statement or a comprehension, or manually call next on them until they're done, but you can't [1] them.
That's why you get an error message that says the generator object is not subscriptable.
Second, you didn't want to subscript the generator anyway. Your generator yields an iterable of multiple pairs. It happens to only yield once, but that's no different from a sequence with just one pair in it—it's still not the same thing as a pair.
Consider the nearest sequence equivalent:
def seq():
a = 2
b = 3
return [(a, b)]
Obviously seq()[0] is going to be the tuple (2, 3), and seq()[1] is going to be an IndexError. So, even if you could subscript generators, your code wouldn't make sense.
What you actually want to do is either take the first pair, or loop over all the pairs (I'm not sure which). And then you can do [0] and [1] to the/each pair.
So, either this:
def func():
for pair in gen():
c = pair[0]
d = pair[1]
… or this:
def func():
pair = next(gen())
c = pair[0]
d = pair[1]
Or, if you really wanted to call it twice for some reason, this:
def func():
for pair in gen():
c = pair[0]
for pair in gen():
d = pair[1]
… or this:
def func():
c = next(gen())[0]
d = next(gen())pair[1]
What you are trying to do is get the first and second element without iterating over an iterator. You need to iterate over it to get values from it like -
for i in gen():
c, d = i # you need this because you are returning a tuple
You can go through this post to learn more about iterators and generators

Set class with timed auto remove of elements

I'm writing a software with Python and I need a class to store a set of elements (order is not relevant and not repeated elements), like the Python class set, but I need that the elements are auto-removed after a period of time.
For that, I want to overwrite the set.add method, adding an argument with default value with this timeout.
My problem is with the best way to implement that: threads? gobject-like timeouts?
All suggestions are welcome!
Just an idea (of course, it isn't the best one): Use a dictionary to store the timestamp of each addition plus the concrete timeout for each item. Then, when you want to check if an item is in the set, you have to compare the current time with the value in the dictionary. This way you don't need to start a new thread to remove each item when the timeout is finished (just keep the key in the dictionary, and update it if the item is added again).
With this solution you have to implement __contains__ and __iter__ apart from add, to make sure 'a' in myset and iter(myset) returns consistent results.
import time
class TimedSet(set):
def __init__(self):
self.__table = {}
def add(self, item, timeout=1):
self.__table[item] = time.time() + timeout
set.add(self, item)
def __contains__(self, item):
return time.time() < self.__table.get(item)
def __iter__(self):
for item in set.__iter__(self):
if time.time() < self.__table.get(item):
yield item
And a possible example of usage:
t_set = TimedSet()
t_set.add('a')
time.sleep(0.6)
print 'a' in t_set
time.sleep(0.6)
print 'a' in t_set
t_set.add('x', 0.3)
t_set.add('y', 0.4)
t_set.add('z', 0.5)
time.sleep(0.35)
for item in t_set:
print item
I don't see any reason not to go with multithreading. As it is really easy to implement, with minimum code. Roughly something like this:
import threading
import time
def ttl_set_remove(my_set, item, ttl):
time.sleep(ttl)
my_set.remove(item)
class MySet(set):
def add(self, item, ttl):
set.add(self, item)
t = threading.Thread(target=ttl_set_remove, args=(self, item, ttl))
t.start()
test:
s = MySet()
s.add('a', 20)
s.add('b', 10)
s.add('c', 2)
print(s)
time.sleep(5)
print(s)
>>>
MySet({'c', 'b', 'a'})
MySet({'b', 'a'})

heapq with custom compare predicate

I am trying to build a heap with a custom sort predicate. Since the values going into it are of "user-defined" type, I cannot modify their built-in comparison predicate.
Is there a way to do something like:
h = heapq.heapify([...], key=my_lt_pred)
h = heapq.heappush(h, key=my_lt_pred)
Or even better, I could wrap the heapq functions in my own container so I don't need to keep passing the predicate.
According to the heapq documentation, the way to customize the heap order is to have each element on the heap to be a tuple, with the first tuple element being one that accepts normal Python comparisons.
The functions in the heapq module are a bit cumbersome (since they are not object-oriented), and always require our heap object (a heapified list) to be explicitly passed as the first parameter. We can kill two birds with one stone by creating a very simple wrapper class that will allow us to specify a key function, and present the heap as an object.
The class below keeps an internal list, where each element is a tuple, the first member of which is a key, calculated at element insertion time using the key parameter, passed at Heap instantiation:
# -*- coding: utf-8 -*-
import heapq
class MyHeap(object):
def __init__(self, initial=None, key=lambda x:x):
self.key = key
self.index = 0
if initial:
self._data = [(key(item), i, item) for i, item in enumerate(initial)]
self.index = len(self._data)
heapq.heapify(self._data)
else:
self._data = []
def push(self, item):
heapq.heappush(self._data, (self.key(item), self.index, item))
self.index += 1
def pop(self):
return heapq.heappop(self._data)[2]
(The extra self.index part is to avoid clashes when the evaluated key value is a draw and the stored value is not directly comparable - otherwise heapq could fail with TypeError)
Define a class, in which override the __lt__() function. See example below (works in Python 3.7):
import heapq
class Node(object):
def __init__(self, val: int):
self.val = val
def __repr__(self):
return f'Node value: {self.val}'
def __lt__(self, other):
return self.val < other.val
heap = [Node(2), Node(0), Node(1), Node(4), Node(2)]
heapq.heapify(heap)
print(heap) # output: [Node value: 0, Node value: 2, Node value: 1, Node value: 4, Node value: 2]
heapq.heappop(heap)
print(heap) # output: [Node value: 1, Node value: 2, Node value: 2, Node value: 4]
The heapq documentation suggests that heap elements could be tuples in which the first element is the priority and defines the sort order.
More pertinent to your question, however, is that the documentation includes a discussion with sample code of how one could implement their own heapq wrapper functions to deal with the problems of sort stability and elements with equal priority (among other issues).
In a nutshell, their solution is to have each element in the heapq be a triple with the priority, an entry count and the element to be inserted. The entry count ensures that elements with the same priority a sorted in the order they were added to the heapq.
setattr(ListNode, "__lt__", lambda self, other: self.val <= other.val)
Use this for comparing values of objects in heapq
The limitation with both answers is that they don't allow ties to be treated as ties. In the first, ties are broken by comparing items, in the second by comparing input order. It is faster to just let ties be ties, and if there are a lot of them it could make a big difference. Based on the above and on the docs, it is not clear if this can be achieved in heapq. It does seem strange that heapq does not accept a key, while functions derived from it in the same module do.
P.S.:
If you follow the link in the first comment ("possible duplicate...") there is another suggestion of defining le which seems like a solution.
In python3, you can use cmp_to_key from functools module. cpython source code.
Suppose you need a priority queue of triplets and specify the priority use the last attribute.
from heapq import *
from functools import cmp_to_key
def mycmp(triplet_left, triplet_right):
key_l, key_r = triplet_left[2], triplet_right[2]
if key_l > key_r:
return -1 # larger first
elif key_l == key_r:
return 0 # equal
else:
return 1
WrapperCls = cmp_to_key(mycmp)
pq = []
myobj = tuple(1, 2, "anystring")
# to push an object myobj into pq
heappush(pq, WrapperCls(myobj))
# to get the heap top use the `obj` attribute
inner = pq[0].obj
Performance Test:
Environment
python 3.10.2
Code
from functools import cmp_to_key
from timeit import default_timer as time
from random import randint
from heapq import *
class WrapperCls1:
__slots__ = 'obj'
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
kl, kr = self.obj[2], other.obj[2]
return True if kl > kr else False
def cmp_class2(obj1, obj2):
kl, kr = obj1[2], obj2[2]
return -1 if kl > kr else 0 if kl == kr else 1
WrapperCls2 = cmp_to_key(cmp_class2)
triplets = [[randint(-1000000, 1000000) for _ in range(3)] for _ in range(100000)]
# tuple_triplets = [tuple(randint(-1000000, 1000000) for _ in range(3)) for _ in range(100000)]
def test_cls1():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls1(triplet))
def test_cls2():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls2(triplet))
def test_cls3():
pq = []
for triplet in triplets:
heappush(pq, (-triplet[2], triplet))
start = time()
for _ in range(10):
test_cls1()
# test_cls2()
# test_cls3()
print("total running time (seconds): ", -start+(start:=time()))
Results
use list instead of tuple, per function:
WrapperCls1: 16.2ms
WrapperCls1 with __slots__: 9.8ms
WrapperCls2: 8.6ms
move the priority attribute into the first position ( don't support custom predicate ): 6.0ms.
Therefore, this method is slightly faster than using a custom class with an overridden __lt__() function and the __slots__ attribute.
Simple and Recent
A simple solution is to store entries as a list of tuples for each tuple define the priority in your desired order if you need a different order for each item within the tuple just make it the negative for descending order.
See the official heapq python documentation in this topic Priority Queue Implementation Notes

Get class instance where a certain property has a certain value

I'm sure there is a term for what I'm looking for, or if there's not, there is a very good reason what I'm trying to do is in fact silly.
But anyway. I'm wondering whether there is a (quasi) built-in way of finding a certain class instance that has a property set to a certain value.
An example:
class Klass(object):
def __init__(self, value):
self.value = value
def square_value(self):
return self.value * self.value
>>> a = Klass(1)
>>> b = Klass(2)
>>> instance = find_instance([a, b], value=1)
>>> instance.square_value()
1
>>> instance = find_instance([a, b], value=2)
>>> instance.square_value()
4
I know that I could write a function that loops through all Klass instances, and returns the ones with the requested values. On the other hand, this functionality feels as if it should exist within Python already, and if it's not, that there must be a very good reasons why it's not. In other words, that what I'm trying to do here can be done in a much better way.
(And of course, I'm not looking for a way to square a value. The above is just an example of the construct I'm trying to look for).
Use filter:
filter(lambda obj: obj.value == 1, [a, b])
Filter will return a list of objects which meet the requirement you specify. Docs: http://docs.python.org/library/functions.html#filter
Bascially, filter(fn, list) iterates over list, and applies fn to each item. It collects all of the items for which fn returns true, puts then into a list, and returns them.
NB: filter will always return a list, even if there is only one object which matches. So if you only wanted to return the first instance which matches, you'd have to to something like:
def find_instance(fn, objs):
all_matches = filter(fn, objs)
if len(all_matches) == 0:
return False # no matches
else:
return all_matches[0]
or, better yet,
def find_instance(fn, objs):
all_matches = filter(fn, objs)
return len(all_matches) > 0 and all_matches[0] # uses the fact that 'and' returns its second argument if its first argument evaluates to True.
Then, you would call this function like this:
instance = find_instance(lambda x: x.value == 1, [a, b])
and then instance would be a.
A more efficient version of Ord's answer, if you are looking for just one matching instance, would be
def find_instance(fn, objs):
all_matches = (o for o in objs if fn(objs))
return next(all_matches, None)
instance = find_instance(lambda x: x.value == 1, [a, b])
This will stop the search as soon as you find the first match (good if your test function is expensive or your list is large), or None if there aren't any matches.
Note that the next function is new in Python 2.6; in an older version, I think you have to do
try:
return all_matches.next()
except StopIteration:
return None
Of course, if you're just doing this once, you could do it as a one-liner:
instance = next((o for o in [a, b] if o.value == 1), None)
The latter has the advantage of not doing a bunch of function calls and so might be slightly faster, though the difference will probably be trivial.

Statistical accumulator in Python

An statistical accumulator allows one to perform incremental calculations. For instance, for computing the arithmetic mean of a stream of numbers given at arbitrary times one could make an object which keeps track of the current number of items given, n and their sum, sum. When one requests the mean, the object simply returns sum/n.
An accumulator like this allows you to compute incrementally in the sense that, when given a new number, you don't need to recompute the entire sum and count.
Similar accumulators can be written for other statistics (cf. boost library for a C++ implementation).
How would you implement accumulators in Python? The code I came up with is:
class Accumulator(object):
"""
Used to accumulate the arithmetic mean of a stream of
numbers. This implementation does not allow to remove items
already accumulated, but it could easily be modified to do
so. also, other statistics could be accumulated.
"""
def __init__(self):
# upon initialization, the numnber of items currently
# accumulated (_n) and the total sum of the items acumulated
# (_sum) are set to zero because nothing has been accumulated
# yet.
self._n = 0
self._sum = 0.0
def add(self, item):
# the 'add' is used to add an item to this accumulator
try:
# try to convert the item to a float. If you are
# successful, add the float to the current sum and
# increase the number of accumulated items
self._sum += float(item)
self._n += 1
except ValueError:
# if you fail to convert the item to a float, simply
# ignore the exception (pass on it and do nothing)
pass
#property
def mean(self):
# the property 'mean' returns the current mean accumulated in
# the object
if self._n > 0:
# if you have more than zero items accumulated, then return
# their artithmetic average
return self._sum / self._n
else:
# if you have no items accumulated, return None (you could
# also raise an exception)
return None
# using the object:
# Create an instance of the object "Accumulator"
my_accumulator = Accumulator()
print my_accumulator.mean
# prints None because there are no items accumulated
# add one (a number)
my_accumulator.add(1)
print my_accumulator.mean
# prints 1.0
# add two (a string - it will be converted to a float)
my_accumulator.add('2')
print my_accumulator.mean
# prints 1.5
# add a 'NA' (will be ignored because it cannot be converted to float)
my_accumulator.add('NA')
print my_accumulator.mean
# prints 1.5 (notice that it ignored the 'NA')
Interesting design questions arise:
How to make the accumulator
thread-safe?
How to safely remove
items?
How to architect in a way
that allows other statistics to be
plugged in easily (a factory for statistics)
For a generalized, threadsafe higher-level function, you could use something like the following in combination with the Queue.Queue class and some other bits:
from Queue import Empty
def Accumulator(f, q, storage):
"""Yields successive values of `f` over the accumulation of `q`.
`f` should take a single iterable as its parameter.
`q` is a Queue.Queue or derivative.
`storage` is a persistent sequence that provides an `append` method.
`collections.deque` may be particularly useful, but a `list` is quite acceptable.
>>> from Queue import Queue
>>> from collections import deque
>>> from threading import Thread
>>> def mean(it):
... vals = tuple(it)
... return sum(it) / len(it)
>>> value_queue = Queue()
>>> LastThreeAverage = Accumulator(mean, value_queue, deque((), 3))
>>> def add_to_queue(it, queue):
... for value in it:
... value_queue.put(value)
>>> putting_thread = Thread(target=add_to_queue,
... args=(range(0, 12, 2), value_queue))
>>> putting_thread.start()
>>> list(LastThreeAverage)
[0, 1, 2, 4, 6, 8]
"""
try:
while True:
storage.append(q.get(timeout=0.1))
q.task_done()
yield f(storage)
except Empty:
pass
This generator function evades most of its purported responsibility by delegating it to other entities:
It relies on Queue.Queue to supply its source elements in a thread-safe manner
A collections.deque object can be passed in as the value of the storage parameter; this provides, among other things, a convenient way to only use the last n (in this case 3) values
The function itself (in this case mean) is passed as a parameter. This will result in less-than-optimally efficient code in some cases, but is readily applied to all sorts of situations.
Note that there is a possibility of the accumulator timing out if your producer thread takes longer than 0.1 seconds per value. This is easily remedied by passing a longer timeout or by removing the timeout parameter entirely. In the latter case the function will block indefinitely at the end of the queue; this usage makes more sense in a case where it's being used in a sub thread (usually a daemon thread). Of course you can also parametrize the arguments that are passed to q.get as a fourth argument to Accumulator.
If you want to communicate end of queue, i.e. that there are no more values to come, from the producer thread (here putting_thread), you can pass and check for a sentinel value or use some other method. There is more info in this thread; I opted to write a subclass of Queue.Queue called CloseableQueue that provides a close method.
There are various other ways you could customize the behaviour of such a function, for example by limiting the queue size; this is just an example of usage.
edit
As mentioned above, this loses some efficiency because of the necessity of recalculation and also, I think, doesn't really answer your question.
A generator function can also accept values through its send method. So you can write a mean generator function like
def meangen():
"""Yields the accumulated mean of sent values.
>>> g = meangen()
>>> g.send(None) # Initialize the generator
>>> g.send(4)
4.0
>>> g.send(10)
7.0
>>> g.send(-2)
4.0
"""
sum = yield(None)
count = 1
while True:
sum += yield(sum / float(count))
count += 1
Here the yield expression is both bringing values —the arguments to send— into the function, while simultaneously passing the calculated values out as the return value of send.
You can pass the generator returned by a call to that function to a more optimizable accumulator generator function like this one:
def EfficientAccumulator(g, q):
"""Similar to Accumulator but sends values to a generator `g`.
>>> from Queue import Queue
>>> from threading import Thread
>>> value_queue = Queue()
>>> g = meangen()
>>> g.send(None)
>>> mean_accumulator = EfficientAccumulator(g, value_queue)
>>> def add_to_queue(it, queue):
... for value in it:
... value_queue.put(value)
>>> putting_thread = Thread(target=add_to_queue,
... args=(range(0, 12, 2), value_queue))
>>> putting_thread.start()
>>> list(mean_accumulator)
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0]
"""
try:
while True:
yield(g.send(q.get(timeout=0.1)))
q.task_done()
except Empty:
pass
If I were doing this in Python, there are two things I would do differently:
Separate out the functionality of each accumulator.
Not use #property in any way you did.
For the first one, I would likely want to come up with an API for performing an accumulation, perhaps something like:
def add(self, num) # add a number
def compute(self) # compute the value of the accumulator
Then I would create a AccumulatorRegistry that holds onto these accumulators, and allows the user to call actions and add to all of them. The code may look like:
class Accumulators(object):
_accumulator_library = {}
def __init__(self):
self.accumulator_library = {}
for key, value in Accumulators._accumulator_library.items():
self.accumulator_library[key] = value()
#staticmethod
def register(name, accumulator):
Accumulators._accumulator_library[name] = accumulator
def add(self, num):
for accumulator in self.accumulator_library.values():
accumulator.add(num)
def compute(self, name):
self.accumulator_library[name].compute()
#staticmethod
def register_decorator(name):
def _inner(cls):
Accumulators.register(name, cls)
return cls
#Accumulators.register_decorator("Mean")
class Mean(object):
def __init__(self):
self.total = 0
self.count = 0
def add(self, num):
self.count += 1
self.total += num
def compute(self):
return self.total / float(self.count)
I should probably speak to your thread-safe question. Python's GIL protects you from a lot of threading issues. There are a few things you may way to do to protect yourself though:
If these objects are localized to one thread, use threading.local
If not, you can wrap the operations in a lock, using the with context syntax to deal with holding the lock for you.

Categories