I'm trying to use queue.PriorityQueue in Python 3(.6).
I would like to store objects with a given priority. But if two objects have the same priority, I don't mind PriorityQueue.get to return either. In other words, my objects can't be compared at integers, it won't make sense to allow them to be, I just care about the priority.
In Python 3.7's documentation, there's a solution involving dataclasses. And I quote:
If the data elements are not comparable, the data can be wrapped in a class that ignores the data item and only compares the priority number:
from dataclasses import dataclass, field
from typing import Any
#dataclass(order=True)
class PrioritizedItem:
priority: int
item: Any=field(compare=False)
Alas, I'm using Python 3.6. In the documentation of this version of Python, there's no comment on using PriorityQueue for the priorities, not bothering about the "object value" which wouldn't be logical in my case.
Is there a better way than to define __le__ and other comparison methods on my custom class? I find this solution particularly ugly and counter-intuitive, but that might be me.
dataclasses is just a convenience method to avoid having to create a lot of boilerplate code.
You don't actually have to create a class. A tuple with a unique counter value too:
from itertools import count
unique = count()
q.put((priority, next(unique), item))
so that ties between equal priority are broken by the integer that follows; because it is always unique the item value is never consulted.
You can also create a class using straight-up rich comparison methods, made simpler with #functools.total_ordering:
from functools import total_ordering
#total_ordering
class PrioritizedItem:
def __init__(self, priority, item):
self.priority = priority
self.item = item
def __eq__(self, other):
if not isinstance(other, __class__):
return NotImplemented
return self.priority == other.priority
def __lt__(self, other):
if not isinstance(other, __class__):
return NotImplemented
return self.priority < other.priority
See priority queue implementation notes - just before the section you quoted (regarding using dataclasses) it tells you how to do it whitout them:
... is to store entries as 3-element list including the priority, an entry count, and the task. The entry count serves as a tie-breaker so that two tasks with the same priority are returned in the order they were added. And since no two entry counts are the same, the tuple comparison will never attempt to directly compare two tasks.
So simply add your items as 3rd element in a tuple (Prio, Count, YourElem) when adding to your queue.
Contreived example:
from queue import PriorityQueue
class CompareError(ValueError): pass
class O:
def __init__(self,n):
self.n = n
def __lq__(self):
raise CompareError
def __repr__(self): return str(self)
def __str__(self): return self.n
def add(prioqueue,prio,item):
"""Adds the 'item' with 'prio' to the 'priorqueue' adding a unique value that
is stored as member of this method 'add.n' which is incremented on each usage."""
prioqueue.put( (prio, add.n, item))
add.n += 1
# no len() on PrioQueue - we ensure our unique integer via method-param
# if you forget to declare this, you get an AttributeError
add.n = 0
h = PriorityQueue()
add(h, 7, O('release product'))
add(h, 1, O('write spec 3'))
add(h, 1, O('write spec 2'))
add(h, 1, O('write spec 1'))
add(h, 3, O('create tests'))
for _ in range(4):
item = h.get()
print(item)
Using h.put( (1, O('write spec 1')) ) leads to
TypeError: '<' not supported between instances of 'O' and 'int'`
Using def add(prioqueue,prio,item): pushes triplets as items wich have guaranteed distinct 2nd values so our O()-instances are never used as tie-breaker.
Output:
(1, 2, write spec 3)
(1, 3, write spec 2)
(1, 4, write spec 1)
(3, 5, create tests)
see MartijnPieters answer #here for a nicer unique 2nd element.
Let's assume that we don't want to write a decorator with equivalent functionality to dataclass. The problem is that we don't want to have to define all of the comparison operators in order to make our custom class comparable based on priority. The #functools.total_ordering decorator can help. Excerpt:
Given a class defining one or more rich comparison ordering methods, this class decorator supplies the rest. This simplifies the effort involved in specifying all of the possible rich comparison operations:
The class must define one of __lt__(), __le__(), __gt__(), or __ge__(). In addition, the class should supply an __eq__() method.
Using the provided example:
from functools import total_ordering
#total_ordering
class PrioritizedItem:
# ...
def __eq__(self, other):
return self.priority == other.priority
def __lt__(self, other):
return self.priority < other.priority
All you need is a wrapper class that implements __lt__ in order for PriorityQueue to work correctly. This is noted here:
The sort routines are guaranteed to use __lt__() when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an __lt__() method
It's as simple as something like this
class PriorityElem:
def __init__(self, elem_to_wrap):
self.wrapped_elem = elem_to_wrap
def __lt__(self, other):
return self.wrapped_elem.priority < other.wrapped_elem.priority
If your elements do not have priorities then it's as simple as:
class PriorityElem:
def __init__(self, elem_to_wrap, priority):
self.wrapped_elem = elem_to_wrap
self.priority = other.priority
def __lt__(self, other):
return self.priority < other.priority
Now you can use PriorityQueue like so
queue = PriorityQueue()
queue.put(PriorityElem(my_custom_class1, 10))
queue.put(PriorityElem(my_custom_class2, 10))
queue.put(PriorityElem(my_custom_class3, 30))
first_returned_elem = queue.get()
# first_returned_elem is PriorityElem(my_custom_class1, 10)
second_returned_elem = queue.get()
# second_returned_elem is PriorityElem(my_custom_class2, 10)
third_returned_elem = queue.get()
# third_returned_elem is PriorityElem(my_custom_class3, 30)
Getting at your original elements in that case would be as simple as
elem = queue.get().wrapped_elem
Since you don't care about sort stability that's all you need.
Edit: As noted in the comments and confirmed here, heappush is not stable:
unlike sorted(), this implementation is not stable.
Related
I am trying sort a list of strings in a way that uses a special comparison. I am trying to use functools.total_ordering, but I'm not sure whether it's filling out the undefined comparisons correctly.
The two I define ( > and ==) work as expected, but < does not. In particular, I print all three and I get that a > b and a < b. How is this possible? I would think that total_ordering would simply define < as not > and not ==. The result of my < test is what you would get with regular str comparison, leading me to believe that total_ordering isn't doing anything.
Perhaps the problem is that I am inheriting str, which already has __lt__ implemented? If so, is there a fix to this issue?
from functools import total_ordering
#total_ordering
class SortableStr(str):
def __gt__(self, other):
return self+other > other+self
#Is this necessary? Or will default to inherited class?
def __eq__(self, other):
return str(self) == str(other)
def main():
a = SortableStr("99")
b = SortableStr("994")
print(a > b)
print(a == b)
print(a < b)
if __name__ == "__main__":
main()
OUTPUT:
True
False
True
You're right that the built-in str comparison operators are interfering with your code. From the docs
Given a class defining one or more rich comparison ordering methods, this class decorator supplies the rest.
So it only supplies the ones not already defined. In your case, the fact that some of them are defined in a parent class is undetectable to total_ordering.
Now, we can dig deeper into the source code and find the exact check
roots = {op for op in _convert if getattr(cls, op, None) is not getattr(object, op, None)}
So it checks if the values are equal to the ones defined in the root object object. We can make that happen
#total_ordering
class SortableStr(str):
__lt__ = object.__lt__
__le__ = object.__le__
__ge__ = object.__ge__
def __gt__(self, other):
return self+other > other+self
#Is this necessary? Or will default to inherited class?
def __eq__(self, other):
return str(self) == str(other)
Now total_ordering will see that __lt__, __le__, and __ge__ are equal to the "original" object values and overwrite them, as desired.
This all being said, I would argue that this is a poor use of inheritance. You're violating Liskov substitution at the very least, in that mixed comparisons between str and SortableStr are going to, to put it lightly, produce counterintuitive results.
My more general recommendation is to favor composition over inheritance and, rather than defining a thing that "is a" specialized string, consider defining a type that "contains" a string and has specialized behavior.
#total_ordering
class SortableStr:
def __init__(self, value):
self.value = value
def __gt__(self, other):
return self.value + other.value > other.value + self.value
def __eq__(self, other):
return self.value == other.value
There, no magic required. Now SortableStr("99") is a valid object that is not a string but exhibits the behavior you want.
Not sure if this is correct, but glancing at the documentation of functools.total_ordering, this stands out to me:
Given a class defining one or more rich comparison ordering methods,
this class decorator supplies the rest.
Emphasis mine. Your class inherits __lt__ from str, so it does not get re-implemented by total_ordering since it isn't missing. That's my best guess.
I have a class MyClass, which contains two member variables foo and bar:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
I have two instances of this class, each of which has identical values for foo and bar:
x = MyClass('foo', 'bar')
y = MyClass('foo', 'bar')
However, when I compare them for equality, Python returns False:
>>> x == y
False
How can I make python consider these two objects equal?
You should implement the method __eq__:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
def __eq__(self, other):
if not isinstance(other, MyClass):
# don't attempt to compare against unrelated types
return NotImplemented
return self.foo == other.foo and self.bar == other.bar
Now it outputs:
>>> x == y
True
Note that implementing __eq__ will automatically make instances of your class unhashable, which means they can't be stored in sets and dicts. If you're not modelling an immutable type (i.e. if the attributes foo and bar may change the value within the lifetime of your object), then it's recommended to just leave your instances as unhashable.
If you are modelling an immutable type, you should also implement the data model hook __hash__:
class MyClass:
...
def __hash__(self):
# necessary for instances to behave sanely in dicts and sets.
return hash((self.foo, self.bar))
A general solution, like the idea of looping through __dict__ and comparing values, is not advisable - it can never be truly general because the __dict__ may have uncomparable or unhashable types contained within.
N.B.: be aware that before Python 3, you may need to use __cmp__ instead of __eq__. Python 2 users may also want to implement __ne__, since a sensible default behaviour for inequality (i.e. inverting the equality result) will not be automatically created in Python 2.
You override the rich comparison operators in your object.
class MyClass:
def __lt__(self, other):
# return comparison
def __le__(self, other):
# return comparison
def __eq__(self, other):
# return comparison
def __ne__(self, other):
# return comparison
def __gt__(self, other):
# return comparison
def __ge__(self, other):
# return comparison
Like this:
def __eq__(self, other):
return self._id == other._id
If you're dealing with one or more classes that you can't change from the inside, there are generic and simple ways to do this that also don't depend on a diff-specific library:
Easiest, unsafe-for-very-complex-objects method
pickle.dumps(a) == pickle.dumps(b)
pickle is a very common serialization lib for Python objects, and will thus be able to serialize pretty much anything, really. In the above snippet, I'm comparing the str from serialized a with the one from b. Unlike the next method, this one has the advantage of also type checking custom classes.
The biggest hassle: due to specific ordering and [de/en]coding methods, pickle may not yield the same result for equal objects, especially when dealing with more complex ones (e.g. lists of nested custom-class instances) like you'll frequently find in some third-party libs. For those cases, I'd recommend a different approach:
Thorough, safe-for-any-object method
You could write a recursive reflection that'll give you serializable objects, and then compare results
from collections.abc import Iterable
BASE_TYPES = [str, int, float, bool, type(None)]
def base_typed(obj):
"""Recursive reflection method to convert any object property into a comparable form.
"""
T = type(obj)
from_numpy = T.__module__ == 'numpy'
if T in BASE_TYPES or callable(obj) or (from_numpy and not isinstance(T, Iterable)):
return obj
if isinstance(obj, Iterable):
base_items = [base_typed(item) for item in obj]
return base_items if from_numpy else T(base_items)
d = obj if T is dict else obj.__dict__
return {k: base_typed(v) for k, v in d.items()}
def deep_equals(*args):
return all(base_typed(args[0]) == base_typed(other) for other in args[1:])
Now it doesn't matter what your objects are, deep equality is assured to work
>>> from sklearn.ensemble import RandomForestClassifier
>>>
>>> a = RandomForestClassifier(max_depth=2, random_state=42)
>>> b = RandomForestClassifier(max_depth=2, random_state=42)
>>>
>>> deep_equals(a, b)
True
The number of comparables doesn't matter as well
>>> c = RandomForestClassifier(max_depth=2, random_state=1000)
>>> deep_equals(a, b, c)
False
My use case for this was checking deep equality among a diverse set of already trained Machine Learning models inside BDD tests. The models belonged to a diverse set of third-party libs. Certainly implementing __eq__ like other answers here suggest wasn't an option for me.
Covering all the bases
You may be in a scenario where one or more of the custom classes being compared do not have a __dict__ implementation. That's not common by any means, but it is the case of a subtype within sklearn's Random Forest classifier: <type 'sklearn.tree._tree.Tree'>. Treat these situations on a case by case basis - e.g. specifically, I decided to replace the content of the afflicted type with the content of a method that gives me representative information on the instance (in this case, the __getstate__ method). For such, the second-to-last row in base_typed became
d = obj if T is dict else obj.__dict__ if '__dict__' in dir(obj) else obj.__getstate__()
Edit: for the sake of organization, I replaced the hideous oneliner above with return dict_from(obj). Here, dict_from is a really generic reflection made to accommodate more obscure libs (I'm looking at you, Doc2Vec)
def isproperty(prop, obj):
return not callable(getattr(obj, prop)) and not prop.startswith('_')
def dict_from(obj):
"""Converts dict-like objects into dicts
"""
if isinstance(obj, dict):
# Dict and subtypes are directly converted
d = dict(obj)
elif '__dict__' in dir(obj):
# Use standard dict representation when available
d = obj.__dict__
elif str(type(obj)) == 'sklearn.tree._tree.Tree':
# Replaces sklearn trees with their state metadata
d = obj.__getstate__()
else:
# Extract non-callable, non-private attributes with reflection
kv = [(p, getattr(obj, p)) for p in dir(obj) if isproperty(p, obj)]
d = {k: v for k, v in kv}
return {k: base_typed(v) for k, v in d.items()}
Do mind none of the above methods yield True for objects with the same key-value pairs in a differing order, as in
>>> a = {'foo':[], 'bar':{}}
>>> b = {'bar':{}, 'foo':[]}
>>> pickle.dumps(a) == pickle.dumps(b)
False
But if you want that you could use Python's built-in sorted method beforehand anyway.
With Dataclasses in Python 3.7 (and above), a comparison of object instances for equality is an inbuilt feature.
A backport for Dataclasses is available for Python 3.6.
(Py37) nsc#nsc-vbox:~$ python
Python 3.7.5 (default, Nov 7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dataclasses import dataclass
>>> #dataclass
... class MyClass():
... foo: str
... bar: str
...
>>> x = MyClass(foo="foo", bar="bar")
>>> y = MyClass(foo="foo", bar="bar")
>>> x == y
True
Implement the __eq__ method in your class; something like this:
def __eq__(self, other):
return self.path == other.path and self.title == other.title
Edit: if you want your objects to compare equal if and only if they have equal instance dictionaries:
def __eq__(self, other):
return self.__dict__ == other.__dict__
As a summary :
It's advised to implement __eq__ rather than __cmp__, except if you run python <= 2.0 (__eq__ has been added in 2.1)
Don't forget to also implement __ne__ (should be something like return not self.__eq__(other) or return not self == other except very special case)
Don`t forget that the operator must be implemented in each custom class you want to compare (see example below).
If you want to compare with object that can be None, you must implement it. The interpreter cannot guess it ... (see example below)
class B(object):
def __init__(self):
self.name = "toto"
def __eq__(self, other):
if other is None:
return False
return self.name == other.name
class A(object):
def __init__(self):
self.toto = "titi"
self.b_inst = B()
def __eq__(self, other):
if other is None:
return False
return (self.toto, self.b_inst) == (other.toto, other.b_inst)
Depending on your specific case, you could do:
>>> vars(x) == vars(y)
True
See Python dictionary from an object's fields
You should implement the method __eq__:
class MyClass:
def __init__(self, foo, bar, name):
self.foo = foo
self.bar = bar
self.name = name
def __eq__(self,other):
if not isinstance(other,MyClass):
return NotImplemented
else:
#string lists of all method names and properties of each of these objects
prop_names1 = list(self.__dict__)
prop_names2 = list(other.__dict__)
n = len(prop_names1) #number of properties
for i in range(n):
if getattr(self,prop_names1[i]) != getattr(other,prop_names2[i]):
return False
return True
When comparing instances of objects, the __cmp__ function is called.
If the == operator is not working for you by default, you can always redefine the __cmp__ function for the object.
Edit:
As has been pointed out, the __cmp__ function is deprecated since 3.0.
Instead you should use the “rich comparison” methods.
I wrote this and placed it in a test/utils module in my project. For cases when its not a class, just plan ol' dict, this will traverse both objects and ensure
every attribute is equal to its counterpart
No dangling attributes exist (attrs that only exist on one object)
Its big... its not sexy... but oh boi does it work!
def assertObjectsEqual(obj_a, obj_b):
def _assert(a, b):
if a == b:
return
raise AssertionError(f'{a} !== {b} inside assertObjectsEqual')
def _check(a, b):
if a is None or b is None:
_assert(a, b)
for k,v in a.items():
if isinstance(v, dict):
assertObjectsEqual(v, b[k])
else:
_assert(v, b[k])
# Asserting both directions is more work
# but it ensures no dangling values on
# on either object
_check(obj_a, obj_b)
_check(obj_b, obj_a)
You can clean it up a little by removing the _assert and just using plain ol' assert but then the message you get when it fails is very unhelpful.
Below works (in my limited testing) by doing deep compare between two object hierarchies. In handles various cases including the cases when objects themselves or their attributes are dictionaries.
def deep_comp(o1:Any, o2:Any)->bool:
# NOTE: dict don't have __dict__
o1d = getattr(o1, '__dict__', None)
o2d = getattr(o2, '__dict__', None)
# if both are objects
if o1d is not None and o2d is not None:
# we will compare their dictionaries
o1, o2 = o1.__dict__, o2.__dict__
if o1 is not None and o2 is not None:
# if both are dictionaries, we will compare each key
if isinstance(o1, dict) and isinstance(o2, dict):
for k in set().union(o1.keys() ,o2.keys()):
if k in o1 and k in o2:
if not deep_comp(o1[k], o2[k]):
return False
else:
return False # some key missing
return True
# mismatched object types or both are scalers, or one or both None
return o1 == o2
This is a very tricky code so please add any cases that might not work for you in comments.
class Node:
def __init__(self, value):
self.value = value
self.next = None
def __repr__(self):
return str(self.value)
def __eq__(self,other):
return self.value == other.value
node1 = Node(1)
node2 = Node(1)
print(f'node1 id:{id(node1)}')
print(f'node2 id:{id(node2)}')
print(node1 == node2)
>>> node1 id:4396696848
>>> node2 id:4396698000
>>> True
Use the setattr function. You might want to use this when you can't add something inside the class itself, say, when you are importing the class.
setattr(MyClass, "__eq__", lambda x, y: x.foo == y.foo and x.bar == y.bar)
If you want to get an attribute-by-attribute comparison, and see if and where it fails, you can use the following list comprehension:
[i for i,j in
zip([getattr(obj_1, attr) for attr in dir(obj_1)],
[getattr(obj_2, attr) for attr in dir(obj_2)])
if not i==j]
The extra advantage here is that you can squeeze it one line and enter in the "Evaluate Expression" window when debugging in PyCharm.
I tried the initial example (see 7 above) and it did not work in ipython. Note that cmp(obj1,obj2) returns a "1" when implemented using two identical object instances. Oddly enough when I modify one of the attribute values and recompare, using cmp(obj1,obj2) the object continues to return a "1". (sigh...)
Ok, so what you need to do is iterate two objects and compare each attribute using the == sign.
Instance of a class when compared with == comes to non-equal. The best way is to ass the cmp function to your class which will do the stuff.
If you want to do comparison by the content you can simply use cmp(obj1,obj2)
In your case cmp(doc1,doc2) It will return -1 if the content wise they are same.
Let's suppose I have a program that creates some scheme with lines and points.
All lines determine by two points. There are these classes:
class Coordinates(object):
def __init__(self, x, y):
self.x = x
self.y = y
class Point(object):
def __init__(self, coordinates):
self.coordinates = coordinates
class Line(object):
def __init__(self, coordinates_1, coordinates_2):
self.coordinates_1 = coordinates_1
self.coordinates_2 = coordinates_2
A scheme takes list of lines and creates a list of unique points.
class Circuit(object):
def __init__(self, element_list):
self.line_list = element_list
self.point_collection = set()
self.point_collection = self.generate_points()
def generate_points(self):
for line in self.line_list:
coordinates_pair = [line.coordinates_1, line.coordinates_2]
for coordinates in coordinates_pair:
self.point_collection.add(Point(coordinates))
return self.point_collection
What variants are able to make a list or collection of unique objects? How to do it without using sets and sorting, only with loops and conditions? And how to do it simplier?
UPD. Code I attached doesn't work properly. I tried to add hash and eq methods in Point class:
class Point(object):
def __init__(self, coordinates):
self.coordinates = coordinates
def __hash__(self):
return 0
def __eq__(self, other):
return True
Then I try to make a scheme with some lines:
element_list=[]
element_list.append(Line(Coordinates(0,0), Coordinates(10,0)))
element_list.append(Line(Coordinates(10,0), Coordinates(10,20)))
circuit = Circuit(element_list)
print(circuit.point_collection)
Two lines here equal four points, where two points have the same coordinates. Hence, the code must print three objects, but it does only one:
{<__main__.Point object at 0x0083E050>}
Short answer:
You need to implement __hash__() and __eq__() methods in your Point class.
For an idea, see this answer showing a correct and good way to implement __hash__().
Long answer:
The documentation says that:
A set object is an unordered collection of distinct hashable objects. Common uses include (...) removing duplicates from a sequence (...)
And hashable means:
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.
Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().
Which explains why your code does not remove duplicate points.
Consider this implementation that makes all instances of Foo distinct and all instances of Bar equal:
class Foo:
pass
class Bar:
def __hash__(self):
return 0
def __eq__(self, other):
return True
Now run:
>>> set([Foo(), Foo()])
{<__main__.Foo at 0x7fb140791da0>, <__main__.Foo at 0x7fb140791f60>}
>>> set([Bar(), Bar()])
{<__main__.Bar at 0x7fb1407c5780>}
In your case, __eq__ should return True when both coordinates are equal, while __hash__ should return a hash of the coordinate pair. See the answer mentioned earlier for a good way to do this.
Some remarks:
Your Point class has currently no reason to exist from a design perspective, since it is just a wrapper around Coordinates and offers no additional functionality. You should just use either one of them, for example:
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
And why not call coordinates_1 and coordinates_2 just a and b?
class Line(object):
def __init__(self, a, b):
self.a = a
self.b = b
Also, your generate_points could be implemented in a more pythonic way:
def generate_points(self):
return set(p for l in self.line_list for p in (l.a, l.b))
Finally, for easier debugging, your might consider implementing __repr__ and __str__ methods in your classes.
The main function of the class is a dictionary with words as keys and id numbers as values (note: id is not in sequential because some of the entries are removed):
x = {'foo':0, 'bar':1, 'king':3}
When i wrote the iterator function for a customdict class i created, it breaks when iterating through range(1 to infinity) because of a KeyError.
class customdict():
def __init__(self,dic):
self.cdict = dic
self.inverse = {}
def keys(self):
# this is necessary when i try to overload the UserDict.Mixin
return self.cdict.values()
def __getitem__(self, valueid):
""" Iterator function of the inversed dictionary """
if self.inverse == {}:
self.inverse = {v:k for k,v in self.cdict.items()}
return self.inverse[valueid]
x = {'foo':0, 'bar':1, 'king':3}
y = customdict(x)
for i in y:
print i
Without try and except and accessing the len(x), how could I resolve the iteration of the dictionary within the customdict class? Reason being x is >>>, len(x) will take too long for realtime.
I've tried UserDict.DictMixin and suddenly it works, why is that so?:
import UserDict.DictMixin
class customdict(UserDict.DictMixin):
...
Is there a way so that i don't use Mixin because somehow in __future__ and python3, mixins looks like it's deprecated?
Define following method.
def __iter__(self):
for k in self.keys():
yield k
I've tried UserDict.DictMixin and suddenly it works, why is that so?:
Because DictMixin define above __iter__ method for you.
(UserDict.py source code.)
Just share another way:
class customdict(dict):
def __init__(self,dic):
dict.__init__(self,{v:k for k,v in dic.items()})
x = {'foo':0, 'bar':1, 'king':3}
y = customdict(x)
for i in y:
print i,y[i]
result:
0 foo
1 bar
3 king
def __iter__(self):
return iter(self.cdict.itervalues())
In Python3 you'd call values() instead.
You're correct that UserDict.DictMixin is out of date, but it's not the fact that it's a mixin that's the problem, it's the fact that collections.Mapping and collections.MutableMapping use a more sensible underlying interface. So if you want to update from UserDict.DictMixin, you should switch to collections.Mapping and implement __iter__() and __len__() instead of keys().
I am trying to build a heap with a custom sort predicate. Since the values going into it are of "user-defined" type, I cannot modify their built-in comparison predicate.
Is there a way to do something like:
h = heapq.heapify([...], key=my_lt_pred)
h = heapq.heappush(h, key=my_lt_pred)
Or even better, I could wrap the heapq functions in my own container so I don't need to keep passing the predicate.
According to the heapq documentation, the way to customize the heap order is to have each element on the heap to be a tuple, with the first tuple element being one that accepts normal Python comparisons.
The functions in the heapq module are a bit cumbersome (since they are not object-oriented), and always require our heap object (a heapified list) to be explicitly passed as the first parameter. We can kill two birds with one stone by creating a very simple wrapper class that will allow us to specify a key function, and present the heap as an object.
The class below keeps an internal list, where each element is a tuple, the first member of which is a key, calculated at element insertion time using the key parameter, passed at Heap instantiation:
# -*- coding: utf-8 -*-
import heapq
class MyHeap(object):
def __init__(self, initial=None, key=lambda x:x):
self.key = key
self.index = 0
if initial:
self._data = [(key(item), i, item) for i, item in enumerate(initial)]
self.index = len(self._data)
heapq.heapify(self._data)
else:
self._data = []
def push(self, item):
heapq.heappush(self._data, (self.key(item), self.index, item))
self.index += 1
def pop(self):
return heapq.heappop(self._data)[2]
(The extra self.index part is to avoid clashes when the evaluated key value is a draw and the stored value is not directly comparable - otherwise heapq could fail with TypeError)
Define a class, in which override the __lt__() function. See example below (works in Python 3.7):
import heapq
class Node(object):
def __init__(self, val: int):
self.val = val
def __repr__(self):
return f'Node value: {self.val}'
def __lt__(self, other):
return self.val < other.val
heap = [Node(2), Node(0), Node(1), Node(4), Node(2)]
heapq.heapify(heap)
print(heap) # output: [Node value: 0, Node value: 2, Node value: 1, Node value: 4, Node value: 2]
heapq.heappop(heap)
print(heap) # output: [Node value: 1, Node value: 2, Node value: 2, Node value: 4]
The heapq documentation suggests that heap elements could be tuples in which the first element is the priority and defines the sort order.
More pertinent to your question, however, is that the documentation includes a discussion with sample code of how one could implement their own heapq wrapper functions to deal with the problems of sort stability and elements with equal priority (among other issues).
In a nutshell, their solution is to have each element in the heapq be a triple with the priority, an entry count and the element to be inserted. The entry count ensures that elements with the same priority a sorted in the order they were added to the heapq.
setattr(ListNode, "__lt__", lambda self, other: self.val <= other.val)
Use this for comparing values of objects in heapq
The limitation with both answers is that they don't allow ties to be treated as ties. In the first, ties are broken by comparing items, in the second by comparing input order. It is faster to just let ties be ties, and if there are a lot of them it could make a big difference. Based on the above and on the docs, it is not clear if this can be achieved in heapq. It does seem strange that heapq does not accept a key, while functions derived from it in the same module do.
P.S.:
If you follow the link in the first comment ("possible duplicate...") there is another suggestion of defining le which seems like a solution.
In python3, you can use cmp_to_key from functools module. cpython source code.
Suppose you need a priority queue of triplets and specify the priority use the last attribute.
from heapq import *
from functools import cmp_to_key
def mycmp(triplet_left, triplet_right):
key_l, key_r = triplet_left[2], triplet_right[2]
if key_l > key_r:
return -1 # larger first
elif key_l == key_r:
return 0 # equal
else:
return 1
WrapperCls = cmp_to_key(mycmp)
pq = []
myobj = tuple(1, 2, "anystring")
# to push an object myobj into pq
heappush(pq, WrapperCls(myobj))
# to get the heap top use the `obj` attribute
inner = pq[0].obj
Performance Test:
Environment
python 3.10.2
Code
from functools import cmp_to_key
from timeit import default_timer as time
from random import randint
from heapq import *
class WrapperCls1:
__slots__ = 'obj'
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
kl, kr = self.obj[2], other.obj[2]
return True if kl > kr else False
def cmp_class2(obj1, obj2):
kl, kr = obj1[2], obj2[2]
return -1 if kl > kr else 0 if kl == kr else 1
WrapperCls2 = cmp_to_key(cmp_class2)
triplets = [[randint(-1000000, 1000000) for _ in range(3)] for _ in range(100000)]
# tuple_triplets = [tuple(randint(-1000000, 1000000) for _ in range(3)) for _ in range(100000)]
def test_cls1():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls1(triplet))
def test_cls2():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls2(triplet))
def test_cls3():
pq = []
for triplet in triplets:
heappush(pq, (-triplet[2], triplet))
start = time()
for _ in range(10):
test_cls1()
# test_cls2()
# test_cls3()
print("total running time (seconds): ", -start+(start:=time()))
Results
use list instead of tuple, per function:
WrapperCls1: 16.2ms
WrapperCls1 with __slots__: 9.8ms
WrapperCls2: 8.6ms
move the priority attribute into the first position ( don't support custom predicate ): 6.0ms.
Therefore, this method is slightly faster than using a custom class with an overridden __lt__() function and the __slots__ attribute.
Simple and Recent
A simple solution is to store entries as a list of tuples for each tuple define the priority in your desired order if you need a different order for each item within the tuple just make it the negative for descending order.
See the official heapq python documentation in this topic Priority Queue Implementation Notes