How can I hash an object with two symmetrically equivalent characteristics? - python

I have an object (Edge) which contains two other objects (points A and B) in 3D. Geometrically, an edge from A = (0, 0, 0) to B = (1, 0, 0) should be the same as an edge from A = (1, 0, 0) to B = (0, 0, 0), and it's easy to make an equality statement of two edges. However, I'm having some conceptual problems implementing a way to hash this object (in Python). For example, hash((A, B)) will return a different value from hash((B, A)).
I've seen answers about similar problems on this site, but they all involve making a comparison between the two elements. I don't really want to do this, because while I can think of a rigorous way to compare two points (compare x-coordinates first, then y-coordinates if x are equal, then z's if y's are equal), I don't know if I want to implement a comparison which seems meaningless mathematically and only useful for this single instance. The statement (1, 0, 0) > (0, 300, 10^10) might be correct with this method, but it isn't very meaningful.
class Edge(object):
def __init__(self, pointA, pointB):
self._A = pointA
self._B = pointB
ab = pointA + pointB
self._midpoint = Vector(ab.x / 2, ab.y / 2, ab.z / 2)
def get_A(self):
return self._A
def set_A(self, point):
self._A = point
def get_B(self):
return self._B
def set_B(self, point):
self._B = point
A = property(get_A, set_A)
B = property(get_B, set_B)
def __eq__(self, other):
if isinstance(other, Edge):
if (self.A == other.A) and (self.B == other.B):
return True
elif (self.B == other.A) and (self.A == other.B):
return True
else:
return False
def __ne__(self, other):
return not self.__eq__(other)
def __hash__(self):
return hash((self.A, self.B)) # =/= hash((self.B, self.A))!
def __str__(self):
return "[{}, {}]".format(self.A, self.B)
In conclusion, I'm wondering if there's an implementation which will give two equivalent edges the same hash value without creating some arbitrary comparison function between points. (P.S. my "point" class is called "Vector")

Combine the hashes of A and B with XOR:
def __hash__(self):
return hash(self.A) ^ hash(self.B)

Related

Python Set Intersection on Object Attributes

I have 2 sets, each containing objects of the following class:
class pointInfo:
x = 0
y = 0
steps = 0
I want to find the intersection of the two sets, but only on the x and y values.
something like:
def findIntersections(pointInfoSet1, pointInfoSetwire2):
return [pointInfo for pointInfo in pointInfoSet1 if pointInfo.x, pointInfo.y in pointInfoSet2]
I know if you have 2 sets in python you can just do set1.intersection(set2), but that wont work in this case because I just want to find where a certain subset of the object attributes are the same, not identical objects. Thanks in advance!
Here's a solution that makes Point objects that are Hashable on their x and y attributes:
class Point:
def __init__(self, x=0, y=0, step=0):
self.x = x
self.y = y
self.step = step
def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return (self.x, self.y) == (other.x, other.y)
def __hash__(self):
return hash((self.x, self.y))
def __repr__(self):
return "Point(x={0.x}, y={0.y}, step={0.step})".format(self)
We can then put them in sets and get intersections naturally:
{Point(1, 1, 1), Point(2, 2)} & {Point(1, 1, 2)}
# {Point(x=1, y=1, step=2)}
Assuming no wire crosses itself, a better solution might be to iterate through both wires, maintaining a dict of points to steps, and outputting the sum of the steps at each intersection:
from itertools import chain
def stepsToIntersections(wire1, wire2):
seen = {}
for point in chain(wire1, wire2):
if point in seen:
yield point.step + seen[point]
else:
seen[point] = point.step
closest = min(stepsToIntersections(wire1, wire2))

comparing two list with custom objects, not behaving as I expect (Lexicography comparison) - Python

I have two Python lists.
a = [A(1), B(1)]
b = [A(1), B(2)]
The check a < b does not get to call B's __lt__ operator. The conclusion is that a is not smaller than b.
I have verified that A's __lt__ is called (actually twice to see if the first element in a is smaller than the one in b, and then the other way around).
Thanks in advance,
Oren
When you compare two lists in Python, it compares them element-by-element and stops comparing them after finding two unequal elements. This doesn't mean that one element has to be greater or less than the other, just that they have to be unequal. Here's a naive example using something that I think is similar to what you have; consider a class A:
class A:
def __init__(self, val):
self.val = val
def __lt__(self, obj):
return self.val < obj.val
Now consider two objects a and b such that a = A(1) and b = A(1). a < b evaluates to False like we'd expect, but a == b also evaluates to False. This is because the object has no way to compare equality through an __eq__ method, and the objects are not the exact same instance. We can add one like so:
def __eq__(self, obj):
return self.val == obj.val
Now, a == b will evaluate to True and your original expression will work as expected.
Python will only compare the second items if the first items were found to be equal. If the second items are not being compared this implies the first items were not equal.
So the issue likely lies with the __lt__ implementation for A, if you want to post that code we might be able to help spot the issue.
the python documentation clearly states that lt gets called on the first object that is not equal. In your example you did not mention you inplemented the eq operator, so this is my reproduction:
class A:
def __init__(self, val):
self.val = val
def __lt__(self, other):
print('lt in A')
return self.val < other.val
class B:
def __init__(self, val):
self.val = val
def __lt__(self, other):
print('lt in B')
return self.val < other.val
a = [A(1), B(2)]
b = [A(1), B(1)]
print(a < b)
which outputs:
lt in A
False
because the first object is different (although the same val) and it will take the result of that lt
when you implement the __eq__ method, it will continue to do this:
class A:
def init(self, val):
self.val = val
def __lt__(self, other):
print('lt in A')
return self.val < other.val
def __eq__(self, other):
return self.val == other.val
will output
lt in B
True
because the first element evaluates to true using eq

Class Custom __eq__ as Comparison of Hashes

Consider a custom class:
class MyObject:
def __init__(self, a, b):
self.a = a
self.b = b
def __hash__(self):
return hash((self.a, self.b))
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.__hash__() == other.__hash__()
Is it a bad idea to make equality reliant upon the hash? This seems like a much more elegant and readable mechanism than checking each pair of attributes in a piecemeal fashion for larger numbers of attributes ala
self.a == other.a and self.b == other.b and ... self.n == other.n
or a more dynamic check using getattr and a list (is there a better way to compare large numbers of pairs of attributes?)
Is the size of the hash returned by the builtin hash function not large enough to be reliable in relatively large sets of data?
Yes, this is a bad idea. Hashes are not unique, objects with equal hashes are not guaranteed to actually be equal too:
>>> (-1, 0) == (-2, 0)
False
>>> hash((-1, 0)) == hash((-2, 0))
True
Hashes are not meant to be unique; they are a means to pick a slot in a limited-size hash table quickly, to facilitate O(1) dictionary look-ups, and collisions are allowed and expected.
Yes, Python requires that equal objects should have equal hashes, but that doesn't mean the relationship can be reversed.
I just compare tuples:
def __eq__(self, other):
return (self.a, self.b) == (other.a, other.b)
If you are writing a lot of data classes, simple classes that all need equality testing and hashing, etc. use the dataclasses module (Python 3.7 or up, or use a backport):
from dataclasses import dataclass
#dataclass(frozen=True)
class MyObject:
a: int
b: int
The above class now comes with a __hash__ and __equals__ method:
>>> MyObject(-1, 0) == MyObject(-2, 0)
False
>>> hash(MyObject(-1, 0)) == hash(MyObject(-2, 0))
True
>>> MyObject(42, 12345) == MyObject(42, 12345)
True

merging tuples in heap module in python

I wanted to know about the merging behavior of heap.merge().
How does heapq.merge() decide the order when merging a list of tuples.?
I am given two lists each with a 3-tuple,
A = [(a, b, c)]
B = [(x, y, z)]
where the 3-tuples are of type (int, int, str). I wanted to combine the two lists. I am using heapq.merge() operation as it is efficient and optimized for large lists. A and B could contain millions of 3-tuples.
Is it guaranteed that heap.merge() will output an order where given two tuples,
a >= x and b >= y and c >= z?
Python sorts tuples in lexicographic order:
first the first two items are compared, and if they differ this
determines the outcome of the comparison; if they are equal, the next
two items are compared, and so on, until either sequence is exhausted.
Take for example,
In [33]: import heapq
In [34]: A = [(1,100,2)]
In [35]: B = [(2,0,0)]
In [40]: list(heapq.merge(A,B))
Out[40]: [(1, 100, 2), (2, 0, 0)]
In [41]: (1, 100, 2) < (2, 0, 0)
Out[41]: True
Thus, it is not necessarily true that
a >= x and b >= y and c >= z
It is possible to use heapq on any collection of orderable objects, including instances of a custom class. Using a custom class, you can arrange for any kind of ordering rule you like. For example,
class MyTuple(tuple):
def __lt__(self, other):
return all(a < b for a, b in zip(self, other))
def __eq__(self, other):
return (len(self) == len(other)
and all(a == b for a, b in zip(self, other)))
def __gt__(self, other):
return not (self < other or self == other)
def __le__(self, other):
return self < other or self == other
def __ge__(self, other):
return not self < other
A = [MyTuple((1,100,2))]
B = [MyTuple((2,0,0))]
print(list(heapq.merge(A,B)))
# [(2, 0, 0), (1, 100, 2)]
Note, however, that although this changes our notion of < for MyTuple, the result returned by heapq.merge is not guaranteed to satisfy
a <= x and b <= y and c <= z
To do this, we'd have to first remove all items from A and B which are mutually unorderable.

Using Python tuples as vectors

I need to represent immutable vectors in Python ("vectors" as in linear algebra, not as in programming). The tuple seems like an obvious choice.
The trouble is when I need to implement things like addition and scalar multiplication. If a and b are vectors, and c is a number, the best I can think of is this:
tuple(map(lambda x,y: x + y, a, b)) # add vectors 'a' and 'b'
tuple(map(lambda x: x * c, a)) # multiply vector 'a' by scalar 'c'
which seems inelegant; there should be a clearer, simpler way to get this done -- not to mention avoiding the call to tuple, since map returns a list.
Is there a better option?
NumPy supports various algebraic operations with its arrays.
Immutable types are pretty rare in Python and third-party extensions thereof; the OP rightly claims "there are enough uses for linear algebra that it doesn't seem likely I have to roll my own" -- but all the existing types I know that do linear algebra are mutable! So, as the OP is adamant on immutability, there is nothing for it but the roll-your-own route.
Not that there's all that much rolling involved, e.g. if you specifically need 2-d vectors:
import math
class ImmutableVector(object):
__slots__ = ('_d',)
def __init__(self, x, y):
object.__setattr__(self, _d, (x, y))
def __setattr__(self, n, v):
raise ValueError("Can't alter instance of %s" % type(self))
#property
def x(self):
return self._d[0]
#property
def y(self):
return self._d[1]
def __eq__(self, other):
return self._d == other._d
def __ne__(self, other):
return self._d != other._d
def __hash__(self):
return hash(self._d)
def __add__(self, other):
return type(self)(self.x+other.x, self.y+other.y)
def __mul__(self, scalar):
return type(self)(self.x*scalar, self.y*scalar)
def __repr__(self):
return '%s(%s, %s)' % (type(self).__name__, self.x, self.y)
def __abs__(self):
return math.hypot(self.x, self.y)
I "threw in for free" a few extras such as .x and .y R/O properties, nice string representation, usability in sets or as keys in dicts (why else would one want immutability?-), low memory footprint, abs(v) to give v's vector-length -- I'm sure you can think of other "wouldn't-it-be-cool-if" methods and operators, depending on your application field, and they'll be just as easy. If you need other dimensionalities it won't be much harder, though a tad less readable since the .x, .y notation doesn't apply any more;-) (but I'd use genexps, not map).
By inheriting from tuple, you can make a nice Vector class pretty easily. Here's enough code to provide addition of vectors, and multiplication of a vector by a scalar. It gives you arbitrary length vectors, and can work with complex numbers, ints, or floats.
class Vector(tuple):
def __add__(self, a):
# TODO: check lengths are compatable.
return Vector(x + y for x, y in zip(self, a))
def __mul__(self, c):
return Vector(x * c for x in self)
def __rmul__(self, c):
return Vector(c * x for x in self)
a = Vector((1, 2, 3))
b = Vector((2, 3, 4))
print a + b
print 3 * a
print a * 3
Although using a library like NumPy seems to be the resolution for the OP, I think there is still some value in a simple solution which does not require additional libraries and which you can stay immutable, with iterables.
Using the itertools and operators modules:
imap(add, a, b) # returns iterable to sum of a and b vectors
This implementation is simple. It does not use lambda neither any list-tuple conversion as it is iterator based.
from itertools import imap
from operator import add
vec1 = (1, 2, 3)
vec2 = (10, 20, 30)
result = imap(add, vec1, vec2)
print(tuple(result))
Yields:
(11, 22, 33)
Why not create your own class, making use of 2 Cartesian point member variables? (sorry if the syntax is a little off, my python is rusty)
class point:
def __init__(self,x,y):
self.x=x
self.y=y
#etc
def add(self,p):
return point(self.x + p.x, self.y + p.y)
class vector:
def __init__(self,a,b):
self.pointA=a
self.pointB=b
#etc
def add(self,v):
return vector(self.pointA + v.pointA, self.pointB + v.pointB)
For occasional use, a Python 3 solution without repeating lambdas is possible via using the standard operator package:
from operator import add, mul
a = (1, 2, 3)
b = (4, 5, 6)
print(tuple(map(add, a , b)))
print(tuple(map(mul, a , b)))
which prints:
(5, 7, 9)
(4, 10, 18)
For serious linear algebra computations using numpy vectors is the canonical solution:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a+b)
print(a*b)
which prints:
[5 7 9]
[ 4 10 18]
Since pretty much all of the sequence manipulation functions return lists, that's pretty much what you're going to have to do.

Categories