merging tuples in heap module in python

merging tuples in heap module in python - python

I wanted to know about the merging behavior of heap.merge().
How does heapq.merge() decide the order when merging a list of tuples.?
I am given two lists each with a 3-tuple,
A = [(a, b, c)]
B = [(x, y, z)]
where the 3-tuples are of type (int, int, str). I wanted to combine the two lists. I am using heapq.merge() operation as it is efficient and optimized for large lists. A and B could contain millions of 3-tuples.
Is it guaranteed that heap.merge() will output an order where given two tuples,
a >= x and b >= y and c >= z?

Python sorts tuples in lexicographic order:
first the first two items are compared, and if they differ this
determines the outcome of the comparison; if they are equal, the next
two items are compared, and so on, until either sequence is exhausted.
Take for example,
In [33]: import heapq
In [34]: A = [(1,100,2)]
In [35]: B = [(2,0,0)]
In [40]: list(heapq.merge(A,B))
Out[40]: [(1, 100, 2), (2, 0, 0)]
In [41]: (1, 100, 2) < (2, 0, 0)
Out[41]: True
Thus, it is not necessarily true that
a >= x and b >= y and c >= z
It is possible to use heapq on any collection of orderable objects, including instances of a custom class. Using a custom class, you can arrange for any kind of ordering rule you like. For example,
class MyTuple(tuple):
def __lt__(self, other):
return all(a < b for a, b in zip(self, other))
def __eq__(self, other):
return (len(self) == len(other)
and all(a == b for a, b in zip(self, other)))
def __gt__(self, other):
return not (self < other or self == other)
def __le__(self, other):
return self < other or self == other
def __ge__(self, other):
return not self < other
A = [MyTuple((1,100,2))]
B = [MyTuple((2,0,0))]
print(list(heapq.merge(A,B)))
# [(2, 0, 0), (1, 100, 2)]
Note, however, that although this changes our notion of < for MyTuple, the result returned by heapq.merge is not guaranteed to satisfy
a <= x and b <= y and c <= z
To do this, we'd have to first remove all items from A and B which are mutually unorderable.

Related

Comparing contents of lists ignoring order

assuming I have a class shown below:
class OBJ:
def __init__(self, a):
self.A = a
and I have 2 lists of these objects
# sorry this is a bad example, plz look at the bottom
a = [OBJ(1), OBJ(0), OBJ(20), OBJ(-1)]
b = [OBJ(20), OBJ(-1), OBJ(1), OBJ(0)]
how do I prove that these 2 lists' contents are the same?
I have tried to use the sorted() method but it doesn't seem to work because you cannot logically compare 2 objects. Does anyone have a quick and efficient way of solving this? Thank you!
edit:
sorry the 2 lists are a bad example. When i mean the same i mean they are both refering to the same object. so:
a = OBJ(1)
b = OBJ(-1)
c = OBJ(20)
x = [a,b,c]
y = [c,a,b]
how do i prove x and y are the same?

You need to implement the __eq__ and __lt__ methods to allow you to sort the objects and then compare them:
class OBJ:
def __init__(self, a):
self.A = a
def __eq__(self, other):
if not isinstance(other, OBJ):
# don't attempt to compare against unrelated types
return NotImplemented
return self.A == other.A
def __lt__(self, other):
return self.A < other.A
a = [OBJ(1), OBJ(0), OBJ(20), OBJ(-1)]
b = [OBJ(20), OBJ(-1), OBJ(1), OBJ(0)]
test:
sorted(a) == sorted(b)
Output: True
Edit:
The comment in the question made it so that you wanted to check that the objects were exactly the same, not just the same inputs. To do this, just use id() to see if they point to the same exact object
example:
a = OBJ(1)
b = OBJ(-1)
c = OBJ(20)
x = [a,b,c]
y = [c,a,b]
sorted([id(temp) for temp in x]) == sorted([id(temp) for temp in y])
Output: True
however...
a = OBJ(1)
b = OBJ(-1)
c = OBJ(20)
d = OBJ(20) # Same input value as c, but a different object
x = [a,b,c]
y = [d,a,b]
sorted([id(temp) for temp in x]) == sorted([id(temp) for temp in y])
Output: False

You could compare 2 stand-in lists that are sorted() based on your attribute A:
>>>print(sorted([o.A for o in a]) == sorted([o.A for o in b]))
True

How can I hash an object with two symmetrically equivalent characteristics?

I have an object (Edge) which contains two other objects (points A and B) in 3D. Geometrically, an edge from A = (0, 0, 0) to B = (1, 0, 0) should be the same as an edge from A = (1, 0, 0) to B = (0, 0, 0), and it's easy to make an equality statement of two edges. However, I'm having some conceptual problems implementing a way to hash this object (in Python). For example, hash((A, B)) will return a different value from hash((B, A)).
I've seen answers about similar problems on this site, but they all involve making a comparison between the two elements. I don't really want to do this, because while I can think of a rigorous way to compare two points (compare x-coordinates first, then y-coordinates if x are equal, then z's if y's are equal), I don't know if I want to implement a comparison which seems meaningless mathematically and only useful for this single instance. The statement (1, 0, 0) > (0, 300, 10^10) might be correct with this method, but it isn't very meaningful.
class Edge(object):
def __init__(self, pointA, pointB):
self._A = pointA
self._B = pointB
ab = pointA + pointB
self._midpoint = Vector(ab.x / 2, ab.y / 2, ab.z / 2)
def get_A(self):
return self._A
def set_A(self, point):
self._A = point
def get_B(self):
return self._B
def set_B(self, point):
self._B = point
A = property(get_A, set_A)
B = property(get_B, set_B)
def __eq__(self, other):
if isinstance(other, Edge):
if (self.A == other.A) and (self.B == other.B):
return True
elif (self.B == other.A) and (self.A == other.B):
return True
else:
return False
def __ne__(self, other):
return not self.__eq__(other)
def __hash__(self):
return hash((self.A, self.B)) # =/= hash((self.B, self.A))!
def __str__(self):
return "[{}, {}]".format(self.A, self.B)
In conclusion, I'm wondering if there's an implementation which will give two equivalent edges the same hash value without creating some arbitrary comparison function between points. (P.S. my "point" class is called "Vector")

Combine the hashes of A and B with XOR:
def __hash__(self):
return hash(self.A) ^ hash(self.B)

How to use a custom function in max(x, key=custom_function) function?

I have a custom data type, say: mytime, which represent hours and minutes, such as 29:45, it is 29 hours and 45 minutes.
I want to use max built-in function to find the item in a list of lists, whose sum of its elements is the greatest, where all lists contain values of mytime type.
x = [[a, b], [c, d]]
a,b,c,d are of mytime type.
max(x, key=sum)
won't work here, because a,b,c,d, are not integers.
If I type a + b at python command line, I get the sum of these two time values, result is of mytime type, without any errors.
How do I use max function here?

Let's say your class looks like this:
class mytime(object):
def __init__(self, h, m):
self.h = h
self.m = m
def __add__(self, other):
return mytime(self.h + other.h, self.m + other.m)
def __repr__(self):
return '%i:%i' % (self.h, self.m)
and you use it like this:
a = mytime(10, 10)
b = mytime(2, 22)
print a + b
and it will work as expect:
12:32
Problem:
What you want to do is:
l = [a, b]
print sum(l)
but it will fail:
TypeError: unsupported operand type(s) for +: 'int' and 'mytime'
The problem is that the sum function will start with 0 and will add up all values of the list. It will try to evaluate
0 + mytime(10, 10)
which will fail.
Solution:
The solution to your problem is implementing the __radd__ function, which represents "reverse add" and is called when the arguments can't be resolved in the "forward" direction. For example, x + y is evaluated as x.__add__(y) if possible, but if that doesn't exist then Python tries y.__radd__(x).
So you can add the following method to your class:
def __radd__(self, other):
return mytime(self.h, self.m)
and the sum function will work for you (in this implementation ignoring the other value, which is probably fine in your case).

You can write your own sum function:
def my_sum(item):
return sum(60 * e[0] + e[1] for e in item)
x = [[(2,0), (3,0)], [(9, 0), (4, 0)]]
print max(x, key=my_sum)
I have represented your mytime data structure as tuples (with hours and minutes) so you may need to adjust my_sum to your data structure. The only requirement is that the hours and minutes of a mytime can be filled in for e[0] and e[1] respectively.
The above code returns the greatest element (in this case [(9, 0), (4, 0)]).

Are you sure using a + b works? All sum does is repeatedly apply + to adjacent elements (it's the same as reduce(operator.add, sequence) with a special case to break on strings)... So if it does work - then max(x, key=sum) should just work -- as long as mydate supports comparison operators - eg __gt__, __eq__, __lt__
Example
You need to have __gt__ defined for max to work...
class mydate(object):
def __init__(self, num):
self.num = num
def __add__(self, other): # make sure sum works
return self.num + other.num
def __gt__(self, other): # make sure max can do > comparison
return self.num > other.num
def __repr__(self):
return 'date: {}'.format(self.num)
x = mydate(3)
y = mydate(5)
z = mydate(2)
print max([x,y,z], key=sum)

Overloading + to support tuples

I'd like to be able to write something like this in python:
a = (1, 2)
b = (3, 4)
c = a + b # c would be (4, 6)
d = 3 * b # d would be (9, 12)
I realize that you can overload operators to work with custom classes, but is there a way to overload operators to work with pairs?
Of course, such solutions as
c = tuple([x+y for x, y in zip(a, b)])
do work, but, let aside performance, they aren't quite as pretty as overloading the + operator.
One can of course define add and mul functions such as
def add((x1, y1), (x2, y2)):
return (x1 + x2, y1 + y2)
def mul(a, (x, y)):
return (a * x, a * y)
but still being able to write q * b + r instead of add(times(q, b), r) would be nicer.
Ideas?
EDIT: On a side note, I realize that since + currently maps to tuple concatenation, it might be unwise to redefine it, even if it's possible. The question still holds for - for example =)

In contrast to Ruby, you can't change the behaviour of built-in types in Python. All you can do is create a new type derived from a built-in type. Literals will still create the built-in type, though.
Probably the best you can get is
class T(tuple):
def __add__(self, other):
return T(x + y for x, y in zip(self, other))
def __rmul__(self, other):
return T(other * x for x in self)
a = T((1, 2))
b = T((3, 4))
c = a + b # c would be (4, 6)
d = 3 * b # d would be (9, 12)

You can inherit a class from tuple and overload its __add__ method. Here's a very simplistic example:
class mytuple(tuple):
def __add__(self, other):
assert len(self) == len(other)
return tuple([x + y for x, y in zip(self, other)])
mt = mytuple((5, 6))
print mt + (2, 3) # prints (7, 9)
I wouldn't recommend this approach though, because tuples weren't really designed for this purpose. If you want to perform numeric computations, just use numpy.

You cannot modify types defined in C, so you would need to create all new types for this. Or you could just use NumPy, which already has types that support this.

There is the famous infix operator hack that would allow you to do soemthing like this:
x = Infix(lambda a,b:tuple([x+y for x, y in zip(a, b)]))
y = Infix(lambda a,b:tuple([a*y for y in b]))
c = a |x| b # c would be (4, 6)
d = 3 |y| b # d would be (9, 12)
That would hide the generator expressions and be applicable to tuples of all lengths, at the expense of "weird" pseudo-operators |x| and |y|.

Using python complex numbers is definitely one way to do it, if not extremely pretty.
a = 1 + 2j
b = 3 + 4j
c = a + b # c would be 4 + 6j
d = 3 * b # d would be 9 + 12j
That saves the definition of an extra class.
Also, expanding on previous answers,
class T(tuple):
def __add__((x, y), (x1, y1)):
return T((x+x1, y+y1))
def __rmul__((x, y), other):
return T((other * x, other * y))
would improve performance, at the cost of restraining the implementation to pairs.

Write your own class and implement __mul__, __add__ etc.

You can use numpy.array to get all you need.

Using Python tuples as vectors

I need to represent immutable vectors in Python ("vectors" as in linear algebra, not as in programming). The tuple seems like an obvious choice.
The trouble is when I need to implement things like addition and scalar multiplication. If a and b are vectors, and c is a number, the best I can think of is this:
tuple(map(lambda x,y: x + y, a, b)) # add vectors 'a' and 'b'
tuple(map(lambda x: x * c, a)) # multiply vector 'a' by scalar 'c'
which seems inelegant; there should be a clearer, simpler way to get this done -- not to mention avoiding the call to tuple, since map returns a list.
Is there a better option?

NumPy supports various algebraic operations with its arrays.

Immutable types are pretty rare in Python and third-party extensions thereof; the OP rightly claims "there are enough uses for linear algebra that it doesn't seem likely I have to roll my own" -- but all the existing types I know that do linear algebra are mutable! So, as the OP is adamant on immutability, there is nothing for it but the roll-your-own route.
Not that there's all that much rolling involved, e.g. if you specifically need 2-d vectors:
import math
class ImmutableVector(object):
__slots__ = ('_d',)
def __init__(self, x, y):
object.__setattr__(self, _d, (x, y))
def __setattr__(self, n, v):
raise ValueError("Can't alter instance of %s" % type(self))
#property
def x(self):
return self._d[0]
#property
def y(self):
return self._d[1]
def __eq__(self, other):
return self._d == other._d
def __ne__(self, other):
return self._d != other._d
def __hash__(self):
return hash(self._d)
def __add__(self, other):
return type(self)(self.x+other.x, self.y+other.y)
def __mul__(self, scalar):
return type(self)(self.x*scalar, self.y*scalar)
def __repr__(self):
return '%s(%s, %s)' % (type(self).__name__, self.x, self.y)
def __abs__(self):
return math.hypot(self.x, self.y)
I "threw in for free" a few extras such as .x and .y R/O properties, nice string representation, usability in sets or as keys in dicts (why else would one want immutability?-), low memory footprint, abs(v) to give v's vector-length -- I'm sure you can think of other "wouldn't-it-be-cool-if" methods and operators, depending on your application field, and they'll be just as easy. If you need other dimensionalities it won't be much harder, though a tad less readable since the .x, .y notation doesn't apply any more;-) (but I'd use genexps, not map).

By inheriting from tuple, you can make a nice Vector class pretty easily. Here's enough code to provide addition of vectors, and multiplication of a vector by a scalar. It gives you arbitrary length vectors, and can work with complex numbers, ints, or floats.
class Vector(tuple):
def __add__(self, a):
# TODO: check lengths are compatable.
return Vector(x + y for x, y in zip(self, a))
def __mul__(self, c):
return Vector(x * c for x in self)
def __rmul__(self, c):
return Vector(c * x for x in self)
a = Vector((1, 2, 3))
b = Vector((2, 3, 4))
print a + b
print 3 * a
print a * 3

Although using a library like NumPy seems to be the resolution for the OP, I think there is still some value in a simple solution which does not require additional libraries and which you can stay immutable, with iterables.
Using the itertools and operators modules:
imap(add, a, b) # returns iterable to sum of a and b vectors
This implementation is simple. It does not use lambda neither any list-tuple conversion as it is iterator based.
from itertools import imap
from operator import add
vec1 = (1, 2, 3)
vec2 = (10, 20, 30)
result = imap(add, vec1, vec2)
print(tuple(result))
Yields:
(11, 22, 33)

Why not create your own class, making use of 2 Cartesian point member variables? (sorry if the syntax is a little off, my python is rusty)
class point:
def __init__(self,x,y):
self.x=x
self.y=y
#etc
def add(self,p):
return point(self.x + p.x, self.y + p.y)
class vector:
def __init__(self,a,b):
self.pointA=a
self.pointB=b
#etc
def add(self,v):
return vector(self.pointA + v.pointA, self.pointB + v.pointB)

For occasional use, a Python 3 solution without repeating lambdas is possible via using the standard operator package:
from operator import add, mul
a = (1, 2, 3)
b = (4, 5, 6)
print(tuple(map(add, a , b)))
print(tuple(map(mul, a , b)))
which prints:
(5, 7, 9)
(4, 10, 18)
For serious linear algebra computations using numpy vectors is the canonical solution:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a+b)
print(a*b)
which prints:
[5 7 9]
[ 4 10 18]

Since pretty much all of the sequence manipulation functions return lists, that's pretty much what you're going to have to do.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

merging tuples in heap module in python - python

Related

Comparing contents of lists ignoring order

How can I hash an object with two symmetrically equivalent characteristics?

How to use a custom function in max(x, key=custom_function) function?

Overloading + to support tuples

Using Python tuples as vectors

Categories

Resources