Creating array of unique objects in Python - python

Let's suppose I have a program that creates some scheme with lines and points.
All lines determine by two points. There are these classes:
class Coordinates(object):
def __init__(self, x, y):
self.x = x
self.y = y
class Point(object):
def __init__(self, coordinates):
self.coordinates = coordinates
class Line(object):
def __init__(self, coordinates_1, coordinates_2):
self.coordinates_1 = coordinates_1
self.coordinates_2 = coordinates_2
A scheme takes list of lines and creates a list of unique points.
class Circuit(object):
def __init__(self, element_list):
self.line_list = element_list
self.point_collection = set()
self.point_collection = self.generate_points()
def generate_points(self):
for line in self.line_list:
coordinates_pair = [line.coordinates_1, line.coordinates_2]
for coordinates in coordinates_pair:
self.point_collection.add(Point(coordinates))
return self.point_collection
What variants are able to make a list or collection of unique objects? How to do it without using sets and sorting, only with loops and conditions? And how to do it simplier?
UPD. Code I attached doesn't work properly. I tried to add hash and eq methods in Point class:
class Point(object):
def __init__(self, coordinates):
self.coordinates = coordinates
def __hash__(self):
return 0
def __eq__(self, other):
return True
Then I try to make a scheme with some lines:
element_list=[]
element_list.append(Line(Coordinates(0,0), Coordinates(10,0)))
element_list.append(Line(Coordinates(10,0), Coordinates(10,20)))
circuit = Circuit(element_list)
print(circuit.point_collection)
Two lines here equal four points, where two points have the same coordinates. Hence, the code must print three objects, but it does only one:
{<__main__.Point object at 0x0083E050>}

Short answer:
You need to implement __hash__() and __eq__() methods in your Point class.
For an idea, see this answer showing a correct and good way to implement __hash__().
Long answer:
The documentation says that:
A set object is an unordered collection of distinct hashable objects. Common uses include (...) removing duplicates from a sequence (...)
And hashable means:
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.
Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().
Which explains why your code does not remove duplicate points.
Consider this implementation that makes all instances of Foo distinct and all instances of Bar equal:
class Foo:
pass
class Bar:
def __hash__(self):
return 0
def __eq__(self, other):
return True
Now run:
>>> set([Foo(), Foo()])
{<__main__.Foo at 0x7fb140791da0>, <__main__.Foo at 0x7fb140791f60>}
>>> set([Bar(), Bar()])
{<__main__.Bar at 0x7fb1407c5780>}
In your case, __eq__ should return True when both coordinates are equal, while __hash__ should return a hash of the coordinate pair. See the answer mentioned earlier for a good way to do this.
Some remarks:
Your Point class has currently no reason to exist from a design perspective, since it is just a wrapper around Coordinates and offers no additional functionality. You should just use either one of them, for example:
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
And why not call coordinates_1 and coordinates_2 just a and b?
class Line(object):
def __init__(self, a, b):
self.a = a
self.b = b
Also, your generate_points could be implemented in a more pythonic way:
def generate_points(self):
return set(p for l in self.line_list for p in (l.a, l.b))
Finally, for easier debugging, your might consider implementing __repr__ and __str__ methods in your classes.

Related

How to combine two objects of the class together as a dictionary [duplicate]

How do you go about overloading the addition, subtraction, and multiplication operator so we can add, subtract, and multiply two vectors of different or identical sizes? For example, if the vectors are different sizes we must be able to add, subtract, or multiply the two vectors according to the smallest vector size?
I've created a function that allows you to modify different vectors, but now I'm struggling to overload the operators and haven't a clue on where to begin. I will paste the code below. Any ideas?
def __add__(self, y):
self.vector = []
for j in range(len(self.vector)):
self.vector.append(self.vector[j] + y.self.vector[j])
return Vec[self.vector]
You define the __add__, __sub__, and __mul__ methods for the class, that's how. Each method takes two objects (the operands of +/-/*) as arguments and is expected to return the result of the computation.
Nothing wrong with the accepted answer on this question but I'm adding some quick snippets to illustrate how this can be used. (Note that you could also "overload" the method to handle multiple types.)
"""Return the difference of another Transaction object, or another
class object that also has the `val` property."""
class Transaction(object):
def __init__(self, val):
self.val = val
def __sub__(self, other):
return self.val - other.val
buy = Transaction(10.00)
sell = Transaction(7.00)
print(buy - sell)
# 3.0
"""Return a Transaction object with `val` as the difference of this
Transaction.val property and another object with a `val` property."""
class Transaction(object):
def __init__(self, val):
self.val = val
def __sub__(self, other):
return Transaction(self.val - other.val)
buy = Transaction(20.00)
sell = Transaction(5.00)
result = buy - sell
print(result.val)
# 15
"""Return difference of this Transaction.val property and an integer."""
class Transaction(object):
def __init__(self, val):
self.val = val
def __sub__(self, other):
return self.val - other
buy = Transaction(8.00)
print(buy - 6.00)
# 2
docs have the answer. Basically there are functions that get called on an object when you add or multiple, etc. for instance __add__ is the normal add function.

Quick way to remove duplicate objects in a List in Python

I have a list of MyClass objects which is made like so:
# The class is MyClass(string_a: str = None, string_b: str = None)
test_list: List[MyClass] = []
test_clist.append(MyClass("hello", "world"))
test_clist.append(MyClass("hello", ""))
test_clist.append(MyClass("hello", "world"))
test_clist.append(MyClass(None, "world")
I want the end result to only have the 3rd append removed:
# Remove test_clist.append(MyClass("hello", "world"))
This is just a sample and the list of objects can have nothing in the list or n. Is there a way to remove them quickly or a better way like how to quickly tell if it already exists before appending?
If your objects are of primitive types, you can use set
list(set(test_clist))
and if not, like your case then you have 2 solutions
1- Implement __hash__() & __eq__()
You have to implement __hash__() & __eq__ in your class in order to use set() to remove the duplicates
see below example
class MyClass(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return f"MyClass({self.x} - {self.y})"
def __hash__(self):
return hash((self.x, self.y))
def __eq__(self, other):
if self.__class__ != other.__class__:
return NotImplemented
return (
self.x == other.x and
self.y == other.y
)
l = []
l.append(MyClass('hello', 'world'))
l.append(MyClass('hello', 'world'))
l.append(MyClass('', 'world'))
l.append(MyClass(None, 'world'))
print(list(set(l)))
Since you have more than one key that you want to use in comparing, __hash__() uses a key tuple.
__repr__() just for representing the class's object as a string.
2- Use 3rd Party Package
check out a package called toolz
then use unique() method to remove the duplicates by passing a key
toolz.unique(test_list, key=lambda x: x.your_attribute)
In your case, you have more than one attribute, so you can combine them into one, for example by creating a concatenated property for them then pass it to toolz.unique() as your key, or just concatenate them on the fly like below
toolz.unique(test_list, key=lambda x: x.first_attribute + x.second_attribute)

Check if an object is in a set (Python)

Let say I have the following Point Class.
class POINT:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.y
Main function:
def main():
mySet = set()
a = POINT(1,2)
mySet.add(a)
b = POINT(1,2)
print("B is in mySet= {}".format(b in mySet))
I would like to know an efficient way to check if an object(a point) is in a set.
I know two ways to accomplish it, but they are either not efficient or don't use a custom object:
Traverse through all the point objects in the set --> O(n)
Use set to represent points. i.e (1,2) in mySet --> not using a custom object
I believe when using the key term in, it will check the id or hash values of objects. I wonder what key term allows me to check the values of objects in a set.
We could rephrase this question to "how to use in key term with a custom object?"
We need to define hash in the custom class. How do we do it?
We need to consider two main cases:
Avoid collision
Efficient
We could get collision if we define hash = self.x + self.y because Point(x,y) and Point(y,x) would give the same hash values and it shouldn't be since their x's and y's are not the same.
One way to avoid it is by using a built-in hash function that takes objects. We could convert our self.x and self.y to a tuple object so that it can be used with the hash function. The efficient of this would be depend on how Python implements the hash().
class POINT:
def __hash__(self):
return hash((self.x, self.y))

Using queue.PriorityQueue, not caring about comparisons

I'm trying to use queue.PriorityQueue in Python 3(.6).
I would like to store objects with a given priority. But if two objects have the same priority, I don't mind PriorityQueue.get to return either. In other words, my objects can't be compared at integers, it won't make sense to allow them to be, I just care about the priority.
In Python 3.7's documentation, there's a solution involving dataclasses. And I quote:
If the data elements are not comparable, the data can be wrapped in a class that ignores the data item and only compares the priority number:
from dataclasses import dataclass, field
from typing import Any
#dataclass(order=True)
class PrioritizedItem:
priority: int
item: Any=field(compare=False)
Alas, I'm using Python 3.6. In the documentation of this version of Python, there's no comment on using PriorityQueue for the priorities, not bothering about the "object value" which wouldn't be logical in my case.
Is there a better way than to define __le__ and other comparison methods on my custom class? I find this solution particularly ugly and counter-intuitive, but that might be me.
dataclasses is just a convenience method to avoid having to create a lot of boilerplate code.
You don't actually have to create a class. A tuple with a unique counter value too:
from itertools import count
unique = count()
q.put((priority, next(unique), item))
so that ties between equal priority are broken by the integer that follows; because it is always unique the item value is never consulted.
You can also create a class using straight-up rich comparison methods, made simpler with #functools.total_ordering:
from functools import total_ordering
#total_ordering
class PrioritizedItem:
def __init__(self, priority, item):
self.priority = priority
self.item = item
def __eq__(self, other):
if not isinstance(other, __class__):
return NotImplemented
return self.priority == other.priority
def __lt__(self, other):
if not isinstance(other, __class__):
return NotImplemented
return self.priority < other.priority
See priority queue implementation notes - just before the section you quoted (regarding using dataclasses) it tells you how to do it whitout them:
... is to store entries as 3-element list including the priority, an entry count, and the task. The entry count serves as a tie-breaker so that two tasks with the same priority are returned in the order they were added. And since no two entry counts are the same, the tuple comparison will never attempt to directly compare two tasks.
So simply add your items as 3rd element in a tuple (Prio, Count, YourElem) when adding to your queue.
Contreived example:
from queue import PriorityQueue
class CompareError(ValueError): pass
class O:
def __init__(self,n):
self.n = n
def __lq__(self):
raise CompareError
def __repr__(self): return str(self)
def __str__(self): return self.n
def add(prioqueue,prio,item):
"""Adds the 'item' with 'prio' to the 'priorqueue' adding a unique value that
is stored as member of this method 'add.n' which is incremented on each usage."""
prioqueue.put( (prio, add.n, item))
add.n += 1
# no len() on PrioQueue - we ensure our unique integer via method-param
# if you forget to declare this, you get an AttributeError
add.n = 0
h = PriorityQueue()
add(h, 7, O('release product'))
add(h, 1, O('write spec 3'))
add(h, 1, O('write spec 2'))
add(h, 1, O('write spec 1'))
add(h, 3, O('create tests'))
for _ in range(4):
item = h.get()
print(item)
Using h.put( (1, O('write spec 1')) ) leads to
TypeError: '<' not supported between instances of 'O' and 'int'`
Using def add(prioqueue,prio,item): pushes triplets as items wich have guaranteed distinct 2nd values so our O()-instances are never used as tie-breaker.
Output:
(1, 2, write spec 3)
(1, 3, write spec 2)
(1, 4, write spec 1)
(3, 5, create tests)
see MartijnPieters answer #here for a nicer unique 2nd element.
Let's assume that we don't want to write a decorator with equivalent functionality to dataclass. The problem is that we don't want to have to define all of the comparison operators in order to make our custom class comparable based on priority. The #functools.total_ordering decorator can help. Excerpt:
Given a class defining one or more rich comparison ordering methods, this class decorator supplies the rest. This simplifies the effort involved in specifying all of the possible rich comparison operations:
The class must define one of __lt__(), __le__(), __gt__(), or __ge__(). In addition, the class should supply an __eq__() method.
Using the provided example:
from functools import total_ordering
#total_ordering
class PrioritizedItem:
# ...
def __eq__(self, other):
return self.priority == other.priority
def __lt__(self, other):
return self.priority < other.priority
All you need is a wrapper class that implements __lt__ in order for PriorityQueue to work correctly. This is noted here:
The sort routines are guaranteed to use __lt__() when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an __lt__() method
It's as simple as something like this
class PriorityElem:
def __init__(self, elem_to_wrap):
self.wrapped_elem = elem_to_wrap
def __lt__(self, other):
return self.wrapped_elem.priority < other.wrapped_elem.priority
If your elements do not have priorities then it's as simple as:
class PriorityElem:
def __init__(self, elem_to_wrap, priority):
self.wrapped_elem = elem_to_wrap
self.priority = other.priority
def __lt__(self, other):
return self.priority < other.priority
Now you can use PriorityQueue like so
queue = PriorityQueue()
queue.put(PriorityElem(my_custom_class1, 10))
queue.put(PriorityElem(my_custom_class2, 10))
queue.put(PriorityElem(my_custom_class3, 30))
first_returned_elem = queue.get()
# first_returned_elem is PriorityElem(my_custom_class1, 10)
second_returned_elem = queue.get()
# second_returned_elem is PriorityElem(my_custom_class2, 10)
third_returned_elem = queue.get()
# third_returned_elem is PriorityElem(my_custom_class3, 30)
Getting at your original elements in that case would be as simple as
elem = queue.get().wrapped_elem
Since you don't care about sort stability that's all you need.
Edit: As noted in the comments and confirmed here, heappush is not stable:
unlike sorted(), this implementation is not stable.

Keys are not unique for a python dictionary!

A stupid newbie question here
For a python dictionary q len(set(q.keys())) != len(q.keys()). Is that even possible?
This can happen if you violate a requirement of dict, and change its hash.
When an object is used in a dict, its hash value must not change, and its equality to other objects must not change. Other properties may change, as long as they don't affect how it appears to the dict.
(This does not mean that a hash value is never allowed to change. That's a common misconception. Hash values themselves may change. It's only dict which requires that key hashes be immutable, not __hash__ itself.)
The following code adds an object to a dict, then changes its hash out from under the dict. q[a] = 2 then adds a as a new key in the dict, even though it's already present; since the hash value changed, the dict doesn't find the old value. This reproduces the peculiarity you saw.
class Test(object):
def __init__(self, h):
self.h = h
def __hash__(self):
return self.h
a = Test(1)
q = {}
q[a] = 1
a.h = 2
q[a] = 2
print q
# True:
print len(set(q.keys())) != len(q.keys())
The underlying code for dictionaries and sets is substantially the same, so you can usually expect that len(set(d.keys()) == len(d.keys()) is an invariant.
That said, both sets and dicts depend on __eq__ and __hash__ to identify unique values and to organize them for efficient search. So, if those return inconsistent results (or violate the rule that "a==b implies hash(a)==hash(b)", then there is no way to enforce the invariant:
>>> from random import randrange
>>> class A():
def __init__(self, x):
self.x = x
def __eq__(self, other):
return bool(randrange(2))
def __hash__(self):
return randrange(8)
def __repr__(self):
return '|%d|' % self.x
>>> s = [A(i) for i in range(100)]
>>> d = dict.fromkeys(s)
>>> len(d.keys())
29
>>> len(set(d.keys()))
12

Categories