Check if an object is in a set (Python) - python

Let say I have the following Point Class.
class POINT:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.y
Main function:
def main():
mySet = set()
a = POINT(1,2)
mySet.add(a)
b = POINT(1,2)
print("B is in mySet= {}".format(b in mySet))
I would like to know an efficient way to check if an object(a point) is in a set.
I know two ways to accomplish it, but they are either not efficient or don't use a custom object:
Traverse through all the point objects in the set --> O(n)
Use set to represent points. i.e (1,2) in mySet --> not using a custom object
I believe when using the key term in, it will check the id or hash values of objects. I wonder what key term allows me to check the values of objects in a set.

We could rephrase this question to "how to use in key term with a custom object?"
We need to define hash in the custom class. How do we do it?
We need to consider two main cases:
Avoid collision
Efficient
We could get collision if we define hash = self.x + self.y because Point(x,y) and Point(y,x) would give the same hash values and it shouldn't be since their x's and y's are not the same.
One way to avoid it is by using a built-in hash function that takes objects. We could convert our self.x and self.y to a tuple object so that it can be used with the hash function. The efficient of this would be depend on how Python implements the hash().
class POINT:
def __hash__(self):
return hash((self.x, self.y))

Related

Quick way to remove duplicate objects in a List in Python

I have a list of MyClass objects which is made like so:
# The class is MyClass(string_a: str = None, string_b: str = None)
test_list: List[MyClass] = []
test_clist.append(MyClass("hello", "world"))
test_clist.append(MyClass("hello", ""))
test_clist.append(MyClass("hello", "world"))
test_clist.append(MyClass(None, "world")
I want the end result to only have the 3rd append removed:
# Remove test_clist.append(MyClass("hello", "world"))
This is just a sample and the list of objects can have nothing in the list or n. Is there a way to remove them quickly or a better way like how to quickly tell if it already exists before appending?
If your objects are of primitive types, you can use set
list(set(test_clist))
and if not, like your case then you have 2 solutions
1- Implement __hash__() & __eq__()
You have to implement __hash__() & __eq__ in your class in order to use set() to remove the duplicates
see below example
class MyClass(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return f"MyClass({self.x} - {self.y})"
def __hash__(self):
return hash((self.x, self.y))
def __eq__(self, other):
if self.__class__ != other.__class__:
return NotImplemented
return (
self.x == other.x and
self.y == other.y
)
l = []
l.append(MyClass('hello', 'world'))
l.append(MyClass('hello', 'world'))
l.append(MyClass('', 'world'))
l.append(MyClass(None, 'world'))
print(list(set(l)))
Since you have more than one key that you want to use in comparing, __hash__() uses a key tuple.
__repr__() just for representing the class's object as a string.
2- Use 3rd Party Package
check out a package called toolz
then use unique() method to remove the duplicates by passing a key
toolz.unique(test_list, key=lambda x: x.your_attribute)
In your case, you have more than one attribute, so you can combine them into one, for example by creating a concatenated property for them then pass it to toolz.unique() as your key, or just concatenate them on the fly like below
toolz.unique(test_list, key=lambda x: x.first_attribute + x.second_attribute)

Access class object by its parameters?

Is there any way to find an object saved in a list, knowing only its parameters and without traversing the said list?
For example, there's a class, objects of which have an (x;y) coordinate, and none of the objects share the same coordinate (all x/y pairs are distinct and do not repeat). These objects are all saved in a list:
class Point():
def __init__(self, x, y):
self.x = x
self.y = y
points = [Point(...), Point(...), Point(...), Point(...), ...]
Whenever I need the specific instance, is there any way to find it (here: its index in the list) by using just its coordinates without traversing the whole list like here:
def find_objects_index(x, y):
for i in range(len(points)):
if points[i].x == x and points[i].y == y:
return i
EDIT: these Point()s are to be accessed for writing, not reading, and so object.x and object.y will be changing, you can't just create a dictionary with (object.x, object.y) as keys - you'd need to add a new entry and delete the old one each time.
You can use list aggregation with a condition to get the item(s) you are looking for:
matching = [p for p in points if p.x = VALX and p.y == VALX]
However, in this case having a dictionary with (x, y) as key is most likely the correct (and well performing) way to go.
Is there any way to find an object saved in a list, knowing only its
parameters and without traversing the said list?
Short answer: No.
If you want or need to traverse such a collection of data points rapidly, perhaps you should consider using an type other than a list--a binary tree based off x or y data, for example (or if you need to track them separately, perhaps one tree for each)?
Just put the Points in a dict():
class Point():
def __init__(self, x, y):
self.x = x
self.y = y
points_list = [Point(...), Point(...), Point(...), Point(...), ...]
points_dict = {(p.x,p.y):p for p in points_list}
def find_object(x, y):
if (x,y) in points_dict:
return points_dict[(x,y)]
def replace_object(x, y, new_point):
points_dict.pop((x, y), None)
points_dict[(new_point.x, new_point.y)] = new_point

Creating array of unique objects in Python

Let's suppose I have a program that creates some scheme with lines and points.
All lines determine by two points. There are these classes:
class Coordinates(object):
def __init__(self, x, y):
self.x = x
self.y = y
class Point(object):
def __init__(self, coordinates):
self.coordinates = coordinates
class Line(object):
def __init__(self, coordinates_1, coordinates_2):
self.coordinates_1 = coordinates_1
self.coordinates_2 = coordinates_2
A scheme takes list of lines and creates a list of unique points.
class Circuit(object):
def __init__(self, element_list):
self.line_list = element_list
self.point_collection = set()
self.point_collection = self.generate_points()
def generate_points(self):
for line in self.line_list:
coordinates_pair = [line.coordinates_1, line.coordinates_2]
for coordinates in coordinates_pair:
self.point_collection.add(Point(coordinates))
return self.point_collection
What variants are able to make a list or collection of unique objects? How to do it without using sets and sorting, only with loops and conditions? And how to do it simplier?
UPD. Code I attached doesn't work properly. I tried to add hash and eq methods in Point class:
class Point(object):
def __init__(self, coordinates):
self.coordinates = coordinates
def __hash__(self):
return 0
def __eq__(self, other):
return True
Then I try to make a scheme with some lines:
element_list=[]
element_list.append(Line(Coordinates(0,0), Coordinates(10,0)))
element_list.append(Line(Coordinates(10,0), Coordinates(10,20)))
circuit = Circuit(element_list)
print(circuit.point_collection)
Two lines here equal four points, where two points have the same coordinates. Hence, the code must print three objects, but it does only one:
{<__main__.Point object at 0x0083E050>}
Short answer:
You need to implement __hash__() and __eq__() methods in your Point class.
For an idea, see this answer showing a correct and good way to implement __hash__().
Long answer:
The documentation says that:
A set object is an unordered collection of distinct hashable objects. Common uses include (...) removing duplicates from a sequence (...)
And hashable means:
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.
Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().
Which explains why your code does not remove duplicate points.
Consider this implementation that makes all instances of Foo distinct and all instances of Bar equal:
class Foo:
pass
class Bar:
def __hash__(self):
return 0
def __eq__(self, other):
return True
Now run:
>>> set([Foo(), Foo()])
{<__main__.Foo at 0x7fb140791da0>, <__main__.Foo at 0x7fb140791f60>}
>>> set([Bar(), Bar()])
{<__main__.Bar at 0x7fb1407c5780>}
In your case, __eq__ should return True when both coordinates are equal, while __hash__ should return a hash of the coordinate pair. See the answer mentioned earlier for a good way to do this.
Some remarks:
Your Point class has currently no reason to exist from a design perspective, since it is just a wrapper around Coordinates and offers no additional functionality. You should just use either one of them, for example:
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
And why not call coordinates_1 and coordinates_2 just a and b?
class Line(object):
def __init__(self, a, b):
self.a = a
self.b = b
Also, your generate_points could be implemented in a more pythonic way:
def generate_points(self):
return set(p for l in self.line_list for p in (l.a, l.b))
Finally, for easier debugging, your might consider implementing __repr__ and __str__ methods in your classes.

What happens in this class?

I am trying to alter a program, but i first need to fully understand the code.
class Coordinate:
def __init__(self,x,y):
self.x = x
self.y = y
def equal_to(self,coordinate):
return coordinate.x == self.x and coordinate.y == self.y
def merge_together(self,coordinate_together):
return Coordinate(self.x+coordinate_together.x,self.y+coordinate_together.y)
What is the functionality of this class?
I can't understand -- especially the return coordinate.x and coordinate.y parts.
::merge_together sums (Euclidean translation) the ordinates, returning a new Coordinate instance.
::equal_to compares two Coordinate objects (but perhaps should be using the __eq__ idiom -- along with related methods).
it returns True if coordinate.x == self.x AND coordinate.y == self.y.
(looks like same position)
The method expects and instance of Coordinate probably (see below the same object)

When to store things as part of an instance vs returning them?

I was just wondering when to store things as part of a class instance versus when to use a method to return things. For example, which of the following would be better:
class MClass():
def __init__(self):
self.x = self.get_x()
self.get_y()
self.z = None
self.get_z()
def get_x(self):
return 2
def get_y(self):
self.y = 5 * self.x
def get_z(self):
return self.get_x() * self.x
What are the conventions regarding this sort of thing and when should I assign things to self and when should I return values? Is this essentially a public/private sort of distinction?
You shouldn't return anything from __init__.
Python is not Java. You don't need to include get for everything.
If x is always 2 and y is always 10 and z is always 12, that is a lot of code.
Making some assumptions, I would write that class:
class MClass(object):
def __init__(self, x):
self.x = x
def y(self):
return self.x * 5
def z(self):
return self.x + self.y()
>>> c = MClass(2)
>>> c.x
2
>>> c.y() # note parentheses
10
>>> c.z()
12
This allows x to change later (e.g. c.x = 4) and still give the correct values for y and z.
You can use the #property decorator:
class MClass():
def __init__(self):
self.x = 2
#property
def y(self):
return 5 * self.x
#here a plus method for the setter
#y.setter
def y(self,value):
self.x = y/5
#property
def z(self):
return self.x * self.x
It's a good way of organizing yours acessors
There's no "conventions" regarding this, AFAIK, although there're common practices, different from one language to the next.
In python, the general belief is that "everything is public", and there's no reason at all to have a getter method just to return the value of a instance variable. You may, however, need such a method if you need to perform operations on the instance when such variable is accessed.
Your get_y method, for example, only makes sense if you need to recalculate the expression (5 * self.x) every time you access the value. Otherwise, you should simply define the y variable in the instance in __init__ - it's faster (because you don't recalculate the value every time) and it makes your intentions clear (because anyone looking at your code will immediately know that the value does not change)
Finally, some people prefer using properties instead of writing bare get/set methods. There's more info in this question
I read your question as a general Object Oriented development question, rather than a python specific one. As such, the general rule of member data would be to save the data as a member of the class only if it's relevant as part of a particular instance.
As an example, if you have a Screen object which has two dimensions, height and width. Those two should be stored as members. The area associated with a particular instance would return the value associated with a particular instance's height and width.
If there are certain things that seem like they should be calculated on the fly, but might be called over and over again, you can cache them as members as well, but that's really something you should do after you determine that it is a valid trade off (extra member in exchange for faster run time).
get should always do what it says. get_y() and get_z() don't do that.
Better do:
class MClass(object):
def __init__(self):
self.x = 2
#property
def y(self):
return 5 * self.x
#property
def z(self):
return self.x * self.x
This makes y and z always depend on the value of x.
You can do
c = MClass()
print c.y, c.z # 10, 4
c.x = 20
print c.y, c.z # 100, 400

Categories