Comparing contents of lists ignoring order - python

assuming I have a class shown below:
class OBJ:
def __init__(self, a):
self.A = a
and I have 2 lists of these objects
# sorry this is a bad example, plz look at the bottom
a = [OBJ(1), OBJ(0), OBJ(20), OBJ(-1)]
b = [OBJ(20), OBJ(-1), OBJ(1), OBJ(0)]
how do I prove that these 2 lists' contents are the same?
I have tried to use the sorted() method but it doesn't seem to work because you cannot logically compare 2 objects. Does anyone have a quick and efficient way of solving this? Thank you!
edit:
sorry the 2 lists are a bad example. When i mean the same i mean they are both refering to the same object. so:
a = OBJ(1)
b = OBJ(-1)
c = OBJ(20)
x = [a,b,c]
y = [c,a,b]
how do i prove x and y are the same?

You need to implement the __eq__ and __lt__ methods to allow you to sort the objects and then compare them:
class OBJ:
def __init__(self, a):
self.A = a
def __eq__(self, other):
if not isinstance(other, OBJ):
# don't attempt to compare against unrelated types
return NotImplemented
return self.A == other.A
def __lt__(self, other):
return self.A < other.A
a = [OBJ(1), OBJ(0), OBJ(20), OBJ(-1)]
b = [OBJ(20), OBJ(-1), OBJ(1), OBJ(0)]
test:
sorted(a) == sorted(b)
Output: True
Edit:
The comment in the question made it so that you wanted to check that the objects were exactly the same, not just the same inputs. To do this, just use id() to see if they point to the same exact object
example:
a = OBJ(1)
b = OBJ(-1)
c = OBJ(20)
x = [a,b,c]
y = [c,a,b]
sorted([id(temp) for temp in x]) == sorted([id(temp) for temp in y])
Output: True
however...
a = OBJ(1)
b = OBJ(-1)
c = OBJ(20)
d = OBJ(20) # Same input value as c, but a different object
x = [a,b,c]
y = [d,a,b]
sorted([id(temp) for temp in x]) == sorted([id(temp) for temp in y])
Output: False

You could compare 2 stand-in lists that are sorted() based on your attribute A:
>>>print(sorted([o.A for o in a]) == sorted([o.A for o in b]))
True

Related

how to define 'in' to accomplish the function if dataframe's index in some self-defined class?

I have already accomplish contain in some self-defined class for instance, like
class A:
def __init__(self):
self.l = [1,2,3]
def __contain__(self, i:int):
if i in self.l:
return True
return False
And it works fine with a single element
if 1 in A:
return True
But now I want to do something like:
df = pd.DataFrame(np.random.randn(10,10))
a = df[df.index in A]
To get rows with index in A (that is to say index which is in [1,2,3])
But it shows me errors like 'TypeError: argument of type 'A' is not iterable'
I know it could be done by the form
a = df[[id for id in df.index if id in A]]
But I want to know if there is some form just like df[df.index in A] because it looks pretty and efficient~~
Whenever I was trying to make __contains__ return an iterable, I was just getting single bool
import pandas as pd
from typing import Iterable, Union
class A:
def __init__(self):
self.l = [1,2,3]
def __contains__(self, i:Union[int,Iterable]):
if isinstance(i, Iterable):
return [j in self.l for j in i]
elif i in self.l:
return True
return False
a = A()
df = pd.DataFrame(np.random.randn(10,10))
print(df.index in a)
Output:
True
Seems like python implicitly applies bool to anything coming out of __contains__.
Still, you can implement it with a Series-like interface
import pandas as pd
from typing import Iterable, Union
class A:
def __init__(self):
self.l = [1,2,3]
def isin(self,i:Iterable):
return [j in self.l for j in i]
a = A()
df = pd.DataFrame(np.random.randn(10,10))
print(df[a.isin(df.index)])
Output:
0 1 2 3 4 5 6 \
1 -0.899868 0.830076 1.106072 -1.664480 1.291234 0.257702 -1.486293
2 1.060163 1.143478 0.861907 1.480999 -1.238395 -0.130496 -0.441712
3 1.176099 0.105020 0.502756 0.993179 1.561893 1.036998 0.551943
7 8 9
1 0.394313 0.434380 -1.554062
2 -2.538269 0.188291 -0.451774
3 -0.342378 -0.779410 -1.491517

Custom object does not properly work as dictionary key even after overwriting __hash__() and __eq__()

NOTE: I am aware of this exact same question here and here. However, I have tried the solutions proposed by the answers there and they do not work for me (see sample code below).
A B object has a list of A. A is composed by a tuple of only two integers and an integer.
I am trying to use B objects as keys in a dictionary. However, even after implementing my own __eq__() and __hash__() methods, the length of my dictionary increases even after adding the same object to it.
See code below:
class A:
def __init__(self, my_tuple, my_integer):
self.my_tuple = my_tuple
self.my_integer = my_integer
def __eq__(self, other):
return self.my_tuple == other.my_tuple and self.my_integer == other.my_integer
class B:
def __init__(self):
self.list_of_A = []
def add(self, my_tuple, my_integer):
new_A = A(my_tuple, my_integer)
self.list_of_A.append(new_A)
def __hash__(self):
return hash(repr(self))
def __eq__(self, other):
for i in range(len(self.list_of_A)):
if self.list_of_A[i] != other.list_of_A[i]:
return False
return True
b_1 = B()
b_1.add((1,2), 3)
b_2 = B()
b_2.add((1,2), 3)
my_dict = {}
my_dict[b_1] = 'value'
print(len(my_dict))
my_dict[b_2] = 'value_2'
print(len(my_dict))
The output I am getting is
12
And the expected output is
11
Because I am adding the same object (i.e.:same properties values).
The hashes aren't equal because the repr()s aren't equal. Consider the following example I just did on my python console using your code:
>>> x = B()
>>> y = B()
>>> repr(x)
'<__main__.B object at 0x7f7b3a20c358>'
>>> repr(y)
'<__main__.B object at 0x7f7b3aa197b8>'
Obviously, x and y will have different hashes.
All you need to do, then, is overwrite __repr__() so that it outputs a deterministic value based on the contents of the object, rather than its memory address, and you should be good to go. In your case, that may look something like this:
class A:
...
def __repr__(self):
return f"A(my_tuple:{self.my_tuple}, my_integer:{self.my_integer})"
class B:
...
def __repr__(self):
return f"B(list_of_a:{self.list_of_a})"

Adding up instance variables of a class object

I want to sum up an instance variable of a class object in a list of that class object.
Class A(object):
def __init__(self):
self.a = 20
B = []
for i in range(10):
B.append(A())
# Can this be made more pythonic?
sum = 0
for i in B:
sum += i.a
I was thinking along the lines of using a map function or something? Don't know if that's pushing it too much though. Just curious.
class A(object):
def __init__(self):
self.a = 20
B = []
for i in range(10):
B.append(A())
s = sum(i.a for i in B)
print s
works.
You can use reduce
reduce(lambda acc, c: acc + c, [i.a for i in B])
or sum() with comprehension
sum([i.a for i in B])

comparison function must return int, not long

class C:
def __init__(self,n,x):
self.n = n
self.x = x
a = C('a',1)
b = C('b',2)
c = C('c',3)
classList = [b,a,c]
for q in classList: print q.n,
classList.sort(lambda a,b: long(a.x - b.x))
for q in classList: print q.n,
Running the code above would get the error TypeError: comparison function must return int, not long.
Is there another clean way to sort class objects by certain class variables?
Use the built-in cmp function: cmp(a.x, b.x)
By the way, you can also utilize the key parameter of sort:
classList.sort(key=lambda c: c.x)
which is faster.
According to wiki.python.org:
This technique is fast because the key function is called exactly once
for each input record.
I dont think you need long
class C:
def __init__(self,n,x):
self.n = n
self.x = x
a = C('a',1)
b = C('b',2)
c = C('c',3)
classList = [b,a,c]
for q in classList: print q.n,
classList.sort(lambda a,b: a.x - b.x)
for q in classList: print q.n,
Output:
b a c a b c
Instead of using a cmp function, use a key function - it is more efficient, and doesn't have this kind of restriction on what types it can return:
classList.sort(key=lambda a: a.x)
This is also more future proof: cmp functions are no longer supported in Python 3, and continue to exist in Python 2 in order to support old code (from before key existed).
You can just add the comparison you want to your class:
class C(object):
def __init__(self,n,x):
self.n = n
self.x = x
def __cmp__(self,other):
return cmp(self.x,other.x)

Python, mutable object as default argument, is there any way to solve?

class A:
def __init__(self, n=[0]):
self.data = n
a = A()
print a.data[0] #print 0
a.data[0] +=1
b = A()
print a.data[0] #print 1, desired output is 0
In the case above, is there any way to provide a default argument with the mutable object (such as list or class) in __init__() class A, but b is not affected by the operation a?
You could try this:
class A:
def __init__(self, n=None):
if n is None:
n = [0]
self.data = n
Which avoids the biggest problem you're facing here, that is, that's the same list for every single object of your type "A."
One possibility is:
class A:
def __init__(self, n=None):
if n is None:
n = [0]
self.data = n
Also:
class A:
def __init__(self, n=[0]):
print id(n)
self.data = n[:]
print id(self.data)
del n
a = A()
print a.data[0] #prints 0
a.data[0] +=1
print a.data[0] #prints 1
print
b = A()
print b.data[0] #prints desired output 0
The principle is that it creates another list. If a long list is passed as argument, there will be two long list in memory. So the inconvenience is that it creates another list.... That's why I delete n.
Don't think it's better, but it may give you comprehension of what happens

Categories