I can understand that some langurage allows user to do some operator overloading. I know this in C++ area first. But c++ also has some restrictions on operator overloading and I think that's reasonable.
but when I come to python pandams library. I'm start to confused.
Take a look at my code at nbviewer.jupyter.org
complaints['Complaint Type'] == "Noise - Street/Sidewalk"
doesn't return a True or False.
This is crazy to me. Does anyone can help me to understand this?
in Python, can we overloading operator == so that it doesn't return a boolean?
If it is true for question 1, how can I wrote a simple code to demo this?
Some relevant results copied from the link:
>>> complaints['Complaint Type'] == "Noise - Street/Sidewalk"
0 True
1 False
2 False
3 False
4 False
...
111063 False
111064 False
111065 False
111066 True
111067 False
111068 False
Name: Complaint Type, Length: 111069, dtype: bool
You can overload operators if you create your own classes and add a __eq__ method to them.
class MyClass(object):
def __eq__(self, other):
# compare self with other, return whatever you need
This will be invoked whenever you compare your type with self == other. It is considered very normal to return a boolean from this function in python, so you might want to have a think about returning anything else if you want your code to make sense to other developers.
See the docs for python 2 on this here
Related
Just to clarify, I can't imagine ever wanting to do this. But let's say I want to modify how == works. Not in the context of a custom class I'm making, but in all cases. Let's say, for instance, that I want to redefine == so it ignores its operands and returns True in all cases.
So,
>>>> 1 == 5
True
>>>> True == False
True
Can this be done? My curiosity won't let me rest until I know. Thanks!
I'm pretty certain you can not overload operators of built in types. But I suppose you could wrap it
class MyInt(int):
def __eq__(self, other):
return other == 5
if __name__ == '__main__':
i = MyInt(2)
print(i == 5)
i += 5
print(i == 5)
print(i)
The MyInt class would just act like an normal int but its equality check is overwritten and would return what ever you want it to. However as soon as you start using other operators it will return to being a normal int. You would have to overwrite all the operators as well, making sure you always return MyInt.
Output:
True
False
7
That's pretty much the only way I can think of achieving something close to what you asked for.
The most basic types like int or bool are written as C extension types for which the methods can't be changed:
E.g. trying to make integer + subtracting:
>>> int.__add__ = int.__sub__
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
int.__add__ = int.__sub__
TypeError: can't set attributes of built-in/extension type 'int'
Currently a set of functions returns success=True or False.
We've discovered this isn't good enough, though, since False can convey both "valid result" or "invalid result", and we want behavior to differ in each case.
So I think they should be changed to instead return {True, False, InvalidResult}, where bool(InvalidResult) is false for backward compatibility, but can be tested for using if is InvalidResult.
I'm not sure what the terminology is, but I'm imagining something like the built-in NotImplemented that's returned by comparison functions. This is called a "special value" in the docs and is of type NotImplementedType.
How to create such an object and what methods/attributes should it have? I should create my own type like NotImplementedType also, or is there an existing type that conveys this "flag" concept? It's a similar kind of object to True, False, None, NotImplemented, etc.
You could just use None or 0 as the InvalidResult value, e.g. in my_mod, define InvalidResult = None, then elsewhere you can test if result is my_mod.InvalidResult. See here for some more info on the "truthfulness" of None: False or None vs. None or False
Or you could define an object with suitable methods for Boolean conversion; hopefully others will chime in with those details.
Note that whichever way you go, you'll have to be careful if you have multipart Boolean expressions: InvalidResult and False will give InvalidResult
but False and InvalidResult will give False.
Apparently this is called a "sentinel" and is a simple as this:
class InvalidResultType(object):
"""
Indicates that minimization has failed and result is invalid (such as a
boundary or constraint violation)
"""
def __repr__(self):
return 'InvalidResult'
def __bool__(self):
return False
def __reduce__(self):
return 'InvalidResult'
InvalidResult = InvalidResultType()
success = InvalidResult
assert success == InvalidResult
assert success is InvalidResult
assert not bool(InvalidResult)
assert InvalidResult != True
assert InvalidResult != False # Not sure about this yet
assert InvalidResult != None
Now of course I find the similar questions:
Defining my own None-like Python constant
and the __reduce__ might be overkill; I'm not sure if pickling or copying will ever matter
How to create a second None in Python? Making a singleton object where the id is always the same
In a comment on this question, I saw a statement that recommended using
result is not None
vs
result != None
What is the difference? And why might one be recommended over the other?
== is an equality test. It checks whether the right hand side and the left hand side are equal objects (according to their __eq__ or __cmp__ methods.)
is is an identity test. It checks whether the right hand side and the left hand side are the very same object. No methodcalls are done, objects can't influence the is operation.
You use is (and is not) for singletons, like None, where you don't care about objects that might want to pretend to be None or where you want to protect against objects breaking when being compared against None.
First, let me go over a few terms. If you just want your question answered, scroll down to "Answering your question".
Definitions
Object identity: When you create an object, you can assign it to a variable. You can then also assign it to another variable. And another.
>>> button = Button()
>>> cancel = button
>>> close = button
>>> dismiss = button
>>> print(cancel is close)
True
In this case, cancel, close, and dismiss all refer to the same object in memory. You only created one Button object, and all three variables refer to this one object. We say that cancel, close, and dismiss all refer to identical objects; that is, they refer to one single object.
Object equality: When you compare two objects, you usually don't care that it refers to the exact same object in memory. With object equality, you can define your own rules for how two objects compare. When you write if a == b:, you are essentially saying if a.__eq__(b):. This lets you define a __eq__ method on a so that you can use your own comparison logic.
Rationale for equality comparisons
Rationale: Two objects have the exact same data, but are not identical. (They are not the same object in memory.)
Example: Strings
>>> greeting = "It's a beautiful day in the neighbourhood."
>>> a = unicode(greeting)
>>> b = unicode(greeting)
>>> a is b
False
>>> a == b
True
Note: I use unicode strings here because Python is smart enough to reuse regular strings without creating new ones in memory.
Here, I have two unicode strings, a and b. They have the exact same content, but they are not the same object in memory. However, when we compare them, we want them to compare equal. What's happening here is that the unicode object has implemented the __eq__ method.
class unicode(object):
# ...
def __eq__(self, other):
if len(self) != len(other):
return False
for i, j in zip(self, other):
if i != j:
return False
return True
Note: __eq__ on unicode is definitely implemented more efficiently than this.
Rationale: Two objects have different data, but are considered the same object if some key data is the same.
Example: Most types of model data
>>> import datetime
>>> a = Monitor()
>>> a.make = "Dell"
>>> a.model = "E770s"
>>> a.owner = "Bob Jones"
>>> a.warranty_expiration = datetime.date(2030, 12, 31)
>>> b = Monitor()
>>> b.make = "Dell"
>>> b.model = "E770s"
>>> b.owner = "Sam Johnson"
>>> b.warranty_expiration = datetime.date(2005, 8, 22)
>>> a is b
False
>>> a == b
True
Here, I have two Dell monitors, a and b. They have the same make and model. However, they neither have the same data nor are the same object in memory. However, when we compare them, we want them to compare equal. What's happening here is that the Monitor object implemented the __eq__ method.
class Monitor(object):
# ...
def __eq__(self, other):
return self.make == other.make and self.model == other.model
Answering your question
When comparing to None, always use is not. None is a singleton in Python - there is only ever one instance of it in memory.
By comparing identity, this can be performed very quickly. Python checks whether the object you're referring to has the same memory address as the global None object - a very, very fast comparison of two numbers.
By comparing equality, Python has to look up whether your object has an __eq__ method. If it does not, it examines each superclass looking for an __eq__ method. If it finds one, Python calls it. This is especially bad if the __eq__ method is slow and doesn't immediately return when it notices that the other object is None.
Did you not implement __eq__? Then Python will probably find the __eq__ method on object and use that instead - which just checks for object identity anyway.
When comparing most other things in Python, you will be using !=.
Consider the following:
class Bad(object):
def __eq__(self, other):
return True
c = Bad()
c is None # False, equivalent to id(c) == id(None)
c == None # True, equivalent to c.__eq__(None)
None is a singleton, and therefore identity comparison will always work, whereas an object can fake the equality comparison via .__eq__().
>>> () is ()
True
>>> 1 is 1
True
>>> (1,) == (1,)
True
>>> (1,) is (1,)
False
>>> a = (1,)
>>> b = a
>>> a is b
True
Some objects are singletons, and thus is with them is equivalent to ==. Most are not.
It is standard convention to use if foo is None rather than if foo == None to test if a value is specifically None.
If you want to determine whether a value is exactly True (not just a true-like value), is there any reason to use if foo == True rather than if foo is True? Does this vary between implementations such as CPython (2.x and 3.x), Jython, PyPy, etc.?
Example: say True is used as a singleton value that you want to differentiate from the value 'bar', or any other true-like value:
if foo is True: # vs foo == True
...
elif foo == 'bar':
...
Is there a case where using if foo is True would yield different results from if foo == True?
NOTE: I am aware of Python booleans - if x:, vs if x == True, vs if x is True. However, it only addresses whether if foo, if foo == True, or if foo is True should generally be used to determine whether foo has a true-like value.
UPDATE: According to PEP 285 § Specification:
The values False and True will be singletons, like None.
If you want to determine whether a value is exactly True (not just a true-like value), is there any reason to use if foo == True rather than if foo is True?
If you want to make sure that foo really is a boolean and of value True, use the is operator.
Otherwise, if the type of foo implements its own __eq__() that returns a true-ish value when comparing to True, you might end up with an unexpected result.
As a rule of thumb, you should always use is with the built-in constants True, False and None.
Does this vary between implementations such as CPython (2.x and 3.x), Jython, PyPy, etc.?
In theory, is will be faster than == since the latter must honor types' custom __eq__ implementations, while is can directly compare object identities (e.g., memory addresses).
I don't know the source code of the various Python implementations by heart, but I assume that most of them can optimize that by using some internal flags for the existence of magic methods, so I suspect that you won't notice the speed difference in practice.
Never use is True in combination with numpy (and derivatives such as pandas):
In[1]: import numpy as np
In[2]: a = np.array([1, 2]).any()
In[4]: a is True
Out[4]: False
In[5]: a == True
Out[5]: True
This was unexpected to me as:
In[3]: a
Out[3]: True
I guess the explanation is given by:
In[6]: type(a)
Out[6]: numpy.bool_
is there any reason to use if foo == True rather than if foo is True?"
>>> d = True
>>> d is True
True
>>> d = 1
>>> d is True
False
>>> d == True
True
>>> d = 2
>>> d == True
False
Note that bool is a subclass of int, and that True has the integer value 1. To answer your question, if you want to check that some variable "is exactly True", you have to use the identity operator is. But that's really not pythonic... May I ask what's your real use case - IOW : why do you want to make a difference between True, 1 or any 'truth' value ?
edit: regarding:
Is there a case where using if foo is True would yield different results from if foo == True?
there is a case, and it's this:
In [24]: 1 is True
Out[24]: False
In [25]: 1 == True
Out[25]: True
additionally, if you're looking to use a singleton as a sentinel value, you can just create an object:
sentinel_time = object()
def f(snth):
if snth is sentinel_time:
print 'got em!'
f(sentinel_time)
you don't want to use if var == True:, you really want if var:.
imagine you have a list. you don't care if a list is "True" or not, you just want to know whether or not it's empty. so...
l = ['snth']
if l:
print l
check out this post for what evaluates to False: Evaluation of boolean expressions in Python
Using foo is True instead of foo == True (or just foo) if is most of the time not what you want.
I have seen foo is True used for checking that the parameter foo really was a boolean.
It contradicts python's duck-typing philosophy (you should in general not check for types. A function acting differently with True than with other truthy values is counter-intuitive for a programmer who assumes duck-typing)
Even if you want to check for types, it is better to do it explicity like :
def myFunction(foo):
if not isinstance(foo, bool):
raise ValueError("foo should be a boolean")
>>> myFunction(1)
Exception: ValueError "foo should be a boolean"
For several reasons:
Bool is the only type where the is operator will be equivalent to isinstance(a, bool) and a. The reason for that is the fact that True and False are singletons. In other words, this works because of a poorly known feature of python (especially when some tutorials teach you that True and False are just aliases for 1 and 0).
If you use isinstance and the programmer was not aware that your function did not accept truthy-values, or if they are using numpy and forgot to cast their numpy-boolean to a python-boolean, they will know what they did wrong, and will be able to debug.
Compare with
def myFunction(foo):
if foo is True:
doSomething()
else:
doSomethingElse()
In this case, myFunction(1) not only does not raise an exception, but probably does the opposite of what it was expected to do. This makes for a hard to find bug in case someone was using a numpy boolean for example.
When should you use is True then ?
EDIT: this is bad practice, starting from 3.9, python raises a warning when you try to use is to compare with a literal. See # JayDadhania's comment below. In conclusion is should not be used to compare to literals, only to check the equality of memory address.
Just don't use it. If you need to check for type, use isinstance.
Old paragraph:
Basically, use it only as a shorthand for isinstance(foo, bool) and foo
The only case I see is when you explicitely want to check if a value is true, and you will also check if the value is another truthy value later on. Examples include:
if foo is True:
doSomething()
elif foo is False:
doSomethingElse()
elif foo is 1: #EDIT: raises a warning, use == 1 instead
doYetSomethingElse()
else:
doSomethingElseEntirely()
Here's a test that allows you to see the difference between the 3 forms of testing for True:
for test in ([], [1], 0, 1, 2):
print repr(test), 'T' if test else 'F', 'T' if test == True else 'F', 'T' if test is True else 'F'
[] F F F
[1] T F F
0 F F F
1 T T F
2 T F F
As you can see there are cases where all of them deliver different results.
Most of the time, you should not care about a detail like this. Either you already know that foo is a boolean (and you can thus use if foo), or you know that foo is something else (in which case there's no need to test). If you don't know the types of your variables, you may want to refactor your code.
But if you really need to be sure it is exactly True and nothing else, use is. Using == will give you 1 == True.
I have the following code:
a = str('5')
b = int(5)
a == b
# False
But if I make a subclass of int, and reimplement __cmp__:
class A(int):
def __cmp__(self, other):
return super(A, self).__cmp__(other)
a = str('5')
b = A(5)
a == b
# TypeError: A.__cmp__(x,y) requires y to be a 'A', not a 'str'
Why are these two different? Is the python runtime catching the TypeError thrown by int.__cmp__(), and interpreting that as a False value? Can someone point me to the bit in the 2.x cpython source that shows how this is working?
The documentation isn't completely explicit on this point, but see here:
If both are numbers, they are converted to a common type. Otherwise, objects of different types always compare unequal, and are ordered consistently but arbitrarily. You can control comparison behavior of objects of non-built-in types by defining a __cmp__ method or rich comparison methods like __gt__, described in section Special method names.
This (particularly the implicit contrast between "objects of different types" and "objects of non-built-in types") suggests that the normal process of actually calling comparison methods is skipped for built-in types: if you try to compare objects of two dfferent (and non-numeric) built-in types, it just short-circuits to an automatic False.
A comparison decision tree for a == b looks something like:
python calls a.__cmp__(b)
a checks that b is an appropriate type
if b is an appropriate type, return -1, 0, or +1
if b is not, return NotImplented
if -1, 0, or +1 returned, python is done; otherwise
if NotImplented returned, try
b.__cmp__(a)
b checks that a is an appropriate type
if a is an appropriate type, return -1, 0, or +1
if a is not, return NotImplemented
if -1, 0, or +1 returned, python is done; otherwise
if NotImplented returned again, the answer is False
Not an exact answer, but hopefully it helps.
If I understood your problem right, you need something like:
>>> class A(int):
... def __cmp__(self, other):
... return super(A, self).__cmp__(A(other)) # <--- A(other) instead of other
...
>>> a = str('5')
>>> b = A(5)
>>> a == b
True
Updated
Regarding to 2.x cpython source, you can find reason for this result in typeobject.c in function wrap_cmpfunc which actually checks two things: given compare function is a func and other is subtype for self.
if (Py_TYPE(other)->tp_compare != func &&
!PyType_IsSubtype(Py_TYPE(other), Py_TYPE(self))) {
// ....
}