Here is my problem:
I use numpy any() function to check if my array is empty or not.
a = numpy.array([1., 2., 3.])
a.any()
#True
a = numpy.array([0., 0., 0.])
a.any()
#False
I would think that, given that 0. is a float, the any() function from numpy would return True. How can I make this happen?
What's the reason behind the fact that zeros are not considered as actual values by numpy?
I use python 2.6
What you are observing is actually expected: any() means "is there any element whose boolean value is true in this array?". Since the boolean value of 0. is false (non-zero numbers are true), it is normal that a.any() is false when the array only contains zeroes.
You can check the boolean value of any Python object with bool().
If you need to know if your array has any element, then you can test a.size (0 for no elements).
What's the reason behind the fact that zeros are not considered as actual values by numpy?
It's a general principle in Python that "falsey" means False, None, a numeric zero*, or an empty collection. See Truth Value Testing in the documentation for the exact rules.** Different languages have different rules for what counts as truthy vs. falsy***; these are Python's.
And NumPy follows that general principle. So, zeros are considered as actual values, but they're actual false values.
So, an array full of numeric zero values does not have any truthy members, therefore calling any on it will return False.
* Note that in some cases, a value that rounds to 0.0 isn't exactly zero, in which case it may be, confusingly, true. Just one more way floating point rounding errors suck… If you really need to check that the values are non-zero, check whether they're within some appropriate epsilon of zero, rather than checking exact values. NumPy has a number of useful helpers here.
** I left out the rule that custom types can decide which values are truthy or falsy by defining a __bool__ method, or various fallbacks which depend on your exact Python version. That's how things work under the hood. But for the designer of such a class, her class should try to follow the general principle; whatever it means for her values which are "zero" or "empty" or "false" or "nonexistent", that's the rule that her __bool__ method should apply.
*** In C-family languages, it's generally zeros and NULL pointers that are falsy. In Lisp-family languages, it's only the empty list or closely-related values. In Ruby and Swift, it's just false and nil. And so on. Any rule will be counter-intuitive in some cases; as long as the language and its ecosystem are consistent, that's as good as you can hope for. (If you have to use a language that isn't consistent, like PHP or JavaScript, you'll have to keep the docs handy…)
Related
Since True and False are instances of int, the following is valid in Python:
>>> l = [0, 1, 2]
>>> l[False]
0
>>> l[True]
1
I understand why this happens. However, I find this behaviour a bit unexpected and can lead to hard-to-debug bugs. It has certainly bitten me a couple of times.
Can anyone think of a legit use of indexing lists with True or False?
In the past, some people have used this behaviour to produce a poor-man's conditional expression:
['foo', 'bar'][eggs > 5] # produces 'bar' when eggs is 6 or higher, 'foo' otherwise
However, with a proper conditional expression having been added to the language in Python 2.5, this is very much frowned upon, for the reasons you state: relying on booleans being a subclass of integers is too 'magical' and unreadable for a maintainer.
So, unless you are code-golfing (deliberately producing very compact and obscure code), use
'bar' if eggs > 5 else 'foo'
instead, which has the added advantage that the two expressions this selects between are lazily evaluated; if eggs > 5 is false, the expression before the if is never executed.
If you are puzzled why bool is a valid index argument: this is simply for consistency with the fact that bool is a subclass of int and in Python it is a numerical type.
If you are asking why bool is a numerical type in the first place then you have to understand that bool wasn't present in old releases of Python and people used ints instead.
I will add a bit of historic arguments. First of all the addition of bool in python is shortly described in Guido van Rossum (aka BDFL) blogpost: The History of Python: The history of bool, True and False. The type was added via PEP 285.
The PEP contains the actual rationales used for this decisions. I'll quote some of the portions of the PEP below.
4) Should we strive to eliminate non-Boolean operations on bools
in the future, through suitable warnings, so that for example
True+1 would eventually (in Python 3000) be illegal?
=> No.
There's a small but vocal minority that would prefer to see
"textbook" bools that don't support arithmetic operations at
all, but most reviewers agree with me that bools should always
allow arithmetic operations.
6) Should bool inherit from int?
=> Yes.
In an ideal world, bool might be better implemented as a
separate integer type that knows how to perform mixed-mode
arithmetic. However, inheriting bool from int eases the
implementation enormously(in part since all C code that calls
PyInt_Check() will continue to work -- this returns true for
subclasses of int). Also, I believe this is right in terms of
substitutability: code that requires an int can be fed a bool
and it will behave the same as 0 or 1. Code that requires a
bool may not work when it is given an int; for example, 3 & 4
is 0, but both 3 and 4 are true when considered as truth
values.
Because bool inherits from int, True+1 is valid and equals 2, and
so on. This is important for backwards compatibility: because
comparisons and so on currently return integer values, there's no
way of telling what uses existing applications make of these
values.
Because of backwards compatibility, the bool type lacks many
properties that some would like to see. For example, arithmetic
operations with one or two bool arguments is allowed, treating
False as 0 and True as 1. Also, a bool may be used as a sequence
index.
I don't see this as a problem, and I don't want evolve the
language in this direction either. I don't believe that a
stricter interpretation of "Booleanness" makes the language any
clearer.
Summary:
Backwards compatibility: there was plenty of code that already used ints 0 and 1 to represent False and True and some of it used those values in numerical computations.
It wasn't seen as a big deal to have a "non-textbook" bool type
Plenty of people in the Python community wanted these features
BDFL said so.
There are often better ways, but Boolean indices do have their uses. I've used them when I want to convert a boolean result to something more human readable:
test_result = run_test()
log.info("The test %s." % ('Failed', 'Passed')[test_result])
In this answer https://stackoverflow.com/a/27680814/3456281, the following construct is presented
a=[1,2]
while True:
if IndexError:
print ("Stopped.")
break
print(a[2])
which actually prints "Stopped." and breaks (tested with Python 3.4.1).
Why?! Why is if IndexError even legal? Why does a[2] not raise an IndexError with no try ... except around?
All objects have a boolean value. If not otherwise defined, that boolean value is True.
So this code is simply the equivalent of doing if True; so execution reaches the break statement immediately and the print is never reached.
View the Python documentation, section Truth Value Testing under Built-in Types of The Python Standard Library. The first sentence, and the first sentence after the last bullet point answers your question.
Any object can be tested for truth value ...
and
All other values are considered true — so objects of many types are
always true.
Here's the full text of the documentation, (content in brackets, [], are added as an augmentation):
5.1. Truth Value Testing
Any object can be tested for truth value, for use in an if or while
condition or as operand of the Boolean operations below. The following
values are considered false:
None
False
zero of any numeric type, for example, 0, 0L, 0.0, 0j.
any empty sequence, for example, '', (), [].
any empty mapping, for example, {}.
instances of user-defined classes, if the class defines a __bool__() [__nonzero__() in Python 2] or __len__() method, when that method returns the integer zero or bool value False. [See Data Model, Special Method Names, section Basic Customization, of The Python Language Reference]
All other values are considered true — so objects of many types are
always true.
Operations and built-in functions that have a Boolean result always
return 0 or False for false and 1 or True for true, unless otherwise
stated. (Important exception: the Boolean operations or and and always
return one of their operands.)
Conclusion
So since Exception does not have a __bool__, __nonzero__, or __len__, (nor fall under the other conditions listed above) an Exception object will always test as True in a boolean context.
I know that Python guarantees that there is only one instance of NoneType, the None object, so that you can safely use is None to test if something equals None.
Is there an equivalent guarantee for bool True and False (i.e. that there is only one instance of each)?
If not, why not?
EDIT: In particular, I've noticed that (n+0) is (0+n) gives True for n in range(-5, 257) and False otherwise. In other words, zero, the first 256 positive and the first 5 negative integers seem to be pre-cached and are not instanced again. I am guessing that that's a choice of the interpreter (CPython, in my case) and not a specification of the language. And bool derives from int, so I still have to wonder about what expectations I can have with other interpreters.
EDIT: To clarify, since this seems to have generated a lot of confusion, my intention is not to test the boolean interpretation of a value. For that I would never use is True or is False. My intention is to be able to tell apart False from everything else, in a variable that can have values of several types including empty strings, zeros, and None, and similarly for True. I'm myself an experienced programmer, of the kind who cringes when I see "if booleanvar == True".
NOTE ON DUPLICATES: The questions this one was alleged to be a duplicate of (this and this) don't answer this question; they merely state that bool is a subclass of int that differ mainly in their repr, not if True and False are guaranteed to be unique.
Also, note that it's not a question about what the names True and False are bound to, but about the instances of the class bool.
From the docs (https://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy):
Booleans
These represent the truth values False and True. The two objects representing the values False and True are the only Boolean objects.
There are only two objects, any computation producing a boolean will produce one of those two existing objects:
>>> (1 == 1) is True
True
>>> (1 == 0) is False
True
The bool type has only two instances, True and False. Furthermore, it can't be subclassed, so there's no way to create a derived class that can have additional instances.
But even though it's guaranteed, there's seldom a good reason to rely upon it. You should usually use if x rather than if x is True, and avoid situations where you need to distinguish True from other truthy values or False from other falsy values.
I ran into unexpected results in a python if clause today:
import numpy
if numpy.allclose(6.0, 6.1, rtol=0, atol=0.5):
print 'close enough' # works as expected (prints message)
if numpy.allclose(6.0, 6.1, rtol=0, atol=0.5) is True:
print 'close enough' # does NOT work as expected (prints nothing)
After some poking around (i.e., this question, and in particular this answer), I understand the cause: the type returned by numpy.allclose() is numpy.bool_ rather than plain old bool, and apparently if foo = numpy.bool_(1), then if foo will evaluate to True while if foo is True will evaluate to False. This appears to be the work of the is operator.
My questions are: why does numpy have its own boolean type, and what is best practice in light of this situation? I can get away with writing if foo: to get expected behavior in the example above, but I like the more stringent if foo is True: because it excludes things like 2 and [2] from returning True, and sometimes the explicit type check is desirable.
You're doing something which is considered an anti-pattern. Quoting PEP 8:
Don't compare boolean values to True or False using ==.
Yes: if greeting:
No: if greeting == True:
Worse: if greeting is True:
The fact that numpy wasn't designed to facilitate your non-pythonic code isn't a bug in numpy. In fact, it's a perfect example of why your personal idiom is an anti-pattern.
As PEP 8 says, using is True is even worse than == True. Why? Because you're checking object identity: not only must the result be truthy in a boolean context (which is usually all you need), and equal to the boolean True value, it has to actually be the constant True. It's hard to imagine any situation in which this is what you want.
And you specifically don't want it here:
>>> np.True_ == True
True
>>> np.True_ is True
False
So, all you're doing is explicitly making your code incompatible with numpy, and various other C extension libraries (conceivably a pure-Python library could return a custom value that's equal to True, but I don't know of any that do so).
In your particular case, there is no reason to exclude 2 and [2]. If you read the docs for numpy.allclose, it clearly isn't going to return them. But consider some other function, like many of those in the standard library that just say they evaluate to true or to false. That means they're explicitly allowed to return one of their truthy arguments, and often will do so. Why would you want to consider that false?
Finally, why would numpy, or any other C extension library, define such bool-compatible-but-not-bool types?
In general, it's because they're wrapping a C int or a C++ bool or some other such type. In numpy's case, it's wrapping a value that may be stored in a fastest-machine-word type or a single byte (maybe even a single bit in some cases) as appropriate for performance, and your code doesn't have to care which, because all representations look the same, including being truthy and equal to the True constant.
why does numpy have its own boolean type
Space and speed. Numpy stores things in compact arrays; if it can fit a boolean into a single byte it'll try. You can't easily do this with Python objects, as you have to store references which slows calculations down significantly.
I can get away with writing if foo: to get expected behavior in the example above, but I like the more stringent if foo is True: because it excludes things like 2 and [2] from returning True, and sometimes the explicit type check is desirable.
Well, don't do that.
Should be a simple question, but I'm unable to find an answer anywhere. The ~ operator in python is a documented as a bitwise inversion operator. Fine. I have noticed seemingly schizophrenic behavior though, to wit:
~True -> -2
~1 -> -2
~False -> -1
~0 -> -1
~numpy.array([True,False],dtype=int) -> array([-2,-1])
~numpy.array([True,False],dtype=bool) -> array([False,True])
In the first 4 examples, I can see that python is implementing (as documented) ~x = -(x+1), with the input treated as an int even if it's boolean. Hence, for a scalar boolean, ~ is not treated as a logical negation. Not that the behavior is identical on a numpy array defined with boolean values by with an int type.
Why does ~ then work as a logical negation operator on a boolean array (Also notice: ~numpy.isfinite(numpy.inf) -> True?)?
It is extremely annoying that I must use not() on a scalar, but not() won't work to negate an array. Then for an array, I must use ~, but ~ won't work to negate a scalar...
not is implemented through the __nonzero__ special method, which is required to return either True or False, so it can't give the required result. Instead the ~ operator is used, which is implemented through the __not__ special method. For the same reason, & and | are used in place of and and or.
PEP 335 aimed to allow overloading of boolean operators but was rejected because of excessive overhead (it would e.g. complicate if statements). PEP 225 suggests a general syntax for "elementwise" operators, which would provide a more general solution, but has been deferred. It appears that the current situation, while awkward, is not painful enough to force change.
np.isfinite when called on a scalar returns a value of type np.bool_, not bool. np.bool_ is also the type you get when extracting a scalar value from an array of bool dtype. If you use np.True_ and np.False_ in place of True and False you will get consistent behaviour under ~.