python tilde unary operator as negation numpy bool array

python tilde unary operator as negation numpy bool array - python

Should be a simple question, but I'm unable to find an answer anywhere. The ~ operator in python is a documented as a bitwise inversion operator. Fine. I have noticed seemingly schizophrenic behavior though, to wit:
~True -> -2
~1 -> -2
~False -> -1
~0 -> -1
~numpy.array([True,False],dtype=int) -> array([-2,-1])
~numpy.array([True,False],dtype=bool) -> array([False,True])
In the first 4 examples, I can see that python is implementing (as documented) ~x = -(x+1), with the input treated as an int even if it's boolean. Hence, for a scalar boolean, ~ is not treated as a logical negation. Not that the behavior is identical on a numpy array defined with boolean values by with an int type.
Why does ~ then work as a logical negation operator on a boolean array (Also notice: ~numpy.isfinite(numpy.inf) -> True?)?
It is extremely annoying that I must use not() on a scalar, but not() won't work to negate an array. Then for an array, I must use ~, but ~ won't work to negate a scalar...

not is implemented through the __nonzero__ special method, which is required to return either True or False, so it can't give the required result. Instead the ~ operator is used, which is implemented through the __not__ special method. For the same reason, & and | are used in place of and and or.
PEP 335 aimed to allow overloading of boolean operators but was rejected because of excessive overhead (it would e.g. complicate if statements). PEP 225 suggests a general syntax for "elementwise" operators, which would provide a more general solution, but has been deferred. It appears that the current situation, while awkward, is not painful enough to force change.
np.isfinite when called on a scalar returns a value of type np.bool_, not bool. np.bool_ is also the type you get when extracting a scalar value from an array of bool dtype. If you use np.True_ and np.False_ in place of True and False you will get consistent behaviour under ~.

Related

Why is the last number returned when putting "and" in python? [duplicate]

First, the code:
>>> False or 'hello'
'hello'
This surprising behavior lets you check if x is not None and check the value of x in one line:
>>> x = 10 if randint(0,2) == 1 else None
>>> (x or 0) > 0
# depend on x value...
Explanation: or functions like this:
if x is false, then y, else x
No language that I know lets you do this. So, why does Python?

It sounds like you're combining two issues into one.
First, there's the issue of short-circuiting. Marcin's answer addresses this issue perfectly, so I won't try to do any better.
Second, there's or and and returning the last-evaluated value, rather than converting it to bool. There are arguments to be made both ways, and you can find many languages on either side of the divide.
Returning the last-evaluated value allows the functionCall(x) or defaultValue shortcut, avoids a possibly wasteful conversion (why convert an int 2 into a bool 1 if the only thing you're going to do with it is check whether it's non-zero?), and is generally easier to explain. So, for various combinations of these reasons, languages like C, Lisp, Javascript, Lua, Perl, Ruby, and VB all do things this way, and so does Python.
Always returning a boolean value from an operator helps to catch some errors (especially in languages where the logical operators and the bitwise operators are easy to confuse), and it allows you to design a language where boolean checks are strictly-typed checks for true instead of just checks for nonzero, it makes the type of the operator easier to write out, and it avoids having to deal with conversion for cases where the two operands are different types (see the ?: operator in C-family languages). So, for various combinations of these reasons, languages like C++, Fortran, Smalltalk, and Haskell all do things this way.
In your question (if I understand it correctly), you're using this feature to be able to write something like:
if (x or 0) < 1:
When x could easily be None. This particular use case isn't very useful, both because the more-explicit x if x else 0 (in Python 2.5 and later) is just as easy to write and probably easier to understand (at least Guido thinks so), but also because None < 1 is the same as 0 < 1 anyway (at least in Python 2.x, so you've always got at least one of the two options)… But there are similar examples where it is useful. Compare these two:
return launchMissiles() or -1
return launchMissiles() if launchMissiles() else -1
The second one will waste a lot of missiles blowing up your enemies in Antarctica twice instead of once.
If you're curious why Python does it this way:
Back in the 1.x days, there was no bool type. You've got falsy values like None, 0, [], (), "", etc., and everything else is true, so who needs explicit False and True? Returning 1 from or would have been silly, because 1 is no more true than [1, 2, 3] or "dsfsdf". By the time bool was added (gradually over two 2.x versions, IIRC), the current logic was already solidly embedded in the language, and changing would have broken a lot of code.
So, why didn't they change it in 3.0? Many Python users, including BDFL Guido, would suggest that you shouldn't use or in this case (at the very least because it's a violation of "TOOWTDI"); you should instead store the result of the expression in a variable, e.g.:
missiles = launchMissiles()
return missiles if missiles else -1
And in fact, Guido has stated that he'd like to ban launchMissiles() or -1, and that's part of the reason he eventually accepted the ternary if-else expression that he'd rejected many times before. But many others disagree, and Guido is a benevolent DFL. Also, making or work the way you'd expect everywhere else, while refusing to do what you want (but Guido doesn't want you to want) here, would actually be pretty complicated.
So, Python will probably always be on the same side as C, Perl, and Lisp here, instead of the same side as Java, Smalltalk, and Haskell.

No language that i know lets you do this. So, why Python do?
Then you don't know many languages. I can't think of one language that I do know that does not exhibit this "shortcircuiting" behaviour.
It does it because it is useful to say:
a = b or K
such that a either becomes b, if b is not None (or otherwise falsy), and if not it gets the default value K.

Actually a number of languages do. See Wikipedia about Short-Circuit Evaluation
For the reason why short-circuit evaluation exists, wikipedia writes:
If both expressions used as conditions are simple boolean variables,
it can be actually faster to evaluate both conditions used in boolean
operation at once, as it always requires a single calculation cycle,
as opposed to one or two cycles used in short-circuit evaluation
(depending on the value of the first).

This behavior is not surprising, and it's quite straightforward if you consider Python has the following features regarding or, and and not logical operators:
Short-circuit evaluation: it only evaluates operands up to where it needs to.
Non-coercing result: the result is one of the operands, not coerced to bool.
And, additionally:
The Truth Value of an object is False only for None, False, 0, "", [], {}. Everything else has a truth value of True (this is a simplification; the correct definition is in the official docs)
Combine those features, and it leads to:
or : if the first operand evaluates as True, short-circuit there and return it. Or return the 2nd operand.
and: if the first operand evaluates as False, short-circuit there and return it. Or return the 2nd operand.
It's easier to understand if you generalize to a chain of operations:
>>> a or b or c or d
>>> a and b and c and d
Here is the "rule of thumb" I've memorized to help me easily predict the result:
or : returns the first "truthy" operand it finds, or the last one.
and: returns the first "falsy" operand it finds, or the last one.
As for your question, on why python behaves like that, well... I think because it has some very neat uses, and it's quite intuitive to understand. A common use is a series of fallback choices, the first "found" (ie, non-falsy) is used. Think about this silly example:
drink = getColdBeer() or pickNiceWine() or random.anySoda or "meh, water :/"
Or this real-world scenario:
username = cmdlineargs.username or configFile['username'] or DEFAULT_USERNAME
Which is much more concise and elegant than the alternative.
As many other answers have pointed out, Python is not alone and many other languages have the same behavior, for both short-circuit (I believe most current languanges are) and non-coercion.

"No language that i know lets you do this. So, why Python do?" You seem to assume that all languages should be the same. Wouldn't you expect innovation in programming languages to produce unique features that people value?
You've just pointed out why it's useful, so why wouldn't Python do it? Perhaps you should ask why other languages don't.

You can take advantage of the special features of the Python or operator out of Boolean contexts. The rule of thumb is still that the result of your Boolean expressions is the first true operand or the last in the line.
Notice that the logical operators (or included) are evaluated before the assignment operator =, so you can assign the result of a Boolean expression to a variable in the same way you do with a common expression:
>>> a = 1
>>> b = 2
>>> var1 = a or b
>>> var1
1
>>> a = None
>>> b = 2
>>> var2 = a or b
>>> var2
2
>>> a = []
>>> b = {}
>>> var3 = a or b
>>> var3
{}
Here, the or operator works as expected, returning the first true operand or the last operand if both are evaluated to false.

Is there any legitimate use of list[True], list[False] in Python?

Since True and False are instances of int, the following is valid in Python:
>>> l = [0, 1, 2]
>>> l[False]
0
>>> l[True]
1
I understand why this happens. However, I find this behaviour a bit unexpected and can lead to hard-to-debug bugs. It has certainly bitten me a couple of times.
Can anyone think of a legit use of indexing lists with True or False?

In the past, some people have used this behaviour to produce a poor-man's conditional expression:
['foo', 'bar'][eggs > 5] # produces 'bar' when eggs is 6 or higher, 'foo' otherwise
However, with a proper conditional expression having been added to the language in Python 2.5, this is very much frowned upon, for the reasons you state: relying on booleans being a subclass of integers is too 'magical' and unreadable for a maintainer.
So, unless you are code-golfing (deliberately producing very compact and obscure code), use
'bar' if eggs > 5 else 'foo'
instead, which has the added advantage that the two expressions this selects between are lazily evaluated; if eggs > 5 is false, the expression before the if is never executed.

If you are puzzled why bool is a valid index argument: this is simply for consistency with the fact that bool is a subclass of int and in Python it is a numerical type.
If you are asking why bool is a numerical type in the first place then you have to understand that bool wasn't present in old releases of Python and people used ints instead.
I will add a bit of historic arguments. First of all the addition of bool in python is shortly described in Guido van Rossum (aka BDFL) blogpost: The History of Python: The history of bool, True and False. The type was added via PEP 285.
The PEP contains the actual rationales used for this decisions. I'll quote some of the portions of the PEP below.
4) Should we strive to eliminate non-Boolean operations on bools
in the future, through suitable warnings, so that for example
True+1 would eventually (in Python 3000) be illegal?
=> No.
There's a small but vocal minority that would prefer to see
"textbook" bools that don't support arithmetic operations at
all, but most reviewers agree with me that bools should always
allow arithmetic operations.
6) Should bool inherit from int?
=> Yes.
In an ideal world, bool might be better implemented as a
separate integer type that knows how to perform mixed-mode
arithmetic. However, inheriting bool from int eases the
implementation enormously(in part since all C code that calls
PyInt_Check() will continue to work -- this returns true for
subclasses of int). Also, I believe this is right in terms of
substitutability: code that requires an int can be fed a bool
and it will behave the same as 0 or 1. Code that requires a
bool may not work when it is given an int; for example, 3 & 4
is 0, but both 3 and 4 are true when considered as truth
values.
Because bool inherits from int, True+1 is valid and equals 2, and
so on. This is important for backwards compatibility: because
comparisons and so on currently return integer values, there's no
way of telling what uses existing applications make of these
values.
Because of backwards compatibility, the bool type lacks many
properties that some would like to see. For example, arithmetic
operations with one or two bool arguments is allowed, treating
False as 0 and True as 1. Also, a bool may be used as a sequence
index.
I don't see this as a problem, and I don't want evolve the
language in this direction either. I don't believe that a
stricter interpretation of "Booleanness" makes the language any
clearer.
Summary:
Backwards compatibility: there was plenty of code that already used ints 0 and 1 to represent False and True and some of it used those values in numerical computations.
It wasn't seen as a big deal to have a "non-textbook" bool type
Plenty of people in the Python community wanted these features
BDFL said so.

There are often better ways, but Boolean indices do have their uses. I've used them when I want to convert a boolean result to something more human readable:
test_result = run_test()
log.info("The test %s." % ('Failed', 'Passed')[test_result])

How to see that a numpy array of zeros is not empty?

Here is my problem:
I use numpy any() function to check if my array is empty or not.
a = numpy.array([1., 2., 3.])
a.any()
#True
a = numpy.array([0., 0., 0.])
a.any()
#False
I would think that, given that 0. is a float, the any() function from numpy would return True. How can I make this happen?
What's the reason behind the fact that zeros are not considered as actual values by numpy?
I use python 2.6

What you are observing is actually expected: any() means "is there any element whose boolean value is true in this array?". Since the boolean value of 0. is false (non-zero numbers are true), it is normal that a.any() is false when the array only contains zeroes.
You can check the boolean value of any Python object with bool().
If you need to know if your array has any element, then you can test a.size (0 for no elements).

What's the reason behind the fact that zeros are not considered as actual values by numpy?
It's a general principle in Python that "falsey" means False, None, a numeric zero*, or an empty collection. See Truth Value Testing in the documentation for the exact rules.** Different languages have different rules for what counts as truthy vs. falsy***; these are Python's.
And NumPy follows that general principle. So, zeros are considered as actual values, but they're actual false values.
So, an array full of numeric zero values does not have any truthy members, therefore calling any on it will return False.
* Note that in some cases, a value that rounds to 0.0 isn't exactly zero, in which case it may be, confusingly, true. Just one more way floating point rounding errors suck… If you really need to check that the values are non-zero, check whether they're within some appropriate epsilon of zero, rather than checking exact values. NumPy has a number of useful helpers here.
** I left out the rule that custom types can decide which values are truthy or falsy by defining a __bool__ method, or various fallbacks which depend on your exact Python version. That's how things work under the hood. But for the designer of such a class, her class should try to follow the general principle; whatever it means for her values which are "zero" or "empty" or "false" or "nonexistent", that's the rule that her __bool__ method should apply.
*** In C-family languages, it's generally zeros and NULL pointers that are falsy. In Lisp-family languages, it's only the empty list or closely-related values. In Ruby and Swift, it's just false and nil. And so on. Any rule will be counter-intuitive in some cases; as long as the language and its ecosystem are consistent, that's as good as you can hope for. (If you have to use a language that isn't consistent, like PHP or JavaScript, you'll have to keep the docs handy…)

boolean and type checking in python vs numpy

I ran into unexpected results in a python if clause today:
import numpy
if numpy.allclose(6.0, 6.1, rtol=0, atol=0.5):
print 'close enough' # works as expected (prints message)
if numpy.allclose(6.0, 6.1, rtol=0, atol=0.5) is True:
print 'close enough' # does NOT work as expected (prints nothing)
After some poking around (i.e., this question, and in particular this answer), I understand the cause: the type returned by numpy.allclose() is numpy.bool_ rather than plain old bool, and apparently if foo = numpy.bool_(1), then if foo will evaluate to True while if foo is True will evaluate to False. This appears to be the work of the is operator.
My questions are: why does numpy have its own boolean type, and what is best practice in light of this situation? I can get away with writing if foo: to get expected behavior in the example above, but I like the more stringent if foo is True: because it excludes things like 2 and [2] from returning True, and sometimes the explicit type check is desirable.

You're doing something which is considered an anti-pattern. Quoting PEP 8:
Don't compare boolean values to True or False using ==.
Yes: if greeting:
No: if greeting == True:
Worse: if greeting is True:
The fact that numpy wasn't designed to facilitate your non-pythonic code isn't a bug in numpy. In fact, it's a perfect example of why your personal idiom is an anti-pattern.
As PEP 8 says, using is True is even worse than == True. Why? Because you're checking object identity: not only must the result be truthy in a boolean context (which is usually all you need), and equal to the boolean True value, it has to actually be the constant True. It's hard to imagine any situation in which this is what you want.
And you specifically don't want it here:
>>> np.True_ == True
True
>>> np.True_ is True
False
So, all you're doing is explicitly making your code incompatible with numpy, and various other C extension libraries (conceivably a pure-Python library could return a custom value that's equal to True, but I don't know of any that do so).
In your particular case, there is no reason to exclude 2 and [2]. If you read the docs for numpy.allclose, it clearly isn't going to return them. But consider some other function, like many of those in the standard library that just say they evaluate to true or to false. That means they're explicitly allowed to return one of their truthy arguments, and often will do so. Why would you want to consider that false?
Finally, why would numpy, or any other C extension library, define such bool-compatible-but-not-bool types?
In general, it's because they're wrapping a C int or a C++ bool or some other such type. In numpy's case, it's wrapping a value that may be stored in a fastest-machine-word type or a single byte (maybe even a single bit in some cases) as appropriate for performance, and your code doesn't have to care which, because all representations look the same, including being truthy and equal to the True constant.

why does numpy have its own boolean type
Space and speed. Numpy stores things in compact arrays; if it can fit a boolean into a single byte it'll try. You can't easily do this with Python objects, as you have to store references which slows calculations down significantly.
I can get away with writing if foo: to get expected behavior in the example above, but I like the more stringent if foo is True: because it excludes things like 2 and [2] from returning True, and sometimes the explicit type check is desirable.
Well, don't do that.

What is the motivation for the "or" operator to not return a bool?

First, the code:
>>> False or 'hello'
'hello'
This surprising behavior lets you check if x is not None and check the value of x in one line:
>>> x = 10 if randint(0,2) == 1 else None
>>> (x or 0) > 0
# depend on x value...
Explanation: or functions like this:
if x is false, then y, else x
No language that I know lets you do this. So, why does Python?

No language that i know lets you do this. So, why Python do?
Then you don't know many languages. I can't think of one language that I do know that does not exhibit this "shortcircuiting" behaviour.
It does it because it is useful to say:
a = b or K
such that a either becomes b, if b is not None (or otherwise falsy), and if not it gets the default value K.

Actually a number of languages do. See Wikipedia about Short-Circuit Evaluation
For the reason why short-circuit evaluation exists, wikipedia writes:
If both expressions used as conditions are simple boolean variables,
it can be actually faster to evaluate both conditions used in boolean
operation at once, as it always requires a single calculation cycle,
as opposed to one or two cycles used in short-circuit evaluation
(depending on the value of the first).

This behavior is not surprising, and it's quite straightforward if you consider Python has the following features regarding or, and and not logical operators:
Short-circuit evaluation: it only evaluates operands up to where it needs to.
Non-coercing result: the result is one of the operands, not coerced to bool.
And, additionally:
The Truth Value of an object is False only for None, False, 0, "", [], {}. Everything else has a truth value of True (this is a simplification; the correct definition is in the official docs)
Combine those features, and it leads to:
or : if the first operand evaluates as True, short-circuit there and return it. Or return the 2nd operand.
and: if the first operand evaluates as False, short-circuit there and return it. Or return the 2nd operand.
It's easier to understand if you generalize to a chain of operations:
>>> a or b or c or d
>>> a and b and c and d
Here is the "rule of thumb" I've memorized to help me easily predict the result:
or : returns the first "truthy" operand it finds, or the last one.
and: returns the first "falsy" operand it finds, or the last one.
As for your question, on why python behaves like that, well... I think because it has some very neat uses, and it's quite intuitive to understand. A common use is a series of fallback choices, the first "found" (ie, non-falsy) is used. Think about this silly example:
drink = getColdBeer() or pickNiceWine() or random.anySoda or "meh, water :/"
Or this real-world scenario:
username = cmdlineargs.username or configFile['username'] or DEFAULT_USERNAME
Which is much more concise and elegant than the alternative.
As many other answers have pointed out, Python is not alone and many other languages have the same behavior, for both short-circuit (I believe most current languanges are) and non-coercion.

"No language that i know lets you do this. So, why Python do?" You seem to assume that all languages should be the same. Wouldn't you expect innovation in programming languages to produce unique features that people value?
You've just pointed out why it's useful, so why wouldn't Python do it? Perhaps you should ask why other languages don't.

You can take advantage of the special features of the Python or operator out of Boolean contexts. The rule of thumb is still that the result of your Boolean expressions is the first true operand or the last in the line.
Notice that the logical operators (or included) are evaluated before the assignment operator =, so you can assign the result of a Boolean expression to a variable in the same way you do with a common expression:
>>> a = 1
>>> b = 2
>>> var1 = a or b
>>> var1
1
>>> a = None
>>> b = 2
>>> var2 = a or b
>>> var2
2
>>> a = []
>>> b = {}
>>> var3 = a or b
>>> var3
{}
Here, the or operator works as expected, returning the first true operand or the last operand if both are evaluated to false.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.