What's the purpose of the + (pos) unary operator in Python? - python

Generally speaking, what should the unary + do in Python?
I'm asking because, so far, I have never seen a situation like this:
+obj != obj
Where obj is a generic object implementing __pos__().
So I'm wondering: why do + and __pos__() exist? Can you provide a real-world example where the expression above evaluates to True?

Here's a "real-world" example from the decimal package:
>>> from decimal import Decimal
>>> obj = Decimal('3.1415926535897932384626433832795028841971')
>>> +obj != obj # The __pos__ function rounds back to normal precision
True
>>> obj
Decimal('3.1415926535897932384626433832795028841971')
>>> +obj
Decimal('3.141592653589793238462643383')

In Python 3.3 and above, collections.Counter uses the + operator to remove non-positive counts.
>>> from collections import Counter
>>> fruits = Counter({'apples': 0, 'pears': 4, 'oranges': -89})
>>> fruits
Counter({'pears': 4, 'apples': 0, 'oranges': -89})
>>> +fruits
Counter({'pears': 4})
So if you have negative or zero counts in a Counter, you have a situation where +obj != obj.
>>> obj = Counter({'a': 0})
>>> +obj != obj
True

I believe that Python operators where inspired by C, where the + operator was introduced for symmetry (and also some useful hacks, see comments).
In weakly typed languages such as PHP or Javascript, + tells the runtime to coerce the value of the variable into a number. For example, in Javascript:
+"2" + 1
=> 3
"2" + 1
=> '21'
Python is strongly typed, so strings don't work as numbers, and, as such, don't implement an unary plus operator.
It is certainly possible to implement an object for which +obj != obj :
>>> class Foo(object):
... def __pos__(self):
... return "bar"
...
>>> +Foo()
'bar'
>>> obj = Foo()
>>> +"a"
As for an example for which it actually makes sense, check out the
surreal numbers. They are a superset of the reals which includes
infinitesimal values (+ epsilon, - epsilon), where epsilon is
a positive value which is smaller than any other positive number, but
greater than 0; and infinite ones (+ infinity, - infinity).
You could define epsilon = +0, and -epsilon = -0.
While 1/0 is still undefined, 1/epsilon = 1/+0 is +infinity, and 1/-epsilon = -infinity. It is
nothing more than taking limits of 1/x as x aproaches 0 from the right (+) or from the left (-).
As 0 and +0 behave differently, it makes sense that 0 != +0.

A lot of examples here look more like bugs. This one is actually a feature, though:
The + operator implies a copy.
This is extremely useful when writing generic code for scalars and arrays.
For example:
def f(x, y):
z = +x
z += y
return z
This function works on both scalars and NumPy arrays without making extra copies and without changing the type of the object and without requiring any external dependencies!
If you used numpy.positive or something like that, you would introduce a NumPy dependency, and you would force numbers to NumPy types, which can be undesired by the caller.
If you did z = x + y, your result would no longer necessarily be the same type as x. In many cases that's fine, but when it's not, it's not an option.
If you did z = --x, you would create an unnecessary copy, which is slow.
If you did z = 1 * x, you'd perform an unnecessary multiplication, which is also slow.
If you did copy.copy... I guess that'd work, but it's pretty cumbersome.
Unary + is a really great option for this.

For symmetry, because unary minus is an operator, unary plus must be too. In most arithmetic situations, it doesn't do anything, but keep in mind that users can define arbitrary classes and use these operators for anything they want, even if it isn't strictly algebraic.
I know it's an old thread, but I wanted to extend the existing answers to provide a broader set of examples:
+ could assert for positivity and throw exception if it's not - very useful to detect corner cases.
The object may be multivalued (think ±sqrt(z) as a single object -- for solving quadratic equations, for multibranched analytical functions, anything where you can "collapse" a twovalued function into one branch with a sign. This includes the ±0 case mentioned by vlopez.
If you do lazy evaluation, this may create a function object that adds something to whatever it is applied to something else. For instance, if you are parsing arithmetics incrementally.
As an identity function to pass as an argument to some functional.
For algebraic structures where sign accumulates -- ladder operators and such. Sure, it could be done with other functions, but one could conceivably see something like y=+++---+++x. Even more, they don't have to commute. This constructs a free group of plus and minuses which could be useful. Even in formal grammar implementations.
Wild usage: it could "mark" a step in the calculation as "active" in some sense. reap/sow system -- every plus remembers the value and at the end, you can gather the collected intermediates... because why not?
That, plus all the typecasting reasons mentioned by others.
And after all... it's nice to have one more operator in case you need it.

__pos__() exists in Python to give programmers similar possibilities as in C++ language — to overload operators, in this case the unary operator +.
(Overloading operators means give them a different meaning for different objects, e. g. binary + behaves differently for numbers and for strings — numbers are added while strings are concatenated.)
Objects may implement (beside others) these emulating numeric types functions (methods):
__pos__(self) # called for unary +
__neg__(self) # called for unary -
__invert__(self) # called for unary ~
So +object means the same as object.__pos__() — they are interchangeable.
However, +object is more easy on the eye.
Creator of a particular object has free hands to implement these functions as he wants — as other people showed in their real world's examples.
And my contribution — as a joke: ++i != +i in C/C++.

Unary + is actually the fastest way to see if a value is numeric or not (and raise an exception if it isn't)! It's a single instruction in bytecode, where something like isinstance(i, int) actually looks up and calls the isinstance function!

Related

Can you use in-place operators with the Python 3.8 "Walrus" operator?

In Python 3.8, the Walrus operator was introduced, allowing assignment as an expression.
This means we can replace these 2 statements
a = 2
print(a)
with
print(a := 2)
However, Python also has "in-place" assignment with operators where, for example, a = a * 3 is equivalent to a *= 3
Is there any way to use "in-place" operator assignment, in combination with Walrus assignment?
For the following code
a *= 3
print(a)
To re-create this with Walrus assignment, it seems you must do
print(a := a * 3)
Both of these attempts raise a SyntaxError
print(a :*= 3)
print(a *:= 3)
No, augmented assignment (what you call "in-place" assignment, which doesn't always occur) is not supported with Python's Walrus operator, as stated in the specification PEP-0572.
The "equivalent" (quotes with emphasis) of
y ?= x
where ? is a supported symbol (e.g. +, -, *, >>, <<,...) is
(y := y ? x)
This is unfortunate from the perspective of programmers coming from C/C++ that have learned to appreciate that assignment expressions such as (which in C/C++ are valid not only as stand-alone statements as in Python, but as the loop condition or just about anywhere where a value can be substituted, assuming appropriate types)
(y ?= x)
which can make code more concise and perhaps more efficient by telling the compiler to do in-place assignment if they can (in C/C++, p++ and ++p -- analagous to p += 1 -- which both increment p in-place, are also idiomatic in loop conditions, stand-alone statements or just about anywhere where the (before-increment) value of p or its incremented value is needed).
Still, the Walrus operator at least helps make code more concise and readable in some situations.
def logb2(n: int):
"return the (bit) position of the most significant bit of n"
result = 0
while (n := n >> 1):
result += 1
return result
which I prefer over something like
def logb2(n: int):
result = 0
while True:
n >>= 1
if n:
result += 1
else:
return result
On my hardware and software, timeit.timeit shows that they take about the same amount of time, with the latter performing slightly (though negligibly, negligibly) better, despite the former requiring about 3 less bytecode instructions (less bytecote instructions does not necessarily imply more efficient of course).
There are many good use cases of the Walrus operator though in my view it is most beneficial in situations where it improves readability.
Support for what you (we) are after CAN be implemented in Python, but I don't see this happening anytime soon, if ever (which, well, is not such a bad thing).
You can't use in-place operators with the walrus operator.
Explanation :
The walrus operator assign a value to a new variable and return its value.
In-place operators are shortcuts to apply operators to an existing variable and update the variable.
In-place operators would have no meaning for a new variable. They raise an error if they are applied to a variable that is not assigned yet.
Therefore combining in-place and walrus has no use case.
Edit: A more practical explanation :
what would be the meaning of a:+=1 ?
if a doesn't exists, to what value 1 would be added ?
what would be the final value of a ?
for reference :
a:=8 #means : assign 8 to a and "return" a
a+=1 #means : a = a +1 and it raises an exception if a doesn't exists
You need to define a first:
print((a := 2)*3)
6

Matching multiple floats in an IF statement [duplicate]

It's well known that comparing floats for equality is a little fiddly due to rounding and precision issues.
For example: Comparing Floating Point Numbers, 2012 Edition
What is the recommended way to deal with this in Python?
Is a standard library function for this somewhere?
Python 3.5 adds the math.isclose and cmath.isclose functions as described in PEP 485.
If you're using an earlier version of Python, the equivalent function is given in the documentation.
def isclose(a, b, rel_tol=1e-09, abs_tol=0.0):
return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)
rel_tol is a relative tolerance, it is multiplied by the greater of the magnitudes of the two arguments; as the values get larger, so does the allowed difference between them while still considering them equal.
abs_tol is an absolute tolerance that is applied as-is in all cases. If the difference is less than either of those tolerances, the values are considered equal.
Something as simple as the following may be good enough:
return abs(f1 - f2) <= allowed_error
I would agree that Gareth's answer is probably most appropriate as a lightweight function/solution.
But I thought it would be helpful to note that if you are using NumPy or are considering it, there is a packaged function for this.
numpy.isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)
A little disclaimer though: installing NumPy can be a non-trivial experience depending on your platform.
Use Python's decimal module, which provides the Decimal class.
From the comments:
It is worth noting that if you're
doing math-heavy work and you don't
absolutely need the precision from
decimal, this can really bog things
down. Floats are way, way faster to
deal with, but imprecise. Decimals are
extremely precise but slow.
The common wisdom that floating-point numbers cannot be compared for equality is inaccurate. Floating-point numbers are no different from integers: If you evaluate "a == b", you will get true if they are identical numbers and false otherwise (with the understanding that two NaNs are of course not identical numbers).
The actual problem is this: If I have done some calculations and am not sure the two numbers I have to compare are exactly correct, then what? This problem is the same for floating-point as it is for integers. If you evaluate the integer expression "7/3*3", it will not compare equal to "7*3/3".
So suppose we asked "How do I compare integers for equality?" in such a situation. There is no single answer; what you should do depends on the specific situation, notably what sort of errors you have and what you want to achieve.
Here are some possible choices.
If you want to get a "true" result if the mathematically exact numbers would be equal, then you might try to use the properties of the calculations you perform to prove that you get the same errors in the two numbers. If that is feasible, and you compare two numbers that result from expressions that would give equal numbers if computed exactly, then you will get "true" from the comparison. Another approach is that you might analyze the properties of the calculations and prove that the error never exceeds a certain amount, perhaps an absolute amount or an amount relative to one of the inputs or one of the outputs. In that case, you can ask whether the two calculated numbers differ by at most that amount, and return "true" if they are within the interval. If you cannot prove an error bound, you might guess and hope for the best. One way of guessing is to evaluate many random samples and see what sort of distribution you get in the results.
Of course, since we only set the requirement that you get "true" if the mathematically exact results are equal, we left open the possibility that you get "true" even if they are unequal. (In fact, we can satisfy the requirement by always returning "true". This makes the calculation simple but is generally undesirable, so I will discuss improving the situation below.)
If you want to get a "false" result if the mathematically exact numbers would be unequal, you need to prove that your evaluation of the numbers yields different numbers if the mathematically exact numbers would be unequal. This may be impossible for practical purposes in many common situations. So let us consider an alternative.
A useful requirement might be that we get a "false" result if the mathematically exact numbers differ by more than a certain amount. For example, perhaps we are going to calculate where a ball thrown in a computer game traveled, and we want to know whether it struck a bat. In this case, we certainly want to get "true" if the ball strikes the bat, and we want to get "false" if the ball is far from the bat, and we can accept an incorrect "true" answer if the ball in a mathematically exact simulation missed the bat but is within a millimeter of hitting the bat. In that case, we need to prove (or guess/estimate) that our calculation of the ball's position and the bat's position have a combined error of at most one millimeter (for all positions of interest). This would allow us to always return "false" if the ball and bat are more than a millimeter apart, to return "true" if they touch, and to return "true" if they are close enough to be acceptable.
So, how you decide what to return when comparing floating-point numbers depends very much on your specific situation.
As to how you go about proving error bounds for calculations, that can be a complicated subject. Any floating-point implementation using the IEEE 754 standard in round-to-nearest mode returns the floating-point number nearest to the exact result for any basic operation (notably multiplication, division, addition, subtraction, square root). (In case of tie, round so the low bit is even.) (Be particularly careful about square root and division; your language implementation might use methods that do not conform to IEEE 754 for those.) Because of this requirement, we know the error in a single result is at most 1/2 of the value of the least significant bit. (If it were more, the rounding would have gone to a different number that is within 1/2 the value.)
Going on from there gets substantially more complicated; the next step is performing an operation where one of the inputs already has some error. For simple expressions, these errors can be followed through the calculations to reach a bound on the final error. In practice, this is only done in a few situations, such as working on a high-quality mathematics library. And, of course, you need precise control over exactly which operations are performed. High-level languages often give the compiler a lot of slack, so you might not know in which order operations are performed.
There is much more that could be (and is) written about this topic, but I have to stop there. In summary, the answer is: There is no library routine for this comparison because there is no single solution that fits most needs that is worth putting into a library routine. (If comparing with a relative or absolute error interval suffices for you, you can do it simply without a library routine.)
math.isclose() has been added to Python 3.5 for that (source code). Here is a port of it to Python 2. It's difference from one-liner of Mark Ransom is that it can handle "inf" and "-inf" properly.
def isclose(a, b, rel_tol=1e-09, abs_tol=0.0):
'''
Python 2 implementation of Python 3.5 math.isclose()
https://github.com/python/cpython/blob/v3.5.10/Modules/mathmodule.c#L1993
'''
# sanity check on the inputs
if rel_tol < 0 or abs_tol < 0:
raise ValueError("tolerances must be non-negative")
# short circuit exact equality -- needed to catch two infinities of
# the same sign. And perhaps speeds things up a bit sometimes.
if a == b:
return True
# This catches the case of two infinities of opposite sign, or
# one infinity and one finite number. Two infinities of opposite
# sign would otherwise have an infinite relative tolerance.
# Two infinities of the same sign are caught by the equality check
# above.
if math.isinf(a) or math.isinf(b):
return False
# now do the regular computation
# this is essentially the "weak" test from the Boost library
diff = math.fabs(b - a)
result = (((diff <= math.fabs(rel_tol * b)) or
(diff <= math.fabs(rel_tol * a))) or
(diff <= abs_tol))
return result
I'm not aware of anything in the Python standard library (or elsewhere) that implements Dawson's AlmostEqual2sComplement function. If that's the sort of behaviour you want, you'll have to implement it yourself. (In which case, rather than using Dawson's clever bitwise hacks you'd probably do better to use more conventional tests of the form if abs(a-b) <= eps1*(abs(a)+abs(b)) + eps2 or similar. To get Dawson-like behaviour you might say something like if abs(a-b) <= eps*max(EPS,abs(a),abs(b)) for some small fixed EPS; this isn't exactly the same as Dawson, but it's similar in spirit.
If you want to use it in testing/TDD context, I'd say this is a standard way:
from nose.tools import assert_almost_equals
assert_almost_equals(x, y, places=7) # The default is 7
In terms of absolute error, you can just check
if abs(a - b) <= error:
print("Almost equal")
Some information of why float act weird in Python:
Python 3 Tutorial 03 - if-else, logical operators and top beginner mistakes
You can also use math.isclose for relative errors.
This is useful for the case where you want to make sure two numbers are the same 'up to precision', and there isn't any need to specify the tolerance:
Find minimum precision of the two numbers
Round both of them to minimum precision and compare
def isclose(a, b):
astr = str(a)
aprec = len(astr.split('.')[1]) if '.' in astr else 0
bstr = str(b)
bprec = len(bstr.split('.')[1]) if '.' in bstr else 0
prec = min(aprec, bprec)
return round(a, prec) == round(b, prec)
As written, it only works for numbers without the 'e' in their string representation (meaning 0.9999999999995e-4 < number <= 0.9999999999995e11)
Example:
>>> isclose(10.0, 10.049)
True
>>> isclose(10.0, 10.05)
False
For some of the cases where you can affect the source number representation, you can represent them as fractions instead of floats, using integer numerator and denominator. That way you can have exact comparisons.
See Fraction from fractions module for details.
I liked Sesquipedal's suggestion, but with modification (a special use case when both values are 0 returns False). In my case, I was on Python 2.7 and just used a simple function:
if f1 ==0 and f2 == 0:
return True
else:
return abs(f1-f2) < tol*max(abs(f1),abs(f2))
If you want to do it in a testing or TDD context using the pytest package, here's how:
import pytest
PRECISION = 1e-3
def assert_almost_equal():
obtained_value = 99.99
expected_value = 100.00
assert obtained_value == pytest.approx(expected_value, PRECISION)
I found the following comparison helpful:
str(f1) == str(f2)
To compare up to a given decimal without atol/rtol:
def almost_equal(a, b, decimal=6):
return '{0:.{1}f}'.format(a, decimal) == '{0:.{1}f}'.format(b, decimal)
print(almost_equal(0.0, 0.0001, decimal=5)) # False
print(almost_equal(0.0, 0.0001, decimal=4)) # True
This maybe is a bit ugly hack, but it works pretty well when you don't need more than the default float precision (about 11 decimals).
The round_to function uses the format method from the built-in str class to round up the float to a string that represents the float with the number of decimals needed, and then applies the eval built-in function to the rounded float string to get back to the float numeric type.
The is_close function just applies a simple conditional to the rounded up float.
def round_to(float_num, prec):
return eval("'{:." + str(int(prec)) + "f}'.format(" + str(float_num) + ")")
def is_close(float_a, float_b, prec):
if round_to(float_a, prec) == round_to(float_b, prec):
return True
return False
>>>a = 10.0
10.0
>>>b = 10.0001
10.0001
>>>print is_close(a, b, prec=3)
True
>>>print is_close(a, b, prec=4)
False
Update:
As suggested by #stepehjfox, a cleaner way to build a rount_to function avoiding "eval" is using nested formatting:
def round_to(float_num, prec):
return '{:.{precision}f}'.format(float_num, precision=prec)
Following the same idea, the code can be even simpler using the great new f-strings (Python 3.6+):
def round_to(float_num, prec):
return f'{float_num:.{prec}f}'
So, we could even wrap it up all in one simple and clean 'is_close' function:
def is_close(a, b, prec):
return f'{a:.{prec}f}' == f'{b:.{prec}f}'
If you want to compare floats, the options above are great, but in my case, I ended up using Enum's, since I only had few valid floats my use case was accepting.
from enum import Enum
class HolidayMultipliers(Enum):
EMPLOYED_LESS_THAN_YEAR = 2.0
EMPLOYED_MORE_THAN_YEAR = 2.5
Then running:
testable_value = 2.0
HolidayMultipliers(testable_value)
If the float is valid, it's fine, but otherwise it will just throw an ValueError.
Use == is a simple good way, if you don't care about tolerance precisely.
# Python 3.8.5
>>> 1.0000000000001 == 1
False
>>> 1.00000000000001 == 1
True
But watch out for 0:
>>> 0 == 0.00000000000000000000000000000000000000000001
False
The 0 is always the zero.
Use math.isclose if you want to control the tolerance.
The default a == b is equivalent to math.isclose(a, b, rel_tol=1e-16, abs_tol=0).
If you still want to use == with a self-defined tolerance:
>>> class MyFloat(float):
def __eq__(self, another):
return math.isclose(self, another, rel_tol=0, abs_tol=0.001)
>>> a == MyFloat(0)
>>> a
0.0
>>> a == 0.001
True
So far, I didn't find anywhere to config it globally for float. Besides, mock is also not working for float.__eq__.

Tuples and Ternary and positional parameters

Given:
>>> a,b=2,3
>>> c,d=3,2
>>> def f(x,y): print(x,y)
I have an existing (as in cannot be changed) 2 positional parameter function where I want the positional parameters to always be in ascending order; i.e., f(2,3) no matter what two arguments I use (f(a,b) is the same as f(c,d) in the example)
I know that I could do:
>>> f(*sorted([c,d]))
2 3
Or I could do:
>>> f(*((a,b) if a<b else (b,a)))
2 3
(Note the need for tuple parenthesis in this form because , is lower precedence than the ternary...)
Or,
def my_f(a,b):
return f(a,b) if a<b else f(b,a)
All these seem kinda kludgy. Is there another syntax that I am missing?
Edit
I missed an 'old school' Python two member tuple method. Index a two member tuple based on the True == 1, False == 0 method:
>>> f(*((a,b),(b,a))[a>b])
2 3
Also:
>>> f(*{True:(a,b), False:(b,a)}[a<b])
2 3
Edit 2
The reason for this silly exercise: numpy.isclose has the following usage note:
For finite values, isclose uses the following equation to test whether
two floating point values are equivalent.
absolute(a - b) <= (atol + rtol * absolute(b))
The above equation is not symmetric in a and b, so that isclose(a, b)
might be different from isclose(b, a) in some rare cases.
I would prefer that not happen.
I am looking for the fastest way to make sure that arguments to numpy.isclose are in a consistent order. That is why I am shying away from f(*sorted([c,d]))
Implemented my solution in case anyone else is looking.
def sort(f):
def wrapper(*args):
return f(*sorted(args))
return wrapper
#sort
def f(x, y):
print(x, y)
f(3, 2)
>>> (2, 3)
Also since #Tadhg McDonald-Jensen mention that you may not be able to change the function yourself that you could wrap the function as such
my_func = sort(f)
You mention that your use-case is np.isclose. However your approach isn't a good way to solve the real issue. But it's understandable given the poor argument naming of that function - it sort of implies that both arguments are interchangable. If it were: numpy.isclose(measured, expected, ...) (or something like it) it would be much clearer.
For example if you expect the value 10 and measure 10.51 and you allow for 5% deviation, then in order to get a useful result you must use np.isclose(10.51, 10, ...), otherwise you would get wrong results:
>>> import numpy as np
>>> measured = 10.51
>>> expected = 10
>>> err_rel = 0.05
>>> err_abs = 0.0
>>> np.isclose(measured, expected, err_rel, err_abs)
False
>>> np.isclose(expected, measured, err_rel, err_abs)
True
It's clear to see that the first one gives the correct result because the actually measured value is not within the tolerance of the expected value. That's because the relative uncertainty is an "attribute" of the expected value, not of the value you compare it with!
So solving this issue by "sorting" the parameters is just wrong. That's a bit like changing the numerator and denominator for division because the denominator contains zeros and dividing by zero could give NaN, Inf, a Warning or an Exception... it definetly avoids the problem but just by giving an incorrect result (the comparison isn't perfect because with division it will almost always give a wrong result; with isclose it's rare).
This was a somewhat artificial example designed to trigger that behaviour and most of the time it's not important if you use measured, expected or expected, measured but in the few cases where it does matter you can't solve it by swapping the arguments (except when you have no "expected" result, but that rarely happens - at least it shouldn't).
There was some discussion about this topic when math.isclose was added to the python library:
Symmetry (PEP 485)
[...]
Which approach is most appropriate depends on what question is being asked. If the question is: "are these two numbers close to each other?", there is no obvious ordering, and a symmetric test is most appropriate.
However, if the question is: "Is the computed value within x% of this known value?", then it is appropriate to scale the tolerance to the known value, and an asymmetric test is most appropriate.
[...]
This proposal [for math.isclose] uses a symmetric test.
So if your test falls into the first category and you like a symmetric test - then math.isclose could be a viable alternative (at least if you're dealing with scalars):
math.isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0)
[...]
rel_tol is the relative tolerance – it is the maximum allowed difference between a and b, relative to the larger absolute value of a or b. For example, to set a tolerance of 5%, pass rel_tol=0.05. The default tolerance is 1e-09, which assures that the two values are the same within about 9 decimal digits. rel_tol must be greater than zero.
[...]
Just in case this answer couldn't convince you and you still want to use a sorted approach - then you should order by the absolute of you values (i.e. *sorted([a, b], key=abs)). Otherwise you might get surprising results when comparing negative numbers:
>>> np.isclose(-10.51, -10, err_rel, err_abs) # -10.51 is smaller than -10!
False
>>> np.isclose(-10, -10.51, err_rel, err_abs)
True
For only two elements in the tuple, the second one is the preferred idiom -- in my experience. It's fast, readable, etc.
No, there isn't really another syntax. There's also
(min(a,b), max(a,b))
... but this isn't particularly superior to the other methods; merely another way of expressing it.
Note after comment by dawg:
A class with custom comparison operators could return the same object for both min and max.

Comparing two floats in python [duplicate]

It's well known that comparing floats for equality is a little fiddly due to rounding and precision issues.
For example: Comparing Floating Point Numbers, 2012 Edition
What is the recommended way to deal with this in Python?
Is a standard library function for this somewhere?
Python 3.5 adds the math.isclose and cmath.isclose functions as described in PEP 485.
If you're using an earlier version of Python, the equivalent function is given in the documentation.
def isclose(a, b, rel_tol=1e-09, abs_tol=0.0):
return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)
rel_tol is a relative tolerance, it is multiplied by the greater of the magnitudes of the two arguments; as the values get larger, so does the allowed difference between them while still considering them equal.
abs_tol is an absolute tolerance that is applied as-is in all cases. If the difference is less than either of those tolerances, the values are considered equal.
Something as simple as the following may be good enough:
return abs(f1 - f2) <= allowed_error
I would agree that Gareth's answer is probably most appropriate as a lightweight function/solution.
But I thought it would be helpful to note that if you are using NumPy or are considering it, there is a packaged function for this.
numpy.isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)
A little disclaimer though: installing NumPy can be a non-trivial experience depending on your platform.
Use Python's decimal module, which provides the Decimal class.
From the comments:
It is worth noting that if you're
doing math-heavy work and you don't
absolutely need the precision from
decimal, this can really bog things
down. Floats are way, way faster to
deal with, but imprecise. Decimals are
extremely precise but slow.
The common wisdom that floating-point numbers cannot be compared for equality is inaccurate. Floating-point numbers are no different from integers: If you evaluate "a == b", you will get true if they are identical numbers and false otherwise (with the understanding that two NaNs are of course not identical numbers).
The actual problem is this: If I have done some calculations and am not sure the two numbers I have to compare are exactly correct, then what? This problem is the same for floating-point as it is for integers. If you evaluate the integer expression "7/3*3", it will not compare equal to "7*3/3".
So suppose we asked "How do I compare integers for equality?" in such a situation. There is no single answer; what you should do depends on the specific situation, notably what sort of errors you have and what you want to achieve.
Here are some possible choices.
If you want to get a "true" result if the mathematically exact numbers would be equal, then you might try to use the properties of the calculations you perform to prove that you get the same errors in the two numbers. If that is feasible, and you compare two numbers that result from expressions that would give equal numbers if computed exactly, then you will get "true" from the comparison. Another approach is that you might analyze the properties of the calculations and prove that the error never exceeds a certain amount, perhaps an absolute amount or an amount relative to one of the inputs or one of the outputs. In that case, you can ask whether the two calculated numbers differ by at most that amount, and return "true" if they are within the interval. If you cannot prove an error bound, you might guess and hope for the best. One way of guessing is to evaluate many random samples and see what sort of distribution you get in the results.
Of course, since we only set the requirement that you get "true" if the mathematically exact results are equal, we left open the possibility that you get "true" even if they are unequal. (In fact, we can satisfy the requirement by always returning "true". This makes the calculation simple but is generally undesirable, so I will discuss improving the situation below.)
If you want to get a "false" result if the mathematically exact numbers would be unequal, you need to prove that your evaluation of the numbers yields different numbers if the mathematically exact numbers would be unequal. This may be impossible for practical purposes in many common situations. So let us consider an alternative.
A useful requirement might be that we get a "false" result if the mathematically exact numbers differ by more than a certain amount. For example, perhaps we are going to calculate where a ball thrown in a computer game traveled, and we want to know whether it struck a bat. In this case, we certainly want to get "true" if the ball strikes the bat, and we want to get "false" if the ball is far from the bat, and we can accept an incorrect "true" answer if the ball in a mathematically exact simulation missed the bat but is within a millimeter of hitting the bat. In that case, we need to prove (or guess/estimate) that our calculation of the ball's position and the bat's position have a combined error of at most one millimeter (for all positions of interest). This would allow us to always return "false" if the ball and bat are more than a millimeter apart, to return "true" if they touch, and to return "true" if they are close enough to be acceptable.
So, how you decide what to return when comparing floating-point numbers depends very much on your specific situation.
As to how you go about proving error bounds for calculations, that can be a complicated subject. Any floating-point implementation using the IEEE 754 standard in round-to-nearest mode returns the floating-point number nearest to the exact result for any basic operation (notably multiplication, division, addition, subtraction, square root). (In case of tie, round so the low bit is even.) (Be particularly careful about square root and division; your language implementation might use methods that do not conform to IEEE 754 for those.) Because of this requirement, we know the error in a single result is at most 1/2 of the value of the least significant bit. (If it were more, the rounding would have gone to a different number that is within 1/2 the value.)
Going on from there gets substantially more complicated; the next step is performing an operation where one of the inputs already has some error. For simple expressions, these errors can be followed through the calculations to reach a bound on the final error. In practice, this is only done in a few situations, such as working on a high-quality mathematics library. And, of course, you need precise control over exactly which operations are performed. High-level languages often give the compiler a lot of slack, so you might not know in which order operations are performed.
There is much more that could be (and is) written about this topic, but I have to stop there. In summary, the answer is: There is no library routine for this comparison because there is no single solution that fits most needs that is worth putting into a library routine. (If comparing with a relative or absolute error interval suffices for you, you can do it simply without a library routine.)
math.isclose() has been added to Python 3.5 for that (source code). Here is a port of it to Python 2. It's difference from one-liner of Mark Ransom is that it can handle "inf" and "-inf" properly.
def isclose(a, b, rel_tol=1e-09, abs_tol=0.0):
'''
Python 2 implementation of Python 3.5 math.isclose()
https://github.com/python/cpython/blob/v3.5.10/Modules/mathmodule.c#L1993
'''
# sanity check on the inputs
if rel_tol < 0 or abs_tol < 0:
raise ValueError("tolerances must be non-negative")
# short circuit exact equality -- needed to catch two infinities of
# the same sign. And perhaps speeds things up a bit sometimes.
if a == b:
return True
# This catches the case of two infinities of opposite sign, or
# one infinity and one finite number. Two infinities of opposite
# sign would otherwise have an infinite relative tolerance.
# Two infinities of the same sign are caught by the equality check
# above.
if math.isinf(a) or math.isinf(b):
return False
# now do the regular computation
# this is essentially the "weak" test from the Boost library
diff = math.fabs(b - a)
result = (((diff <= math.fabs(rel_tol * b)) or
(diff <= math.fabs(rel_tol * a))) or
(diff <= abs_tol))
return result
I'm not aware of anything in the Python standard library (or elsewhere) that implements Dawson's AlmostEqual2sComplement function. If that's the sort of behaviour you want, you'll have to implement it yourself. (In which case, rather than using Dawson's clever bitwise hacks you'd probably do better to use more conventional tests of the form if abs(a-b) <= eps1*(abs(a)+abs(b)) + eps2 or similar. To get Dawson-like behaviour you might say something like if abs(a-b) <= eps*max(EPS,abs(a),abs(b)) for some small fixed EPS; this isn't exactly the same as Dawson, but it's similar in spirit.
If you want to use it in testing/TDD context, I'd say this is a standard way:
from nose.tools import assert_almost_equals
assert_almost_equals(x, y, places=7) # The default is 7
In terms of absolute error, you can just check
if abs(a - b) <= error:
print("Almost equal")
Some information of why float act weird in Python:
Python 3 Tutorial 03 - if-else, logical operators and top beginner mistakes
You can also use math.isclose for relative errors.
This is useful for the case where you want to make sure two numbers are the same 'up to precision', and there isn't any need to specify the tolerance:
Find minimum precision of the two numbers
Round both of them to minimum precision and compare
def isclose(a, b):
astr = str(a)
aprec = len(astr.split('.')[1]) if '.' in astr else 0
bstr = str(b)
bprec = len(bstr.split('.')[1]) if '.' in bstr else 0
prec = min(aprec, bprec)
return round(a, prec) == round(b, prec)
As written, it only works for numbers without the 'e' in their string representation (meaning 0.9999999999995e-4 < number <= 0.9999999999995e11)
Example:
>>> isclose(10.0, 10.049)
True
>>> isclose(10.0, 10.05)
False
For some of the cases where you can affect the source number representation, you can represent them as fractions instead of floats, using integer numerator and denominator. That way you can have exact comparisons.
See Fraction from fractions module for details.
I liked Sesquipedal's suggestion, but with modification (a special use case when both values are 0 returns False). In my case, I was on Python 2.7 and just used a simple function:
if f1 ==0 and f2 == 0:
return True
else:
return abs(f1-f2) < tol*max(abs(f1),abs(f2))
If you want to do it in a testing or TDD context using the pytest package, here's how:
import pytest
PRECISION = 1e-3
def assert_almost_equal():
obtained_value = 99.99
expected_value = 100.00
assert obtained_value == pytest.approx(expected_value, PRECISION)
I found the following comparison helpful:
str(f1) == str(f2)
To compare up to a given decimal without atol/rtol:
def almost_equal(a, b, decimal=6):
return '{0:.{1}f}'.format(a, decimal) == '{0:.{1}f}'.format(b, decimal)
print(almost_equal(0.0, 0.0001, decimal=5)) # False
print(almost_equal(0.0, 0.0001, decimal=4)) # True
This maybe is a bit ugly hack, but it works pretty well when you don't need more than the default float precision (about 11 decimals).
The round_to function uses the format method from the built-in str class to round up the float to a string that represents the float with the number of decimals needed, and then applies the eval built-in function to the rounded float string to get back to the float numeric type.
The is_close function just applies a simple conditional to the rounded up float.
def round_to(float_num, prec):
return eval("'{:." + str(int(prec)) + "f}'.format(" + str(float_num) + ")")
def is_close(float_a, float_b, prec):
if round_to(float_a, prec) == round_to(float_b, prec):
return True
return False
>>>a = 10.0
10.0
>>>b = 10.0001
10.0001
>>>print is_close(a, b, prec=3)
True
>>>print is_close(a, b, prec=4)
False
Update:
As suggested by #stepehjfox, a cleaner way to build a rount_to function avoiding "eval" is using nested formatting:
def round_to(float_num, prec):
return '{:.{precision}f}'.format(float_num, precision=prec)
Following the same idea, the code can be even simpler using the great new f-strings (Python 3.6+):
def round_to(float_num, prec):
return f'{float_num:.{prec}f}'
So, we could even wrap it up all in one simple and clean 'is_close' function:
def is_close(a, b, prec):
return f'{a:.{prec}f}' == f'{b:.{prec}f}'
If you want to compare floats, the options above are great, but in my case, I ended up using Enum's, since I only had few valid floats my use case was accepting.
from enum import Enum
class HolidayMultipliers(Enum):
EMPLOYED_LESS_THAN_YEAR = 2.0
EMPLOYED_MORE_THAN_YEAR = 2.5
Then running:
testable_value = 2.0
HolidayMultipliers(testable_value)
If the float is valid, it's fine, but otherwise it will just throw an ValueError.
Use == is a simple good way, if you don't care about tolerance precisely.
# Python 3.8.5
>>> 1.0000000000001 == 1
False
>>> 1.00000000000001 == 1
True
But watch out for 0:
>>> 0 == 0.00000000000000000000000000000000000000000001
False
The 0 is always the zero.
Use math.isclose if you want to control the tolerance.
The default a == b is equivalent to math.isclose(a, b, rel_tol=1e-16, abs_tol=0).
If you still want to use == with a self-defined tolerance:
>>> class MyFloat(float):
def __eq__(self, another):
return math.isclose(self, another, rel_tol=0, abs_tol=0.001)
>>> a == MyFloat(0)
>>> a
0.0
>>> a == 0.001
True
So far, I didn't find anywhere to config it globally for float. Besides, mock is also not working for float.__eq__.

Faster method of evaluating a boolean expression as a string in Python

I have been working on this project for a couple months right now. The ultimate goal of this project is to evaluate an entire digital logic circuit similar to functional testing; just to give a scope of the problem. The topic I created here deals with the issue I'm having with performance of analyzing a boolean expression. For any gate inside a digital circuit, it has an output expression in terms of the global inputs. EX: ((A&B)|(C&D)^E). What I want to do with this expression is then calculate all possible outcomes and determine how much influence each input has on the outcome.
The fastest way that I have found was by building a truth table as a matrix and looking at certain rows (won't go into specifics of that algorithm as it's offtopic), the problem with that is once the number of unique inputs goes above 26-27 (something around that) the memory usage is well beyond 16GB (Max my computer has). You might say "Buy more RAM", but as every increase in inputs by 1, memory usage doubles. Some of the expressions I analyze are well over 200 unique inputs...
The method I use right now uses the compile method to take the expression as the string. Then I create an array with all of the inputs found from the compile method. Then I generate a list row by row of "True" and "False" randomly chosen from a sample of possible values (that way it will be equivalent to rows in a truth table if the sample size is the same size as the range and it will allow me to limit the sample size when things get too long to calculate). These values are then zipped with the input names and used to evaluate the expression. This will give the initial result, after that I go column by column in the random boolean list and flip the boolean then zip it with the inputs again and evaluate it again to determine if the result changed.
So my question is this: Is there a faster way? I have included the code that performs the work. I have tried regular expressions to find and replace but it is always slower (from what I've seen). Take into account that the inner for loop will run N times where N is the number of unique inputs. The outside for loop I limit to run 2^15 if N > 15. So this turns into eval being executed Min(2^N, 2^15) * (1 + N)...
As an update to clarify what I am asking exactly (Sorry for any confusion). The algorithm/logic for calculating what I need is not the issue. I am asking for an alternative to the python built-in 'eval' that will perform the same thing faster. (take a string in the format of a boolean expression, replace the variables in the string with the values in the dictionary and then evaluate the string).
#value is expression as string
comp = compile(value.strip(), '-', 'eval')
inputs = comp.co_names
control = [0]*len(inputs)
#Sequences of random boolean values to be used
random_list = gen_rand_bits(len(inputs))
for row in random_list:
valuedict = dict(zip(inputs, row))
answer = eval(comp, valuedict)
for column in range(len(row)):
row[column] = ~row[column]
newvaluedict = dict(zip(inputs, row))
newanswer = eval(comp, newvaluedict)
row[column] = ~row[column]
if answer != newanswer:
control[column] = control[column] + 1
My question:
Just to make sure that I understand this correctly: Your actual problem is to determine the relative influence of each variable within a boolean expression on the outcome of said expression?
OP answered:
That is what I am calculating but my problem is not with how I calculate it logically but with my use of the python eval built-in to perform evaluating.
So, this seems to be a classic XY problem. You have an actual problem which is to determine the relative influence of each variable within the a boolean expression. You have attempted to solve this in a rather ineffective way, and now that you actually “feel” the inefficiency (in both memory usage and run time), you look for ways to improve your solution instead of looking for better ways to solve your original problem.
In any way, let’s first look at how you are trying to solve this. I’m not exactly sure what gen_rand_bits is supposed to do, so I can’t really take that into account. But still, you are essentially trying out every possible combination of variable assignments and see if flipping the value for a single variable changes the outcome of the formula result. “Luckily”, these are just boolean variables, so you are “only” looking at 2^N possible combinations. This means you have exponential run time. Now, O(2^N) algorithms are in theory very very bad, while in practice it’s often somewhat okay to use them (because most have an acceptable average case and execute fast enough). However, being an exhaustive algorithm, you actually have to look at every single combination and can’t shortcut. Plus the compilation and value evaluation using Python’s eval is apparently not so fast to make the inefficient algorithm acceptable.
So, we should look for a different solution. When looking at your solution, one might say that more efficient is not really possible, but when looking at the original problem, we can argue otherwise.
You essentially want to do things similar to what compilers do as static analysis. You want to look at the source code and analyze it just from there without having to actually evaluate that. As the language you are analyzing is highly restricted (being only a boolean expression with very few operators), this isn’t really that hard.
Code analysis usually works on the abstract syntax tree (or an augmented version of that). Python offers code analysis and abstract syntax tree generation with its ast module. We can use this to parse the expression and get the AST. Then based on the tree, we can analyze how relevant each part of an expression is for the whole.
Now, evaluating the relevance of each variable can get quite complicated, but you can do it all by analyzing the syntax tree. I will show you a simple evaluation that supports all boolean operators but will not further check the semantic influence of expressions:
import ast
class ExpressionEvaluator:
def __init__ (self, rawExpression):
self.raw = rawExpression
self.ast = ast.parse(rawExpression)
def run (self):
return self.evaluate(self.ast.body[0])
def evaluate (self, expr):
if isinstance(expr, ast.Expr):
return self.evaluate(expr.value)
elif isinstance(expr, ast.Name):
return self.evaluateName(expr)
elif isinstance(expr, ast.UnaryOp):
if isinstance(expr.op, ast.Invert):
return self.evaluateInvert(expr)
else:
raise Exception('Unknown unary operation {}'.format(expr.op))
elif isinstance(expr, ast.BinOp):
if isinstance(expr.op, ast.BitOr):
return self.evaluateBitOr(expr.left, expr.right)
elif isinstance(expr.op, ast.BitAnd):
return self.evaluateBitAnd(expr.left, expr.right)
elif isinstance(expr.op, ast.BitXor):
return self.evaluateBitXor(expr.left, expr.right)
else:
raise Exception('Unknown binary operation {}'.format(expr.op))
else:
raise Exception('Unknown expression {}'.format(expr))
def evaluateName (self, expr):
return { expr.id: 1 }
def evaluateInvert (self, expr):
return self.evaluate(expr.operand)
def evaluateBitOr (self, left, right):
return self.join(self.evaluate(left), .5, self.evaluate(right), .5)
def evaluateBitAnd (self, left, right):
return self.join(self.evaluate(left), .5, self.evaluate(right), .5)
def evaluateBitXor (self, left, right):
return self.join(self.evaluate(left), .5, self.evaluate(right), .5)
def join (self, a, ratioA, b, ratioB):
d = { k: v * ratioA for k, v in a.items() }
for k, v in b.items():
if k in d:
d[k] += v * ratioB
else:
d[k] = v * ratioB
return d
expr = '((A&B)|(C&D)^~E)'
ee = ExpressionEvaluator(expr)
print(ee.run())
# > {'A': 0.25, 'C': 0.125, 'B': 0.25, 'E': 0.25, 'D': 0.125}
This implementation will essentially generate a plain AST for the given expression and the recursively walk through the tree and evaluate the different operators. The big evaluate method just delegates the work to the type specific methods below; it’s similar to what ast.NodeVisitor does except that we return the analyzation results from each node here. One could augment the nodes instead of returning it instead though.
In this case, the evaluation is just based on ocurrence in the expression. I don’t explicitely check for semantic effects. So for an expression A | (A & B), I get {'A': 0.75, 'B': 0.25}, although one could argue that semantically B has no relevance at all to the result (making it {'A': 1} instead). This is however something I’ll leave for you. As of now, every binary operation is handled identically (each operand getting a relevance of 50%), but that can be of course adjusted to introduce some semantic rules.
In any way, it will not be necessary to actually test variable assignments.
Instead of reinventing the wheel and getting into risk like performance and security which you are already in, it is better to search for industry ready well accepted libraries.
Logic Module of sympy would do the exact thing that you want to achieve without resorting to evil ohh I meant eval. More importantly, as the boolean expression is not a string you don;t have to care about parsing the expression which generally turns out to be the bottleneck.
You don't have to prepare a static table for computing this. Python is a dynamic language, thus it's able to interpret and run a code by itself during runtime.
In you case, I would suggest a soluation that:
import random, re, time
#Step 1: Input your expression as a string
logic_exp = "A|B&(C|D)&E|(F|G|H&(I&J|K|(L&M|N&O|P|Q&R|S)&T)|U&V|W&X&Y)"
#Step 2: Retrieve all the variable names.
# You can design a rule for naming, and use regex to retrieve them.
# Here for example, I consider all the single-cap-lettler are variables.
name_regex = re.compile(r"[A-Z]")
#Step 3: Replace each variable with its value.
# You could get the value with reading files or keyboard input.
# Here for example I just use random 0 or 1.
for name in name_regex.findall(logic_exp):
logic_exp = logic_exp.replace(name, str(random.randrange(2)))
#Step 4: Replace the operators. Python use 'and', 'or' instead of '&', '|'
logic_exp = logic_exp.replace("&", " and ")
logic_exp = logic_exp.replace("|", " or " )
#Step 5: interpret the expression with eval(exp) and output its value.
print "exporession =", logic_exp
print "expression output =",eval(logic_exp)
This would be very fast and take very little memory. For a test, I run the example above with 25 input variables:
exporession = 1 or 1 and (1 or 1) and 0 or (0 or 0 or 1 and (1 and 0 or 0 or (0 and 0 or 0 and 0 or 1 or 0 and 0 or 0) and 1) or 0 and 1 or 0 and 1 and 0)
expression output= 1
computing time: 0.000158071517944 seconds
According to your comment, I see that you are computing all the possible combinations instead of the output at a given input values. If so, it would become a typical NP-complete Boolean satisfiability problem. I don't think there's any algorithm that could make it by a complexity lower than O(2^N). I suggest you to search with the keywords fast algorithm to solve SAT problem, you would find a lot of interesting things.

Categories