python decimal comparison
>>> from decimal import Decimal
>>> Decimal('1.0') > 2.0
True
I was expecting it to convert 2.0 correctly, but after reading thru PEP 327 I understand there were some reason for not implictly converting float to Decimal, but shouldn't in that case it should raise TypeError as it does in this case
>>> Decimal('1.0') + 2.0
Traceback (most recent call last):
File "<string>", line 1, in <string>
TypeError: unsupported operand type(s) for +: 'Decimal' and 'float'
so does all other operator / - % // etc
so my questions are
is this right behavior? (not to raise exception in cmp)
What if I derive my own class and
right a float converter basically
Decimal(repr(float_value)), are
there any caveats? my use case
involves only comparison of prices
System details: Python 2.5.2 on Ubuntu 8.04.1
Re 1, it's indeed the behavior we designed -- right or wrong as it may be (sorry if that trips your use case up, but we were trying to be general!).
Specifically, it's long been the case that every Python object could be subject to inequality comparison with every other -- objects of types that aren't really comparable get arbitrarily compared (consistently in a given run, not necessarily across runs); main use case was sorting a heterogeneous list to group elements in it by type.
An exception was introduced for complex numbers only, making them non-comparable to anything -- but that was still many years ago, when we were occasionally cavalier about breaking perfectly good user code. Nowadays we're much stricter about backwards compatibility within a major release (e.g. along the 2.* line, and separately along the 3.* one, though incompatibilities are allowed between 2 and 3 -- indeed that's the whole point of having a 3.* series, letting us fix past design decisions even in incompatible ways).
The arbitrary comparisons turned out to be more trouble than they're worth, causing user confusion; and the grouping by type can now be obtained easily e.g. with a key=lambda x: str(type(x)) argument to sort; so in Python 3 comparisons between objects of different types, unless the objects themselves specifically allow it in the comparison methods, does raise an exception:
>>> decimal.Decimal('2.0') > 1.2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: Decimal() > float()
In other words, in Python 3 this behaves exactly as you think it should; but in Python 2 it doesn't (and never will in any Python 2.*).
Re 2, you'll be fine -- though, look to gmpy for what I hope is an interesting way to convert doubles to infinite-precision fractions through Farey trees. If the prices you're dealing with are precise to no more than cents, use '%.2f' % x rather than repr(x)!-)
Rather than a subclass of Decimal, I'd use a factory function such as
def to_decimal(float_price):
return decimal.Decimal('%.2f' % float_price)
since, once produced, the resulting Decimal is a perfectly ordinary one.
The greater-than comparison works because, by default, it works for all objects.
>>> 'abc' > 123
True
Decimal is right merely because it correctly follows the spec. Whether the spec was the correct approach is a separate question. :)
Only the normal caveats when dealing with floats, which briefly summarized are: beware of edge cases such as negative zero, +/-infinity, and NaN, don't test for equality (related to the next point), and count on math being slightly inaccurate.
>>> print (1.1 + 2.2 == 3.3)
False
If it's "right" is a matter of opinion, but the rationale of why there is no automatic conversion exists in the PEP, and that was the decision taken. The caveat basically is that you can't always exactly convert between float and decimal. Therefore the conversion should not be implicit. If you in your application know that you never have enough significant numbers for this to affect you, making classes that allow this implicit behaviour shouldn't be a problem.
Also, one main argument is that real world use cases doesn't exist. It's likely to be simpler if you just use Decimal everywhere.
Related
I often see error messages that look like any of:
TypeError: '<' not supported between instances of 'str' and 'int'
The message can vary quite a bit, and I guess that it has many causes; so rather than ask again every time for every little situation, I want to know: what approaches or techniques can I use to find the problem, when I see this error message? (I have already read I'm getting a TypeError. How do I fix it?, but I'm looking for advice specific to the individual pattern of error messages I have identified.)
So far, I have figured out that:
the error will show some kind of operator (most commonly <; sometimes >, <=, >= or +) is "not supported between instances of", and then two type names (could be any types, but usually they are not the same).
The highlighted code will almost always have that operator in it somewhere, but the version with < can also show up if I am trying to sort something. (Why?)
Overview
As with any other TypeError, the main steps of the debugging task are:
Figure out what operation is raising the exception, what the inputs are, and what their types are
Understand why these types and operation cause a problem together, and determine which is wrong
If the input is wrong, work backwards to figure out where it comes from
The "working backwards" part is the same for all exceptions, but here are some specific hints for the first two steps.
Identifying the operation and inputs
This error occurs with the relational operators (or comparisons) <, >, <=, >=. It won't happen with == or != (unless someone specifically defines those operators for a user-defined class such that they do), because there is a fallback comparison based on object identity.
Bitwise, arithmetic and shifting operators give different error messages. (The boolean logical operators and and or do not normally cause a problem because of their logic is supported by every type by default, just like with == and !=. As for xor, that doesn't exist.)
As usual, start by looking at the last line of code mentioned in the error message. Go to the corresponding file and examine that line of code. (If the code is line-wrapped, it might not all be shown in the error message.)
Try to find an operator that matches the one in the error message, and double-check what the operands will be i.e. the things on the left-hand and right-hand side of the error. Double-check operator precedence to make sure of what expression will feed into the left-hand and right-hand sides of the operator. If the line is complex, try rewriting it to do the work in multiple steps. (If this accidentally fixes the problem, consider not trying to put it back!)
Sometimes the problem will be obvious at this point (for example, maybe the wrong variable was used due to a typo). Otherwise, use a debugger (ideally) or print traces to verify these values, and their types, at the time that the error occurs. The same line of code could run successfully many other times before the error occurs, so figuring out the problem with print can be difficult. Consider using temporary exception handling, along with breaking up the expression:
# result = complex_expression_a() < complex_expression_b()
try:
lhs, rhs = complex_expression_a(), complex_expression_b()
result = lhs < rhs
except TypeError:
print(f'comparison failed between `{lhs}` of type `{type(lhs)}` and `{rhs}` of type `{type(rhs)}`')
raise # so the program still stops and shows the error
Special case: sorting
As noted in the question, trying to sort a list using its .sort method, or to sort a sequence of values using the built-in sorted function (this is basically equivalent to creating a new list from the values, .sorting it and returning it), can cause TypeError: '<' not supported between instances of... - naming the types of two of the values that are in the input. This happens because general-purpose sorting involves comparing the values being sorted, and the built-in sort does this using <. (In Python 2.x, it was possible to specify a custom comparison function, but now custom sort orders are done using a "key" function that transforms the values into something that sorts in the desired way.)
Therefore, if the line of code contains one of these calls, the natural explanation is that the values being sorted are of incompatible types (typically, mixed types). Rather than looking for left- and right-hand side of an expression, we look at a single sequence of inputs. One useful technique here is to use set to find out all the types of these values (looking at individual values will probably not be as insightful):
try:
my_data.sort()
except TypeError:
print(f'sorting failed. Found these types: {set(type(d) for d in my_data)}')
raise
See also LabelEncoder: TypeError: '>' not supported between instances of 'float' and 'str' for a Pandas-specific variant of this problem.
If all the input values are the same type, it could still be that the type does not support comparison (for example, a list of all None cannot be sorted, despite that it's obvious that the result should just be the same list). A special note here: if the input was created using a list comprehension, then the values will normally be of the same type, but that type could be invalid. Carefully check the logic for the comprehension. If it results in a function, or in None, see the corresponding sections below.
Historical note
This kind of error is specific to Python 3. In 2.x, objects could be compared regardless of mismatched types, following rather complex rules; and certain things of the same type (such as dicts) could be compared that are no longer considered comparable in 3.x.
This meant that data could always be sorted without causing a cryptic error; but the resulting order could be hard to understand, and this permissive behaviour often caused many more problems than it solved.
Understanding the incompatibility
For comparisons, it's very likely that the problem is with either or both of the inputs, rather than the operator; but double-check the intended logic anyway.
For simple cases of sorting an input sequence, similarly, the problem is almost certainly with the input values. However, when sorting using a key function (e.g. mylist.sort(key=lambda x: ...), that function could also cause the problem. Double-check the logic: given the expected type for the input values, what type of thing will be returned? Does it make sense to compare two things of that type? If an existing function is used, test the function with some sample values. If a lambda is used, convert it to a function first and test that.
If the list is supposed to contain instances of a user-defined class, make sure that the class instances are created properly. Consider for example:
class Example:
def __init__(self):
self.attribute = None
mylist = [Example(), Example()]
mylist.sort(key=lambda e: e.attribute)
The key function was supposed to make it possible to sort the instances according to their attribute value, but those values were set wrongly to None - thus we still get an error, because the Nones returned from the key function are not comparable.
Comparing NoneType
NoneType is the type of the special None value, so this means that either of the operands (or one or more of the elements of the input) is None.
Check:
If the value is supposed to be provided by a user-defined function, make sure that the value is returned rather than being displayed using print and that the return value is used properly. Make sure that the function explicitly returns a non-None value without reaching the end, in every case. If the function uses recursion, make sure that it doesn't improperly ignore a value returned from the recursive call (i.e., unless there is a good reason).
If the value is supposed to come from a built-in method or a library function, make sure that it actually returns the value, rather than modifying the input as a side effect. This commonly happens for example with many list methods, random.shuffle, and print (especially a print call left over from a previous debugging attempt). Many other things can return None in some circumstances rather than reporting an error. When in doubt, read the documentation.
Comparing functions (or methods)
This almost always means that the function was not called when it should have been. Keep in mind that the parentheses are necessary for a call even if there are no arguments.
For example, if we have
import random
if random.random < 0.5:
print('heads')
else:
print('tails')
This will fail because the random function was not called - the code should say if random.random() < 0.5: instead.
Comparing strings and numbers
If one side of the comparison is a str and the other side is int or float, this typically suggests that the str should have been converted earlier on, as in this example. This especially happens when the string comes from user input.
Comparing user-defined types
By default, only == and != comparisons are possible with user-defined types. The others need to be implemented, using the special methods __lt__ (<), __le__ (<=), __gt__ (>) and/or __ge__ (>=). Python 3.x can make some inferences here automatically, but not many:
>>> class Example:
... def __init__(self, value):
... self._value = value
... def __gt__(self, other):
... if isinstance(other, Example):
... return self._value > other._value
... return self._value > other # for non-Examples
...
>>> Example(1) > Example(2) # our Example class supports `>` comparison with other Examples
False
>>> Example(1) > 2 # as well as non-Examples.
False
>>> Example(1) < Example(2) # `<` is inferred by swapping the arguments, for two Examples...
True
>>> Example(1) < 2 # but not for other types
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'Example' and 'int'
>>> Example(1) >= Example(2) # and `>=` does not work, even though `>` and `==` do
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '>=' not supported between instances of 'Example' and 'Example'
In 3.2 and up, this can be worked around using the total_ordering decorator from the standard library functools module:
>>> from functools import total_ordering
>>> #total_ordering
... class Example:
... # the rest of the class as before
>>> # Now all the examples work and do the right thing.
For some reason, Cython is returning 0 on a math expression that should evaluate to 0.5:
print(2 ** (-1)) # prints 0
Oddly enough, mix variables in and it'll work as expected:
i = 1
print(2 ** (-i)) # prints 0.5
Vanilla CPython returns 0.5 for both cases. I'm compiling for 37m-x86_64-linux-gnu, and language_level is set to 3.
What is this witchcraft?
It's because it's using C ints rather than Python integers so it matches C behaviour rather than Python behaviour. I'm relatively sure this used to be documented as a limitation somewhere but I can't find it now. If you want to report it as a bug then go to https://github.com/cython/cython/issues, but I suspect this is a deliberate trade-off of speed for compatibility.
The code gets translated to
__Pyx_pow_long(2, -1L)
where __Pyx_pow_long is a function of type static CYTHON_INLINE long __Pyx_pow_long(long b, long e).
The easiest way to fix it is to change one/both of the numbers to be a floating point number
print(2. ** (-1))
As a general comment on the design choice: people from the C world generally expect int operator int to return an int, and this option will be fastest. Python had tried to do this in the past with the Python 2 division behaviour (but inconsistently - power always returned a floating point number).
Cython generally tries to follow Python behaviour. However, a lot of people are using it for speed so they also try to fall back to quick, C-like operations especially when people specify types (since those people want speed). I think what's happened here is that it's been able to infer the types automatically, and so defaulted to C behaviour. I suspect ideally it should distinguish between specified types and types that it's inferred. However, it's also probably too late to start changing that.
It looks like Cython is incorrectly inferring the final data type as int rather than float when only numbers are involved
The following code works as expected:
print(2.0 ** (-1))
See this link for a related discussion: https://groups.google.com/forum/#!topic/cython-users/goVpote2ScY
It's clearly stated in the docs that int(number) is a flooring type conversion:
int(1.23)
1
and int(string) returns an int if and only if the string is an integer literal.
int('1.23')
ValueError
int('1')
1
Is there any special reason for that? I find it counterintuitive that the function floors in one case, but not the other.
There is no special reason. Python is simply applying its general principle of not performing implicit conversions, which are well-known causes of problems, particularly for newcomers, in languages such as Perl and Javascript.
int(some_string) is an explicit request to convert a string to integer format; the rules for this conversion specify that the string must contain a valid integer literal representation. int(float) is an explicit request to convert a float to an integer; the rules for this conversion specify that the float's fractional portion will be truncated.
In order for int("3.1459") to return 3 the interpreter would have to implicitly convert the string to a float. Since Python doesn't support implicit conversions, it chooses to raise an exception instead.
This is almost certainly a case of applying three of the principles from the Zen of Python:
Explicit is better implicit.
[...] practicality beats purity
Errors should never pass silently
Some percentage of the time, someone doing int('1.23') is calling the wrong conversion for their use case, and wants something like float or decimal.Decimal instead. In these cases, it's clearly better for them to get an immediate error that they can fix, rather than silently giving the wrong value.
In the case that you do want to truncate that to an int, it is trivial to explicitly do so by passing it through float first, and then calling one of int, round, trunc, floor or ceil as appropriate. This also makes your code more self-documenting, guarding against a later modification "correcting" a hypothetical silently-truncating int call to float by making it clear that the rounded value is what you want.
Sometimes a thought experiment can be useful.
Behavior A: int('1.23') fails with an error. This is the existing behavior.
Behavior B: int('1.23') produces 1 without error. This is what you're proposing.
With behavior A, it's straightforward and trivial to get the effect of behavior B: use int(float('1.23')) instead.
On the other hand, with behavior B, getting the effect of behavior A is significantly more complicated:
def parse_pure_int(s):
if "." in s:
raise ValueError("invalid literal for integer with base 10: " + s)
return int(s)
(and even with the code above, I don't have complete confidence that there isn't some corner case that it mishandles.)
Behavior A therefore is more expressive than behavior B.
Another thing to consider: '1.23' is a string representation of a floating-point value. Converting '1.23' to an integer conceptually involves two conversions (string to float to integer), but int(1.23) and int('1') each involve only one conversion.
Edit:
And indeed, there are corner cases that the above code would not handle: 1e-2 and 1E-2 are both floating point values too.
In simple words - they're not the same function.
int( decimal ) behaves as 'floor i.e. knock off the decimal portion and return as int'
int( string ) behaves as 'this text describes an integer, convert it and return as int'.
They are 2 different functions with the same name that return an integer but they are different functions.
'int' is short and easy to remember and its meaning applied to each type is intuitive to most programmers which is why they chose it.
There's no implication they are providing the same or combined functionality, they simply have the same name and return the same type. They could as easily be called 'floorDecimalAsInt' and 'convertStringToInt', but they went for 'int' because it's easy to remember, (99%) intuitive and confusion would rarely occur.
Parsing text as an Integer for text which included a decimal point such as "4.5" would throw an error in majority of computer languages and be expected to throw an error by majority of programmers, since the text-value does not represent an integer and implies they are providing erroneous data
I will have this random number generated e.g 12.75 or 1.999999999 or 2.65
I want to always round this number down to the nearest integer whole number so 2.65 would be rounded to 2.
Sorry for asking but I couldn't find the answer after numerous searches, thanks :)
You can us either int(), math.trunc(), or math.floor(). They all will do what you want for positive numbers:
>>> import math
>>> math.floor(12.6) # returns 12.0 in Python 2
12
>>> int(12.6)
12
>>> math.trunc(12.6)
12
However, note that they behave differently with negative numbers: int and math.trunc will go to 0, whereas math.floor always floors downwards:
>>> import math
>>> math.floor(-12.6) # returns -13.0 in Python 2
-13
>>> int(-12.6)
-12
>>> math.trunc(-12.6)
-12
Note that math.floor and math.ceil used to return floats in Python 2.
Also note that int and math.trunc will both (at first glance) appear to do the same thing, though their exact semantics differ. In short: int is for general/type conversion and math.trunc is specifically for numeric types (and will help make your intent more clear).
Use int if you don't really care about the difference, if you want to convert strings, or if you don't want to import a library. Use trunc if you want to be absolutely unambiguous about what you mean or if you want to ensure your code works correctly for non-builtin types.
More info below:
Math.floor() in Python 2 vs Python 3
Note that math.floor (and math.ceil) were changed slightly from Python 2 to Python 3 -- in Python 2, both functions will return a float instead of an int. This was changed in Python 3 so that both methods return an int (more specifically, they call the __float__ method on whatever object they were given). So then, if you're using Python 2, or would like your code to maintain compatibility between the two versions, it would generally be safe to do int(math.floor(...)).
For more information about why this change was made + about the potential pitfalls of doing int(math.floor(...)) in Python 2, see
Why do Python's math.ceil() and math.floor() operations return floats instead of integers?
int vs math.trunc()
At first glance, the int() and math.trunc() methods will appear to be identical. The primary differences are:
int(...)
The int function will accept floats, strings, and ints.
Running int(param) will call the param.__int__() method in order to perform the conversion (and then will try calling __trunc__ if __int__ is undefined)
The __int__ magic method was not always unambiguously defined -- for some period of time, it turned out that the exact semantics and rules of how __int__ should work were largely left up to the implementing class.
The int function is meant to be used when you want to convert a general object into an int. It's a type conversion method. For example, you can convert strings to ints by doing int("42") (or do things like change of base: int("AF", 16) -> 175).
math.trunc(...)
The trunc will only accept numeric types (ints, floats, etc)
Running math.trunc(param) function will call the param.__trunc__() method in order to perform the conversion
The exact behavior and semantics of the __trunc__ magic method was precisely defined in PEP 3141 (and more specifically in the Changes to operations and __magic__ methods section).
The math.trunc function is meant to be used when you want to take an existing real number and specifically truncate and remove its decimals to produce an integral type. This means that unlike int, math.trunc is a purely numeric operation.
All that said, it turns out all of Python's built-in types will behave exactly the same whether you use int or trunc. This means that if all you're doing is using regular ints, floats, fractions, and decimals, you're free to use either int or trunc.
However, if you want to be very precise about what exactly your intent is (ie if you want to make it absolutely clear whether you're flooring or truncating), or if you're working with custom numeric types that have different implementations for __int__ and __trunc__, then it would probably be best to use math.trunc.
You can also find more information and debate about this topic on Python's developer mailing list.
you can do this easily with a built in python functions, just use two forward slashes and divide by 1.
>>> print 12.75//1
12.0
>>> print 1.999999999//1
1.0
>>> print 2.65//1
2.0
No need to import any module like math etc....
python bydeafault it convert if you do simply type cast by integer
>>>x=2.65
>>>int(x)
2
I'm not sure whether you want math.floor, math.trunc, or int, but... it's almost certainly one of those functions, and you can probably read the docs and decide more easily than you can explain enough for usb to decide for you.
Obviously, Michael0x2a's answer is what you should do. But, you can always get a bit creative.
int(str(12.75).split('.')[0])
If you only looking for the nearest integer part I think the best option would be to use math.trunc() function.
import math
math.trunc(123.456)
You can also use int()
int(123.456)
The difference between these two functions is that int() function also deals with string numeric conversion, where trunc() only deals with numeric values.
int('123')
# 123
Where trunc() function will throw an exception
math.trunc('123')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-f9aa08f6d314> in <module>()
----> 1 math.trunc('123')
TypeError: type str doesn't define __trunc__ method
If you know that you only dealing with numeric data, you should consider using trunc() function since it's faster than int()
timeit.timeit("math.trunc(123.456)", setup="import math", number=10_000)
# 0.0011689490056596696
timeit.timeit("int(123.456)", number=10_000)
# 0.0014109049952821806
I used python to write an assignment last week, here is a code snippet
def departTime():
'''
Calculate the time to depart a packet.
'''
if(random.random < 0.8):
t = random.expovariate(1.0 / 2.5)
else:
t = random.expovariate(1.0 / 10.5)
return t
Can you see the problem? I compare random.random with 0.8, which
should be random.random().
Of course this because of my careless, but I don't get it. In my
opinion, this kind of comparison should invoke a least a warning in
any programming language.
So why does python just ignore it and return False?
This isn't always a mistake
Firstly, just to make things clear, this isn't always a mistake.
In this particular case, it's pretty clear the comparison is an error.
However, because of the dynamic nature of Python, consider the following (perfectly valid, if terrible) code:
import random
random.random = 9 # Very weird but legal assignment.
random.random < 10 # True
random.random > 10 # False
What actually happens when comparing objects?
As for your actual case, comparing a function object to a number, have a look at Python documentation: Python Documentation: Expressions. Check out section 5.9, titled "Comparisons", which states:
The operators <, >, ==, >=, <=, and != compare the values of two objects. The objects need not have the same type. If both are numbers, they are converted to a common type. Otherwise, objects of different types always compare unequal, and are ordered consistently but arbitrarily. You can control comparison behavior of objects of non-built-in types by defining a cmp method or rich comparison methods like gt, described in section Special method names.
(This unusual definition of comparison was used to simplify the definition of operations like sorting and the in and not in operators. In the future, the comparison rules for objects of different types are likely to change.)
That should explain both what happens and the reasoning for it.
BTW, I'm not sure what happens in newer versions of Python.
Edit: If you're wondering, Debilski's answer gives info about Python 3.
This is ‘fixed’ in Python 3 http://docs.python.org/3.1/whatsnew/3.0.html#ordering-comparisons.
Because in Python that is a perfectly valid comparison. Python can't know if you really want to make that comparison or if you've just made a mistake. It's your job to supply Python with the right objects to compare.
Because of the dynamic nature of Python you can compare and sort almost everything with almost everything (this is a feature). You've compared a function to a float in this case.
An example:
list = ["b","a",0,1, random.random, random.random()]
print sorted(list)
This will give the following output:
[0, 0.89329568818188976, 1, <built-in method random of Random object at 0x8c6d66c>, 'a', 'b']
I think python allows this because the random.random object could be overriding the > operator by including a __gt__ method in the object which might be accepting or even expecting a number. So, python thinks you know what you are doing... and does not report it.
If you try check for it, you can see that __gt__ exists for random.random...
>>> random.random.__gt__
<method-wrapper '__gt__' of builtin_function_or_method object at 0xb765c06c>
But, that might not be something you want to do.