Question on Python treatment of numpy.int32 vs int

Question on Python treatment of numpy.int32 vs int - python

In coding up a simple Fibonacci script, I found some 'odd' behaviour in how Python treats numpy.int32 vs how it treats regular int numbers.
Can anyone help me understand what causes this behaviour?
Using the Fibonacci code as follows, leveraging caching to significantly speed things up;
from functools import lru_cache
import numpy as np
#lru_cache(maxsize=None)
def fibo(n):
if n <= 1:
return n
else:
return fibo(n-1)+fibo(n-2)
If I define a Numpy array of numbers to calculate over (with np.arange), it all works well until n = 47, then things start going haywire. If, on the other hand, I use a regular python list, then the values are all correctly calculated
You should be able to see the difference with the following;
fibo(np.int32(47)), fibo(47)
Which should return (at least it does for me);
(-1323752223, 2971215073)
Obviously, something very wrong has occured with the calculations against the numpy.int32 input. Now, I can get around the issue by simply inserting a 'n = int(n)' line in the fibo function before anything else is evaluated, but I dont understand why this is necessary.
I've also tried np.int(47) instead of np.int32(47), and found that the former works just fine. However, using np.arange to create the array seems to default to np.int32 data type.
I've tried removing the caching (I wouldn't recommend you try - it takes around 2 hours to calculate to n = 47) - and I get the same behaviour, so that is not the cause.
Can anyone shed some insight into this for me?
Thanks

Python's "integers have unlimited precision". This was built into the language so that new users have "one less thing to learn".
Though maybe not in your case, or for anyone using NumPy. That library is designed to make computations as fast as possible. It therefore uses data types that are well supported by the CPU architecture, such as 32-bit and 64-bit integers that neatly fit into a CPU register and have an invariable memory footprint.
But then we're back to dealing with overflow problems like in any other programming language. NumPy does warn about that though:
>>> print(fibo(np.int32(47)))
fib.py:9: RuntimeWarning: overflow encountered in long_scalars
return fibo(n-1)+fibo(n-2)
-1323752223
Here we are using a signed 32-bit integer. The largest positive number it can hold is 231 - 1 = 2147483647. But the 47th Fibonacci number is even larger than that, it's 2971215073 as you calculated. In that case, the 32-bit integer overflows and we end up with -1323752223, which is its two's complement:
>>> 2971215073 + 1323752223 == 2**32
True
It worked with np.int because that's just an alias of the built-in int, so it returns a Python integer:
>>> np.int is int
True
For more on this, see: What is the difference between native int type and the numpy.int types?
Also note that np.arange for integer arguments returns an integer array of type np.int_ (with a trailing underscore, unlike np.int). That data type is platform-dependent and maps to 32-bit integers on Windows, but 64-bit on Linux.

Related

Cython returns 0 for expression that should evaluate to 0.5?

For some reason, Cython is returning 0 on a math expression that should evaluate to 0.5:
print(2 ** (-1)) # prints 0
Oddly enough, mix variables in and it'll work as expected:
i = 1
print(2 ** (-i)) # prints 0.5
Vanilla CPython returns 0.5 for both cases. I'm compiling for 37m-x86_64-linux-gnu, and language_level is set to 3.
What is this witchcraft?

It's because it's using C ints rather than Python integers so it matches C behaviour rather than Python behaviour. I'm relatively sure this used to be documented as a limitation somewhere but I can't find it now. If you want to report it as a bug then go to https://github.com/cython/cython/issues, but I suspect this is a deliberate trade-off of speed for compatibility.
The code gets translated to
__Pyx_pow_long(2, -1L)
where __Pyx_pow_long is a function of type static CYTHON_INLINE long __Pyx_pow_long(long b, long e).
The easiest way to fix it is to change one/both of the numbers to be a floating point number
print(2. ** (-1))
As a general comment on the design choice: people from the C world generally expect int operator int to return an int, and this option will be fastest. Python had tried to do this in the past with the Python 2 division behaviour (but inconsistently - power always returned a floating point number).
Cython generally tries to follow Python behaviour. However, a lot of people are using it for speed so they also try to fall back to quick, C-like operations especially when people specify types (since those people want speed). I think what's happened here is that it's been able to infer the types automatically, and so defaulted to C behaviour. I suspect ideally it should distinguish between specified types and types that it's inferred. However, it's also probably too late to start changing that.

It looks like Cython is incorrectly inferring the final data type as int rather than float when only numbers are involved
The following code works as expected:
print(2.0 ** (-1))
See this link for a related discussion: https://groups.google.com/forum/#!topic/cython-users/goVpote2ScY

why does math.log accepts big integer values?

from math import log,sqrt
import sys
n = 760 ** 890
print(log(n))
I get a valid result.
Now change log by sqrt and you get (as expected):
OverflowError: int too large to convert to float
So I suppose there's a trick for integer arguments in log function, using integer logarithms but I didn't find that in the documentation. There's just this:
math.log(x[, base])
With one argument, return the natural logarithm of x (to base e).
With two arguments, return the logarithm of x to the given base, calculated as log(x)/log(base).
Where is that documented?

I finally dug into python math lib source code and found this:
/* A decent logarithm is easy to compute even for huge ints, but libm can't
do that by itself -- loghelper can. func is log or log10, and name is
"log" or "log10". Note that overflow of the result isn't possible: an int
can contain no more than INT_MAX * SHIFT bits, so has value certainly less
than 2**(2**64 * 2**16) == 2**2**80, and log2 of that is 2**80, which is
small enough to fit in an IEEE single. log and log10 are even smaller.
However, intermediate overflow is possible for an int if the number of bits
in that int is larger than PY_SSIZE_T_MAX. */
static PyObject*
loghelper(PyObject* arg, double (*func)(double), const char *funcname)
{
/* If it is int, do it ourselves. */
if (PyLong_Check(arg)) {
double x, result;
Py_ssize_t e;
...
I'll spare you the rest of the source (check the link), but what I understand from it is that Python checks if the passed argument is integer, and if it is, don't use math lib (If it is int, do it ourselves.) comment. Also: A decent logarithm is easy to compute even for huge ints, but libm can't do that by itself -- loghelper can
If it's a double, then call native math library.
From the source comments, we see that Python tries the hardest to provide the result even in case of the overflow (Here the conversion to double overflowed, but it's possible to compute the log anyway. Clear the exception and continue)
So thanks to the python wrapping of the log function, Python is able to compute logarithm of huge integers (which is specific to some functions, since some others like sqrt cannot do it), and it's documented, but only in the source code, probably making it an implementation detail as Jon hinted.

I think this thread is useful since python now uses long ints, the trick to avoid overflow is the use of _PyLong_Frexp function see here and an alternative formula to compute the log function even after an OverflowError is raised when trying to convert a long int to a Double, check loghelper at this module.
_PyLong_Frexp returns an approximation to the initial long int arg, given inside loghelper with the help of a double x and an exponent e (arg~x*2**e) and the log is calculated as log~log(x*2**e)=log(x)+log(2)*e. I am missing the specifics of the approximation using x,e but you can find it in the implementation of _PyLong_Frexp in the link provided.

Inaccurate Large Fibonacci Numbers in Python

I am currently implementing this simple code trying to find the n-th element of the Fibonacci sequence using Python 2.7:
import numpy as np
def fib(n):
F = np.empty(n+2)
F[1] = 1
F[0] = 0
for i in range(2,n+1):
F[i]=F[i-1]+F[i-2]
return int(F[n])
This works fine for F < 79, but after that I get wrong numbers. For example, according to wolfram alpha F79 should be equal to 14472334024676221, but fib(100) gives me 14472334024676220. I think this could be caused by the way python deals with integers, but I have no idea what exactly the problem is. Any help is greatly appreciated!

the default data type for a numpy array is depending on architecture a 64 (or 32) bit int.
pure python would let you have arbitrarily long integers; numpy does not.
so it's more the way numpy deals with integers; pure python would do just fine.

Python will deal with integers perfectly fine here. Indeed, that is the beauty of python. numpy, on the other hand, introduces ugliness and just happens to be completely unnecessary, and will likely slow you down. Your implementation will also require much more space. Python allows you to write beautiful, readable code. Here is Raymond Hettinger's canonical implementation of iterative fibonacci in Python:
def fib(n):
x, y = 0, 1
for _ in range(n):
x, y = y, x + y
return x
That is O(n) time and constant space. It is beautiful, readable, and succinct. It will also give you the correct integer as long as you have memory to store the number on your machine. Learn to use numpy when it is the appropriate tool, and as importantly, learn to not use it when it is inappropriate.

Unless you want to generate a list with all the fibonacci numbers until Fn, there is no need to use a list, numpy or anything else like that, a simple loop and 2 variables will be enough as you only really need to know the 2 previous values
def fib(n):
Fk, Fk1 = 0, 1
for _ in range(n):
Fk, Fk1 = Fk1, Fk+Fk1
return Fk
of course, there is better ways to do it using the mathematical properties of the Fibonacci numbers, with those we know that there is a matrix that give us the right result
import numpy
def fib_matrix(n):
mat = numpy.matrix( [[1,1],[1,0]], dtype=object) ** n
return mat[0,1]
to which I assume they have an optimized matrix exponentiation making it more efficient that the previous method.
Using the properties of the underlying Lucas sequence is possible to do it without the matriz, and equally as efficient as exponentiation by squaring and with the same number of variables as the other, but that is a little harder to understand at first glance unlike the first example because alongside the second example it require more mathematical.
The close form, the one with the golden ratio, will give you the result even faster, but that have the risk of being inaccurate because the use of floating point arithmetic.

As an additional word to the previous answer by hiro protagonist, note that if using Numpy is a requirement, you can solve very easely your issue by replacing:
F = np.empty(n+2)
with
F = np.empty(n+2, dtype=object)
but it will not do anything more than transferring back the computation to pure Python.

Counterintuitive behaviour of int() in python

It's clearly stated in the docs that int(number) is a flooring type conversion:
int(1.23)
1
and int(string) returns an int if and only if the string is an integer literal.
int('1.23')
ValueError
int('1')
1
Is there any special reason for that? I find it counterintuitive that the function floors in one case, but not the other.

There is no special reason. Python is simply applying its general principle of not performing implicit conversions, which are well-known causes of problems, particularly for newcomers, in languages such as Perl and Javascript.
int(some_string) is an explicit request to convert a string to integer format; the rules for this conversion specify that the string must contain a valid integer literal representation. int(float) is an explicit request to convert a float to an integer; the rules for this conversion specify that the float's fractional portion will be truncated.
In order for int("3.1459") to return 3 the interpreter would have to implicitly convert the string to a float. Since Python doesn't support implicit conversions, it chooses to raise an exception instead.

This is almost certainly a case of applying three of the principles from the Zen of Python:
Explicit is better implicit.
[...] practicality beats purity
Errors should never pass silently
Some percentage of the time, someone doing int('1.23') is calling the wrong conversion for their use case, and wants something like float or decimal.Decimal instead. In these cases, it's clearly better for them to get an immediate error that they can fix, rather than silently giving the wrong value.
In the case that you do want to truncate that to an int, it is trivial to explicitly do so by passing it through float first, and then calling one of int, round, trunc, floor or ceil as appropriate. This also makes your code more self-documenting, guarding against a later modification "correcting" a hypothetical silently-truncating int call to float by making it clear that the rounded value is what you want.

Sometimes a thought experiment can be useful.
Behavior A: int('1.23') fails with an error. This is the existing behavior.
Behavior B: int('1.23') produces 1 without error. This is what you're proposing.
With behavior A, it's straightforward and trivial to get the effect of behavior B: use int(float('1.23')) instead.
On the other hand, with behavior B, getting the effect of behavior A is significantly more complicated:
def parse_pure_int(s):
if "." in s:
raise ValueError("invalid literal for integer with base 10: " + s)
return int(s)
(and even with the code above, I don't have complete confidence that there isn't some corner case that it mishandles.)
Behavior A therefore is more expressive than behavior B.
Another thing to consider: '1.23' is a string representation of a floating-point value. Converting '1.23' to an integer conceptually involves two conversions (string to float to integer), but int(1.23) and int('1') each involve only one conversion.
Edit:
And indeed, there are corner cases that the above code would not handle: 1e-2 and 1E-2 are both floating point values too.

In simple words - they're not the same function.
int( decimal ) behaves as 'floor i.e. knock off the decimal portion and return as int'
int( string ) behaves as 'this text describes an integer, convert it and return as int'.
They are 2 different functions with the same name that return an integer but they are different functions.
'int' is short and easy to remember and its meaning applied to each type is intuitive to most programmers which is why they chose it.
There's no implication they are providing the same or combined functionality, they simply have the same name and return the same type. They could as easily be called 'floorDecimalAsInt' and 'convertStringToInt', but they went for 'int' because it's easy to remember, (99%) intuitive and confusion would rarely occur.
Parsing text as an Integer for text which included a decimal point such as "4.5" would throw an error in majority of computer languages and be expected to throw an error by majority of programmers, since the text-value does not represent an integer and implies they are providing erroneous data

How do I ONLY round a number/float down in Python?

I will have this random number generated e.g 12.75 or 1.999999999 or 2.65
I want to always round this number down to the nearest integer whole number so 2.65 would be rounded to 2.
Sorry for asking but I couldn't find the answer after numerous searches, thanks :)

You can us either int(), math.trunc(), or math.floor(). They all will do what you want for positive numbers:
>>> import math
>>> math.floor(12.6) # returns 12.0 in Python 2
12
>>> int(12.6)
12
>>> math.trunc(12.6)
12
However, note that they behave differently with negative numbers: int and math.trunc will go to 0, whereas math.floor always floors downwards:
>>> import math
>>> math.floor(-12.6) # returns -13.0 in Python 2
-13
>>> int(-12.6)
-12
>>> math.trunc(-12.6)
-12
Note that math.floor and math.ceil used to return floats in Python 2.
Also note that int and math.trunc will both (at first glance) appear to do the same thing, though their exact semantics differ. In short: int is for general/type conversion and math.trunc is specifically for numeric types (and will help make your intent more clear).
Use int if you don't really care about the difference, if you want to convert strings, or if you don't want to import a library. Use trunc if you want to be absolutely unambiguous about what you mean or if you want to ensure your code works correctly for non-builtin types.
More info below:
Math.floor() in Python 2 vs Python 3
Note that math.floor (and math.ceil) were changed slightly from Python 2 to Python 3 -- in Python 2, both functions will return a float instead of an int. This was changed in Python 3 so that both methods return an int (more specifically, they call the __float__ method on whatever object they were given). So then, if you're using Python 2, or would like your code to maintain compatibility between the two versions, it would generally be safe to do int(math.floor(...)).
For more information about why this change was made + about the potential pitfalls of doing int(math.floor(...)) in Python 2, see
Why do Python's math.ceil() and math.floor() operations return floats instead of integers?
int vs math.trunc()
At first glance, the int() and math.trunc() methods will appear to be identical. The primary differences are:
int(...)
The int function will accept floats, strings, and ints.
Running int(param) will call the param.__int__() method in order to perform the conversion (and then will try calling __trunc__ if __int__ is undefined)
The __int__ magic method was not always unambiguously defined -- for some period of time, it turned out that the exact semantics and rules of how __int__ should work were largely left up to the implementing class.
The int function is meant to be used when you want to convert a general object into an int. It's a type conversion method. For example, you can convert strings to ints by doing int("42") (or do things like change of base: int("AF", 16) -> 175).
math.trunc(...)
The trunc will only accept numeric types (ints, floats, etc)
Running math.trunc(param) function will call the param.__trunc__() method in order to perform the conversion
The exact behavior and semantics of the __trunc__ magic method was precisely defined in PEP 3141 (and more specifically in the Changes to operations and __magic__ methods section).
The math.trunc function is meant to be used when you want to take an existing real number and specifically truncate and remove its decimals to produce an integral type. This means that unlike int, math.trunc is a purely numeric operation.
All that said, it turns out all of Python's built-in types will behave exactly the same whether you use int or trunc. This means that if all you're doing is using regular ints, floats, fractions, and decimals, you're free to use either int or trunc.
However, if you want to be very precise about what exactly your intent is (ie if you want to make it absolutely clear whether you're flooring or truncating), or if you're working with custom numeric types that have different implementations for __int__ and __trunc__, then it would probably be best to use math.trunc.
You can also find more information and debate about this topic on Python's developer mailing list.

you can do this easily with a built in python functions, just use two forward slashes and divide by 1.
>>> print 12.75//1
12.0
>>> print 1.999999999//1
1.0
>>> print 2.65//1
2.0

No need to import any module like math etc....
python bydeafault it convert if you do simply type cast by integer
>>>x=2.65
>>>int(x)
2

I'm not sure whether you want math.floor, math.trunc, or int, but... it's almost certainly one of those functions, and you can probably read the docs and decide more easily than you can explain enough for usb to decide for you.

Obviously, Michael0x2a's answer is what you should do. But, you can always get a bit creative.
int(str(12.75).split('.')[0])

If you only looking for the nearest integer part I think the best option would be to use math.trunc() function.
import math
math.trunc(123.456)
You can also use int()
int(123.456)
The difference between these two functions is that int() function also deals with string numeric conversion, where trunc() only deals with numeric values.
int('123')
# 123
Where trunc() function will throw an exception
math.trunc('123')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-f9aa08f6d314> in <module>()
----> 1 math.trunc('123')
TypeError: type str doesn't define __trunc__ method
If you know that you only dealing with numeric data, you should consider using trunc() function since it's faster than int()
timeit.timeit("math.trunc(123.456)", setup="import math", number=10_000)
# 0.0011689490056596696
timeit.timeit("int(123.456)", number=10_000)
# 0.0014109049952821806

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Question on Python treatment of numpy.int32 vs int - python

Related

Cython returns 0 for expression that should evaluate to 0.5?

why does math.log accepts big integer values?

Inaccurate Large Fibonacci Numbers in Python

Counterintuitive behaviour of int() in python

How do I ONLY round a number/float down in Python?

Categories

Resources