why does math.log accepts big integer values? - python

from math import log,sqrt
import sys
n = 760 ** 890
print(log(n))
I get a valid result.
Now change log by sqrt and you get (as expected):
OverflowError: int too large to convert to float
So I suppose there's a trick for integer arguments in log function, using integer logarithms but I didn't find that in the documentation. There's just this:
math.log(x[, base])
With one argument, return the natural logarithm of x (to base e).
With two arguments, return the logarithm of x to the given base, calculated as log(x)/log(base).
Where is that documented?

I finally dug into python math lib source code and found this:
/* A decent logarithm is easy to compute even for huge ints, but libm can't
do that by itself -- loghelper can. func is log or log10, and name is
"log" or "log10". Note that overflow of the result isn't possible: an int
can contain no more than INT_MAX * SHIFT bits, so has value certainly less
than 2**(2**64 * 2**16) == 2**2**80, and log2 of that is 2**80, which is
small enough to fit in an IEEE single. log and log10 are even smaller.
However, intermediate overflow is possible for an int if the number of bits
in that int is larger than PY_SSIZE_T_MAX. */
static PyObject*
loghelper(PyObject* arg, double (*func)(double), const char *funcname)
{
/* If it is int, do it ourselves. */
if (PyLong_Check(arg)) {
double x, result;
Py_ssize_t e;
...
I'll spare you the rest of the source (check the link), but what I understand from it is that Python checks if the passed argument is integer, and if it is, don't use math lib (If it is int, do it ourselves.) comment. Also: A decent logarithm is easy to compute even for huge ints, but libm can't do that by itself -- loghelper can
If it's a double, then call native math library.
From the source comments, we see that Python tries the hardest to provide the result even in case of the overflow (Here the conversion to double overflowed, but it's possible to compute the log anyway. Clear the exception and continue)
So thanks to the python wrapping of the log function, Python is able to compute logarithm of huge integers (which is specific to some functions, since some others like sqrt cannot do it), and it's documented, but only in the source code, probably making it an implementation detail as Jon hinted.

I think this thread is useful since python now uses long ints, the trick to avoid overflow is the use of _PyLong_Frexp function see here and an alternative formula to compute the log function even after an OverflowError is raised when trying to convert a long int to a Double, check loghelper at this module.
_PyLong_Frexp returns an approximation to the initial long int arg, given inside loghelper with the help of a double x and an exponent e (arg~x*2**e) and the log is calculated as log~log(x*2**e)=log(x)+log(2)*e. I am missing the specifics of the approximation using x,e but you can find it in the implementation of _PyLong_Frexp in the link provided.

Related

Question on Python treatment of numpy.int32 vs int

In coding up a simple Fibonacci script, I found some 'odd' behaviour in how Python treats numpy.int32 vs how it treats regular int numbers.
Can anyone help me understand what causes this behaviour?
Using the Fibonacci code as follows, leveraging caching to significantly speed things up;
from functools import lru_cache
import numpy as np
#lru_cache(maxsize=None)
def fibo(n):
if n <= 1:
return n
else:
return fibo(n-1)+fibo(n-2)
If I define a Numpy array of numbers to calculate over (with np.arange), it all works well until n = 47, then things start going haywire. If, on the other hand, I use a regular python list, then the values are all correctly calculated
You should be able to see the difference with the following;
fibo(np.int32(47)), fibo(47)
Which should return (at least it does for me);
(-1323752223, 2971215073)
Obviously, something very wrong has occured with the calculations against the numpy.int32 input. Now, I can get around the issue by simply inserting a 'n = int(n)' line in the fibo function before anything else is evaluated, but I dont understand why this is necessary.
I've also tried np.int(47) instead of np.int32(47), and found that the former works just fine. However, using np.arange to create the array seems to default to np.int32 data type.
I've tried removing the caching (I wouldn't recommend you try - it takes around 2 hours to calculate to n = 47) - and I get the same behaviour, so that is not the cause.
Can anyone shed some insight into this for me?
Thanks
Python's "integers have unlimited precision". This was built into the language so that new users have "one less thing to learn".
Though maybe not in your case, or for anyone using NumPy. That library is designed to make computations as fast as possible. It therefore uses data types that are well supported by the CPU architecture, such as 32-bit and 64-bit integers that neatly fit into a CPU register and have an invariable memory footprint.
But then we're back to dealing with overflow problems like in any other programming language. NumPy does warn about that though:
>>> print(fibo(np.int32(47)))
fib.py:9: RuntimeWarning: overflow encountered in long_scalars
return fibo(n-1)+fibo(n-2)
-1323752223
Here we are using a signed 32-bit integer. The largest positive number it can hold is 231 - 1 = 2147483647. But the 47th Fibonacci number is even larger than that, it's 2971215073 as you calculated. In that case, the 32-bit integer overflows and we end up with -1323752223, which is its two's complement:
>>> 2971215073 + 1323752223 == 2**32
True
It worked with np.int because that's just an alias of the built-in int, so it returns a Python integer:
>>> np.int is int
True
For more on this, see: What is the difference between native int type and the numpy.int types?
Also note that np.arange for integer arguments returns an integer array of type np.int_ (with a trailing underscore, unlike np.int). That data type is platform-dependent and maps to 32-bit integers on Windows, but 64-bit on Linux.

Scipy.optimize: parsing error occurs when maxiter is a large integer

I am implementing a shooting method type problem and i used scipy.optimize.bisect from the scipy module.To achieve higher precision i wanted to go to large iteration numbers, but frequently got the error "unable to parse arguments".
It appears that the scipy function is unable to parse 2147483648=2^31 .
This has to be due to the fact that those large integers are stored as 64 bit instead of 32 bit numbers, but there must be a reason to circumvent this, right? is there anything i can do to have scipy accept large integers?
It seems unlikely that scipy would just straight up breaks down at those iteration numbers.
Help is appreciated!
code example:
#maxN=int(2**31)
maxN=int(2**31-1)
A=0
B=1
scipy.optimize.bisect(lambda x: x**2, a, b,maxiter=maxN)
If I set maxN to a number smaller than 2^31 everything works, but anything bigger than that leads to the error described above.
Under the hood, scipy.optimize.bisect calls the this C function with signature:
double
bisect(callback_type f, double xa, double xb, double xtol, double rtol,
int iter, void *func_data, scipy_zeros_info *solver_stats)
and typedefs
typedef struct {
int funcalls;
int iterations;
int error_num;
} scipy_zeros_info;
typedef double (*callback_type)(double, void*);
typedef double (*solver_type)(callback_type, double, double, double, double,
int, void *, scipy_zeros_info*);
where iter is the maximal number of allowed iterations. Since a int is exactly 32 bits (4 bytes) in size on most platforms, you can't pass a value larger than 2^31-1 as you already observed.
However, you can easily write your own bisect function with Cython. You only need to change the function signatures, i.e. the type of iter, iterations and the loop variable i to long long.

Cython returns 0 for expression that should evaluate to 0.5?

For some reason, Cython is returning 0 on a math expression that should evaluate to 0.5:
print(2 ** (-1)) # prints 0
Oddly enough, mix variables in and it'll work as expected:
i = 1
print(2 ** (-i)) # prints 0.5
Vanilla CPython returns 0.5 for both cases. I'm compiling for 37m-x86_64-linux-gnu, and language_level is set to 3.
What is this witchcraft?
It's because it's using C ints rather than Python integers so it matches C behaviour rather than Python behaviour. I'm relatively sure this used to be documented as a limitation somewhere but I can't find it now. If you want to report it as a bug then go to https://github.com/cython/cython/issues, but I suspect this is a deliberate trade-off of speed for compatibility.
The code gets translated to
__Pyx_pow_long(2, -1L)
where __Pyx_pow_long is a function of type static CYTHON_INLINE long __Pyx_pow_long(long b, long e).
The easiest way to fix it is to change one/both of the numbers to be a floating point number
print(2. ** (-1))
As a general comment on the design choice: people from the C world generally expect int operator int to return an int, and this option will be fastest. Python had tried to do this in the past with the Python 2 division behaviour (but inconsistently - power always returned a floating point number).
Cython generally tries to follow Python behaviour. However, a lot of people are using it for speed so they also try to fall back to quick, C-like operations especially when people specify types (since those people want speed). I think what's happened here is that it's been able to infer the types automatically, and so defaulted to C behaviour. I suspect ideally it should distinguish between specified types and types that it's inferred. However, it's also probably too late to start changing that.
It looks like Cython is incorrectly inferring the final data type as int rather than float when only numbers are involved
The following code works as expected:
print(2.0 ** (-1))
See this link for a related discussion: https://groups.google.com/forum/#!topic/cython-users/goVpote2ScY

What's the rationale behind 2.5 // 2.0 returning a float rather than an int in Python 3.x?

What's the rationale behind 2.5 // 2.0 returning a float rather than an int in Python 3.x?
If it's an integral value, why not put it in a type int object?
[edit]
I am looking for a justification of the fact that this is so. What were the arguments in making it this way. Haven't been able to find them yet.
[edit2]
The relation with floor is more problematic than the term "floor division" suggests!
floor(3.5 / 5.5) == 0 (int)
whereas
3.5 // 5.5 == 0.0 (float)
Can not yet discern any logic here :(
[edit3]
From PEP238:
In a
unified model, the integer 1 should be indistinguishable from the
floating point number 1.0 (except for its inexactness), and both
should behave the same in all numeric contexts.
All very nice, but a not unimportant library like Numpy complains when offering floats as indices, even if they're integral. So 'indistinguishable' is not reality yet. Spent some time hunting a bug in connection with this. I was very surprise to learn about the true nature of //. And it wasn't that obvious from the docs (for me).
Since I've quite some trust in the design of Python 3.x, I thought I must have missed a very obvious reason to define // in this way. But now I wonder...
The // operator is covered in PEP 238. First of all, note that it's not "integer division" but "floor division", i.e. it is never claimed that the result would be an integer.
From the section on Semantics of Floor Division:
Floor division will be implemented in all the Python numeric types, and will have the semantics of
a // b == floor(a/b)
except that the result type will be the common type into which a and b are coerced before the operation.
And later:
For floating point inputs, the result is a float. For example:
3.5//2.0 == 1.0
The rationale behind this decision is not explicitly stated (or I could not find it). However, the way it is implemented, it is consistent with the other mathematical operations (emphasis mine):
Specifically, if a and b are of the same type, a//b will be of that type too. If the inputs are of different types, they are first coerced to a common type using the same rules used for all other arithmetic operators.
Also, if the results would be automatically converted to int, that could yield weird and surprising results for very large floating point numbers that are beyond integer precision:
>>> 1e30 // 2.
5e+29
>>> int(1e30 // 2.)
500000000000000009942312419328

Counterintuitive behaviour of int() in python

It's clearly stated in the docs that int(number) is a flooring type conversion:
int(1.23)
1
and int(string) returns an int if and only if the string is an integer literal.
int('1.23')
ValueError
int('1')
1
Is there any special reason for that? I find it counterintuitive that the function floors in one case, but not the other.
There is no special reason. Python is simply applying its general principle of not performing implicit conversions, which are well-known causes of problems, particularly for newcomers, in languages such as Perl and Javascript.
int(some_string) is an explicit request to convert a string to integer format; the rules for this conversion specify that the string must contain a valid integer literal representation. int(float) is an explicit request to convert a float to an integer; the rules for this conversion specify that the float's fractional portion will be truncated.
In order for int("3.1459") to return 3 the interpreter would have to implicitly convert the string to a float. Since Python doesn't support implicit conversions, it chooses to raise an exception instead.
This is almost certainly a case of applying three of the principles from the Zen of Python:
Explicit is better implicit.
[...] practicality beats purity
Errors should never pass silently
Some percentage of the time, someone doing int('1.23') is calling the wrong conversion for their use case, and wants something like float or decimal.Decimal instead. In these cases, it's clearly better for them to get an immediate error that they can fix, rather than silently giving the wrong value.
In the case that you do want to truncate that to an int, it is trivial to explicitly do so by passing it through float first, and then calling one of int, round, trunc, floor or ceil as appropriate. This also makes your code more self-documenting, guarding against a later modification "correcting" a hypothetical silently-truncating int call to float by making it clear that the rounded value is what you want.
Sometimes a thought experiment can be useful.
Behavior A: int('1.23') fails with an error. This is the existing behavior.
Behavior B: int('1.23') produces 1 without error. This is what you're proposing.
With behavior A, it's straightforward and trivial to get the effect of behavior B: use int(float('1.23')) instead.
On the other hand, with behavior B, getting the effect of behavior A is significantly more complicated:
def parse_pure_int(s):
if "." in s:
raise ValueError("invalid literal for integer with base 10: " + s)
return int(s)
(and even with the code above, I don't have complete confidence that there isn't some corner case that it mishandles.)
Behavior A therefore is more expressive than behavior B.
Another thing to consider: '1.23' is a string representation of a floating-point value. Converting '1.23' to an integer conceptually involves two conversions (string to float to integer), but int(1.23) and int('1') each involve only one conversion.
Edit:
And indeed, there are corner cases that the above code would not handle: 1e-2 and 1E-2 are both floating point values too.
In simple words - they're not the same function.
int( decimal ) behaves as 'floor i.e. knock off the decimal portion and return as int'
int( string ) behaves as 'this text describes an integer, convert it and return as int'.
They are 2 different functions with the same name that return an integer but they are different functions.
'int' is short and easy to remember and its meaning applied to each type is intuitive to most programmers which is why they chose it.
There's no implication they are providing the same or combined functionality, they simply have the same name and return the same type. They could as easily be called 'floorDecimalAsInt' and 'convertStringToInt', but they went for 'int' because it's easy to remember, (99%) intuitive and confusion would rarely occur.
Parsing text as an Integer for text which included a decimal point such as "4.5" would throw an error in majority of computer languages and be expected to throw an error by majority of programmers, since the text-value does not represent an integer and implies they are providing erroneous data

Categories