Integer overflow in Python3 - python

I'm new to Python, I was reading this page where I saw a weird statement:
if n+1 == n: # catch a value like 1e300
raise OverflowError("n too large")
x equals to a number greater than it?! I sense a disturbance in the Force.
I know that in Python 3, integers don't have fixed byte length. Thus, there's no integer overflow, like how C's int works. But of course the memory can't store infinite data.
I think that's why the result of n+1 can be the same as n: Python can't allocate more memory to preform the summation, so it is skipped, and n == n is true. Is that correct?
If so, this could lead to incorrect result of the program. Why don't Python raise an error when operations are not possible, just like C++'s std::bad_alloc?
Even if n is not too large and the check evaluates to false, result - due to the multiplication - would need much more bytes. Could result *= factor fail for the same reason?
I found it in the offical Python documentation. Is it really the correct way to check big integers / possible integer "overflow"?

Python3
Only floats have
a hard limit in python. Integers are implemented as “long” integer objects of arbitrary size in python3 and do not normally overflow.
You can test that behavior with the following code
import sys
i = sys.maxsize
print(i)
# 9223372036854775807
print(i == i + 1)
# False
i += 1
print(i)
# 9223372036854775808
f = sys.float_info.max
print(f)
# 1.7976931348623157e+308
print(f == f + 1)
# True
f += 1
print(f)
# 1.7976931348623157e+308
You may also want to take a look at sys.float_info and sys.maxsize
Python2
In python2 integers are automatically casted to long integers if too large as described in the documentation for numeric types
import sys
i = sys.maxsize
print type(i)
# <type 'int'>
i += 1
print type(i)
# <type 'long'>
Could result *= factor fail for the same reason?
Why not try it?
import sys
i = 2
i *= sys.float_info.max
print i
# inf
Python has a special float value for infinity (and negative infinity too) as described in the docs for float

Integers don't work that way in Python.
But float does. That is also why the comment says 1e300, which is a float in scientific notation.

I had a problem of with integer overlflows in python3, but when I inspected the types, I understood the reason:
import numpy as np
a = np.array([3095693933], dtype=int)
s = np.sum(a)
print(s)
# 3095693933
s * s
# -8863423146896543127
print(type(s))
# numpy.int64
py_s = int(s)
py_s * py_s
# 9583320926813008489
Some pandas and numpy functions, such as sum on arrays or Series return an np.int64 so this might be the reason you are seeing int overflows in Python3.

Related

How are integer truncated for Python hash() function? [duplicate]

I've been playing with Python's hash function. For small integers, it appears hash(n) == n always. However this does not extend to large numbers:
>>> hash(2**100) == 2**100
False
I'm not surprised, I understand hash takes a finite range of values. What is that range?
I tried using binary search to find the smallest number hash(n) != n
>>> import codejamhelpers # pip install codejamhelpers
>>> help(codejamhelpers.binary_search)
Help on function binary_search in module codejamhelpers.binary_search:
binary_search(f, t)
Given an increasing function :math:`f`, find the greatest non-negative integer :math:`n` such that :math:`f(n) \le t`. If :math:`f(n) > t` for all :math:`n \ge 0`, return None.
>>> f = lambda n: int(hash(n) != n)
>>> n = codejamhelpers.binary_search(f, 0)
>>> hash(n)
2305843009213693950
>>> hash(n+1)
0
What's special about 2305843009213693951? I note it's less than sys.maxsize == 9223372036854775807
Edit: I'm using Python 3. I ran the same binary search on Python 2 and got a different result 2147483648, which I note is sys.maxint+1
I also played with [hash(random.random()) for i in range(10**6)] to estimate the range of hash function. The max is consistently below n above. Comparing the min, it seems Python 3's hash is always positively valued, whereas Python 2's hash can take negative values.
2305843009213693951 is 2^61 - 1. It's the largest Mersenne prime that fits into 64 bits.
If you have to make a hash just by taking the value mod some number, then a large Mersenne prime is a good choice -- it's easy to compute and ensures an even distribution of possibilities. (Although I personally would never make a hash this way)
It's especially convenient to compute the modulus for floating point numbers. They have an exponential component that multiplies the whole number by 2^x. Since 2^61 = 1 mod 2^61-1, you only need to consider the (exponent) mod 61.
See: https://en.wikipedia.org/wiki/Mersenne_prime
Based on python documentation in pyhash.c file:
For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types.
So for a 64/32 bit machine, the reduction would be 2 _PyHASH_BITS - 1, but what is _PyHASH_BITS?
You can find it in pyhash.h header file which for a 64 bit machine has been defined as 61 (you can read more explanation in pyconfig.h file).
#if SIZEOF_VOID_P >= 8
# define _PyHASH_BITS 61
#else
# define _PyHASH_BITS 31
#endif
So first off all it's based on your platform for example in my 64bit Linux platform the reduction is 261-1, which is 2305843009213693951:
>>> 2**61 - 1
2305843009213693951
Also You can use math.frexp in order to get the mantissa and exponent of sys.maxint which for a 64 bit machine shows that max int is 263:
>>> import math
>>> math.frexp(sys.maxint)
(0.5, 64)
And you can see the difference by a simple test:
>>> hash(2**62) == 2**62
True
>>> hash(2**63) == 2**63
False
Read the complete documentation about python hashing algorithm https://github.com/python/cpython/blob/master/Python/pyhash.c#L34
As mentioned in comment you can use sys.hash_info (in python 3.X) which will give you a struct sequence of parameters used for computing
hashes.
>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>>
Alongside the modulus that I've described in preceding lines, you can also get the inf value as following:
>>> hash(float('inf'))
314159
>>> sys.hash_info.inf
314159
Hash function returns plain int that means that returned value is greater than -sys.maxint and lower than sys.maxint, which means if you pass sys.maxint + x to it result would be -sys.maxint + (x - 2).
hash(sys.maxint + 1) == sys.maxint + 1 # False
hash(sys.maxint + 1) == - sys.maxint -1 # True
hash(sys.maxint + sys.maxint) == -sys.maxint + sys.maxint - 2 # True
Meanwhile 2**200 is a n times greater than sys.maxint - my guess is that hash would go over range -sys.maxint..+sys.maxint n times until it stops on plain integer in that range, like in code snippets above..
So generally, for any n <= sys.maxint:
hash(sys.maxint*n) == -sys.maxint*(n%2) + 2*(n%2)*sys.maxint - n/2 - (n + 1)%2 ## True
Note: this is true for python 2.
The implementation for the int type in cpython can be found here.
It just returns the value, except for -1, than it returns -2:
static long
int_hash(PyIntObject *v)
{
/* XXX If this is changed, you also need to change the way
Python's long, float and complex types are hashed. */
long x = v -> ob_ival;
if (x == -1)
x = -2;
return x;
}

division of very large integers in Python using SageMath

I'm implementing Wiener's Exponent Attack using Python and SageMath.
My code is as follows
from sage.all import *
# constants
b = some_very_large_number
n = some_very_large_number
b_over_n = continued_fraction(b/n)
while True:
t_over_a = b_over_n.convergent(i+1)
t = t_over_a.numerator()
a = t_over_a.denominator()
# check if t divides ab-1
if ((t != 0) and (gcd(a*b-1, t)== t)):
print("Found i: ", i)
break
i += 1
I found out that the loop would not end forever and added this line of code before the while loop.
print(b_over_n.convergent(5))
And I found that b_over_n was always returning 0 no matter what.
I also printed out type(b_over_n) and checked it was of 'long' type.
I have checked SageMath manuals but haven't found anything useful yet.
Is there something I'm doing wrong here?
It turns out I was using Python2, where int/int would return int.
Thus since b was smaller than n in my case, b/n automatically returned 0.

Should I use float literals to represent integer numbers as floats in Python?

Consider a function that is to return a half of a float argument. Should it better be the first or the second?
def half_a(x: float) -> float:
return x / 2
def half_b(x: float) -> float:
return x / 2.0
Is there any performance difference or is there a style convention that would say one of these is better than the other?
Clearly half_a looks better and a more complex piece of code may get more readable written this way but in some other languages it is either necessary or preferable to use the half_b version to avoid run-time type conversion.
It's hard to know if there's a performance difference (and if there is, it's definitely negligible). Regarding style, there is also no common convention. However, I would choose the first one, if you are on python 3+. Python 3 has a different operator for integer division. See below
x = 2
print(type(x)) # int
print(type(x / 2)) # float
print(type(x // 2)) # int
On the other hand, if you are on python 2, you should probably choose the second one, because if your argument happens to be an int
print(2/5) # 0
float divided by float is slightly faster than float divided by int:
>>> timeit.timeit('n/2', 'n=123456.789')
0.04134701284306175
>>> timeit.timeit('n/2.0', 'n=123456.789')
0.03455621766488548
>>> timeit.timeit('[n/2 for n in r]', 'r = [n*5/1.1 for n in range(1, 10001)]', number=10000)
5.177127423787169
>>> timeit.timeit('[n/2.0 for n in r]', 'r = [n*5/1.1 for n in range(1, 10001)]', number=10000)
4.067747102254316

When is hash(n) == n in Python?

I've been playing with Python's hash function. For small integers, it appears hash(n) == n always. However this does not extend to large numbers:
>>> hash(2**100) == 2**100
False
I'm not surprised, I understand hash takes a finite range of values. What is that range?
I tried using binary search to find the smallest number hash(n) != n
>>> import codejamhelpers # pip install codejamhelpers
>>> help(codejamhelpers.binary_search)
Help on function binary_search in module codejamhelpers.binary_search:
binary_search(f, t)
Given an increasing function :math:`f`, find the greatest non-negative integer :math:`n` such that :math:`f(n) \le t`. If :math:`f(n) > t` for all :math:`n \ge 0`, return None.
>>> f = lambda n: int(hash(n) != n)
>>> n = codejamhelpers.binary_search(f, 0)
>>> hash(n)
2305843009213693950
>>> hash(n+1)
0
What's special about 2305843009213693951? I note it's less than sys.maxsize == 9223372036854775807
Edit: I'm using Python 3. I ran the same binary search on Python 2 and got a different result 2147483648, which I note is sys.maxint+1
I also played with [hash(random.random()) for i in range(10**6)] to estimate the range of hash function. The max is consistently below n above. Comparing the min, it seems Python 3's hash is always positively valued, whereas Python 2's hash can take negative values.
2305843009213693951 is 2^61 - 1. It's the largest Mersenne prime that fits into 64 bits.
If you have to make a hash just by taking the value mod some number, then a large Mersenne prime is a good choice -- it's easy to compute and ensures an even distribution of possibilities. (Although I personally would never make a hash this way)
It's especially convenient to compute the modulus for floating point numbers. They have an exponential component that multiplies the whole number by 2^x. Since 2^61 = 1 mod 2^61-1, you only need to consider the (exponent) mod 61.
See: https://en.wikipedia.org/wiki/Mersenne_prime
Based on python documentation in pyhash.c file:
For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types.
So for a 64/32 bit machine, the reduction would be 2 _PyHASH_BITS - 1, but what is _PyHASH_BITS?
You can find it in pyhash.h header file which for a 64 bit machine has been defined as 61 (you can read more explanation in pyconfig.h file).
#if SIZEOF_VOID_P >= 8
# define _PyHASH_BITS 61
#else
# define _PyHASH_BITS 31
#endif
So first off all it's based on your platform for example in my 64bit Linux platform the reduction is 261-1, which is 2305843009213693951:
>>> 2**61 - 1
2305843009213693951
Also You can use math.frexp in order to get the mantissa and exponent of sys.maxint which for a 64 bit machine shows that max int is 263:
>>> import math
>>> math.frexp(sys.maxint)
(0.5, 64)
And you can see the difference by a simple test:
>>> hash(2**62) == 2**62
True
>>> hash(2**63) == 2**63
False
Read the complete documentation about python hashing algorithm https://github.com/python/cpython/blob/master/Python/pyhash.c#L34
As mentioned in comment you can use sys.hash_info (in python 3.X) which will give you a struct sequence of parameters used for computing
hashes.
>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>>
Alongside the modulus that I've described in preceding lines, you can also get the inf value as following:
>>> hash(float('inf'))
314159
>>> sys.hash_info.inf
314159
Hash function returns plain int that means that returned value is greater than -sys.maxint and lower than sys.maxint, which means if you pass sys.maxint + x to it result would be -sys.maxint + (x - 2).
hash(sys.maxint + 1) == sys.maxint + 1 # False
hash(sys.maxint + 1) == - sys.maxint -1 # True
hash(sys.maxint + sys.maxint) == -sys.maxint + sys.maxint - 2 # True
Meanwhile 2**200 is a n times greater than sys.maxint - my guess is that hash would go over range -sys.maxint..+sys.maxint n times until it stops on plain integer in that range, like in code snippets above..
So generally, for any n <= sys.maxint:
hash(sys.maxint*n) == -sys.maxint*(n%2) + 2*(n%2)*sys.maxint - n/2 - (n + 1)%2 ## True
Note: this is true for python 2.
The implementation for the int type in cpython can be found here.
It just returns the value, except for -1, than it returns -2:
static long
int_hash(PyIntObject *v)
{
/* XXX If this is changed, you also need to change the way
Python's long, float and complex types are hashed. */
long x = v -> ob_ival;
if (x == -1)
x = -2;
return x;
}

Python memory error for integer and math

When I run my code below I get Memory Error
import math
X = 600851475143
halfX = math.trunc(int(X / 2))
countFactors = 0
for i in range(halfX):
if i >0 and X % i:
countFactors += 1
print countFactors
I understand because of math calcs here but I do not know how to correct it.
I'm going to guess you're using Python 2.7 (or 2.x, at any rate).
If that's the case, you should use xrange instead of range.
In python 3.x, range creates an iterator that only uses a few bytes of memory regardless of how large it is. In python 2.x, range always creates a list containing numbers counting up (or down) over the specified range. Calling range(some_large_number) can cause you to run out of memory in 2.x.
Therefore, Python 2.x has xrange which creates an iterator identical to range in 3.x.
Also, you can simplify your math somewhat. For example:
x = 600851475143
half_x = x // 2
count_factors = 0
for i in xrange(half_x):
if i > 0 and x % i == 0:
count_factors += 1
print count_factors
However, there are much more efficient ways to do this.
As a simple example, if the number is divisible by two, you can iterative over every other number, cutting the number of tests in half. Similarly, if it's divisible by 3, 5, etc.
I'll leave it to you to figure out the generalization. It's a fun problem :)

Categories