possible Python integer overflow error - python

Below is my code that I'm trying to run to solve for a maximum prioritized product problem i.e. return a maximum value for a product of two numbers in a list. My issue is that the product that is being computed is incorrect when the numbers in the input list are large. I was of the opinion that there will be no integer overflow in error and resulting product will be automatically long but that is not happening evidently as my multiplication is returning some kind of garbage.
#"python2"
# -*- coding: utf-8 -*-
"""
Created on Tue Jun 28 17:12:38 2016
#author: pvatsa
"""
import numpy as np
n = int(raw_input())
alist = np.random.randint(100000, size = n)
print alist
assert(len(alist) == n)
alist = sorted(alist)
print alist
max_no = max(alist)
second_largest_no = alist[n-2]
print max_no*second_largest_no
#print long(product)
#print type(product)

Using np.random.randint will create an array of 32 bits integers:
>>> alist = np.random.randint(100000, size = n)
>>> alist.dtype
dtype('int32')
Sorting it will preserve that type, creating a list of numpy.int32 objects instead of converting them back to Python (overflow-safe) integers:
>>> foo = sorted(alist)
>>> type(foo[-1])
<type 'numpy.int32'>
As such, the multiplication can overflow: you can solve it by either:
Casting your numbers back to Python numbers
Having an array of longs in the first place
The first case is simply a matter of converting values of interest:
>>> foo[-1] * foo[-2]
1386578402
>>> int(foo[-1]) * int(foo[-2])
9976512994L
The second can be done by calling randint with dtype=np.int64 (for numpy >= 1.11) or converting the array afterwards:
>>> llist = np.array(alist, dtype=np.int64)
>>> llist.sort()
>>> np.prod(llist[-2:])
9987503750

Related

How to split bytes into bits [duplicate]

I am working with Python3.2. I need to take a hex stream as an input and parse it at bit-level. So I used
bytes.fromhex(input_str)
to convert the string to actual bytes. Now how do I convert these bytes to bits?
Another way to do this is by using the bitstring module:
>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'
And if you need to strip the leading 0b:
>>> c.bin[2:]
'11111111'
The bitstring module isn't a requirement, as jcollado's answer shows, but it has lots of performant methods for turning input into bits and manipulating them. You might find this handy (or not), for example:
>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'
etc.
What about something like this?
>>> bin(int('ff', base=16))
'0b11111111'
This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.
As pointed out by a comment, if you need to get rid of the 0b prefix, you can do it this way:
>>> bin(int('ff', base=16))[2:]
'11111111'
... or, if you are using Python 3.9 or newer:
>>> bin(int('ff', base=16)).removepreffix('0b')
'11111111'
Note: using lstrip("0b") here will lead to 0 integer being converted to an empty string. This is almost always not what you want to do.
Operations are much faster when you work at the integer level. In particular, converting to a string as suggested here is really slow.
If you want bit 7 and 8 only, use e.g.
val = (byte >> 6) & 3
(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3 is the number with the first two bits set...)
These can easily be translated into simple CPU operations that are super fast.
using python format string syntax
>>> mybyte = bytes.fromhex("0F") # create my byte using a hex string
>>> binary_string = "{:08b}".format(int(mybyte.hex(),16))
>>> print(binary_string)
00001111
The second line is where the magic happens. All byte objects have a .hex() function, which returns a hex string. Using this hex string, we convert it to an integer, telling the int() function that it's a base 16 string (because hex is base 16). Then we apply formatting to that integer so it displays as a binary string. The {:08b} is where the real magic happens. It is using the Format Specification Mini-Language format_spec. Specifically it's using the width and the type parts of the format_spec syntax. The 8 sets width to 8, which is how we get the nice 0000 padding, and the b sets the type to binary.
I prefer this method over the bin() method because using a format string gives a lot more flexibility.
I think simplest would be use numpy here. For example you can read a file as bytes and then expand it to bits easily like this:
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]
Will give:
['0b1000001', '0b1000010', '0b1000011']
Here how to do it using format()
print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)
It is important the 08b . That means it will be a maximum of 8 leading zeros be appended to complete a byte. If you don't specify this then the format will just have a variable bit length for each converted byte.
To binary:
bin(byte)[2:].zfill(8)
Use ord when reading reading bytes:
byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix
Or
Using str.format():
'{:08b}'.format(ord(f.read(1)))
The other answers here provide the bits in big-endian order ('\x01' becomes '00000001')
In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc -
here's a snippet for that:
def bits_little_endian_from_bytes(s):
return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)
And for the other direction:
def bytes_from_bits_little_endian(s):
return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))
One line function to convert bytes (not string) to bit list. There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.
def byte2bin(b):
return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]
I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one. This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16).
Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.
I wrote three variants of the function. First using numpy:
import math
import numpy as np
def bit_positions_numpy(val):
"""
Given an integer value, return the positions of the on bits.
"""
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
bit_arr = np.unpackbits(arr, bitorder='big')
bit_positions = np.where(bit_arr[::-1])[0].tolist()
return bit_positions
Then using string logic:
def bit_positions_str(val):
is_negative = val < 0
if is_negative:
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
neg_position = (length * 8) - 1
# special logic for negatives to get twos compliment repr
max_val = 1 << neg_position
val_ = max_val + val
else:
val_ = val
binary_string = '{:b}'.format(val_)[::-1]
bit_positions = [pos for pos, char in enumerate(binary_string)
if char == '1']
if is_negative:
bit_positions.append(neg_position)
return bit_positions
And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.
BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
positions = [pos for pos, mask in pos_masks if (mask & i)]
BYTE_TO_POSITIONS.append(positions)
def bit_positions_lut(val):
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
bit_positions = []
for offset, b in enumerate(bytestr[::-1]):
pos = BYTE_TO_POSITIONS[b]
if offset == 0:
bit_positions.extend(pos)
else:
pos_offset = (8 * offset)
bit_positions.extend([p + pos_offset for p in pos])
return bit_positions
The benchmark code is as follows:
def benchmark_bit_conversions():
# for val in [-0, -1, -3, -4, -9999]:
test_values = [
# -1, -2, -3, -4, -8, -32, -290, -9999,
# 0, 1, 2, 3, 4, 8, 32, 290, 9999,
4324, 1028, 1024, 3000, -100000,
999999999999,
-999999999999,
2 ** 32,
2 ** 64,
2 ** 128,
2 ** 128,
]
for val in test_values:
r1 = bit_positions_str(val)
r2 = bit_positions_numpy(val)
r3 = bit_positions_lut(val)
print(f'val={val}')
print(f'r1={r1}')
print(f'r2={r2}')
print(f'r3={r3}')
print('---')
assert r1 == r2
import xdev
xdev.profile_now(bit_positions_numpy)(val)
xdev.profile_now(bit_positions_str)(val)
xdev.profile_now(bit_positions_lut)(val)
import timerit
ti = timerit.Timerit(10000, bestof=10, verbose=2)
for timer in ti.reset('str'):
for val in test_values:
bit_positions_str(val)
for timer in ti.reset('numpy'):
for val in test_values:
bit_positions_numpy(val)
for timer in ti.reset('lut'):
for val in test_values:
bit_positions_lut(val)
for timer in ti.reset('raw_bin'):
for val in test_values:
bin(val)
for timer in ti.reset('raw_bytes'):
for val in test_values:
val.to_bytes(val.bit_length(), 'big', signed=True)
And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.
Timed str for: 10000 loops, best of 10
time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs

Numpy square return wrong values for array

I have a data set where pairs of numbers are represented by 32 bits, each number is represented as two 8 unsigned integers.
I'm trying to get the first real part in one array and the complex part in a second array.
I'm then trying to square each part, add them up and take the square root of the sum. (aka taking the magnitude).
When I try squaring the elements of each arrays using numpy.square I get not only negative but also inaccurate values.
Any idea what's going on/what's wrong?
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as signal
data = np.fromfile(r'C:\Users\Miaou\Desktop\RAW_DATA_000066_000008.bin', dtype="int16")
print(data)
Is = data[0::2]
Qs = data[1::2]
Is_square = np.square(Is, dtype='int16')
Qs_square = np.square(Qs, dtype='int16')
print('Is',Is)
print('Qs',Qs)
print('Is square',Is_square)
print('Qs square',Qs_square)
Output: Is [ 335 -720 8294 ... -3377 3878 6759]
Qs [-2735 4047 1274 ... -279 1319 4918]
Is square [-18847 -5888 -22364 ... 865 31140 5489]
Qs square [ 9121 -5791 -15324 ... 12305 -29711 3940]
You're experiencing integer overflow. The min value of the int16 (signed) type is -32768 and the maximum value is 32767. The reason for this is because you only have 16 bits (that's what int16) means. Note that 2^16 = 65536, but since it's signed (negatives allowed), we don't have values 0 through 65536, but instead you can imagine they are shifted down such that 0 is centered (i.e. -32768 to 32767)
Let's take your first element of the Is as an example:
>>> 335**2
112225
Note that 112225 > 32767. This means you'll get overflow. It just keeps wrapping around until it lands in the valid range:
>>> x = 112225
>>> x = x - 2**16
>>> x
46689 # Still not in the valid range. Repeat.
>>> x = x - 2**16
>>> x
-18847 # Yep, now we are between -32768 and 32768
The other answer here is not quite right as leaving off the dtype does not suffice:
>>> Is = np.array([335, -720, 8294, -3377, 3878, 6759]).astype('int16')
>>> Is
array([ 335, -720, 8294, -3377, 3878, 6759], dtype=int16)
>>> Is_sq_nodtype = np.square(Is)
>>> Is_sq_nodtype
array([-18847, -5888, -22364, 865, 31140, 5489], dtype=int16)
numpy ops keep the same dtype. You actually need to "up" the dtype to have more bits. int32 should probably do the trick for your values (you can also do int64 or float depending on how big your values are, here is a list of dtypes: https://docs.scipy.org/doc/numpy-1.10.0/user/basics.types.html)
Working example:
>>> Is = np.array([335, -720, 8294, -3377, 3878, 6759]).astype('int16')
array([ 335, -720, 8294, -3377, 3878, 6759], dtype=int16)
>>> Is_sq = np.square(Is, dtype='int32')
>>> Is_sq
array([ 112225, 518400, 68790436, 11404129, 15038884, 45684081], dtype=int32)
HTH.
Remove dtype='int16' from your np.square() calls.

Integer overflow in Python3

I'm new to Python, I was reading this page where I saw a weird statement:
if n+1 == n: # catch a value like 1e300
raise OverflowError("n too large")
x equals to a number greater than it?! I sense a disturbance in the Force.
I know that in Python 3, integers don't have fixed byte length. Thus, there's no integer overflow, like how C's int works. But of course the memory can't store infinite data.
I think that's why the result of n+1 can be the same as n: Python can't allocate more memory to preform the summation, so it is skipped, and n == n is true. Is that correct?
If so, this could lead to incorrect result of the program. Why don't Python raise an error when operations are not possible, just like C++'s std::bad_alloc?
Even if n is not too large and the check evaluates to false, result - due to the multiplication - would need much more bytes. Could result *= factor fail for the same reason?
I found it in the offical Python documentation. Is it really the correct way to check big integers / possible integer "overflow"?
Python3
Only floats have
a hard limit in python. Integers are implemented as “long” integer objects of arbitrary size in python3 and do not normally overflow.
You can test that behavior with the following code
import sys
i = sys.maxsize
print(i)
# 9223372036854775807
print(i == i + 1)
# False
i += 1
print(i)
# 9223372036854775808
f = sys.float_info.max
print(f)
# 1.7976931348623157e+308
print(f == f + 1)
# True
f += 1
print(f)
# 1.7976931348623157e+308
You may also want to take a look at sys.float_info and sys.maxsize
Python2
In python2 integers are automatically casted to long integers if too large as described in the documentation for numeric types
import sys
i = sys.maxsize
print type(i)
# <type 'int'>
i += 1
print type(i)
# <type 'long'>
Could result *= factor fail for the same reason?
Why not try it?
import sys
i = 2
i *= sys.float_info.max
print i
# inf
Python has a special float value for infinity (and negative infinity too) as described in the docs for float
Integers don't work that way in Python.
But float does. That is also why the comment says 1e300, which is a float in scientific notation.
I had a problem of with integer overlflows in python3, but when I inspected the types, I understood the reason:
import numpy as np
a = np.array([3095693933], dtype=int)
s = np.sum(a)
print(s)
# 3095693933
s * s
# -8863423146896543127
print(type(s))
# numpy.int64
py_s = int(s)
py_s * py_s
# 9583320926813008489
Some pandas and numpy functions, such as sum on arrays or Series return an np.int64 so this might be the reason you are seeing int overflows in Python3.

How do I check if int in Python?

In the question we're asked to remove all even numbers from an array, hence I tried to create a function:
import numpy as np
A = np.array([2,3,4,5])
def remove_even(A):
if ((A[0])/2) != int: #check if the first value is an integer when divided by 2
A = A[0:len(A)+1: 2]
return A
else:
A = A[1:len(A)+1:2]
However, regardless of my array starting with either an even number (i.e. 2) or an odd number (i.e. 1) the execution of the code goes only as far as to the if statement but not to theelse.
What am I missing? I would appreciate any feedback!
In numpy you can just use a boolean mask:
A[(A % 2).astype(bool)]
returns
array([3, 5])
You can do the below if you need without numpy solution.
l = [1,2,3,4,5,6,7,8]
a = [i for i in l if not i%2]
##print(a) output
##[2, 4, 6, 8]
Your code never reaches the else clause because the if test is always true.
No number is equal to int because int is a class. != is an equality test not a membership test.
4/2 is not of type int because the / operator gives a float result, so the answer is 2.0. That means type(A[0]/2) will always be a float irrespective of the value of A[0]. So testing the result of the division for membership of int, even if correctly done, won't do what you want.
Do this instead:
if not (A[0] % 2):
This will be true if A[0] is not an even number, whether integer or float.

how to divide integer and take some part

I have a variable holding x length number, in real time I do not know x. I just want to get divide this value into two. For example;
variable holds a = 01029108219821082904444333322221111
I just want to take last 16 integers as a new number, like
b = 0 # initialization
b = doSomeOp (a)
b = 4444333322221111 # new value of b
How can I divide the integer ?
>>> a = 1029108219821082904444333322221111
>>> a % 10**16
4444333322221111
or, using string manipulation:
>>> int(str(a)[-16:])
4444333322221111
If you don't know the "length" of the number in advance, you can calculate it:
>>> import math
>>> a % 10 ** int(math.log10(a)/2)
4444333322221111
>>> int(str(a)[-int(math.log10(a)/2):])
4444333322221111
And, of course, for the "other half" of the number, it's
>>> a // 10 ** int(math.log10(a)/2) # Use a single / with Python 2
102910821982108290
EDIT:
If your actual question is "How can I divide a string in half", then it's
>>> a = "\x00*\x10\x01\x00\x13\xa2\x00#J\xfd\x15\xff\xfe\x00\x000013A200402D5DF9"
>>> half = len(a)//2
>>> front, back = a[:half], a[half:]
>>> front
'\x00*\x10\x01\x00\x13¢\x00#Jý\x15ÿþ\x00\x00'
>>> back
'0013A200402D5DF9'
One way to do this is:
b = int(str(a)[len(str(a))/2:])
I would just explot slices here by casting it to a string, taking a slice and convert it back to a number.
b = int(str(a)[-16:])

Categories