int.to_bytes length calculation - python

Probably a stupid question but I stumbled across the int.to_bytes() function and I can't explain why I have to add "+7" if I calculate the bytes length and there is no hint in the documentation. I am sure I miss some bit/byte calculation magic but I need a little help from the community here.
Example what confused me: x.to_bytes((x.bit_length() + 7) // 8, byteorder='little') from https://docs.python.org/3/library/stdtypes.html#int.to_bytes
Thanks in advance

bit_length returns the number of bits necessary to represent an integer in binary, excluding the sign and leading zeros. So
x.bit_length() + 7) // 8
will just give you the number of bytes necessary to represent that integer x. You could also write something like
from math import ceil
ceil(x.bit_length / 8)
to get the same number.
The method to_bytes() requires this byte length as its first argument. To account for x == 0 you probably want to include that case to:
x.to_bytes(length=(min(x.bit_length(), 1) + 7) // 8, byteorder='little')

Related

Reverse decode function of c-struct in Python

I am using this function found on GitHub to read some data from a HID-stream in Python. h.read(64)
def decode_bytes(byte_1, byte_2, byte_3, byte_4):
bytes_reversed_and_concatenated = byte_4 * (16 ** 6) + byte_3 * (16 ** 4) + byte_2 * (16 ** 2) + byte_1
bytes_hex = hex(bytes_reversed_and_concatenated)[2:]
bytes_decimal = str(round(struct.unpack('!f', bytes.fromhex(bytes_hex))[0], 1))
return bytes_decimal
The function converts four bytes (in hex-values as integers) from the stream to a Python float-value which is returned as a string. I've read that a C-struct float representation takes up four bytes, so I guess that explains that the function takes four bytes as an input. But apart from that, I'm pretty blank as to how and why the function works.
I have two questions:
First I would very much like to get a better understanding of how the function works. Why does it reverse the byte order and what is up with the 16 ** 6, 16 ** 4 and so on? I am having a hard time figuring out, what that does in Python.
Second I would like to reverse the function. Meaning I would like to be able to supply a float as an input and get out a list of four integer-hex-values, which I can write back via the HID-interface. But I have no idea, where to start.
I was hoping to get some pointers in the right direction. Any help is much appreciated.
So the comment from #user2357112 helped me figure everything out. The working and much simpler function now looks like this:
def decode_bytes(byte_1, byte_2, byte_3, byte_4):
return_value = struct.unpack('<f', bytes([byte_1, byte_2, byte_3, byte_4]))
return str(round(return_value[0], 1))
And if I want to wrap a float back up as a bytes array I do this:
struct.pack('<f', float(float_value))
Also I learned a bit about Endianness along the way. Thanks.

Python: Solution for high int-precision needed (generate primes)

at the moment I try to implement a generate_random_prime()-function from the algorithm shown in FIPS186-4 from NIST (Appendix B3.2.1), see here.
But there seems a big problem with step 4.4 (if p < sqrt(2)*(2**((nlen/2)-1)), because of the precision in Python.
to show the relevant part and problem of my code, see this example:
import os
from decimal import Decimal
import math
for i in range(100):
nlen = 2048 #my key-size should be 2048bit
p = int.from_bytes(os.urandom(int(2048/2/8)), byteorder = "little") #see Ann1 and Ann2
print(p < Decimal(math.sqrt(2))*(Decimal(2**(int(2048/2))) - 1)
Ann1: 2048/2/8 because of bytes
Ann2: I know that os.urandom is not the best generator - I will later use an approved one... for the testing phase it should be acceptable I think...
The result is always "True" - so the algorithm will never leave step 4.4.
I think the problem is Decimal(math.sqrt(2))*(Decimal(2**(int(2048/2))) - 1), because the result of this is Decimal('2.542322012307292741109308792E+308'). Convert to int via int(Decimal(math.sqrt(2))*(Decimal(2**(int(2048/2))) - 1)), the result will be
254232201230729274110930879200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
It is rounded up! - Is this the reason for the always True result? I think in this case it will never be possible to get a p less than Decimal(math.sqrt(2))*(Decimal(2**(int(2048/2))) - 1)
How can I solve this problem?
__
edit: found a mistake:
Decimal(math.sqrt(2))*(Decimal(2**(int(2048/2))) - 1) should be Decimal(math.sqrt(2))*(Decimal(2**(int(2048/2-1)))), so the result of this should be Decimal('1.271161006153646370554654396E+308') instead of Decimal('2.542322012307292741109308791E+308')
You are constantly converting between floats, integers and Decimal. Drop all use of float; this includes not using functions that produce float values, such as math.sqrt().
Stick to Decimal objects instead, and only convert the final value to an integer:
int(Decimal(2).sqrt() * 2 ** ((nlen // 2) - 1))
Note the use of //, to use integer division, not true division (producing floats again).

Python unpack binary data, numeric of length 12

I have a file with big endian binaries. There are two numeric fields. The first has length 8 and the second length 12. How can I unpack the two numbers?
I am using the Python module struct (https://docs.python.org/2/library/struct.html) and it works for the first field
num1 = struct.unpack('>Q',payload[0:8])
but I don't know how I can unpack the second number. If I treat it as char(12), then I get something like '\x00\xe3AC\x00\x00\x00\x06\x00\x00\x00\x01'.
Thanks.
I think you should create a new string of bytes for the second number of length 16, fill the last 12 bytes with the string of bytes that hold your number and first 4 ones with zeros.
Then decode the bytestring with unpack with format >QQ, let's say to numHI, numLO variables. Then, you get final number with that: number = numHI * 2^64 + numLO*. AFAIR the integers in Python can be (almost) as large as you wish, so you will have no problems with overflows. That's only rough idea, please comment if you have problems with writing that in actual Python code, I'll then edit my answer to provide more help.
*^ is in this case the math power, so please use math.pow. Alternatively, you can use byte shift: number = numHI << 64 + numLO.

How to fix a int overflow?

I'm having a python code that uses the number 2637268776 (bigger than sys.maxint in 32-bit systems). Therefore it is saved as a long type.
I'm using a C++ framework bindings in my code, so I have a case where it's being converted into int32, resulting in an int32 overflow:
2637268776 --> -1657698520
In my case, it can happen only one time, so it's safe to assume that if the integer is negative, we had a single int overflow. How can I mathematically reverse the numbers?
In short, you can't. There are many long integers that would map to the same negative number. In your example, these are 2637268776L, 6932236072L, 11227203368L, 15522170664L, 19817137960L etc.
Also, it is possible to get a positive number as a result of such an overflow. For example, 4294967297L would map to 1.
You could add 2 * (sys.maxint + 1) to it:
>>> -1657698520 + (2 * (sys.maxint + 1))
2637268776L
but that only works for original values < 2 * (sys.maxint + 1), as beyond that the overflow will run into positive numbers, or worse, overflow again.

Python, len, and size of ints

So, cPython (2.4) has some interesting behaviour when the length of something gets near to 1<<32 (the size of an int).
r = xrange(1<<30)
assert len(r) == 1<<30
is fine, but:
r = xrange(1<<32)
assert len(r) == 1<<32
ValueError: xrange object size cannot be reported`__len__() should return 0 <= outcome
Alex's wowrange has this behaviour as well. wowrange(1<<32).l is fine, but len(wowrange(1<<32)) is bad. I'm guessing there is some floating point behaviour (being read as negative) action going on here.
What exactly is happening here? (this is pretty well-solved below!)
How can I get around it? Longs?
(My specific application is random.sample(xrange(1<<32),ABUNCH)) if people want to tackle that question directly!)
cPython assumes that lists fit in memory. This extends to objects that behave like lists, such as xrange. essentially, the len function expects the __len__ method to return something that is convertable to size_t, which won't happen if the number of logical elements is too large, even if those elements don't actually exist in memory.
You'll find that
xrange(1 << 31 - 1)
is the last one that behaves as you want. This is because the maximum signed (32-bit) integer is 2^31 - 1.
1 << 32 is not a positive signed 32-bit integer (Python's int datatype), so that's why you're getting that error.
In Python 2.6, I can't even do xrange(1 << 32) or xrange(1 << 31) without getting an error, much less len on the result.
Edit If you want a little more detail...
1 << 31 represents the number 0x80000000 which in 2's complement representation is the lowest representable negative number (-1 * 2^31) for a 32-bit int. So yes, due to the bit-wise representation of the numbers you're working with, it's actually becoming negative.
For a 32-bit 2's complement number, 0x7FFFFFFF is the highest representable integer (2^31 - 1) before you "overflow" into negative numbers.
Further reading, if you're interested.
Note that when you see something like 2147483648L in the prompt, the "L" at the end signifies that it's now being represented as a "long integer" (64 bits, usually, I can't make any promises on how Python handles it because I haven't read up on it).
1<<32, when treated as a signed integer, is negative.

Categories