Python representation of negative integers - python

>>> x = -4
>>> print("{} {:b}".format(x, x))
-4 -100
>>> mask = 0xFFFFFFFF
>>> print("{} {:b}".format(x & mask, x & mask))
4294967292 11111111111111111111111111111100
>>>
>>> x = 0b11111111111111111111111111111100
>>> print("{} {:b}".format(x, x))
4294967292 11111111111111111111111111111100
>>> print("{} {:b}".format(~(x ^ mask), ~(x ^ mask)))
-4 -100
I am having trouble figuring out how Python represents negative integers, and therefore how bit operations work. It is my understanding that Python attempts to emulate two's complement, but with any number of bits. Therefore, it is common to use 32-bit masks to force Python to set a standard size on integers before bit operations.
As you can see in my example, -4 & 0xFFFFFFFF yields a large positive number. Why does Python seem to read this as an unsigned integer, instead of a two's complement negative number? Later, the operation ~(x ^ mask), which should yield the exact same two's complement bit pattern as the large positive, instead gives -4. What causes the conversion to a signed int?
Thanks!

TLDR; CPython integer type stores the sign in a specific field of a structure. When performing a bitwise operation, CPython replaces negative numbers by their two's complement and sometimes (!) performs the reverse operation (ie replace the two's complements by negative numbers).
Bitwise operations
The internal representation of an integer is a PyLongObject struct, that contains a PyVarObject struct. (When CPython creates a new PyLong object, it allocates the memory for the structure and a trailing space for the digits.) What matter here is that the PyLong is sized: the ob_size field of the PyVarObject embedded struct contains the size (in digits) of the integer (digits are either 15 or 30 bits digits).
If the integer is negative then this size is minus the number of digits .
(References: https://github.com/python/cpython/blob/master/Include/object.h and https://github.com/python/cpython/blob/master/Include/longobject.h)
As you see, the inner CPython's representation of an integer is really far from the usual binary representation. Yet CPython has to provide bitwise operations for various purposes. Let's take a look at the comments in the code:
static PyObject *
long_bitwise(PyLongObject *a,
char op, /* '&', '|', '^' */
PyLongObject *b)
{
/* Bitwise operations for negative numbers operate as though
on a two's complement representation. So convert arguments
from sign-magnitude to two's complement, and convert the
result back to sign-magnitude at the end. */
/* If a is negative, replace it by its two's complement. */
/* Same for b. */
/* Complement result if negative. */
}
To handle negative integers in bitwise operations, CPython use the two's complement (actually, that's a two's complement digit by digit, but I don't go into the details). But note the "Sign Rule" (name is mine): the sign of the result is the bitwise operator applied to the signs of the numbers. More precisely, the result is negative if nega <op> negb == 1, (negx = 1 for negative, 0 for positive). Simplified code:
switch (op) {
case '^': negz = nega ^ negb; break;
case '&': negz = nega & negb; break;
case '|': negz = nega | negb; break;
default: ...
}
Binary formatting
On the other hand, the formatter does not perform the two's complement, even in binary representation: format_long_internal calls long_format_binary and remove the two leading characters, but keeps the sign. See the code:
/* Is a sign character present in the output? If so, remember it
and skip it */
if (PyUnicode_READ_CHAR(tmp, inumeric_chars) == '-') {
sign_char = '-';
++prefix;
++leading_chars_to_skip;
}
The long_format_binary function does not perform any two's complement: just output the number in base 2, preceded by the sign.
if (negative) \
*--p = '-'; \
Your question
I will follow your REPL sequence:
>>> x = -4
>>> print("{} {:b}".format(x, x))
-4 -100
Nothing surprising, given that there is no two's complement in formatting, but a sign.
>>> mask = 0xFFFFFFFF
>>> print("{} {:b}".format(x & mask, x & mask))
4294967292 11111111111111111111111111111100
The number -4 is negative. Hence, it is replaced by its two's complement before the logical and, digit by digit. You expected that the result will be turned into a negative number, but remember the "Sign Rule":
>>> nega=1; negb=0
>>> nega & negb
0
Hence: 1. the result does not have the negative sign; 2. the result is not complemented to two. Your result is compliant with the "Sign Rule", even if this rule doesn't seem very intuitive.
Now, the last part:
>>> x = 0b11111111111111111111111111111100
>>> print("{} {:b}".format(x, x))
4294967292 11111111111111111111111111111100
>>> print("{} {:b}".format(~(x ^ mask), ~(x ^ mask)))
-4 -100
Again, -4 is negative, hence replaced by it's two's complement 0b11111111111111111111111111111100, then XORed with 0b11111111111111111111111111111111. The result is 0b11 (3). You take the complement unary, that is 0b11111111111111111111111111111100 again, but this time the sign is negative:
>>> nega=1; negb=0
>>> nega ^ negb
1
Therefore, the result is complemented and gets the negative sign, as you expected.
Conclusion: I guess there was no perfect solution to have arbitrary long signed number and provide bitwise operations, but the documentation is not really verbose on the choices that were made.

Related

Understanding binary addition in python using bit manipulation

I am looking at solutions for this question:
Given two integers a and b, return the sum of the two integers without using the operators + and -. (Input Limits: -1000 <= a, b <= 1000)
In all these solutions, I am struggling to understand why the solutions do ~(a ^ mask) when a exceeds 32-bit number max 0x7fffffff when evaluating a + b [see code below].
def getSum(self, a: int, b: int) -> int:
# 32bit mask
mask = 0xFFFFFFFF # 8Fs = all 1s for 32 bits
while True:
# Handle addition and carry
a, b = (a ^ b) & mask, ((a & b) << 1) & mask
if b == 0:
break
max_int = 0x7FFFFFFF
print("A:", a)
print("Bin A:", bin(a))
print("Bin M:", bin(mask))
print(" A^M:", bin(a ^ mask))
print("~ A^M:", bin(~(a ^ mask)))
print(" ~ A:", bin(~a))
return a if a < max_int else ~(a ^ mask)
I don't get why we need to mask a again when returning answer?
When exiting the loop it was already masked: a = (a ^ b) & mask. So why can't we just do ~a if the 32nd bit is set to 1 for a?
I looked at The meaning of Bit-wise NOT in Python, to understand ~ operation, but did not get it.
Output for a = -12, b = -8. Correctly returns -20:
A: 4294967276
Bin A: 0b11111111111111111111111111101100
Bin M: 0b11111111111111111111111111111111
A^M: 0b10011
~ A^M: -0b10100
~ A: -0b11111111111111111111111111101101
Edit:
Here's a helpful link - Differentiating a 2's complement negative number and the corresponding positive number.
Because of the bounds, we will never have a positive number overflow. (numbers are only between [-1000, 1000]). So we know that when the 32nd bit was set it was because it was a negative number.
So we do ~(a ^ mask) to retain our a value till first 32 bits and add infinite leading 1s to it so the compiler knows this is a negative number.
If you look at Bin A, it is already the 2s complement representation of -20. But right now Python treats it as a positive number. So we need to tell Python to treat it as a negative number using ~(a ^ mask).
You forgot to specify constraints of the original problem, that the a and b are in [-1000, 1000]. That code is a port of a C or Java implementation, which assumes a and b are 4 byte signed integers. And I'm not sure you understand the (a ^ b) and (a & b) << 1 code correctly. The former adds i-th bit of the a and b ignoring a carry bit for every bit. The latter gets all that ignored carry bits.
To the main point, the last operation ~(a ^ mask) is for dealing with a negative integer. The a ^ mask inverts the lower 32 bits of the a and preserves the other upper bits. So, the bitwise NOT of the result, ~(a ^ mask) preserves the lower 32 bits of the a and inverts the other upper bits. Because the upper bits(other than the lower 32 bits) of the a are all zero, the ~(a ^ mask) is just setting all the upper bits to one. It's equivalent to a | ~mask, which is much more readable.
See the following example.
print(~(0xffffffff ^ 0xffffffff)) # This will output -1.
print(~(0xfffffffe ^ 0xffffffff)) # This will output -2.
print(~0xffffffff) # This will output -4294967296(-0x100000000).

Simulating a C cast in Python [duplicate]

Let's say I have this number i = -6884376.
How do I refer to it as to an unsigned variable?
Something like (unsigned long)i in C.
Assuming:
You have 2's-complement representations in mind; and,
By (unsigned long) you mean unsigned 32-bit integer,
then you just need to add 2**32 (or 1 << 32) to the negative value.
For example, apply this to -1:
>>> -1
-1
>>> _ + 2**32
4294967295L
>>> bin(_)
'0b11111111111111111111111111111111'
Assumption #1 means you want -1 to be viewed as a solid string of 1 bits, and assumption #2 means you want 32 of them.
Nobody but you can say what your hidden assumptions are, though. If, for example, you have 1's-complement representations in mind, then you need to apply the ~ prefix operator instead. Python integers work hard to give the illusion of using an infinitely wide 2's complement representation (like regular 2's complement, but with an infinite number of "sign bits").
And to duplicate what the platform C compiler does, you can use the ctypes module:
>>> import ctypes
>>> ctypes.c_ulong(-1) # stuff Python's -1 into a C unsigned long
c_ulong(4294967295L)
>>> _.value
4294967295L
C's unsigned long happens to be 4 bytes on the box that ran this sample.
To get the value equivalent to your C cast, just bitwise and with the appropriate mask. e.g. if unsigned long is 32 bit:
>>> i = -6884376
>>> i & 0xffffffff
4288082920
or if it is 64 bit:
>>> i & 0xffffffffffffffff
18446744073702667240
Do be aware though that although that gives you the value you would have in C, it is still a signed value, so any subsequent calculations may give a negative result and you'll have to continue to apply the mask to simulate a 32 or 64 bit calculation.
This works because although Python looks like it stores all numbers as sign and magnitude, the bitwise operations are defined as working on two's complement values. C stores integers in twos complement but with a fixed number of bits. Python bitwise operators act on twos complement values but as though they had an infinite number of bits: for positive numbers they extend leftwards to infinity with zeros, but negative numbers extend left with ones. The & operator will change that leftward string of ones into zeros and leave you with just the bits that would have fit into the C value.
Displaying the values in hex may make this clearer (and I rewrote to string of f's as an expression to show we are interested in either 32 or 64 bits):
>>> hex(i)
'-0x690c18'
>>> hex (i & ((1 << 32) - 1))
'0xff96f3e8'
>>> hex (i & ((1 << 64) - 1)
'0xffffffffff96f3e8L'
For a 32 bit value in C, positive numbers go up to 2147483647 (0x7fffffff), and negative numbers have the top bit set going from -1 (0xffffffff) down to -2147483648 (0x80000000). For values that fit entirely in the mask, we can reverse the process in Python by using a smaller mask to remove the sign bit and then subtracting the sign bit:
>>> u = i & ((1 << 32) - 1)
>>> (u & ((1 << 31) - 1)) - (u & (1 << 31))
-6884376
Or for the 64 bit version:
>>> u = 18446744073702667240
>>> (u & ((1 << 63) - 1)) - (u & (1 << 63))
-6884376
This inverse process will leave the value unchanged if the sign bit is 0, but obviously it isn't a true inverse because if you started with a value that wouldn't fit within the mask size then those bits are gone.
Python doesn't have builtin unsigned types. You can use mathematical operations to compute a new int representing the value you would get in C, but there is no "unsigned value" of a Python int. The Python int is an abstraction of an integer value, not a direct access to a fixed-byte-size integer.
Since version 3.2 :
def unsignedToSigned(n, byte_count):
return int.from_bytes(n.to_bytes(byte_count, 'little', signed=False), 'little', signed=True)
def signedToUnsigned(n, byte_count):
return int.from_bytes(n.to_bytes(byte_count, 'little', signed=True), 'little', signed=False)
output :
In [3]: unsignedToSigned(5, 1)
Out[3]: 5
In [4]: signedToUnsigned(5, 1)
Out[4]: 5
In [5]: unsignedToSigned(0xFF, 1)
Out[5]: -1
In [6]: signedToUnsigned(0xFF, 1)
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 signedToUnsigned(0xFF, 1)
Input In [1], in signedToUnsigned(n, byte_count)
4 def signedToUnsigned(n, byte_count):
----> 5 return int.from_bytes(n.to_bytes(byte_count, 'little', signed=True), 'little', signed=False)
OverflowError: int too big to convert
In [7]: signedToUnsigned(-1, 1)
Out[7]: 255
Explanations : to/from_bytes convert to/from bytes, in 2's complement considering the number as one of size byte_count * 8 bits. In C/C++, chances are you should pass 4 or 8 as byte_count for respectively a 32 or 64 bit number (the int type).
I first pack the input number in the format it is supposed to be from (using the signed argument to control signed/unsigned), then unpack to the format we would like it to have been from. And you get the result.
Note the Exception when trying to use fewer bytes than required to represent the number (In [6]). 0xFF is 255 which can't be represented using a C's char type (-128 ≤ n ≤ 127). This is preferable to any other behavior.
You could use the struct Python built-in library:
Encode:
import struct
i = -6884376
print('{0:b}'.format(i))
packed = struct.pack('>l', i) # Packing a long number.
unpacked = struct.unpack('>L', packed)[0] # Unpacking a packed long number to unsigned long
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
-11010010000110000011000
4288082920
11111111100101101111001111101000
Decode:
dec_pack = struct.pack('>L', unpacked) # Packing an unsigned long number.
dec_unpack = struct.unpack('>l', dec_pack)[0] # Unpacking a packed unsigned long number to long (revert action).
print(dec_unpack)
Out:
-6884376
[NOTE]:
> is BigEndian operation.
l is long.
L is unsigned long.
In amd64 architecture int and long are 32bit, So you could use i and I instead of l and L respectively.
[UPDATE]
According to the #hl037_ comment, this approach works on int32 not int64 or int128 as I used long operation into struct.pack(). Nevertheless, in the case of int64, the written code would be changed simply using long long operand (q) in struct as follows:
Encode:
i = 9223372036854775807 # the largest int64 number
packed = struct.pack('>q', i) # Packing an int64 number
unpacked = struct.unpack('>Q', packed)[0] # Unpacking signed to unsigned
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
9223372036854775807
111111111111111111111111111111111111111111111111111111111111111
Next, follow the same way for the decoding stage. As well as this, keep in mind q is long long integer — 8byte and Q is unsigned long long
But in the case of int128, the situation is slightly different as there is no 16-byte operand for struct.pack(). Therefore, you should split your number into two int64.
Here's how it should be:
i = 10000000000000000000000000000000000000 # an int128 number
print(len('{0:b}'.format(i)))
max_int64 = 0xFFFFFFFFFFFFFFFF
packed = struct.pack('>qq', (i >> 64) & max_int64, i & max_int64)
a, b = struct.unpack('>QQ', packed)
unpacked = (a << 64) | b
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
123
10000000000000000000000000000000000000
111100001011110111000010000110101011101101001000110110110010000000011110100001101101010000000000000000000000000000000000000
just use abs for converting unsigned to signed in python
a=-12
b=abs(a)
print(b)
Output:
12

Python binary value of integer of certain byte size

I know python probably isn't the best tool for this, but let's say I have a value that I would like to display as an unsigned char with values between -128 and 127. For example:
# ok for positive number
>>> f'0b{1:>08b}'
'0b00000001'
# how to do it for negative number?
>>> f'0b{-1:>08b}' # should be 0b11111111
'0b000000-1'
# how to do it for 2's complement?
>>> f'0b{~1:>08b}' # should be 0b11111110
'0b00000-10'
How could I do this display in python?
Use modulo 256.
# positive number is the same
>>> f'0b{1 % 0x100:>08b}'
'0b00000001'
# correct bit pattern, were you to notate -1 in a signed int8
# same as notating 255 in unsigned int8, which is what -1 % 255 is
>>> f'0b{-1 % 0x100:>08b}'
'0b11111111'
# flipped bits from 1, truncated to only the least significant 8 digits
>>> f'0b{~1 % 0x100:>08b}'
'0b11111110'
Essentially this is just 'convert your signed char into an unsigned char, and print the bit pattern' - the benefit of using the modulo operator is you always get a positive number, and if your modulo is a power of two, the bit pattern for every bit less than that modulus is left exactly the same.
You could try manually setting the bits like this knowing that 2^n - 1 sets the first n bits:
>>> negative =lambda value, bits: bin(2**bits-1-value+1)
>>> complement =lambda value, bits: bin(2**bits-1-value)
# to verify (note, python doesn't 'know' its 1 byte so will
# equal 256 unless we do the 1-byte mask with &0xFF
>>> (1+int(complement(1,8),2))&0xFF
0

16 bit hex into 14 bit signed int python?

I get a 16 bit Hex number (so 4 digits) from a sensor and want to convert it into a signed integer so I can actually use it.
There are plenty of codes on the internet that get the job done, but with this sensor it is a bit more arkward.
In fact, the number has only 14 bit, the first two (from the left) are irrelevant.
I tried to do it (in Python 3) but failed pretty hard.
Any suggestions how to "cut" the first two digits of the number and then make the rest a signed integer?
The Datasheet says, that E002 should be -8190 ane 1FFE should be +8190.
Thanks a lot!
Let's define a conversion function:
>>> def f(x):
... r = int(x, 16)
... return r if r < 2**15 else r - 2**16
...
Now, let's test the function against the values that the datahsheet provided:
>>> f('1FFE')
8190
>>> f('E002')
-8190
The usual convention for signed numbers is that a number is negative if the high bit is set and positive if it isn't. Following this convention, '0000' is zero and 'FFFF' is -1. The issue is that int assumes that a number is positive and we have to correct for that:
For any number equal to or less than 0x7FFF, then high bit is unset and the number is positive. Thus we return r=int(x,16) if r<2**15.
For any number r-int(x,16) that is equal to or greater than 0x8000, we return r - 2**16.
While your sensor may only produce 14-bin data, the manufacturer is following the standard convention for 16-bit integers.
Alternative
Instead of converting x to r and testing the value of r, we can directly test whether the high bit in x is set:
>>> def g(x):
... return int(x, 16) if x[0] in '01234567' else int(x, 16) - 2**16
...
>>> g('1FFE')
8190
>>> g('E002')
-8190
Ignoring the upper bits
Let's suppose that the manufacturer is not following standard conventions and that the upper 2-bits are unreliable. In this case, we can use modulo, %, to remove them and, after adjusting the other constants as appropriate for 14-bit integers, we have:
>>> def h(x):
... r = int(x, 16) % 2**14
... return r if r < 2**13 else r - 2**14
...
>>> h('1FFE')
8190
>>> h('E002')
-8190
There is a general algorithm for sign-extending a two's-complement integer value val whose number of bits is nbits (so that the top-most of those bits is the sign bit).
That algorithm is:
treat the value as a non-negative number, and if needed, mask off additional bits
invert the sign bit, still treating the result as a non-negative number
subtract the numeric value of the sign bit considered as a non-negative number, producing as a result, a signed number.
Expressing this algorithm in Python produces:
from __future__ import print_function
def sext(val, nbits):
assert nbits > 0
signbit = 1 << (nbits - 1)
mask = (1 << nbits) - 1
return ((val & mask) ^ signbit) - signbit
if __name__ == '__main__':
print('sext(0xe002, 14) =', sext(0xe002, 14))
print('sext(0x1ffe, 14) =', sext(0x1ffe, 14))
which when run shows the desired results:
sext(0xe002, 14) = -8190
sext(0x1ffe, 14) = 8190

Extendible hashing - most significant bits

I want to write extendible hashing. On wiki I have found good implementation in python. But this code uses least significant bits, so when I have hash 1101 for d = 1 value is 1 and for d = 2 value is 01. I would like to use most significant bits. For exmaple: hash 1101, d = 1 value is 1, d = 2 value is 11. Is there any simple way to do that? I tried, but I can't.
Do you understand why it uses the least significant bits?
More or less. It makes efficient when we using arrays. Ok so for hash function I would like to use four least bits from 4-bytes integer but from left to right.
h = hash(k)
h = h & 0xf #use mask to get four least bits
p = self.pp[ h >> ( 4 - GD)]
And it doesn't work, and I don't know why.
Computing a hash using the least significant bits is the fastest way to compute a hash, because it only requires an AND bitwise operation. This makes it very popular.
Here is an implemetation (in C) for a hash using the most significant bits. Since there is no direct way to know the most significant bit, it repeatedly tests that the remaining value has only the specified amount of bits.
int significantHash(int value, int bits) {
int mask = (1 << bits) - 1;
while (value > mask) {
value >>= 1;
}
return value;
}
I recommend the overlapping hash, that makes use of all the bits of the number. Essentially, it cuts the number in parts of equal number of bits and XORs them. It runs slower than the least significant hash, but faster than the significant hash. Above all else, it offers a better dispersion than the other two methods, making it a better candidate when the numbers that must be hashed have a certain bit-related-pattern.
int overlappingHash(int value, int bits) {
int mask = (1 << bits) - 1;
int answer = 0;
do {
answer ^= (value & mask);
value >>= bits;
} while (value > 0);
return answer;
}

Categories