Python - weird properties for bit operations [duplicate] - python

I was trying to understand bitwise NOT in python.
I tried following:
print('{:b}'.format(~ 0b0101))
print(~ 0b0101)
The output is
-110
-6
I tried to understand the output as follows:
Bitwise negating 0101 gives 1010. With 1 in most significant bit, python interprets it as a negative number in 2's complement form and to get back corresponding decimal it further takes 2's complement of 1010 as follows:
1010
0101 (negating)
0110 (adding 1 to get final value)
So it prints it as -110 which is equivalent to -6.
Am I right with this interpretation?

You're half right..
The value is indeed represented by ~x == -(x+1) (add one and invert), but the explanation of why is a little misleading.
Two's compliment numbers require setting the MSB of the integer, which is a little difficult if the number can be an arbitrary number of bits long (as is the case with python). Internally python keeps a separate number (there are optimizations for short numbers however) that tracks how long the digit is. When you print a negative int using the binary format: f'{-6:b}, it just slaps a negative sign in front of the binary representation of the positive value (one's compliment). Otherwise, how would python determine how many leading one's there should be? Should positive values always have leading zeros to indicate they're positive? Internally it does indeed use two's compliment for the math though.
If we consider signed 8 bit numbers (and display all the digits) in 2's compliment your example becomes:
~ 0000 0101: 5
= 1111 1010: -6
So in short, python is performing correct bitwise negation, however the display of negative binary formatted numbers is misleading.

Python integers are arbitrarily long, so if you invert 0b0101, it would be 1111...11111010. How many ones do you write? Well, a 4-bit twos complement -6 is 1010, and a 32-bit twos complement -6 is 11111111111111111111111111111010. So an arbitrarily long -6 could ideally just be written as -6.
Check what happens when ~5 is masked to look at the bits it represents:
>>> ~5
-6
>>> format(~5 & 0xF,'b')
'1010'
>>> format(~5 & 0xFFFF,'b')
'1111111111111010'
>>> format(~5 & 0xFFFFFFFF,'b')
'11111111111111111111111111111010'
>>> format(~5 & 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,'b')
'11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111010'
A negative decimal representation makes sense and you must mask to limit a representation to a specific number of bits.

Related

What is the exact definition of bitwise not in Python, given arbitrary length integers?

Bitwise not (~) is well-defined in languages that define a specific bit length and format for ints. Since in Python 3, ints can be any length, they by definition have variable number of bits. Internally, I believe Python uses at least 28 bytes to store an int, but of course these aren't what bitwise not is defined on.
How does Python define bitwise not:
Is the bit length a function of the int size, the native platform, or something else?
Does Python sign extend, zero extend, or do something else?
Python integers emulate 2's-complement represenation with "an infinite" number of sign bits (all 0 bits for >= 0, all 1 bits for < 0).
All bitwise integer operations maintain this useful illusion.
For example, 0 is an infinite number of 0 bits, and ~0 is an infinite number of 1 bits - which is -1 in 2's-complement notation.
>>> ~0
-1
It's generally true that ~i = -i-1, which, in English, can be read as "the 1's-complement of i is the 2's-complement of i minus 1".
For right shifts of integers, Python fills with more copies of the sign bit.

Why the binary representation is different from python compiler than what we know on paper?

Bitwise NOT is the first complement, for example:
x = 1 (binary: 0001)
~x = -2 (binary: 1110)
Hence, my question is why -2 in binary is (-0b10) as for the python compiler?
We know that 1110 represents (14) for unsigned integer and (-2) for signed integer.
Two's complement inherently depends on the size of a number. For example, -2 on signed 4-bit is 1110 but on signed 8-bit is 1111 1110.
Python's integer type is arbitrary precision. That means there is no well-defined leading bit to indicate negative sign or well-defined length of the two's complement. A two's complement would be 1... 1110, where ... is an infinite repetition of 1.
As such, Python's integer are displayed as a separate sign (nothing or -) and the absolute number. Thus, -2 becomes - and 0b10 – i.e. - 2. Similarly, -5 becomes - and 0b101 – i.e. - 5.
Note that this representation is merely the standard representation to be human-readable. It is not necessarily the internal representation, which is implementation defined.

Python Integer Bit masking while keeping sign

I am trying to replicate/validate bitwise arithmetic logic in Python.
I have cases where bits from the absolute value are truncated (no matter if they are 0 or 1) while the sign is preserved. This happens in various bit length representations.
How to implement truncating bits from the absolute value also for negative integers elegantly in Python ?
For positive integers I can easily apply a bit mask:
n=7; nMod=n & 0b11; print(nMod) #truncate MSB
#expected and actual: 3
For negative integers it does not work, probably due to the internal 2's complement and variable number of bits representation:
n=-7; nMod=n & 0b11; print(nMod)
#expected:-3; actual: 1
One could certainly analyze the absolute value, determine which bits are actually Ones and remove them by shifting left and right but my wish would be a simple one-liner like for the positive numbers.

How to create Python fixed length bits?

I wish to do bitwise negation in Python.
My expectation:
negate(0001) => 1110
But Python's ~0b0001 returns -0b10. It seems Python truncate 1110 into -0b10.
How to keep the leading bits?
Moreover, why
bin(~0b1) yields -0b10?
How many bits are reserved for that datatype?
Python uses arbitrary precision arithmetic, so you don't have to worry about the number of bits used. Also it returns -0b10 for bin(~0b1), because it understands that the result is -2 and represents the number as it is 10 and keeps the sign in the front (only for the negative numbers).
But we can represent the number as we like using format function, like this
def negate(number, bits = 32):
return format(~number & 2 ** bits - 1, "0{}b".format(bits))
print(negate(1))
# 11111111111111111111111111111110
print(negate(1, bits = 4))
# 1110
Or, as suggested by eryksun,
def negate(number, bits = 32):
return "{:0{}b}".format(~number & 2 ** bits - 1, bits)
Python acts as if its integers have infinitely many bits. As such, if you use ~ on one, the string representation can't start with an infinite number of 1s, or generating the string would never terminate. Instead, Python chooses to represent it as a negative number as it would be using two's compliment. If you want to restrict the integer to a number of bits, & it against an appropriate mask:
>>> bin((~1) & 0b1111)
'0b1110'

Python invert operator - visualizing the negative number

In simple words, the definition given for ~n is -n - 1.
For example ~1
1 = 0001
~1 = 1110 (which is -2)
For even numbers
say ~2
2 = 0010
~2 = 1101 (which is the representation of -3 in twos complement)
But the question is
1110 = -2 can be easily visualized as -2 (the right two bits are 10 and the rest all 1)
1101 = -3 can't be visualized like this (going by the above logic it should be -5)
So I am wondering is there a simple way to see and tell from twos complement binary what the negative number represents without doing much calculations .
~n = -n - 1 is equivalent to -n = ~n + 1. That means that to figure out what the negative of a number is, you can invert it (in your head) and add one. Pretend zeros are ones and vice versa, then add one.
Example: Pretend this
1101
is this
0010
then add 1
0011
Thus, 1101 represents -3.
The definition comes because of 2's complement representation of integers. So "really" ~n is n with all the bits flipped. But the result of flipping "all" the bits depends in a sense on how many bits n has in the first place. CPython uses fixed-width integers internally but the language doesn't present them to the programmer, so the only definition that makes sense in general is the arithmetic one ~n = -n - 1. But the motivation for that definition is flipping the bits of a fixed-width 2's complement integer.
1110 = ~2 can be easily visualized as -2
...1110 is not ~2, it is -2. ~2 is ...1101 because 2 is ...010.
1101 = ~3 can't be visualized like this
...1101 is not ~3, it is -3. ~3 is ...1100 because 3 is ...011.
The way I visualize this (when I visualize it at all -- as a mathematician by training I prefer not to consider specific numbers), is to know that in 2's complement, ...10... is always a negated power of two. So ...10 is -2, ...100 is -4, etc.
Then in order to know what for example ...110110 is, it's ...110000 + 110, that is to say -16 + 6, which is -10.
Of course ...110110 is also (by bit flipping) ~1001, that is to say ~9, which by the formula is -9-1, which is also -10. So the system works ;-)

Categories