I wish to do bitwise negation in Python.
My expectation:
negate(0001) => 1110
But Python's ~0b0001 returns -0b10. It seems Python truncate 1110 into -0b10.
How to keep the leading bits?
Moreover, why
bin(~0b1) yields -0b10?
How many bits are reserved for that datatype?
Python uses arbitrary precision arithmetic, so you don't have to worry about the number of bits used. Also it returns -0b10 for bin(~0b1), because it understands that the result is -2 and represents the number as it is 10 and keeps the sign in the front (only for the negative numbers).
But we can represent the number as we like using format function, like this
def negate(number, bits = 32):
return format(~number & 2 ** bits - 1, "0{}b".format(bits))
print(negate(1))
# 11111111111111111111111111111110
print(negate(1, bits = 4))
# 1110
Or, as suggested by eryksun,
def negate(number, bits = 32):
return "{:0{}b}".format(~number & 2 ** bits - 1, bits)
Python acts as if its integers have infinitely many bits. As such, if you use ~ on one, the string representation can't start with an infinite number of 1s, or generating the string would never terminate. Instead, Python chooses to represent it as a negative number as it would be using two's compliment. If you want to restrict the integer to a number of bits, & it against an appropriate mask:
>>> bin((~1) & 0b1111)
'0b1110'
Related
I was trying to understand bitwise NOT in python.
I tried following:
print('{:b}'.format(~ 0b0101))
print(~ 0b0101)
The output is
-110
-6
I tried to understand the output as follows:
Bitwise negating 0101 gives 1010. With 1 in most significant bit, python interprets it as a negative number in 2's complement form and to get back corresponding decimal it further takes 2's complement of 1010 as follows:
1010
0101 (negating)
0110 (adding 1 to get final value)
So it prints it as -110 which is equivalent to -6.
Am I right with this interpretation?
You're half right..
The value is indeed represented by ~x == -(x+1) (add one and invert), but the explanation of why is a little misleading.
Two's compliment numbers require setting the MSB of the integer, which is a little difficult if the number can be an arbitrary number of bits long (as is the case with python). Internally python keeps a separate number (there are optimizations for short numbers however) that tracks how long the digit is. When you print a negative int using the binary format: f'{-6:b}, it just slaps a negative sign in front of the binary representation of the positive value (one's compliment). Otherwise, how would python determine how many leading one's there should be? Should positive values always have leading zeros to indicate they're positive? Internally it does indeed use two's compliment for the math though.
If we consider signed 8 bit numbers (and display all the digits) in 2's compliment your example becomes:
~ 0000 0101: 5
= 1111 1010: -6
So in short, python is performing correct bitwise negation, however the display of negative binary formatted numbers is misleading.
Python integers are arbitrarily long, so if you invert 0b0101, it would be 1111...11111010. How many ones do you write? Well, a 4-bit twos complement -6 is 1010, and a 32-bit twos complement -6 is 11111111111111111111111111111010. So an arbitrarily long -6 could ideally just be written as -6.
Check what happens when ~5 is masked to look at the bits it represents:
>>> ~5
-6
>>> format(~5 & 0xF,'b')
'1010'
>>> format(~5 & 0xFFFF,'b')
'1111111111111010'
>>> format(~5 & 0xFFFFFFFF,'b')
'11111111111111111111111111111010'
>>> format(~5 & 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,'b')
'11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111010'
A negative decimal representation makes sense and you must mask to limit a representation to a specific number of bits.
I'm translating this from C to Python:
int_fast16_t x;
int_fast16_t y = compute_value();
x = ~y;
y = x+1;
I don't think
y = compute_value()
y = (~y) + 1
will work: how would it know on how many bits should the binary NOT be done? (8? 16? 32?).
In the case of C, it is explicit with the type int_fast16_t, but with Python we don't know a number of bits in advance.
How do you do this in Python?
I have read How do I do a bitwise Not operation in Python? but here it's more specific: how does Python infer the number of bits to use in a binary NOT?
Example:
How does Python know if the binary NOT of 3 (0b11) should be done on 4 bits: 0b00 or on 8 bits: 0b11111100 or on 16 bits: 0b1111111111111100?
Python int has an infinite number of bits. The only way to invert them all is to negate the sign of the number. So for example ~1 is -2. To get a specific number of bits in the result, you have to mask off those bits. (~1)&0xffff is 0xfffe which is one bit off, I assume due to two's complement.
how does Python infer the number of bits to use in a binary NOT?
It uses all bits in the binary representation it has for the operand. Python integers consist of one or more longs (see Python integer objects).
How does Python know if the binary NOT of 3 (0b11) should be done on 4 bits: 0b00 or on 8 bits: 0b11111100 or on 16 bits: 0b1111111111111100?
There is no case where it would be just 4 or 8 bits. It is done on all bits in the 64 bits it has for the integer (or a multiple of that, if the original value needed more long words).
Bitwise not (~) is well-defined in languages that define a specific bit length and format for ints. Since in Python 3, ints can be any length, they by definition have variable number of bits. Internally, I believe Python uses at least 28 bytes to store an int, but of course these aren't what bitwise not is defined on.
How does Python define bitwise not:
Is the bit length a function of the int size, the native platform, or something else?
Does Python sign extend, zero extend, or do something else?
Python integers emulate 2's-complement represenation with "an infinite" number of sign bits (all 0 bits for >= 0, all 1 bits for < 0).
All bitwise integer operations maintain this useful illusion.
For example, 0 is an infinite number of 0 bits, and ~0 is an infinite number of 1 bits - which is -1 in 2's-complement notation.
>>> ~0
-1
It's generally true that ~i = -i-1, which, in English, can be read as "the 1's-complement of i is the 2's-complement of i minus 1".
For right shifts of integers, Python fills with more copies of the sign bit.
Say I have a number, 18573628, where each digit represents some kind of flag, and I want to check if the value of the fourth flag is set to 7 or not (which it is).
I do not want to use indexing. I want to in some way and with a flag mask, such as this:
00070000
I would normally use np.logical_and() or something like that, but that will consider any positive value to be True. How can I and while considering the value of a digit? For example, preforming the operation with
flags = 18573628
and
mask = 00070000
would yield 00010000
though trying a different mask, such as
mask = 00040000
would yield 00000000
What you can do is
if (x // 10**n % 10) == y:
...
to check if the n-th digit of x (counting from right) is equal to y
You have to use divide and modulo for a decimal mask:
flags = 18573628
mask = 10000
if (flags / mask) % 10 == 7:
do_something
You can convert the input number into an array of digit numbers and then simply indexing into that array with that specific index or indices would give us those digit(s). For doing that conversion, we can use np.fromstring, like so -
In [87]: nums = np.fromstring(str(18573628),dtype=np.uint8)-48
In [88]: nums
Out[88]: array([1, 8, 5, 7, 3, 6, 2, 8], dtype=uint8)
In [89]: nums[3] == 7
Out[89]: True
Say I have a number, 18573628, where each digit represents some kind of flag, and I want to check if the value of the fourth flag is set to 7
Firstly, bitwise operations like & are bit-wise, which is to say they operate on base-2 digits. They don't operate naturally on digits of any other base, although bases which are themselves powers of 2 work out ok.
To stick with bit-wise operations
You need to know how many values each flag can take, to figure out how many bits each flag needs to encode.
If you want to allow each flag the values zero to nine, you need four bits. However, in this scheme, your number won't behave like a normal integer (storing a base-10 digit in each 4-bit group is called Binary Coded Decimal).
The reason it won't behave like a normal integer is that flag values 1,2,3 will be stored as 1 * 16**2 + 2*16 + 3 instead of the 1 * 10**2 + 2*10 + 3 you'd normally expect. So you'd need to write some code to support this use. However, extracting flag n (counting from zero at the right) just becomes
def bcdFlagValue(bcd, flagnum):
if flagnum == 0:
return bcd & 0x0F;
return 0x0F & (bcd >> ((flagnum-1) * 4))
If you actually need a different range of values for each flag, you need to choose the correct number of bits, and adjust the shift and mask values appropriately.
In either case, you'll need a helper function if you want to print your flags as the base-10 number you showed.
To use normal base 10 numbers
You need to use division and modulo (as 6502 showed), because base-10 numbers don't fit evenly into base-2 bits, so simple bit operations don't work
Note
The BCD approach saves space at the cost of complexity, effort and some speed - from subsequent comments, it's probably simpler to just use the string of digit characters directly unless you really need to save 4 bits per digit.
if flags and mask are hexadecimal values, you can do:
flags = int("18573628", 16)
mask = int("00070000", 16)
result = flags & mask
print(hex(result))
=> '0x70000'
Without dealing with the particulars of your case (the SDSS data, which should be documented in the product specification), let's look at some options.
First, you need to to know if it is to be read in big-endian or little-endian order (is the first bit to the right or to the left). Then you need to know the size of each flag. For a series of yes-no parameters, it could simply be 1 bit (0 or 1). For up to four options, it could be two bits (00, 01, 10, 11), etc. It is also possible that some combinations are reserved for future expansion, don't currently have meaning, and should not be expected to occur in the data. I've also seen instances where the flag size varies, so first n bits mean refer to parameter x, next n bits refer to parameter y, etc.
There is a good explanation of the concept as part of Landsat-8 satellite imagery:
http://landsat.usgs.gov/qualityband.php
To read the values, you convert the base 10 integer to binary, and traverse it in the specified chunks, converting back to int to obtain the parameter values according to your product specification.
Python's math module contain handy functions like floor & ceil. These functions take a floating point number and return the nearest integer below or above it. However these functions return the answer as a floating point number. For example:
import math
f=math.floor(2.3)
Now f returns:
2.0
What is the safest way to get an integer out of this float, without running the risk of rounding errors (for example if the float is the equivalent of 1.99999) or perhaps I should use another function altogether?
All integers that can be represented by floating point numbers have an exact representation. So you can safely use int on the result. Inexact representations occur only if you are trying to represent a rational number with a denominator that is not a power of two.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
To quote from Wikipedia,
Any integer with absolute value less than or equal to 224 can be exactly represented in the single precision format, and any integer with absolute value less than or equal to 253 can be exactly represented in the double precision format.
Use int(your non integer number) will nail it.
print int(2.3) # "2"
print int(math.sqrt(5)) # "2"
You could use the round function. If you use no second parameter (# of significant digits) then I think you will get the behavior you want.
IDLE output.
>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2
Combining two of the previous results, we have:
int(round(some_float))
This converts a float to an integer fairly dependably.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
This post explains why it works in that range.
In a double, you can represent 32bit integers without any problems. There cannot be any rounding issues. More precisely, doubles can represent all integers between and including 253 and -253.
Short explanation: A double can store up to 53 binary digits. When you require more, the number is padded with zeroes on the right.
It follows that 53 ones is the largest number that can be stored without padding. Naturally, all (integer) numbers requiring less digits can be stored accurately.
Adding one to 111(omitted)111 (53 ones) yields 100...000, (53 zeroes). As we know, we can store 53 digits, that makes the rightmost zero padding.
This is where 253 comes from.
More detail: We need to consider how IEEE-754 floating point works.
1 bit 11 / 8 52 / 23 # bits double/single precision
[ sign | exponent | mantissa ]
The number is then calculated as follows (excluding special cases that are irrelevant here):
-1sign × 1.mantissa ×2exponent - bias
where bias = 2exponent - 1 - 1, i.e. 1023 and 127 for double/single precision respectively.
Knowing that multiplying by 2X simply shifts all bits X places to the left, it's easy to see that any integer must have all bits in the mantissa that end up right of the decimal point to zero.
Any integer except zero has the following form in binary:
1x...x where the x-es represent the bits to the right of the MSB (most significant bit).
Because we excluded zero, there will always be a MSB that is one—which is why it's not stored. To store the integer, we must bring it into the aforementioned form: -1sign × 1.mantissa ×2exponent - bias.
That's saying the same as shifting the bits over the decimal point until there's only the MSB towards the left of the MSB. All the bits right of the decimal point are then stored in the mantissa.
From this, we can see that we can store at most 52 binary digits apart from the MSB.
It follows that the highest number where all bits are explicitly stored is
111(omitted)111. that's 53 ones (52 + implicit 1) in the case of doubles.
For this, we need to set the exponent, such that the decimal point will be shifted 52 places. If we were to increase the exponent by one, we cannot know the digit right to the left after the decimal point.
111(omitted)111x.
By convention, it's 0. Setting the entire mantissa to zero, we receive the following number:
100(omitted)00x. = 100(omitted)000.
That's a 1 followed by 53 zeroes, 52 stored and 1 added due to the exponent.
It represents 253, which marks the boundary (both negative and positive) between which we can accurately represent all integers. If we wanted to add one to 253, we would have to set the implicit zero (denoted by the x) to one, but that's impossible.
If you need to convert a string float to an int you can use this method.
Example: '38.0' to 38
In order to convert this to an int you can cast it as a float then an int. This will also work for float strings or integer strings.
>>> int(float('38.0'))
38
>>> int(float('38'))
38
Note: This will strip any numbers after the decimal.
>>> int(float('38.2'))
38
math.floor will always return an integer number and thus int(math.floor(some_float)) will never introduce rounding errors.
The rounding error might already be introduced in math.floor(some_large_float), though, or even when storing a large number in a float in the first place. (Large numbers may lose precision when stored in floats.)
Another code sample to convert a real/float to an integer using variables.
"vel" is a real/float number and converted to the next highest INTEGER, "newvel".
import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
for row in cursor:
curvel = float(row[1])
newvel = int(math.ceil(curvel))
Since you're asking for the 'safest' way, I'll provide another answer other than the top answer.
An easy way to make sure you don't lose any precision is to check if the values would be equal after you convert them.
if int(some_value) == some_value:
some_value = int(some_value)
If the float is 1.0 for example, 1.0 is equal to 1. So the conversion to int will execute. And if the float is 1.1, int(1.1) equates to 1, and 1.1 != 1. So the value will remain a float and you won't lose any precision.