How to 'and' data without ignoring digits?

How to 'and' data without ignoring digits? - python

Say I have a number, 18573628, where each digit represents some kind of flag, and I want to check if the value of the fourth flag is set to 7 or not (which it is).
I do not want to use indexing. I want to in some way and with a flag mask, such as this:
00070000
I would normally use np.logical_and() or something like that, but that will consider any positive value to be True. How can I and while considering the value of a digit? For example, preforming the operation with
flags = 18573628
and
mask = 00070000
would yield 00010000
though trying a different mask, such as
mask = 00040000
would yield 00000000

What you can do is
if (x // 10**n % 10) == y:
...
to check if the n-th digit of x (counting from right) is equal to y

You have to use divide and modulo for a decimal mask:
flags = 18573628
mask = 10000
if (flags / mask) % 10 == 7:
do_something

You can convert the input number into an array of digit numbers and then simply indexing into that array with that specific index or indices would give us those digit(s). For doing that conversion, we can use np.fromstring, like so -
In [87]: nums = np.fromstring(str(18573628),dtype=np.uint8)-48
In [88]: nums
Out[88]: array([1, 8, 5, 7, 3, 6, 2, 8], dtype=uint8)
In [89]: nums[3] == 7
Out[89]: True

Say I have a number, 18573628, where each digit represents some kind of flag, and I want to check if the value of the fourth flag is set to 7
Firstly, bitwise operations like & are bit-wise, which is to say they operate on base-2 digits. They don't operate naturally on digits of any other base, although bases which are themselves powers of 2 work out ok.
To stick with bit-wise operations
You need to know how many values each flag can take, to figure out how many bits each flag needs to encode.
If you want to allow each flag the values zero to nine, you need four bits. However, in this scheme, your number won't behave like a normal integer (storing a base-10 digit in each 4-bit group is called Binary Coded Decimal).
The reason it won't behave like a normal integer is that flag values 1,2,3 will be stored as 1 * 16**2 + 2*16 + 3 instead of the 1 * 10**2 + 2*10 + 3 you'd normally expect. So you'd need to write some code to support this use. However, extracting flag n (counting from zero at the right) just becomes
def bcdFlagValue(bcd, flagnum):
if flagnum == 0:
return bcd & 0x0F;
return 0x0F & (bcd >> ((flagnum-1) * 4))
If you actually need a different range of values for each flag, you need to choose the correct number of bits, and adjust the shift and mask values appropriately.
In either case, you'll need a helper function if you want to print your flags as the base-10 number you showed.
To use normal base 10 numbers
You need to use division and modulo (as 6502 showed), because base-10 numbers don't fit evenly into base-2 bits, so simple bit operations don't work
Note
The BCD approach saves space at the cost of complexity, effort and some speed - from subsequent comments, it's probably simpler to just use the string of digit characters directly unless you really need to save 4 bits per digit.

if flags and mask are hexadecimal values, you can do:
flags = int("18573628", 16)
mask = int("00070000", 16)
result = flags & mask
print(hex(result))
=> '0x70000'

Without dealing with the particulars of your case (the SDSS data, which should be documented in the product specification), let's look at some options.
First, you need to to know if it is to be read in big-endian or little-endian order (is the first bit to the right or to the left). Then you need to know the size of each flag. For a series of yes-no parameters, it could simply be 1 bit (0 or 1). For up to four options, it could be two bits (00, 01, 10, 11), etc. It is also possible that some combinations are reserved for future expansion, don't currently have meaning, and should not be expected to occur in the data. I've also seen instances where the flag size varies, so first n bits mean refer to parameter x, next n bits refer to parameter y, etc.
There is a good explanation of the concept as part of Landsat-8 satellite imagery:
http://landsat.usgs.gov/qualityband.php
To read the values, you convert the base 10 integer to binary, and traverse it in the specified chunks, converting back to int to obtain the parameter values according to your product specification.

Related

Is there a built-in python function to count bit flip in a binary string?

Is there a built-in python function to count bit flip in a binary string? The question I am trying to solve is given a binary string of arbitrary length, how can I return the number of bit flips of the string. For example, the bit flip number of '01101' is 3, and '1110011' is 2, etc.
The way I can come up with to solve this problem is to use a for loop and a counter. However, that seems too lengthy. Is there a way I can do that faster? Or is there a built-in function in python that allows me to do that directly? Thanks for the help!

There is a very fast way to do that without any explicit loops and only using Python builtins: you can convert the string to a binary number, then detect all the bit flips using a XOR-based integer tricks and then convert the integer back to a string to count the number of bit flips. Here is the code:
# Convert the binary string `s` to an integer: "01101" -> 0b01101
n = int(s, 2)
# Build a binary mask to skip the most significant bit of n: 0b01101 -> 0b01111
mask = (1 << (len(s)-1)) - 1
# Check if the ith bit of n is different from the (i+1)th bit of n using a bit-wise XOR:
# 0b01101 & 0b01111 -> 0b1101 (discard the first bit)
# 0b01101 >> 1 -> 0b0110
# 0b1101 ^ 0b0110 -> 0b1011
bitFlips = (n & mask) ^ (n >> 1)
# Convert the integer back to a string and count the bit flips: 0b1011 -> "0b1011" -> 3
flipCount = bin(bitFlips).count('1')
This trick is much faster than other methods since integer operations are very optimized compare to a loop-based interpreted codes or the ones working on iterables. Here are performance results for a string of size 1000 on my machine:
ljdyer's solution: 96 us x1.0
Karl's solution: 39 us x2.5
This solution: 4 us x24.0
If you are working with short bounded strings, then there are even faster ways to count the number of bits set in an integer.

Don't know about a built in function, but here's a one-liner:
bit_flip_count = len([x for x in range(1, len(x0)) if x0[x] != x0[x-1]])

Given a sequence of values, you can find the number of times that the value changes by grouping contiguous values and then counting the groups. There will be one more group than the number of changes (since the elements before the first change are also in a group). (Of course, for an empty sequence, this gives you a result of -1; you may want to handle this case separately.)
Grouping in Python is built-in, via the standard library itertools.groupby. This tool only considers contiguous groups, which is often a drawback (if you want to make a histogram, for example, you have to sort the data first) but in our case is exactly what we want. The overall interface of this tool is a bit complex, but in our case we can use it simply:
from itertools import groupby
def changes_in(sequence):
return len(list(groupby(sequence))) - 1

What is the exact definition of bitwise not in Python, given arbitrary length integers?

Bitwise not (~) is well-defined in languages that define a specific bit length and format for ints. Since in Python 3, ints can be any length, they by definition have variable number of bits. Internally, I believe Python uses at least 28 bytes to store an int, but of course these aren't what bitwise not is defined on.
How does Python define bitwise not:
Is the bit length a function of the int size, the native platform, or something else?
Does Python sign extend, zero extend, or do something else?

Python integers emulate 2's-complement represenation with "an infinite" number of sign bits (all 0 bits for >= 0, all 1 bits for < 0).
All bitwise integer operations maintain this useful illusion.
For example, 0 is an infinite number of 0 bits, and ~0 is an infinite number of 1 bits - which is -1 in 2's-complement notation.
>>> ~0
-1
It's generally true that ~i = -i-1, which, in English, can be read as "the 1's-complement of i is the 2's-complement of i minus 1".
For right shifts of integers, Python fills with more copies of the sign bit.

Calculating 16-bit integer value from two 8-bit integers?

To illustrate what I mean: In a hex editor I have 8C 01 which is 396 little-endian. The data I'm working with is a tuple with two separate 8-bit integers i = (140, 1).
To calculate the 16-bit value my first approach was to simply multiply the 2nd integer by 255 then add the first. However this method is simply wrong as it does not give the correct value (due to my lack of knowledge). Can anyone provide a better (possibly Pythonic) approach?

You need to multiply it with 256 (28). So the function would be something like:
def pack (tup) :
return 256*tup[1]+tup[0]
or perform a bitwise shift, which makes more sense when working with bits:
def pack(tup) :
return (tup[1]<<8)|tup[0]
here << means you place the value of tup[1] eight positions to the left. The pipe (|) means you perform an OR-operation. This is reasonable if you enforce the values in the tuple are less than 256 and can - at least theoretically - result in some speedup.
More generic
In case your tuple has an arbitrary length (for instance three, four, or more elements), you can define a more generic function:
def pack(tup) :
sum = 0
for i in range(len(tup)) :
sum |= tup[i]<<(i<<3)
return sum
Here <<3 is used as a shortcut to multiplying with 8, so an equivalent function would be:
def pack(tup) :
sum = 0
for i in range(len(tup)) :
sum |= tup[i]<<(8*i)
return sum
Or written out, it is something like:
tup[0]|(tup[1]<<8)|(tup[2]<<16)|(...)

You should multiply by 256...
>>> i[1]*256 + i[0]
396
There is a Python way using struct module, but not necessary in such a simple case.
>>> from struct import pack, unpack
>>> unpack('<H', pack('BB', *i))[0]
396

How to create Python fixed length bits?

I wish to do bitwise negation in Python.
My expectation:
negate(0001) => 1110
But Python's ~0b0001 returns -0b10. It seems Python truncate 1110 into -0b10.
How to keep the leading bits?
Moreover, why
bin(~0b1) yields -0b10?
How many bits are reserved for that datatype?

Python uses arbitrary precision arithmetic, so you don't have to worry about the number of bits used. Also it returns -0b10 for bin(~0b1), because it understands that the result is -2 and represents the number as it is 10 and keeps the sign in the front (only for the negative numbers).
But we can represent the number as we like using format function, like this
def negate(number, bits = 32):
return format(~number & 2 ** bits - 1, "0{}b".format(bits))
print(negate(1))
# 11111111111111111111111111111110
print(negate(1, bits = 4))
# 1110
Or, as suggested by eryksun,
def negate(number, bits = 32):
return "{:0{}b}".format(~number & 2 ** bits - 1, bits)

Python acts as if its integers have infinitely many bits. As such, if you use ~ on one, the string representation can't start with an infinite number of 1s, or generating the string would never terminate. Instead, Python chooses to represent it as a negative number as it would be using two's compliment. If you want to restrict the integer to a number of bits, & it against an appropriate mask:
>>> bin((~1) & 0b1111)
'0b1110'

The tilde operator in Python

What's the usage of the tilde operator in Python?
One thing I can think about is do something in both sides of a string or list, such as check if a string is palindromic or not:
def is_palindromic(s):
return all(s[i] == s[~i] for i in range(len(s) / 2))
Any other good usage?

It is a unary operator (taking a single argument) that is borrowed from C, where all data types are just different ways of interpreting bytes. It is the "invert" or "complement" operation, in which all the bits of the input data are reversed.
In Python, for integers, the bits of the twos-complement representation of the integer are reversed (as in b <- b XOR 1 for each individual bit), and the result interpreted again as a twos-complement integer. So for integers, ~x is equivalent to (-x) - 1.
The reified form of the ~ operator is provided as operator.invert. To support this operator in your own class, give it an __invert__(self) method.
>>> import operator
>>> class Foo:
... def __invert__(self):
... print 'invert'
...
>>> x = Foo()
>>> operator.invert(x)
invert
>>> ~x
invert
Any class in which it is meaningful to have a "complement" or "inverse" of an instance that is also an instance of the same class is a possible candidate for the invert operator. However, operator overloading can lead to confusion if misused, so be sure that it really makes sense to do so before supplying an __invert__ method to your class. (Note that byte-strings [ex: '\xff'] do not support this operator, even though it is meaningful to invert all the bits of a byte-string.)

~ is the bitwise complement operator in python which essentially calculates -x - 1
So a table would look like
i ~i
-----
0 -1
1 -2
2 -3
3 -4
4 -5
5 -6
So for i = 0 it would compare s[0] with s[len(s) - 1], for i = 1, s[1] with s[len(s) - 2].
As for your other question, this can be useful for a range of bitwise hacks.

One should note that in the case of array indexing, array[~i] amounts to reversed_array[i]. It can be seen as indexing starting from the end of the array:
[0, 1, 2, 3, 4, 5, 6, 7, 8]
^ ^
i ~i

Besides being a bitwise complement operator, ~ can also help revert a boolean value, though it is not the conventional bool type here, rather you should use numpy.bool_.
This is explained in,
import numpy as np
assert ~np.True_ == np.False_
Reversing logical value can be useful sometimes, e.g., below ~ operator is used to cleanse your dataset and return you a column without NaN.
from numpy import NaN
import pandas as pd
matrix = pd.DataFrame([1,2,3,4,NaN], columns=['Number'], dtype='float64')
# Remove NaN in column 'Number'
matrix['Number'][~matrix['Number'].isnull()]

The only time I've ever used this in practice is with numpy/pandas. For example, with the .isin() dataframe method.
In the docs they show this basic example
>>> df.isin([0, 2])
num_legs num_wings
falcon True True
dog False True
But what if instead you wanted all the rows not in [0, 2]?
>>> ~df.isin([0, 2])
num_legs num_wings
falcon False False
dog True False

I was solving this leetcode problem and I came across this beautiful solution by a user named Zitao Wang.
The problem goes like this for each element in the given array find the product of all the remaining numbers without making use of divison and in O(n) time
The standard solution is:
Pass 1: For all elements compute product of all the elements to the left of it
Pass 2: For all elements compute product of all the elements to the right of it
and then multiplying them for the final answer
His solution uses only one for loop by making use of. He computes the left product and right product on the fly using ~
def productExceptSelf(self, nums):
res = [1]*len(nums)
lprod = 1
rprod = 1
for i in range(len(nums)):
res[i] *= lprod
lprod *= nums[i]
res[~i] *= rprod
rprod *= nums[~i]
return res

Explaining why -x -1 is correct in general (for integers)
Sometimes (example), people are surprised by the mathematical behaviour of the ~ operator. They might reason, for example, that rather than evaluating to -19, the result of ~18 should be 13 (since bin(18) gives '0b10010', inverting the bits would give '0b01101' which represents 13 - right?). Or perhaps they might expect 237 (treating the input as signed 8-bit quantity), or some other positive value corresponding to larger integer sizes (such as the machine word size).
Note, here, that the signed interpretation of the bits 11101101 (which, treated as unsigned, give 237) is... -19. The same will happen for larger numbers of bits. In fact, as long as we use at least 6 bits, and treating the result as signed, we get the same answer: -19.
The mathematical rule - negate, and then subtract one - holds for all inputs, as long as we use enough bits, and treat the result as signed.
And, this being Python, conceptually numbers use an arbitrary number of bits. The implementation will allocate more space automatically, according to what is necessary to represent the number. (For example, if the value would "fit" in one machine word, then only one is used; the data type abstracts the process of sign-extending the number out to infinity.) It also does not have any separate unsigned-integer type; integers simply are signed in Python. (After all, since we aren't in control of the amount of memory used anyway, what's the point in denying access to negative values?)
This breaks intuition for a lot of people coming from a C environment, in which it's arguably best practice to use only unsigned types for bit manipulation and then apply 2s-complement interpretation later (and only if appropriate; if a value is being treated as a group of "flags", then a signed interpretation is unlikely to make sense). Python's implementation of ~, however, is consistent with its other design choices.
How to force unsigned behaviour
If we wanted to get 13, 237 or anything else like that from inverting the bits of 18, we would need some external mechanism to specify how many bits to invert. (Again, 18 conceptually has arbitrarily many leading 0s in its binary representation in an arbitrary number of bits; inverting them would result in something with leading 1s; and interpreting that in 2s complement would give a negative result.)
The simplest approach is to simply mask off those arbitrarily-many bits. To get 13 from inverting 18, we want 5 bits, so we mask with 0b11111, i.e., 31. More generally (and giving the same interface for the original behaviour):
def invert(value, bits=None):
result = ~value
return result if bits is None else (result & ((1 << bits) - 1))
Another way, per Andrew Jenkins' answer at the linked example question, is to XOR directly with the mask. Interestingly enough, we can use XOR to handle the default, arbitrary-precision case. We simply use an arbitrary-sized mask, i.e. an integer that conceptually has an arbitrary number of 1 bits in its binary representation - i.e., -1. Thus:
def invert(value, bits=None):
return value ^ (-1 if bits is None else ((1 << bits) - 1))
However, using XOR like this will give strange results for a negative value - because all those arbitrarily-many set bits "before" (in more-significant positions) the XOR mask weren't cleared:
>>> invert(-19, 5) # notice the result is equal to 18 - 32
-14

it's called Binary One’s Complement (~)
It returns the one’s complement of a number’s binary. It flips the bits. Binary for 2 is 00000010. Its one’s complement is 11111101.
This is binary for -3. So, this results in -3. Similarly, ~1 results in -2.
~-3
Output : 2
Again, one’s complement of -3 is 2.

This is minor usage is tilde...
def split_train_test_by_id(data, test_ratio, id_column):
ids = data[id_column]
in_test_set = ids.apply(lambda id_: test_set_check(id_, test_ratio))
return data.loc[~in_test_set], data.loc[in_test_set]
the code above is from "Hands On Machine Learning"
you use tilde (~ sign) as alternative to - sign index marker
just like you use minus - is for integer index
ex)
array = [1,2,3,4,5,6]
print(array[-1])
is the samething as
print(array[~1])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to 'and' data without ignoring digits? - python

What you can do is if (x // 10**n % 10) == y: ... to check if the n-th digit of x (counting from right) is equal to y

You have to use divide and modulo for a decimal mask: flags = 18573628 mask = 10000 if (flags / mask) % 10 == 7: do_something

if flags and mask are hexadecimal values, you can do: flags = int("18573628", 16) mask = int("00070000", 16) result = flags & mask print(hex(result)) => '0x70000'

Related

Is there a built-in python function to count bit flip in a binary string?

What is the exact definition of bitwise not in Python, given arbitrary length integers?

Calculating 16-bit integer value from two 8-bit integers?

How to create Python fixed length bits?

The tilde operator in Python

Categories

Resources