trouble with grabbing tag bits - python

I'm implementing a direct mapped cache using python which is direct mapped. Each line in cache contains 4 bytes. I'm having trouble for some reason with pulling out the first (in this case) 27 bits, and also the last 5 bits by using bit shifting.
I'm not sure what exactly I'm doing wrong in terms of bitshifting, but everything I've done is not giving me the desired bits I want. I'm doing a sort of "hard-coded" solution for now but converting the integer stored in cache to a bit string, and using python's string indexing to just get the first 27 bits, though I do want to know how to do it via bit shifting.
def getTag(d_bytes):
b = bin(d_bytes)
b = b[2:]
return (b[0:27])
Is the hard-coded solution I'm referring to.
If the value stored in cache is
0b11010101010101010000100010001
I would like to have a tag of:
110101010101010100001000 (The first 27 bits, as tag = (line size - index - offset)
An index of:
100 - next 3 bits following tag
and an offset of:
01 (The last two bits) - last two bits

You can extract the bits by masking and shifting.
To get the first n bits, the mask to use is 000011..(n times)..11. This mask can simply be generated with (1<<n)-1. This is equal to the number 2^n-1 whose code is exactly the mask that we want.
Now if you want to extract a bitfield that is at any position in your word, you have first to shift it right to the proper position, then use masking.
So for your problem, you can use
# extract n bits of x starting at position m
def getfield(x,n,m):
r=x>>m # shift it right to have lsb of bitfield at position 0
return r&((1<<n)-1) # then mask to extract n bits
lsb27=getfield(tag,27,0) # get bits x[26:0]
msb5=getfield(tag,5,27) # get bits x[31:27]

Related

How can read, manipulate, and write a range of (less than 8) bits of a byte array in Python?

I'm looking for a way to read, manipulate, and write specific ranges of (less than 8) bits in a byte array.
I want something like, let's say I want to work with the 6-8th bit of a byte array:
R0 = bytearray(8)
bits = readBits(R0, 6, 2) # bytearray, position, range
print(bits) # [0, 0] or maybe it's easier to work with a byte
# with the irrelevant bits zeroed, you tell me
# do stuff ... XOR, AND, ETC
writeBits(R0, bits, 6) # target, source, position
I'm writing an x86-64 emulator in Python and some instructions in the General Purpose Registers can reference 2 bit segments of the 64 bit register, which I've been representing as bytearray(8). The smallest unit of a byte array seems to be a byte (go figure). Is there a way to select bits? Let's say I want to read, manipulate, and update the 6th-8th bits of an 8 byte object? What does that look like?
Maybe there's a more granular data structure I should be using rather than a byte array?
One answer to a similar question working with a hex string suggests:
i = int("140900793d002327", 16)
# getting bit at position 28 (counting from 0 from right)
i >> 28 & 1
# getting bits at position 24-27
bin(i >> 24 & 0b111)
But this code isn't explained super well, for example what does 0b111 do here and how do I use this approach dynamically to isolate any desired range of bits as with my imaginary readBits and writeBits funcs, rather than the hard-coded functionality here?

Creating a unique short ID from string value

I have some data that has unique IDs stored as a string in the form:
ddd.dddddaddd.dddddz
Where d is some digit and a/z is some alphabet character. The digits may be 0-9 and the characters are either E or W for the a and N or S for the z.
I'd like to turn this into a unique integer and what I've tried using the hashlib module returns:
>>> int(hashlib.sha256(str.encode(s)).hexdigest(), 16)
Output: a very long integer (on another system cannot copy it)
Is there a way to generate a unique integer ID from a string so that it does not exceed 12 digits? I know that I will never need a unique integer ID beyond 12 digits.
Just something simple:
>>> s = '123.45678W123.45678S'
>>> int(s.translate(str.maketrans('EWNS', '1234', '.')))
123456782123456784
Not the impossible 12 digits you're still asking for in the question, but under the 20 digits you allowed in the comments.
As you are dealing with coordinates, I would try my best to keep the information in the final 12-digit ID.
If your points are global, it might be necessary to keep the degrees but they may be widespread, so you can sacrifice some information when it comes to precision.
If your points are local (all within a range of less than 10 degrees) you might skip the first two digits of the degrees and focus on the decimals.
As it may be possible that two points are close to each other, it may be prudent to reserve one digit as a serial number.
Proposal for widespread points:
s = "123.45678N123.45678E"
ident = "".join([s[0:6],s[10:16]]).replace(".","")
q = 0
if s[9]=="N":
q+=1
if s[-1]=="E":
q+=2
ident+=str(q)+'0'
The example would translate to 123451234530.
After computing the initial ident numbers for each ID, you should loop through them and increment the last digit if an ident is already taken.
This way you could easily reconstruct the location from the ID by just separating the first 10 digits to two degrees of the format ddd.dd and use the [-2] digit as an indicator of the quadrant (0:SW, 1:SE, 2:NW, 3:NE).

Python: mask/remove the least significant 2 bits of every 16-bit integer

I want to remove the least significant 2 bits of every 16-bit integer from a bitarray. They're stored like this:
010101**00**10010101101100**00**10101010.....
(The zeroes between the asterisks will be removed. There are two of them every 16 bits (ignoring the very first)).
I can simply eliminate them with a regular for loop checking indexes (the 7th and 8th after every 16 bits).
But... is there another more pythonic way to do this? I'm thinking about some slice notation or maybe comprehension lists. Perhaps I could divide every number by 4 and encode every one with 14 bits (if there's a way to do that).
You can clear bits quite easily with masking. If you want to clear bits 8 and 7 you can do it like this:
a = int('10010101101100',2)
mask = ~((1 << 7) | (1 << 8))
bin(a&mask)
more information about masking from here!

How to 'and' data without ignoring digits?

Say I have a number, 18573628, where each digit represents some kind of flag, and I want to check if the value of the fourth flag is set to 7 or not (which it is).
I do not want to use indexing. I want to in some way and with a flag mask, such as this:
00070000
I would normally use np.logical_and() or something like that, but that will consider any positive value to be True. How can I and while considering the value of a digit? For example, preforming the operation with
flags = 18573628
and
mask = 00070000
would yield 00010000
though trying a different mask, such as
mask = 00040000
would yield 00000000
What you can do is
if (x // 10**n % 10) == y:
...
to check if the n-th digit of x (counting from right) is equal to y
You have to use divide and modulo for a decimal mask:
flags = 18573628
mask = 10000
if (flags / mask) % 10 == 7:
do_something
You can convert the input number into an array of digit numbers and then simply indexing into that array with that specific index or indices would give us those digit(s). For doing that conversion, we can use np.fromstring, like so -
In [87]: nums = np.fromstring(str(18573628),dtype=np.uint8)-48
In [88]: nums
Out[88]: array([1, 8, 5, 7, 3, 6, 2, 8], dtype=uint8)
In [89]: nums[3] == 7
Out[89]: True
Say I have a number, 18573628, where each digit represents some kind of flag, and I want to check if the value of the fourth flag is set to 7
Firstly, bitwise operations like & are bit-wise, which is to say they operate on base-2 digits. They don't operate naturally on digits of any other base, although bases which are themselves powers of 2 work out ok.
To stick with bit-wise operations
You need to know how many values each flag can take, to figure out how many bits each flag needs to encode.
If you want to allow each flag the values zero to nine, you need four bits. However, in this scheme, your number won't behave like a normal integer (storing a base-10 digit in each 4-bit group is called Binary Coded Decimal).
The reason it won't behave like a normal integer is that flag values 1,2,3 will be stored as 1 * 16**2 + 2*16 + 3 instead of the 1 * 10**2 + 2*10 + 3 you'd normally expect. So you'd need to write some code to support this use. However, extracting flag n (counting from zero at the right) just becomes
def bcdFlagValue(bcd, flagnum):
if flagnum == 0:
return bcd & 0x0F;
return 0x0F & (bcd >> ((flagnum-1) * 4))
If you actually need a different range of values for each flag, you need to choose the correct number of bits, and adjust the shift and mask values appropriately.
In either case, you'll need a helper function if you want to print your flags as the base-10 number you showed.
To use normal base 10 numbers
You need to use division and modulo (as 6502 showed), because base-10 numbers don't fit evenly into base-2 bits, so simple bit operations don't work
Note
The BCD approach saves space at the cost of complexity, effort and some speed - from subsequent comments, it's probably simpler to just use the string of digit characters directly unless you really need to save 4 bits per digit.
if flags and mask are hexadecimal values, you can do:
flags = int("18573628", 16)
mask = int("00070000", 16)
result = flags & mask
print(hex(result))
=> '0x70000'
Without dealing with the particulars of your case (the SDSS data, which should be documented in the product specification), let's look at some options.
First, you need to to know if it is to be read in big-endian or little-endian order (is the first bit to the right or to the left). Then you need to know the size of each flag. For a series of yes-no parameters, it could simply be 1 bit (0 or 1). For up to four options, it could be two bits (00, 01, 10, 11), etc. It is also possible that some combinations are reserved for future expansion, don't currently have meaning, and should not be expected to occur in the data. I've also seen instances where the flag size varies, so first n bits mean refer to parameter x, next n bits refer to parameter y, etc.
There is a good explanation of the concept as part of Landsat-8 satellite imagery:
http://landsat.usgs.gov/qualityband.php
To read the values, you convert the base 10 integer to binary, and traverse it in the specified chunks, converting back to int to obtain the parameter values according to your product specification.

PNG Chunk type-code Bit #5

I'm trying to write my own little PNG reader in Python. There is something in the documentation I don't quite understand. In chapter 3.3 (where chunks are handled) it says:
Four bits of the type code, namely bit 5 (value 32) of each byte, are used to convey chunk properties. This
choice means that a human can read off the assigned properties according to whether each letter of the type
code is uppercase (bit 5 is 0) or lowercase (bit 5 is 1). However, decoders should test the properties of an unknown
chunk by numerically testing the specified bits; testing whether a character is uppercase or lowercase
is inefficient, and even incorrect if a locale-specific case definition is used.
Ok, so it explicitly denotes one should not test whether a byte is uppercase or lowercase. Then, how do I check that bit 5?
Furthermore, the documentation states
Ancillary bit: bit 5 of first byte
0 (uppercase) = critical, 1 (lowercase) = ancillary.
I have the following function to convert an integer to a bit-stream:
def bits(x, n):
""" Convert an integer value *x* to a sequence of *n* bits as a string. """
return ''.join(str([0, 1][x >> i & 1]) for i in xrange(n - 1, -1, -1))
Just for example, take the sRGB chunk. The lowercase s denotes the chunk is ancillary. But comparing the bit-streams of an uppercase S and lowercase s
01110011
01010011
we can see that bit #5 is zero in both cases.
I think I do have a wrong understanding of counting the bits. As the only bit that changes is the third one (i.e. indexed with 2), i assume this is the bit I'm searching for? It is also the 6th bit from the right and indexed with 5 (from the right of course). Is this what I'm searching for?
Python does have bitwise manipulation. You are doing it the hard way, when they already gave you the bitmask (32 or 0x20).
is_critical = (type_code & 0x20) == 0
or, equivalently:
is_critical = (type_code & (0x1 << 5)) == 0
(with extra parentheses for clarity)

Categories