Most Significant Byte Calculation - python

I am trying to implement a larger cipher problem, and I am running into an issue I don't quite understand when taking the Most Significant Byte (not bit).
To turn an int into a byte I am using:
def binary(i):
if i == 0:
return "0"
s = ''
while i:
if i & 1 == 1:
s = "1" + s
else:
s = "0" + s
i >>= 1
return s
I am pretty sure the above is correct, it works with my test numbers. To then extract the Most Significant Byte I am using:
def msb(i):
a = binary(i)
b = a[0:7]
c = int(b,2)
return c
However, this seems to return a number half what I would expect. Am I wrong in thinking you can get the most significant byte by just taking the first 8 bits, or am I missing something else silly?

Your example code only gets the seven leading bits, not 8:
def msb(i):
a = binary(i)
b = a[0:7] # gets first SEVEN characters of string a
c = int(b,2)
return c
Change it to a[0:8] to extract 8 leading characters/bits rather than 7.

There are much easier ways to do this. For example, if you want the top eight bits (ignoring byte alignment), you can do:
def msb(val):
return val >> (val.bit_length() - 8)
For the most significant aligned byte, in Python 3 you can do:
def msb(val):
return val.to_bytes((val.bit_length() + 7) // 8, 'big')[0]
In Py2, you'd have to convert to a hex string and back to match the to_bytes approach.

a byte is 0xFF you can get the most signifigant byte(leftmost) by doing
i & (0xFF<<(n_bytes(i)-1))
I always get most significant and least significant confused if you want the rightmost byte its easier even
i & 0xFF
i think thats right at least ... im not sure if it will be guaranteed to return the number of bytes or not ...
based on your example i think the second code is what you want
you could also do something like
s = struct.pack("i",i)
ord(s[0]) # leftmost
ord(s[-1]) # rightmost

If you want aligned bytes, this should work at least from Python 2.5 onwards:
def msb(val):
return 0 if val == 0 else val >> (((val.bit_length() - 1) >> 3) << 3)
Or, if you prefer it more readable:
def msb(val)
if val == 0:
return 0
else:
return val >> (((val.bit_length() - 1) / 8) * 8)

Related

Why is using integers much slower than using strings in this problem dealing with binary strings?

I solved the following leet code problem https://leetcode.com/problems/find-kth-bit-in-nth-binary-string/ by using strings (class Solution) and integers (class Solution2). I thought the integer solution should be a lot faster and using much less memory. But actually it takes a lot longer. With n=18 and k=200 it took 10.1 seconds compared to 0.13 seconds of the string solution.
import time
class Solution:
def findKthBit(self, n: int, k: int) -> str:
i = 0
Sn = "0"
Sn = self.findR(Sn, i, n)
return Sn[k-1]
def findR(self, Sn, i, n):
if i == n:
return Sn
newSn = self.calcNewSn(Sn, i)
return self.findR(newSn, i+1, n)
def calcNewSn(self, Sn, i):
inverted = ""
for c in Sn:
inverted += "1" if c == "0" else "0"
newSn = Sn + "1" + (inverted)[::-1]
return newSn
class Solution2:
def findKthBitBin(self, n: int, k: int) -> str:
i = 0
# MSB (S1) has to be 1 for bit operations but is actually always 0
Sn = 1
Sn = self.findRBin(Sn, i, n)
lenSn = (2**(n+1)) - 1
return "0" if k == 1 else str((Sn >> (lenSn - k)) & 1)
def findRBin(self, Sn, i, n):
if i == n:
return Sn
newSn = self.calcNewSnBin(Sn, i)
return self.findRBin(newSn, i+1, n)
def calcNewSnBin(self, Sn, i):
lenSn = 2**(i+1) - 1
newSn = (Sn << 1) | 1
inverted = (~Sn | (1 << (lenSn - 1))) & self.getMask(lenSn)
newSn = self.reverseBits(newSn, inverted, lenSn)
return newSn
def getMask(self, lenSn):
mask = 0
for i in range(lenSn):
mask |= (1 << i)
return mask
def reverseBits(self, newSn, inverted, lenSn):
for i in range(lenSn):
newSn <<= 1
newSn |= inverted & 1
inverted >>= 1
return newSn
sol = Solution()
sol2 = Solution2()
start = time.time()
print(sol.findKthBit(18, 200))
end = time.time()
print(f"time using strings: {end - start}")
start = time.time()
print(sol2.findKthBitBin(18, 200))
end = time.time()
print(f"time using large integers: {end - start}")
Why is the solution using integers so slow? Are the implemetations of big ints faster in other languages compared to strings?
The problem isn't how fast or slow the integer implementation is. The problem is that you've written an algorithm that wants a random-access mutable sequence data type, like a list, and integers are not a random-access mutable sequence.
For example, look at your reverseBits implementation:
def reverseBits(self, newSn, inverted, lenSn):
for i in range(lenSn):
newSn <<= 1
newSn |= inverted & 1
inverted >>= 1
return newSn
With a list, you could just call list.reverse(). Instead, your code shifts newSn and inverted over and over, copying almost all the data involved repeatedly on every iteration.
Strings aren't a random-access mutable sequence data type either, but they're closer, and the standard implementation of Python has a weird optimization that actually does try to mutate strings in loops like the one you've written:
def calcNewSn(self, Sn, i):
inverted = ""
for c in Sn:
inverted += "1" if c == "0" else "0"
newSn = Sn + "1" + (inverted)[::-1]
return newSn
Instead of building a new string and copying all the data for the +=, Python tries to resize the string in place. When that works, it avoids most of the unnecessary copying you're doing with the integer-based implementation.
The right way to write your algorithm would be to use lists, instead of strings or integers. However, there's an even better way, with a different algorithm that doesn't need to build a bit sequence at all. You don't need the whole bit sequence. You just need to compute a single bit. Figure out how to compute just that bit.
The bulk of the time is spent in reverseBits(). Overall, you're creating hundreds of thousands of new integer objects in that, up to 262143 bits long. It's essentially quadratic time. The corresponding operation for the string form is just (inverted)[::-1], which runs entirely at "C speed", and is worst-case linear time.
Low Level, High Level
You can get a factor of 8 speedup just by replacing reverseBits()
def reverseBits(self, newSn, inverted, lenSn):
inverted = int(f"{inverted:0{lenSn}b}"[::-1], 2)
return (newSn << lenSn) | inverted
That's linear time. It converts inverted to a bit string, reverses that string, and converts the result back to an int. Other tricks can be used to speed other overly-slow bigint algorithms.
But, as the other answer (which you should accept!) said, it's missing the forest for the trees: the whole approach is off the best track. It's possible, and even with simpler code, to write a function such that all these run blazing fast (O(n) worst case per call), and require only trivial working memory:
>>> [findKthBit(4, i) for i in range(1, 16)] == list("011100110110001")
True
>>> findKthBit(500, 2**20)
'1'
>>> findKthBit(500, 2**20 - 1)
'1'
>>> findKthBit(500, 2**20 + 1)
'0'
There's not a computer in the world with enough memory to build a string with 2**500-1 characters (or a bigint with that many bits).

Extended fields when creating a Scapy Layer (chained bytes)

I am trying to create a Layer in Scapy for a protocol which has a custom field type called "Extended Field".
The principle is quite simple but I struggle with implementation.
Principle is:
First byte of the field is at a known position followed by a variable number of bytes.
The number of bytes is not defined anywhere in the frame.
If LSB of a byte is "1" then the following byte is part of the field
If LSB of a byte is "0" then it is the end of the field.
Result is a bit field with concatenated 7 MSB bits of each byte
I made a picture to make it simpler:
Extended Field Description
I have read a lot of stuff about variable length fields in Scapy but as far as I understand this does not cover this case.
Do you think it can be implemented in a Scapy Layer? Any help would be appreciated.
Ok, I answer myself after a bit of digging into Scapy.
Here is a solution that works (maybe not optimal, but enough for me):
from scapy.all import *
class LSBExtendedField(Field):
"""
LSB Extended Field
------------------
This type of field has a variable number of bytes. Each byte is defined as follows:
- The 7 MSB bits are data
- The LSB is an extenesion bit
* 0 means it is last byte of the field ("stopping bit")
* 1 means there is another byte after this one ("forwarding bit")
To get the actual data, it is necessary to navigate the binary data byte per byte and to check if LSB until 0
"""
"""
Converts bytes to field
"""
def str2extended(self, l=""):
s = []
# First bit is the stopping bit at zero
bits = 0b0
# Then we retrieve 7 bits. If "forwarding bit" is 1, then we continue on another byte
i = 0
for c in l:
s.append(hex(c & 0xfe))
bits = bits << 7 | (int(c) >> 1)
if not int(c)&0b1:
end = l[i+1:]
break
i=i+1
return end, bits
"""
Converts field to bytes
"""
def extended2str(self, l):
l=int(l)
s = []
# First bit is the stopping bit at zero
bits = 0b0
# Then we group bits 7 by 7 adding the "forwarding bit" if necessary
i=1
while (l>0):
if i%8 == 0:
s.append(bits)
bits = 0b1
i=0
else:
bits = bits | (l & 0b1) << i
l = l >> 1
i = i+1
s.append(bits)
s.reverse()
result = "".encode()
for x in s:
result = result + struct.pack(">B", x)
return result
def i2m(self, pkt, x):
return self.extended2str(x)
def m2i(self, pkt, x):
return self.str2extended(x)[1]
def addfield(self, pkt, s, val):
return s+self.i2m(pkt, val)
def getfield(self, pkt, s):
return self.str2extended(s)

Checking specific bits of a bitmask

I am working with Bitmasks in python. As far as I know, these are arrays of integers that when they are unpacked into binary format they tell you which of the 32 bits are set (=1) for a given element in the array.
I would like to know the fastest way to check whether 4 specific bits are set or not for any element of an array. I do not care about the rest. I have tried the following solution but it is not fast enough for my purpose:
def detect(bitmask, check=(18,22,23,24), bits=32):
boolmask = np.zeros(len(bitmask), dtype=bool)
for i, val in enumerate(bitmask):
bithost = np.zeros(bits, dtype='i1')
masklist = list(bin(val)[2:])
bithost[:len(masklist)] = np.flip(masklist,axis=0)
if len(np.intersect1d(np.nonzero(bithost)[0] ,check)) != 0:
boolmask[i] = True
else:
boolmask[i] = False
if any(boolmask):
print("There are some problems")
else:
print("It is clean")
For example, if a given bitmask contains the integer 24453656 (1011101010010001000011000 in binary), the output of function detect would be "There are some problems" since bit 22 is set:
bit: ... 20, 21, 22, 23, 24,...
mask: ... 0, 0, 1, 0, 0,...
Any ideas on how to improve the performance?
Integers are nothing but sequence of bits in the computer.
So, if you get integer, let's say: 333 it is a sequence of bits 101001101 to the computer. It doesn't need any unpacking into bits. It is bits.
Therefore, if the mask is also an integer, you don't need any unpacking, just apply bitwise operations to it. Check wikipedia for details of how these work.
In order to check if ANY of the bits xyz are set in an integer abc, you do:
(abc & xyz) > 0. If you absolutely need checking mask to be a tuple of bit places, you do some packing, like this:
def detect(bitmask,check=(18,22,23,24)):
checkmask=sum(2**x for x in check)
if (bitmask & checkmask) > 0:
print "There are some problems"
else:
print "Everything OK"
Note that bitmasks start with 0 based bit indices. First bit is bit 0.
I am not sure what's in your bitmask argument. Regarless, you should probably use bitwise operators.
Make a bit mask like this:
def turn_bits_on(bits):
n = 0
for k in bits:
n = (n | (1 << (k - 1))) if k > 0 else n
return n
bits_to_check = turn_bits_on([18, 22, 23, 24])
Then, for a single number, you can detect with:
def is_ok(value, mask):
return not (value & mask)
print(is_ok(24453656, bits_to_check))
Finally, depending on what your bitmask value is (a list, a DataFrame, etc), apply the is_ok() function to each value.
Hope this helps!

read single bit operation python 2.6

I am trying to read a single bit in a binary string but can't seem to get it to work properly. I read in a value then convert to a 32b string. From there I need to read a specific bit in the string but its not always the same. getBin function returns 32bit string with leading 0's. The code I have always returns a 1, even if the bit is a 0. Code example:
slot=195035377
getBin = lambda x, n: x >= 0 and str(bin(x))[2:].zfill(n) or "-" + str(bin(x))[3:].zfill(n)
bits = getBin(slot,32)
bit = (bits and (1 * (2 ** y)) != 0)
print("bit: %i\n"%(bit))
in this example bits = 00001011101000000000000011110011
and if I am looking for bit3 which i s a 0, bit will be equal to 1. Any ideas?
To test for specific bits in a integer value, use the & bitwise operand; no need to convert this to a binary string.
if slot & (1 << 3):
print 'bit 3 is set'
else:
print 'bit 3 is not set'
The above code shifts a test bit to the left twice. Alternatively, shift slot to the right 3 times:
if (slot >> 2) & 1:
To make this generic for any bit position, subtract 1:
if slot & (1 << (bitpos - 1)):
print 'bit {} is set'.format(bitpos)
or
if (slot >> (bitpos - 1)) & 1:
Your binary formatting code is overly verbose. Just use the format() function to create a binary string representation:
format(slot, '032b')
formats your binary value to a 0-padded 32-character binary string.
n = 223
bitpos = 3
bit3 = (n >> (bitpos-1))&1
is how you should be doing it ... don't use strings!
You can just use slicing to get the correct digit.
bits = getBin(slot, 32)
bit = bits[bit_location-1:bit_location] #Assumes zero based values
print("bit: %i\n"%(bit))

How to get the signed integer value of a long in python?

If lv stores a long value, and the machine is 32 bits, the following code:
iv = int(lv & 0xffffffff)
results an iv of type long, instead of the machine's int.
How can I get the (signed) int value in this case?
import ctypes
number = lv & 0xFFFFFFFF
signed_number = ctypes.c_long(number).value
You're working in a high-level scripting language; by nature, the native data types of the system you're running on aren't visible. You can't cast to a native signed int with code like this.
If you know that you want the value converted to a 32-bit signed integer--regardless of the platform--you can just do the conversion with the simple math:
iv = 0xDEADBEEF
if(iv & 0x80000000):
iv = -0x100000000 + iv
Essentially, the problem is to sign extend from 32 bits to... an infinite number of bits, because Python has arbitrarily large integers. Normally, sign extension is done automatically by CPU instructions when casting, so it's interesting that this is harder in Python than it would be in, say, C.
By playing around, I found something similar to BreizhGatch's function, but that doesn't require a conditional statement. n & 0x80000000 extracts the 32-bit sign bit; then, the - keeps the same 32-bit representation but sign-extends it; finally, the extended sign bits are set on n.
def toSigned32(n):
n = n & 0xffffffff
return n | (-(n & 0x80000000))
Bit Twiddling Hacks suggests another solution that perhaps works more generally. n ^ 0x80000000 flips the 32-bit sign bit; then - 0x80000000 will sign-extend the opposite bit. Another way to think about it is that initially, negative numbers are above positive numbers (separated by 0x80000000); the ^ swaps their positions; then the - shifts negative numbers to below 0.
def toSigned32(n):
n = n & 0xffffffff
return (n ^ 0x80000000) - 0x80000000
Can I suggest this:
def getSignedNumber(number, bitLength):
mask = (2 ** bitLength) - 1
if number & (1 << (bitLength - 1)):
return number | ~mask
else:
return number & mask
print iv, '->', getSignedNumber(iv, 32)
You may use struct library to convert values like that. It's ugly, but works:
from struct import pack, unpack
signed = unpack('l', pack('L', lv & 0xffffffff))[0]
A quick and dirty solution (x is never greater than 32-bit in my case).
if x > 0x7fffffff:
x = x - 4294967296
If you know how many bits are in the original value, e.g. byte or multibyte values from an I2C sensor, then you can do the standard Two's Complement conversion:
def TwosComp8(n):
return n - 0x100 if n & 0x80 else n
def TwosComp16(n):
return n - 0x10000 if n & 0x8000 else n
def TwosComp32(n):
return n - 0x100000000 if n & 0x80000000 else n
In case the hexadecimal representation of the number is of 4 bytes, this would solve the problem.
def B2T_32(x):
num=int(x,16)
if(num & 0x80000000): # If it has the negative sign bit. (MSB=1)
num -= 0x80000000*2
return num
print(B2T_32(input("enter a input as a hex value\n")))
Simplest solution with any bit-length of number
Why is the syntax of a signed integer so difficult for the human mind to understand. Because this is the idea of machines. :-)
Let's explain.
If we have a bi-directional 7-bit counter with the initial state
000 0000
and we get a pulse for the back count input. Then the next number to count will be
111 1111
And the people said:
Hey, the counter we need to know that this is a negative reload. You
should add a sign letting you know about this.
And the counter added:
1111 1111
And people asked,
How are we going to calculate that this is -1.
The counter replied: Find a number one greater than the reading and subtract it and you get the result.
1111 1111
-10000 0000
____________
(dec) -1
def sigIntFromHex(a): # a = 0x0xffe1
if a & (1 << (a.bit_length()-1)): # check if highest bit is 1 thru & with 0x1000
return a - (1 << (a.bit_length())) # 0xffe1 - 0x10000
else:
return a
###and more elegant:###
def sigIntFromHex(a):
return a - (1 << (a.bit_length())) if a & (1 << (a.bit_length()-1)) else a
b = 0xFFE1
print(sigIntFromHex(b))
I hope I helped

Categories