Extended fields when creating a Scapy Layer (chained bytes) - python

I am trying to create a Layer in Scapy for a protocol which has a custom field type called "Extended Field".
The principle is quite simple but I struggle with implementation.
Principle is:
First byte of the field is at a known position followed by a variable number of bytes.
The number of bytes is not defined anywhere in the frame.
If LSB of a byte is "1" then the following byte is part of the field
If LSB of a byte is "0" then it is the end of the field.
Result is a bit field with concatenated 7 MSB bits of each byte
I made a picture to make it simpler:
Extended Field Description
I have read a lot of stuff about variable length fields in Scapy but as far as I understand this does not cover this case.
Do you think it can be implemented in a Scapy Layer? Any help would be appreciated.

Ok, I answer myself after a bit of digging into Scapy.
Here is a solution that works (maybe not optimal, but enough for me):
from scapy.all import *
class LSBExtendedField(Field):
"""
LSB Extended Field
------------------
This type of field has a variable number of bytes. Each byte is defined as follows:
- The 7 MSB bits are data
- The LSB is an extenesion bit
* 0 means it is last byte of the field ("stopping bit")
* 1 means there is another byte after this one ("forwarding bit")
To get the actual data, it is necessary to navigate the binary data byte per byte and to check if LSB until 0
"""
"""
Converts bytes to field
"""
def str2extended(self, l=""):
s = []
# First bit is the stopping bit at zero
bits = 0b0
# Then we retrieve 7 bits. If "forwarding bit" is 1, then we continue on another byte
i = 0
for c in l:
s.append(hex(c & 0xfe))
bits = bits << 7 | (int(c) >> 1)
if not int(c)&0b1:
end = l[i+1:]
break
i=i+1
return end, bits
"""
Converts field to bytes
"""
def extended2str(self, l):
l=int(l)
s = []
# First bit is the stopping bit at zero
bits = 0b0
# Then we group bits 7 by 7 adding the "forwarding bit" if necessary
i=1
while (l>0):
if i%8 == 0:
s.append(bits)
bits = 0b1
i=0
else:
bits = bits | (l & 0b1) << i
l = l >> 1
i = i+1
s.append(bits)
s.reverse()
result = "".encode()
for x in s:
result = result + struct.pack(">B", x)
return result
def i2m(self, pkt, x):
return self.extended2str(x)
def m2i(self, pkt, x):
return self.str2extended(x)[1]
def addfield(self, pkt, s, val):
return s+self.i2m(pkt, val)
def getfield(self, pkt, s):
return self.str2extended(s)

Related

Counting the number of leading zero bits in a sha256 encrpytion

I'm having trouble trying to count the number of leading zero bits after an sha256 hash function as I don't have a lot of experience on 'low level' stuff in python
hex = hashlib.sha256((some_data_from_file).encode('ascii')).hexdigest()
# hex = 0000094e7cc7303a3e33aaeaba76ad937603d4d040064f473a12f10ab30a879f
# this has 20 leading zero bits
hex_digits = int.from_bytes(bytes(hex.encode('ascii')), 'big') #convert str to int
#count num of leading zeroes
def countZeros(x):
total_bits = 256
res = 0
while ((x & (1 << (total_bits - 1))) == 0):
x = (x << 1)
res += 1
return res
print(countZeroes(hex_digits)) # returns 2
I've also tried converting it using bin() however that didn't provide me with any leading zeros.
Instead of getting the hex digest and analyzing that hex string, you could just get the digest, interpret it as an int, ask for its bit-length, and subtract that from 256:
digest = hashlib.sha256(some_data_from_file.encode('ascii')).digest()
print(256 - int.from_bytes(digest, 'big').bit_length())
Demo (Try it online!):
import hashlib
some_data_from_file = '665782'
# Show hex digest for clarity
hex = hashlib.sha256(some_data_from_file.encode('ascii')).hexdigest()
print(hex)
# Show number of leading zero bits
digest = hashlib.sha256(some_data_from_file.encode('ascii')).digest()
print(256 - int.from_bytes(digest, 'big').bit_length())
Output:
0000000399c6aea5ad0c709a9bc331a3ed6494702bd1d129d8c817a0257a1462
30
Benchmark along with Pranav's (not sure how to handle mtraceur's) starting with sha256-values (i.e., before calling hexdigest() or digest()):
462 ns 464 ns 471 ns just_digest
510 ns 518 ns 519 ns just_hexdigest
566 ns 568 ns 574 ns Kelly3
608 ns 608 ns 611 ns Kelly2
688 ns 688 ns 692 ns Kelly
1139 ns 1139 ns 1140 ns Pranav
Benchmark code (Try it online!):
def Kelly(sha256):
return 256 - int.from_bytes(sha256.digest(), 'big').bit_length()
def Kelly2(sha256):
zeros = 0
for byte in sha256.digest():
if byte:
return zeros + 8 - byte.bit_length()
zeros += 8
return zeros
def Kelly3(sha256):
digest = sha256.digest()
if byte := digest[0]:
return 8 - byte.bit_length()
zeros = 0
for byte in digest:
if byte:
return zeros + 8 - byte.bit_length()
zeros += 8
return zeros
def Pranav(sha256):
nzeros = 0
for c in sha256.hexdigest():
if c == "0": nzeros += 4
else:
digit = int(c, base=16)
nzeros += 4 - (math.floor(math.log2(digit)) + 1)
break
return nzeros
def just_digest(sha256):
return sha256.digest()
def just_hexdigest(sha256):
return sha256.hexdigest()
funcs = just_digest, just_hexdigest, Kelly3, Kelly2, Kelly, Pranav
from timeit import repeat
import hashlib, math
from collections import deque
sha256s = [hashlib.sha256(str(i).encode('ascii'))
for i in range(10_000)]
expect = list(map(Kelly, sha256s))
for func in funcs:
result = list(map(func, sha256s))
print(result == expect, func.__name__)
tss = [[] for _ in funcs]
for _ in range(10):
print()
for func, ts in zip(funcs, tss):
t = min(repeat(lambda: deque(map(func, sha256s), 0), number=1))
ts.append(t)
for func, ts in zip(funcs, tss):
print(*('%4d ns ' % (t / len(sha256s) * 1e9) for t in sorted(ts)[:3]), func.__name__)
.hexdigest() returns a string, so your hex variable is a string.
I'm going to call it h instead, because hex is a builtin python function.
So you have:
h = "0000094e7cc7303a3e33aaeaba76ad937603d4d040064f473a12f10ab30a879f"
Now this is a hexadecimal string. Each digit in a hexadecimal number gives you four bits in binary. Since this has five leading zeros, you already have 5 * 4 = 20 leading zeros.
nzeros = 0
for c in h:
if c == "0": nzeros += 4
else: break
Then, you need to count the leading zeros in the binary representation of the first non-zero hexadecimal digit. This is easy to get: A number has math.floor(math.log2(number)) + 1 binary digits, i.e. 4 - (math.floor(math.log2(number)) + 1) leading zeros if it's a hexadecimal digit, since they can only have a max of 4 bits. In this case, the digit is a 9 (1001 in binary), so there are zero additional leading zeros.
So, modify the previous loop:
nzeros = 0
for c in h:
if c == "0": nzeros += 4
else:
digit = int(c, base=16)
nzeros += 4 - (math.floor(math.log2(digit)) + 1)
break
print(nzeros) # 20
Danger!!!
Is this security-sensitive code? Can this hash ever be the result of hashing secret/private data?
If so, then you should probably implement something in C or similar, while taking care to protect against leaking information about the hash through side-channels.
Otherwise, I suggest picking the version (from any of these answers) that you and the people working on your code find the most intuitive, clear, and so on, unless performance matters more than readability, in which case pick the fastest one.
If your hashes are never of security-sensitive inputs, then:
If you just want a good balance of simplicity and low-effort:
def count_leading_zeroes(value, max_bits=256):
value &= (1 << max_bits) - 1 # truncate; treat negatives as 2's compliment
if value == 0:
return max_bits
significant_bits = len(bin(value)) - 2 # has "0b" prefix
return max_bits - significant_bits
If you want to really embrace the bit twiddling you were trying in your question:
def count_leading_zeroes(value, max_bits=256):
top_bit = 1 << (max_bits - 1)
count = 0
value &= (1 << max_bits) - 1
while not value & top_bit:
count += 1
value <<= 1
return count
If you're doing manual bit twiddling, I think in this case a loop which counts from the top is the most justified option, because hashes are rather evenly randomly distributed and so each bit has about even chance of not being zero.
So you have a good chance of exiting the loop early and thus executing more efficiently if you start for from the top (if you start from the bottom you have to check every bit).
You could alternatively do a bit twiddling thing that's inspired by binary search. That way instead of O(n) steps you do O(log(n)) steps. However, this arguably isn't an optimization worth doing in CPython, and for a JIT implementation like PyPy this manual optimization can actually make it harder for automatic optimization to realize that you can just use a raw "count leading zeroes" CPU instruction. If I ever get the time I'll edit this answer with an example of that later.
Now about those side-channel attacks: basically any time you have code that works on secret data (or any results of secret data which you can't prove (like a cryptographer would) have fully irretrievably lost all information about the secret data) , you should make sure your code takes does exactly the same amount of operations and takes the same branches regardless of the data!
Explaining why you should do this is outside the scope of this answer, but if you don't, your code could be harming users by leaking their secret information in ways that hackers could access.
So!
You might be tempted to modify the simple version that uses bin, but bin is inherently hash-dependent: it produces a string whose length is conditional on the leading zeroes, and as far as I know it doesn't (and logically can't! at least not in the general case) guarantee that it does so in constant-time without data-dependent branches. So we should assume merely running bin on an integer leaks information about the integer through side-channels like runtime and branch predictor state and amount of memory allocated and so on.
For illustrative purposes, if we did have a side-channel-safe bin, which I'll call "bin_", we could do:
def count_leading_zeroes(value, max_bits=256):
value &= (1 << max_bits) - 1 # truncate; treat negatives as 2's compliment
value <<= 1 # securely compensate for 0
significant_bits = len(bin_(value)) - 3 # has "0b" prefix and "0" suffix
return max_bits - significant_bits
In principle, a bit-twiddling loop could do leading zero bit count in constant-time and free of input-dependent branches.
The challenge is writing this neatly in Python, because Python is so abstracted from the underlying machine.
The core problem is this: at the CPU level, it's really easy to take a 1 or 0 bit and turn it, branchlessly, into something more useful (like a mask with all bits 1s or all bits 2s, which then lets you conditionally but branchlessly select one of two numbers or clear a number, which you can then use to implement something like "if the lowest bit is set, reset the counter to zero"). At the Python level, implementing stuff like this is a struggle through the fog of a lot of uncertainty of how the Python runtime is implemented - there are many places where it might be reasonable to have data-dependent branches under the covers. So really we want to reduce the amount of Python steps and conversions between the digest that hashlib gives us and our leading zeroes answer.
So the best option is actually to never even reach for human-readable stuff like hex or integer forms of the digest at all! Just stick to the raw digest. Something like this, conceptually:
def count_leading_zeroes_in_bytes(data):
count = 0
# branchless "latch" mask to stop counting:
still_in_leading_zeroes = 1
for byte in data:
for index in reversed(range(8)):
bit = (byte >> index) & 1
# branchless "conditional" if bit is zero:
is_zero = bit ^ 1
# branchlessly increment count if needed:
count += is_zero & still_in_leading_zeroes
# branchlessly latch count on first 1 bit:
still_in_leading_zeroes &= is_zero
return count
This is the best I was able to think of in pure Python. And it still failed.
But some quick testing by both #KellyBundy and me (see comments and Kelly's answer for some examples) shows this version is both extremely slow, and does not actually achieve input-independent execution times (because there's yet another relevant data-dependent optimization inside Python, and possibly for other reasons we're missing).
So if you're going to try to implement anything in Python, test it thoroughly before relying on it to be actually be secure, or just taking the general gist and implementing a C or assembly version. Something like this:
/* _clz.c */
#include <limits.h> /* CHAR_BIT */
#include <stddef.h> /* size_t */
int count_leading_zeroes_bytes(char * bytes, size_t length)
{
int still_in_leading_zeroes = 1;
int count = 0;
while(length--)
{
char byte = *bytes++;
int bits = CHAR_BIT;
while(bits--)
{
int bit = (byte >> bits) & 1;
int is_zero = bit ^ 1;
count += is_zero & still_in_leading_zeroes;
still_in_leading_zeroes &= is_zero;
}
}
return count;
}
# clz.py
import ctypes
# This is just a quick example. A mature version
# would load the library as appropriate for each
# platform.
_library = ctypes.CDLL('./_clz.so')
_count_leading_zeroes_bytes = _library.count_leading_zeroes_bytes
def count_leading_zeroes_bytes(data):
return _count_leading_zeroes_bytes(
ctypes.c_char_p(data),
ctypes.c_size_t(len(data)),
)

custom crc32 calculation in python without libs

I have been looking for a simple python code which can generate a crc32-sum. It is for a stm32 and i dont find a good example which is adjustable.
To get the right settings for my calculation i used following side.
http://www.sunshine2k.de/coding/javascript/crc/crc_js.html
The settings would be the following:
Polynomial: 0x4C11DB7,
Initial Value: 0xFFFFFFFF
and no Xor Value or 0x00, also the Input and result are not reflected.
Does someone know where i could get a simple adjustable algorithm or where i can learn how to write one?
Edit:
I use this function to create the table
def create_table():
a = []
for i in range(256):
k = i
for j in range(8):
if k & 1:
k ^= 0x4C11DB7
k >>= 1
a.append(k)
return a
and the following for generating the crc-sum
def crc32(bytestream):
crc_table = create_table()
crc32 = 0xffffffff
for byte in range( int(len(bytestream)) ):
lookup_index = (crc32 ^ byte) & 0xff
crc32 = (crc32 >> 8) ^ crc_table[lookup_index]
return crc32
and call the function with this
print(hex(crc32(b"1205")))
the result is: 0x9f8e7b8c
but the website gives me: 0xA7D10A0A
can someone help me?
First off, what you have is for a reflected CRC, not a non-reflected CRC. Though there is an error in your table construction. This:
if k & 1:
k ^= 0x4C11DB7
k >>= 1
is wrong. The exclusive-or must be done after the shift. So it would need to be (for the reflected case):
k = (k >> 1) ^ 0xedb88320 if k & 1 else k >> 1
Note that the polynomial also needs to be reflected in this case.
Another error in your code is using range to make the integers 0, 1, ..., and using those instead of the actual data bytes to compute the CRC on! What you want for your for loop is simply:
for byte in bytestream:
The whole point of using a table is to make the CRC calculation faster. You don't want to regenerate the table every time you do a CRC. You want to generate the table once when your program starts, and then use it multiple times. Or you can generate the table separately from your program, and then put the table itself in your program. That's what's usually done.
Anyway, to do the non-reflected case, you need to flip things around. So to make the table:
def create_table():
a = []
for i in range(256):
k = i << 24;
for _ in range(8):
k = (k << 1) ^ 0x4c11db7 if k & 0x80000000 else k << 1
a.append(k & 0xffffffff)
return a
To use the table:
def crc32(bytestream):
crc_table = create_table()
crc = 0xffffffff
for byte in bytestream:
lookup_index = ((crc >> 24) ^ byte) & 0xff
crc = ((crc & 0xffffff) << 8) ^ crc_table[lookup_index]
return crc
Now it correctly implements your specification, which happens to be the MPEG-2 32-bit CRC specification (from Greg Cook's CRC catalogue):
width=32 poly=0x04c11db7 init=0xffffffff refin=false refout=false xorout=0x00000000 check=0x0376e6e7 residue=0x00000000 name="CRC-32/MPEG-2"
For the code above, if I do:
print(hex(crc32(b'123456789')))
I get 0x376e6e7, which matches the check value in the catalog.
Again, you need to take the create_table() out of the crc32() routine and do it somewhere else, once.

Most Significant Byte Calculation

I am trying to implement a larger cipher problem, and I am running into an issue I don't quite understand when taking the Most Significant Byte (not bit).
To turn an int into a byte I am using:
def binary(i):
if i == 0:
return "0"
s = ''
while i:
if i & 1 == 1:
s = "1" + s
else:
s = "0" + s
i >>= 1
return s
I am pretty sure the above is correct, it works with my test numbers. To then extract the Most Significant Byte I am using:
def msb(i):
a = binary(i)
b = a[0:7]
c = int(b,2)
return c
However, this seems to return a number half what I would expect. Am I wrong in thinking you can get the most significant byte by just taking the first 8 bits, or am I missing something else silly?
Your example code only gets the seven leading bits, not 8:
def msb(i):
a = binary(i)
b = a[0:7] # gets first SEVEN characters of string a
c = int(b,2)
return c
Change it to a[0:8] to extract 8 leading characters/bits rather than 7.
There are much easier ways to do this. For example, if you want the top eight bits (ignoring byte alignment), you can do:
def msb(val):
return val >> (val.bit_length() - 8)
For the most significant aligned byte, in Python 3 you can do:
def msb(val):
return val.to_bytes((val.bit_length() + 7) // 8, 'big')[0]
In Py2, you'd have to convert to a hex string and back to match the to_bytes approach.
a byte is 0xFF you can get the most signifigant byte(leftmost) by doing
i & (0xFF<<(n_bytes(i)-1))
I always get most significant and least significant confused if you want the rightmost byte its easier even
i & 0xFF
i think thats right at least ... im not sure if it will be guaranteed to return the number of bytes or not ...
based on your example i think the second code is what you want
you could also do something like
s = struct.pack("i",i)
ord(s[0]) # leftmost
ord(s[-1]) # rightmost
If you want aligned bytes, this should work at least from Python 2.5 onwards:
def msb(val):
return 0 if val == 0 else val >> (((val.bit_length() - 1) >> 3) << 3)
Or, if you prefer it more readable:
def msb(val)
if val == 0:
return 0
else:
return val >> (((val.bit_length() - 1) / 8) * 8)

read single bit operation python 2.6

I am trying to read a single bit in a binary string but can't seem to get it to work properly. I read in a value then convert to a 32b string. From there I need to read a specific bit in the string but its not always the same. getBin function returns 32bit string with leading 0's. The code I have always returns a 1, even if the bit is a 0. Code example:
slot=195035377
getBin = lambda x, n: x >= 0 and str(bin(x))[2:].zfill(n) or "-" + str(bin(x))[3:].zfill(n)
bits = getBin(slot,32)
bit = (bits and (1 * (2 ** y)) != 0)
print("bit: %i\n"%(bit))
in this example bits = 00001011101000000000000011110011
and if I am looking for bit3 which i s a 0, bit will be equal to 1. Any ideas?
To test for specific bits in a integer value, use the & bitwise operand; no need to convert this to a binary string.
if slot & (1 << 3):
print 'bit 3 is set'
else:
print 'bit 3 is not set'
The above code shifts a test bit to the left twice. Alternatively, shift slot to the right 3 times:
if (slot >> 2) & 1:
To make this generic for any bit position, subtract 1:
if slot & (1 << (bitpos - 1)):
print 'bit {} is set'.format(bitpos)
or
if (slot >> (bitpos - 1)) & 1:
Your binary formatting code is overly verbose. Just use the format() function to create a binary string representation:
format(slot, '032b')
formats your binary value to a 0-padded 32-character binary string.
n = 223
bitpos = 3
bit3 = (n >> (bitpos-1))&1
is how you should be doing it ... don't use strings!
You can just use slicing to get the correct digit.
bits = getBin(slot, 32)
bit = bits[bit_location-1:bit_location] #Assumes zero based values
print("bit: %i\n"%(bit))

How to get the signed integer value of a long in python?

If lv stores a long value, and the machine is 32 bits, the following code:
iv = int(lv & 0xffffffff)
results an iv of type long, instead of the machine's int.
How can I get the (signed) int value in this case?
import ctypes
number = lv & 0xFFFFFFFF
signed_number = ctypes.c_long(number).value
You're working in a high-level scripting language; by nature, the native data types of the system you're running on aren't visible. You can't cast to a native signed int with code like this.
If you know that you want the value converted to a 32-bit signed integer--regardless of the platform--you can just do the conversion with the simple math:
iv = 0xDEADBEEF
if(iv & 0x80000000):
iv = -0x100000000 + iv
Essentially, the problem is to sign extend from 32 bits to... an infinite number of bits, because Python has arbitrarily large integers. Normally, sign extension is done automatically by CPU instructions when casting, so it's interesting that this is harder in Python than it would be in, say, C.
By playing around, I found something similar to BreizhGatch's function, but that doesn't require a conditional statement. n & 0x80000000 extracts the 32-bit sign bit; then, the - keeps the same 32-bit representation but sign-extends it; finally, the extended sign bits are set on n.
def toSigned32(n):
n = n & 0xffffffff
return n | (-(n & 0x80000000))
Bit Twiddling Hacks suggests another solution that perhaps works more generally. n ^ 0x80000000 flips the 32-bit sign bit; then - 0x80000000 will sign-extend the opposite bit. Another way to think about it is that initially, negative numbers are above positive numbers (separated by 0x80000000); the ^ swaps their positions; then the - shifts negative numbers to below 0.
def toSigned32(n):
n = n & 0xffffffff
return (n ^ 0x80000000) - 0x80000000
Can I suggest this:
def getSignedNumber(number, bitLength):
mask = (2 ** bitLength) - 1
if number & (1 << (bitLength - 1)):
return number | ~mask
else:
return number & mask
print iv, '->', getSignedNumber(iv, 32)
You may use struct library to convert values like that. It's ugly, but works:
from struct import pack, unpack
signed = unpack('l', pack('L', lv & 0xffffffff))[0]
A quick and dirty solution (x is never greater than 32-bit in my case).
if x > 0x7fffffff:
x = x - 4294967296
If you know how many bits are in the original value, e.g. byte or multibyte values from an I2C sensor, then you can do the standard Two's Complement conversion:
def TwosComp8(n):
return n - 0x100 if n & 0x80 else n
def TwosComp16(n):
return n - 0x10000 if n & 0x8000 else n
def TwosComp32(n):
return n - 0x100000000 if n & 0x80000000 else n
In case the hexadecimal representation of the number is of 4 bytes, this would solve the problem.
def B2T_32(x):
num=int(x,16)
if(num & 0x80000000): # If it has the negative sign bit. (MSB=1)
num -= 0x80000000*2
return num
print(B2T_32(input("enter a input as a hex value\n")))
Simplest solution with any bit-length of number
Why is the syntax of a signed integer so difficult for the human mind to understand. Because this is the idea of machines. :-)
Let's explain.
If we have a bi-directional 7-bit counter with the initial state
000 0000
and we get a pulse for the back count input. Then the next number to count will be
111 1111
And the people said:
Hey, the counter we need to know that this is a negative reload. You
should add a sign letting you know about this.
And the counter added:
1111 1111
And people asked,
How are we going to calculate that this is -1.
The counter replied: Find a number one greater than the reading and subtract it and you get the result.
1111 1111
-10000 0000
____________
(dec) -1
def sigIntFromHex(a): # a = 0x0xffe1
if a & (1 << (a.bit_length()-1)): # check if highest bit is 1 thru & with 0x1000
return a - (1 << (a.bit_length())) # 0xffe1 - 0x10000
else:
return a
###and more elegant:###
def sigIntFromHex(a):
return a - (1 << (a.bit_length())) if a & (1 << (a.bit_length()-1)) else a
b = 0xFFE1
print(sigIntFromHex(b))
I hope I helped

Categories