I am doing an assignment where I have to institute the Diffie-Hellman key exchange. In order to speed things up I am using bit-operators in Python, and everything is working fine programming wise, but I have to perform a Parity Checksum, and I don't think I have the proper understanding of what this is or how it works.
Basically I need to be able to take a key of variable length (up to 2048 bits), break it into 64 bit words, and perform Checksum. I am unsure what this means exactly. To break a word into 64 bit words using Python, how would one go about that? I think once I do that I should be able to just perform an XOR operation on the words to get a 64bit output. At the moment though I am stuck on exactly how one break up a word in 64 bit chunks in Python appropriately?
A parity checksum is just the xor of all the bits in the word. The most efficient way to do this will have log(nbits) operations, because you can halve the number of bits you are dealing with on each iteration. For example:
def parity(word, nbits):
if nbits & (nbits - 1):
raise ValueError("nbits must be power of two")
while nbits > 1:
nbits >>= 1
word ^= (word >> nbits)
return word & 1
A longitudinal parity check is a bit different, because you stop when you get to a given word-size, at which point, your parity should be all zeros or all ones, rather than a single one or zero. I don't know whether you want odd or even parity, so this is a bit more general:
def longitudinal_parity(data, total_bits, word_bits, expected_parity=1):
"""
Performs longitudinal parity check
"""
for nbits in (total_bits, word_bits):
if nbits & (nbits - 1):
raise ValueError("bit size must be power of two")
mask = (1 << total_bits) - 1
while total_bits > word_bits:
total_bits >>= 1
data ^= (data >> total_bits)
mask >>= total_bits
data &= mask
return data == (mask if expected_parity else 0)
So for your example, the first parameter would be a 2048 bit integer, the total_bits would be 2048, the word_bits would be 64, and the desired parity would be 0 or 1.
I don't know anything about Diffie-Hellman's parity check, but if your parity is provided separately (seems likely), then you are comparing against a separate parity word rather than all ones or all zeroes. This is a minor tweak:
def longitudinal_parity(data, total_bits, word_bits, expected_parity):
"""
Performs longitudinal parity check
"""
for nbits in (total_bits, word_bits):
if nbits & (nbits - 1):
raise ValueError("bit size must be power of two")
mask = (1 << total_bits) - 1
while total_bits > word_bits:
total_bits >>= 1
data ^= (data >> total_bits)
mask >>= total_bits
data &= mask
return data == expected_parity
There are plenty of possible optimizations here, such as precalculating masks, starting the mask off at a smaller number, etc. Hopefully the code is readable.
Related
I'm having trouble trying to count the number of leading zero bits after an sha256 hash function as I don't have a lot of experience on 'low level' stuff in python
hex = hashlib.sha256((some_data_from_file).encode('ascii')).hexdigest()
# hex = 0000094e7cc7303a3e33aaeaba76ad937603d4d040064f473a12f10ab30a879f
# this has 20 leading zero bits
hex_digits = int.from_bytes(bytes(hex.encode('ascii')), 'big') #convert str to int
#count num of leading zeroes
def countZeros(x):
total_bits = 256
res = 0
while ((x & (1 << (total_bits - 1))) == 0):
x = (x << 1)
res += 1
return res
print(countZeroes(hex_digits)) # returns 2
I've also tried converting it using bin() however that didn't provide me with any leading zeros.
Instead of getting the hex digest and analyzing that hex string, you could just get the digest, interpret it as an int, ask for its bit-length, and subtract that from 256:
digest = hashlib.sha256(some_data_from_file.encode('ascii')).digest()
print(256 - int.from_bytes(digest, 'big').bit_length())
Demo (Try it online!):
import hashlib
some_data_from_file = '665782'
# Show hex digest for clarity
hex = hashlib.sha256(some_data_from_file.encode('ascii')).hexdigest()
print(hex)
# Show number of leading zero bits
digest = hashlib.sha256(some_data_from_file.encode('ascii')).digest()
print(256 - int.from_bytes(digest, 'big').bit_length())
Output:
0000000399c6aea5ad0c709a9bc331a3ed6494702bd1d129d8c817a0257a1462
30
Benchmark along with Pranav's (not sure how to handle mtraceur's) starting with sha256-values (i.e., before calling hexdigest() or digest()):
462 ns 464 ns 471 ns just_digest
510 ns 518 ns 519 ns just_hexdigest
566 ns 568 ns 574 ns Kelly3
608 ns 608 ns 611 ns Kelly2
688 ns 688 ns 692 ns Kelly
1139 ns 1139 ns 1140 ns Pranav
Benchmark code (Try it online!):
def Kelly(sha256):
return 256 - int.from_bytes(sha256.digest(), 'big').bit_length()
def Kelly2(sha256):
zeros = 0
for byte in sha256.digest():
if byte:
return zeros + 8 - byte.bit_length()
zeros += 8
return zeros
def Kelly3(sha256):
digest = sha256.digest()
if byte := digest[0]:
return 8 - byte.bit_length()
zeros = 0
for byte in digest:
if byte:
return zeros + 8 - byte.bit_length()
zeros += 8
return zeros
def Pranav(sha256):
nzeros = 0
for c in sha256.hexdigest():
if c == "0": nzeros += 4
else:
digit = int(c, base=16)
nzeros += 4 - (math.floor(math.log2(digit)) + 1)
break
return nzeros
def just_digest(sha256):
return sha256.digest()
def just_hexdigest(sha256):
return sha256.hexdigest()
funcs = just_digest, just_hexdigest, Kelly3, Kelly2, Kelly, Pranav
from timeit import repeat
import hashlib, math
from collections import deque
sha256s = [hashlib.sha256(str(i).encode('ascii'))
for i in range(10_000)]
expect = list(map(Kelly, sha256s))
for func in funcs:
result = list(map(func, sha256s))
print(result == expect, func.__name__)
tss = [[] for _ in funcs]
for _ in range(10):
print()
for func, ts in zip(funcs, tss):
t = min(repeat(lambda: deque(map(func, sha256s), 0), number=1))
ts.append(t)
for func, ts in zip(funcs, tss):
print(*('%4d ns ' % (t / len(sha256s) * 1e9) for t in sorted(ts)[:3]), func.__name__)
.hexdigest() returns a string, so your hex variable is a string.
I'm going to call it h instead, because hex is a builtin python function.
So you have:
h = "0000094e7cc7303a3e33aaeaba76ad937603d4d040064f473a12f10ab30a879f"
Now this is a hexadecimal string. Each digit in a hexadecimal number gives you four bits in binary. Since this has five leading zeros, you already have 5 * 4 = 20 leading zeros.
nzeros = 0
for c in h:
if c == "0": nzeros += 4
else: break
Then, you need to count the leading zeros in the binary representation of the first non-zero hexadecimal digit. This is easy to get: A number has math.floor(math.log2(number)) + 1 binary digits, i.e. 4 - (math.floor(math.log2(number)) + 1) leading zeros if it's a hexadecimal digit, since they can only have a max of 4 bits. In this case, the digit is a 9 (1001 in binary), so there are zero additional leading zeros.
So, modify the previous loop:
nzeros = 0
for c in h:
if c == "0": nzeros += 4
else:
digit = int(c, base=16)
nzeros += 4 - (math.floor(math.log2(digit)) + 1)
break
print(nzeros) # 20
Danger!!!
Is this security-sensitive code? Can this hash ever be the result of hashing secret/private data?
If so, then you should probably implement something in C or similar, while taking care to protect against leaking information about the hash through side-channels.
Otherwise, I suggest picking the version (from any of these answers) that you and the people working on your code find the most intuitive, clear, and so on, unless performance matters more than readability, in which case pick the fastest one.
If your hashes are never of security-sensitive inputs, then:
If you just want a good balance of simplicity and low-effort:
def count_leading_zeroes(value, max_bits=256):
value &= (1 << max_bits) - 1 # truncate; treat negatives as 2's compliment
if value == 0:
return max_bits
significant_bits = len(bin(value)) - 2 # has "0b" prefix
return max_bits - significant_bits
If you want to really embrace the bit twiddling you were trying in your question:
def count_leading_zeroes(value, max_bits=256):
top_bit = 1 << (max_bits - 1)
count = 0
value &= (1 << max_bits) - 1
while not value & top_bit:
count += 1
value <<= 1
return count
If you're doing manual bit twiddling, I think in this case a loop which counts from the top is the most justified option, because hashes are rather evenly randomly distributed and so each bit has about even chance of not being zero.
So you have a good chance of exiting the loop early and thus executing more efficiently if you start for from the top (if you start from the bottom you have to check every bit).
You could alternatively do a bit twiddling thing that's inspired by binary search. That way instead of O(n) steps you do O(log(n)) steps. However, this arguably isn't an optimization worth doing in CPython, and for a JIT implementation like PyPy this manual optimization can actually make it harder for automatic optimization to realize that you can just use a raw "count leading zeroes" CPU instruction. If I ever get the time I'll edit this answer with an example of that later.
Now about those side-channel attacks: basically any time you have code that works on secret data (or any results of secret data which you can't prove (like a cryptographer would) have fully irretrievably lost all information about the secret data) , you should make sure your code takes does exactly the same amount of operations and takes the same branches regardless of the data!
Explaining why you should do this is outside the scope of this answer, but if you don't, your code could be harming users by leaking their secret information in ways that hackers could access.
So!
You might be tempted to modify the simple version that uses bin, but bin is inherently hash-dependent: it produces a string whose length is conditional on the leading zeroes, and as far as I know it doesn't (and logically can't! at least not in the general case) guarantee that it does so in constant-time without data-dependent branches. So we should assume merely running bin on an integer leaks information about the integer through side-channels like runtime and branch predictor state and amount of memory allocated and so on.
For illustrative purposes, if we did have a side-channel-safe bin, which I'll call "bin_", we could do:
def count_leading_zeroes(value, max_bits=256):
value &= (1 << max_bits) - 1 # truncate; treat negatives as 2's compliment
value <<= 1 # securely compensate for 0
significant_bits = len(bin_(value)) - 3 # has "0b" prefix and "0" suffix
return max_bits - significant_bits
In principle, a bit-twiddling loop could do leading zero bit count in constant-time and free of input-dependent branches.
The challenge is writing this neatly in Python, because Python is so abstracted from the underlying machine.
The core problem is this: at the CPU level, it's really easy to take a 1 or 0 bit and turn it, branchlessly, into something more useful (like a mask with all bits 1s or all bits 2s, which then lets you conditionally but branchlessly select one of two numbers or clear a number, which you can then use to implement something like "if the lowest bit is set, reset the counter to zero"). At the Python level, implementing stuff like this is a struggle through the fog of a lot of uncertainty of how the Python runtime is implemented - there are many places where it might be reasonable to have data-dependent branches under the covers. So really we want to reduce the amount of Python steps and conversions between the digest that hashlib gives us and our leading zeroes answer.
So the best option is actually to never even reach for human-readable stuff like hex or integer forms of the digest at all! Just stick to the raw digest. Something like this, conceptually:
def count_leading_zeroes_in_bytes(data):
count = 0
# branchless "latch" mask to stop counting:
still_in_leading_zeroes = 1
for byte in data:
for index in reversed(range(8)):
bit = (byte >> index) & 1
# branchless "conditional" if bit is zero:
is_zero = bit ^ 1
# branchlessly increment count if needed:
count += is_zero & still_in_leading_zeroes
# branchlessly latch count on first 1 bit:
still_in_leading_zeroes &= is_zero
return count
This is the best I was able to think of in pure Python. And it still failed.
But some quick testing by both #KellyBundy and me (see comments and Kelly's answer for some examples) shows this version is both extremely slow, and does not actually achieve input-independent execution times (because there's yet another relevant data-dependent optimization inside Python, and possibly for other reasons we're missing).
So if you're going to try to implement anything in Python, test it thoroughly before relying on it to be actually be secure, or just taking the general gist and implementing a C or assembly version. Something like this:
/* _clz.c */
#include <limits.h> /* CHAR_BIT */
#include <stddef.h> /* size_t */
int count_leading_zeroes_bytes(char * bytes, size_t length)
{
int still_in_leading_zeroes = 1;
int count = 0;
while(length--)
{
char byte = *bytes++;
int bits = CHAR_BIT;
while(bits--)
{
int bit = (byte >> bits) & 1;
int is_zero = bit ^ 1;
count += is_zero & still_in_leading_zeroes;
still_in_leading_zeroes &= is_zero;
}
}
return count;
}
# clz.py
import ctypes
# This is just a quick example. A mature version
# would load the library as appropriate for each
# platform.
_library = ctypes.CDLL('./_clz.so')
_count_leading_zeroes_bytes = _library.count_leading_zeroes_bytes
def count_leading_zeroes_bytes(data):
return _count_leading_zeroes_bytes(
ctypes.c_char_p(data),
ctypes.c_size_t(len(data)),
)
I understand what each of the individual operators does by itself, but I don't know how they interact in order to get the correct results.
def kill(n, k):
#Takes int n and replaces the bit k from right with 0. Returns the new number
return n & ~(1<<k-1)
I tested the program with the n as 37 and k as 3.
def b(n,s=""):
print (str(format(n, 'b')) +" "+ s)
def kill(n, k):
b(n, "n ")
b(1<<k-1, "1<<k-1")
b(~(1<<k-1), "~(1<<k-1) ")
b( n & ~(1<<k-1)," n & ~(1<<k-1) ")
return n & ~(1<<k-1)
#TESTS
kill(37, 3)
I decided to run through it step by step.
I printed both the binary representations of both n and ~(1<<k-1) but after that I was lost. ~(1<<k-1) gave me -101 and I'm not sure how to visualize that in binary. Can someone go through it step by step with visualizations for the binary?
All numbers below are printed in binary representation.
Say, n has m digits in binary representation. Observe that n & 11...1 (m ones) would return n. Indeed, working bitwise, if x is a one-bit digit (0 or 1), then x & 1 = x.
Moreover, observe that x & 0 = x. Therefore, to set up kth digit of number n to 0, we need to do operation and (&) with 11111..1011..1, where 0 is exactly on kth location from the end.
Now we need to generate 11111..1011..0. It has all ones except one digit. If we negate it, we get 00000..0100..1 which we get by 1 << k-1.
All in all: 1 << k-1 gives us 00000..0100..0. Its negation provides 11111..1011..1. Finally, we do & with the input.
I am working with Bitmasks in python. As far as I know, these are arrays of integers that when they are unpacked into binary format they tell you which of the 32 bits are set (=1) for a given element in the array.
I would like to know the fastest way to check whether 4 specific bits are set or not for any element of an array. I do not care about the rest. I have tried the following solution but it is not fast enough for my purpose:
def detect(bitmask, check=(18,22,23,24), bits=32):
boolmask = np.zeros(len(bitmask), dtype=bool)
for i, val in enumerate(bitmask):
bithost = np.zeros(bits, dtype='i1')
masklist = list(bin(val)[2:])
bithost[:len(masklist)] = np.flip(masklist,axis=0)
if len(np.intersect1d(np.nonzero(bithost)[0] ,check)) != 0:
boolmask[i] = True
else:
boolmask[i] = False
if any(boolmask):
print("There are some problems")
else:
print("It is clean")
For example, if a given bitmask contains the integer 24453656 (1011101010010001000011000 in binary), the output of function detect would be "There are some problems" since bit 22 is set:
bit: ... 20, 21, 22, 23, 24,...
mask: ... 0, 0, 1, 0, 0,...
Any ideas on how to improve the performance?
Integers are nothing but sequence of bits in the computer.
So, if you get integer, let's say: 333 it is a sequence of bits 101001101 to the computer. It doesn't need any unpacking into bits. It is bits.
Therefore, if the mask is also an integer, you don't need any unpacking, just apply bitwise operations to it. Check wikipedia for details of how these work.
In order to check if ANY of the bits xyz are set in an integer abc, you do:
(abc & xyz) > 0. If you absolutely need checking mask to be a tuple of bit places, you do some packing, like this:
def detect(bitmask,check=(18,22,23,24)):
checkmask=sum(2**x for x in check)
if (bitmask & checkmask) > 0:
print "There are some problems"
else:
print "Everything OK"
Note that bitmasks start with 0 based bit indices. First bit is bit 0.
I am not sure what's in your bitmask argument. Regarless, you should probably use bitwise operators.
Make a bit mask like this:
def turn_bits_on(bits):
n = 0
for k in bits:
n = (n | (1 << (k - 1))) if k > 0 else n
return n
bits_to_check = turn_bits_on([18, 22, 23, 24])
Then, for a single number, you can detect with:
def is_ok(value, mask):
return not (value & mask)
print(is_ok(24453656, bits_to_check))
Finally, depending on what your bitmask value is (a list, a DataFrame, etc), apply the is_ok() function to each value.
Hope this helps!
I want to write extendible hashing. On wiki I have found good implementation in python. But this code uses least significant bits, so when I have hash 1101 for d = 1 value is 1 and for d = 2 value is 01. I would like to use most significant bits. For exmaple: hash 1101, d = 1 value is 1, d = 2 value is 11. Is there any simple way to do that? I tried, but I can't.
Do you understand why it uses the least significant bits?
More or less. It makes efficient when we using arrays. Ok so for hash function I would like to use four least bits from 4-bytes integer but from left to right.
h = hash(k)
h = h & 0xf #use mask to get four least bits
p = self.pp[ h >> ( 4 - GD)]
And it doesn't work, and I don't know why.
Computing a hash using the least significant bits is the fastest way to compute a hash, because it only requires an AND bitwise operation. This makes it very popular.
Here is an implemetation (in C) for a hash using the most significant bits. Since there is no direct way to know the most significant bit, it repeatedly tests that the remaining value has only the specified amount of bits.
int significantHash(int value, int bits) {
int mask = (1 << bits) - 1;
while (value > mask) {
value >>= 1;
}
return value;
}
I recommend the overlapping hash, that makes use of all the bits of the number. Essentially, it cuts the number in parts of equal number of bits and XORs them. It runs slower than the least significant hash, but faster than the significant hash. Above all else, it offers a better dispersion than the other two methods, making it a better candidate when the numbers that must be hashed have a certain bit-related-pattern.
int overlappingHash(int value, int bits) {
int mask = (1 << bits) - 1;
int answer = 0;
do {
answer ^= (value & mask);
value >>= bits;
} while (value > 0);
return answer;
}
If lv stores a long value, and the machine is 32 bits, the following code:
iv = int(lv & 0xffffffff)
results an iv of type long, instead of the machine's int.
How can I get the (signed) int value in this case?
import ctypes
number = lv & 0xFFFFFFFF
signed_number = ctypes.c_long(number).value
You're working in a high-level scripting language; by nature, the native data types of the system you're running on aren't visible. You can't cast to a native signed int with code like this.
If you know that you want the value converted to a 32-bit signed integer--regardless of the platform--you can just do the conversion with the simple math:
iv = 0xDEADBEEF
if(iv & 0x80000000):
iv = -0x100000000 + iv
Essentially, the problem is to sign extend from 32 bits to... an infinite number of bits, because Python has arbitrarily large integers. Normally, sign extension is done automatically by CPU instructions when casting, so it's interesting that this is harder in Python than it would be in, say, C.
By playing around, I found something similar to BreizhGatch's function, but that doesn't require a conditional statement. n & 0x80000000 extracts the 32-bit sign bit; then, the - keeps the same 32-bit representation but sign-extends it; finally, the extended sign bits are set on n.
def toSigned32(n):
n = n & 0xffffffff
return n | (-(n & 0x80000000))
Bit Twiddling Hacks suggests another solution that perhaps works more generally. n ^ 0x80000000 flips the 32-bit sign bit; then - 0x80000000 will sign-extend the opposite bit. Another way to think about it is that initially, negative numbers are above positive numbers (separated by 0x80000000); the ^ swaps their positions; then the - shifts negative numbers to below 0.
def toSigned32(n):
n = n & 0xffffffff
return (n ^ 0x80000000) - 0x80000000
Can I suggest this:
def getSignedNumber(number, bitLength):
mask = (2 ** bitLength) - 1
if number & (1 << (bitLength - 1)):
return number | ~mask
else:
return number & mask
print iv, '->', getSignedNumber(iv, 32)
You may use struct library to convert values like that. It's ugly, but works:
from struct import pack, unpack
signed = unpack('l', pack('L', lv & 0xffffffff))[0]
A quick and dirty solution (x is never greater than 32-bit in my case).
if x > 0x7fffffff:
x = x - 4294967296
If you know how many bits are in the original value, e.g. byte or multibyte values from an I2C sensor, then you can do the standard Two's Complement conversion:
def TwosComp8(n):
return n - 0x100 if n & 0x80 else n
def TwosComp16(n):
return n - 0x10000 if n & 0x8000 else n
def TwosComp32(n):
return n - 0x100000000 if n & 0x80000000 else n
In case the hexadecimal representation of the number is of 4 bytes, this would solve the problem.
def B2T_32(x):
num=int(x,16)
if(num & 0x80000000): # If it has the negative sign bit. (MSB=1)
num -= 0x80000000*2
return num
print(B2T_32(input("enter a input as a hex value\n")))
Simplest solution with any bit-length of number
Why is the syntax of a signed integer so difficult for the human mind to understand. Because this is the idea of machines. :-)
Let's explain.
If we have a bi-directional 7-bit counter with the initial state
000 0000
and we get a pulse for the back count input. Then the next number to count will be
111 1111
And the people said:
Hey, the counter we need to know that this is a negative reload. You
should add a sign letting you know about this.
And the counter added:
1111 1111
And people asked,
How are we going to calculate that this is -1.
The counter replied: Find a number one greater than the reading and subtract it and you get the result.
1111 1111
-10000 0000
____________
(dec) -1
def sigIntFromHex(a): # a = 0x0xffe1
if a & (1 << (a.bit_length()-1)): # check if highest bit is 1 thru & with 0x1000
return a - (1 << (a.bit_length())) # 0xffe1 - 0x10000
else:
return a
###and more elegant:###
def sigIntFromHex(a):
return a - (1 << (a.bit_length())) if a & (1 << (a.bit_length()-1)) else a
b = 0xFFE1
print(sigIntFromHex(b))
I hope I helped