I'm having trouble trying to count the number of leading zero bits after an sha256 hash function as I don't have a lot of experience on 'low level' stuff in python
hex = hashlib.sha256((some_data_from_file).encode('ascii')).hexdigest()
# hex = 0000094e7cc7303a3e33aaeaba76ad937603d4d040064f473a12f10ab30a879f
# this has 20 leading zero bits
hex_digits = int.from_bytes(bytes(hex.encode('ascii')), 'big') #convert str to int
#count num of leading zeroes
def countZeros(x):
total_bits = 256
res = 0
while ((x & (1 << (total_bits - 1))) == 0):
x = (x << 1)
res += 1
return res
print(countZeroes(hex_digits)) # returns 2
I've also tried converting it using bin() however that didn't provide me with any leading zeros.
Instead of getting the hex digest and analyzing that hex string, you could just get the digest, interpret it as an int, ask for its bit-length, and subtract that from 256:
digest = hashlib.sha256(some_data_from_file.encode('ascii')).digest()
print(256 - int.from_bytes(digest, 'big').bit_length())
Demo (Try it online!):
import hashlib
some_data_from_file = '665782'
# Show hex digest for clarity
hex = hashlib.sha256(some_data_from_file.encode('ascii')).hexdigest()
print(hex)
# Show number of leading zero bits
digest = hashlib.sha256(some_data_from_file.encode('ascii')).digest()
print(256 - int.from_bytes(digest, 'big').bit_length())
Output:
0000000399c6aea5ad0c709a9bc331a3ed6494702bd1d129d8c817a0257a1462
30
Benchmark along with Pranav's (not sure how to handle mtraceur's) starting with sha256-values (i.e., before calling hexdigest() or digest()):
462 ns 464 ns 471 ns just_digest
510 ns 518 ns 519 ns just_hexdigest
566 ns 568 ns 574 ns Kelly3
608 ns 608 ns 611 ns Kelly2
688 ns 688 ns 692 ns Kelly
1139 ns 1139 ns 1140 ns Pranav
Benchmark code (Try it online!):
def Kelly(sha256):
return 256 - int.from_bytes(sha256.digest(), 'big').bit_length()
def Kelly2(sha256):
zeros = 0
for byte in sha256.digest():
if byte:
return zeros + 8 - byte.bit_length()
zeros += 8
return zeros
def Kelly3(sha256):
digest = sha256.digest()
if byte := digest[0]:
return 8 - byte.bit_length()
zeros = 0
for byte in digest:
if byte:
return zeros + 8 - byte.bit_length()
zeros += 8
return zeros
def Pranav(sha256):
nzeros = 0
for c in sha256.hexdigest():
if c == "0": nzeros += 4
else:
digit = int(c, base=16)
nzeros += 4 - (math.floor(math.log2(digit)) + 1)
break
return nzeros
def just_digest(sha256):
return sha256.digest()
def just_hexdigest(sha256):
return sha256.hexdigest()
funcs = just_digest, just_hexdigest, Kelly3, Kelly2, Kelly, Pranav
from timeit import repeat
import hashlib, math
from collections import deque
sha256s = [hashlib.sha256(str(i).encode('ascii'))
for i in range(10_000)]
expect = list(map(Kelly, sha256s))
for func in funcs:
result = list(map(func, sha256s))
print(result == expect, func.__name__)
tss = [[] for _ in funcs]
for _ in range(10):
print()
for func, ts in zip(funcs, tss):
t = min(repeat(lambda: deque(map(func, sha256s), 0), number=1))
ts.append(t)
for func, ts in zip(funcs, tss):
print(*('%4d ns ' % (t / len(sha256s) * 1e9) for t in sorted(ts)[:3]), func.__name__)
.hexdigest() returns a string, so your hex variable is a string.
I'm going to call it h instead, because hex is a builtin python function.
So you have:
h = "0000094e7cc7303a3e33aaeaba76ad937603d4d040064f473a12f10ab30a879f"
Now this is a hexadecimal string. Each digit in a hexadecimal number gives you four bits in binary. Since this has five leading zeros, you already have 5 * 4 = 20 leading zeros.
nzeros = 0
for c in h:
if c == "0": nzeros += 4
else: break
Then, you need to count the leading zeros in the binary representation of the first non-zero hexadecimal digit. This is easy to get: A number has math.floor(math.log2(number)) + 1 binary digits, i.e. 4 - (math.floor(math.log2(number)) + 1) leading zeros if it's a hexadecimal digit, since they can only have a max of 4 bits. In this case, the digit is a 9 (1001 in binary), so there are zero additional leading zeros.
So, modify the previous loop:
nzeros = 0
for c in h:
if c == "0": nzeros += 4
else:
digit = int(c, base=16)
nzeros += 4 - (math.floor(math.log2(digit)) + 1)
break
print(nzeros) # 20
Danger!!!
Is this security-sensitive code? Can this hash ever be the result of hashing secret/private data?
If so, then you should probably implement something in C or similar, while taking care to protect against leaking information about the hash through side-channels.
Otherwise, I suggest picking the version (from any of these answers) that you and the people working on your code find the most intuitive, clear, and so on, unless performance matters more than readability, in which case pick the fastest one.
If your hashes are never of security-sensitive inputs, then:
If you just want a good balance of simplicity and low-effort:
def count_leading_zeroes(value, max_bits=256):
value &= (1 << max_bits) - 1 # truncate; treat negatives as 2's compliment
if value == 0:
return max_bits
significant_bits = len(bin(value)) - 2 # has "0b" prefix
return max_bits - significant_bits
If you want to really embrace the bit twiddling you were trying in your question:
def count_leading_zeroes(value, max_bits=256):
top_bit = 1 << (max_bits - 1)
count = 0
value &= (1 << max_bits) - 1
while not value & top_bit:
count += 1
value <<= 1
return count
If you're doing manual bit twiddling, I think in this case a loop which counts from the top is the most justified option, because hashes are rather evenly randomly distributed and so each bit has about even chance of not being zero.
So you have a good chance of exiting the loop early and thus executing more efficiently if you start for from the top (if you start from the bottom you have to check every bit).
You could alternatively do a bit twiddling thing that's inspired by binary search. That way instead of O(n) steps you do O(log(n)) steps. However, this arguably isn't an optimization worth doing in CPython, and for a JIT implementation like PyPy this manual optimization can actually make it harder for automatic optimization to realize that you can just use a raw "count leading zeroes" CPU instruction. If I ever get the time I'll edit this answer with an example of that later.
Now about those side-channel attacks: basically any time you have code that works on secret data (or any results of secret data which you can't prove (like a cryptographer would) have fully irretrievably lost all information about the secret data) , you should make sure your code takes does exactly the same amount of operations and takes the same branches regardless of the data!
Explaining why you should do this is outside the scope of this answer, but if you don't, your code could be harming users by leaking their secret information in ways that hackers could access.
So!
You might be tempted to modify the simple version that uses bin, but bin is inherently hash-dependent: it produces a string whose length is conditional on the leading zeroes, and as far as I know it doesn't (and logically can't! at least not in the general case) guarantee that it does so in constant-time without data-dependent branches. So we should assume merely running bin on an integer leaks information about the integer through side-channels like runtime and branch predictor state and amount of memory allocated and so on.
For illustrative purposes, if we did have a side-channel-safe bin, which I'll call "bin_", we could do:
def count_leading_zeroes(value, max_bits=256):
value &= (1 << max_bits) - 1 # truncate; treat negatives as 2's compliment
value <<= 1 # securely compensate for 0
significant_bits = len(bin_(value)) - 3 # has "0b" prefix and "0" suffix
return max_bits - significant_bits
In principle, a bit-twiddling loop could do leading zero bit count in constant-time and free of input-dependent branches.
The challenge is writing this neatly in Python, because Python is so abstracted from the underlying machine.
The core problem is this: at the CPU level, it's really easy to take a 1 or 0 bit and turn it, branchlessly, into something more useful (like a mask with all bits 1s or all bits 2s, which then lets you conditionally but branchlessly select one of two numbers or clear a number, which you can then use to implement something like "if the lowest bit is set, reset the counter to zero"). At the Python level, implementing stuff like this is a struggle through the fog of a lot of uncertainty of how the Python runtime is implemented - there are many places where it might be reasonable to have data-dependent branches under the covers. So really we want to reduce the amount of Python steps and conversions between the digest that hashlib gives us and our leading zeroes answer.
So the best option is actually to never even reach for human-readable stuff like hex or integer forms of the digest at all! Just stick to the raw digest. Something like this, conceptually:
def count_leading_zeroes_in_bytes(data):
count = 0
# branchless "latch" mask to stop counting:
still_in_leading_zeroes = 1
for byte in data:
for index in reversed(range(8)):
bit = (byte >> index) & 1
# branchless "conditional" if bit is zero:
is_zero = bit ^ 1
# branchlessly increment count if needed:
count += is_zero & still_in_leading_zeroes
# branchlessly latch count on first 1 bit:
still_in_leading_zeroes &= is_zero
return count
This is the best I was able to think of in pure Python. And it still failed.
But some quick testing by both #KellyBundy and me (see comments and Kelly's answer for some examples) shows this version is both extremely slow, and does not actually achieve input-independent execution times (because there's yet another relevant data-dependent optimization inside Python, and possibly for other reasons we're missing).
So if you're going to try to implement anything in Python, test it thoroughly before relying on it to be actually be secure, or just taking the general gist and implementing a C or assembly version. Something like this:
/* _clz.c */
#include <limits.h> /* CHAR_BIT */
#include <stddef.h> /* size_t */
int count_leading_zeroes_bytes(char * bytes, size_t length)
{
int still_in_leading_zeroes = 1;
int count = 0;
while(length--)
{
char byte = *bytes++;
int bits = CHAR_BIT;
while(bits--)
{
int bit = (byte >> bits) & 1;
int is_zero = bit ^ 1;
count += is_zero & still_in_leading_zeroes;
still_in_leading_zeroes &= is_zero;
}
}
return count;
}
# clz.py
import ctypes
# This is just a quick example. A mature version
# would load the library as appropriate for each
# platform.
_library = ctypes.CDLL('./_clz.so')
_count_leading_zeroes_bytes = _library.count_leading_zeroes_bytes
def count_leading_zeroes_bytes(data):
return _count_leading_zeroes_bytes(
ctypes.c_char_p(data),
ctypes.c_size_t(len(data)),
)
Related
I solved the following leet code problem https://leetcode.com/problems/find-kth-bit-in-nth-binary-string/ by using strings (class Solution) and integers (class Solution2). I thought the integer solution should be a lot faster and using much less memory. But actually it takes a lot longer. With n=18 and k=200 it took 10.1 seconds compared to 0.13 seconds of the string solution.
import time
class Solution:
def findKthBit(self, n: int, k: int) -> str:
i = 0
Sn = "0"
Sn = self.findR(Sn, i, n)
return Sn[k-1]
def findR(self, Sn, i, n):
if i == n:
return Sn
newSn = self.calcNewSn(Sn, i)
return self.findR(newSn, i+1, n)
def calcNewSn(self, Sn, i):
inverted = ""
for c in Sn:
inverted += "1" if c == "0" else "0"
newSn = Sn + "1" + (inverted)[::-1]
return newSn
class Solution2:
def findKthBitBin(self, n: int, k: int) -> str:
i = 0
# MSB (S1) has to be 1 for bit operations but is actually always 0
Sn = 1
Sn = self.findRBin(Sn, i, n)
lenSn = (2**(n+1)) - 1
return "0" if k == 1 else str((Sn >> (lenSn - k)) & 1)
def findRBin(self, Sn, i, n):
if i == n:
return Sn
newSn = self.calcNewSnBin(Sn, i)
return self.findRBin(newSn, i+1, n)
def calcNewSnBin(self, Sn, i):
lenSn = 2**(i+1) - 1
newSn = (Sn << 1) | 1
inverted = (~Sn | (1 << (lenSn - 1))) & self.getMask(lenSn)
newSn = self.reverseBits(newSn, inverted, lenSn)
return newSn
def getMask(self, lenSn):
mask = 0
for i in range(lenSn):
mask |= (1 << i)
return mask
def reverseBits(self, newSn, inverted, lenSn):
for i in range(lenSn):
newSn <<= 1
newSn |= inverted & 1
inverted >>= 1
return newSn
sol = Solution()
sol2 = Solution2()
start = time.time()
print(sol.findKthBit(18, 200))
end = time.time()
print(f"time using strings: {end - start}")
start = time.time()
print(sol2.findKthBitBin(18, 200))
end = time.time()
print(f"time using large integers: {end - start}")
Why is the solution using integers so slow? Are the implemetations of big ints faster in other languages compared to strings?
The problem isn't how fast or slow the integer implementation is. The problem is that you've written an algorithm that wants a random-access mutable sequence data type, like a list, and integers are not a random-access mutable sequence.
For example, look at your reverseBits implementation:
def reverseBits(self, newSn, inverted, lenSn):
for i in range(lenSn):
newSn <<= 1
newSn |= inverted & 1
inverted >>= 1
return newSn
With a list, you could just call list.reverse(). Instead, your code shifts newSn and inverted over and over, copying almost all the data involved repeatedly on every iteration.
Strings aren't a random-access mutable sequence data type either, but they're closer, and the standard implementation of Python has a weird optimization that actually does try to mutate strings in loops like the one you've written:
def calcNewSn(self, Sn, i):
inverted = ""
for c in Sn:
inverted += "1" if c == "0" else "0"
newSn = Sn + "1" + (inverted)[::-1]
return newSn
Instead of building a new string and copying all the data for the +=, Python tries to resize the string in place. When that works, it avoids most of the unnecessary copying you're doing with the integer-based implementation.
The right way to write your algorithm would be to use lists, instead of strings or integers. However, there's an even better way, with a different algorithm that doesn't need to build a bit sequence at all. You don't need the whole bit sequence. You just need to compute a single bit. Figure out how to compute just that bit.
The bulk of the time is spent in reverseBits(). Overall, you're creating hundreds of thousands of new integer objects in that, up to 262143 bits long. It's essentially quadratic time. The corresponding operation for the string form is just (inverted)[::-1], which runs entirely at "C speed", and is worst-case linear time.
Low Level, High Level
You can get a factor of 8 speedup just by replacing reverseBits()
def reverseBits(self, newSn, inverted, lenSn):
inverted = int(f"{inverted:0{lenSn}b}"[::-1], 2)
return (newSn << lenSn) | inverted
That's linear time. It converts inverted to a bit string, reverses that string, and converts the result back to an int. Other tricks can be used to speed other overly-slow bigint algorithms.
But, as the other answer (which you should accept!) said, it's missing the forest for the trees: the whole approach is off the best track. It's possible, and even with simpler code, to write a function such that all these run blazing fast (O(n) worst case per call), and require only trivial working memory:
>>> [findKthBit(4, i) for i in range(1, 16)] == list("011100110110001")
True
>>> findKthBit(500, 2**20)
'1'
>>> findKthBit(500, 2**20 - 1)
'1'
>>> findKthBit(500, 2**20 + 1)
'0'
There's not a computer in the world with enough memory to build a string with 2**500-1 characters (or a bigint with that many bits).
I could do this in brute force, but I was hoping there was clever coding, or perhaps an existing function, or something I am not realising...
So some examples of numbers I want:
00000000001111110000
11111100000000000000
01010101010100000000
10101010101000000000
00100100100100100100
The full permutation. Except with results that have ONLY six 1's. Not more. Not less. 64 or 32 bits would be ideal. 16 bits if that provides an answer.
I think what you need here is using the itertools module.
BAD SOLUTION
But you need to be careful, for instance, using something like permutations would just work for very small inputs. ie:
Something like the below would give you a binary representation:
>>> ["".join(v) for v in set(itertools.permutations(["1"]*2+["0"]*3))]
['11000', '01001', '00101', '00011', '10010', '01100', '01010', '10001', '00110', '10100']
then just getting decimal representation of those number:
>>> [int("".join(v), 16) for v in set(itertools.permutations(["1"]*2+["0"]*3))]
[69632, 4097, 257, 17, 65552, 4352, 4112, 65537, 272, 65792]
if you wanted 32bits with 6 ones and 26 zeroes, you'd use:
>>> [int("".join(v), 16) for v in set(itertools.permutations(["1"]*6+["0"]*26))]
but this computation would take a supercomputer to deal with (32! = 263130836933693530167218012160000000 )
DECENT SOLUTION
So a more clever way to do it is using combinations, maybe something like this:
import itertools
num_bits = 32
num_ones = 6
lst = [
f"{sum([2**vv for vv in v]):b}".zfill(num_bits)
for v in list(itertools.combinations(range(num_bits), num_ones))
]
print(len(lst))
this would tell us there is 906192 numbers with 6 ones in the whole spectrum of 32bits numbers.
CREDITS:
Credits for this answer go to #Mark Dickinson who pointed out using permutations was unfeasible and suggested the usage of combinations
Well I am not a Python coder so I can not post a valid code for you. Instead I can do a C++ one...
If you look at your problem you set 6 bits and many zeros ... so I would approach this by 6 nested for loops computing all the possible 1s position and set the bits...
Something like:
for (i0= 0;i0<32-5;i0++)
for (i1=i0+1;i1<32-4;i1++)
for (i2=i1+1;i2<32-3;i2++)
for (i3=i2+1;i3<32-2;i3++)
for (i4=i3+1;i4<32-1;i4++)
for (i5=i4+1;i5<32-0;i5++)
// here i0,...,i5 marks the set bits positions
So the O(2^32) become to less than `~O(26.25.24.23.22.21/16) and you can not go faster than that as that would mean you miss valid solutions...
I assume you want to print the number so for speed up you can compute the number as a binary number string from the start to avoid slow conversion between string and number...
The nested for loops can be encoded as increment operation of an array (similar to bignum arithmetics)
When I put all together I got this C++ code:
int generate()
{
const int n1=6; // number of set bits
const int n=32; // number of bits
char x[n+2]; // output number string
int i[n1],j,cnt; // nested for loops iterator variables and found solutions count
for (j=0;j<n;j++) x[j]='0'; x[j]='b'; j++; x[j]=0; // x = 0
for (j=0;j<n1;j++){ i[j]=j; x[i[j]]='1'; } // first solution
for (cnt=0;;)
{
// Form1->mm_log->Lines->Add(x); // here x is the valid answer to print
cnt++;
for (j=n1-1;j>=0;j--) // this emulates n1 nested for loops
{
x[i[j]]='0'; i[j]++;
if (i[j]<n-n1+j+1){ x[i[j]]='1'; break; }
}
if (j<0) break;
for (j++;j<n1;j++){ i[j]=i[j-1]+1; x[i[j]]='1'; }
}
return cnt; // found valid answers
};
When I use this with n1=6,n=32 I got this output (without printing the numbers):
cnt = 906192
and it was finished in 4.246 ms on AMD A8-5500 3.2GHz (win7 x64 32bit app no threads) which is fast enough for me...
Beware once you start outputing the numbers somewhere the speed will drop drastically. Especially if you output to console or what ever ... it might be better to buffer the output somehow like outputting 1024 string numbers at once etc... But as I mentioned before I am no Python coder so it might be already handled by the environment...
On top of all this once you will play with variable n1,n you can do the same for zeros instead of ones and use faster approach (if there is less zeros then ones use nested for loops to mark zeros instead of ones)
If the wanted solution numbers are wanted as a number (not a string) then its possible to rewrite this so the i[] or i0,..i5 holds the bitmask instead of bit positions ... instead of inc/dec you just shift left/right ... and no need for x array anymore as the number would be x = i0|...|i5 ...
You could create a counter array for positions of 1s in the number and assemble it by shifting the bits in their respective positions. I created an example below. It runs pretty fast (less than a second for 32 bits on my laptop):
bitCount = 32
oneCount = 6
maxBit = 1<<(bitCount-1)
ones = [1<<b for b in reversed(range(oneCount)) ] # start with bits on low end
ones[0] >>= 1 # shift back 1st one because it will be incremented at start of loop
index = 0
result = []
while index < len(ones):
ones[index] <<= 1 # shift one at current position
if index == 0:
number = sum(ones) # build output number
result.append(number)
if ones[index] == maxBit:
index += 1 # go to next position when bit reaches max
elif index > 0:
index -= 1 # return to previous position
ones[index] = ones[index+1] # and prepare it to move up (relative to next)
64 bits takes about a minute, roughly proportional to the number of values that are output. O(n)
The same approach can be expressed more concisely in a recursive generator function which will allow more efficient use of the bit patterns:
def genOneBits(bitcount=32,onecount=6):
for bitPos in range(onecount-1,bitcount):
value = 1<<bitPos
if onecount == 1: yield value; continue
for otherBits in genOneBits(bitPos,onecount-1):
yield value + otherBits
result = [ n for n in genOneBits(32,6) ]
This is not faster when you get all the numbers but it allows partial access to the list without going through all values.
If you need direct access to the Nth bit pattern (e.g. to get a random one-bits pattern), you can use the following function. It works like indexing a list but without having to generate the list of patterns.
def numOneBits(bitcount=32,onecount=6):
def factorial(X): return 1 if X < 2 else X * factorial(X-1)
return factorial(bitcount)//factorial(onecount)//factorial(bitcount-onecount)
def nthOneBits(N,bitcount=32,onecount=6):
if onecount == 1: return 1<<N
bitPos = 0
while bitPos<=bitcount-onecount:
group = numOneBits(bitcount-bitPos-1,onecount-1)
if N < group: break
N -= group
bitPos += 1
if bitPos>bitcount-onecount: return None
result = 1<<bitPos
result |= nthOneBits(N,bitcount-bitPos-1,onecount-1)<<(bitPos+1)
return result
# bit pattern at position 1000:
nthOneBit(1000) # --> 10485799 (00000000101000000000000000100111)
This allows you to get the bit patterns on very large integers that would be impossible to generate completely:
nthOneBits(10000, bitcount=256, onecount=9)
# 77371252457588066994880639
# 100000000000000000000000000000000001000000000000000000000000000000000000000000001111111
It is worth noting that the pattern order does not follow the numerical order of the corresponding numbers
Although nthOneBits() can produce any pattern instantly, it is much slower than the other functions when mass producing patterns. If you need to manipulate them sequentially, you should go for the generator function instead of looping on nthOneBits().
Also, it should be fairly easy to tweak the generator to have it start at a specific pattern so you could get the best of both approaches.
Finally, it may be useful to obtain then next bit pattern given a known pattern. This is what the following function does:
def nextOneBits(N=0,bitcount=32,onecount=6):
if N == 0: return (1<<onecount)-1
bitPositions = []
for pos in range(bitcount):
bit = N%2
N //= 2
if bit==1: bitPositions.insert(0,pos)
index = 0
result = None
while index < onecount:
bitPositions[index] += 1
if bitPositions[index] == bitcount:
index += 1
continue
if index == 0:
result = sum( 1<<bp for bp in bitPositions )
break
if index > 0:
index -= 1
bitPositions[index] = bitPositions[index+1]
return result
nthOneBits(12) #--> 131103 00000000000000100000000000011111
nextOneBits(131103) #--> 262175 00000000000001000000000000011111 5.7ns
nthOneBits(13) #--> 262175 00000000000001000000000000011111 49.2ns
Like nthOneBits(), this one does not need any setup time. It could be used in combination with nthOneBits() to get subsequent patterns after getting an initial one at a given position. nextOneBits() is much faster than nthOneBits(i+1) but is still slower than the generator function.
For very large integers, using nthOneBits() and nextOneBits() may be the only practical options.
You are dealing with permutations of multisets. There are many ways to achieve this and as #BPL points out, doing this efficiently is non-trivial. There are many great methods mentioned here: permutations with unique values. The cleanest (not sure if it's the most efficient), is to use the multiset_permutations from the sympy module.
import time
from sympy.utilities.iterables import multiset_permutations
t = time.process_time()
## Credit to #BPL for the general setup
multiPerms = ["".join(v) for v in multiset_permutations(["1"]*6+["0"]*26)]
elapsed_time = time.process_time() - t
print(elapsed_time)
On my machine, the above computes in just over 8 seconds. It generates just under a million results as well:
len(multiPerms)
906192
When comparing floats to integers, some pairs of values take much longer to be evaluated than other values of a similar magnitude.
For example:
>>> import timeit
>>> timeit.timeit("562949953420000.7 < 562949953421000") # run 1 million times
0.5387085462592742
But if the float or integer is made smaller or larger by a certain amount, the comparison runs much more quickly:
>>> timeit.timeit("562949953420000.7 < 562949953422000") # integer increased by 1000
0.1481498428446173
>>> timeit.timeit("562949953423001.8 < 562949953421000") # float increased by 3001.1
0.1459577925548956
Changing the comparison operator (e.g. using == or > instead) does not affect the times in any noticeable way.
This is not solely related to magnitude because picking larger or smaller values can result in faster comparisons, so I suspect it is down to some unfortunate way the bits line up.
Clearly, comparing these values is more than fast enough for most use cases. I am simply curious as to why Python seems to struggle more with some pairs of values than with others.
A comment in the Python source code for float objects acknowledges that:
Comparison is pretty much a nightmare
This is especially true when comparing a float to an integer, because, unlike floats, integers in Python can be arbitrarily large and are always exact. Trying to cast the integer to a float might lose precision and make the comparison inaccurate. Trying to cast the float to an integer is not going to work either because any fractional part will be lost.
To get around this problem, Python performs a series of checks, returning the result if one of the checks succeeds. It compares the signs of the two values, then whether the integer is "too big" to be a float, then compares the exponent of the float to the length of the integer. If all of these checks fail, it is necessary to construct two new Python objects to compare in order to obtain the result.
When comparing a float v to an integer/long w, the worst case is that:
v and w have the same sign (both positive or both negative),
the integer w has few enough bits that it can be held in the size_t type (typically 32 or 64 bits),
the integer w has at least 49 bits,
the exponent of the float v is the same as the number of bits in w.
And this is exactly what we have for the values in the question:
>>> import math
>>> math.frexp(562949953420000.7) # gives the float's (significand, exponent) pair
(0.9999999999976706, 49)
>>> (562949953421000).bit_length()
49
We see that 49 is both the exponent of the float and the number of bits in the integer. Both numbers are positive and so the four criteria above are met.
Choosing one of the values to be larger (or smaller) can change the number of bits of the integer, or the value of the exponent, and so Python is able to determine the result of the comparison without performing the expensive final check.
This is specific to the CPython implementation of the language.
The comparison in more detail
The float_richcompare function handles the comparison between two values v and w.
Below is a step-by-step description of the checks that the function performs. The comments in the Python source are actually very helpful when trying to understand what the function does, so I've left them in where relevant. I've also summarised these checks in a list at the foot of the answer.
The main idea is to map the Python objects v and w to two appropriate C doubles, i and j, which can then be easily compared to give the correct result. Both Python 2 and Python 3 use the same ideas to do this (the former just handles int and long types separately).
The first thing to do is check that v is definitely a Python float and map it to a C double i. Next the function looks at whether w is also a float and maps it to a C double j. This is the best case scenario for the function as all the other checks can be skipped. The function also checks to see whether v is inf or nan:
static PyObject*
float_richcompare(PyObject *v, PyObject *w, int op)
{
double i, j;
int r = 0;
assert(PyFloat_Check(v));
i = PyFloat_AS_DOUBLE(v);
if (PyFloat_Check(w))
j = PyFloat_AS_DOUBLE(w);
else if (!Py_IS_FINITE(i)) {
if (PyLong_Check(w))
j = 0.0;
else
goto Unimplemented;
}
Now we know that if w failed these checks, it is not a Python float. Now the function checks if it's a Python integer. If this is the case, the easiest test is to extract the sign of v and the sign of w (return 0 if zero, -1 if negative, 1 if positive). If the signs are different, this is all the information needed to return the result of the comparison:
else if (PyLong_Check(w)) {
int vsign = i == 0.0 ? 0 : i < 0.0 ? -1 : 1;
int wsign = _PyLong_Sign(w);
size_t nbits;
int exponent;
if (vsign != wsign) {
/* Magnitudes are irrelevant -- the signs alone
* determine the outcome.
*/
i = (double)vsign;
j = (double)wsign;
goto Compare;
}
}
If this check failed, then v and w have the same sign.
The next check counts the number of bits in the integer w. If it has too many bits then it can't possibly be held as a float and so must be larger in magnitude than the float v:
nbits = _PyLong_NumBits(w);
if (nbits == (size_t)-1 && PyErr_Occurred()) {
/* This long is so large that size_t isn't big enough
* to hold the # of bits. Replace with little doubles
* that give the same outcome -- w is so large that
* its magnitude must exceed the magnitude of any
* finite float.
*/
PyErr_Clear();
i = (double)vsign;
assert(wsign != 0);
j = wsign * 2.0;
goto Compare;
}
On the other hand, if the integer w has 48 or fewer bits, it can safely turned in a C double j and compared:
if (nbits <= 48) {
j = PyLong_AsDouble(w);
/* It's impossible that <= 48 bits overflowed. */
assert(j != -1.0 || ! PyErr_Occurred());
goto Compare;
}
From this point onwards, we know that w has 49 or more bits. It will be convenient to treat w as a positive integer, so change the sign and the comparison operator as necessary:
if (nbits <= 48) {
/* "Multiply both sides" by -1; this also swaps the
* comparator.
*/
i = -i;
op = _Py_SwappedOp[op];
}
Now the function looks at the exponent of the float. Recall that a float can be written (ignoring sign) as significand * 2exponent and that the significand represents a number between 0.5 and 1:
(void) frexp(i, &exponent);
if (exponent < 0 || (size_t)exponent < nbits) {
i = 1.0;
j = 2.0;
goto Compare;
}
This checks two things. If the exponent is less than 0 then the float is smaller than 1 (and so smaller in magnitude than any integer). Or, if the exponent is less than the number of bits in w then we have that v < |w| since significand * 2exponent is less than 2nbits.
Failing these two checks, the function looks to see whether the exponent is greater than the number of bit in w. This shows that significand * 2exponent is greater than 2nbits and so v > |w|:
if ((size_t)exponent > nbits) {
i = 2.0;
j = 1.0;
goto Compare;
}
If this check did not succeed we know that the exponent of the float v is the same as the number of bits in the integer w.
The only way that the two values can be compared now is to construct two new Python integers from v and w. The idea is to discard the fractional part of v, double the integer part, and then add one. w is also doubled and these two new Python objects can be compared to give the correct return value. Using an example with small values, 4.65 < 4 would be determined by the comparison (2*4)+1 == 9 < 8 == (2*4) (returning false).
{
double fracpart;
double intpart;
PyObject *result = NULL;
PyObject *one = NULL;
PyObject *vv = NULL;
PyObject *ww = w;
// snip
fracpart = modf(i, &intpart); // split i (the double that v mapped to)
vv = PyLong_FromDouble(intpart);
// snip
if (fracpart != 0.0) {
/* Shift left, and or a 1 bit into vv
* to represent the lost fraction.
*/
PyObject *temp;
one = PyLong_FromLong(1);
temp = PyNumber_Lshift(ww, one); // left-shift doubles an integer
ww = temp;
temp = PyNumber_Lshift(vv, one);
vv = temp;
temp = PyNumber_Or(vv, one); // a doubled integer is even, so this adds 1
vv = temp;
}
// snip
}
}
For brevity I've left out the additional error-checking and garbage-tracking Python has to do when it creates these new objects. Needless to say, this adds additional overhead and explains why the values highlighted in the question are significantly slower to compare than others.
Here is a summary of the checks that are performed by the comparison function.
Let v be a float and cast it as a C double. Now, if w is also a float:
Check whether w is nan or inf. If so, handle this special case separately depending on the type of w.
If not, compare v and w directly by their representations as C doubles.
If w is an integer:
Extract the signs of v and w. If they are different then we know v and w are different and which is the greater value.
(The signs are the same.) Check whether w has too many bits to be a float (more than size_t). If so, w has greater magnitude than v.
Check if w has 48 or fewer bits. If so, it can be safely cast to a C double without losing its precision and compared with v.
(w has more than 48 bits. We will now treat w as a positive integer having changed the compare op as appropriate.)
Consider the exponent of the float v. If the exponent is negative, then v is less than 1 and therefore less than any positive integer. Else, if the exponent is less than the number of bits in w then it must be less than w.
If the exponent of v is greater than the number of bits in w then v is greater than w.
(The exponent is the same as the number of bits in w.)
The final check. Split v into its integer and fractional parts. Double the integer part and add 1 to compensate for the fractional part. Now double the integer w. Compare these two new integers instead to get the result.
Using gmpy2 with arbitrary precision floats and integers it is possible to get more uniform comparison performance:
~ $ ptipython
Python 3.5.1 |Anaconda 4.0.0 (64-bit)| (default, Dec 7 2015, 11:16:01)
Type "copyright", "credits" or "license" for more information.
IPython 4.1.2 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import gmpy2
In [2]: from gmpy2 import mpfr
In [3]: from gmpy2 import mpz
In [4]: gmpy2.get_context().precision=200
In [5]: i1=562949953421000
In [6]: i2=562949953422000
In [7]: f=562949953420000.7
In [8]: i11=mpz('562949953421000')
In [9]: i12=mpz('562949953422000')
In [10]: f1=mpfr('562949953420000.7')
In [11]: f<i1
Out[11]: True
In [12]: f<i2
Out[12]: True
In [13]: f1<i11
Out[13]: True
In [14]: f1<i12
Out[14]: True
In [15]: %timeit f<i1
The slowest run took 10.15 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 441 ns per loop
In [16]: %timeit f<i2
The slowest run took 12.55 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 152 ns per loop
In [17]: %timeit f1<i11
The slowest run took 32.04 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 269 ns per loop
In [18]: %timeit f1<i12
The slowest run took 36.81 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 231 ns per loop
In [19]: %timeit f<i11
The slowest run took 78.26 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 156 ns per loop
In [20]: %timeit f<i12
The slowest run took 21.24 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 194 ns per loop
In [21]: %timeit f1<i1
The slowest run took 37.61 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 275 ns per loop
In [22]: %timeit f1<i2
The slowest run took 39.03 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 259 ns per loop
If lv stores a long value, and the machine is 32 bits, the following code:
iv = int(lv & 0xffffffff)
results an iv of type long, instead of the machine's int.
How can I get the (signed) int value in this case?
import ctypes
number = lv & 0xFFFFFFFF
signed_number = ctypes.c_long(number).value
You're working in a high-level scripting language; by nature, the native data types of the system you're running on aren't visible. You can't cast to a native signed int with code like this.
If you know that you want the value converted to a 32-bit signed integer--regardless of the platform--you can just do the conversion with the simple math:
iv = 0xDEADBEEF
if(iv & 0x80000000):
iv = -0x100000000 + iv
Essentially, the problem is to sign extend from 32 bits to... an infinite number of bits, because Python has arbitrarily large integers. Normally, sign extension is done automatically by CPU instructions when casting, so it's interesting that this is harder in Python than it would be in, say, C.
By playing around, I found something similar to BreizhGatch's function, but that doesn't require a conditional statement. n & 0x80000000 extracts the 32-bit sign bit; then, the - keeps the same 32-bit representation but sign-extends it; finally, the extended sign bits are set on n.
def toSigned32(n):
n = n & 0xffffffff
return n | (-(n & 0x80000000))
Bit Twiddling Hacks suggests another solution that perhaps works more generally. n ^ 0x80000000 flips the 32-bit sign bit; then - 0x80000000 will sign-extend the opposite bit. Another way to think about it is that initially, negative numbers are above positive numbers (separated by 0x80000000); the ^ swaps their positions; then the - shifts negative numbers to below 0.
def toSigned32(n):
n = n & 0xffffffff
return (n ^ 0x80000000) - 0x80000000
Can I suggest this:
def getSignedNumber(number, bitLength):
mask = (2 ** bitLength) - 1
if number & (1 << (bitLength - 1)):
return number | ~mask
else:
return number & mask
print iv, '->', getSignedNumber(iv, 32)
You may use struct library to convert values like that. It's ugly, but works:
from struct import pack, unpack
signed = unpack('l', pack('L', lv & 0xffffffff))[0]
A quick and dirty solution (x is never greater than 32-bit in my case).
if x > 0x7fffffff:
x = x - 4294967296
If you know how many bits are in the original value, e.g. byte or multibyte values from an I2C sensor, then you can do the standard Two's Complement conversion:
def TwosComp8(n):
return n - 0x100 if n & 0x80 else n
def TwosComp16(n):
return n - 0x10000 if n & 0x8000 else n
def TwosComp32(n):
return n - 0x100000000 if n & 0x80000000 else n
In case the hexadecimal representation of the number is of 4 bytes, this would solve the problem.
def B2T_32(x):
num=int(x,16)
if(num & 0x80000000): # If it has the negative sign bit. (MSB=1)
num -= 0x80000000*2
return num
print(B2T_32(input("enter a input as a hex value\n")))
Simplest solution with any bit-length of number
Why is the syntax of a signed integer so difficult for the human mind to understand. Because this is the idea of machines. :-)
Let's explain.
If we have a bi-directional 7-bit counter with the initial state
000 0000
and we get a pulse for the back count input. Then the next number to count will be
111 1111
And the people said:
Hey, the counter we need to know that this is a negative reload. You
should add a sign letting you know about this.
And the counter added:
1111 1111
And people asked,
How are we going to calculate that this is -1.
The counter replied: Find a number one greater than the reading and subtract it and you get the result.
1111 1111
-10000 0000
____________
(dec) -1
def sigIntFromHex(a): # a = 0x0xffe1
if a & (1 << (a.bit_length()-1)): # check if highest bit is 1 thru & with 0x1000
return a - (1 << (a.bit_length())) # 0xffe1 - 0x10000
else:
return a
###and more elegant:###
def sigIntFromHex(a):
return a - (1 << (a.bit_length())) if a & (1 << (a.bit_length()-1)) else a
b = 0xFFE1
print(sigIntFromHex(b))
I hope I helped
Building on How Do You Express Binary Literals in Python, I was thinking about sensible, intuitive ways to do that Programming 101 chestnut of displaying integers in base-2 form. This is the best I came up with, but I'd like to replace it with a better algorithm, or at least one that should have screaming-fast performance.
def num_bin(N, places=8):
def bit_at_p(N, p):
''' find the bit at place p for number n '''
two_p = 1 << p # 2 ^ p, using bitshift, will have exactly one
# bit set, at place p
x = N & two_p # binary composition, will be one where *both* numbers
# have a 1 at that bit. this can only happen
# at position p. will yield two_p if N has a 1 at
# bit p
return int(x > 0)
bits = ( bit_at_p(N,x) for x in xrange(places))
return "".join( (str(x) for x in bits) )
# or, more consisely
# return "".join([str(int((N & 1 << x)>0)) for x in xrange(places)])
For best efficiency, you generally want to process more than a single bit at a time.
You can use a simple method to get a fixed width binary representation. eg.
def _bin(x, width):
return ''.join(str((x>>i)&1) for i in xrange(width-1,-1,-1))
_bin(x, 8) will now give a zero padded representation of x's lower 8 bits. This can be used to build a lookup table, allowing your converter to process 8 bits at a time (or more if you want to devote the memory to it).
_conv_table = [_bin(x,8) for x in range(256)]
Then you can use this in your real function, stripping off leading zeroes when returning it. I've also added handling for signed numbers, as without it you will get an infinite loop (Negative integers conceptually have an infinite number of set sign bits.)
def bin(x):
if x == 0:
return '0' #Special case: Don't strip leading zero if no other digits
elif x < 0:
sign='-'
x*=-1
else:
sign = ''
l=[]
while x:
l.append(_conv_table[x & 0xff])
x >>= 8
return sign + ''.join(reversed(l)).lstrip("0")
[Edit] Changed code to handle signed integers.
[Edit2] Here are some timing figures of the various solutions. bin is the function above, constantin_bin is from Constantin's answer and num_bin is the original version. Out of curiosity, I also tried a 16 bit lookup table variant of the above (bin16 below), and tried out Python3's builtin bin() function. All timings were for 100000 runs using an 01010101 bit pattern.
Num Bits: 8 16 32 64 128 256
---------------------------------------------------------------------
bin 0.544 0.586 0.744 1.942 1.854 3.357
bin16 0.542 0.494 0.592 0.773 1.150 1.886
constantin_bin 2.238 3.803 7.794 17.869 34.636 94.799
num_bin 3.712 5.693 12.086 32.566 67.523 128.565
Python3's bin 0.079 0.045 0.062 0.069 0.212 0.201
As you can see, when processing long values using large chunks really pays off, but nothing beats the low-level C code of python3's builtin (which bizarrely seems consistently faster at 256 bits than 128!). Using a 16 bit lookup table improves things, but probably isn't worth it unless you really need it, as it uses up a large chunk of memory, and can introduce a small but noticalbe startup delay to precompute the table.
Not screaming-fast, but straightforward:
>>> def bin(x):
... sign = '-' if x < 0 else ''
... x = abs(x)
... bits = []
... while x:
... x, rmost = divmod(x, 2)
... bits.append(rmost)
... return sign + ''.join(str(b) for b in reversed(bits or [0]))
It is also faster than num_bin:
>>> import timeit
>>> t_bin = timeit.Timer('bin(0xf0)', 'from __main__ import bin')
>>> print t_bin.timeit(number=100000)
4.19453350997
>>> t_num_bin = timeit.Timer('num_bin(0xf0)', 'from __main__ import num_bin')
>>> print t_num_bin.timeit(number=100000)
4.70694716882
Even more, it actually works correctly (for my definition of "correctness" :)):
>>> bin(1)
'1'
>>> num_bin(1)
'10000000'