Why are some float < integer comparisons four times slower than others? - python

When comparing floats to integers, some pairs of values take much longer to be evaluated than other values of a similar magnitude.
For example:
>>> import timeit
>>> timeit.timeit("562949953420000.7 < 562949953421000") # run 1 million times
0.5387085462592742
But if the float or integer is made smaller or larger by a certain amount, the comparison runs much more quickly:
>>> timeit.timeit("562949953420000.7 < 562949953422000") # integer increased by 1000
0.1481498428446173
>>> timeit.timeit("562949953423001.8 < 562949953421000") # float increased by 3001.1
0.1459577925548956
Changing the comparison operator (e.g. using == or > instead) does not affect the times in any noticeable way.
This is not solely related to magnitude because picking larger or smaller values can result in faster comparisons, so I suspect it is down to some unfortunate way the bits line up.
Clearly, comparing these values is more than fast enough for most use cases. I am simply curious as to why Python seems to struggle more with some pairs of values than with others.

A comment in the Python source code for float objects acknowledges that:
Comparison is pretty much a nightmare
This is especially true when comparing a float to an integer, because, unlike floats, integers in Python can be arbitrarily large and are always exact. Trying to cast the integer to a float might lose precision and make the comparison inaccurate. Trying to cast the float to an integer is not going to work either because any fractional part will be lost.
To get around this problem, Python performs a series of checks, returning the result if one of the checks succeeds. It compares the signs of the two values, then whether the integer is "too big" to be a float, then compares the exponent of the float to the length of the integer. If all of these checks fail, it is necessary to construct two new Python objects to compare in order to obtain the result.
When comparing a float v to an integer/long w, the worst case is that:
v and w have the same sign (both positive or both negative),
the integer w has few enough bits that it can be held in the size_t type (typically 32 or 64 bits),
the integer w has at least 49 bits,
the exponent of the float v is the same as the number of bits in w.
And this is exactly what we have for the values in the question:
>>> import math
>>> math.frexp(562949953420000.7) # gives the float's (significand, exponent) pair
(0.9999999999976706, 49)
>>> (562949953421000).bit_length()
49
We see that 49 is both the exponent of the float and the number of bits in the integer. Both numbers are positive and so the four criteria above are met.
Choosing one of the values to be larger (or smaller) can change the number of bits of the integer, or the value of the exponent, and so Python is able to determine the result of the comparison without performing the expensive final check.
This is specific to the CPython implementation of the language.
The comparison in more detail
The float_richcompare function handles the comparison between two values v and w.
Below is a step-by-step description of the checks that the function performs. The comments in the Python source are actually very helpful when trying to understand what the function does, so I've left them in where relevant. I've also summarised these checks in a list at the foot of the answer.
The main idea is to map the Python objects v and w to two appropriate C doubles, i and j, which can then be easily compared to give the correct result. Both Python 2 and Python 3 use the same ideas to do this (the former just handles int and long types separately).
The first thing to do is check that v is definitely a Python float and map it to a C double i. Next the function looks at whether w is also a float and maps it to a C double j. This is the best case scenario for the function as all the other checks can be skipped. The function also checks to see whether v is inf or nan:
static PyObject*
float_richcompare(PyObject *v, PyObject *w, int op)
{
double i, j;
int r = 0;
assert(PyFloat_Check(v));
i = PyFloat_AS_DOUBLE(v);
if (PyFloat_Check(w))
j = PyFloat_AS_DOUBLE(w);
else if (!Py_IS_FINITE(i)) {
if (PyLong_Check(w))
j = 0.0;
else
goto Unimplemented;
}
Now we know that if w failed these checks, it is not a Python float. Now the function checks if it's a Python integer. If this is the case, the easiest test is to extract the sign of v and the sign of w (return 0 if zero, -1 if negative, 1 if positive). If the signs are different, this is all the information needed to return the result of the comparison:
else if (PyLong_Check(w)) {
int vsign = i == 0.0 ? 0 : i < 0.0 ? -1 : 1;
int wsign = _PyLong_Sign(w);
size_t nbits;
int exponent;
if (vsign != wsign) {
/* Magnitudes are irrelevant -- the signs alone
* determine the outcome.
*/
i = (double)vsign;
j = (double)wsign;
goto Compare;
}
}
If this check failed, then v and w have the same sign.
The next check counts the number of bits in the integer w. If it has too many bits then it can't possibly be held as a float and so must be larger in magnitude than the float v:
nbits = _PyLong_NumBits(w);
if (nbits == (size_t)-1 && PyErr_Occurred()) {
/* This long is so large that size_t isn't big enough
* to hold the # of bits. Replace with little doubles
* that give the same outcome -- w is so large that
* its magnitude must exceed the magnitude of any
* finite float.
*/
PyErr_Clear();
i = (double)vsign;
assert(wsign != 0);
j = wsign * 2.0;
goto Compare;
}
On the other hand, if the integer w has 48 or fewer bits, it can safely turned in a C double j and compared:
if (nbits <= 48) {
j = PyLong_AsDouble(w);
/* It's impossible that <= 48 bits overflowed. */
assert(j != -1.0 || ! PyErr_Occurred());
goto Compare;
}
From this point onwards, we know that w has 49 or more bits. It will be convenient to treat w as a positive integer, so change the sign and the comparison operator as necessary:
if (nbits <= 48) {
/* "Multiply both sides" by -1; this also swaps the
* comparator.
*/
i = -i;
op = _Py_SwappedOp[op];
}
Now the function looks at the exponent of the float. Recall that a float can be written (ignoring sign) as significand * 2exponent and that the significand represents a number between 0.5 and 1:
(void) frexp(i, &exponent);
if (exponent < 0 || (size_t)exponent < nbits) {
i = 1.0;
j = 2.0;
goto Compare;
}
This checks two things. If the exponent is less than 0 then the float is smaller than 1 (and so smaller in magnitude than any integer). Or, if the exponent is less than the number of bits in w then we have that v < |w| since significand * 2exponent is less than 2nbits.
Failing these two checks, the function looks to see whether the exponent is greater than the number of bit in w. This shows that significand * 2exponent is greater than 2nbits and so v > |w|:
if ((size_t)exponent > nbits) {
i = 2.0;
j = 1.0;
goto Compare;
}
If this check did not succeed we know that the exponent of the float v is the same as the number of bits in the integer w.
The only way that the two values can be compared now is to construct two new Python integers from v and w. The idea is to discard the fractional part of v, double the integer part, and then add one. w is also doubled and these two new Python objects can be compared to give the correct return value. Using an example with small values, 4.65 < 4 would be determined by the comparison (2*4)+1 == 9 < 8 == (2*4) (returning false).
{
double fracpart;
double intpart;
PyObject *result = NULL;
PyObject *one = NULL;
PyObject *vv = NULL;
PyObject *ww = w;
// snip
fracpart = modf(i, &intpart); // split i (the double that v mapped to)
vv = PyLong_FromDouble(intpart);
// snip
if (fracpart != 0.0) {
/* Shift left, and or a 1 bit into vv
* to represent the lost fraction.
*/
PyObject *temp;
one = PyLong_FromLong(1);
temp = PyNumber_Lshift(ww, one); // left-shift doubles an integer
ww = temp;
temp = PyNumber_Lshift(vv, one);
vv = temp;
temp = PyNumber_Or(vv, one); // a doubled integer is even, so this adds 1
vv = temp;
}
// snip
}
}
For brevity I've left out the additional error-checking and garbage-tracking Python has to do when it creates these new objects. Needless to say, this adds additional overhead and explains why the values highlighted in the question are significantly slower to compare than others.
Here is a summary of the checks that are performed by the comparison function.
Let v be a float and cast it as a C double. Now, if w is also a float:
Check whether w is nan or inf. If so, handle this special case separately depending on the type of w.
If not, compare v and w directly by their representations as C doubles.
If w is an integer:
Extract the signs of v and w. If they are different then we know v and w are different and which is the greater value.
(The signs are the same.) Check whether w has too many bits to be a float (more than size_t). If so, w has greater magnitude than v.
Check if w has 48 or fewer bits. If so, it can be safely cast to a C double without losing its precision and compared with v.
(w has more than 48 bits. We will now treat w as a positive integer having changed the compare op as appropriate.)
Consider the exponent of the float v. If the exponent is negative, then v is less than 1 and therefore less than any positive integer. Else, if the exponent is less than the number of bits in w then it must be less than w.
If the exponent of v is greater than the number of bits in w then v is greater than w.
(The exponent is the same as the number of bits in w.)
The final check. Split v into its integer and fractional parts. Double the integer part and add 1 to compensate for the fractional part. Now double the integer w. Compare these two new integers instead to get the result.

Using gmpy2 with arbitrary precision floats and integers it is possible to get more uniform comparison performance:
~ $ ptipython
Python 3.5.1 |Anaconda 4.0.0 (64-bit)| (default, Dec 7 2015, 11:16:01)
Type "copyright", "credits" or "license" for more information.
IPython 4.1.2 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import gmpy2
In [2]: from gmpy2 import mpfr
In [3]: from gmpy2 import mpz
In [4]: gmpy2.get_context().precision=200
In [5]: i1=562949953421000
In [6]: i2=562949953422000
In [7]: f=562949953420000.7
In [8]: i11=mpz('562949953421000')
In [9]: i12=mpz('562949953422000')
In [10]: f1=mpfr('562949953420000.7')
In [11]: f<i1
Out[11]: True
In [12]: f<i2
Out[12]: True
In [13]: f1<i11
Out[13]: True
In [14]: f1<i12
Out[14]: True
In [15]: %timeit f<i1
The slowest run took 10.15 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 441 ns per loop
In [16]: %timeit f<i2
The slowest run took 12.55 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 152 ns per loop
In [17]: %timeit f1<i11
The slowest run took 32.04 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 269 ns per loop
In [18]: %timeit f1<i12
The slowest run took 36.81 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 231 ns per loop
In [19]: %timeit f<i11
The slowest run took 78.26 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 156 ns per loop
In [20]: %timeit f<i12
The slowest run took 21.24 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 194 ns per loop
In [21]: %timeit f1<i1
The slowest run took 37.61 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 275 ns per loop
In [22]: %timeit f1<i2
The slowest run took 39.03 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 259 ns per loop

Related

Counting the number of leading zero bits in a sha256 encrpytion

I'm having trouble trying to count the number of leading zero bits after an sha256 hash function as I don't have a lot of experience on 'low level' stuff in python
hex = hashlib.sha256((some_data_from_file).encode('ascii')).hexdigest()
# hex = 0000094e7cc7303a3e33aaeaba76ad937603d4d040064f473a12f10ab30a879f
# this has 20 leading zero bits
hex_digits = int.from_bytes(bytes(hex.encode('ascii')), 'big') #convert str to int
#count num of leading zeroes
def countZeros(x):
total_bits = 256
res = 0
while ((x & (1 << (total_bits - 1))) == 0):
x = (x << 1)
res += 1
return res
print(countZeroes(hex_digits)) # returns 2
I've also tried converting it using bin() however that didn't provide me with any leading zeros.
Instead of getting the hex digest and analyzing that hex string, you could just get the digest, interpret it as an int, ask for its bit-length, and subtract that from 256:
digest = hashlib.sha256(some_data_from_file.encode('ascii')).digest()
print(256 - int.from_bytes(digest, 'big').bit_length())
Demo (Try it online!):
import hashlib
some_data_from_file = '665782'
# Show hex digest for clarity
hex = hashlib.sha256(some_data_from_file.encode('ascii')).hexdigest()
print(hex)
# Show number of leading zero bits
digest = hashlib.sha256(some_data_from_file.encode('ascii')).digest()
print(256 - int.from_bytes(digest, 'big').bit_length())
Output:
0000000399c6aea5ad0c709a9bc331a3ed6494702bd1d129d8c817a0257a1462
30
Benchmark along with Pranav's (not sure how to handle mtraceur's) starting with sha256-values (i.e., before calling hexdigest() or digest()):
462 ns 464 ns 471 ns just_digest
510 ns 518 ns 519 ns just_hexdigest
566 ns 568 ns 574 ns Kelly3
608 ns 608 ns 611 ns Kelly2
688 ns 688 ns 692 ns Kelly
1139 ns 1139 ns 1140 ns Pranav
Benchmark code (Try it online!):
def Kelly(sha256):
return 256 - int.from_bytes(sha256.digest(), 'big').bit_length()
def Kelly2(sha256):
zeros = 0
for byte in sha256.digest():
if byte:
return zeros + 8 - byte.bit_length()
zeros += 8
return zeros
def Kelly3(sha256):
digest = sha256.digest()
if byte := digest[0]:
return 8 - byte.bit_length()
zeros = 0
for byte in digest:
if byte:
return zeros + 8 - byte.bit_length()
zeros += 8
return zeros
def Pranav(sha256):
nzeros = 0
for c in sha256.hexdigest():
if c == "0": nzeros += 4
else:
digit = int(c, base=16)
nzeros += 4 - (math.floor(math.log2(digit)) + 1)
break
return nzeros
def just_digest(sha256):
return sha256.digest()
def just_hexdigest(sha256):
return sha256.hexdigest()
funcs = just_digest, just_hexdigest, Kelly3, Kelly2, Kelly, Pranav
from timeit import repeat
import hashlib, math
from collections import deque
sha256s = [hashlib.sha256(str(i).encode('ascii'))
for i in range(10_000)]
expect = list(map(Kelly, sha256s))
for func in funcs:
result = list(map(func, sha256s))
print(result == expect, func.__name__)
tss = [[] for _ in funcs]
for _ in range(10):
print()
for func, ts in zip(funcs, tss):
t = min(repeat(lambda: deque(map(func, sha256s), 0), number=1))
ts.append(t)
for func, ts in zip(funcs, tss):
print(*('%4d ns ' % (t / len(sha256s) * 1e9) for t in sorted(ts)[:3]), func.__name__)
.hexdigest() returns a string, so your hex variable is a string.
I'm going to call it h instead, because hex is a builtin python function.
So you have:
h = "0000094e7cc7303a3e33aaeaba76ad937603d4d040064f473a12f10ab30a879f"
Now this is a hexadecimal string. Each digit in a hexadecimal number gives you four bits in binary. Since this has five leading zeros, you already have 5 * 4 = 20 leading zeros.
nzeros = 0
for c in h:
if c == "0": nzeros += 4
else: break
Then, you need to count the leading zeros in the binary representation of the first non-zero hexadecimal digit. This is easy to get: A number has math.floor(math.log2(number)) + 1 binary digits, i.e. 4 - (math.floor(math.log2(number)) + 1) leading zeros if it's a hexadecimal digit, since they can only have a max of 4 bits. In this case, the digit is a 9 (1001 in binary), so there are zero additional leading zeros.
So, modify the previous loop:
nzeros = 0
for c in h:
if c == "0": nzeros += 4
else:
digit = int(c, base=16)
nzeros += 4 - (math.floor(math.log2(digit)) + 1)
break
print(nzeros) # 20
Danger!!!
Is this security-sensitive code? Can this hash ever be the result of hashing secret/private data?
If so, then you should probably implement something in C or similar, while taking care to protect against leaking information about the hash through side-channels.
Otherwise, I suggest picking the version (from any of these answers) that you and the people working on your code find the most intuitive, clear, and so on, unless performance matters more than readability, in which case pick the fastest one.
If your hashes are never of security-sensitive inputs, then:
If you just want a good balance of simplicity and low-effort:
def count_leading_zeroes(value, max_bits=256):
value &= (1 << max_bits) - 1 # truncate; treat negatives as 2's compliment
if value == 0:
return max_bits
significant_bits = len(bin(value)) - 2 # has "0b" prefix
return max_bits - significant_bits
If you want to really embrace the bit twiddling you were trying in your question:
def count_leading_zeroes(value, max_bits=256):
top_bit = 1 << (max_bits - 1)
count = 0
value &= (1 << max_bits) - 1
while not value & top_bit:
count += 1
value <<= 1
return count
If you're doing manual bit twiddling, I think in this case a loop which counts from the top is the most justified option, because hashes are rather evenly randomly distributed and so each bit has about even chance of not being zero.
So you have a good chance of exiting the loop early and thus executing more efficiently if you start for from the top (if you start from the bottom you have to check every bit).
You could alternatively do a bit twiddling thing that's inspired by binary search. That way instead of O(n) steps you do O(log(n)) steps. However, this arguably isn't an optimization worth doing in CPython, and for a JIT implementation like PyPy this manual optimization can actually make it harder for automatic optimization to realize that you can just use a raw "count leading zeroes" CPU instruction. If I ever get the time I'll edit this answer with an example of that later.
Now about those side-channel attacks: basically any time you have code that works on secret data (or any results of secret data which you can't prove (like a cryptographer would) have fully irretrievably lost all information about the secret data) , you should make sure your code takes does exactly the same amount of operations and takes the same branches regardless of the data!
Explaining why you should do this is outside the scope of this answer, but if you don't, your code could be harming users by leaking their secret information in ways that hackers could access.
So!
You might be tempted to modify the simple version that uses bin, but bin is inherently hash-dependent: it produces a string whose length is conditional on the leading zeroes, and as far as I know it doesn't (and logically can't! at least not in the general case) guarantee that it does so in constant-time without data-dependent branches. So we should assume merely running bin on an integer leaks information about the integer through side-channels like runtime and branch predictor state and amount of memory allocated and so on.
For illustrative purposes, if we did have a side-channel-safe bin, which I'll call "bin_", we could do:
def count_leading_zeroes(value, max_bits=256):
value &= (1 << max_bits) - 1 # truncate; treat negatives as 2's compliment
value <<= 1 # securely compensate for 0
significant_bits = len(bin_(value)) - 3 # has "0b" prefix and "0" suffix
return max_bits - significant_bits
In principle, a bit-twiddling loop could do leading zero bit count in constant-time and free of input-dependent branches.
The challenge is writing this neatly in Python, because Python is so abstracted from the underlying machine.
The core problem is this: at the CPU level, it's really easy to take a 1 or 0 bit and turn it, branchlessly, into something more useful (like a mask with all bits 1s or all bits 2s, which then lets you conditionally but branchlessly select one of two numbers or clear a number, which you can then use to implement something like "if the lowest bit is set, reset the counter to zero"). At the Python level, implementing stuff like this is a struggle through the fog of a lot of uncertainty of how the Python runtime is implemented - there are many places where it might be reasonable to have data-dependent branches under the covers. So really we want to reduce the amount of Python steps and conversions between the digest that hashlib gives us and our leading zeroes answer.
So the best option is actually to never even reach for human-readable stuff like hex or integer forms of the digest at all! Just stick to the raw digest. Something like this, conceptually:
def count_leading_zeroes_in_bytes(data):
count = 0
# branchless "latch" mask to stop counting:
still_in_leading_zeroes = 1
for byte in data:
for index in reversed(range(8)):
bit = (byte >> index) & 1
# branchless "conditional" if bit is zero:
is_zero = bit ^ 1
# branchlessly increment count if needed:
count += is_zero & still_in_leading_zeroes
# branchlessly latch count on first 1 bit:
still_in_leading_zeroes &= is_zero
return count
This is the best I was able to think of in pure Python. And it still failed.
But some quick testing by both #KellyBundy and me (see comments and Kelly's answer for some examples) shows this version is both extremely slow, and does not actually achieve input-independent execution times (because there's yet another relevant data-dependent optimization inside Python, and possibly for other reasons we're missing).
So if you're going to try to implement anything in Python, test it thoroughly before relying on it to be actually be secure, or just taking the general gist and implementing a C or assembly version. Something like this:
/* _clz.c */
#include <limits.h> /* CHAR_BIT */
#include <stddef.h> /* size_t */
int count_leading_zeroes_bytes(char * bytes, size_t length)
{
int still_in_leading_zeroes = 1;
int count = 0;
while(length--)
{
char byte = *bytes++;
int bits = CHAR_BIT;
while(bits--)
{
int bit = (byte >> bits) & 1;
int is_zero = bit ^ 1;
count += is_zero & still_in_leading_zeroes;
still_in_leading_zeroes &= is_zero;
}
}
return count;
}
# clz.py
import ctypes
# This is just a quick example. A mature version
# would load the library as appropriate for each
# platform.
_library = ctypes.CDLL('./_clz.so')
_count_leading_zeroes_bytes = _library.count_leading_zeroes_bytes
def count_leading_zeroes_bytes(data):
return _count_leading_zeroes_bytes(
ctypes.c_char_p(data),
ctypes.c_size_t(len(data)),
)

Times-two faster than bit-shift, for Python 3.x integers?

I was looking at the source of sorted_containers and was surprised to see this line:
self._load, self._twice, self._half = load, load * 2, load >> 1
Here load is an integer. Why use bit shift in one place, and multiplication in another? It seems reasonable that bit shifting may be faster than integral division by 2, but why not replace the multiplication by a shift as well? I benchmarked the the following cases:
(times, divide)
(shift, shift)
(times, shift)
(shift, divide)
and found that #3 is consistently faster than other alternatives:
# self._load, self._twice, self._half = load, load * 2, load >> 1
import random
import timeit
import pandas as pd
x = random.randint(10 ** 3, 10 ** 6)
def test_naive():
a, b, c = x, 2 * x, x // 2
def test_shift():
a, b, c = x, x << 1, x >> 1
def test_mixed():
a, b, c = x, x * 2, x >> 1
def test_mixed_swapped():
a, b, c = x, x << 1, x // 2
def observe(k):
print(k)
return {
'naive': timeit.timeit(test_naive),
'shift': timeit.timeit(test_shift),
'mixed': timeit.timeit(test_mixed),
'mixed_swapped': timeit.timeit(test_mixed_swapped),
}
def get_observations():
return pd.DataFrame([observe(k) for k in range(100)])
The question:
Is my test valid? If so, why is (multiply, shift) faster than (shift, shift)?
I run Python 3.5 on Ubuntu 14.04.
Edit
Above is the original statement of the question. Dan Getz provides an excellent explanation in his answer.
For the sake of completeness, here are sample illustrations for larger x when multiplication optimizations do not apply.
This seems to be because multiplication of small numbers is optimized in CPython 3.5, in a way that left shifts by small numbers are not. Positive left shifts always create a larger integer object to store the result, as part of the calculation, while for multiplications of the sort you used in your test, a special optimization avoids this and creates an integer object of the correct size. This can be seen in the source code of Python's integer implementation.
Because integers in Python are arbitrary-precision, they are stored as arrays of integer "digits", with a limit on the number of bits per integer digit. So in the general case, operations involving integers are not single operations, but instead need to handle the case of multiple "digits". In pyport.h, this bit limit is defined as 30 bits on 64-bit platform, or 15 bits otherwise. (I'll just call this 30 from here on to keep the explanation simple. But note that if you were using Python compiled for 32-bit, your benchmark's result would depend on if x were less than 32,768 or not.)
When an operation's inputs and outputs stay within this 30-bit limit, the operation can be handled in an optimized way instead of the general way. The beginning of the integer multiplication implementation is as follows:
static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
PyLongObject *z;
CHECK_BINOP(a, b);
/* fast path for single-digit multiplication */
if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
/* if we don't have long long then we're almost certainly
using 15-bit digits, so v will fit in a long. In the
unlikely event that we're using 30-bit digits on a platform
without long long, a large v will just cause us to fall
through to the general multiplication code below. */
if (v >= LONG_MIN && v <= LONG_MAX)
return PyLong_FromLong((long)v);
#endif
}
So when multiplying two integers where each fits in a 30-bit digit, this is done as a direct multiplication by the CPython interpreter, instead of working with the integers as arrays. (MEDIUM_VALUE() called on a positive integer object simply gets its first 30-bit digit.) If the result fits in a single 30-bit digit, PyLong_FromLongLong() will notice this in a relatively small number of operations, and create a single-digit integer object to store it.
In contrast, left shifts are not optimized this way, and every left shift deals with the integer being shifted as an array. In particular, if you look at the source code for long_lshift(), in the case of a small but positive left shift, a 2-digit integer object is always created, if only to have its length truncated to 1 later: (my comments in /*** ***/)
static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
/*** ... ***/
wordshift = shiftby / PyLong_SHIFT; /*** zero for small w ***/
remshift = shiftby - wordshift * PyLong_SHIFT; /*** w for small w ***/
oldsize = Py_ABS(Py_SIZE(a)); /*** 1 for small v > 0 ***/
newsize = oldsize + wordshift;
if (remshift)
++newsize; /*** here newsize becomes at least 2 for w > 0, v > 0 ***/
z = _PyLong_New(newsize);
/*** ... ***/
}
Integer division
You didn't ask about the worse performance of integer floor division compared to right shifts, because that fit your (and my) expectations. But dividing a small positive number by another small positive number is not as optimized as small multiplications, either. Every // computes both the quotient and the remainder using the function long_divrem(). This remainder is computed for a small divisor with a multiplication, and is stored in a newly-allocated integer object, which in this situation is immediately discarded.
Or at least, that was the case when this question was originally asked. In CPython 3.6, a fast path for small int // was added, so // now beats >> for small ints too.

Runtime of python's if substring in string

What is the big O of the following if statement?
if "pl" in "apple":
...
What is the overall big O of how python determines if the string "pl" is found in the string "apple"
or any other substring in string search.
Is this the most efficient way to test if a substring is in a string? Does it use the same algorithm as .find()?
The time complexity is O(N) on average, O(NM) worst case (N being the length of the longer string, M, the shorter string you search for). As of Python 3.10, heuristics are used to lower the worst-case scenario to O(N + M) by switching algorithms.
The same algorithm is used for str.index(), str.find(), str.__contains__() (the in operator) and str.replace(); it is a simplification of the Boyer-Moore with ideas taken from the Boyer–Moore–Horspool and Sunday algorithms.
See the original stringlib discussion post, as well as the fastsearch.h source code; until Python 3.10, the base algorithm has not changed since introduction in Python 2.5 (apart from some low-level optimisations and corner-case fixes).
The post includes a Python-code outline of the algorithm:
def find(s, p):
# find first occurrence of p in s
n = len(s)
m = len(p)
skip = delta1(p)[p[m-1]]
i = 0
while i <= n-m:
if s[i+m-1] == p[m-1]: # (boyer-moore)
# potential match
if s[i:i+m-1] == p[:m-1]:
return i
if s[i+m] not in p:
i = i + m + 1 # (sunday)
else:
i = i + skip # (horspool)
else:
# skip
if s[i+m] not in p:
i = i + m + 1 # (sunday)
else:
i = i + 1
return -1 # not found
as well as speed comparisons.
In Python 3.10, the algorithm was updated to use an enhanced version of the Crochemore and Perrin's Two-Way string searching algorithm for larger problems (with p and s longer than 100 and 2100 characters, respectively, with s at least 6 times as long as p), in response to a pathological edgecase someone reported. The commit adding this change included a write-up on how the algorithm works.
The Two-way algorithm has a worst-case time complexity of O(N + M), where O(M) is a cost paid up-front to build a shift table from the s search needle. Once you have that table, this algorithm does have a best-case performance of O(N/M).
In Python 3.4.2, it looks like they are resorting to the same function, but there may be a difference in timing nevertheless. For example, s.find first is required to look up the find method of the string and such.
The algorithm used is a mix between Boyer-More and Horspool.
You can use timeit and test it yourself:
maroun#DQHCPY1:~$ python -m timeit 's = "apple";s.find("pl")'
10000000 loops, best of 3: 0.125 usec per loop
maroun#DQHCPY1:~$ python -m timeit 's = "apple";"pl" in s'
10000000 loops, best of 3: 0.0371 usec per loop
Using in is indeed faster (0.0371 usec compared to 0.125 usec).
For actual implementation, you can look at the code itself.
I think the best way to find out is to look at the source. This looks like it would implement __contains__:
static int
bytes_contains(PyObject *self, PyObject *arg)
{
Py_ssize_t ival = PyNumber_AsSsize_t(arg, PyExc_ValueError);
if (ival == -1 && PyErr_Occurred()) {
Py_buffer varg;
Py_ssize_t pos;
PyErr_Clear();
if (PyObject_GetBuffer(arg, &varg, PyBUF_SIMPLE) != 0)
return -1;
pos = stringlib_find(PyBytes_AS_STRING(self), Py_SIZE(self),
varg.buf, varg.len, 0);
PyBuffer_Release(&varg);
return pos >= 0;
}
if (ival < 0 || ival >= 256) {
PyErr_SetString(PyExc_ValueError, "byte must be in range(0, 256)");
return -1;
}
return memchr(PyBytes_AS_STRING(self), (int) ival, Py_SIZE(self)) != NULL;
}
in terms of stringlib_find(), which uses fastsearch().

Long numbers hashing complexity in Python

How does Python hash long numbers? I guess it takes O(1) time for 32-bit ints, but the way long integers work in Python makes me think the complexity is not O(1) for them. I've looked for answers in relevant questions, but have found none straightforward enough to make me confident. Thank you in advance.
The long_hash() function indeed loops and depends on the size of the integer, yes:
/* The following loop produces a C unsigned long x such that x is
congruent to the absolute value of v modulo ULONG_MAX. The
resulting x is nonzero if and only if v is. */
while (--i >= 0) {
/* Force a native long #-bits (32 or 64) circular shift */
x = (x >> (8*SIZEOF_LONG-PyLong_SHIFT)) | (x << PyLong_SHIFT);
x += v->ob_digit[i];
/* If the addition above overflowed we compensate by
incrementing. This preserves the value modulo
ULONG_MAX. */
if (x < v->ob_digit[i])
x++;
}
where i is the 'object size', e.g. the number of digits used to represent the number, where the size of a digit depends on your platform.

Extendible hashing - most significant bits

I want to write extendible hashing. On wiki I have found good implementation in python. But this code uses least significant bits, so when I have hash 1101 for d = 1 value is 1 and for d = 2 value is 01. I would like to use most significant bits. For exmaple: hash 1101, d = 1 value is 1, d = 2 value is 11. Is there any simple way to do that? I tried, but I can't.
Do you understand why it uses the least significant bits?
More or less. It makes efficient when we using arrays. Ok so for hash function I would like to use four least bits from 4-bytes integer but from left to right.
h = hash(k)
h = h & 0xf #use mask to get four least bits
p = self.pp[ h >> ( 4 - GD)]
And it doesn't work, and I don't know why.
Computing a hash using the least significant bits is the fastest way to compute a hash, because it only requires an AND bitwise operation. This makes it very popular.
Here is an implemetation (in C) for a hash using the most significant bits. Since there is no direct way to know the most significant bit, it repeatedly tests that the remaining value has only the specified amount of bits.
int significantHash(int value, int bits) {
int mask = (1 << bits) - 1;
while (value > mask) {
value >>= 1;
}
return value;
}
I recommend the overlapping hash, that makes use of all the bits of the number. Essentially, it cuts the number in parts of equal number of bits and XORs them. It runs slower than the least significant hash, but faster than the significant hash. Above all else, it offers a better dispersion than the other two methods, making it a better candidate when the numbers that must be hashed have a certain bit-related-pattern.
int overlappingHash(int value, int bits) {
int mask = (1 << bits) - 1;
int answer = 0;
do {
answer ^= (value & mask);
value >>= bits;
} while (value > 0);
return answer;
}

Categories