Numpy - How to remove trailing N*8 zeros - python

I have 1d array, I need to remove all trailing blocks of 8 zeros.
[0,1,1,0,1,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0]
->
[0,1,1,0,1,0,0,0]
a.shape[0] % 8 == 0 always, so no worries about that.
Is there a better way to do it?
import numpy as np
P = 8
arr1 = np.random.randint(2,size=np.random.randint(5,10) * P)
arr2 = np.random.randint(1,size=np.random.randint(5,10) * P)
arr = np.concatenate((arr1, arr2))
indexes = []
arr = np.flip(arr).reshape(arr.shape[0] // P, P)
for i, f in enumerate(arr):
if (f == 0).all():
indexes.append(i)
else:
break
arr = np.delete(arr, indexes, axis=0)
arr = np.flip(arr.reshape(arr.shape[0] * P))

You can do it without allocating more space by using views and np.argmax to get the last nonzero element:
index = arr.size - np.argmax(arr[::-1])
Rounding up to the nearest multiple of eight is easy:
index = np.ceil(index / 8) * 8
Now chop off the rest:
arr = arr[:index]
Or as a one-liner:
arr = arr[:(arr.size - np.argmax(arr[::-1])) / 8) * 8]
This version is O(n) in time and O(1) in space because it reuses the same buffers for everything (including the output).
This has the additional advantage that it will work correctly even if there are no trailing zeros. Using argmax does rely on all the elements being the same though. If that is not the case, you will need to compute a mask first, e.g. with arr.astype(bool).
If you want to use your original approach, you could vectorize that too, although there will be a bit more overhead:
view = arr.reshape(-1, 8)
mask = view.any(axis = 1)
index = view.shape[0] - np.argmax(mask[::-1])
arr = arr[:index * 8]

There is a numpy function that does almost what you want np.trim_zeros. We can use that:
import numpy as np
def trim_mod(a, m=8):
t = np.trim_zeros(a, 'b')
return a[:len(a)-(len(a)-len(t))//m*m]
def test(a, t, m=8):
assert (len(a) - len(t)) % m == 0
assert len(t) < m or np.any(t[-m:])
assert not np.any(a[len(t):])
for _ in range(1000):
a = (np.random.random(np.random.randint(10, 100000))<0.002).astype(int)
m = np.random.randint(4, 20)
t = trim_mod(a, m)
test(a, t, m)
print("Looks correct")
Prints:
Looks correct
It seems to scale linearly in the number of trailing zeros:
But feels rather slow in absolute terms (units are ms per trial), so maybe np.trim_zeros is just a python loop.
Code for the picture:
from timeit import timeit
A = (np.random.random(1000000)<0.02).astype(int)
m = 8
T = []
for last in range(1, 1000, 9):
A[-last:] = 0
A[-last] = 1
T.append(timeit(lambda: trim_mod(A, m), number=100)*10)
import pylab
pylab.plot(range(1, 1000, 9), T)
pylab.show()

A low level approach :
import numba
#numba.njit
def trim8(a):
n=a.size-1
while n>=0 and a[n]==0 : n-=1
c= (n//8+1)*8
return a[:c]
Some tests :
In [194]: A[-1]=1 # best case
In [196]: %timeit trim_mod(A,8)
5.7 µs ± 323 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [197]: %timeit trim8(A)
714 ns ± 33.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [198]: %timeit A[:(A.size - np.argmax(A[::-1]) // 8) * 8]
4.83 ms ± 479 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [202]: A[:]=0 #worst case
In [203]: %timeit trim_mod(A,8)
2.5 s ± 49.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [204]: %timeit trim8(A)
1.14 ms ± 71.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [205]: %timeit A[:(A.size - np.argmax(A[::-1]) // 8) * 8]
5.5 ms ± 950 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It has a short circuit mechanism like trim_zeros, but is much faster.

Related

Create all permutations of size N of the digits 0-9 - as optimized as possible using numpy\scipy

i need to create an array of all the permutations of the digits 0-9 of size N (input, 1 <= N <= 10).
I've tried this:
np.array(list(itertools.permutations(range(10), n)))
for n=6:
timeit np.array(list(itertools.permutations(range(10), 6)))
on my machine gives:
68.5 ms ± 881 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
But it simply not fast enough.
I need it to be below 40ms.
Note:
I cannot change the machine from numpy version 1.22.3
Refer to the link provided by #KellyBundy to get a fast method:
def permutations_(n, k):
a = np.zeros((math.perm(n, k), k), np.uint8)
f = 1
for m in range(n - k + 1, n + 1):
b = a[:f, n - m + 1:]
for i in range(1, m):
a[i * f:(i + 1) * f, n - m] = i
a[i * f:(i + 1) * f, n - m + 1:] = b + (b >= i)
b += 1
f *= m
return a
Simple test:
In [125]: %timeit permutations_(10, 6)
3.96 ms ± 42.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [128]: np.array_equal(permutations_(10, 6), np.array(list(permutations(range(10), 6))))
Out[128]: True
Old answer
Using itertools.chain.from_iterable to concatenate iterators of each tuple to construct array lazily can get a little improvement:
In [94]: from itertools import chain, permutations
In [95]: %timeit np.array(list(permutations(range(10), 6)), np.int8)
63.2 ms ± 500 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [96]: %timeit np.fromiter(chain.from_iterable(permutations(range(10), 6)), np.int8).reshape(-1, 6)
28.4 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#KellyBundy proposed a faster solution in the comments area, using the fast iteration in the bytes constructor and buffer protocol. It seems that the numpy.fromiter wasted a lot of time in iteration:
In [98]: %timeit np.frombuffer(bytes(chain.from_iterable(permutations(range(10), 6))), np.int8).reshape(-1, 6)
11.3 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
However, it should be noted that the above results are read-only (thanks for #MichaelSzczesny's reminder):
In [109]: ar = np.frombuffer(bytes(chain.from_iterable(permutations(range(10), 6))), np.int8).reshape(-1, 6)
In [110]: ar[0, 0] = 1
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [110], line 1
----> 1 ar[0, 0] = 1
ValueError: assignment destination is read-only

Perform sum over different slice of each row for 2D array

I have a 2D array of numbers and would like to average over different indices in each row. Say I have
import numpy as np
data = np.arange(16).reshape(4, 4)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
I have two lists, specifying the first (inclusive) and last (exclusive) index for each row:
start = [1, 0, 1, 2]
end = [2, 1, 3, 4]
Then I would like to achieve this:
result = []
for i in range(4):
result.append(np.sum(data[i, start[i]:end[i]]))
which gives
[1, 4, 19, 29]
However, the arrays I use are a lot larger than in this example, so this method is too slow for me. Is there some smart way to avoid this loop?
My first idea was to flatten the array. Then, I guess, one would need to somehow make a list of slices and apply it in parallel on the array, which I don't know how to do.
Otherwise, I was thinking of using np.apply_along_axis but I think this only works for functions?
Let's run with your raveling idea. You can convert the indices of your array into raveled indices like this:
ind = np.stack((start, end), axis=0)
ind += np.arange(data.shape[0]) * data.shape[1]
ind = ind.ravel(order='F')
if ind[-1] == data.size:
ind = ind[:-1]
Now you can ravel the original array, and add.reduceat on the segments thus defined:
np.add.reduceat(data.ravel(), ind)[::2] / np.subtract(end, start)
TL;DR
def row_mean(data, start, end):
ind = np.stack((start, end), axis=0)
ind += np.arange(data.shape[0]) * data.shape[1]
ind = ind.ravel(order='F')
if ind[-1] == data.size:
ind = ind[:-1]
return np.add.reduceat(data.ravel(), ind)[::2] / np.subtract(end, start)
Timings
Using the exact same arrays shown in #Divakar's answer, we get the following results (specific to my machine of course):
%timeit einsum_mean(data, start, end)
261 ms ± 2.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit broadcasting_mean(data, start, end)
405 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit ragged_mean(data, start, end)
520 ms ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit row_mean(data, start, end)
45.6 ms ± 708 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Somewhat surprisingly, this method runs 5-10x faster than all the others, despite doing lots of extra work by adding up all the numbers between the regions of interest. The reason is probably that it has extremely low overhead: the arrays of indices are small, and it only makes a single pass over a 1D array.
Let's try to create a mask with broadcasting:
start = np.array([1,0,1,2])
end = np.array([2,1,3,4])
idx = np.arange(data.shape[1])
mask = (start[:,None] <= idx) & (idx <end[:,None])
# this is for sum
(data * mask).sum(1)
Output:
array([ 1, 4, 19, 29])
And if you need average:
(data * mask).sum(1) / mask.sum(1)
which gives:
array([ 1. , 4. , 9.5, 14.5])
Here's one way with broadcasting to create the masking array and then using np.einsum to sum those per row using that mask -
start = np.asarray(start)
end = np.asarray(end)
r = np.arange(data.shape[1])
m = (r>=start[:,None]) & (r<end[:,None])
out = np.einsum('ij,ij->i',data,m)
To get the averages, divide by the mask summations -
avg_out = np.einsum('ij,ij->i',data,m)/np.count_nonzero(m,axis=1)
Or for the last step, use np.matmul/# :
out = (data[:,None] # m[:,:,None]).ravel()
Timings
# #Quang Hoang's soln with sum method
def broadcast_sum(data, start, end):
idx = np.arange(data.shape[1])
mask = (start[:,None] <= idx) & (idx <end[:,None])
return (data * mask).sum(1) / mask.sum(1)
# From earlier in this post
def broadcast_einsum(data, start, end):
r = np.arange(data.shape[1])
m = (r>=start[:,None]) & (r<end[:,None])
return np.einsum('ij,ij->i',data,m)/np.count_nonzero(m,axis=1)
# #Paul Panzer's soln
def ragged_mean(data,left,right):
n,m = data.shape
ps = np.zeros((n,m+1),data.dtype)
left,right = map(np.asarray,(left,right))
rng = np.arange(len(data))
np.cumsum(data,axis=1,out=ps[:,1:])
return (ps[rng,right]-ps[rng,left])/(right-left)
# #Mad Physicist's soln
def row_mean(data, start, end):
ind = np.stack((start, end), axis=0)
ind += np.arange(data.shape[0]) * data.shape[1]
ind = ind.ravel(order='F')
if ind[-1] == data.size:
ind = ind[:-1]
return np.add.reduceat(data.ravel(), ind)[::2] / np.subtract(end, start)
1. Tall array
Using given sample and tiling it along rows :
In [74]: data = np.arange(16).reshape(4,4)
...: start = [1,0,1,2]
...: end = [2,1,3,4]
...:
...: N = 100000
...: data = np.repeat(data,N,axis=0)
...: start = np.tile(start,N)
...: end = np.tile(end,N)
In [75]: %timeit broadcast_sum(data, start, end)
...: %timeit broadcast_einsum(data, start, end)
...: %timeit ragged_mean(data, start, end)
...: %timeit row_mean(data, start, end)
41.4 ms ± 3.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
38.8 ms ± 996 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
24 ms ± 525 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
22.5 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
2. Square array
Using a large square array (same as the given sample shape) -
In [76]: np.random.seed(0)
...: data = np.random.rand(10000,10000)
...: start = np.random.randint(0,5000,10000)
...: end = start + np.random.randint(1,5000,10000)
In [77]: %timeit broadcast_sum(data, start, end)
...: %timeit broadcast_einsum(data, start, end)
...: %timeit ragged_mean(data, start, end)
...: %timeit row_mean(data, start, end)
759 ms ± 3.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
514 ms ± 5.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
932 ms ± 4.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
72.5 ms ± 587 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
We can avoid creating masks at the cost of doing a few more sums:
import numpy as np
def ragged_mean(data,left,right):
n,m = data.shape
ps = np.zeros((n,m+1),data.dtype)
left,right = map(np.asarray,(left,right))
rng = np.arange(len(data))
np.cumsum(data,axis=1,out=ps[:,1:])
return (ps[rng,right]-ps[rng,left])/(right-left)
data = np.arange(16).reshape(4,4)
start = [1,0,1,2]
end = [2,1,3,4]
print(ragged_mean(data,start,end))
Sample run:
[ 1. 4. 9.5 14.5]
When the slices (or rather the discarded areas) are large a hybrid (numpy core in one Python loop) method is even faster than #MadPhysicist's. This method is, however, quite slow on #Divakar's tall test case.
import operator as op
def hybrid(data,start,end):
off = np.arange(0,data.size,data.shape[1])
start = start+off
end = end+off
sliced = op.itemgetter(*map(slice,start,end))(data.ravel())
return np.fromiter(map(op.methodcaller("sum"),sliced),float,len(data))\
/(end-start)

Numpy grouping by range of difference between elements

I have an array of angles that I want to group into arrays with a max difference of 2 deg between them.
eg: input:
angles = np.array([[1],[2],[3],[4],[4],[5],[10]])
output
('group', 1)
[[1]
[2]
[3]]
('group', 2)
[[4]
[4]
[5]]
('group', 3)
[[10]]
numpy.diff gets the difference of the next element from the current, I need the difference of the next elements from the first of the group
itertools.groupby groups the elements not within a definable range
numpy.digitize groups the elements by a predefined range, not by the range specified by the elements of the array.
(Maybe I can use this by getting the unique values of angles, grouping them by their difference and using that as the predefined range?)
.
My approach which works but seems extremely inefficient and non-pythonic:
(I am using expand_dims and vstack because I'm working with a 1d arrays (not just angles) but I've reduced them to simplify it for this question)
angles = np.array([[1],[2],[3],[4],[4],[5],[10]])
groupedangles = []
idx1 = 0
diffAngleMax = 2
while(idx1 < len(angles)):
angleA = angles[idx1]
group = np.expand_dims(angleA, axis=0)
for idx2 in xrange(idx1+1,len(angles)):
angleB = angles[idx2]
diffAngle = angleB - angleA
if abs(diffAngle) <= diffAngleMax:
group = np.vstack((group,angleB))
else:
idx1 = idx2
groupedangles.append(group)
break
if idx2 == len(angles) - 1:
if idx1 == idx2:
angleA = angles[idx1]
group = np.expand_dims(angleA, axis=0)
groupedangles.append(group)
break
for idx, x in enumerate(groupedangles):
print('group', idx+1)
print(x)
What is a better and faster way to do this?
Update Here is some Cython treatment
In [1]: import cython
In [2]: %load_ext Cython
In [3]: %%cython
...: import numpy as np
...: cimport numpy as np
...: def cluster(np.ndarray array, np.float64_t maxdiff):
...: cdef np.ndarray[np.float64_t, ndim=1] flat = np.sort(array.flatten())
...: cdef list breakpoints = []
...: cdef np.float64_t seed = flat[0]
...: cdef np.int64_t int = 0
...: for i in range(0, len(flat)):
...: if (flat[i] - seed) > maxdiff:
...: breakpoints.append(i)
...: seed = flat[i]
...: return np.split(array, breakpoints)
...:
Sparsity test
In [4]: angles = np.random.choice(np.arange(5000), 500).astype(np.float64)[:, None]
In [5]: %timeit cluster(angles, 2)
422 µs ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Duplication test
In [6]: angles = np.random.choice(np.arange(500), 1500).astype(np.float64)[:, None]
In [7]: %timeit cluster(angles, 2)
263 µs ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Both tests show a significant improvement. The algorithm now sorts the input and makes a single run over the sorted array, which makes it stable O(N*log(N)).
Pre-update
This is a variation on seed clustering. It requires no sorting
def cluster(array, maxdiff):
tmp = array.copy()
groups = []
while len(tmp):
# select seed
seed = tmp.min()
mask = (tmp - seed) <= maxdiff
groups.append(tmp[mask, None])
tmp = tmp[~mask]
return groups
Example:
In [27]: cluster(angles, 2)
Out[27]:
[array([[1],
[2],
[3]]), array([[4],
[4],
[5]]), array([[10]])]
A benchmark for 500, 1000 and 1500 angles:
In [4]: angles = np.random.choice(np.arange(500), 500)[:, None]
In [5]: %timeit cluster(angles, 2)
1.25 ms ± 60.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [6]: angles = np.random.choice(np.arange(500), 1000)[:, None]
In [7]: %timeit cluster(angles, 2)
1.46 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [8]: angles = np.random.choice(np.arange(500), 1500)[:, None]
In [9]: %timeit cluster(angles, 2)
1.99 ms ± 72.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
While the algorithm is O(N^2) in the worst case and O(N) in the best case, the benchmarks above clearly show near-linear time growth, because the actual runtime depends on the structure of your data: sparsity and the duplication rate. In most real cases you won't hit the worst case.
Some sparsity benchmarks
In [4]: angles = np.random.choice(np.arange(500), 500)[:, None]
In [5]: %timeit cluster(angles, 2)
1.06 ms ± 27.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [6]: angles = np.random.choice(np.arange(1000), 500)[:, None]
In [7]: %timeit cluster(angles, 2)
1.79 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: angles = np.random.choice(np.arange(1500), 500)[:, None]
In [9]: %timeit cluster(angles, 2)
2.16 ms ± 90.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [10]: angles = np.random.choice(np.arange(5000), 500)[:, None]
In [11]: %timeit cluster(angles, 2)
3.21 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Here is a sorting based solution. One could try and be a bit smarter and use bincount and argpartition to avoid the sorting, but at N <= 500 it's not worth the trouble.
import numpy as np
def flexibin(a):
idx0 = np.argsort(a)
as_ = a[idx0]
A = np.r_[as_, as_+2]
idx = np.argsort(A)
uinv = np.flatnonzero(idx >= len(a))
linv = np.empty_like(idx)
linv[np.flatnonzero(idx < len(a))] = np.arange(len(a))
bins = [0]
curr = 0
while True:
for j in range(uinv[idx[curr]], len(idx)):
if idx[j] < len(a) and A[idx[j]] > A[idx[curr]] + 2:
bins.append(j)
curr = j
break
else:
return np.split(idx0, linv[bins[1:]])
a = 180 * np.random.random((500,))
bins = flexibin(a)
mn, mx = zip(*((np.min(a[b]), np.max(a[b])) for b in bins))
assert np.all(np.diff(mn) > 2)
assert np.all(np.subtract(mx, mn) <= 2)
print('all ok')

Apply a custom function to multidimensional numpy array, keeping the same shape

Basically I want to map over each value of a multidimensional numpy array. The output should have the same shape as the input.
This is the way I did it:
def f(x):
return x*x
input = np.arange(3*4*5).reshape(3,4,5)
output = np.array(list(map(f, input)))
print(output)
It works, but it feels a bit too complicated (np.array, list, map). Is there a more elegant solution?
Just call your function on the array:
f(input)
Also, better not to use the name input for your variable as it is a builtin:
import numpy as np
def f(x):
return x*x
arr = np.arange(3*4*5).reshape(3,4,5)
print(np.alltrue(f(arr) == np.array(list(map(f, input)))))
Output:
True
If the function is more complex:
def f(x):
return x+1 if x%2 else 2*x
use vectorize:
np.vectorize(f)(arr)
Better, always try to use the vectorized NumPy functions such as np.where:
>>> np.alltrue(np.vectorize(f)(arr) == np.where(arr % 2, arr + 1, arr * 2))
True
The native NumPy version is considerably faster:
%%timeit
np.vectorize(f)(arr)
34 µs ± 996 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
np.where(arr % 2, arr + 1, arr * 2)
5.16 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
This is much more pronounced for larger arrays:
big_arr = np.arange(30 * 40 * 50).reshape(30, 40, 50)
%%timeit
np.vectorize(f)(big_arr)
15.5 ms ± 318 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
np.where(big_arr % 2, big_arr + 1, big_arr * 2)
797 µs ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Check if a numpy array is sorted

I have a numpy array and I like to check if it is sorted.
>>> a = np.array([1,2,3,4,5])
array([1, 2, 3, 4, 5])
np.all(a[:-1] <= a[1:])
Examples:
is_sorted = lambda a: np.all(a[:-1] <= a[1:])
>>> a = np.array([1,2,3,4,9])
>>> is_sorted(a)
True
>>> a = np.array([1,2,3,4,3])
>>> is_sorted(a)
False
With NumPy tools:
np.all(np.diff(a) >= 0)
but numpy solutions are all O(n).
If you want quick code and very quick conclusion on unsorted arrays:
import numba
#numba.jit
def is_sorted(a):
for i in range(a.size-1):
if a[i+1] < a[i] :
return False
return True
which is O(1) (in mean) on random arrays.
The inefficient but easy-to-type solution:
(a == np.sort(a)).all()
For completeness, the O(log n) iterative solution is found below. The recursive version is slower and crashes with big vector sizes. However, it is still slower than the native numpy using np.all(a[:-1] <= a[1:]) most likely due to modern CPU optimizations. The only case where the O(log n) is faster is on the "average" random case or if it is "almost" sorted. If you suspect your array is already fully sorted then np.all will be faster.
def is_sorted(a):
idx = [(0, a.size - 1)]
while idx:
i, j = idx.pop(0) # Breadth-First will find almost-sorted in O(log N)
if i >= j:
continue
elif a[i] > a[j]:
return False
elif i + 1 == j:
continue
else:
mid = (i + j) >> 1 # Division by 2 with floor
idx.append((i, mid))
idx.append((mid, j))
return True
is_sorted2 = lambda a: np.all(a[:-1] <= a[1:])
Here are the results:
# Already sorted array - np.all is super fast
sorted_array = np.sort(np.random.rand(1000000))
%timeit is_sorted(sorted_array)
659 ms ± 3.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit is_sorted2(sorted_array)
431 µs ± 35.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Here I included the random in each command so we need to substract it's timing
%timeit np.random.rand(1000000)
6.08 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit is_sorted(np.random.rand(1000000))
6.11 ms ± 58.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Without random part, it took 6.11 ms - 6.08 ms = 30µs per loop
%timeit is_sorted2(np.random.rand(1000000))
6.83 ms ± 75.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Without random part, it took 6.83 ms - 6.08 ms = 750µs per loop
Net, a O(n) vector optimized code is better than an O(log n) algorithm, unless you will run >100 million element arrays.

Categories