Number of times an array is present in another array in Python - python

How can I count the number of times an array is present in a larger array?
a = np.array([1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1])
b = np.array([1, 1, 1])
The count for the number of times b is present in a should be 3
b can be any combination of 1s and 0s
I'm working with huge arrays, so for loops are pretty slow

If the subarray being searched for contains all 1s, you can count the number of times the subarray appears in the larger array by convolving the two arrays with np.convolve and counting the number of entries in the result that equal the size of the subarray:
# 'valid' = convolve only over the complete overlap of the signals
>>> np.convolve(a, b, mode='valid')
array([1, 1, 2, 3, 2, 2, 2, 3, 3, 2, 1, 1])
# ^ ^ ^ <= Matches
>>> win_size = min(a.size, b.size)
>>> np.count_nonzero(np.convolve(a, b) == win_size)
3
For subarrays that may contain 0s, you can start by using convolution to transform a into an array containing the binary numbers encoded by each window of size b.size. Then just compare each element of the transformed array with the binary number encoded by b and count the matches:
>>> b = np.array([0, 1, 1]) # encodes '3'
>>> weights = 2 ** np.arange(b.size) # == [1, 2, 4, 8, ..., 2**(b.size-1)]
>>> np.convolve(a, weights, mode='valid')
array([4, 1, 3, 7, 6, 5, 3, 7, 7, 6, 4, 1])
# ^ ^ Matches
>>> target = (b * np.flip(weights)).sum() # target==3
>>> np.count_nonzero(np.convolve(a, weights, mode='valid') == target)
2

Not a super fast method, but you can view a as a windowed array using np.lib.stride_tricks.sliding_window_view:
window = np.lib.stride_tricks.sliding_window_view(a, b.shape)
You can now equate this to b directly and find where they match:
result = (window == b).all(-1).sum()
For older versions of numpy (pre-1.20.0), you can use np.libs.stride_tricks.as_strided to achieve a similar result:
window = np.lib.stride_tricks.as_strided(
a, shape=(*(np.array(a.shape) - b.shape + 1), *b.shape),
strides=a.strides + (a.strides[0],) * b.ndim)

Here is a solution using a list comprehension:
a = [1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1]
b = [1, 1, 1]
sum(a[i:i+len(b)]==b for i in range(len(a)-len(b)))
output: 3

Here are a few improvements on #Brian's answer:
Use np.correlate not np.convolve; they are nearly identical but convolve reads a and b in opposite directions
To deal with templates that have zeros convert the zeros to -1. For example:
a = np.array([1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1])
b = np.array([0,1,1])
np.correlate(a,2*b-1)
# array([-1, 1, 2, 1, 0, 0, 2, 1, 1, 0, -1, 1])
The template fits where the correlation equals the number of ones in the template. The indices can be extracted like so:
(np.correlate(a,2*b-1)==np.count_nonzero(b)).nonzero()[0]
# array([2, 6])
If you only need the count use np.count_nonzero
np.count_nonzero((np.correlate(a,2*b-1)==np.count_nonzero(b)))
# 2

Related

Find first n non zero values in in numpy 2d array

I would like to know the fastest way to extract the indices of the first n non zero values per column in a 2D array.
For example, with the following array:
arr = [
[4, 0, 0, 0],
[0, 0, 0, 0],
[0, 4, 0, 0],
[2, 0, 9, 0],
[6, 0, 0, 0],
[0, 7, 0, 0],
[3, 0, 0, 0],
[1, 2, 0, 0],
With n=2 I would have [0, 0, 1, 1, 2] as xs and [0, 3, 2, 5, 3] as ys. 2 values in the first and second columns and 1 in the third.
Here is how it is currently done:
x = []
y = []
n = 3
for i, c in enumerate(arr.T):
a = c.nonzero()[0][:n]
if len(a):
x.extend([i]*len(a))
y.extend(a)
In practice I have arrays of size (405, 256).
Is there a way to make it faster?
Here is a method, although quite confusing as it uses a lot of functions, that does not require sorting the array (only a linear scan is necessary to get non null values):
n = 2
# Get indices with non null values, columns indices first
nnull = np.stack(np.where(arr.T != 0))
# split indices by unique value of column
cols_ids= np.array_split(range(len(nnull[0])), np.where(np.diff(nnull[0]) > 0)[0] +1 )
# Take n in each (max) and concatenate the whole
np.concatenate([nnull[:, u[:n]] for u in cols_ids], axis = 1)
outputs:
array([[0, 0, 1, 1, 2],
[0, 3, 2, 5, 3]], dtype=int64)
Here is one approach using argsort, it gives a different order though:
n = 2
m = arr!=0
# non-zero values first
idx = np.argsort(~m, axis=0)
# get first 2 and ensure non-zero
m2 = np.take_along_axis(m, idx, axis=0)[:n]
y,x = np.where(m2)
# slice
x, idx[y,x]
# (array([0, 1, 2, 0, 1]), array([0, 2, 3, 3, 5]))
Use dislocation comparison for the row results of the transposed nonzero:
>>> n = 2
>>> i, j = arr.T.nonzero()
>>> mask = np.concatenate([[True] * n, i[n:] != i[:-n]])
>>> i[mask], j[mask]
(array([0, 0, 1, 1, 2], dtype=int64), array([0, 3, 2, 5, 3], dtype=int64))

Most computationally efficient way to classify the comparison two numpy arrays of 1s and 0s; if an index of both contains 1s, or 0s, etc

Say I have two numpy arrays
prediction = np.array([1, 0, 0, 1, 1, 0, 1, 1, 1])
groundtrue = np.array([1, 0, 1, 0, 1, 1, 0, 0, 1])
I would like to compare the two arrays, including a classification of each index comparison. So I want to classify if both prediction and groundtrue have 1s, or if prediction has 1 and groundtrue has 0, or vise versa, etc.
So an example desired result could look like this
comparison = np.array([1, 2, 3, 4, 1, 3, 4, 4, 1])
1 is used if both have 1s. 2 is used if both have 0s. 3 is used if if prediction has 0 and groundtrue has 1. etc.
The most straight forward way is to use a loop and directly compare each index but it seems that numpy may have some operators that can do this very computationally efficiently and much less lines of code
np.where(prediction == groundtrue, 1, 0)
Can get me an array where the two values are equal, or not, but I don't see it giving different types of comparisons.
Since you have only 0s and 1s in prediction and groundtrue, there will be only 4 cases in total. Hence you can create a classification with:
prediction + 2 * groundtrue
# array([3, 0, 2, 1, 3, 2, 1, 1, 3])
It won't be the same order as in OP, but enough to distinguish different cases.
0 if both prediction and groudtrue are 0s;
1 if groudtrue is 0 and prediction is 1;
2 if groudtrue is 1 and prediction is 0;
3 if both are 1s;
One can do this with boolean masks.
import numpy as np
prediction = np.array([1, 0, 0, 1, 1, 0, 1, 1, 1])
groundtrue = np.array([1, 0, 1, 0, 1, 1, 0, 0, 1])
comparison = np.zeros_like(prediction)
comparison[(prediction==1) & (groundtrue==1)] = 1
comparison[(prediction==0) & (groundtrue==0)] = 2
comparison[(prediction==0) & (groundtrue==1)] = 3
comparison[(prediction==1) & (groundtrue==0)] = 4
comparison # array([1, 2, 3, 4, 1, 3, 4, 4, 1])
The following gives the same output as the above, but you only need to compute two masks. This assumes that prediction and ground truth can only have values of 0 or 1.
import numpy as np
prediction = np.array([1, 0, 0, 1, 1, 0, 1, 1, 1])
groundtrue = np.array([1, 0, 1, 0, 1, 1, 0, 0, 1])
pred0 = prediction == 0
ground0 = groundtrue == 0
comparison = np.zeros_like(prediction)
comparison[~pred0 & ~ground0] = 1
comparison[pred0 & ground0] = 2
comparison[pred0 & ~ground0] = 3
comparison[~pred0 & ground0] = 4
comparison # array([1, 2, 3, 4, 1, 3, 4, 4, 1])
The masks will have boolean values, where each value indicates whether the condition is true in the original array.
x = np.array([0, 1])
x == 0 # array([ True, False])
The & operator will compute the logical AND of both boolean arrays. See numpy.logical_and for more information.
Then, one can use the boolean mask to subset certain values. We can assign that subset of indices a new value.
x = np.array([0, 1])
x[x==0] = 2
x # array([2, 1])
You can try using code like this:
import numpy as np
prediction = np.array([1, 0, 0, 1, 1, 0, 1, 1, 1])
groundtrue = np.array([1, 0, 1, 0, 1, 1, 0, 0, 1])
lut=np.array([2,4,3,1]) #look up table
pow_two=2**np.arange(2)
comparison=lut[np.dot(pow_two, np.vstack((prediction, groundtrue)))]

How to find a distance between elements in numpy array?

For example, I have such array z:
array([1, 0, 1, 0, 0, 0, 1, 0, 0, 1])
How to find a distances between two successive 1s in this array? (measured in the numbers of 0s)
For example, in the z array, such distances are:
[1, 3, 2]
I have such code for it:
distances = []
prev_idx = 0
for idx, element in enumerate(z):
if element == 1:
distances.append(idx - prev_idx)
prev_idx = idx
distances = np.array(distances[1:]) - 1
Can this opeartion be done without for-loop and maybe in more efficient way?
UPD
The solution in the #warped answer works fine in 1-D case.
But what if z will be 2D-array like np.array([z, z])?
You can use np.where to find the ones, and then np.diff to get the distances:
q=np.where(z==1)
np.diff(q[0])-1
out:
array([1, 3, 2], dtype=int64)
edit:
for 2d arrays:
You can use the minimum of the manhattan distance (decremented by 1) of the positions that have ones to get the number of zeros inbetween:
def manhattan_distance(a, b):
return np.abs(np.array(a) - np.array(b)).sum()
zeros_between = []
r, c = np.where(z==1)
coords = list(zip(r,c))
for i, c in enumerate(coords[:-1]):
zeros_between.append(
np.min([manhattan_distance(c, coords[j])-1 for j in range(i+1, len(coords))]))
If you dont want to use the for, you can use np.where and np.roll
import numpy as np
x = np.array([1, 0, 1, 0, 0, 0, 1, 0, 0, 1])
pos = np.where(x==1)[0] #pos = array([0, 2, 6, 9])
shift = np.roll(pos,-1) # shift = array([2, 6, 9, 0])
result = ((shift-pos)-1)[:-1]
#shift-pos = array([ 2, 4, 3, -9])
#(shif-pos)-1 = array([ 1, 3, 2, -10])
#((shif-pos)-1)[:-1] = array([ 1, 3, 2])
print(result)

Comparing 2 numpy arrays

I have 2 numpy arrays and I want whenever element B is 1, the element in A is equal to 0. Both arrays are always in the same dimension:
A = [1, 2, 3, 4, 5]
B = [0, 0, 0, 1, 0]
I tried to do numpy slicing but I still can't get it to work.
B[A==1]=0
How can I achieve this in numpy without doing the conventional loop ?
First, you need them to be numpy arrays and not lists. Then, you just inverted B and A.
import numpy as np
A = np.array([1, 2, 3, 4, 5])
B = np.array([0, 0, 0, 1, 0])
A[B==1]=0 ## array([1, 2, 3, 0, 5])
If you use lists instead, here is what you get
A = [1, 2, 3, 4, 5]
B = [0, 0, 0, 1, 0]
A[B==1]=0 ## [0, 2, 3, 4, 5]
That's because B == 1 is False or 0 (instead of an array). So you essentially write A[0] = 0
Isn't it that what you want to do ?
A[B==1] = 0
A
array([1, 2, 3, 0, 5])

Array of ranges with numpy

I have the following array:
>>> x = numpy.array([2,4,2,3,1])
>>> x
array([2, 4, 2, 3, 1])
I would like an array of ranges of these values. I can create it like this:
>>> numpy.hstack( (numpy.arange(v) for v in x) )
array([0, 1, 0, 1, 2, 3, 0, 1, 0, 1, 2, 0])
Given x, is there a faster way to generate this with numpy without having to use a for loop?
I figured it out:
>>> x
array([2, 4, 2, 3, 1])
>>> ends = numpy.cumsum(x)
>>> ranges = numpy.arange(ends[-1])
>>> ranges = ranges - numpy.repeat(ends-x, x)
>>> ranges
array([0, 1, 0, 1, 2, 3, 0, 1, 0, 1, 2, 0])
>>>
Is this actually faster ?
I have a similar need, and
concatenate([range(l, r) for l, r in array((left, right)).T])
is twice as fast as
range(end[-1]) + repeat(left + end, right-left)
(where end = cumsum(right - left) just like yours).
(in my very short experience, repeat is very slow -- at least in python 3.6)

Categories