Related
How can I count the number of times an array is present in a larger array?
a = np.array([1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1])
b = np.array([1, 1, 1])
The count for the number of times b is present in a should be 3
b can be any combination of 1s and 0s
I'm working with huge arrays, so for loops are pretty slow
If the subarray being searched for contains all 1s, you can count the number of times the subarray appears in the larger array by convolving the two arrays with np.convolve and counting the number of entries in the result that equal the size of the subarray:
# 'valid' = convolve only over the complete overlap of the signals
>>> np.convolve(a, b, mode='valid')
array([1, 1, 2, 3, 2, 2, 2, 3, 3, 2, 1, 1])
# ^ ^ ^ <= Matches
>>> win_size = min(a.size, b.size)
>>> np.count_nonzero(np.convolve(a, b) == win_size)
3
For subarrays that may contain 0s, you can start by using convolution to transform a into an array containing the binary numbers encoded by each window of size b.size. Then just compare each element of the transformed array with the binary number encoded by b and count the matches:
>>> b = np.array([0, 1, 1]) # encodes '3'
>>> weights = 2 ** np.arange(b.size) # == [1, 2, 4, 8, ..., 2**(b.size-1)]
>>> np.convolve(a, weights, mode='valid')
array([4, 1, 3, 7, 6, 5, 3, 7, 7, 6, 4, 1])
# ^ ^ Matches
>>> target = (b * np.flip(weights)).sum() # target==3
>>> np.count_nonzero(np.convolve(a, weights, mode='valid') == target)
2
Not a super fast method, but you can view a as a windowed array using np.lib.stride_tricks.sliding_window_view:
window = np.lib.stride_tricks.sliding_window_view(a, b.shape)
You can now equate this to b directly and find where they match:
result = (window == b).all(-1).sum()
For older versions of numpy (pre-1.20.0), you can use np.libs.stride_tricks.as_strided to achieve a similar result:
window = np.lib.stride_tricks.as_strided(
a, shape=(*(np.array(a.shape) - b.shape + 1), *b.shape),
strides=a.strides + (a.strides[0],) * b.ndim)
Here is a solution using a list comprehension:
a = [1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1]
b = [1, 1, 1]
sum(a[i:i+len(b)]==b for i in range(len(a)-len(b)))
output: 3
Here are a few improvements on #Brian's answer:
Use np.correlate not np.convolve; they are nearly identical but convolve reads a and b in opposite directions
To deal with templates that have zeros convert the zeros to -1. For example:
a = np.array([1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1])
b = np.array([0,1,1])
np.correlate(a,2*b-1)
# array([-1, 1, 2, 1, 0, 0, 2, 1, 1, 0, -1, 1])
The template fits where the correlation equals the number of ones in the template. The indices can be extracted like so:
(np.correlate(a,2*b-1)==np.count_nonzero(b)).nonzero()[0]
# array([2, 6])
If you only need the count use np.count_nonzero
np.count_nonzero((np.correlate(a,2*b-1)==np.count_nonzero(b)))
# 2
For example, I have such array z:
array([1, 0, 1, 0, 0, 0, 1, 0, 0, 1])
How to find a distances between two successive 1s in this array? (measured in the numbers of 0s)
For example, in the z array, such distances are:
[1, 3, 2]
I have such code for it:
distances = []
prev_idx = 0
for idx, element in enumerate(z):
if element == 1:
distances.append(idx - prev_idx)
prev_idx = idx
distances = np.array(distances[1:]) - 1
Can this opeartion be done without for-loop and maybe in more efficient way?
UPD
The solution in the #warped answer works fine in 1-D case.
But what if z will be 2D-array like np.array([z, z])?
You can use np.where to find the ones, and then np.diff to get the distances:
q=np.where(z==1)
np.diff(q[0])-1
out:
array([1, 3, 2], dtype=int64)
edit:
for 2d arrays:
You can use the minimum of the manhattan distance (decremented by 1) of the positions that have ones to get the number of zeros inbetween:
def manhattan_distance(a, b):
return np.abs(np.array(a) - np.array(b)).sum()
zeros_between = []
r, c = np.where(z==1)
coords = list(zip(r,c))
for i, c in enumerate(coords[:-1]):
zeros_between.append(
np.min([manhattan_distance(c, coords[j])-1 for j in range(i+1, len(coords))]))
If you dont want to use the for, you can use np.where and np.roll
import numpy as np
x = np.array([1, 0, 1, 0, 0, 0, 1, 0, 0, 1])
pos = np.where(x==1)[0] #pos = array([0, 2, 6, 9])
shift = np.roll(pos,-1) # shift = array([2, 6, 9, 0])
result = ((shift-pos)-1)[:-1]
#shift-pos = array([ 2, 4, 3, -9])
#(shif-pos)-1 = array([ 1, 3, 2, -10])
#((shif-pos)-1)[:-1] = array([ 1, 3, 2])
print(result)
I have a huge training dataset with 4 classes. These classes are labeled non-consecutively. To be able to apply a sequential neural network the classes have to be relabeled so that the unique values in the classes are consecutive. In addition, at the end of the script I have to relabel them back to their old values.
I know how to relabel them with loops:
def relabel(old_classes, new_classes):
indexes=[np.where(old_classes ==np.unique(old_classes)[i]) for i in range(len(new_classes))]
for i in range(len(new_classes )):
old_classes [indexes[i]]=new_classes[i]
return old_classes
>>> old_classes = np.array([0,1,2,6,6,2,6,1,1,0])
>>> new_classes = np.arange(len(np.unique(old_classes)))
>>> relabel(old_classes,new_classes)
array([0, 1, 2, 3, 3, 2, 3, 1, 1, 0])
But this isn't nice coding and it takes quite a lot of time.
Any idea how to vectorize this relabeling?
To be clear, I also want to be able to relabel them back to their old values:
>>> relabeled_classes=np.array([0, 1, 2, 3, 3, 2, 3, 1, 1, 0])
>>> old_classes = np.array([0,1,2,6])
>>> relabel(relabeled_classes,old_classes )
array([0,1,2,6,6,2,6,1,1,0])
We can use the optional argument return_inverse with np.unique to get those unique sequential IDs/tags, like so -
unq_arr, unq_tags = np.unique(old_classes,return_inverse=1)
Index into unq_arr with unq_tags to retrieve back -
old_classes_retrieved = unq_arr[unq_tags]
Sample run -
In [69]: old_classes = np.array([0,1,2,6,6,2,6,1,1,0])
In [70]: unq_arr, unq_tags = np.unique(old_classes,return_inverse=1)
In [71]: unq_arr
Out[71]: array([0, 1, 2, 6])
In [72]: unq_tags
Out[72]: array([0, 1, 2, 3, 3, 2, 3, 1, 1, 0])
In [73]: old_classes_retrieved = unq_arr[unq_tags]
In [74]: old_classes_retrieved
Out[74]: array([0, 1, 2, 6, 6, 2, 6, 1, 1, 0])
I have 2 numpy arrays and I want whenever element B is 1, the element in A is equal to 0. Both arrays are always in the same dimension:
A = [1, 2, 3, 4, 5]
B = [0, 0, 0, 1, 0]
I tried to do numpy slicing but I still can't get it to work.
B[A==1]=0
How can I achieve this in numpy without doing the conventional loop ?
First, you need them to be numpy arrays and not lists. Then, you just inverted B and A.
import numpy as np
A = np.array([1, 2, 3, 4, 5])
B = np.array([0, 0, 0, 1, 0])
A[B==1]=0 ## array([1, 2, 3, 0, 5])
If you use lists instead, here is what you get
A = [1, 2, 3, 4, 5]
B = [0, 0, 0, 1, 0]
A[B==1]=0 ## [0, 2, 3, 4, 5]
That's because B == 1 is False or 0 (instead of an array). So you essentially write A[0] = 0
Isn't it that what you want to do ?
A[B==1] = 0
A
array([1, 2, 3, 0, 5])
I'll start with a statement of the problem. Afterward I will demonstrate a brief sequence of coding which progressively builds the solution until the problem is reached. Obviously, here the goal is to compute b. I am asking how to do it most efficiently, ideally using an elementwise numpy vector expression, with no iterations nor loops at all:
b = sum(v)-a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py", line 1251, in sum
return _wrapit(a, 'sum', axis, dtype, out)
File "/usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py", line 37, in _wrapit
result = getattr(asarray(obj),method)(*args, **kwds)
File "/usr/lib64/python2.6/site-packages/numpy/core/numeric.py", line 230, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
Here is my best version of the code I tried to do which led up to that error. I added some print statements for clarity:
a = array([0,1,0,1,1])
b = +a
print b
array([0, 1, 0, 1, 1])
b = array([sum(a[0:2]), sum(a[0:3]), sum(a[1:4]), sum(a[2:5]), sum(a[3:5])])
print b
array([1, 1, 2, 2, 2])
b = array([sum(a[0:2])-a[0], sum(a[0:3])-a[1], sum(a[1:4])-a[2], sum(a[2:5])-a[3], sum(a[3:5])-a[4]])
print b
array([1, 0, 2, 1, 1])
v = [a[0:2], a[0:3], a[1:4], a[2:5], a[3:5]]
print v
[array([0, 1]), array([0, 1, 0]), array([1, 0, 1]), array([0, 1, 1]), array([1, 1])]
Notice that v is a view list. The views are referring to backing array a.
print a
array([0, 1, 0, 1, 1])
a[0]=9
print v
[array([9, 1]), array([9, 1, 0]), array([1, 0, 1]), array([0, 1, 1]), array([1, 1])]
a[0]=0
print v
[array([0, 1]), array([0, 1, 0]), array([1, 0, 1]), array([0, 1, 1]), array([1, 1])]
So far so good: Variable v is a true view, meaning v is updated when a is updated.
b = array([sum(v[0])-a[0], sum(v[1])-a[1], sum(v[2])-a[2], sum(v[3])-a[3], sum(v[4])-a[4]])
print b
array([1, 0, 2, 1, 1])
Excellent, so far so good. Now lets simplify the line of code a little further.... Please notice that variables b, v, and a, all have the same number of elements.
b = sum(v)-a
Traceback (most recent call last)...(error messages)...
Oh-oh, bad code! Now, I also tried other ways to express b, but they were similarly errors, and there is no need for me to show much more bad code here. The question is how to express the assignment expression correctly, yet most efficiently. Especially helpful for computation if it is possible in this particular application, would be to completely avoid looping expressions and avoid list comprehensions, after the views have been set up.
It's OK in this application to set up the views using slow loops. The views wont be changing very often. The backing array a will be changing often, and will be quite large.
Thank you for reading and any of your best proposals!
For the particular view v you posted, the computation can be expressed as a convolution with the kernel [1, 1, 1]:
In [78]: import numpy as np
In [80]: a = np.array([0,1,0,1,1])
In [81]: b = np.convolve(a, [1,1,1], 'same') - a
In [82]: b
Out[82]: array([1, 0, 2, 1, 1])
You didn't say how your v changes with time, but perhaps if they are similar, you can continue expressing the computation as a convolution with changes to the kernel.
How about:
numpy.vectorize(sum)(v) - a
For example:
>>> import numpy
>>> a = numpy.array([0,1,0,1,1])
>>> v = [a[0:2], a[0:3], a[1:4], a[2:5], a[3:5]]
>>> numpy.vectorize(sum)(v) - a
array([1, 0, 2, 1, 1])
I think unutbu's answer is the right one. But just for the sake of diversity, here's an approach that uses a rolling window over a base array, of which a is also a view. This assumes we know the length of a in advance.
First we create an overallocated array:
>>> datalen = 5
>>> base = numpy.zeros(datalen + 2, dtype='i8')
Then we define a as a truncated view of that array and initialize it:
>>> a = base[1:-1]
>>> a[:] = [0, 1, 0, 1, 1]
Now we use stride_tricks. The normal strides for an array of shape (5, 3) and dtype='i8' would be (24, 8); by reducing 24 to 8, we ensure that the starting point of each row moves forward by one item instead of 3.
>>> window = numpy.lib.stride_tricks.as_strided(base, shape=(5, 3),
strides=(8, 8))
>>> window
array([[0, 0, 1],
[0, 1, 0],
[1, 0, 1],
[0, 1, 1],
[1, 1, 0]])
Now we can call sum(axis=1):
>>> window.sum(axis=1) - a
array([1, 0, 2, 1, 1])
a and window point to the same memory, so updates work correctly:
>>> a[0] = 9
>>> window
array([[0, 9, 1],
[9, 1, 0],
[1, 0, 1],
[0, 1, 1],
[1, 1, 0]])
>>> window.sum(axis=1) - a
array([1, 9, 2, 1, 1])
I'll also point out that for the particular example you offer here, something as simple as this works:
>>> base[:5] + base[-5:]
array([1, 9, 2, 1, 1])
>>> a[0] = 0
>>> base[:5] + base[-5:]
array([1, 0, 2, 1, 1])
But I guess your actual needs are more complex.
What about:
b = [sum(i) for i in v] - a
(which works because v as the same number of items as a, and that they're both 1D) ?