numpy argmin vectorization - python

I'm trying to iterate over numpy rows, and put the index of each cluster of 3 elements that contains the lowest value into another row. This should be in the context of left, middle, right; the left and right edges only look at two values ('left and middle' or 'middle and right'), but everything in the middle should look at all 3.
For loops do this trivially, but it's very slow. Some kind of numpy vectorization would probably speed this up.
For example:
[1 18 3 6 2]
# should give the indices...
[0 0 2 4 4] # matching values 1 1 3 2 2
Slow for loop of an implementation:
for y in range(height):
for x in range(width):
i = 0 if x == 0 else x - 1
other_array[y,x] = np.argmin(array[y,i:x+2]) + i

NOTE: See update below for a solution with no for loops.
This works for an array of any number of dimensions:
def window_argmin(arr):
padded = np.pad(
arr,
[(0,)] * (arr.ndim-1) + [(1,)],
'constant',
constant_values=np.max(arr)+1,
)
slices = np.concatenate(
[
padded[..., np.newaxis, i:i+3]
for i in range(arr.shape[-1])
],
axis=-2,
)
return (
np.argmin(slices, axis=-1) +
np.arange(-1, arr.shape[-1]-1)
)
The code uses np.pad to pad the last dimension of the array with an extra number to the left and one to the right, so we can always use windows of 3 elements for the argmin. It sets the extra elements as max+1 so they'll never be picked by argmin.
Then it uses an np.concatenate of a list of slices to add a new dimension with each of 3-element windows. This is the only place we're using a for loop and we're only looping over the last dimension, once, to create the separate 3-element windows. (See update below for a solution that removes this for loop.)
Finally, we call np.argmin on each of the windows.
We need to adjust them, which we can do by adding the offset of the first element of the window (which is actually -1 for the first window, since it's a padded element.) We can do the adjustment with a simple sum of an arange array, which works with the broadcast.
Here's a test with your sample array:
>>> x = np.array([1, 18, 3, 6, 2])
>>> window_argmin(x)
array([0, 0, 2, 4, 4])
And a 3d example:
>>> z
array([[[ 1, 18, 3, 6, 2],
[ 1, 2, 3, 4, 5],
[ 3, 6, 19, 19, 7]],
[[ 1, 18, 3, 6, 2],
[99, 4, 4, 67, 2],
[ 9, 8, 7, 6, 3]]])
>>> window_argmin(z)
array([[[0, 0, 2, 4, 4],
[0, 0, 1, 2, 3],
[0, 0, 1, 4, 4]],
[[0, 0, 2, 4, 4],
[1, 1, 1, 4, 4],
[1, 2, 3, 4, 4]]])
UPDATE: Here's a version using stride_tricks that doesn't use any for loops:
def window_argmin(arr):
padded = np.pad(
arr,
[(0,)] * (arr.ndim-1) + [(1,)],
'constant',
constant_values=np.max(arr)+1,
)
slices = np.lib.stride_tricks.as_strided(
padded,
shape=arr.shape + (3,),
strides=padded.strides + (padded.strides[-1],),
)
return (
np.argmin(slices, axis=-1) +
np.arange(-1, arr.shape[-1]-1)
)
What helped me come up with the stride tricks solution was this numpy issue asking to add a sliding window function, linking to an example implementation of it, so I just adapted it for this specific case. It's still pretty much magic to me, but it works. 😁
Tested and works as expected for arrays of different numbers of dimensions.

import numpy as np
array = [1, 18, 3, 6, 2]
array.insert(0, np.max(array) + 1) # right shift of array
# [19, 1, 18, 3, 6, 2]
other_array = [ np.argmin(array[i-1:i+2]) + i - 2 for i in range(1, len(array)) ]
array.remove(np.max(array)) # original array
# [1, 18, 3, 6, 2]

Related

Numpy (python) - create a matrix with rows having subsequent values multiplied by the row's number

I want to create an n × n matrix with rows having subsequent values multiplied by the row's number. For example for n = 4:
[[0, 1, 2, 3], [0, 2, 4, 6], [0, 3, 6, 9], [0, 4, 8, 12]]
For creating such a matrix, I know the following code can be used:
n, n = 3, 3
K = np.empty(shape=(n, n), dtype=int)
i,j = np.ogrid[:n, :n]
L = i+j
print(L)
I don't know how I can make rows having subsequent values multiplied by the row's number.
You can use the outer product of two vectors to create an array like that. Use np.outer(). For example, for n = 4:
import numpy as np
n = 4
row = np.arange(n)
np.outer(row + 1, row)
This produces:
array([[ 0, 1, 2, 3],
[ 0, 2, 4, 6],
[ 0, 3, 6, 9],
[ 0, 4, 8, 12]])
Take a look at row and try different orders of multiplication etc to see what's going on here. As others pointed out in the commets, you should also review your code to see that you're creating n twice and not using K (and in general I'd avoid np.empty() as a beginner because it can lead to unexpected behaviour).

Replace consecutive duplicates in 2D numpy array

I have a two dimensional numpy array x:
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
My goal is to replace all consecutive duplicate numbers with a specific value (lets take -1), but by leaving one occurrence unchanged.
I could do this as follows:
def replace_consecutive_duplicates(x):
consec_dup = np.zeros(x.shape, dtype=bool)
consec_dup[:, 1:] = np.diff(x, axis=1) == 0
x[consec_dup] = -1
return x
# current output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, 5, -1, -1, 3],
# [ 0, 2, -1, -1, -1, 1, -1, 4]])
However, in this case the one occurrence left unchanged is always the first.
My goal is to leave the middle occurrence unchanged.
So given the same x as input, the desired output of function replace_consecutive_duplicates is:
# desired output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, -1, 5, -1, 3],
# [ 0, -1, 2, -1, -1, 1, -1, 4]])
Note that in case consecutive duplicate sequences with an even number of occurrences the middle left value should be unchanged. So the consecutive duplicate sequence [2, 2, 2, 2] in x[1] becomes [-1, 2, -1, -1]
Also note that I'm looking for a vectorized solution for 2D numpy arrays since performance is of absolute importance in my particular use case.
I've already tried looking at things like run length encoding and using np.diff(), but I didn't manage to solve this. Hope you guys can help!
The main problem is that you require the length of the number of consecutives values. This is not easy to get with numpy, but using itertools.groupby we can solve it using the following code.
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
def replace_row(arr: np.ndarray, new_val=-1):
results = []
for val, count in itertools.groupby(arr):
k = len(list(count))
results.extend([new_val] * ((k - 1) // 2))
results.append(val)
results.extend([new_val] * (k // 2))
return np.fromiter(results, arr.dtype)
if __name__ == '__main__':
for idx, row in enumerate(x):
x[idx, :] = replace_row(row)
print(x)
Output:
[[ 1 2 8 4 -1 5 -1 3]
[ 0 -1 2 -1 -1 1 -1 4]]
This isn't vectorized, but can be combined with multi threading since every row is handled one by one.

Numpy concatenate lists where first column is in range n

I am trying to select all rows in a numpy matrix named matrix with shape (25323, 9), where the values of the first column are inside the range of start and end for each tuple on the list range_tuple. Ultimately, I want to create a new numpy matrix with the result where final has a shape of (n, 9). The following code returns this error: TypeError: only integer scalar arrays can be converted to a scalar index. I have also tried initializing final with numpy.zeros((1,9)) and used np.concatenate but get similar results. I do get a compiled result when I use final.append(result) instead of using np.concatenate but the shape of the matrix gets lost. I know there is a proper solution to this problem, any help would be appreciated.
final = []
for i in range_tuples:
copy = np.copy(matrix)
start = i[0]
end = i[1]
result = copy[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate(final, result)
final = np.matrix(final)
In [33]: arr
Out[33]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]])
In [34]: tups = [(0,6),(3,12),(9,10),(15,14)]
In [35]: alist=[]
...: for start, stop in tups:
...: res = arr[(arr[:,0]<stop)&(arr[:,0]>=start), :]
...: alist.append(res)
...:
check the list; note that elements differ in shape; some are 1 or 0 rows. It's a good idea to test these edge cases.
In [37]: alist
Out[37]:
[array([[0, 1, 2],
[3, 4, 5]]), array([[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]]), array([[ 9, 10, 11]]), array([], shape=(0, 3), dtype=int64)]
vstack joins them:
In [38]: np.vstack(alist)
Out[38]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[ 9, 10, 11]])
Here concatenate also works, because default axis is 0, and all inputs are already 2d.
Try the following
final = np.empty((0,9))
for start, stop in range_tuples:
result = matrix[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate((final, result))
The first is to initialize final as a numpy array. The first argument to concatenate has to be a python list of the arrays, see docs. In your code it interprets the result variable as the value for the parameter axis
Notes
I used tuple deconstruction to make the loop clearer
the copy is not needed
appending lists can be faster. The final result can afterwards be obtained through reshaping, if result is always of the same length.
I would simply create a boolean mask to select rows that satisfy required conditions.
EDIT: I missed that you are working with matrix (as opposite to ndarray). Answer was edited for matrix.
Assume following input data:
matrix = np.matrix([[1, 2, 3], [5, 6, 7], [2, 1, 7], [3, 4, 5], [8, 9, 0]])
range_tuple = [(0, 2), (1, 4), (1, 9), (5, 9), (0, 100)]
Then, first, I would convert range_tuple to a numpy.ndarray:
range_mat = np.matrix(range_tuple)
Now, create the mask:
mask = np.ravel((matrix[:, 0] > range_mat[:, 0]) & (matrix[:, 0] < range_mat[:, 1]))
Apply the mask:
final = matrix[mask] # or matrix[mask].copy() if you intend to modify matrix
To check:
print(final)
[[1 2 3]
[2 1 7]
[8 9 0]]
If length of range_tuple can be different from the number of rows in the matrix, then do this:
n = min(range_mat.shape[0], matrix.shape[0])
mask = np.pad(
np.ravel(
(matrix[:n, 0] > range_mat[:n, 0]) & (matrix[:n, 0] < range_mat[:n, 1])
),
(0, matrix.shape[0] - n)
)
final = matrix[mask]

Incrementing elements of an array that are indexed by another array

I'm implementing a Circle Hough Transform, so I have a 3D Numpy array C of counters representing possible Xcenter, Ycenter, Radius combinations. I want to increment the counters that are indexed by another 2D Numpy array I. So, for example, if I is
[[xc0, yc0, r0],
...,
[xcN, ycN, rN]]
then I want to say something like:
C[I] = C[I] + 1
and I want the effect to be:
C[xc0, yc0, r0] = C[xc0, yc0, r0] + 1
...
C[xcN, ycN, rN] = C[xcN, ycN, rN] + 1
However the indexing that's performed seems to be mixed up, referring to the wrong entries in C. Further, I would really prefer to say something like:
C[I] += 1
since this would appear to reduce the amount of index calculation.
So, two questions:
How can I get the effect of "array indexed by array"?
Can I get away with using the increment operator, and does it actually save any time?
The technique you are seeking is generally called advanced or fancy indexing. The premise to fancy indexing is that you need indices of broadcastable size in each dimension. The corresponding elements at each location in the index arrays select a single element from the array being indexed. In your case, all that means is that you need to split I across the different dimensions. Since I is currently N x 3, you can do
C[tuple(I.T)] += 1
If you could pre-transpose I somehow, you could do
C[*I] += 1
Using the in-place increment is by far your best bet here. If you do
C[tuple(I.T)] = C[tuple(I.T)] + 1
a copy of the N indexed elements will be made. The copy will then be incremented, and reassigned correctly to the source array. You can imagine how this would be much more expensive than just incrementing values in place.
I support what #MadPhysicist suggested. The following elaborates on his suggestions and validates that you are getting consistent result.
Possible Methods
# method-1
C[I[:,0], I[:,1], I[:,2]] += 1
# method-2
C[tuple(I.T)] += 1
Solution in Detail
Make Dummy Data
I = np.vstack([
np.random.randint(6, size=10),
np.random.randint(5, size=10),
np.random.randint(3, size=10),
]).T
C = np.arange(90).reshape((6,5,3))
I
Output
array([[2, 3, 2],
[1, 3, 2],
[2, 0, 0],
[0, 3, 0],
[2, 0, 2],
[2, 3, 2],
[4, 0, 2],
[2, 1, 2],
[4, 1, 1],
[1, 1, 1]])
First we use list comprehension
Here we do this to extract values from C while treating I as a subset of its indices. Thus we will know what to expect, if we follow what #MadPhysicist suggested.
I2 = [tuple(x) for x in tuple(I)]
[C[x] for x in I2]
Output
[41, 26, 30, 9, 32, 41, 62, 35, 64, 19]
Crosscheck
Let's see what is inside I2.
[(2, 3, 2),
(1, 3, 2),
(2, 0, 0),
(0, 3, 0),
(2, 0, 2),
(2, 3, 2),
(4, 0, 2),
(2, 1, 2),
(4, 1, 1),
(1, 1, 1)]
So, this shows us that we have something tangible.
Test Other Methods
Method-1
C[I[:,0], I[:,1], I[:,2]]
Method-2
C[tuple(I.T)]
Output
Both the methods 1 and 2 produce the same as before.
array([41, 26, 30, 9, 32, 41, 62, 35, 64, 19])
Ok. So Indexing Problem is Over.
Now we address the problem posed in this question. Use, either method-1 or method-2 below. Method-2 is more concise (as suggested by #MadPhysicist).
# method-1
C[I[:,0], I[:,1], I[:,2]] += 1
# method-2
C[tuple(I.T)] += 1
Quick Test
Here is a quick test first (without changing C first) as a safety precaution.
B = C.copy()
B[tuple(I.T)] += 1
B[tuple(I.T)]
Output
array([42, 27, 31, 10, 33, 42, 63, 36, 65, 20])
So, it works!
You could do something like this:
In [3]: c = np.array([[1, 2, 3], [2, 3, 4], [3, 4, 5]])
In [4]: c
Out[4]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
In [5]: i = [0, 2]
In [6]: c[i]
Out[6]:
array([[1, 2, 3],
[3, 4, 5]])
In [7]: c[i] + 1
Out[7]:
array([[2, 3, 4],
[4, 5, 6]])
you could simply do it as c[i] where i is the indices. ([5] in above.)
you can simply let numpy handle that stuff. Incrementing by 1 or adding some scalar to a matrix is handled by broadcasting the scalar. As for whether it is faster, I don't know. You can read more about it here: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
Hope that helps.

Choose indices in numpy arrays on particular dimensions [duplicate]

This question already has answers here:
Index n dimensional array with (n-1) d array
(3 answers)
Closed 4 years ago.
It is hard to find a clear title but an example will put it clearly.
For example, my inputs are:
c = np.full((4, 3, 2), 5)
c[:,:,1] *= 2
ix = np.random.randint(0, 2, (4, 3))
if ix is:
array([[1, 0, 1],
[0, 0, 1],
[0, 0, 1],
[1, 1, 0]])
if want as a result:
array([[10, 5, 10],
[ 5, 5, 10],
[ 5, 5, 10],
[10, 10, 5]])
My c array can be of arbitrary dimensions, as well a the dimension I want to sample in.
It sounds like interpolation, but I'm reluctant to construct a be array of indices each time I want to apply this. Is there a way of doing this using some kind of indexing on numpy arrays ? Or do I have to use some interpolation methods...
Speed and memory are a concern here because I have to do this many times, and the arrays can be really large.
Thanks for any insight !
Create the x, y indices with numpy.ogrid, and then use advanced indexing:
idx, idy = np.ogrid[:c.shape[0], :c.shape[1]]
c[idx, idy, ix]
#array([[10, 5, 10],
# [ 5, 5, 10],
# [ 5, 5, 10],
# [10, 10, 5]])

Categories