I'm implementing a Circle Hough Transform, so I have a 3D Numpy array C of counters representing possible Xcenter, Ycenter, Radius combinations. I want to increment the counters that are indexed by another 2D Numpy array I. So, for example, if I is
[[xc0, yc0, r0],
...,
[xcN, ycN, rN]]
then I want to say something like:
C[I] = C[I] + 1
and I want the effect to be:
C[xc0, yc0, r0] = C[xc0, yc0, r0] + 1
...
C[xcN, ycN, rN] = C[xcN, ycN, rN] + 1
However the indexing that's performed seems to be mixed up, referring to the wrong entries in C. Further, I would really prefer to say something like:
C[I] += 1
since this would appear to reduce the amount of index calculation.
So, two questions:
How can I get the effect of "array indexed by array"?
Can I get away with using the increment operator, and does it actually save any time?
The technique you are seeking is generally called advanced or fancy indexing. The premise to fancy indexing is that you need indices of broadcastable size in each dimension. The corresponding elements at each location in the index arrays select a single element from the array being indexed. In your case, all that means is that you need to split I across the different dimensions. Since I is currently N x 3, you can do
C[tuple(I.T)] += 1
If you could pre-transpose I somehow, you could do
C[*I] += 1
Using the in-place increment is by far your best bet here. If you do
C[tuple(I.T)] = C[tuple(I.T)] + 1
a copy of the N indexed elements will be made. The copy will then be incremented, and reassigned correctly to the source array. You can imagine how this would be much more expensive than just incrementing values in place.
I support what #MadPhysicist suggested. The following elaborates on his suggestions and validates that you are getting consistent result.
Possible Methods
# method-1
C[I[:,0], I[:,1], I[:,2]] += 1
# method-2
C[tuple(I.T)] += 1
Solution in Detail
Make Dummy Data
I = np.vstack([
np.random.randint(6, size=10),
np.random.randint(5, size=10),
np.random.randint(3, size=10),
]).T
C = np.arange(90).reshape((6,5,3))
I
Output
array([[2, 3, 2],
[1, 3, 2],
[2, 0, 0],
[0, 3, 0],
[2, 0, 2],
[2, 3, 2],
[4, 0, 2],
[2, 1, 2],
[4, 1, 1],
[1, 1, 1]])
First we use list comprehension
Here we do this to extract values from C while treating I as a subset of its indices. Thus we will know what to expect, if we follow what #MadPhysicist suggested.
I2 = [tuple(x) for x in tuple(I)]
[C[x] for x in I2]
Output
[41, 26, 30, 9, 32, 41, 62, 35, 64, 19]
Crosscheck
Let's see what is inside I2.
[(2, 3, 2),
(1, 3, 2),
(2, 0, 0),
(0, 3, 0),
(2, 0, 2),
(2, 3, 2),
(4, 0, 2),
(2, 1, 2),
(4, 1, 1),
(1, 1, 1)]
So, this shows us that we have something tangible.
Test Other Methods
Method-1
C[I[:,0], I[:,1], I[:,2]]
Method-2
C[tuple(I.T)]
Output
Both the methods 1 and 2 produce the same as before.
array([41, 26, 30, 9, 32, 41, 62, 35, 64, 19])
Ok. So Indexing Problem is Over.
Now we address the problem posed in this question. Use, either method-1 or method-2 below. Method-2 is more concise (as suggested by #MadPhysicist).
# method-1
C[I[:,0], I[:,1], I[:,2]] += 1
# method-2
C[tuple(I.T)] += 1
Quick Test
Here is a quick test first (without changing C first) as a safety precaution.
B = C.copy()
B[tuple(I.T)] += 1
B[tuple(I.T)]
Output
array([42, 27, 31, 10, 33, 42, 63, 36, 65, 20])
So, it works!
You could do something like this:
In [3]: c = np.array([[1, 2, 3], [2, 3, 4], [3, 4, 5]])
In [4]: c
Out[4]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
In [5]: i = [0, 2]
In [6]: c[i]
Out[6]:
array([[1, 2, 3],
[3, 4, 5]])
In [7]: c[i] + 1
Out[7]:
array([[2, 3, 4],
[4, 5, 6]])
you could simply do it as c[i] where i is the indices. ([5] in above.)
you can simply let numpy handle that stuff. Incrementing by 1 or adding some scalar to a matrix is handled by broadcasting the scalar. As for whether it is faster, I don't know. You can read more about it here: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
Hope that helps.
Related
I want to know where array a is equal to any of the values in array b.
For example,
a = np.random.randint(0,16, size=(3,4))
b = np.array([2,3,9])
# like this, but for any size b:
locations = np.nonzero((a==b[0]) | (a==b[1]) | (a==b[3]))
The reason is so I can change the values in a from (any of b) to another value:
a[locations] = 99
Or-ing the equality checks is not a great solution, because I would like to do this without knowing the size of b ahead of time. Is there an array solution?
[edit]
There are now 2 good answers to this question, one using broadcasting with extra dimensions, and another using np.in1d. Both work for the specific case in this question. I ended up using np.isin instead, since it seems like it is more agnostic to the shapes of both a and b.
I accepted the answer that taught me about in1d since that led me to my preferred solution.
You can use np.in1d then reshape back to a's shape so you can set the values in a to your special flag.
import numpy as np
np.random.seed(410012)
a = np.random.randint(0, 16, size=(3, 4))
#array([[ 8, 5, 5, 15],
# [ 3, 13, 8, 10],
# [ 3, 11, 0, 10]])
b = np.array([[2,3,9], [4,5,6]])
a[np.in1d(a, b).reshape(a.shape)] = 999
#array([[ 8, 999, 999, 15],
# [999, 13, 8, 10],
# [999, 11, 0, 10]])
Or-ing the equality checks is not a great solution, because I would like to do this without knowing the size of b ahead of time.
EDIT:
Vectorized equivalent to the code you have written above -
a = np.random.randint(0,16, size=(3,4))
b = np.array([2,3,9])
locations = np.nonzero((a==b[0]) | (a==b[1]) | (a==b[2]))
locations2 = np.nonzero((a[None,:,:]==b[:,None,None]).any(0))
np.allclose(locations, locations2)
True
This shows that your output is exactly the same as this output, without the need of explicitly mentioning b[0], b[1]... or using a for loop.
Explanation -
Broadcasting an operation can help you in this case. What you are trying to do is to compare each of the (3,4) matrix elements to each value in b which is (3,). This means that the resultant boolean matrix that you want is going to be three, (3,4) matrices, or (3,3,4)
Once you have done that, you want to take an ANY or OR between the three (3,4) matrices element-wise. That would reduce the (3,3,4) to a (3,4)
Finally you want to use np.nonzero to identify the locations where values are equal to TRUE
The above 3 steps can be done as follows -
Broadcasting comparison operation:
a[None,:,:]==b[:,None,None]] #(1,3,4) == (3,1,1) -> (3,3,4)
Reduction using OR logic:
(a[None,:,:]==b[:,None,None]).any(0) #(3,3,4) -> (3,4)
Get non-zero locations:
np.nonzero((a[None,:,:]==b[:,None,None]).any(0))
numpy.isin works on multi-dimensional a and b.
In [1]: import numpy as np
In [2]: a = np.random.randint(0, 16, size=(3, 4)); a
Out[2]:
array([[12, 2, 15, 11],
[12, 15, 5, 10],
[ 4, 2, 14, 7]])
In [3]: b = [2, 4, 5, 12]
In [4]: c = [[2, 4], [5, 12]]
In [5]: np.isin(a, b).astype(int)
Out[5]:
array([[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 0, 0]])
In [6]: np.isin(a, c).astype(int)
Out[6]:
array([[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 0, 0]])
In [7]: a[np.isin(a, b)] = 99; a
Out[7]:
array([[99, 99, 15, 11],
[99, 15, 99, 10],
[99, 99, 14, 7]])
I'm trying to iterate over numpy rows, and put the index of each cluster of 3 elements that contains the lowest value into another row. This should be in the context of left, middle, right; the left and right edges only look at two values ('left and middle' or 'middle and right'), but everything in the middle should look at all 3.
For loops do this trivially, but it's very slow. Some kind of numpy vectorization would probably speed this up.
For example:
[1 18 3 6 2]
# should give the indices...
[0 0 2 4 4] # matching values 1 1 3 2 2
Slow for loop of an implementation:
for y in range(height):
for x in range(width):
i = 0 if x == 0 else x - 1
other_array[y,x] = np.argmin(array[y,i:x+2]) + i
NOTE: See update below for a solution with no for loops.
This works for an array of any number of dimensions:
def window_argmin(arr):
padded = np.pad(
arr,
[(0,)] * (arr.ndim-1) + [(1,)],
'constant',
constant_values=np.max(arr)+1,
)
slices = np.concatenate(
[
padded[..., np.newaxis, i:i+3]
for i in range(arr.shape[-1])
],
axis=-2,
)
return (
np.argmin(slices, axis=-1) +
np.arange(-1, arr.shape[-1]-1)
)
The code uses np.pad to pad the last dimension of the array with an extra number to the left and one to the right, so we can always use windows of 3 elements for the argmin. It sets the extra elements as max+1 so they'll never be picked by argmin.
Then it uses an np.concatenate of a list of slices to add a new dimension with each of 3-element windows. This is the only place we're using a for loop and we're only looping over the last dimension, once, to create the separate 3-element windows. (See update below for a solution that removes this for loop.)
Finally, we call np.argmin on each of the windows.
We need to adjust them, which we can do by adding the offset of the first element of the window (which is actually -1 for the first window, since it's a padded element.) We can do the adjustment with a simple sum of an arange array, which works with the broadcast.
Here's a test with your sample array:
>>> x = np.array([1, 18, 3, 6, 2])
>>> window_argmin(x)
array([0, 0, 2, 4, 4])
And a 3d example:
>>> z
array([[[ 1, 18, 3, 6, 2],
[ 1, 2, 3, 4, 5],
[ 3, 6, 19, 19, 7]],
[[ 1, 18, 3, 6, 2],
[99, 4, 4, 67, 2],
[ 9, 8, 7, 6, 3]]])
>>> window_argmin(z)
array([[[0, 0, 2, 4, 4],
[0, 0, 1, 2, 3],
[0, 0, 1, 4, 4]],
[[0, 0, 2, 4, 4],
[1, 1, 1, 4, 4],
[1, 2, 3, 4, 4]]])
UPDATE: Here's a version using stride_tricks that doesn't use any for loops:
def window_argmin(arr):
padded = np.pad(
arr,
[(0,)] * (arr.ndim-1) + [(1,)],
'constant',
constant_values=np.max(arr)+1,
)
slices = np.lib.stride_tricks.as_strided(
padded,
shape=arr.shape + (3,),
strides=padded.strides + (padded.strides[-1],),
)
return (
np.argmin(slices, axis=-1) +
np.arange(-1, arr.shape[-1]-1)
)
What helped me come up with the stride tricks solution was this numpy issue asking to add a sliding window function, linking to an example implementation of it, so I just adapted it for this specific case. It's still pretty much magic to me, but it works. 😁
Tested and works as expected for arrays of different numbers of dimensions.
import numpy as np
array = [1, 18, 3, 6, 2]
array.insert(0, np.max(array) + 1) # right shift of array
# [19, 1, 18, 3, 6, 2]
other_array = [ np.argmin(array[i-1:i+2]) + i - 2 for i in range(1, len(array)) ]
array.remove(np.max(array)) # original array
# [1, 18, 3, 6, 2]
I am trying to select all rows in a numpy matrix named matrix with shape (25323, 9), where the values of the first column are inside the range of start and end for each tuple on the list range_tuple. Ultimately, I want to create a new numpy matrix with the result where final has a shape of (n, 9). The following code returns this error: TypeError: only integer scalar arrays can be converted to a scalar index. I have also tried initializing final with numpy.zeros((1,9)) and used np.concatenate but get similar results. I do get a compiled result when I use final.append(result) instead of using np.concatenate but the shape of the matrix gets lost. I know there is a proper solution to this problem, any help would be appreciated.
final = []
for i in range_tuples:
copy = np.copy(matrix)
start = i[0]
end = i[1]
result = copy[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate(final, result)
final = np.matrix(final)
In [33]: arr
Out[33]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]])
In [34]: tups = [(0,6),(3,12),(9,10),(15,14)]
In [35]: alist=[]
...: for start, stop in tups:
...: res = arr[(arr[:,0]<stop)&(arr[:,0]>=start), :]
...: alist.append(res)
...:
check the list; note that elements differ in shape; some are 1 or 0 rows. It's a good idea to test these edge cases.
In [37]: alist
Out[37]:
[array([[0, 1, 2],
[3, 4, 5]]), array([[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]]), array([[ 9, 10, 11]]), array([], shape=(0, 3), dtype=int64)]
vstack joins them:
In [38]: np.vstack(alist)
Out[38]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[ 9, 10, 11]])
Here concatenate also works, because default axis is 0, and all inputs are already 2d.
Try the following
final = np.empty((0,9))
for start, stop in range_tuples:
result = matrix[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate((final, result))
The first is to initialize final as a numpy array. The first argument to concatenate has to be a python list of the arrays, see docs. In your code it interprets the result variable as the value for the parameter axis
Notes
I used tuple deconstruction to make the loop clearer
the copy is not needed
appending lists can be faster. The final result can afterwards be obtained through reshaping, if result is always of the same length.
I would simply create a boolean mask to select rows that satisfy required conditions.
EDIT: I missed that you are working with matrix (as opposite to ndarray). Answer was edited for matrix.
Assume following input data:
matrix = np.matrix([[1, 2, 3], [5, 6, 7], [2, 1, 7], [3, 4, 5], [8, 9, 0]])
range_tuple = [(0, 2), (1, 4), (1, 9), (5, 9), (0, 100)]
Then, first, I would convert range_tuple to a numpy.ndarray:
range_mat = np.matrix(range_tuple)
Now, create the mask:
mask = np.ravel((matrix[:, 0] > range_mat[:, 0]) & (matrix[:, 0] < range_mat[:, 1]))
Apply the mask:
final = matrix[mask] # or matrix[mask].copy() if you intend to modify matrix
To check:
print(final)
[[1 2 3]
[2 1 7]
[8 9 0]]
If length of range_tuple can be different from the number of rows in the matrix, then do this:
n = min(range_mat.shape[0], matrix.shape[0])
mask = np.pad(
np.ravel(
(matrix[:n, 0] > range_mat[:n, 0]) & (matrix[:n, 0] < range_mat[:n, 1])
),
(0, matrix.shape[0] - n)
)
final = matrix[mask]
TL;DR:
I am looking for a way to get a non trivial, and in particular non contigous, view of a numpy ndarray.
E.g., given a 1D ndarray, x = np.array([1, 2, 3, 4]), is there a way to get a non trivial view of it, e.g. np.array([2, 4, 3, 1])?
Longer Version
The context of the question is the following: I have a 4D ndarray of shape (U, V, S, T) which I would like to reshape to a 2D ndarray of shape (U*S, V*T)in a non-trivial way, i.e. a simple np.reshape()does not do the trick as I have a more complex indexing scheme in mind, in which the reshaped array will not be contigous in memory. The arrays in my case are rather large and I would like to get a view and not a copy of the array.
Example
Given an array x(u, v, s, t)of shape (2, 2, 2, 2):
x = np.array([[[[1, 1], [1, 1]],[[2, 2], [2, 2]]],
[[[3, 3], [3, 3]], [[4, 4], [4, 4]]]])
I would like to get the view z(a, b) of the array:
np.array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
This corresponds to a indexing scheme of a = u * S + s and b = v * T + t, where in this case S = 2 = T.
What I have tried
Various approaches using np.reshape or even as_strided. Doing standard reshaping will not change the order of elements as they appear in the memory. I tried playing around with order='F' and transposing a bit but had no idea which gave me the correct result.
Since I know the indexing scheme, I tried to operate on the flattened view of the array using np.ravel(). My idea was to create an array of indices follwing the desired indexing scheme and apply it to the flattened array view, but unfortunately, fancy/advanced indexing gives a copy of the array, not a view.
Question
Is there any way to achieve the indexing view that I'm looking for?
In principle, I think this should be possible, as for example ndarray.sort() performs an in place non-trivial indexing of the array. On the other hand, this is probably implemented in C/C++, so it might even not be possible in pure Python?
Let's review the basics of an array - it has a flat data buffer, a shape, strides, and dtype. Those three attributes are used to view the elements of the data buffer in a particular way, whether it is a simple 1d sequence, 2d or higher dimensions.
A true view than use the same data buffer, but applies different shape, strides or dtype to it.
To get [2, 4, 3, 1] from [1,2,3,4] requires starting at 2, jumping forward 2, then skipping back to 1 and forward 2. That's not a regular pattern that can be represented by strides.
arr[1::2] gives the [2,4], and arr[0::2] gives the [1,3].
(U, V, S, T) to (U*S, V*T) requires a transpose to (U, S, V, T), followed by a reshape
arr.transpose(0,2,1,3).reshape(U*S, V*T)
That will require a copy, no way around that.
In [227]: arr = np.arange(2*3*4*5).reshape(2,3,4,5)
In [230]: arr1 = arr.transpose(0,2,1,3).reshape(2*4, 3*5)
In [231]: arr1.shape
Out[231]: (8, 15)
In [232]: arr1
Out[232]:
array([[ 0, 1, 2, 3, 4, 20, 21, 22, 23, 24, 40, 41, 42,
43, 44],
[ 5, 6, 7, 8, 9, 25, 26, 27, 28, 29, 45, 46, 47,
48, 49],
....)
Or with your x
In [234]: x1 = x.transpose(0,2,1,3).reshape(4,4)
In [235]: x1
Out[235]:
array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
Notice that the elements are in a different order:
In [254]: x.ravel()
Out[254]: array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
In [255]: x1.ravel()
Out[255]: array([1, 1, 2, 2, 1, 1, 2, 2, 3, 3, 4, 4, 3, 3, 4, 4])
ndarray.sort is in-place and changes the order of bytes in the data buffer. It is operating at a low level that we don't have access to. It isn't a view of the original array.
I have a 2d and 1d array. I am looking to find the two rows that contain at least once the values from the 1d array as follows:
import numpy as np
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
results = []
for elem in B:
results.append(np.where(A==elem)[0])
This works and results in the following array:
[array([0, 5], dtype=int64),
array([0, 3], dtype=int64),
array([2, 4], dtype=int64),
array([0, 2], dtype=int64)]
But this is probably not the best way of proceeding. Following the answers given in this question (Search Numpy array with multiple values) I tried the following solutions:
out1 = np.where(np.in1d(A, B))
num_arr = np.sort(B)
idx = np.searchsorted(B, A)
idx[idx==len(num_arr)] = 0
out2 = A[A == num_arr[idx]]
But these give me incorrect values:
In [36]: out1
Out[36]: (array([ 0, 1, 2, 6, 8, 9, 13, 17], dtype=int64),)
In [37]: out2
Out[37]: array([0, 3, 1, 2, 3, 1, 2, 0])
Thanks for your help
If you need to know whether each row of A contains ANY element of array B without interest in which particular element of B it is, the following script can be used:
input:
np.isin(A,B).sum(axis=1)>0
output:
array([ True, False, True, True, True, True])
Since you're dealing with a 2D array* you can use broadcasting to compare B with raveled version of A. This will give you the respective indices in a raveled shape. Then you can reverse the result and get the corresponding indices in original array using np.unravel_index.
In [50]: d = np.where(B[:, None] == A.ravel())[1]
In [51]: np.unravel_index(d, A.shape)
Out[51]: (array([0, 5, 0, 3, 2, 4, 0, 2]), array([0, 2, 2, 0, 0, 1, 1, 2]))
^
# expected result
* From documentation: For 3-dimensional arrays this is certainly efficient in terms of lines of code, and, for small data sets, it can also be computationally efficient. For large data sets, however, the creation of the large 3-d array may result in sluggish performance.
Also, Broadcasting is a powerful tool for writing short and usually intuitive code that does its computations very efficiently in C. However, there are cases when broadcasting uses unnecessarily large amounts of memory for a particular algorithm. In these cases, it is better to write the algorithm's outer loop in Python. This may also produce more readable code, as algorithms that use broadcasting tend to become more difficult to interpret as the number of dimensions in the broadcast increases.
Is something like this what you are looking for?
import numpy as np
from itertools import combinations
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
for i in combinations(A, 2):
if np.all(np.isin(B, np.hstack(i))):
print(i[0], ' ', i[1])
which prints the following:
[0 3 1] [2 7 3]
[0 3 1] [6 2 7]
note: this solution does NOT require the rows be consecutive. Please let me know if that is required.