Numpy where for 2 dimensional array - python

I have a 2 d numpy array. I need to keep all the rows whose value at a specific column is greater than a certain number. Right now, I have:
f_left = np.where(f_sorted[:,attribute] >= split_point)
And it is failing with: "Index Error: too many indices for array"
How should I do this? I can't figure it out from the numpy website, here

You actually don't even need where.
yy = np.array(range(12)).reshape((4,3))
yy[yy[:,1] > 2]
Output
array([[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])

x = np.array([[2,3,4],[5,6,7],[1,2,3],[8,9,10]])
array([[ 2, 3, 4],
[ 5, 6, 7],
[ 1, 2, 3],
[ 8, 9, 10]])
Find the rows where the second element are >=4
x[np.where(x[:,1] >= 4)]
array([[ 5, 6, 7],
[ 8, 9, 10]])

Related

Slice numpy array with start position and offset back in the array

I would like to slice a numpy array with a constant offset back in the array. I.e. start in the k'th position and go back n elements. I want to move the slight one index ahead at every iteration.
E.g. I have the following array
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
and let's say k is 5 and n is 3. That would give me the following (with ordering preserved)
x_sliced = [4, 5, 6]
In the next iteration k += 1 and n is still 3. That gives me the following array
x_sliced = [5, 6, 7]
I can sort of get the result but I'll have to flip the array to get back to the original order. Isn't there a clever way that just uses a position and an offset back in the array?
If I understand correctly, this could help:
from skimage.util.shape import view_as_windows
k = 5
n = 3
view_as_windows(x[k-n+1:], n)
output:
array([[ 4, 5, 6],
[ 5, 6, 7],
[ 6, 7, 8],
[ 7, 8, 9],
[ 8, 9, 10]])
Then you can loop over the output rows and process each window. One thing to note is that the overlapping windows share the same memory. If you wish that changing the values in one window does not affect the next overlapping window, simply copy it (.copy())
You can use np.lib.stride_tricks.as_strided for that which requires no other dependencies than NumPy itself!
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
b = np.lib.stride_tricks.as_strided(x, (len(x)-3+1, 3), 2 * x.strides)
b
> array([[ 1, 2, 3],
[ 2, 3, 4],
[ 3, 4, 5],
[ 4, 5, 6],
[ 5, 6, 7],
[ 6, 7, 8],
[ 7, 8, 9],
[ 8, 9, 10]])
Moreover, with list(b) you can turn the 2D array into a list of 1D arrays.

How to merge multiple Numpy array into single array

I want to merge multiple 2d Numpy array of shapes let say (r, a) ,(r,b) ,(r,c),...(r,z) into single 2d array of shape (r,a+b+c...+z)
I tried np.hstack but it needs the same shape & np.concat operates only on tuple as 2nd array.
You can use np.concatenate or np.hstack. Here is an example:
>>> a = np.arange(15).reshape(5,3)
>>> a
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
>>> b = np.arange(10).reshape(5,2)
>>> b
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> np.concatenate((a,b), axis =1)
array([[ 0, 1, 2, 0, 1],
[ 3, 4, 5, 2, 3],
[ 6, 7, 8, 4, 5],
[ 9, 10, 11, 6, 7],
[12, 13, 14, 8, 9]])
>>> np.hstack((a,b))
array([[ 0, 1, 2, 0, 1],
[ 3, 4, 5, 2, 3],
[ 6, 7, 8, 4, 5],
[ 9, 10, 11, 6, 7],
[12, 13, 14, 8, 9]])
Hope it helps
I am new to numpy but I think its not possible. Precondition is
"The arrays must have the same shape along all but the second axis, except 1-D arrays which can be any length."
Actually one of my function was returning scipy.sparse.csr.csr_matrix and I was converting it into np.array along with lists returned by another function so that I can merge all them but the sparse matrix was converted into
array(<73194x17 sparse matrix of type '' with 203371 stored elements in Compressed Sparse Row format>, dtype=object)
which was not compatible with np.hstack.
so sorry for the inconvenience.
I figured out my solution instead of numpy.hstack i used scipy hstack function.
Thank you, Everyone, for responding.

numpy - select multiple elements from each row of an array

I need to select multiple different values from each row of a 2D array.
A = np.array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9,10,11,12])
A[something]
>>> np.array([[ 1, 2],
[ 6, 7],
[11,12]])
I know I can create a boolean array the same shape as A and set each element in a for loop, but I'm hoping come up with a better solution.
You can try the following:
import numpy as np
A = np.array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9,10,11,12]])
i = [[0],[1],[2]]
j = [[0,1], [1,2],[2,3]]
B = A[i,j]
print(B)
#Prints
[[ 1 2]
[ 6 7]
[11 12]]

NumPy/PyTorch extract subsets of images

In Numpy, given a stack of large images A of size(N,hl,wl), and coordinates x of size(N) and y of size(N) I want to get smaller images of size (N,16,16)
In a for loop it would look like this:
B=numpy.zeros((N,16,16))
for i in range(0,N):
B[i,:,:]=A[i,y[i]:y[i]+16,x[i]:x[i]+16]
But can I do this just with indexing?
Bonus question: Will this indexing also work in pytorch? If not how can I implement this there?
In numpy slicing is very simple and the same logic works with a pytorch example. For example
imgs = np.random.normal(size=(16,24,24))
imgs[:,0:12,0:12].shape
imgs_tensor = torch.from_numpy(imgs)
imgs_tensor[:,0:12,0:12].size()
where the first : in the slicing indicates to select all the images in the batch. The 2nd and 3rd : indicates the slicing for height and width.
Pretty simple really with view_as_windows from scikit-image, to get those sliding windowed views as a 6D array with the fourth axis being singleton. Then, use advanced-indexing to select the ones we want based off the y and x indices for indexing into the second and third axes of the windowed array to get our B.
Hence, the implementation would be -
from skimage.util.shape import view_as_windows
BSZ = 16, 16 # Blocksize
A6D = view_as_windows(A,(1,BSZ[0],BSZ[1]))
B_out = A6D[np.arange(N),y,x,0]
Explanation
To explain to other readers on what's really going on with the problem, here's a sample run on a smaller dataset and with a blocksize of (2,2) -
1) Input array (3D) :
In [78]: A
Out[78]:
array([[[ 5, 5, 3, 5, 3, 8],
[ 5, *2, 6, 2, 2, 4],
[ 4, 3, 4, 9, 3, 8],
[ 6, 3, 3, 10, 4, 5],
[10, 2, 5, 7, 6, 7],
[ 5, 4, 2, 5, 2, 10]],
[[ 4, 9, 8, 4, 9, 8],
[ 7, 10, 8, 2, 10, 9],
[10, *9, 3, 2, 4, 7],
[ 5, 10, 8, 3, 5, 4],
[ 6, 8, 2, 4, 10, 4],
[ 2, 8, 6, 2, 7, 5]],
[[ *4, 8, 7, 2, 9, 9],
[ 2, 10, 2, 3, 8, 8],
[10, 7, 5, 8, 2, 10],
[ 7, 4, 10, 9, 6, 9],
[ 3, 4, 9, 9, 10, 3],
[ 6, 4, 10, 2, 6, 3]]])
2) y and x indices to index into the second and third axes :
In [79]: y
Out[79]: array([1, 2, 0])
In [80]: x
Out[80]: array([1, 1, 0])
3) Finally the desired output, which is a block each from each of the 2D slice along the first axis and whose starting point (top left corner point) is (y,x) on that 2D slice. Refer to the asterisks in A for those -
In [81]: B
Out[81]:
array([[[ 2, 6],
[ 3, 4]],
[[ 9, 3],
[10, 8]],
[[ 4, 8],
[ 2, 10]]])
This is an implementation of extract_glimpse similar with tf.image.extract_glimpse in PyTorch. It should be satisfied your need:
https://github.com/jimmysue/xvision/blob/main/xvision/ops/extract_glimpse.py#L14

Difference between A[1:3][0:2] and A[1:3,0:2]

I can't figure out the difference between these two kinds of indexing. It seems like they should produce the same results but they do not. Any explanation?
A[1:3, 0:2] takes rows 1-3 and columns 0-2 thus returning a 2x2 array.
A[1:3][0:2] first takes rows 1-3 and from this subarray takes the rows 0-2, resulting in a 2xn array where n is the original number of columns.
In [1]: import numpy as np
In [2]: a = np.arange(16).reshape(4,4)
In [3]: a
Out[3]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [4]: a[1:3,0:2]
Out[4]:
array([[4, 5],
[8, 9]])
In [5]: a[1:3]
Out[5]:
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [6]: a[1:3][0:2]
Out[6]:
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
The equivalent of A[1:3,0:2] using two [] is: A[1:3][:,0:2]:
In [7]: a[1:3][:,0:2]
Out[7]:
array([[4, 5],
[8, 9]])
Where : means "all the rows". So you are first selecting the rows via [1:3] and then, from all the rows select columns 0-2.
A[1:3][0:2] means first apply [1:3] on A, and then apply [0:2] on the array returned from the first step, so both slicing are only applied on the rows. OTOH A[1:3, 0:2] means apply 1:3 on the rows and 0:2 on columns, ie. get second and third row only and get only the first two columns of those rows.
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> a[1:3][0:2]
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> a[1:3] #Get 2nd and 3rd row.
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> _[0:2] #Get the first two rows of the last array.
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> a[1:3, 0:2]
array([[4, 5],
[8, 9]])

Categories