Select elements of numpy array via mask and preserving original dimensions - python

Hello I have following data
ids = np.concatenate([1.0 * np.ones(shape=(4, 9,)),
2.0 * np.ones(shape=(4, 3,))], axis=1)
logits = np.random.normal(size=(4, 9 + 3, 256))
Now I want to get numpy array only of ids that have 1.0 and I want to get array of size (4,9, 256)
I tried logits[ids == 1.0, :] but I get (36, 256)
How I can make slicing without connecting first two dimensions ?
Current dimensions are only example ones and I am looking for generic solution.

Your question appears to assume that each row has the same number of nonzero entries; in that case you can solve your problem generally like this:
mask = (ids == 1)
num_per_row = mask.sum(1)
# same number of entries per row is required
assert np.all(num_per_row == num_per_row[0])
result = logits[mask].reshape(logits.shape[0], num_per_row[0], logits.shape[2])
print(result.shape)
# (4, 9, 256)

Related

Masking an array

I am trying to mask an array (called dataset) in python:
The array has the following size (5032, 48, 48). Basically these are 5032 48x48 images. But some of the images may not contain any data, so there might only be 0's there. These are the ones I want to mask.
I tried the following: (dataset[:] == 0).all(axis=0).
When I print the shape of the above operation I get (5032, 48) which is not what I want. I expected (5032, ).
I am not sure what I am doing wrong.
I wanted to create a mask with the size (5032, ) which has True (if there is at least one value in the 48x48 array that is nonzero) and False (if there are only zero values in the 48x48 array) values.
Thanks for your help
Kind of a hacky way, but just sum across the last two axis and check if the sum is zero.
nonzero_images = images[np.where(np.sum(images, axis = (1, 2)) == 0)]
You can try something like
# sample data - 3 nonzeros and 2 zeros
dataset = np.concatenate([np.ones((3, 48, 48)), np.zeros((2, 48, 48))])
new = dataset[np.unique(np.where(dataset.all(axis=1))[0])]
print(f'Dataset Shape: {dataset.shape}\nNew Shape: {new.shape}')
# Dataset Shape: (5, 48, 48)
# New Shape: (3, 48, 48)

How to remove matrices in a numpy array of matrices?

I have a numpy array arr_seg_labs which has the following shape: (1735, 128, 128).
It contains pixel masks between 1 and 10 and also contains zeros and 255 (background).
I want to remove those (128, 128) matrices which not contain the given category identifier (9) and to keep those which contain at least one 9.
I made a mask (horse_mask) for this, but I don't know how can I continue this thread to filter this numpy array
CAT_IDX_HORSE = 9
horse_mask = arr_seg_labs == CAT_IDX_HORSE
IIUC you can use masks and indexing as:
CAT_IDX_HORSE = 9
mask = (a == CAT_IDX_HORSE ).sum((1, 2))
result = a[mask != 0]

Numpy dot product along specific axes

I have a 512x512 image array and I want to perform operations on 8x8 blocks. At the moment I have something like this:
output = np.zeros(512, 512)
for i in range(0, 512, 8):
for j in rangerange(0, 512, 8):
a = input[i:i+8, j:j+8]
b = some_other_array[i:i+8, j:j+8]
output[i:i+8, j:j+8] = np.dot(a, b)
where a & b are 8x8 blocks derived from the original array. I would like to speed up this code by using vectorised operations. I have reshaped my inputs like this:
input = input.reshape(64, 8, 64, 8)
some_other_array = some_other_array.reshape(64, 8, 64, 8)
How could I perform a dot product on only axes 1 & 3 to output an array of shape (64, 8, 64, 8)?
I have tried np.tensordot(input, some_other_array, axes=([0, 1], [2, 3])) which gives the correct output shape, but the values do not match the output from the loop above. I've also looked at np.einsum but I haven't come across a simple example with what I'm trying to achieve.
As you suspected, np.einsum can take care of this. If input and some_other_array have shapes (64, 8, 64, 8), then if you write
output = np.einsum('ijkl,ilkm->ijkm', input, some_other_array)
then output will also have shape (64, 8, 64, 8), where matrix multiplication (i.e. np.dot) has been done only on axes 1 and 3.
The string argument to np.einsum looks complicated, but really it's a combination of two things. First, matrix multiplication is given by jl,lm->jm (see e.g. this answer on einsum). Second, we don't want to do anything to axis 0 and 2, so for them I just write ik,ik->ik. Combining the two gives ijkl,ilkm->ijkm.
They'll work if you reorder them a bit. If input and some_other_array are both shaped (64,8,64,8), then:
input = input.transpose(0,2,1,3)
some_other_array = some_other_array.transpose(0,2,1,3)
This will reorder them to 64,64,8,8. At this point you can compute a matrix multiplication. Do note that you need matmul to compute the block products, and not dot, which will try to multiply the entire matrices.
output = input # some_other_array
output = output.transpose(0,2,1,3)
output = output.reshape(512,512)

Selecting which dimension to index in a numpy array

I am writing a program that is suppose to be able to import numpy arrays of some higher dimension, e.g. something like an array a:
a = numpy.zeros([3,5,7,2])
Further, each dimension will correspond to some physical dimension, e.g. frequency, distance, ... and I will also import arrays with information about these dimensions, e.g. for a above:
freq = [1,2,3]
time = [0,1,2,3,4,5,6]
distance = [0,0,0,4,1]
angle = [0,180]
Clearly from this example and the signature it can be figured out that freq belong to dimension 0, time to dimension 2 and so on. But since this is not known in advance, I can take a frequency slice like
a_f1 = a[1,:,:,:]
since I do not know which dimension the frequency is indexed.
So, what I would like is to have some way to chose which dimension to index with an index; in some Python'ish code something like
a_f1 = a.get_slice([0,], [[1],])
This is suppose to return the slice with index 1 from dimension 0 and the full other dimensions.
Doing
a_p = a[0, 1:, ::2, :-1]
would then correspond to something like
a_p = a.get_slice([0, 1, 2, 3], [[0,], [1,2,3,4], [0,2,4,6], [0,]])
You can fairly easily construct a tuple of indices, using slice objects where needed, and then use this to index into your array. The basic is recipe is this:
indices = {
0: # put here whatever you want to get on dimension 0,
1: # put here whatever you want to get on dimension 1,
# leave out whatever dimensions you want to get all of
}
ix = [indices.get(dim, slice(None)) for dim in range(arr.ndim)]
arr[ix]
Here I have done it with a dictionary since I think that makes it easier to see which dimension goes with which indexer.
So with your example data:
x = np.zeros([3,5,7,2])
We do this:
indices = {0: 1}
ix = [indices.get(dim, slice(None)) for dim in range(x.ndim)]
>>> x[ix].shape
(5L, 7L, 2L)
Because your array is all zeros, I'm just showing the shape of the result to indicate that it is what we want. (Even if it weren't all zeros, it's hard to read a 3D array in text form.)
For your second example:
indices = {
0: 0,
1: slice(1, None),
2: slice(None, None, 2),
3: slice(None, -1)
}
ix = [indices.get(dim, slice(None)) for dim in range(x.ndim)]
>>> x[ix].shape
(4L, 4L, 1L)
You can see that the shape corresponds to the number of values in your a_p example. One thing to note is that the first dimension is gone, since you only specified one value for that index. The last dimension still exists, but with a length of one, because you specified a slice that happens to just get one element. (This is the same reason that some_list[0] gives you a single value, but some_list[:1] gives you a one-element list.)
You can use advanced indexing to achieve this.
The index for each dimension needs to be shaped appropriately so that the indices will broadcast correctly across the array. For example, the index for the first dimension of a 3-d array needs to be shaped (x, 1, 1) so that it will broadcast across the first dimension. The index for the second dimension of a 3-d array needs to be shaped (1, y, 1) so that it will broadcast across the second dimension.
import numpy as np
a = np.zeros([3,5,7,2])
b = a[0, 1:, ::2, :-1]
indices = [[0,], [1,2,3,4], [0,2,4,6], [0,]]
def get_aslice(a, indices):
n_dim_ = len(indices)
index_array = [np.array(thing) for thing in indices]
idx = []
# reshape the arrays by adding single-dimensional entries
# based on the position in the index array
for d, thing in enumerate(index_array):
shape = [1] * n_dim_
shape[d] = thing.shape[0]
#print(d, shape)
idx.append(thing.reshape(shape))
c = a[idx]
# to remove leading single-dimensional entries from the shape
#while c.shape[0] == 1:
# c = np.squeeze(c, 0)
# To remove all single-dimensional entries from the shape
#c = np.squeeze(c).shape
return c
For a as an input, it returns an array with shape (1,4,4,1) your a_p example has a shape of (4,4,1). If the extra dimensions need to be removed un-comment the np.squeeze lines in the function.
Now I feel silly. While reading the docs slower I noticed numpy has an indexing routine that does what you want - numpy.ix_
>>> a = numpy.zeros([3,5,7,2])
>>> indices = [[0,], [1,2,3,4], [0,2,4,6], [0,]]
>>> index_arrays = np.ix_(*indices)
>>> a_p = a[index_arrays]
>>> a_p.shape
(1, 4, 4, 1)
>>> a_p = np.squeeze(a_p)
>>> a_p.shape
(4, 4)
>>>

numpy insert 2D array into 4D structure

I have a 4D array: array = np.random.rand(3432,1,30,512)
I also have 5 sets of 2D arrays with shape (30,512)
I want to insert these into the 4D structure along axis 1 so that my final shape is (3432,6,30,512) (5 new arrays + the original 1). I need to iteratively insert this set for each of the 3432 elements
Whats the most effective way to do this?
I've tried reshaping the 2D to 4D and then inserting along axis 1. I'm expecting axis 1 to never exceed a size of 6, but the 2D arrays just keep getting added, rather than a set for each of the 3432 elements. I think my problem lies in not fully understanding the obj param for the insert method:
all_data = np.reshape(all_data, (-1, 1, 30, 512))
for i in range(all_data.shape[0]):
num_band = 1
for band in range(5):
temp_trial = np.zeros((30, 512)) # Just an example. values arent actually 0
temp_trial = np.reshape(temp_trial, (1,1,30,512))
all_data = np.insert(all_data, num_band, temp_trial, 1)
num_band += 1
Create an array with the final shape first and insert the elements later:
final = np.zeros((3432,6,30,512))
for i in range(3432): # note, this will take a while
for j in range(6):
final[i, j, :, :] = # insert your array here (np.ones((30, 512)))
or if you actually want to broadcast this over the zeroth axis, assuming each of the 3432 should be the same for each "band":
for i in range(6):
final[:, i, :, :] = # insert your array here (np.ones((30, 512)))
As long as you don't do many loops there is no need to vectorize it

Categories