Masking an array - python

I am trying to mask an array (called dataset) in python:
The array has the following size (5032, 48, 48). Basically these are 5032 48x48 images. But some of the images may not contain any data, so there might only be 0's there. These are the ones I want to mask.
I tried the following: (dataset[:] == 0).all(axis=0).
When I print the shape of the above operation I get (5032, 48) which is not what I want. I expected (5032, ).
I am not sure what I am doing wrong.
I wanted to create a mask with the size (5032, ) which has True (if there is at least one value in the 48x48 array that is nonzero) and False (if there are only zero values in the 48x48 array) values.
Thanks for your help

Kind of a hacky way, but just sum across the last two axis and check if the sum is zero.
nonzero_images = images[np.where(np.sum(images, axis = (1, 2)) == 0)]

You can try something like
# sample data - 3 nonzeros and 2 zeros
dataset = np.concatenate([np.ones((3, 48, 48)), np.zeros((2, 48, 48))])
new = dataset[np.unique(np.where(dataset.all(axis=1))[0])]
print(f'Dataset Shape: {dataset.shape}\nNew Shape: {new.shape}')
# Dataset Shape: (5, 48, 48)
# New Shape: (3, 48, 48)

Related

Split matrices into smaller blocks and do something based on its values

I have a numpy array ys_big_seg which has the following shape: (146, 128, 128). It contains pixel masks which values can be 0 or 1. 1 if the pixel is in a given category otherwise 0. I have to scale it to binary mask. So I want to iterate through the (128, 128) matrices and split it to (8, 8) matrices and then based on the smaller matrices values (if every element is 0 then 0, if every element is 1 then 1, if there are mixed values then randomly 0 or 1) substitute these smaller matrices with given values to reduce the (128, 128) matrices to (16, 16).
How can I solve this problem?
I hope it makes sense, sorry for my English.
I think this does what you're looking for:
>>> x.shape
(146, 128, 128)
>>> mask = x.reshape(-1, 16, 16, 8, 8).sum(axis=(3, 4)) >= 32
>>> mask.shape
(146, 16, 16)
Any 8x8 block with a mixture of 0s and 1s will result in a 1 if the total sum is >= 32 (i.e., half or more of the values values are 1), so it's not quite randomly chosen.
Obviously, a sum of 0 (all elements in an 8x8 block are 0) will "fail" that criteria and be 0, and a sum of 64 (all elements in an 8x8 block are 1) will "pass" and end up as a 1. If your matrices are a lot more sparse, you could lower the threshold from 32.
Since you're using this array as a mask, you can leave the 1s and 0s as their boolean counterparts. But if you plan to use the mask as a binary array, then you can easily add .astype(int).

Select elements of numpy array via mask and preserving original dimensions

Hello I have following data
ids = np.concatenate([1.0 * np.ones(shape=(4, 9,)),
2.0 * np.ones(shape=(4, 3,))], axis=1)
logits = np.random.normal(size=(4, 9 + 3, 256))
Now I want to get numpy array only of ids that have 1.0 and I want to get array of size (4,9, 256)
I tried logits[ids == 1.0, :] but I get (36, 256)
How I can make slicing without connecting first two dimensions ?
Current dimensions are only example ones and I am looking for generic solution.
Your question appears to assume that each row has the same number of nonzero entries; in that case you can solve your problem generally like this:
mask = (ids == 1)
num_per_row = mask.sum(1)
# same number of entries per row is required
assert np.all(num_per_row == num_per_row[0])
result = logits[mask].reshape(logits.shape[0], num_per_row[0], logits.shape[2])
print(result.shape)
# (4, 9, 256)

Numpy dot product along specific axes

I have a 512x512 image array and I want to perform operations on 8x8 blocks. At the moment I have something like this:
output = np.zeros(512, 512)
for i in range(0, 512, 8):
for j in rangerange(0, 512, 8):
a = input[i:i+8, j:j+8]
b = some_other_array[i:i+8, j:j+8]
output[i:i+8, j:j+8] = np.dot(a, b)
where a & b are 8x8 blocks derived from the original array. I would like to speed up this code by using vectorised operations. I have reshaped my inputs like this:
input = input.reshape(64, 8, 64, 8)
some_other_array = some_other_array.reshape(64, 8, 64, 8)
How could I perform a dot product on only axes 1 & 3 to output an array of shape (64, 8, 64, 8)?
I have tried np.tensordot(input, some_other_array, axes=([0, 1], [2, 3])) which gives the correct output shape, but the values do not match the output from the loop above. I've also looked at np.einsum but I haven't come across a simple example with what I'm trying to achieve.
As you suspected, np.einsum can take care of this. If input and some_other_array have shapes (64, 8, 64, 8), then if you write
output = np.einsum('ijkl,ilkm->ijkm', input, some_other_array)
then output will also have shape (64, 8, 64, 8), where matrix multiplication (i.e. np.dot) has been done only on axes 1 and 3.
The string argument to np.einsum looks complicated, but really it's a combination of two things. First, matrix multiplication is given by jl,lm->jm (see e.g. this answer on einsum). Second, we don't want to do anything to axis 0 and 2, so for them I just write ik,ik->ik. Combining the two gives ijkl,ilkm->ijkm.
They'll work if you reorder them a bit. If input and some_other_array are both shaped (64,8,64,8), then:
input = input.transpose(0,2,1,3)
some_other_array = some_other_array.transpose(0,2,1,3)
This will reorder them to 64,64,8,8. At this point you can compute a matrix multiplication. Do note that you need matmul to compute the block products, and not dot, which will try to multiply the entire matrices.
output = input # some_other_array
output = output.transpose(0,2,1,3)
output = output.reshape(512,512)

Understanding numpy shape

I'm newbie with Python and also with Numpy.
I have this code:
one_array.shape
When I run it, I get this output:
(20, 48, 240, 240)
one_array is a Numpy Array that has 20 images.
What do mean the other three numbers in shape output (48, 240, 240)?
Your array consist of 20 images, each of them is the size 48X240X240. Which is odd, I would expect that it will be something like 240X240X3 but for some reason you have way more channels (referring to RGB). ]
So the shape function return the size of dimension along each axis (the current shape of the entire array), so in your case there is (20, 48, 240, 240)
Edit:
As the user said, each image consist of 48 NITFY images of 1 channel which explain the output of shape
Imagine your Numpy Array as a Vector that can be in one dimension, but in your case it looks like it is in dimension 4.
(20, 4, 240, 240) means a big matrix composed of 20 x 4 x 240 x 240 elements.
one_array.shape == (20, 48, 240, 240) means that one_array is a 4-dimensional array with 20*48*240*240 or 55296000 elements.
You are right, you can think of one_array as an array with 20 elements, in which is element in another array with shape (48, 240, 240). However, usually is it better to think that one_array is a 4 dimensional array, that has a total of 20x48x240x240 = 55296000 elements.

Appending along 4th dimension of numpy array

I have a 4d set of data structured (time,level,lat,lon) that i am trying to interpolate. In order to do so easily i need to add an extra longitude value onto the end of the data with the same values as the first longitude. this will allow the interpolation method i am using to correctly interpolate at higher longitude values(eg 359)
currently data has dimension (64,70,64,128), need to make it have dimension (64,70,64,129) where the values at the last longitude is the same as the ones at the first longitude.
Here is what i have tried so far,
data = np.concatenate((data, data[:,:,:,0]), axis = 3)
and
data = np.append( data, data[:,:,:,0],axis = 3)
however i get
ValueError: all the input arrays must have same number of dimensions
for both, i tried adding an extra dimension to the data to append with data[:,:,:,0][...,np.newaxis] however that did not help.
At this point I am not sure how to go about doing this, other than looping through each time,level,lat and appending a single value, however i need to perform this operation to hundreds of sets of data so this would get very slow.
Any ideas?
The issue is that your arrays need to share the same shape (obviously from the error message), but what that means is that your arrays need to have the same number of dimensions. The quick answer is use
np.append(data, data[:,:,:,0,np.newaxis], axis=3)
# or alternatively in shorthand:
np.append(data, data[...,0,None], axis=-1)
Adding either None or np.newaxis at the end of your slice adds an extra dimension to the array:
>>> data.shape
(64, 70, 64, 128)
>>> data[...,0].shape
(64, 70, 64)
>>> data[...,0,None].shape
(64, 70, 64, 1)
This allows the arrays to share the same number of dimensions and the same shape in all dimensions but the one you're appending over.

Categories