Masking arrays using numpy - python

I have an array and I want to mask it such that I Keep its shape as it is i.e, not to delete the masked elements.
For example in this code
input = torch.randn(2, 5)
mask = input > 0
input = input[mask]
input = input *1000000000000
print(input)
printing the input is the result of the above mathematical operation on the unmasked elements and returns a 1D array without the masked elements.

you're overwriting your original array when you do input = input[mask]. If you omit that step, you can modify the masked values in place, but keep the non-masked values as is
i = np.random.randn(2, 5)
print(i)
[[ 0.48857855 0.97799014 2.29587523 -2.37257331 1.28193921]
[ 0.62932172 1.37433223 -1.2427145 0.31424802 1.34534568]]
mask = i> 0
i[mask] *= 1000000000000
print(i)
[[ 4.88578545e+11 9.77990142e+11 2.29587523e+12 -2.37257331e+00 1.28193921e+12]
[ 6.29321720e+11 1.37433223e+12 -1.24271450e+00 3.14248021e+11 1.34534568e+12]]

Related

How to remove matrices in a numpy array of matrices?

I have a numpy array arr_seg_labs which has the following shape: (1735, 128, 128).
It contains pixel masks between 1 and 10 and also contains zeros and 255 (background).
I want to remove those (128, 128) matrices which not contain the given category identifier (9) and to keep those which contain at least one 9.
I made a mask (horse_mask) for this, but I don't know how can I continue this thread to filter this numpy array
CAT_IDX_HORSE = 9
horse_mask = arr_seg_labs == CAT_IDX_HORSE
IIUC you can use masks and indexing as:
CAT_IDX_HORSE = 9
mask = (a == CAT_IDX_HORSE ).sum((1, 2))
result = a[mask != 0]

Randomly select rows from numpy array based on a condition

Let's say I have 2 arrays of arrays, labels is 1D and data is 5D note that both arrays have the same first dimension.
To simplify things let's say labels contain only 3 arrays :
labels=np.array([[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]])
And let's say I have a datalist of data arrays (length=3) where each array has a 5D shape where the first dimension of each one is the same as the arrays of the labels array.
In this example, datalist has 3 arrays of shapes : (8,3,100,10,1), (5,3,100,10,1) and (10,3,100,10,1) respectively. Here, the first dimension of each of these arrays is the same as the lengths of each array in label.
Now I want to reduce the number of zeros in each array of labels and keep the other values. Let's say I want to keep only 3 zeros for each array. Therefore, the length of each array in labels as well as the first dimension of each array in data will be 6, 4 and 8.
In order to reduce the number of zeros in each array of labels, I want to randomly select and keep only 3. Now these same random selected indexes will be used then to select the correspondant rows from data.
For this example, the new_labels array will be something like this :
new_labels=np.array([[0,0,1,1,2,0],[4,0,0,0],[0,3,2,1,0,1,7,0]])
Here's what I have tried so far :
all_ind=[] #to store indexes where value=0 for all arrays
indexes_to_keep=[] #to store the random selected indexes
new_labels=[] #to store the final results
for i in range(len(labels)):
ind=[] #to store indexes where value=0 for one array
for j in range(len(labels[i])):
if (labels[i][j]==0):
ind.append(j)
all_ind.append(ind)
for k in range(len(labels)):
indexes_to_keep.append(np.random.choice(all_ind[i], 3))
aux= np.zeros(len(labels[i]) - len(all_ind[i]) + 3)
....
....
Here, how can I fill **aux** with the values ?
....
....
new_labels.append(aux)
Any suggestions ?
Playing with numpy arrays of different lenghts is not a good idea therefore you are required to iterate each item and perform some method on it. Assuming you want to optimize that method only, masking might work pretty well here:
def specific_choice(x, n):
'''leaving n random zeros of the list x'''
x = np.array(x)
mask = x != 0
idx = np.flatnonzero(~mask)
np.random.shuffle(idx) #dynamical change of idx value, quite fast
idx = idx[:n]
mask[idx] = True
return x[mask] # or mask if you need it
Iteration of list is faster than one of array so effective usage would be:
labels = [[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]]
output = [specific_choice(n, 3) for n in labels]
Output:
[array([0, 1, 1, 2, 0, 0]), array([0, 4, 0, 0]), array([0, 3, 0, 2, 1, 1, 7, 0])]

How can I ensure a numpy array to be either a 2D row- or column vector?

Is there a numpy function to ensure a 1D- or 2D- array to be either a column or row vector?
For example, I have either one of the following vectors/lists. What is the easiest way to convert any of the input into a column vector?
x1 = np.array(range(5))
x2 = x1[np.newaxis, :]
x3 = x1[:, np.newaxis]
def ensureCol1D(x):
# The input is either a 0D list or 1D.
assert(len(x.shape)==1 or (len(x.shape)==2 and 1 in x.shape))
x = np.atleast_2d(x)
n = x.size
print(x.shape, n)
return x if x.shape[0] == n else x.T
assert(ensureCol1D(x1).shape == (x1.size, 1))
assert(ensureCol1D(x2).shape == (x2.size, 1))
assert(ensureCol1D(x3).shape == (x3.size, 1))
Instead of writing my own function ensureCol1D, is there something similar already available in numpy that ensures a vector to be column?
Your question is essentially how to convert an array into a "column", a column being a 2D array with a row length of 1. This can be done with ndarray.reshape(-1, 1).
This means that you reshape your array to have a row length of one, and let numpy infer the number of rows / column length.
x1 = np.array(range(5))
print(x1.reshape(-1, 1))
Output:
array([[0],
[1],
[2],
[3],
[4]])
You get the same output when reshaping x2 and x3. Additionally this also works for n-dimensional arrays:
x = np.random.rand(1, 2, 3)
print(x.reshape(-1, 1).shape)
Output:
(6, 1)
Finally the only thing missing here is that you make some assertions to ensure that arrays that cannot be converted are not converted incorrectly. The main check you're making is that the number of non-one integers in the shape is less than or equal to one. This can be done with:
assert sum(i != 1 for i in x1.shape) <= 1
This check along with .reshape let's you apply your logic on all numpy arrays.

numpy insert 2D array into 4D structure

I have a 4D array: array = np.random.rand(3432,1,30,512)
I also have 5 sets of 2D arrays with shape (30,512)
I want to insert these into the 4D structure along axis 1 so that my final shape is (3432,6,30,512) (5 new arrays + the original 1). I need to iteratively insert this set for each of the 3432 elements
Whats the most effective way to do this?
I've tried reshaping the 2D to 4D and then inserting along axis 1. I'm expecting axis 1 to never exceed a size of 6, but the 2D arrays just keep getting added, rather than a set for each of the 3432 elements. I think my problem lies in not fully understanding the obj param for the insert method:
all_data = np.reshape(all_data, (-1, 1, 30, 512))
for i in range(all_data.shape[0]):
num_band = 1
for band in range(5):
temp_trial = np.zeros((30, 512)) # Just an example. values arent actually 0
temp_trial = np.reshape(temp_trial, (1,1,30,512))
all_data = np.insert(all_data, num_band, temp_trial, 1)
num_band += 1
Create an array with the final shape first and insert the elements later:
final = np.zeros((3432,6,30,512))
for i in range(3432): # note, this will take a while
for j in range(6):
final[i, j, :, :] = # insert your array here (np.ones((30, 512)))
or if you actually want to broadcast this over the zeroth axis, assuming each of the 3432 should be the same for each "band":
for i in range(6):
final[:, i, :, :] = # insert your array here (np.ones((30, 512)))
As long as you don't do many loops there is no need to vectorize it

Crop part of np.array

Ihave a numpy array A like
A.shape
(512,270,1,20)
I dont want to use all the 20 layers in dimension 4. The new array should be like
Anew.shape
(512,270,1,2)
So I want to crop out 2 "slices" of the array A
From the python documentation, the answer is:
start = 4 # Index where you want to start.
Anew = A[:,:,:,start:start+2]
You can use a list or array of indices rather than slice notation in order to select an arbitrary sequence of indices in the final dimension:
x = np.zeros((512, 270, 1, 20))
y = x[..., [4, 10]] # the 5th and 11th indices in the final dimension
print(y.shape)
# (512,270,1,2)

Categories