I have a 2D MaskedArray X and I want to randomly select 30 non-masked elements from it and return their indices idx.
The goal is that I could use the indices to read / set values efficiently later in my code:
selected = X[idx]
X[idx] = a # some arrays with the same length
What is the most efficient way of generating idx?
Ok I have figured out a way... if anyone has a better approach please let me know.
pos = np.random.choice(X.count(), size=30)
idx = tuple(np.take((~X.mask).nonzero(), pos, axis=1))
I solved a similar task, by passing true/false of the array mask as weights to np.random.choice:
import numpy.ma as ma
import numpy as np
data = np.array([[0,0,0,1],[0,1,3,2],[2,0,0,3],[0,3,4,1]])
numSample=2
masked = ma.masked_where(data<3, data)
weights=~masked.mask + 0 #Assign False = 0, True = 1
normalized = weights.ravel()/float(weights.sum())
index=np.random.choice(
masked.size,
size=numSample,
replace=False,
p=normalized
)
idx, idy = np.unravel_index(index, data.shape)
Related
I have a 2d numpy array size 100 x 100.
I want to randomly sample values from the "inside" 80 x 80 values so that I can exclude values which are influenced by edge effects. I want to sample from row 10 to row 90 and within that from column 10 to column 90.
However, importantly, I need to retain the original index values from the 100 x 100 grid, so I can't just trim the dataset and move on. If I do that, I am not really solving the edge effect problem because this is occurring within a loop with multiple iterations.
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))
row_idx =np.arange(min_select,max_select)
col_idx = np.arange(min_select,max_select)
indices_random = ????? Somehow randomly sample from new_abundances only within the rows and columns of row_idx and col_idx set.
What I ultimately need is a list of 250 random indices selected from within the flattened new_abundances array. I need to keep the new_abundances array as 2d to identify the "edges" but once that is done, I need to flatten it to get the indices which are randomly selected.
Desired output:
An 1d list of indices from a flattened new_abundances array.
Woudl something like solve your problem?
import numpy as np
np.random.seed(0)
mat = np.random.random(size=(100,100))
x_indices = np.random.randint(low=10, high=90, size=250)
y_indices = np.random.randint(low=10, high=90, size=250)
coordinates = list(zip(x_indices,y_indices))
flat_mat = mat.flatten()
flat_index = x_indices * 100 + y_indices
Then you can access elements using any value from the coordinates list, e.g. mat[coordinates[0]] returns the the matrix value at coordinates[0]. Value of coordinates[0] is (38, 45) in my case. If the matrix is flattened, you can calculate the 1D index of the corresponding element. In this case, mat[coordinates[0]] == flat_mat[flat_index[0]] holds, where flat_index[0]==3845=100*38+45
Please also note that multiple sampling of the original data is possible this way.
Using your notation:
import numpy as np
np.random.seed(0)
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))
x_indices = np.random.randint(low=min_select, high=max_select, size=250)
y_indices = np.random.randint(low=min_select, high=max_select, size=250)
coords = list(zip(x_indices,y_indices))
flat_new_abundances = new_abundances.flatten()
flat_index = x_indices * gridsize + y_indices
I want to find a faster way to get the index of the nearest element in array of shape (k,l) to each element in another array of shape (n,l), I found 2 solutions but I think the performance can be improved.
here is an example
import numpy as np
import numpy.matlib as ml
frames = np.random.random([1000,2])
codeBook = np.random.random([8,2])
I
dist = np.zeros([frames .shape[0],codeBook.shape[0]])
for i in range(8):
difference = frames - ml.repmat(codeBook[i,:],frames.shape[0],1)
dist[:,i] = np.sqrt(np.sum(difference**2,1))
idx = np.argmin(dist,axis=1)
II
diffToCB = frames - np.rot90(ml.repmat(codeBook,frames.shape[0],1).reshape(-1,8,2),axes=(1,0))
idx = np.argmin(np.sqrt(np.einsum('ijk,ijk->ij', diffToCB, diffToCB)) , axis=0)
I have a 2d array/matrix like this, how would I randomly pick the value from this 2D matrix, for example getting value like [-62, 29.23]. I looked at the numpy.choice but it is built for 1d array.
The following is my example with 4 rows and 8 columns
Space_Position=[
[[-62,29.23],[-49.73,29.23],[-31.82,29.23],[-14.2,29.23],[3.51,29.23],[21.21,29.23],[39.04,29.23],[57.1,29.23]],
[[-62,11.28],[-49.73,11.28],[-31.82,11.28],[-14.2,11.28],[3.51,11.28],[21.21,11.28] ,[39.04,11.28],[57.1,11.8]],
[[-62,-5.54],[-49.73,-5.54],[-31.82,-5.54] ,[-14.2,-5.54],[3.51,-5.54],[21.21,-5.54],[39.04,-5.54],[57.1,-5.54]],
[[-62,-23.1],[-49.73,-23.1],[-31.82,-23.1],[-14.2,-23.1],[3.51,-23.1],[21.21,-23.1],[39.04,-23.1] ,[57.1,-23.1]]
]
In the answers the following solution was given:
random_index1 = np.random.randint(0, Space_Position.shape[0])
random_index2 = np.random.randint(0, Space_Position.shape[1])
Space_Position[random_index1][random_index2]
this indeed works to give me one sample, how about more than one sample like what np.choice() does?
Another way I am thinking is to tranform the matrix into a array instead of matrix like,
Space_Position=[
[-62,29.23],[-49.73,29.23],[-31.82,29.23],[-14.2,29.23],[3.51,29.23],[21.21,29.23],[39.04,29.23],[57.1,29.23], ..... ]
and at last use np.choice(), however I could not find the ways to do the transformation, np.flatten() makes the array like
Space_Position=[-62,29.23,-49.73,29.2, ....]
Just use a random index (in your case 2 because you have 3 dimensions):
import numpy as np
Space_Position = np.array(Space_Position)
random_index1 = np.random.randint(0, Space_Position.shape[0])
random_index2 = np.random.randint(0, Space_Position.shape[1])
Space_Position[random_index1, random_index2] # get the random element.
The alternative is to actually make it 2D:
Space_Position = np.array(Space_Position).reshape(-1, 2)
and then use one random index:
Space_Position = np.array(Space_Position).reshape(-1, 2) # make it 2D
random_index = np.random.randint(0, Space_Position.shape[0]) # generate a random index
Space_Position[random_index] # get the random element.
If you want N samples with replacement:
N = 5
Space_Position = np.array(Space_Position).reshape(-1, 2) # make it 2D
random_indices = np.random.randint(0, Space_Position.shape[0], size=N) # generate N random indices
Space_Position[random_indices] # get N samples with replacement
or without replacement:
Space_Position = np.array(Space_Position).reshape(-1, 2) # make it 2D
random_indices = np.arange(0, Space_Position.shape[0]) # array of all indices
np.random.shuffle(random_indices) # shuffle the array
Space_Position[random_indices[:N]] # get N samples without replacement
Refering to numpy.random.choice:
Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its axis keyword.
The genrator documentation is linked here numpy.random.Generator.choice.
Using this knowledge. You can create a generator and then "choice" from your array:
rng = np.random.default_rng() #creates the generator ==> Generator(PCG64) at 0x2AA703BCE50
N = 3 #Number of Choices
a = np.array(Space_Position) #makes sure, a is an ndarray and numpy-supported
s = a.shape #(4,8,2)
a = a.reshape((s[0] * s[1], s[2])) #makes your array 2 dimensional keeping the last dimension seperated
a.shape #(32, 2)
b = rng.choice(a, N, axis=0, replace=False) #returns N choices of a in array b, e.g. narray([[ 57.1 , 11.8 ], [ 21.21, -5.54], [ 39.04, 11.28]])
#Note: replace=False prevents having the same entry several times in the result
Space_Position[np.random.randint(0, len(Space_Position))]
[np.random.randint(0, len(Space_Position))]
gives you what you want
Having an array and a mask for this array, using fancy indexing, it is easy to select only the data of the array corresponding to the mask.
import numpy as np
a = np.arange(20).reshape(4, 5)
mask = [0, 2]
data = a[:, mask]
But is there a rapid way to select all the data of the array that does not belong to the mask (i.e. the mask is the data we want to reject)?
I tried to find a general solution going through an intermediate boolean array, but I'm sure there is something really easier.
mask2 = np.ones(a.shape)==1
mask2[:, mask]=False
data = a[mask2].reshape(a.shape[0], a.shape[1]-size(mask))
Thank you
Have a look at numpy.invert, numpy.bitwise_not, numpy.logical_not, or more concisely ~mask. (They all do the same thing, in this case.)
As a quick example:
import numpy as np
x = np.arange(10)
mask = x > 5
print x[mask]
print x[~mask]
I have a 2 dimensional NumPy array. I know how to get the maximum values over axes:
>>> a = array([[1,2,3],[4,3,1]])
>>> amax(a,axis=0)
array([4, 3, 3])
How can I get the indices of the maximum elements? I would like as output array([1,1,0]) instead.
>>> a.argmax(axis=0)
array([1, 1, 0])
>>> import numpy as np
>>> a = np.array([[1,2,3],[4,3,1]])
>>> i,j = np.unravel_index(a.argmax(), a.shape)
>>> a[i,j]
4
argmax() will only return the first occurrence for each row.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html
If you ever need to do this for a shaped array, this works better than unravel:
import numpy as np
a = np.array([[1,2,3], [4,3,1]]) # Can be of any shape
indices = np.where(a == a.max())
You can also change your conditions:
indices = np.where(a >= 1.5)
The above gives you results in the form that you asked for. Alternatively, you can convert to a list of x,y coordinates by:
x_y_coords = zip(indices[0], indices[1])
There is argmin() and argmax() provided by numpy that returns the index of the min and max of a numpy array respectively.
Say e.g for 1-D array you'll do something like this
import numpy as np
a = np.array([50,1,0,2])
print(a.argmax()) # returns 0
print(a.argmin()) # returns 2
And similarly for multi-dimensional array
import numpy as np
a = np.array([[0,2,3],[4,30,1]])
print(a.argmax()) # returns 4
print(a.argmin()) # returns 0
Note that these will only return the index of the first occurrence.
v = alli.max()
index = alli.argmax()
x, y = index/8, index%8