how to randomly sample in 2D matrix in numpy - python

I have a 2d array/matrix like this, how would I randomly pick the value from this 2D matrix, for example getting value like [-62, 29.23]. I looked at the numpy.choice but it is built for 1d array.
The following is my example with 4 rows and 8 columns
Space_Position=[
[[-62,29.23],[-49.73,29.23],[-31.82,29.23],[-14.2,29.23],[3.51,29.23],[21.21,29.23],[39.04,29.23],[57.1,29.23]],
[[-62,11.28],[-49.73,11.28],[-31.82,11.28],[-14.2,11.28],[3.51,11.28],[21.21,11.28] ,[39.04,11.28],[57.1,11.8]],
[[-62,-5.54],[-49.73,-5.54],[-31.82,-5.54] ,[-14.2,-5.54],[3.51,-5.54],[21.21,-5.54],[39.04,-5.54],[57.1,-5.54]],
[[-62,-23.1],[-49.73,-23.1],[-31.82,-23.1],[-14.2,-23.1],[3.51,-23.1],[21.21,-23.1],[39.04,-23.1] ,[57.1,-23.1]]
]
In the answers the following solution was given:
random_index1 = np.random.randint(0, Space_Position.shape[0])
random_index2 = np.random.randint(0, Space_Position.shape[1])
Space_Position[random_index1][random_index2]
this indeed works to give me one sample, how about more than one sample like what np.choice() does?
Another way I am thinking is to tranform the matrix into a array instead of matrix like,
Space_Position=[
[-62,29.23],[-49.73,29.23],[-31.82,29.23],[-14.2,29.23],[3.51,29.23],[21.21,29.23],[39.04,29.23],[57.1,29.23], ..... ]
and at last use np.choice(), however I could not find the ways to do the transformation, np.flatten() makes the array like
Space_Position=[-62,29.23,-49.73,29.2, ....]

Just use a random index (in your case 2 because you have 3 dimensions):
import numpy as np
Space_Position = np.array(Space_Position)
random_index1 = np.random.randint(0, Space_Position.shape[0])
random_index2 = np.random.randint(0, Space_Position.shape[1])
Space_Position[random_index1, random_index2] # get the random element.
The alternative is to actually make it 2D:
Space_Position = np.array(Space_Position).reshape(-1, 2)
and then use one random index:
Space_Position = np.array(Space_Position).reshape(-1, 2) # make it 2D
random_index = np.random.randint(0, Space_Position.shape[0]) # generate a random index
Space_Position[random_index] # get the random element.
If you want N samples with replacement:
N = 5
Space_Position = np.array(Space_Position).reshape(-1, 2) # make it 2D
random_indices = np.random.randint(0, Space_Position.shape[0], size=N) # generate N random indices
Space_Position[random_indices] # get N samples with replacement
or without replacement:
Space_Position = np.array(Space_Position).reshape(-1, 2) # make it 2D
random_indices = np.arange(0, Space_Position.shape[0]) # array of all indices
np.random.shuffle(random_indices) # shuffle the array
Space_Position[random_indices[:N]] # get N samples without replacement

Refering to numpy.random.choice:
Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its axis keyword.
The genrator documentation is linked here numpy.random.Generator.choice.
Using this knowledge. You can create a generator and then "choice" from your array:
rng = np.random.default_rng() #creates the generator ==> Generator(PCG64) at 0x2AA703BCE50
N = 3 #Number of Choices
a = np.array(Space_Position) #makes sure, a is an ndarray and numpy-supported
s = a.shape #(4,8,2)
a = a.reshape((s[0] * s[1], s[2])) #makes your array 2 dimensional keeping the last dimension seperated
a.shape #(32, 2)
b = rng.choice(a, N, axis=0, replace=False) #returns N choices of a in array b, e.g. narray([[ 57.1 , 11.8 ], [ 21.21, -5.54], [ 39.04, 11.28]])
#Note: replace=False prevents having the same entry several times in the result

Space_Position[np.random.randint(0, len(Space_Position))]
[np.random.randint(0, len(Space_Position))]
gives you what you want

Related

Random sample from specific rows and columns of a 2d numpy array (essentially sampling by ignoring edge effects)

I have a 2d numpy array size 100 x 100.
I want to randomly sample values from the "inside" 80 x 80 values so that I can exclude values which are influenced by edge effects. I want to sample from row 10 to row 90 and within that from column 10 to column 90.
However, importantly, I need to retain the original index values from the 100 x 100 grid, so I can't just trim the dataset and move on. If I do that, I am not really solving the edge effect problem because this is occurring within a loop with multiple iterations.
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))
row_idx =np.arange(min_select,max_select)
col_idx = np.arange(min_select,max_select)
indices_random = ????? Somehow randomly sample from new_abundances only within the rows and columns of row_idx and col_idx set.
What I ultimately need is a list of 250 random indices selected from within the flattened new_abundances array. I need to keep the new_abundances array as 2d to identify the "edges" but once that is done, I need to flatten it to get the indices which are randomly selected.
Desired output:
An 1d list of indices from a flattened new_abundances array.
Woudl something like solve your problem?
import numpy as np
np.random.seed(0)
mat = np.random.random(size=(100,100))
x_indices = np.random.randint(low=10, high=90, size=250)
y_indices = np.random.randint(low=10, high=90, size=250)
coordinates = list(zip(x_indices,y_indices))
flat_mat = mat.flatten()
flat_index = x_indices * 100 + y_indices
Then you can access elements using any value from the coordinates list, e.g. mat[coordinates[0]] returns the the matrix value at coordinates[0]. Value of coordinates[0] is (38, 45) in my case. If the matrix is flattened, you can calculate the 1D index of the corresponding element. In this case, mat[coordinates[0]] == flat_mat[flat_index[0]] holds, where flat_index[0]==3845=100*38+45
Please also note that multiple sampling of the original data is possible this way.
Using your notation:
import numpy as np
np.random.seed(0)
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))
x_indices = np.random.randint(low=min_select, high=max_select, size=250)
y_indices = np.random.randint(low=min_select, high=max_select, size=250)
coords = list(zip(x_indices,y_indices))
flat_new_abundances = new_abundances.flatten()
flat_index = x_indices * gridsize + y_indices

creating a multi-dimensional numpy array where each dimension of the array is of length L

I'm looking to create a numpy array of d dimensions, where each dimension is of length n.
For example:
np.zeros((5,2)), will give me a 5-row, 2-column array of zeros. What I'm looking for is a 5x5 array. Now I know I can simply do np.zeros((5,5)), but my goal is to generate the array dynamically:
dims = 4
elem_length = 10
#generate the array
#results in a 10x10x10x10 numpy array
Another option is to create single-dimensional tuples and join them all:
shp = ()
for i in range(dims):
shp = shp + (elem_length,)
new_arr = np.zeros(shp)
But that's not python-y at all. Is there a better way?
I'm not sure about numpy way to generate a dxdxdx... array, but if you wanted to generate the shape tuple you could try list comprehension.
EX:
shape_list = [elem_length for _ in range(dims)]
shape_tuple = tuple(shape_list)
print(shape_tuple)
>> (elem_length, elem_length, elem_length)
dn_arr = np.zeros(shape_tuple)
print(dn_arr.shape)
>> (elem_length, elem_length, elem_length)

Mapped averaging from 2D to higher dimensional numpy arrays

I have some 2D numpy data that I want to translate to a higher dimensional array using a mapped average (or some other statistic).
The source data is 2D with a MxN shape, and I want to map this onto a 4D array (AxBxCxD shape). The indicies mapping from the source data to each of the four dimensions are created from either 2D (MxN shaped) variables or tiled 1D (Mx1 shaped) variables.
Below is an a working example of what I am trying to do. Although this seems to work, I would like to know if there is a function that would allow me to:
1) do-away with the for-loops and
2) allow for a variable number of dimensions (3D, 4D, 5D, etc) for the destination array(s).
import numpy as np
# create data I want to conditionally average (MxN array)
zz = np.random.rand(100,10)
# create variables used to define binning for conditional averaging
# each variable defines one dimension of the final 4 dimensional array
aa = np.random.rand(100,10)*10
bb = np.random.rand(100,1) + 5
cc = np.random.rand(100,1) * 25
dd = np.random.rand(100,1)* 50 + 100
# define binning boundaries
binsaa = np.array([2, 4, 6, 8])
binsbb = np.array([5.1, 5.5, 5.7])
binscc = np.array([12])
binsdd = np.array([110, 133])
# create bin indicies
idaa = np.digitize(aa,binsaa,right=True)
idbb = np.digitize(bb,binsbb,right=True)
idcc = np.digitize(cc,binscc,right=True)
iddd = np.digitize(dd,binsdd,right=True)
# tile some of the indicies so they match the shape of the data to be averaged
idbbt = np.tile(idbb,[1,10])
idcct = np.tile(idcc,[1,10])
idddt = np.tile(iddd,[1,10])
# make empty destination 4 dimensional arrays
avgxx = np.zeros([5,4,2,3])
cntxx = np.zeros([5,4,2,3])
# use for loops to average original data and place in 4-dim array
for ixa in range(5):
for ixb in range(4):
for ixc in range(2):
for ixd in range(3):
idz = (idaa == ixa) & (idbbt == ixb) & (idcct == ixc) & (idddt == ixd)
avgxx[ixa,ixb,ixc,ixd] = np.average(zz[idz])
cntxx[ixa,ixb,ixc,ixd] = np.sum(idz)
print(avgxx[:,:,:,:])
print(cntxx[:,:,:,:])

Numpy, how to reshape a vector to multi column array

I am wondering how to use np.reshape to reshape a long vector into n columns array without giving the row numbers.
Normally I can find out the row number by len(a)//n:
a = np.arange(0, 10)
n = 2
b = a.reshape(len(a)//n,n)
If there a more direct way without using len(a)//n?
You can use -1 on one dimension, numpy will figure out what this number should be:
a = np.arange(0, 10)
n = 2
b = a.reshape(-1, n)
The doc is pretty clear about this feature: https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html
One shape dimension can be -1. In this case, the value is inferred
from the length of the array and remaining dimensions.

Create random submatrix of large matrix in python

I have the following code to create a random subset (of size examples) of a large set:
def sampling(input_set):
tmp = random.sample(input_set, examples)
return tmp
The problem is that my input is a large matrix, so input_set.shape = (n,m). However, sampling(input_set) is a list, while I want it to be a submatrix of size = (examples, m), not a list of length examples of vectors of size m.
I modified my code to do this:
def sampling(input_set):
tmp = random.sample(input_set, examples)
sample = input_set[0:examples]
for i in range(examples):
sample[i] = tmp[i]
return sample
This works, but is there a more elegant/better way to accomplish what I am trying to do?
Use numpy as follow to create a n x m matrix (assuming input_set is a list)
import numpy as np
input_matrix = np.array(input_set).reshape(n,m)
Ok, if i understand correctly the question you just want to drop the last couple of rolls (n - k) so:
sample = input_matrix[:k - n]
must do the job for you.
Don't know if still interested in, but maybe you do something like this:
#select a random 6x6 matrix with items -10 / 10
import numpy as np
mat = np.random.randint(-10,10,(6,6))
print (mat)
#select a random int between 0 and 5
startIdx = np.random.randint(0,5)
print(startIdx)
#extracy submatrix (will be less than 3x3 id the index is out of bounds)
print(mat[startIdx:startIdx+3,startIdx:startIdx+3])

Categories