Suppose you have a 3d numpy array, how can I build the mean of the N maximum elements along a given axis? So basically something like:
a = np.random.randint(10, size=(100,100,100)) #axes x,y,z
result = np.mean(a, axis = 2)
however, I want to restrict the mean to the N maximum values along axis z. To illustrate the issue, this is a solution using looping:
a = np.random.randint(10, size=(100,100,100)) #axes x,y,z
N = 5
maxima = np.zeros((100,100,N)) #container for mean of N max values along axis z
for x in range(100): #loop through x axis
for y in range(100): #loop through y axis
max_idx = a[x, y, :].argsort()[-N:] #indices of N max values along z axis
maxima[x, y, :] = a[x, y , max_idx] #extract values
result = np.mean(maxima, axis = 2) #take the mean
I would like to achieve the same result with multidimensional indexing.
Here's one approach using np.argpartition to get the max N indices and then advanced-indexing for extracting and computing the desired average values -
# Get max N indices along the last axis
maxN_indx = np.argpartition(a,-N, axis=-1)[...,-N:]
# Get a list of indices for use in advanced-indexing into input array,
# alongwith the max N indices along the last axis
all_idx = np.ogrid[tuple(map(slice, a.shape))]
all_idx[-1] = maxN_indx
# Index and get the mean along the last axis
out = a[all_idx].mean(-1)
Last step could also be expressed in an explicit way for advanced-indexing, like so -
m,n = a.shape[:2]
out = a[np.arange(m)[:,None,None], np.arange(n)[:,None], maxN_indx].mean(-1)
Related
I have a 2d numpy array size 100 x 100.
I want to randomly sample values from the "inside" 80 x 80 values so that I can exclude values which are influenced by edge effects. I want to sample from row 10 to row 90 and within that from column 10 to column 90.
However, importantly, I need to retain the original index values from the 100 x 100 grid, so I can't just trim the dataset and move on. If I do that, I am not really solving the edge effect problem because this is occurring within a loop with multiple iterations.
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))
row_idx =np.arange(min_select,max_select)
col_idx = np.arange(min_select,max_select)
indices_random = ????? Somehow randomly sample from new_abundances only within the rows and columns of row_idx and col_idx set.
What I ultimately need is a list of 250 random indices selected from within the flattened new_abundances array. I need to keep the new_abundances array as 2d to identify the "edges" but once that is done, I need to flatten it to get the indices which are randomly selected.
Desired output:
An 1d list of indices from a flattened new_abundances array.
Woudl something like solve your problem?
import numpy as np
np.random.seed(0)
mat = np.random.random(size=(100,100))
x_indices = np.random.randint(low=10, high=90, size=250)
y_indices = np.random.randint(low=10, high=90, size=250)
coordinates = list(zip(x_indices,y_indices))
flat_mat = mat.flatten()
flat_index = x_indices * 100 + y_indices
Then you can access elements using any value from the coordinates list, e.g. mat[coordinates[0]] returns the the matrix value at coordinates[0]. Value of coordinates[0] is (38, 45) in my case. If the matrix is flattened, you can calculate the 1D index of the corresponding element. In this case, mat[coordinates[0]] == flat_mat[flat_index[0]] holds, where flat_index[0]==3845=100*38+45
Please also note that multiple sampling of the original data is possible this way.
Using your notation:
import numpy as np
np.random.seed(0)
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))
x_indices = np.random.randint(low=min_select, high=max_select, size=250)
y_indices = np.random.randint(low=min_select, high=max_select, size=250)
coords = list(zip(x_indices,y_indices))
flat_new_abundances = new_abundances.flatten()
flat_index = x_indices * gridsize + y_indices
I am trying to calculate the average of a 3D array between two indices on the 1st axis. The start and end indices vary from cell to cell and are represented by two separate 2D arrays that are the same shape as a slice of the 3D array.
I have managed to implement a piece of code that loops through the pixels of my 3D array, but this method is painfully slow in the case of my array with a shape of (70, 550, 350). Is there a way to vectorise the operation using numpy or xarray (the arrays are stored in an xarray dataset)?
Here is a snippet of what I would like to optimise:
# My 3D raster containing values; shape = (time, x, y)
values = np.random.rand(10, 55, 60)
# A 2D raster containing start indices for the averaging
start_index = np.random.randint(0, 4, size=(values.shape[1], values.shape[2]))
# A 2D raster containing end indices for the averaging
end_index = np.random.randint(5, 9, size=(values.shape[1], values.shape[2]))
# Initialise an array that will contain results
mean_array = np.zeros_like(values[0, :, :])
# Loop over 3D raster to calculate the average between indices on axis 0
for i in range(0, values.shape[1]):
for j in range(0, values.shape[2]):
mean_array[i, j] = np.mean(values[start_index[i, j]: end_index[i, j], i, j], axis=0)
One way to do this without loops is to zero-out the entries you don't want to use, compute the sum of the remaining items, then divide by the number of nonzero entries. For example:
i = np.arange(values.shape[0])[:, None, None]
mean_array_2 = np.where((i >= start_index) & (i < end_index), values, 0).sum(0) / (end_index - start_index)
np.allclose(mean_array, mean_array_2)
# True
Note that this assumes that the indices are in the range 0 <= i < values.shape[0]; if this is not the case you can use np.clip or other means to standardize the indices before computation.
So I have a large list of points.
I have split those points up into the x coordinates and the y coordinates and then further split them into groups of 1000.
x = [points_Cartesian[x: x + 1000, 0] for x in range(0, len(points_Cartesian), 1000)]
(The y coordinates looks the same but with y instead of x.)
I am trying to turn the cartesian points into polar and to do so I must square every item in x and every item in y.
for sublist1 in x:
temp1 = []
for inte1 in sublist1:
temp1.append(inte1**2)
xSqua.append(temp1)
After that I add both of the Squared values together and square root them to get rad.
rad = np.sqrt(xSqua + ySqua)
The problem is, I started with 10,000 points and somewhere in this code it gets trimmed down to 1,000.
Does anyone know what the error is and how I fix it?
You're already using numpy. You can reshape matrices using numpy.reshape() and square the entire array elementwise using the ** operator on the entire array and your code will be much faster than iterating.
For example, let's say we have a 10000x3 points_cartesian
points_Cartesian = np.random.random((10000,2))
# reshape to 1000 columns, as many rows as required
xpts = points_Cartesian[:, 0].reshape((-1, 1000))
ypts = points_Cartesian[:, 1].reshape((-1, 1000))
# elementwise square using **
rad = np.sqrt(xpts**2 + ypts**2)
ang = np.arctan2(ypts, xpts)
Now rad and ang are 10x1000 arrays.
I've an image of about 8000x9000 size as a numpy matrix. I also have a list of indices in a numpy 2xn matrix. These indices are fractional as well as may be out of image size. I need to interpolate the image and find the values for the given indices. If the indices fall outside, I need to return numpy.nan for them. Currently I'm doing it in for loop as below
def interpolate_image(image: numpy.ndarray, indices: numpy.ndarray) -> numpy.ndarray:
"""
:param image:
:param indices: 2xN matrix. 1st row is dim1 (rows) indices, 2nd row is dim2 (cols) indices
:return:
"""
# Todo: Vectorize this
M, N = image.shape
num_indices = indices.shape[1]
interpolated_image = numpy.zeros((1, num_indices))
for i in range(num_indices):
x, y = indices[:, i]
if (x < 0 or x > M - 1) or (y < 0 or y > N - 1):
interpolated_image[0, i] = numpy.nan
else:
# Todo: Do Bilinear Interpolation. For now nearest neighbor is implemented
interpolated_image[0, i] = image[int(round(x)), int(round(y))]
return interpolated_image
But the for loop is taking huge amount of time (as expected). How can I vectorize this? I found scipy.interpolate.interp2d, but I'm not able to use it. Can someone explain how to use this or any other method is also fine. I also found this, but again it is not according to my requirements. Given x and y indices, these generated interpolated matrices. I don't want that. For the given indices, I just want the interpolated values i.e. I need a vector output. Not a matrix.
I tried like this, but as said above, it gives a matrix output
f = interpolate.interp2d(numpy.arange(image.shape[0]), numpy.arange(image.shape[1]), image, kind='linear')
interp_image_vect = f(indices[:,0], indices[:,1])
RuntimeError: Cannot produce output of size 73156608x73156608 (size too large)
For now, I've implemented nearest-neighbor interpolation. scipy interp2d doesn't have nearest neighbor. It would be good if the library function as nearest neighbor (so I can compare). If not, then also fine.
It looks like scipy.interpolate.RectBivariateSpline will do the trick:
from scipy.interpolate import RectBivariateSpline
image = # as given
indices = # as given
spline = RectBivariateSpline(numpy.arange(M), numpy.arange(N), image)
interpolated = spline(indices[0], indices[1], grid=False)
This gets you the interpolated values, but it doesn't give you nan where you need it. You can get that with where:
nans = numpy.zeros(interpolated.shape) + numpy.nan
x_in_bounds = (0 <= indices[0]) & (indices[0] < M)
y_in_bounds = (0 <= indices[1]) & (indices[1] < N)
bounded = numpy.where(x_in_bounds & y_in_bounds, interpolated, nans)
I tested this with a 2624x2624 image and 100,000 points in indices and all told it took under a second.
I have the following 2-D Numpy arrays:
X # X.shape = (11688, 144)
Y # Y.shape = (2912, 1000)
The first array is populated with atmospheric data, and the second array is populated with random index values from 0 to X.shape[0]-1. I want to index the rows of X with each column of Y to yield a 3-D array result, where result.shape = (2912, 1000, 144), and I want to do this without looping.
My current approach is:
result = X[Y,:]
but this one line of code can take more than 10 seconds to execute depending on the shape of the 0th axis of Y.
Is there a more optimal way to perform this type of indexing in order to speed up its execution?
EDIT: Here's a more complete example of what I'm trying to accomplish.
X = np.random.rand(11688, 144) # Time-by-longitude array of atmospheric data
t = np.arange(X.shape[0]) # Time vector
# Populate array of randomly drawn time steps
Y = np.zeros((2912, 1000), dtype='i')
for i in xrange(1000):
Y[:,i] = np.random.choice(t, 2912)
# Index X with each column of Y
result = X[Y,:]