I have a 2D numpy array of 2D points:
np.random.seed(0)
a = np.random.rand(3, 4, 2) # each value is a 2D point
I would like to sort each row by the norm of every point
norms = np.linalg.norm(a, axis=2) # shape(3, 4)
indices = np.argsort(norms, axis=0) # indices of each sorted row
Now I would like to create an array with the same shape and values as a. that will have each row of 2D points sorted by their norm.
How can I achieve that?
I tried variations of np.take & np.take_along_axis but with no success.
for example:
np.take(a, indices, axis=1) # shape (3,3,4,2)
This samples a 3 times, once for each row in indices.
I would like to sample a just once. each row in indices has the columns that should be sampled from the corresponding row.
If I understand you correctly, you want this:
norms = np.linalg.norm(a,axis=2) # shape(3,4)
indices = np.argsort(norms , axis=1)
np.take_along_axis(a, indices[:,:,None], axis=1)
output for your example:
[[[0.4236548 0.64589411]
[0.60276338 0.54488318]
[0.5488135 0.71518937]
[0.43758721 0.891773 ]]
[[0.07103606 0.0871293 ]
[0.79172504 0.52889492]
[0.96366276 0.38344152]
[0.56804456 0.92559664]]
[[0.0202184 0.83261985]
[0.46147936 0.78052918]
[0.77815675 0.87001215]
[0.97861834 0.79915856]]]
Related
I'm looking for a pythonic way of selecting all rows from a 2D dataset of size (nrows, ncols) in such way that I would like to keep only those rows for which all the values along fall between 5% and 95% percentile values. If we use np.percentile(dataset, 5, axis=0), we obtain an array of values with the size ncols.
In the case of 1D arrays, writing something like X[X>0] is trivial. What is the approach when you want to generalize to 2D or higher dimensions?
X[X>np.percentile(dataset, 5, axis=0)]
If I understand correctly in your 2D example one can use np.all() to find the rows, where the criteria is satisfied. Then you can use the syntax like X[X>0] (see below for an example).
I am not sure how to generalize to higher dimensions, but maybe np.take (https://numpy.org/doc/stable/reference/generated/numpy.take.html) is what you are looking for?
2D example:
# Setup
import numpy as np
np.random.seed(100)
dataset = np.random.normal(size=(10,2))
display(dataset)
array([[-1.74976547, 0.3426804 ],
[ 1.1530358 , -0.25243604],
[ 0.98132079, 0.51421884],
[ 0.22117967, -1.07004333],
[-0.18949583, 0.25500144],
[-0.45802699, 0.43516349],
[-0.58359505, 0.81684707],
[ 0.67272081, -0.10441114],
[-0.53128038, 1.02973269],
[-0.43813562, -1.11831825]])
# Indexing
lo = np.percentile(dataset, 5, axis=0)
hi = np.percentile(dataset, 95, axis=0)
idx = (lo < data) & (hi > data) # turns into a 1d index
dataset[np.all(idx, axis=1)]
array([[-1.74976547, 0.3426804 ],
[ 1.1530358 , -0.25243604],
[-0.45802699, 0.43516349],
[ 0.67272081, -0.10441114],
[-0.53128038, 1.02973269]])
Say I have one 2d numpy array X with shape (3,3) and one numpy array Y with shape (3,) where
X = np.array([[0,1,2],
[3,4,5],
[1,9,2]])
Y = np.array([[1,0,1]])
How can I create a numpy array, Z for example, from multiplying X,Y element-wise and then summation row-wise?
multiplying element-wise would yield: 0,0,2, 3,0,5, 1,0,2
then, adding each row would yield:
Z = np.array([2,8,3])
I have tried variations of
Z = np.sum(X * Y) --> adds all elements of entire array, not row-wise.
I know I can use a forloop but the dataset is very large and so I am trying to find a more efficient numpy-specific way to perform the operation. Is this possible?
You can do the following:
sum_row = np.sum(X*Y, axis=1) # axis=0 for columnwise
I want to summarize a 3d array dat using indices contained in a 2d array idx.
Consider the example below. For each margin along dat[:, :, i], I want to compute the median according to some index idx. The desired output (out) is a 2d array, whose rows record the index and columns record the margin. The following code works but is not very efficient. Any suggestions?
import numpy as np
dat = np.arange(12).reshape(2, 2, 3)
idx = np.array([[0, 0], [1, 2]])
out = np.empty((3, 3))
for i in np.unique(idx):
out[i,] = np.median(dat[idx==i], axis = 0)
print(out)
Output:
[[ 1.5 2.5 3.5]
[ 6. 7. 8. ]
[ 9. 10. 11. ]]
To visualize the problem better, I will refer to the 2x2 dimensions of the array as the rows and columns, and the 3 dimension as depth. I will refer to vectors along the 3rd dimension as "pixels" (pixels have length 3), and planes along the first two dimensions as "channels".
Your loop is accumulating a set of pixels selected by the mask idx == i, and taking the median of each channel within that set. The result is an Nx3 array, where N is the number of distinct incides that you have.
One day, generalized ufuncs will be ubiquitous in numpy, and np.median will be such a function. On that day, you will be able to use reduceat magic1 to do something like
unq, ind = np.unique(idx, return_inverse=True)
np.median.reduceat(dat.reshape(-1, dat.shape[-1]), np.r_[0, np.where(np.diff(unq[ind]))[0]+1])
1 See Applying operation to unevenly split portions of numpy array for more info on the specific type of magic.
Since this is not currently possible, you can use scipy.ndimage.median instead. This version allows you to compute medians over a set of labeled areas in an array, which is exactly what you have with idx. This method assumes that your index array contains N densely packed values, all of which are in range(N). Otherwise the reshaping operations will not work properly.
If that is not the case, start by transforming idx:
_, ind = np.unique(idx, return_inverse=True)
idx = ind.reshape(idx.shape)
OR
idx = np.unique(idx, return_inverse=True)[1].reshape(idx.shape)
Since you are actually computing a separate median for each region and channel, you will need to have a set of labels for each channel. Flesh out idx to have a distinct set of indices for each channel:
chan = dat.shape[-1]
offset = idx.max() + 1
index = np.stack([idx + i * offset for i in range(chan)], axis=-1)
Now index has an identical set of regions defined in each channel, which you can use in scipy.ndimage.median:
out = scipy.ndimage.median(dat, index, index=range(offset * chan)).reshape(chan, offset).T
The input labels must be densely packed from zero to offset * chan for index=range(offset * chan) to work properly, and the reshape operation to have the right number of elements. The final transpose is just an artifact of how the labels are arranged.
Here is the complete product, along with an IDEOne demo of the result:
import numpy as np
from scipy.ndimage import median
dat = np.arange(12).reshape(2, 2, 3)
idx = np.array([[0, 0], [1, 2]])
def summarize(dat, idx):
idx = np.unique(idx, return_inverse=True)[1].reshape(idx.shape)
chan = dat.shape[-1]
offset = idx.max() + 1
index = np.stack([idx + i * offset for i in range(chan)], axis=-1)
return median(dat, index, index=range(offset * chan)).reshape(chan, offset).T
print(summarize(dat, idx))
I'd like to take a 1x3 matrix of 'row' values, where a = ([1],[3],[5]) and a 3x1 matrix of column values, b = ([1,4,7]) , and create a 3x3 matrix of (row, column) values so that the final matrix would look like
([(1,1) (1,4) (1,7)],
[(3,1) (3,4) (3,7)],
[(5,1) (5,4) (5,7)])
Is there a way to do this without using a for loop?
import numpy as np
a = np.array([1,4,7]).reshape((3,1))
b = np.array([1,3,5]).reshape((1,3))
I have a dataset array A. A is nĂ—2. It can be plotted on the x and y axis.
A[:,1] gets me all of the Y values ans A[:,0] gets me all the x values.
Now, I have a few other dataset arrays that are similar to A. X values are the same for these similar arrays. How do I calculate the standard deviation of the datasets? There should be a std value for each X. In the end my result std should have a length of n.
I can do this the manual way with loops but I'm not sure how to do this using NumPy in a pythonic and simple manner.
here are some sample data:
A=[[0,2.54],[1,254.5],[2,-43]]
B=[[0,3.34],[1,154.5],[2,-93]]
std_Array=[std(2.54,3.54),std(254.5,154.5),std(-43,-93)]
Suppose your arrays are all the same shape and they are in a list. Then to get the standard deviation of the first column of each you can do
arrays = [np.random.rand(10, 2) for _ in range(8)]
np.dstack(arrays).std(axis=0)[0]
This stacks the 2-D arrays into a 3-D array an then takes the std along the first axis, giving a 2 X 8 (the number of arrays). The first row of the result is the std. devs. of the 8 sets of x-values.
If you post some sample data perhaps we could help more.
Is this pythonic enough?
std_Array = numpy.std((A,B), axis = 0)[:,1]
li_arr = [np.array(x)[: , 1] for x in [A , B]]
This will produce numpy arrays with specifi columns you want to add the result will be
[array([ 2.54, 254.5 , -43. ]), array([ 3.34, 154.5 , -93. ])]
then you stack the values using column_stack
arr = np.column_stack(li_arr)
this will be the result stacking
array([[ 2.54, 3.34],
[ 254.5 , 154.5 ],
[ -43. , -93. ]])
and then finally
np.std(arr , axis = 1)