numpy get std between datasets - python

I have a dataset array A. A is nĂ—2. It can be plotted on the x and y axis.
A[:,1] gets me all of the Y values ans A[:,0] gets me all the x values.
Now, I have a few other dataset arrays that are similar to A. X values are the same for these similar arrays. How do I calculate the standard deviation of the datasets? There should be a std value for each X. In the end my result std should have a length of n.
I can do this the manual way with loops but I'm not sure how to do this using NumPy in a pythonic and simple manner.
here are some sample data:
A=[[0,2.54],[1,254.5],[2,-43]]
B=[[0,3.34],[1,154.5],[2,-93]]
std_Array=[std(2.54,3.54),std(254.5,154.5),std(-43,-93)]

Suppose your arrays are all the same shape and they are in a list. Then to get the standard deviation of the first column of each you can do
arrays = [np.random.rand(10, 2) for _ in range(8)]
np.dstack(arrays).std(axis=0)[0]
This stacks the 2-D arrays into a 3-D array an then takes the std along the first axis, giving a 2 X 8 (the number of arrays). The first row of the result is the std. devs. of the 8 sets of x-values.
If you post some sample data perhaps we could help more.

Is this pythonic enough?
std_Array = numpy.std((A,B), axis = 0)[:,1]

li_arr = [np.array(x)[: , 1] for x in [A , B]]
This will produce numpy arrays with specifi columns you want to add the result will be
[array([ 2.54, 254.5 , -43. ]), array([ 3.34, 154.5 , -93. ])]
then you stack the values using column_stack
arr = np.column_stack(li_arr)
this will be the result stacking
array([[ 2.54, 3.34],
[ 254.5 , 154.5 ],
[ -43. , -93. ]])
and then finally
np.std(arr , axis = 1)

Related

Numpy drop all rows for which the values are between 5% and 95% percentile

I'm looking for a pythonic way of selecting all rows from a 2D dataset of size (nrows, ncols) in such way that I would like to keep only those rows for which all the values along fall between 5% and 95% percentile values. If we use np.percentile(dataset, 5, axis=0), we obtain an array of values with the size ncols.
In the case of 1D arrays, writing something like X[X>0] is trivial. What is the approach when you want to generalize to 2D or higher dimensions?
X[X>np.percentile(dataset, 5, axis=0)]
If I understand correctly in your 2D example one can use np.all() to find the rows, where the criteria is satisfied. Then you can use the syntax like X[X>0] (see below for an example).
I am not sure how to generalize to higher dimensions, but maybe np.take (https://numpy.org/doc/stable/reference/generated/numpy.take.html) is what you are looking for?
2D example:
# Setup
import numpy as np
np.random.seed(100)
dataset = np.random.normal(size=(10,2))
display(dataset)
array([[-1.74976547, 0.3426804 ],
[ 1.1530358 , -0.25243604],
[ 0.98132079, 0.51421884],
[ 0.22117967, -1.07004333],
[-0.18949583, 0.25500144],
[-0.45802699, 0.43516349],
[-0.58359505, 0.81684707],
[ 0.67272081, -0.10441114],
[-0.53128038, 1.02973269],
[-0.43813562, -1.11831825]])
# Indexing
lo = np.percentile(dataset, 5, axis=0)
hi = np.percentile(dataset, 95, axis=0)
idx = (lo < data) & (hi > data) # turns into a 1d index
dataset[np.all(idx, axis=1)]
array([[-1.74976547, 0.3426804 ],
[ 1.1530358 , -0.25243604],
[-0.45802699, 0.43516349],
[ 0.67272081, -0.10441114],
[-0.53128038, 1.02973269]])

What is the best way of rearranging subarrays in numpy? [duplicate]

I have a 2D numpy array of 2D points:
np.random.seed(0)
a = np.random.rand(3, 4, 2) # each value is a 2D point
I would like to sort each row by the norm of every point
norms = np.linalg.norm(a, axis=2) # shape(3, 4)
indices = np.argsort(norms, axis=0) # indices of each sorted row
Now I would like to create an array with the same shape and values as a. that will have each row of 2D points sorted by their norm.
How can I achieve that?
I tried variations of np.take & np.take_along_axis but with no success.
for example:
np.take(a, indices, axis=1) # shape (3,3,4,2)
This samples a 3 times, once for each row in indices.
I would like to sample a just once. each row in indices has the columns that should be sampled from the corresponding row.
If I understand you correctly, you want this:
norms = np.linalg.norm(a,axis=2) # shape(3,4)
indices = np.argsort(norms , axis=1)
np.take_along_axis(a, indices[:,:,None], axis=1)
output for your example:
[[[0.4236548 0.64589411]
[0.60276338 0.54488318]
[0.5488135 0.71518937]
[0.43758721 0.891773 ]]
[[0.07103606 0.0871293 ]
[0.79172504 0.52889492]
[0.96366276 0.38344152]
[0.56804456 0.92559664]]
[[0.0202184 0.83261985]
[0.46147936 0.78052918]
[0.77815675 0.87001215]
[0.97861834 0.79915856]]]

Avoid using for loop. Python 3

I have an array of shape (3,2):
import numpy as np
arr = np.array([[0.,0.],[0.25,-0.125],[0.5,-0.125]])
I was trying to build a matrix (matrix) of dimensions (6,2), with the results of the outer product of the elements i,i of arr and arr.T. At the moment I am using a for loop such as:
size = np.shape(arr)
matrix = np.zeros((size[0]*size[1],size[1]))
for i in range(np.shape(arr)[0]):
prod = np.outer(arr[i],arr[i].T)
matrix[size[1]*i:size[1]+size[1]*i,:] = prod
Resulting:
matrix =array([[ 0. , 0. ],
[ 0. , 0. ],
[ 0.0625 , -0.03125 ],
[-0.03125 , 0.015625],
[ 0.25 , -0.0625 ],
[-0.0625 , 0.015625]])
Is there any way to build this matrix without using a for loop (e.g. broadcasting)?
Extend arrays to 3D with None/np.newaxis keeping the first axis aligned, while letting the second axis getting pair-wise multiplied, perform multiplication leveraging broadcasting and reshape to 2D -
matrix = (arr[:,None,:]*arr[:,:,None]).reshape(-1,arr.shape[1])
We can also use np.einsum -
matrix = np.einsum('ij,ik->ijk',arr,arr).reshape(-1,arr.shape[1])
einsum string representation might be more intuitive as it lets us visualize three things :
Axes that are aligned (axis=0 here).
Axes that are getting summed up (none here).
Axes that are kept i.e. element-wise multiplied (axis=1 here).

Python replacing max values in array

I have a 3D numpy array A of shape 10 x 5 x 3. I also have a vector B of length 3 (length of last axis of A). I want to compare each A[:,:,i] against B[i] where i = 0:2 and replace all values A[:,:,i] > B[i] with B[i].
Is there a way to achieve this without a for loop.
Edit: I tried the argmax across i = 0:2 using a for loop
python replace values in 2d numpy array
You can use numpy.minimum to accomplish this. It returns the element-wise minimum between two arrays. If the arrays are different sizes (such as in your case), then the arrays are automatically broadcast to the correct size prior to comparison.
A = numpy.random.rand(1,2,3)
# array([[[ 0.79188 , 0.32707664, 0.18386629],
# [ 0.4139146 , 0.07259663, 0.47604274]]])
B = numpy.array([0.1, 0.2, 0.3])
C = numpy.minimum(A, B)
# array([[[ 0.1 , 0.2 , 0.18386629],
# [ 0.1 , 0.07259663, 0.3 ]]])
Or as suggested by #Divakar if you want to do in-place replacement:
numpy.minimum(A, B, out=A)

Quickest way to calculate the average growth rate across columns of a numpy array

Given an array such as:
import numpy as np
a = np.array([[1,2,3,4,5],[6,7,8,9,10]])
What's the quickest way to calculate the growth rates of each row so that my results would be 0.52083333333333326, and 0.13640873015873009 respectively.
I tried using:
>>> np.nanmean(np.rate(1,0,-a[:-1],a[1:]), axis=0)
array([ 5. , 2.5 , 1.66666667, 1.25 , 1. ])
but of course it doesn't yield the right result and I don't know how to get the axis right for the numpy.rate function.
In [262]: a = np.array([[1,2,3,4,5],[6,7,8,9,10]]).astype(float)
In [263]: np.nanmean((a[:, 1:]/a[:, :-1]), axis=1) - 1
Out[263]: array([ 0.52083333, 0.13640873])
To take your approach using numpy.rate, you need to index into your a array properly (consider all rows separately) and use axis=1:
In [6]: np.nanmean(np.rate(1,0,-a[:,:-1],a[:,1:]), axis=1)
Out[6]: array([ 0.52083333, 0.13640873])

Categories