say I have a (3,3,3) array like this.
array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]]])
How do I get the 9 values corresponding to euclidean distance between each vector of 3 values and the zeroth values?
Such as doing a numpy.linalg.norm([1,1,1] - [1,1,1]) 2 times, and then doing norm([0,0,0] - [0,0,0]), and then norm([2,2,2] - [1,1,1]) 2 times, norm([2,2,2] - [0,0,0]), then norm([3,3,3] - [1,1,1]) 2 times, and finally norm([1,1,1] - [0,0,0]).
Any good ways to vectorize this? I want to store the distances in a (3,3,1) matrix.
The result would be:
array([[[0. ],
[0. ],
[0. ]],
[[1.73],
[1.73],
[3.46]]
[[3.46],
[3.46],
[1.73]]])
keepdims argument is added in numpy 1.7, you can use it to keep the sum axis:
np.sum((x - [1, 1, 1])**2, axis=-1, keepdims=True)**0.5
the result is:
[[[ 0. ]
[ 0. ]
[ 0. ]]
[[ 1.73205081]
[ 1.73205081]
[ 1.73205081]]
[[ 3.46410162]
[ 3.46410162]
[ 0. ]]]
Edit
np.sum((x - x[0])**2, axis=-1, keepdims=True)**0.5
the result is:
array([[[ 0. ],
[ 0. ],
[ 0. ]],
[[ 1.73205081],
[ 1.73205081],
[ 3.46410162]],
[[ 3.46410162],
[ 3.46410162],
[ 1.73205081]]])
You might want to consider scipy.spatial.distance.cdist(), which efficiently computes distances between pairs of points in two collections of inputs (with a standard euclidean metric, among others). Here's example code:
import numpy as np
import scipy.spatial.distance as dist
i = np.array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]]])
n,m,o = i.shape
# compute euclidean distances of each vector to the origin
# reshape input array to 2-D, as required by cdist
# only keep diagonal, as cdist computes all pairwise distances
# reshape result, adapting it to input array and required output
d = dist.cdist(i.reshape(n*m,o),i[0]).reshape(n,m,o).diagonal(axis1=2).reshape(n,m,1)
d holds:
array([[[ 0. ],
[ 0. ],
[ 0. ]],
[[ 1.73205081],
[ 1.73205081],
[ 3.46410162]],
[[ 3.46410162],
[ 3.46410162],
[ 1.73205081]]])
The big caveat of this approach is that we're calculating n*m*o distances, when we only need n*m (and that it involves an insane amount of reshaping).
I'm doing something similar that is to compute the the sum of squared distances (SSD) for each pair of frames in video volume. I think that it could be helpful for you.
video_volume is a a single 4d numpy array. This array should have dimensions
(time, rows, cols, 3) and dtype np.uint8.
Output is a square 2d numpy array of dtype float. output[i,j] should contain
the SSD between frames i and j.
video_volume = video_volume.astype(float)
size_t = video_volume.shape[0]
output = np.zeros((size_t, size_t), dtype = np.float)
for i in range(size_t):
for j in range(size_t):
output[i, j] = np.square(video_volume[i,:,:,:] - video_volume[j,:,:,:]).sum()
Related
Given there is a matrix of n samples with m features, I am calculating the covariance matrix of features by hand so it should be m by m. That can be done by numpy, for small example,
arr_x = np.array([[ 1, 4],
[ 4, 10],
[ 3, 6],
[ 5, 11],
[ 2, 4]])
print(np.cov(arr_x, rowvar=False))
>>> [[ 2.5 5. ]
[ 5. 11. ]]
I can get the same covariance matrix by doing
mean_x = np.mean(arr_x, 0)
np.dot(arr_x.T-mean_x[:, None],(arr_x.T-mean_x[:, None]).T)/(arr_x.shape[0] - 1)
>>> [[ 2.5, 5. ]
[ 5. , 11. ]]
Yet, the latter derived method does not result in the same result as numpy for some what large matrix
a = np.random.randint(0, 255, (5000, 200))
mean_a = np.mean(a, 0)
by_hand = np.dot(a.T-mean_a[:, None],(a.T-mean_a[:, None]).T)/(a.shape[0] - 1)
numpy_cov = np.cov(a, rowvar=False)
print((by_hand == numpy_cov).all())
What am I doing wrong or is there better way to get a covariance matrix by hand?
I have an nx2 array of points represented as a ndarray. I want to index some of the elements (indices are given in a ndarray as well) of one of the two column vectors such that the output is a column vector. If however the index array contains only one index, a (1,)-shaped array should be returned.
I already tried the following things without success:
import numpy as np
points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
index = np.array([0, 1, 2])
points[index, [0]] -> array([0. , 1. , 2.5]) -> shape (3,)
points[[index], 0] -> array([[0. , 1. , 2.5]]) -> shape (1, 3)
points[[index], [0]] -> array([[0. , 1. , 2.5]]) -> shape (1, 3)
points[index, 0, np.newaxis] -> array([[0. ], [1. ], [2.5]]) -> shape(3, 1) # desired
np.newaxis works for this scenario however if the index array only contains one value it does not deliver the right shape:
import numpy as np
points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
index = np.array([0])
points[index, 0, np.newaxis] -> array([[0.]]) -> shape (1, 1)
points[index, [0]] -> array([0.]) -> shape (1,) # desired
Is there possibility to index the ndarray such that the output has shapes (3,1) for the first example and (1,) for the second example without doing case differentiations based on the size of the index array?
Thanks in advance for your help!
In [329]: points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
...: index = np.array([0, 1, 2])
We can select 3 rows with:
In [330]: points[index,:]
Out[330]:
array([[0. , 1. ],
[1. , 1.5],
[2.5, 0.5]])
However if we select a column as well, the result is 1d, even if we use [0]. That's because the (3,) row index is broadcast against the (1,) column index, resulting in a (3,) result:
In [331]: points[index,0]
Out[331]: array([0. , 1. , 2.5])
In [332]: points[index,[0]]
Out[332]: array([0. , 1. , 2.5])
If we make row index (3,1) shape, the result also (3,1):
In [333]: points[index[:,None],[0]]
Out[333]:
array([[0. ],
[1. ],
[2.5]])
In [334]: points[index[:,None],0]
Out[334]:
array([[0. ],
[1. ],
[2.5]])
We get the same thing if we use a row slice:
In [335]: points[0:3,[0]]
Out[335]:
array([[0. ],
[1. ],
[2.5]])
Using [index] doesn't help because it makes the row index (1,3) shape, resulting in a (1,3) result. Of course you could transpose it to get (3,1).
With a 1 element index:
In [336]: index1 = np.array([0])
In [337]: points[index1[:,None],0]
Out[337]: array([[0.]])
In [338]: _.shape
Out[338]: (1, 1)
In [339]: points[index1,0]
Out[339]: array([0.])
In [340]: _.shape
Out[340]: (1,)
If the row index was a scalar, as opposed to 1d:
In [341]: index1 = np.array(0)
In [342]: points[index1[:,None],0]
...
IndexError: too many indices for array
In [343]: points[index1[...,None],0] # use ... instead
Out[343]: array([0.])
In [344]: points[index1, 0] # scalar result
Out[344]: 0.0
I think handling the np.array([0]) case separately requires an if test. At least I can't think of a builtin numpy way of burying it.
I'm not certain I understand the wording in your question, but it seems as though you may be after the ndarray.swapaxes method (see https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.swapaxes.html#numpy.ndarray.swapaxes)
for your snippet:
points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
swapped = points.swapaxes(0,1)
print(swapped)
gives
[[0. 1. 2.5 4. 5. ]
[1. 1.5 0.5 1. 2. ]]
I have this numpy array
matrix = np.array([[ 0.8, 0.2, 0.1],
[ 1. , 0. , 0. ],
[ 0. , 0. , 1. ]])
and I would like to filter to return, for each row of matrix the indices in decreasing value order.
For example, this would be
np.array([[0, 1, 2], [0, 1, 2], [2, 0, 1]])
I know I could use np.argsort, but this doesn't seem to be returning the right output. I tried changing the axis to different values, but that doesn't help either.
Probably the easiest way to get your desired output would be:
(-matrix).argsort(axis=1)
# array([[0, 1, 2],
# [0, 1, 2],
# [2, 0, 1]])
I think np.argsort does seem to do the trick, you just need to make sure to flip the matrix horizontally to make it decreasing order:
>>>matrix = np.array(
[[ 0.8, 0.2, 0.1],
[ 1. , 0. , 0. ],
[ 0. , 0. , 1. ]])
>>> np.fliplr(np.argsort(matrix))
array([[0, 1, 2],
[0, 2, 1],
[2, 1, 0]])
This should be the right output unless you have any requirements for sorting ties. Right now the flipping would make the rightmost tie the first index. If you wanted to match your exact output, where the leftmost index is first you could do a bit of juggling:
# Flip the array first and get the indices
>>> flipped = np.argsort(np.fliplr(matrix))
# Subtract the width of your array to reverse the indices
# Flip the array to be in descending order
>>> np.fliplr(abs(flipped - flipped.shape[1]))
array([[0, 1, 2],
[0, 1, 2],
[2, 0, 1]])
I have a matrix containing positive and negative numbers like this:
>>> source_matrix
array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]])
I'd like to had a copy of this matrix with inverted negatives:
>>> result
array([[-0.25, -0.5, 0],
[-0.2, 0, 4],
[ 0, 6, 5]])
Firstly, since your desired array is gonna contain float type you need to determine the array's dtype at creation time as float. The reason for that is because if you assign the float results of the inverted sub-array they'll automatically be casted to float. Secondly, you need to find the negative numbers in your array and then use a simple indexing in order to grab them and use np.true_divide() to perform the inversion.
In [25]: arr = np.array([[-4, -2, 0],
...: [-5, 0, 4],
...: [ 0, 6, 5]], dtype=np.float)
...:
...:
In [26]: mask = arr < 0
In [27]: arr[mask] = np.true_divide(1, arr[mask])
In [28]: arr
Out[28]:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
You can also achieve this without masking, by using the where and out params of true_divide.
a = np.array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]], dtype=np.float)
np.true_divide(1, a, out=a, where=a<0)
Giving the result:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
The where= parameter is passed an array of the same dimensions as your two inputs. Where this evaluates to True the divide is performed. Where it evaluates to False, the original input, passed in via out= is output into the result unchanged.
I have a 2d-Numpy array containing basically a label-value pair. I have combined several of these matricies, but I'm hoping to round the label to 4 decimal places and sum the values, such that:
[[70.00103, 1],
[70.02474, 1],
[70.02474, 1],
[70.024751, 1],
[71.009100, 1],
[79.0152, 1],
[79.0152633, 1],
[79.0152634, 1]]
becomes
[[70.001, 1],
[70.0247, 2],
[70.0248, 1],
[71.0091, 1],
[79.0152, 1],
[79.0153, 2]]
Any thoughts on how one might accomplish this in a speedy manner, using either numpy or pandas? Thanks!
In [10]:
import numpy as np
x=np.array([[70.00103, 1],[70.02474, 1],[70.02474, 1],[70.024751, 1],[71.009100, 1],[79.0152, 1],[79.0152633, 1],[79.0152634,1]])
x[:,0]=x[:,0].round(4)
x
Out[10]:
array([[ 70.001 , 1. ],
[ 70.0247, 1. ],
[ 70.0247, 1. ],
[ 70.0248, 1. ],
[ 71.0091, 1. ],
[ 79.0152, 1. ],
[ 79.0153, 1. ],
[ 79.0153, 1. ]])
In [14]:
import pandas as pd
pd.DataFrame(x).groupby(0).sum()
Out[14]:
70.0010 1
70.0247 2
70.0248 1
71.0091 1
79.0152 1
79.0153 2
It's what that np.around is for :
>>> A=np.array([[70.00103, 1],
... [70.02474, 1],
... [70.02474, 1],
... [70.024751, 1],
... [71.009100, 1],
... [79.0152, 1],
... [79.0152633, 1],
... [79.0152634, 1]])
>>>
>>> np.around(A, decimals=4)
array([[ 70.001 , 1. ],
[ 70.0247, 1. ],
[ 70.0247, 1. ],
[ 70.0248, 1. ],
[ 71.0091, 1. ],
[ 79.0152, 1. ],
[ 79.0153, 1. ],
[ 79.0153, 1. ]])