Losing decimal when doing array operation in Python - python

I tried to make a function and inside it there is a code to divides a column with its column sum and here I come up with.
A = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4]])
print(A)
A = A.T
Asum = A.sum(axis=1)
print(Asum)
for i in range(len(Asum)):
A[:,i] = A[:,i]/Asum[i]
I'm hoping some decimal matrix but it automatically turn into integer. It gives me a zero matrix. Where do I go wrong?

You must change:
Asum = A.sum(axis=1)
by:
Asum = A.sum(axis=0)
To get the column by column sum.
Also you can get the division easily with numpy.divide:
np.divide(A, Asum)
#array([[0.1, 0.1, 0.1],
# [0.2, 0.2, 0.2],
# [0.3, 0.3, 0.3],
# [0.4, 0.4, 0.4]])
Or simply with:
A/Asum

Your A is integer dtype; assigned floats get truncated. If A started as a float array your iteration would work. But you don't need to iterate to perform this calculation:
In [108]: A = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4]]).T
In [109]: A
Out[109]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
In [110]: Asum = A.sum(axis=1)
In [111]: Asum
Out[111]: array([ 3, 6, 9, 12])
A is (4,3), Asum is (4,). If we make it (4,1):
In [114]: Asum[:,None]
Out[114]:
array([[ 3],
[ 6],
[ 9],
[12]])
we can perform the divide without iteration (review broadcasting if necessary):
In [115]: A/Asum[:,None]
Out[115]:
array([[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333]])
sum has keepdims parameter that makes this kind of calculation easier:
In [117]: Asum = A.sum(axis=1, keepdims=True)
In [118]: Asum
Out[118]:
array([[ 3],
[ 6],
[ 9],
[12]])

Related

How to index elements from a column of a ndarray such that the output is a column vector?

I have an nx2 array of points represented as a ndarray. I want to index some of the elements (indices are given in a ndarray as well) of one of the two column vectors such that the output is a column vector. If however the index array contains only one index, a (1,)-shaped array should be returned.
I already tried the following things without success:
import numpy as np
points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
index = np.array([0, 1, 2])
points[index, [0]] -> array([0. , 1. , 2.5]) -> shape (3,)
points[[index], 0] -> array([[0. , 1. , 2.5]]) -> shape (1, 3)
points[[index], [0]] -> array([[0. , 1. , 2.5]]) -> shape (1, 3)
points[index, 0, np.newaxis] -> array([[0. ], [1. ], [2.5]]) -> shape(3, 1) # desired
np.newaxis works for this scenario however if the index array only contains one value it does not deliver the right shape:
import numpy as np
points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
index = np.array([0])
points[index, 0, np.newaxis] -> array([[0.]]) -> shape (1, 1)
points[index, [0]] -> array([0.]) -> shape (1,) # desired
Is there possibility to index the ndarray such that the output has shapes (3,1) for the first example and (1,) for the second example without doing case differentiations based on the size of the index array?
Thanks in advance for your help!
In [329]: points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
...: index = np.array([0, 1, 2])
We can select 3 rows with:
In [330]: points[index,:]
Out[330]:
array([[0. , 1. ],
[1. , 1.5],
[2.5, 0.5]])
However if we select a column as well, the result is 1d, even if we use [0]. That's because the (3,) row index is broadcast against the (1,) column index, resulting in a (3,) result:
In [331]: points[index,0]
Out[331]: array([0. , 1. , 2.5])
In [332]: points[index,[0]]
Out[332]: array([0. , 1. , 2.5])
If we make row index (3,1) shape, the result also (3,1):
In [333]: points[index[:,None],[0]]
Out[333]:
array([[0. ],
[1. ],
[2.5]])
In [334]: points[index[:,None],0]
Out[334]:
array([[0. ],
[1. ],
[2.5]])
We get the same thing if we use a row slice:
In [335]: points[0:3,[0]]
Out[335]:
array([[0. ],
[1. ],
[2.5]])
Using [index] doesn't help because it makes the row index (1,3) shape, resulting in a (1,3) result. Of course you could transpose it to get (3,1).
With a 1 element index:
In [336]: index1 = np.array([0])
In [337]: points[index1[:,None],0]
Out[337]: array([[0.]])
In [338]: _.shape
Out[338]: (1, 1)
In [339]: points[index1,0]
Out[339]: array([0.])
In [340]: _.shape
Out[340]: (1,)
If the row index was a scalar, as opposed to 1d:
In [341]: index1 = np.array(0)
In [342]: points[index1[:,None],0]
...
IndexError: too many indices for array
In [343]: points[index1[...,None],0] # use ... instead
Out[343]: array([0.])
In [344]: points[index1, 0] # scalar result
Out[344]: 0.0
I think handling the np.array([0]) case separately requires an if test. At least I can't think of a builtin numpy way of burying it.
I'm not certain I understand the wording in your question, but it seems as though you may be after the ndarray.swapaxes method (see https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.swapaxes.html#numpy.ndarray.swapaxes)
for your snippet:
points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
swapped = points.swapaxes(0,1)
print(swapped)
gives
[[0. 1. 2.5 4. 5. ]
[1. 1.5 0.5 1. 2. ]]

Appending 2x2 co variance matrices in numpy

I have a numpy array such as:
gmm.sigma =
[[[ 4.64 -1.93]
[-1.93 6.5 ]]
[[ 3.65 2.89]
[ 2.89 -1.26]]]
and I want to add another 2x2 matrix such as:
gauss.sigma=
[[ -1.24 2.34]
[ 2.34 4.76]]
to get:
gmm.sigma =
[[[ 4.64 -1.93]
[-1.93 6.5 ]]
[[ 3.65 2.89]
[ 2.89 -1.26]]
[[-1.24 2.34]
[ 2.34 4.76]]]
I have tried: gmm.sigma = np.append(gmm.sigma, gauss.sigma, axis = 0),
but get this error:
Traceback (most recent call last):
File "test1.py", line 40, in <module>
gmm.sigma = np.append(gmm.sigma, gauss.sigma, axis = 0)
File "/home/rowan/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4528, in append
return concatenate((arr, values), axis=axis)
ValueError: all the input arrays must have same number of dimensions
Any help is appreciated
Looks like you want to join the 2 arrays on the first axis - except that the second is only 2d. It needs an added dimension:
In [233]: arr = np.arange(8).reshape(2,2,2)
In [234]: arr1 = np.arange(10,14).reshape(2,2)
In [235]: np.concatenate((arr, arr1[None,:,:]), axis=0)
Out[235]:
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[10, 11],
[12, 13]]])
dstack is a variation on concatenate that expands everything to 3d, and joins on the last axis. To use it we have to transpose everything:
In [236]: np.dstack((arr.T,arr1.T)).T
Out[236]:
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[10, 11],
[12, 13]]])
index_tricks adds some classes that play similar tricks with dimensions:
In [241]: np.r_['0,3', arr, arr1]
Out[241]:
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[10, 11],
[12, 13]]])
The docs of np.r_ require some reading if you want get most from it, but it might worth using if you had to adjust the dimensions of several arrays, eg. np.r_['0,3', arr1, arr, arr1]
You can use dstack which stacks the arrays in sequence depth wise (along the third axis) followed by a transpose. To get the desired output, you will have to stack gmm.T and gauss
gmm = np.array([[[4.64, -1.93],
[-1.93, 6.5 ]],
[[3.65, 2.89],
[2.89, -1.26]]])
gauss = np.array([[ -1.24, 2.34],
[2.34, 4.76]])
result = np.dstack((gmm.T, gauss)).T
print (result)
print (result.shape)
# (3, 2, 2)
Output
array([[[ 4.64, -1.93],
[-1.93, 6.5 ]],
[[ 3.65, 2.89],
[ 2.89, -1.26]],
[[-1.24, 2.34],
[ 2.34, 4.76]]])
Alternatively you can also use concatenate by properly reshaping your second array as
gmm = np.array([[[4.64, -1.93],
[-1.93, 6.5 ]],
[[3.65, 2.89],
[2.89, -1.26]]])
gauss = np.array([[ -1.24, 2.34],
[2.34, 4.76]]).reshape(1,2,2)
result = np.concatenate((gmm, gauss), axis=0)
As the error message stated, the dimension of gmm and gauss_sigmaare not the same, you should reshape gauss_sigma before appending.
gmm_sigma = np.array([[[4.64, -1.93], [-1.93, 6.5]], [[3.65, 2.89], [ 2.89, -1.26]]])
gauss_sigma = np.array([[-1.24, 2.34], [2.34, 4.76]])
print(np.append(gmm_sigma, gauss_sigma.reshape(1, 2, 2), axis=0))
# array([[[ 4.64, -1.93],
# [-1.93, 6.5 ]],
#
# [[ 3.65, 2.89],
# [ 2.89, -1.26]],
#
# [[-1.24, 2.34],
# [ 2.34, 4.76]]])

How to invert only negative elements in numpy matrix?

I have a matrix containing positive and negative numbers like this:
>>> source_matrix
array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]])
I'd like to had a copy of this matrix with inverted negatives:
>>> result
array([[-0.25, -0.5, 0],
[-0.2, 0, 4],
[ 0, 6, 5]])
Firstly, since your desired array is gonna contain float type you need to determine the array's dtype at creation time as float. The reason for that is because if you assign the float results of the inverted sub-array they'll automatically be casted to float. Secondly, you need to find the negative numbers in your array and then use a simple indexing in order to grab them and use np.true_divide() to perform the inversion.
In [25]: arr = np.array([[-4, -2, 0],
...: [-5, 0, 4],
...: [ 0, 6, 5]], dtype=np.float)
...:
...:
In [26]: mask = arr < 0
In [27]: arr[mask] = np.true_divide(1, arr[mask])
In [28]: arr
Out[28]:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
You can also achieve this without masking, by using the where and out params of true_divide.
a = np.array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]], dtype=np.float)
np.true_divide(1, a, out=a, where=a<0)
Giving the result:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
The where= parameter is passed an array of the same dimensions as your two inputs. Where this evaluates to True the divide is performed. Where it evaluates to False, the original input, passed in via out= is output into the result unchanged.

Scipy: Calculation of standardized euclidean via cdist

The formula is available in the docs and pointed to in this answer. However when I'm trying to apply it I'm not getting a matching answer. I'm sure there's some silly mistake I'm making somewhere so thanks for bearing with me:
Setup
Say I have 2 matrices:
X: array([[0, 1, 0],
[1, 1, 1]])
X2: array([[1, 1, 0],
[1, 1, 1],
[1, 2, 0]])
Now applying Xans = scipy.spatial.distance.cdist(X, X2, 'seuclidean') gives:
Xans: array([[2.23606798, 2.88675135, 3.16227766],
[1.82574186, 0. , 2.88675135]])
Let's just focus on Xans[0][0] = 2.23606798, which should have been obtained by applying seuclidean(X[0], X2[0]).
Method 1: Using pdist
I tried doing this via pdist but get a NaN:
In [104]: scipy.spatial.distance.pdist([X[0], X2[0]], metric='seuclidean')
Out[104]: array([nan])
Why is this happening?
Method 2: Direct Formula Application
I tried manually using the formula linked in the answer above as follows:
In [107]: (((X[0] - X2[0])**2).sum()/(np.var([X[0], X2[0]])))**0.5
Out[107]: 2.0
As can be seen this is giving 2.0?
I'm clearly doing something very wrong - What is it?
The standardized Euclidean distance weights each variable with a separate variance. If you don't provide the variances with the V argument, it computes them from the input array. This is mentioned in the pdist docstring in the "Parameters" section under **kwargs, where it shows:
V : ndarray
The variance vector for standardized Euclidean.
Default: var(X, axis=0, ddof=1)
For example:
In [39]: A
Out[39]:
array([[3, 0, 2],
[2, 1, 2],
[0, 0, 1],
[3, 1, 2],
[1, 0, 0]])
In [40]: from scipy.spatial.distance import pdist
In [41]: pdist(A, metric='seuclidean')
Out[41]:
array([ 1.98029509, 2.55814731, 1.82574186, 2.71163072, 2.63368079,
0.76696499, 2.9868995 , 3.14284123, 1.35581536, 3.26898677])
We get the same result if we provide the variances computed as explained in the docstring:
In [42]: pdist(A, metric='seuclidean', V=np.var(A, axis=0, ddof=1))
Out[42]:
array([ 1.98029509, 2.55814731, 1.82574186, 2.71163072, 2.63368079,
0.76696499, 2.9868995 , 3.14284123, 1.35581536, 3.26898677])
Of course, if you provide variances that are all 1, you get the regular Euclidean distance:
In [43]: pdist(A, metric='seuclidean', V=np.ones(A.shape[1]))
Out[43]:
array([ 1.41421356, 3.16227766, 1. , 2.82842712, 2.44948974,
1. , 2.44948974, 3.31662479, 1.41421356, 3. ])
In [44]: pdist(A, metric='euclidean')
Out[44]:
array([ 1.41421356, 3.16227766, 1. , 2.82842712, 2.44948974,
1. , 2.44948974, 3.31662479, 1.41421356, 3. ])
The problem with your "Method 1" is that in your input array of just two points (i.e. [X[0], X2[0]]), the second and third components of the points don't change, so the variance associated with those components is 0:
In [45]: p = np.array([X[0], X2[0]])
In [46]: p
Out[46]:
array([[0, 1, 0],
[1, 1, 0]])
In [47]: np.var(p, axis=0, ddof=1)
Out[47]: array([ 0.5, 0. , 0. ])
When the code for the seuclidean divides by these variances, the result is either infinity or NaN--the latter if the numerator is also 0, which is the case in the third component of the input [X[0], X2[0]].
To work around this, you have to decide how you want to handle the case where the variance of a component is 0, and handle it explicitly. For example, if you want it to act like that variance is 1 in that case (just to avoid dividing by 0) you could do something like the following.
Suppose B is our array of points. The third column of B is all 1s.
In [63]: B
Out[63]:
array([[3, 0, 1],
[2, 1, 1],
[0, 0, 1],
[3, 1, 1],
[1, 0, 1]])
Compute the variances of the columns:
In [64]: V = np.var(B, axis=0, ddof=1)
In [65]: V
Out[65]: array([ 1.7, 0.3, 0. ])
Replace the variances that are 0 with 1:
In [66]: V[V == 0] = 1
In [67]: V
Out[67]: array([ 1.7, 0.3, 1. ])
Use V to compute the standardized Euclidean distances:
In [68]: pdist(B, metric='seuclidean', V=V)
Out[68]:
array([ 1.98029509, 2.30089497, 1.82574186, 1.53392998, 2.38459106,
0.76696499, 1.98029509, 2.93725228, 0.76696499, 2.38459106])
This has the same effect as simply removing the constant column:
In [69]: pdist(B[:, :2], metric='seuclidean')
Out[69]:
array([ 1.98029509, 2.30089497, 1.82574186, 1.53392998, 2.38459106,
0.76696499, 1.98029509, 2.93725228, 0.76696499, 2.38459106])
Your "Method 2" is wrong because your formula is wrong. You have to keep the variances for each component. np.var([X[0], X2[0]]) computes the (single) variance of all the values in the input. Instead, you need to use the axis and ddof arguments shown above.

How to get euclidean distance on a 3x3x3 array in numpy

say I have a (3,3,3) array like this.
array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]]])
How do I get the 9 values corresponding to euclidean distance between each vector of 3 values and the zeroth values?
Such as doing a numpy.linalg.norm([1,1,1] - [1,1,1]) 2 times, and then doing norm([0,0,0] - [0,0,0]), and then norm([2,2,2] - [1,1,1]) 2 times, norm([2,2,2] - [0,0,0]), then norm([3,3,3] - [1,1,1]) 2 times, and finally norm([1,1,1] - [0,0,0]).
Any good ways to vectorize this? I want to store the distances in a (3,3,1) matrix.
The result would be:
array([[[0. ],
[0. ],
[0. ]],
[[1.73],
[1.73],
[3.46]]
[[3.46],
[3.46],
[1.73]]])
keepdims argument is added in numpy 1.7, you can use it to keep the sum axis:
np.sum((x - [1, 1, 1])**2, axis=-1, keepdims=True)**0.5
the result is:
[[[ 0. ]
[ 0. ]
[ 0. ]]
[[ 1.73205081]
[ 1.73205081]
[ 1.73205081]]
[[ 3.46410162]
[ 3.46410162]
[ 0. ]]]
Edit
np.sum((x - x[0])**2, axis=-1, keepdims=True)**0.5
the result is:
array([[[ 0. ],
[ 0. ],
[ 0. ]],
[[ 1.73205081],
[ 1.73205081],
[ 3.46410162]],
[[ 3.46410162],
[ 3.46410162],
[ 1.73205081]]])
You might want to consider scipy.spatial.distance.cdist(), which efficiently computes distances between pairs of points in two collections of inputs (with a standard euclidean metric, among others). Here's example code:
import numpy as np
import scipy.spatial.distance as dist
i = np.array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]]])
n,m,o = i.shape
# compute euclidean distances of each vector to the origin
# reshape input array to 2-D, as required by cdist
# only keep diagonal, as cdist computes all pairwise distances
# reshape result, adapting it to input array and required output
d = dist.cdist(i.reshape(n*m,o),i[0]).reshape(n,m,o).diagonal(axis1=2).reshape(n,m,1)
d holds:
array([[[ 0. ],
[ 0. ],
[ 0. ]],
[[ 1.73205081],
[ 1.73205081],
[ 3.46410162]],
[[ 3.46410162],
[ 3.46410162],
[ 1.73205081]]])
The big caveat of this approach is that we're calculating n*m*o distances, when we only need n*m (and that it involves an insane amount of reshaping).
I'm doing something similar that is to compute the the sum of squared distances (SSD) for each pair of frames in video volume. I think that it could be helpful for you.
video_volume is a a single 4d numpy array. This array should have dimensions
(time, rows, cols, 3) and dtype np.uint8.
Output is a square 2d numpy array of dtype float. output[i,j] should contain
the SSD between frames i and j.
video_volume = video_volume.astype(float)
size_t = video_volume.shape[0]
output = np.zeros((size_t, size_t), dtype = np.float)
for i in range(size_t):
for j in range(size_t):
output[i, j] = np.square(video_volume[i,:,:,:] - video_volume[j,:,:,:]).sum()

Categories