Weird behavior when squaring elements in numpy array - python

I have two numpy arrays of shape (1, 250000):
a = [[ 0 254 1 ..., 255 0 1]]
b = [[ 1 0 252 ..., 0 255 255]]
I want to create a new numpy array whose elements are the square root of the sum of squares of elements in the arrays a and b, but I am not getting the correct result:
>>> c = np.sqrt(np.square(a)+np.square(b))
>>> print c
[[ 1. 2. 4.12310553 ..., 1. 1. 1.41421354]]
Am I missing something simple here?

Presumably your arrays a and b are arrays of unsigned 8 bit integers--you can check by inspecting the attribute a.dtype. When you square them, the data type is preserved, and the 8 bit values overflow, which means the values "wrap around" (i.e. the squared values are modulo 256):
In [7]: a = np.array([[0, 254, 1, 255, 0, 1]], dtype=np.uint8)
In [8]: np.square(a)
Out[8]: array([[0, 4, 1, 1, 0, 1]], dtype=uint8)
In [9]: b = np.array([[1, 0, 252, 0, 255, 255]], dtype=np.uint8)
In [10]: np.square(a) + np.square(b)
Out[10]: array([[ 1, 4, 17, 1, 1, 2]], dtype=uint8)
In [11]: np.sqrt(np.square(a) + np.square(b))
Out[11]:
array([[ 1. , 2. , 4.12310553, 1. , 1. ,
1.41421354]], dtype=float32)
To avoid the problem, you can tell np.square to use a floating point data type:
In [15]: np.sqrt(np.square(a, dtype=np.float64) + np.square(b, dtype=np.float64))
Out[15]:
array([[ 1. , 254. , 252.00198412, 255. ,
255. , 255.00196078]])
You could also use the function numpy.hypot, but you might still want to use the dtype argument, otherwise the default data type is np.float16:
In [16]: np.hypot(a, b)
Out[16]: array([[ 1., 254., 252., 255., 255., 255.]], dtype=float16)
In [17]: np.hypot(a, b, dtype=np.float64)
Out[17]:
array([[ 1. , 254. , 252.00198412, 255. ,
255. , 255.00196078]])
You might wonder why the dtype argument that I used in numpy.square and numpy.hypot is not shown in the functions' docstrings. Both of these functions are numpy "ufuncs", and the authors of numpy decided that it was better to show only the main arguments in the docstring. The optional arguments are documented in the reference manual.

For this simple case, it works perfectly fine:
In [1]: a = np.array([[ 0, 2, 4, 6, 8]])
In [2]: b = np.array([[ 1, 3, 5, 7, 9]])
In [3]: c = np.sqrt(np.square(a) + np.square(b))
In [4]: print(c)
[[ 1. 3.60555128 6.40312424 9.21954446 12.04159458]]
You must be doing something wrong.

Related

Numpy - Declare a specific nx1 array

I'm using numpy in python , in order to create a nx1 matrix . I want the 1st element of the matrix to be 3 , the 2nd -1 , then the n-1 element -1 again and at the end the n element 3. All the in between elements , i.e. from element 3 to element n-2 should be 0. I've made a drawing of the mentioned matrix , is like this :
I'm fairly new to python and using numpy but seems like a great tool for managing matrices. What I've tried so far is creating the nx1 array (giving n some value) and initializing it to 0 .
import numpy as np
n = 100
I = np.arange(n)
matrix = np.row_stack(0*I)
print("\Matrix is \n",matrix)
Any clues to how i proceed? Or what routine to use ?
Probably the simplest way is to just do the following:
import numpy as np
n = 10
a = np.zeros(n)
a[0] = 3
a[1] = -1
a[len(a)-1] = 3
a[len(a)-2] = -1
>>print(a)
output: [ 3. -1. 0. 0. 0. 0. 0. 0. -1. 3.]
Hope this helps ;)
In [97]: n=10
In [98]: arr = np.zeros(n,int)
In [99]: arr[[0,-1]]=3; arr[[1,-2]]=-1
In [100]: arr
Out[100]: array([ 3, -1, 0, 0, 0, 0, 0, 0, -1, 3])
Easily changed to (n,1):
In [101]: arr[:,None]
Out[101]:
array([[ 3],
[-1],
[ 0],
[ 0],
[ 0],
[ 0],
[ 0],
[ 0],
[-1],
[ 3]])
I guess something that works is :
import numpy as np
n = 100
I = np.arange(n)
matrix = np.row_stack(0*I)
matrix[0]=3
matrix[1]=-1
matrix[n-2]=-1
matrix[n-1]=3
print("\Matrix is \n",matrix)

How to invert only negative elements in numpy matrix?

I have a matrix containing positive and negative numbers like this:
>>> source_matrix
array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]])
I'd like to had a copy of this matrix with inverted negatives:
>>> result
array([[-0.25, -0.5, 0],
[-0.2, 0, 4],
[ 0, 6, 5]])
Firstly, since your desired array is gonna contain float type you need to determine the array's dtype at creation time as float. The reason for that is because if you assign the float results of the inverted sub-array they'll automatically be casted to float. Secondly, you need to find the negative numbers in your array and then use a simple indexing in order to grab them and use np.true_divide() to perform the inversion.
In [25]: arr = np.array([[-4, -2, 0],
...: [-5, 0, 4],
...: [ 0, 6, 5]], dtype=np.float)
...:
...:
In [26]: mask = arr < 0
In [27]: arr[mask] = np.true_divide(1, arr[mask])
In [28]: arr
Out[28]:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
You can also achieve this without masking, by using the where and out params of true_divide.
a = np.array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]], dtype=np.float)
np.true_divide(1, a, out=a, where=a<0)
Giving the result:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
The where= parameter is passed an array of the same dimensions as your two inputs. Where this evaluates to True the divide is performed. Where it evaluates to False, the original input, passed in via out= is output into the result unchanged.

Scipy: Calculation of standardized euclidean via cdist

The formula is available in the docs and pointed to in this answer. However when I'm trying to apply it I'm not getting a matching answer. I'm sure there's some silly mistake I'm making somewhere so thanks for bearing with me:
Setup
Say I have 2 matrices:
X: array([[0, 1, 0],
[1, 1, 1]])
X2: array([[1, 1, 0],
[1, 1, 1],
[1, 2, 0]])
Now applying Xans = scipy.spatial.distance.cdist(X, X2, 'seuclidean') gives:
Xans: array([[2.23606798, 2.88675135, 3.16227766],
[1.82574186, 0. , 2.88675135]])
Let's just focus on Xans[0][0] = 2.23606798, which should have been obtained by applying seuclidean(X[0], X2[0]).
Method 1: Using pdist
I tried doing this via pdist but get a NaN:
In [104]: scipy.spatial.distance.pdist([X[0], X2[0]], metric='seuclidean')
Out[104]: array([nan])
Why is this happening?
Method 2: Direct Formula Application
I tried manually using the formula linked in the answer above as follows:
In [107]: (((X[0] - X2[0])**2).sum()/(np.var([X[0], X2[0]])))**0.5
Out[107]: 2.0
As can be seen this is giving 2.0?
I'm clearly doing something very wrong - What is it?
The standardized Euclidean distance weights each variable with a separate variance. If you don't provide the variances with the V argument, it computes them from the input array. This is mentioned in the pdist docstring in the "Parameters" section under **kwargs, where it shows:
V : ndarray
The variance vector for standardized Euclidean.
Default: var(X, axis=0, ddof=1)
For example:
In [39]: A
Out[39]:
array([[3, 0, 2],
[2, 1, 2],
[0, 0, 1],
[3, 1, 2],
[1, 0, 0]])
In [40]: from scipy.spatial.distance import pdist
In [41]: pdist(A, metric='seuclidean')
Out[41]:
array([ 1.98029509, 2.55814731, 1.82574186, 2.71163072, 2.63368079,
0.76696499, 2.9868995 , 3.14284123, 1.35581536, 3.26898677])
We get the same result if we provide the variances computed as explained in the docstring:
In [42]: pdist(A, metric='seuclidean', V=np.var(A, axis=0, ddof=1))
Out[42]:
array([ 1.98029509, 2.55814731, 1.82574186, 2.71163072, 2.63368079,
0.76696499, 2.9868995 , 3.14284123, 1.35581536, 3.26898677])
Of course, if you provide variances that are all 1, you get the regular Euclidean distance:
In [43]: pdist(A, metric='seuclidean', V=np.ones(A.shape[1]))
Out[43]:
array([ 1.41421356, 3.16227766, 1. , 2.82842712, 2.44948974,
1. , 2.44948974, 3.31662479, 1.41421356, 3. ])
In [44]: pdist(A, metric='euclidean')
Out[44]:
array([ 1.41421356, 3.16227766, 1. , 2.82842712, 2.44948974,
1. , 2.44948974, 3.31662479, 1.41421356, 3. ])
The problem with your "Method 1" is that in your input array of just two points (i.e. [X[0], X2[0]]), the second and third components of the points don't change, so the variance associated with those components is 0:
In [45]: p = np.array([X[0], X2[0]])
In [46]: p
Out[46]:
array([[0, 1, 0],
[1, 1, 0]])
In [47]: np.var(p, axis=0, ddof=1)
Out[47]: array([ 0.5, 0. , 0. ])
When the code for the seuclidean divides by these variances, the result is either infinity or NaN--the latter if the numerator is also 0, which is the case in the third component of the input [X[0], X2[0]].
To work around this, you have to decide how you want to handle the case where the variance of a component is 0, and handle it explicitly. For example, if you want it to act like that variance is 1 in that case (just to avoid dividing by 0) you could do something like the following.
Suppose B is our array of points. The third column of B is all 1s.
In [63]: B
Out[63]:
array([[3, 0, 1],
[2, 1, 1],
[0, 0, 1],
[3, 1, 1],
[1, 0, 1]])
Compute the variances of the columns:
In [64]: V = np.var(B, axis=0, ddof=1)
In [65]: V
Out[65]: array([ 1.7, 0.3, 0. ])
Replace the variances that are 0 with 1:
In [66]: V[V == 0] = 1
In [67]: V
Out[67]: array([ 1.7, 0.3, 1. ])
Use V to compute the standardized Euclidean distances:
In [68]: pdist(B, metric='seuclidean', V=V)
Out[68]:
array([ 1.98029509, 2.30089497, 1.82574186, 1.53392998, 2.38459106,
0.76696499, 1.98029509, 2.93725228, 0.76696499, 2.38459106])
This has the same effect as simply removing the constant column:
In [69]: pdist(B[:, :2], metric='seuclidean')
Out[69]:
array([ 1.98029509, 2.30089497, 1.82574186, 1.53392998, 2.38459106,
0.76696499, 1.98029509, 2.93725228, 0.76696499, 2.38459106])
Your "Method 2" is wrong because your formula is wrong. You have to keep the variances for each component. np.var([X[0], X2[0]]) computes the (single) variance of all the values in the input. Instead, you need to use the axis and ddof arguments shown above.

Numpy calculation of eigenvectors is incorrect

I run the following in Python and expected the columns in E[1] to be the eigenvectors of A, but they are not. Only Sympy.Matrix.eigenvects() seem to do it right. Why this error?
A
Out[194]:
matrix([[-3, 3, 2],
[ 1, -1, -2],
[-1, -3, 0]])
E = np.linalg.eig(A)
E
Out[196]:
(array([ 2., -4., -2.]),
matrix([[ -2.01889132e-16, 9.48683298e-01, 8.94427191e-01],
[ 5.54700196e-01, -3.16227766e-01, -3.71551690e-16],
[ -8.32050294e-01, 2.73252305e-17, 4.47213595e-01]]))
A*E[1] / E[1]
Out[205]:
matrix([[ 6.59900617, -4. , -2. ],
[ 2. , -4. , -3.88449298],
[ 2. , 8.125992 , -2. ]])
The eigenvectors are correct, within an expected margin of error.
What you discovered is that testing eigenvectors with element-wise division is a bad idea.
A better way is to compute the norm of the difference between matrix*vector and eigenvalue*vector.
NumPy performs computations in floating point arithmetics, limited to 52 bits of precision (double precision). This means any of its answers may contain numerical errors, at least of relative size 2**(-52) which is about 2e-16. So, when you see a number like 2e-16 coming from a calculation with numbers of size 1-3, the conclusion is: "that number should probably be zero, and the value we have for it is likely just noise". And if you divide by that number, noise is all you get.
SymPy, on the other hand, performs symbolic manipulations, so its answer (when it can get one) is exactly what the theory predicts.
From its docs:
The number w is an eigenvalue of a if there exists a vector v such that dot(a,v) = w * v. Thus, the arrays a, w, and v satisfy the equations dot(a[:,:], v[:,i]) = w[i] * v[:,i] for i \in {0,...,M-1}.
With your matrix:
In [1]: A = np.array([[-3, 3, 2],
...: [ 1, -1, -2],
...: [-1, -3, 0]])
...:
In [2]: w,v=np.linalg.eig(A)
In [3]: w
Out[3]: array([ 2., -4., -2.])
In [4]: v
Out[4]:
array([[ -9.39932874e-17, 9.48683298e-01, 8.94427191e-01],
[ 5.54700196e-01, -3.16227766e-01, 1.93473310e-16],
[ -8.32050294e-01, -4.08811066e-17, 4.47213595e-01]])
In [5]: np.dot(A,v)
Out[5]:
array([[ -2.22044605e-16, -3.79473319e+00, -1.78885438e+00],
[ 1.10940039e+00, 1.26491106e+00, -7.77156117e-16],
[ -1.66410059e+00, 4.44089210e-16, -8.94427191e-01]])
In [6]: w*v
Out[6]:
array([[ -1.87986575e-16, -3.79473319e+00, -1.78885438e+00],
[ 1.10940039e+00, 1.26491106e+00, -3.86946619e-16],
[ -1.66410059e+00, 1.63524427e-16, -8.94427191e-01]])
In [7]: np.dot(A,v)-w*v
Out[7]:
array([[ -3.40580301e-17, 8.88178420e-16, 2.22044605e-16],
[ 8.88178420e-16, -6.66133815e-16, -3.90209498e-16],
[ -2.22044605e-16, 2.80564783e-16, -3.33066907e-16]])
In [8]: np.allclose(np.dot(A,v), w*v)
Out[8]: True
So, yes, the documented test is satisfied, within floating point limits.
einsum can be used to highlight the i axis in the dot calculation.
In [10]: np.einsum('...k,ki->...i',A,v)
Out[10]:
array([[ -2.22044605e-16, -3.79473319e+00, -1.78885438e+00],
[ 1.10940039e+00, 1.26491106e+00, -7.77156117e-16],
[ -1.66410059e+00, 3.88578059e-16, -8.94427191e-01]])
When I divide by v (element wise), the result matches the eigenvalues, 2 -4,-2, except where v and the dot are virtually 0 (1e-16 or smaller).
In [11]: np.einsum('...k,ki->...i',A,v)/v
Out[11]:
array([[ 2.36234534, -4. , -2. ],
[ 2. , -4. , -4.01686475],
[ 2. , -9.50507681, -2. ]])

Replace all elements of a matrix by their inverses

I've got a simple problem and I can't figure out how to solve it.
Here is a matrix: A = np.array([[1,0,3],[0,7,9],[0,0,8]]).
I want to find a quick way to replace all elements of this matrix by their inverses, excluding of course the zero elements.
I know, thanks to the search engine of Stackoverflow, how to replace an element by a given value with a condition. On the contrary, I do not figure out how to replace elements by new elements depending on the previous ones (e.g. squared elements, inverses, etc.)
Use 1. / A (notice the dot for Python 2):
>>> A
array([[1, 0, 3],
[0, 7, 9],
[0, 0, 8]], dtype)
>>> 1./A
array([[ 1. , inf, 0.33333333],
[ inf, 0.14285714, 0.11111111],
[ inf, inf, 0.125 ]])
Or if your array has dtype float, you can do it in-place without warnings:
>>> A = np.array([[1,0,3], [0,7,9], [0,0,8]], dtype=np.float64)
>>> A[A != 0] = 1. / A[A != 0]
>>> A
array([[ 1. , 0. , 0.33333333],
[ 0. , 0.14285714, 0.11111111],
[ 0. , 0. , 0.125 ]])
Here we use A != 0 to select only those elements that are non-zero.
However if you try this on your original array you'd see
array([[1, 0, 0],
[0, 0, 0],
[0, 0, 0]])
because your array could only hold integers, and inverse of all others would have been rounded down to 0.
Generally all of the numpy stuff on arrays does element-wise vectorized transformations so that to square elements,
>>> A = np.array([[1,0,3],[0,7,9],[0,0,8]])
>>> A * A
array([[ 1, 0, 9],
[ 0, 49, 81],
[ 0, 0, 64]])
And just a note on Antti Haapala's answer, (Sorry, I can't comment yet)
if you wanted to keep the 0's, you could use
B=1./A #I use the 1. to make sure it uses floats
B[B==np.inf]=0

Categories