I want to create an orthogonal matrices in Python to visualize the decline of a signal according to the distance from the source and the angle to the source.
For simplicity we can describe the decline:
NewValue = cos(angle)*(StartingValue – a*(distance))
I found that Scipy.stats has ortho_group, which can be used to create random orthogonal matrices:
from scipy.stats import ortho_group
x = ortho_group.rvs(3)
np.dot(x, x.T)
# returns:
array([[ 1.00000000e+00, 1.13231364e-17, -2.86852790e-16],
[ 1.13231364e-17, 1.00000000e+00, -1.46845020e-16],
[ -2.86852790e-16, -1.46845020e-16, 1.00000000e+00]])
import scipy.linalg
np.fabs(scipy.linalg.det(x))
# returns:
1.0
Since a random matrix isn’t really useful, I keep wondering how I can create a orthogonal matrix with values according to my function.
A second challenge, I’m encountering is how to limit the range of the matrix to a range of angles of 0-45° degrees.
Related
When trying to reproduce the following example, where the decomposition is T x Lambda x T^{-1} (not named in the image but visible) where Lambda is a diagonal matrix of exponential eigenvalues, and T is some eigenvector matrix I think. I do not know how, or am unable to, calculate the matrix T in Python.
My code attempt is shown below. The eigenvectors vecs should correspond to T. what am I doing wrong? And is there a more appropriate function than eigs?
import numpy as np
from scipy.sparse.linalg import eigs
from scipy.linalg import expm,inv
A = np.array([[5,1],[-2,2]])
eA = expm(A)
vals, vecs = eigs(A, k=2)
print(vals) #eigenvalues match Lambda in example
print(vecs) #eigenvectors don't match T in example
print(inv(vecs))
Returns:
[4.+0.j 3.+0.j]
[[ 0.70710678 -0.4472136 ]
[-0.70710678 0.89442719]]
[[2.82842712 1.41421356]
[2.23606798 2.23606798]]
when they should be matrices of positive/negative 1's and 2's as shown in the image.
from numpy.linalg import inv, qr
X = np.random.randn(5, 5)
mat = X.T.dot(X)
inv(mat)
mat.dot(inv(mat))
dot product of matrix and its inverse should be Identity matrix.
But, here output is-
array([[ 1.00000000e+00, 6.70961522e-16, 3.98202719e-16,
-2.04084178e-15, 3.07963387e-16],
[-6.46120445e-15, 1.00000000e+00, 4.44698794e-16,
1.40254635e-15, 2.71601492e-16],
[ 3.00736839e-15, -5.65091222e-16, 1.00000000e+00,
1.63129995e-16, -6.43576692e-17],
[ 1.01120865e-14, -1.23622826e-15, -6.99882344e-16,
1.00000000e+00, -1.13627444e-16],
[-6.31447442e-15, 2.46897480e-15, 9.95010178e-16,
-2.81959392e-15, 1.00000000e+00]])
Please explain.
That must be due to the algorithm rounding but I've found that if you diagonalize the matrix and calculate the dot product with the inverse you end up correctly with the identity matrix. This might be due to a different algorithm used to calculate the inverse matrix for a diagonal matrix.
import numpy as np
m = np.random.randn(5,5)
print(np.linalg.det(m))
e = np.linalg.eig(m)[0]
mdiag = np.eye(5)*e
print(mdiag.dot(np.linalg.inv(mdiag)))
This method seems to work always for 3x3 matrix but some times fails for bigger matrixes since there is an immaginary part left in the order of 1e-17
What is the function of numpy.linalg.norm method?
In this Kmeans Clustering sample the numpy.linalg.norm function is used to get the distance between new centroids and old centroids in the movement centroid step but I cannot understand what is the meaning by itself
Could somebody give me a few ideas in relation to this Kmeans clustering context?
What is the norm of a vector?
numpy.linalg.norm is used to calculate the norm of a vector or a matrix.
This is the help document taken from numpy.linalg.norm:
numpy.linalg.norm(x, ord=None, axis=None, keepdims=False)[source]
This is the code snippet taken from K-Means Clustering in Python:
# Euclidean Distance Caculator
def dist(a, b, ax=1):
return np.linalg.norm(a - b, axis=ax)
It take order=None as default, so just to calculate the Frobenius norm of (a-b), this is ti calculate the distance between a and b( using the upper Formula).
I am not a mathematician but here is my layman's explanation of “norm”:
A vector describes the location of a point in space relative to the origin. Here’s an example in 2D space for the point [3 2]:
The norm is the distance from the point to the origin. In the 2D case it’s easy to visualize the point as the diametrically opposed point of a right triangle and see that the norm is the same thing as the hypotenuse.
However, In higher dimensions it’s no longer a shape we describe in average-person language, but the distance from the origin to the point is still called the norm. Here's an example in 3D space:
I don’t know why the norm is used in K-means clustering. You stated that it was part of determing the distance between the old and new centroid in each step. Not sure why one would use the norm for this since you can get the distance between two points in any dimensionality* using an extension of the from used in 2D algebra:
You just add a term for each addtional dimension, for example here is a 3D version:
*where the dimensions are positive integers
numpy.linalg.norm function is used to get the sum from a row or column of a matrix.Suppose ,
>>> c = np.array([[ 1, 2, 3],
... [-1, 1, 4]])
>>> LA.norm(c, axis=0)
array([ 1.41421356, 2.23606798, 5. ])
>>> LA.norm(c, axis=1)
array([ 3.74165739, 4.24264069])
>>> LA.norm(c, ord=1, axis=1)
array([6, 6])
I have a SAS script that uses the "proc corr" procedure, along with weighting in order to create a weighted correlation matrix. I am now trying to reproduce this function in python, but I haven't found a good way of including the weighting in the output matrix.
While looking for a solution, I've found a few scripts and functions that calculate weighted correlation coefficients for two columns/variables (examples here) using a weights array, but I am trying to create a weighted correlation matrix with many more variables. I've tried using these functions by looping through variable combinations, but it is running magnitudes slower than the SAS procedure.
I was wondering if there was an efficient way to create a weighted correlation matrix in python that works similarly to the SAS code, or at least returns equivalent results without looping through all variable combinations.
numpy's covariance takes two different kind of weights parameters - I don't have SAS to check against, but it is likely a similar approach.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html#numpy.cov
Once you have a covariance matrix, it can be converted to a correlation matrix using a formula like this
https://en.wikipedia.org/wiki/Covariance_matrix#Correlation_matrix
Complete example
import numpy as np
x = np.array([1., 1.1, 1.2, 0.9])
y = np.array([2., 2.05, 2.02, 2.8])
np.cov(x, y)
Out[49]:
array([[ 0.01666667, -0.03816667],
[-0.03816667, 0.151225 ]])
cov = np.cov(x, y, fweights=[10, 1, 1, 1])
cov
Out[51]:
array([[ 0.00474359, -0.00703205],
[-0.00703205, 0.04872308]])
def cov_to_corr(cov):
""" based on https://en.wikipedia.org/wiki/Covariance_matrix#Correlation_matrix """
D = np.sqrt(np.diag(np.diag(cov)))
Dinv = np.linalg.inv(D)
return Dinv # cov # Dinv # requires python3.5, use np.dot otherwise
cov_to_corr(cov)
Out[53]:
array([[ 1. , -0.46255259],
[-0.46255259, 1. ]])
import numpy as np
np.random.random((5,5))
array([[ 0.26045197, 0.66184973, 0.79957904, 0.82613958, 0.39644677],
[ 0.09284838, 0.59098542, 0.13045167, 0.06170584, 0.01265676],
[ 0.16456109, 0.87820099, 0.79891448, 0.02966868, 0.27810629],
[ 0.03037986, 0.31481138, 0.06477025, 0.37205248, 0.59648463],
[ 0.08084797, 0.10305354, 0.72488268, 0.30258304, 0.230913 ]])
I would like to create a 2D density estimate from this 2D array such that similar values imply higher density. Is there a way to do this in numpy?
I agree, it is indeed not entirely clear what you mean.
The numpy.histogram function provides you with the density for an array.
import numpy as np
array = np.random.random((5,5))
print array
density = np.histogram(array, density=True)
print(density)
You can then plot the density, for example with Matplotlib.
There is a great discussion on this here: How does numpy.histogram() work?