Using Python & Numpy, I would like to:
Consider each row of an (n columns x
m rows) matrix as a vector
Weight each row (scalar
multiplication on each component of
the vector)
Add each row to create a final vector
(vector addition).
The weights are given in a regular numpy array, n x 1, so that each vector m in the matrix should be multiplied by weight n.
Here's what I've got (with test data; the actual matrix is huge), which is perhaps very un-Numpy and un-Pythonic. Can anyone do better? Thanks!
import numpy
# test data
mvec1 = numpy.array([1,2,3])
mvec2 = numpy.array([4,5,6])
start_matrix = numpy.matrix([mvec1,mvec2])
weights = numpy.array([0.5,-1])
#computation
wmatrix = [ weights[n]*start_matrix[n] for n in range(len(weights)) ]
vector_answer = [0,0,0]
for x in wmatrix: vector_answer+=x
Even a 'technically' correct answer has been all ready given, I'll give my straightforward answer:
from numpy import array, dot
dot(array([0.5, -1]), array([[1, 2, 3], [4, 5, 6]]))
# array([-3.5 -4. -4.5])
This one is much more on with the spirit of linear algebra (and as well those three dotted requirements on top of the question).
Update:
And this solution is really fast, not marginally, but easily some (10- 15)x faster than all ready proposed one!
It will be more convenient to use a two-dimensional numpy.array than a numpy.matrix in this case.
start_matrix = numpy.array([[1,2,3],[4,5,6]])
weights = numpy.array([0.5,-1])
final_vector = (start_matrix.T * weights).sum(axis=1)
# array([-3.5, -4. , -4.5])
The multiplication operator * does the right thing here due to NumPy's broadcasting rules.
Related
I have to solve a XOR operation on very high dimensional (~30'000) vectors to compute the Hamming distance. For example, I need to compute the XOR operation between one vector full of False with 16 sparsily located True with each row of a 50'000x30'000 matrix.
As of now, the quickest way I found is to not use scipy.sparse but the simple ^ operation on each row.
This:
l1distances=(self.hashes[index,:]^self.hashes[all_points,:]).sum(axis=1)
Happens to be ten times faster than this:
sparse_hashes = scipy.sparse.csr_matrix((self.hashes)).astype('bool')
for i in range(all_points.shape[0]):
l1distances[0,i]=(sparse_hashes[index]-sparse_hashes[all_points[i]]).sum()
But ten times faster is still quite slow since, theoretically, having a sparse vector with 16 activations should make the computation the same as having a 16 dimension one.
Is there any solution? I'm really struggling here, thanks for the help!
If your vector is highly sparse (like 16/30000) I'd probably just skip fiddling with sparse xor entirely.
from scipy import sparse
import numpy as np
import numpy.testing as npt
matrix_1 = sparse.random(10000, 100, density=0.1, format='csc')
matrix_1.data = np.ones(matrix_1.data.shape, dtype=bool)
matrix_2 = sparse.random(1, 100, density=0.1, format='csc', dtype=bool)
vec = matrix_2.A.flatten()
# Pull out the part of the sparse matrix that matches the vector and sum it after xor
matrix_xor = (matrix_1[:, vec].A ^ np.ones(vec.sum(), dtype=bool)[np.newaxis, :]).sum(axis=1)
# Sum the part that doesnt match the vector and add it
l1distances = matrix_1[:, ~vec].sum(axis=1).A.flatten() + matrix_xor
# Double check that I can do basic math
npt.assert_array_equal(l1distances, (matrix_1.A ^ vec[np.newaxis, :]).sum(axis=1))
Assume I have a set of vectors $ a_1, ..., a_d $ that are orthonormal to each other. Now, I want to find another vector $ a_{d+1} $ that is orthogonal to all the other vectors.
Is there an efficient algorithm to achieve this? I can only think of adding a random vector to the end, and then applying gram-schmidt.
Is there a python library which already achieves this?
Related. Can't speak to optimality, but here is a working solution. The good thing is that numpy.linalg does all of the heavy lifting, so this may be speedier and more robust than doing Gram-Schmidt by hand. Besides, this suggests that the complexity is not worse than Gram-Schmidt.
The idea:
Treat your input orthogonal vectors as columns of a matrix O.
Add another random column to O. Generically O will remain a full-rank matrix.
Choose b = [0, 0, ..., 0, 1] with len(b) = d + 1.
Solve a least-squares problem x O = b. Then, x is guaranteed to be non-zero and orthogonal to all original columns of O.
import numpy as np
from numpy.linalg import lstsq
from scipy.linalg import orth
# random matrix
M = np.random.rand(10, 5)
# get 5 orthogonal vectors in 10 dimensions in a matrix form
O = orth(M)
def find_orth(O):
rand_vec = np.random.rand(O.shape[0], 1)
A = np.hstack((O, rand_vec))
b = np.zeros(O.shape[1] + 1)
b[-1] = 1
return lstsq(A.T, b)[0]
res = find_orth(O)
if all(np.abs(np.dot(res, col)) < 10e-9 for col in O.T):
print("Success")
else:
print("Failure")
Good afternoon all relatively simple`question here from a mechanical standpoint.
I'm currently performing PCA and have successfully written a code that computes the covariance matrix and correlation matrix, and the associated eigenspectrum.
Now, I have created an array that represents the eigenvectors row wise, and i would like to compute the transformation C*v^t, where c is the observation matrix and v^t is the element wise entries of the eigen vector transposed.
Now, since some of these matrices are pretty big-i'd like to be able to tell python which row of the eigenvector matrix to mulitply C by. So far I have tried some of the numpy functions, but to no avail.
(for those of you wondering, i don't want to compute the matrix product of all the eigen vecotrs, i only need to multiply by a small subset of them-the ones associated with the largest eigenvalues)
Thanks!
To "slice" a vector of row n out of 2-dimensional array A, you use a syntax like A[n]. If it's slicing columns you wanted instead, the syntax is A[:,n].
For transformations with numpy arrays and vectors, the syntax is with matrix multiplication operator:
>>> A = np.array([[0, -1], [1, 0]])
>>> vs = np.array([[1, 2], [3, 4]])
>>> A # vs[0] # this is a rotation of the first row of vs by A
array([-2, 1])
>>> A # vs[1] # this is a rotation of second row of vs by A
array([-4, 3])
Note: If you're on older python version (< 3.5), you might not have # available yet. Then you'll have to use a function np.dot(array, vector) instead of the operator.
I'd like to use numpy to calculate the inverse. But I'm getting an error:
'numpy.ndarry' object has no attribute I
To calculate inverse of a matrix in numpy, say matrix M, it should be simply:
print M.I
Here's the code:
x = numpy.empty((3,3), dtype=int)
for comb in combinations_with_replacement(range(10), 9):
x.flat[:] = comb
print x.I
I'm presuming, this error occurs because x is now flat, thus 'I' command is not compatible. Is there a work around for this?
My goal is to print the INVERSE MATRIX of every possible numerical matrix combination.
The I attribute only exists on matrix objects, not ndarrays. You can use numpy.linalg.inv to invert arrays:
inverse = numpy.linalg.inv(x)
Note that the way you're generating matrices, not all of them will be invertible. You will either need to change the way you're generating matrices, or skip the ones that aren't invertible.
try:
inverse = numpy.linalg.inv(x)
except numpy.linalg.LinAlgError:
# Not invertible. Skip this one.
pass
else:
# continue with what you were doing
Also, if you want to go through all 3x3 matrices with elements drawn from [0, 10), you want the following:
for comb in itertools.product(range(10), repeat=9):
rather than combinations_with_replacement, or you'll skip matrices like
numpy.array([[0, 1, 0],
[0, 0, 0],
[0, 0, 0]])
Another way to do this is to use the numpy matrix class (rather than a numpy array) and the I attribute. For example:
>>> m = np.matrix([[2,3],[4,5]])
>>> m.I
matrix([[-2.5, 1.5],
[ 2. , -1. ]])
Inverse of a matrix using python and numpy:
>>> import numpy as np
>>> b = np.array([[2,3],[4,5]])
>>> np.linalg.inv(b)
array([[-2.5, 1.5],
[ 2. , -1. ]])
Not all matrices can be inverted. For example singular matrices are not Invertable:
>>> import numpy as np
>>> b = np.array([[2,3],[4,6]])
>>> np.linalg.inv(b)
LinAlgError: Singular matrix
Solution to singular matrix problem:
try-catch the Singular Matrix exception and keep going until you find a transform that meets your prior criteria AND is also invertable.
What about inv?
e.g.:
my_inverse_array = inv(my_array)
IDK if anyone already mentioned this but I want to point out that matrix_object. I and np.linalg.inv(matrix_object) don't give a true inverse. This has given me a lot of grief. It's true that for a matrix object m, np.dot(m, m.I) = an identity matrix, but np.dot(m.I, m) =/= I. Same goes for np.linalg.inv(I).
Be careful with that.
Suppose I have two vectors of length 25, and I want to compute their covariance matrix. I try doing this with numpy.cov, but always end up with a 2x2 matrix.
>>> import numpy as np
>>> x=np.random.normal(size=25)
>>> y=np.random.normal(size=25)
>>> np.cov(x,y)
array([[ 0.77568388, 0.15568432],
[ 0.15568432, 0.73839014]])
Using the rowvar flag doesn't help either - I get exactly the same result.
>>> np.cov(x,y,rowvar=0)
array([[ 0.77568388, 0.15568432],
[ 0.15568432, 0.73839014]])
How can I get the 25x25 covariance matrix?
You have two vectors, not 25. The computer I'm on doesn't have python so I can't test this, but try:
z = zip(x,y)
np.cov(z)
Of course.... really what you want is probably more like:
n=100 # number of points in each vector
num_vects=25
vals=[]
for _ in range(num_vects):
vals.append(np.random.normal(size=n))
np.cov(vals)
This takes the covariance (I think/hope) of num_vects 1xn vectors
Try this:
import numpy as np
x=np.random.normal(size=25)
y=np.random.normal(size=25)
z = np.vstack((x, y))
c = np.cov(z.T)
Covariance matrix from samples vectors
To clarify the small confusion regarding what is a covariance matrix defined using two N-dimensional vectors, there are two possibilities.
The question you have to ask yourself is whether you consider:
each vector as N realizations/samples of one single variable (for example two 3-dimensional vectors [X1,X2,X3] and [Y1,Y2,Y3], where you have 3 realizations for the variables X and Y respectively)
each vector as 1 realization for N variables (for example two 3-dimensional vectors [X1,Y1,Z1] and [X2,Y2,Z2], where you have 1 realization for the variables X,Y and Z per vector)
Since a covariance matrix is intuitively defined as a variance based on two different variables:
in the first case, you have 2 variables, N example values for each, so you end up with a 2x2 matrix where the covariances are computed thanks to N samples per variable
in the second case, you have N variables, 2 samples for each, so you end up with a NxN matrix
About the actual question, using numpy
if you consider that you have 25 variables per vector (took 3 instead of 25 to simplify example code), so one realization for several variables in one vector, use rowvar=0
# [X1,Y1,Z1]
X_realization1 = [1,2,3]
# [X2,Y2,Z2]
X_realization2 = [2,1,8]
numpy.cov([X,Y],rowvar=0) # rowvar false, each column is a variable
Code returns, considering 3 variables:
array([[ 0.5, -0.5, 2.5],
[-0.5, 0.5, -2.5],
[ 2.5, -2.5, 12.5]])
otherwise, if you consider that one vector is 25 samples for one variable, use rowvar=1 (numpy's default parameter)
# [X1,X2,X3]
X = [1,2,3]
# [Y1,Y2,Y3]
Y = [2,1,8]
numpy.cov([X,Y],rowvar=1) # rowvar true (default), each row is a variable
Code returns, considering 2 variables:
array([[ 1. , 3. ],
[ 3. , 14.33333333]])
Reading the documentation as,
>> np.cov.__doc__
or looking at Numpy Covariance, Numpy treats each row of array as a separate variable, so you have two variables and hence you get a 2 x 2 covariance matrix.
I think the previous post has right solution. I have the explanation :-)
I suppose what youre looking for is actually a covariance function which is a timelag function. I'm doing autocovariance like that:
def autocovariance(Xi, N, k):
Xs=np.average(Xi)
aCov = 0.0
for i in np.arange(0, N-k):
aCov = (Xi[(i+k)]-Xs)*(Xi[i]-Xs)+aCov
return (1./(N))*aCov
autocov[i]=(autocovariance(My_wector, N, h))
You should change
np.cov(x,y, rowvar=0)
onto
np.cov((x,y), rowvar=0)
What you got (2 by 2) is more useful than 25*25. Covariance of X and Y is an off-diagonal entry in the symmetric cov_matrix.
If you insist on (25 by 25) which I think useless, then why don't you write out the definition?
x=np.random.normal(size=25).reshape(25,1) # to make it 2d array.
y=np.random.normal(size=25).reshape(25,1)
cov = np.matmul(x-np.mean(x), (y-np.mean(y)).T) / len(x)
As pointed out above, you only have two vectors so you'll only get a 2x2 cov matrix.
IIRC the 2 main diagonal terms will be sum( (x-mean(x))**2) / (n-1) and similarly for y.
The 2 off-diagonal terms will be sum( (x-mean(x))(y-mean(y)) ) / (n-1). n=25 in this case.
according the document, you should expect variable vector in column:
If we examine N-dimensional samples, X = [x1, x2, ..., xn]^T
though later it says each row is a variable
Each row of m represents a variable.
so you need input your matrix as transpose
x=np.random.normal(size=25)
y=np.random.normal(size=25)
X = np.array([x,y])
np.cov(X.T)
and according to wikipedia: https://en.wikipedia.org/wiki/Covariance_matrix
X is column vector variable
X = [X1,X2, ..., Xn]^T
COV = E[X * X^T] - μx * μx^T // μx = E[X]
you can implement it yourself:
# X each row is variable
X = X - X.mean(axis=0)
h,w = X.shape
COV = X.T # X / (h-1)
i don't think you understand the definition of covariance matrix.
If you need 25 x 25 covariance matrix, you need 25 vectors each with n data points.