Subtract Vector from Every Column of a Matrix - python

Suppose I have an n x 1 column vector v, and an n x m matrix M. I'm looking for a method to subtract v from every column of M without a loop in Numpy. How can I do this?
I've searched the web and I can't find a method to do this.

Besides searching the web most of the time it is useful to just play around with the arrays and see what works. In you case it is really straight forward:
import numpy as np
n, m = 13, 17
v = np.random.random((n, 1))
M = np.random.random((n, m))
res = M - v
This is also a good resource to get familiar with the basic concepts of numpy.

Related

Solving A*x=x with numpy

I am new to numpy and I want to solve the equation A * x = x, where A is a matrix and x is a vector.
Searching for the vector x, if it exists.
I found the np.linalg.solve()-function , but didn't got it to work as intended.
The issue here is not so much a problem with numpy as with your understanding of the linear algebra involved. The question you are asking can be rephrased as:
A # x = x
A # x = I # x
(A - I) # x = 0
This is a specific formulation of the the general eigenvector problem, with the stipulation that the eigenvalue is 1.
Numpy solves this problem with the function np.linalg.eig:
w, v = np.linalg.eig(A)
You need to check if any of the values are 1 (there could be more than one):
mask = np.isclose(w, 1)
if mask.any():
for vec in v[:, mask].T:
print(vec)
else:
print('Nope!')
The elements of v are unit vectors. Keep in mind that any scalar multiple of such a vector is also a solution.
For issues with non-invertible matrices, you may want to use scipy.linalg.svd instead:
v, w, _ = svd(A)
The rest of the procedure will be the same.

Vectorise Python code

I have coded a kriging algorithm but I find it quite slow. Especially, do you have an idea on how I could vectorise the piece of code in the cons function below:
import time
import numpy as np
B = np.zeros((200, 6))
P = np.zeros((len(B), len(B)))
def cons():
time1=time.time()
for i in range(len(B)):
for j in range(len(B)):
P[i,j] = corr(B[i], B[j])
time2=time.time()
return time2-time1
def corr(x,x_i):
return np.exp(-np.sum(np.abs(np.array(x) - np.array(x_i))))
time_av = 0.
for i in range(30):
time_av+=cons()
print "Average=", time_av/100.
Edit: Bonus questions
What happens to the broadcasting solution if I want corr(B[i], C[j]) with C the same dimension than B
What happens to the scipy solution if my p-norm orders are an array:
p=np.array([1.,2.,1.,2.,1.,2.])
def corr(x, x_i):
return np.exp(-np.sum(np.abs(np.array(x) - np.array(x_i))**p))
For 2., I tried P = np.exp(-cdist(B, C,'minkowski', p)) but scipy is expecting a scalar.
Your problem seems very simple to vectorize. For each pair of rows of B you want to compute
P[i,j] = np.exp(-np.sum(np.abs(B[i,:] - B[j,:])))
You can make use of array broadcasting and introduce a third dimension, summing along the last one:
P2 = np.exp(-np.sum(np.abs(B[:,None,:] - B),axis=-1))
The idea is to reshape the first occurence of B to shape (N,1,M) while the second B is left with shape (N,M). With array broadcasting, the latter is equivalent to (1,N,M), so
B[:,None,:] - B
is of shape (N,N,M). Summing along the last index will then result in the (N,N)-shape correlation array you're looking for.
Note that if you were using scipy, you would be able to do this using scipy.spatial.distance.cdist (or, equivalently, a combination of scipy.spatial.distance.pdist and scipy.spatial.distance.squareform), without unnecessarily computing the lower triangular half of this symmetrix matrix. Using #Divakar's suggestion in comments for the simplest solution this way:
from scipy.spatial.distance import cdist
P3 = 1/np.exp(cdist(B, B, 'minkowski',1))
cdist will compute the Minkowski distance in 1-norm, which is exactly the sum of the absolute values of coordinate differences.

Dot product between 2D and 3D arrays

Assume that I have two arrays V and Q, where V is (i, j, j) and Q is (j, j). I now wish to compute the dot product of Q with each "row" of V and save the result as an (i, j, j) sized matrix. This is easily done using for-loops by simply iterating over i like
import numpy as np
v = np.random.normal(size=(100, 5, 5))
q = np.random.normal(size=(5, 5))
output = np.zeros_like(v)
for i in range(v.shape[0]):
output[i] = q.dot(v[i])
However, this is way too slow for my needs, and I'm guessing there is a way to vectorize this operation using either einsum or tensordot, but I haven't managed to figure it out. Could someone please point me in the right direction? Thanks
You can certainly use np.tensordot, but need to swap axes afterwards, like so -
out = np.tensordot(v,q,axes=(1,1)).swapaxes(1,2)
With np.einsum, it's a bit more straight-forward, like so -
out = np.einsum('ijk,lj->ilk',v,q)

How can I compute the null space/kernel (x: M·x = 0) of a sparse matrix in Python?

I found some examples online showing how to find the null space of a regular matrix in Python, but I couldn't find any examples for a sparse matrix (scipy.sparse.csr_matrix).
By null space I mean x such that M·x = 0, where '·' is matrix multiplication. Does anybody know how to do this?
Furthermore, in my case I know that the null space will consist of a single vector. Can this information be used to improve the efficiency of the method?
This isn't a complete answer yet, but hopefully it will be a starting point towards one. You should be able to compute the null space using a variant on the SVD-based approach shown for dense matrices in this question:
import numpy as np
from scipy import sparse
import scipy.sparse.linalg
def rand_rank_k(n, k, **kwargs):
"generate a random (n, n) sparse matrix of rank <= k"
a = sparse.rand(n, k, **kwargs)
b = sparse.rand(k, n, **kwargs)
return a.dot(b)
# I couldn't think of a simple way to generate a random sparse matrix with known
# rank, so I'm currently using a dense matrix for proof of concept
n = 100
M = rand_rank_k(n, n - 1, density=1)
# # this seems like it ought to work, but it doesn't
# u, s, vh = sparse.linalg.svds(M, k=1, which='SM')
# this works OK, but obviously converting your matrix to dense and computing all
# of the singular values/vectors is probably not feasible for large sparse matrices
u, s, vh = np.linalg.svd(M.todense(), full_matrices=False)
tol = np.finfo(M.dtype).eps * M.nnz
null_space = vh.compress(s <= tol, axis=0).conj().T
print(null_space.shape)
# (100, 1)
print(np.allclose(M.dot(null_space), 0))
# True
If you know that x is a single row vector then in principle you would only need to compute the smallest singular value/vector of M. It ought to be possible to do this using scipy.sparse.linalg.svds, i.e.:
u, s, vh = sparse.linalg.svds(M, k=1, which='SM')
null_space = vh.conj().ravel()
Unfortunately, scipy's svds seems to be badly behaved when finding small singular values of singular or near-singular matrices and usually either returns NaNs or throws an ArpackNoConvergence error.
I'm not currently aware of an alternative implementation of truncated SVD with Python bindings that will work on sparse matrices and can selectively find the smallest singular values - perhaps someone else knows of one?
Edit
As a side note, the second approach seems to work reasonably well using MATLAB or Octave's svds function:
>> M = rand(100, 99) * rand(99, 100);
% svds converges much more reliably if you set sigma to something small but nonzero
>> [U, S, V] = svds(M, 1, 1E-9);
>> max(abs(M * V))
ans = 1.5293e-10
I have been trying to find a solution to the same problem. Using Scipy's svds function provides unreliable results for small singular values. Therefore i have been using QR decomposition instead. The sparseqr https://github.com/yig/PySPQR provides a wrapper for Matlabs SuiteSparseQR method, and works reasonably well. Using this the null space can be calculated as:
from sparseqr import qr
Q, _, _,r = qr( M.transpose() )
N = Q.tocsr()[:,r:]

Single operation to take the matrix product along only the last two dimensions

This is probably obvious on reflection, but it's not clear to me right now.
For a pair of numpy arrays of shapes (K, N, M) and (K, M, N) denoted by a and b respectively, is there a way to compute the following as a single vectorized operation:
import numpy as np
K = 5
N = 2
M = 3
a = np.random.randn(K, N, M)
b = np.random.randn(K, M, N)
output = np.empty((K, N, N))
for each_a, each_b, each_out in zip(a, b, output):
each_out[:] = each_a.dot(each_b)
A simple a.dot(b) returns the dot product for every pair of the first axis (so it returns an array of shape (K, N, K, N).
edit: fleshed out the code a bit for those that couldn't understand the question.
I answered a similar question a while back: Element-wise matrix multiplication in NumPy .
I think what you're looking for is:
output = np.einsum('ijk,ikl->ijl', a, b)
Good luck!

Categories