Summing vector pairs efficiently in pytorch - python

I'm trying to calculate the summation of each pair of rows in a matrix. Suppose I have an m x n matrix, say one like
[[1,2,3],
[4,5,6],
[7,8,9]]
and I want to create a matrix of the summations of all pairs of rows. So, for the above matrix, we would want
[[5,7,9],
[8,10,12],
[11,13,15]]
In general, I think the new matrix will be (m choose 2) x n. For the above example in pytorch, I ran
import torch
x = torch.tensor([[1,2,3], [4,5,6], [7,8,9]])
y = x[None] + x[:, None]
torch.cat((y[0, 1:3, :], y[1, 2:3, :]))
which manually creates the matrix I am looking for. However, I am struggling to think of a way to create the output without manually specifying indices and without using a for-loop. Is there even a way to create such a matrix for an arbitrary matrix without the use of a for-loop?

You can try using this function:
def sum_rows(x):
y = x[None] + x[:, None]
ind = torch.tril_indices(x.shape[0], x.shape[0], offset=-1)
return y[ind[0], ind[1]]
Because you know you want pairs with the constraints of sum_matrix[i,j], where i<j (but i>j would also work), you can just specify that you want the lower/upper triangle indices of your 3D matrix. This still uses a for loop, AFAIK, but should do the job for variable-sized inputs.

Related

Efficient way to fill NumPy array for independent entries?

I'm currently trying to fill a matrix K where each entry in the matrix is just a function applied to two entries of an array x.
At the moment I'm using the most obvious method of running through rows and columns one at a time using a double for-loop:
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
for j in range(x.shape[0]):
K[i,j] = f(x[i],x[j])
While this works fine the resulting matrix is a 10,000 by 10,000 matrix and takes very long to calculate. I was wondering if there is a more efficient way to do this built into NumPy?
EDIT: The function in question here is a gaussian kernel:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.dot(vec,vec)/(2*sigma**2))
where I set sigma in advance before calculating the matrix.
The array x is an array of shape (10000, 8). So the scalar product in the gaussian is between two vectors of dimension 8.
You can use a single for loop together with broadcasting. This requires to change the implementation of the gaussian function to accept 2D inputs:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.sum(vec**2, axis=-1)/(2*sigma**2))
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
K[i] = gaussian(x[i:i+1], x)
Theoretically you could accomplish this even without any for loop, again by using broadcasting, but here an intermediary array of size len(x)**2 * x.shape[1] will be created which might run out of memory for your array sizes:
K = gaussian(x[None, :, :], x[:, None, :])

How to efficiently calculate pairwise intersection of nonzero indices in a scipy.csr sparse matrix?

I have a scipy.sparse.csr matrix X which is n x p. For each row in X I would like to compute the intersection of the non zero element indices with each row in X and store them in a new tensor or maybe even a dictionary. For example, X is:
X = [
[0., 1.5, 4.7],
[4., 0., 0.],
[0., 0., 2.6]
]
I would like the output to be
intersect =
[
[[1,2], [], [2]],
[[], [0], []],
[[2], [], [2]]
]
intersect[i,j] is an ndarray representing the intersection of the indices of nonzero elements of ith and jth rows of X i.e X[i], X[j].
Currently the way I am doing this is by looping and I would like to vectorize this as it would be much faster and the computations are done in parallel.
# current code
n = X.shape[0]
intersection_dict = {}
for i in range(n):
for j in range(n):
indices = np.intersect1d(X[i].indices, X[j].indices)
intersection_dict[(i,j)] = indices
My n is pretty large so looping over n^2 is very poor. I am just having trouble figuring out a way to vectorize this operation. Does anybody have any ideas on how to tackle this?
EDIT:
It was made apparent that I should explain the problem I am trying to solve, so here it is.
I am solving an optimization problem and have an equation
W = X diag(theta) X'. I want to find W in a quick manner as I update the entries of theta till convergence. Further I am updating parameters using pytorch where sparse operations are not as extensive as in scipy.
where:
X : n x p sparse data matrix (n documents, p features)
theta : p x 1 parameter vector I want to learn and will be updating
X' : p x n transpose of sparse data matrix
note p >> n
I had in mind two methods of solving this quickly
Cache sparse outer product of (see More efficient matrix multiplication with diagonal matrix)
W_ij = X_i * theta * X_j (element wise product of row i of X, theta, and row j of X. And since X_i, X_j are sparse I was thinking if I take the intersection of the nonzero indices then I can do a simple dense elementwise product (sparse element wise product not supported in pytorch) of X_i[intersection indices] * theta[intersection indices] X_j[intersection indices]
I want to vectorize as much of this computation as possible rather than loop as my n is typically in the thousands and p is 11 million.
I am attempting method 2 over method 1 do to the lack of sparse support in Pytorch. Mainly when updating the entries of theta I would not like to do sparse-dense or sparse-sparse operations. I want to do dense-dense operations.
The optimization you're looking at requires storing p different n x n matrices. If you do want to try it, I'd probably use all the functionality built into sparse matrices in scipy's C extensions.
import numpy as np
from scipy import sparse
arr = sparse.random(100,10000, format="csr", density=0.01)
xxt = arr # arr.T
p_comps = [arr[:, i] # arr.T[i, :] for i in range(arr.shape[1])]
def calc_weights(xxt, thetas, p_comps):
xxt = xxt.copy()
xxt.data = np.zeros(xxt.data.shape, dtype=xxt.dtype)
for i, t in enumerate(thetas):
xxt += (p_comps[i] * t)
return xxt
W = calc_weights(xxt, np.ones(10000), p_comps)
>>>(xxt.A == W.A).all()
True
It's really unlikely that this is going to work well implemented in python. You may have better luck doing this in C, or writing something with nested loops that operates on elements and is amenable to getting JIT compiled with numba.
One first easy solution is to notice that the output matrix is symmetrical:
n = X.shape[0]
intersection_dict = {}
for i in range(n):
for j in range(i,n): #note the edit here
indices = np.intersect1d(X[i].indices, X[j].indices)
intersection_dict[(i,j)] = indices
This will reduce by a bit less than 2X your computation

Python: vectorization over numpy array

X and Y are both 3d arrays with dimensions (a,b,c). My goal is to do a dot product.
Consider that case where index i and j are scalar, and (X[i,:,j].T).dot(Y[i,:,j]) would be simple and return a scalar.
However, if I try to do vectorization, i and j become 1d arrays, and (X[i,:,j].T).dot(Y[i,:,j]) return a matrix but I am expecting a 1d array as result. How do I get around this problem ?
Naive implementation using list comprehension:
a,b,c = X.shape
r1 = [(X[i,:,j].T).dot(Y[i,:,j]) for i in range(a) for j in range(c)]
Implementation using np.einsum:
r2 = np.einsum('ijk,ijk->ik', X,Y).flatten()

Scipy: Sparse indicator matrix from array(s)

What is the most efficient way to compute a sparse boolean matrix I from one or two arrays a,b, with I[i,j]==True where a[i]==b[j]? The following is fast but memory-inefficient:
I = a[:,None]==b
The following is slow and still memory-inefficient during creation:
I = csr((a[:,None]==b),shape=(len(a),len(b)))
The following gives at least the rows,cols for better csr_matrix initialization, but it still creates the full dense matrix and is equally slow:
z = np.argwhere((a[:,None]==b))
Any ideas?
One way to do it would be to first identify all different elements that a and b have in common using sets. This should work well if there are not very many different possibilities for the values in a and b. One then would only have to loop over the different values (below in variable values) and use np.argwhere to identify the indices in a and b where these values occur. The 2D indices of the sparse matrix can then be constructed using np.repeat and np.tile:
import numpy as np
from scipy import sparse
a = np.random.randint(0, 10, size=(400,))
b = np.random.randint(0, 10, size=(300,))
## matrix generation after OP
I1 = sparse.csr_matrix((a[:,None]==b),shape=(len(a),len(b)))
##identifying all values that occur both in a and b:
values = set(np.unique(a)) & set(np.unique(b))
##here we collect the indices in a and b where the respective values are the same:
rows, cols = [], []
##looping over the common values, finding their indices in a and b, and
##generating the 2D indices of the sparse matrix with np.repeat and np.tile
for value in values:
x = np.argwhere(a==value).ravel()
y = np.argwhere(b==value).ravel()
rows.append(np.repeat(x, len(x)))
cols.append(np.tile(y, len(y)))
##concatenating the indices for different values and generating a 1D vector
##of True values for final matrix generation
rows = np.hstack(rows)
cols = np.hstack(cols)
data = np.ones(len(rows),dtype=bool)
##generating sparse matrix
I3 = sparse.csr_matrix( (data,(rows,cols)), shape=(len(a),len(b)) )
##checking that the matrix was generated correctly:
print((I1 != I3).nnz==0)
The syntax for generating the csr matrix is taken from the documentation. The test for sparse matrix equality is taken from this post.
Old Answer:
I don't know about performance, but at least you can avoid constructing the full dense matrix by using a simple generator expression. Here some code that uses two 1d arras of random integers to first generate the sparse matrix the way that the OP posted and then uses a generator expression to test all elements for equality:
import numpy as np
from scipy import sparse
a = np.random.randint(0, 10, size=(400,))
b = np.random.randint(0, 10, size=(300,))
## matrix generation after OP
I1 = sparse.csr_matrix((a[:,None]==b),shape=(len(a),len(b)))
## matrix generation using generator
data, rows, cols = zip(
*((True, i, j) for i,A in enumerate(a) for j,B in enumerate(b) if A==B)
)
I2 = sparse.csr_matrix((data, (rows, cols)), shape=(len(a), len(b)))
##testing that matrices are equal
## from https://stackoverflow.com/a/30685839/2454357
print((I1 != I2).nnz==0) ## --> True
I think there is no way around the double loop and ideally this would be pushed into numpy, but at least with the generator the loops are somewhat optimised ...
You could use numpy.isclose with small tolerance:
np.isclose(a,b)
Or pandas.DataFrame.eq:
a.eq(b)
Note this returns an array of True False.

Creating a sparse matrix from lists of sub matrices (Python)

This is my first SO question ever. Let me know if I could have asked it better :)
I am trying to find a way to splice together lists of sparse matrices into a larger block matrix.
I have python code that generates lists of square sparse matrices, matrix by matrix. In pseudocode:
Lx = [Lx1, Lx1, ... Lxn]
Ly = [Ly1, Ly2, ... Lyn]
Lz = [Lz1, Lz2, ... Lzn]
Since each individual Lx1, Lx2 etc. matrix is computed sequentially, they are appended to a list--I could not find a way to populate an array-like object "on the fly".
I am optimizing for speed, and the bottleneck features a computation of Cartesian products item-by-item, similar to the pseudocode:
M += J[i,j] * [ Lxi *Lxj + Lyi*Lyj + Lzi*Lzj ]
for all combinations of 0 <= i, j <= n. (J is an n-dimensional square matrix of numbers).
It seems that vectorizing this by computing all the Cartesian products in one step via (pseudocode):
L = [ [Lx1, Lx2, ...Lxn],
[Ly1, Ly2, ...Lyn],
[Lz1, Lz2, ...Lzn] ]
product = L.T * L
would be faster. However, options such as np.bmat, np.vstack, np.hstack seem to require arrays as inputs, and I have lists instead.
Is there a way to efficiently splice the three lists of matrices together into a block? Or, is there a way to generate an array of sparse matrices one element at a time and then np.vstack them together?
Reference: Similar MATLAB code, used to compute the Hamiltonian matrix for n-spin NMR simulation, can be found here:
http://spindynamics.org/Spin-Dynamics---Part-II---Lecture-06.php
This is scipy.sparse.bmat:
L = scipy.sparse.bmat([Lx, Ly, Lz], format='csc')
LT = scipy.sparse.bmat(zip(Lx, Ly, Lz), format='csr') # Not equivalent to L.T
product = LT * L
I have a "vectorized" solution, but it's almost twice as slow as the original code. Both the bottleneck shown above, and the final dot product shown in the last line below, take about 95% of the calculation time according to kernprof tests.
# Create the matrix of column vectors from these lists
L_column = bmat([Lx, Ly, Lz], format='csc')
# Create the matrix of row vectors (via a transpose of matrix with
# transposed blocks)
Lx_trans = [x.T for x in Lx]
Ly_trans = [y.T for y in Ly]
Lz_trans = [z.T for z in Lz]
L_row = bmat([Lx_trans, Ly_trans, Lz_trans], format='csr').T
product = L_row * L_column
I was able to get a tenfold speed increase by not using sparse matrices and using an array of arrays.
Lx = np.empty((1, nspins), dtype='object')
Ly = np.empty((1, nspins), dtype='object')
Lz = np.empty((1, nspins), dtype='object')
These are populated with the individual Lx arrays (formerly sparse matrices) as they are generated. Using the array structure allows the transpose and Cartesian product to perform as desired:
Lcol = np.vstack((Lx, Ly, Lz)).real
Lrow = Lcol.T # As opposed to sparse version of code, this works!
Lproduct = np.dot(Lrow, Lcol)
The individual Lx[n] matrices are still "bundled", so Product is an n x n matrix. This means in-place multiplication of the n x n J array with Lproduct works:
scalars = np.multiply(J, Lproduct)
Each matrix element is then added on to the final hamiltonian matrix:
for n in range(nspins):
for m in range(nspins):
M += scalars[n, k].real

Categories