Efficient way to fill NumPy array for independent entries? - python

I'm currently trying to fill a matrix K where each entry in the matrix is just a function applied to two entries of an array x.
At the moment I'm using the most obvious method of running through rows and columns one at a time using a double for-loop:
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
for j in range(x.shape[0]):
K[i,j] = f(x[i],x[j])
While this works fine the resulting matrix is a 10,000 by 10,000 matrix and takes very long to calculate. I was wondering if there is a more efficient way to do this built into NumPy?
EDIT: The function in question here is a gaussian kernel:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.dot(vec,vec)/(2*sigma**2))
where I set sigma in advance before calculating the matrix.
The array x is an array of shape (10000, 8). So the scalar product in the gaussian is between two vectors of dimension 8.

You can use a single for loop together with broadcasting. This requires to change the implementation of the gaussian function to accept 2D inputs:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.sum(vec**2, axis=-1)/(2*sigma**2))
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
K[i] = gaussian(x[i:i+1], x)
Theoretically you could accomplish this even without any for loop, again by using broadcasting, but here an intermediary array of size len(x)**2 * x.shape[1] will be created which might run out of memory for your array sizes:
K = gaussian(x[None, :, :], x[:, None, :])

Related

Summing vector pairs efficiently in pytorch

I'm trying to calculate the summation of each pair of rows in a matrix. Suppose I have an m x n matrix, say one like
[[1,2,3],
[4,5,6],
[7,8,9]]
and I want to create a matrix of the summations of all pairs of rows. So, for the above matrix, we would want
[[5,7,9],
[8,10,12],
[11,13,15]]
In general, I think the new matrix will be (m choose 2) x n. For the above example in pytorch, I ran
import torch
x = torch.tensor([[1,2,3], [4,5,6], [7,8,9]])
y = x[None] + x[:, None]
torch.cat((y[0, 1:3, :], y[1, 2:3, :]))
which manually creates the matrix I am looking for. However, I am struggling to think of a way to create the output without manually specifying indices and without using a for-loop. Is there even a way to create such a matrix for an arbitrary matrix without the use of a for-loop?
You can try using this function:
def sum_rows(x):
y = x[None] + x[:, None]
ind = torch.tril_indices(x.shape[0], x.shape[0], offset=-1)
return y[ind[0], ind[1]]
Because you know you want pairs with the constraints of sum_matrix[i,j], where i<j (but i>j would also work), you can just specify that you want the lower/upper triangle indices of your 3D matrix. This still uses a for loop, AFAIK, but should do the job for variable-sized inputs.

Vectorize sampling from a multidimensional array [duplicate]

This question already has an answer here:
How to draw a sample from a categorical distribution
(1 answer)
Closed 1 year ago.
I have a numpy array of shape D x N x K.
I need a resulting D x N array of random elements out of K classes, where for each index [d, n] there is a different probability vector for the classes, indicated by the third axis.
The numpy documentation for np.random.choice only allows 1D array for probabilities.
Can I vectorize this type of sampling, or do I have to use a for loop as follows:
# input_array of shape (D, N, K)
# output_array of shape (D, N)
for d in range(input_array.shape[0]):
for n in range(input_array.shape[1]):
probabilities = input_array[d, n]
element = np.random.choice(K, p=probabilities)
output_array[d, n] = element
I would have loved if there is a function such as
output_array = np.random.choice(input_array, K, probability_axis=-1)
Edit: Managed to find a "hand engineered" solution here.
Neither np.random.choice nor np.random.default_rng().choice support broadcasting of probabilities in the way that you intend. However, you can cobble together something that works similarly using np.cumsum:
sprob = input_array.cumsum(axis=-1, dtype=float)
sprob /= sprob[:, :, -1:]
output_array = (np.random.rand(D, N, 1) > sprob).argmin(-1)
Unfortunately, np.searchsorted does not support multi-dimensional lookup either (probably for related reasons).

Numpy vectorized implementation of term-by-term division in a matrix

I have one matrix and one vector, of dimensions (N, d) and (N,) respectively. For each row, I want to divide each element by the corresponding value in the vector. I was wondering if there was a vectorized implementation (to save computation time). (I'm trying to create points on the surface of a d-dimensional sphere.) Right now I'm doing this:
x = np.random.randn(N,d)
norm = np.linalg.norm(x, axis=1)
for i in range(N):
for j in range(d):
x[i][j] = x[i][j] / norm[i]
np.linalg.norm has a keepdims argument just for this:
x /= np.linalg.norm(x, axis=1, keepdims=True)

How to efficiently calculate pairwise intersection of nonzero indices in a scipy.csr sparse matrix?

I have a scipy.sparse.csr matrix X which is n x p. For each row in X I would like to compute the intersection of the non zero element indices with each row in X and store them in a new tensor or maybe even a dictionary. For example, X is:
X = [
[0., 1.5, 4.7],
[4., 0., 0.],
[0., 0., 2.6]
]
I would like the output to be
intersect =
[
[[1,2], [], [2]],
[[], [0], []],
[[2], [], [2]]
]
intersect[i,j] is an ndarray representing the intersection of the indices of nonzero elements of ith and jth rows of X i.e X[i], X[j].
Currently the way I am doing this is by looping and I would like to vectorize this as it would be much faster and the computations are done in parallel.
# current code
n = X.shape[0]
intersection_dict = {}
for i in range(n):
for j in range(n):
indices = np.intersect1d(X[i].indices, X[j].indices)
intersection_dict[(i,j)] = indices
My n is pretty large so looping over n^2 is very poor. I am just having trouble figuring out a way to vectorize this operation. Does anybody have any ideas on how to tackle this?
EDIT:
It was made apparent that I should explain the problem I am trying to solve, so here it is.
I am solving an optimization problem and have an equation
W = X diag(theta) X'. I want to find W in a quick manner as I update the entries of theta till convergence. Further I am updating parameters using pytorch where sparse operations are not as extensive as in scipy.
where:
X : n x p sparse data matrix (n documents, p features)
theta : p x 1 parameter vector I want to learn and will be updating
X' : p x n transpose of sparse data matrix
note p >> n
I had in mind two methods of solving this quickly
Cache sparse outer product of (see More efficient matrix multiplication with diagonal matrix)
W_ij = X_i * theta * X_j (element wise product of row i of X, theta, and row j of X. And since X_i, X_j are sparse I was thinking if I take the intersection of the nonzero indices then I can do a simple dense elementwise product (sparse element wise product not supported in pytorch) of X_i[intersection indices] * theta[intersection indices] X_j[intersection indices]
I want to vectorize as much of this computation as possible rather than loop as my n is typically in the thousands and p is 11 million.
I am attempting method 2 over method 1 do to the lack of sparse support in Pytorch. Mainly when updating the entries of theta I would not like to do sparse-dense or sparse-sparse operations. I want to do dense-dense operations.
The optimization you're looking at requires storing p different n x n matrices. If you do want to try it, I'd probably use all the functionality built into sparse matrices in scipy's C extensions.
import numpy as np
from scipy import sparse
arr = sparse.random(100,10000, format="csr", density=0.01)
xxt = arr # arr.T
p_comps = [arr[:, i] # arr.T[i, :] for i in range(arr.shape[1])]
def calc_weights(xxt, thetas, p_comps):
xxt = xxt.copy()
xxt.data = np.zeros(xxt.data.shape, dtype=xxt.dtype)
for i, t in enumerate(thetas):
xxt += (p_comps[i] * t)
return xxt
W = calc_weights(xxt, np.ones(10000), p_comps)
>>>(xxt.A == W.A).all()
True
It's really unlikely that this is going to work well implemented in python. You may have better luck doing this in C, or writing something with nested loops that operates on elements and is amenable to getting JIT compiled with numba.
One first easy solution is to notice that the output matrix is symmetrical:
n = X.shape[0]
intersection_dict = {}
for i in range(n):
for j in range(i,n): #note the edit here
indices = np.intersect1d(X[i].indices, X[j].indices)
intersection_dict[(i,j)] = indices
This will reduce by a bit less than 2X your computation

Creating a sparse matrix from lists of sub matrices (Python)

This is my first SO question ever. Let me know if I could have asked it better :)
I am trying to find a way to splice together lists of sparse matrices into a larger block matrix.
I have python code that generates lists of square sparse matrices, matrix by matrix. In pseudocode:
Lx = [Lx1, Lx1, ... Lxn]
Ly = [Ly1, Ly2, ... Lyn]
Lz = [Lz1, Lz2, ... Lzn]
Since each individual Lx1, Lx2 etc. matrix is computed sequentially, they are appended to a list--I could not find a way to populate an array-like object "on the fly".
I am optimizing for speed, and the bottleneck features a computation of Cartesian products item-by-item, similar to the pseudocode:
M += J[i,j] * [ Lxi *Lxj + Lyi*Lyj + Lzi*Lzj ]
for all combinations of 0 <= i, j <= n. (J is an n-dimensional square matrix of numbers).
It seems that vectorizing this by computing all the Cartesian products in one step via (pseudocode):
L = [ [Lx1, Lx2, ...Lxn],
[Ly1, Ly2, ...Lyn],
[Lz1, Lz2, ...Lzn] ]
product = L.T * L
would be faster. However, options such as np.bmat, np.vstack, np.hstack seem to require arrays as inputs, and I have lists instead.
Is there a way to efficiently splice the three lists of matrices together into a block? Or, is there a way to generate an array of sparse matrices one element at a time and then np.vstack them together?
Reference: Similar MATLAB code, used to compute the Hamiltonian matrix for n-spin NMR simulation, can be found here:
http://spindynamics.org/Spin-Dynamics---Part-II---Lecture-06.php
This is scipy.sparse.bmat:
L = scipy.sparse.bmat([Lx, Ly, Lz], format='csc')
LT = scipy.sparse.bmat(zip(Lx, Ly, Lz), format='csr') # Not equivalent to L.T
product = LT * L
I have a "vectorized" solution, but it's almost twice as slow as the original code. Both the bottleneck shown above, and the final dot product shown in the last line below, take about 95% of the calculation time according to kernprof tests.
# Create the matrix of column vectors from these lists
L_column = bmat([Lx, Ly, Lz], format='csc')
# Create the matrix of row vectors (via a transpose of matrix with
# transposed blocks)
Lx_trans = [x.T for x in Lx]
Ly_trans = [y.T for y in Ly]
Lz_trans = [z.T for z in Lz]
L_row = bmat([Lx_trans, Ly_trans, Lz_trans], format='csr').T
product = L_row * L_column
I was able to get a tenfold speed increase by not using sparse matrices and using an array of arrays.
Lx = np.empty((1, nspins), dtype='object')
Ly = np.empty((1, nspins), dtype='object')
Lz = np.empty((1, nspins), dtype='object')
These are populated with the individual Lx arrays (formerly sparse matrices) as they are generated. Using the array structure allows the transpose and Cartesian product to perform as desired:
Lcol = np.vstack((Lx, Ly, Lz)).real
Lrow = Lcol.T # As opposed to sparse version of code, this works!
Lproduct = np.dot(Lrow, Lcol)
The individual Lx[n] matrices are still "bundled", so Product is an n x n matrix. This means in-place multiplication of the n x n J array with Lproduct works:
scalars = np.multiply(J, Lproduct)
Each matrix element is then added on to the final hamiltonian matrix:
for n in range(nspins):
for m in range(nspins):
M += scalars[n, k].real

Categories