Sparse matrix multiplication when results' sparsity is known (in python|scipy|cython)

Sparse matrix multiplication when results' sparsity is known (in python|scipy|cython) - python

Suppose we want to compute C=A*B for given sparse matrices A,B but are interested in a very small subset of entries of C, represented by a list of index pairs:
rows=[i1, i2, i3 ... ]
cols=[j1, j2, j3 ... ]
Both A and B are quite large (say 50Kx50K), but very sparse (<1% of entries is non-zero).
How can we compute this subset of the multiplication?
Here's a naive implementation that works really slow:
def naive(A, B, rows, cols):
N = len(rows)
vals = []
for n in xrange(N):
v = A.getrow(rows[n]) * B.getcol(cols[n])
vals.append(v[0, 0])
R = sps.coo_matrix((np.array(vals), (np.array(rows), np.array(cols))), shape=(A.shape[0], B.shape[1]), dtype=np.float64)
return R
even for small matrices this is quite bad:
import scipy.sparse as sps
import numpy as np
D = 1000
A = np.random.randn(D, D)
A[np.abs(A) > 0.1] = 0
A = sps.csr_matrix(A)
B = np.random.randn(D, D)
B[np.abs(B) > 0.1] = 0
B = sps.csr_matrix(B)
X = np.random.randn(D, D)
X[np.abs(X) > 0.1] = 0
X[X != 0] = 1
X = sps.csr_matrix(X)
rows, cols = X.nonzero()
naive(A, B, rows, cols)
On my machine, naive() finishes after 1 minute, and most of the effort is spent on structuring the rows/cols (in getrow(), getcol()).
Of course, converting this (very small) example to dense matrices, the computation takes about 100ms:
A0 = np.array(A.todense())
B0 = np.array(B.todense())
X0 = np.array(X.todense())
A0.dot(B0) * X0
Any thoughts on how to efficiently compute such matrix multiplication?
Note: This question is almost identical to the following question:
Subset of a matrix multiplication, fast, and sparse
However, there, A and B are full matrices, and, one of the dimensions is very low (say, 10)
the proposed solutions seem to benefit from both.

The format of your sparse matrices is important here. You always need a row form A and a column from B. So, store A as a csr and B as csc to get rid of the getrow/getcol overhead. Unfortunately, this is only a small part of the story.
The best solution depends a lot on the structure of your sparse matrix (a lot of sparse columns/rows, etc), but you might try one based on dictionaries and sets. For matrix A for each row the following are kept:
a set with all non-zero column indices on that row
a dictionary with the non-zero indices as keys and the corresponding non-zero values as values
For matrix B similar dicts and sets are kept for each column.
To calculate element (M, N) in the multiplication result, row M of A is multiplied with column N of B. The multiplication:
find the set intersection of the non-zero sets
calculate the sum of multiplications of the non-zero elements (i.e. the intersection above)
In most cases this should be very fast, as in a sparse matrix the set intersection is usually very small.
Some code:
class rowarray():
def __init__(self, arr):
self.rows = []
for row in arr:
nonzeros = np.nonzero(row)[0]
nzvalues = { i: row[i] for i in nonzeros }
self.rows.append((set(nonzeros), nzvalues))
def __getitem__(self, key):
return self.rows[key]
def __len__(self):
return len(self.rows)
class colarray(rowarray):
def __init__(self, arr):
rowarray.__init__(self, arr.T)
def maybe_less_naive(A, B, rows, cols):
N = len(rows)
vals = []
for n in xrange(N):
nz1,v1 = A[rows[n]]
nz2,v2 = B[cols[n]]
# list of common non-zeros
nz = nz1.intersection(nz2)
# sum of non-zeros
vals.append(sum([ v1[i]*v2[i] for i in nz]))
R = sps.coo_matrix((np.array(vals), (np.array(rows), np.array(cols))), shape=(len(A), len(B)), dtype=np.float64)
return R
D = 1000
Ap = np.random.randn(D, D)
Ap[np.abs(Ap) > 0.1] = 0
A = rowarray(Ap)
Bp = np.random.randn(D, D)
Bp[np.abs(Bp) > 0.1] = 0
B = colarray(Bp)
X = np.random.randn(D, D)
X[np.abs(X) > 0.1] = 0
X[X != 0] = 1
X = sps.csr_matrix(X)
rows, cols = X.nonzero()
maybe_less_naive(A, B, rows, cols)
This is a bit more efficient, the multiplication takes approximately 2 seconds for the test (80 000 elements). The results seem to be essentially the same.
A few comments on the performance.
There are two operations performed for each output element:
set intersection
multiplication
The complexity of set intersection should be O(min(m,n)) where m and n are the numbers of non-zeros in each operand. This is invariant of the size of the matrix, only the average number of non-zeros per row/column is important.
The number of multiplications (and dict lookups) depends on the number of non-zeros found in the intersection above.
If both matrices have randomly distributed non-zeros with probability (density) p, and the row/column length is n, then:
set intersection: O(np)
dictionary lookup, multiplication: O(np^2)
This shows that with really sparse matrices finding the intersections is the critical point. This can also be verified by profiling; most of the time is spent calculating the intersections.
When this is reflected to the real world, we seem to spend around 20 us for a row/column of 80 non-zeros. This is not blindingly fast, and the code can certainly be made faster. Cython may be one solution, but this may be one of the problems where Python is not the best possible solution. A simple linear matching (merge sort -type algorithm) for sorted integers should be at least an order of magnitude faster when written in C.
One important thing to note is that the algorithm can be done in parallel for several elements at a time. There is no need to settle for a single thread, as the calculations are independent as far as one thread handles one output point.

Related

How to efficiently find separately for each element N maximum values among multiple matrices?

I am looping through a large number of H x W matrices. I cannot store them all in memory. I need to get N matrices. For example, the element of the 1st of N matrix in position (i, j) will be the largest among all elements in position (i, j) of all processed matrix matrices. For the second of the N matrix, the elements that are the second-largest will be taken, and so on.
Example.
Let N = 2. Then the 1st matrix will look like this.
And the second matrix is like this.
How to do such an operation inside a loop so as not to store all matrices in memory?

The comments suggested using the np.partition function. I replaced the use of numpy with cupy, which uses the GPU. And also added a buffer to sort less frequently.
import cupy as np
buf = // # As much as fits into the GPU
largests = np.zeros((buf + N, h, w))
for i in range(num):
val = //
largests[i % buf] = val
if i % buf == buf - 1:
largests.partition(range(buf, buf + N), axis=0)
largests.partition(range(buf, buf + N), axis=0) # Let's not forget the tail
res = largests[:-(N + 1):-1]
The solution does not work very quickly, but I have come to terms with this speed.

Generate an Asymmetric NxM matrix whose Rows and Columns Independently Sum to 1

Given a target matrix size of N rows and M columns, is it possible to pick values such that all rows and columns sum to 1, on the condition that the matrix is not symmetric across the diagonal? Here's a target matrix I was able to generate when N==M (The problems arise when N!=M - see below):
[[0.08345877 0.12844672 0.90911941 0.41964704 0.57709569]
[0.53949086 0.07965491 0.62582134 0.48922244 0.38357809]
[0.80619328 0.27581426 0.31312973 0.26855717 0.4540732 ]
[0.11803505 0.88201276 0.1990759 0.2818701 0.63677383]
[0.57058968 0.75183898 0.07062126 0.6584709 0.06624682]]
I'm writing this in numpy. Currently I've written the following (brute force) code, which I know works when n==m. However, if n != m, rowwise and columwise sums don't converge to 0, and the ratio of rowwise sums to columwise sums converges to (n/m):
n,m = (5,4)
mat = np.random.random((n,m))
for i in range(100):
s0 = mat.sum(0)
s1 = mat.sum(1)[:,newaxis]
mat = (mat/s0)
mat = (mat/s1)
if i%10 == 0:
print(s0[0]/s1[0,0])
The final output in this case is 1.25 (I.e. n/m, or 5/4). I'm beginning to think this might not be mathematically possible. Can someone prove me wrong?

I suspect you are correct, the problem cannot be solved if N != M.
Take a 2x3 matrix as an example:
[[a b c]
[d e f]]
Assume that all rows and all columns sum to 1 and show a contradiction. The rows sum to 1 so:
a+b+c = 1
d+e+f = 1
This gives:
(a+b+c)+(d+e+f) = 1 + 1 = 2
Now look at the columns. Each column also sums to 1 so we have:
a+d = 1
b+e = 1
c+f = 1
Combining the three column equations gives:
(a+d)+(b+e)+(c+f) = 1 + 1 + 1 = 3
Since the sum of all six matrix elements cannot be both 2 and 3 simultaneously, 2 != 3, the initial assumption leads to a contradiction and so is disproved. More generally the problem cannot be solved for N != M with N rows and M columns.
The contradiction disappears when N = M for a square matrix.

Updating python numpy array columns multiple times

Suppose I have a 2x3 matrix A:
1 2 3
4 5 6
and a vector y of length 4:
0 1 2 1
as well as another 4x2 matrix B:
0 0
1 1
2 2
3 3
I want to update the columns of A multiple times by adding from rows of B.
And the index of columns of A to be updated is given by y.
Using for loops, this can be done as:
for i in np.arange(4):
A[:,y[i]] += B[i,:]
I had implemented this using ufunc.at as:
np.add.at(A.T,y,B)
However, the performance of ufunc.at is almost as bad as using for loops.
How can I get a different vectorized implementation?
Updating using A[:,y]+=B.T seems to update each column only once.

Approach #1
Here's one approach using summations at intervals with np.add.reduceat -
def reduceat_app(A, y, B):
idx = y.argsort()
y0 = y[idx]
sep_idx = np.concatenate(([0], np.flatnonzero(y0[1:] != y0[:-1])+1))
A += np.add.reduceat(B[idx],sep_idx).T
Approach #2
With relatively lesser number of rows in A, we can also use np.bincount to perform those bin-based summations in an iterative manner for each row, like so -
def bincount_loopy_app(A, y, B):
n = A.shape[1]
for i,a in enumerate(A):
a += np.bincount(y,B[:,i],minlength=n).astype(A.dtype)
Approach #3
We can vectorize previous approach by creating a 2D grid of y indices/bins for all elements such that for each row, we would have offsetted bins. With that offsetting, bincount could be used to perform bin based summations across all rows in a vectorized manner.
Thus, the implementation would be -
def bincount_vectorized_app(A, y, B):
m,n = A.shape
IDs = y[:,None] + n*np.arange(B.shape[1])
vals = np.bincount(IDs.ravel(), B.ravel(), minlength=m*n)
A += vals.astype(A.dtype).reshape((m,n))

Nested for Loop optimization in python

i want to optimize 2 for loops into single for loop, is there any way as length of array is very large.
A = [1,4,2 6,9,10,80] #length of list is very large
B = []
for x in A:
for y in A:
if x != y:
B.append(abs(x-y))
print(B)

not any better but more pythonic:
B = [abs(x-y) for x in A for y in A if x!=y]
unless you absolutely need duplicates (abs(a-b) == abs(b-a)), you can half your list (and thus computation):
B = [abs(A[i]-A[j]) for i in range(len(A)) for j in range(i+1, len(A))]
finaly you can use the power of numpy to get C++ speedup:
import numpy as np
A = np.array(A)
A.shape = -1,1 # make it a column vector
diff = np.abs(A - A.T) # diff is the matrix of abs differences
# grab upper triangle of order 1 (i.e. less the diagonal)
B = diff[np.triu_indices(len(A), k=1)]
But this will always be O(n^2) no matter what...

Fast algorithm to compute Adamic-Adar

I'm working on graph analysis. I want to compute an N by N similarity matrix that contains the Adamic Adar similarity between every two vertices. To give an overview of Adamic Adar let me start with this introduction:
Given the adjacency matrix A of an undirected graph G. CN is the set of all common neighbors of two vertices x, y. A common neighbor of two vertices is one where both vertices have an edge/link to, i.e. both vertices will have a 1 for the corresponding common neighbor node in A. k_n is the degree of node n.
Adamic-Adar is defined as the following:
My attempt to compute it is to fetch both rows of the x and y nodes from A and then sum them. Then look for the elements that has 2 as the value and then gets their degrees and apply the equation. However computing that takes really really a long of time. I tried with a graph that contains 1032 vertices and it took a lot of time to compute. It started with 7 minutes and then I cancelled the computations. So my question: is there a better algorithm to compute it?
Here's my code in python:
def aa(graph):
"""
Calculates the Adamic-Adar index.
"""
N = graph.num_vertices()
A = gts.adjacency(graph)
S = np.zeros((N,N))
degrees = get_degrees_dic(graph)
for i in xrange(N):
A_i = A[i]
for j in xrange(N):
if j != i:
A_j = A[j]
intersection = A_i + A_j
common_ns_degs = list()
for index in xrange(N):
if intersection[index] == 2:
cn_deg = degrees[index]
common_ns_degs.append(1.0/np.log10(cn_deg))
S[i,j] = np.sum(common_ns_degs)
return S

Since you're using numpy, you can really cut down on your need to iterate for every operation in the algorithm. my numpy- and vectorized-fu aren't the greatest, but the below runs in around 2.5s on a graph with ~13,000 nodes:
def adar_adamic(adj_mat):
"""Computes Adar-Adamic similarity matrix for an adjacency matrix"""
Adar_Adamic = np.zeros(adj_mat.shape)
for i in adj_mat:
AdjList = i.nonzero()[0] #column indices with nonzero values
k_deg = len(AdjList)
d = np.log(1.0/k_deg) # row i's AA score
#add i's score to the neighbor's entry
for i in xrange(len(AdjList)):
for j in xrange(len(AdjList)):
if AdjList[i] != AdjList[j]:
cell = (AdjList[i],AdjList[j])
Adar_Adamic[cell] = Adar_Adamic[cell] + d
return Adar_Adamic
unlike MBo's answer, this does build the full, symmetric matrix, but the inefficiency (for me) was tolerable, given the execution time.

I believe you are using rather slow approach. It would better to revert it -
- initialize AA (Adamic-Adar) matrix by zeros
- for every node k get it's degree k_deg
- calc d = log(1.0/k_deg) (why log10 - is it important or not?)
- add d to all AAij, where i,j - all pairs of 1s in kth row
of adjacency matrix
Edit:
- for sparse graphs it is useful to extract positions of all 1s in kth row to the list to reach O(V*(V+E)) complexity instead of O(V^3)
AA = np.zeros((N,N))
for k = 0 to N - 1 do
AdjList = []
for j = 0 to N - 1 do
if A[k, j] = 1 then
AdjList.Add(j)
k_deg = AdjList.Length
d = log(1/k_deg)
for j = 0 to AdjList.Length - 2 do
for i = j+1 to AdjList.Length - 1 do
AA[AdjList[i],AdjList[j]] = AA[AdjList[i],AdjList[j]] + d
//half of matrix filled, it is symmetric for undirected graph

I don't see a way of reducing the time complexity, but it can be vectorized:
degrees = A.sum(axis=0)
weights = np.log10(1.0/degrees)
adamic_adar = (A*weights).dot(A.T)
With A a regular Numpy array. It seems you're using graph_tool.spectral.adjacency and thus A would be a sparse matrix. In that case the code would be:
from scipy.sparse import csr_matrix
degrees = A.sum(axis=0)
weights = csr_matrix(np.log10(1.0/degrees))
adamic_adar = A.multiply(weights) * A.T
This is much faster than using Python loops. A small warning though: with this approach you really need to make sure that the values on the main diagonal (of A and adamic_adar) are what you expect them to be. Also, A must not contain weights, but only zeros and ones.

I believe there most be a function like the one defined in R igraph in its python_igraph as well for the node similarity (Adamic_Adar as well)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sparse matrix multiplication when results' sparsity is known (in python|scipy|cython) - python

Related

How to efficiently find separately for each element N maximum values among multiple matrices?

Generate an Asymmetric NxM matrix whose Rows and Columns Independently Sum to 1

Updating python numpy array columns multiple times

Nested for Loop optimization in python

Fast algorithm to compute Adamic-Adar

Categories

Resources