Given two sparse scipy matrices A, B I want to compute the row-wise outer product.
I can do this with numpy in a number of ways. The easiest perhaps being
np.einsum('ij,ik->ijk', A, B).reshape(n, -1)
or
(A[:, :, np.newaxis] * B[:, np.newaxis, :]).reshape(n, -1)
where n is the number of rows in A and B.
In my case, however, going through dense matrices eat up way too much RAM.
The only option I have found is thus to use a python loop:
sp.sparse.vstack((ra.T#rb).reshape(1,-1) for ra, rb in zip(A,B)).tocsr()
While using less RAM, this is very slow.
My question is thus, is there a sparse (RAM efficient) way to take the row-wise outer product of two matrices, which keeps things vectorized?
(A similar question is numpy elementwise outer product with sparse matrices but all answers there go through dense matrices.)
We can directly calculate the csr representation of the result. It's not superfast (~3 seconds on 100,000x768) but may be ok, depending on your use case:
import numpy as np
import itertools
from scipy import sparse
def spouter(A,B):
N,L = A.shape
N,K = B.shape
drows = zip(*(np.split(x.data,x.indptr[1:-1]) for x in (A,B)))
data = [np.outer(a,b).ravel() for a,b in drows]
irows = zip(*(np.split(x.indices,x.indptr[1:-1]) for x in (A,B)))
indices = [np.ravel_multi_index(np.ix_(a,b),(L,K)).ravel() for a,b in irows]
indptr = np.fromiter(itertools.chain((0,),map(len,indices)),int).cumsum()
return sparse.csr_matrix((np.concatenate(data),np.concatenate(indices),indptr),(N,L*K))
A = sparse.random(100,768,0.03).tocsr()
B = sparse.random(100,768,0.03).tocsr()
print(np.all(np.einsum('ij,ik->ijk',A.A,B.A).reshape(100,-1) == spouter(A,B).A))
A = sparse.random(100000,768,0.03).tocsr()
B = sparse.random(100000,768,0.03).tocsr()
from time import time
T = time()
C = spouter(A,B)
print(time()-T)
Sample run:
True
3.1073222160339355
Given data with shape = (t,m,n), I need to find a vector variable of shape (n,) that minimizes a convex function of the data and vector. I've used cvxopt (and cvxpy) to perform convex optimizations using 2D input, but it seems like they don't support 3D arrays. Is there a way to implement this convex optimization using these or other similar packages?
Given data with shape (t,m,n) and (t,m) and var with shape (n,), here's a simplification of the type of function I need to minimize:
import numpy as np
obj_func(var,data1,data2):
#data1.shape = (t,m,n)
#data2.shape = (t,m)
#var.shape = (n,)
score = np.sum(data1*var,axis=2) #dot product along axis 2
time_series = np.sum(score*data2,axis=1) #weighted sum along axis 1
return np.sum(time_series)-np.sum(time_series**2) #some function
This seems like it should be a simple convex optimization, but unfortunately these functions aren't supported on N-dimensional arrays in cvxopt/cvxpy. Is there a way to implement this?
I think if you simply reshape data1 to be 2d temporarily you'll be fine, e.g.
import numpy as np
import cvxpy as cp
t, m, n = 10, 8, 6
data1 = np.ones((t, m, n))
data2 = np.ones((t, m))
x = cp.Variable(n)
score = cp.reshape(data1.reshape(-1, n) * x, (t, m))
time_series = cp.sum(cp.multiply(score, data2), axis=1)
expr = cp.sum(time_series) - cp.sum(time_series ** 2)
print(repr(expr))
Outputs:
Expression(CONCAVE, UNKNOWN, ())
I am trying to find the eigenvalues and eigenvectors of a complex matrix with scipy.sparse.linalg.eigsh using its shift-invert mode. With just real numbers in the matrix I get the same result for the spicy.linalg.eigh solver, but when adding the imaginary parts the eigenvalues diverge. A tiny example:
import numpy as np
from scipy.linalg import eigh
from scipy.sparse.linalg import eigsh
n = 10
X = np.random.random((n, n)) - 0.5 + (np.random.random((n, n)) - 0.5) * 1j
X = np.dot(X, X.T) # create a symmetric matrix
evals_all, evecs_all = eigh(X)
evals_small, evecs_small = eigsh(X, 3, sigma=0, which='LM')
print(sorted(evals_all, key=abs))
print(sorted(evals_small, key=abs))
The prints in this case are for example
[0.041577858515751132, -0.084104744918533481, -0.58668240775486691, 0.63845672501004724, -1.2311727737115068, 1.5193345703630159, -1.8652302423152105, 1.9970059660853923, -2.6414593461321654, 2.8624290667460293]
[-0.017278543470343462, -0.32684893256215408, 0.34551438015659475]
whereas in the real case, the first three eigenvalues are identical.
I am aware that I'm passing a dense matrix to the sparse solver, but this is just intended as an example.
I am probably missing something obvious somewhere, but I'd be happy about some hints where to look. Thank you!
scipy is not checking your input if it's hermitian.
Doing it like proposed in the link:
if not np.allclose(X, np.asmatrix(X).H):
raise ValueError('expected symmetric or Hermitian matrix')
outputs:
ValueError: expected symmetric or Hermitian matrix
I think this is also indicated by those negative eigenvalues you see (but complex-based math is really not my speciality...).
In scipy, we can construct a sparse matrix using scipy.sparse.lil_matrix() etc. But the matrix is in 2d.
I am wondering if there is an existing data structure for sparse 3d matrix / array (tensor) in Python?
p.s. I have lots of sparse data in 3d and need a tensor to store / perform multiplication. Any suggestions to implement such a tensor if there's no existing data structure?
Happy to suggest a (possibly obvious) implementation of this, which could be made in pure Python or C/Cython if you've got time and space for new dependencies, and need it to be faster.
A sparse matrix in N dimensions can assume most elements are empty, so we use a dictionary keyed on tuples:
class NDSparseMatrix:
def __init__(self):
self.elements = {}
def addValue(self, tuple, value):
self.elements[tuple] = value
def readValue(self, tuple):
try:
value = self.elements[tuple]
except KeyError:
# could also be 0.0 if using floats...
value = 0
return value
and you would use it like so:
sparse = NDSparseMatrix()
sparse.addValue((1,2,3), 15.7)
should_be_zero = sparse.readValue((1,5,13))
You could make this implementation more robust by verifying that the input is in fact a tuple, and that it contains only integers, but that will just slow things down so I wouldn't worry unless you're releasing your code to the world later.
EDIT - a Cython implementation of the matrix multiplication problem, assuming other tensor is an N Dimensional NumPy array (numpy.ndarray) might look like this:
#cython: boundscheck=False
#cython: wraparound=False
cimport numpy as np
def sparse_mult(object sparse, np.ndarray[double, ndim=3] u):
cdef unsigned int i, j, k
out = np.ndarray(shape=(u.shape[0],u.shape[1],u.shape[2]), dtype=double)
for i in xrange(1,u.shape[0]-1):
for j in xrange(1, u.shape[1]-1):
for k in xrange(1, u.shape[2]-1):
# note, here you must define your own rank-3 multiplication rule, which
# is, in general, nontrivial, especially if LxMxN tensor...
# loop over a dummy variable (or two) and perform some summation:
out[i,j,k] = u[i,j,k] * sparse((i,j,k))
return out
Although you will always need to hand roll this for the problem at hand, because (as mentioned in code comment) you'll need to define which indices you're summing over, and be careful about the array lengths or things won't work!
EDIT 2 - if the other matrix is also sparse, then you don't need to do the three way looping:
def sparse_mult(sparse, other_sparse):
out = NDSparseMatrix()
for key, value in sparse.elements.items():
i, j, k = key
# note, here you must define your own rank-3 multiplication rule, which
# is, in general, nontrivial, especially if LxMxN tensor...
# loop over a dummy variable (or two) and perform some summation
# (example indices shown):
out.addValue(key) = out.readValue(key) +
other_sparse.readValue((i,j,k+1)) * sparse((i-3,j,k))
return out
My suggestion for a C implementation would be to use a simple struct to hold the indices and the values:
typedef struct {
int index[3];
float value;
} entry_t;
you'll then need some functions to allocate and maintain a dynamic array of such structs, and search them as fast as you need; but you should test the Python implementation in place for performance before worrying about that stuff.
An alternative answer as of 2017 is the sparse package. According to the package itself it implements sparse multidimensional arrays on top of NumPy and scipy.sparse by generalizing the scipy.sparse.coo_matrix layout.
Here's an example taken from the docs:
import numpy as np
n = 1000
ndims = 4
nnz = 1000000
coords = np.random.randint(0, n - 1, size=(ndims, nnz))
data = np.random.random(nnz)
import sparse
x = sparse.COO(coords, data, shape=((n,) * ndims))
x
# <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1000000>
x.nbytes
# 16000000
y = sparse.tensordot(x, x, axes=((3, 0), (1, 2)))
y
# <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1001588>
Have a look at sparray - sparse n-dimensional arrays in Python (by Jan Erik Solem). Also available on github.
Nicer than writing everything new from scratch may be to use scipy's sparse module as far as possible. This may lead to (much) better performance. I had a somewhat similar problem, but I only had to access the data efficiently, not perform any operations on them. Furthermore, my data were only sparse in two out of three dimensions.
I have written a class that solves my problem and could (as far as I think) easily be extended to satisfiy the OP's needs. It may still hold some potential for improvement, though.
import scipy.sparse as sp
import numpy as np
class Sparse3D():
"""
Class to store and access 3 dimensional sparse matrices efficiently
"""
def __init__(self, *sparseMatrices):
"""
Constructor
Takes a stack of sparse 2D matrices with the same dimensions
"""
self.data = sp.vstack(sparseMatrices, "dok")
self.shape = (len(sparseMatrices), *sparseMatrices[0].shape)
self._dim1_jump = np.arange(0, self.shape[1]*self.shape[0], self.shape[1])
self._dim1 = np.arange(self.shape[0])
self._dim2 = np.arange(self.shape[1])
def __getitem__(self, pos):
if not type(pos) == tuple:
if not hasattr(pos, "__iter__") and not type(pos) == slice:
return self.data[self._dim1_jump[pos] + self._dim2]
else:
return Sparse3D(*(self[self._dim1[i]] for i in self._dim1[pos]))
elif len(pos) > 3:
raise IndexError("too many indices for array")
else:
if (not hasattr(pos[0], "__iter__") and not type(pos[0]) == slice or
not hasattr(pos[1], "__iter__") and not type(pos[1]) == slice):
if len(pos) == 2:
result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]]]
else:
result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]], pos[2]].T
if hasattr(pos[2], "__iter__") or type(pos[2]) == slice:
result = result.T
return result
else:
if len(pos) == 2:
return Sparse3D(*(self[i, self._dim2[pos[1]]] for i in self._dim1[pos[0]]))
else:
if not hasattr(pos[2], "__iter__") and not type(pos[2]) == slice:
return sp.vstack([self[self._dim1[pos[0]], i, pos[2]]
for i in self._dim2[pos[1]]]).T
else:
return Sparse3D(*(self[i, self._dim2[pos[1]], pos[2]]
for i in self._dim1[pos[0]]))
def toarray(self):
return np.array([self[i].toarray() for i in range(self.shape[0])])
I also need 3D sparse matrix for solving the 2D heat equations (2 spatial dimensions are dense, but the time dimension is diagonal plus and minus one offdiagonal.) I found this link to guide me. The trick is to create an array Number that maps the 2D sparse matrix to a 1D linear vector. Then build the 2D matrix by building a list of data and indices. Later the Number matrix is used to arrange the answer back to a 2D array.
[edit] It occurred to me after my initial post, this could be handled better by using the .reshape(-1) method. After research, the reshape method is better than flatten because it returns a new view into the original array, but flatten copies the array. The code uses the original Number array. I will try to update later.[end edit]
I test it by creating a 1D random vector and solving for a second vector. Then multiply it by the sparse 2D matrix and I get the same result.
Note: I repeat this many times in a loop with exactly the same matrix M, so you might think it would be more efficient to solve for inverse(M). But the inverse of M is not sparse, so I think (but have not tested) using spsolve is a better solution. "Best" probably depends on how large the matrix is you are using.
#!/usr/bin/env python3
# testSparse.py
# profhuster
import numpy as np
import scipy.sparse as sM
import scipy.sparse.linalg as spLA
from array import array
from numpy.random import rand, seed
seed(101520)
nX = 4
nY = 3
r = 0.1
def loadSpNodes(nX, nY, r):
# Matrix to map 2D array of nodes to 1D array
Number = np.zeros((nY, nX), dtype=int)
# Map each element of the 2D array to a 1D array
iM = 0
for i in range(nX):
for j in range(nY):
Number[j, i] = iM
iM += 1
print(f"Number = \n{Number}")
# Now create a sparse matrix of the "stencil"
diagVal = 1 + 4 * r
offVal = -r
d_list = array('f')
i_list = array('i')
j_list = array('i')
# Loop over the 2D nodes matrix
for i in range(nX):
for j in range(nY):
# Recall the 1D number
iSparse = Number[j, i]
# populate the diagonal
d_list.append(diagVal)
i_list.append(iSparse)
j_list.append(iSparse)
# Now, for each rectangular neighbor, add the
# off-diagonal entries
# Use a try-except, so boundry nodes work
for (jj,ii) in ((j+1,i),(j-1,i),(j,i+1),(j,i-1)):
try:
iNeigh = Number[jj, ii]
if jj >= 0 and ii >=0:
d_list.append(offVal)
i_list.append(iSparse)
j_list.append(iNeigh)
except IndexError:
pass
spNodes = sM.coo_matrix((d_list, (i_list, j_list)), shape=(nX*nY,nX*nY))
return spNodes
MySpNodes = loadSpNodes(nX, nY, r)
print(f"Sparse Nodes = \n{MySpNodes.toarray()}")
b = rand(nX*nY)
print(f"b=\n{b}")
x = spLA.spsolve(MySpNodes.tocsr(), b)
print(f"x=\n{x}")
print(f"Multiply back together=\n{x * MySpNodes}")
I needed a 3d look up table for x,y,z and came up with this solution..
Why not use one of the dimensions to be a divisor of the third dimension? ie. use x and 'yz' as the matrix dimensions
eg. if x has 80 potential members, y has 100 potential' and z has 20 potential'
you make the sparse matrix to be 80 by 2000 (i.e. xy=100x20)
x dimension is as usual
yz dimension: the first 100 elements will represent z=0, y=0 to 99
..............the second 100 will represent z=2, y=0 to 99 etc
so given element located at (x,y,z) would be in sparse matrix at (x, z*100 + y)
if you need to use negative numbers design a aritrary offset into your matrix translation. the solutio could be expanded to n dimensions if necessary
from scipy import sparse
m = sparse.lil_matrix((100,2000), dtype=float)
def add_element((x,y,z), element):
element=float(element)
m[x,y+z*100]=element
def get_element(x,y,z):
return m[x,y+z*100]
add_element([3,2,4],2.2)
add_element([20,15,7], 1.2)
print get_element(0,0,0)
print get_element(3,2,4)
print get_element(20,15,7)
print " This is m sparse:";print m
====================
OUTPUT:
0.0
2.2
1.2
This is m sparse:
(3, 402L) 2.2
(20, 715L) 1.2
====================
I have the following code in Python using Numpy:
p = np.diag(1.0 / np.array(x))
How can I transform it to get the sparse matrix p2 with the same values as p without creating p first?
Use scipy.sparse.spdiags (which does a lot, and so may be confusing, at first), scipy.sparse.dia_matrix and/or scipy.sparse.lil_diags. (depending on the format you want the sparse matrix in...)
E.g. using spdiags:
import numpy as np
import scipy as sp
import scipy.sparse
x = np.arange(10)
# "0" here indicates the main diagonal...
# "y" will be a dia_matrix type of sparse array, by default
y = sp.sparse.spdiags(x, 0, x.size, x.size)
Using the scipy.sparse module,
p = sparse.dia_matrix(1.0 / np.array(x), shape=(len(x), len(x)));