allocate memory in python for large scipy.sparse matrix operations - python

Is there a way I can allocate memory for scipy sparse matrix functions to process large datasets?
Specifically, I'm attempting to use Asymmetric Least Squares Smoothing (translated into python here and the original here) to perform a baseline correction on a large mass spec dataset (length of ~60,000).
The function (see below) uses the scipy.sparse matrix operations.
def baseline_als(y, lam, p, niter):
L = len(y)
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
w = np.ones(L)
for i in xrange(niter):
W = sparse.spdiags(w, 0, L, L)
Z = W + lam * D.dot(D.transpose())
z = spsolve(Z, w*y)
w = p * (y > z) + (1-p) * (y < z)
return z
I have no problem when I pass data sets that are 10,000 or less in length:
baseline_als(np.ones(10000),100,0.1,10)
But when passing larger data sets, e.g.
baseline_als(np.ones(50000), 100, 0.1, 10)
I get a MemoryError, for the line
D = sparse.csc_matrix(np.diff(np.eye(L), 2))

Try changing
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
to
diag = np.ones(L - 2)
D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2)
D will be a sparse matrix in DIAgonal format. If it turns out that being in CSC format is important, convert it using the tocsc() method:
D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2).tocsc()
The following example shows that the old and new versions generate the same matrix:
In [67]: from scipy import sparse
In [68]: L = 8
Original:
In [69]: D = sparse.csc_matrix(np.diff(np.eye(L), 2))
In [70]: D.A
Out[70]:
array([[ 1., 0., 0., 0., 0., 0.],
[-2., 1., 0., 0., 0., 0.],
[ 1., -2., 1., 0., 0., 0.],
[ 0., 1., -2., 1., 0., 0.],
[ 0., 0., 1., -2., 1., 0.],
[ 0., 0., 0., 1., -2., 1.],
[ 0., 0., 0., 0., 1., -2.],
[ 0., 0., 0., 0., 0., 1.]])
New version:
In [71]: diag = np.ones(L - 2)
In [72]: D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2)
In [73]: D.A
Out[73]:
array([[ 1., 0., 0., 0., 0., 0.],
[-2., 1., 0., 0., 0., 0.],
[ 1., -2., 1., 0., 0., 0.],
[ 0., 1., -2., 1., 0., 0.],
[ 0., 0., 1., -2., 1., 0.],
[ 0., 0., 0., 1., -2., 1.],
[ 0., 0., 0., 0., 1., -2.],
[ 0., 0., 0., 0., 0., 1.]])

Related

Compute indexed tensor multiplication with sympy

I would like to compute the following with sympy:
Where I is a 3x3 identity matrix. The end use is to use this with symbolic matrices.
I have the following:
import sympy as sp
I = sp.eye(3)
Missing operations with sympy
With numpy I can just use the einsum function and have:
import numpy as np
I = np.eye(3)
Res = (np.einsum("ij,kl->ijkl", I, I)
+ np.einsum("ik,jl->ijkl", I, I)
+ np.einsum("il,jk->ijkl", I, I))
However, einsum will not accept sympy's objects for this operation.
How can I compute this with sympy?
While the notation makes the einsum expression convenient, it isn't a matrix-product. It's more like an extended outer product. There's no sum-of-products:
In [22]: I = np.eye(3)
...: Res = (
...: np.einsum("ij,kl->ijkl", I, I)
...: + np.einsum("ik,jl->ijkl", I, I)
...: + np.einsum("il,jk->ijkl", I, I)
...: )
In [23]: Res
Out[23]:
array([[[[3., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]],
[[0., 1., 0.],
[1., 0., 0.],
[0., 0., 0.]],
[[0., 0., 1.],
[0., 0., 0.],
[1., 0., 0.]]],
[[[0., 1., 0.],
[1., 0., 0.],
[0., 0., 0.]],
[[1., 0., 0.],
[0., 3., 0.],
[0., 0., 1.]],
[[0., 0., 0.],
[0., 0., 1.],
[0., 1., 0.]]],
[[[0., 0., 1.],
[0., 0., 0.],
[1., 0., 0.]],
[[0., 0., 0.],
[0., 0., 1.],
[0., 1., 0.]],
[[1., 0., 0.],
[0., 1., 0.],
[0., 0., 3.]]]])
In [24]: Res.shape
Out[24]: (3, 3, 3, 3)
In numpy we can use broadcasting to do the same thing:
In [25]: res1 = I[:,:,None,None]*I + I[:,None,:,None]*I[None,:,None,:]+I[:,None,None,:]*I[None,:,:,Non
...: e]
In [26]: res1.shape
Out[26]: (3, 3, 3, 3)
In [27]: np.allclose(Res, res1)
Out[27]: True
Your sp.eye produces a MutableDenseMatrix
https://docs.sympy.org/latest/modules/matrices/dense.html#sympy.matrices.dense.MutableDenseMatrix
Feel free to study its docs. My impression is that sympy matrices don't implement multidimensional arrays with anything like the power of numpy.

How to change offset of matrix python numpy

I would like to get a matrix in shape of 100x100 like this:
[-2,1,0,0]
[1,-2,1,0]
[0,1,-2,1]
[0,0,1,-2]
I started with creating the diagonal:
import numpy as np
diagonal= (100)
diagonal= np.full(diagonal, -2)
A100 = (100,100)
A100 = np.zeros(A100)
np.fill_diagonal(A100, diagonal)
Now for changing the offset I tried:
off1=(99)
off1=np.ones(off1)
off1=np.diagonal(A100, offset=1)
But this doesn`t work.
Thanks for your help!
Construct the matrix from three identity matrices:
np.eye(100, k=1) + np.eye(100, k=-1) - 2 * np.eye(100)
P.S. This solution is 7x faster than the scipy.sparse solution.
You can use scipy.sparse.diags
from scipy.sparse import diags
A100 = diags([-2, 1, 1], [0, -1, 1], shape = (100, 100))
A100.A
Out[]:
array([[-2., 1., 0., ..., 0., 0., 0.],
[ 1., -2., 1., ..., 0., 0., 0.],
[ 0., 1., -2., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., -2., 1., 0.],
[ 0., 0., 0., ..., 1., -2., 1.],
[ 0., 0., 0., ..., 0., 1., -2.]])

How to distribute a Numpy array along the diagonal of an array of higher dimension?

I have three two dimensional Numpy arrays x, w, d and want to create a fourth one called a. w and d define only the shape of a with d.shape + w.shape. I want to have x in the entries of a with a zeros elsewhere.
Specifically, I want a loop-free version of this code:
a = np.zeros(d.shape + w.shape)
for j in range(d.shape[1]):
a[:,j,:,j] = x
For example, given:
x = np.array([
[2, 3],
[1, 1],
[8,10],
[0, 1]
])
w = np.array([
[ 0, 1, 1],
[-1,-2, 1]
])
d = np.matmul(x,w)
I want a to be
array([[[[ 2., 0., 0.],
[ 3., 0., 0.]],
[[ 0., 2., 0.],
[ 0., 3., 0.]],
[[ 0., 0., 2.],
[ 0., 0., 3.]]],
[[[ 1., 0., 0.],
[ 1., 0., 0.]],
[[ 0., 1., 0.],
[ 0., 1., 0.]],
[[ 0., 0., 1.],
[ 0., 0., 1.]]],
[[[ 8., 0., 0.],
[10., 0., 0.]],
[[ 0., 8., 0.],
[ 0., 10., 0.]],
[[ 0., 0., 8.],
[ 0., 0., 10.]]],
[[[ 0., 0., 0.],
[ 1., 0., 0.]],
[[ 0., 0., 0.],
[ 0., 1., 0.]],
[[ 0., 0., 0.],
[ 0., 0., 1.]]]])
This answer inspired the following solution:
# shape a: (4, 3, 2, 3)
# shape x: (4, 2)
a = np.zeros(d.shape + w.shape)
a[:, np.arange(a.shape[1]), :, np.arange(a.shape[3])] = x
It uses Numpy's broadcasting (see here or here) im combination with Advanced Indexing to enlarge x to fit the slicing.
I happen to have an even simpler solution: a = np.tensordot(x, np.identity(3), axes = 0).swapaxes(1,2)
The size of the identity matrix will be decided by the number of times you wish to repeat the elements of x.

Numpy triu generates nan when called on matrices with infinite values

Just found some unexpected behaviour in Numpy 1.8.1 in the triu function.
import numpy as np
a = np.zeros((4, 4))
a[1:, 2] = np.inf
a
>>>array([[ 0., 0., 0., 0.],
[ inf, 0., 0., 0.],
[ inf, 0., 0., 0.],
[ inf, 0., 0., 0.]])
np.triu(a)
>>>array([[ 0., 0., 0., 0.],
[ nan, 0., 0., 0.],
[ nan, 0., 0., 0.],
[ nan, 0., 0., 0.]])
Would this behaviour ever be desirable? Or shall I file a bug report?
Edit
I raised an issue on the Numpy github page
1. Explanation
Looks like you ignored the RuntimeWarning:
>>> np.triu(a)
twodim_base.py:450: RuntimeWarning: invalid value encountered in multiply
out = multiply((1 - tri(m.shape[0], m.shape[1], k - 1, dtype=m.dtype)), m)
The source code for numpy.triu is as follows:
def triu(m, k=0):
m = asanyarray(m)
out = multiply((1 - tri(m.shape[0], m.shape[1], k - 1, dtype=m.dtype)), m)
return out
This uses numpy.tri to get an array with ones below the diagonal and zeros above, and subtracts this from 1 to get an array with zeros below the diagonal and ones above:
>>> 1 - np.tri(4, 4, -1)
array([[ 1., 1., 1., 1.],
[ 0., 1., 1., 1.],
[ 0., 0., 1., 1.],
[ 0., 0., 0., 1.]])
Then it multiplies this element-wise with the original array. So where the original array has inf, the result has inf * 0 which is NaN.
2. Workaround
Use numpy.tril_indices to generate the indices of the lower triangle, and set all those entries to zero:
>>> a = np.ones((4, 4))
>>> a[1:, 0] = np.inf
>>> a
array([[ 1., 1., 1., 1.],
[ inf, 1., 1., 1.],
[ inf, 1., 1., 1.],
[ inf, 1., 1., 1.]])
>>> a[np.tril_indices(4, -1)] = 0
>>> a
array([[ 1., 1., 1., 1.],
[ 0., 1., 1., 1.],
[ 0., 0., 1., 1.],
[ 0., 0., 0., 1.]])
(Depending on what you are going to do with a, you might want to take a copy before zeroing these entries.)

Matrix with given numbers in random places in python/numpy

I have an NxN matrix filled with zeros. Now I want to add to the matrix, say, n ones and m twos to random places. I.e. I want to create a matrix where there is some fixed amount of a given number at random places and possibly a fixed amount of some other given number in random places. How do I do this?
In Matlab I would do this by making a random permutation of the matrix indices with randperm() and then filling the n first indices given by randperm of the matrix with ones and m next with twos.
You can use numpy.random.shuffle to randomly permute an array in-place.
>>> import numpy as np
>>> X = np.zeros(N * N)
>>> X[:n] = 1
>>> X[n:n+m] = 2
>>> np.random.shuffle(X)
>>> X = X.reshape((N, N))
Would numpy.random.permutation be what you are looking for?
You can do something like this:
In [9]: a=numpy.zeros(100)
In [10]: p=numpy.random.permutation(100)
In [11]: a[p[:10]]=1
In [12]: a[p[10:20]]=2
In [13]: a.reshape(10,10)
Out[13]:
array([[ 0., 1., 0., 0., 0., 2., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0., 2., 0.],
[ 0., 2., 0., 0., 0., 0., 2., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 2., 0., 2., 1., 1., 0.],
[ 0., 0., 0., 0., 1., 0., 2., 0., 0., 0.],
[ 0., 2., 0., 2., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 2., 0., 0., 0., 1., 0.]])
Here we create a random permutation, then set the first 10 indices taken from the permutation in a to 1, then the next 10 indices to 2.
To generate the indices of the elements for where to add ones and twos, what about this?
# assuming N, n and m exist.
In [1]: import random
In [3]: indices = [(m, n) for m in range(N) for n in range(N)]
In [4]: random_indices = random.sample(indices, n + m)
In [5]: ones = random_indices[:n]
In [6]: twos = random_indices[n:]
Corrected as commented by Petr Viktorin in order not to have overlapping indexes in ones and twos.
An alternate way to generate the indices:
In [7]: import itertools
In [8]: indices = list(itertools.product(range(N), range(N)))

Categories