How to get non-zero values from sparse SciPy matrix? - python

How can I get the values of a sparse matrix? For example:
x = sp.sparse.csr_matrix([[0,0,-1,1,0],[0,0,0,0,-1]])
print(x)
(0, 2) -1
(0, 3) 1
(1, 4) -1
I am just looking for the values of the data, i.e., [-1, 1, 1].

This can be accessed through the data property:
x = sp.sparse.csr_matrix([[0,0,-1,1,0],[0,0,0,0,-1]])
print(x.data)
[-1 1 -1]

Related

Tying all values along the diagonal of a matrix in PyTorch

The model I am implementing has a set of parameters a1, ..., aH that correspond to weighing previous outputs. It's realized through multiplying a matrix that looks like this:
a1 0 0 0 0 ...
a2 a1 0 0 0 ...
a3 a2 a1 0 0 ...
: : : : :
In the current implementation, the a's are saved in a one-dimensional nn.parameter.Parameter with H entries, from which the matrix is constructed during each forward pass. The gradient of the matrix automatically propagates to the parameters via autograd.
However, this requires constructing the matrix anew every forward pass. Is there a way to have the matrix itself be the parameter but tie the weights along the main diagonal and lower subdiagonals so that it is equivalent to constructing it from the parameter vector?
You can use dia_matrix from scipy
from scipy.sparse import dia_matrix
a = [1,2,3]
b = [4,5]
m = dia_matrix([a, b], [0, -1])
print(m)
# (0, 0) [1, 2, 3]
# (0, 1) [4, 5]
a[0] = 10
print(m)
# (0, 0) [10, 2, 3]
# (0, 1) [4, 5]

Subtracting one dimensional array (list of scalars) from 3 dimensional arrays using broadcasting

I have a one dimesional array of scalar values
Y = np.array([1, 2])
I also have a 3-dimensional array:
X = np.random.randint(0, 255, size=(2, 2, 3))
I am attempting to subtract each value of Y from X, so I should get back Z which should be of shape (2, 2, 2, 3) or maybe (2, 2, 2, 3).
I can"t seem to figure out how to do this via broadcasting.
I tried changing the change of Y:
Y = np.array([[[1, 2]]])
but not sure what the correct shape should be.
Broadcasting lines up dimensions on the right. So you're looking to operate on a (2, 1, 1, 1) array and a (2, 2, 3) array.
The simplest way I can think of is using reshape:
Y = Y.reshape(-1, 1, 1, 1)
More generally:
Y = Y.reshape(-1, *([1] * X.ndim))
At most one of the arguments to reshape can be -1, indicating all the remaining size not accounted for by other dimensions.
To get Z of shape (2, 2, 2, 3):
Z = X - Y.reshape(-1, *([1] * X.ndim))
If you were OK with having Z of shape (2, 2, 3, 2), the operation would be much simpler:
Z = X[..., None] - Y
None or np.newaxis will insert a unit axis into the end of X's shape, making it broadcast properly with the 1D Y.
I am not entirely sure on which dimension you want your subtraction to take place, but X - Y will not return an error if you define Y such as Y = numpy.array([1,2]).reshape(2, 1, 1) or Y = numpy.array([1,2]).reshape(1, 2, 1).

Find all points within distance 1 of specific point in 2D numpy matrix

I want to find a list of points that are within range 1 (or exactly diagonal) of a point in my numpy matrix:
For example say my matrix m is:
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 1 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]
I would like to obtain a list of tuples or something representing all the coordinates of the 9 points with X's below:
[[0 0 0 0 0]
[0 X X X 0]
[0 X X X 0]
[0 X X X 0]
[0 0 0 0 0]]
Here is another example with the target point on the edge:
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 1]
[0 0 0 0 0]
[0 0 0 0 0]]
In this case there would only 6 points within distance 1 of the target point:
[[0 0 0 0 0]
[0 0 0 X X]
[0 0 0 X X]
[0 0 0 X X]
[0 0 0 0 0]]
EDIT:
Using David Herrings answer/comment about chebyshev distance here is my attempt to solve example 2 above assuming I know the coordinates of the target point:
from scipy.spatial import distance
point = [2, 4]
valid_points = []
for x in range(5):
for y in range(5):
if(distance.chebyshev(point, [x,y]) <= 1):
valid_points.append([x,y])
print(valid_points) # [[1, 3], [1, 4], [2, 3], [2, 4], [3, 3], [3, 4]]
This seems a little inefficient for a bigger array as I only need to check a small set of cells really not the whole martix.
I think you're making it a little too complicated - no need to rely on complicated functions
import numpy as np
# set up matrix
x = np.zeros((5,5))
# add a single point
x[2,-1] = 1
# get coordinates of point as array
r, c = np.where(x)
# convert to python scalars
r = r[0]
c = c[0]
# get boundaries of array
m, n = x.shape
coords = []
# loop over possible locations
for i in [-1, 0, 1]:
for j in [-1, 0, 1]:
# check if location is within boundary
if 0 <= r + i < m and 0 <= c + j < n:
coords.append((r + i, c + j))
print(coords)
>>> [(1, 3), (1, 4), (2, 3), (2, 4), (3, 3), (3, 4)]
There is no algorithm of interest here. If you don’t already know where the 1 is, first you have to find it, and you can’t do better than searching through every element. (You can get a constant-factor speedup by having numpy do this at C speed with argmax; use divmod to separate the flattened index into row and column.) Thereafter, all you do is add &pm;1 (or 0) to the coordinates unless it would take you outside the array bounds. You don’t ever construct coordinates only to discard them later.
A simple way would be to get all possible coordinates with a cartesian product
Setup the data:
x = np.array([[0,0,0], [0,1,0], [0,0,0]])
x
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 0]])
You know that the coordinates will be +/- 1 of your location:
loc = np.argwhere(x == 1)[0] # unless already known or pre-specified
v = [loc[0], loc[0]-1, loc[0]+1]
h = [loc[1], loc[1]-1, loc[1]+1]
output = []
for i in itertools.product(v, h):
if not np.any(np.array(i) >= x.shape[0]) and not np.any(np.array(i) < 0): output.append(i)
print(output)
[(1, 1), (1, 0), (1, 2), (0, 1), (0, 0), (0, 2), (2, 1), (2, 0), (2, 2)]

Scipy: sparse matrix conditional removal of columns

I have a large (79 000 x 480 000) sparse csr matrix. I am trying to remove all columns (within a certain range) for which each value < k.
In regular numpy matrices this is simply done by a mask:
m = np.array([[0,2,1,1],
[0,4,2,0],
[0,3,4,0]])
mask = (arr < 2)
idx = mask.all(axis=0)
result = m[:, ~idx]
print result
>>> [[2 1]
[4 2]
[3 4]]
The unary bitwise negation operator ~ and boolean mask functionality are not available for sparse matrices however. What is the best method to:
Obtain the indices of columns where all values fulfill condition e < k.
Remove these columns based on the list of indices.
Some things to note:
The columns represent ngram text features: there are no columns in the matrix for which each element is zero.
Is using the csr matrix format even a plausible choice for this?
Do I transpose and make use of .nonzero()? I have a fair amount of working memory (192GB) so time efficiency is preferable to memory efficiency.
If I do
M = sparse.csr_matrix(m)
M < 2
I get an efficiency warning; all the 0 values of M satisfy the condition,
In [1754]: print(M)
(0, 1) 2
(0, 2) 1
(0, 3) 1
(1, 1) 4
(1, 2) 2
(2, 1) 3
(2, 2) 4
In [1755]: print(M<2)
/usr/lib/python3/dist-packages/scipy/sparse/compressed.py:275: SparseEfficiencyWarning: Comparing a sparse matrix with a scalar greater than zero using < is inefficient, try using >= instead.
warn(bad_scalar_msg, SparseEfficiencyWarning)
(0, 0) True # not in M
(0, 2) True
(0, 3) True
(1, 0) True # not in M
(1, 3) True
(2, 0) True # not in M
(2, 3) True
In [1756]: print(M>=2) # all a subset of M
(0, 1) True
(1, 1) True
(1, 2) True
(2, 1) True
(2, 2) True
If I=M>=2; there isn't an all method, but there is a sum.
In [1760]: I.sum(axis=0)
Out[1760]: matrix([[0, 3, 2, 0]], dtype=int32)
sum is actually performed using a matrix multiplication
In [1769]: np.ones((1,3),int)*I
Out[1769]: array([[0, 3, 2, 0]], dtype=int32)
Using nonzero to find the nonzero columns:
In [1778]: np.nonzero(I.sum(axis=0))
Out[1778]: (array([0, 0], dtype=int32), array([1, 2], dtype=int32))
In [1779]: M[:,np.nonzero(I.sum(axis=0))[1]]
Out[1779]:
<3x2 sparse matrix of type '<class 'numpy.int32'>'
with 6 stored elements in Compressed Sparse Row format>
In [1780]: M[:,np.nonzero(I.sum(axis=0))[1]].A
Out[1780]:
array([[2, 1],
[4, 2],
[3, 4]], dtype=int32)
General points:
watch out for those 0 values when doing comparisons
watch out for False values when doing logic on sparse matrices
sparse matrices are optimized for math, especially matrix multiplication
sparse indexing isn't quite as powerful as array indexing; and not as fast either.
note when operations produce a dense matrix

numpy: ndenumerate for masked arrays?

Is there a way to enumerate over the non-masked locations of a masked numpy ndarray (e.g. in the way that ndenumerate does it for regular ndarrays, but omitting all the masked entries)?
EDIT: to be more precise: the enumeration should not only skip over the masked entries, but also show the indices of the non-masked ones in the original array. E.g. if the first five elements of a 1-d array are masked, and the next one has an unmasked value of 3, then the enumeration should start with something like ((5,), 3), ....
Thanks!
PS: note that, although it is possible to apply ndenumerate to a masked ndarray, the resulting enumeration does not discriminate between its masked and normal entries. In fact, ndenumerate not only does not filter out the masked entries from the enumeration, but it doesn't even replace the enumerated values with the masked constant. Therefore, one can't adapt ndenumerate for this task by just wrapping ndenumerate with a suitable filter.
You can access only valid entries using inverse of a mask as an index:
>>> import numpy as np
>>> import numpy.ma as ma
>>> x = np.array([11, 22, -1, 44])
>>> m_arr = ma.masked_array(x, mask=[0, 0, 1, 0])
>>> for index, i in np.ndenumerate(m_arr[~m_arr.mask]):
print index, i
(0,) 11
(1,) 22
(2,) 44
See this for details.
The enumeration over only valid entries with indices from the original array:
>>> for (index, val), m in zip(np.ndenumerate(m_arr), m_arr.mask):
if not m:
print index, val
(0,) 11
(1,) 22
(3,) 44
How about:
import numpy as np
import itertools
def maenumerate(marr):
mask = ~marr.mask.ravel()
for i, m in itertools.izip(np.ndenumerate(marr), mask):
if m: yield i
N = 12
a = np.arange(N).reshape(2, 2, 3)+10
b = np.ma.array(a, mask = (a%5 == 0))
for i, val in maenumerate(b):
print i, val
which yields
(0, 0, 1) 11
(0, 0, 2) 12
(0, 1, 0) 13
(0, 1, 1) 14
(1, 0, 0) 16
(1, 0, 1) 17
(1, 0, 2) 18
(1, 1, 0) 19
(1, 1, 2) 21

Categories