Scipy sparse matrix from list of list with integers - python

How to make a scipy sparse matrix from a list of lists with integers (or strings)?
[[1,2,3],
[1],
[1,4,5]]
Should become:
[[1, 1, 1, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 1, 1]]
But then in scipy's compressed sparse format?

I assume you want to have a 5 by 5 matrix at the end. also indices start from 0.
In [18]:import scipy.sparse as sp
In [20]: a = [[0,1,2],[0],[0,3,4]]
In [31]: m = sp.lil_matrix((5,5), dtype=int)
In [32]: for row_index, col_indices in enumerate(a):
m[row_index, col_indices] = 1
....:
In [33]: m.toarray()
Out[33]:
array([[1, 1, 1, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])

Related

How can I get a numpy array slides by choosing with specific rows and columns inplace?

As in the title, if I have a matrix a
a = np.diag(np.arange(5))
array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 4]])
How can I assign a new 4x4 matrix or even 3x4 matrix to a without i-th row and i-th column? Let's say
b = array([[1,1,1,1],
[1,1,1,1],
[1,1,1,1])
I want to slice a and remove the first and second row and the second column of the matrix, which is something in R like
a[c(-1,-2), -2] = b
a =
array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1]])
But in python, I tried something like
a[[2,3,4],:][:,[0,1,3,4]]
output:
array([0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])
This operation won't allow me to assign a new matrix to slices of a.
How can I do that? I really appreciate any help you can provide.
p.s.
I found in this special case, I can assign values by blocks. But what I actually want to ask is when we do slice like a[2:5, [0,2,3,4]], we can get a 3x4 matrix, and assign a new matrix to that position of the matrix. But I want to do is to slice 'a[[0,2,3,4],[0,2,3,4]]` to get a 4x4 matrix or other shapes(the index for row and column may even be random), and assign a new matrix to that position. But numpy gives me a 1d array.
newmatrix = a[[0, 1, 3, 4], :][:, [0, 1, 3, 4]]
Regarding setting the values of a matric part of a larger matrix, I think there is no direct option. But you can create the original matrix around the one to be added:
before = np.array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 4]])
insert_array = np.array([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]])
first two rows without second column
first_step = np.delete(before[:2, :], 1, 1)
or
first_step = before[:2, [0, 2, 3, 4]]
appended to insert matrix
second_step = np.insert(insert_array, 0, first_step, axis=0)
second column appended
third_step = np.insert(second_step, 1, before[:, 1], axis=1)
final matrix
third_step = np.array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1]])
I can't find a one-step solution to do that. But I think we can assign matrix by block.
a[2:5, 0] = 1
a[2:5, 2:5] = 1
Then I can get what I want.

How to set values in a 2d numpy array given 1D indices for each row?

In numpy you can set the indices of a 1d array to a value
import numpy as np
b = np.array([0, 0, 0, 0, 0])
indices = [1, 3]
b[indices] = 1
b
array([0, 1, 0, 1, 0])
I'm trying to do this with multi-rows and an index for each row in the most programmatically elegant and computationally efficient way possible. For example
b = np.array([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])
indices = [[1, 3], [0, 1], [0, 3]]
The desired result is
array([[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[1, 0, 0, 1, 0]])
I tried b[indices] and b[:,indices] but they resulted in an error or undesired result.
From searching, there are a few work arounds, but each tends to need at least 1 loop in python.
Solution 1: Run a loop through each row of the 2d array. The draw back for this is that the loop runs in python, and this part won't take advantage of numpy's c processing.
Solution 2: Use numpy put. The draw back is put works on a flattened version of the input array, so the indices need to be flattened too, and altered by the row size and number of rows, which would use a double for loop in python.
Solution 3: put_along_axis seems to only be able to set 1 value per row, so I would need to repeat this function for the number of values per row.
What would be the most computationally and programatically elegant solution? Anything where numpy would handle all the operations?
In [330]: b = np.zeros((3,5),int)
To set the (3,2) columns, the row indices need to be (3,1) shape (matching by broadcasting):
In [331]: indices = np.array([[1,3],[0,1],[0,3]])
In [332]: b[np.arange(3)[:,None], indices] = 1
In [333]: b
Out[333]:
array([[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[1, 0, 0, 1, 0]])
put along does the same thing:
In [335]: b = np.zeros((3,5),int)
In [337]: np.put_along_axis(b, indices,1,axis=1)
In [338]: b
Out[338]:
array([[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[1, 0, 0, 1, 0]])
On solution to build the indices in each dimension and then use a basic indexing:
from itertools import chain
b = np.array([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])
# Find the indices along the axis 0
y = np.arange(len(indices)).repeat(np.fromiter(map(len, indices), dtype=np.int_))
# Flatten the list and convert it to an array
x = np.fromiter(chain.from_iterable(indices), dtype=np.int_)
# Finaly set the items
b[y, x] = 1
It works even for indices lists with variable-sized sub-lists like indices = [[1, 3], [0, 1], [0, 2, 3]]. If your indices list always contains the same number of items in each sub-list then you can use the (more efficient) following code:
b = np.array([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])
indices = np.array(indices)
n, m = indices.shape
y = np.arange(n).repeat(m)
x = indices.ravel()
b[y, x] = 1
Simple one-liner based on Jérôme's answer (requires all items of indices to be equal-length):
>>> b[np.arange(np.size(indices)) // len(indices[0]), np.ravel(indices)] = 1
>>> b
array([[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[1, 0, 0, 1, 0]])

Cluster non-zero values in a 2D NumPy array

I want to cluster non-zero locations in a NumPy 2D array for MSER detection. Then I want to find the number of points in each cluster and remove those clusters which do not have number of points between some x and y (10 and 300).
I have tried clustering them by searching with neighbouring points but the method fails for concave-shaped non-zero clusters.
[[0, 1, 0, 0, 1],
[0, 1, 1, 1, 1],
[0, 0, 0, 0, 0],
[1, 1, 0, 1, 1],
[1, 0, 0, 1, 1]]
should output, for x=4 and y=5 (both included)
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 0, 0, 1, 1]]
I'm not sure I have understood your question correctly, but I think scikit-image's label and regionprops could get the job done.
In [6]: import numpy as np
In [7]: from skimage import measure, regionprops
In [8]: img = np.array([[0, 7, 0, 0, 7],
...: [0, 9, 1, 1, 4],
...: [0, 0, 0, 0, 0],
...: [2, 1, 0, 2, 1],
...: [1, 0, 0, 6, 4]])
...:
In [9]: arr = measure.label(img > 0)
In [10]: arr
Out[10]:
array([[0, 1, 0, 0, 1],
[0, 1, 1, 1, 1],
[0, 0, 0, 0, 0],
[2, 2, 0, 3, 3],
[2, 0, 0, 3, 3]])
In [11]: print('Label\t# pixels')
...: for region in measure.regionprops(arr):
...: print(f"{region['label']}\t{region['area']}")
...:
Label # pixels
1 6
2 3
3 4

Count indices to array to produce heatmap

I'd like to accumulate indices that point to a m-by-n array to another array of that very shape to produce a heatmap. For example, these indices:
[
[0, 1, 2, 0, 1, 2]
[0, 1, 0, 0, 0, 2]
]
would produce the following array:
[
[2, 0, 0]
[1, 1, 0]
[1, 0, 1]
]
I've managed to succesfully implement an algorithm, but I started wondering, whether there is already a built-in NumPy solution for this kind of problem.
Here's my code:
a = np.array([[0, 1, 2, 0, 1, 2], [0, 1, 0, 0, 0, 2]])
def _gather_indices(indices: np.ndarray, shape: tuple):
heat = np.zeros(shape)
for i in range(indices.shape[-1]):
heat[tuple(indices[:, i])] += 1
Two methods could be suggested.
With np.add.at -
heat = np.zeros(shape,dtype=int)
np.add.at(heat,(a[0],a[1]),1)
Or with tuple() based one for a more aesthetic one -
np.add.at(heat,tuple(a),1)
With bincount -
idx = np.ravel_multi_index(a,shape)
np.bincount(idx,minlength=np.prod(shape)).reshape(shape)
Additionally, we could compute shape using the max-limits of the indices in a -
shape = a.max(axis=1)+1
Sample run -
In [147]: a
Out[147]:
array([[0, 1, 2, 0, 1, 2],
[0, 1, 0, 0, 0, 2]])
In [148]: shape = (3,3)
In [149]: heat = np.zeros(shape,dtype=int)
...: np.add.at(heat,(a[0],a[1]),1)
In [151]: heat
Out[151]:
array([[2, 0, 0],
[1, 1, 0],
[1, 0, 1]])
In [173]: idx = np.ravel_multi_index(a,shape)
In [174]: np.bincount(idx,minlength=np.prod(shape)).reshape(shape)
Out[174]:
array([[2, 0, 0],
[1, 1, 0],
[1, 0, 1]])

Generate samples from a random matrix

Assume we have a random matrix A of size n*m. Each elements A_ij is the success probability of a Bernoulli distribution.
I want to draw a sample z from A with the following rule:
z_ij draw from Bernoulli(A_ij)
Is there any numpy function support this?
EDIT: operations such as
arr = numpy.random.random([10, 5])
f = lambda x: numpy.random.binomial(1, x)
sp = map(f, arr)
are inefficient. Is there any faster method?
You can directly give an array as one of the arguments of your binomial distribution, for example:
import numpy as np
arr = np.random.random([10, 5])
sp = np.random.binomial(1, arr)
sp
gives
array([[0, 0, 0, 0, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1],
[0, 1, 0, 1, 0],
[0, 1, 1, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 0, 0],
[1, 0, 0, 1, 0]])

Categories