I've the following adjacency matrix:
array([[0, 1, 1, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 0, 1, 0],
[0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 0, 1, 0]])
Which can be drawn like that:
My goal is to identify the connected graph ABC and DEFG. It's seems that Depth-First Search algorithm is what I need and that Scipy implemented it. So here is my code:
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import depth_first_order
import numpy as np
test = np.asarray([
[0, 1, 1, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 0, 1, 0],
[0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 0, 1, 0]
])
graph = csr_matrix(test)
result = depth_first_order(graph, 0)
But I don't get the result:
>>> result
(array([0, 1, 2]), array([-9999, 0, 1, -9999, -9999, -9999, -9999]))
what's that array([-9999, 0, 1, -9999, -9999, -9999, -9999]) ? Also, in the documentation, they talk about a sparse matrix not about an adjacency one. But an adjacency matrix seems to be a sparse matrix by definition so it's not clear for me.
While you could indeed use DFS to find the connected components, SciPy makes it even easier with scipy.sparse.csgraph.connected_components. With your example:
In [3]: connected_components(test)
Out[3]: (2, array([0, 0, 0, 1, 1, 1, 1], dtype=int32))
Well to start, you have an undirected graph. Look at the documentation again and set the directed parameter to false since the default is True.
The first array you get is the nodes reachable from where you start (node 0 = node a) including your starting node.
So you start at node a and you can reach b and c. You can't reach the rest of the graph since you have a disconnected graph. DFS is doing what it is supposed to do. You will need to do DFS on the d node to get the second graph.
Related
I'm trying to find the matrix exponential of a sparse matrix:
import numpy as np
b = np.array([[1, 0, 1, 0, 1, 0, 1, 1, 1, 0],
[1, 0, 0, 0, 1, 1, 0, 1, 1, 0],
[0, 1, 1, 0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1, 0, 0],
[1, 1, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 1, 1],
[0, 0, 1, 0, 1, 0, 1, 1, 0, 0],
[1, 0, 0, 0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 0, 1, 1, 1, 0],
[0, 0, 0, 1, 0, 1, 1, 0, 0, 1]])
I can calculate this using scipy.linalg.expm, but it is slow for larger matrices.
from scipy.linalg import expm
S1 = expm(b)
Since this is a sparse matrix, I tried converting b to a scipy.sparse matrix and calling that function on the converted sparse matrix:
import scipy.sparse as sp
import numpy as np
sp_b = sp.csr_matrix(b)
S1 = expm(sp_b);
But I get the following error:
loop of ufunc does not support argument 0 of type csr_matrix which has no callable exp method
How can I calculate the matrix exponential of a sparse matrix?
You need to use scipy.sparse.linalg.expm for your sparse matrix instead of scipy.linalg.expm.
import scipy.sparse as sp
from scipy.sparse.linalg import expm
import numpy as np
b = np.array([[1, 0, 1, 0, 1, 0, 1, 1, 1, 0],
[1, 0, 0, 0, 1, 1, 0, 1, 1, 0],
[0, 1, 1, 0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1, 0, 0],
[1, 1, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 1, 1],
[0, 0, 1, 0, 1, 0, 1, 1, 0, 0],
[1, 0, 0, 0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 0, 1, 1, 1, 0],
[0, 0, 0, 1, 0, 1, 1, 0, 0, 1]])
sp_b = sp.csr_matrix(b)
S1 = expm(sp_b);
Note: As you found, defining your matrix as a CSR matrix gives the warning "SparseEfficiencyWarning: spsolve is more efficient when sparse b is in the CSC matrix format". To get rid of this, you can do as the warning suggests, and define a CSC matrix if that makes sense for your application:
sp_b = sp.csc_matrix(b)
I have a tensor with three dimensions and three classes (0: background, 1: first class, 2: second class). I would like to find connected clusters and assign outlier's labels by performing a majority vote. A 2D example:
import numpy as np
data = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 1, 2],
[1, 2, 0, 0, 2, 2, 2],
[0, 1, 0, 0, 0, 2, 0],
[0, 0, 0, 0, 0, 0, 0],])
should be changed to
data = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 2, 2],
[1, 1, 0, 0, 2, 2, 2],
[0, 1, 0, 0, 0, 2, 0],
[0, 0, 0, 0, 0, 0, 0],])
It is enough to see connected regions as one cluster an count the appearence of the labels. I am not looking for any machine learning method.
You can use scipy.ndimage.measurements.label to find the connected components and then use np.bincount for the counting
from scipy.ndimage import measurements
lbl,ncl = measurements.label(data)
lut = np.bincount((data+2*lbl).ravel(),None,2*ncl+3)[1:].reshape(-1,2).argmax(1)+1
lut[0] = 0
lut[lbl]
# array([[0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0],
# [0, 1, 0, 0, 0, 0, 0],
# [1, 1, 1, 0, 0, 2, 2],
# [1, 1, 0, 0, 2, 2, 2],
# [0, 1, 0, 0, 0, 2, 0],
# [0, 0, 0, 0, 0, 0, 0]])
In what I am working on, I have two numpy matrices, both the same size, filled with 0's and 1's for simplicity (but let's say it could be filled with any numbers). What I would like to know is a way to extract, from these two matrices, the position of the 1's that exist in the same position in both matrices.
For example, if I have the following two matrices and value
a = np.array([[0, 0, 0, 1, 0, 1],
[1, 1, 0, 1, 1, 1],
[1, 0, 1, 1, 0, 1],
[1, 0 ,1, 1, 1, 0],
[0, 0, 1, 0, 0, 0]])
b = np.array([[0, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 0]])
value = 1
then I would like a way to somehow get the information of all the locations where the value "1" exists in both matrices, i.e.:
result = [(0,5),(1,1),(2,3),(4,2)]
I guess the result could be thought of as an intersection, but in my case the position is important which is the reason I don't think np.intersect1d() would be much help. In the actual matrices I am working with, they are on the order of about 10,000 by 10,000, so this list would probably be a lot longer.
Thanks in advance for any help!
You could use numpy.argwhere:
import numpy as np
a = np.array([[0, 0, 0, 1, 0, 1],
[1, 1, 0, 1, 1, 1],
[1, 0, 1, 1, 0, 1],
[1, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 0, 0]])
b = np.array([[0, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 0]])
result = np.argwhere(a & b)
print(result)
Output
[[0 5]
[1 1]
[2 3]
[2 5]
[4 2]]
Given a matrix containing a polygon mask (here a small and simplistic case):
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
The outline is extracted with skimage.segmentation.find_boundaries(), giving:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0],
[0, 1, 0, 0, 0, 1, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
The outline's [row,column] (i.e. [y,x]) coordinates are then extracted giving:
outline = array([[2,2],[1,2],[1,3],[2,4],[3,5],[4,5],[5,4],[5,3],[5,2],[4,1],[3,1]])
These coordinates are then pruned to a minimal set that define the polygon (i.e. the vertices), giving:
vertices = array([[2,2],[1,2],[1,3],[3,5],[4,5],[5,4],[5,2],[4,1],[3,1]])
(Which corresponds to:)
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0],
[0, 1, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Is there a fast way using numpy/scipy/skimage/etc to get the outline coordinates (the array outline above) given the vertex coordinates (the array vertices above)?
Further, after getting back the outline coordinates, is there a good numpy/scipy/skimage way to get back the coordinates of all points in the original polygon mask?
Given 2 vertices in a polygon v1, v2 we can get all the points p which are part of the line from v1 to v2 using a line rasterization algorithm. A very fast algorithm for this is Bresenham's line drawing algorithm. After this you can apply this algorithm for each pair of adject vertices in the polygon. Although, I can not guarantee that the outline will be exactly the one in the original polygon since the rasterization algorithm will give the best set of points for the given line, not the one you have in the original algorithm (consider them compression errors).
For the filling algorithm, they are called polygon rasterization algorithm, but I can't help you here since I don't know which are best/fastest.
I've got an np.array 219 by 219 with mostly 0s and 2% of nonzeros and I know want to create new arrays where each of the nonzero values has 90% of chance of becoming a zero.
I now know how to change the n-th non zero value to 0 but how to work with probabilities?
Probably this can be modified:
index=0
for x in range(0, 219):
for y in range(0, 219):
if (index+1) % 10 == 0:
B[x][y] = 0
index+=1
print(B)
You could use np.random.random to create an array of random numbers to compare with 0.9, and then use np.where to select either the original value or 0. Since each draw is independent, it doesn't matter if we replace a 0 with a 0, so we don't need to treat zero and nonzero values differently. For example:
In [184]: A = np.random.randint(0, 2, (8,8))
In [185]: A
Out[185]:
array([[1, 1, 1, 0, 0, 0, 0, 1],
[1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0, 1],
[0, 1, 0, 1, 1, 1, 1, 0],
[1, 1, 0, 1, 1, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 1, 0, 1]])
In [186]: np.where(np.random.random(A.shape) < 0.9, 0, A)
Out[186]:
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0]])
# first method
prob=0.3
print(np.random.choice([2,5], (5,), p=[prob,1-prob]))
# second method (i prefer)
import random
import numpy as np
def randomZerosOnes(a,b, N, prob):
if prob > 1-prob:
n1=int((1-prob)*N)
n0=N-n1
else:
n0=int(prob*N)
n1=N-n0
zo=np.concatenate(([a for _ in range(n0)] ,[b for _ in range(n1)] ), axis=0 )
random.shuffle(zo)
return zo
zo=randomZerosOnes(2,5, N=5, prob=0.3)
print(zo)