I have created a adjacency matrix using networkx as below:
from networkx.algorithms.bipartite.matrix import biadjacency_matrix as adj
user_node_list = data['user_id'].unique()
item_node_list = data['item_id'].unique()
adj_matrix = adj(B, user_node_list, column_order=item_node_list, dtype=None, weight='rating', format='csr')
I want to visualize this adj_matrix. How can I do this?
You can use Pandas to visualize your adj_matrix as following:
import pandas as pd
A = pd.DataFrame(adj_matrix)
Much of the time we're working with graphs with sparse adjacency matrices, so networkx returns a SciPy Compressed Sparse Row matrix rather than a numpy.ndarray or numpy.matrix. The former representation uses more efficient data structures and algorithms for representing and processing sparse matrices. In particular the __repr__ representation of the matrix differs from that of a vanilla (dense) NumPy matrix. It will look something like
<11x11 sparse matrix of type '<class 'numpy.int64'>'
with 28 stored elements in Compressed Sparse Row format>
This makes sense because if the representation of a CSR matrix were the same as what we see with a dense matrix, a simple print statement or logging message could have serious performance impacts if the matrix were very large.
Compare the above output with the __repr__ output of a vanilla (dense) NumPy matrix:
matrix([[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0]])
which allows us to inspect the matrix elements visually (I am guessing that this is what was meant with "visualize the adj_matrix").
To convert a sparse CSR matrix to a dense NumPy matrix, simply do sparse_matrix.todense(). Note that this representation of a sparse matrix will require substantially more memory, so be mindful of that when working with larger graphs.
Related
I've the following adjacency matrix:
array([[0, 1, 1, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 0, 1, 0],
[0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 0, 1, 0]])
Which can be drawn like that:
My goal is to identify the connected graph ABC and DEFG. It's seems that Depth-First Search algorithm is what I need and that Scipy implemented it. So here is my code:
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import depth_first_order
import numpy as np
test = np.asarray([
[0, 1, 1, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 0, 1, 0],
[0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 0, 1, 0]
])
graph = csr_matrix(test)
result = depth_first_order(graph, 0)
But I don't get the result:
>>> result
(array([0, 1, 2]), array([-9999, 0, 1, -9999, -9999, -9999, -9999]))
what's that array([-9999, 0, 1, -9999, -9999, -9999, -9999]) ? Also, in the documentation, they talk about a sparse matrix not about an adjacency one. But an adjacency matrix seems to be a sparse matrix by definition so it's not clear for me.
While you could indeed use DFS to find the connected components, SciPy makes it even easier with scipy.sparse.csgraph.connected_components. With your example:
In [3]: connected_components(test)
Out[3]: (2, array([0, 0, 0, 1, 1, 1, 1], dtype=int32))
Well to start, you have an undirected graph. Look at the documentation again and set the directed parameter to false since the default is True.
The first array you get is the nodes reachable from where you start (node 0 = node a) including your starting node.
So you start at node a and you can reach b and c. You can't reach the rest of the graph since you have a disconnected graph. DFS is doing what it is supposed to do. You will need to do DFS on the d node to get the second graph.
How can I efficiently insert sub-matrices at specific positions into my sparse matrix? Also, which scipy sparse matrix class is recommended for such an incremental construction?
More specifically, how can I fill the matrix M in the code below?
def rrd(mesh, rel_rotations, neighbors, R_0):
M = scipy.sparse.lil_matrix((N_FACES*9*3,N_FACES*9))
for i in range(0,N_FACES*27,27):
for j in range(3):
for k in range(0,N_FACES*9,9):
M[i+j*9:i+(j+1)*9,k:k+9] = -np.eye(9)
for i in range(len(rel_rotations)):
diagonals = [
rel_rotations[i][0][2],
np.append(rel_rotations[i][0][1].repeat(3), rel_rotations[i][1][2].repeat(3)),
np.append(rel_rotations[i][0][0].repeat(3), np.append(rel_rotations[i][1][1].repeat(3),
rel_rotations[i][2][2].repeat(3))),
np.append(rel_rotations[i][1][0].repeat(3), rel_rotations[i][2][1].repeat(3)),
rel_rotations[i][2][0].repeat(3)
]
diag_rel_rotations = scipy.sparse.diags(diagonals, [-6,-3,0,3,6], shape=(9,9)).todense()
mod = i % 3
div = int((i-mod)/3)
n_idx = neighbors[div][mod]
M[i+mod*9:i+(mod+1)*9][n_idx*9:(n_idx+1)*9] = diag_rel_rotations
Slicing doesn't work here and I looked through some different types of sparse matrices but couldn't figure out which is the appropriate one for this problem.
lil is the right one for assignment.
In [553]: M = sparse.lil_matrix((9,9), dtype=int)
In [554]: M
Out[554]:
<9x9 sparse matrix of type '<class 'numpy.int64'>'
with 0 stored elements in LInked List format>
In [555]: M[2:5, 3:6] = np.eye(3)
In [556]: M
Out[556]:
<9x9 sparse matrix of type '<class 'numpy.int64'>'
with 3 stored elements in LInked List format>
In [557]: M.A
Out[557]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]])
In [558]: d = sparse.diags([[1,2],[1,2,3],[2,3]], [-1,0,1])
In [562]: M[0:3, 6:9] = d
I am trying to make a special diagonal matrix that looks like this:
[[1,1,0,0,0,0],
[0,0,1,1,0,0],
[0,0,0,0,1,1]]
It is slightly different from the question here: Make special diagonal matrix in Numpy
I tried tweaking the solution but couldn't quite get it.
Appreciate any advice on how to achieve this efficiently.
Not as elegant as in comments, but :
a=4 # number of rows
b=a*2 #number of columns
np.array((([1]*2+[0]*b)*a)[:-b]).reshape(a,b)
array([[1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1]])
works for any a.
I have a sparse matrix stored on disk in coordinate format, (triplet format).
I would like to read chunks of the matrix into memory, using scipy.sparse, however, when doing this, scipy will always assume a dense matrix indexing from 0,0, regardless of the chunk.
This means, for example, that for the last 'chunk' in the sparse matrix scipy will interpret as being a huge matrix that only has some values in the bottom right corner.
How can I correctly handle the chunks so that when doing toarray to create a dense matrix it only creates the subset corresponding to that chunk?
The reason for doing this is that, even sparse, the matrix is too large for memory (approx 600 million 32bit floating point values) and to display on screen (as the matrix represents a geospatial raster) I need to convert it to a dense matrix to store in a geospatial format (e.g. geotiff).
You should be able tweak the row and col values when building the subset. For example:
In [84]: row=np.arange(10)
In [85]: col=np.random.randint(0,6,row.shape)
In [86]: data=np.ones(row.shape,dtype=int)*2
In [87]: M=sparse.coo_matrix((data,(row,col)),shape=(10,6))
In [88]: M.A
Out[88]:
array([[0, 0, 2, 0, 0, 0],
[0, 0, 0, 0, 0, 2],
[0, 0, 0, 2, 0, 0],
[0, 0, 2, 0, 0, 0],
[0, 0, 2, 0, 0, 0],
[0, 2, 0, 0, 0, 0],
[2, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 2, 0],
[0, 0, 0, 2, 0, 0],
[0, 0, 0, 0, 0, 2]])
To build a matrix with a subset of the rows use:
In [89]: M1=sparse.coo_matrix((data[5:],(row[5:]-5,col[5:])),shape=(5,6))
In [90]: M1.A
Out[90]:
array([[0, 2, 0, 0, 0, 0],
[2, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 2, 0],
[0, 0, 0, 2, 0, 0],
[0, 0, 0, 0, 0, 2]])
You'll have to decide whether you want to specify the shape for M1, or let it deduce it from the range of row and col.
If these coordinates are not sorted, or you also want to take a subrange of col, things could get more complicated. But I think this captures the basic idea.
How to construct sparse matrix from diagonal vectors like this:
Lets say my matrix is square with dimension N=6 and i have the following vector
vec = np.array([[1], [1,2]])
and I want to put those parts on diagonals
offset = np.array([2,3])
but vec[0] should start at Mat[0,2] and vec[1] should start at Mat[1,4]
I know about scipy.sparse.diags() but I don't think there is a way to specify just part of a diagonal where non-zero elements are present.
This is just an example to illustrate the problem. In reality I deal with very big arrays and I dont want to waste memory for useless zeros.
Is this the matrix that you want?
In [200]: sparse.dia_matrix(([[0,0,1,0,0,0],[0,0,0,0,1,2]],[2,3]),(6,6)).A
Out[200]:
array([[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 2],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
Yes, the specification includes zeros, which could be annoying in larger cases.
spdiags just wraps the dia_matrix, with the option of converting the result to another format. In your example that converts a 7 element sparse to a 3.
sparse.diags accepts a ragged list of values, but they still need to match the diagonals in length. And internally it converts them to the rectangular array that dia_matrix takes.
S3=sparse.diags([[1,0,0,0],[0,1,2]],[2,3],(6,6))
So if you really need to be parsimonious about the zeros you need to go the coo route.
For example:
In [363]: starts = [[0,2],[1,4]]
In [364]: data = np.concatenate(vec)
In [365]: rows=np.concatenate([range(s[0],s[0]+len(v)) for s,v in zip(starts, vec)])
In [366]: cols=np.concatenate([range(s[1],s[1]+len(v)) for s,v in zip(starts, vec)])
In [367]: sparse.coo_matrix((data,(rows,cols)),(6,6)).A
Out[367]:
array([[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 2],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])