Changing graph dataset matrices from sparse format to dense

Changing graph dataset matrices from sparse format to dense - python

I am trying to use the CoRA dataset to train a graph neural network on tensorflow for the first time. The features and adjacency matrices provided by the dataset comes in a sparse representation but I don't need it here. Thus, I want to use numpy's todense() but it turns out it doesn't exist. For your reference, here is the relevant code:
import tensorflow as tf
import numpy as np
from spektral.datasets import citation
cora_dataset = spektral.datasets.citation.Citation(name='cora')
test_mask = cora_dataset.mask_te
train_mask = cora_dataset.mask_tr
val_mask = cora_dataset.mask_va
graph = cora_dataset.graphs[0]
features = graph.x
adj = graph.a
labels = graph.y
features = features.todense()
and the error is: "AttributeError: 'numpy.ndarray' object has no attribute 'todense'"
I would like to know if there has been a replacement for todense() or any other ways to convert sparse representations to dense.

You can use tf.sparse.to_dense to convert sparse matrix to dense matrix.
Here is the example:
indices = [
[0, 1],
[0, 2],
[0, 4],
[1, 0],
[1, 2],
[1, 3],
[1, 5],
[2, 0],
[2, 1],
[2, 3],
[2, 4],
[3, 1],
[3, 2],
[3, 7],
[4, 0],
[4, 2],
[4, 5],
[4, 6],
[5, 1],
[5, 4],
[5, 6],
[6, 4],
[6, 5],
[6, 7],
[7, 3],
[7, 6]]
values = [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]
dense_shape = [8,8]
adjacency_matrix = tf.sparse.SparseTensor(
indices, values, dense_shape
)
dense_matrix = tf.sparse.to_dense(adjacency_matrix)
I hope that helps.

Related

What does x[:,[0,1,2,2]](a kind of splicing) mean in numpy arrays in python? I was executing the following in anaconda

In Numpy using numpy.ones, I got this
import numpy as np
x=np.ones((3,3))
print(x)
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
x[:,[1,1,1,1]]
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])

x[:, [0,1,2,2]] means you are taking (all the rows of) columns 0,1,2 and 2 and combining them.
Since you have all ones in your data, it is hard to visualize but the following example will help:
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
x
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
x[:, [0]]
array([[1],
[4],
[7]])
x[:, [1]]
array([[2],
[5],
[8]])
x[:, [2]]
array([[3],
[6],
[9]])
x[:, [0, 2, 1, 1]]
out: array([[1, 3, 2, 2],
[4, 6, 5, 5],
[7, 9, 8, 8]])

Create a new sparse matrix from the operations of different rows of a given large sparse matrix in python

Lets say , S is the large scipy-csr-matrix(sparse) and a dictionary D with key -> index(position) of the row vector A in S & values -> list of all the indices(positions) of other row vectors l in S. For each row vector in l you subtract A and get the new vector which will be nothing but the new row vector to be updated in the new sparse matrix.
dictionary of form -> { 1 : [4 , 5 ,... ,63] }
then have to create a new sparse matrix with....
new_row_vector_1 -> S_vec1 - S_vec4
new_row_vector_2 -> S_vec1 - S_vec5
.
new_row_vector_n -> S_vec1 - S_vec63
where S_vecX is the Xth row vector of matrix S
Check out the pictorial explanation of the above statements
Numpy Example:
>>> import numpy as np
>>> s = np.array([[1,5,3,4],[3,0,12,7],[5,6,2,4],[4,6,6,4],[7,12,5,67]])
>>> s
array([[ 1, 5, 3, 4],
[ 3, 0, 12, 7],
[ 5, 6, 2, 4],
[ 4, 6, 6, 4],
[ 7, 12, 5, 67]])
>>> index_dictionary = {0: [2, 4], 1: [3, 4], 2: [1], 3: [1, 2], 4: [1, 3, 2]}
>>> n = np.zeros((10,4)) #sum of all lengths of values in index_dictionary would be the number of rows for the new array(n) and columns remain the same as s.
>>> n
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
>>> idx = 0
>>> for index in index_dictionary:
... for k in index_dictionary[index]:
... n[idx] = s[index]-s[k]
... idx += 1
...
>>> n
array([[ -4., -1., 1., 0.],
[ -6., -7., -2., -63.],
[ -1., -6., 6., 3.],
[ -4., -12., 7., -60.],
[ 2., 6., -10., -3.],
[ 1., 6., -6., -3.],
[ -1., 0., 4., 0.],
[ 4., 12., -7., 60.],
[ 3., 6., -1., 63.],
[ 2., 6., 3., 63.]])
n is what i want.

Here's a simple demonstration of what I think you are trying to do:
First the numpy array version:
In [619]: arr=np.arange(12).reshape(4,3)
In [620]: arr[[1,0,2,3]]-arr[0]
Out[620]:
array([[3, 3, 3],
[0, 0, 0],
[6, 6, 6],
[9, 9, 9]])
Now the sparse equivalent:
In [622]: M=sparse.csr_matrix(arr)
csr implements row indexing:
In [624]: M[[1,0,2,3]]
Out[624]:
<4x3 sparse matrix of type '<class 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>
In [625]: M[[1,0,2,3]].A
Out[625]:
array([[ 3, 4, 5],
[ 0, 1, 2],
[ 6, 7, 8],
[ 9, 10, 11]], dtype=int32)
But not broadcasting:
In [626]: M[[1,0,2,3]]-M[0]
....
ValueError: inconsistent shapes
So we can use an explicit form of broadcasting
In [627]: M[[1,0,2,3]]-M[[0,0,0,0]] # or M[[0]*4]
Out[627]:
<4x3 sparse matrix of type '<class 'numpy.int32'>'
with 9 stored elements in Compressed Sparse Row format>
In [628]: _.A
Out[628]:
array([[3, 3, 3],
[0, 0, 0],
[6, 6, 6],
[9, 9, 9]], dtype=int32)
This may not be the fastest or most efficient, but it's a start.
I found in a previous SO question that M[[1,0,2,3]] indexing is performed with a matrix multiplication, in this case the equivalent of:
idxM = sparse.csr_matrix(([1,1,1,1],([0,1,2,3],[1,0,2,3])),(4,4))
M1 = idxM * M
Sparse matrix slicing using list of int
So my difference expression requires 2 such multiplication along with the subtraction.
We could try a row by row iteration, and building a new matrix from the result, but there's no guarantee that it will be faster. Depending on the arrays, converting to dense and back might even be faster.
=================
I can imagine 2 ways of applying this to the dictionary.
One is to iterate through the dictionary (what order?), perform this difference for each key, collect the results in a list (list of sparse matrices), and use sparse.bmat to join them into one matrix.
Another is to collect two lists of indexes, and apply the above indexed difference just once.
In [8]: index_dictionary = {0: [2, 4], 1: [3, 4], 2: [1], 3: [1, 2], 4: [1, 3, 2]}
In [10]: alist=[]
...: for index in index_dictionary:
...: for k in index_dictionary[index]:
...: alist.append((index, k))
In [11]: idx = np.array(alist)
In [12]: idx
Out[12]:
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
applied to your dense s:
In [15]: s = np.array([[1,5,3,4],[3,0,12,7],[5,6,2,4],[4,6,6,4],[7,12,5,67]])
In [16]: s[idx[:,0]]-s[idx[:,1]]
Out[16]:
array([[ -4, -1, 1, 0],
[ -6, -7, -2, -63],
[ -1, -6, 6, 3],
[ -4, -12, 7, -60],
[ 2, 6, -10, -3],
[ 1, 6, -6, -3],
[ -1, 0, 4, 0],
[ 4, 12, -7, 60],
[ 3, 6, -1, 63],
[ 2, 6, 3, 63]])
and to the sparse equivalent
In [19]: arr= sparse.csr_matrix(s)
In [20]: arr
Out[20]:
<5x4 sparse matrix of type '<class 'numpy.int32'>'
with 19 stored elements in Compressed Sparse Row format>
In [21]: res=arr[idx[:,0]]-arr[idx[:,1]]
In [22]: res
Out[22]:
<10x4 sparse matrix of type '<class 'numpy.int32'>'
with 37 stored elements in Compressed Sparse Row format>

Scipy sparse matrix assignment using only stored elements

I have a large sparse matrix globalGrid (lil_matrix) and a smaller matrix localGrid (coo_matrix). The localGrid represents a subset of the globalGrid and I want to update the globalGrid with the localGrid. For this I use the following code (in Python Scipy):
globalGrid[xLocalgrid:xLocalgrid + localGrid.shape[0], yLocalgrid: yLocalgrid + localGrid.shape[1]] = localGrid
where xLocalGrid and yLocalGrid are the offset of the localGrid origin with respect to the globalGrid.
The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
I have found about masked arrays in numpy, however that does not seem to apply to sparse scipy matrices.
edit: In response to the comments below, here is a example to illustrate what I mean:
First setup the matrices:
M=sparse.lil_matrix(2*np.ones([5,5]))
m = sparse.eye(3)
M.todense()
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]])
m.todense()
matrix([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
Then assign:
M[1:4, 1:4] = m
Now the result is:
M.todense()
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 1., 0., 0., 2.],
[ 2., 0., 1., 0., 2.],
[ 2., 0., 0., 1., 2.],
[ 2., 2., 2., 2., 2.]])
Whereas I need the result to be:
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 1., 2., 2., 2.],
[ 2., 2., 1., 2., 2.],
[ 2., 2., 2., 1., 2.],
[ 2., 2., 2., 2., 2.]])

Should this line
The problem is that the localGrid is sparse, but also the non-zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
be changed to?
The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
Your question isn't quite clear, but I'm guessing that because the globalGrid[a:b, c:d] indexing spans values that should be 0 in both arrays, that you are worried that 0's are being copied.
Let's try this with real matrices.
In [13]: M=sparse.lil_matrix((10,10))
In [14]: m=sparse.eye(3)
In [15]: M[4:7,5:8]=m
In [16]: m
Out[16]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements (1 diagonals) in DIAgonal format>
In [17]: M
Out[17]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in LInked List format>
In [18]: M.data
Out[18]: array([[], [], [], [], [1.0], [1.0], [1.0], [], [], []], dtype=object)
In [19]: M.rows
Out[19]: array([[], [], [], [], [5], [6], [7], [], [], []], dtype=object)
M does not have any unnecessary 0's.
If there are unnecessary 0's in a sparse matrix, a round trip to csr format should take care of them
M.tocsr().tolil()
csr format also has an inplace .eliminate_zeros() method.
So your concern is with over writing the nonzeros of the target array.
With dense arrays, the use of nonzero (or where) takes care of this:
In [87]: X=np.ones((10,10),int)*2
In [88]: y=np.eye(3)
In [89]: I,J=np.nonzero(y)
In [90]: X[I+3,J+2]=y[I,J]
In [91]: X
Out[91]:
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])
Trying the sparse equivalent:
In [92]: M=sparse.lil_matrix(X)
In [93]: M
Out[93]:
<10x10 sparse matrix of type '<class 'numpy.int32'>'
with 100 stored elements in LInked List format>
In [94]: m=sparse.coo_matrix(y)
In [95]: m
Out[95]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in COOrdinate format>
In [96]: I,J=np.nonzero(m)
In [97]: I
Out[97]: array([0, 1, 2], dtype=int32)
In [98]: J
Out[98]: array([0, 1, 2], dtype=int32)
In [99]: M[I+3,J+2]=m[I,J]
...
TypeError: 'coo_matrix' object is not subscriptable
I could have used the sparse matrix own nonzero.
In [106]: I,J=m.nonzero()
For coo format, this is the same as
In [109]: I,J=m.row, m.col
In which case I can also use the data attribute:
In [100]: M[I+3,J+2]=m.data
In [101]: M.A
Out[101]:
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], dtype=int32)
The code for m.nonzero may be instructive
A = self.tocoo()
nz_mask = A.data != 0
return (A.row[nz_mask],A.col[nz_mask])
So you need to be careful to make sure that index and data attributes of the sparse matrix match.
And also pay attention as to which sparse formats allow indexing. lil is good for changing values. csr allows element by element indexing, but raises an efficiency warning if you try to change zero values to nonzero (or v.v.). coo has this nice pairing of indices and data, but doesn't allow indexing.
Another subtle point: in constructing a coo you may repeat coordinates. When converted to csr format those values are summed. But the assignment that I'm suggesting will only use the last value, not the sum. So make sure you understand how your local matrix was constructed, and know whether it is a 'clean' representation of the data.

I have found a working solution. Instead of using an assignment I loop over the data in the sparse matrix (using M.data and M.rows) and replace the elements one by one.
for idx, row in enumerate(m.rows):
for idy, col in enumerate(row):
M[yOffset+col, xOffset+idx] = m.data[idx][idy]
I am still curious though if there is no simpler/faster method to achieve this result.
My implementation of the above answer:
I,J = m.nonzero()
M[I+yOffset,J+xOffset] = m[I,J]
return M'
This was however marginally slower.

Specifying dtype=object for numpy.gradient

Is there a way to specify the dtype for numpy.gradient?
I'm using an array of subarrays and it's throwing the following error:
ValueError: setting an array element with a sequence.
Here is an example:
import numpy as np
a = np.empty([3, 3], dtype=object)
it = np.nditer(a, flags=['multi_index', 'refs_ok'])
while not it.finished:
i = it.multi_index[0]
j = it.multi_index[1]
a[it.multi_index] = np.array([i, j])
it.iternext()
print(a)
which outputs
[[array([0, 0]) array([0, 1]) array([0, 2])]
[array([1, 0]) array([1, 1]) array([1, 2])]
[array([2, 0]) array([2, 1]) array([2, 2])]]
I would like print(np.gradient(a)) to return
array(
[[array([[1, 0],[0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]])],
[array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0],[0, 1]])],
[array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0],[0, 1]])]],
dtype=object)
Notice that, in this case, the gradient of the vector field is an identity tensor field.

why are you working an array of dtype object? That's more work than using a 2d array.
e.g.
In [53]: a1=np.array([[1,2],[3,4],[5,6]])
In [54]: a1
Out[54]:
array([[1, 2],
[3, 4],
[5, 6]])
In [55]: np.gradient(a1)
Out[55]:
[array([[ 2., 2.],
[ 2., 2.],
[ 2., 2.]]),
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])]
or working column by column, or row by row
In [61]: [np.gradient(i) for i in a1.T]
Out[61]: [array([ 2., 2., 2.]), array([ 2., 2., 2.])]
In [62]: [np.gradient(i) for i in a1]
Out[62]: [array([ 1., 1.]), array([ 1., 1.]), array([ 1., 1.])]
dtype=object only make sense if the subarrays/lists differ in type and/or shape. And even then it doesn't add much to a regular Python list.
==============================
I can take your 2d a, and make a 3d array with:
In [126]: a1=np.zeros((3,3,2),int)
In [127]: a1.flat[:]=[i for i in a.flatten()]
In [128]: a1
Out[128]:
array([[[0, 0],
[0, 1],
[0, 2]],
[[1, 0],
[1, 1],
[1, 2]],
[[2, 0],
[2, 1],
[2, 2]]])
Or I could produce the same thing with meshgrid:
In [129]: X,Y=np.meshgrid(np.arange(3),np.arange(3),indexing='ij')
In [130]: a2=np.array([Y,X]).T
When I apply np.gradient to that I get 3 arrays, each (3,3,2) in shape.
In [136]: ga1=np.gradient(a1)
In [137]: len(ga1)
Out[137]: 3
In [138]: ga1[0].shape
Out[138]: (3, 3, 2)
It looks like the 1st 2 arrays have the values you want, so it's just a matter of rearranging them.
In [141]: np.array(ga1[:2]).shape
Out[141]: (2, 3, 3, 2)
In [143]: gga1=np.array(ga1[:2]).transpose([1,2,0,3])
In [144]: gga1.shape
Out[144]: (3, 3, 2, 2)
In [145]: gga1[0,0]
Out[145]:
array([[ 1., -0.],
[-0., 1.]])
If they must go back into a (3,3) object array, I could do:
In [146]: goa1=np.empty([3,3],dtype=object)
In [147]: for i in range(3):
for j in range(3):
goa1[i,j]=gga1[i,j]
.....:
In [148]: goa1
Out[148]:
array([[array([[ 1., -0.],
[-0., 1.]]),
array([[ 1., -0.],
[ 0., 1.]]),
array([[ 1., -0.],
...
[ 0., 1.]]),
array([[ 1., 0.],
[ 0., 1.]])]], dtype=object)
I still wonder what's the point to working with a object array.

Adding n columns to a numpy array [duplicate]

This question already has answers here:
Concatenate a NumPy array to another NumPy array
(12 answers)
Closed 7 years ago.
I'm making a program where I need to make a matrix looking like this:
A = np.array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
So I started thinking about this np.arange(1,4)
But, how to append n columns of np.arange(1,4) to A?

As mentioned in docs you can use concatenate
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
[3, 4, 6]])

Here's another way, using broadcasting:
In [69]: np.arange(1,4)*np.ones((4,1))
Out[69]:
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])

You can get something like what you typed in your question with:
N = 3
A = np.tile(np.arange(1, N+1), (N, 1))
I'm assuming you want a square array?

>>> np.repeat([np.arange(1, 4)], 4, 0)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Changing graph dataset matrices from sparse format to dense - python

Related

What does x[:,[0,1,2,2]](a kind of splicing) mean in numpy arrays in python? I was executing the following in anaconda

Create a new sparse matrix from the operations of different rows of a given large sparse matrix in python

Scipy sparse matrix assignment using only stored elements

Specifying dtype=object for numpy.gradient

Adding n columns to a numpy array [duplicate]

Categories

Resources