Efficient way of taking Logarithm function in a sparse matrix - python

I have a big sparse matrix. I want to take log4 for all element in that sparse matrix.
I try to use numpy.log() but it doesn't work with matrices.
I can also take logarithm row by row. Then I crush old row with a new one.
# Assume A is a sparse matrix (Linked List Format) with float values as data
# It is only for one row
import numpy as np
c = np.log(A.getrow(0)) / numpy.log(4)
A[0, :] = c
This was not as quick as I'd expected. Is there a faster way to do this?

You can modify the data attribute directly:
>>> a = np.array([[5,0,0,0,0,0,0],[0,0,0,0,2,0,0]])
>>> coo = coo_matrix(a)
>>> coo.data
array([5, 2])
>>> coo.data = np.log(coo.data)
>>> coo.data
array([ 1.60943791, 0.69314718])
>>> coo.todense()
matrix([[ 1.60943791, 0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.69314718,
0. , 0. ]])
Note that this doesn't work properly if the sparse format has repeated elements (which is valid in the COO format); it'll take the logs individually, and log(a) + log(b) != log(a + b). You probably want to convert to CSR or CSC first (which is fast) to avoid this problem.
You'll also have to add checks if the sparse matrix is in a different format, of course. And if you don't want to modify the matrix in-place, just construct a new sparse matrix as you did in your answer, but without adding 3 because that's completely unnecessary here.

I think I solve it with very easy way. It is very strange that no one could answer immediately.
# Let A be a COO_matrix
import numpy as np
from scipy.sparse import coo_matrix
new_data = np.log(A.data+3)/np.log(4) #3 is not so important. It can be 1 too.
A = coo_matrix((new_data, (A.row, A.col)), shape=A.shape)

Related

Numpy appending two-dimensional arrays together

I am trying to create a function which exponentiates a 2-D matrix and keeps the result in a 3D array, where the first dimension is indexing the exponent. This is important because the rows of the matrix I am exponentiating represent information about different vertices on a graph. So for example if we have A, A^2, A^3, each is shape (50,50) and I want a matrix D = (3,50,50) so that I can go D[:,1,:] to retrieve all the information about node 1 and be able to do matrix multiplication with that. My code is currently as
def expo(times,A,n):
temp = A;
result = csr_matrix.toarray(temp)
for i in range(0,times):
temp = np.dot(temp,A)
if i == 0:
result = np.array([result,csr_matrix.toarray(temp)]) # this creates a (2,50,50) array
if i > 0:
result = np.append(result,csr_matrix.toarray(temp),axis=0) # this does not work
return result
However, this is not working because in the "i>0" case the temp array is of the shape (50,50) and cannot be appended. I am not sure how to make this work and I am rather confused by the dimensionality in Numpy, e.g. why thinks are (50,1) sometimes and just (50,) other times. Would anyone be able to help me make this code work and explain generally how these things should be done in Numpy?
Documentation reference
If you want to stack matrices in numpy, you can use the stack function.
If you also want the index to correspond to the exponent, you might want to add a unity matrix to the beginning of your output:
MWE
import numpy as np
def expo(A, n):
result =[np.eye(len(A)), A,]
for _ in range(n-1):
result.append(result[-1].dot(A))
return np.stack(result, axis=0)
# If you do not really need the 3D array,
# you could also just return the list
result = expo(np.array([[1,-2],[-2,1]]), 3)
print(result)
# [[[ 1. 0.]
# [ 0. 1.]]
#
# [[ 1. -2.]
# [ -2. 1.]]
#
# [[ 5. -4.]
# [ -4. 5.]]
#
# [[ 13. -14.]
# [-14. 13.]]]
print(result[1])
# [[ 1. -2.]
# [-2. 1.]]
Comments
As you can see, we first simply create the list of matrices, and then convert them to an array at the end. I am not sure if you really need the 3D array though, as you could also just index the list that was created, but that depends on your use case, if that is convenient or not.
I guess the axis keyword argument for a lot of numpy functions can be confusing at first, but the documentation usually has good examples that combined with same trial and error, should get you pretty far. For example for numpy.stack, the very first example is indeed exactly what you want to do.

Avoid using for loop. Python 3

I have an array of shape (3,2):
import numpy as np
arr = np.array([[0.,0.],[0.25,-0.125],[0.5,-0.125]])
I was trying to build a matrix (matrix) of dimensions (6,2), with the results of the outer product of the elements i,i of arr and arr.T. At the moment I am using a for loop such as:
size = np.shape(arr)
matrix = np.zeros((size[0]*size[1],size[1]))
for i in range(np.shape(arr)[0]):
prod = np.outer(arr[i],arr[i].T)
matrix[size[1]*i:size[1]+size[1]*i,:] = prod
Resulting:
matrix =array([[ 0. , 0. ],
[ 0. , 0. ],
[ 0.0625 , -0.03125 ],
[-0.03125 , 0.015625],
[ 0.25 , -0.0625 ],
[-0.0625 , 0.015625]])
Is there any way to build this matrix without using a for loop (e.g. broadcasting)?
Extend arrays to 3D with None/np.newaxis keeping the first axis aligned, while letting the second axis getting pair-wise multiplied, perform multiplication leveraging broadcasting and reshape to 2D -
matrix = (arr[:,None,:]*arr[:,:,None]).reshape(-1,arr.shape[1])
We can also use np.einsum -
matrix = np.einsum('ij,ik->ijk',arr,arr).reshape(-1,arr.shape[1])
einsum string representation might be more intuitive as it lets us visualize three things :
Axes that are aligned (axis=0 here).
Axes that are getting summed up (none here).
Axes that are kept i.e. element-wise multiplied (axis=1 here).

Reshape numpy (n,) vector to (n,1) vector

So it is easier for me to think about vectors as column vectors when I need to do some linear algebra. Thus I prefer shapes like (n,1).
Is there significant memory usage difference between shapes (n,) and (n,1)?
What is preferred way?
And how to reshape (n,) vector into (n,1) vector. Somehow b.reshape((n,1)) doesn't do the trick.
a = np.random.random((10,1))
b = np.ones((10,))
b.reshape((10,1))
print(a)
print(b)
[[ 0.76336295]
[ 0.71643237]
[ 0.37312894]
[ 0.33668241]
[ 0.55551975]
[ 0.20055153]
[ 0.01636735]
[ 0.5724694 ]
[ 0.96887004]
[ 0.58609882]]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
More simpler way with python syntax sugar is to use
b.reshape(-1,1)
where the system will automatically compute the correct shape instead "-1"
ndarray.reshape() returns a new view, or a copy (depends on the new shape). It does not modify the array in place.
b.reshape((10, 1))
as such is effectively no-operation, since the created view/copy is not assigned to anything. The "fix" is simple:
b_new = b.reshape((10, 1))
The amount of memory used should not differ at all between the 2 shapes. Numpy arrays use the concept of strides and so the dimensions (10,) and (10, 1) can both use the same buffer; the amounts to jump to next row and column just change.

Saving an array inside a column of a matrix in numpy shape error

Let's say I do some calculation and I get a matrix of size 3 by 3 each time in a loop. Assume that each time, I want to save such matrix in a column of a bigger matrix, whose number of rows is equal to 9 (total number of elements in the smaller matrix). first I reshape the smaller matrix and then try to save it into one column of the big matrix. A simple code for only one column looks something like this:
import numpy as np
Big = np.zeros((9,3))
Small = np.random.rand(3,3)
Big[:,0]= np.reshape(Small,(9,1))
print Big
But python throws me the following error:
Big[:,0]= np.reshape(Small,(9,1))
ValueError: could not broadcast input array from shape (9,1) into shape (9)
I also tried to use flatten, but that didn't work either. Is there any way to create a shape(9) array from the small matrix or any other way to handle this error?
Your help is greatly appreciated!
try:
import numpy as np
Big = np.zeros((9,3))
Small = np.random.rand(3,3)
Big[:,0]= np.reshape(Small,(9,))
print Big
or:
import numpy as np
Big = np.zeros((9,3))
Small = np.random.rand(3,3)
Big[:,0]= Small.reshape((9,1))
print Big
or:
import numpy as np
Big = np.zeros((9,3))
Small = np.random.rand(3,3)
Big[:,[0]]= np.reshape(Small,(9,1))
print Big
Either case gets me:
[[ 0.81527817 0. 0. ]
[ 0.4018887 0. 0. ]
[ 0.55423212 0. 0. ]
[ 0.18543227 0. 0. ]
[ 0.3069444 0. 0. ]
[ 0.72315677 0. 0. ]
[ 0.81592963 0. 0. ]
[ 0.63026719 0. 0. ]
[ 0.22529578 0. 0. ]]
Explanation
the shape of Big you are trying to assign to is (9, ) one-dimensional. The shape you are trying to assign with is (9, 1) two-dimensional. You need to reconcile this by making the two-dim a one-dim np.reshape(Small, (9,1)) into np.reshape(Small, (9,)). Or, make the one-dim into a two-dim Big[:, 0] into Big[:, [0]]. The exception is when I assigned 'Big[:, 0] = Small.reshape((9,1))`. In this case, numpy must be checking.

Euclidian Distances between points

I have an array of points in numpy:
points = rand(dim, n_points)
And I want to:
Calculate all the l2 norm (euclidian distance) between a certain point and all other points
Calculate all pairwise distances.
and preferably all numpy and no for's. How can one do it?
If you're willing to use SciPy, the scipy.spatial.distance module (the functions cdist and/or pdist) do exactly what you want, with all the looping done in C. You can do it with broadcasting too but there's some extra memory overhead.
This might help with the second part:
import numpy as np
from numpy import *
p=rand(3,4) # this is column-wise so each vector has length 3
sqrt(sum((p[:,np.newaxis,:]-p[:,:,np.newaxis])**2 ,axis=0) )
which gives
array([[ 0. , 0.37355868, 0.64896708, 1.14974483],
[ 0.37355868, 0. , 0.6277216 , 1.19625254],
[ 0.64896708, 0.6277216 , 0. , 0.77465192],
[ 1.14974483, 1.19625254, 0.77465192, 0. ]])
if p was
array([[ 0.46193242, 0.11934744, 0.3836483 , 0.84897951],
[ 0.19102709, 0.33050367, 0.36382587, 0.96880535],
[ 0.84963349, 0.79740414, 0.22901247, 0.09652746]])
and you can check one of the entries via
sqrt(sum ((p[:,0]-p[:,2] )**2 ))
0.64896708223796884
The trick is to put newaxis and then do broadcasting.
Good luck!

Categories