Row by column dot product in numpy array - python

I have two numpy arrays (A, B) of equal dimensions lets say 3*3 each. I want to have an output vector of size (3,) that has the dot product of the first row of A and first column of B, second row of A and second column of B and so on.
A = np.array([[ 5, 1 ,3], [ 1, 1 ,1], [ 1, 2 ,1]])
B = np.array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])
What I want to have as result is [16,6,8] which would be equivilant to
np.diagonal(A.dot(B.T))
but of course I don't want this solution because the matrix is very large.

Just do an element wise multiplication and then sum the rows:
(A * B).sum(axis=1)
# array([16, 6, 8])
Or use np.einsum:
np.einsum('ij,ij->i', A, B)
# array([16, 6, 8])

Related

How to sum specific row values together in Sparse COO matrix to reshape matrix

I have a sparse coo matrix built in python using the scipy library. An example data set looks something like this:
>>> v.toarray()
array([[1, 0, 2, 4],
[0, 0, 3, 1],
[4, 5, 6, 9]])
I would like to add the 0th index and 2nd index together and the 1st index and the and 3rd index together so the shape would change from 3, 4 to 3, 2.
However looking at the docs their sum function doesn't support slicing of some sort. So the only way I have thought of a way to do something like that would be to loop the matrix as an array then use numpy to get the summed values like so:
a_col = []
b_col = []
for x in range(len(v.toarray()):
a_col.append(np.sum(v.toarray()[x, [0, 2]], axis=0))
b_col.append(np.sum(v.toarray()[x, [1, 3]], axis=0))
Then use those values for a_col and b_col to create the matrix again.
But surely there should be a way to handle it with the sum method?
You can add the values with a simple loop and 2d slicing and than take the columns you want
v = np.array([[1, 0, 2, 4],
[0, 0, 3, 1],
[4, 5, 6, 9]])
for i in range(2):
v[:, i] = v[:, i] + v[:, i+2]
print(v[:, :2])
Output
[[ 3 4]
[ 3 1]
[10 14]]
You can use csr_matrix.dot with a special matrix to achieve the same,
csr = csr_matrix(csr.dot(np.array([[1,0,1,0],[0,1,0,1]]).T))
#csr.data
#[ 3, 4, 3, 1, 10, 14]

Efficiently evaluate a function of an array's values _and_ indices

For a machine learning project I am doing, I need to transform a 2D array of floats to another array of the same shape where elements to the left and below are at least as large as the given element.
For example,
In [135]: import numpy as np
...: A = np.array([[1, 2, 1, 1],
...: [1, 1, 6, 5],
...: [3, 2, 4, 2]])
...: print(A)
[[1 2 1 1]
[1 1 6 5]
[3 2 4 2]]
Because A[0,1] = 2, I the following elements (below and to the right) to be >= 2: A[0,2], A[0,3], A[1,1].
Likewise, because A[1,2] = 6, I the following elements (below and to the right) to be >= 6: A[1,3], A[2,2], A[2,3].
I need to do this for every element in the array. The end result is:
[[1 2 2 2]
[1 2 6 6]
[3 3 6 6]]
Here's code that works, but I'd rather use fewer loops. I'd like to use vector operations or apply the function set_val against all elements of the array A. I looked into meshgrid and vectorize, but didn't see how to pass the index of the array (i.e. row,col) to the function.
def set_val(A, cur_row,cur_col,min_val):
for row_new in range(cur_row,A.shape[0]):
for col_new in range(cur_col,A.shape[1]):
if A[row_new,col_new] < min_val:
A[row_new,col_new] = min_val
A_new = A.copy()
#Iterate over every element of A
for row,row_data in enumerate(A):
for col,val in enumerate(row_data):
#Set values to the right and below to be no smaller than the given value
set_val(A, row, col, val)
print(A_new)
My question: Is there a more efficient (or at least more Pythonic) approach?
You can make use of two "cummulative maximum" calls:
from np.mx import maximum as mx
mx.accumulate(mx.accumulate(A), axis=1)
The mx.accumulate calculates the cummulative maximum. This means that for axis=0, the value for B = accumulate(A) is so that bij= maxk≤j aik. For axis=1, the same happens, but columnwise.
By doing this two times, we know that for the result R the value for rij will be the maximum of rij= maxk≤i, l≤ j akl.
Indeed, if such the largest element exists in this subrectangle, then the first mx.accumulate(..) will copy that value to the right, and thus eventually to the same column as the "target". Then the next mx.accumulate(.., axis=1) will copy that value to the same row as the "target", and thus pass that value to the correct cell.
For the given sample input, we thus obtain:
>>> A
array([[1, 2, 1, 1],
[1, 1, 6, 5],
[3, 2, 4, 2]])
>>> mx.accumulate(mx.accumulate(A), axis=1)
array([[1, 2, 2, 2],
[1, 2, 6, 6],
[3, 3, 6, 6]])
Benchmarks: if we run the above algorithm for a random 1000×1000 matrix, and we repeat the experiment 100 times, we get the following benchmark:
>>> timeit(lambda: mx.accumulate(mx.accumulate(A), axis=1), number=100)
1.5123104000231251
This thus means that it calculates one such matrix in approximately 151 milliseconds.

what's the difference between these two numpy array shape?

In [136]: s = np.array([[1,0,1],[0,1,1],[0,0,1],[1,1,1]])
In [137]: s
Out[137]:
array([[1, 0, 1],
[0, 1, 1],
[0, 0, 1],
[1, 1, 1]])
In [138]: x = s[0:1]
In [139]: x.shape
Out[139]: (1, 3)
In [140]: y = s[0]
In [141]: y.shape
Out[141]: (3,)
In [142]: x
Out[142]: array([[1, 0, 1]])
In [143]: y
Out[143]: array([1, 0, 1])
In the above code, x's shape is (1,3) and y's shape is(3,).
(1,3): 1 row and 3 columns
(3,): How many rows and columns in this case?
Does (3,) represent 1-dimension array?
In practice, if I want to iterate through the matrix row by row, which way should I go?
for i in range(len(x)):
row = x[i]
# OR
row = x[i:i+1]
First, you can get the number of dimensions of an numpy array array through len(array.shape).
An array with some dimensions of length 1 is not equal to an array with those dimensions removed, for example:
>>> a = np.array([[1], [2], [3]])
>>> b = np.array([1, 2, 3])
>>> a
array([[1],
[2],
[3]])
>>> b
array([1, 2, 3])
>>> a.shape
(3, 1)
>>> b.shape
(3,)
>>> a + a
array([[2],
[4],
[6]])
>>> a + b
array([[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
Conceptually, the difference between an array of shape (3, 1) and one of shape (3,) is like the difference between the length of [100] and 100.
[100] is a list that happens to have one element. It could have more, but right now it has the minimum possible number of elements.
On the other hand, it doesn't even make sense to talk about the length of 100, because it doesn't have one.
Similarly, the array of shape (3, 1) has 3 rows and 1 column, while the array of shape (3,) has no columns at all. It doesn't even have rows, in a sense; it is a row, just like 100 has no elements, because it is an element.
For more information on how differently shaped arrays behave when interacting with other arrays, you can see the broadcasting rules.
Lastly, for completeness, to iterate through the rows of a numpy array, you could just do for row in array. If you want to iterate through the back axes, you can use np.moveaxis, for example:
>>> array = np.array([[1, 2], [3, 4], [5, 6]])
>>> for row in array:
... print(row)
...
[1 2]
[3 4]
[5 6]
>>> for col in np.moveaxis(array, [0, 1], [1, 0]):
... print(col)
...
[1 3 5]
[2 4 6]

Vectorize numpy indexing and apply a function to build a matrix

I have a matrix X of size (d,N). In other words, there are N vectors with d dimensions each. For example,
X = [[1,2,3,4],[5,6,7,8]]
there are N=4 vectors of d=2 dimensions.
Also, I have rag array (list of lists). Indices are indexing columns in the X matrix. For example,
I = [ [0,1], [1,2,3] ]
The I[0]=[0,1] indexes columns 0 and 1 in matrix X. Similarly the element I[1] indexes columns 1,2 and 3. Notice that elements of I are lists that are not of the same length!
What I would like to do, is to index the columns in the matrix X using each element in I, sum the vectors and get a vector. Repeat this for each element of I and thus build a new matrix Y. The matrix Y should have as many d-dimensional vectors as there are elements in I array. In my example, the Y matrix will have 2 vectors of 2 dimensions.
In my example, the element I[0] tells to get columns 0 and 1 from matrix X. Sum the two vectors 2-dimensional vectors of matrix X and put this vector in Y (column 0). Then, element I[1] tells to sum the columns 1,2 and 3 of matrix X and put this new vector in Y (column 1).
I can do this easily using a loop but I would like to vectorize this operation if possible. My matrix X has hundreds of thousands of columns and the I indexing matrix has tens of thousands elements (each element is a short lists of indices).
My loopy code :
Y = np.zeros( (d,len(I)) )
for i,idx in enumerate(I):
Y[:,i] = np.sum( X[:,idx], axis=1 )
Here's an approach -
# Get a flattened version of indices
idx0 = np.concatenate(I)
# Get indices at which we need to do "intervaled-summation" along axis=1
cut_idx = np.append(0,map(len,I))[:-1].cumsum()
# Finally index into cols of array with flattend indices & perform summation
out = np.add.reduceat(X[:,idx0], cut_idx,axis=1)
Step-by-step run -
In [67]: X
Out[67]:
array([[ 1, 2, 3, 4],
[15, 6, 17, 8]])
In [68]: I
Out[68]: array([[0, 2, 3, 1], [2, 3, 1], [2, 3]], dtype=object)
In [69]: idx0 = np.concatenate(I)
In [70]: idx0 # Flattened indices
Out[70]: array([0, 2, 3, 1, 2, 3, 1, 2, 3])
In [71]: cut_idx = np.append(0,map(len,I))[:-1].cumsum()
In [72]: cut_idx # We need to do addition in intervals limited by these indices
Out[72]: array([0, 4, 7])
In [74]: X[:,idx0] # Select all of the indexed columns
Out[74]:
array([[ 1, 3, 4, 2, 3, 4, 2, 3, 4],
[15, 17, 8, 6, 17, 8, 6, 17, 8]])
In [75]: np.add.reduceat(X[:,idx0], cut_idx,axis=1)
Out[75]:
array([[10, 9, 7],
[46, 31, 25]])

how to replace an array containing indices with values from another array?

I have an array b containing indices of an array a. I want to insert values of another array c in the array b with same indices.
import numpy as np
a1=np.array([[1, 3, 5, 2, 3],[7, 6, 5, 2, 4],[2, 0, 5, 6, 4]])
a=a1.argsort()[:,:2]
## this will create an array with indices of 2 smallest values of a1
a
[[0 3]
[3 4]
[1 0]]
b=np.array([[1],[2],[3],[4],[5],[6]])
now I want to replace value 0 in a with 1 in b ; 3 with 4 and so on
i tried using:
[a[index]]=b[index]
but its obviously not the right way as array a handles these indices as values
please help
If I understood you correctly, you can just use the flattened version of a to index into b:
result = b.ravel()[a.ravel()]
[1, 4, 4, 5, 2, 1]
If you need it in the same dimensions as a you can reshape it:
result = result.reshape(a.shape)
[[1, 4]
[4, 5]
[2, 1]]

Categories