Vectorize numpy indexing and apply a function to build a matrix

Vectorize numpy indexing and apply a function to build a matrix - python

I have a matrix X of size (d,N). In other words, there are N vectors with d dimensions each. For example,
X = [[1,2,3,4],[5,6,7,8]]
there are N=4 vectors of d=2 dimensions.
Also, I have rag array (list of lists). Indices are indexing columns in the X matrix. For example,
I = [ [0,1], [1,2,3] ]
The I[0]=[0,1] indexes columns 0 and 1 in matrix X. Similarly the element I[1] indexes columns 1,2 and 3. Notice that elements of I are lists that are not of the same length!
What I would like to do, is to index the columns in the matrix X using each element in I, sum the vectors and get a vector. Repeat this for each element of I and thus build a new matrix Y. The matrix Y should have as many d-dimensional vectors as there are elements in I array. In my example, the Y matrix will have 2 vectors of 2 dimensions.
In my example, the element I[0] tells to get columns 0 and 1 from matrix X. Sum the two vectors 2-dimensional vectors of matrix X and put this vector in Y (column 0). Then, element I[1] tells to sum the columns 1,2 and 3 of matrix X and put this new vector in Y (column 1).
I can do this easily using a loop but I would like to vectorize this operation if possible. My matrix X has hundreds of thousands of columns and the I indexing matrix has tens of thousands elements (each element is a short lists of indices).
My loopy code :
Y = np.zeros( (d,len(I)) )
for i,idx in enumerate(I):
Y[:,i] = np.sum( X[:,idx], axis=1 )

Here's an approach -
# Get a flattened version of indices
idx0 = np.concatenate(I)
# Get indices at which we need to do "intervaled-summation" along axis=1
cut_idx = np.append(0,map(len,I))[:-1].cumsum()
# Finally index into cols of array with flattend indices & perform summation
out = np.add.reduceat(X[:,idx0], cut_idx,axis=1)
Step-by-step run -
In [67]: X
Out[67]:
array([[ 1, 2, 3, 4],
[15, 6, 17, 8]])
In [68]: I
Out[68]: array([[0, 2, 3, 1], [2, 3, 1], [2, 3]], dtype=object)
In [69]: idx0 = np.concatenate(I)
In [70]: idx0 # Flattened indices
Out[70]: array([0, 2, 3, 1, 2, 3, 1, 2, 3])
In [71]: cut_idx = np.append(0,map(len,I))[:-1].cumsum()
In [72]: cut_idx # We need to do addition in intervals limited by these indices
Out[72]: array([0, 4, 7])
In [74]: X[:,idx0] # Select all of the indexed columns
Out[74]:
array([[ 1, 3, 4, 2, 3, 4, 2, 3, 4],
[15, 17, 8, 6, 17, 8, 6, 17, 8]])
In [75]: np.add.reduceat(X[:,idx0], cut_idx,axis=1)
Out[75]:
array([[10, 9, 7],
[46, 31, 25]])

Related

How do I access the values of a numpy array with an array of indices?

For example,
k = np.array([[[1,2,3,4],[1,2,3,4]]])
index = np.array([[0,0], [0,1]])
I want to be able to get the values from k responding to [0,0] and [0,1].
How could I do this?
If I use a for loop through the array it works.
for y in range(1):
for x in range(1):
k[index[y,x]]
However, I would like to do this without using for loops.

In [50]: k = np.array([[[1,2,3,4],[1,2,3,4]]])
...: index = np.array([[0,0], [0,1]])
In [51]: k
Out[51]:
array([[[1, 2, 3, 4],
[1, 2, 3, 4]]])
In [52]: k.shape
Out[52]: (1, 2, 4)
Note the shape - 3d, due to the 3 levels of []
In [53]: index
Out[53]:
array([[0, 0],
[0, 1]])
Because this array is symmetric, it doesn't matter whether we use the rows or the columns. For a more general case you'll need to be clearer.
In any case, we index each dimension of k with an array
Using columns of index, and working with the first 2 dimensions:
In [54]: k[index[:,0],index[:,1]]
Out[54]:
array([[1, 2, 3, 4],
[1, 2, 3, 4]])
Looks much like k except it is 2d.
Or applying a 0 to the first size 1 dimension:
In [55]: k[0,index[:,0],index[:,1]]
Out[55]: array([1, 2])
Read more at https://numpy.org/doc/stable/user/basics.indexing.html

How to sum specific row values together in Sparse COO matrix to reshape matrix

I have a sparse coo matrix built in python using the scipy library. An example data set looks something like this:
>>> v.toarray()
array([[1, 0, 2, 4],
[0, 0, 3, 1],
[4, 5, 6, 9]])
I would like to add the 0th index and 2nd index together and the 1st index and the and 3rd index together so the shape would change from 3, 4 to 3, 2.
However looking at the docs their sum function doesn't support slicing of some sort. So the only way I have thought of a way to do something like that would be to loop the matrix as an array then use numpy to get the summed values like so:
a_col = []
b_col = []
for x in range(len(v.toarray()):
a_col.append(np.sum(v.toarray()[x, [0, 2]], axis=0))
b_col.append(np.sum(v.toarray()[x, [1, 3]], axis=0))
Then use those values for a_col and b_col to create the matrix again.
But surely there should be a way to handle it with the sum method?

You can add the values with a simple loop and 2d slicing and than take the columns you want
v = np.array([[1, 0, 2, 4],
[0, 0, 3, 1],
[4, 5, 6, 9]])
for i in range(2):
v[:, i] = v[:, i] + v[:, i+2]
print(v[:, :2])
Output
[[ 3 4]
[ 3 1]
[10 14]]

You can use csr_matrix.dot with a special matrix to achieve the same,
csr = csr_matrix(csr.dot(np.array([[1,0,1,0],[0,1,0,1]]).T))
#csr.data
#[ 3, 4, 3, 1, 10, 14]

Row by column dot product in numpy array

I have two numpy arrays (A, B) of equal dimensions lets say 3*3 each. I want to have an output vector of size (3,) that has the dot product of the first row of A and first column of B, second row of A and second column of B and so on.
A = np.array([[ 5, 1 ,3], [ 1, 1 ,1], [ 1, 2 ,1]])
B = np.array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])
What I want to have as result is [16,6,8] which would be equivilant to
np.diagonal(A.dot(B.T))
but of course I don't want this solution because the matrix is very large.

Just do an element wise multiplication and then sum the rows:
(A * B).sum(axis=1)
# array([16, 6, 8])
Or use np.einsum:
np.einsum('ij,ij->i', A, B)
# array([16, 6, 8])

Randomly shuffle items in each row of numpy array

I have a numpy array like the following:
Xtrain = np.array([[1, 2, 3],
[4, 5, 6],
[1, 7, 3]])
I want to shuffle the items of each row separately, but do not want the shuffle to be the same for each row (as in several examples just shuffle column order).
For example, I want an output like the following:
output = np.array([[3, 2, 1],
[4, 6, 5],
[7, 3, 1]])
How can I randomly shuffle each of the rows randomly in an efficient way? My actual np array is over 100000 rows and 1000 columns.

Since you want to only shuffle the columns you can just perform the shuffling on transposed of your matrix:
In [86]: np.random.shuffle(Xtrain.T)
In [87]: Xtrain
Out[87]:
array([[2, 3, 1],
[5, 6, 4],
[7, 3, 1]])
Note that random.suffle() on a 2D array shuffles the rows not items in each rows. i.e. changes the position of the rows. Therefor if your change the position of the transposed matrix rows you're actually shuffling the columns of your original array.
If you still want a completely independent shuffle you can create random indexes for each row and then create the final array with a simple indexing:
In [172]: def crazyshuffle(arr):
...: x, y = arr.shape
...: rows = np.indices((x,y))[0]
...: cols = [np.random.permutation(y) for _ in range(x)]
...: return arr[rows, cols]
...:
Demo:
In [173]: crazyshuffle(Xtrain)
Out[173]:
array([[1, 3, 2],
[6, 5, 4],
[7, 3, 1]])
In [174]: crazyshuffle(Xtrain)
Out[174]:
array([[2, 3, 1],
[4, 6, 5],
[1, 3, 7]])

From: https://github.com/numpy/numpy/issues/5173
def disarrange(a, axis=-1):
"""
Shuffle `a` in-place along the given axis.
Apply numpy.random.shuffle to the given axis of `a`.
Each one-dimensional slice is shuffled independently.
"""
b = a.swapaxes(axis, -1)
# Shuffle `b` in-place along the last axis. `b` is a view of `a`,
# so `a` is shuffled in place, too.
shp = b.shape[:-1]
for ndx in np.ndindex(shp):
np.random.shuffle(b[ndx])
return

This solution is not efficient by any means, but I had fun thinking about it, so wrote it down. Basically, you ravel the array, and create an array of row labels, and an array of indices. You shuffle the index array, and index the original and row label arrays with that. Then you apply a stable argsort to the row labels to gather the data into rows. Apply that index and reshape and viola, data shuffled independently by rows:
import numpy as np
r, c = 3, 4 # x.shape
x = np.arange(12) + 1 # Already raveled
inds = np.arange(x.size)
rows = np.repeat(np.arange(r).reshape(-1, 1), c, axis=1).ravel()
np.random.shuffle(inds)
x = x[inds]
rows = rows[inds]
inds = np.argsort(rows, kind='mergesort')
x = x[inds].reshape(r, c)
Here is an IDEOne Link

We can create a random 2-dimensional matrix, sort it by each row, and then use the index matrix given by argsort to reorder the target matrix.
target = np.random.randint(10, size=(5, 5))
# [[7 4 0 2 5]
# [5 6 4 8 7]
# [6 4 7 9 5]
# [8 6 6 2 8]
# [8 1 6 7 3]]
shuffle_helper = np.argsort(np.random.rand(5,5), axis=1)
# [[0 4 3 2 1]
# [4 2 1 3 0]
# [1 2 3 4 0]
# [1 2 4 3 0]
# [1 2 3 0 4]]
target[np.arange(shuffle_helper.shape[0])[:, None], shuffle_helper]
# array([[7, 5, 2, 0, 4],
# [7, 4, 6, 8, 5],
# [4, 7, 9, 5, 6],
# [6, 6, 8, 2, 8],
# [1, 6, 7, 8, 3]])
Explanation
We use np.random.rand and argsort to mimic the effect from shuffling.
random.rand gives randomness.
Then, we use argsort with axis=1 to help rank each row. This creates the index that can be used for reordering.

Lets say you have array a with shape 100000 x 1000.
b = np.random.choice(100000 * 1000, (100000, 1000), replace=False)
ind = np.argsort(b, axis=1)
a_shuffled = a[np.arange(100000)[:,np.newaxis], ind]
I don't know if this is faster than loop, because it needs sorting, but with this solution maybe you will invent something better, for example with np.argpartition instead of np.argsort

You may use Pandas:
df = pd.DataFrame(X_train)
_ = df.apply(lambda x: np.random.permutation(x), axis=1, raw=True)
df.values
Change the keyword to axis=0 if you want to shuffle columns.

Selecting matrix elements from a column matrix

I want to select a element from each row in a matrix according a column matrix.
The column matrix thus contains the indexes to pick.
(Pdb) num_samples
15000
(Pdb) probs.shape
(15000, 26)
(Pdb) y.shape
(15000, 1)
(Pdb) (probs[np.arange(num_samples),y]).shape
(15000, 15000)
(Pdb) # this should (15000,)

integer array indexing may be helpful.
Suppose you have this numpy array:
myArray = numpy.array([[2, 3, 4],
[6, 7, 8],
[9, 1, 5]])
If your array of indices to select is
indices = numpy.array([2, 0, 1])
Then
rowSelector = numpy.arange(myArray.shape[0])
myArray[rowSelector, indices]
returns the values of the selected elements of the array:
array([4, 6, 1])

It may be that your y variable is a numpy matrix. Your code works fine when y is a list:
>>> num_samples = 3
>>> data = np.matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> y = [0, 1, 2]
>>> print((data[np.arange(num_samples), y]).shape)
(1, 3)
and the result can be easily rearranged to be a column matrix.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Vectorize numpy indexing and apply a function to build a matrix - python

Related

How do I access the values of a numpy array with an array of indices?

How to sum specific row values together in Sparse COO matrix to reshape matrix

Row by column dot product in numpy array

Randomly shuffle items in each row of numpy array

Selecting matrix elements from a column matrix

Categories

Resources