Indexing a matrix by a column vector - python

I have a matrix M of size m x n, and column vector of m x 1.
For each of m rows, I need to pickup the index corresponding to the value in the column vector minus 1. Thus, giving me answer m x 1. How can I do this?
zb=a1.a3[np.arange(a1.z3.shape[0]),a1.train_labels-1]
zb.shape
Out[72]: (4000, 4000)
a1.z3.shape
Out[73]: (4000, 26)
a1.train_labels.shape
Out[74]: (4000, 1)
a1.train_labels.head()
Out[75]:
22
1618 25
2330 1
1651 17
133 17
2360 5
#my column vector a1.train_labels is shuffled. I don't want to unshuffle it.

If your 2d array is M, and indices are a 1d array v, then you can use
M[np.arange(len(v)), v - 1]
For example:
In [14]: M = np.array([[1, 2], [3, 4]])
In [15]: v = np.array([2, 1])
In [16]: M[np.arange(len(v)), v - 1]
Out[16]: array([2, 3])

Related

How can I manipulate a numpy array without nested loops?

If I have a MxN numpy array denoted arr, I wish to index over all elements and adjust the values like so
for m in range(arr.shape[0]):
for n in range(arr.shape[1]):
arr[m, n] += x**2 * np.cos(m) * np.sin(n)
Where x is a random float.
Is there a way to broadcast this over the entire array without needing to loop? Thus, speeding up the run time.
You are just adding zeros, because sin(2*pi*k) = 0 for integer k.
However, if you want to vectorize this, the function np.meshgrid could help you.
Check the following example, where I removed the 2 pi in the trigonometric functions to add something unequal zero.
x = 2
arr = np.arange(12, dtype=float).reshape(4, 3)
n, m = np.meshgrid(np.arange(arr.shape[1]), np.arange(arr.shape[0]), sparse=True)
arr += x**2 * np.cos(m) * np.sin(n)
arr
Edit: use the sparse argument to reduce memory consumption.
You can use nested generators of two-dimensional arrays:
import numpy as np
from random import random
x = random()
n, m = 10,20
arr = [[x**2 * np.cos(2*np.pi*j) * np.sin(2*np.pi*i) for j in range(m)] for i in range(n)]
In [156]: arr = np.ones((2, 3))
Replace the range with arange:
In [157]: m, n = np.arange(arr.shape[0]), np.arange(arr.shape[1])
And change the first array to (2,1) shape. A (2,1) array broadcasts with a (3,) to produce a (2,3) result.
In [158]: A = 0.23**2 * np.cos(m[:, None]) * np.sin(n)
In [159]: A
Out[159]:
array([[0. , 0.04451382, 0.04810183],
[0. , 0.02405092, 0.02598953]])
In [160]: arr + A
Out[160]:
array([[1. , 1.04451382, 1.04810183],
[1. , 1.02405092, 1.02598953]])
The meshgrid suggested in the accepted answer does the same thing:
In [161]: np.meshgrid(m, n, sparse=True, indexing="ij")
Out[161]:
[array([[0],
[1]]),
array([[0, 1, 2]])]
This broadcasting may be clearer with:
In [162]: m, n
Out[162]: (array([0, 1]), array([0, 1, 2]))
In [163]: m[:, None] * 10 + n
Out[163]:
array([[ 0, 1, 2],
[10, 11, 12]])

adding dimensions to existing np arrays

I'm trying to make a clean connection between the dimensions in a numpy array and the dimensions of a matrix via classical linear algebra. Suppose the following:
In [1] import numpy as np
In [2] rand = np.random.RandomState(42)
In [3] a = rand.rand(3,2)
In [4] a
Out[4]:
array([[0.61185289, 0.13949386],
[0.29214465, 0.36636184],
[0.45606998, 0.78517596]])
In [5]: a[np.newaxis,:,:]
Out[5]:
array([[[0.61185289, 0.13949386],
[0.29214465, 0.36636184],
[0.45606998, 0.78517596]]])
In [6]: a[:,np.newaxis,:]
Out[6]:
array([[[0.61185289, 0.13949386]],
[[0.29214465, 0.36636184]],
[[0.45606998, 0.78517596]]])
In [7]: a[:,:,np.newaxis]
Out[7]:
array([[[0.61185289],
[0.13949386]],
[[0.29214465],
[0.36636184]],
[[0.45606998],
[0.78517596]]])
My questions are as follows:
Is is correct to say that the dimensions of a are 3 X 2? In other words, a 3 X 2 matrix?
Is it correct to say that the dimensions of a[np.newaxis,:,:] are 1 X 3 X 2? In other words, a matrix containing a 3 X 2 matrix?
Is it correct to say that the dimensions of a[:,np.newaxis,:] are 3 X 1 X 2? In other words a matrix containing 3 1 X 2 matrices?
Is it correct to say that the dimensions of a[:,:,np.newaxis] are 3 X 2 X1? In other words a matrix containing 3 matrices each of which contain 2 1 X 1 matrices?
yes
yes
yes
three 2x1 matrices each of which contains one vector of size 1
Just find out using .shape:
import numpy as np
rand = np.random.RandomState(42)
# 1.
a = rand.rand(3, 2)
print(a.shape, a, sep='\n', end='\n\n')
# 2.
b = a[np.newaxis, :, :]
print(b.shape, b, sep='\n', end='\n\n')
# 3.
c = a[:, np.newaxis, :]
print(c.shape, c, sep='\n', end='\n\n')
# 4.a
d = a[:, :, np.newaxis]
print(d.shape, d, sep='\n', end='\n\n')
# 4.b
print(d[0].shape, d[0], sep='\n', end='\n\n')
print(d[0, 0].shape, d[0, 0])
output:
(3, 2)
[[0.37454012 0.95071431]
[0.73199394 0.59865848]
[0.15601864 0.15599452]]
(1, 3, 2)
[[[0.37454012 0.95071431]
[0.73199394 0.59865848]
[0.15601864 0.15599452]]]
(3, 1, 2)
[[[0.37454012 0.95071431]]
[[0.73199394 0.59865848]]
[[0.15601864 0.15599452]]]
(3, 2, 1)
[[[0.37454012]
[0.95071431]]
[[0.73199394]
[0.59865848]]
[[0.15601864]
[0.15599452]]]
(2, 1)
[[0.37454012]
[0.95071431]]
(1,) [0.37454012]

How to group values in matrix with items of unequal length

Lets say I have a simple array:
a = np.arange(3)
And an array of indices with the same length:
I = np.array([0, 0, 1])
I now want to group the values based on the indices.
How would I group the elements of the first array to produce the result below?
np.array([[0, 1], [2], dtype=object)
Here is what I tried:
a = np.arange(3)
I = np.array([0, 0, 1])
out = np.empty(2, dtype=object)
out.fill([])
aslists = np.vectorize(lambda x: [x], otypes=['object'])
out[I] += aslists(a)
However, this approach does not concatenate the lists, but only maintains the last value for each index:
array([[1], [2]], dtype=object)
Or, for a 2-dimensional case:
a = np.random.rand(100)
I = (np.random.random(100) * 5 //1).astype(int)
J = (np.random.random(100) * 5 //1).astype(int)
out = np.empty((5, 5), dtype=object)
out.fill([])
How can I append the items from a to out based on the two index arrays?
1D Case
Assuming I being sorted, for a list of arrays as output -
idx = np.unique(I, return_index=True)[1]
out = np.split(a,idx)[1:]
Another with slicing to get idx for splitting a -
out = np.split(a, np.flatnonzero(I[1:] != I[:-1])+1)
To get an array of lists as output -
np.array([i.tolist() for i in out])
Sample run -
In [84]: a = np.arange(3)
In [85]: I = np.array([0, 0, 1])
In [86]: out = np.split(a, np.flatnonzero(I[1:] != I[:-1])+1)
In [87]: out
Out[87]: [array([0, 1]), array([2])]
In [88]: np.array([i.tolist() for i in out])
Out[88]: array([[0, 1], [2]], dtype=object)
2D Case
For 2D case of filling into a 2D array with groupings made from indices in two arrays I and J that represent the rows and columns where the groups are to be assigned, we could do something like this -
ncols = 5
lidx = I*ncols+J
sidx = lidx.argsort() # Use kind='mergesort' to keep order
lidx_sorted = lidx[sidx]
unq_idx, split_idx = np.unique(lidx_sorted, return_index=True)
out.flat[unq_idx] = np.split(a[sidx], split_idx)[1:]

Numpy normalize multi dim (>=3) array

I have a 5 dim array (comes from binning operations) and would like to have it normed (sum == 1 for the last dimension).
I thought I found the answer here but it says:
ValueError: Found array with dim 5. the normalize function expected <= 2.
I achieve the result with 5 nested loops, like:
for en in range(en_bin.nb):
for zd in range(zd_bin.nb):
for az in range(az_bin.nb):
for oa in range(oa_bin.nb):
# reduce fifth dimension (en reco) for normalization
b = np.sum(a[en][zd][az][oa])
for er in range(er_bin.nb):
a[en][zd][az][oa][er] /= b
but I want to vectorise operations.
For example:
In [18]: a.shape
Out[18]: (3, 1, 1, 2, 4)
In [20]: b.shape
Out[20]: (3, 1, 1, 2)
In [22]: a
Out[22]:
array([[[[[ 0.90290316, 0.00953237, 0.57925688, 0.65402645],
[ 0.68826638, 0.04982717, 0.30458093, 0.0025204 ]]]],
[[[[ 0.7973917 , 0.93050739, 0.79963614, 0.75142376],
[ 0.50401287, 0.81916812, 0.23491561, 0.77206141]]]],
[[[[ 0.44507296, 0.06625994, 0.6196917 , 0.6808444 ],
[ 0.8199077 , 0.02179789, 0.24627425, 0.43382448]]]]])
In [23]: b
Out[23]:
array([[[[ 2.14571886, 1.04519487]]],
[[[ 3.27895899, 2.33015801]]],
[[[ 1.81186899, 1.52180432]]]])
Sum along the last axis by listing axis=-1 with numpy.sum, keeping dimensions and then simply divide by the array itself, thus bringing in NumPy broadcasting -
a/a.sum(axis=-1,keepdims=True)
This should be applicable for ndarrays of generic number of dimensions.
Alternatively, we could sum with axis-reduction and then add a new axis with None/np.newaxis to match up with the input array shape and then divide -
a/(a.sum(axis=-1)[...,None])

Euclidean distances between several images and one base image

I have a matrix X of dimensions (30x8100) and another one Y of dimensions (1x8100). I want to generate an array containing the difference between them (X[1]-Y, X[2]-Y,..., X[30]-Y)
Can anyone help?
All you need for that is
X - Y
Since several people have offered answers that seem to try to make the shapes match manually, I should explain:
Numpy will automatically expand Y's shape so that it matches with that of X. This is called broadcasting, and it usually does a very good job of guessing what should be done. In ambiguous cases, an axis keyword can be applied to tell it which direction to do things. Here, since Y has a dimension of length 1, that is the axis that is expanded to be length 30 to match with X's shape.
For example,
In [87]: import numpy as np
In [88]: n, m = 3, 5
In [89]: x = np.arange(n*m).reshape(n,m)
In [90]: y = np.arange(m)[None,...]
In [91]: x.shape
Out[91]: (3, 5)
In [92]: y.shape
Out[92]: (1, 5)
In [93]: (x-y).shape
Out[93]: (3, 5)
In [106]: x
Out[106]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [107]: y
Out[107]: array([[0, 1, 2, 3, 4]])
In [108]: x-y
Out[108]:
array([[ 0, 0, 0, 0, 0],
[ 5, 5, 5, 5, 5],
[10, 10, 10, 10, 10]])
But this is not really a euclidean distance, as your title seems to suggest you want:
df = np.asarray(x - y) # the difference between the images
dst = np.sqrt(np.sum(df**2, axis=1)) # their euclidean distances
use array and use numpy broadcasting in order to subtract it from Y
init the matrix:
>>> from numpy import *
>>> a = array([[1,2,3],[4,5,6]])
Accessing the second row in a:
>>> a[1]
array([4, 5, 6])
Subtract array from Y
>>> Y = array([3,9,0])
>>> a - Y
array([[-2, -7, 3],
[ 1, -4, 6]])
Just iterate rows from your numpy array and you can actually just subtract them and numpy will make a new array with the differences!
import numpy as np
final_array = []
#X is a numpy array that is 30X8100 and Y is a numpy array that is 1X8100
for row in X:
output = row - Y
final_array.append(output)
output will be your resulting array of X[0] - Y, X[1] - Y etc. Now your final_array will be an array with 30 arrays inside, each that have the values of the X-Y that you need! Simple as that. Just make sure you convert your matrices to a numpy arrays first
Edit: Since numpy broadcasting will do the iteration, all you need is one line once you have your two arrays:
final_array = X - Y
And then that is your array with the differences!
a1 = numpy.array(X) #make sure you have a numpy array like [[1,2,3],[4,5,6],...]
a2 = numpy.array(Y) #make sure you have a 1d numpy array like [1,2,3,...]
a2 = [a2] * len(a1[0]) #make a2 as wide as a1
a2 = numpy.array(zip(*a2)) #transpose it (a2 is now same shape as a1)
print a1-a2 #idiomatic difference between a1 and a2 (or X and Y)

Categories