advanced indexing using numpy

advanced indexing using numpy - python

I'm trying to use advanced indexing but I cannot get it to work with this simple array
arr = np.array([[[ 1, 10, 100,1000],[ 2, 20, 200,2000]],[[ 3, 30, 300,3000],[ 4,40,400,4000]],[[5, 50, 500,5000],[6, 60,600,6000]]])
d1=np.array([0])
d2=np.array([0,1])
d3=np.array([0,1,2])
arr[d1,d2,d3]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (1,) (2,) (3,)
and
arr[d1[:,np.newaxis],d2[np.newaxis,:],d3]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (1,1) (1,2) (3,)
Expected output:
array([[[ 1, 10, 100],
[ 2, 20, 200]]])

You can use np.ix_ to combine several one-dimensional index arrays of different lengths to index a multidimensional array. For example:
arr[np.ix_(d1,d2,d3)]
To add more context, np.ix_ returns a tuple of ndimensional arrays. The same can be achieved "by hand" by adding np.newaxis for appropriate dimensions:
xs, ys, zs = np.ix_(d1,d2,d3)
# xs.shape == (1, 1, 1) == (len(d1), 1, 1 )
# ys.shape == (1, 2, 1) == (1, len(d2), 1 )
# zs.shape == (1, 1, 3) == (1, 1, len(d3))
result_ix = arr[xs, ys, zs]
# using newaxis:
result_newaxis = arr[
d1[:, np.newaxis, np.newaxis],
d2[np.newaxis, :, np.newaxis],
d3[np.newaxis, np.newaxis, :],
]
assert (result_ix == result_newaxis).all()

You need only d1 to select the first cell:
>>> arr[d1]
array([[[ 1, 10, 100],
[ 2, 20, 200]]])

Related

Masking out some rows of numpy array and recover back

I have a mask with a mask_re:(8781288, 1) including ones and zeros, label file (y_lbl:(8781288, 1)) and a feature vector with feat_re: (8781288, 64). I need to take only those rows from feature vector and label files that are 1 in the mask file. how can I do this, and how can apply the opposite action of putting (recovering back) prediction values (ypred) in the masked_label file based on the mask file in the elements that are one?
For example in Matlab can be done easily X=feat_re(mask_re==1) and can be recovered back new_lbl(mask_re==1)=ypred, where new_lbl=zeros(8781288, 1). I tried to do a similar thing in python:
X=feat_re[np.where(mask_re==1),:]
X.shape
(2, 437561, 64)
EDITED (SOLVED) According to what #hpaulj suggested
The problem was with the shape of my mask file, once I changed it to mask_new=mask_re.reshape((8781288)), it solved my issue, and then
X=feat_re[mask_new==1,:]
(437561, 64)

In [182]: arr = np.arange(12).reshape(3,4)
In [183]: mask = np.array([1,0,1], bool)
In [184]: arr[mask,:]
Out[184]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
In [185]: new = np.zeros_like(arr)
In [186]: new[mask,:] = np.array([10,12,14,16])
In [187]: new
Out[187]:
array([[10, 12, 14, 16],
[ 0, 0, 0, 0],
[10, 12, 14, 16]])
I suspect your error comes from the shape of mask:
In [188]: mask1 = mask[:,None]
In [189]: mask1.shape
Out[189]: (3, 1)
In [190]: arr[mask1,:]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-190-6317c3ea0302> in <module>
----> 1 arr[mask1,:]
IndexError: too many indices for array
Remember, numpy can have 1d and 0d arrays; it doesn't force everything to be 2d.
With where (aka nonzero):
In [191]: np.nonzero(mask)
Out[191]: (array([0, 2]),) # 1 element tuple
In [192]: np.nonzero(mask1)
Out[192]: (array([0, 2]), array([0, 0])) # 2 element tuple
In [193]: arr[_191] # using the mask index
Out[193]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])

you can use boolean indexing for masking like below
X = feat_re[mask_re==1, :]
X = X.reshape(2, -1, 64)
this selects rows of feat_re where (mask_re==1) is True. Then you can reshape x using reshape function. you can again use reshape to get back to same array shape. "-1" in reshape indicate the size need to be calculated by numpy

Subtracting one dimensional array (list of scalars) from 3 dimensional arrays using broadcasting

I have a one dimesional array of scalar values
Y = np.array([1, 2])
I also have a 3-dimensional array:
X = np.random.randint(0, 255, size=(2, 2, 3))
I am attempting to subtract each value of Y from X, so I should get back Z which should be of shape (2, 2, 2, 3) or maybe (2, 2, 2, 3).
I can"t seem to figure out how to do this via broadcasting.
I tried changing the change of Y:
Y = np.array([[[1, 2]]])
but not sure what the correct shape should be.

Broadcasting lines up dimensions on the right. So you're looking to operate on a (2, 1, 1, 1) array and a (2, 2, 3) array.
The simplest way I can think of is using reshape:
Y = Y.reshape(-1, 1, 1, 1)
More generally:
Y = Y.reshape(-1, *([1] * X.ndim))
At most one of the arguments to reshape can be -1, indicating all the remaining size not accounted for by other dimensions.
To get Z of shape (2, 2, 2, 3):
Z = X - Y.reshape(-1, *([1] * X.ndim))
If you were OK with having Z of shape (2, 2, 3, 2), the operation would be much simpler:
Z = X[..., None] - Y
None or np.newaxis will insert a unit axis into the end of X's shape, making it broadcast properly with the 1D Y.

I am not entirely sure on which dimension you want your subtraction to take place, but X - Y will not return an error if you define Y such as Y = numpy.array([1,2]).reshape(2, 1, 1) or Y = numpy.array([1,2]).reshape(1, 2, 1).

Modify multiple columns in an array numpy

I have a numpy array (nxn matrix), and I would like to modify only the columns which sum is 0. And I would like to assign the same value to all of these columns.
To do that, I have first taken the index of the columns that sum to 0:
sum_lines = np.sum(mat_trans, axis = 0)
indices = np.where(sum_lines == 0)[0]
then I did a loop on those indices:
for i in indices:
mat_trans[:, i] = rank_vect
so that each of these columns now has the value of the rank_vect column vector.
I was wondering if there was a way to do this without loop, something that would look like:
mat_trans[:, (np.where(sum_lines == 0)[0]))] = rank_vect
Thanks!

In [114]: arr = np.array([[0,1,2,3],[1,0,2,-3],[-1,2,0,0]])
In [115]: sumlines = np.sum(arr, axis=0)
In [116]: sumlines
Out[116]: array([0, 3, 4, 0])
In [117]: idx = np.where(sumlines==0)[0]
In [118]: idx
Out[118]: array([0, 3])
So the columns that we want to modify are:
In [119]: arr[:,idx]
Out[119]:
array([[ 0, 3],
[ 1, -3],
[-1, 0]])
In [120]: rv = np.array([10,11,12])
If rv is 1d, we get a shape error:
In [121]: arr[:,idx] = rv
ValueError: shape mismatch: value array of shape (3,) could not be broadcast to indexing result of shape (2,3)
But if it is a column vector (shape (3,1)) it can be broadcast to the (3,2) target:
In [122]: arr[:,idx] = rv[:,None]
In [123]: arr
Out[123]:
array([[10, 1, 2, 10],
[11, 0, 2, 11],
[12, 2, 0, 12]])

This should do the trick
mat_trans[:,indices] = np.stack((rank_vect,)*indices.size,-1)
Please test and let me know if it does what you want. It just stacks the rank_vect repeatedly to match the shape of the LHS on the RHS.
I believe this is equivalent to
for i in indices:
mat_trans[:, i] = rank_vec
I'd be interested to know the speed difference

Column normalization behaves differently in higher dimensions

I am trying to subtract the minimum value of an ndarray for an arbitrary dimension. It seems to work with 3 dimensions, but not 4
3 Dimensional Case:
x1 = np.arange(27.0).reshape((3, 3, 3))
# x1 is (3,3,3)
x2 = x1.min(axis=(1,2))
# x2 is (3,)
(x1 - x2).shape
#Output: (3, 3, 3)
(x1 - x2).shape == x1.shape
#As expected: True
4 Dimnesional Case:
mat1 = np.random.rand(10,5,2,1)
# mat1 is (10,5,2,1)
mat2 = mat1.min(axis = (1,2,3))
# mat2 is (10,)
(mat1 - mat2).shape == mat1.shape
# Should be True, but
#Output: False

Your first example is misleading because all dimensions are the same size. That hides the kind of error that you see in the 2nd. Examples with different size dimensions are better at catching errors:
In [530]: x1 = np.arange(2*3*4).reshape(2,3,4)
In [531]: x2 = x1.min(axis=(1,2))
In [532]: x2.shape
Out[532]: (2,)
In [533]: x1-x2
...
ValueError: operands could not be broadcast together with shapes (2,3,4) (2,)
Compare that with a case where I tell it to keep dimensions:
In [534]: x2 = x1.min(axis=(1,2),keepdims=True)
In [535]: x2.shape
Out[535]: (2, 1, 1)
In [536]: x1-x2
Out[536]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
The basic rule of broadcasting: a (2,) array can expand to (1,1,2) if needed, but not to (2,1,1).
But why doesn't the 2nd case produce an error?
In [539]: mat1.shape
Out[539]: (10, 5, 2, 1)
In [540]: mat2.shape
Out[540]: (10,)
In [541]: (mat1-mat2).shape
Out[541]: (10, 5, 2, 10)
It's that trailing size 1, which can broadcast with the (10,):
(10,5,2,1) (10,) => (10,5,2,1)(1,1,1,10) => (10,5,2,10)
It's as though you'd added a newaxis to a 3d array:
mat1 = np.random.rand(10,5,2)
mat1[...,None] - mat2

advanced 3d indexind in theano

My question is very similar to
Indexing tensor with index matrix in theano?
except that I have 3 dimensions. At first I want to got it working in numpy. With 2 dimensions there is no problem:
>>> idx = np.random.randint(3, size=(4, 2, 3))
>>> d = np.random.rand(4*2*3).reshape((4, 2, 3))
>>> d[1]
array([[ 0.37057415, 0.73066383, 0.76399376],
[ 0.12155831, 0.12552545, 0.87648523]])
>>> idx[1]
array([[2, 0, 1],
[2, 2, 2]])
>>> d[1][np.arange(d.shape[1])[:, np.newaxis], idx[1]]
array([[ 0.76399376, 0.37057415, 0.73066383],
[ 0.87648523, 0.87648523, 0.87648523]]) #All correct
But I have no idea how to make it works with all 3 dimensions. Example of failed try:
>>> d[np.arange(d.shape[0])[:, np.newaxis], np.arange(d.shape[1]), idx]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,1) (2,) (4,2,3)

Does this work?
d[
np.arange(d.shape[0])[:, np.newaxis, np.newaxis],
np.arange(d.shape[1])[:, np.newaxis],
idx
]
You need the index arrays to collectively have broadcastable dimensions

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

advanced indexing using numpy - python

You need only d1 to select the first cell: >>> arr[d1] array([[[ 1, 10, 100], [ 2, 20, 200]]])

Related

Masking out some rows of numpy array and recover back

Subtracting one dimensional array (list of scalars) from 3 dimensional arrays using broadcasting

Modify multiple columns in an array numpy

Column normalization behaves differently in higher dimensions

advanced 3d indexind in theano

Categories

Resources