I am new to python and confused on a code
X = np.array([2,3,4,4])
print(np.dot(X,X))
This works
Y = np.array([[100],
[200],
[300],
[400]])
print(np.dot(Y,Y))
This doesn't. I understood it is because of the relationship with array dimensions. But I cant understand how. Please explain.
X is a 1d array (row vector is not the right descriptor):
In [382]: X = np.array([2,3,4,4])
In [383]: X.shape
Out[383]: (4,)
In [384]: np.dot(X,X) # docs for 1d arrays apply
Out[384]: 45
Y is 2d array.
In [385]: Y = X[:,None]
In [386]: Y
Out[386]:
array([[2],
[3],
[4],
[4]])
In [387]: Y.shape
Out[387]: (4, 1)
In [388]: np.dot(Y,Y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-388-3a0bc5156893> in <module>()
----> 1 np.dot(Y,Y)
ValueError: shapes (4,1) and (4,1) not aligned: 1 (dim 1) != 4 (dim 0)
For 2d arrays, the last dimension of the first pairs with the 2nd to the last of second.
In [389]: np.dot(Y,Y.T) # (4,1) pair with (1,4) to produce (4,4)
Out[389]:
array([[ 4, 6, 8, 8],
[ 6, 9, 12, 12],
[ 8, 12, 16, 16],
[ 8, 12, 16, 16]])
In [390]: np.dot(Y.T,Y) # (1,4) pair with (4,1) to produce (1,1)
Out[390]: array([[45]])
Related
I have a mask with a mask_re:(8781288, 1) including ones and zeros, label file (y_lbl:(8781288, 1)) and a feature vector with feat_re: (8781288, 64). I need to take only those rows from feature vector and label files that are 1 in the mask file. how can I do this, and how can apply the opposite action of putting (recovering back) prediction values (ypred) in the masked_label file based on the mask file in the elements that are one?
For example in Matlab can be done easily X=feat_re(mask_re==1) and can be recovered back new_lbl(mask_re==1)=ypred, where new_lbl=zeros(8781288, 1). I tried to do a similar thing in python:
X=feat_re[np.where(mask_re==1),:]
X.shape
(2, 437561, 64)
EDITED (SOLVED) According to what #hpaulj suggested
The problem was with the shape of my mask file, once I changed it to mask_new=mask_re.reshape((8781288)), it solved my issue, and then
X=feat_re[mask_new==1,:]
(437561, 64)
In [182]: arr = np.arange(12).reshape(3,4)
In [183]: mask = np.array([1,0,1], bool)
In [184]: arr[mask,:]
Out[184]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
In [185]: new = np.zeros_like(arr)
In [186]: new[mask,:] = np.array([10,12,14,16])
In [187]: new
Out[187]:
array([[10, 12, 14, 16],
[ 0, 0, 0, 0],
[10, 12, 14, 16]])
I suspect your error comes from the shape of mask:
In [188]: mask1 = mask[:,None]
In [189]: mask1.shape
Out[189]: (3, 1)
In [190]: arr[mask1,:]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-190-6317c3ea0302> in <module>
----> 1 arr[mask1,:]
IndexError: too many indices for array
Remember, numpy can have 1d and 0d arrays; it doesn't force everything to be 2d.
With where (aka nonzero):
In [191]: np.nonzero(mask)
Out[191]: (array([0, 2]),) # 1 element tuple
In [192]: np.nonzero(mask1)
Out[192]: (array([0, 2]), array([0, 0])) # 2 element tuple
In [193]: arr[_191] # using the mask index
Out[193]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
you can use boolean indexing for masking like below
X = feat_re[mask_re==1, :]
X = X.reshape(2, -1, 64)
this selects rows of feat_re where (mask_re==1) is True. Then you can reshape x using reshape function. you can again use reshape to get back to same array shape. "-1" in reshape indicate the size need to be calculated by numpy
Suppose we have an array
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
Now I have below
row_r1 = a[1, :]
row_r2 = a[1:2, :]
print(row_r1.shape)
print(row_r2.shape)
I don't understand why row_r1.shape is (4,) and row_r2.shape is (1,4)
Shouldn't their shape all equal to (4,)?
I like to think of it this way. The first way row[1, :], states go get me all values on row 1 like this:
Returning:
array([5, 6, 7, 8])
shape
(4,) Four values in a numpy array.
Where as the second row[1:2, :], states go get me a slice of data between index 1 and index 2:
Returning:
array([[5, 6, 7, 8]]) Note: the double brackets
shape
(1,4) Four values in on one row in a np.array.
Their shapes are different because they aren't the same thing. You can verify by printing them:
import numpy as np
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
row_r1 = a[1, :]
row_r2 = a[1:2, :]
print("{} is shape {}".format(row_r1, row_r1.shape))
print("{} is shape {}".format(row_r2, row_r2.shape))
Yields:
[5 6 7 8] is shape (4,)
[[5 6 7 8]] is shape (1, 4)
This is because indexing will return an element, whereas slicing will return an array. You can however manipulate them to be the same thing using the .resize() function available to numpy arrays.
The code:
import numpy as np
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
row_r1 = a[1, :]
row_r2 = a[1:2, :]
print("{} is shape {}".format(row_r1, row_r1.shape))
print("{} is shape {}".format(row_r2, row_r2.shape))
# Now resize row_r1 to be the same shape
row_r1.resize((1, 4))
print("{} is shape {}".format(row_r1, row_r1.shape))
print("{} is shape {}".format(row_r2, row_r2.shape))
Yields
[5 6 7 8] is shape (4,)
[[5 6 7 8]] is shape (1, 4)
[[5 6 7 8]] is shape (1, 4)
[[5 6 7 8]] is shape (1, 4)
Showing that you are in fact now dealing with the same shaped object. Hope this helps clear it up!
Let say, I have an array with
x.shape = (10,1024)
when I try to print x[0].shape
x[0].shape
it prints 1024
and when I print x.shape[0]
x.shape[0]
it prints 10
I know it's a silly question, and maybe there is another question like this, but can someone explain it to me ?
x is a 2D array, which can also be looked upon as an array of 1D arrays, having 10 rows and 1024 columns. x[0] is the first 1D sub-array which has 1024 elements (there are 10 such 1D sub-arrays in x), and x[0].shape gives the shape of that sub-array, which happens to be a 1-tuple - (1024, ).
On the other hand, x.shape is a 2-tuple which represents the shape of x, which in this case is (10, 1024). x.shape[0] gives the first element in that tuple, which is 10.
Here's a demo with some smaller numbers, which should hopefully be easier to understand.
x = np.arange(36).reshape(-1, 9)
x
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35]])
x[0]
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
x[0].shape
(9,)
x.shape
(4, 9)
x.shape[0]
4
x[0].shape will give the Length of 1st row of an array. x.shape[0] will give the number of rows in an array. In your case it will give output 10. If you will type x.shape[1], it will print out the number of columns i.e 1024. If you would type x.shape[2], it will give an error, since we are working on a 2-d array and we are out of index. Let me explain you all the uses of 'shape' with a simple example by taking a 2-d array of zeros of dimension 3x4.
import numpy as np
#This will create a 2-d array of zeroes of dimensions 3x4
x = np.zeros((3,4))
print(x)
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]
#This will print the First Row of the 2-d array
x[0]
array([ 0., 0., 0., 0.])
#This will Give the Length of 1st row
x[0].shape
(4,)
#This will Give the Length of 2nd row, verified that length of row is showing same
x[1].shape
(4,)
#This will give the dimension of 2-d Array
x.shape
(3, 4)
# This will give the number of rows is 2-d array
x.shape[0]
3
# This will give the number of columns is 2-d array
x.shape[1]
3
# This will give the number of columns is 2-d array
x.shape[1]
4
# This will give an error as we have a 2-d array and we are asking value for an index
out of range
x.shape[2]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-20-4b202d084bc7> in <module>()
----> 1 x.shape[2]
IndexError: tuple index out of range
x[0].shape gives you the length of the first row. x.shape[0] gives you the first component of the dimensions of 'x', 1024 rows by 10 columns.
x.shape[0] will give the number of rows in an array.
x[0] is 1st row of x so x[0].shape will provide the length of 1st row.
I am trying to subtract the minimum value of an ndarray for an arbitrary dimension. It seems to work with 3 dimensions, but not 4
3 Dimensional Case:
x1 = np.arange(27.0).reshape((3, 3, 3))
# x1 is (3,3,3)
x2 = x1.min(axis=(1,2))
# x2 is (3,)
(x1 - x2).shape
#Output: (3, 3, 3)
(x1 - x2).shape == x1.shape
#As expected: True
4 Dimnesional Case:
mat1 = np.random.rand(10,5,2,1)
# mat1 is (10,5,2,1)
mat2 = mat1.min(axis = (1,2,3))
# mat2 is (10,)
(mat1 - mat2).shape == mat1.shape
# Should be True, but
#Output: False
Your first example is misleading because all dimensions are the same size. That hides the kind of error that you see in the 2nd. Examples with different size dimensions are better at catching errors:
In [530]: x1 = np.arange(2*3*4).reshape(2,3,4)
In [531]: x2 = x1.min(axis=(1,2))
In [532]: x2.shape
Out[532]: (2,)
In [533]: x1-x2
...
ValueError: operands could not be broadcast together with shapes (2,3,4) (2,)
Compare that with a case where I tell it to keep dimensions:
In [534]: x2 = x1.min(axis=(1,2),keepdims=True)
In [535]: x2.shape
Out[535]: (2, 1, 1)
In [536]: x1-x2
Out[536]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
The basic rule of broadcasting: a (2,) array can expand to (1,1,2) if needed, but not to (2,1,1).
But why doesn't the 2nd case produce an error?
In [539]: mat1.shape
Out[539]: (10, 5, 2, 1)
In [540]: mat2.shape
Out[540]: (10,)
In [541]: (mat1-mat2).shape
Out[541]: (10, 5, 2, 10)
It's that trailing size 1, which can broadcast with the (10,):
(10,5,2,1) (10,) => (10,5,2,1)(1,1,1,10) => (10,5,2,10)
It's as though you'd added a newaxis to a 3d array:
mat1 = np.random.rand(10,5,2)
mat1[...,None] - mat2
My question is very similar to
Indexing tensor with index matrix in theano?
except that I have 3 dimensions. At first I want to got it working in numpy. With 2 dimensions there is no problem:
>>> idx = np.random.randint(3, size=(4, 2, 3))
>>> d = np.random.rand(4*2*3).reshape((4, 2, 3))
>>> d[1]
array([[ 0.37057415, 0.73066383, 0.76399376],
[ 0.12155831, 0.12552545, 0.87648523]])
>>> idx[1]
array([[2, 0, 1],
[2, 2, 2]])
>>> d[1][np.arange(d.shape[1])[:, np.newaxis], idx[1]]
array([[ 0.76399376, 0.37057415, 0.73066383],
[ 0.87648523, 0.87648523, 0.87648523]]) #All correct
But I have no idea how to make it works with all 3 dimensions. Example of failed try:
>>> d[np.arange(d.shape[0])[:, np.newaxis], np.arange(d.shape[1]), idx]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,1) (2,) (4,2,3)
Does this work?
d[
np.arange(d.shape[0])[:, np.newaxis, np.newaxis],
np.arange(d.shape[1])[:, np.newaxis],
idx
]
You need the index arrays to collectively have broadcastable dimensions