Related
I'm trying to use advanced indexing but I cannot get it to work with this simple array
arr = np.array([[[ 1, 10, 100,1000],[ 2, 20, 200,2000]],[[ 3, 30, 300,3000],[ 4,40,400,4000]],[[5, 50, 500,5000],[6, 60,600,6000]]])
d1=np.array([0])
d2=np.array([0,1])
d3=np.array([0,1,2])
arr[d1,d2,d3]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (1,) (2,) (3,)
and
arr[d1[:,np.newaxis],d2[np.newaxis,:],d3]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (1,1) (1,2) (3,)
Expected output:
array([[[ 1, 10, 100],
[ 2, 20, 200]]])
You can use np.ix_ to combine several one-dimensional index arrays of different lengths to index a multidimensional array. For example:
arr[np.ix_(d1,d2,d3)]
To add more context, np.ix_ returns a tuple of ndimensional arrays. The same can be achieved "by hand" by adding np.newaxis for appropriate dimensions:
xs, ys, zs = np.ix_(d1,d2,d3)
# xs.shape == (1, 1, 1) == (len(d1), 1, 1 )
# ys.shape == (1, 2, 1) == (1, len(d2), 1 )
# zs.shape == (1, 1, 3) == (1, 1, len(d3))
result_ix = arr[xs, ys, zs]
# using newaxis:
result_newaxis = arr[
d1[:, np.newaxis, np.newaxis],
d2[np.newaxis, :, np.newaxis],
d3[np.newaxis, np.newaxis, :],
]
assert (result_ix == result_newaxis).all()
You need only d1 to select the first cell:
>>> arr[d1]
array([[[ 1, 10, 100],
[ 2, 20, 200]]])
Suppose we have an array
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
Now I have below
row_r1 = a[1, :]
row_r2 = a[1:2, :]
print(row_r1.shape)
print(row_r2.shape)
I don't understand why row_r1.shape is (4,) and row_r2.shape is (1,4)
Shouldn't their shape all equal to (4,)?
I like to think of it this way. The first way row[1, :], states go get me all values on row 1 like this:
Returning:
array([5, 6, 7, 8])
shape
(4,) Four values in a numpy array.
Where as the second row[1:2, :], states go get me a slice of data between index 1 and index 2:
Returning:
array([[5, 6, 7, 8]]) Note: the double brackets
shape
(1,4) Four values in on one row in a np.array.
Their shapes are different because they aren't the same thing. You can verify by printing them:
import numpy as np
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
row_r1 = a[1, :]
row_r2 = a[1:2, :]
print("{} is shape {}".format(row_r1, row_r1.shape))
print("{} is shape {}".format(row_r2, row_r2.shape))
Yields:
[5 6 7 8] is shape (4,)
[[5 6 7 8]] is shape (1, 4)
This is because indexing will return an element, whereas slicing will return an array. You can however manipulate them to be the same thing using the .resize() function available to numpy arrays.
The code:
import numpy as np
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
row_r1 = a[1, :]
row_r2 = a[1:2, :]
print("{} is shape {}".format(row_r1, row_r1.shape))
print("{} is shape {}".format(row_r2, row_r2.shape))
# Now resize row_r1 to be the same shape
row_r1.resize((1, 4))
print("{} is shape {}".format(row_r1, row_r1.shape))
print("{} is shape {}".format(row_r2, row_r2.shape))
Yields
[5 6 7 8] is shape (4,)
[[5 6 7 8]] is shape (1, 4)
[[5 6 7 8]] is shape (1, 4)
[[5 6 7 8]] is shape (1, 4)
Showing that you are in fact now dealing with the same shaped object. Hope this helps clear it up!
My question is about Python array shape.
What is the difference between array size (2, ) and (2, 1)?
I tried to add those two arrays together. However, I got an error as follows:
Non-broadcastable output operant with shape (2, ) doesn't match the broadcast shape (2, 2)
There is no difference in the raw memory. But logically, one is a one-dimensional array of two values, the other is a 2D array (where one of the dimensions just happens to be size 1).
The logical distinction is important to numpy; when you try to add them, it wants to make a new 2x2 array where the top row is the sum of the (2, 1) array's top "row" with each value in the (2,) array. If you use += to do that though, you're indicating that you expect to be able to modify the (2,) array in place, which is not possible without resizing (which numpy won't do). If you change your code from:
arr1 += arr2
to:
arr1 = arr1 + arr2
it will happily create a new (2, 2) array. Or if the goal was that the 2x1 array should act like a flat 1D array, you can flatten it:
alreadyflatarray += twodarray.flatten()
(2,) is an unidimensional array, (2,1) is a matrix with only one column
You can easily see the difference by crating arrays full of zeros using np.zero passing the desired shape:
>>> np.zeros((2,))
array([0., 0.])
>>> np.zeros((2,1))
array([[0.],
[0.]])
#yx131, you can have a look at the below code to just have a clear picture about tuples and it's use in defining the shape of numpy arrays.
Note: Do not forget to see the code below as it has explanation of the problems related to Broadcasting in numpy.
Also check numpy's broadcasting rule at
https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html.
There's a difference between (2) and (2,). The first one is a literal value 2 where as the 2nd one is a tuple.
(2,) is 1 item tuple and (2, 2) is 2 items tuple. It is clear in the code example.
Note: In case of numpy arrays, (2,) denotes shape of 1 dimensional array of 2 items and (2, 2) denotes the shape of 2 dimensional array (matrix) with 2 rows and 2 colums. If you want to add 2 arrays then their shape should be same.
v = (2) # Assignment of value 2
t = (2,) # Comma is necessary at the end of item to define 1 item tuple, it is not required in case of list
t2 = (2, 1) # 2 item tuple
t3 = (3, 4) # 2 item tuple
print(v, type(v))
print(t, type(t))
print(t2, type(t2))
print(t3, type(t3))
print(t + t2)
print(t2 + t3)
"""
2 <class 'int'>
(2,) <class 'tuple'>
(2, 1) <class 'tuple'>
(3, 4) <class 'tuple'>
(2, 2, 1)
(2, 1, 3, 4)
"""
Now, let's have a look at the below code to figure out the error related to broadcasting. It's all related to dimensions.
# Python 3.5.2
import numpy as np
arr1 = np.array([1, 4]);
arr2 = np.array([7, 6, 3, 8]);
arr3 = np.array([3, 6, 2, 1]);
print(arr1, ':', arr1.shape)
print(arr2, ":", arr2.shape)
print(arr3, ":", arr3.shape)
print ("\n")
"""
[1 4] : (2,)
[7 6 3 8] : (4,)
[3 6 2 1] : (4,)
"""
# Changing shapes (dimensions)
arr1.shape = (2, 1)
arr2.shape = (2, 2)
arr3.shape = (2, 2)
print(arr1, ':', arr1.shape)
print(arr2, ":", arr2.shape)
print(arr3, ":", arr3.shape)
print("\n")
print(arr1 + arr2)
"""
[[1]
[4]] : (2, 1)
[[7 6]
[3 8]] : (2, 2)
[[3 6]
[2 1]] : (2, 2)
[[ 8 7]
[ 7 12]]
"""
arr1.shape = (2, )
print(arr1, arr1.shape)
print(arr1 + arr2)
"""
[1 4] (2,)
[[ 8 10]
[ 4 12]]
"""
# Code with error(Broadcasting related)
arr2.shape = (4,)
print(arr1+arr2)
"""
Traceback (most recent call last):
File "source_file.py", line 53, in <module>
print(arr1+arr2)
ValueError: operands could not be broadcast together
with shapes (2,) (4,)
"""
So in your case, the problem is related to the mismatched dimensions (acc. to numpy's broadcasting ) to be added. Thanks.
Make an array that has shape (2,)
In [164]: a = np.array([3,6])
In [165]: a
Out[165]: array([3, 6])
In [166]: a.shape
Out[166]: (2,)
In [167]: a.reshape(2,1)
Out[167]:
array([[3],
[6]])
In [168]: a.reshape(1,2)
Out[168]: array([[3, 6]])
The first displays like a simple list [3,6]. The second as a list with 2 nested lists. The third as a list with one nested list of 2 items. So there is a consistent relation between shape and list nesting.
In [169]: a + a
Out[169]: array([ 6, 12]) # shape (2,)
In [170]: a + a.reshape(1,2)
Out[170]: array([[ 6, 12]]) # shape (1,2)
In [171]: a + a.reshape(2,1)
Out[171]:
array([[ 6, 9], # shape (2,2)
[ 9, 12]])
Dimensions behave as:
(2,) + (2,) => (2,)
(2,) + (1,2) => (1,2) + (1,2) => (1,2)
(2,) + (2,1) => (1,2) + (2,1) => (2,2) + (2,2) => (2,2)
That is a lower dimensional array can be expanded to the matching number of dimensions with the addition of leading size 1 dimensions.
And size 1 dimensions can be changed to match the corresponding dimension.
I suspect you got the error when doing a a += ... (If so you should have stated that clearly.)
In [172]: a += a
In [173]: a += a.reshape(1,2)
....
ValueError: non-broadcastable output operand with shape (2,)
doesn't match the broadcast shape (1,2)
In [175]: a += a.reshape(2,1)
...
ValueError: non-broadcastable output operand with shape (2,)
doesn't match the broadcast shape (2,2)
With the a+=... addition, the result shape is fixed at (2,), the shape of a. But as noted above the two additions generate (1,2) and (2,2) results, which aren't compatible with (2,).
The same reasoning can explain these additions and errors:
In [176]: a1 = a.reshape(1,2)
In [177]: a1 += a
In [178]: a1
Out[178]: array([[12, 24]])
In [179]: a2 = a.reshape(2,1)
In [180]: a2 += a
...
ValueError: non-broadcastable output operand with shape (2,1)
doesn't match the broadcast shape (2,2)
In [182]: a1 += a2
...
ValueError: non-broadcastable output operand with shape (1,2)
doesn't match the broadcast shape (2,2)
I want to concatenate two csr_matrix, each with shape=(1,N).
I know I should use scipy.sparse.vstack:
from scipy.sparse import csr_matrix,vstack
c1 = csr_matrix([[1, 2]])
c2 = csr_matrix([[3, 4]])
print c1.shape,c2.shape
print vstack([c1, c2], format='csr')
#prints:
(1, 2) (1, 2)
(0, 0) 1
(0, 1) 2
(1, 0) 3
(1, 1) 4
However, my code fails:
from scipy.sparse import csr_matrix,vstack
import numpy as np
y_train = np.array([1, 0, 1, 0, 1, 0])
X_train = csr_matrix([[1, 1], [-1, 1], [1, 0], [-1, 0], [1, -1], [-1, -1]])
c0 = X_train[y_train == 0].mean(axis=0)
c1 = X_train[y_train == 1].mean(axis=0)
print c0.shape, c1.shape #prints (1L, 2L) (1L, 2L)
print c0,c1 #prints [[-1. 0.]] [[ 1. 0.]]
print vstack([c0,c1], format='csr')
The last line raises exception -
File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 484, in vstack
return bmat([[b] for b in blocks], format=format, dtype=dtype)
File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 533, in bmat
raise ValueError('blocks must be 2-D')
ValueError: blocks must be 2-D
I guess using mean has something to do with out.
Any ideas?
Taking the mean of a sparse matrix returns a NumPy matrix (which is not sparse).
So c0 and c1 are matrices:
In [76]: type(c0)
Out[76]: numpy.matrixlib.defmatrix.matrix
In [89]: sparse.issparse(c0)
Out[94]: False
vstack expects its first argument to be a sequence of sparse matrices.
So make (at least) the first matrix a sparse matrix:
In [31]: vstack([coo_matrix(c0), c1])
Out[31]:
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
In [32]: vstack([coo_matrix(c0), c1]).todense()
Out[32]:
matrix([[-1., 0.],
[ 1., 0.]])
I have a matrix X of dimensions (30x8100) and another one Y of dimensions (1x8100). I want to generate an array containing the difference between them (X[1]-Y, X[2]-Y,..., X[30]-Y)
Can anyone help?
All you need for that is
X - Y
Since several people have offered answers that seem to try to make the shapes match manually, I should explain:
Numpy will automatically expand Y's shape so that it matches with that of X. This is called broadcasting, and it usually does a very good job of guessing what should be done. In ambiguous cases, an axis keyword can be applied to tell it which direction to do things. Here, since Y has a dimension of length 1, that is the axis that is expanded to be length 30 to match with X's shape.
For example,
In [87]: import numpy as np
In [88]: n, m = 3, 5
In [89]: x = np.arange(n*m).reshape(n,m)
In [90]: y = np.arange(m)[None,...]
In [91]: x.shape
Out[91]: (3, 5)
In [92]: y.shape
Out[92]: (1, 5)
In [93]: (x-y).shape
Out[93]: (3, 5)
In [106]: x
Out[106]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [107]: y
Out[107]: array([[0, 1, 2, 3, 4]])
In [108]: x-y
Out[108]:
array([[ 0, 0, 0, 0, 0],
[ 5, 5, 5, 5, 5],
[10, 10, 10, 10, 10]])
But this is not really a euclidean distance, as your title seems to suggest you want:
df = np.asarray(x - y) # the difference between the images
dst = np.sqrt(np.sum(df**2, axis=1)) # their euclidean distances
use array and use numpy broadcasting in order to subtract it from Y
init the matrix:
>>> from numpy import *
>>> a = array([[1,2,3],[4,5,6]])
Accessing the second row in a:
>>> a[1]
array([4, 5, 6])
Subtract array from Y
>>> Y = array([3,9,0])
>>> a - Y
array([[-2, -7, 3],
[ 1, -4, 6]])
Just iterate rows from your numpy array and you can actually just subtract them and numpy will make a new array with the differences!
import numpy as np
final_array = []
#X is a numpy array that is 30X8100 and Y is a numpy array that is 1X8100
for row in X:
output = row - Y
final_array.append(output)
output will be your resulting array of X[0] - Y, X[1] - Y etc. Now your final_array will be an array with 30 arrays inside, each that have the values of the X-Y that you need! Simple as that. Just make sure you convert your matrices to a numpy arrays first
Edit: Since numpy broadcasting will do the iteration, all you need is one line once you have your two arrays:
final_array = X - Y
And then that is your array with the differences!
a1 = numpy.array(X) #make sure you have a numpy array like [[1,2,3],[4,5,6],...]
a2 = numpy.array(Y) #make sure you have a 1d numpy array like [1,2,3,...]
a2 = [a2] * len(a1[0]) #make a2 as wide as a1
a2 = numpy.array(zip(*a2)) #transpose it (a2 is now same shape as a1)
print a1-a2 #idiomatic difference between a1 and a2 (or X and Y)