Numpy indexing multidimensional arrays with array and slice - python

My doubt is about this example in the numpy docs.
y = np.arange(35).reshape(5,7)
This is the operation that I am trying to clarify:
y[np.array([0,2,4]),1:3]
According to the docs:
"In effect, the slice is converted to an index array np.array([[1,2]]) (shape (1,2)) that is broadcast with the index array to produce a resultant array of shape (3,2)."
This does not work, so I am assuming it is not equivalent
y[np.array([0,2,4]), np.array([1,2])]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-140-f4cd35e70141> in <module>()
----> 1 y[np.array([0,2,4]), np.array([1,2])]
ValueError: shape mismatch: objects cannot be broadcast to a single shape
How does this broadcasted array of shape (3,2) looks like?

The broadcasting is more like:
In [280]: y[np.array([0,2,4])[...,None], np.array([1,2])]
Out[280]:
array([[ 1, 2],
[15, 16],
[29, 30]])
I added a dimension to [0,2,4] making it 2d. broadcast_arrays can be used to see what the broadcasted arrays look like:
In [281]: np.broadcast_arrays(np.array([0,2,4])[...,None], np.array([1,2]))
Out[281]:
[array([[0, 0],
[2, 2],
[4, 4]]),
array([[1, 2],
[1, 2],
[1, 2]])]
np.broadcast_arrays([[0],[2],[4]], [1,2]) is the samething without the array wrappers. np.meshgrid([0,2,4], [1,2], indexing='ij') is another way of producing these indexing arrays.
(the lists produced by meshgrid or broadcast_arrays could be used as the argument for y[_].)
So it's right to say [1,2] is broadcast with the index array, but it omits the bit about adjusting dimensions.
A little earlier they have this example:
y[np.array([0,2,4])]
which is equivalent to y[np.array([0,2,4]), :]. It picks 3 rows, and all items from them. The 1:3 case can be thought of as an extension of this, picking 3 rows, and then 2 columns.
y[[0,2,4],:][:,1:3]
This might be a better way of thinking about the indexing if broadcasting is too confusing.
There's another docs page that might handle this better
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
In this docs, basic indexing involves slices and integers
y[:,1:3], y[1,:], y[1, 1:3]
Advanced indexing involves an array (or list)
y[[0,2,4],:]
This produces the same result as y[::2,:], except the list case produces a copy, the slice (basic) a view.
y[[0,2,4], [1,2,3]] is a case of pure advance index array indexing, the result is 3 items, ones at (0,1), (2,2), and (4,3).
y[[0,2,4], 1:3] is a case that this docs calls Combining advanced and basic indexing, 'advanced' from `[0,2,4]', basic from '1:3'.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
Looking at a more complex index array might add some insight.
In [222]: i=[[0,2],[1,4]]
Used with another list, it is 'pure' advanced, and the result is broadcasted:
In [224]: y[i, [1,2]]
Out[224]:
array([[ 1, 16],
[ 8, 30]])
The index arrays are:
In [234]: np.broadcast_arrays(i, [1,2])
Out[234]:
[array([[0, 2],
[1, 4]]),
array([[1, 2],
[1, 2]])]
The [1,2] list is just expanded to a (2,2) array.
Using it with a slice is an example of this mixed advanced/basic, and the result is 3d (2,2,2).
In [223]: y[i, 1:3]
Out[223]:
array([[[ 1, 2],
[15, 16]],
[[ 8, 9],
[29, 30]]])
The equivalent with broadcasting is
y[np.array(i)[...,None], [1,2]]

y[data,beginIndex:endIndex]
import numpy as np
y = np.arange(35).reshape(5,7)
print(y)
[[ 0 1 2 3 4 5 6]
[ 7 8 9 10 11 12 13]
[14 15 16 17 18 19 20]
[21 22 23 24 25 26 27]
[28 29 30 31 32 33 34]]
print(y[np.array([0,2,4]),1:3])
[[ 1 2]
[15 16]
[29 30]]

You're right that the documentation may be incorrect here, or at least something is missing. I'd file an issue for that, for clarification in the documentation.
In fact, this part of the documentation shows just this example, but then with the exception you get being raised:
>>> y[np.array([0,2,4]), np.array([0,1])]
<type 'exceptions.ValueError'>: shape mismatch: objects cannot be
broadcast to a single shape

Related

Concatenate fails in simple example

I am trying the simple examples of this page
In it it says:
arr=np.array([4,7,12])
arr1=np.array([5,9,15])
np.concatenate((arr,arr1))
# Must give array([ 4, 7, 12, 5, 9, 15])
np.concatenate((arr,arr1),axis=1)
#Must give
#[[4,5],[7,9],[12,15]]
# but it gives *** numpy.AxisError: axis 1 is out of bounds for array of dimension 1
Why is this example not working?
np.vstack is what you're looking for. Note the transpose at the end, this converts vstack's 2x3 result to a 3x2 array.
import numpy as np
arr = np.array([4,7,12])
arr1 = np.array([5,9,15])
a = np.vstack((arr,arr1)).T
print(a)
Output:
[[ 4 5]
[ 7 9]
[12 15]]

Difference between just reshaping and reshaping and getting transpose?

I'm currently studying CS231 assignments and I've realized something confusing. When calculating gradients, when I first reshape x then get transpose I got the correct result.
x_r=x.reshape(x.shape[0],-1)
dw= x_r.T.dot(dout)
However, when I reshape directly as the X.T shape it doesn't return the correct result.
dw = x.reshape(-1,x.shape[0]).dot(dout)
Can someone explain the following question?
How does the order of getting elements with np.reshape() change?
How reshaping (N,d1,d2..dn) shaped array into N,D array differs from getting a reshaped array of (D,N) with its transpose.
While both your approaches result in arrays of same shape, there will by a difference in the order of elements due to the way numpy reads / writes elements. By default, reshape uses a C-like index order, which means the elements are read / written with the last axis index changing fastest, back to the first axis index changing slowest (taken from the documentation).
Here is an example of what that means in practice. Let's assume the following array x:
x = np.asarray([[[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [11, 12]]])
print(x.shape) # (2, 3, 2)
print(x)
# output
[[[ 1 2]
[ 3 4]
[ 5 6]]
[[ 7 8]
[ 9 10]
[11 12]]]
Now let's reshape this array the following two ways:
opt1 = x.reshape(x.shape[0], -1)
opt2 = x.reshape(-1, x.shape[0])
print(opt1.shape) # outptu: (2, 6)
print(opt2.shape) # output: (6, 2)
print(opt1)
# output:
[[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]]
print(opt2)
# output:
[[ 1 2]
[ 3 4]
[ 5 6]
[ 7 8]
[ 9 10]
[11 12]]
reshape first inferred the shape of the new arrays and then returned a view where it read the elements in C-like index order.
To exemplify this on opt1: since the original array x has 12 elements, it inferred that the new array opt1 must have a shape of (2, 6) (because 2*6=12). Now, reshape returns a view where:
opt1[0][0] == x[0][0][0]
opt1[0][1] == x[0][0][1]
opt1[0][2] == x[0][1][0]
opt1[0][3] == x[0][1][1]
opt1[0][4] == x[0][2][0]
opt1[0][5] == x[0][2][1]
opt1[1][0] == x[1][0][0]
...
opt1[1][5] == x[1][2][1]
So as described above, the last axis index changes fastest and the first axis index slowest. In the same way, the output for opt2 will be computed.
You can now verify that transposing the first option will result in the same shape but a different order of elements:
opt1 = opt1.T
print(opt1.shape) # output: (6, 2)
print(opt1)
# output:
[[ 1 7]
[ 2 8]
[ 3 9]
[ 4 10]
[ 5 11]
[ 6 12]]
Obviously, the two approaches do not result in the same array due to element ordering, even though they will have the same shape.

Understanding the slicing of NumPy array

I haven't understood the output of the following program:
import numpy as np
myList = [[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]]
myNumpyArray = np.array(myList)
print(myNumpyArray[0:3, 1:3])
Output
[[ 2 3]
[ 6 7]
[10 11]]
What I knew that would be the intersection of all rows, and 2nd to 4th columns. In that logic, the output should be:
2 3 4
6 7 8
10 11 12
14 15 16
What am I missing here?
The ending indices (the 3's in 0:3 and 1:3) are exclusive, not inclusive, while the starting indices (0 and 1) are in fact inclusive. If the ending indices were inclusive, then the output would be as you expect. But because they're exclusive, you're actually only grabbing rows 0, 1, and 2, and columns 1 and 2. The output is the intersection of those, which is equivalent to the output you're seeing.
If you are trying to get the data you expect, you can do myNumpyArray[:, 1:]. The : simply grabs all the elements of the array (in your case, in the first dimension of the array), and the 1: grabs all the content of the array starting at index 1, ignoring the data in the 0th place.
This is a classic case of just needing to understand slice notation.
inside the brackets, you have the slice for each dimension:
arr[dim1_start:dim1_end, dim2_start, dim2_end]
For the above notation, the slice will include the elements starting at dimX_start, up to, and not including, dimX_end.
So, for what you wrote: myNumpyArray[0:3, 1:3]
you selected rows 0, 1, and 2 (not including 3) and columns 1 and 2 (not including 3)
I hope that helps explain your results.
For the result you were expecting, you would need something more like:
print(myNumpyArray[0:4, 1:4])
For more info on slicing, you might go to the numpy docs or look at a similar question posted a while back.

Python: Access saved points from 2d array in 3d numpy array

I got a 2d numpy array (shape(y,x)=601,1200) and a 3d numpy array (shape(z,y,x)=137,601,1200).
In my 2d array, I saved the z values at the y, x point which I now want to access from my 3d array and save it into a new 2d array.
I tried something like this without success.
levels = array2d.reshape(-1)
y = np.arange(601)
x = np.arange(1200)
newArray2d=oldArray3d[levels,y,x]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (721200,) (601,) (1200,)
I don't want to try something with loops, so is there any faster method?
This is the data you have:
x_len = 12 # In your case, 1200
y_len = 6 # In your case, 601
z_len = 3 # In your case, 137
import numpy as np
my2d = np.random.randint(0,z_len,(y_len,x_len))
my3d = np.random.randint(0,5,(z_len,y_len,x_len))
This is one way to build your new 2d array:
yindices,xindices = np.indices(my2d.shape)
new2d = my3d[my2d[:], yindices, xindices]
Notes:
We're using Integer Advanced Indexing.
This means we index the 3d array my3d with 3 integer index arrays.
For more explanation on how integer array indexing works, please refer to my answer on this other question
In your attempt, there was no need to reshape your 2d with reshape(-1), since the shape of the integer index array that we pass, will (after any broadcasting) become the shape of the resulting 2d array.
Also, in your attempt, your second and third index arrays need to have opposite orientations. That is, they must be of shape (y_len,1) and (1, x_len). Notice the different positions of the 1. This ensures that these two index arrays will get broadcasted
There's some vagueness in your question, but I think you want to advanced indexing like this:
In [2]: arr = np.arange(24).reshape(4,3,2)
In [3]: levels = np.random.randint(0,4,(3,2))
In [4]: levels
Out[4]:
array([[1, 2],
[3, 1],
[0, 2]])
In [5]: arr
Out[5]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]],
[[18, 19],
[20, 21],
[22, 23]]])
In [6]: arr[levels, np.arange(3)[:,None], np.arange(2)]
Out[6]:
array([[ 6, 13],
[20, 9],
[ 4, 17]])
levels is (3,2). I created the other 2 indexing arrays to they broadcast with it (3,1) and (2,). The result is a (3,2) array of values from arr, selected by their combined indices.

Convert a numpy array to an array of numpy arrays

How can I convert numpy array a to numpy array b in a (num)pythonic way. Solution should ideally work for arbitrary dimensions and array lengths.
import numpy as np
a=np.arange(12).reshape(2,3,2)
b=np.empty((2,3),dtype=object)
b[0,0]=np.array([0,1])
b[0,1]=np.array([2,3])
b[0,2]=np.array([4,5])
b[1,0]=np.array([6,7])
b[1,1]=np.array([8,9])
b[1,2]=np.array([10,11])
For a start:
In [638]: a=np.arange(12).reshape(2,3,2)
In [639]: b=np.empty((2,3),dtype=object)
In [640]: for index in np.ndindex(b.shape):
b[index]=a[index]
.....:
In [641]: b
Out[641]:
array([[array([0, 1]), array([2, 3]), array([4, 5])],
[array([6, 7]), array([8, 9]), array([10, 11])]], dtype=object)
It's not ideal since it uses iteration. But I wonder whether it is even possible to access the elements of b in any other way. By using dtype=object you break the basic vectorization that numpy is known for. b is essentially a list with numpy multiarray shape overlay. dtype=object puts an impenetrable wall around those size 2 arrays.
For example, a[:,:,0] gives me all the even numbers, in a (2,3) array. I can't get those numbers from b with just indexing. I have to use iteration:
[b[index][0] for index in np.ndindex(b.shape)]
# [0, 2, 4, 6, 8, 10]
np.array tries to make the highest dimension array that it can, given the regularity of the data. To fool it into making an array of objects, we have to give an irregular list of lists or objects. For example we could:
mylist = list(a.reshape(-1,2)) # list of arrays
mylist.append([]) # make the list irregular
b = np.array(mylist) # array of objects
b = b[:-1].reshape(2,3) # cleanup
The last solution suggests that my first one can be cleaned up a bit:
b = np.empty((6,),dtype=object)
b[:] = list(a.reshape(-1,2))
b = b.reshape(2,3)
I suspect that under the covers, the list() call does an iteration like
[x for x in a.reshape(-1,2)]
So time wise it might not be much different from the ndindex time.
One thing that I wasn't expecting about b is that I can do math on it, with nearly the same generality as on a:
b-10
b += 10
b *= 2
An alternative to an object dtype would be a structured dtype, e.g.
In [785]: b1=np.zeros((2,3),dtype=[('f0',int,(2,))])
In [786]: b1['f0'][:]=a
In [787]: b1
Out[787]:
array([[([0, 1],), ([2, 3],), ([4, 5],)],
[([6, 7],), ([8, 9],), ([10, 11],)]],
dtype=[('f0', '<i4', (2,))])
In [788]: b1['f0']
Out[788]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
In [789]: b1[1,1]['f0']
Out[789]: array([8, 9])
And b and b1 can be added: b+b1 (producing an object dtype). Curiouser and curiouser!
Based on hpaulj I provide a litte more generic solution. a is an array of dimension N which shall be converted to an array b of dimension N1 with dtype object holding arrays of dimension (N-N1).
In the example N equals 5 and N1 equals 3.
import numpy as np
N=5
N1=3
#create array a with dimension N
a=np.random.random(np.random.randint(2,20,size=N))
a_shape=a.shape
b_shape=a_shape[:N1] # shape of array b
b_arr_shape=a_shape[N1:] # shape of arrays in b
#Solution 1 with list() method (faster)
b=np.empty(np.prod(b_shape),dtype=object) #init b
b[:]=list(a.reshape((-1,)+b_arr_shape))
b=b.reshape(b_shape)
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b
#Solution 2 with ndindex loop (slower)
b=np.empty(b_shape,dtype=object)
for index in np.ndindex(b_shape):
b[index]=a[index]
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b

Categories