I got a basic error with strange output that I do not understand verywell:
step to reproduce
arr1 = np.zeros([6,10,50])
arr2 = np.zeros([6,10])
arr1[:, :, range(25,26,1)] = [arr2]
That generate this error:
ValueError: shape mismatch: value array of shape (1,6,10) could not be broadcast to indexing result of shape (1,6,10)
Could anyone explain what I'm doing wrong ?
Add an extra dimension to arr2:
arr1[:, :, range(25,26,1)] = arr2.reshape(arr2.shape + (1,))
Easier notation for range as used here:
arr1[:, :, 25:26)] = arr2.reshape(arr2.shape + (1,))
(and slice(25,26,1), or slice(25,26), could also work; just to add to the options and possible confusion.)
Or insert an extra axis at the end of arr2:
arr1[..., 25:26] = arr2[..., np.newaxis]
(where ... means "as many dimensions as possible"). You can also use None instead of np.newaxis; the latter is probably more explicit, but anyone knowing NumPy will recognise None as inserting an extra dimension (axis).
Of course, you could also set arr2 to be 3-dimensional from the start:
arr2 = np.zeros([6,10,1])
Note that broadcasting does work when used from the left:
>>> arr1 = np.zeros([50,6,10]) # Swapped ("rolled") dimensions
>>> arr2 = np.zeros([6,10])
>>> arr1[25:26, :, :] = arr2 # No need to add an extra axis
It's just that it doesn't work when used from the right, as in your code.
Since range(25, 26, 1) is actually a single number, you could use either:
arr1[:, :, 25:26] = arr2[..., None]
or:
arr1[:, :, 25] = arr2
in place of arr1[:, :, range(25,26,1)] = [arr2].
Note that for ranges/slices that do not reduce to a single number the first line would use broadcasting.
The reason why your original code does not work is that you are mixing NumPy arrays and Python lists in a non-compatible way because NumPy interprets [arr2] as having shape (1, 6, 10) while the result expects a shape (6, 10, 1) (the error you are getting is substantially about that.)
The above solution targets at making sure that arr2 is in a compatible shape.
Another possibility would have been to change the shape of the recipient, which would allow you to assign [arr2], e.g.:
arr1 = np.zeros([50,6,10])
arr2 = np.zeros([6,10])
arr1[25:26, :, :] = [arr2]
This method may be less efficient though, since arr2[..., None] is just a memory view of the same data in arr2, while [arr2] is creating (read: allocating new memory for) a new list object, which would require some casting (happening under the hood) to be assigned to a NumPy array.
Related
I'm using numpy and want to index a row without losing the dimension information.
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:]
xslice.shape # >> (10,)
In this example xslice is now 1 dimension, but I want it to be (1,10).
In R, I would use X[10,:,drop=F]. Is there something similar in numpy. I couldn't find it in the documentation and didn't see a similar question asked.
Thanks!
Another solution is to do
X[[10],:]
or
I = array([10])
X[I,:]
The dimensionality of an array is preserved when indexing is performed by a list (or an array) of indexes. This is nice because it leaves you with the choice between keeping the dimension and squeezing.
It's probably easiest to do x[None, 10, :] or equivalently (but more readable) x[np.newaxis, 10, :]. None or np.newaxis increases the dimension of the array by 1, so that you're back to the original after the slicing eliminates a dimension.
As far as why it's not the default, personally, I find that constantly having arrays with singleton dimensions gets annoying very quickly. I'd guess the numpy devs felt the same way.
Also, numpy handle broadcasting arrays very well, so there's usually little reason to retain the dimension of the array the slice came from. If you did, then things like:
a = np.zeros((100,100,10))
b = np.zeros(100,10)
a[0,:,:] = b
either wouldn't work or would be much more difficult to implement.
(Or at least that's my guess at the numpy dev's reasoning behind dropping dimension info when slicing)
I found a few reasonable solutions.
1) use numpy.take(X,[10],0)
2) use this strange indexing X[10:11:, :]
Ideally, this should be the default. I never understood why dimensions are ever dropped. But that's a discussion for numpy...
Here's an alternative I like better. Instead of indexing with a single number, index with a range. That is, use X[10:11,:]. (Note that 10:11 does not include 11).
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10:11,:]
xslice.shape # >> (1,10)
This makes it easy to understand with more dimensions too, no None juggling and figuring out which axis to use which index. Also no need to do extra bookkeeping regarding array size, just i:i+1 for any i that you would have used in regular indexing.
b = np.ones((2, 3, 4))
b.shape # >> (2, 3, 4)
b[1:2,:,:].shape # >> (1, 3, 4)
b[:, 2:3, :].shape . # >> (2, 1, 4)
To add to the solution involving indexing by lists or arrays by gnebehay, it is also possible to use tuples:
X[(10,),:]
This is especially annoying if you're indexing by an array that might be length 1 at runtime. For that case, there's np.ix_:
some_array[np.ix_(row_index,column_index)]
I've been using np.reshape to achieve the same as shown below
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:].reshape(1, -1)
xslice.shape # >> (1, 10)
I'm using numpy and want to index a row without losing the dimension information.
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:]
xslice.shape # >> (10,)
In this example xslice is now 1 dimension, but I want it to be (1,10).
In R, I would use X[10,:,drop=F]. Is there something similar in numpy. I couldn't find it in the documentation and didn't see a similar question asked.
Thanks!
Another solution is to do
X[[10],:]
or
I = array([10])
X[I,:]
The dimensionality of an array is preserved when indexing is performed by a list (or an array) of indexes. This is nice because it leaves you with the choice between keeping the dimension and squeezing.
It's probably easiest to do x[None, 10, :] or equivalently (but more readable) x[np.newaxis, 10, :]. None or np.newaxis increases the dimension of the array by 1, so that you're back to the original after the slicing eliminates a dimension.
As far as why it's not the default, personally, I find that constantly having arrays with singleton dimensions gets annoying very quickly. I'd guess the numpy devs felt the same way.
Also, numpy handle broadcasting arrays very well, so there's usually little reason to retain the dimension of the array the slice came from. If you did, then things like:
a = np.zeros((100,100,10))
b = np.zeros(100,10)
a[0,:,:] = b
either wouldn't work or would be much more difficult to implement.
(Or at least that's my guess at the numpy dev's reasoning behind dropping dimension info when slicing)
I found a few reasonable solutions.
1) use numpy.take(X,[10],0)
2) use this strange indexing X[10:11:, :]
Ideally, this should be the default. I never understood why dimensions are ever dropped. But that's a discussion for numpy...
Here's an alternative I like better. Instead of indexing with a single number, index with a range. That is, use X[10:11,:]. (Note that 10:11 does not include 11).
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10:11,:]
xslice.shape # >> (1,10)
This makes it easy to understand with more dimensions too, no None juggling and figuring out which axis to use which index. Also no need to do extra bookkeeping regarding array size, just i:i+1 for any i that you would have used in regular indexing.
b = np.ones((2, 3, 4))
b.shape # >> (2, 3, 4)
b[1:2,:,:].shape # >> (1, 3, 4)
b[:, 2:3, :].shape . # >> (2, 1, 4)
To add to the solution involving indexing by lists or arrays by gnebehay, it is also possible to use tuples:
X[(10,),:]
This is especially annoying if you're indexing by an array that might be length 1 at runtime. For that case, there's np.ix_:
some_array[np.ix_(row_index,column_index)]
I've been using np.reshape to achieve the same as shown below
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:].reshape(1, -1)
xslice.shape # >> (1, 10)
I am creating a numpy array using the following:
X = np.linspace(-5, 5, num=500)
This generates points evenly sampled 500 points between -5 and 5. The shape of the resulting array is: (500,). Now, I need to pass it to a function that expects a 2-D array. So, I can reshape it as:
X = X.reshape((500, 1))
However, I noticed that X = X[:, None] has the same effect. For the life of me though, I cannot understand what this syntax is doing. Hoping someone can shed some light on this.
The syntax X[: ,None] is actually the same as:
X[:, np.newaxis]
which is adding a new dimension to your original array.
The Python interpreter translates
x[:,None]
to
x.__getitem__((slice(None,None,None), None))
and the ndarray implementation of __getitem__ acts in much the same way as x.reshape(500,1). Implementation details will differ, but the effect is the same. `
So at a syntax level, it's just normal Python. But the numpy semantics give it a distinctive meaning.
x[:, np.newaxis]
may be clearer, but np.newaxis is just an alias for None:
In [48]: np.newaxis is None
Out[48]: True
I generally use MATLAB and Octave, and i recently switching to python numpy.
In numpy when I define an array like this
>>> a = np.array([[2,3],[4,5]])
it works great and size of the array is
>>> a.shape
(2, 2)
which is also same as MATLAB
But when i extract the first entire column and see the size
>>> b = a[:,0]
>>> b.shape
(2,)
I get size (2,), what is this? I expect the size to be (2,1). Perhaps i misunderstood the basic concept. Can anyone make me clear about this??
A 1D numpy array* is literally 1D - it has no size in any second dimension, whereas in MATLAB, a '1D' array is actually 2D, with a size of 1 in its second dimension.
If you want your array to have size 1 in its second dimension you can use its .reshape() method:
a = np.zeros(5,)
print(a.shape)
# (5,)
# explicitly reshape to (5, 1)
print(a.reshape(5, 1).shape)
# (5, 1)
# or use -1 in the first dimension, so that its size in that dimension is
# inferred from its total length
print(a.reshape(-1, 1).shape)
# (5, 1)
Edit
As Akavall pointed out, I should also mention np.newaxis as another method for adding a new axis to an array. Although I personally find it a bit less intuitive, one advantage of np.newaxis over .reshape() is that it allows you to add multiple new axes in an arbitrary order without explicitly specifying the shape of the output array, which is not possible with the .reshape(-1, ...) trick:
a = np.zeros((3, 4, 5))
print(a[np.newaxis, :, np.newaxis, ..., np.newaxis].shape)
# (1, 3, 1, 4, 5, 1)
np.newaxis is just an alias of None, so you could do the same thing a bit more compactly using a[None, :, None, ..., None].
* An np.matrix, on the other hand, is always 2D, and will give you the indexing behavior you are familiar with from MATLAB:
a = np.matrix([[2, 3], [4, 5]])
print(a[:, 0].shape)
# (2, 1)
For more info on the differences between arrays and matrices, see here.
Typing help(np.shape) gives some insight in to what is going on here. For starters, you can get the output you expect by typing:
b = np.array([a[:,0]])
Basically numpy defines things a little differently than MATLAB. In the numpy environment, a vector only has one dimension, and an array is a vector of vectors, so it can have more. In your first example, your array is a vector of two vectors, i.e.:
a = np.array([[vec1], [vec2]])
So a has two dimensions, and in your example the number of elements in both dimensions is the same, 2. Your array is therefore 2 by 2. When you take a slice out of this, you are reducing the number of dimensions that you have by one. In other words, you are taking a vector out of your array, and that vector only has one dimension, which also has 2 elements, but that's it. Your vector is now 2 by _. There is nothing in the second spot because the vector is not defined there.
You could think of it in terms of spaces too. Your first array is in the space R^(2x2) and your second vector is in the space R^(2). This means that the array is defined on a different (and bigger) space than the vector.
That was a lot to basically say that you took a slice out of your array, and unlike MATLAB, numpy does not represent vectors (1 dimensional) in the same way as it does arrays (2 or more dimensions).
I have a multidimensional numpy array, and I need to iterate across a given dimension. Problem is, I won't know which dimension until runtime. In other words, given an array m, I could want
m[:,:,:,i] for i in xrange(n)
or I could want
m[:,:,i,:] for i in xrange(n)
etc.
I imagine that there must be a straightforward feature in numpy to write this, but I can't figure out what it is/what it might be called. Any thoughts?
There are many ways to do this. You could build the right index with a list of slices, or perhaps alter m's strides. However, the simplest way may be to use np.swapaxes:
import numpy as np
m=np.arange(24).reshape(2,3,4)
print(m.shape)
# (2, 3, 4)
Let axis be the axis you wish to loop over. m_swapped is the same as m except the axis=1 axis is swapped with the last (axis=-1) axis.
axis=1
m_swapped=m.swapaxes(axis,-1)
print(m_swapped.shape)
# (2, 4, 3)
Now you can just loop over the last axis:
for i in xrange(m_swapped.shape[-1]):
assert np.all(m[:,i,:] == m_swapped[...,i])
Note that m_swapped is a view, not a copy, of m. Altering m_swapped will alter m.
m_swapped[1,2,0]=100
print(m)
assert(m[1,0,2]==100)
You can use slice(None) in place of the :. For example,
from numpy import *
d = 2 # the dimension to iterate
x = arange(5*5*5).reshape((5,5,5))
s = slice(None) # :
for i in range(5):
slicer = [s]*3 # [:, :, :]
slicer[d] = i # [:, :, i]
print x[slicer] # x[:, :, i]