I have a 3 dimensional numpy array of dimensions 333*333*52
I have a 333 element lists of indices ranging from 0-332 eg [4 12 332 0 ...] that I wish to use to rearrange the first two dimensions of the 3d array
In matlab I would do:
rearranged_array = original_array(new_order, new_order, :)
But this approach does not work with numpy:
rearranged_array = original_array[new_order, new_order, :]
Produces a 333*52 array
While:
rearranged_array = original_array[new_order][new_order, :]
Does not get things in the right order
Edit:
This seems to work:
rearranged_array = original_array[new_order, :][:, new_order]
This seems a lot less intuitive to me than the matlab method - are there any better ways?
Your third one
rearranged_array = original_array[new_order][new_order, :]
is just doing the same operation twice.
You want
rearranged_array = original_array[new_order][:, new_order]
The reason your first solution doesn't work is because numpy only does the rearrangement if the index passed is a list or array, but if you pass new_order, new_order, that is a tuple.
Another solution is to do
rearranged_array = original_array[np.row_stack((new_order, new_order))]
nb. you keep doing things like a[x, y, :] and a[x, :]. Trailing : are superfluous. a[x, y] and a[x] respectively do exactly the same thing.
Related
Say I have one 2d numpy array X with shape (3,3) and one numpy array Y with shape (3,) where
X = np.array([[0,1,2],
[3,4,5],
[1,9,2]])
Y = np.array([[1,0,1]])
How can I create a numpy array, Z for example, from multiplying X,Y element-wise and then summation row-wise?
multiplying element-wise would yield: 0,0,2, 3,0,5, 1,0,2
then, adding each row would yield:
Z = np.array([2,8,3])
I have tried variations of
Z = np.sum(X * Y) --> adds all elements of entire array, not row-wise.
I know I can use a forloop but the dataset is very large and so I am trying to find a more efficient numpy-specific way to perform the operation. Is this possible?
You can do the following:
sum_row = np.sum(X*Y, axis=1) # axis=0 for columnwise
This is an example of my error. Say i created a numpy array
X = np.zeros((1000, 50))
Where 1000 is the features (rows) and 50 is the examples (columns)
Since i am adding examples one by one i will have to replace columns in the array 1 by 1 to get the final feature array. I tried this:
X[:,i] = example
where example is of size (1000, 1), and i is iterated for every example. This does not work because X[:,i] is of shape (1000,), a rank 1 array. How do i code it so that each example replaces a row of the X array without throwing the broadcast error. Thank you.
Reshape your vector before assigning it.
X[:,i] = example.reshape(-1,)
This will suppress the second dimension and turn example into shape (1000,)
Or, avoiding assigning one by one in the loop you can put all of your arrays in a list and then call np.array on your list and transpose it to have them as columns. This will probably work better if you can construct your list of arrays in a list comprehension.
Example:
arrs = [np.random.randint(10, size=5) for _ in range(5)]
X = np.array(arrs).T
I have a 3d Numpy array and would like to take the mean over one axis considering certain elements from the other two dimensions.
This is an example code depicting my problem:
import numpy as np
myarray = np.random.random((5,10,30))
yy = [1,2,3,4]
xx = [20,21,22,23,24,25,26,27,28,29]
mymean = [ np.mean(myarray[t,yy,xx]) for t in np.arange(5) ]
However, this results in:
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Why does an indexing like e.g. myarray[:,[1,2,3,4],[1,2,3,4]] work, but not my code above?
This is how you fancy-index over more than one dimension:
>>> np.mean(myarray[np.arange(5)[:, None, None], np.array(yy)[:, None], xx],
axis=(-1, -2))
array([ 0.49482768, 0.53013301, 0.4485054 , 0.49516017, 0.47034123])
When you use fancy indexing, i.e. a list or array as an index, over more than one dimension, numpy broadcasts those arrays to a common shape, and uses them to index the array. You need to add those extra dimensions of length 1 at the end of the first indexing arrays, for the broadcast to work properly. Here are the rules of the game.
Since you use consecutive elements you can use a slice:
import numpy as np
myarray = np.random.random((5,10,30))
yy = slice(1,5)
xx = slice(20, 30)
mymean = [np.mean(myarray[t, yy, xx]) for t in np.arange(5)]
To answer your question about why it doesn't work: when you use lists/arrays as indices, Numpy uses a different set of indexing semantics than it does if you use slices. You can see the full story in the documentation and, as that page says, it "can be somewhat mind-boggling".
If you want to do it for nonconsecutive elements, you must grok that complex indexing mechanism.
Consider the following simple example:
X = numpy.zeros([10, 4]) # 2D array
x = numpy.arange(0,10) # 1D array
X[:,0] = x # WORKS
X[:,0:1] = x # returns ERROR:
# ValueError: could not broadcast input array from shape (10) into shape (10,1)
X[:,0:1] = (x.reshape(-1, 1)) # WORKS
Can someone explain why numpy has vectors of shape (N,) rather than (N,1) ?
What is the best way to do the casting from 1D array into 2D array?
Why do I need this?
Because I have a code which inserts result x into a 2D array X and the size of x changes from time to time so I have X[:, idx1:idx2] = x which works if x is 2D too but not if x is 1D.
Do you really need to be able to handle both 1D and 2D inputs with the same function? If you know the input is going to be 1D, use
X[:, i] = x
If you know the input is going to be 2D, use
X[:, start:end] = x
If you don't know the input dimensions, I recommend switching between one line or the other with an if, though there might be some indexing trick I'm not aware of that would handle both identically.
Your x has shape (N,) rather than shape (N, 1) (or (1, N)) because numpy isn't built for just matrix math. ndarrays are n-dimensional; they support efficient, consistent vectorized operations for any non-negative number of dimensions (including 0). While this may occasionally make matrix operations a bit less concise (especially in the case of dot for matrix multiplication), it produces more generally applicable code for when your data is naturally 1-dimensional or 3-, 4-, or n-dimensional.
I think you have the answer already included in your question. Numpy allows the arrays be of any dimensionality (while afaik Matlab prefers two dimensions where possible), so you need to be correct with this (and always distinguish between (n,) and (n,1)). By giving one number as one of the indices (like 0 in 3rd row), you reduce the dimensionality by one. By giving a range as one of the indices (like 0:1 in 4th row), you don't reduce the dimensionality.
Line 3 makes perfect sense for me and I would assign to the 2-D array this way.
Here are two tricks that make the code a little shorter.
X = numpy.zeros([10, 4]) # 2D array
x = numpy.arange(0,10) # 1D array
X.T[:1, :] = x
X[:, 2:3] = x[:, None]
I have a multidimensional numpy array, and I need to iterate across a given dimension. Problem is, I won't know which dimension until runtime. In other words, given an array m, I could want
m[:,:,:,i] for i in xrange(n)
or I could want
m[:,:,i,:] for i in xrange(n)
etc.
I imagine that there must be a straightforward feature in numpy to write this, but I can't figure out what it is/what it might be called. Any thoughts?
There are many ways to do this. You could build the right index with a list of slices, or perhaps alter m's strides. However, the simplest way may be to use np.swapaxes:
import numpy as np
m=np.arange(24).reshape(2,3,4)
print(m.shape)
# (2, 3, 4)
Let axis be the axis you wish to loop over. m_swapped is the same as m except the axis=1 axis is swapped with the last (axis=-1) axis.
axis=1
m_swapped=m.swapaxes(axis,-1)
print(m_swapped.shape)
# (2, 4, 3)
Now you can just loop over the last axis:
for i in xrange(m_swapped.shape[-1]):
assert np.all(m[:,i,:] == m_swapped[...,i])
Note that m_swapped is a view, not a copy, of m. Altering m_swapped will alter m.
m_swapped[1,2,0]=100
print(m)
assert(m[1,0,2]==100)
You can use slice(None) in place of the :. For example,
from numpy import *
d = 2 # the dimension to iterate
x = arange(5*5*5).reshape((5,5,5))
s = slice(None) # :
for i in range(5):
slicer = [s]*3 # [:, :, :]
slicer[d] = i # [:, :, i]
print x[slicer] # x[:, :, i]