numpy array not broadcastable - python

This is an example of my error. Say i created a numpy array
X = np.zeros((1000, 50))
Where 1000 is the features (rows) and 50 is the examples (columns)
Since i am adding examples one by one i will have to replace columns in the array 1 by 1 to get the final feature array. I tried this:
X[:,i] = example
where example is of size (1000, 1), and i is iterated for every example. This does not work because X[:,i] is of shape (1000,), a rank 1 array. How do i code it so that each example replaces a row of the X array without throwing the broadcast error. Thank you.

Reshape your vector before assigning it.
X[:,i] = example.reshape(-1,)
This will suppress the second dimension and turn example into shape (1000,)
Or, avoiding assigning one by one in the loop you can put all of your arrays in a list and then call np.array on your list and transpose it to have them as columns. This will probably work better if you can construct your list of arrays in a list comprehension.
Example:
arrs = [np.random.randint(10, size=5) for _ in range(5)]
X = np.array(arrs).T

Related

Extract 2d ndarray from arbitrarily dimensional ndarray using index arrays

I want to extract parts of an numpy ndarray based on arrays of index positions for some of the dimensions. Let me show this on an example
Example data
dummy = np.random.rand(5,2,100)
X = np.array([[0,1],[4,1],[2,0]])
dummy is the original ndarray with dimensionality 5x2x100. This dimensionality is arbitrary, it could as well be 5x2x4x100.
X is a matrix of index values, here X[:,0] are the indices of the first dimension of dummy, X[:,1] those of the second dimension. The number of columns in X is always the number of dimensions in dummy minus 1.
Example output
I want to extract an ndarray of the following form for this example
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Complications
If the number of dimensions in dummy were fixed, this could just be done by dummy[X[:,0],X[:,1],:] . Sadly the dimensionality can be different, e.g. dummy could be a 5x2x4x6x100 ndarray and X correspondingly would then be 3x4 . My attempts at dealing with it have not yielded the desired result.
dummy[X,:] yields a 3x2x2x100 ndarray for this example same as dummy[X]
Iteratively reducing dummy by doing something like dummy = dummy[X[:,i],:] with i an iterator over the number of columns of X also does not reduce the ndarray in the example past 3x2x100
I have a feeling that this should be pretty simple with numpy indexing, but I guess my search for a solution was missing the right terms for this.
Does anyone have a solution to this?
I will try to provide some explainability to #Michael Szczesny answer.
First, notice that if you have an np.array with dimension n and pass m indexes where m<n, then it will be the same as using : in the dimensions >=m. In your case, for example:
dummy[(0, 0)] == dummy[0, 0, :]
Given that, note that you can also pass an array as an index. Thus:
dummy[([0, 1], [0, 0])]
It would be the same as:
np.array([dummy[(0,0)], dummy[(1,0)]])
You can validate that using:
dummy[([0, 1], [0, 0])] == np.array([dummy[(0,0)], dummy[(1,0)]])
Finally, notice that:
(*X.T,)
# (array([0, 4, 2]), array([1, 1, 0]))
You are here getting each dimension as an array, and then you will get:
[
dummy[0,1],
dummy[4,1],
dummy[2,0]
]
Which is the same as:
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Edit: Instead of using (*X.T,), you can use tuple(X.T), which for me, makes more sense
as Michael Szczesny wrote, the best solution is dummy[(*X.T,)].
Since X[:,0] are the indices of the first dimension of dummy and X[:,1] are the indices of the second dimension of dummy, if you transpose X (X.T) you'll have the the indices of the first dimension of dummy as X.T[0] and the indices of the second dimension of dummy as X.T[1].
Now to slice dummy as you want, you can specify the indices of the first and of the second dimension in this way:
dummy[(first_dim_indices, second_dim_indices)] = dummy[(X.T[0], X.T[1])]
In order to simplify the code (and since you doesn't want to transpose the X matrix twice) you can unpack X.T in a tuple as (*X.T,) and so write X[(*X.T,)] is the same thing to write dummy[(X.T[0], X.T[1])].
This writing is also useful if you have an unfixed number of dimensions to slice trough because you will unpack from X.T as many lines as there are dimensions to slice in dummy. For example suppose you want to retrieve an 1D-array from dummy given the following indices:
first_dim: (0, 4, 2)
second_dim: (1, 1, 0)
third_dim: (9, 8, 7)
You can specify the indices of the 3 dimensions as X = np.array([[0,1,9],[4,1,8],[2,0,7]]) and dim[(*X.T,)] is still valid.

Numpy shape function for single dimension array

I've started to learn NumPy, when I create an array and then invoke the .shape function, I understand how it works for most cases. However, the result does not make sense to me for a single-dimensional array. Can someone please explain the outcome?
array = np.array([4,5,6])
print(array.shape)
The outcome is (3,)
Output tulpe of ints in the "np.shape" function gives the lengths of the corresponding array dimension, and this tuple will be (n,m), in which n and m indicate row and columns, respectively. For a single dimension array, this Tuple will be just (n,), in which n indicates the number of array elements.

Appending matricies into a single matrix with numpy

I have a function in Python that returns a numpy.mat of shape (100, 1). I am calling this function 4 times in a loop and would like to take the resulting 4 matricies and create a matrix of shape (100, 4). I have looked for sometime at numpy.append, numpy.concatenate, and numpy.insert but have not been able to get this working.
Here is a short SSCCE of my issue
zeros = np.zeros(shape=(100, 4))
for i in range(1, 5):
np.append(zeros, np.empty(shape=(100, 1)))
print(zeros)
Where zeros should results in a matrix of shape (100, 4) with "junk" values from each of the calls to numpy.empty and not all 0..
Do something along these lines -
zeros = np.zeros(shape=(100, 4))
for i in range(1, 5):
data = np.random.rand(100,1) # func that returns (100,1) shaped array
zeros[:,i-1] = data.ravel()
In place of ravel(), we could also use : data[:,0] or np.squeeze(data), basic idea is to feed a 1D array there, because the LHS zeros[:,i-1] expects a 1D array there.
As an alternative, inside the loop, we could also do -
zeros[:,[i-1]] = data
Thus, with that list of column index [i-1] instead of i-1, we are keeping the dimensions into which data is to be assigned (keeps as 2D) and that allows us to feed in data, which is also 2D without any change.

Multiple Element Indexing in multi-dimensional array

I have a 3d Numpy array and would like to take the mean over one axis considering certain elements from the other two dimensions.
This is an example code depicting my problem:
import numpy as np
myarray = np.random.random((5,10,30))
yy = [1,2,3,4]
xx = [20,21,22,23,24,25,26,27,28,29]
mymean = [ np.mean(myarray[t,yy,xx]) for t in np.arange(5) ]
However, this results in:
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Why does an indexing like e.g. myarray[:,[1,2,3,4],[1,2,3,4]] work, but not my code above?
This is how you fancy-index over more than one dimension:
>>> np.mean(myarray[np.arange(5)[:, None, None], np.array(yy)[:, None], xx],
axis=(-1, -2))
array([ 0.49482768, 0.53013301, 0.4485054 , 0.49516017, 0.47034123])
When you use fancy indexing, i.e. a list or array as an index, over more than one dimension, numpy broadcasts those arrays to a common shape, and uses them to index the array. You need to add those extra dimensions of length 1 at the end of the first indexing arrays, for the broadcast to work properly. Here are the rules of the game.
Since you use consecutive elements you can use a slice:
import numpy as np
myarray = np.random.random((5,10,30))
yy = slice(1,5)
xx = slice(20, 30)
mymean = [np.mean(myarray[t, yy, xx]) for t in np.arange(5)]
To answer your question about why it doesn't work: when you use lists/arrays as indices, Numpy uses a different set of indexing semantics than it does if you use slices. You can see the full story in the documentation and, as that page says, it "can be somewhat mind-boggling".
If you want to do it for nonconsecutive elements, you must grok that complex indexing mechanism.

How to assign a 1D numpy array to 2D numpy array?

Consider the following simple example:
X = numpy.zeros([10, 4]) # 2D array
x = numpy.arange(0,10) # 1D array
X[:,0] = x # WORKS
X[:,0:1] = x # returns ERROR:
# ValueError: could not broadcast input array from shape (10) into shape (10,1)
X[:,0:1] = (x.reshape(-1, 1)) # WORKS
Can someone explain why numpy has vectors of shape (N,) rather than (N,1) ?
What is the best way to do the casting from 1D array into 2D array?
Why do I need this?
Because I have a code which inserts result x into a 2D array X and the size of x changes from time to time so I have X[:, idx1:idx2] = x which works if x is 2D too but not if x is 1D.
Do you really need to be able to handle both 1D and 2D inputs with the same function? If you know the input is going to be 1D, use
X[:, i] = x
If you know the input is going to be 2D, use
X[:, start:end] = x
If you don't know the input dimensions, I recommend switching between one line or the other with an if, though there might be some indexing trick I'm not aware of that would handle both identically.
Your x has shape (N,) rather than shape (N, 1) (or (1, N)) because numpy isn't built for just matrix math. ndarrays are n-dimensional; they support efficient, consistent vectorized operations for any non-negative number of dimensions (including 0). While this may occasionally make matrix operations a bit less concise (especially in the case of dot for matrix multiplication), it produces more generally applicable code for when your data is naturally 1-dimensional or 3-, 4-, or n-dimensional.
I think you have the answer already included in your question. Numpy allows the arrays be of any dimensionality (while afaik Matlab prefers two dimensions where possible), so you need to be correct with this (and always distinguish between (n,) and (n,1)). By giving one number as one of the indices (like 0 in 3rd row), you reduce the dimensionality by one. By giving a range as one of the indices (like 0:1 in 4th row), you don't reduce the dimensionality.
Line 3 makes perfect sense for me and I would assign to the 2-D array this way.
Here are two tricks that make the code a little shorter.
X = numpy.zeros([10, 4]) # 2D array
x = numpy.arange(0,10) # 1D array
X.T[:1, :] = x
X[:, 2:3] = x[:, None]

Categories