I'm trying to load MNIST dataset into arrays.
When I use
(X_train, y_train), (X_test, y_test)= mnist.load_data()
I get an array y_test(10000,) but I want it to be in the shape of (10000,1).
What is the difference between array(10000,1) and array(10000,)?
How can I convert the first array to the second array?
Your first Array with shape (10000,) is a 1-Dimensional np.ndarray.
Since the shape attribute of numpy Arrays is a Tuple and a tuple of length 1 needs a trailing comma the shape is (10000,) and not (10000) (which would be an int). So currently your data looks like this:
import numpy as np
a = np.arange(5) # >>> array([0, 1, 2, 3, 4]
print(a.shape) # >>> (5,)
What you want is an 2-Dimensional array with shape of (10000, 1).
Adding a dimension of length 1 doesn't require any additional data, it is basically and "empty" dimension. To add an dimension to an existing array you can use either np.expand_dims() or np.reshape().
Using np.expand_dims:
import numpy as np
b = np.array(np.arange(5)) # >>> array([0, 1, 2, 3, 4])
b = np.expand_dims(b, axis=1) # >>> array([[0],[1],[2],[3],[4]])
The function was specifically made for the purpose of adding empty dimensions to arrays. The axis keyword specifies which position the newly added dimension will occupy.
Using np.reshape:
import numpy as np
a = np.arange(5)
X_test_reshaped = np.reshape(a, shape=[-1, 1]) # >>> array([[0],[1],[2],[3],[4]])
The shape=[-1, 1] specifies how the new shape should look like after the reshape operation. The -1 itself will be replaced by the shape that 'fits the data' by numpy internally.
Reshape is a more powerful function than expand_dims and can be used in many different ways. You can read more on other uses of it in the numpy docs. numpy.reshape()
An array with a size of (10,1) is a 2D array containing empty columns.
An array with a size of (10,) is a 1D array.
To convert (10,1) to (10,), you can simply collapse the columns. For example, we take the x array, which has x.shape = (10,1). now using x[:,] you can collapse the columns and x[:,].shape = (10,).
To convert (10,) to (10,1), you can add a dimension by using np.newaxis. So, after import numpy as np, assuming we are using numpy arrays here. Take a y array for example, which has y.shape = (10,). Using y[:, np.newaxis], you can a new array with the shape of (10,1).
Related
Assume the following code:
import numpy as np
x = np.random.random([2, 4, 50])
y = np.random.random([2, 4, 60])
z = [x, y]
z = np.array(z, dtype=object)
This gives a ValueError: could not broadcast input array from shape (2,4,50) into shape (2,4)
I can understand why this error would occur since the trailing (last) dimension of both arrays is different and a numpy array cannot store arrays with varying dimensions.
However, I happen to have a MAT-file which when loaded in Python through the io.loadmat() function in scipy, contains a np.ndarray with the following properties:
from scipy import io
mat = io.loadmat(file_name='gt.mat')
print(mat.shape)
> (1, 250)
print(mat[0].shape, mat[0].dtype)
> (250,) dtype('O')
print(mat[0][0].shape, mat[0][0].dtype)
> (2, 4, 54), dtype('<f8')
print(mat[0][1].shape, mat[0][1].dtype)
> (2, 4, 60), dtype('<f8')
This is pretty confusing for me. How is the array mat[0] in this file holding numpy arrays with different trailing dimensions as objects while being a np.ndarray itself and I am not able do so myself?
When calling np.array on a nested array, it will try to stack the arrays anyway. Note that you are dealing with objects in both cases. It is still possible. One way would be to first create an empty array of objects and then fill in the values.
z = np.empty(2, dtype=object)
z[0] = x
z[1] = y
Like in this answer.
I am trying to convert 2D array with one column into 1D vector using np.newaxis. The result I got so far is 3D array instead of 1D vector or 1D array.
The 2D array y1 is:
y1.shape
(506, 1)
y1
array([[0.42 ],
[0.36666667],
[0.66 ],
[0.63333333],
[0.69333333],
... ])
Now I'd like to convert it into 1D array
import numpy as np
y2=y1[np.newaxis,:]
y2.shape
(1, 506, 1)
You can see after using np.newaxis, the shape of y2 become a 3D array, I am expecting the shape of (506,) 1D array.
what is the problem of my above code? Thanks
np.newaxis expand dimension so 2D -> 3D. If you want to reduce your dimension 2D -> 1D, use squeeze:
>>> a
array([[0.42 ],
[0.36666667],
[0.66 ],
[0.63333333],
[0.69333333]])
>>> a.shape()
(5, 1)
>>> a.squeeze()
array([0.42 , 0.36666667, 0.66 , 0.63333333, 0.69333333])
>>> a.squeeze().shape
(5,)
From the documentation:
Each newaxis object in the selection tuple serves to expand the dimensions of the resulting selection by one unit-length dimension. The added dimension is the position of the newaxis object in the selection tuple.
np.newaxis() is used to increase the dimension of the array. It will not decrease the dimension. In order to decrease the dimension, you can use:
reshape()
y1 = np.array(y1).reshape(-1,)
print(y1.shape)
>>> (506,)
If I create a np.zeros array called 'sparsity' of shape (3,1,3,16) and then create a another numpy array of shape (3,16) called 'per_channel_sparsity'.
Is this the correct way to "replace" each of the 3 (3,16) matrices in 'sparsity' with the per_channel_sparsity matrix?
import numpy as np
sparsity = np.zeros((3,1,3,16)).astype(np.uint8)
per_channel_sparsity = np.random.rand((3,16)).astype(np.uint8)
for i in range(3):
sparsity[i, 0, :, :] = per_channel_sparsity
I have one array with dimension (1538,4) called X_scaled and another array with dimensions (1538,1) called Y_mlp. I want to add Y_mlp to X_scaled such that Y_mlp becomes the fifth column in X_scaled. How can I do this?
You're looking for np.hstack.
numpy.hstack(tup)
Take a sequence of arrays and stack them horizontally to make a single
array.
import numpy as np
X_scaled, Y_mlp = ..., ...
Y_mlp = Y_mlp.reshape(-1, 1)
out = np.hstack((X_scaled, Y_mlp))
print(out.shape)
Output:
(5, 5)
Concatenation occurs along the second dimension.
I'm trying to input vectors into a numpy matrix by doing:
eigvec[:,i] = null
However I keep getting the error:
ValueError: could not broadcast input array from shape (20,1) into shape (20)
I've tried using flatten and reshape, but nothing seems to work
The shapes in the error message are a good clue.
In [161]: x = np.zeros((10,10))
In [162]: x[:,1] = np.ones((1,10)) # or x[:,1] = np.ones(10)
In [163]: x[:,1] = np.ones((10,1))
...
ValueError: could not broadcast input array from shape (10,1) into shape (10)
In [166]: x[:,1].shape
Out[166]: (10,)
In [167]: x[:,[1]].shape
Out[167]: (10, 1)
In [168]: x[:,[1]] = np.ones((10,1))
When the shape of the destination matches the shape of the new value, the copy works. It also works in some cases where the new value can be 'broadcasted' to fit. But it does not try more general reshaping. Also note that indexing with a scalar reduces the dimension.
I can guess that
eigvec[:,i] = null.flat
would work (however, null.flatten() should work too). In fact, it looks like NumPy complains because of you are assigning a pseudo-1D array (shape (20, 1)) to a 1D array which is considered to be oriented differently (shape (1, 20), if you wish).
Another solution would be:
eigvec[:,i] = null.T
where you properly transpose the "vector" null.
The fundamental point here is that NumPy has "broadcasting" rules for converting between arrays with different numbers of dimensions. In the case of conversions between 2D and 1D, a 1D array of size n is broadcast into a 2D array of shape (1, n) (and not (n, 1)). More generally, missing dimensions are added to the left of the original dimensions.
The observed error message basically said that shapes (20,) and (20, 1) are not compatible: this is because (20,) becomes (1, 20) (and not (20, 1)). In fact, one is a column matrix, while the other is a row matrix.