Numpy array with elements of different last axis dimensions - python

Assume the following code:
import numpy as np
x = np.random.random([2, 4, 50])
y = np.random.random([2, 4, 60])
z = [x, y]
z = np.array(z, dtype=object)
This gives a ValueError: could not broadcast input array from shape (2,4,50) into shape (2,4)
I can understand why this error would occur since the trailing (last) dimension of both arrays is different and a numpy array cannot store arrays with varying dimensions.
However, I happen to have a MAT-file which when loaded in Python through the io.loadmat() function in scipy, contains a np.ndarray with the following properties:
from scipy import io
mat = io.loadmat(file_name='gt.mat')
print(mat.shape)
> (1, 250)
print(mat[0].shape, mat[0].dtype)
> (250,) dtype('O')
print(mat[0][0].shape, mat[0][0].dtype)
> (2, 4, 54), dtype('<f8')
print(mat[0][1].shape, mat[0][1].dtype)
> (2, 4, 60), dtype('<f8')
This is pretty confusing for me. How is the array mat[0] in this file holding numpy arrays with different trailing dimensions as objects while being a np.ndarray itself and I am not able do so myself?

When calling np.array on a nested array, it will try to stack the arrays anyway. Note that you are dealing with objects in both cases. It is still possible. One way would be to first create an empty array of objects and then fill in the values.
z = np.empty(2, dtype=object)
z[0] = x
z[1] = y
Like in this answer.

Related

Numpy: How to multiply (N,N) and (N,N,M,M) numpy arrays?

I want to multiply two numpy arrays. One numpy array is given by matrix of shape (10, 10) and the other is given by a matrix of matrices, i.e. shape (10, 10, 256, 256).
I now simply want to multiply each matrix in the second matrix of matrices with the corresponding component in the first matrix. For instance, the matrix at position (0, 0) in the second matrix shall be multiplied by the value at position (0, 0) in the first matrix.
Intuitively, this is not really complicated, but numpy does not seem to support that. Or at least I am not smart enough to make it work. The ValueError that is thrown says:
ValueError: operands could not be broadcast together with shapes (10,10) (10,10,256,256)
Can anybody of you help me please? How can I achieve what I want in a numpyy way.
You can use the NumPy einsum function, e.g., (using zeros arrays as dummies in this example):
import numpy as np
x = np.zeros((10, 10))
y = np.zeros((10, 10, 256, 256))
z = np.einsum("ij,ijkm->km", x, y)
print(z.shape)
(256, 256)
See here for a nice description of einsum's usage.

Combining list of 2D numpy arrays

How do I combine N, 2D numpy arrays (of dimension R x C) to create a 3D numpy array of shape (N, R, C)? Right now, the N-2D numpy arrays are contained inside a list, and I want that to become a 3D numpy array. Let's say X is my list of 2D numpy arrays, if I just do np.array(X), I get something of shape (N,). If I do np.vstack(X), I get something of shape (N x R, C). How do I solve this problem?
You can use np.stack:
test = np.stack([np.ones([2, 3]) for _ in range(4)])
print(test.shape) # (4, 2, 3)
you could just use :
np.array([np.array(x) for x in ArrayList])

How to fuse two axis of a n-dimentional array in Python

Instead of a n-dimentional array, let's take a 3D array to illustrate my question :
>>> import numpy as np
>>> arr = np.ones(24).reshape(2, 3, 4)
So I have an array of shape (2, 3, 4). I would like to concatenate/fuse the 2nd and 3rd axis together to get an array of the shape (2, 12).
Wrongly, thought I could have done it easily with np.concatenate :
>>> np.concatenate(arr, axis=1).shape
(3, 8)
I found a way to do it by a combination of np.rollaxis and np.concatenate but it is increasingly ugly as the array goes up in dimension:
>>> np.rollaxis(np.concatenate(np.rollaxis(arr, 0, 3), axis=0), 0, 2).shape
(2, 12)
Is there any simple way to accomplish this? It seems very trivial, so there must exist some function, but I cannot seem to find it.
EDIT : Indeed I could use np.reshape, which means to compute the dimensions of the axis first. Is it possible without accessing/computing the shape beforehand?
On recent python versions you can do:
anew = a.reshape(*a.shape[:k], -1, *a.shape[k+2:])
I recommend against directly assigning to .shape since it doesn't work on sufficiently noncontiguous arrays.
Let's say that you have n dimensions in your array and that you want to fuse adjacent axis i and i+1:
shape = a.shape
new_shape = list(shape[:i]) + [-1] + list(shape[i+2:])
a.shape = new_shape

python difference between array(10,1) array(10,)

I'm trying to load MNIST dataset into arrays.
When I use
(X_train, y_train), (X_test, y_test)= mnist.load_data()
I get an array y_test(10000,) but I want it to be in the shape of (10000,1).
What is the difference between array(10000,1) and array(10000,)?
How can I convert the first array to the second array?
Your first Array with shape (10000,) is a 1-Dimensional np.ndarray.
Since the shape attribute of numpy Arrays is a Tuple and a tuple of length 1 needs a trailing comma the shape is (10000,) and not (10000) (which would be an int). So currently your data looks like this:
import numpy as np
a = np.arange(5) # >>> array([0, 1, 2, 3, 4]
print(a.shape) # >>> (5,)
What you want is an 2-Dimensional array with shape of (10000, 1).
Adding a dimension of length 1 doesn't require any additional data, it is basically and "empty" dimension. To add an dimension to an existing array you can use either np.expand_dims() or np.reshape().
Using np.expand_dims:
import numpy as np
b = np.array(np.arange(5)) # >>> array([0, 1, 2, 3, 4])
b = np.expand_dims(b, axis=1) # >>> array([[0],[1],[2],[3],[4]])
The function was specifically made for the purpose of adding empty dimensions to arrays. The axis keyword specifies which position the newly added dimension will occupy.
Using np.reshape:
import numpy as np
a = np.arange(5)
X_test_reshaped = np.reshape(a, shape=[-1, 1]) # >>> array([[0],[1],[2],[3],[4]])
The shape=[-1, 1] specifies how the new shape should look like after the reshape operation. The -1 itself will be replaced by the shape that 'fits the data' by numpy internally.
Reshape is a more powerful function than expand_dims and can be used in many different ways. You can read more on other uses of it in the numpy docs. numpy.reshape()
An array with a size of (10,1) is a 2D array containing empty columns.
An array with a size of (10,) is a 1D array.
To convert (10,1) to (10,), you can simply collapse the columns. For example, we take the x array, which has x.shape = (10,1). now using x[:,] you can collapse the columns and x[:,].shape = (10,).
To convert (10,) to (10,1), you can add a dimension by using np.newaxis. So, after import numpy as np, assuming we are using numpy arrays here. Take a y array for example, which has y.shape = (10,). Using y[:, np.newaxis], you can a new array with the shape of (10,1).

Append value to each array in a numpy array

I have a numpy array of arrays, for example:
x = np.array([[1,2,3],[10,20,30]])
Now lets say I want to extend each array with [4,40], to generate the following resulting array:
[[1,2,3,4],[10,20,30,40]]
How can I do this without making a copy of the whole array? I tried to change the shape of the array in place but it throws a ValueError:
x[0] = np.append(x[0],4)
x[1] = np.append(x[1],40)
ValueError : could not broadcast input array from shape (4) into shape (3)
You can't do this. Numpy arrays allocate contiguous blocks of memory, if at all possible. Any change to the array size will force an inefficient copy of the whole array. You should use Python lists to grow your structure if possible, then convert the end result back to an array.
However, if you know the final size of the resulting array, you could instantiate it with something like np.empty() and then assign values by index, rather than appending. This does not change the size of the array itself, only reassigns values, so should not require copying.
While #roganjosh is right that you cannot modify the numpy arrays without making a copy (in the underlying process), there is a simpler way of appending each value of an ndarray to the end of each numpy array in a 2d ndarray, by using numpy.column_stack
x = np.array([[1,2,3],[10,20,30]])
array([[ 1, 2, 3],
[10, 20, 30]])
stack_y = np.array([4,40])
array([ 4, 40])
numpy.column_stack((x, stack_y))
array([[ 1, 2, 3, 4],
[10, 20, 30, 40]])
Create a new matrix
Insert the values of your old matrix
Then, insert your new values in the last positions
x = np.array([[1,2,3],[10,20,30]])
new_X = np.zeros((2, 4))
new_X[:2,:3] = x
new_X[0][-1] = 4
new_X[1][-1] = 40
x=new_X
Or Use np.reshape() or np.resize() instead

Categories