How to stack multiple 2D numpy arrays in a 3D numpy array - python

I am extracting features from Audio clips. In doing so for 1 clip a matrix of 20x2 dimension is obtained. I have around 1000 of such clips. I want to store all the data in 1 numpy array of dimension 20x2x1000. Please suggest a method for the same.

The function you're looking for is np.stack. It's used to stack multiple NumPy arrays along a new axis.
import numpy as np
# Generate 1000 features
original_features = [np.random.rand(20, 2) for i in range(1000)]
# Stack them into one array
stacked_features = np.stack(original_features, axis=2)
assert stacked_features.shape == (20, 2, 1000)

There is a convenient function for this and that is numpy.dstack. Below is a snippet of code for depth stacking of arrays:
# whatever the number of arrays that you have
In [4]: tuple_of_arrs = tuple(np.random.randn(20, 2) for _ in range(10))
# stack each of the arrays along third axis
In [7]: depth_stacked = np.dstack(tuple_of_arrs)
In [8]: depth_stacked.shape
Out[8]: (20, 2, 10)

Related

Numpy array with elements of different last axis dimensions

Assume the following code:
import numpy as np
x = np.random.random([2, 4, 50])
y = np.random.random([2, 4, 60])
z = [x, y]
z = np.array(z, dtype=object)
This gives a ValueError: could not broadcast input array from shape (2,4,50) into shape (2,4)
I can understand why this error would occur since the trailing (last) dimension of both arrays is different and a numpy array cannot store arrays with varying dimensions.
However, I happen to have a MAT-file which when loaded in Python through the io.loadmat() function in scipy, contains a np.ndarray with the following properties:
from scipy import io
mat = io.loadmat(file_name='gt.mat')
print(mat.shape)
> (1, 250)
print(mat[0].shape, mat[0].dtype)
> (250,) dtype('O')
print(mat[0][0].shape, mat[0][0].dtype)
> (2, 4, 54), dtype('<f8')
print(mat[0][1].shape, mat[0][1].dtype)
> (2, 4, 60), dtype('<f8')
This is pretty confusing for me. How is the array mat[0] in this file holding numpy arrays with different trailing dimensions as objects while being a np.ndarray itself and I am not able do so myself?
When calling np.array on a nested array, it will try to stack the arrays anyway. Note that you are dealing with objects in both cases. It is still possible. One way would be to first create an empty array of objects and then fill in the values.
z = np.empty(2, dtype=object)
z[0] = x
z[1] = y
Like in this answer.

I have a list of 2 dimensional arrays. How to convert the list into a 3 dimensional numpy array?

L is a list of 1000 arrays of size (300,300).
So I have to convert L into a numpy array(3D) of size (300,300)x 1000.
Use np.stack. It requires a sequence (you have a list which will work) of arrays to stack along a new axis. Note that axis=-1 gives you the axis specification you have asked for.
# list of arrays made of random noise
x_list = [np.random.normal(0.0, 1.0, size=(300, 300)) for _ in range(1000)]
# array made of arrays in list
x_array = np.stack(x_list, axis=-1)
print("Shape of array: ", x_array.shape)
This gives
>>> Shape of array: (300, 300, 1000)

Subtract Mean from Multidimensional Numpy-Array

I'm currently learning about broadcasting in Numpy and in the book I'm reading (Python for Data Analysis by Wes McKinney the author has mentioned the following example to "demean" a two-dimensional array:
import numpy as np
arr = np.random.randn(4, 3)
print(arr.mean(0))
demeaned = arr - arr.mean(0)
print(demeaned)
print(demeand.mean(0))
Which effectively causes the array demeaned to have a mean of 0.
I had the idea to apply this to an image-like, three-dimensional array:
import numpy as np
arr = np.random.randint(0, 256, (400,400,3))
demeaned = arr - arr.mean(2)
Which of course failed, because according to the broadcasting rule, the trailing dimensions have to match, and that's not the case here:
print(arr.shape) # (400, 400, 3)
print(arr.mean(2).shape) # (400, 400)
Now, i have gotten it to work mostly, by substracting the mean from every single index in the third dimension of the array:
demeaned = np.ones(arr.shape)
for i in range(3):
demeaned[...,i] = arr[...,i] - means
print(demeaned.mean(0))
At this point, the returned values are very close to zero and i think, that's a precision error. Am i actually right with this thought or is there another caveat, that i missed?
Also, this doesn't seam to be the cleanest, most 'numpy'-way to achieve what i wanted to achieve. Is there a function or a principle that i can make use of to improve the code?
As of numpy version 1.7.0, np.mean, and several other functions, accept a tuple in their axis parameter. This means that you can perform the operation on the planes of the image all at once:
m = arr.mean(axis=(0, 1))
This mean will have shape (3,), with one element for each plane of the image.
If you want to subtract the means of each pixel individually, you have to remember that broadcasting aligns shape tuples on the right edge. That means that you need to insert an extra dimension:
n = arr.mean(axis=2)
n = n.reshape(*n.shape, 1)
Or
n = arr.mean(axis=2)[..., None]
Try np.apply_along_axis().
np.apply_along_axis(lambda x: x - np.mean(x), 2, arr)
Output: you get the array of the same shape where each cell is demeaned in the dimension you want (the second parameter, here it is 2).

Advanced Indexing in 3 Dimensional Numpy ndarray In Python

I have a ndarray of shape (68, 64, 64) called 'prediction'. These dimensions correspond to image_number, height, width. For each image, I have a tuple of length two that contains coordinates that corresponds to a particular location in each 64x64 image, for example (12, 45). I can stack these coordinates into another Numpy ndarray of shape (68,2) called 'locations'.
How can I construct a slice object or construct the necessary advanced indexing indices to access these locations without using a loop? Looking for help on the syntax. Using pure Numpy matrixes without loops is the goal.
Working loop structure
Import numpy as np
# example code with just ones...The real arrays have 'real' data.
prediction = np.ones((68,64,64), dtype='float32')
locations = np.ones((68,2), dtype='uint32')
selected_location_values = np.empty(prediction.shape[0], dtype='float32')
for index, (image, coordinates) in enumerate(zip(prediction, locations)):
selected_locations_values[index] = image[coordinates]
Desired approach
selected_location_values = np.empty(prediction.shape[0], dtype='float32')
correct_indexing = some_function_here(locations). # ?????
selected_locations_values = predictions[correct_indexing]
A straightforward indexing should work:
img = np.arange(locations.shape[0])
r = locations[:, 0]
c = locations[:, 1]
selected_locations_values = predictions[img, r, c]
Fancy indexing works by selecting elements of the indexed array that correspond to the shape of the broadcasted indices. In this case, the indices are quite straightforward. You just need the range to tell you what image each location corresponds to.

Iteratively appending ndarray arrays using numpy in Python

I am trying to figure out how to iteratively append 2D arrays to generate a singular larger array. On each iteration a 16x200 ndarray is generated as seen below:
For each iteration a new 16x200 array is generated, I would like to 'append' this to the previously generated array for a total of N iterations. For example for two iterations the first generated array would be 16x200 and for the second iteration the newly generated 16x200 array would be appended to the first creating a 16x400 sized array.
train = np.array([])
for i in [1, 2, 1, 2]:
spike_count = [0, 0, 0, 0]
img = cv2.imread("images/" + str(i) + ".png", 0) # Read the associated image to be classified
k = np.array(temporallyEncode(img, 200, 4))
# Somehow append k to train on each iteration
In the case of the above embedded code the loop iterates 4 times so the final train array is expected to be 16x800 in size. Any help would be greatly appreciated, I have drawn a blank on how to successfully accomplish this. The code below is a general case:
import numpy as np
totalArray = np.array([])
for i in range(1,3):
arrayToAppend = totalArray = np.zeros((4, 200))
# Append arrayToAppend to totalArray somehow
While it is possible to perform a concatenate (or one of the 'stack' variants) at each iteration, it is generally faster to accumulate the arrays in a list, and perform the concatenate once. List append is simpler and faster.
alist = []
for i in range(0,3):
arrayToAppend = totalArray = np.zeros((4, 200))
alist.append(arrayToAppend)
arr = np.concatenate(alist, axis=1) # to get (4,600)
# hstack does the same thing
# vstack is the same, but with axis=0 # (12,200)
# stack creates new dimension, # (3,4,200), (4,3,200) etc
Try using numpy hstack. From the documention, hstack takes a sequence of arrays and stack them horizontally to make a single array.
For example:
import numpy as np
x = np.zeros((16, 200))
y = x.copy()
for i in xrange(5):
y = np.hstack([y, x])
print y.shape
Gives:
(16, 400)
(16, 600)
(16, 800)
(16, 1000)
(16, 1200)

Categories