merging multiple numpy arrays - python

i have 3 numpy arrays which store image data of shape (4,100,100).
arr1= np.load(r'C:\Users\x\Desktop\py\output\a1.npy')
arr2= np.load(r'C:\Users\x\Desktop\py\output\a2.npy')
arr3= np.load(r'C:\Users\x\Desktop\py\output\a3.npy')
I want to merge all 3 arrays into 1 array.
I have tried in this way:
merg_arr = np.zeros((len(arr1)+len(arr2)+len(arr3), 4,100,100), dtype=input_img.dtype)
now this make an array of the required length but I don't know how to copy all the data in this array. may be using a loop?

This will do the trick:
merge_arr = np.concatenate([arr1, arr2, arr3], axis=0)
np.stack arranges arrays along a new dimension. Their dimensions (except for the first) need to match.
Demo:
arr1 = np.empty((60, 4, 10, 10))
arr2 = np.empty((14, 4, 10, 10))
arr3 = np.empty((6, 4, 10, 10))
merge_arr = np.concatenate([arr1, arr2, arr3], axis=0)
print(merge_arr.shape) # (80, 4, 10, 10)

Related

How to stack numpy array along an axis

I have two numpy arrays, one with shape let's say (10, 5, 200), and another one with the shape (1, 200), how can I stack them so I get as a result an array of dimensions (10, 6, 200)? Basically by stacking it to each 2-d array iterating along the first dimension
a = np.random.random((10, 5, 200))
b = np.zeros((1, 200))
I'v tried with hstack and vstack but I get an error in incorrect number of axis
Let's say:
a = np.random.random((10, 5, 200))
b = np.zeros((1, 200))
Let's look at the volume (number of elements) of each array:
The volume of a is 10*5*200 = 10000.
The volume of an array with (10,6,200) is 10*5*200=1200.
That is you want to create an array that has 2000 more elements.
However, the volume of b is 1*200 = 200.
This means a and b can't be stacked.
As hpaulj mentioned in the comments, one way is to define an numpy array and then fill it:
result = np.empty((a.shape[0], a.shape[1] + b.shape[0], a.shape[2]))
result[:, :a.shape[1], :] = a
result[:, a.shape[1]:, :] = b

selecting random elements from each column of numpy array

I have an n row, m column numpy array, and would like to create a new k x m array by selecting k random elements from each column of the array. I wrote the following python function to do this, but would like to implement something more efficient and faster:
def sample_array_cols(MyMatrix, nelements):
vmat = []
TempMat = MyMatrix.T
for v in TempMat:
v = np.ndarray.tolist(v)
subv = random.sample(v, nelements)
vmat = vmat + [subv]
return(np.array(vmat).T)
One question is whether there's a way to loop over each column without transposing the array (and then transposing back). More importantly, is there some way to map the random sample onto each column that would be faster than having a for loop over all columns? I don't have that much experience with numpy objects, but I would guess that there should be something analogous to apply/mapply in R that would work?
One alternative is to randomly generate the indices first, and then use take_along_axis to map them to the original array:
arr = np.random.randn(1000, 5000) # arbitrary
k = 10 # arbitrary
n, m = arr.shape
idx = np.random.randint(0, n, (k, m))
new = np.take_along_axis(arr, idx, axis=0)
Output (shape):
in [215]: new.shape
out[215]: (10, 500) # (k x m)
To sample each column without replacement just like your original solution
import numpy as np
matrix = np.arange(4*3).reshape(4,3)
matrix
Output
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
k = 2
np.take_along_axis(matrix, np.random.rand(*matrix.shape).argsort(axis=0)[:k], axis=0)
Output
array([[ 9, 1, 2],
[ 3, 4, 11]])
I would
Pre-allocate the result array, and fill in columns, and
Use numpy index based indexing
def sample_array_cols(matrix, n_result):
(n,m) = matrix.shape
vmat = numpy.array([n_result, m], dtype= matrix.dtype)
for c in range(m):
random_indices = numpy.random.randint(0, n, n_result)
vmat[:,c] = matrix[random_indices, c]
return vmat
Not quite fully vectorized, but better than building up a list, and the code scans just like your description.

How to join two 3D numpy arrays so that np.arr(1,m,n) + np.arr(1,m,n) = np.arr(2,m,n)

I have several 3-dimensional numpy arrays that I want to join together to feed them as a training set for my LSTM neural network. They are mostly of shape (1,m,n)
I want to join them so that, for e.g. np.arr(1,50,20) + np.arr(1,50,20) = np.arr(2,50,20) and np.arr(1,50,20) + np.arr(3,50,20) = np.arr(4,50,20)
Which of the stack functions of numpy would suit my problem? Or is there another way to solve it more efficiently?
Use numpy concatenate with the first axis.
import numpy as np
rng = np.random.default_rng()
a = rng.integers(0, 10, (1, 3, 20))
b = rng.integers(-10, -1, (2, 3, 20))
c = np.concatenate((a, b), axis=0)
print(c.shape)
(3, 3, 20)
Use np.vstack
x = np.array([[[2,3,5],[4,5,1]]])
y = np.array([[[1,5,8],[8,0,9]]])
x.shape
(1,2,3)
np.vstack((x,y)).shape
(2,2,3)

Python/Numpy broadcast join between two arrays

The question is on how to join two arrays in this case more efficiently- There's a numpy array one of shape (N, M, 1) and array two of shape (M,F). It's required to join the second array with the first, to create an array of the shape (N, M, F+1). The elements of the second array will be broadcast along N.
One solution is copying array 2 to have size of the first (along all dims but one) and then concatenate. But this if the copying can be done as a broadcast during the join/concat it would use much lesser memory.
Any suggestions on how to make this more efficient?
The setup:
import numpy as np
arr1 = np.random.randint(0,10,(5,10))
arr1 = np.expand_dims(arr1, axis=-1) #(5,10, 1)
arr2 = np.random.randint(0,4,(10,15))
arr2 = np.expand_dims(arr2, axis=0) #(1, 10, 15)
arr2_2 = arr2
for i in range(len(arr1)-1):
arr2_2 = np.concatenate([arr2_2, arr2],axis=0)
arr2_2.shape #(5, 10, 15)
np.concatenate([arr1, arr2_2],axis=-1) # (5, 10, 16) -> correct end result
Joining arr1 and arr2 to get
try this
>>> a = np.random.randint(0, 10, (5, 10))
>>> b = np.random.randint(0, 4, (10, 15))
>>> c = np.dstack((a[:, :, np.newaxis], np.broadcast_to(b, (a.shape[0], *b.shape))))
>>> a.shape, b.shape, c.shape
((5, 10), (10, 15), (5, 10, 16)))

Dynamically indexing/choosing the dimension of numpy array

Just working on a CNN and am stuck on a tensor algorithm.
I want to be able to iterate through a list, or tuple, of dimensions and choose a range of elements of X (a multi dimensional array) from that dimension, while leaving the other dimensions alone.
x = np.random.random((10,3,32,32)) #some multi dimensional array
dims = [2,3] #aka the 32s
#for a dimension in dims
#I want the array of numbers from i:i+window in that dimension
#something like
arr1 = x.index(i:i+3,axis = dim[0])
#returns shape 10,3,3,32
arr2 = arr1.index(i:i+3,axis = dim[1])
#returns shape 10,3,3,3
np.take should work for you (read its docs)
In [237]: x=np.ones((10,3,32,32),int)
In [238]: dims=[2,3]
In [239]: arr1=x.take(range(1,1+3), axis=dims[0])
In [240]: arr1.shape
Out[240]: (10, 3, 3, 32)
In [241]: arr2=x.take(range(1,1+3), axis=dims[1])
In [242]: arr2.shape
Out[242]: (10, 3, 32, 3)
You can try slicing with
arr1 = x[:,:,i:i+3,:]
and
arr2 = arr1[:,:,:,i:i+3]
Shape is then
>>> x[:,:,i:i+3,:].shape
(10, 3, 3, 32)

Categories