I have a numpy array of shape 28 x 1875. Each element is a 3-element list (only floats). I need to split each of these elements to individual ones, to obtain an array of shape 28x5625(1875*3). I've tried np.split, however it only separates each element, but no each sub-element. Is there a fast way to do this?
Making a 2d array of lists:
In [522]: arr = np.empty(6,object)
In [523]: arr[:] = [list(range(i,i+3)) for i in range(6)]
In [524]: arr = arr.reshape(2,3)
In [525]: arr
Out[525]:
array([[list([0, 1, 2]), list([1, 2, 3]), list([2, 3, 4])],
[list([3, 4, 5]), list([4, 5, 6]), list([5, 6, 7])]], dtype=object)
It's easier to fill such an array if it is 1d, which is why I start with (6,) and reshape after.
Paul Panzer's suggestion:
In [526]: np.array(arr.tolist())
Out[526]:
array([[[0, 1, 2],
[1, 2, 3],
[2, 3, 4]],
[[3, 4, 5],
[4, 5, 6],
[5, 6, 7]]])
In [527]: _.reshape(2,-1)
Out[527]:
array([[0, 1, 2, 1, 2, 3, 2, 3, 4],
[3, 4, 5, 4, 5, 6, 5, 6, 7]])
You can also use np.stack (a version of np.concatenate) to create a nd array. It does though, require a 1d object array - hence the ravel:
In [536]: np.stack(arr.ravel())
Out[536]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7]])
That can be reshaped as needed:
In [537]: np.stack(arr.ravel()).reshape(2,-1)
Out[537]:
array([[0, 1, 2, 1, 2, 3, 2, 3, 4],
[3, 4, 5, 4, 5, 6, 5, 6, 7]])
In some cases we need to transpose axes to get the desired order.
Related
import Numpy as np
a = np.array([[5, 1, 8, 1, 6, 1, 3, 2],[2, 3, 4, 1, 6, 1, 4, 2]])
n = 2
[(a[0:2, i:i+n]).sum(axis=1) for i in range(0,a.shape[1],n)]
The output is:
[array([6, 5]), array([9, 5]), array([7, 7]), array([5, 6])]
How can I get a 2D array instead of 4 arrays I got in above output...Is there a better more elegant way doing this using reshape?
You can use sliding_window_view:
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view
a = np.array([[5, 1, 8, 1, 6, 1, 3, 2], [2, 3, 4, 1, 6, 1, 4, 2]])
n = 2
out = sliding_window_view(a, (n, n)).sum(axis=n+1)[:, ::n].squeeze()
Output:
array([[6, 5],
[9, 5],
[7, 7],
[5, 6]])
You can always stack the results to get a single array:
import numpy as np
a = np.array([[5, 1, 8, 1, 6, 1, 3, 2],[2, 3, 4, 1, 6, 1, 4, 2]])
n = 2
np.stack([(a[0:2, i:i+n]).sum(axis=1) for i in range(0,a.shape[1],n)])
Giving you:
array([[6, 5],
[9, 5],
[7, 7],
[5, 6]])
Alternatively, you can reshape the array and sum:
a.T.reshape(-1, n, 2).sum(axis=1)
Giving the same result. But note, the array will need to be reshape-able to the given n, so n=4 is fine n=3 is an error.
Another solution:
from numpy.lib.stride_tricks import sliding_window_view
a = np.array([[5, 1, 8, 1, 6, 1, 3, 2],[2, 3, 4, 1, 6, 1, 4, 2]])
n = 2
print(sliding_window_view(a, (n,n))[:,::n].sum(axis=n+1))
Prints:
[[[6 5]
[9 5]
[7 7]
[5 6]]]
Let's say I have data structured in a 2D array like this:
[[1, 3, 4, 6],
[1, 4, 8, 2],
[1, 3, 2, 9],
[2, 2, 4, 8],
[2, 4, 9, 1],
[2, 2, 9, 3]]
The first column denotes a third dimension, so I want to convert this to the following 3D array:
[[[3, 4, 6],
[4, 8, 2],
[3, 2, 9]],
[[2, 4, 8],
[4, 9, 1],
[2, 9, 3]]]
Is there a built-in numpy function to do this?
You can try code below:
import numpy as np
array = np.array([[1, 3, 4, 6],
[1, 4, 8, 2],
[1, 3, 2, 9],
[2, 2, 4, 8],
[2, 4, 9, 1],
[2, 2, 9, 3]])
array = np.delete(array, 0, 1)
array.reshape(2,3,-1)
Output
array([[[3, 4, 6],
[4, 8, 2],
[3, 2, 9]],
[[2, 4, 8],
[4, 9, 1],
[2, 9, 3]]])
However, this code can be used when you are aware of the array's shape. But if you are sure that the number of columns in the array is a multiple of 3, you can simply use code below to show the array in the desired format.
array.reshape(array.shape[0]//3,3,-3)
Use numpy array slicing with reshape function.
import numpy as np
arr = [[1, 3, 4, 6],
[1, 4, 8, 2],
[1, 3, 2, 9],
[2, 2, 4, 8],
[2, 4, 9, 1],
[2, 2, 9, 3]]
# convert the list to numpy array
arr = np.array(arr)
# remove first column from numpy array
arr = arr[:,1:]
# reshape the remaining array to desired shape
arr = arr.reshape(len(arr)//3,3,-1)
print(arr)
Output:
[[[3 4 6]
[4 8 2]
[3 2 9]]
[[2 4 8]
[4 9 1]
[2 9 3]]]
You list a non numpy array. I am unsure if you are just suggesting numpy as a means to get a non numpy result, or you are actually looking for a numpy array as result. If you don't actually need numpy, you could do something like this:
arr = [[1, 3, 4, 6],
[1, 4, 8, 2],
[1, 3, 2, 9],
[2, 2, 4, 8],
[2, 4, 9, 1],
[2, 2, 9, 3]]
# Length of the 3rd and 2nd dimension.
nz = arr[-1][0] + (arr[0][0]==0)
ny = int(len(arr)/nz)
res = [[arr[ny*z_idx+y_idx][1:] for y_idx in range(ny)] for z_idx in range(nz)]
OUTPUT:
[[[3, 4, 6], [4, 8, 2], [3, 2, 9]], [[2, 4, 8], [4, 9, 1], [2, 9, 3]]]
Note that the calculation of nz takes into account that the 3rd dimension index in your array is either 0-based (as python is per default) or 1-based (as you show in your example).
Let's say I have this numpy matrix:
>>> mat = np.matrix([[3,4,5,2,1], [1,2,7,6,5], [8,9,4,5,2]])
>>> mat
matrix([[3, 4, 5, 2, 1],
[1, 2, 7, 6, 5],
[8, 9, 4, 5, 2]])
Now let's say I have some indexes in this form:
>>> ind = np.matrix([[0,2,3], [0,4,2], [3,1,2]])
>>> ind
matrix([[0, 2, 3],
[0, 4, 2],
[3, 1, 2]])
What I would like to do is to get three values from each row of the matrix, specifically values at columns 0, 2, and 3 for the first row, values at columns 0, 4 and 2 for the second row, etc. This is the expected output:
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
I've tried using np.take but it doesn't seem to work. Any suggestion?
This is take_along_axis.
>>> np.take_along_axis(mat, ind, axis=1)
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
This will do it: mat[np.arange(3).reshape(-1, 1), ind]
In [245]: mat[np.arange(3).reshape(-1, 1), ind]
Out[245]:
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
(but take_along_axis in #user3483203's answer is simpler).
I have a numpy array of shape (1429,1) where each row itself is a numpy array of shape (3,100) where l may vary from row to row.
How can I reshape this array by flattening each row such that the resulting numpy array will have the shape (1429, 300)?
I guess your initial array's shape is (1429, 3, 100), if that's true, you can change it's shape as below:
import numpy as np
a = a.flatten().reshape((1429, 300)) #a is the initial numpy array
The type of your embedding structure is probably object. It's just a collection of references on 1429 numpy.ndarrays.
As an exemple :
a=np.empty((1429,1),object)
for x in a :
x[0]=np.random.rand(3,100)
In [19]: a.shape,a.dtype
Out[19]: ((1429, 1), dtype('O'))
In [20]: a[0,0].shape
Out[20]: (3, 100)
The structure is probably not contiguous. To obtain a block containing all your data, you must reconstruct it to obtain the good layout :
b=np.array([x.ravel() for x in a.ravel()])
In [21]: b.shape
Out[21]: (1429, 300)
ravel discard unwanted dimensions.
Assuming it is an object dtype array with shape (1429,1), and all elements are 2d of shape (3,100), a good way to 'flatten' is to use concatenate or stack.
np.stack(arr.ravel()).reshape(-1,300)
I use arr.ravel() so the array looks like a (1429) element list to stack. stack then concatenates the elements, creating a (1429, 3, 100) array. The reshape then converts that to (1429, 300).
In [939]: arr = np.empty((5,1),object)
In [940]: arr[:,0] = [np.arange(6).reshape(2,3) for _ in range(5)]
In [941]: arr
Out[941]:
array([[array([[0, 1, 2],
[3, 4, 5]])],
[array([[0, 1, 2],
[3, 4, 5]])],
[array([[0, 1, 2],
[3, 4, 5]])],
[array([[0, 1, 2],
[3, 4, 5]])],
[array([[0, 1, 2],
[3, 4, 5]])]], dtype=object)
In [942]: np.stack(arr.ravel())
Out[942]:
array([[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]]])
In [943]: np.stack(arr.ravel()).reshape(-1,6)
Out[943]:
array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]])
np.stack with the default axis=0 is the same as np.array(...).
Or with concatenate
In [950]: np.concatenate(arr.ravel(),axis=0)
Out[950]:
array([[0, 1, 2],
[3, 4, 5],
[0, 1, 2],
[3, 4, 5],
[0, 1, 2],
[3, 4, 5],
[0, 1, 2],
[3, 4, 5],
[0, 1, 2],
[3, 4, 5]])
In [951]: np.concatenate(arr.ravel(),axis=0).reshape(5,6)
Out[951]:
array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]])
I have an array of a of shape (N,k) and another array b of shape (N,). I want to check if the ith value in b is contained in a[i]. If it is not present, I want to replace a[i,k] with b[i]. An example:
a = np.array([[1, 2, 2, 3, 4, 5],
[1, 2, 3, 3, 4, 5],
[1, 2, 3, 4, 4, 5],
[1, 2, 3, 4, 5, 5],
[1, 2, 3, 4, 5, 6]])
b = np.array([1,7,3,8,9])
The output array should look like this:
np.array([[1, 2, 2, 3, 4, 5],
[1, 2, 3, 3, 4, 7],
[1, 2, 3, 4, 4, 5],
[1, 2, 3, 4, 5, 8],
[1, 2, 3, 4, 5, 9]])
Writing loops over N seems to be very inefficient. In my dataset typically N is of the order of 10 million while k is about 50 to 100. Is there an efficient way to vectorize this using numpy functions?
The indices where to replace can be found doing:
s = a - b[:, None]
TOL = 1.e-6
ind = np.where(~(np.abs(s) <= TOL).any(axis=1))[0]
and thanks to NumPy's fancy indexing you can update your array in-place without for loops:
a[ind, :] = b[ind][:, None]