There is a bunch of questions regarding reshaping of matrices using NumPy here on stackoverflow. I have found one that is closely related to what I am trying to achieve. However, this answer is not general enough for my application. So here we are.
I have got a matrix with millions of lines (shape m x n) that looks like this:
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5],
[6, 6, 6, 6],
[7, 7, 7, 7],
[...]]
From this I would like to go to a shape m/2 x 2n like it can be seen below. For that one has to take n consecutive rows every n rows (in this example n = 2). The blocks of consecutively taken rows are then horizontally stacked to the untouched rows. In this example that would mean:
The first two rows stay like they are.
Take row two and three and horizontally concatenate them to row zero and one.
Take row six and seven and horizontally concatenate them to row four and five. This concatenated block then becomes row two and three.
...
[[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7],
[...]]
How would I most efficiently (in terms of the least computation time possible) do that using Numpy? And would it make sense to speed the process up using Numba? Or is there not much to speed up?
Assuming your array's length is divisible by 4, here one way you can do it using numpy.hstack after creating the correct indices for selecting the rows for the "left" and "right" parts of the resulting array:
import numpy
# Create the array
N = 1000*4
a = np.hstack([np.arange(0, N)[:, None]]*4) #shape (4000, 4)
a
array([[ 0, 0, 0, 0],
[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
...,
[3997, 3997, 3997, 3997],
[3998, 3998, 3998, 3998],
[3999, 3999, 3999, 3999]])
left_idx = np.array([np.array([0,1]) + 4*i for i in range(N//4)]).reshape(-1)
right_idx = np.array([np.array([2,3]) + 4*i for i in range(N//4)]).reshape(-1)
r = np.hstack([a[left_idx], a[right_idx]]) #shape (2000, 8)
r
array([[ 0, 0, 0, ..., 2, 2, 2],
[ 1, 1, 1, ..., 3, 3, 3],
[ 4, 4, 4, ..., 6, 6, 6],
...,
[3993, 3993, 3993, ..., 3995, 3995, 3995],
[3996, 3996, 3996, ..., 3998, 3998, 3998],
[3997, 3997, 3997, ..., 3999, 3999, 3999]])
Here's an application of the swapaxes answer in your link.
In [11]: x=np.array([[0, 0, 0, 0],
...: [1, 1, 1, 1],
...: [2, 2, 2, 2],
...: [3, 3, 3, 3],
...: [4, 4, 4, 4],
...: [5, 5, 5, 5],
...: [6, 6, 6, 6],
...: [7, 7, 7, 7]])
break the array into 'groups' with a reshape, keeping the number of columns (4) unchanged.
In [17]: x.reshape(2,2,2,4)
Out[17]:
array([[[[0, 0, 0, 0],
[1, 1, 1, 1]],
[[2, 2, 2, 2],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[5, 5, 5, 5]],
[[6, 6, 6, 6],
[7, 7, 7, 7]]]])
swap the 2 middle dimensions, regrouping rows:
In [18]: x.reshape(2,2,2,4).transpose(0,2,1,3)
Out[18]:
array([[[[0, 0, 0, 0],
[2, 2, 2, 2]],
[[1, 1, 1, 1],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[6, 6, 6, 6]],
[[5, 5, 5, 5],
[7, 7, 7, 7]]]])
Then back to the target shape. This final step creates a copy of the original (the previous steps were view):
In [19]: x.reshape(2,2,2,4).transpose(0,2,1,3).reshape(4,8)
Out[19]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7]])
It's hard to generalize this, since there are different ways of rearranging blocks. For example my first try produced:
In [16]: x.reshape(4,2,4).transpose(1,0,2).reshape(4,8)
Out[16]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[4, 4, 4, 4, 6, 6, 6, 6],
[1, 1, 1, 1, 3, 3, 3, 3],
[5, 5, 5, 5, 7, 7, 7, 7]])
Related
Assuming i have the following array:
a = array([[[4, 8, 7, 3, 1, 2],
[3, 1, 8, 7, 1, 9],
[0, 0, 3, 0, 7, 6],
[1, 1, 5, 0, 5, 1],
[1, 6, 7, 0, 6, 2]],
[[8, 1, 1, 0, 0, 0],
[2, 8, 1, 6, 4, 9],
[1, 8, 7, 2, 2, 2],
[6, 6, 2, 6, 0, 5],
[3, 2, 2, 0, 6, 8]],
[[4, 6, 3, 2, 1, 4],
[0, 4, 3, 5, 9, 4],
[1, 4, 6, 7, 2, 4],
[6, 3, 5, 7, 7, 8],
[1, 0, 3, 9, 2, 5]],
[[7, 7, 3, 9, 7, 0],
[8, 5, 1, 4, 3, 9],
[9, 7, 9, 5, 4, 9],
[2, 0, 6, 0, 8, 5],
[4, 4, 4, 7, 5, 2]],
[[4, 0, 8, 2, 1, 0],
[2, 4, 0, 7, 3, 7],
[4, 6, 8, 7, 9, 6],
[3, 2, 7, 5, 2, 3],
[7, 6, 3, 0, 1, 5]]])
Is there an easy way to reduce the column values by summing to get the following array:
b = array([[[12, 10, 3],
[4, 15, 10],
[0, 3, 13],
[2, 5, 6],
[7, 7, 8]],
...])
The first row is achieved by:[4+8, 7+3, 1+2]. I know we can use np.sum to merge columns but I am lost on how to select the right columns to add together. Help is greatly appreciated!
You can do simply this:
a.reshape((5,5,3,2)).sum(axis=-1)
reshape((5,5,3,2)) will 'split' the last dimension into groups of 2, the sum(axis=-1) will sum over that last freshly created dimension.
In the general case (if the dimensions of a change) you can also use
a.reshape(a.shape[:-1]+(-1,2)).sum(axis=-1)
We can do this with simple numpy slicing.
b = a[:,:,::2] + a[:,:,1::2]
This tells us to select the whole array, and at the last dimension select the even columns, then the odd columns, and add them together elementwise.
Let's say I have the following array
import numpy as np
matrix = np.array([
[[1, 2, 3, 4], [0, 1], [2, 3, 4, 5]],
[[1, 2, 3], [4], [0, 1], [2, 0], [0, 0]],
[[2, 2], [3, 4, 0], [1, 1, 0, 0], [0]],
[[6, 3, 3, 4, 0], [4, 2, 3, 4, 5]],
[[1, 2, 3, 2], [0, 1, 2], [3, 4, 5]]])
As you can see, it's a staggered array. What I want to do is to sum the elements in a way so that the output is:
[11, 11, 15, 18, 0, 8, 9, 9, 12, 15]
I want to sum the elements in the "columns" of the matrix, but I don't know how to do it.
As mentioned by juanpa.arrivillaga in the comments, you don't have a multi-dimensional array, you have a 1-D array of lists of lists. You need to flatten the inner lists first :
>>> np.array([[z for y in x for z in y] for x in matrix])
array([[1, 2, 3, 4, 0, 1, 2, 3, 4, 5],
[1, 2, 3, 4, 0, 1, 2, 0, 0, 0],
[2, 2, 3, 4, 0, 1, 1, 0, 0, 0],
[6, 3, 3, 4, 0, 4, 2, 3, 4, 5],
[1, 2, 3, 2, 0, 1, 2, 3, 4, 5]])
It should be much easier to solve your problem now. This matrix has a shape of (5,10), and supports T for transposition and np.sum() for summing rows or columns.
You didn't write any code, so I won't solve the problem completely, but with this matrix, you're one step away from:
array([11, 11, 15, 18, 0, 8, 9, 9, 12, 15])
I have a static shape-(l,l) array C. I want to extract portions of it into some other array K, which has shape (m,m,n,n). The starting index of what I want to extract from C is given in array i0, which has shape (m,m).
Some element of K will be given by K[i,j,:,:] = C[i0[i,j]:i0[i,j]+n, i0[i,j]:i0[i,j]+n]. So going off some other similar questions it seemed like this might do the job...
C[i0[None, None, ...] + np.arange(n)[..., None, None],
i0[None, None, ...] + np.arange(n)[..., None, None], I, J]
which raises an IndexError. I guess this is because C is only 2D, and the dimensions can't be increased. Though that could be easily fixed by tiling C, since C is large, that would be rather expensive to remake m*m times.
So my question is how to extract different (2D) portions of a 2D array into corresponding portions of a 4D array.
One way would be with np.meshgrid to create 2D indexing meshes corresponding to the window of (n,n) shape, adding those with i0 that's extended with two new axes along which broadcasting would take place. Finally, we simply index into C to give us the desired 4D output. Thus, one implementation would be like so -
N = np.arange(n)
X,Y = np.meshgrid(N,N)
out = C[i0[...,None,None] + Y,i0[...,None,None] + X]
Sample run -
In [153]: C
Out[153]:
array([[3, 5, 1, 6, 3, 5, 8, 7, 0, 2],
[8, 4, 6, 8, 7, 2, 6, 2, 5, 0],
[3, 7, 7, 7, 3, 4, 4, 6, 7, 6],
[7, 0, 8, 2, 1, 1, 0, 4, 4, 6],
[2, 4, 6, 0, 0, 5, 6, 8, 0, 0],
[4, 6, 1, 0, 5, 6, 2, 1, 7, 4],
[0, 5, 5, 3, 7, 5, 7, 1, 4, 0],
[6, 4, 4, 7, 2, 4, 6, 6, 6, 5],
[5, 2, 3, 2, 2, 5, 4, 5, 2, 5],
[3, 7, 1, 0, 4, 4, 6, 6, 2, 2]])
In [154]: i0
Out[154]:
array([[1, 0, 4, 4],
[0, 4, 4, 0],
[2, 3, 1, 3],
[2, 2, 0, 4]])
In [155]: n = 3
In [157]: out[0,0,:,:]
Out[157]:
array([[4, 6, 8],
[7, 7, 7],
[0, 8, 2]])
In [158]: C[i0[0,0]:i0[0,0]+n,i0[0,0]:i0[0,0]+n]
Out[158]:
array([[4, 6, 8],
[7, 7, 7],
[0, 8, 2]])
For example, x = np.random.randint(low=0, high=10, shape=(6,6)) gives me a 6x6 numpy array:
array([[3, 1, 0, 1, 5, 4],
[2, 9, 9, 4, 8, 8],
[2, 3, 4, 3, 2, 9],
[5, 8, 4, 5, 7, 6],
[3, 0, 8, 1, 8, 0],
[6, 7, 1, 9, 0, 5]])
How can I get a list of, say, all 2x3 submatrices? What about non-overlapping ones?
I could code this in myself, but I'm sure this is a common enough operation that it already exists in numpy, I just can't find it.
Listed in this post is a generic approach to get a list of submatrices with given shape. Based on the order of submatrices being row (C-style) or column major (fortran-way), you would have two choices. Here's the implementation with np.reshape , np.transpose and np.array_split -
def split_submatrix(x,submat_shape,order='C'):
p,q = submat_shape # Store submatrix shape
m,n = x.shape
if np.any(np.mod(x.shape,np.array(submat_shape))!=0):
raise Exception('Input array shape is not divisible by submatrix shape!')
if order == 'C':
x4D = x.reshape(-1,p,n/q,q).transpose(0,2,1,3).reshape(-1,p,q)
return np.array_split(x4D,x.size/(p*q),axis=0)
elif order == 'F':
x2D = x.reshape(-1,n/q,q).transpose(1,0,2).reshape(-1,q)
return np.array_split(x2D,x.size/(p*q),axis=0)
else:
print "Invalid output order."
return x
Sample run with a modified sample input -
In [201]: x
Out[201]:
array([[5, 2, 5, 6, 5, 6, 1, 5],
[1, 1, 8, 4, 4, 5, 2, 5],
[4, 1, 6, 5, 6, 4, 6, 1],
[5, 3, 7, 0, 5, 8, 6, 5],
[7, 7, 0, 6, 5, 2, 5, 4],
[3, 4, 2, 5, 0, 7, 5, 0]])
In [202]: split_submatrix(x,(3,4))
Out[202]:
[array([[[5, 2, 5, 6],
[1, 1, 8, 4],
[4, 1, 6, 5]]]), array([[[5, 6, 1, 5],
[4, 5, 2, 5],
[6, 4, 6, 1]]]), array([[[5, 3, 7, 0],
[7, 7, 0, 6],
[3, 4, 2, 5]]]), array([[[5, 8, 6, 5],
[5, 2, 5, 4],
[0, 7, 5, 0]]])]
In [203]: split_submatrix(x,(3,4),order='F')
Out[203]:
[array([[5, 2, 5, 6],
[1, 1, 8, 4],
[4, 1, 6, 5]]), array([[5, 3, 7, 0],
[7, 7, 0, 6],
[3, 4, 2, 5]]), array([[5, 6, 1, 5],
[4, 5, 2, 5],
[6, 4, 6, 1]]), array([[5, 8, 6, 5],
[5, 2, 5, 4],
[0, 7, 5, 0]])]
I m a little new to python. I have a function named featureExtraction which returns a 1-D array for an image. I need to stack all such 1-d arrays row wise to form a 2-d array. I have the following equivalent code in MATLAB.
I1=imresize(I,[256 256]);
Features(k,:) = featureextraction(I1);
featureextraction returns a 1-d row vector which is stacked row-wise to form a 2-d array. What is the equivalent code snippet in python?
Thank You in advance.
Not sure what you're looking for, but maybe vstack or column_stack?
>>> np.vstack((a,a,a))
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> np.column_stack((a,a,a))
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7],
[8, 8, 8],
[9, 9, 9]])
Or even just np.array:
>>> np.array([a,a,a])
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
You can use numpy.vstack():
a = np.array([1,2,3])
np.vstack((a,a,a))
#array([[1, 2, 3],
# [1, 2, 3],
# [1, 2, 3]])