efficiently reshaping 3D numpy array - python

assuming I have a 2D numpy array and want to reshape it with strides into 3D, what would be the best way to do that?
little example:
def find_ngrams(input_list, n):
return np.array(list(zip(*[input_list[i:] for i in range(n)])))
x = np.array(range(15))
x = x.reshape((5,3))
print(x)
print(x.shape)
res = find_ngrams(x, 3)
print(res.shape)
print(res)
This returns the expected result correctly:
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]
[12 13 14]]
(5, 3)
(3, 3, 3)
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
[[ 6 7 8]
[ 9 10 11]
[12 13 14]]]
However, how can I do this more efficiently, preferably using stride_tricks?

Here's how I would do it with as_strided:
window_length=3
strides = x.strides
new_len = (x.shape[0]-window_length+1)
out = as_strided(x,shape=(window_length, new_len, x.shape[1]),
strides=(strides[0],) + (strides[0], strides[1]))
Output:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]],
[[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]]])

Related

Matlab reshape equivalent in Python

I'm currently porting a MATLAB library over to Python. As of right now, I am trying to keep the code as one-to-one as possible. I'm noticing some differences between reshape in Matlab vs Python that is causing some issues.
I've heard people talk about the difference in 'C' and 'Fortran' order. How numpy defaults to 'C' order and Matlab uses 'Fortran'. Below are two Python examples using both orders.
>>> a = np.arange(12).reshape((2,3,2))
>>> a
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
>>> b = np.arange(12).reshape((2,3,2), order='F')
>>> b
array([[[ 0, 6],
[ 2, 8],
[ 4, 10]],
[[ 1, 7],
[ 3, 9],
[ 5, 11]]])
Below is the matlab/octave equivalent to the above python code.
octave:12> a = reshape((0:11), [3,2,2])
a =
ans(:,:,1) =
0 3
1 4
2 5
ans(:,:,2) =
6 9
7 10
8 11
Notice that each example yields a different result.
These examples are meant to illustrate the discrepancy that I'm referring to. The datasets that I'm working on in my project are significantly larger. I need to be able to reshape arrays in Python and be confident that it is performing the same reshape operations as it would in Matlab. Any help would be appreciated.
Why are you using a (2,3,2) shape in one, and (3,2,2) in the other?
In [82]: arr = np.arange(12).reshape((3,2,2), order='F')
In [83]: arr
Out[83]:
array([[[ 0, 6],
[ 3, 9]],
[[ 1, 7],
[ 4, 10]],
[[ 2, 8],
[ 5, 11]]])
In [84]: arr[:,:,0]
Out[84]:
array([[0, 3],
[1, 4],
[2, 5]])
In [85]: arr[:,:,1]
Out[85]:
array([[ 6, 9],
[ 7, 10],
[ 8, 11]])
===
Looking a strides may help identify the differences between c and f orders
In [86]: arr.shape
Out[86]: (3, 2, 2)
In [87]: arr.strides
Out[87]: (8, 24, 48)
Notice how the smallest steps, 1 element (8 bytes) is taken in first dimension.
Contrast that with a C order:
In [89]: np.arange(12).reshape(2,2,3)
Out[89]:
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
In [90]: np.arange(12).reshape(2,2,3).strides
Out[90]: (48, 24, 8)
===
OK lets try the (2,3,2) shape:
>> a = reshape((0:11),[2,3,2])
a =
ans(:,:,1) =
0 2 4
1 3 5
ans(:,:,2) =
6 8 10
7 9 11
Samething with order 'F':
In [94]: arr = np.arange(12).reshape((2,3,2), order='F')
In [95]: arr
Out[95]:
array([[[ 0, 6],
[ 2, 8],
[ 4, 10]],
[[ 1, 7],
[ 3, 9],
[ 5, 11]]])
In [96]: arr[:,:,0]
Out[96]:
array([[0, 2, 4],
[1, 3, 5]])
>> squeeze(a(1,:,:))
ans =
0 6
2 8
4 10
In [98]: arr[0,:,:]
Out[98]:
array([[ 0, 6],
[ 2, 8],
[ 4, 10]])

Create meshgrids from 2D slices off 3D array

I have 3D numpy array.
import numpy as np
X = np.arange(12).reshape(2, 2, 3)
print(X)
[[[ 0 1 2]
[ 3 4 5]]
[[ 6 7 8]
[ 9 10 11]]]
I would like to vectorize the following for all 2D array in 3D array. For example, for 1st 2D array:
ss = np.array(np.meshgrid(*X[0]), dtype=object).T.reshape(-1,2)
print(ss)
[[0 3]
[0 4]
[0 5]
[1 3]
[1 4]
[1 5]
[2 3]
[2 4]
[2 5]]
I tried following:
def f(x):
return np.array(np.meshgrid(*x), dtype=object).T.reshape(-1,2)
ff = np.apply_along_axis(f, 0, X)
print(ff)
Here's a generic solution that uses one-loop and scales to generic shapes. It assigns into an initialized array and broadcasts to replicate values, where it achieves memory efficiency. It works for any length along the second axis of X. Hence, the implementation would be -
def meshgrid_2D_blocks(X):
m,n,r = X.shape
out_shp = [m]+[r]*n+[n]
out = np.empty(out_shp,dtype=X.dtype)
# Assign each block iteratively
shp = [-1]+[1]*n
for i in range(n):
shp[i+1] = r
out[...,i] = X[:,i].reshape(shp)
shp[i+1] = 1
return out.reshape(m,-1,n)
Sample runs
Case #1 : Second axis of length=2
In [167]: X = np.arange(12).reshape(2, 2, 3)
In [168]: X
Out[168]:
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
In [169]: meshgrid_2D_blocks(X)
Out[169]:
array([[[ 0, 3],
[ 0, 4],
[ 0, 5],
[ 1, 3],
[ 1, 4],
[ 1, 5],
[ 2, 3],
[ 2, 4],
[ 2, 5]],
[[ 6, 9],
[ 6, 10],
[ 6, 11],
[ 7, 9],
[ 7, 10],
[ 7, 11],
[ 8, 9],
[ 8, 10],
[ 8, 11]]])
Case #2 : Second axis of length=3
In [170]: X = np.arange(12).reshape(2, 3, 2)
In [171]: X
Out[171]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
In [172]: meshgrid_2D_blocks(X)
Out[172]:
array([[[ 0, 2, 4],
[ 0, 2, 5],
[ 0, 3, 4],
[ 0, 3, 5],
[ 1, 2, 4],
[ 1, 2, 5],
[ 1, 3, 4],
[ 1, 3, 5]],
[[ 6, 8, 10],
[ 6, 8, 11],
[ 6, 9, 10],
[ 6, 9, 11],
[ 7, 8, 10],
[ 7, 8, 11],
[ 7, 9, 10],
[ 7, 9, 11]]])
This is one way to achieve that:
import numpy as np
X = np.arange(12).reshape(2, 2, 3)
out = np.stack(np.broadcast_arrays(X[:, 0, :, None], X[:, 1, None, :]), -1).reshape(len(X), -1, 2)
print(out)
# [[[ 0 3]
# [ 0 4]
# [ 0 5]
# [ 1 3]
# [ 1 4]
# [ 1 5]
# [ 2 3]
# [ 2 4]
# [ 2 5]]
#
# [[ 6 9]
# [ 6 10]
# [ 6 11]
# [ 7 9]
# [ 7 10]
# [ 7 11]
# [ 8 9]
# [ 8 10]
# [ 8 11]]]

Reshaping Sequence Numpy Pandas python

I have tried this:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.read_csv("test.csv")
>>> df
input1 input2 input3 input4 input5 input6 input7 input8 output
0 1 2 3 4 5 6 7 8 1
1 2 3 4 5 6 7 8 9 0
2 3 4 5 6 7 8 9 10 -1
3 4 5 6 7 8 9 10 11 -1
4 5 6 7 8 9 10 11 12 1
5 6 7 8 9 10 11 12 13 0
6 7 8 9 10 11 12 13 14 1
>>> seq_len=3
>>> data = []
>>> data_raw = df.values
>>> for index in range(len(data_raw) - seq_len + 1):
... data.append(data_raw[index: index + seq_len])
...
>>> data
[array([[ 1, 2, 3, 4, 5, 6, 7, 8, 1],
[ 2, 3, 4, 5, 6, 7, 8, 9, 0],
[ 3, 4, 5, 6, 7, 8, 9, 10, -1]], dtype=int64), array([[ 2, 3, 4, 5, 6, 7, 8, 9, 0],
[ 3, 4, 5, 6, 7, 8, 9, 10, -1],
[ 4, 5, 6, 7, 8, 9, 10, 11, -1]], dtype=int64), array([[ 3, 4, 5, 6, 7, 8, 9, 10, -1],
[ 4, 5, 6, 7, 8, 9, 10, 11, -1],
[ 5, 6, 7, 8, 9, 10, 11, 12, 1]], dtype=int64), array([[ 4, 5, 6, 7, 8, 9, 10, 11, -1],
[ 5, 6, 7, 8, 9, 10, 11, 12, 1],
[ 6, 7, 8, 9, 10, 11, 12, 13, 0]], dtype=int64), array([[ 5, 6, 7, 8, 9, 10, 11, 12, 1],
[ 6, 7, 8, 9, 10, 11, 12, 13, 0],
[ 7, 8, 9, 10, 11, 12, 13, 14, 1]], dtype=int64)]
>>> data = np.asarray(data)
>>> data.shape
(5, 3, 9)
>>> data_reshape = data.reshape(5,9,3)
>>> data_reshape
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 1],
[ 2, 3, 4],
[ 5, 6, 7],
[ 8, 9, 0],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, -1]],
[[ 2, 3, 4],
[ 5, 6, 7],
[ 8, 9, 0],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, -1],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, -1]],
[[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, -1],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, -1],
[ 5, 6, 7],
[ 8, 9, 10],
[11, 12, 1]],
[[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, -1],
[ 5, 6, 7],
[ 8, 9, 10],
[11, 12, 1],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 0]],
[[ 5, 6, 7],
[ 8, 9, 10],
[11, 12, 1],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 0],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 1]]], dtype=int64)
I was willing to have the series as:
array([[[1,2,3],
[2,3,4],
[3,4,5],
[4,5,6],
[5,6,7],
[6,7,8],
[7,8,9],
[8,9,10],
[1,0,-1]],
[[2,3,4],
[3,4,5],
[4,5,6],
[5,6,7],
[6,7,8],
[7,8,9],
[8,9,10],
[9,10,11],
[0,-1,-1]],
[[3,4,5],
[4,5,6],
[5,6,7],
[6,7,8],
[7,8,9],
[8,9,10],
[9,10,11],
[10,11,12],
[-1,-1,1]],
[[4,5,6],
[5,6,7],
[6,7,8],
[7,8,9],
[8,9,10],
[9,10,11],
[10,11,12],
[11,12,13],
[-1,1,0]],
[[5,6,7],
[6,7,8],
[7,8,9],
[8,9,10],
[9,10,11],
[10,11,12],
[11,12,13],
[12,13,14],
[1,0,1]]], dtype=int64)
Kindly, help me achieve this.
I have tried the data you have supplied in the question. I understood what you wanted to have. See the following:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.read_csv("test.csv")
>>> df
input1 input2 input3 input4 input5 input6 input7 input8 output
0 1 2 3 4 5 6 7 8 1
1 2 3 4 5 6 7 8 9 0
2 3 4 5 6 7 8 9 10 -1
3 4 5 6 7 8 9 10 11 -1
4 5 6 7 8 9 10 11 12 1
5 6 7 8 9 10 11 12 13 0
6 7 8 9 10 11 12 13 14 1
>>> seq_len=3
>>> data = []
>>> data_raw = df.values
>>> for index in range(len(data_raw) - seq_len + 1):
... data.append(data_raw[index: index + seq_len].T)
...
>>> data
[array([[ 1, 2, 3],
[ 2, 3, 4],
[ 3, 4, 5],
[ 4, 5, 6],
[ 5, 6, 7],
[ 6, 7, 8],
[ 7, 8, 9],
[ 8, 9, 10],
[ 1, 0, -1]], dtype=int64), array([[ 2, 3, 4],
[ 3, 4, 5],
[ 4, 5, 6],
[ 5, 6, 7],
[ 6, 7, 8],
[ 7, 8, 9],
[ 8, 9, 10],
[ 9, 10, 11],
[ 0, -1, -1]], dtype=int64), array([[ 3, 4, 5],
[ 4, 5, 6],
[ 5, 6, 7],
[ 6, 7, 8],
[ 7, 8, 9],
[ 8, 9, 10],
[ 9, 10, 11],
[10, 11, 12],
[-1, -1, 1]], dtype=int64), array([[ 4, 5, 6],
[ 5, 6, 7],
[ 6, 7, 8],
[ 7, 8, 9],
[ 8, 9, 10],
[ 9, 10, 11],
[10, 11, 12],
[11, 12, 13],
[-1, 1, 0]], dtype=int64), array([[ 5, 6, 7],
[ 6, 7, 8],
[ 7, 8, 9],
[ 8, 9, 10],
[ 9, 10, 11],
[10, 11, 12],
[11, 12, 13],
[12, 13, 14],
[ 1, 0, 1]], dtype=int64)]
>>> data = np.asarray(data)
>>> data.shape
(5, 9, 3)
Hope this is what you wanted to achieve. :)

How to insert (change values) matrix in matrix? Python (numpy)

For example, I have a matrix:
[ [1 2 3 4 5],
[6 7 8 9 10],
[11 12 13 14 15],
[16 17 18 19 20],
[21 22 23 24 25] ]
I want to insert [ [-1 -1 -1], [0 5 0] ] in some position, like:
[ [1 2 3 4 5],
[6 7 8 9 10],
[11 -1 -1 -1 15],
[16 0 5 0 20],
[21 22 23 24 25] ]
Use numpy insert!
Here is an example from the numpy reference at scipy:
>>> a = np.array([[1, 1], [2, 2], [3, 3]])
>>> a
array([[1, 1],
[2, 2],
[3, 3]])
>>> np.insert(a, 1, 5)
array([1, 5, 1, 2, 2, 3, 3])
>>> np.insert(a, 1, 5, axis=1)
array([[1, 5, 1],
[2, 5, 2],
[3, 5, 3]]
Read more here: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.insert.html
Based on the example, I would say you are trying to replace or modify
part of the existing array rather than insert an array.
You could use basic slicing to get a view of the part of the array you want to overwrite,
and assign the value of that slice to a new matrix of the same size
as the slice.
For example:
>>> x=np.matrix([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
>>> x
matrix([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
>>> x[1:3,1:4]=np.matrix([[-1,-2,-3],[-4,-5,-6]])
>>> x
matrix([[ 1, 2, 3, 4],
[ 5, -1, -2, -3],
[ 9, -4, -5, -6],
[13, 14, 15, 16]])
In general, to describe a submatrix of m rows and n columns with its upper left corner at row r and column c of the original matrix,
index the slice as x[r:r+m,c:c+n].
This method take the matrix m, and replace the elements with array n starting from row r, column c
def replace(m, n, r, c):
i = 0
if len(n) + c > len(m[r]):
return
for each in n:
m[r][c] = n[i]
c += 1
i += 1
you have to check the index boundaries for the matrix

Reshape an array in NumPy

Consider an array of the following form (just an example):
[[ 0 1]
[ 2 3]
[ 4 5]
[ 6 7]
[ 8 9]
[10 11]
[12 13]
[14 15]
[16 17]]
It's shape is [9,2]. Now I want to transform the array so that each column becomes a shape [3,3], like this:
[[ 0 6 12]
[ 2 8 14]
[ 4 10 16]]
[[ 1 7 13]
[ 3 9 15]
[ 5 11 17]]
The most obvious (and surely "non-pythonic") solution is to initialise an array of zeroes with the proper dimension and run two for-loops where it will be filled with data. I'm interested in a solution that is language-conform...
a = np.arange(18).reshape(9,2)
b = a.reshape(3,3,2).swapaxes(0,2)
# a:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15],
[16, 17]])
# b:
array([[[ 0, 6, 12],
[ 2, 8, 14],
[ 4, 10, 16]],
[[ 1, 7, 13],
[ 3, 9, 15],
[ 5, 11, 17]]])
numpy has a great tool for this task ("numpy.reshape") link to reshape documentation
a = [[ 0 1]
[ 2 3]
[ 4 5]
[ 6 7]
[ 8 9]
[10 11]
[12 13]
[14 15]
[16 17]]
`numpy.reshape(a,(3,3))`
you can also use the "-1" trick
`a = a.reshape(-1,3)`
the "-1" is a wild card that will let the numpy algorithm decide on the number to input when the second dimension is 3
so yes.. this would also work:
a = a.reshape(3,-1)
and this:
a = a.reshape(-1,2)
would do nothing
and this:
a = a.reshape(-1,9)
would change the shape to (2,9)
There are two possible result rearrangements (following example by #eumiro). Einops package provides a powerful notation to describe such operations non-ambigously
>> a = np.arange(18).reshape(9,2)
# this version corresponds to eumiro's answer
>> einops.rearrange(a, '(x y) z -> z y x', x=3)
array([[[ 0, 6, 12],
[ 2, 8, 14],
[ 4, 10, 16]],
[[ 1, 7, 13],
[ 3, 9, 15],
[ 5, 11, 17]]])
# this has the same shape, but order of elements is different (note that each paer was trasnposed)
>> einops.rearrange(a, '(x y) z -> z x y', x=3)
array([[[ 0, 2, 4],
[ 6, 8, 10],
[12, 14, 16]],
[[ 1, 3, 5],
[ 7, 9, 11],
[13, 15, 17]]])

Categories