How to get many rolling window slices in numpy? - python

I have the following numpy array:
[[[1], [2], [3], [1], [2], [3]],
[[4], [5], [6], [4], [5], [6]],
[[7], [8], [9], [7], [8], [9]]]
And I want each of the elements in the last dimension, [1], [2], [3] etc. to be concatenate with the following n arrays in the second dimension. In case of overflow, elements can be filled with 0. For example, for n = 2:
[[[1, 2, 3], [2, 3, 1], [3, 1, 2], [1, 2, 3], [2, 3, 0], [3, 0, 0]],
[[4, 5, 6], [5, 6, 4], [6, 4, 5], [4, 5, 6], [5, 6, 0], [6, 0, 0]],
[[7, 8, 9], [8, 9, 7], [9, 7, 8], [7, 8, 9], [8, 9, 0], [9, 0, 0]]]
I want to do this with the built in numpy functions for good performance and also want to do it in reverse i.e., a shift of n = -2 is fair game. How to do this?
For n = -2:
[[[0, 0, 1], [0, 1, 2], [1, 2, 3], [2, 3, 1], [3, 1, 2], [1, 2, 3]],
[[0, 0, 4], [0, 4, 5], [4, 5, 6], [5, 6, 4], [6, 4, 5], [4, 5, 6]],
[[0, 0, 7], [0, 7, 8], [7, 8, 9], [8, 9, 7], [9, 7, 8], [7, 8, 9]]]
For n = 3
[[[1, 2, 3, 1], [2, 3, 1, 2], [3, 1, 2, 3], [1, 2, 3, 0], [2, 3, 0, 0], [3, 0, 0, 0]],
[[4, 5, 6, 4], [5, 6, 4, 5], [6, 4, 5, 6], [4, 5, 6, 0], [5, 6, 0, 0], [6, 0, 0, 0]],
[[7, 8, 9, 7], [8, 9, 7, 8], [9, 7, 8, 9], [7, 8, 9, 0], [8, 9, 0, 0], [9, 0, 0, 0]]]
If the current shape of the array is (height, width, 1), after the operation, the shape will be (height, width, abs(n) + 1).
How to generalize this so that the numbers 1, 2, 3 etc. can themselves be numpy arrays?

This sounds like a textbook application for the monster that is as_strided. One of the nice things about it is that it does not require any additional imports. The general idea is this:
You have an array with shape (3, 6, 1) and strides (6, 1, 1) * element_size.
x = ...
n = ... # Must not be zero, but you can special-case it to return the original array
You want to transform this into an array that has shape (3, 6, |n| + 1) and therefore strides (6 * (|n| + 1), |n| + 1, 1) * element_size.
To do this, you first pad the left or the right with |n| zeros:
pad = np.zeros((x.shape[0], np.abs(n), x.shape[2]))
x_pad = np.concatenate([x, pad][::np.sign(n)], axis=1)
Now you can index directly into the buffer with a custom shape and strides to get the result you want. Instead of using the proper strides (6 * (|n| + 1), |n| + 1, 1) * element_size, we will index each repeated element directly into the same buffer of the original array, meaning that the strides will be adjusted. The middle dimension will move by one element, rather than the proper |n| + 1. That way, the columns can start exactly where you want them to:
new_shape = (x.shape[0], x.shape[1], x.shape[2] + np.abs(n))
new_strides = (x_pad.strides[0], x_pad.strides[2], x_pad.strides[2])
result = np.lib.stride_tricks.as_strided(x_pad, shape=new_shape, strides=new_strides)
There are many caveats here. The biggest thing to be aware of is that multiple array elements access the same memory. My advice is to make a proper fleshed-out copy if you plan to do anything besides just reading the data:
result = result.copy()
This will give you a buffer of the correct size rather than a crazy view into the original data with padding.

Here is a way to do it:
from skimage.util import view_as_windows
if n>=0:
a = np.pad(a.reshape(*a.shape[:-1]),((0,0),(0,n)))
else:
n *= -1
a = np.pad(a.reshape(*a.shape[:-1]),((0,0),(n,0)))
b = view_as_windows(a,(1,n+1))
b = b.reshape(*b.shape[:-2]+(n+1,))
a is your input array and b is your output:
n=2:
[[[1 2 3]
[2 3 1]
[3 1 2]
[1 2 3]
[2 3 0]
[3 0 0]]
[[4 5 6]
[5 6 4]
[6 4 5]
[4 5 6]
[5 6 0]
[6 0 0]]
[[7 8 9]
[8 9 7]
[9 7 8]
[7 8 9]
[8 9 0]
[9 0 0]]]
n=-2:
[[[0 0 1]
[0 1 2]
[1 2 3]
[2 3 1]
[3 1 2]
[1 2 3]]
[[0 0 4]
[0 4 5]
[4 5 6]
[5 6 4]
[6 4 5]
[4 5 6]]
[[0 0 7]
[0 7 8]
[7 8 9]
[8 9 7]
[9 7 8]
[7 8 9]]]
Explanation:
np.pad(a.reshape(*a.shape[:-1]),((0,0),(0,n))) pads enough zeros to the right side of array for overflow of windows (similarly padding left side for negative n)
view_as_windows(a,(1,n+1)) creates windows of shape (1,n+1) from the array as desired by the question.
b.reshape(*b.shape[:-2]+(n+1,)) gets rid of the extra dimension of length 1 created by (1,n+1)-shaped windows and reshape b to desired shape. Note the argument *b.shape[:-2]+(n+1,) is simply concatenation of two tuples to create a single tuple as shape.

You can also do this (requires numpy version 1.20.x+
import numpy as np
arr = np.array([[[1], [2], [3], [1], [2], [3]],
[[4], [5], [6], [4], [5], [6]],
[[7], [8], [9], [7], [8], [9]]])
n = 2 # your n value
nslices = n+1
# reshape into 2d array
arr = arr.reshape((3, -1)) # <-- add n zeros as padding here
# perform the slicing
slices = np.lib.stride_tricks.sliding_window_view(arr, (1, nslices))
slices = slices[:, :, 0]
print(slices)

Related

Changing shape of a numpy array in a way that keeps the indices/positions of elements the same

Suppose I have the following numpy array
[[[1 2 3]
[4 5 6]
[7 8 9]]
[[1 2 3]
[4 5 6]
[7 8 9]]
[[1 2 3]
[4 5 6]
[7 8 9]]]
I want to be able to resize this array (making it smaller or larger along an axis) but have existing elements have the same indices as they did before the resize. So, if I decreased the size of axis 2 by one element, it would look like this:
[[[1 2]
[4 5]
[7 8]]
[[1 2]
[4 5]
[7 8]]
[[1 2]
[4 5]
[7 8]]]
And if I increased the size of axis 1, it would look like this:
[[[1 2 3]
[4 5 6]
[7 8 9]
[0 0 0]]
[[1 2 3]
[4 5 6]
[7 8 9]
[0 0 0]]
[[1 2 3]
[4 5 6]
[7 8 9]
[0 0 0]]]
How would I do this, short of implementing all the loops and everything myself?
For reference, if I use the Numpy resize() function, and do np.resize(my_array, (3, 3, 2)) to decrease the size of axis 2 from 3 to 2, Numpy simply changes the sizes of the dimensions and doesn't reorganize the array data itself, meaning that indices of elements are not preserved:
[[[1 2]
[3 4]
[5 6]]
[[7 8]
[9 1]
[2 3]]
[[4 5]
[6 7]
[8 9]]]
You can try defining a class that inherits from the np.ndarray class, defining an extra bound function that does the "resizing":
import numpy as np
class Array(np.ndarray):
def __new__(cls, a, dtype=None, order=None):
obj = np.asarray(a, dtype, order).view(cls)
return obj
def __array_wrap__(self, out_arr, context=None):
return np.ndarray.__array_wrap__(self, out_arr, context)
def resizing(self, cols):
return self[..., :cols]
arr = np.tile(np.arange(1, 10), 3).reshape(3, 3, 3)
arr = Array(arr)
Now we can do:
print(arr.resizing(1))
Output:
[[[1]
[4]
[7]]
[[1]
[4]
[7]]
[[1]
[4]
[7]]]
print(arr.resizing(2))
Output:
[[[1 2]
[4 5]
[7 8]]
[[1 2]
[4 5]
[7 8]]
[[1 2]
[4 5]
[7 8]]]
You don't want resize or reshape
Make a sample array:
In [29]: arr = np.arange(1,10).reshape(1,3,3)
In [30]: arr
Out[30]:
array([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]])
To get your (3,3,3) array use repeat. To keep the answer short, I'll stick with the (1,3,3):
In [31]: arr.repeat(3,0)
Out[31]:
array([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]])
Slicing can "cut off" a column
In [33]: arr[:,:,:2]
Out[33]:
array([[[1, 2],
[4, 5],
[7, 8]]])
Adding on a row, use concatenate:
In [36]: np.concatenate((arr, np.zeros((1,1,3),int)), axis=1)
Out[36]:
array([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[0, 0, 0]]])
Keep in mind that numpy stores an array as shape and a 1d data-buffer.
In [40]: arr.ravel()
Out[40]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr is a reshape from arange. Reshape preserves that 1d source. So it isn't suitable for the kinds of changes you want. resize works that 1d source also, though it can add on, or subtract from it. Think in terms of making a new array with selected values from the original, rather than changing the array.
I would use either np.pad or np.zeros.
using np.pad:
Build example array:
>>> import numpy as np
>>>
>>> A = np.resize(np.r_[1:10],(3,3,3))
>>> A
array([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]])
Define the function
>>> def recut_pad(A,shp):
... return np.pad(A[tuple(map(slice,shp))],[(0,max(0,sn-so)) for sn,so in zip(shp,A.shape)])
Try it out:
>>> recut_pad(A,(2,4,5))
array([[[1, 2, 3, 0, 0],
[4, 5, 6, 0, 0],
[7, 8, 9, 0, 0],
[0, 0, 0, 0, 0]],
[[1, 2, 3, 0, 0],
[4, 5, 6, 0, 0],
[7, 8, 9, 0, 0],
[0, 0, 0, 0, 0]]])
>>> recut_pad(A,(4,2,2))
array([[[1, 2],
[4, 5]],
[[1, 2],
[4, 5]],
[[1, 2],
[4, 5]],
[[0, 0],
[0, 0]]])
Using np.zeros and slicing:
Define the function:
>>> def recut_zeros(A,shp):
... out = np.zeros(shp,A.dtype)
... out[tuple(map(slice,A.shape))] = A[tuple(map(slice,shp))]
... return out
Verify:
>>> np.all(recut_pad(A,(1,5,4))==recut_zeros(A,(1,5,4)))
True
>>> np.all(recut_pad(A,(7,2,3))==recut_zeros(A,(7,2,3)))
True

Split up numpy array

I've been looking all over but I'm not really sure how to even describe what it is I want. Essentially I need to turn
np.array(
[[0,0, 1,1, 2,2],
[0,0, 1,1, 2,2],
[3,3, 4,4, 5,5],
[3,3, 4,4, 5,5]]
)
into
np.array(
[[[0,1,2,3,4,5], [0,1,2,3,4,5]],
[[0,1,2,3,4,5], [0,1,2,3,4,5]]
)
I think I can accomplish that using np.reshape and maybe some other stuff but if I try reshape with arguments (2,2,6) I get back
[[[0 0 1 1 2 2]
[0 0 1 1 2 2]]
[[3 3 4 4 5 5]
[3 3 4 4 5 5]]]
which is not quite what I want.
Make your array with a couple of repeats:
In [208]: arr = np.arange(0,6).reshape(2,3)
In [209]: arr
Out[209]:
array([[0, 1, 2],
[3, 4, 5]])
In [210]: arr = arr.repeat(2,0).repeat(2,1)
In [211]: arr
Out[211]:
array([[0, 0, 1, 1, 2, 2],
[0, 0, 1, 1, 2, 2],
[3, 3, 4, 4, 5, 5],
[3, 3, 4, 4, 5, 5]])
Now break it into blocks which we can transpose:
In [215]: arr1 = arr.reshape(2,2,3,2)
In [216]: arr1
Out[216]:
array([[[[0, 0],
[1, 1],
[2, 2]],
[[0, 0],
[1, 1],
[2, 2]]],
[[[3, 3],
[4, 4],
[5, 5]],
[[3, 3],
[4, 4],
[5, 5]]]])
In [217]: arr1.shape
Out[217]: (2, 2, 3, 2)
In [218]: arr1.transpose(1,0,2,3)
Out[218]:
array([[[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]],
[[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]]])
Let's consolidate the middle 2 axes:
In [220]: arr1.transpose(1,0,2,3).reshape(2,6,2)
Out[220]:
array([[[0, 0],
[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]],
[[0, 0],
[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]]])
Almost there; just need another transpose:
In [221]: arr1.transpose(1,0,2,3).reshape(2,6,2).transpose(0,2,1)
Out[221]:
array([[[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]],
[[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]]])
The basic idea is to reshape the array into blocks, do a transpose, and reshape again. Here I needed another transpose, but if I choose the right one to start with I might not have needed that.
I don't know of a systematic way of doing this; there may be one, but so far I've just used a bit of trial and error when answering this kind of question. Everyone wants a different final arrangement.
This should work:
>>> import numpy as np
>>> A = np.array(
... [[0,0, 1,1, 2,2],
... [0,0, 1,1, 2,2],
... [3,3, 4,4, 5,5],
... [3,3, 4,4, 5,5]]
... )
>>> B = a[::2,::2].flatten()
>>> B
array([0, 1, 2, 3, 4, 5])
>>> C = np.tile(b, (2,2,1))
>>> C
array([[[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]],
[[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]]])
We can generalize this for a given n * m matrix, that contain blocks sized n/y * m/x of identical values (so there are y rows and x columns of blocks)
def transform(A, y, x):
dy = A.shape[0]/y
dx = A.shape[1]/x
B = A[::dy, ::dx].flatten()
return np.tile(B, (y,x,1))
I think this is what you are looking for:
import numpy
b=numpy.array([[0,0,1,1,2,2],[0,0,1,1,2,2],[3,3,4,4,5,5],[3,3,4,4,5,5]])
c1=(b[::2,::2].flatten(),b[::2,1::2].flatten())
c2=(b[1::2,::2].flatten(),b[1::2,1::2].flatten())
c=numpy.vstack((c1,c2)).reshape((2,2,6))
print(c)
which outputs:
[[[0 1 2 3 4 5]
[0 1 2 3 4 5]]
[[0 1 2 3 4 5]
[0 1 2 3 4 5]]]
and for general size target array and general size input array this is the algorithm with an example of 3*3 input array:
import numpy
b=numpy.array([[0,0,1,1,2,2],[0,0,1,1,2,2],[0,0,1,1,2,2],[3,3,4,4,5,5],[3,3,4,4,5,5],[3,3,4,4,5,5]])
(m,n)=b.shape
C=b[::int(m/2),::2].flatten(),b[::int(m/2),1::2].flatten()
for i in range(1,int(m/2)):
C=numpy.vstack((C,(b[i::int(m/2),::2].flatten(),b[i::int(m/2),1::2].flatten())))
print(C)
which outputs:
[[0 1 2 3 4 5]
[0 1 2 3 4 5]
[0 1 2 3 4 5]
[0 1 2 3 4 5]
[0 1 2 3 4 5]
[0 1 2 3 4 5]]

Dataframe column of arrays to numpy array

I have a dataframe df with a single column that contains arrays of length 3. Now, I want to transform this column to a numpy array of the correct shape. However, applying np.reshape does not work. How can I do this?
Here is a brief example:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['col'])
for i in range(10):
df.loc[i,'col'] = np.zeros(3)
arr = np.array(df['col'])
np.reshape(arr, (10,3)) # This does not work
Here are two approaches using np.vstack and np.concatenate -
np.vstack(df.col)
np.concatenate(df.col).reshape(df.shape[0],-1) # for performance
For best performance, we could use the underlying data with df.col.values instead.
Sample run -
In [116]: df
Out[116]:
col
0 [7, 5, 2]
1 [1, 1, 3]
2 [6, 1, 4]
3 [7, 0, 0]
4 [8, 8, 0]
5 [7, 8, 0]
6 [0, 5, 8]
7 [8, 3, 1]
8 [6, 6, 8]
9 [8, 2, 3]
In [117]: np.vstack(df.col)
Out[117]:
array([[7, 5, 2],
[1, 1, 3],
[6, 1, 4],
[7, 0, 0],
[8, 8, 0],
[7, 8, 0],
[0, 5, 8],
[8, 3, 1],
[6, 6, 8],
[8, 2, 3]])

Sorting all rows in numpy matrix by target-column

This seems been asked many times, however the answer I found not work now. Let's be simple, here I have a numpy matrix
data = np.matrix([[9, 8],
[7, 6],
[5, 7],
[3, 2],
[1, 0]])
Then sort by second column as below
[[1, 0],
[3, 2],
[7, 6],
[5, 7],
[9, 8]])
I tried a lot examples like Python Matrix sorting via one column but none of them worked.
I wondering maybe because the answers were posted years ago which do not work for newest Python? My Python is 3.5.1.
Example of my failed trial:
data = np.matrix([[9, 8],
[7, 6],
[5, 7],
[3, 2],
[1, 0]])
temp = data.view(np.ndarray)
np.lexsort((temp[:, 1], ))
print(temp)
print(data)
You are a moving target.
Sort each column independently:
In [151]: np.sort(data,axis=0)
Out[151]:
matrix([[1, 0],
[3, 2],
[5, 6],
[7, 7],
[9, 8]])
Sort on the values of the second column
In [160]: ind=np.argsort(data[:,1],axis=0)
In [161]: ind
Out[161]:
matrix([[4],
[3],
[1],
[2],
[0]], dtype=int32)
In [162]: data[ind.ravel(),:] # ravel needed because of matrix
Out[162]:
matrix([[[1, 0],
[3, 2],
[7, 6],
[5, 7],
[9, 8]]])
Another way to get a valid ind array:
In [163]: ind=np.argsort(data.A[:,1],axis=0)
In [164]: ind
Out[164]: array([4, 3, 1, 2, 0], dtype=int32)
In [165]: data[ind,:]
To use lexsort you need something like
In [175]: np.lexsort([data.A[:,0],data.A[:,1]])
Out[175]: array([4, 3, 1, 2, 0], dtype=int32)
or your 'failed' case - which isn't a fail
In [178]: np.lexsort((data.A[:,1],))
Out[178]: array([4, 3, 1, 2, 0], dtype=int32)
here data[:,1] is the primary key. data[:,0] is the tie breaker (not applicable in your example). I'm just working from the docs.
The approach in your link is working:
import numpy as np
data = np.matrix([[9, 8],
[7, 6],
[5, 7],
[3, 2],
[1, 0]])
print(data[np.argsort(data.A[:, 1])])
[[1 0]
[3 2]
[7 6]
[5 7]
[9 8]]
And now an example where it's better to see:
data = np.matrix([[1, 9],
[2, 8],
[3, 7],
[4, 6],
[0, 5]])
[[0 5]
[4 6]
[3 7]
[2 8]
[1 9]]

subtraction operation on multidimensional arrays

I have a list.
l = [[1, 2, 8] [8, 2, 7] [7, 2, 5]]
I want first element to be zero and then I need to subtract values columnwise.
explanation :
1 2 8
8 2 7
7 2 5
subtraction as,
0 1 6
0 -6 5
0 -5 3
I want output as :
l = [[0, 1, 6], [0, -6, 5], [0, -5, 3]]
which is the faster way to perform this operation if I have large list?
I am using numpy but I changed here so that easy to understand
my numpy array object is
l = [[1 2 8] [8 2 7] [7 2 5]]
>>> l = np.array([[1, 2, 8], [8, 2, 7], [7, 2, 5]])
>>> l[:, 1:] -= l[:, :-1]
>>> l[:, 0] = 0
>>> l
array([[ 0, 1, 6],
[ 0, -6, 5],
[ 0, -5, 3]])
Using numpy.insert and numpy.diff:
>>> import numpy as np
>>> a = np.array([[1, 2, 8], [8, 2, 7], [7, 2, 5]])
>>> np.insert(np.diff(a), 0, 0, axis=1)
array([[ 0, 1, 6],
[ 0, -6, 5],
[ 0, -5, 3]])
Without numpy, you can get away with this
l = [[1, 2, 8], [8, 2, 7], [7, 2, 5]]
def minus(rest, val):
rest[-1] -= val
rest.append(val)
return rest
def myReduce(l):
l2 = reduce(minus, l[-2::-1], [l[-1]])
l2.reverse()
l2[0] = 0
return l2
l2 = map(myReduce, l)
print l2
I guess it's quite straightforward and easy to understand.

Categories