Is there a way to efficiently implement a rolling window for 1D arrays in Numpy?
For example, I have this pure Python code snippet to calculate the rolling standard deviations for a 1D list, where observations is the 1D list of values, and n is the window length for the standard deviation:
stdev = []
for i, data in enumerate(observations[n-1:]):
strip = observations[i:i+n]
mean = sum(strip) / n
stdev.append(sqrt(250*sum([(s-mean)**2 for s in strip])/(n-1)))
Is there a way to do this completely within Numpy, i.e., without any Python loops? The standard deviation is trivial with numpy.std, but the rolling window part completely stumps me.
I found this blog post regarding a rolling window in Numpy, but it doesn't seem to be for 1D arrays.
Just use the blog code, but apply your function to the result.
i.e.
numpy.std(rolling_window(observations, n), 1)
where you have (from the blog):
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
Starting in Numpy 1.20, you can directly get a rolling window with sliding_window_view:
from numpy.lib.stride_tricks import sliding_window_view
sliding_window_view(np.array([1, 2, 3, 4, 5, 6]), window_shape = 3)
# array([[1, 2, 3],
# [2, 3, 4],
# [3, 4, 5],
# [4, 5, 6]])
I tried using so12311's answer listed above on a 2D array with shape [samples, features] in order to get an output array with shape [samples, timesteps, features] for use with a convolution or lstm neural network, but it wasn't working quite right. After digging into how the strides were working, I realized that it was moving the window along the last axis, so I made some adjustments so that the window is moved along the first axis instead:
def rolling_window(a, window_size):
shape = (a.shape[0] - window_size + 1, window_size) + a.shape[1:]
strides = (a.strides[0],) + a.strides
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
NOTE: there is no difference in the output if you are only using a 1D input array. In my search this was the first result to get close to what I wanted to do, so I am adding this to help any others searching for a similar answer.
With only one line of code...
import pandas as pd
pd.Series(observations).rolling(n).std()
Based on latter answers, here I add code for rolling 1-D numpy arrays choosing window size and window steps frequency.
a = np.arange(50)
def rolling_window(array, window_size,freq):
shape = (array.shape[0] - window_size + 1, window_size)
strides = (array.strides[0],) + array.strides
rolled = np.lib.stride_tricks.as_strided(array, shape=shape, strides=strides)
return rolled[np.arange(0,shape[0],freq)]
rolling_window(a,10,5)
Output:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[25, 26, 27, 28, 29, 30, 31, 32, 33, 34],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
def moving_avg(x,n):
mv = np.convolve(x,np.ones(n)/n,mode='valid')
return np.concatenate(([np.NaN for k in range(n-1)],mv))
I needed a rolling window to apply to any intermediate axis of an n-dimensional array, so I extended the code from the already accepted answer and #Miguel Gonzalez. The corresponding code to apply a rolling window to an n-d array along any axis:
def rolling_window(array, window, freq, axis=0):
shape = array.shape[:axis] + (array.shape[axis] - window_size + 1, window_size) + array.shape[axis+1:]
strides = array.strides[:axis] + (array.strides[axis],) + array.strides[axis:]
rolled = np.lib.stride_tricks.as_strided(array, shape=shape, strides=strides)
return np.take(rolled, np.arange(0,shape[axis],freq), axis=axis)
An example to create a test to assert validity of the function:
arr = np.random.randint(1, 1000, size=(2,108,21,5))
arr_windowed = rolling_window_ndimensional(arr, 12, 12, axis=1)
print(arr.shape)
print(arr_windowed.shape)
np.allclose(arr, arr_windowed.reshape(2,-1, 21,5))
Related
I am trying to collapse a fits data cube with Python. I know that special packages are doing it, but it is for a lecture purposes. I first extract a subcube in Z:
hdu.data = hdu.data[3365:3405, :, :]
subcube = hdu.data
The subcube has a dimension of Z=40, Y=50 and X=26. I want to collapse the cube in a all fashion way by a double loop in X and Y, in order to have a simple 2D image.
for i in range(1, xdim):
for j in range(1, ydim):
Sum[j,i] = subcube[:,j,i].sum()
I get an error message: IndexError: index 26 is out of bounds for axis 1 with size 26.
I know that python handle differently the cube dimensions as Z, Y, X and not X, Y, Z like IDL for example, but I can not figure out why I have the error.
Python indices start at 0. You need to do range(xdim) and range(ydim) in your for loops.
Python ranges starts with 0. Range for X is 0-25. For Y and Z the same.
Maybe simple double loop over subcube with new list creation can hel you?
z_flatten = [[sum(col) for col in row] for row in subcube]
The existing answers pointing out that Python is 0-indexed are correct, but no one pointed out yet that you don't even need to create an empty array with np.zeros or to use any for loops to do this.
Numpy already allows you to apply most operations along a specific axis of your array, as opposed to looping over the dimensions of your sub-cube and summing just one pixel at a time.
For example let's make a 3x4x4 data cube:
>>> cube = np.arange(3 * 4 * 4).reshape((3, 4, 4))
>>> cube
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]])
Say you want to sum all layers of a 3x3 slice of this cube:
>>> cube[:, :3, :3].sum(axis=0)
array([[48, 51, 54],
[60, 63, 66],
[72, 75, 78]])
In your case, the equivalent would be
subcube[:, :ydim, :xdim].sum(axis=0)
This is equivalent to what you're trying to do, but much more efficient.
As a general note, although you read your data cube out of a FITS file, since astropy.io.fits returns a Numpy array, any documentation or questions you can find about Numpy arrays apply--it generally isn't important at that point that it came from a FITS file. I point this out, just because it might help you in the future if you're struggling to perform operations on Numpy arrays.
I have a set of data like this:
numpy.array([[3, 7],[5, 8],[6, 19],[8, 59],[10, 42],[12, 54], [13, 32], [14, 19], [99, 19]])
which I want to split into number of chunkcs with a percantage of overlapping, for each column separatly... for example for column 1, splitting into 3 chunkcs with %50 overlapping (results in a 2-d array):
[[3, 5, 6, 8,],
[6, 8, 10, 12,],
[10, 12, 13, 14,]]
(ignoring last row which will result in [13, 14, 99] not identical in size as the rest).
I'm trying to make a function that takes the array, number of chunkcs and overlpapping percantage and returns the results.
That's a window function, so use skimage.util.view_as_windows:
from skimage.util import view_as_windows
out = view_as_windows(in_arr[:, 0], window_shape = 4, step = 2)
If you need numpy only, you can use this recipe
For numpy only, quite fast approach is:
def rolling(a, window, step):
shape = ((a.size - window)//step + 1, window)
strides = (step*a.itemsize, a.itemsize)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
And you can call it like so:
rolling(arr[:,0].copy(), 4, 2)
Remark: I've got unexpected outputs for rolling(arr[:,0], 4, 2) so just took a copy instead.
I'm struggling to deal with a scipy.stats.binned_statistic_dd() result. I have an array of positions and another array of ids that I'm binning in 3 directions. I'm providing a list of the bin edges as input rather than a number of bins in each direction coupled with a range option. I have 3 bins in x, 2 in y, and 3 in z, or 18 bins.
However, when I check the binnumbers listed, they are all in a range greater than 20. How do I get the bin numbers to reflect the number of bins provided and get rid of all the extra bins?
I've tried to follow what was suggested in this post (Output in scipy.stats.binned_statistic_dd()) which deals with something similar, but I can't understand how to apply this to my case. As usual, the documentation is as cryptic as ever.
Any help on get my binnumbers between 1-18 in this example would be greatly appreciated!
pos = np.array([[-0.02042167, -0.0223282 , 0.00123734],
[-0.0420364 , 0.01196078, 0.00694259],
[-0.09625651, -0.00311446, 0.06125461],
[-0.07693234, -0.02749618, 0.03617278],
[-0.07578646, 0.01199925, 0.02991888],
[-0.03258293, -0.00371765, 0.04245596],
[-0.06765955, 0.02798434, 0.07075846],
[-0.02431445, 0.02774102, 0.06719837],
[ 0.02798265, -0.01096739, -0.01658691],
[-0.00584252, 0.02043389, -0.00827088],
[ 0.00623063, -0.02642285, 0.03232817],
[ 0.00884222, 0.01498996, 0.02912483],
[ 0.07189474, -0.01541584, 0.01916607],
[ 0.07239394, 0.0059483 , 0.0740187 ],
[-0.08519159, -0.02894125, 0.10923724],
[-0.10803509, 0.01365444, 0.09555333],
[-0.0442866 , -0.00845725, 0.10361843],
[-0.04246779, 0.00396127, 0.1418258 ],
[-0.08975861, 0.02999023, 0.12713186],
[ 0.01772454, -0.0020405 , 0.08824418]])
ids = np.array([16, 9, 6, 19, 1, 4, 10, 5, 18, 11, 2, 12, 13, 8, 3, 17, 14,
15, 20, 7])
xbinEdges = np.array([-0.15298488, -0.05108961, 0.05080566, 0.15270093])
ybinEdges = np.array([-0.051, 0. , 0.051])
zbinEdges = np.array([-0.053, 0.049, 0.151, 0.253])
ret = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
statistic='count', expand_binnumbers=False)
bincounts = ret.statistic
binnumber = ret.binnumber.T
>>> binnumber = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
52, 32, 47], dtype=int64)
ranges = [[-0.15298488071, 0.15270092971],
[-0.051000000000000004, 0.051000000000000004],
[-0.0530000000000001, 0.25300000000000006]]
ret3 = stats.binned_statistic_dd(pos, ids, bins=(3,2,3), statistic='count', expand_binnumbers=False, range=ranges)
bincounts = ret3.statistic
binnumber = ret3.binnumber.T
>>> binnumber = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
52, 32, 47], dtype=int64)
Ok, after several days of background thinking and a quick scour through the binned_statistic_dd() source code I think I've come to the correct answer and it's pretty simple.
It seem binned_statistic_dd() adds an extra set of outlier bins in the binning phase and then removes these when returning the histogram results, but leaving the bin numbers untouched (I think this is in case you want to reuse the result for further stats outputs).
So it seems that if you export the expanded binnumbers (expand_binnumbers=True) and then subtract 1 from each binnumber to re-adjust the bin indices you can calculate the "correct" bin ids.
ret2 = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
statistic='count', expand_binnumbers=True)
bincounts2 = ret2.statistic
binnumber2 = ret2.binnumber
indxnum2 = binnumber2-1
corrected_bin_ids = np.ravel_multi_index((indxnum2),(numX, numY, numZ))
Quick and simple in the end!
Given an input tensor of shape (C, B, H) torch.Size([2, 5, 32]) of some neural net layers, where
channels = 2
batch_size = 5
hidden_size = 32
The goal is to flatten the channels and manipulate the input tensor to the shape (B, C*H) torch.Size([5, 2 * 32]), where:
batch_size = 5
hidden_size = 32 * 2
I've tried to do the following:
import torch
t = torch.rand([2, 5, 32])
# Changed from (channels, batch_size, hidden_size)
# -> (batch_size, channels, hidden_size)
t = t.permute(1, 0, 2)
# Reshape using view(), where batch_size is t.size(0)
# and -1 is to flatten the left over values to the other dimension.
z = t.contiguous().view(t.size(0), -1)
print(z.shape)
print(z)
[out]:
torch.Size([5, 64])
tensor([[0.3911, 0.9586, 0.2104, 0.3937, 0.9976, 0.3378, 0.0630, 0.6676, 0.0806,
0.9311, 0.5219, 0.1697, 0.7442, 0.5162, 0.2555, 0.0826, 0.5502, 0.9700,
0.3375, 0.5012, 0.9025, 0.8176, 0.1465, 0.1848, 0.3460, 0.9999, 0.7892,
0.7577, 0.6615, 0.2620, 0.6868, 0.2003, 0.4840, 0.8354, 0.9253, 0.3172,
0.9516, 0.8962, 0.1272, 0.2268, 0.6510, 0.5166, 0.6772, 0.9616, 0.9826,
0.5254, 0.9191, 0.4378, 0.7048, 0.8808, 0.0299, 0.1102, 0.9710, 0.8714,
0.7256, 0.9684, 0.6117, 0.1957, 0.8663, 0.4742, 0.2843, 0.6548, 0.9592,
0.1559],
[0.2333, 0.0858, 0.5284, 0.2965, 0.3863, 0.3370, 0.6940, 0.3387, 0.3513,
0.1022, 0.3731, 0.3575, 0.7095, 0.0053, 0.7024, 0.4091, 0.3289, 0.5808,
0.5640, 0.8847, 0.7584, 0.8878, 0.9873, 0.0525, 0.7731, 0.2501, 0.9926,
0.5226, 0.0925, 0.0300, 0.4176, 0.0456, 0.4643, 0.4497, 0.5920, 0.9519,
0.6647, 0.2379, 0.4927, 0.9666, 0.1675, 0.9887, 0.7741, 0.5668, 0.7376,
0.4452, 0.7449, 0.1298, 0.9065, 0.3561, 0.5813, 0.1439, 0.2115, 0.5874,
0.2038, 0.1066, 0.3843, 0.6179, 0.8321, 0.9428, 0.1067, 0.5045, 0.9324,
0.3326],
[0.6556, 0.1479, 0.9288, 0.9238, 0.1324, 0.0718, 0.6620, 0.2659, 0.7162,
0.7559, 0.7564, 0.2120, 0.3943, 0.9497, 0.7520, 0.8455, 0.4444, 0.4708,
0.8371, 0.6365, 0.3616, 0.0326, 0.1581, 0.4973, 0.6701, 0.9245, 0.8274,
0.3464, 0.7044, 0.5376, 0.0441, 0.5210, 0.8603, 0.7396, 0.2544, 0.3514,
0.5686, 0.3283, 0.7248, 0.4303, 0.9531, 0.5587, 0.8703, 0.1585, 0.9161,
0.9043, 0.9778, 0.4489, 0.9463, 0.8655, 0.5576, 0.1135, 0.1268, 0.3424,
0.1504, 0.2265, 0.1734, 0.1872, 0.3995, 0.1191, 0.0532, 0.6109, 0.1662,
0.6937],
[0.6342, 0.1922, 0.1758, 0.4625, 0.7654, 0.6509, 0.2908, 0.1546, 0.4768,
0.3779, 0.2490, 0.0086, 0.6170, 0.5425, 0.6953, 0.4730, 0.5834, 0.8326,
0.0165, 0.8236, 0.0023, 0.7479, 0.5621, 0.9894, 0.5957, 0.0857, 0.6087,
0.5667, 0.5478, 0.8197, 0.9228, 0.7329, 0.4434, 0.5894, 0.9860, 0.6133,
0.2395, 0.4718, 0.8830, 0.6361, 0.6104, 0.6630, 0.5084, 0.7604, 0.7591,
0.3601, 0.6888, 0.6767, 0.9178, 0.5291, 0.0591, 0.4320, 0.7875, 0.5038,
0.4419, 0.0319, 0.3719, 0.5843, 0.0334, 0.3525, 0.0023, 0.1205, 0.4040,
0.7908],
[0.0989, 0.8436, 0.0425, 0.6247, 0.6091, 0.4778, 0.2692, 0.4785, 0.9217,
0.9604, 0.6355, 0.4686, 0.9414, 0.7722, 0.8013, 0.1660, 0.6578, 0.6414,
0.6814, 0.6212, 0.4124, 0.7102, 0.7416, 0.7404, 0.9842, 0.6542, 0.0106,
0.3826, 0.5529, 0.8079, 0.9855, 0.3012, 0.2341, 0.9353, 0.6597, 0.7177,
0.8214, 0.1438, 0.4729, 0.6747, 0.9310, 0.4167, 0.3689, 0.8464, 0.9395,
0.9407, 0.8419, 0.5486, 0.1786, 0.1423, 0.9900, 0.9365, 0.3996, 0.1862,
0.6232, 0.7547, 0.7779, 0.4767, 0.6218, 0.9079, 0.6153, 0.1488, 0.5960,
0.4015]])
Although the permute() + view() achieve the desired output, are there other ways to perform the same operation? Is there a better way that can directly rehape without first permutating the order of the shape?
Let's look "behind the curtain" and see why one must have both permute/transpose and view in order to go from a C-B-H to B-C*H:
Elements of tensors are stored as a long contiguous vector in memory. For instance, if you look at a 2-3-4 tensor it has 24 elements stored at 24 consecutive places in memory. This tensor also has a "header" that tells pytorch to treat these 24 values as a 2-by-3-by-4 tensor. This is done by storing not only the size of the tensor, but also "strides": what is the "stride" one need to jump in order to get to the next element along each dimension. In our example, size=(2,3,4) and strides=(12, 4, 1) (you can check this out yourself, and you can see more about it here).
Now, if you only want to change the size to 2-(3*4) you do not need to move any item of the tensor in memory, only to update the "header" of the tensor. By setting size=(2, 12) and strides=(12, 1) you are done!
Alternatively, if you want to "transpose" the tensor to 3-2-4 that's a bit more tricky, but you can still do that by manipulating the strides. Setting size=(3, 2, 4) and strides=(4, 12, 1) gives you exactly what you want without moving any of the real tensor elements in memory.
However, once you manipulated the strides, you cannot trivially change the size of the tensor - because now you will need to have two different "stride" values for one (or more) dimensions. This is why you must call contiguous() at this point.
Summary
If you want to move from shape (C, B, H) to (B, C*H) you must have permute, contiguous and view operations, otherwise you just scramble the entries of your tensor.
A small example with 2-3-4 tensor:
a =
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
If you just change the view of the tensor you get
a.view(3,8)
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23]])
Which is not what you want!
You need to have
a.permute(1,0,2).contiguous().view(3, 8)
array([[ 0, 1, 2, 3, 12, 13, 14, 15],
[ 4, 5, 6, 7, 16, 17, 18, 19],
[ 8, 9, 10, 11, 20, 21, 22, 23]])
Einops allows doing such element rearrangements in one (readable) line
from einops import rearrange
import torch
t = torch.rand([2, 5, 32])
y = rearrange(t, 'c b h -> b (c h)')
y.shape # prints torch.Size([5, 64])
edit: it's an image so the suggested (How can I efficiently process a numpy array in blocks similar to Matlab's blkproc (blockproc) function) isn't really working for me
I have the following matlab code
fun = #(block_struct) ...
std2(block_struct.data) * ones(size(block_struct.data));
B=blockproc(im2double(Icorrected), [4 4], fun);
I want to remake my code, but this time in Python. I have installed Scikit and i'm trying to work around it like this
b = np.std(a, axis = 2)
The problem of course it's that i'm not applying the std for a number of blocks, just like above.
How can i do something like this? Start a loop and try to call the function for each X*X blocks? Then i wouldn't keep the size the it was.
Is there another more efficient way?
If there is no overlap in the windows you can reshape the data to suit your needs:
Find the mean of 3x3 windows of a 9x9 array.
import numpy as np
>>> a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59, 60, 61, 62],
[63, 64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79, 80]])
Find the new shape
>>> window_size = (3,3)
>>> tuple(np.array(a.shape) / window_size) + window_size
(3, 3, 3, 3)
>>> b = a.reshape(3,3,3,3)
Find the mean along the first and third axes.
>>> b.mean(axis = (1,3))
array([[ 10., 13., 16.],
[ 37., 40., 43.],
[ 64., 67., 70.]])
>>>
2x2 windows of a 4x4 array:
>>> a = np.arange(16).reshape((4,4))
>>> window_size = (2,2)
>>> tuple(np.array(a.shape) / window_size) + window_size
(2, 2, 2, 2)
>>> b = a.reshape(2,2,2,2)
>>> b.mean(axis = (1,3))
array([[ 2.5, 4.5],
[ 10.5, 12.5]])
>>>
It won't work if the window size doesn't divide into the array size evenly. In that case you need some overlap in the windows or if you just want overlap numpy.lib.stride_tricks.as_strided is the way to go - a generic N-D function can be found at Efficient Overlapping Windows with Numpy
Another option for 2d arrays is sklearn.feature_extraction.image.extract_patches_2d and for ndarray's - sklearn.feature_extraction.image.extract_patches. Each manipulate the array's strides to produce the patches/windows.
I did the following
io.use_plugin('pil', 'imread')
a = io.imread('C:\Users\Dimitrios\Desktop\polimesa\\arizona.jpg')
B = np.zeros((len(a)/2 +1, len(a[0])/2 +1))
for i in xrange(0, len(a), 2):
for j in xrange(0, len(a[0]), 2):
x.append(a[i][j])
if i+1 < len(a):
x.append(a[i+1][j])
if j+1 < len(a[0]):
x.append(a[i][j+1])
if i+1 < len(a) and j+1 < len(a[0]):
x.append(a[i+1][j+1])
B[i/2][j/2] = np.std(x)
x[:] = []
and i think it's correct. Iterating over the image by 2 and taking each neighbour node, adding them to a list and calculating std.
edit* later edited for 4x4 blocks.
We can implement blockproc() in python the following way:
def blockproc(im, block_sz, func):
h, w = im.shape
m, n = block_sz
for x in range(0, h, m):
for y in range(0, w, n):
block = im[x:x+m, y:y+n]
block[:,:] = func(block)
return im
Now, let's apply it to implement contrast enhancement with local histogram equalization, with the low-contrast moon image (of size 512x512) as input and choosing 32x32 blocks:
from skimage import data, exposure
img = data.moon()
img = img / img.max()
m, n = 64, 64
img_eq = blockproc(img.copy(), (m, n), exposure.equalize_hist)
Display the input and output images:
Note that the function does in-place modification to the image, hence a copy of the input image is passed instead.