averaging matrix efficiently

averaging matrix efficiently - python

in Python, given an n x p matrix, e.g. 4 x 4, how can I return a matrix that's 4 x 2 that simply averages the first two columns and the last two columns for all 4 rows of the matrix?
e.g. given:
a = array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
return a matrix that has the average of a[:, 0] and a[:, 1] and the average of a[:, 2] and a[:, 3].
I want this to work for an arbitrary matrix of n x p assuming that the number of columns I am averaging of n is obviously evenly divisible by n.
let me clarify: for each row, I want to take the average of the first two columns, then the average of the last two columns. So it would be:
1 + 2 / 2, 3 + 4 / 2 <- row 1 of new matrix
5 + 6 / 2, 7 + 8 / 2 <- row 2 of new matrix, etc.
which should yield a 4 by 2 matrix rather than 4 x 4.
thanks.

How about using some math? You can define a matrix M = [[0.5,0],[0.5,0],[0,0.5],[0,0.5]] so that A*M is what you want.
from numpy import array, matrix
A = array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
M = matrix([[0.5,0],
[0.5,0],
[0,0.5],
[0,0.5]])
print A*M
Generating M is pretty simple too, entries are 1/n or zero.

reshape - get mean - reshape
>>> a.reshape(-1, a.shape[1]//2).mean(1).reshape(a.shape[0],-1)
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])
is supposed to work for any array size, and reshape doesn't make a copy.

It's a bit unclear what should happen for matrices with n > 4, but this code will do what you want:
a = N.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]], dtype=float)
avg = N.vstack((N.average(a[:,0:2], axis=1), N.average(a[:,2:4], axis=1))).T
This yields avg =
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])

Here's a way to do it. You only need to change groupsize to make it work with other sizes like you said, though I'm not fully sure what you want.
groupsize = 2
out = np.hstack([np.mean(x,axis=1,out=np.zeros((a.shape[0],1))) for x in np.hsplit(a,groupsize)])
yields
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])
for out. Hopefully it gives you some ideas on how to do exactly what it is that you want to do. You can make groupsize dependent on the dimensions of a for instance.

Related

Index values to a vector with numpy in python

I have created a vector of zeros called Qc_vector (18 rows x 1 column).
I have created another vector called s_vector (6 rows x 1 column) that is generated each time by a for loop within the range ingreso_datos, that is, for this example it is generated 5 times.
I have also created a list called indices that is generated for each iteration of the loop, these indices tell me the row number to which I should index the values from s_vector to Qc_vector
PROBLEM
When trying to do this I get the following error: ValueError: shape mismatch: value array of shape (6,) could not be broadcast to indexing result of shape (6,1)
For element 6 of the matrix ingreso_datos, the indices are: [1,2,3,4,5,6]
For the end of the loop, that is, for element number 5 s_vector it looks like this:
s_vector for element 5
Qc_vector indexed, how it should look
import numpy as np
# Element 1(i) 2(i) 3(i) 1(j) 2(j) 3(j) x(i) y(i) x(j) y(j) | W(kg/m) Axis(kg/m)
# [Col0] [Col1] [Col2] [Col3] [Col4] [Col5] [Col6] [Col7] [Col8] [Col9] [Col10] | [Col11] [Col12]
ingreso_datos = [[ 1, 13, 14, 15, 7, 8, 9, 0, 0, 0, 2.5, 0, 0],
[ 2, 16, 17, 18, 10, 11, 12, 4.5, 0, 4.5, 2.5, 0, 0],
[ 3, 7, 8, 9, 1, 2, 3, 4.5, 0, 4.5, 2.5, 0, 0],
[ 4, 10, 11, 12, 4, 5, 6, 4.5, 0, 4.5, 2.5, 0, 0],
[ 5, 7, 8, 9, 10, 11, 12, 4.5, 0, 4.5, 2.5, -2200, 0]]
Qc_vector = np.zeros((12,1)) # Vector de zeros
for i in range(len(ingreso_datos)):
indices = []
indices.append([ingreso_datos[i][0], ingreso_datos[i][1], ingreso_datos[i][2], ingreso_datos[i][3],
ingreso_datos[i][4], ingreso_datos[i][5], ingreso_datos[i][6]])
for row in indices:
indices = np.array(row[1:])
L = np.sqrt((ingreso_datos[i][9]-ingreso_datos[i][7])**2+(ingreso_datos[i][10]-ingreso_datos[i][8])**2)
lx = (ingreso_datos[i][9]-ingreso_datos[i][7])/L
ly = (ingreso_datos[i][10]-ingreso_datos[i][8])/L
w = ingreso_datos[i][11]
ad = ingreso_datos[i][12]
s_vector = np.array([ad*L/2, w*L/2, (w*L**2)/12, ad*L/2, w*L/2, (-w*L**2)/12]) # s_vector
Qc_vector[np.ix_(indices)] = s_vector # Indexing

Qc_vector is (18,1).
indices = [ingreso_datos[i][0], ingreso_datos[i][1], ingreso_datos[i][2], ingreso_datos[i][3], ingreso_datos[i][4], ingreso_datos[i][5], ingreso_datos[i][6]])
or simply:
indices = [ingreso_datos[i,[0,1,2,3,4,5,6]]]
followed by:
for row in indices:
indices = np.array(row[1:])
which is just
ingreso_datos[i,[1,2,3,4,5,6]]
s_vector is a 6 element array, shape (6,)
In:
Qc_vector[np.ix_(indices)] = s_vector
you don't need ix_. In my previous answer I suggested:
master_matrix[np.ix_(indices,indices)] ==little_matrix
as a way of doing the indexing for all rows, not just one at a time.
I think your assignment can be simplified to
Qc_vector[indices, 0] = s_vector
That way there's a shape (6,) array on both sides.
I have a feeling you are still trying to write this code by copying other people's code, without understanding what is happening, or why they suggest things.
or define Qc_vector with shape (18,) rather than (18,1).

A quick fix if you don't want to bother too much would be to use numpy.reshape().
This way you can manage the shape mismatch.

numpy apply_along_axis computation on multidimensional data

I am translating a J language code into Python, but the way of python's apply function seems little unclear to me...
I currently have a (3, 3, 2) matrix A, and a (3, 3) matrix B.
I want to divide each matrix in A by rows in B:
A = np.arange(1,19).reshape(3,3,2)
array([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16],
[17, 18]]])
B = np.arange(1,10).reshape(3,3)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
That is the result would be like
1 2
1.5 2
1.66667 2
1.75 2
1.8 2
1.83333 2
1.85714 2
1.875 2
1.88889 2
for the first matrix of the result, the way I want to compute is the following:
1/1 2/1
3/2 4/2
5/3 6/3
I have tried
np.apply_along_axis(np.divide,1,A,B)
but it says
operands could not be broadcast together with shapes (10,) (10,10,2)
Any advice?
Thank you in advance = ]
ps. the J code is
A %"2 1 B
This means "divide each matrix("2) from A by each row ("1) from B"
or just simply
A % B

Broadcasting works if the trailing dimensions match or are one! So we can basically add a dummy dimension!
import numpy as np
A = np.arange(1,19).reshape(3,3,2)
B = np.arange(1,10).reshape(3,3)
B = B[...,np.newaxis] # This adds new dummy dimension in the end, B's new shape is (3,3,1)
A/B
array([[[1. , 2. ],
[1.5 , 2. ],
[1.66666667, 2. ]],
[[1.75 , 2. ],
[1.8 , 2. ],
[1.83333333, 2. ]],
[[1.85714286, 2. ],
[1.875 , 2. ],
[1.88888889, 2. ]]])

In python how can I find all the values in an array that are in between two specific values

Hi I would like to go all over certain values of an array lets say
spikeTimes = [1,2,3,4,5,6,7,8]
stimTimes =[0.5, 4.5, 7.5]
WIN_SIZE=2
I want for every element of stimTimes to be able to get the values that are in between stimTimes[indx] to stimTimes[indx]+WIN_SIZE
I should be able to get for the first value of spikeTimes(0.5) - 1,2 (between 0.5-2.5)
for the second value(4.5) - 5,6 (values that are between 4.5- 4.5+WIN_SIZE=6.5)
and for the 3rd value(7.5) - 8

This can be done with nested list comprehensions:
spikeTimes = [1, 2, 3, 4, 5, 6, 7, 8]
stimTimes = [0.5, 4.5, 7.5]
WIN_SIZE = 2
partitions = [[spike for spike in spikeTimes if stim <= spike <= stim + WIN_SIZE] for stim in stimTimes]
print(partitions)
Produces:
[[1, 2], [5, 6], [8]]

Vector operations with numpy

I have three numpy arrays:
X: a 3073 x 49000 matrix
W: a 10 x 3073 matrix
y: a 49000 x 1 vector
y contains values between 0 and 9, each value represents a row in W.
I would like to add the first column of X to the row in W given by the first element in y. I.e. if the first element in y is 3, add the first column of X to the fourth row of W. And then add the second column of X to the row in W given by the second element in y and so on, until all columns of X has been aded to the row in W specified by y, which means a total of 49000 added rows.
W[y] += X.T does not work for me, because this will not add more than one vector to a row in W.
Please note: I'm only looking for vectorized solutions. I.e. no for-loops.
EDIT: To clarify I'll add an example with small matrix sizes adapted from Salvador Dali's example below.
In [1]: import numpy as np
In [2]: a, b, c = 3, 4, 5
In [3]: np.random.seed(0)
In [4]: X = np.random.randint(10, size=(b,c))
In [5]: W = np.random.randint(10, size=(a,b))
In [6]: y = np.random.randint(a, size=(c,1))
In [7]: X
Out[7]:
array([[5, 0, 3, 3, 7],
[9, 3, 5, 2, 4],
[7, 6, 8, 8, 1],
[6, 7, 7, 8, 1]])
In [8]: W
Out[8]:
array([[5, 9, 8, 9],
[4, 3, 0, 3],
[5, 0, 2, 3]])
In [9]: y
Out[9]:
array([[0],
[1],
[1],
[2],
[0]])
In [10]: W[y.ravel()] + X.T
Out[10]:
array([[10, 18, 15, 15],
[ 4, 6, 6, 10],
[ 7, 8, 8, 10],
[ 8, 2, 10, 11],
[12, 13, 9, 10]])
In [11]: W[y.ravel()] = W[y.ravel()] + X.T
In [12]: W
Out[12]:
array([[12, 13, 9, 10],
[ 7, 8, 8, 10],
[ 8, 2, 10, 11]])
The problem is to get BOTH column 0 and column 4 in X added to row 0 in W, as well as both column 1 and 2 in X added to row 1 in W.
The desired outcome is thus:
W = [[17, 22, 16, 16],
[ 7, 11, 14, 17],
[ 8, 2, 10, 11]]

First the straight forward loop solution as reference:
In [65]: for i,j in enumerate(y):
W[j]+=X[:,i]
....:
In [66]: W
Out[66]:
array([[17, 22, 16, 16],
[ 7, 11, 14, 17],
[ 8, 2, 10, 11]])
An add.at solution:
In [67]: W=W1.copy()
In [68]: np.add.at(W,(y.ravel()),X.T)
In [69]: W
Out[69]:
array([[17, 22, 16, 16],
[ 7, 11, 14, 17],
[ 8, 2, 10, 11]])
add.at does an unbuffered calculation, getting around the buffering that prevents W[y.ravel()] += X.T from working. It is still iterative, but the loop has been moved to compiled code. It isn't true vectorization because the order of application matters. The addition for one row of X.T depends on the results from the previous rows.
https://stackoverflow.com/a/20811014/901925 is the answer I gave a couple of years ago to a similar question (for 1d arrays).
But when dealing with your large arrays:
X: a 3073 x 49000 matrix
W: a 10 x 3073 matrix
y: a 49000 x 1 vector
this can run into speed issues. Note that W[y.ravel()] is the same size as X.T (why did you pick these sizes that require transpose?). And it's a copy, not a view. So there's already a time penalty.
bincount has been suggested in previous questions, and I think it is faster. Making for loop with index arrays faster (both bincount and add.at solutions)
Iterating over the small 3073 dimension could also have speed advantages. Or better yet on the size 10 dimension as Divakar demonstrates.
For the small test case, a,b,c=3,4,5, the add.at solution is fastest, with Divakar's bincount and einseum next. For a larger a,b,c=10,1000,20000, add.at gets very slow, with bincount being the fastest.
Related SO answers
https://stackoverflow.com/a/28205888/901925 (notes that bincount requires complete coverage for y).
https://stackoverflow.com/a/30041823/901925 (where Divakar again shows that bincount rules!)

Vectorized approaches
Approach #1
Based on this answer, here's a vectorized solution using np.bincount -
N = y.max()+1
id = y.ravel() + np.arange(X.shape[0])[:,None]*N
W[:N] += np.bincount(id.ravel(), weights=X.ravel()).reshape(-1,N).T
Approach #2
You can make good usage of boolean indexing and np.einsum to get the job done in a concise vectorized manner -
N = y.max()+1
W[:N] += np.einsum('ijk,lk->il',(np.arange(N)[:,None,None] == y.ravel()),X)
Loopy approaches
Approach #3
Since you are selecting and adding up a huge number of columns from X per unique y, it might be better in terms of performance to run a loop with complexity equal to the number of such unique y's, which seems to be at max equal to the number of rows in W and that in your case is just 10. Thus, the loop has just 10 iterations, not bad! Here's the implementation to fulfill those aspirations -
for k in range(W.shape[0]):
W[k] += X[:,(y==k).ravel()].sum(1)
Approach #4
You can bring in np.einsum to do the columnwise summations and have the final output like so -
for k in range(W.shape[0]):
W[k] += np.einsum('ij->i',X[:,(y==k).ravel()])

This will achieve what you want: X + W[y.ravel()].T
To see that this really does the work, here is a reproducible example:
import numpy as np
np.random.seed(0)
a, b, c = 3, 5, 4 # you can use your 3073, 49000, 10 later
X = np.random.rand(a, b)
W = np.random.rand(c, a)
y = np.random.randint(c, size=(b, 1))
Now your matrices are:
[[ 0.0871293 0.0202184 0.83261985]
[ 0.77815675 0.87001215 0.97861834]
[ 0.79915856 0.46147936 0.78052918]
[ 0.11827443 0.63992102 0.14335329]]
[[3]
[0]
[3]
[2]
[0]]
[[ 0.5488135 0.71518937 0.60276338 0.54488318 0.4236548 ]
[ 0.64589411 0.43758721 0.891773 0.96366276 0.38344152]
[ 0.79172504 0.52889492 0.56804456 0.92559664 0.07103606]]
And W[y.ravel()] gives you " W given by the first element in y". By transposing it, you will get a matrix ready to be added to X:
[[ 0.11827443 0.0871293 0.11827443 0.79915856 0.0871293 ]
[ 0.63992102 0.0202184 0.63992102 0.46147936 0.0202184 ]
[ 0.14335329 0.83261985 0.14335329 0.78052918 0.83261985]]

While I can't say that this is very pythonic, it is a solution (I think):
for column in range(x.shape[1]):
w[y[column]] = x[:,column].T

Interpolation of values when zooming down

I have a 2D array that I would like to down sample to compare it to another.
Lets say my array x is 512x512, I'd like an array y 128x128 where the elements of y are build using an interpolation of the values overs 4x4 blocks of x (this interpolation could just be taking the average, but other methodes, like geometric average, could be interesting)
So far I looked at scipy.ndimage.interpolation.zoom but I don't get the results I want
>> x = np.arange(16).reshape(4,4)
>> print(x)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
>> y = scipy.ndimage.interpolation.zoom(x, 0.5)
>> print(y)
[[ 0 3]
[12 15]]
I expected y to be
[[ 2.5 4.5]
[10.5 12.5]]
Note that simply setting dtype=np.float32 doesn't solve that ...

sklearn.feature_extraction.image.extract_patches cleverly uses np.lib.stride_tricks.as_strided to produce a windowed array that can be operated on.
The sliding_window function, found here
Efficient Overlapping Windows with Numpy, produces a windowed array with or without overlap
also and let's you get a glimpse of what is happening under the hood.
>>> a = np.arange(16).reshape(4,4)
step_height,step_width determines the overlap for the windows - in your case the steps are the same as the window size, no overlap.
>>> window_height, window_width, step_height, step_width = 2, 2, 2, 2
>>> y = sliding_window(a, (window_height, window_width), (step_height,step_width))
>>> y
array([[[ 0, 1],
[ 4, 5]],
[[ 2, 3],
[ 6, 7]],
[[ 8, 9],
[12, 13]],
[[10, 11],
[14, 15]]])
Operate on the windows:
>>> y = y.mean(axis = (1,2))
>>> y
array([ 2.5, 4.5, 10.5, 12.5])
You need to determine the final shape depending on the number of windows.
>>> final_shape = (2,2)
>>> y = y.reshape(final_shape)
>>> y
array([[ 2.5, 4.5],
[ 10.5, 12.5]])
Searching SO for numpy, window, array should produce numerous other answers and possible solutions.

What you seem to be looking for is the mean over blocks of 4, which is not obtainable with zoom, since zoom uses interpolation (see its docstring)
To obtain what you show, try the following
import numpy as np
x = np.arange(16).reshape(4, 4)
xx = x.reshape(len(x) // 2, 2, x.shape[1] // 2, 2).transpose(0, 2, 1, 3).reshape(len(x) // 2, x.shape[1] // 2, -1).mean(-1)
print xx
This yields
[[ 2.5 4.5]
[ 10.5 12.5]]
Alternatively, this can be done using sklearn.feature_extraction.image.extract_patches
from sklearn.feature_extraction.image import extract_patches
patches = extract_patches(x, patch_shape=(2, 2), extraction_step=(2, 2))
xx = patches.mean(-1).mean(-1)
print xx
However, if your goal is to subsample an image in a graceful way, then taking the mean over blocks of the image is not the right way to do it: It is likely to cause aliasing effects. What you should do in this case is smooth the image ever so slightly using scipy.ndimage.gaussian_filter (e.g. sigma=0.35 * subsample_factor) and then subsample simply by indexing [::2, ::2]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

averaging matrix efficiently - python

reshape - get mean - reshape >>> a.reshape(-1, a.shape[1]//2).mean(1).reshape(a.shape[0],-1) array([[ 1.5, 3.5], [ 5.5, 7.5], [ 9.5, 11.5], [ 13.5, 15.5]]) is supposed to work for any array size, and reshape doesn't make a copy.

Related

Index values to a vector with numpy in python

numpy apply_along_axis computation on multidimensional data

In python how can I find all the values in an array that are in between two specific values

Vector operations with numpy

Interpolation of values when zooming down

Categories

Resources