numpy apply_along_axis computation on multidimensional data - python

I am translating a J language code into Python, but the way of python's apply function seems little unclear to me...
I currently have a (3, 3, 2) matrix A, and a (3, 3) matrix B.
I want to divide each matrix in A by rows in B:
A = np.arange(1,19).reshape(3,3,2)
array([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16],
[17, 18]]])
B = np.arange(1,10).reshape(3,3)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
That is the result would be like
1 2
1.5 2
1.66667 2
1.75 2
1.8 2
1.83333 2
1.85714 2
1.875 2
1.88889 2
for the first matrix of the result, the way I want to compute is the following:
1/1 2/1
3/2 4/2
5/3 6/3
I have tried
np.apply_along_axis(np.divide,1,A,B)
but it says
operands could not be broadcast together with shapes (10,) (10,10,2)
Any advice?
Thank you in advance = ]
ps. the J code is
A %"2 1 B
This means "divide each matrix("2) from A by each row ("1) from B"
or just simply
A % B

Broadcasting works if the trailing dimensions match or are one! So we can basically add a dummy dimension!
import numpy as np
A = np.arange(1,19).reshape(3,3,2)
B = np.arange(1,10).reshape(3,3)
B = B[...,np.newaxis] # This adds new dummy dimension in the end, B's new shape is (3,3,1)
A/B
array([[[1. , 2. ],
[1.5 , 2. ],
[1.66666667, 2. ]],
[[1.75 , 2. ],
[1.8 , 2. ],
[1.83333333, 2. ]],
[[1.85714286, 2. ],
[1.875 , 2. ],
[1.88888889, 2. ]]])

Related

Why numpy.fft.rfft/irfft transforms do not bring the input back?

I have a list, when I transform it by np.fft.rfft and bring it back by np.fft.irfft it does not work for ex(2) but work with ex(1). What should I do to make it work with ex(2)?
ex(1):
import NumPy as np
z=[[1,2,34,45],[1,2,5,6],[7,8,9,10]]
x1=np.fft.rfft(z)
x2=np.fft.irfft(x1)
print(x2)
print(z)
out:
[[ 1. 2. 34. 45.]
[ 1. 2. 5. 6.]
[ 7. 8. 9. 10.]]
[[1, 2, 34, 45], [1, 2, 5, 6], [7, 8, 9, 10]]
ex(2):
import NumPy as np
z1=[[5,8,6],[45,6,3],[847,5847,6]]
x3=np.fft.rfft(z1)
x4=np.fft.irfft(x3)
print(x4)
print(z1)
out:
[[ 8.5 10.5 ]
[ 47.25 6.75]
[2310.25 4389.75]]
[[5, 8, 6], [45, 6, 3], [847, 5847, 6]]
Please help.
The isn't an error but the intended behaviour of np.rfft for input of odd length:
The truncated or zero-padded input, transformed along the axis
indicated by axis, or the last one if axis is not specified. If n is
even, the length of the transformed axis is (n/2)+1. If n is odd, the
length is (n+1)/2.
This is a consequence of Nyquist-Shannon sampling theorem.
In order to solve this, you can simply add a zero at the end of every rows of z1 if there is an odd number of columns (i.e. zero-padding) by specifying an appropriate n kwarg in np.rfft call which gives:
import numpy as np
z1 = np.array([[5,8,6],[45,6,3],[847,5847,6]])
n = z1.shape[1]
if n%2:
# zero padding if n odd
n += 1
x3 = np.fft.rfft(z1,n,axis=-1)
x4 = np.fft.irfft(x3)
which gives the initial input:
print(x4)
>>>[[5.000e+00 8.000e+00 6.000e+00 0.000e+00]
[4.500e+01 6.000e+00 3.000e+00 0.000e+00]
[8.470e+02 5.847e+03 6.000e+00 0.000e+00]]
print(z1)
>>>[[ 5 8 6]
[ 45 6 3]
[ 847 5847 6]]
Feel free to discard the last column of zeros of x4 after going back from the frequency domain.

tensorflow: how to interleave columns of two tensors (e.g. using tf.scatter_nd)?

I've read the tf.scatter_nd documentation and run the example code for 1D and 3D tensors... and now I'm trying to do it for a 2D tensor. I want to 'interleave' the columns of two tensors. For 1D tensors, one can do this via
'''
We want to interleave elements of 1D tensors arr1 and arr2, where
arr1 = [10, 11, 12]
arr2 = [1, 2, 3, 4, 5, 6]
such that
desired result = [1, 2, 10, 3, 4, 11, 5, 6, 12]
'''
import tensorflow as tf
with tf.Session() as sess:
updates1 = tf.constant([1,2,3,4,5,6])
indices1 = tf.constant([[0], [1], [3], [4], [6], [7]])
shape = tf.constant([9])
scatter1 = tf.scatter_nd(indices1, updates1, shape)
updates2 = tf.constant([10,11,12])
indices2 = tf.constant([[2], [5], [8]])
scatter2 = tf.scatter_nd(indices2, updates2, shape)
result = scatter1 + scatter2
print(sess.run(result))
(aside: is there a better way to do this? I'm all ears.)
This gives the output
[ 1 2 10 3 4 11 5 6 12]
Yay! that worked!
Now lets' try to extend this to 2D.
'''
We want to interleave the *columns* (not rows; rows would be easy!) of
arr1 = [[1,2,3,4,5,6],[1,2,3,4,5,6],[1,2,3,4,5,6]]
arr2 = [[10 11 12], [10 11 12], [10 11 12]]
such that
desired result = [[1,2,10,3,4,11,5,6,12],[1,2,10,3,4,11,5,6,12],[1,2,10,3,4,11,5,6,12]]
'''
updates1 = tf.constant([[1,2,3,4,5,6],[1,2,3,4,5,6],[1,2,3,4,5,6]])
indices1 = tf.constant([[0], [1], [3], [4], [6], [7]])
shape = tf.constant([3, 9])
scatter1 = tf.scatter_nd(indices1, updates1, shape)
This gives the error
ValueError: The outer 1 dimensions of indices.shape=[6,1] must match the outer 1
dimensions of updates.shape=[3,6]: Dimension 0 in both shapes must be equal, but
are 6 and 3. Shapes are [6] and [3]. for 'ScatterNd_2' (op: 'ScatterNd') with
input shapes: [6,1], [3,6], [2].
Seems like my indices is specifying row indices instead of column indices, and given the way that arrays are "connected" in numpy and tensorflow (i.e. row-major order), does that mean
I need to explicitly specify every single pair of indices for every element in updates1?
Or is there some kind of 'wildcard' specification I can use for the rows? (Note indices1 = tf.constant([[:,0], [:,1], [:,3], [:,4], [:,6], [:,7]]) gives syntax errors, as it probably should.)
Would it be easier to just do a transpose, interleave the rows, then transpose back?
Because I tried that...
scatter1 = tf.scatter_nd(indices1, tf.transpose(updates1), tf.transpose(shape))
print(sess.run(tf.transpose(scatter1)))
...and got a much longer error message, that I don't feel like posting unless someone requests it.
PS- I searched to make sure this isn't a duplicate -- I find it hard to imagine that someone else hasn't asked this before -- but turned up nothing.
This is pure slicing but I didn't know that syntax like arr1[0:,:][:,:2] actually works. It seems it does but not sure if it is better.
This may be the wildcard slicing mechanism you are looking for.
arr1 = tf.constant([[1,2,3,4,5,6],[1,2,3,4,5,7],[1,2,3,4,5,8]])
arr2 = tf.constant([[10, 11, 12], [10, 11, 12], [10, 11, 12]])
with tf.Session() as sess :
sess.run( tf.global_variables_initializer() )
print(sess.run(tf.concat([arr1[0:,:][:,:2], arr2[0:,:] [:,:1],
arr1[0:,:][:,2:4],arr2[0:, :][:, 1:2],
arr1[0:,:][:,4:6],arr2[0:, :][:, 2:3]],axis=1)))
Output is
[[ 1 2 10 3 4 11 5 6 12]
[ 1 2 10 3 4 11 5 7 12]
[ 1 2 10 3 4 11 5 8 12]]
So, for example,
arr1[0:,:] returns
[[1 2 3 4 5 6]
[1 2 3 4 5 7]
[1 2 3 4 5 8]]
and arr1[0:,:][:,:2] returns the first two columns
[[1 2]
[1 2]
[1 2]]
axis is 1.
Some moderators might have regarded my question as a duplicate of this one, not because the questions are the same, but only because the answers contain parts one can use to answer this question -- i.e. specifying every index combination by hand.
A totally different method would be to multiply by a permutation matrix as shown in the last answer to this question. Since my original question was about scatter_nd, I'm going to post this solution but wait to see what other answers come in... (Alternatively, I or someone could edit the question to make it about reordering columns, not specific to scatter_nd --EDIT: I have just edited the question title to reflect this).
Here, we concatenate the two different arrays/tensors...
import numpy as np
import tensorflow as tf
sess = tf.Session()
# the ultimate application is for merging variables which should be in groups,
# e.g. in this example, [1,2,10] is a group of 3, and there are 3 groups of 3
n_groups = 3
vars_per_group = 3 # once the single value from arr2 (below) is included
arr1 = 10+tf.range(n_groups, dtype=float)
arr1 = tf.stack((arr1,arr1,arr1),0)
arr2 = 1+tf.range(n_groups * (vars_per_group-1), dtype=float)
arr2 = tf.stack((arr2,arr2,arr2),0)
catted = tf.concat((arr1,arr2),1) # concatenate the two arrays together
print("arr1 = \n",sess.run(arr1))
print("arr2 = \n",sess.run(arr2))
print("catted = \n",sess.run(catted))
Which gives output
arr1 =
[[10. 11. 12.]
[10. 11. 12.]
[10. 11. 12.]]
arr2 =
[[1. 2. 3. 4. 5. 6.]
[1. 2. 3. 4. 5. 6.]
[1. 2. 3. 4. 5. 6.]]
catted =
[[10. 11. 12. 1. 2. 3. 4. 5. 6.]
[10. 11. 12. 1. 2. 3. 4. 5. 6.]
[10. 11. 12. 1. 2. 3. 4. 5. 6.]]
Now we build the permutation matrix and multiply...
start_index = 2 # location of where the interleaving begins
# cml = "column map list" is the list of where each column will get mapped to
cml = [start_index + x*(vars_per_group) for x in range(n_groups)] # first array
for i in range(n_groups): # second array
cml += [x + i*(vars_per_group) for x in range(start_index)] # vars before start_index
cml += [1 + x + i*(vars_per_group) + start_index \
for x in range(vars_per_group-start_index-1)] # vars after start_index
print("\n cml = ",cml,"\n")
# Create a permutation matrix using p
np_perm_mat = np.zeros((len(cml), len(cml)))
for idx, i in enumerate(cml):
np_perm_mat[idx, i] = 1
perm_mat = tf.constant(np_perm_mat,dtype=float)
result = tf.matmul(catted, perm_mat)
print("result = \n",sess.run(result))
Which gives output
cml = [2, 5, 8, 0, 1, 3, 4, 6, 7]
result =
[[ 1. 2. 10. 3. 4. 11. 5. 6. 12.]
[ 1. 2. 10. 3. 4. 11. 5. 6. 12.]
[ 1. 2. 10. 3. 4. 11. 5. 6. 12.]]
Even though this doesn't use scatter_nd as the original question asked, one thing I like about this is, you can allocate the perm_mat once in some __init__() method, and hang on to it, and after that initial overhead it's just matrix-matrix multiplication by a sparse, constant matrix, which should be pretty fast. (?)
Still happy to wait and see what other answers might come in.

Access all elements at given x, y position in 3-dimensional numpy array

mat_a = np.random.random((5, 5))
mat_b = np.random.random((5, 5))
mat_c = np.random.random((5, 5))
bigmat = np.stack((mat_a, mat_b, mat_c)) # this is a 3, 5, 5 array
for (x, y, z), value in np.ndenumerate(bigmat):
print (x, y, z)
In the example above, how can I loop so that I iterate only across the 5 x 5 array and at each position I get 3 values i.e. loop should run 25 times and each time, I get an array with 3 values (one from each of mat_a, mat_b and mat_c)
EDIT: Please note that I need to be able to access elements by position later i.e. if bigmat is reshaped, there should be a way to access element based on specific y, z
There is a function that generates all indices for a given shape, ndindex.
for y,z in np.ndindex(bigmat.shape[1:]):
print(y,z,bigmat[:,y,z])
0 0 [ 0 25 50]
0 1 [ 1 26 51]
0 2 [ 2 27 52]
0 3 [ 3 28 53]
0 4 [ 4 29 54]
1 0 [ 5 30 55]
1 1 [ 6 31 56]
...
For a simple case like this it isn't much easier than the double for range loop. Nor will it be faster; but you asked for an iteration.
Another iterator is itertools.product(range(5),range(5))
Timewise, product is pretty good:
In [181]: timeit [bigmat[:,y,z] for y,z in itertools.product(range(5),range(5
...: ))]
10000 loops, best of 3: 26.5 µs per loop
In [191]: timeit [bigmat[:,y,z] for (y,z),v in np.ndenumerate(bigmat[0,...])]
...:
10000 loops, best of 3: 61.9 µs per loop
transposing and reshaping is the fastest way to get a list (or array) of the triplets - but it does not give the indices as well:
In [198]: timeit list(bigmat.transpose(1,2,0).reshape(-1,3))
100000 loops, best of 3: 15.1 µs per loop
But the same operation gets the indices from np.mgrid (or np.meshgrid):
np.mgrid[0:5,0:5].transpose(1,2,0).reshape(-1,2)
(though this is surprisingly slow)
Simon's answer is fine. If you reshape things properly you can get them all in a nice array without any looping.
In [33]: bigmat
Out[33]:
array([[[ 0.51701737, 0.90723012, 0.42534365, 0.3087416 , 0.44315561],
[ 0.3902181 , 0.59261932, 0.21231607, 0.61440961, 0.24910501],
[ 0.63911556, 0.16333704, 0.62123781, 0.6298554 , 0.29012245],
[ 0.95260313, 0.86813746, 0.26722519, 0.14738102, 0.60523372],
[ 0.33189713, 0.6494197 , 0.30269686, 0.47312059, 0.84690451]],
[[ 0.95974972, 0.09659425, 0.06765838, 0.36025411, 0.91492751],
[ 0.92421874, 0.31670119, 0.99623178, 0.30394588, 0.30970197],
[ 0.53590091, 0.04273708, 0.97876218, 0.09686119, 0.78394054],
[ 0.5463358 , 0.29239676, 0.6284822 , 0.96649507, 0.05261606],
[ 0.91733464, 0.77312656, 0.45962704, 0.06446105, 0.58643379]],
[[ 0.75161903, 0.43286354, 0.09633492, 0.52275049, 0.40827006],
[ 0.51816158, 0.05330978, 0.49134325, 0.73652136, 0.14437844],
[ 0.83833791, 0.2072704 , 0.18345275, 0.57282927, 0.7218022 ],
[ 0.56180415, 0.85591746, 0.35482315, 0.94562085, 0.92706479],
[ 0.2994697 , 0.99724253, 0.66386017, 0.0121033 , 0.43448805]]])
Reshaping things...
new_bigmat = bigmat.T.reshape([25,3])
In [36]: new_bigmat
Out[36]:
array([[ 0.51701737, 0.95974972, 0.75161903],
[ 0.3902181 , 0.92421874, 0.51816158],
[ 0.63911556, 0.53590091, 0.83833791],
[ 0.95260313, 0.5463358 , 0.56180415],
[ 0.33189713, 0.91733464, 0.2994697 ],
[ 0.90723012, 0.09659425, 0.43286354],
[ 0.59261932, 0.31670119, 0.05330978],
[ 0.16333704, 0.04273708, 0.2072704 ],
[ 0.86813746, 0.29239676, 0.85591746],
[ 0.6494197 , 0.77312656, 0.99724253],
[ 0.42534365, 0.06765838, 0.09633492],
[ 0.21231607, 0.99623178, 0.49134325],
[ 0.62123781, 0.97876218, 0.18345275],
[ 0.26722519, 0.6284822 , 0.35482315],
[ 0.30269686, 0.45962704, 0.66386017],
[ 0.3087416 , 0.36025411, 0.52275049],
[ 0.61440961, 0.30394588, 0.73652136],
[ 0.6298554 , 0.09686119, 0.57282927],
[ 0.14738102, 0.96649507, 0.94562085],
[ 0.47312059, 0.06446105, 0.0121033 ],
[ 0.44315561, 0.91492751, 0.40827006],
[ 0.24910501, 0.30970197, 0.14437844],
[ 0.29012245, 0.78394054, 0.7218022 ],
[ 0.60523372, 0.05261606, 0.92706479],
[ 0.84690451, 0.58643379, 0.43448805]])
Edit: To keep track of indices, you might try the following (open to other ideas here). Each row in xy_index gives your x,y values respectively for the corresponding row in the new_bigmat array. This answer doesn't require any loops. If looping is acceptable you can borrow Simon's suggestion in the comments or np.ndindex as suggested in hpaulj's answer.
row_index, col_index = np.meshgrid(range(5),range(5))
xy_index = np.array([row_index.flatten(), col_index.flatten()]).T
In [48]: xy_index
Out[48]:
array([[0, 0],
[1, 0],
[2, 0],
[3, 0],
[4, 0],
[0, 1],
[1, 1],
[2, 1],
[3, 1],
[4, 1],
[0, 2],
[1, 2],
[2, 2],
[3, 2],
[4, 2],
[0, 3],
[1, 3],
[2, 3],
[3, 3],
[4, 3],
[0, 4],
[1, 4],
[2, 4],
[3, 4],
[4, 4]])
The required result can be obtained by slicing, e.g.:
for x in range(5):
for y in range(5):
print (bigmat[:,x,y])
If you don't actually need to stack the arrays, and only want to iterate over all three arrays, element-wise, at once, numpy.nditer works - I'm still fuzzy on all its parameters I don't know if it is any faster, test it on a subset.
a1 = np.arange(9).reshape(3,3) + 10
a2 = np.arange(9).reshape(3,3) + 20
a3 = np.arange(9).reshape(3,3) + 30
c = np.nditer((a1, a2, a3))
for thing in c:
print(np.array(thing))
>>>
[10 20 30]
[11 21 31]
[12 22 32]
[13 23 33]
[14 24 34]
[15 25 35]
[16 26 36]
[17 27 37]
[18 28 38]
>>>

Combine or join numpy arrays

How can I join two numpy ndarrays to accomplish the following in a fast way, using optimized numpy, without any looping?
>>> a = np.random.rand(2,2)
>>> a
array([[ 0.09028802, 0.2274419 ],
[ 0.35402772, 0.87834376]])
>>> b = np.random.rand(2,2)
>>> b
array([[ 0.4776325 , 0.73690098],
[ 0.69181444, 0.672248 ]])
>>> c = ???
>>> c
array([[ 0.09028802, 0.2274419, 0.4776325 , 0.73690098],
[ 0.09028802, 0.2274419, 0.69181444, 0.672248 ],
[ 0.35402772, 0.87834376, 0.4776325 , 0.73690098],
[ 0.35402772, 0.87834376, 0.69181444, 0.672248 ]])
Not the prettiest, but you could combine hstack, repeat, and tile:
>>> a = np.arange(4).reshape(2,2)
>>> b = a+10
>>> a
array([[0, 1],
[2, 3]])
>>> b
array([[10, 11],
[12, 13]])
>>> np.hstack([np.repeat(a,len(a),0),np.tile(b,(len(b),1))])
array([[ 0, 1, 10, 11],
[ 0, 1, 12, 13],
[ 2, 3, 10, 11],
[ 2, 3, 12, 13]])
Or for a 3x3 case:
>>> a = np.arange(9).reshape(3,3)
>>> b = a+10
>>> np.hstack([np.repeat(a,len(a),0),np.tile(b,(len(b),1))])
array([[ 0, 1, 2, 10, 11, 12],
[ 0, 1, 2, 13, 14, 15],
[ 0, 1, 2, 16, 17, 18],
[ 3, 4, 5, 10, 11, 12],
[ 3, 4, 5, 13, 14, 15],
[ 3, 4, 5, 16, 17, 18],
[ 6, 7, 8, 10, 11, 12],
[ 6, 7, 8, 13, 14, 15],
[ 6, 7, 8, 16, 17, 18]])
What you want is, apparently, the cartesian product of a and b, stacked horizontally. You can use the itertools module to generate the indices for the numpy arrays, then numpy.hstack to stack them:
import numpy as np
from itertools import product
a = np.array([[ 0.09028802, 0.2274419 ],
[ 0.35402772, 0.87834376]])
b = np.array([[ 0.4776325 , 0.73690098],
[ 0.69181444, 0.672248 ],
[ 0.79941110, 0.52273 ]])
a_inds, b_inds = map(list, zip(*product(range(len(a)), range(len(b)))))
c = np.hstack((a[a_inds], b[b_inds]))
This results in a c of:
array([[ 0.09028802, 0.2274419 , 0.4776325 , 0.73690098],
[ 0.09028802, 0.2274419 , 0.69181444, 0.672248 ],
[ 0.09028802, 0.2274419 , 0.7994111 , 0.52273 ],
[ 0.35402772, 0.87834376, 0.4776325 , 0.73690098],
[ 0.35402772, 0.87834376, 0.69181444, 0.672248 ],
[ 0.35402772, 0.87834376, 0.7994111 , 0.52273 ]])
Breaking down the indices thing:
product(range(len(a)), range(len(b)) will generate something that looks like this if you convert it to a list:
[(0, 0), (0, 1), (1, 0), (1, 1)]
You want something like this: [0, 0, 1, 1], [0, 1, 0, 1], so you need to transpose the generator. The idiomatic way to do this is with zip(*zipped_thing). However, if you just directly assign these, you'll get tuples, like this:
[(0, 0, 1, 1), (0, 1, 0, 1)]
But numpy arrays interpret tuples as multi-dimensional indexes, so you want to turn them to lists, which is why I mapped the list constructor onto the result of the product function.
Let's walk through a prospective solution to handle generic cases involving different shaped arrays with some inlined comments to explain the method involved.
(1) First off, we store shapes of input arrays.
ma,na = a.shape
mb,nb = b.shape
(2) Next up, initialize a 3D array with number of columns being the sum of number of columns in input arraysa and b. Use np.empty for this task.
out = np.empty((ma,mb,na+nb),dtype=a.dtype)
(3) Then, set the first axis of the 3D array for the first "na" columns with the rows from a with a[:,None,:]. So, if we assign it to out[:,:,:na], that second colon would indicate to NumPy that we need a broadcasted setting, if possible as always happens with singleton dims in NumPy arrays. In effect, this would be same as tiling/repeating, but possibly in an efficient way.
out[:,:,:na] = a[:,None,:]
(4) Repeat for setting elements from b into output array. This time we would broadcast along the first axis of out with out[:,:,na:], with that first colon helping us do that broadcasting.
out[:,:,na:] = b
(5) Final step is to reshape the output to a 2D shape. This could be done with simply changing the shape with the required 2D shape tuple. Reshaping just changes view and is effectively zero cost.
out.shape = (ma*mb,na+nb)
Condensing everything, the full implementation would look like this -
ma,na = a.shape
mb,nb = b.shape
out = np.empty((ma,mb,na+nb),dtype=a.dtype)
out[:,:,:na] = a[:,None,:]
out[:,:,na:] = b
out.shape = (ma*mb,na+nb)
You can use dstack() and broadcast_arrays():
import numpy as np
a = np.random.randint(0, 10, (3, 2))
b = np.random.randint(10, 20, (4, 2))
np.dstack(np.broadcast_arrays(a[:, None], b)).reshape(-1, a.shape[-1] + b.shape[-1])
Try either np.hstack or np.vstack. This would work even for arrays that are not the same length. All you would need to do is this:
np.hstack(appendedarray[:]) or np.vstack(appendedarray[:])
All arrays are indexable, so you can merge the by just calling:
a[:2],b[:2]
or you can use core numpy stacking functions, should look something like this:
c = np.vstack(a,b)

averaging matrix efficiently

in Python, given an n x p matrix, e.g. 4 x 4, how can I return a matrix that's 4 x 2 that simply averages the first two columns and the last two columns for all 4 rows of the matrix?
e.g. given:
a = array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
return a matrix that has the average of a[:, 0] and a[:, 1] and the average of a[:, 2] and a[:, 3].
I want this to work for an arbitrary matrix of n x p assuming that the number of columns I am averaging of n is obviously evenly divisible by n.
let me clarify: for each row, I want to take the average of the first two columns, then the average of the last two columns. So it would be:
1 + 2 / 2, 3 + 4 / 2 <- row 1 of new matrix
5 + 6 / 2, 7 + 8 / 2 <- row 2 of new matrix, etc.
which should yield a 4 by 2 matrix rather than 4 x 4.
thanks.
How about using some math? You can define a matrix M = [[0.5,0],[0.5,0],[0,0.5],[0,0.5]] so that A*M is what you want.
from numpy import array, matrix
A = array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
M = matrix([[0.5,0],
[0.5,0],
[0,0.5],
[0,0.5]])
print A*M
Generating M is pretty simple too, entries are 1/n or zero.
reshape - get mean - reshape
>>> a.reshape(-1, a.shape[1]//2).mean(1).reshape(a.shape[0],-1)
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])
is supposed to work for any array size, and reshape doesn't make a copy.
It's a bit unclear what should happen for matrices with n > 4, but this code will do what you want:
a = N.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]], dtype=float)
avg = N.vstack((N.average(a[:,0:2], axis=1), N.average(a[:,2:4], axis=1))).T
This yields avg =
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])
Here's a way to do it. You only need to change groupsize to make it work with other sizes like you said, though I'm not fully sure what you want.
groupsize = 2
out = np.hstack([np.mean(x,axis=1,out=np.zeros((a.shape[0],1))) for x in np.hsplit(a,groupsize)])
yields
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])
for out. Hopefully it gives you some ideas on how to do exactly what it is that you want to do. You can make groupsize dependent on the dimensions of a for instance.

Categories