I a little confused by the broadcasting rules of numpy. Suppose you want to perform an axis-wise scalar product of a higher dimension array to reduce the array dimension by one (basically to perform a weighted summation along one axis):
from numpy import *
A = ones((3,3,2))
v = array([1,2])
B = zeros((3,3))
# V01: this works
B[0,0] = v.dot(A[0,0])
# V02: this works
B[:,:] = v[0]*A[:,:,0] + v[1]*A[:,:,1]
# V03: this doesn't
B[:,:] = v.dot(A[:,:])
Why does V03 not work?
Cheers
np.dot(a, b) operates over the last axis of a and the second-to-last of b. So for your particular case in your question,you could always go with :
>>> a.dot(v)
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
If you want to keep the v.dot(a) order, you need to get the axis into position, which can easily be achieved with np.rollaxis:
>>> v.dot(np.rollaxis(a, 2, 1))
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
I don't like np.dot too much, unless it is for the obvious matrix or vector multiplication, because it is very strict about the output dtype when using the optional out parameter. Joe Kington has mentioned it already, but if you are going to be doing this type of things, get used to np.einsum: once you get the hang of the syntax, it will cut down the amount of time you spend worrying about reshaping things to a minimum:
>>> a = np.ones((3, 3, 2))
>>> np.einsum('i, jki', v, a)
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
Not that it is too relevant in this case, but it is also ridiculously fast:
In [4]: %timeit a.dot(v)
100000 loops, best of 3: 2.43 us per loop
In [5]: %timeit v.dot(np.rollaxis(a, 2, 1))
100000 loops, best of 3: 4.49 us per loop
In [7]: %timeit np.tensordot(v, a, axes=(0, 2))
100000 loops, best of 3: 14.9 us per loop
In [8]: %timeit np.einsum('i, jki', v, a)
100000 loops, best of 3: 2.91 us per loop
You can also use tensordot, in this particular case.
import numpy as np
A = np.ones((3,3,2))
v = np.array([1,2])
print np.tensordot(v, A, axes=(0, 2))
This yields:
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
The axes=(0,2) indicates that tensordot should sum over the first axis in v and the third axis in A. (Also have a look at einsum, which is more flexible, but harder to understand if you're not used to the notation.)
If speed is a consideration, tensordot is considerably faster than using apply_along_axes for small arrays.
In [14]: A = np.ones((3,3,2))
In [15]: v = np.array([1,2])
In [16]: %timeit np.tensordot(v, A, axes=(0, 2))
10000 loops, best of 3: 21.6 us per loop
In [17]: %timeit np.apply_along_axis(v.dot, 2, A)
1000 loops, best of 3: 258 us per loop
(The difference is less apparent for large arrays due to a constant overhead, though tensordot is consistently faster.)
You could use numpy.apply_along_axis() for this:
In [35]: np.apply_along_axis(v.dot, 2, A)
Out[35]:
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
The reason I think V03 doesn't work is that it's no different to:
B[:,:] = v.dot(A)
i.e. it tries to compute the dot product along the outermost axis of A.
Related
I have two numpy arrays NS, EW to sum up. Each of them has missing values at different positions, like
NS = array([[ 1., 2., nan],
[ 4., 5., nan],
[ 6., nan, nan]])
EW = array([[ 1., 2., nan],
[ 4., nan, nan],
[ 6., nan, 9.]]
How can I perform a summation operation in the numpy way, which will treat nan as zero if one array has nan at a location, and keep nan if both arrays has nan at the same location.
The result I expect to see is
SUM = array([[ 2., 4., nan],
[ 8., 5., nan],
[ 12., nan, 9.]])
When I try
SUM=np.add(NS,EW)
it gives me
SUM=array([[ 2., 4., nan],
[ 8., nan, nan],
[ 12., nan, nan]])
When I try
SUM = np.nansum(np.dstack((NS,EW)),2)
it gives me
SUM=array([[ 2., 4., 0.],
[ 8., 5., 0.],
[ 12., 0., 9.]])
Of course, I can realize my goal by doing element-level operation,
for i in range(np.size(NS,0)):
for j in range(np.size(NS,1)):
if np.isnan(NS[i,j]) and np.isnan(EW[i,j]):
SUM[i,j] = np.nan
elif np.isnan(NS[i,j]):
SUM[i,j] = EW[i,j]
elif np.isnan(EW[i,j]):
SUM[i,j] = NS[i,j]
else:
SUM[i,j] = NS[i,j]+EW[i,j]
but it is very slow. So I'm looking for a more numpy solution to solve this problem.
Thanks for help in advance!
Approach #1 : One approach with np.where -
def sum_nan_arrays(a,b):
ma = np.isnan(a)
mb = np.isnan(b)
return np.where(ma&mb, np.nan, np.where(ma,0,a) + np.where(mb,0,b))
Sample run -
In [43]: NS
Out[43]:
array([[ 1., 2., nan],
[ 4., 5., nan],
[ 6., nan, nan]])
In [44]: EW
Out[44]:
array([[ 1., 2., nan],
[ 4., nan, nan],
[ 6., nan, 9.]])
In [45]: sum_nan_arrays(NS, EW)
Out[45]:
array([[ 2., 4., nan],
[ 8., 5., nan],
[ 12., nan, 9.]])
Approach #2 : Probably a faster one with a mix of boolean-indexing -
def sum_nan_arrays_v2(a,b):
ma = np.isnan(a)
mb = np.isnan(b)
m_keep_a = ~ma & mb
m_keep_b = ma & ~mb
out = a + b
out[m_keep_a] = a[m_keep_a]
out[m_keep_b] = b[m_keep_b]
return out
Runtime test -
In [140]: # Setup input arrays with 4/9 ratio of NaNs (same as in the question)
...: a = np.random.rand(3000,3000)
...: b = np.random.rand(3000,3000)
...: a.ravel()[np.random.choice(range(a.size), size=4000000, replace=0)] = np.nan
...: b.ravel()[np.random.choice(range(b.size), size=4000000, replace=0)] = np.nan
...:
In [141]: np.nanmax(np.abs(sum_nan_arrays(a, b) - sum_nan_arrays_v2(a, b))) # Verify
Out[141]: 0.0
In [142]: %timeit sum_nan_arrays(a, b)
10 loops, best of 3: 141 ms per loop
In [143]: %timeit sum_nan_arrays_v2(a, b)
10 loops, best of 3: 177 ms per loop
In [144]: # Setup input arrays with lesser NaNs
...: a = np.random.rand(3000,3000)
...: b = np.random.rand(3000,3000)
...: a.ravel()[np.random.choice(range(a.size), size=4000, replace=0)] = np.nan
...: b.ravel()[np.random.choice(range(b.size), size=4000, replace=0)] = np.nan
...:
In [145]: np.nanmax(np.abs(sum_nan_arrays(a, b) - sum_nan_arrays_v2(a, b))) # Verify
Out[145]: 0.0
In [146]: %timeit sum_nan_arrays(a, b)
10 loops, best of 3: 69.6 ms per loop
In [147]: %timeit sum_nan_arrays_v2(a, b)
10 loops, best of 3: 38 ms per loop
Actually your nansum approach almost worked, you just need to add in the nans again:
def add_ignore_nans(a, b):
stacked = np.array([a, b])
res = np.nansum(stacked, axis=0)
res[np.all(np.isnan(stacked), axis=0)] = np.nan
return res
>>> add_ignore_nans(a, b)
array([[ 2., 4., nan],
[ 8., 5., nan],
[ 12., nan, 9.]])
This will be slower than #Divakars answer but I wanted to mention that you were pretty close already! :-)
I think we can get a bit more concise, in the same vein as Divakar's second approach. With a = NS and b = EW:
na = numpy.isnan(a)
nb = numpy.isnan(b)
a[na] = 0
b[nb] = 0
a += b
na &= nb
a[na] = numpy.nan
The operations are done in-place where possible to save memory, assuming this is is feasible in your scenario. The final result is in a.
I'm looking for the fastest way to calculate the squared difference between two vectors ((x1-x2)**2), but pairwise (all combinations or only the upper triangle).
x1 = [1,3,5,6,8]
x2 = [3,6,7,9,12]
Expected output:
array([[ 4., 25., 36., 64., 121.],
[ 0., 9., 16., 36., 81.],
[ 4., 1., 4., 16., 49.],
[ 9., 0., 1., 9., 36.],
[ 25., 4., 1., 1., 16.]])
or
array([[ 4., 25., 36., 64., 121.],
[ 0., 9., 16., 36., 81.],
[ 0., 0., 4., 16., 49.],
[ 0., 0., 0., 9., 36.],
[ 0., 0., 0., 0., 16.]])
or even (if faster):
array([ 4., 25., 36., 64., 121., 9., 16., 36., 81.,
4., 1., 4., 16., 49., 9., 1., 9., 36.,
25., 4., 1., 1., 16.])
Here's one with broadcasting and masking to get the upper triangular ones and then squaring only those for better performance efficiency -
def pairwise_squared_diff(x1, x2):
x1 = np.asarray(x1)
x2 = np.asarray(x2)
diffs = x1[:,None] - x2
mask = np.arange(len(x1))[:,None] <= np.arange(len(x2))
return (diffs[mask])**2
Sample run -
In [85]: x1
Out[85]: array([1, 3, 5, 6, 8])
In [86]: x2
Out[86]: array([ 3, 6, 7, 9, 12])
In [87]: pairwise_squared_diff(x1, x2)
Out[87]:
array([ 4, 25, 36, 64, 121, 9, 16, 36, 81, 4, 16, 49, 9,
36, 16])
Possible improvements
Improvement #1 :
We could also use np.tri to generate mask -
mask = ~np.tri(len(x1),len(x2),dtype=bool,k=-1)
Improvement #2 :
If we are okay with a 2D output with the lower triangular ones set as 0s, then a simple elementwise multiplication with mask solves it too to get the final output -
(diffs*mask)**2
This would work well with numexpr module for large data and to gain memory efficiency and hence performance.
Improvement #3 :
We could also compute the differences with numexpr and hence compute the masked output too with the same evaulate method, to give ourselves a new solution altogether -
def pairwise_squared_diff_numexpr(x1, x2):
x1 = np.asarray(x1)
x2 = np.asarray(x2)
mask = ~np.tri(len(x1),len(x2),dtype=bool,k=-1)
return ne.evaluate('mask*((x1D-x2)**2)',{'x1D':x1[:,None]})
Timings with improvements
Let's study these suggestions on performance for large arrays -
Setup :
In [136]: x1 = np.random.randint(0,9,(1000))
In [137]: x2 = np.random.randint(0,9,(1000))
With Improvement #1 :
In [138]: %timeit np.arange(len(x1))[:,None] <= np.arange(len(x2))
1000 loops, best of 3: 772 µs per loop
In [139]: %timeit ~np.tri(len(x1),len(x2),dtype=bool,k=-1)
1000 loops, best of 3: 243 µs per loop
With Improvement #2 :
In [140]: import numexpr as ne
In [141]: diffs = x1[:,None] - x2
...: mask = np.arange(len(x1))[:,None] <= np.arange(len(x2))
In [142]: %timeit (diffs[mask])**2
1000 loops, best of 3: 1.46 ms per loop
In [143]: %timeit ne.evaluate('(diffs*mask)**2')
1000 loops, best of 3: 1.05 ms per loop
With Improvement #3 on complete solutions :
In [170]: %timeit pairwise_squared_diff(x1, x2)
100 loops, best of 3: 3.66 ms per loop
In [171]: %timeit pairwise_squared_diff_numexpr(x1, x2)
1000 loops, best of 3: 1.54 ms per loop
Loopy one
For completeness, here's a loopy one that leverages slicing to perform better than pure broadcasting one, owing to the memory-efficiency -
def pairwise_squared_diff_loopy(x1,x2):
n = len(x2)
idx = np.concatenate(( [0], np.arange(n,0,-1).cumsum() ))
start, stop = idx[:-1], idx[1:]
L = n*(n+1)//2
out = np.empty(L,dtype=np.result_type(x1,x2))
for i,(s0,s1) in enumerate(zip(start,stop)):
out[s0:s1] = x1[i] - x2[i:]
return out**2
Timings -
In [300]: x1 = np.random.randint(0,9,(1000))
...: x2 = np.random.randint(0,9,(1000))
In [301]: %timeit pairwise_squared_diff(x1, x2)
100 loops, best of 3: 3.44 ms per loop
In [302]: %timeit pairwise_squared_diff_loopy(x1, x2)
100 loops, best of 3: 2.73 ms per loop
You can use broadcasting:
x1 = np.asarray([1,3,5,6,8]).reshape(-1, 1)
x2 = np.asarray([3,6,7,9,12]).reshape(1, -1)
(x1 - x2)**2
Output:
array([[ 4, 25, 36, 64, 121],
[ 0, 9, 16, 36, 81],
[ 4, 1, 4, 16, 49],
[ 9, 0, 1, 9, 36],
[ 25, 4, 1, 1, 16]])
which is simple to code, but computes all values, so it may be optimized to compute only the upper triangle.
Since collections.Counter is so slow, I am pursuing a faster method of summing mapped values in Python 2.7. It seems like a simple concept and I'm kind of disappointed in the built-in Counter method.
Basically, I need to be able to take arrays like this:
array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.]])
And then "add" them so they look like this:
array([[ 0., 5.],
[ 1., 1.],
[ 2., 7.],
[ 3., 1.]])
If there isn't a good way to do this quickly and efficiently, I'm open to any other ideas that will allow me to do something similar to this, and I'm open to modules other than Numpy.
Thanks!
Edit: Ready for some speedtests?
Intel win 64bit machine. All of the following values are in seconds; 20000 loops.
collections.Counter results:
2.131000, 2.125000, 2.125000
Divakar's union1d + masking results:
1.641000, 1.633000, 1.625000
Divakar's union1d + indexing results:
0.625000, 0.625000, 0.641000
Histogram results:
1.844000, 1.938000, 1.858000
Pandas results:
16.659000, 16.686000, 16.885000
Conclusions: union1d + indexing wins, the array size is too small for Pandas to be effective, and the histogram approach blew my mind with its simplicity but I'm guessing it takes too much overhead to create. All of the responses I received were very good, though. This is what I used to get the numbers. Thanks again!
Edit: And it should be mentioned that using Counter1.update(Counter2.elements()) is terrible despite doing the same exact thing (65.671000 sec).
Later Edit: I've been thinking about this a lot, and I've came to realize that, with Numpy, it might be more effective to fill each array with zeros so that the first column isn't even needed since we can just use the index, and that would also make it much easier to add multiple arrays together as well as do other functions. Additionally, Pandas makes more sense than Numpy since there would be no need to 0-fill, and it would definitely be more effective with large data sets (however, Numpy has the advantage of being compatible on more platforms, like GAE, if that matters at all). Lastly, the answer I checked was definitely the best answer for the exact question I asked--adding the two arrays in the way I showed--but I think what I needed was a change in perspective.
Here's one approach with np.union1d and masking -
def app1(a,b):
c0 = np.union1d(a[:,0],b[:,0])
out = np.zeros((len(c0),2))
out[:,0] = c0
mask1 = np.in1d(c0,a[:,0])
out[mask1,1] = a[:,1]
mask2 = np.in1d(c0,b[:,0])
out[mask2,1] += b[:,1]
return out
Sample run -
In [174]: a
Out[174]:
array([[ 0., 2.],
[ 12., 2.],
[ 23., 1.]])
In [175]: b
Out[175]:
array([[ 0., 3.],
[ 1., 1.],
[ 12., 5.]])
In [176]: app1(a,b)
Out[176]:
array([[ 0., 5.],
[ 1., 1.],
[ 12., 7.],
[ 23., 1.]])
Here's another with np.union1d and indexing -
def app2(a,b):
n = np.maximum(a[:,0].max(), b[:,0].max())+1
c0 = np.union1d(a[:,0],b[:,0])
out0 = np.zeros((int(n), 2))
out0[a[:,0].astype(int),1] = a[:,1]
out0[b[:,0].astype(int),1] += b[:,1]
out = out0[c0.astype(int)]
out[:,0] = c0
return out
For the case where all indices are covered by the first column values in a and b -
def app2_specific(a,b):
c0 = np.union1d(a[:,0],b[:,0])
n = c0[-1]+1
out0 = np.zeros((int(n), 2))
out0[a[:,0].astype(int),1] = a[:,1]
out0[b[:,0].astype(int),1] += b[:,1]
out0[:,0] = c0
return out0
Sample run -
In [234]: a
Out[234]:
array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
In [235]: b
Out[235]:
array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.]])
In [236]: app2_specific(a,b)
Out[236]:
array([[ 0., 5.],
[ 1., 1.],
[ 2., 7.],
[ 3., 1.]])
If you know the number of fields, use np.bincount.
c = np.vstack([a, b])
counts = np.bincount(c[:, 0], weights = c[:, 1], minlength = numFields)
out = np.vstack([np.arange(numFields), counts]).T
This works if you're getting all your data at once. Make a list of your arrays and vstack them. If you're getting data chunks sequentially, you can use np.add.at to do the same thing.
out = np.zeros(2, numFields)
out[:, 0] = np.arange(numFields)
np.add.at(out[:, 1], a[:, 0], a[:, 1])
np.add.at(out[:, 1], b[:, 0], b[:, 1])
You can use a basic histogram, this will deal with gaps, too. You can filter out zero-count entries if need be.
import numpy as np
x = np.array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
y = np.array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.],
[ 5., 3.]])
c, w = np.vstack((x,y)).T
h, b = np.histogram(c, weights=w,
bins=np.arange(c.min(),c.max()+2))
r = np.vstack((b[:-1], h)).T
print(r)
# [[ 0. 5.]
# [ 1. 1.]
# [ 2. 7.]
# [ 3. 1.]
# [ 4. 0.]
# [ 5. 3.]]
r_nonzero = r[r[:,1]!=0]
Pandas have some functions doing exactly what you intend
import pandas as pd
pda = pd.DataFrame(a).set_index(0)
pdb = pd.DataFrame(b).set_index(0)
result = pd.concat([pda, pdb], axis=1).fillna(0).sum(axis=1)
Edit: If you actually need the data back in numpy format, just do
array_res = result.reset_index(name=1).values
This is a quintessential grouping problem, which numpy_indexed (disclaimer: I am its author) was created to solve elegantly and efficiently:
import numpy_indexed as npi
C = np.concatenate([A, B], axis=0)
labels, sums = npi.group_by(C[:, 0]).sum(C[:, 1])
Note: its cleaner to maintain your label arrays as a seperate int array; floats are finicky when it comes to labeling things, with positive and negative zeros, and printed values not relaying all binary state. Better to use ints for that.
Given an ndarray x and a one dimensional array containing the length of contiguous slices of a dimension of x, I want to compute a new array that contains the sum of all of the slices. For example, in two dimensions summing over dimension one:
>>> lens = np.array([1, 3, 2])
array([1, 3, 2])
>>> x = np.arange(4 * lens.sum()).reshape((4, lens.sum())).astype(float)
array([[ 0., 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10., 11.],
[ 12., 13., 14., 15., 16., 17.],
[ 18., 19., 20., 21., 22., 23.]])
# I want to compute:
>>> result
array([[ 0., 6., 9.],
[ 6., 24., 21.],
[ 12., 42., 33.],
[ 18., 60., 45.]])
# 0 = 0
# 6 = 1 + 2 + 3
# ...
# 45 = 22 + 23
The two ways that come to mind are:
a) Use cumsum and fancy indexing:
def cumsum_method(x, lens):
xc = x.cumsum(1)
lc = lens.cumsum() - 1
res = xc[:, lc]
res[:, 1:] -= xc[:, lc[:-1]]
return res
b) Use bincount and intelligently generate the appropriate bins:
def bincount_method(x, lens):
bins = np.arange(lens.size).repeat(lens) + \
np.arange(x.shape[0])[:, None] * lens.size
return np.bincount(bins.flat, weights=x.flat).reshape((-1, lens.size))
Timing these two on large input had the cumsum method performing slightly better:
>>> lens = np.random.randint(1, 100, 100)
>>> x = np.random.random((100000, lens.sum()))
>>> %timeit cumsum_method(x, lens)
1 loops, best of 3: 3 s per loop
>>> %timeit bincount_method(x, lens)
1 loops, best of 3: 3.9 s per loop
Is there an obviously more efficient way that I'm missing? It seems like a native c call would be faster because it wouldn't require allocating the cumsum or the bins array. A numpy builtin function that does something close to this could likely be better than (a) or (b). I couldn't find anything through searching and looking through the documentation.
Note, this is similar to this question, but the summation intervals aren't regular.
You can use np.add.reduceat:
>>> np.add.reduceat(x, [0, 1, 4], axis=1)
array([[ 0., 6., 9.],
[ 6., 24., 21.],
[ 12., 42., 33.],
[ 18., 60., 45.]])
The list of indices [0, 1, 4] means: "sum the slices 0:1, 1:4 and 4:". You could generate these values from lens using np.hstack(([0], lens[:-1])).cumsum().
Even factoring in the calculation of the indices from lens, a reduceat method is likely to be significantly faster than alternative methods:
def reduceat_method(x, lens):
i = np.hstack(([0], lens[:-1])).cumsum()
return np.add.reduceat(x, i, axis=1)
lens = np.random.randint(1, 100, 100)
x = np.random.random((1000, lens.sum())
%timeit reduceat_method(x, lens)
# 100 loops, best of 3: 4.89 ms per loop
%timeit cumsum_method(x, lens)
# 10 loops, best of 3: 35.8 ms per loop
%timeit bincount_method(x, lens)
# 10 loops, best of 3: 43.6 ms per loop
I want to center multi-dimensional data in a n x m matrix (<class 'numpy.matrixlib.defmatrix.matrix'>), let's say X . I defined a new array ones(645), lets say centVector to produce the mean for every row in matrix X. And now I want to iterate every row in X, compute the mean and assign this value to the corresponding index in centVector. Isn't this possible in a single row in scipy/numpy? I am not used to this language and think about something like:
centVector = ones(645)
for key, val in X:
centVector[key] = centVector[key] * (val.sum/val.size)
Afterwards I just need to subtract the mean in every Row:
X = X - centVector
How can I simplify this?
EDIT: And besides, the above code is not actually working - for a key-value loop I need something like enumerate(X). And I am not sure if X - centVector is returning the proper solution.
First, some example data:
>>> import numpy as np
>>> X = np.matrix(np.arange(25).reshape((5,5)))
>>> print X
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
numpy conveniently has a mean function. By default however, it'll give you the mean over all the values in the array. Since you want the mean of each row, you need to specify the axis of the operation:
>>> np.mean(X, axis=1)
matrix([[ 2.],
[ 7.],
[ 12.],
[ 17.],
[ 22.]])
Note that axis=1 says: find the mean along the columns (for each row), where 0 = rows and 1 = columns (and so on). Now, you can subtract this mean from your X, as you did originally.
Unsolicited advice
Usually, it's best to avoid the matrix class (see docs). If you remove the np.matrix call from the example data, then you get a normal numpy array.
Unfortunately, in this particular case, using an array slightly complicates things because np.mean will return a 1D array:
>>> X = np.arange(25).reshape((5,5))
>>> r_means = np.mean(X, axis=1)
>>> print r_means
[ 2. 7. 12. 17. 22.]
If you try to subtract this from X, r_means gets broadcast to a row vector, instead of a column vector:
>>> X - r_means
array([[ -2., -6., -10., -14., -18.],
[ 3., -1., -5., -9., -13.],
[ 8., 4., 0., -4., -8.],
[ 13., 9., 5., 1., -3.],
[ 18., 14., 10., 6., 2.]])
So, you'll have to reshape the 1D array into an N x 1 column vector:
>>> X - r_means.reshape((-1, 1))
array([[-2., -1., 0., 1., 2.],
[-2., -1., 0., 1., 2.],
[-2., -1., 0., 1., 2.],
[-2., -1., 0., 1., 2.],
[-2., -1., 0., 1., 2.]])
The -1 passed to reshape tells numpy to figure out this dimension based on the original array shape and the rest of the dimensions of the new array. Alternatively, you could have reshaped the array using r_means[:, np.newaxis].