Broadcasting using numpy's sum function - python

I was reading about broadcasting and was trying to understand it using numpy's sum function.
I created two matrices :
m1 = np.array([[1,2,3],[4,5,6]]) # 3X2
m2 = np.array([[1],[2]]) # 2X1
When I add the above two as :
m1 + m2
broadcasting is done as the column vector [1],[2] replicates itself equal to the number of columns inside m1 matrix. Is it also possible to see broadcasting using np.sum(m1,m2) ? I assume there is no difference between m1 + m2 and np.sum(m1,m2). But currently np.sum(m1,m2) throws an error TypeError: only integer scalar arrays can be converted to a scalar index.
Can't I have numpy to perform broadcasting if I use its sum function?

numpy.sum does not add two arrays, it computes the sum over one (or multiple, or, by default, all) axis of an array. The second argument is which axis to sum over and a multi-dimensional array does not work for that.
These are examples of how numpy.sum works:
m1 = np.arange(12).reshape((3,4))
# sum all entries
np.sum(m1) # 66
# sum along the first axis, getting a result for each column
np.sum(m1, 0) # array([12, 15, 18, 21])
m2 = np.arange(12).reshape((2,3,2))
# sum along two of the three axes
m2.sum((1,2)) # array([15, 51])
What you might be looking for is numpy.add. This adds together two arrays (just like +) but allows adding some constraints (when giving it an out array you can mask certain fields so they will not get filled with the result of the addition). Otherwise it behaves how you would expect it to behave if you know the numpy broadcasting rules:
m1 = np.array([[1,2,3],[4,5,6]]) # 3X2
m2 = np.array([[1],[2]]) # 2X1
m1 + m2
# array([[2, 3, 4],
# [6, 7, 8]])
np.add(m1, m2)
# array([[2, 3, 4],
# [6, 7, 8]])
And here an example of the more fancy usage:
m1 = m1.astype(float)
m1[1, 1] = np.inf
m1
# array([[ 1., 2., 3.],
# [ 4., inf, 6.]])
out = np.zeros_like(m1)
where = np.ones_like(m1, dtype=bool)
where[1, 1] = False # don't want that infinity in the sum
np.add(m1, m2, out, where=where)
# array([[ 2., 3., 4.],
# [ 6., 0., 8.]])

You actually kind of can make sum broadcast:
>>> import numpy as np
>>>
>>> a, b, c = np.ogrid[:2, :3, :4]
>>> d = b*c
>>> list(map(np.shape, (a, b, c, d)))
[(2, 1, 1), (1, 3, 1), (1, 1, 4), (1, 3, 4)]
>>>
>>> a+b+c+d
array([[[ 0, 1, 2, 3],
[ 1, 3, 5, 7],
[ 2, 5, 8, 11]],
[[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12]]])
>>> np.sum([a, b, c, d])
array([[[ 0, 1, 2, 3],
[ 1, 3, 5, 7],
[ 2, 5, 8, 11]],
[[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12]]])
I suspect this creates a 4-element array of dtype object and then delegates the actual summing to the element arrays.
Unfortunately, the array factory can at times be capricious with this kind of array-of-arrays:
And, indeed, we can use an example known to defeat np.array to trip up np.sum, even though the actual error doesn't appear to happen in np.array:
>>> np.sum([np.arange(3), 1]) # fine
array([1, 2, 3])
>>> np.sum([1, np.arange(3)]) # ouch!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/paul/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1882, in sum
out=out, **kwargs)
File "/home/paul/lib/python3.6/site-packages/numpy/core/_methods.py", line 32, in _sum
return umr_sum(a, axis, dtype, out, keepdims)
ValueError: setting an array element with a sequence.
So, on balance, it is probably better to go with the builtin Python sum:
>>> sum([a, b, c, d])
array([[[ 0, 1, 2, 3],
[ 1, 3, 5, 7],
[ 2, 5, 8, 11]],
[[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12]]])
>>> sum([1, np.arange(3)])
array([1, 2, 3])
>>> sum([np.arange(3), 1])
array([1, 2, 3])

Related

What is the difference between (13027,) and (13027,1) in numpy expand_dim()

These are two outputs in a chunk of code after I apply the call .shape to a variable b before and after applying the call np.expand_dim(b, axis=1).
I see that the _dim part may seem like a dead giveaway, but the outputs don't seem to be different, except for, perhaps turning a row vector into a column vector (?):
b is [208. 193. 208. ... 46. 93. 200.] a row vector, but np.expand_dim(b, axis=1) gives:
[[208.]
[193.]
[208.]
...
[ 46.]
[ 93.]
[200.]]
Which could be interpreted as a column vector (?), as opposed to any increased number of dimensions.
What is the difference between (13027,) and (13027,1)
They are arrays of different dimensions and some operations apply to them differently. For example
>>> a = np.arange(5)
>>> b = np.arange(5, 10)
>>> a + b
array([ 5, 7, 9, 11, 13])
>>> np.expand_dims(a, axis=1) + b
array([[ 5, 6, 7, 8, 9],
[ 6, 7, 8, 9, 10],
[ 7, 8, 9, 10, 11],
[ 8, 9, 10, 11, 12],
[ 9, 10, 11, 12, 13]])
The last result is what we call broadcasting, for which you can read in the numpy docs, or even this SO question.
Basically np.expand_dims adds new axes at the specified dimensions and all the following achieve the same result
>>> a.shape
(5,)
>>> np.expand_dims(a, axis=(0, 2)).shape
(1, 5, 1)
>>> a[None,:,None].shape
(1, 5, 1)
>>> a[np.newaxis,:,np.newaxis].shape
(1, 5, 1)
Note that in numpy the transpose of a 1D array is still a 1D array. It isn't like in MATLAB where a row vector turns to a column vector.
>>> a
array([0, 1, 2, 3, 4])
>>> a.T
array([0, 1, 2, 3, 4])
>>> a.T.shape
(5,)
So in order to turn it to a "column vector" you have to turn the array from shape (N,) to (N, 1) with broadcasting (or reshaping). But you're better off treating it as a 2D array of N rows with 1 element per row.
(13027,) is treating the x axis as 0, while (13027,1) is treating the x axis as 1.
https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html
It's like "i" where i = 0 by default so if you don't explicitly define it, it will start at 0.

Numpy concatenate lists where first column is in range n

I am trying to select all rows in a numpy matrix named matrix with shape (25323, 9), where the values of the first column are inside the range of start and end for each tuple on the list range_tuple. Ultimately, I want to create a new numpy matrix with the result where final has a shape of (n, 9). The following code returns this error: TypeError: only integer scalar arrays can be converted to a scalar index. I have also tried initializing final with numpy.zeros((1,9)) and used np.concatenate but get similar results. I do get a compiled result when I use final.append(result) instead of using np.concatenate but the shape of the matrix gets lost. I know there is a proper solution to this problem, any help would be appreciated.
final = []
for i in range_tuples:
copy = np.copy(matrix)
start = i[0]
end = i[1]
result = copy[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate(final, result)
final = np.matrix(final)
In [33]: arr
Out[33]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]])
In [34]: tups = [(0,6),(3,12),(9,10),(15,14)]
In [35]: alist=[]
...: for start, stop in tups:
...: res = arr[(arr[:,0]<stop)&(arr[:,0]>=start), :]
...: alist.append(res)
...:
check the list; note that elements differ in shape; some are 1 or 0 rows. It's a good idea to test these edge cases.
In [37]: alist
Out[37]:
[array([[0, 1, 2],
[3, 4, 5]]), array([[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]]), array([[ 9, 10, 11]]), array([], shape=(0, 3), dtype=int64)]
vstack joins them:
In [38]: np.vstack(alist)
Out[38]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[ 9, 10, 11]])
Here concatenate also works, because default axis is 0, and all inputs are already 2d.
Try the following
final = np.empty((0,9))
for start, stop in range_tuples:
result = matrix[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate((final, result))
The first is to initialize final as a numpy array. The first argument to concatenate has to be a python list of the arrays, see docs. In your code it interprets the result variable as the value for the parameter axis
Notes
I used tuple deconstruction to make the loop clearer
the copy is not needed
appending lists can be faster. The final result can afterwards be obtained through reshaping, if result is always of the same length.
I would simply create a boolean mask to select rows that satisfy required conditions.
EDIT: I missed that you are working with matrix (as opposite to ndarray). Answer was edited for matrix.
Assume following input data:
matrix = np.matrix([[1, 2, 3], [5, 6, 7], [2, 1, 7], [3, 4, 5], [8, 9, 0]])
range_tuple = [(0, 2), (1, 4), (1, 9), (5, 9), (0, 100)]
Then, first, I would convert range_tuple to a numpy.ndarray:
range_mat = np.matrix(range_tuple)
Now, create the mask:
mask = np.ravel((matrix[:, 0] > range_mat[:, 0]) & (matrix[:, 0] < range_mat[:, 1]))
Apply the mask:
final = matrix[mask] # or matrix[mask].copy() if you intend to modify matrix
To check:
print(final)
[[1 2 3]
[2 1 7]
[8 9 0]]
If length of range_tuple can be different from the number of rows in the matrix, then do this:
n = min(range_mat.shape[0], matrix.shape[0])
mask = np.pad(
np.ravel(
(matrix[:n, 0] > range_mat[:n, 0]) & (matrix[:n, 0] < range_mat[:n, 1])
),
(0, matrix.shape[0] - n)
)
final = matrix[mask]

Numpy concatenate for arrays with same rows but different columns

I have a list of arrays that contain the same rows, but different columns.
I printed out the shape of the array and checked that they have same rows.
print ("Type x_test : actual",type(x_dump),x_dump.shape, type(actual), actual.shape, pred.shape)
cmp = np.concatenate([x_test,actual,pred],axis = 1)
('Type x_test : actual', <type 'numpy.ndarray'>, (2420L, 4719L), <type 'numpy.ndarray'>, (2420L,), (2420L,))
This gives me an error:
ValueError: all the input arrays must have same number of dimensions
I tried to replicate this error using the below commands:
x.shape,x1.shape,x2.shape
Out[772]: ((3L, 1L), (3L, 4L), (3L, 1L))
np.concatenate([x,x1,x2],axis=1)
Out[764]:
array([[ 0, 0, 1, 2, 3, 0],
[ 1, 4, 5, 6, 7, 1],
[ 2, 8, 9, 10, 11, 2]])
I dont get any error here. Is anyone facing similar issue ?
EDIT 1: Right after writing this question, I figured out that the dimensions are different.
#Gareth Rees: has explained beautifully the different between numpy array (R,1) and (R,) here.
Fixed using:
# Reshape and concatenate
actual = actual.reshape(len(actual),1)
pred = pred.reshape(len(pred),1)
EDIT 2: Marking to close this answer as a duplicate of Difference between numpy.array shape (R, 1) and (R,).
EDIT
After posting this, the OP figured out the error. This can be ignored unless one needs to see the construct with shaped (R,1) versus (R,). In any event, it will give the down voters practice space.
ORIGINAL
Given your shapes, the answer is correct.
a = np.arange(3).reshape(3,1)
b = np.arange(12).reshape(3,4)
c = np.arange(3).reshape(3,1)
np.concatenate([a, b, c], axis=1)
Out[4]:
array([[ 0, 0, 1, 2, 3, 0],
[ 1, 4, 5, 6, 7, 1],
[ 2, 8, 9, 10, 11, 2]])
a
Out[5]:
array([[0],
[1],
[2]])
b
Out[6]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
c
Out[7]:
array([[0],
[1],
[2]])

Numpy assignment like 'numpy.take'

Is it possible to assign to a numpy array along the lines of how the take functionality works?
E.g. if I have a an array a, a list of indices inds, and a desired axis, I can use take as follows:
import numpy as np
a = np.arange(12).reshape((3, -1))
inds = np.array([1, 2])
print(np.take(a, inds, axis=1))
[[ 1 2]
[ 5 6]
[ 9 10]]
This is extremely useful when the indices / axis needed may change at runtime.
However, numpy does not let you do this:
np.take(a, inds, axis=1) = 0
print(a)
It looks like there is some limited (1-D) support for this via numpy.put, but I was wondering if there was a cleaner way to do this?
In [222]: a = np.arange(12).reshape((3, -1))
...: inds = np.array([1, 2])
...:
In [223]: np.take(a, inds, axis=1)
Out[223]:
array([[ 1, 2],
[ 5, 6],
[ 9, 10]])
In [225]: a[:,inds]
Out[225]:
array([[ 1, 2],
[ 5, 6],
[ 9, 10]])
construct an indexing tuple
In [226]: idx=[slice(None)]*a.ndim
In [227]: axis=1
In [228]: idx[axis]=inds
In [229]: a[tuple(idx)]
Out[229]:
array([[ 1, 2],
[ 5, 6],
[ 9, 10]])
In [230]: a[tuple(idx)] = 0
In [231]: a
Out[231]:
array([[ 0, 0, 0, 3],
[ 4, 0, 0, 7],
[ 8, 0, 0, 11]])
Or for a[inds,:]:
In [232]: idx=[slice(None)]*a.ndim
In [233]: idx[0]=inds
In [234]: a[tuple(idx)]
Out[234]:
array([[ 4, 0, 0, 7],
[ 8, 0, 0, 11]])
In [235]: a[tuple(idx)]=1
In [236]: a
Out[236]:
array([[0, 0, 0, 3],
[1, 1, 1, 1],
[1, 1, 1, 1]])
PP's suggestion:
def put_at(inds, axis=-1, slc=(slice(None),)):
return (axis<0)*(Ellipsis,) + axis*slc + (inds,) + (-1-axis)*slc
To be used as in a[put_at(ind_list,axis=axis)]
I've seen both styles on numpy functions. This looks like one used for extend_dims, mine was used in apply_along/over_axis.
earlier thoughts
In a recent take question I/we figured out that it was equivalent to arr.flat[ind] for some some raveled index. I'll have to look that up.
There is an np.put that is equivalent to assignment to the flat:
Signature: np.put(a, ind, v, mode='raise')
Docstring:
Replaces specified elements of an array with given values.
The indexing works on the flattened target array. `put` is roughly
equivalent to:
a.flat[ind] = v
Its docs also mention place and putmask (and copyto).
numpy multidimensional indexing and the function 'take'
I commented take (without axis) is equivalent to:
lut.flat[np.ravel_multi_index(arr.T, lut.shape)].T
with ravel:
In [257]: a = np.arange(12).reshape((3, -1))
In [258]: IJ=np.ix_(np.arange(a.shape[0]), inds)
In [259]: np.ravel_multi_index(IJ, a.shape)
Out[259]:
array([[ 1, 2],
[ 5, 6],
[ 9, 10]], dtype=int32)
In [260]: np.take(a,np.ravel_multi_index(IJ, a.shape))
Out[260]:
array([[ 1, 2],
[ 5, 6],
[ 9, 10]])
In [261]: a.flat[np.ravel_multi_index(IJ, a.shape)] = 100
In [262]: a
Out[262]:
array([[ 0, 100, 100, 3],
[ 4, 100, 100, 7],
[ 8, 100, 100, 11]])
and to use put:
In [264]: np.put(a, np.ravel_multi_index(IJ, a.shape), np.arange(1,7))
In [265]: a
Out[265]:
array([[ 0, 1, 2, 3],
[ 4, 3, 4, 7],
[ 8, 5, 6, 11]])
Use of ravel is unecessary in this case but might useful in others.
I have given an example for use of
numpy.take in 2 dimensions. Perhaps you can adapt that to your problem
You can juste use indexing in this way :
a[:,[1,2]]=0

How to add a dimension to a numpy array in Python

I have an array that is size (214, 144). I need it to be (214,144,1) is there a way to do this easily in Python? Basically the dimensions are supposed to be (Days, Times, Stations). Since I only have 1 station's data that dimension would be a 1. However if I could also make the code flexible enough work for say 2 stations that would be great (e.g. changing the dimension size from (428,288) to (214,144,2)) that would be great!
You could use reshape:
>>> a = numpy.array([[1,2,3,4,5,6],[7,8,9,10,11,12]])
>>> a.shape
(2, 6)
>>> a.reshape((2, 6, 1))
array([[[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6]],
[[ 7],
[ 8],
[ 9],
[10],
[11],
[12]]])
>>> _.shape
(2, 6, 1)
Besides changing the shape from (x, y) to (x, y, 1), you could use (x, y/n, n) as well, but you may want to specify the column order depending on the input:
>>> a.reshape((2, 3, 2))
array([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]]])
>>> a.reshape((2, 3, 2), order='F')
array([[[ 1, 4],
[ 2, 5],
[ 3, 6]],
[[ 7, 10],
[ 8, 11],
[ 9, 12]]])
1) To add a dimension to an array a of arbitrary dimensionality:
b = numpy.reshape (a, list (numpy.shape (a)) + [1])
Explanation:
You get the shape of a, turn it into a list, concatenate 1 to that list, and use that list as the new shape in a reshape operation.
2) To specify subdivisions of the dimensions, and have the size of the last dimension calculated automatically, use -1 for the size of the last dimension. e.g.:
b = numpy.reshape(a, [numpy.size(a,0)/2, numpy.size(a,1)/2, -1])
The shape of b in this case will be [214,144,4].
(obviously you could combine the two approaches if necessary):
b = numpy.reshape (a, numpy.append (numpy.array (numpy.shape (a))/2, -1))

Categories