sum in numpy arrays with different dimension

sum in numpy arrays with different dimension - python

I have a 3D numpy array with shape A=(10227,127,340) and a 1D with shape B=(10227), both array of float64. I just want to sum B to A(first column) at each of the 127x340 grid point.
The output array should be C(10227,127,340) with the values of the first column changed after the sum, of course.

This may be one way of achieving it:
C = A + np.repeat(B, A.shape[1] * A.shape[2]).reshape(A.shape)
The following code also works but is much slower:
C = A.copy()
for i in range(A.shape[0]):
C[i:] += B[i]

A more compact way of doing this uses numpy's fancy dimensional indexing tools:
C = B + A[:,None,None]
This automagically expands A to have the dimensions of B| along the selected axes (indicated by Noneornp.newaxis`)

Related

Python: vectorization over numpy array

X and Y are both 3d arrays with dimensions (a,b,c). My goal is to do a dot product.
Consider that case where index i and j are scalar, and (X[i,:,j].T).dot(Y[i,:,j]) would be simple and return a scalar.
However, if I try to do vectorization, i and j become 1d arrays, and (X[i,:,j].T).dot(Y[i,:,j]) return a matrix but I am expecting a 1d array as result. How do I get around this problem ?

Naive implementation using list comprehension:
a,b,c = X.shape
r1 = [(X[i,:,j].T).dot(Y[i,:,j]) for i in range(a) for j in range(c)]
Implementation using np.einsum:
r2 = np.einsum('ijk,ijk->ik', X,Y).flatten()

How to efficiently operate on sub-arrays like calculating the determinants, inverse,

I have to to multiple operations on sub-arrays like matrix inversions or building determinants. Since for-loops are not very fast in Python I wonder what is the best way to do this.
import numpy as np
n = 8
a = np.random.rand(3,3,n)
b = np.empty(n)
c = np.zeros_like(a)
for i in range(n):
b[i] = np.linalg.det(a[:,:,i])
c[:,:,i] = np.linalg.inv(a[:,:,i])

Those numpy.linalg functions would accept n-dim arrays as long as the last two axes are the ones that form the 2D slices along which functions are intended to be operated upon. Hence, to solve our cases, permute axes to bring-up the axis of iteration as the first one, perform the required operation and if needed push-back that axis back to it's original place.
Hence, we could get those outputs, like so -
b = np.linalg.det(np.moveaxis(a,2,0))
c = np.moveaxis(np.linalg.inv(np.moveaxis(a,2,0)),0,2)

What is the difference between an array with shape (N,1) and one with shape (N)? And how to convert between the two?

Python newbie here coming from a MATLAB background.
I have a 1 column array and I want to move that column into the first column of a 3 column array. With a MATLAB background this is what I would do:
import numpy as np
A = np.zeros([150,3]) #three column array
B = np.ones([150,1]) #one column array which needs to replace the first column of A
#MATLAB-style solution:
A[:,0] = B
However this does not work because the "shape" of A is (150,3) and the "shape" of B is (150,1). And apparently the command A[:,0] results in a "shape" of (150).
Now, what is the difference between (150,1) and (150)? Aren't they the same thing: a column vector? And why isn't Python "smart enough" to figure out that I want to put the column vector, B, into the first column of A?
Is there an easy way to convert a 1-column vector with shape (N,1) to a 1-column vector with shape (N)?
I am new to Python and this seems like a really silly thing that MATLAB does much better...

Several things are different. In numpy arrays may be 0d or 1d or higher. In MATLAB 2d is the smallest (and at one time the only dimensions). MATLAB readily expands dimensions the end because it is Fortran ordered. numpy, is by default c ordered, and most readily expands dimensions at the front.
In [1]: A = np.zeros([5,3])
In [2]: A[:,0].shape
Out[2]: (5,)
Simple indexing reduces a dimension, regardless whether it's A[0,:] or A[:,0]. Contrast that with happens to a 3d MATLAB matrix, A(1,:,:) v A(:,:,1).
numpy does broadcasting, adjusting dimensions during operations like sum and assignment. One basic rule is that dimensions may be automatically expanded toward the start if needed:
In [3]: A[:,0] = np.ones(5)
In [4]: A[:,0] = np.ones([1,5])
In [5]: A[:,0] = np.ones([5,1])
...
ValueError: could not broadcast input array from shape (5,1) into shape (5)
It can change (5,) LHS to (1,5), but can't change it to (5,1).
Another broadcasting example, +:
In [6]: A[:,0] + np.ones(5);
In [7]: A[:,0] + np.ones([1,5]);
In [8]: A[:,0] + np.ones([5,1]);
Now the (5,) works with (5,1), but that's because it becomes (1,5), which together with (5,1) produces (5,5) - an outer product broadcasting:
In [9]: (A[:,0] + np.ones([5,1])).shape
Out[9]: (5, 5)
In Octave
>> x = ones(2,3,4);
>> size(x(1,:,:))
ans =
1 3 4
>> size(x(:,:,1))
ans =
2 3
>> size(x(:,1,1) )
ans =
2 1
>> size(x(1,1,:) )
ans =
1 1 4
To do the assignment that you want you adjust either side
Index in a way that preserves the number of dimensions:
In [11]: A[:,[0]].shape
Out[11]: (5, 1)
In [12]: A[:,[0]] = np.ones([5,1])
transpose the (5,1) to (1,5):
In [13]: A[:,0] = np.ones([5,1]).T
flatten/ravel the (5,1) to (5,):
In [14]: A[:,0] = np.ones([5,1]).flat
In [15]: A[:,0] = np.ones([5,1])[:,0]
squeeze, ravel also work.
Some quick tests in Octave indicate that it is more forgiving when it comes to dimensions mismatch. But the numpy prioritizes consistency. Once the broadcasting rules are understood, the behavior makes sense.

Use squeeze method to eliminate the dimensions of size 1.
A[:,0] = B.squeeze()
Or just create B one-dimensional to begin with:
B = np.ones([150])
The fact that NumPy maintains a distinction between a 1D array and 2D array with one of dimensions being 1 is reasonable, especially when one begins working with n-dimensional arrays.
To answer the question in the title: there is an evident structural difference between an array of shape (3,) such as
[1, 2, 3]
and an array of shape (3, 1) such as
[[1], [2], [3]]

Scale rows of 3D-tensor

I have an n-by-3-by-3 numpy array A and an n-by-3 numpy array B. I'd now like to multiply every row of every one of the n 3-by-3 matrices with the corresponding scalar in B, i.e.,
import numpy as np
A = np.random.rand(10, 3, 3)
B = np.random.rand(10, 3)
for a, b in zip(A, B):
a = (a.T * b).T
print(a)
Can this be done without the loop as well?

You can use NumPy broadcasting to let the elementwise multiplication happen in a vectorized manner after extending B to 3D after adding a singleton dimension at the end with np.newaxis or its alias/shorthand None. Thus, the implementation would be A*B[:,:,None] or simply A*B[...,None].

Divide an array of arrays by an array of scalars

If I have an array, A, with shape (n, m, o) and an array, B, with shape (n, m), is there a way to divide each array at A[n, m] by the scalar at B[n, m] without a list comprehension?
>>> A.shape
(4,173,1469)
>>> B.shape
(4,173)
>>> # Better way to do:
>>> np.array([[A[i, j] / B[i, j] for j in range(len(B[i]))] for i in range(len(B))])
The problem with a list comprehension is that it is slow, it doesn't return an array (so you have to np.array(_) it, which makes it even slower), it is hard to read, and the whole point of numpy was to move loops from Python to C++ or Fortran.
If A was of shape (n) and B was a scalar (of shape ( )), then this would be trivial: A / B, but this property does not scale with dimensions
>>> A / B
ValueError: operands could not be broadcast together with shapes (4,173,1469) (4,173)
I am looking for a fast way to do this (preferably not by tiling B to an array of shape (n, m, o), and preferably using native numpy tools).

You are absolutely right, there is a better way, I think you are getting the spirit of numpy.
The solution in your case is that you have to add a new dimension to B that consists of one entry in that dimension:
so if your A is of shape (n,m,o) your B has to be of shape (n,m,1) and then you can use the native broadcasting to get your operation "A/B" done.
You can just add that dimension to be by adding a "newaxis" to B there.
import numpy as np
A = np.ones(10,5,3)
B = np.ones(10,5)
Result = A/B[:,:,np.newaxis]
B[:,:,np.newaxis] --> this will turn B into an array of shape of (10,5,1)

From here, the rules of broadcasting are:
When operating on two arrays, NumPy compares their shapes
element-wise. It starts with the trailing dimensions, and works its
way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
Your dimensions are n,m,o and n,m so not compatible.
The / division operator will work using broadcasting if you use:
o,n,m divided by n,m
n,m,o divided by n,m,1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

sum in numpy arrays with different dimension - python

I have a 3D numpy array with shape A=(10227,127,340) and a 1D with shape B=(10227), both array of float64. I just want to sum B to A(first column) at each of the 127x340 grid point. The output array should be C(10227,127,340) with the values of the first column changed after the sum, of course.

This may be one way of achieving it: C = A + np.repeat(B, A.shape[1] * A.shape[2]).reshape(A.shape) The following code also works but is much slower: C = A.copy() for i in range(A.shape[0]): C[i:] += B[i]

A more compact way of doing this uses numpy's fancy dimensional indexing tools: C = B + A[:,None,None] This automagically expands A to have the dimensions of B| along the selected axes (indicated by Noneornp.newaxis`)

Related

Python: vectorization over numpy array

How to efficiently operate on sub-arrays like calculating the determinants, inverse,

What is the difference between an array with shape (N,1) and one with shape (N)? And how to convert between the two?

Scale rows of 3D-tensor

Divide an array of arrays by an array of scalars

Categories

Resources