How to vectorize performing pairwise sums given two numpy arrays?

How to vectorize performing pairwise sums given two numpy arrays? - python

I have two numpy arrays which look like this:
x = [v1, v2, v3, ..., vm]
y = [w1, w2, w3, ..., wn]
where vi, wj are numpy arrays of length 3.
I want to perform a pairwise summation of v's and w's and get a final array
z = [v1+w1, v1+w2,...,v1+wn,v2+w1, ..., vi+wj, ..., vm+wn]
A simple way of obtaining z is as follows:
z = np.zeros ((m*n, 3))
for i in range(m):
for j in range(n):
z[n*i+j] = x[i] + y[j]
This computation is not feasible is m, n are very large.
I know scipy.spatial has methods to enumerate pairwise distances using distance_matrix in a vectorized fashion.
I want to ask if there is a vectorized version of performing such pairwise additions for numpy arrays?

You can take advantage of broadcasting, creating a 2D array, then you can easily get z[i,j] = x[i] + y[j]
x = np.reshape(x, (-1, 1)) # shape (N, 1)
y = np.reshape(y, (-1, 1)) # shape (N, 1)
z = x + y.T # shape (N, N)
If you want to have z as a 1D array you can do z.reshape(-1).

If x is mx3 matrix, y is a nx3
x.shape # (m,3)
y.shape # (n,3)
x1 = x.reshape(m,1,3)
y1 = y.reshape(1,n,3)
z = x1 + y1 # shape (m,n,3)
z1 = z.reshape(-1,3) # (m*n, 3)
equivalently
z = x[:,None]+y
test:
In [263]: x=np.arange(12).reshape(4,3); y=np.arange(6).reshape(2,3)
In [264]: z = x[:,None]+y
In [265]: z.shape
Out[265]: (4, 2, 3)
In [266]: z
Out[266]:
array([[[ 0, 2, 4],
[ 3, 5, 7]],
[[ 3, 5, 7],
[ 6, 8, 10]],
[[ 6, 8, 10],
[ 9, 11, 13]],
[[ 9, 11, 13],
[12, 14, 16]]])

Related

Create numpy matrix from output of function on vector vs vector

I have two vectors / one-dimensional numpy arrays and a function I want to apply:
arr1 = np.arange(1, 5)
arr2 = np.arange(2, 6)
func = lambda x, y: x * y
I now want to construct a n * m matrix (with n, m being the lengths of arr1, and arr2 respectively) containing the values of the function outputs. The naive approach using for loops would look like this:
np.array([[func(x, y) for x in arr1] for y in arr2])
I was wondering if there is a smarter vectorized approach using the arr1[:, None] syntax to apply my function - please note my actual function is significantly more complicated and can't be broken down to simple numpy operations (arr1[:, None] * arr2[None, :] won't work).

When you have numpy.array, One approach can be numpy.einsum. Because you want to compute this : arr1_i * arr2_j -> insert to arr_result_ji.
>>> np.einsum('i, j -> ji', arr1, arr2)
array([[ 2, 4, 6, 8],
[ 3, 6, 9, 12],
[ 4, 8, 12, 16],
[ 5, 10, 15, 20]])
Or you can use numpy.matmul or use #.
>>> np.matmul(arr2[:,None], arr1[None,:])
# OR
>>> arr2[:,None] # arr1[None,:]
# Or by thanks #hpaulj by elementwise multiplication with broadcasting
>>> arr2[:,None] * arr1[None,:]
array([[ 2, 4, 6, 8],
[ 3, 6, 9, 12],
[ 4, 8, 12, 16],
[ 5, 10, 15, 20]])

Here is some comparison between your loop approach and #I'mahdi 's approach:
import time
arr1 = np.arange(1, 10000)
arr2 = np.arange(2, 10001)
start = time.time()
np.array([[func(x, y) for x in arr1] for y in arr2])
print('loop: __time__', time.time()-start)
start = time.time()
(arr1[:, None]*arr2[None, :]).T
print('* __time__', time.time()-start)
start = time.time()
np.einsum('i, j -> ji', arr1, arr2)
print('einsum __time__', time.time()-start)
start = time.time()
np.matmul(arr2[:,None], arr1[None,:])
print('matmul __time__', time.time()-start)
Output:
loop: __time__ 70.3061535358429
* __time__ 0.43536829948425293
einsum __time__ 0.508014440536499
matmul __time__ 0.7149899005889893

Create matrix with numpy meshgrid

I want to create a matrix with M rows and N columns. The increment along the columns are always 1, whilst the increment in rows are a constant value, c. For example, to create this matrix:
The number of rows are 4, the number of columns are 2 and the shift between rows: c = 8. One way to perform this could be:
# Indices of columns
coord_x = np.arange(0, 2)
# Indices of rows
coord_y = np.arange(1, 37, 9)
# Creates 2 matrices with the coordinates
x, y = np.meshgrid(coord_x, coord_y)
# To perform the shift between columns
idx_left = x + y
And the output is:
print(idx_left)
[[ 1 2]
[10 11]
[19 20]
[28 29]]
Can I perform this without the adding idx_left = x + y?. I've already seen other functions but I don't find any that considers a shift along the rows and columns...

Stride tricks
You can use np.lib.stride_tricks for this purpose.
arr = np.arange(1,100)
shape = (4,2)
strides = (arr.strides[0]*9,arr.strides[0]*1) #8 bytes with 9 steps on axis=0, 8bytes with 1 step on axis=1
np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
array([[ 1, 2],
[10, 11],
[19, 20],
[28, 29]])
Another example with 2 shift on axis=0 and 3 shift on axis=1.
arr = np.arange(1,100)
shape = (4,2)
strides = (arr.strides[0]*2,arr.strides[0]*3) #8bytes * 2shift on axis=0, 8bytes*3shift on axis=1
np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
array([[ 1, 4],
[ 3, 6],
[ 5, 8],
[ 7, 10]])
Broadcasting
You could simply do this the same way you are doing without meshgrids only using broadcasting as well -
#Your original example
coord_x = np.arange(0, 2, 1) #start, stop, step
coord_y = np.arange(1, 37, 9)
coord_x[None,:] + coord_y[:,None]
array([[ 1, 2],
[10, 11],
[19, 20],
[28, 29]])
Linspace
You could use linspace if you have extreme limits. So, in your case, you can create a 4 row array from (1,2) to (28,29).
np.linspace((1,2),(28,29),4)
array([[ 1., 2.],
[10., 11.],
[19., 20.],
[28., 29.]])
Mgrid
Mgrid is more convenient than mesh grid for your purpose. You can do -
np.mgrid[0:2:1, 1:37:9].sum(0).T
array([[ 1, 2],
[10, 11],
[19, 20],
[28, 29]])

How to calculate x*x.T in python

I want to calculate the following:
but I have no idea how to do this in python, I do not want to implement this manually but use a predefined function for this, something from numpy for example.
But numpy seems to ignore that x.T should be transposed.
Code:
import numpy as np
x = np.array([1, 5])
print(np.dot(x, x.T)) # = 26, This is not the matrix it should be!

While your vectors are defined as 1-d arrays, you can use np.outer:
np.outer(x, x.T)
> array([[ 1, 5],
> [ 5, 25]])
Alternatively, you could also define your vectors as matrices and use normal matrix multiplication:
x = np.array([[1], [5]])
x # x.T
> array([[ 1, 5],
> [ 5, 25]])

You can do:
x = np.array([[1], [5]])
print(np.dot(x, x.T))
Your original x is of shape (2,), while you need a shape of (2,1). Another way is reshaping your x:
x = np.array([1, 5]).reshape(-1,1)
print(np.dot(x, x.T))
.reshape(-1,1) reshapes your array to have 1 column and implicitely takes care of number of rows.
output:
[[ 1 5]
[ 5 25]]

np.matmul(x[:, np.newaxis], [x])

"Multiply" 1d numpy array with a smaller one and sum the result

I want to "multiply" (for lack of better description) a numpy array X of size M with a smaller numpy array Y of size N, for every N elements in X. Then, I want to sum the resulting array (almost like a dotproduct).
I hope the example makes it more clear:
Example
X = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Y = [1,2,3]
Z = mymul(X, Y)
= [0*1, 1*2, 2*3, 3*1, 4*2, 5*3, 6*1, 7*2, 8*3, 9*1]
= [ 0, 2, 6, 3, 8, 15, 6, 14, 24, 9]
result = sum(Z) = 87
X and Y can be of varying lengths and Y is always smaller than X, but not necessarily divisible (e.g. M % N != 0)
I have some solutions but they are quite slow. I'm hoping there is a faster way to do this.
import numpy as np
X = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int)
Y = np.array([1,2,3], dtype=int)
# these work but are slow for large X, Y
# simple for-loop
t = 0
for i in range(len(X)):
t += X[i] * Y[i % len(Y)]
print(t) #87
# extend Y M/N times so np.dot can be applied
Ytiled = np.tile(Y, int(np.ceil(len(X) / len(Y))))[:len(X)]
t = np.dot(X, Ytiled)
print(t) #87

Resize Y to same length as X and then use matrix-multiplication -
In [52]: np.dot(X, np.resize(Y,len(X)))
Out[52]: 87
Alternative to using np.resize would be with tiling. Hence, np.tile(Y,(m+n-1)//n)[:m] for m,n = len(X), len(Y), could replace np.resize(Y,len(X)) for a faster one.
Another without resizing Y to achieve memory-efficiency -
In [79]: m,n = len(X), len(Y)
In [80]: s = n*(m//n)
In [81]: X2D = X[:s].reshape(-1,n)
In [82]: X2D.dot(Y).sum() + np.dot(X[s:],Y[:m-s])
Out[82]: 87
Alternatively, we can use np.einsum('ij,j->',X2D,Y) to replace X2D.dot(Y).sum().

You can use convolve (documentation):
np.convolve(X, Y[::-1], 'same')[::len(Y)].sum()
Remember to reverse the second array.

Euclidean distances between several images and one base image

I have a matrix X of dimensions (30x8100) and another one Y of dimensions (1x8100). I want to generate an array containing the difference between them (X[1]-Y, X[2]-Y,..., X[30]-Y)
Can anyone help?

All you need for that is
X - Y
Since several people have offered answers that seem to try to make the shapes match manually, I should explain:
Numpy will automatically expand Y's shape so that it matches with that of X. This is called broadcasting, and it usually does a very good job of guessing what should be done. In ambiguous cases, an axis keyword can be applied to tell it which direction to do things. Here, since Y has a dimension of length 1, that is the axis that is expanded to be length 30 to match with X's shape.
For example,
In [87]: import numpy as np
In [88]: n, m = 3, 5
In [89]: x = np.arange(n*m).reshape(n,m)
In [90]: y = np.arange(m)[None,...]
In [91]: x.shape
Out[91]: (3, 5)
In [92]: y.shape
Out[92]: (1, 5)
In [93]: (x-y).shape
Out[93]: (3, 5)
In [106]: x
Out[106]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [107]: y
Out[107]: array([[0, 1, 2, 3, 4]])
In [108]: x-y
Out[108]:
array([[ 0, 0, 0, 0, 0],
[ 5, 5, 5, 5, 5],
[10, 10, 10, 10, 10]])
But this is not really a euclidean distance, as your title seems to suggest you want:
df = np.asarray(x - y) # the difference between the images
dst = np.sqrt(np.sum(df**2, axis=1)) # their euclidean distances

use array and use numpy broadcasting in order to subtract it from Y
init the matrix:
>>> from numpy import *
>>> a = array([[1,2,3],[4,5,6]])
Accessing the second row in a:
>>> a[1]
array([4, 5, 6])
Subtract array from Y
>>> Y = array([3,9,0])
>>> a - Y
array([[-2, -7, 3],
[ 1, -4, 6]])

Just iterate rows from your numpy array and you can actually just subtract them and numpy will make a new array with the differences!
import numpy as np
final_array = []
#X is a numpy array that is 30X8100 and Y is a numpy array that is 1X8100
for row in X:
output = row - Y
final_array.append(output)
output will be your resulting array of X[0] - Y, X[1] - Y etc. Now your final_array will be an array with 30 arrays inside, each that have the values of the X-Y that you need! Simple as that. Just make sure you convert your matrices to a numpy arrays first
Edit: Since numpy broadcasting will do the iteration, all you need is one line once you have your two arrays:
final_array = X - Y
And then that is your array with the differences!

a1 = numpy.array(X) #make sure you have a numpy array like [[1,2,3],[4,5,6],...]
a2 = numpy.array(Y) #make sure you have a 1d numpy array like [1,2,3,...]
a2 = [a2] * len(a1[0]) #make a2 as wide as a1
a2 = numpy.array(zip(*a2)) #transpose it (a2 is now same shape as a1)
print a1-a2 #idiomatic difference between a1 and a2 (or X and Y)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to vectorize performing pairwise sums given two numpy arrays? - python

You can take advantage of broadcasting, creating a 2D array, then you can easily get z[i,j] = x[i] + y[j] x = np.reshape(x, (-1, 1)) # shape (N, 1) y = np.reshape(y, (-1, 1)) # shape (N, 1) z = x + y.T # shape (N, N) If you want to have z as a 1D array you can do z.reshape(-1).

Related

Create numpy matrix from output of function on vector vs vector

Create matrix with numpy meshgrid

How to calculate x*x.T in python

"Multiply" 1d numpy array with a smaller one and sum the result

Euclidean distances between several images and one base image

Categories

Resources