Is there a numerically stable way to compute softmax function below?
I am getting values that becomes Nans in Neural network code.
np.exp(x)/np.sum(np.exp(y))
The softmax exp(x)/sum(exp(x)) is actually numerically well-behaved. It has only positive terms, so we needn't worry about loss of significance, and the denominator is at least as large as the numerator, so the result is guaranteed to fall between 0 and 1.
The only accident that might happen is over- or under-flow in the exponentials. Overflow of a single or underflow of all elements of x will render the output more or less useless.
But it is easy to guard against that by using the identity softmax(x) = softmax(x + c) which holds for any scalar c: Subtracting max(x) from x leaves a vector that has only non-positive entries, ruling out overflow and at least one element that is zero ruling out a vanishing denominator (underflow in some but not all entries is harmless).
Footnote: theoretically, catastrophic accidents in the sum are possible, but you'd need a ridiculous number of terms. For example, even using 16 bit floats which can only resolve 3 decimals---compared to 15 decimals of a "normal" 64 bit float---we'd need between 2^1431 (~6 x 10^431) and 2^1432 to get a sum that is off by a factor of two.
Softmax function is prone to two issues: overflow and underflow
Overflow: It occurs when very large numbers are approximated as infinity
Underflow: It occurs when very small numbers (near zero in the number line) are approximated (i.e. rounded to) as zero
To combat these issues when doing softmax computation, a common trick is to shift the input vector by subtracting the maximum element in it from all elements. For the input vector x, define z such that:
z = x-max(x)
And then take the softmax of the new (stable) vector z
Example:
def stable_softmax(x):
z = x - max(x)
numerator = np.exp(z)
denominator = np.sum(numerator)
softmax = numerator/denominator
return softmax
# input vector
In [267]: vec = np.array([1, 2, 3, 4, 5])
In [268]: stable_softmax(vec)
Out[268]: array([ 0.01165623, 0.03168492, 0.08612854, 0.23412166, 0.63640865])
# input vector with really large number, prone to overflow issue
In [269]: vec = np.array([12345, 67890, 99999999])
In [270]: stable_softmax(vec)
Out[270]: array([ 0., 0., 1.])
In the above case, we safely avoided the overflow problem by using stable_softmax()
For more details, see chapter Numerical Computation in deep learning book.
Extending #kmario23's answer to support 1 or 2 dimensional numpy arrays or lists. 2D tensors (assuming the first dimension is the batch dimension) are common if you're passing a batch of results through softmax:
import numpy as np
def stable_softmax(x):
z = x - np.max(x, axis=-1, keepdims=True)
numerator = np.exp(z)
denominator = np.sum(numerator, axis=-1, keepdims=True)
softmax = numerator / denominator
return softmax
test1 = np.array([12345, 67890, 99999999]) # 1D numpy
test2 = np.array([[12345, 67890, 99999999], # 2D numpy
[123, 678, 88888888]]) #
test3 = [12345, 67890, 999999999] # 1D list
test4 = [[12345, 67890, 999999999]] # 2D list
print(stable_softmax(test1))
print(stable_softmax(test2))
print(stable_softmax(test3))
print(stable_softmax(test4))
[0. 0. 1.]
[[0. 0. 1.]
[0. 0. 1.]]
[0. 0. 1.]
[[0. 0. 1.]]
There is nothing wrong with calculating the softmax function as it is in your case. The problem seems to come from exploding gradient or this sort of issues with your training methods. Focus on those matters with either "clipping values" or "choosing the right initial distribution of weights".
Related
I recently encountered a problem of the inaccuracy of the matrix product/multiplication in NumPy. See my example below also here https://trinket.io/python3/6a4c22e450
import numpy as np
para = np.array([[ 3.28522453e+08, -1.36339334e+08, 1.36339334e+08],
[-1.36339334e+08, 5.65818682e+07, -5.65818682e+07],
[ 1.36339334e+08, -5.65818672e+07, 5.65818682e+07]])
in1 = np.array([[ 285.91695469],
[ 262.3 ],
[-426.64380594]])
in2 = np.array([[ 285.91695537],
[ 262.3 ],
[-426.64380443]])
(in1 - in2)/in1
>>> array([[-2.37831286e-09],
[ 0.00000000e+00],
[ 3.53925214e-09]])
The difference between in1 and in2 is very small, which is ~10^-9
res1 = para # in1
>>> array([[-356.2361908 ],
[ 443.16068268],
[-180.86068344]])
res2 = para # in2
>>> array([[ 73.03147125],
[265.01131439],
[ -2.71131516]])
but after the matrix multiplication, why does the difference between the output res1 and res2 change so much?
(res1 - res2)/res1
>>> array([[1.20500857],
[0.40199723],
[0.98500882]])
This is not a bug; it is to be expected with a matrix such as yours.
Your matrix (which is symmetric) has one large and two small eigenvalues:
In [34]: evals, evecs = np.linalg.eigh(para)
In [35]: evals
Out[35]: array([-1.06130078e-01, 1.00000000e+00, 4.41686189e+08])
Because the matrix is symmetric, it can be diagonalized with an orthonormal basis. That just means that we can define a new coordinate system in which the matrix is diagonal, and the diagonal values are those eigenvalues. The effect of multiplying the matrix by a vector in these coordinates is to simply multiply each coordinate by the corresponding eigenvalue, i.e. the first coordinate is multiplied by -0.106, the second coordinate doesn't change, and the third coordinate is multiplied by the large factor 4.4e8.
The reason you get such a drastic change when multiplying the original matrix para by in1 and in2 is that, in the new coordinates, the third component of the transformed in1 is positive, and the third component of the transformed in2 is negative. (That is, the points are on opposite sides of the 2-d eigenspace associated with the two smaller eigenvalues.) There are several ways to find these transformed coordinates; one is to compute inv(V)#x, where V is the matrix of eigenvectors:
In [36]: np.linalg.solve(evecs, in1)
Out[36]:
array([[ 5.64863071e+02],
[-1.16208620e+02],
[ 8.55527517e-07]])
In [37]: np.linalg.solve(evecs, in2)
Out[37]:
array([[ 5.64863070e+02],
[-1.16208619e+02],
[-2.71381169e-07]])
Note the different signs of the third components. The values are small, but when you multiply by the diagonal matrix, they are multiplied by 4.4e8, giving 377.87 and -119.86, respectively. That large change shows up as the results that you observed in the original coordinates.
For a rougher calculation: note that the elements of para are ~10^8, so multiplication on that order of magnitude occurs when you compute para # x. It is not surprising then, that given the relative differences between in1 and in2 are ~10^-9, the relative differences of res1 and res2 will be ~10^-9 * ~10^8 or ~0.1. (Your calculated relative errors were [1.2, 0.4, 0.99], so the rough estimate is in the right ballpark.)
This looks like a bug ... numpy is written in C, so this could be an issue of casting number into smaller float, which causes big floating point error in this case
I have a set of points in 2-dimensional space and need to calculate the distance from each point to each other point.
I have a relatively small number of points, maybe at most 100. But since I need to do it often and rapidly in order to determine the relationships between these moving points, and since I'm aware that iterating through the points could be as bad as O(n^2) complexity, I'm looking for ways to take advantage of numpy's matrix magic (or scipy).
As it stands in my code, the coordinates of each object are stored in its class. However, I could also update them in a numpy array when I update the class coordinate.
class Cell(object):
"""Represents one object in the field."""
def __init__(self,id,x=0,y=0):
self.m_id = id
self.m_x = x
self.m_y = y
It occurs to me to create a Euclidean distance matrix to prevent duplication, but perhaps you have a cleverer data structure.
I'm open to pointers to nifty algorithms as well.
Also, I note that there are similar questions dealing with Euclidean distance and numpy but didn't find any that directly address this question of efficiently populating a full distance matrix.
You can take advantage of the complex type :
# build a complex array of your cells
z = np.array([complex(c.m_x, c.m_y) for c in cells])
First solution
# mesh this array so that you will have all combinations
m, n = np.meshgrid(z, z)
# get the distance via the norm
out = abs(m-n)
Second solution
Meshing is the main idea. But numpy is clever, so you don't have to generate m & n. Just compute the difference using a transposed version of z. The mesh is done automatically :
out = abs(z[..., np.newaxis] - z)
Third solution
And if z is directly set as a 2-dimensional array, you can use z.T instead of the weird z[..., np.newaxis]. So finally, your code will look like this :
z = np.array([[complex(c.m_x, c.m_y) for c in cells]]) # notice the [[ ... ]]
out = abs(z.T-z)
Example
>>> z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])
>>> abs(z.T-z)
array([[ 0. , 2.23606798, 4.12310563],
[ 2.23606798, 0. , 4.24264069],
[ 4.12310563, 4.24264069, 0. ]])
As a complement, you may want to remove duplicates afterwards, taking the upper triangle :
>>> np.triu(out)
array([[ 0. , 2.23606798, 4.12310563],
[ 0. , 0. , 4.24264069],
[ 0. , 0. , 0. ]])
Some benchmarks
>>> timeit.timeit('abs(z.T-z)', setup='import numpy as np;z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])')
4.645645342274779
>>> timeit.timeit('abs(z[..., np.newaxis] - z)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')
5.049334864854522
>>> timeit.timeit('m, n = np.meshgrid(z, z); abs(m-n)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')
22.489568296184686
If you don't need the full distance matrix, you will be better off using kd-tree. Consider scipy.spatial.cKDTree or sklearn.neighbors.KDTree. This is because a kd-tree kan find k-nearnest neighbors in O(n log n) time, and therefore you avoid the O(n**2) complexity of computing all n by n distances.
Jake Vanderplas gives this example using broadcasting in Python Data Science Handbook, which is very similar to what #shx2 proposed.
import numpy as np
rand = random.RandomState(42)
X = rand.rand(3, 2)
dist_sq = np.sum((X[:, np.newaxis, :] - X[np.newaxis, :, :]) ** 2, axis = -1)
dist_sq
array([[0. , 0.18543317, 0.81602495],
[0.18543317, 0. , 0.22819282],
[0.81602495, 0.22819282, 0. ]])
Here is how you can do it using numpy:
import numpy as np
x = np.array([0,1,2])
y = np.array([2,4,6])
# take advantage of broadcasting, to make a 2dim array of diffs
dx = x[..., np.newaxis] - x[np.newaxis, ...]
dy = y[..., np.newaxis] - y[np.newaxis, ...]
dx
=> array([[ 0, -1, -2],
[ 1, 0, -1],
[ 2, 1, 0]])
# stack in one array, to speed up calculations
d = np.array([dx,dy])
d.shape
=> (2, 3, 3)
Now all is left is computing the L2-norm along the 0-axis (as discussed here):
(d**2).sum(axis=0)**0.5
=> array([[ 0. , 2.23606798, 4.47213595],
[ 2.23606798, 0. , 2.23606798],
[ 4.47213595, 2.23606798, 0. ]])
If you are looking for the most efficient way of computation - use SciPy's cdist() (or pdist() if you need just vector of pairwise distances instead of full distance matrix) as suggested in Tweakimp's comment. As he said it's a lot faster than method based on vectorization and broadcasting, proposed by RichPauloo and shx2. The reason for that is that SciPy's cdist() and pdist() under the hood use for loop and C implementations for metric computations, which are even faster than vectorization.
By the way, if you can use SciPy and still prefer method using broadcasting, you don't have to implement it by yourself, as distance_matrix() function is pure Python implementation, which leverages broadcasting and vectorization (source code, docs).
It's worth mentioning that cdist()/pdist() is also more efficient than broadcasting memory-wise, as it computes distances one by one and avoids creating arrays of n*n*d elements, where n is number of points and d is points' dimensionality.
Experiments
I've conducted some simple experiments to compare performance of SciPy's cdist(), distance_matrix() and broadcasting implementation in NumPy. I used perf_counter_ns() from Python's time module to measure time and all the results are averaged over 10 runs on 10000 points in 2D space using np.float64 datatype (tested on Python 3.8.10, Windows 10 with Ryzen 2700 and 16 GB RAM):
cdist() - 0.6724s
distance_matrix() - 3.0128s
my NumPy implementation - 3.6931s
Code if someone wants to reproduce experiments:
from scipy.spatial import *
import numpy as np
from time import perf_counter_ns
def dist_mat_custom(a, b):
return np.sqrt(np.sum(np.square(a[:, np.newaxis, :] - b[np.newaxis, :, :]), axis=-1))
results = []
size = 10000
it_num = 10
for i in range(it_num):
a = np.random.normal(size=(size, 2))
b = np.random.normal(size=(size, 2))
start = perf_counter_ns()
c = distance_matrix(a, b)
#c = dist_mat_custom(a, b)
#c = distance.cdist(a, b)
results.append(perf_counter_ns() - start)
print(np.mean(results) / 1e9)
I expected the poly() and roots() functions to be each other's inverse. However, this isn't quite true:
# Polys coeffs
pol_c = np.poly([-1, 1, 1, 10]) # Get Polynomial coeffs for eqt with stated roots
# Roots from the poly equation
root_val = np.roots(pol_c)
# Roots from the poly equation, manually entered as integers
roots_v2 = np.roots([1,-11,9,11,-10])
print(pol_c)
print(root_val)
print(roots_v2)
Gives
[1. -11. 9. 11. -10.]
[10.+0.0000000e+00j -1.+0.0000000e+00j 1.+9.6357437e-09j
1.-9.6357437e-09j]
[10.+0.0000000e+00j -1.+0.0000000e+00j 1.+9.6357437e-09j
1.-9.6357437e-09j]
ie. the 3rd & 4th roots are (slightly) imaginary instead of real
My first thought was floating point error, but given that roots() outputs the same answer for float and int inputs that seems not to be the case. Plus I would expect poly() to give non-integer answers if floating point accuracy was limiting the solves.
The functions are inverses of each other, within some computational errors (which may be complex), and up to reordering of roots.
pol_c = np.poly([-1, 1, 1, 10])
root_val = np.roots(pol_c)
print(np.real_if_close(np.around(root_val, 6)))
prints [10. -1. 1. 1.] which is the same as we started with, in another order.
Of course, the order need not be the same: the original order of roots is lost when pol_c was formed, and there is no canonical order for the roots of polynomials (which are generally complex) anyway.
Any skew-symmetric matrix (A^T = -A) can be turned into a Hermitian matrix (iA) and diagonalised with complex numbers. But it is also possible to bring it into block-diagonal form with a special orthogonal transformation and find its eigevalues using only real arithmetic. Is this implemented anywhere in numpy?
Let's take a look at the dgeev() function of the LAPACK librarie. This routine computes the eigenvalues of any real double-precison square matrix. Moreover, this routine is right behind the python function numpy.linalg.eigvals() of the numpy library.
The method used by dgeev() is described in the documentation of LAPACK. It requires the reduction of the matrix A to its real Schur form S.
Any real square matrix A can be expressed as:
A=QSQ^t
where:
Q is a real orthogonal matrix: QQ^t=I
S is a real block upper triangular matrix. The blocks on the diagonal of S are of size 1×1 or 2×2.
Indeed, if A is skew-symmetric, this decomposition seems really close to a block diagonal form obtained by a special orthogonal transformation of A. Moreover, it is really to see that the Schur form S of the skew symmetric matrix A is ... skew-symmetric !
Indeed, let's compute the transpose of S:
S^t=(Q^tAQ)^t
S^t=Q^t(Q^tA)^t
S^t=Q^tA^tQ
S^t=Q^t(-A)Q
S^t=-Q^tAQ
S^t=-S
Hence, if Q is special orthogonal (det(Q)=1), S is a block diagonal form obtained by a special orthogonal transformation. Else, a special orthogonal matrix P can be computed by permuting the first two columns of Q and another Schur form Sd of the matrix A is obtained by changing the sign of S_{12} and S_{21}. Indeed, A=PSdP^t. Then, Sd is a block diagonal form of A obtained by a special orthogonal transformation.
In the end, even if numpy.linalg.eigvals() applied to a real matrix returns complex numbers, there is little complex computation involved in the process !
If you just want to compute the real Schur form, use the function scipy.linalg.schur() with argument output='real'.
Just a piece of code to check that:
import numpy as np
import scipy.linalg as la
a=np.random.rand(4,4)
a=a-np.transpose(a)
print "a= "
print a
#eigenvalue
w, v =np.linalg.eig(a)
print "eigenvalue "
print w
print "eigenvector "
print v
# Schur decomposition
#import scipy
#print scipy.version.version
t,z=la.schur(a, output='real', lwork=None, overwrite_a=True, sort=None, check_finite=True)
print "schur form "
print t
print "orthogonal matrix "
print z
Yes you can do it via sticking a unitary transformation in the middle of the product hence we get
A = V * U * V^-1 = V * T' * T * U * T' * T * V^{-1}.
Once you get the idea you can optimize the code by tiling things but let's do it the naive way by forming T explicitly.
If the matrix is even-sized then all blocks are complex conjugates. Otherwise we get a zero as the eigenvalue. The eigenvalues are guaranteed to have zero real parts so the first thing is to clean up the noise and then order such that the zeros are on the upper left corner (arbitrary choice).
n = 5
a = np.random.rand(n,n)
a=a-np.transpose(a)
[u,v] = np.linalg.eig(a)
perm = np.argsort(np.abs(np.imag(u)))
unew = 1j*np.imag(u[perm])
Obviously, we need to reorder the eigenvector matrix too to keep things equivalent.
vnew = v[:,perm]
Now so far we did nothing other than reordering the middle eigenvalue matrix in the eigenvalue decomposition. Now we switch from complex form to real block diagonal form.
First we have to know how many zero eigenvalues there are
numblocks = np.flatnonzero(unew).size // 2
num_zeros = n - (2 * numblocks)
Then we basically, form another unitary transformation (complex this time) and stick it the same way
T = sp.linalg.block_diag(*[1.]*num_zeros,np.kron(1/np.sqrt(2)*np.eye(numblocks),np.array([[1.,1j],[1,-1j]])))
Eigs = np.real(T.conj().T.dot(np.diag(unew).dot(T)))
Evecs = np.real(vnew.dot(T))
This gives you the new real valued decomposition. So the code all in one place
n = 5
a = np.random.rand(n,n)
a=a-np.transpose(a)
[u,v] = np.linalg.eig(a)
perm = np.argsort(np.abs(np.imag(u)))
unew = 1j*np.imag(u[perm])
vnew = v[perm,:]
numblocks = np.flatnonzero(unew).size // 2
num_zeros = n - (2 * numblocks)
T = sp.linalg.block_diag(*[1.]*num_zeros,np.kron(1/np.sqrt(2)*np.eye(numblocks),np.array([[1.,1j],[1,-1j]])))
Eigs = np.real(T.conj().T.dot(np.diag(unew).dot(T)))
Evecs = np.real(vnew.dot(T))
print(np.allclose(Evecs.dot(Eigs.dot(np.linalg.inv(Evecs))) - a,np.zeros((n,n))))
gives True. Note that this is the naive way of obtaining the real spectral decomposition. There are lots of places where you need to keep track of numerical error accumulation.
Example output
Eigs
Out[379]:
array([[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , -0.61882847, 0. , 0. ],
[ 0. , 0.61882847, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , -1.05097581],
[ 0. , 0. , 0. , 1.05097581, 0. ]])
Evecs
Out[380]:
array([[-0.15419078, -0.27710323, -0.39594838, 0.05427001, -0.51566173],
[-0.22985364, 0.0834649 , 0.23147553, -0.085043 , -0.74279915],
[ 0.63465436, 0.49265672, 0. , 0.20226271, -0.38686576],
[-0.02610706, 0.60684296, -0.17832525, 0.23822511, 0.18076858],
[-0.14115513, -0.23511356, 0.08856671, 0.94454277, 0. ]])
I have a set of points in 2-dimensional space and need to calculate the distance from each point to each other point.
I have a relatively small number of points, maybe at most 100. But since I need to do it often and rapidly in order to determine the relationships between these moving points, and since I'm aware that iterating through the points could be as bad as O(n^2) complexity, I'm looking for ways to take advantage of numpy's matrix magic (or scipy).
As it stands in my code, the coordinates of each object are stored in its class. However, I could also update them in a numpy array when I update the class coordinate.
class Cell(object):
"""Represents one object in the field."""
def __init__(self,id,x=0,y=0):
self.m_id = id
self.m_x = x
self.m_y = y
It occurs to me to create a Euclidean distance matrix to prevent duplication, but perhaps you have a cleverer data structure.
I'm open to pointers to nifty algorithms as well.
Also, I note that there are similar questions dealing with Euclidean distance and numpy but didn't find any that directly address this question of efficiently populating a full distance matrix.
You can take advantage of the complex type :
# build a complex array of your cells
z = np.array([complex(c.m_x, c.m_y) for c in cells])
First solution
# mesh this array so that you will have all combinations
m, n = np.meshgrid(z, z)
# get the distance via the norm
out = abs(m-n)
Second solution
Meshing is the main idea. But numpy is clever, so you don't have to generate m & n. Just compute the difference using a transposed version of z. The mesh is done automatically :
out = abs(z[..., np.newaxis] - z)
Third solution
And if z is directly set as a 2-dimensional array, you can use z.T instead of the weird z[..., np.newaxis]. So finally, your code will look like this :
z = np.array([[complex(c.m_x, c.m_y) for c in cells]]) # notice the [[ ... ]]
out = abs(z.T-z)
Example
>>> z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])
>>> abs(z.T-z)
array([[ 0. , 2.23606798, 4.12310563],
[ 2.23606798, 0. , 4.24264069],
[ 4.12310563, 4.24264069, 0. ]])
As a complement, you may want to remove duplicates afterwards, taking the upper triangle :
>>> np.triu(out)
array([[ 0. , 2.23606798, 4.12310563],
[ 0. , 0. , 4.24264069],
[ 0. , 0. , 0. ]])
Some benchmarks
>>> timeit.timeit('abs(z.T-z)', setup='import numpy as np;z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])')
4.645645342274779
>>> timeit.timeit('abs(z[..., np.newaxis] - z)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')
5.049334864854522
>>> timeit.timeit('m, n = np.meshgrid(z, z); abs(m-n)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')
22.489568296184686
If you don't need the full distance matrix, you will be better off using kd-tree. Consider scipy.spatial.cKDTree or sklearn.neighbors.KDTree. This is because a kd-tree kan find k-nearnest neighbors in O(n log n) time, and therefore you avoid the O(n**2) complexity of computing all n by n distances.
Jake Vanderplas gives this example using broadcasting in Python Data Science Handbook, which is very similar to what #shx2 proposed.
import numpy as np
rand = random.RandomState(42)
X = rand.rand(3, 2)
dist_sq = np.sum((X[:, np.newaxis, :] - X[np.newaxis, :, :]) ** 2, axis = -1)
dist_sq
array([[0. , 0.18543317, 0.81602495],
[0.18543317, 0. , 0.22819282],
[0.81602495, 0.22819282, 0. ]])
Here is how you can do it using numpy:
import numpy as np
x = np.array([0,1,2])
y = np.array([2,4,6])
# take advantage of broadcasting, to make a 2dim array of diffs
dx = x[..., np.newaxis] - x[np.newaxis, ...]
dy = y[..., np.newaxis] - y[np.newaxis, ...]
dx
=> array([[ 0, -1, -2],
[ 1, 0, -1],
[ 2, 1, 0]])
# stack in one array, to speed up calculations
d = np.array([dx,dy])
d.shape
=> (2, 3, 3)
Now all is left is computing the L2-norm along the 0-axis (as discussed here):
(d**2).sum(axis=0)**0.5
=> array([[ 0. , 2.23606798, 4.47213595],
[ 2.23606798, 0. , 2.23606798],
[ 4.47213595, 2.23606798, 0. ]])
If you are looking for the most efficient way of computation - use SciPy's cdist() (or pdist() if you need just vector of pairwise distances instead of full distance matrix) as suggested in Tweakimp's comment. As he said it's a lot faster than method based on vectorization and broadcasting, proposed by RichPauloo and shx2. The reason for that is that SciPy's cdist() and pdist() under the hood use for loop and C implementations for metric computations, which are even faster than vectorization.
By the way, if you can use SciPy and still prefer method using broadcasting, you don't have to implement it by yourself, as distance_matrix() function is pure Python implementation, which leverages broadcasting and vectorization (source code, docs).
It's worth mentioning that cdist()/pdist() is also more efficient than broadcasting memory-wise, as it computes distances one by one and avoids creating arrays of n*n*d elements, where n is number of points and d is points' dimensionality.
Experiments
I've conducted some simple experiments to compare performance of SciPy's cdist(), distance_matrix() and broadcasting implementation in NumPy. I used perf_counter_ns() from Python's time module to measure time and all the results are averaged over 10 runs on 10000 points in 2D space using np.float64 datatype (tested on Python 3.8.10, Windows 10 with Ryzen 2700 and 16 GB RAM):
cdist() - 0.6724s
distance_matrix() - 3.0128s
my NumPy implementation - 3.6931s
Code if someone wants to reproduce experiments:
from scipy.spatial import *
import numpy as np
from time import perf_counter_ns
def dist_mat_custom(a, b):
return np.sqrt(np.sum(np.square(a[:, np.newaxis, :] - b[np.newaxis, :, :]), axis=-1))
results = []
size = 10000
it_num = 10
for i in range(it_num):
a = np.random.normal(size=(size, 2))
b = np.random.normal(size=(size, 2))
start = perf_counter_ns()
c = distance_matrix(a, b)
#c = dist_mat_custom(a, b)
#c = distance.cdist(a, b)
results.append(perf_counter_ns() - start)
print(np.mean(results) / 1e9)