I am trying to understand the implementation that is used in
scipy.stats.wasserstein_distance
for p=1 and no weights, with u_values, v_values the two 1-D distributions, the code comes down to
u_sorter = np.argsort(u_values) (1)
v_sorter = np.argsort(v_values)
all_values = np.concatenate((u_values, v_values)) (2)
all_values.sort(kind='mergesort')
deltas = np.diff(all_values) (3)
u_cdf_indices = u_values[u_sorter].searchsorted(all_values[:-1], 'right') (4)
v_cdf_indices = v_values[v_sorter].searchsorted(all_values[:-1], 'right')
v_cdf = v_cdf_indices / v_values.size (5)
u_cdf = u_cdf_indices / u_values.size
return np.sum(np.multiply(np.abs(u_cdf - v_cdf), deltas)) (6)
What is the reasoning behind this implementation, is there some literature?
I did look at the paper cited which I believe explains why calculating the Wasserstein distance in its general definition in 1D is equivalent to evaluating the integral,
\int_{-\infty}^{+\infty} |U-V|,
with U and V the cumulative distribution functions for the distributions u_values and v_values,
but I don't understand how this integral is evaluated in scipy implementation.
In particular,
a) why are they multiplying by the deltas in (6) to solve the integral?
b) how are v_cdf and u_cdf in (5) the cumulative distribution functions U and V?
Also, with this implementation the element order of the distribution u_values and v_values is not preserved. Shouldn't this be the case in the general Wasserstein distance definition?
Thank you for your help!
The order of the PDF, histogram or KDE is preserved and is important in Wasserstein distance. If you only pass the u_values and v_values then it has to calculate something like a PDF, KDE or histogram. Normally you would provide the PDF and the range of U and V as the 4 arguments to the function wasserstein_distance. So in the case where samples are provided you are not passing a real datapoint, simply a collection of repeated "experiments". Numbers 1 and 4 in your list of code blocks basically bins your data by the number of discrete values. A CDF is the number of discrete values until that point or P(x<X). The CDF is basically the cumulative sum of a PDF, histogram or KDE. Number 5 does the normalization of the CDF to between 0.0 and 1.0 or said another way it divides the bin by the number of bins.
So the order of the discrete values is preserved, not the original order in the datapoint.
B) It may make more sense if you plot the CDF's of a datapoint such as an image file by using the code above.
The transportation problem however may not need a PDF, but rather a datapoint of ordered features or some way to measure distance between features in which case you would calculate it differently.
(Working in 2d for simplicity) I know that the force exerted on two spherical bodies by each other due to gravity is
G(m1*m2/r**2)
However, for a non-spherical object, I cannot find an algorithm or formula that is able to calculate the same force. My initial thought was to pack circles into the object so that the force by gravity would be equal to the sum of the forces by each of the circles. E.g (pseudocode),
def gravity(pos1,shape):
circles = packCircles(shape.points)
force = 0
for each circle in circles:
distance = distanceTo(pos1,circle.pos)
force += newtonForce(distance,shape.mass,1) #1 mass of observer
return force
Would this be a viable solution? If so, how would I pack circles efficiently and quickly? If not, is there a better solution?
Edit: Notice how I want to find the force of the object at a specific point, so angles between the circle and observer will need to be calculated (and vectors summed). It is different from finding the total force exerted.
Background
Some of this explanation will be somewhat off-topic but I think it is necessary to help clarify some of the things brought up in the comments and because much of this is somewhat counterintuitive.
This explanation of gravitational interactions depends on the concept of point masses. Suppose you have two point masses which are in an isolated system separated from each other by some distance, r1, with masses of m1 and m2 respectively,
The gravitational field created by m1 is given by
where G is the universal gravitational constant, r is the distance from m1 and r̂ is the unit direction along the line between m1 and m2.
The gravitational force exerted on m2 by this field is given by
Note - Importantly, this is true for any two point masses at any distance.1
The field nature of gravitational interactions allows us to employ superposition in calculating the net gravitational force due to multiple interactions. Consider if we add another mass, m3 to the previous scenario,
Then the gravitational force on mass m2 is simply a sum of the gravitational force from the fields created by each other mass,
with ri,j = rj,i. This holds for any number of masses at any separations. It also implies that the field created by a collection of masses can be aggregated by a vector sum, if you prefer that formalism.
Now consider if we had a very large number of point masses, M, aggregated together in a continuous, rigid body of uniform density. Then we wanted to calculate the gravitational force on a single spatially distinct point mass, m, due to the aggregate mass, M:
Then instead of considering point masses we can consider areas (or volumes) of mass of differential size and either integrate or sum the effect of these areas (or volumes) on the point mass. In the two dimensional case, the magnitude of the gravitational force is then
where σ is the density of the aggregate mass.2 This is equivalent to summing the gravitational vector field due to each differential mass, σdxdy. Such equivalence is critically important because it implies that for any point mass far enough outside of a mass distribution, the gravitational force due to the mass distribution is almost exactly the same as it would be for a point mass of mass M located at the center of mass of the mass distribution.3 4
This means that, to very good approximation, when it comes to calculating the gravitational field due to any mass distribution, the mass distribution can be replaced with an equivalent-mass point mass at the center of mass of the distribution. This holds for any number of spatially distinct mass distributions, whether those distributions constitute a rigid body or not. Furthermore, it means that you can even aggregate groups of distributions into a single point mass at the center of mass of the system.5 As long as the reference point is far enough away.
However, in order to find the gravitational force on a point mass due to a mass distribution at any point, for any mass distribution in a shape and separation agnostic manner we have to calculate the gravitational field at that point by summing the contributions from each portion of the mass distribution.6
Back to the question
Of course for an arbitrary polygon or polyhedron the analytical solution can be prohibitively difficult, so it is much simpler to use a summation, and algorithmic approaches will similarly use a summation.
Algorithmically speaking, the simplest approach here is not actually geometric packing (with either circles/spheres or squares/cubes). It's not impossible to use packing, but mathematically there are significant challenges to that approach - it is better to employ a method which relies on simpler math. One such approach is to define a grid encompassing the spatial extent of the mass distribution, and then create simple (square/cubic or rectangular/cuboidic) polygons or polyhedrons with the grid points as vertices. This creates three kinds of polygons or polyhedrons:
Those which do not encompass any of the mass distribution
Those which are completely filled by the mass distribution
Those which are partially filled by the mass distribution
Center of Mass - Approach 1
This will work well when the distance from the reference point to the mass distribution is large relative to the angular extent of the distribution, and when there is no geometric enclosure of the reference by the mass distribution (or by any several distributions).
You can then find the center of mass, R of the distribution by summing the contributions from each polygon,
where M is the total mass of the distribution, ri is the spatial vector to the geometric center of the ith polygon, and mi is the density times the portion of the polygon which contains mass (i.e. 1.00 for completely filled polygons and 0.00 for completely empty polygons). As you increase the sampling size (the number of grid points) the approximation for the center of mass will approach the analytical solution. Once you have the center of mass it is trivial to calculate the gravitational field created: you simply place a point mass of mass M at the point R and use the equation from above.
For demonstration, here is an implementation of the described approach in two dimensions in Python using the shapely library for the polygon operations:
import numpy as np
import matplotlib.pyplot as plt
import shapely.geometry as geom
def centerOfMass(r, density = 1.0, n = 100):
theta = np.linspace(0, np.pi*2, len(r))
xy = np.stack([np.cos(theta)*r, np.sin(theta)*r], 1)
mass_dist = geom.Polygon(xy)
x, y = mass_dist.exterior.xy
# Create the grid and populate with polygons
gx, gy = np.meshgrid(np.linspace(min(x), max(x), n), np.linspace(min(y),
max(y), n))
polygons = [geom.Polygon([[gx[i,j], gy[i,j]],
[gx[i,j+1], gy[i,j+1]],
[gx[i+1,j+1],gy[i+1,j+1]],
[gx[i+1,j], gy[i+1,j]],
[gx[i,j], gy[i,j]]])
for i in range(gx.shape[0]-1) for j in range(gx.shape[1]-1)]
# Calculate center of mass
R = np.zeros(2)
M = 0
for p in polygons:
m = (p.intersection(mass_dist).area / p.area) * density
M += m
R += m * np.array([p.centroid.x, p.centroid.y])
return geom.Point(R / M), M
density = 1.0 # kg/m^2
G = 6.67408e-11 # m^3/kgs^2
theta = np.linspace(0, np.pi*2, 100)
r = np.cos(theta*2+np.pi)+5+np.sin(theta)+np.cos(theta*3+np.pi/6)
R, M = centerOfMass(r, density)
m = geom.Point(20, 0)
r_1 = m.distance(R)
m_1 = 5.0 # kg
F = G * (m_1 * M) / r_1**2
rhat = np.array([R.x - m.x, R.y - m.y])
rhat /= (rhat[0]**2 + rhat[1]**2)**0.5
# Draw the mass distribution and force vector, etc
plt.figure(figsize=(12, 6))
plt.axis('off')
plt.plot(np.cos(theta)*r, np.sin(theta)*r, color='k', lw=0.5, linestyle='-')
plt.scatter(m.x, m.y, s=20, color='k')
plt.text(m.x, m.y-1, r'$m$', ha='center')
plt.text(1, -1, r'$M$', ha='center')
plt.quiver([m.x], [m.y], [rhat[0]], [rhat[1]], width=0.004,
scale=0.25, scale_units='xy')
plt.text(m.x - 5, m.y + 1, r'$F = {:.5e}$'.format(F))
plt.scatter(R.x, R.y, color='k')
plt.text(R.x, R.y+0.5, 'Center of Mass', va='bottom', ha='center')
plt.gca().set_aspect('equal')
plt.show()
This approach is a bit overkill: in most cases it would suffice to find the centroid and the area of the polygon multiplied by the density for the center of mass and total mass. However, it would work for even non-uniform mass distributions - that's why I have used it for demonstration.
Field Summation - Approach 2
In many cases this approach is also overkill, especially in comparison to the first approach, but it will provide the best approximation under any distributions (within the classical regime).
The idea here is to sum the effect of each chunk of the mass distribution on a point mass to determine the net gravitational force (based on the premise that the gravitational fields can be independently added)
class pointMass:
def __init__(self, mass, x, y):
self.mass = mass
self.x = x
self.y = y
density = 1.0 # kg/m^2
G = 6.67408e-11 # m^3/kgs^2
def netForce(r, m1, density = 1.0, n = 100):
theta = np.linspace(0, np.pi*2, len(r))
xy = np.stack([np.cos(theta)*r, np.sin(theta)*r], 1)
# Create a shapely polygon for the mass distribution
mass_dist = geom.Polygon(xy)
x, y = mass_dist.exterior.xy
# Create the grid and populate with polygons
gx, gy = np.meshgrid(np.linspace(min(x), max(x), n), np.linspace(min(y),
max(y), n))
polygons = [geom.Polygon([[gx[i,j], gy[i,j]],
[gx[i,j+1], gy[i,j+1]],
[gx[i+1,j+1],gy[i+1,j+1]],
[gx[i+1,j], gy[i+1,j]],
[gx[i,j], gy[i,j]]])
for i in range(gx.shape[0]-1) for j in range(gx.shape[1]-1)]
g = np.zeros(2)
for p in polygons:
m2 = (p.intersection(mass_dist).area / p.area) * density
rhat = np.array([p.centroid.x - m1.x, p.centroid.y - m1.y])
rhat /= (rhat[0]**2 + rhat[1]**2)**0.5
g += m1.mass * m2 / p.centroid.distance(geom.Point(m1.x, m1.y))**2 * rhat
g *= G
return g
theta = np.linspace(0, np.pi*2, 100)
r = np.cos(theta*2+np.pi)+5+np.sin(theta)+np.cos(theta*3+np.pi/6)
m = pointMass(5.0, 20.0, 0.0)
g = netForce(r, m)
plt.figure(figsize=(12, 6))
plt.axis('off')
plt.plot(np.cos(theta)*r, np.sin(theta)*r, color='k', lw=0.5, linestyle='-')
plt.scatter(m.x, m.y, s=20, color='k')
plt.text(m.x, m.y-1, r'$m$', ha='center')
plt.text(1, -1, r'$M$', ha='center')
ghat = g / (g[0]**2 + g[1]**2)**0.5
plt.quiver([m.x], [m.y], [ghat[0]], [ghat[1]], width=0.004,
scale=0.25, scale_units='xy')
plt.text(m.x - 5, m.y + 1, r'$F = ({:0.3e}, {:0.3e})$'.format(g[0], g[1]))
plt.gca().set_aspect('equal')
plt.show()
Which, for the relatively simple test case being used, gives a result which is very close to the first approach:
But while there are cases where the first approach will not work correctly, there are no such cases where the second approach will fail (in the classical regime) so it is advisable to favor this approach.
1This does break down under extremes, e.g. past the event horizon of black holes, or when r approaches the Planck length, but those cases are not the subject of this question.
2This becomes significantly more complex in cases where the density is non-uniform, and there is no trivial analytical solution in cases where the mass distribution can not be described symbolically.
3It should probably be noted that this is effectively what the integral is doing; finding the center of mass.
4For a point mass within a mass distribution Newton's Shell Theorem, or a field summation must be used.
5In astronomy this is called a barycenter, and bodies always orbit the barycenter of the system - not the center of mass of any given body.
6In some cases it is sufficient to use Newton's Shell Theorem, however those cases are not distribution geometry agnostic.
I'm trying to understand the Spherical harmonics expansion in order to solve a more complex problem but the result I'm expecting from a very simple calculation is not correct. I have no clue why this is happening.
A bit of theory: It is well known that a function on the surface of a sphere () can be defined as an infinite sum of some constant coefficients and the spherical harmonics :
The spherical harmonics are defined as :
where are the associated Legendre polynomials.
An finally, the constant coefficients can be calculated (similarly to the Fourier transform) as follow:
The problem: Let's assume we have a sphere centered in where the function on the surface is equal to for all points . We want to calculate the constant coefficients and then calculate back the surface function by approximation. Since the calculation of the constant coefficients reduces to :
which numerically (in Python) can be approximated using:
def Ylm(l,m,theta,phi):
return scipy.special.sph_harm(m,l,theta,phi)
def flm(l,m):
phi, theta = np.mgrid[0:pi:101j, 0:2*pi:101j]
return Ylm(l,m,theta,phi).sum()
Then, by computing a band limited sum over I'm expecting to see when for any given point .
L = 20
f = 0
theta0, phi0 = 0.0, 0.0
for l in xrange(0,L+1):
for m in xrange(-l,l+1):
f += flm(l,m)*Ylm(l,m,theta0,phi0)
print f
but for it gives me and not . For it gives me
I know it seems more a Mathematics problem but the formulas should be correct. The problem seems being on my computation. It could be a really stupid mistake but I cannot spot it. Any suggestion?
Thanks
The spherical harmonics are orthonormal with the inner product
<f|g> = Integral( f(theta,phi)*g(theta,phi)*sin(theta)*dphi*dtheta)
So you should calulate the coefficients by
clm = Integral( Ylm( theta, phi) * sin(theta)*dphi*dtheta)
Model I-V.
Method:
Perform an integral, as a function of E, which outputs Current for each Voltage value used. This is repeated for an array of v_values. The equation can be found below.
Although the limits in this equation range from -inf to inf, the limits must be restricted so that (E+eV)^2-\Delta^2>0 and E^2-\Delta^2>0, to avoid poles. (\Delta_1 = \Delta_2). Therefore there are currently two integrals, with limits from -inf to -gap-e*v and gap to inf.
However, I keep returning a math range error although I believe I have excluded the troublesome E values by using the limits stated above. Pastie of errors: http://pastie.org/private/o3ugxtxai8zbktyxtxuvg
Apologies for the vagueness of this question. But, can anybody see obvious mistakes or code misuse?
My attempt:
from scipy import integrate
from numpy import *
import scipy as sp
import pylab as pl
import numpy as np
import math
e = 1.60217646*10**(-19)
r = 3000
gap = 400*10**(-6)*e
g = (gap)**2
t = 0.02
k = 1.3806503*10**(-23)
kt = k*t
v_values = np.arange(0,0.001,0.0001)
I=[]
for v in v_values:
val, err = integrate.quad(lambda E:(1/(e*r))*(abs(E)/np.sqrt(abs(E**2-g)))*(abs(E+e*v)/(np.sqrt(abs((E+e*v)**2-g))))*((1/(1+math.exp((E+e*v)/kt)))-(1/(1+math.exp(E/k*t)))),-inf,(-gap-e*v)*0.9)
I.append(val)
I = array(I)
I2=[]
for v in v_values:
val2, err = integrate.quad(lambda E:(1/(e*r))*(abs(E)/np.sqrt(abs(E**2-g)))*(abs(E+e*v)/(np.sqrt(abs((E+e*v)**2-g))))*((1/(1+math.exp((E+e*v)/kt)))-(1/(1+math.exp(E/k*t)))),gap*0.9,inf)
I2.append(val2)
I2 = array(I2)
I[np.isnan(I)] = 0
I[np.isnan(I2)] = 0
pl.plot(v_values,I,'-b',v_values,I2,'-b')
pl.show()
This question is better suited for the Computational Science site. Still here are some points for you to think about.
First, the range of integration is the intersection of (-oo, -eV-gap) U (-eV+gap, +oo) and (-oo, -gap) U (gap, +oo). There are two possible cases:
if eV < 2*gap then the allowed energy values are in (-oo, -eV-gap) U (gap, +oo);
if eV > 2*gap then the allowed energy values are in (-oo, -eV-gap) U (-eV+gap, -gap) U (gap, +oo).
Second, you are working in a very low temperature region. With t equal to 0.02 K, the denominator in the Boltzmann factor is 1.7 µeV, while the energy gap is 400 µeV. In this case the value of the exponent is huge for positive energies and it soon goes off the limits of the double precision floating point numbers, used by Python. As this is the minimum possible positive energy, things would not get any better at higher energies. With negative energies the value would always be very close to zero. Note that at this temperature, the Fermi-Dirac distribution has a very sharp edge and resembles a reflected theta function. At E = gap you would have exp(E/kT) of approximately 6.24E+100. You would run out of range when E/kT > 709.78 or E > 3.06*gap.
Yet it makes no sense to go to such energies since at that temperature the difference between the two Fermi functions very quickly becomes zero outside the [-eV, 0] interval which falls entirely inside the gap for the given temperature when V < (2*gap)/e (0.8 mV). That's why one would expect that the current would be very close to zero when the bias voltage is less than 0.8 mV. When it is more than 0.8 mV, then the main value of the integral would come from the integrand in (-eV+gap, -gap), although some non-zero value would come from the region near the singularity at E = gap and some from the region near the singularity at E = -eV-gap. You should not avoid the singularities in the DoS, otherwise you would not get the expected discontinuities (vertical lines) in the I(V) curve (image taken from Wikipedia):
Rather, you have to derive equivalent approximate expressions in the vicinity of each singularity and integrate them instead.
As you can see, there are many special cases for the value of the integrand and you have to take them all into account when computing numerically. If you don't want to do that, you should probably turn to some other mathematical package like Maple or Mathematica. These have much more sophisticated numerical integration routines and might be able to directly handle your formula.
Note that this is not an attempt to answer your question but rather a very long comment that would not fit in any comment field.
The reason for the math range error is that your exponential goes to infinity. Taking v = 0.0009 and E = 5.18e-23, the expression exp((E + e*v) / kt) (I corrected the typo pointed out by Hristo Liev in your Python expression) is exp(709.984..) which is beyond the range you can represent with double precision numbers (up to ca. 1E308).
Two additional notes:
As noted by others, you should probably rescale your equation by using a unit system which delivers numbers in a smaller range. Maybe, atomic units are a possible choice as it would set e = 1, but I did not try to convert your equation into it. (Probably, your timestep would then become quite large, as in atomic units the time unit is about is 1/40 fs).
Usually, one uses the exponential notation for float point numbers: e = 1.60217E-19 instead of e = 1.60217*10**(-19).
The best way to approach this problem in the end was to use a heaviside function to preventE variable from exceeding \Delta variable.