Let's look at m points in n-d space- (A solution for 4 points in 3-d space is here: minimize distance from sets of points)
a= (x1, y1, z1, ..)
b= (x2, y2 ,z2, ..)
c= (x3, y3, z3, ..)
p= (x , y , z, ..)
Find point q = c1* a + c2* b + c3* c + ..
where c1 + c2 + c3 + .. = 1
and c1, c2, c3, .. >= 0
euclidean distance pq is minimized.
What algorithms can be used ? Idea or pseudocode is enough.
(Optimizing performance is a big issue here. Monte Carlo method with all vertices and changing coefficients would also give a solution.)
We can assume p = 0 by subtracting p from all the other points. Then the question is one of minimizing the norm over a convex hull of a finite set of points, i.e., a polytope.
There are a few papers on this problem. It looks like "A recursive algorithm for finding the minimum norm point in a polytope and a pair of closest points in two polytopes" by Kazuyuki Sekitani and Yoshitsugu Yamamoto is a good one, with a short survey of prior solutions to the problem. It is behind a paywall but if you have access to a university library you may be able to download a copy.
The algorithm they give is fairly simple, once you get past the notation. P is the finite set of points. C(P) is its convex hull. Nr(C(P)) is the unique point of minimum norm, which is what you want to find.
Step 0: Choose a point x_0 from the convex hull C(P) of your finite set of points P. They recommend choosing x_0 to be the point in P with minimum norm. Let k=1.
Now loop:
Step 1: Let a_k = min {x^t_{k-1} p | p is in P}. Here x^t_{k-1} is the transpose of x_{k-1} (so the function being minimized is just a dot product as p ranges over your finite set P). If |x_{k-1}|^2 <= a_k, then the answer is x_{k-1}, stop.
Step 2: P_k = {p | p in P and x^t_{k-1} = a_k}. P_k is the subset of P that minimizes the expression in Step 1. Call the algorithm recursively on this set P_k, and let the result be y_k = Nr(C(P_k)).
Step 3: b_k = min{y^t_k p | p in P\P_k}, the minimum of the dot product of y_k with points in the complement set P\P_k. If |y_k|^2 <= b_k then y_k is the answer, stop.
Step 4: s_k = max{s| [(1-s)x_{k-1} + sy_k]^t y_k <= [(1-s)x_{k-1} + sy_k]^t p for every p in P\P_k}. Let x_k = (1-s_k) x_{k-1} + s_k y_k, let k=k+1, and go back to Step 1.
There is an explicit formula for s_k in Step 4:
s_k = min{ [x^t_{k-1} (p-y_k)]/[(y_k-x_{k-1})^t (y_k-p)] | p in P\P_k and (y_k - x_{k-1})^t (y_k-p) > 0 }
There is a proof in the paper that s_k has the necessary properties, that the algorithm terminates after a finite number of operations, and that the result is indeed optimal.
Note that you should add some tolerance into your comparisons, otherwise rounding errors may cause the algorithm to fail. There is a lot of discussion about numerical stability, see the paper for details.
They do not give a complete analysis of the computational complexity of the algorithm, but they do prove it is at most O(m^2) in the two-dimensional case (m is the number of points in P), and they have done numerical experiments which give the impression that it is sublinear in time as a function of m, with dimension fixed. I'm skeptical of that claim. In the absence of a detailed analysis, I suggest you try some experiments with typical data to see how well the algorithm performs for you.
Stated a simpler way, you have a set of points {a}i, and you are considering all points which are some weighted average thereof. This set of points is exactly the convex hull of those points; it's a polytope (polygon, polyhedron, etc.) that just happens to be convex, where the corners are a subset of the {a}i points.
You are just asking which point on a polytope(~hedron) is closest to a point. (your query point p)
The closest point must be on the exterior of the polytope. One algorithm would be to brute-force searching all N-1 dimensional surfaces. Do this in the usual way you would find the closest point on a line or surface or N-dimensional surface to a query point.
(If the points are not all linearly independent, you will have multiple ways (multiple weight vectors) which can give you the same weighted-average point q. You can worry about reconstructing the answer q from the basis vectors after you find it geometrically.)
My goal is to obtain the representations of all faces (in the form of A[x,y,z]'>b) of a polyhedron that is the result of the convex difference between two convex polyhedra. Meaning, finding the intersection of all planes that are the result of the Minkowski difference of P1 - P2 = { x - y | x \in P1, y \in P2 }.
I'm looking for either an established library (Python?) or an idea on how to do this efficiently. I thought about doing something similar to the GJK algorithm but I need all of the faces, and not just compute whether the origin is inside quickly. Moreover, seems inefficient to use this support function in a methodological way in 3D, or higher dimensions. Also, let's say I got the vertices, do I now need to form the plane equation from two vectors on it with the cross product, for every face, or is there a way to obtain it from the Minkowski sum itself? (keeping in mind the need for higher dimensions).
Ok, it seems I was finally able to solve it, and I'm posting in case it would interest anyone in the future:
First, I pip installed the pypoman library.
With it, we are able to move easily between vertices and faces with compute_polytope_halfspaces (aka, the H-representation of a polytope). So I get the representation P_i: H_i x < h_i for i=1,2 from the vertices (or skip it if it's already in the correct format).
Now if we set P_sum = {[x1;x2] \in R^2n | [H_1 0; 0 H_2] [x1;x2]' < [h_1,h_2]'}, notice that the Minkowski sum is equivalent to P1+P2 = [I,I] P_sum (idea from this paper IV.B). So I can use pypoman's project_polytope function to get the Minkwoski sum with H_sum x < h_sum in the original dimensions.
First of all, I know that these threads exist! So bear with me, my question is not fully answered by them.
As an example assume we are in a 4-dimensional vector space, i.e R^4. We are looking at the two linear equations:
3*x1 - 2* x2 + 7*x3 - 2*x4 = 6
1*x1 + 3* x2 - 2*x3 + 5*x4 = -2
The actual questions is: Is there a way to generate a number N of points that solve both of these equations making use of the linear solvers from NumPy etc?
The main problem with all python libraries I have tried so far is: they need n equations for a n-dimensional space
Solving the problem is very easy for one equation, since you can simply use n-1 randomly generated vlaues and adapt the last one such that the vector solves the equation.
My expected result would be a list of N "randomly" generated points that solve k linear equations in an n-dimensional space, where k<n.
A system of linear equations with more variables than equations is known as an underdetermined system.
An underdetermined linear system has either no solution or infinitely many solutions.
There are algorithms to decide whether an underdetermined system has solutions, and if it has any, to express all solutions as linear functions of k of the variables (same k as above). The simplest one is Gaussian elimination.
As you say, many functions available in libraries (e.g. np.linalg.solve) require a square matrix (i.e. n equations for n unknowns), what you are looking for is an implementation of Gaussian elimination for non square linear systems.
This isn't 'random', but np.linalg.lstsq (least square) is will solve non-square matrices:
Return the least-squares solution to a linear matrix equation.
Solves the equation a x = b by computing a vector x that minimizes the Euclidean 2-norm || b - a x ||^2. The equation may be under-, well-, or over- determined (i.e., the number of linearly independent rows of a can be less than, equal to, or greater than its number of linearly independent columns). If a is square and of full rank, then x (but for round-off error) is the “exact” solution of the equation.
For more info, see:
solving Ax =b for a non-square matrix A using python
Since you have an underdetermined system of equations (too few constraints for your solutions, or fewer equations than variables) you can just pick some arbitrary values for x3 and x4 and solve the system in x1, x2 (this has 2 variables/2 equations).
You will just need to check that the resulting system is not inconsistent (i.e. it admits no solution) and that there are no duplicate solutions.
You could for instance fix x3=0 and choosing random values of x4 generate solutions for your equations in x1, x2
Here's an example generating 10 "random" solutions
n = 10
x3 = 0
X = []
for x4 in np.random.choice(1000, n):
b = np.array([[6-7*x3+2*x4],[-2+2*x3-5*x4]])
x = np.linalg.solve(a, b)
# check solution nr. 3
[x1, x2, x3, x4] = X[3]
3*x1 - 2* x2 + 7*x3 - 2*x4
# output: 6.0
1*x1 + 3* x2 - 2*x3 + 5*x4
# output: -2.0
Thanks for the answers, which both helped me and pointed me in the right direction.
I now have an easy step-by-step solution to my problem for arbitrary k<n.
1. Find one solution to all equations given. This can be done by using
solution_vec = numpy.linalg.lstsq(A,b)
this gives a solution as seen in ukemis answer. In my example above, the Matrix A is equal to the coefficients of the equations on the left side, b represents the vector on the right side.
2. Determine the null space of your matrix A.
These are all vectors v such that the skalar product v*A_i = 0 for every(!) row A_i of A. The following function, found in this thread can be used to get representatives of the null space of A:
def nullSpaceOfMatrix(A, eps=1e-15):
u, s, vh = scipy.linalg.svd(A)
null_mask = (s <= eps)
null_space = scipy.compress(null_mask, vh, axis=0)
return scipy.transpose(null_space)
3. Generate as many (N) "random" linear combinations (meaning with random coefficients) of solution_vec and resulting vectors of the nullspace of the matrix as you want! This works because the scalar product is additive and nullspace vectors have a scalar product of 0 to the vectors of the equations. Those linear combinations always must contain solution_vec, as in:
linear_combination = solution_vec + a*null_spacevec_1 + b*nullspacevec_2...
where a and b can be randomly chosen.
Imagine you are given set S of n points in 3 dimensions. Distance between any 2 points is simple Euclidean distance. You want to chose subset Q of k points from this set such that they are farthest from each other. In other words there is no other subset Q’ of k points exists such that min of all pair wise distances in Q is less than that in Q’.
If n is approximately 16 million and k is about 300, how do we efficiently do this?
My guess is that this NP-hard so may be we just want to focus on approximation. One idea I can think of is using Multidimensional scaling to sort these points in a line and then use version of binary search to get points that are furthest apart on this line.
This is known as the discrete p-dispersion (maxmin) problem.
The optimality bound is proved in White (1991) and Ravi et al. (1994) give a factor-2 approximation for the problem with the latter proving this heuristic is the best possible (unless P=NP).
Factor-2 Approximation
The factor-2 approximation is as follows:
Let V be the set of nodes/objects.
Let i and j be two nodes at maximum distance.
Let p be the number of objects to choose.
P = set([i,j])
while size(P)<p:
Find a node v in V-P such that min_{v' in P} dist(v,v') is maximum.
\That is: find the node with the greatest minimum distance to the set P.
P = P.union(v)
Output P
You could implement this in Python like so:
#!/usr/bin/env python3
import numpy as np
p = 50
N = 400
print("Building distance matrix...")
d = np.random.rand(N,N) #Random matrix
d = (d + d.T)/2 #Make the matrix symmetric
print("Finding initial edge...")
maxdist = 0
bestpair = ()
for i in range(N):
for j in range(i+1,N):
if d[i,j]>maxdist:
maxdist = d[i,j]
bestpair = (i,j)
P = set()
print("Finding optimal set...")
while len(P)<p:
print("P size = {0}".format(len(P)))
maxdist = 0
vbest = None
for v in range(N):
if v in P:
for vprime in P:
if d[v,vprime]>maxdist:
maxdist = d[v,vprime]
vbest = v
Exact Solution
You could also model this as an MIP. For p=50, n=400 after 6000s, the optimality gap was still 568%. The approximation algorithm took 0.47s to obtain an optimality gap of 100% (or less). A naive Gurobi Python representation might look like this:
#!/usr/bin/env python
import numpy as np
import gurobipy as grb
p = 50
N = 400
print("Building distance matrix...")
d = np.random.rand(N,N) #Random matrix
d = (d + d.T)/2 #Make the matrix symmetric
m = grb.Model(name="MIP Model")
used = [m.addVar(vtype=grb.GRB.BINARY) for i in range(N)]
objective = grb.quicksum( d[i,j]*used[i]*used[j] for i in range(0,N) for j in range(i+1,N) )
# for maximization
m.ModelSense = grb.GRB.MAXIMIZE
# m.Params.TimeLimit = 3*60
# solving with Glpk
ret = m.optimize()
Obviously, the O(N^2) scaling for the initial points is bad. We can find them more efficiently by recognizing that the pair must lie on the convex hull of the dataset. This gives us an O(N log N) way to find the pair. Once we've found it we proceed as before (using SciPy for acceleration).
The best way of scaling would be to use an R*-tree to efficiently find the minimum distance between a candidate point p and the set P. But this cannot be done efficiently in Python since a for loop is still involved.
import numpy as np
from scipy.spatial import ConvexHull
from scipy.spatial.distance import cdist
p = 300
N = 16000000
# Find a convex hull in O(N log N)
points = np.random.rand(N, 3) # N random points in 3-D
# Returned 420 points in testing
hull = ConvexHull(points)
# Extract the points forming the hull
hullpoints = points[hull.vertices,:]
# Naive way of finding the best pair in O(H^2) time if H is number of points on
# hull
hdist = cdist(hullpoints, hullpoints, metric='euclidean')
# Get the farthest apart points
bestpair = np.unravel_index(hdist.argmax(), hdist.shape)
P = np.array([hullpoints[bestpair[0]],hullpoints[bestpair[1]]])
# Now we have a problem
print("Finding optimal set...")
while len(P)<p:
print("P size = {0}".format(len(P)))
distance_to_P = cdist(points, P)
minimum_to_each_of_P = np.min(distance_to_P, axis=1)
best_new_point_idx = np.argmax(minimum_to_each_of_P)
best_new_point = np.expand_dims(points[best_new_point_idx,:],0)
P = np.append(P,best_new_point,axis=0)
I am also pretty sure that the problem is NP-Hard, the most similar problem I found is the k-Center Problem. If runtime is more important than correctness a greedy algorithm is probably your best choice:
Q ={}
while |Q| < k
Q += p from S where mindist(p, Q) is maximal
Side note: In similar problems e.g., the set-cover problem it can be shown that the solution from the greedy algorithm is at least 63% as good as the optimal solution.
In order to speed things up I see 3 possibilities:
Index your dataset in an R-Tree first, then perform a greedy search. Construction of the R-Tree is O(n log n), but though being developed for nearest neighbor search, it can also help you finding the furthest point to a set of points in O(log n). This might be faster than the naive O(k*n) algorithm.
Sample a subset from your 16 million points and perform the greedy algorithm on the subset. You are approximate anyway so you might be able to spare a little more accuracy. You can also combine this with the 1. algorithm.
Use an iterative approach and stop when you are out of time. The idea here is to randomly select k points from S (lets call this set Q'). Then in each step you switch the point p_ from Q' that has the minimum distance to another one in Q' with a random point from S. If the resulting set Q'' is better proceed with Q'', otherwise repeat with Q'. In order not to get stuck you might want to choose another point from Q' than p_ if you could not find an adequate replacement for a couple of iterations.
If you can afford to do ~ k*n distance calculations then you could
Find the center of the distribution of points.
Select the point furthest from the center. (and remove it from the set of un-selected points).
Find the point furthest from all the currently selected points and select it.
Repeat 3. until you end with k points.
Find the maximum extent of all points. Split into 7x7x7 voxels. For all points in a voxel find the point closest to its centre. Return these 7x7x7 points. Some voxels may contain no points, hopefully not too many.
suppose I have the following Problem:
I have a complex function A(x) and a complex function B(y). I know these functions cross in the complex plane. I would like to find out the corresponding x and y of this intersection point, numerically ( and/or graphically). What is the most clever way of doing that?
This is my starting point:
import matplotlib.pyplot as plt
import numpy as np
from numpy import sqrt, pi
x = np.linspace(1, 10, 10000)
y = np.linspace(1, 60, 10000)
def A_(x):
return -1/( 8/(pi*x)*sqrt(1-(1/x)**2) - 1j*(8/(pi*x**2)) )
A = np.vectorize(A_)
def B_(y):
return 3/(1j*y*(1+1j*y))
B = np.vectorize(B_)
real_A = np.real(A(x))
imag_A = np.imag(A(x))
real_B = np.real(B(y))
imag_B = np.imag(B(y))
plt.plot(real_A, imag_A, color='blue')
plt.plot(real_B, imag_B, color='red')
I don't have to plot it necessarily. I just need x_intersection and y_intersection (with some error that depends on x and y).
Thanks a lot in advance!
I should have used different variable names. To clarify what i need:
x and y are numpy arrays and i need the index of the intersection point of each array plus the corresponding x and y value (which again is not the intersection point itself, but some value of the arrays x and y ).
Here I find the minimum of the distance between the two curves. Also, I cleaned up your code a bit (eg, vectorize wasn't doing anything useful).
import matplotlib.pyplot as plt
import numpy as np
from numpy import sqrt, pi
from scipy import optimize
def A(x):
return -1/( 8/(pi*x)*sqrt(1-(1/x)**2) - 1j*(8/(pi*x**2)) )
def B(y):
return 3/(1j*y*(1+1j*y))
# The next three lines find the intersection
def dist(x):
return abs(A(x[0])-B(x[1]))
sln = optimize.minimize(dist, [1, 1])
# plotting everything....
a0, b0 = A(sln.x[0]), B(sln.x[1])
x = np.linspace(1, 10, 10000)
y = np.linspace(1, 60, 10000)
a, b = A(x), B(y)
plt.plot(a.real, a.imag, color='blue')
plt.plot(b.real, b.imag, color='red')
plt.plot(a0.real, a0.imag, "ob")
plt.plot(b0.real, b0.imag, "xr")
The specific x and y values at the intersection point are sln.x[0] and sln.x[1], since A(sln.x[0])=B(sln.x[1]). If you need the index, as you also mention in your edit, I'd use, for example, numpy.searchsorted(x, sln.x[0]), to find where the values from the fit would insert into your x and y arrays.
I think what's a bit tricky with this problem is that the space for graphing where the intersection is (ie, the complex plane) does not show the input space, but one has to optimize over the input space. It's useful for visualizing the solution, then, to plot the distance between the curves over the input space. That can be done like this:
data = dist((X, Y))
fig, ax = plt.subplots()
im = ax.imshow(data, cmap=plt.cm.afmhot, interpolation='none',
extent=[min(x), max(x), min(y), max(y)], origin="lower")
cbar = fig.colorbar(im)
plt.plot(sln.x[0], sln.x[1], "xw")
From this it seems much more clear how optimize.minimum is working -- it just rolls down the slope to find the minimum distance, which is zero in this case. But still, there's no obvious single visualization that one can use to see the whole problem.
For other intersections one has to dig a bit more. That is, #emma asked about other roots in the comments, and there I mentioned that there's no generally reliable way to find all roots to arbitrary equations, but here's how I'd go about looking for other roots. Here I won't lay out the complete program, but just list the changes and plots as I go along.
First, it's obvious that for the domain shown in my first plot that there's only one intersection, and that there are no intersection in the region to the left. The only place there could be another intersection is to the right, but for that I'll need to allow the sqrt in the def of B to get a negative argument without throwing an exception. An easy way to do this is to add 0j to the argument of the sqrt, like this, sqrt(1+0j-(1/x)**2). Then the plot with the intersection becomes
I plotted this over a broader range (x=np.linspace(-10, 10, 10000) and y=np.linspace(-400, 400, 10000)) and the above is the zoom of the only place where anything interesting is going on. This shows the intersection found above, plus the point where it looks like the two curves might touch (where the red curve, B, comes to a point nearly meeting the blue curve A going upward), so that's the new interesting thing, and the thing I'll look for.
A bit of playing around with limits, etc, show that B is coming to a point asymptotically, and the equation of B is obvious that it will go to 0 + 0j for large +/- y, so that's about all there is to say for B.
It's difficult to understand A from the above plot, so I'll look at the real and imaginary parts independently:
So it's not a crazy looking function, and the jumping between Re=const and Im=const is just the nature of sqrt(1-x-2), which is pure complex for abs(x)<1 and pure real for abs(x)>1.
It's pretty clear now that the other time the curves are equal is at y= +/-inf and x=0. And, quick look at the equations show that A(0)=0+0j and B(+/- inf)=0+0j, so this is another intersection point (though since it occurs at B(+/- inf), it's sort-of ambiguous on whether or not it would be called an intersection).
So that's about it. One other point to mention is that if these didn't have such an easy analytic solution, like it wasn't clear what B was at inf, etc, one could also graph/minimize, etc, by looking at B(1/y), and then go from there, using the same tools as above to deal with the infinity. So using:
def dist2(x):
return abs(A(x[0])-B(1./x[1]))
Where the min on the right is the one initially found, and the zero, now at x=-0 and 1./y=0 is the other one (which, again, isn't interesting enough to apply an optimizer here, but it could be interesting in other equations).
Of course, it's also possible to estimate this by just finding the minimum of the data that goes into the above graph, like this:
X, Y = np.meshgrid(x, y)
data = dist((X, Y))
r = np.unravel_index(data.argmin(), data.shape)
print x[r[1]], y[r[0]]
# 2.06306306306 1.8008008008 # min approach gave 2.05973231 1.80069353
But this is only approximate (to the resolution of data) and involved many more calculations (1M compared to a few hundred). I only post this because I think it might be what the OP originally had in mind.
Briefly, two analytic solutions are derived for the roots of the problem. The first solution removes the parametric representation of x and solves for the roots directly in the (u, v) plane, where for example A(x): u(x) + i v(y) gives v(u) = f(u). The second solution uses a polar representation, e.g. A(x) is given by r(x) exp(i theta(x)), and offers a better understanding of the behavior of the square root as x passes through unity towards zero. Possible solutions occurring at the singular points are explored. Finally, a bisection root finding algorithm is constructed as a Python iterator to invert certain solutions. Summarizing, the one real root can be found as a solution to either of the following equations:
and gives:
x0 = -2.059732
y0 = +1.800694
A(x0) = B(y0) = (-0.707131, -i 0.392670)
As in most problems there are a number of ways to proceed. One can use a "black box" and hopefully find the root they are looking for. Sometimes an answer is all that is desired, and with a little understanding of the functions this may be an adequate way forward. Unfortunately, it is often true that such an approach will provide less insight about the problem then others.
For example, algorithms find it difficult locating roots in the global space. Local roots may be found with other roots lying close by and yet undiscovered. Consequently, the question arises: "Are all the roots accounted for?" A more complete understanding of the functions, e.g. asymptotic behaviors, branch cuts, singular points, can provide the global perspective to better answer this, as well as other important questions.
So another possible solution would be building one's own "black box." A simple bisection routine might be a starting point. Robust if the root lies in the initial interval and fairly efficient. This encourages us to look at the global behavior of the functions. As the code is structured and debugged the various functions are explored, new insights are gained, and the algorithm has become a tool towards a more complete solution to the problem. Perhaps, with some patience, a closed-form solution can be found. A Python iterator is constructed and listed below implementing a bisection root finding algorithm.
Begin by putting the functions A(x) and B(x) in a more standard form:
C(x) = u(x) + i v(x)
and here the complex number i is brought out of the denominator and into the numerator, casting the problem into the form of functions of a complex variable. The new representation simplifies the original functions considerably. The real and imaginary parts are now clearly separated. An interesting graph is to plot A(x) and B(x) in the 3-dimensional space (u, v, x) and then visualize the projection into the u-v plane.
import numpy as np
from numpy import real, imag
import matplotlib.pyplot as plt
ax = fig.gca(projection='3d')
s = np.linspace(a, b, 1000)
ax.plot(f(s).real, f(s).imag, z, color='blue')
ax.plot(g(s).real, g(s).imag, z, color='red')
ax.plot(f(s).real, f(s).imag, 0, color='black')
ax.plot(g(s).real, g(s).imag, 0, color='black')
The question arises: "Can the parametric representation be replaced so that a relationship such as,
A(x): u(x) + i v(x) gives v(u) = f(u)
is obtained?" This will provide A(x) as a function v(u) = f(u) in the u-v plane. Then, if for
B(x): u(x) + i v(x) gives v(u) = g(u)
a similar relationship can be found, the solutions can be set equal to one another,
f(u) = g(u)
and the root(s) computed. In fact, it is convenient to look for a solution in the square of the above equation. The worst case is that an algorithm will have to be built to find the root, but at this point the behavior of our functions are better understood. For example, if f(u) and g(u) are polynomials of degree n then it is known that there are n roots. The best case is that a closed-form solution might be a reward for our determination.
Here is more detail to the solution. For A(x) the following is derived:
and v(u) = f(u) is just v(u) = constant. Similarly for B(x) a slightly more complex form is required:
Look at the function g(u) for B(x). It is imaginary if u > 0, but the root must be real since f(u) is real. This means that u must be less then 0, and there is both a positive and negative real branch to the square root. The sign of f(u) then allows one to pick the negative branch as the solution for the root. So the fact that the solution must be real is determined by the sign of u, and the fact that the real root is negative specifies what branch of the square root to choose.
In the following plot both the real (u < 0) and complex (u > 0) solutions are shown.
The camera looks toward the origin in the back corner, where the red and blue curves meet. The z-axis is the magnitude of f(u) and g(u). The x and y axes are the real/complex values of u respectively. The blue curves are the real solution with (3 - |u|). The red curves are the complex solution with (3 + |u|). The two sets meet at u = 0. The black curve is f(u) equal to (-pi/8).
There is a divergence in g(u) at |u| = 3 and this is associated with x = 0. It is far removed from the solution and will not be considered further.
To obtain the roots to f = g it is easier to square f(u) and equate the two functions. When the function g(u) is squared the branches of the square root are lost, much like squaring the solutions for x**2 = 4. In the end the appropriate root will be chosen by the sign of f(u) and so this is not an issue.
So by looking at the dependence of A and B, with respect to the parametric variable x, a representation for these functions was obtained where v is a function of u and the roots found. A simpler representation can be obtained if the term involving c in the square root is ignored.
The answer gives all the roots to be found. A cubic equation has at most three roots and one is guaranteed to be real. The other two may be imaginary or real. In this case the real root has been found and the other two roots are complex. Interestingly, as c changes these two complex roots may move into the real plane.
In the above figure the x-axis is u and the y axis is the evaluated cubic equation with constant c. The blue curve has c as (pi/8) squared. The red curve uses a larger and negative value for c, and has been translated upwards for purposes of demonstration. For the blue curve there is an inflection point near (0, 0.5), while the red curve has a maximum at (-0.9, 2.5) and a minimum at (0.9, -0.3).
The intersection of the cubic with the black line represents the roots given by: u**3 + c u + 3c = 0. For the blue curve there is one intersection and one real root with two roots in the complex plane. For the red curve there are three intersections, and hence 3 roots. As we change the value of the constant c (blue to red) the one real root undergoes a "pitchfork" bifurcation, and the two roots in the complex plane move into the real space.
Although the root (u_t, v_t) has been located, obtaining the value for x requires that (u, v) be inverted. In the present example this is a trivial matter, but if not, a bisection routine can be used to avoid the algebraic difficulties.
The parametric representation also leads to a solution for the real root, and it rounds out the analysis with an independent verification of the first result. Second, it answers the question about what happens at the singularity in the square root. Third, it gives a greater understanding of the multiplicity of roots.
The steps are: (1) convert A(x) and B(x) into polar form, (2) equate the modulus and argument giving two equations in two unknowns, (3) make a substitution for z = x**2. Converting A(x) to polar form:
Absolute value bars are not indicated, and it should be understood that the moduli r(x) and s(x) are positive definite as their names imply. For B(x):
The two equations in two unknowns:
Finally, the cubic solution is sketched out here where the substitution z = x**2 has been made:
The solution for z = x**2 gives x directly, which allows one to substitute into both A(x) and B(x). This is an exact solution if all terms are maintained in the cubic solution, and there is no error in x0, y0, A(x0), or B(x0). A simpler representation can be found by considering terms proportional to 1/d as small.
Before leaving the polar representation consider the two singular points where: (1) abs(x) = 1, and (2) x = 0. A complicating factor is that the arguments behave something like 1/x instead of x. It is worthwhile to look at a plot of the arctan(a) and then ask yourself how that changes if its argument is replaced by 1/a. The following graphs will then look less foreign.
Consider the polar representation of B(x). As x approaches 0 the modulus and argument tend toward infinity, i.e. the point is infinitely far from the origin and lies along the y-axis. Approaching 0 from the negative direction the point lies along the negative y-axis with varphi = (-pi/2), while approaching from the other direction the point lies along the positive y-axis with varphi = (+pi/2).
A somewhat more complicated behavior is exhibited by A(x). A(x) is even in x since the modulus is positive definite and the argument involves only x**2. There is a symmetry across the y-axis that allows us to only consider the x > 0 plane.
At x = 1 the modulus is just (pi/8), and as x continues to approach 0 so does r(x). The behavior of the argument is more complex. As x approaches unity from large positive values the argument is diverging towards +inf and so theta is approaching (+pi/2). As x passes through 1 the argument becomes complex. At x equals 0 the argument has reached its minimum value of -i. For complex arguments the arctan is given by:
The following are plots of the arguments for A(x) and B(x). The x-axis is the value of x, and the y-axis is the value of the angle in units of pi. In the first plot theta is shown in blue curves, and as x approaches 1 the angle approaches (+pi/2). Theta is real because abs(x) >= 1, and notice it is symmetric across the y-axis. The black curve is varphi and as x approaches 0 it approaches plus or minus (pi/2). Notice it is an odd function in x.
In the second plot A(x) is shown where abs(x) < 1 and the argument becomes complex. Near x = 1 theta is equal to (+pi/2), the blue curve, minus a small imaginary part, the red curve. As x approaches zero theta is equal to (+pi/2) minus a large imaginary part. At x equals 0 the argument is equal to -i and theta = (+pi/2) minus an infinite imaginary part, i.e ln(0) = -inf:
The values for x0 and y0 are determined by the set of equations that equate modulus and argument of A(x) and B(x), and there are no other roots. If x0 = 0 was a root, then it would fall out of these equations. The same holds for x0 = 1. In fact, if one uses approximations in the argument of A(x) about these points, and then substitutes into the equation for the modulus, the equality cannot be maintained there.
Here is another perspective: consider the set of equations where x is assumed large and call it x_inf. The equation for the argument then gives x_inf = y_inf, where 1 is neglected with respect to x_inf squared. Upon substitution into the second equation a cubic is obtained in x_inf. Will this give the correct answer? Yes, if x0 is actually large, and in this case you might get away with it since x0 is approximately 2. The difference between the sqrt(4) and the sqrt(5) is around 10%. But does this mean that x_inf = 100 is a solution? No it does not: x_inf is only a solution if it equals x0.
The initial reason for examining the problem in the first place was to find a context for building a root-finding bisection routine as a Python iterator. This can be used to find any of the roots discussed here, and looks something like this:
class Bisection:
def __init__(self, a, b, func, max_iter):
self.max_iter = max_iter
self.count_iter = 0
self.a = a
self.b = b
self.func = func
fa = func(self.a)
fb = func(self.b)
if fa*fb >= 0.0:
raise ValueError
def __iter__(self):
self.x1 = self.a
self.x2 = self.b
self.xmid = self.x1 + ((self.x2 - self.x1)/2.0)
return self
def __next__(self):
f1 = self.func(self.x1)
f2 = self.func(self.x2)
error = abs(f1 - f2)
fmid = self.func(self.xmid)
if fmid == 0.0:
return self.xmid
if f1*fmid < 0:
self.x2 = self.xmid
self.x1 = self.xmid
self.xmid = self.x1 + ((self.x2 - self.x1)/2.0)
f1 = self.func(self.x1)
fmid = self.func(self.xmid)
self.count_iter += 1
if self.count_iter >= self.max_iter:
raise StopIteration
return self.xmid
The routine does only a minimal amount in the way of catching exceptions and was used to find x for the given solution in the u-v plane. The arguments a and b give the lower and upper brackets for the root to be found. The argument func is the function for the root to be found. This might look like: u0 - B(x).real. The constant max_iterations tells the iterator to stop after a given number of bisections has been attempted.
I have a set of 3 million vectors (300 dimensions each), and I'm looking for a new point in this 300 dim space that is approximately equally distant from all the other points(vectors)
What I could do is initialize a random vector v, and run an optimization over v with the objective:
Where d_xy is the distance between vector x and vector y, but this would be very computationally expensive.
I'm looking for an approximate solution vector for this problem that can be found quickly over very large sets of vectors. (Or any libraries that will do something like this for me- any language)
I agree that in general this is a pretty tough optimization problem, especially at the scale you're describing. Each objective function evaluation requires O(nm + n^2) work for n points of dimension m -- O(nm) to compute distances from each point to the new point and O(n^2) to compute the objective given the distances. This is pretty scary when m=300 and n=3M. Thus even one function evaluation is probably intractable, not to mention solving the full optimization problem.
One approach that has been mentioned in the other answer is to take the centroid of the points, which can be computed efficiently -- O(nm). A downside of this approach is that it could do terribly at the proposed objective. For instance, consider a situation in 1-dimensional space with 3 million points with value 1 and 1 point with value 0. By inspection, the optimal solution is v=0.5 with objective value 0 (it's equidistant from every point), but the centroid will select v=1 (well, a tiny bit smaller than that) with objective value 3 million.
An approach that I think will do better than the centroid is to optimize each dimension separately (ignoring the existence of the other dimensions). While the objective function is still expensive to compute in this case, a bit of algebra shows that the derivative of the objective is quite easy to compute. It is the sum over all pairs (i, j) where i < v and j > v of the value 4*((v-i)+(v-j)). Remember we're optimizing a single dimension so the points i and j are 1-dimensional, as is v. For each dimension we therefore can sort the data (O(n lg n)) and then compute the derivative for a value v in O(n) time using a binary search and basic algebra. We can then use scipy.optimize.newton to find the zero of the derivative, which will be the optimal value for that dimension. Iterating over all dimensions, we'll have an approximate solution to our problem.
First consider the proposed approach versus the centroid method in a simple setting, with 1-dimensional data points {0, 3, 3}:
import bisect
import scipy.optimize
def fulldist(x, data):
dists = [sum([(x[i]-d[i])*(x[i]-d[i]) for i in range(len(x))])**0.5 for d in data]
obj = 0.0
for i in range(len(data)-1):
for j in range(i+1, len(data)):
obj += (dists[i]-dists[j]) * (dists[i]-dists[j])
return obj
def f1p(x, d):
lownum = bisect.bisect_left(d, x)
highnum = len(d) - lownum
lowsum = highnum * (x*lownum - sum([d[i] for i in range(lownum)]))
highsum = lownum * (x*highnum - sum([d[i] for i in range(lownum, len(d))]))
return 4.0 * (lowsum + highsum)
data = [(0.0,), (3.0,), (3.0,)]
opt = []
centroid = []
for d in range(len(data[0])):
thisdim = [x[d] for x in data]
meanval = sum(thisdim) / len(thisdim)
opt.append(scipy.optimize.newton(f1p, meanval, args=(thisdim,)))
print "Proposed", opt, "objective", fulldist(opt, data)
# Proposed [1.5] objective 0.0
print "Centroid", centroid, "objective", fulldist(centroid, data)
# Centroid [2.0] objective 2.0
The proposed approach finds the exact optimal solution, while the centroid method misses by a bit.
Consider a slightly larger example with 1000 points of dimension 300, with each point drawn from a gaussian mixture. Each point's value is normally distributed with mean 0 and variance 1 with probability 0.1 and normally distributed with mean 100 and variance 1 with probability 0.9:
data = []
for n in range(1000):
d = []
for m in range(300):
if random.random() <= 0.1:
d.append(random.normalvariate(0.0, 1.0))
d.append(random.normalvariate(100.0, 1.0))
The resulting objective values were 1.1e6 for the proposed approach and 1.6e9 for the centroid approach, meaning the proposed approach decreased the objective by more than 99.9%. Obviously the differences in the objective value are heavily affected by the distribution of the points.
Finally, to test the scaling (removing the final objective value calculations, since they're in general intractable), I get the following scaling with m=300: 0.9 seconds for 1,000 points, 7.1 seconds for 10,000 points, and 122.3 seconds for 100,000 points. Therefore I expect this should take about 1-2 hours for your full dataset with 3 million points.
From this question on the Math StackExchange:
There is no point that is equidistant from 4 or more points in general
position in the plane, or n+2 points in n dimensions.
Criteria for representing a collection of points by one point are
considered in statistics, machine learning, and computer science. The
centroid is the optimal choice in the least-squares sense, but there
are many other possibilities.
The centroid is the point C in the the plane for which the sum of
squared distances $\sum |CP_i|^2$ is minimum. One could also optimize
a different measure of centrality, or insist that the representative
be one of the points (such as a graph-theoretic center of a weighted
spanning tree), or assign weights to the points in some fashion and
take the centroid of those.
Note, specifically, "the centroid is the optimal choice in the least-squares sense", so the optimal solution to your cost function (which is a least-squares cost) is simply to average all the coordinates of your points (which will give you the centroid).