Eigen OLS vs python statsmodel.api.OLS - python

I need to calculate the slope,intercept of the line for a regression between 2 vectors with data. So i made a prototype with python below code:
A = [1,2,5,7,14,17,19]
b = [2,14,6,7,13,27,29]
A = sm.add_constant(A)
results = sm.OLS(A, b).fit()
print("results: ", results.params)
output: [0.04841897 0.64278656]
Now I need to replicate this using Eigen lib in C++ and as I understood, I need to pass a 1 column in the matrix of A. If I do so, I get totally different results for the regression than if I just use no second column or a 0 column. C++ code below:
Eigen::VectorXd A(7);
Eigen::VectorXd b(7);
A << 1,2,5,7,14,17,19;
b << 2,14,6,7,13,27,29;
MatrixXd new_A(A.rows(), 2);
VectorXd d = VectorXd::Constant(A.rows(), 1);
new_A << A, d;
Eigen::MatrixXd res = new_A.bdcSvd(Eigen::ComputeThinU | Eigen::ComputeThinV).solve(b);
cout << " slope: " << res.coeff(0, 0) << " intercept: " << res.coeff(1, 0) << endl;
cout << "dbl check: " << (new_A.transpose() * new_A).ldlt().solve(new_A.transpose() * b) << endl;
output with '1' column added to new_A -> slope: 1.21644 intercept:
2.70444 output with '0' or no column added -> slope: 0.642787 intercept: 0
How to get same results in C++? Which one is the right one, I seem to trust more the python one since I get the same when I use 0 column.
thank you,
Merlin

It seems i had to invert new_A with b, and replace ComputeThin with ComputeFull so that it builds.
Eigen::MatrixXd res = b.bdcSvd(Eigen::ComputeFullU | Eigen::ComputeFullV).solve(new_A);

Related

linalg.svd and JacobiSVD<MatrixXf> svd ,The results are different

I'm translating the Python code into a C + + version,But I found that the two functions(linalg.svd and JacobiSVD svd ) produce different results.What should I do?
A = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
U, S, V = svd(A,0)
print("U =\n", U)
print("S =\n", S)
print("V =\n", V)
MatrixXf m = MatrixXf::Zero(4,4);
m << 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16;
cout << "Here is the matrix m:" << endl << m << endl;
JacobiSVD<MatrixXf> svd(m, ComputeFullU | ComputeFullV);
cout << "Its singular values are:" << endl << svd.singularValues() << endl;
cout << "Its left singular vectors are the columns of the thin U matrix:" << endl << endl << svd.matrixU() << endl;
cout << "Its right singular vectors are the columns of the thin V matrix:" << endl << endl << svd.matrixV() << endl;
Forgive me for not being clear, but here are the Python code and C ++
code results
U =
[[-0.13472212 -0.82574206 0.54255324 0.07507318]
[-0.3407577 -0.4288172 -0.77936056 0.30429774]
[-0.54679327 -0.03189234 -0.06893859 -0.83381501]
[-0.75282884 0.36503251 0.30574592 0.45444409]]
S =
[3.86226568e+01 2.07132307e+00 1.57283823e-15 3.14535571e-16]
V =
[[-0.4284124 -0.47437252 -0.52033264 -0.56629275]
[ 0.71865348 0.27380781 -0.17103786 -0.61588352]
[-0.19891147 -0.11516042 0.82705525 -0.51298336]
[ 0.51032757 -0.82869661 0.12641052 0.19195853]]
C++
Here is the matrix m:
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Its singular values are:
38.6227
2.07132
2.69062e-16
6.823e-17
Its left singular vectors are the columns of the thin U matrix:
0.134722 0.825742 0.0384608 0.546371
0.340758 0.428817 0.35596 -0.757161
0.546793 0.0318923 -0.827301 -0.12479
0.752829 -0.365033 0.432881 0.33558
Its right singular vectors are the columns of the thin V matrix:
0.428412 -0.718653 -0.124032 0.533494
0.474373 -0.273808 -0.232267 -0.803774
0.520333 0.171038 0.83663 0.00706489
0.566293 0.615884 -0.480331 0.263215
Although it turns out that there are some small deviations, will this affect my work?

Difference between Eigen Svd and np.linag svd

I'm currently trying to implement some prototype Python code into C++ and am running into an issue where I am getting different results for svd calculate between the 2 when using the same exact array
Code:
python:
(U, S, V) = np.linalg.svd(H)
print("h\n",H)
print("u\n",U)
print("v\n",V)
print("s\n",S)
rotation_matrix = np.dot(U, V)
prints
h
[[ 1.19586781e+00 -1.36504900e+00 3.04707238e+00]
[-3.24276981e-01 4.25640964e-01 -6.78455372e-02]
[ 4.58970250e-02 -7.33566042e-02 -2.96605698e-03]]
u
[[-0.99546325 -0.09501679 0.0049729 ]
[ 0.09441242 -0.97994807 0.17546529]
[-0.01179897 0.17513875 0.98447306]]
v
[[-0.34290622 0.39295764 -0.85322893]
[ 0.49311955 -0.6977843 -0.51954806]
[-0.79953014 -0.59890012 0.04549948]]
s
[3.5624894 0.43029207 0.00721429]
C++
code:
std::cout << "H\n" << HTest << std::endl;
Eigen::JacobiSVD<Eigen::MatrixXd> svd;
svd.compute(HTest, Eigen::ComputeThinV | Eigen::ComputeThinU);
std::cout << "h is" << std::endl << HTest << std::endl;
std::cout << "Its singular values are:" << std::endl << svd.singularValues() << std::endl;
std::cout << "Its left singular vectors are the columns of the thin U matrix:" << std::endl << -1*svd.matrixU() << std::endl;
std::cout << "Its right singular vectors are the columns of the thin V matrix:" << std::endl << -1*svd.matrixV() << std::endl;
prints:
h is
1.19587 -1.36505 3.04707
-0.324277 0.425641 -0.0678455
0.045897 -0.0733566 -0.00296606
Its singular values are:
3.56249
0.430292
0.00721429
Its left singular vectors are the columns of the thin U matrix:
-0.995463 -0.0950168 0.0049729
0.0944124 -0.979948 0.175465
-0.011799 0.175139 0.984473
Its right singular vectors are the columns of the thin V matrix:
-0.342906 0.49312 -0.79953
0.392958 -0.697784 -0.5989
-0.853229 -0.519548 0.0454995
So H,U,S are eqivalent between the 2, but V is not. What could cause this?
I didn't notice that the V's were just transposes of each other. user chrslg has a good explanation for why this is so Ill just copy it here:
"I'd say, "because" :-). I don't think there is a good reason. Just 2 implementations. In maths lessons, you've probably learned SVD decomposition with formula M=U.S.Vᵀ. So C++ library probably stick to this formula, and gives U, S, V such as M=U.S.Vᵀ. Where as linalg documentation says that it returns U,S,V such as M=(U*S)#V. So one call V what the other calls Vᵀ. Hard to say which one is right. As long as they do what their doc say they do"

save data (matrix or ndarray) with python then it load in c++ (as OpenCV Mat)

I've created some data in numpy that I would like to use in a separate C++ program. Therefore I need to save the data using python and later load it in C++. What is the best way of doing this?
My numpy ndarray is float 32 and of shape [10000 x 18 x 5]. I can save it for example using
numpy.save(filename, data)
Is there an easy way to load such data in C++? Target structure could be an Eigen::Matrix for example.
After searching for hours I found my year-old example files.
Caveat:
solution only covers 2D matrices
not suited for 3 dimensional or generic ndarrays
Write numpy array to ascii file with header specifying nrows, ncols:
def write_matrix2D_to_ascii(filename, matrix2D):
nrows, ncols = matrix2D.shape
with open(filename, "w") as file:
# write header [rows x cols]
nrows, ncols = matrix2D.shape
file.write(f"{nrows} {ncols}")
file.write("\n")
# write values
for row in range(nrows):
for col in range(ncols):
value = matrix2D[row, col]
file.write(str(value))
file.write(" ")
file.write("\n")
Example output data-file.txt looks like this (first row is header specifying nrows and ncols):
2 3
1.0 2.0 3.0
4.0 5.0 6.0
Cpp function to read matrix from ascii file into OpenCV matrix:
#include <iostream>
#include <fstream>
#include <iomanip> // set precision of output string
#include <opencv2/core/core.hpp> // OpenCV matrices for storing data
using namespace std;
using namespace cv;
void readMatAsciiWithHeader( const string& filename, Mat& matData)
{
cout << "Create matrix from file :" << filename << endl;
ifstream inFileStream(filename.c_str());
if(!inFileStream){
cout << "File cannot be found" << endl;
exit(-1);
}
int rows, cols;
inFileStream >> rows;
inFileStream >> cols;
matData.create(rows,cols,CV_32F);
cout << "numRows: " << rows << "\t numCols: " << cols << endl;
matData.setTo(0); // init all values to 0
float *dptr;
for(int ridx=0; ridx < matData.rows; ++ridx){
dptr = matData.ptr<float>(ridx);
for(int cidx=0; cidx < matData.cols; ++cidx, ++dptr){
inFileStream >> *dptr;
}
}
inFileStream.close();
}
Driver code to use above mentioned function in cpp program:
Mat myMatrix;
readMatAsciiWithHeader("path/to/data-file.txt", myMatrix);
For completeness, some code to save the data using C++:
int saveMatAsciiWithHeader( const string& filename, Mat& matData)
{
if (matData.empty()){
cout << "File could not be saved. MatData is empty" << endl;
return 0;
}
ofstream oStream(filename.c_str());
// Create header
oStream << matData.rows << " " << matData.cols << endl;
// Write data
for(int ridx=0; ridx < matData.rows; ridx++)
{
for(int cidx=0; cidx < matData.cols; cidx++)
{
oStream << setprecision(9) << matData.at<float>(ridx,cidx) << " ";
}
oStream << endl;
}
oStream.close();
cout << "Saved " << filename.c_str() << endl;
return 1;
}
Future work:
solution for 3D matrices
conversion to Eigen::Matrix

Getting wrong values when I stitch 2 shorts back into an unsigned long

I am doing BLE communications with an Arduino Board and an FPGA.
I have this requirement which restraints me from changing the packet structure (the packet structure is basically short data types). Thus, to send a timestamp (form millis()) over, I have to split an unsigned long into 2 shorts on the Arduino side and stitch it back up on the FPGA side (python).
This is the implementation which I have:
// Arduino code in c++
unsigned long t = millis();
// bitmask to get bits 1-16
short LSB = (short) (t & 0x0000FFFF);
// bitshift to get bits 17-32
short MSB = (short) (t >> 16);
// I then send the packet with MSB and LSB values
# FPGA python code to stitch it back up (I receive the packet and extract the MSB and LSB)
MSB = data[3]
LSB = data[4]
data = MSB << 16 | LSB
Now the issue is that my output for data on the FPGA side is sometimes negative, which tells me that I must have missed something somewhere as timestamps are not negative. Does any one know why ?
When I transfer other data in the packet (i.e. other short values and not the timestamp), I am able to receive them as expected, so the problem most probably lies in the conversion that I did and not the sending/receiving of data.
short defaults to signed, and in case of a negative number >> will keep the sign by shifting in one bits in from the left. See e.g. Microsoft.
From my earlier comment:
In Python avoid attempting that by yourself (by the way short from C perspective has no idea concerning its size, you always have to look into the compiler manual or limits.h) and use the struct module instead.
you probably need/want to first convert the long to network byte order using hotnl
As guidot reminded “short” is signed and as data are transferred to Python the code has an issue:
For t=0x00018000 most significant short MSB = 1, least significant short LSB = -32768 (0x8000 in C++ and -0x8000 in Python) and Python code expression
time = MSB << 16 | LSB
returns time = -32768 (see the start of Python code below).
So, we have incorrect sign and we are loosing MSB (any value, not only 1 in our example).
MSB is lost because in the expression above LSB is extended with sign bit 1 to the left 16 bits, then new 16 “1” bits override with “|” operator whatever MSB we have and then all new 16 “1” bits are discarded and the expression returns LSB.
Straightforward fix (1.1 Fix) would be fixing MSB, LSB to unsigned short. This could be enough without any changes in Python code.
To exclude bit operations we could use “union” as per 1.2 Fix.
Without access to C++ code we could fix in Python by converting signed LSB, MSB (2.1 Fix) or use “Union” (similar to C++ “union”, 2.2 Fix).
C++
#include <iostream>
using namespace std;
int main () {
unsigned long t = 0x00018000;
short LSB = (short)(t & 0x0000FFFF);
short MSB = (short)(t >> 16);
cout << hex << "t = " << t << endl;
cout << dec << "LSB = " << LSB << " MSB = " << MSB << endl;
// 1.1 Fix Use unsigned short instead of short
unsigned short fixedLSB = (unsigned short)(t & 0x0000FFFF);
unsigned short fixedMSB = (unsigned short)(t >> 16);
cout << "fixedLSB = " << fixedLSB << " fixedMSB = " << fixedMSB << endl;
// 1.2 Fix Use union
union {
unsigned long t2;
unsigned short unsignedShortArray[2];
};
t2 = 0x00018000;
fixedLSB = unsignedShortArray [0];
fixedMSB = unsignedShortArray [1];
cout << "fixedLSB = " << fixedLSB << " fixedMSB = " << fixedMSB << endl;
}
Output
t = 18000
LSB = -32768 MSB = 1
fixedLSB = 32768 fixedMSB = 1
fixedLSB = 32768 fixedMSB = 1
Python
DATA=[0, 0, 0, 1, -32768]
MSB=DATA[3]
LSB=DATA[4]
data = MSB << 16 | LSB
print (f"MSB = {MSB} ({hex(MSB)})")
print (f"LSB = {LSB} ({hex(LSB)})")
print (f"data = {data} ({hex(data)})")
time = MSB << 16 | LSB
print (f"time = {time} ({hex(time)})")
# 2.1 Fix
def twosComplement (short):
if short >= 0: return short
return 0x10000 + short
fixedTime = twosComplement(MSB) << 16 | twosComplement(LSB)
# 2.2 Fix
import ctypes
class UnsignedIntUnion(ctypes.Union):
_fields_ = [('unsignedInt', ctypes.c_uint),
('ushortArray', ctypes.c_ushort * 2),
('shortArray', ctypes.c_short * 2)]
unsignedIntUnion = UnsignedIntUnion(shortArray = (LSB, MSB))
print ("unsignedIntUnion")
print ("unsignedInt = ", hex(unsignedIntUnion.unsignedInt))
print ("ushortArray[1] = ", hex(unsignedIntUnion.ushortArray[1]))
print ("ushortArray[0] = ", hex(unsignedIntUnion.ushortArray[0]))
print ("shortArray[1] = ", hex(unsignedIntUnion.shortArray[1]))
print ("shortArray[0] = ", hex(unsignedIntUnion.shortArray[0]))
unsignedIntUnion.unsignedInt=twosComplement(unsignedIntUnion.shortArray[1]) << 16 | twosComplement(unsignedIntUnion.shortArray[0])
def toUInt(msShort: int, lsShort: int):
return UnsignedIntUnion(ushortArray = (lsShort, msShort)).unsignedInt
fixedTime = toUInt(MSB, LSB)
print ("fixedTime = ", hex(fixedTime))
print()
Output
MSB = 1 (0x1)
LSB = -32768 (-0x8000)
data = -32768 (-0x8000)
time = -32768 (-0x8000)
unsignedIntUnion
unsignedInt = 0x18000
ushortArray[1] = 0x1
ushortArray[0] = 0x8000
shortArray[1] = 0x1
shortArray[0] = -0x8000
fixedTime = 0x18000

How to find closest locations for a list of locations in more efficient way?

Looking for the help with an algorithm for local machine or cluster (Python, R, JavaScript, any languages).
I have a list of locations with coordinates.
# R script
n <- 10
set.seed(1)
index <- paste0("id_",c(1:n))
lat <- runif(n, 32.0, 41)
lon <- runif(n, 84, 112)*(-1)
values <- as.integer(runif(n, 50, 100))
df <- data.frame(index, lat, lon, values, stringsAsFactors = FALSE)
names(df) <- c('loc_id','lat','lon', 'value')
loc_id lat lon value
1 id_1 34.38958 -89.76729 96
2 id_2 35.34912 -88.94359 60
3 id_3 37.15568 -103.23664 82
4 id_4 40.17387 -94.75490 56
5 id_5 33.81514 -105.55556 63
6 id_6 40.08551 -97.93558 69
7 id_7 40.50208 -104.09332 50
8 id_8 37.94718 -111.77337 69
9 id_9 37.66203 -94.64099 93
10 id_10 32.55608 -105.76847 67
I need to find 3 closets locations for each location in the table.
This is my code in R:
# R script
require(dplyr)
require(geosphere)
start.time <- Sys.time()
d1 <- df
sample <- 999999999999
distances <- list("init1" = sample, "init2" = sample, "init3" = sample)
d1$distances <- apply(d1, 1, function(x){distances})
n_rows = nrow(d1)
for (i in 1:(n_rows-1)) {
# current location
dot1 <- c(d1$lon[i], d1$lat[i])
for (k in (i+1):n_rows) {
# next location
dot2 <- c(d1$lon[k], d1$lat[k])
# distance between locations
meters_between <- as.integer(distm(dot1, dot2, fun = distHaversine))
# updating current location distances
distances <- d1$distances[[i]]
distances[d1$loc_id[k]] <- meters_between
d1$distances[[i]] <- distances[order(unlist(distances), decreasing=FALSE)][1:3]
# updating next location distances
distances <- d1$distances[[k]]
distances[d1$loc_id[i]] <- meters_between
d1$distances[[k]] <- distances[order(unlist(distances), decreasing=FALSE)][1:3]
}
}
But it takes too much time:
# [1] "For 10 rows and 45 iterations takes 0.124729156494141 sec. Average sec 0.00277175903320313 per row."
# [1] "For 100 rows and 4950 iterations takes 2.54944682121277 sec. Average sec 0.000515039761861165 per row."
# [1] "For 200 rows and 19900 iterations takes 10.1178169250488 sec. Average sec 0.000508433011308986 per row."
# [1] "For 500 rows and 124750 iterations takes 73.7151870727539 sec. Average sec 0.000590903303188408 per row."
I did the same in Python:
# Python script
import pandas as pd
import numpy as np
n = 10
np.random.seed(1)
data_m = np.random.uniform(0, 5, 5)
data = {'loc_id':range(1, n+1),
'lat':np.random.uniform(32, 41, n),
'lon':np.random.uniform(84, 112, n)*(-1),
'values':np.random.randint(50, 100, n)}
df = pd.DataFrame(data)[['loc_id', 'lat', 'lon', 'values']]
df['loc_id'] = df['loc_id'].apply(lambda x: 'id_{0}'.format(x))
df = df.reset_index().drop('index', axis = 1).set_index('loc_id')
from geopy.distance import distance
from datetime import datetime
start_time = datetime.now()
sample = 999999999999
df['distances'] = np.nan
df['distances'] = df['distances'].apply(lambda x: [{'init1': sample}, {'init2': sample}, {'init3': sample}])
n_rows = len(df)
rows_done = 0
for i, row_i in df.head(n_rows-1).iterrows():
dot1 = (row_i['lat'], row_i['lon'])
rows_done = rows_done + 1
for k, row_k in df.tail(n_rows-rows_done).iterrows():
dot2 = (row_k['lat'], row_k['lon'])
meters_between = int(distance(dot1,dot2).meters)
distances = df.at[i, 'distances']
distances.append({k: meters_between})
distances_sorted = sorted(distances, key=lambda x: x[next(iter(x))])[:3]
df.at[i, 'distances'] = distances_sorted
distances = df.at[k, 'distances']
distances.append({i: meters_between})
distances_sorted = sorted(distances, key=lambda x: x[next(iter(x))])[:3]
df.at[k, 'distances'] = distances_sorted
print df
Almost the same performance.
Anybody knows if there is a better approach? In my task it has to be done for 90000 locations. Even thought about Hadoop/MpRc/Spark, but have no idea how to do in distributed mode.
I am glad to hear any ideas or suggestions.
If Euclidean distance is ok then nn2 uses kd-trees and C code so it should be fast:
library(RANN)
nn2(df[2:3], k = 4)
This took a total of 0.06 to 0.11 seconds on my not particularly fast laptop to process n = 10,000 rows and a total of 1.00 to 1.25 seconds for 90,000 rows.
I can offer a python solution with scipy
from scipy.spatial import distance
from geopy.distance import vincenty
v=distance.cdist(df[['lat','lon']].values,df[['lat','lon']].values,lambda u, v: vincenty(u, v).kilometers)
np.sort(v,axis=1)[:,1:4]
Out[1033]:
array([[384.09948155, 468.15944729, 545.41393271],
[270.07677993, 397.21974571, 659.96238603],
[384.09948155, 397.21974571, 619.616239 ],
[203.07302273, 483.54687912, 741.21396029],
[203.07302273, 444.49156394, 659.96238603],
[437.31308598, 468.15944729, 494.91879983],
[494.91879983, 695.91437812, 697.27399161],
[270.07677993, 444.49156394, 483.54687912],
[530.54946479, 626.29467739, 695.91437812],
[437.31308598, 545.41393271, 697.27399161]])
Here's how to solve this problem with C++ and my library
GeographicLib (version 1.47 or later). This uses true ellipsoidal geodesic
distances and a
vantage point tree
to optimize the search for nearest neighbors.
#include <exception>
#include <vector>
#include <fstream>
#include <string>
#include <GeographicLib/NearestNeighbor.hpp>
#include <GeographicLib/Geodesic.hpp>
using namespace std;
using namespace GeographicLib;
// A structure to hold a geographic coordinate.
struct pos {
string id;
double lat, lon;
pos(const string& _id = "", double _lat = 0, double _lon = 0) :
id(_id), lat(_lat), lon(_lon) {}
};
// A class to compute the distance between 2 positions.
class DistanceCalculator {
private:
Geodesic _geod;
public:
explicit DistanceCalculator(const Geodesic& geod) : _geod(geod) {}
double operator() (const pos& a, const pos& b) const {
double d;
_geod.Inverse(a.lat, a.lon, b.lat, b.lon, d);
if ( !(d >= 0) )
// Catch illegal positions which result in d = NaN
throw GeographicErr("distance doesn't satisfy d >= 0");
return d;
}
};
int main() {
try {
// Read in pts
vector<pos> pts;
string id;
double lat, lon;
{
ifstream is("pts.txt"); // lines of "id lat lon"
if (!is.good())
throw GeographicErr("pts.txt not readable");
while (is >> id >> lon >> lat)
pts.push_back(pos(id, lat, lon));
if (pts.size() == 0)
throw GeographicErr("need at least one location");
}
// Define a distance function object
DistanceCalculator distance(Geodesic::WGS84());
// Create NearestNeighbor object
NearestNeighbor<double, pos, DistanceCalculator>
ptsset(pts, distance);
vector<int> ind;
int n = 3; // Find 3 nearest neighbors
for (unsigned i = 0; i < pts.size(); ++i) {
ptsset.Search(pts, distance, pts[i], ind,
n, numeric_limits<double>::max(),
// exclude the point itself
0.0);
if (ind.size() != n)
throw GeographicErr("unexpected number of results");
cout << pts[i].id;
for (unsigned j = 0; j < ind.size(); ++j)
cout << " " << pts[ind[j]].id;
cout << "\n";
}
int setupcost, numsearches, searchcost, mincost, maxcost;
double mean, sd;
ptsset.Statistics(setupcost, numsearches, searchcost,
mincost, maxcost, mean, sd);
long long
totcost = setupcost + searchcost,
exhaustivecost = ((pts.size() - 1) * pts.size())/2;
cerr
<< "Number of distance calculations = " << totcost << "\n"
<< "With an exhaustive search = " << exhaustivecost << "\n"
<< "Ratio = " << double(totcost) / exhaustivecost << "\n"
<< "Efficiency improvement = "
<< 100 * (1 - double(totcost) / exhaustivecost) << "%\n";
}
catch (const exception& e) {
cerr << "Caught exception: " << e.what() << "\n";
return 1;
}
}
This reads in a set of points (in the form "id lat lon") for pts.txt,
puts them in a VP tree. Then for each point it looks up the 3 nearest
neighbors and prints the id and the id's of the neighbors (ranked by
distance).
Compile this with, e.g.,
g++ -O3 -o nearest nearest.cpp -lGeographic
If pts.txt contains 90000 points, then the computation completes in
about 6 secs (or 70 μs per point) on my home computer after doing about 3380000 distance
calculations. This
is about 1200 times more efficient than a brute force calculaion
(doing all N(N − 1)/2 distance calculations).
You could speed this up (by a factor of a "few") by using a crude
approximation to the distance (e.g., spherical or euclidean); just
modify the DistanceCalculator class appropriately. For example, this
version of DistanceCalculator returns the spherical distance in
degrees:
// A class to compute the spherical distance between 2 positions.
class DistanceCalculator {
public:
explicit DistanceCalculator(const Geodesic& /*geod*/) {}
double operator() (const pos& a, const pos& b) const {
double sphia, cphia, sphib, cphib, somgab, comgab;
Math::sincosd(a.lat, sphia, cphia);
Math::sincosd(b.lat, sphib, cphib);
Math::sincosd(Math::AngDiff(a.lon, b.lon), somgab, comgab);
return Math::atan2d(Math::hypot(cphia * sphib - sphia * cphib * comgab,
cphib * somgab),
sphia * sphib + cphia * cphib * comgab);
}
};
But now you have the added burden of ensuring that the approximation
is good enough. I recommend just using the correct geodesic distance
in the first place.
Details of the implementation of VP trees in GeographicLib are given
here.

Categories