I have a list with n numbers. I need to calculate the BD distance of each member of the list to all other members and sum all distances for each number and then select the number that has the lowest sum distance to all other members. I used two for loops for this, but it is time-consuming. is it a way to calculate this distance without using for loop? as you know the distance between i and j is the same but we calculate it two times in each loop. I have the distance of all points to each other in a big NumPy array. here we have some part of points in each cluster.
for itr2 in range(K):
tmp_cl=clusters[itr2+1]
if len(tmp_cl)>1:
BD_cent=np.zeros((len(tmp_cl),1))
for itr3 in range(len(tmp_cl)):
sumv=0
for itr5 in range(len(tmp_cl)):
BD_R=bd_rate(rate,tmp_cl[itr3,:],rate,tmp_cl[itr5,:])
BD_R=(BD_R-min_BDR)/(max_BDR-min_BDR)
BD_Q=bd_PSNR(rate,tmp_cl[itr3,:],rate,tmp_cl[itr5,:])
BD_Q=(BD_Q-min_BDQ)/(max_BDQ-min_BDQ)
value=(wr*BD_R+wq*BD_Q)
if value!=np.NINF:
sumv+=(value)
else:
sumv+=1000#for curves which have not overlap with others
BD_cent[itr3]=sumv/len(tmp_cl)
new_centroid_index=np.argmin(BD_cent)
centroid[itr2]=clusters[itr2+1][new_centroid_index]
This should be O(N log(N)) for the sort, and O(N) to find the minimum.
import numpy as np
import matplotlib.pyplot as plt
data = np.array([11, 32, 71, 167, 217, 308, 366, 411, 449])
x = np.sort(data)
position = np.arange(len(x))
# distances between neighboring points
xdiff = np.diff(x, prepend=0, append=0)
# From here, most calculations are unnecessary, because
# the minimum point depends on local conditions, and
# can be obtained from the change in position and xdiff
# delta distance = (distance between neighboring points)*position
xAccumLeftToRight = np.cumsum(xdiff[:-1]*position)
xAccumRightToLeft = np.cumsum(xdiff[::-1][:-1]*position)[::-1]
# Sum of distances to the left and to the right
sumDist = xAccumLeftToRight+xAccumRightToLeft
# Finding minimum. Can be speeded up with an log n search
# because it descends monotonically to the center
# (and then ascends monotonically)
indexMin = np.argmin(sumDist)
print(f"minimum distance at {x[indexMin]}")
print(xdiff)
print(xAccumLeftToRight)
print(xAccumRightToLeft)
print(sumDist)
plt.plot(x, sumDist)
plt.scatter(x[indexMin], sumDist[indexMin])
plt.show()
Related
I have 200 data points, each point is a list of 3 numbers that represents the position. I want to sample N=100 points from this 3D space, but with the constraint that the minimum distance between every two points must be larger than 0.15. The script below is the way I sample the points, but it just keeps running and never stops. Also, if I set a N larger than a value, the code cannot find all N points because I sample each point randomly and it gets to a point where no points can be sampled that isn't too close to the current points, but in reality the N can be much larger than this value if the point distribution is very "dense" (but still satisfies minimum distance larger than 0.15). Is there a more efficient way to do this?
import numpy as np
import random
import time
def get_random_points_not_too_close(points, npoints, min_distance):
random.shuffle(points)
final_points = [points[0]]
while len(final_points) < npoints:
for point in points:
if point in final_points:
continue
elif min([np.linalg.norm(np.array(p) - np.array(point)) for p in final_points]) > min_distance:
final_points.append(point)
return final_points
data = [[random.random() for i in range(3)] for j in range(200)]
t1 = time.time()
sample_points = get_random_points_not_too_close(points=data, npoints=100, min_distance=0.15)
t2 = time.time()
print(t2-t1)
Your algorithm could work for small sets of points, but it does not run in a deterministic time.
I did the following to create a random forest (simulated trees): first generate a square grid of points where the grid point distance is 3x the minimal distance. Now you take each point in the regular grid and translate it in a random direction, at a random distance which is max your distance. The result points will never be closer than max your distance.
I have game window of size 640 by 480 and it is populated by particles, but when a particle goes off to one side, it wraps around to the other (i.e. it is a toroid).
I want to calculate the distance between each particle, since this will be used to apply different forces to each particle.
At first I looped through each pair of particles, and then rescaled everything so that the first particle in pair was centered and then calculated the distance to the second particle, but this was extremely slow to run.
Then I found some functions in scipy.spatial.distance that allow me to calculate the distance between all points very quickly, but the only problem is that it doesn't take into account the wrap around.
Here is my current code
from scipy.spatial.distance import pdist, squareform
...
distance = squareform(pdist([(p.x, p.y) for p in particles]))
This works for particles near the center, but if one particle is at (1, 320) and the other particle is at (639, 320), then it calculates their distance as 638 instead of 2. It doesn't take into account the wrap.
Is there a different function I can use, or some transformation I can apply before/after to take into account the wrap?
You can compute the smaller of the x and y differences (the in-window difference versus the edge-crossing distance) like this:
game_width = 640
game_height = 480
def smaller_xy(point1, point2):
xdiff = abs(point1.x - point2.x)
if xdiff > (game_width / 2):
xdiff = game_width - xdiff
ydiff = abs(point1.y - point2.y)
if ydiff > (game_height / 2):
ydiff = game_height - ydiff
return xdiff, ydiff
That is, if the in-window distance in the x or y directions is greater than half the size of the window in that direction, it's better to go off the edge -- and in that case the distance will be the window size in that direction minus the original in-window distance.
Obviously, once you have the x and y separations you can compute the distance between the points as:
import math
small_x, small_y = smaller_xy(p1, p2)
least_distance = math.sqrt(small_x**2 + small_y**2)
However, depending on how your force calculation is defined, you might find that all you really need is the square of the distance (just (small_x**2 + small_y**2)) and therefore you can avoid the work of finding the sqrt.
To get this plumbed into scipy.pdist, note that pdist can be called with a function argument in addition to the points, as:
Y = pdist(X, func)
This is the last form of invocation shown in the description of pdist at https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html#scipy.spatial.distance.pdist
You should be able to use that feature to cause pdist to to build its distances-between-all-pairs-of-points matrix on the basis of distances calculated by a callback function that applies the smaller_xy computation.
Let's imagine you replicate four boards above, below, left, and right to the original board, and also the particles on the original board onto the new boards. Let's also mark the particles.
Denote the N particles on the original board o(i), i running from 1 and N. a(i) for the particles on the replicated board above. b(i), l(i), r(i) for below, left, and right respectively.
For all distinct i and j, you need to find distances between o(i) and o(j), o(i) and a(j), etc, etc. There are 5x5 = 25 distances to compute for each pair of i and j. Once you have all these distances, take the minimum for each pair, that's your distance for i and j.
I was thinking there may be ways to prune the computation. But my thinking is you would need to at least compute distances between particles to the boarders and compare that with distances on original board. That is an overhead as well.
I am trying to find a point (latitude/longitude) that minimizes the sum of Google maps distance to all other N points.
I was able to extract the Google Maps distances between my latitude and longitude arrays but I wasn't able to minimize my function.
Code
def minimize_g(input_g):
gmaps1 = googlemaps.Client(key="xxx")
def distance_f(x):
dist = gmaps1.distance_matrix([x], np.array(input_g)[:,1:3])
sum_ = 0
for obs in range(len(np.array(df[:3]))):
sum_+= dist['rows'][0]['elements'][obs]['distance']['value']
return sum_
#initial guess: centroid
centroid = input_g.mean(axis=0)
optimization = minimize(distance_f, centroid, method='COBYLA')
return optimization.x
Thanks!
If you are looking for any point on the map that results in shortest distance to all coordinates in your list, you can try writing a function that calculates the distance from one coordinate to another coordinate. If you have that function ready to go, its a matter of calculating the total distance to all your points from a test point.
Then, from some artificially created coordinates, you would minimize the distances to all your points with something along the lines of
import numpy as np
lats = [12.3, 12.4, 12.5]
lons = [16.1, 15.1, 14.1]
def total_distance_to_lats_and_lons(lat, lon):
# some summation over distances from lat, lon to lats, lons
# create two lists with 0.01 degree precision as an artificial grid of possibilities
test_lats = np.arange(min(lats), max(lats), 0.01)
test_lons = np.arange(min(lons), max(lons), 0.01)
test_distances = [] # empty list to fill with the total_distance to each combination of test_lat, test_lon
coordinate_index_combinations = [] # corresponding coordinates
for test_lat in test_lats:
for test_lon in test_lons:
coordinate_combinations.append([test_lat, test_lon]) # add a combination of indices
test_distances.append(total_distance_to_lats_and_lons(test_lat, test_lon)) # add a distance
index_of_best_test_coordinate = np.argmin(test_distances) # find index of the minimum value
print('Best match is index {}'.format(index_of_best_test_coordinate))
print('Coordinates: {}'.format(coordinate_combinations[index_of_best_test_coordinate]))
print('Total distance: {}'.format(test_distances[index_of_best_test_coordinate]))
This brute force approach has some precision limitations and becomes an expensive loop quite quickly, so you can also apply this method iteratively with the minimum found after each round, so iteratively increasing precision and decreasing start and end points in the test coordinate lists. After a few iterations, you should have a pretty precise estimate. On the other hand, it is possible such an iterative method converges to one of multiple local minima, yielding only one of multiple solutions.
I am writing some code to calculate the real distance between one point and the rest of the points from the same array. The array holds positions of particles in 3D space. There is N-particles so the array's shape is (N,3). I choose one particle and calculate the distance between this particle and the rest of the particles, all within one array.
Would anyone here have any idea how to do this?
What I have so far:
xbox = 10
ybox = 10
zbox = 10
nparticles =15
positions = np.empty([nparticles, 3])
for i in range(nparticles):
xrandomalocation = random.uniform(0, xbox)
yrandomalocation = random.uniform(0, ybox)
zrandomalocation = random.uniform(0, zbox)
positions[i, 0] = xrandomalocation
positions[i, 1] = yrandomalocation
positions[i, 2] = zrandomalocation
And that's pretty much all I have right now. I was thinking of using np.linalg.norm however I am not sure at all how to implement it to my code (or maybe use it in a loop)?
It sounds like you could use scipy.distance.cdist or scipy.distance.pdist for this. For example, to get the distances from point X to the points in coords:
>>> from scipy.spatial import distance
>>> X = [(35.0456, -85.2672)]
>>> coords = [(35.1174, -89.9711),
... (35.9728, -83.9422),
... (36.1667, -86.7833)]
>>> distance.cdist(X, coords, 'euclidean')
array([[ 4.70444794, 1.6171966 , 1.88558331]])
pdist is similar, but only takes one array, and you get the distances between all pairs.
i am using this function:
from scipy.spatial import distance
def closest_node(node, nodes):
closest = distance.cdist([node], nodes)
index = closest.argmin()
euclidean = closest[0]
return nodes[index], euclidean[index]
where node is the single point in the space you want to compare with an array of points called nodes. it returns the point and the euclidean distance to your original node
I want to generate random points on the surface of cylinder such that distance between the points fall in a range of 230 and 250. I used the following code to generate random points on surface of cylinder:
import random,math
H=300
R=20
s=random.random()
#theta = random.random()*2*math.pi
for i in range(0,300):
theta = random.random()*2*math.pi
z = random.random()*H
r=math.sqrt(s)*R
x=r*math.cos(theta)
y=r*math.sin(theta)
z=z
print 'C' , x,y,z
How can I generate random points such that they fall with in the range(on the surfaceof cylinder)?
This is not a complete solution, but an insight that should help. If you "unroll" the surface of the cylinder into a rectangle of width w=2*pi*r and height h, the task of finding distance between points is simplified. You have not explained how to measure "distance along the surface" between points on the top of the cylinder and the side- this is a slightly tricky bit of geometry.
As for computing the distance along the surface when we created an artificial "seam", just use both (x1-x2) and (w -x1+x2) - whichever gives the shorter distance is the one you want.
I do think that #VincentNivoliers' suggestion to use Poisson disk sampling is very good, but with the constraints of h=300 and r=20 you will get terrible results no matter what.
The basic way of creating a set of random points with constraints in the positions between them, is to have a function that modulates the probability of points being placed at a certain location. this function starts out being a constant, and whenever a point is placed, forbidden areas surrounding the point are set to zero. That is difficult to do with continuous variables, but reasonably easy if you discretize your problem.
The other thing to be careful about is the being on a cylinder part. It may be easier to think of it as random points on a rectangular area that repeats periodically. This can be handled in two different ways:
the simplest is to take into consideration not only the rectangular tile where you are placing the points, but also its neighbouring ones. Whenever you place a point in your main tile, you also place one in the neighboring ones and compute their effect on the probability function inside your tile.
A more sophisticated approach considers the probability function then convolution of a kernel that encodes forbidden areas, with a sum of delta functions, corresponding to the points already placed. If this is computed using FFTs, the periodicity is anatural by product.
The first approach can be coded as follows:
from __future__ import division
import numpy as np
r, h = 20, 300
w = 2*np.pi*r
int_w = int(np.rint(w))
mult = 10
pdf = np.ones((h*mult, int_w*mult), np.bool)
points = []
min_d, max_d = 230, 250
available_locs = pdf.sum()
while available_locs:
new_idx = np.random.randint(available_locs)
new_idx = np.nonzero(pdf.ravel())[0][new_idx]
new_point = np.array(np.unravel_index(new_idx, pdf.shape))
points += [new_point]
min_mask = np.ones_like(pdf)
if max_d is not None:
max_mask = np.zeros_like(pdf)
else:
max_mask = True
for p in [new_point - [0, int_w*mult], new_point +[0, int_w*mult],
new_point]:
rows = ((np.arange(pdf.shape[0]) - p[0]) / mult)**2
cols = ((np.arange(pdf.shape[1]) - p[1]) * 2*np.pi*r/int_w/mult)**2
dist2 = rows[:, None] + cols[None, :]
min_mask &= dist2 > min_d*min_d
if max_d is not None:
max_mask |= dist2 < max_d*max_d
pdf &= min_mask & max_mask
available_locs = pdf.sum()
points = np.array(points) / [mult, mult*int_w/(2*np.pi*r)]
If you run it with your values, the output is usually just one or two points, as the large minimum distance forbids all others. but if you run it with more reasonable values, e.g.
min_d, max_d = 50, 200
Here's how the probability function looks after placing each of the first 5 points:
Note that the points are returned as pairs of coordinates, the first being the height, the second the distance along the cylinder's circumference.