How to compute the distance between 3d points in a fast way - python

I have a 3d point cloud (x,y,z) in a txt file. I want to calculate the 3d distance between each point and all the other points in the point cloud, and save the number of points having distance less than a threshold. I have done it in python in the shown code but it takes too much time. I was asking for a faster one than the one I got.
from math import sqrt
import numpy as np
points_list = []
with open("D:/Point cloud data/projection_test_data3.txt") as chp:
for line in chp:
x, y, z = line.split()
points_list.append((float(x), float(y), float(z)))
j = 0
Final_density = 0
while j < len(points_list)-1:
i = 0
Density = 0
while i < len (points_list) - 1 :
if sqrt((points_list[i][0] - points_list[j][0])**2 + (points_list[i][1] - points_list[j][1])**2 + (points_list[i][2] - points_list[j][2])**2) < 0.15:
Density += 1
i += 1
Final_density = Density
with open("D:/Point cloud data/my_density.txt", 'a') as g:
g.write("{}\n".format(str(Final_density)))
j += 1

One (quick) option that might speed this up is to change the position of the file writing/opening so that you're not opening/closing the file as many times.
from math import sqrt
import numpy as np
points_list = []
with open("D:/Point cloud data/projection_test_data3.txt") as chp:
for line in chp:
x, y, z = line.split()
points_list.append((float(x), float(y), float(z)))
j = 0
Final_density = 0
with open("D:/Point cloud data/my_density.txt", 'a') as g:
while j < len(points_list)-1:
i = 0
Density = 0
while i < len (points_list) - 1 :
if sqrt((points_list[i][0] - points_list[j][0])**2 + (points_list[i][1] - points_list[j][1])**2 + (points_list[i][2] - points_list[j][2])**2) < 0.15:
Density += 1
i += 1
Final_density = Density
g.write("{}\n".format(str(Final_density)))
j += 1
Since it looks like you can use numpy, why not use it? You'll have to make sure the arrays are numpy arrays, but that should be simple.
if sqrt(...) < 0.15: to if np.linalg.norm(points_list[j] - points_list[i]) < 0.15:
This post (Finding 3d distances using an inbuilt function in python) has other ways to use a prebuilt function to get the 3d distance in python.
Edit thanks to #KellyBundy's comment:
You can also use np.linalg.norm(points_list - points_list[:, None], axis=1) to generate a matrix representing the distance between all points in the array. The diagonal will be 0 (corresponding to the distance between the given point and itself) and the matrix will be symmetric about the diagonal. You can use just the upper triangle to determine the distance between any given pair of points. Again, you'll have to modify your data structure to make all the points into a numpy array in the proper format (np.array([[point1x, point1y, point1z], [point2x, point2y, point2z]]), etc. (https://stackoverflow.com/a/46700369/2391458)
resulting matrix of the form:
[ [0 distance between points 0 and 1 distance between points 0 and 2......]
[distance between points 1 and 0 0 distance between points 2 and 0....]
.....

Quick x2 speed up – replace i = 0 with i = j + 1. That way you would check each pair only once, not twice.
More fundamental change – you can sort points by coordinates, and use sliding window algorithm. The idea is that if points are sorted by x coordinate, j-th point has x=1, and i-th point has x=1.01, then it might be near each other and you should check it. But if i-th point has x=2, then it cannot be near to j-th point, and since points are sorted, all points after i-th can be skipped (i.e. not checked in pair with j-th point).
If points are sparse, then it should significantly speed up function, and complexity would be O(n*log(n)) because of sorting.

In the if, instead of taking the sqrt and comparing it with 0.15, compare it with square of 0.15 which is 0.0225 directly. The result will be the same. sqrt is an expensive operation, it will save you time to not use it.
if (points_list[i][0] - points_list[j][0])**2 + (points_list[i][1] - points_list[j][1])**2 + (points_list[i][2] - points_list[j][2])**2 < 0.0225:

Related

Is there a faster way to perform this neighbour finding operation

I'm trying to calculate Moran's I in Python (This is the underlying equation). My inputs are a coords Nx3 array containing the coordinates of each point and a Nx3 array z which contains the values minus the overall mean. The operation requires each value of z to be multiplied with every point within a set distance (here set to 1.99). My problem is that in my case N=~2 Million and so the find_neighbours operation is very slow. Is there a way I could speed this up?
def find_neighbours(coords,idx,k):
distances = np.sqrt(np.power(coords - coords[idx], 2).sum(axis=1))
distances[idx] = np.inf
return np.argwhere(distances<=k)
z = x - np.mean(x)
n = len(coords)
A = 0
B = np.sum([z[idx]**2 for idx,coord in enumerate(coords)])
S_0 = 0
for idx in range(len(coords)):
neighbours = find_neighbours(coords,idx,1.99)
S_0 += len(neighbours)
A += np.sum([(z[neighbour]*z[idx]) for neighbour in neighbours])
I = (n/S_0)*(A/B)
This is a classical problem with plenty of literature about. It's called Radius Neighbor Search in Three-dimensional Point Clouds . You need to store your points in a better data structure to do the search faster. I would suggest an octree.
Check python code here and adapt to your case.
For explanations, check this paper.

How to estimate the integeral of an oscillating curve using the Monte-Carlo method (in python)

I am trying to estimate the integral below using the Monte-Carlo method (in python):
I am using 1000 random points to estimate the integral. Here's my code:
N = 1000 #total number of points to be generated
def f(x):
return x*np.cos(x)
##Points between the x-axis and the curve will be stored in these empty lists.
red_points_x = []
red_points_y = []
blue_points_x = []
blue_points_y = []
##The loop checks if a point is between the x-axis and the curve or not.
i = 0
while i < N:
x = random.uniform(0, 2*np.pi)
y = random.uniform(3.426*np.cos(3.426), 2*np.pi*np.cos(2*np.pi))
if (0<= x <= np.pi and 0<= y <= f(x)) or (np.pi/2 <= x <= 3*np.pi/2 and f(x) <= y <= 0) or (3*np.pi/2 <= x <= 2*np.pi and 0 <= y <= f(x)):
red_points_x.append(x)
red_points_y.append(y)
else:
blue_points_x.append(x)
blue_points_y.append(y)
i +=1
area_of_rectangle= (2*np.pi)*(2*np.pi*np.cos(2*np.pi))
area= area_of_rectangle*(len(red_points_x))/N
print(area)
Output:
7.658813015245341
But that's far from 0 (the analytic solution)
Here's a visual representation of the area I am trying to plot:
Am I doing something wrong or missing something in my code? Please help, your help will be much appreciated. Thank you so much in advance.
TLDR: I believe the way you calculate the approximation is slightly wrong.
Looking a the wikipedia definition of the Monte Carlo integration the following definition is made:
https://en.wikipedia.org/wiki/Monte_Carlo_integration#Example
V corresponds the volume (area in this case) of the region of interest, x = [0, 2pi], y = [3.426*cos(3.426), 2pi*cos(2pi)].
So Q_N is the volume divided by N times the sum of the function evaluated at the randomly generated points. Hence:
total = 0
while i < N:
x = random.uniform(0, 2 * np.pi)
total += f(x)
i += 1
area_of_rectangle = (2*np.pi)*(2*np.pi*np.cos(2*np.pi)-3.426 * np.cos(3.426))
area = (area_of_rectangle * total) / N
This code yielded an average result of 0.0603 for 1000 runs with N=1000 (to remove the influence of randomly generated values). As you increase N the accuracy increases.
You are on the right track!
A couple pointers to put you on course...
Make your bounding box bigger in the y dimension to alleviate some of the confusing math. Yes, it will converge faster if you get it to "just touch" the max and min, but don't shoot for that yet. Heck, just make it -5 < y < 10 and you will have a nice (larger) box that covers the area you want to integrate. So, change your y generation to that and also change the area of your box calculation
Don't change x, you have it right 0 < x < 2*pi
When you are comparing the point to see if it is "under the curve" you do NOT need to check the x value... right? Just check if y is between f(x) and the axis. More on this in next point.... if so, it is "red"
Also on the point above, you will also need another category for the points that are BELOW the x-axis, because you will want to reduce your total by that amount. An alternate "trick" is to shift your whole function up by some constant such that the entire integral is positive, and then reduce your total by the size of that rectangle (constant * width)
Also, as you work on this, plot your points with matplotlib, it should be very easy the way you have your points gathered to overlay scatter plots with what you have and see if it looks visually accurate!
Comment me back w/ further q's... you got this!

Check whether coordinates are in a certain region on a coordinate system

I have a coordinate system with a certain amount of regions, similar to this one:
The difference in my case is however, that all regions are uniquely numbered, are all of the same size and there are 16 of them (so each quadrant would have 4 slices of exactly the same size).
I also have a set of tuples (two dimensional coordinates), which are all between (-1,-1) and (1,1). I'd now like to check into which region (i.e. 1 to 16) they'd land if mapped onto the coordinate system.
As a total beginner, I have no idea on how to tackle this, but here is my approach so far:
Make all the dividing lines functions and check for each point whether they're above and below them. Ignore those on the decision boundary
For example: Quadrant 1 has four regions. From the x-axis to the y-axis (counter-clockwise) let's call them a, b, c and d.
a would be the region between the x-axis and f1(x) = 0.3333x (red)
b between f1 and f2, f2(x) = x (yellow)
c between f2 and f3, f3(x) = 3x (blue)
d between f3 and the y-axis
As code:
def a(p):
if(y > 0 and y < 0.3333x):
return "a"
else:
b(p)
def b(p):
if(y > 0.3333x and y < x)
return "b"
else:
c(p)
def c(p):
if(y > x and y < 3x):
return "c"
else:
d(p)
def d(p):
if(y > 3x and x > 0):
return "d"
Note: for readability's sake I just wrote "x" and "y" for the tuple's respective coordinates, instead p[0] or p[1] every time. Also, as stated above, I'm assuming that there are not items directly on the functions, so those are ignored.
Now, that is a possible solution, but I feel like there's almost certainly a more efficient one.
Since you're working between (-1,-1) and (1,1) coordinates and divinding equaly the cartesian plane, it becomes naturally to use trigonometry functions. Thinking in the unitary circle, which has 2*pi deegres, you are dividing it in n equal parts (in this case n = 16). So each slice has (2*pi)/16 = pi/8 deegres. Now you can imagine an arbitray point (x, y) connected to the origin point (0, 0), it formes an angle with the x-axis. To find this angle you just need to calculate the arc-tangent of y/x. Then you just need to verify in which angle section it is.
Here is a sketch:
And to directly map to the interval you can use the bisect module:
import bisect
from math import atan2
from math import pi
def find_section(x, y):
# create intervals
sections = [2 * pi * i / 16 for i in range(1, 17)]
# find the angle
angle = atan2(y, x)
# adjusts the angle to the other half circle
if y < 0:
angle += 2*pi
# map into sections
return bisect.bisect_left(sections, angle)
Usage:
In [1]: find_section(0.4, 0.2)
Out[1]: 1
In [2]: find_section(0.8, 0.2)
Out[2]: 0
Shapely is a python library that can help you with typical cartesian geometry, but as far as I know it doesn't have an easy way of extending its Line objects indefinitely based on a function.
If you're ok with that, then you can check if any Point is in any Polygon using the Polygon.contains(Point) pattern, as shown here: https://shapely.readthedocs.io/en/stable/manual.html#object.contains

Python creating density map efficiently

I was hoping for a bit of help to make my code run faster.
Basically I have a square grid of lat,long points in a list insideoceanlist. Then there is a directory containing data files of lat, long coords which represent lightning strikes for a particular day. The idea is for each day, we want to know how many lightning strikes there were around each point on the square grid. At the moment it is just two for loops, so for every point on the square grid, you check how far away every lightning strike was for that day. If it was within 40km I add one to that point to make a density map.
The starting grid has the overall shape of a rectangle, made up of squares with width of 0.11 and length 0.11. The entire rectange is about 50x30. Lastly I have a shapefile which outlines the 'forecast zones' in Australia, and if any point in the grid is outside this zone then we omit it. So all the leftover points (insideoceanlist) are the ones in Australia.
There are around 100000 points on the square grid and even for a slow day there are around 1000 lightning strikes, so it takes a long time to process. Is there a way to do this more efficiently? I really appreciate any advice.
By the way I changed list2 into list3 because I heard that iterating over lists is faster than arrays in python.
for i in range(len(list1)): #list1 is a list of data files containing lat,long coords for lightning strikes for each day
dict_density = {}
for k in insideoceanlist: #insideoceanlist is a grid of ~100000 lat,long points
dict_density[k] = 0
list2 = np.loadtxt(list1[i],delimiter = ",") #this open one of the files containing lat,long coords and puts it into an array
list3 = map(list,list2) #converts the array into a list
# the following part is what I wanted to improve
for j in insideoceanlist:
for l in list3:
if great_circle(l,j).meters < 40000: #great_circle is a function which measures distance between points the two lat,long points
dict_density[j] += 1
#
filename = 'example' +str(i) + '.txt'
with open(filename, 'w') as f:
for m in range(len(insideoceanlist)):
f.write('%s\n' % (dict_density[insideoceanlist[m]])) #writes each point in the same order as the insideoceanlist
f.close()
To elaborate a bit on #DanGetz's answer, here is some code that uses the strike data as the driver, rather than iterating the entire grid for each strike point. I'm assuming you're centered on Australia's median point, with 0.11 degree grid squares, even though the size of a degree varies by latitude!
Some back-of-the-envelope computation with a quick reference to Wikipedia tells me that your 40km distance is a ±4 grid-square range from north to south, and a ±5 grid-square range from east to west. (It drops to 4 squares in the lower latitudes, but ... meh!)
The tricks here, as mentioned, are to convert from strike position (lat/lon) to grid square in a direct, formulaic manner. Figure out the position of one corner of the grid, subtract that position from the strike, then divide by the size of the grid - 0.11 degrees, truncate, and you have your row/col indexes. Now visit all the surrounding squares until the distance grows too great, which is at most 1 + (2 * 2 * 4 * 5) = 81 squares checking for distance. Increment the squares within range.
The result is that I'm doing at most 81 visits times 1000 strikes (or however many you have) as opposed to visiting 100,000 grid squares times 1000 strikes. This is a significant performance gain.
Note that you don't describe your incoming data format, so I just randomly generated numbers. You'll want to fix that. ;-)
#!python3
"""
Per WikiPedia (https://en.wikipedia.org/wiki/Centre_points_of_Australia)
Median point
============
The median point was calculated as the midpoint between the extremes of
latitude and longitude of the continent.
24 degrees 15 minutes south latitude, 133 degrees 25 minutes east
longitude (24°15′S 133°25′E); position on SG53-01 Henbury 1:250 000
and 5549 James 1:100 000 scale maps.
"""
MEDIAN_LAT = -(24.00 + 15.00/60.00)
MEDIAN_LON = (133 + 25.00/60.00)
"""
From the OP:
The starting grid has the overall shape of a rectangle, made up of
squares with width of 0.11 and length 0.11. The entire rectange is about
50x30. Lastly I have a shapefile which outlines the 'forecast zones' in
Australia, and if any point in the grid is outside this zone then we
omit it. So all the leftover points (insideoceanlist) are the ones in
Australia.
"""
DELTA_LAT = 0.11
DELTA_LON = 0.11
GRID_WIDTH = 50.0 # degrees
GRID_HEIGHT = 30.0 # degrees
GRID_ROWS = int(GRID_HEIGHT / DELTA_LAT) + 1
GRID_COLS = int(GRID_WIDTH / DELTA_LON) + 1
LAT_SIGN = 1.0 if MEDIAN_LAT >= 0 else -1.0
LON_SIGN = 1.0 if MEDIAN_LON >= 0 else -1.0
GRID_LOW_LAT = MEDIAN_LAT - (LAT_SIGN * GRID_HEIGHT / 2.0)
GRID_HIGH_LAT = MEDIAN_LAT + (LAT_SIGN * GRID_HEIGHT / 2.0)
GRID_MIN_LAT = min(GRID_LOW_LAT, GRID_HIGH_LAT)
GRID_MAX_LAT = max(GRID_LOW_LAT, GRID_HIGH_LAT)
GRID_LOW_LON = MEDIAN_LON - (LON_SIGN * GRID_WIDTH / 2.0)
GRID_HIGH_LON = MEDIAN_LON + (LON_SIGN * GRID_WIDTH / 2.0)
GRID_MIN_LON = min(GRID_LOW_LON, GRID_HIGH_LON)
GRID_MAX_LON = max(GRID_LOW_LON, GRID_HIGH_LON)
GRID_PROXIMITY_KM = 40.0
"""https://en.wikipedia.org/wiki/Longitude#Length_of_a_degree_of_longitude"""
_Degree_sizes_km = (
(0, 110.574, 111.320),
(15, 110.649, 107.551),
(30, 110.852, 96.486),
(45, 111.132, 78.847),
(60, 111.412, 55.800),
(75, 111.618, 28.902),
(90, 111.694, 0.000),
)
# For the Australia situation, +/- 15 degrees means that our worst
# case scenario is about 40 degrees south. At that point, a single
# degree of longitude is smallest, with a size about 80 km. That
# in turn means a 40 km distance window will span half a degree or so.
# Since grid squares a 0.11 degree across, we have to check +/- 5
# cols.
GRID_SEARCH_COLS = 5
# Latitude degrees are nice and constant-like at about 110km. That means
# a .11 degree grid square is 12km or so, making our search range +/- 4
# rows.
GRID_SEARCH_ROWS = 4
def make_grid(rows, cols):
return [[0 for col in range(cols)] for row in range(rows)]
Grid = make_grid(GRID_ROWS, GRID_COLS)
def _col_to_lon(col):
return GRID_LOW_LON + (LON_SIGN * DELTA_LON * col)
Col_to_lon = [_col_to_lon(c) for c in range(GRID_COLS)]
def _row_to_lat(row):
return GRID_LOW_LAT + (LAT_SIGN * DELTA_LAT * row)
Row_to_lat = [_row_to_lat(r) for r in range(GRID_ROWS)]
def pos_to_grid(pos):
lat, lon = pos
if lat < GRID_MIN_LAT or lat >= GRID_MAX_LAT:
print("Lat limits:", GRID_MIN_LAT, GRID_MAX_LAT)
print("Position {} is outside grid.".format(pos))
return None
if lon < GRID_MIN_LON or lon >= GRID_MAX_LON:
print("Lon limits:", GRID_MIN_LON, GRID_MAX_LON)
print("Position {} is outside grid.".format(pos))
return None
row = int((lat - GRID_LOW_LAT) / DELTA_LAT)
col = int((lon - GRID_LOW_LON) / DELTA_LON)
return (row, col)
def visit_nearby_grid_points(pos, dist_km):
row, col = pos_to_grid(pos)
# +0, +0 is not symmetric - don't increment twice
Grid[row][col] += 1
for dr in range(1, GRID_SEARCH_ROWS):
for dc in range(1, GRID_SEARCH_COLS):
misses = 0
gridpos = Row_to_lat[row+dr], Col_to_lon[col+dc]
if great_circle(pos, gridpos).meters <= dist_km:
Grid[row+dr][col+dc] += 1
else:
misses += 1
gridpos = Row_to_lat[row+dr], Col_to_lon[col-dc]
if great_circle(pos, gridpos).meters <= dist_km:
Grid[row+dr][col-dc] += 1
else:
misses += 1
gridpos = Row_to_lat[row-dr], Col_to_lon[col+dc]
if great_circle(pos, gridpos).meters <= dist_km:
Grid[row-dr][col+dc] += 1
else:
misses += 1
gridpos = Row_to_lat[row-dr], Col_to_lon[col-dc]
if great_circle(pos, gridpos).meters <= dist_km:
Grid[row-dr][col-dc] += 1
else:
misses += 1
if misses == 4:
break
def get_pos_from_line(line):
"""
FIXME: Don't know the format of your data, just random numbers
"""
import random
return (random.uniform(GRID_LOW_LAT, GRID_HIGH_LAT),
random.uniform(GRID_LOW_LON, GRID_HIGH_LON))
with open("strikes.data", "r") as strikes:
for line in strikes:
pos = get_pos_from_line(line)
visit_nearby_grid_points(pos, GRID_PROXIMITY_KM)
If you know the formula that generates the points on your grid, you can probably find the closest grid point to a given point quickly by reversing that formula.
Below is a motivating example, that isn't quite right for your purposes because the Earth is a sphere, not flat or cylindrical. If you can't easily reverse the grid point formula to find the closest grid point, then maybe you can do the following:
create a second grid (let's call it G2) that is a simple formula like below, with big enough boxes such that you can be confident that the closest grid point to any point in one box will either be in the same box, or in one of the 8 neighboring boxes.
create a dict which stores which original grid (G1) points are in which box of the G2 grid
take the point p you're trying to classify, and find the G2 box it would go into
compare p to all the G1 points in this G2 box, and all the immediate neighbors of that box
choose the G1 point of these that's closest to p
Motivating example with a perfect flat grid
If you had a perfect square grid on a flat surface, that isn't rotated, with sides of length d, then their points can be defined by a simple mathematical formula. Their latitude values will all be of the form
lat0 + d * i
for some integer value i, where lat0 is the lowest-numbered latitude, and their longitude values will be of the same form:
long0 + d * j
for some integer j. To find what the closest grid point is for a given (lat, long) pair, you can separately find its latitude and longitude. The closest latitude number on your grid will be where
i = round((lat - lat0) / d)
and likewise j = round((long - long0) / d) for the longitude.
So one way you can go forward is to plug that in to the formulas above, and get
grid_point = (lat0 + d * round((lat - lat0) / d),
long0 + d * round((long - long0) / d)
and just increment the count in your dict at that grid point. This should make your code much, much faster than before, because instead of checking thousands of grid points for distance, you directly found the grid point with a couple calculations.
You can probably make this a little faster by using the i and j numbers as indexes into a multidimensional array, instead of using grid_point in a dict.
Have you tried using Numpy for the indexing? You can use multi-dimensional arrays, and the indexing should be faster because Numpy arrays are essentially a Python wrapper around C arrays.
If you need further speed increases, take a look at Cython, a Python to optimized C converter. It is especially good for multi-dimensional indexing, and should be able to speed this type of code by about an order of magnitude. It'll add a single additional dependency to your code, but it's a quick install, and not too difficult to implement.
(Benchmarks), (Tutorial using Numpy with Cython)
Also as a quick aside, use
for listI in list1:
...
list2 = np.loadtxt(listI, delimiter=',')
# or if that doesn't work, at least use xrange() rather than range()
essentially you should only ever use range() when you explicity need the list generated by the range() function. In your case, it shouldn't do much because it is the outer-most loop.

Efficient distance calculation between N points and a reference in numpy/scipy

I just started using scipy/numpy. I have an 100000*3 array, each row is a coordinate, and a 1*3 center point. I want to calculate the distance for each row in the array to the center and store them in another array. What is the most efficient way to do it?
I would take a look at scipy.spatial.distance.cdist:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html
import numpy as np
import scipy
a = np.random.normal(size=(10,3))
b = np.random.normal(size=(1,3))
dist = scipy.spatial.distance.cdist(a,b) # pick the appropriate distance metric
dist for the default distant metric is equivalent to:
np.sqrt(np.sum((a-b)**2,axis=1))
although cdist is much more efficient for large arrays (on my machine for your size problem, cdist is faster by a factor of ~35x).
I would use the sklearn implementation of the euclidean distance. The advantage is the usage of the more efficient expression by using Matrix multiplication:
dist(x, y) = sqrt(np.dot(x, x) - 2 * np.dot(x, y) + np.dot(y, y)
A simple script would look like this:
import numpy as np
x = np.random.rand(1000, 3)
y = np.random.rand(1000, 3)
dist = np.sqrt(np.dot(x, x)) - (np.dot(x, y) + np.dot(x, y)) + np.dot(y, y)
The advantage of this approach has been nicely described in the sklearn documentation:
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html#sklearn.metrics.pairwise.euclidean_distances
I am using this approach to crunch large datamatrices (10000, 10000) with some minor modifications like using the np.einsum function.
You can also use the development of the norm (similar to remarkable identities). This is probably the most efficent way to compute the distance of a matrix of points.
Here is a code snippet that I originally used for a k-Nearest-Neighbors implementation, in Octave, but you can easily adapt it to numpy since it only uses matrix multiplications (the equivalent is numpy.dot()):
% Computing the euclidian distance between each known point (Xapp) and unknown points (Xtest)
% Note: we use the development of the norm just like a remarkable identity:
% ||x1 - x2||^2 = ||x1||^2 + ||x2||^2 - 2*<x1,x2>
[napp, d] = size(Xapp);
[ntest, d] = size(Xtest);
A = sum(Xapp.^2, 2);
A = repmat(A, 1, ntest);
B = sum(Xtest.^2, 2);
B = repmat(B', napp, 1);
C = Xapp*Xtest';
dist = A+B-2.*C;
This might not answer your question directly, but if you are after all permutations of particle pairs, I've found the following solution to be faster than the pdist function in some cases.
import numpy as np
L = 100 # simulation box dimension
N = 100 # Number of particles
dim = 2 # Dimensions
# Generate random positions of particles
r = (np.random.random(size=(N,dim))-0.5)*L
# uti is a list of two (1-D) numpy arrays
# containing the indices of the upper triangular matrix
uti = np.triu_indices(100,k=1) # k=1 eliminates diagonal indices
# uti[0] is i, and uti[1] is j from the previous example
dr = r[uti[0]] - r[uti[1]] # computes differences between particle positions
D = np.sqrt(np.sum(dr*dr, axis=1)) # computes distances; D is a 4950 x 1 np array
See this for a more in-depth look on this matter, on my blog post.
You may need to specify a more detailed manner the distance function you are interested of, but here is a very simple (and efficient) implementation of Squared Euclidean Distance based on inner product (which obviously can be generalized, straightforward manner, to other kind of distance measures):
In []: P, c= randn(5, 3), randn(1, 3)
In []: dot(((P- c)** 2), ones(3))
Out[]: array([ 8.80512, 4.61693, 2.6002, 3.3293, 12.41800])
Where P are your points and c is the center.
#is it true, to find the biggest distance between the points in surface?
from math import sqrt
n = int(input( "enter the range : "))
x = list(map(float,input("type x coordinates: ").split()))
y = list(map(float,input("type y coordinates: ").split()))
maxdis = 0
for i in range(n):
for j in range(n):
print(i, j, x[i], x[j], y[i], y[j])
dist = sqrt((x[j]-x[i])**2+(y[j]-y[i])**2)
if maxdis < dist:
maxdis = dist
print(" maximum distance is : {:5g}".format(maxdis))

Categories