Reading in Z dimension in KD-Tree - python

I have been playing around for months on how to best write a program that will analyze multiple tables for similarities in geographical coordinates. I have tried everything now from nested for-loops to currently using a KD-Tree which seems to be working great. However I am not sure it is functioning properly when reading in my 3rd dimension, in this case is defined as Z.
import numpy
from scipy import spatial
import math as ma
def d(a,b):
d = ma.acos(ma.sin(ma.radians(a[1]))*ma.sin(ma.radians(b[1]))
+ma.cos(ma.radians(a[1]))*ma.cos(ma.radians(b[1]))*(ma.cos(ma.radians((a[0]-b[0])))))
return d
filename1 = "A"
pos1 = numpy.genfromtxt(filename1,
skip_header=1,
usecols=(1, 2))
z1 = numpy.genfromtxt(filename1,
skip_header=1,
usecols=(3))
filename2 = "B"
pos2 = numpy.genfromtxt(filename2,
#skip_header=1,
usecols=(0, 1))
z2 = numpy.genfromtxt(filename2,
#skip_header=1,
usecols=(2))
filename1 = "A"
data1 = numpy.genfromtxt(filename1,
skip_header=1)
#usecols=(0, 1))
filename2 = "B"
data2 = numpy.genfromtxt(filename2,
skip_header=1)
#usecols=(0, 1)
tree1 = spatial.KDTree(pos1)
match = tree1.query(pos2)
#print match
indices_pos1, indices_pos2 = [], []
for idx_pos1 in range(len(pos1)):
# find indices in pos2 that match this position (idx_pos1)
matching_indices_pos2 = numpy.where(match[1]==idx_pos1)[0]
for idx_pos2 in matching_indices_pos2:
# distance in sph coo
distance = d(pos1[idx_pos1], pos2[idx_pos2])
if distance < 0.01 and z1[idx_pos1]-z2[idx_pos2] > 0.001:
print pos1[idx_pos1], pos2[idx_pos2], z1[idx_pos1], z2[idx_pos2], distance
As you can see I am first calculating the (x,y) position as a single unit measured in spherical coordinates. Each element in file1 is compared to each element in file2. The problem lies somewhere in the Z dimension but I cant seem to crack this issue. When the results are printed out, the Z coordinates are often nowhere near each other. It seems as if my program is entirely ignoring the and statement. Below I have posted a string of results from my data which shows the issue that the z-values are in fact very far apart.
[ 358.98787832 -3.87297365] [ 358.98667162 -3.82408566] 0.694282 0.5310796 0.000853515096105
[ 358.98787832 -3.87297365] [ 359.00303872 -3.8962745 ] 0.694282 0.5132215 0.000484847441066
[ 358.98787832 -3.87297365] [ 358.99624509 -3.84617685] 0.694282 0.5128636 0.000489860962243
[ 359.0065807 -8.81507801] [ 358.99226267 -8.8451829 ] 0.6865379 0.6675241 0.000580562641945
[ 359.0292886 9.31398903] [ 358.99296163 9.28436493] 0.68445694 0.45485374 0.000811677349685
How the out put is structured: [ position1 (x,y)] [position2 (x,y)] [Z1] [Z2] distance
As you can see, specifically in the last example the Z-coordinates are sperated by about .23, which is way over the .001 restriction I typed for it above.
Any insights you could share would be really wonderful!

As for your original problem, you have a simple problem with the sign. You test if z1-z2 > 0.001, but you probably wanted abs(z1-z2) < 0.001 (notice the < instead of a >).
You could have the tree to also take the z coordinate into account, then you need to give it data as (x,y,z) and not only (x,y).
If it doesn't know the z value, it cannot use it.
It should be possible (although the sklearn API might not allow this) to query the tree directly for a window, where you bound the coordinate range and the z range independently. Think of a box that has different extensions in x,y,z. But because z will have a different value range, combining these different scales is difficult.
Beware that the k-d-tree does not know about spherical coordinates. A point at +180 degree and one at -180 degree - or one at 0 and one at 360 - are very far for the k-d-tree, but very close by spherical distance. So it will miss some points!

Related

Binning velocity data then averaging it

So I am running a simulation where particles are interacting with each other and the walls. Here is the snippet that has been writing the particle data (number of timesteps, velocity-x, velocity-y, velocity-z, position-x, position-y, position-z) to individual files for each particle over a large amount of time steps (incremented by 1000). Right now I have 15 particles but in the future there will be more.
N_max = sim.getNumTimeSteps()
particleData = [ [] for x in range(len(sim.getParticleList()))]
for n in range (N_max):
sim.runTimeStep()
if (n%1000==0):
particles = sim.getParticleList()
for i in range(len(sim.getParticleList())):
x, y, z = particles[i].getVelocity()
x2, y2, z2 = particles[i].getPosition()
particleData[i].append( (n, x, y, z, x2, y2, z2) )
for i in range(len(sim.getParticleList())):
with open("{0:d}.dat".format(i), "w") as f:
for j in particleData[i]:
f.write("%f : %f,%f,%f : %f,%f,%f \n" % (j[0], j[1], j[2], j[3], j[4], j[5], j[6]))
sim.exit()
In my simulation, the top wall is fixed and the bottom is sheared (moving). I am interested in dividing my simulation into strips based on y-position. So if it is 10 units in the y direction, I want to split it into 10 strips of 1-width. I am trying to collect the speeds of particles throughout these strips (to compare speeds depending on proximity to which wall), which I will then later average and graph with matplotlib.
I am very new to Python, so someone very good at it recommended I use binning. IE for each time step, after reading the particle position and velocity, I should check where that particle's y-position is. How do I bin like that--adding it to a list of particles for each bin? And they recommended storing the average information in another array. I've Googled plenty on binning but I'm overwhelmed by all the things that numpy and scipy can do so these complicated/advanced examples are lost on me. Is this the best way to go about it? Does this all make sense?!
This is as far as I've gotten with reading the particle's data...
for i in range(10):
with open("{}.dat".format(i),'r') as csvfile:
data = csv.reader(csvfile, delimiter=',')
y2 = []
for row in data:
y2.append(float(row[5]))
then I'm assuming the binning happens, putting y2 in between certain values? like if (n / 10) <= y2 <= ((n+1) / 10):?
Here is an example of the dat files:
0.0 : 0.999900,-0.999900,0.0 : -6.999000,-7.001000,0.0
1000.0 : -1.617575,-0.927360,0.0 : -6.032388,-9.007120,0.0
2000.0 : -1.019145,-0.939388,0.0 : -3.059924,-9.008897,0.0
3000.0 : 0.654350,-0.560711,0.0 : -4.575242,-9.242543,0.0
4000.0 : 0.592084,0.509928,0.0 : -3.952575,-9.275643,0.0
5000.0 : 2.288733,0.0,0.0 : -3.038456,-10.0,0.0
etc until end of simulation, n=20000
Each file belongs to an individual particle, so it shows that particle's movement and speed across the timesteps.
I am simulating 15 particles so I have 15 files.
For the strips I want all the particles are in that strip at any time.
I will average those numbers later.
If the simulation's domain is 10x10, the particles are anywhere between y=0 and y=10.
Here is a non-[Pandas,Numpy,SciPy] solution. If at some point in the future processing time becomes annoying you could delve into those - there is a learning curve. There are other advantages particularly with Pandas - subsequent analysis might be easier with Pandas - But you can probably do all analysis without it.
For the strips I want all the particles are in that strip at any time.
You will need to identify each data point after you have lumped them all together. For simplicity I've used a namedtuple to make an object of each data point.
import csv
from collections import namedtuple
Particle = namedtuple('Particle',('name','t','x','y','z','x2','y2','z2'))
Often choosing the correct container for your stuff is important - you have to figure that out early and it affects the mechanics of the processing later. Again I've opted for simplicity with no thought of how it will be used later - a dictionary with key/value pairs for each strip. Each key is the left-edge of the strip/bin. Converting the y position to an integer easily categorizes it.
# positions in example data are all negative
bins = {-0:[],-1:[],-2:[],-3:[],-4:[],-5:[],-6:[],-7:[],-8:[],-9:[]}
Use the csv module to read all the files; make Particles; put them in bins.
for name in range(3):
with open(f'{name}.dat') as f:
reader = csv.reader(f,delimiter=':')
# example row
# 0.0 : 0.999900,-0.999900,0.0 : -6.999000,-7.001000,0.0
for t,vel,pos in reader:
t = float(t)
x,y,z = map(float, vel.split(','))
x2,y2,z2 = map(float, pos.split(','))
p = Particle(name,t,x,y,z,x2,y2,z2)
y = int(p.y2)
#print(f'{y}:{p}')
bins[y].append(p)
Partial bins made from some random data.
{-9: [Particle(name=1, t=1000.0, x=1.09185, y=2.13655, z=-1.96046, x2=-8.74504, y2=-9.89888, z2=-9.49985),...],
-8: [Particle(name=0, t=5000.0, x=1.2371, y=1.10508, z=-0.9939, x2=-9.47672, y2=-8.90004, z2=-8.06145),
Particle(name=2, t=7000.0, x=-0.82952, y=0.14332, z=-0.3446, x2=-2.76384, y2=-8.14855, z2=-7.2325)],
-7: [...,Particle(name=2, t=12000.0, x=1.06694, y=0.02654, z=-2.93894, x2=-8.62668, y2=-7.93497, z2=-6.18243)],
-6: [Particle(name=0, t=3000.0, x=0.01791, y=-2.67168, z=-1.39907, x2=-6.00256, y2=-6.64951, z2=-6.35569),...,
Particle(name=2, t=18000.0, x=2.41593, y=-2.27558, z=-1.1414, x2=-6.90592, y2=-6.42374, z2=-9.67672)],
-5: [...],
-4: [...],
...}
Random data maker.
import numpy as np
import csv
def make_data(q=3):
for n in range(q):
data = np.random.random((21,6))
np.add(data, [-.5,-.5,-.5,0,0,0], out=data)
np.multiply(data,[6,6,6,-10,-10,-10],out=data)
np.round_(data,5,data)
t = np.linspace(0,20000,21)
data = np.hstack((t[:,None],data))
with open(f'{n}.dat', 'w', newline='') as f:
writer = csv.writer(f,delimiter=':')
writer.writerows(data.tolist())
If in the future you want finer strips, say hundredths of units, just multiply by that factor.
>>> factor = 100
>>> y2 = -1.20513
>>> int(y2*factor)
-120
>>> d = {n:[] for n in range(0,-10*factor,-1)}
>>> d[int(y2*factor)].append(str(y2))
>>> d[-120]
['-1.20513']
>>>

Counterclockwise sorting of x, y data

I have a set of points in a text file: random_shape.dat.
The initial order of points in the file is random. I would like to sort these points in a counter-clockwise order as follows (the red dots are the xy data):
I tried to achieve that by using the polar coordinates: I calculate the polar angle of each point (x,y) then sort by the ascending angles, as follows:
"""
Script: format_file.py
Description: This script will format the xy data file accordingly to be used with a program expecting CCW order of data points, By soting the points in Counterclockwise order
Example: python format_file.py random_shape.dat
"""
import sys
import numpy as np
# Read the file name
filename = sys.argv[1]
# Get the header name from the first line of the file (without the newline character)
with open(filename, 'r') as f:
header = f.readline().rstrip('\n')
angles = []
# Read the data from the file
x, y = np.loadtxt(filename, skiprows=1, unpack=True)
for xi, yi in zip(x, y):
angle = np.arctan2(yi, xi)
if angle < 0:
angle += 2*np.pi # map the angle to 0,2pi interval
angles.append(angle)
# create a numpy array
angles = np.array(angles)
# Get the arguments of sorted 'angles' array
angles_argsort = np.argsort(angles)
# Sort x and y
new_x = x[angles_argsort]
new_y = y[angles_argsort]
print("Length of new x:", len(new_x))
print("Length of new y:", len(new_y))
with open(filename.split('.')[0] + '_formatted.dat', 'w') as f:
print(header, file=f)
for xi, yi in zip(new_x, new_y):
print(xi, yi, file=f)
print("Done!")
By running the script:
python format_file.py random_shape.dat
Unfortunately I don't get the expected results in random_shape_formated.dat! The points are not sorted in the desired order.
Any help is appreciated.
EDIT: The expected resutls:
Create a new file named: filename_formatted.dat that contains the sorted data according to the image above (The first line contains the starting point, the next lines contain the points as shown by the blue arrows in counterclockwise direction in the image).
EDIT 2: The xy data added here instead of using github gist:
random_shape
0.4919261070361315 0.0861956168831175
0.4860816807027076 -0.06601587301587264
0.5023029456281289 -0.18238249845392662
0.5194784026079869 0.24347943722943777
0.5395164357511545 -0.3140611471861465
0.5570497147514262 0.36010146103896146
0.6074231036252226 -0.4142604617604615
0.6397066014669927 0.48590810704447085
0.7048302091822873 -0.5173701298701294
0.7499157837544145 0.5698170011806378
0.8000108666123336 -0.6199254449254443
0.8601249660418364 0.6500974025974031
0.9002010323281716 -0.7196585989767801
0.9703341483292582 0.7299242424242429
1.0104102146155935 -0.7931355765446666
1.0805433306166803 0.8102046438410078
1.1206193969030154 -0.865251869342778
1.1907525129041021 0.8909386068476981
1.2308285791904374 -0.9360074773711129
1.300961695191524 0.971219008264463
1.3410377614778592 -1.0076702085792988
1.4111708774789458 1.051499409681228
1.451246943765281 -1.0788793781975592
1.5213800597663678 1.1317798110979933
1.561456126052703 -1.1509956709956706
1.6315892420537896 1.2120602125147582
1.671665308340125 -1.221751279024005
1.7417984243412115 1.2923406139315234
1.7818744906275468 -1.2943211334120424
1.8520076066286335 1.3726210153482883
1.8920836729149686 -1.3596340023612745
1.9622167889160553 1.4533549783549786
2.0022928552023904 -1.4086186540731989
2.072425971203477 1.5331818181818184
2.1125020374898122 -1.451707005116095
2.182635153490899 1.6134622195985833
2.2227112197772345 -1.4884454939000387
2.292844335778321 1.6937426210153486
2.3329204020646563 -1.5192876820149541
2.403053518065743 1.774476584022039
2.443129584352078 -1.5433264462809912
2.513262700353165 1.8547569854388037
2.5533387666395 -1.561015348288075
2.6234718826405867 1.9345838252656438
2.663547948926922 -1.5719008264462806
2.7336810649280086 1.9858362849271942
2.7737571312143436 -1.5750757575757568
2.8438902472154304 2.009421487603306
2.883966313501766 -1.5687258953168035
2.954099429502852 2.023481896890988
2.9941754957891877 -1.5564797323888229
3.0643086117902745 2.0243890200708385
3.1043846780766096 -1.536523022432113
3.1745177940776963 2.0085143644234558
3.2145938603640314 -1.5088557654466737
3.284726976365118 1.9749508067689887
3.324803042651453 -1.472570838252656
3.39493615865254 1.919162731208186
3.435012224938875 -1.4285753640299088
3.5051453409399618 1.8343467138921687
3.545221407226297 -1.3786835891381335
3.6053355066557997 1.7260966810966811
3.655430589513719 -1.3197205824478546
3.6854876392284703 1.6130086580086582
3.765639771801141 -1.2544077134986225
3.750611246943765 1.5024152236652237
3.805715838087476 1.3785173160173163
3.850244800627849 1.2787337662337666
3.875848954088563 -1.1827449822904361
3.919007794704616 1.1336638361638363
3.9860581363759846 -1.1074537583628485
3.9860581363759846 1.0004485329485333
4.058012891753723 0.876878197560016
4.096267318663407 -1.0303482880755608
4.15638141809291 0.7443374218374221
4.206476500950829 -0.9514285714285711
4.256571583808748 0.6491902794175526
4.3166856832382505 -0.8738695395513574
4.36678076609617 0.593855765446675
4.426894865525672 -0.7981247540338443
4.476989948383592 0.5802489177489183
4.537104047813094 -0.72918339236521
4.587199130671014 0.5902272727272733
4.647313230100516 -0.667045454545454
4.697408312958435 0.6246979535615904
4.757522412387939 -0.6148858717040526
4.807617495245857 0.6754968516332154
4.8677315946753605 -0.5754260133805582
4.917826677533279 0.7163173947264858
4.977940776962782 -0.5500265643447455
5.028035859820701 0.7448917748917752
5.088149959250204 -0.5373268398268394
5.138245042108123 0.7702912239275879
5.198359141537626 -0.5445838252656432
5.2484542243955445 0.7897943722943728
5.308568323825048 -0.5618191656828015
5.358663406682967 0.8052154663518301
5.41877750611247 -0.5844972451790631
5.468872588970389 0.8156473829201105
5.5289866883998915 -0.6067217630853987
5.579081771257811 0.8197294372294377
5.639195870687313 -0.6248642266824076
5.689290953545233 0.8197294372294377
5.749405052974735 -0.6398317591499403
5.799500135832655 0.8142866981503349
5.859614235262157 -0.6493565525383702
5.909709318120076 0.8006798504525783
5.969823417549579 -0.6570670995670991
6.019918500407498 0.7811767020857934
6.080032599837001 -0.6570670995670991
6.13012768269492 0.7562308146399057
6.190241782124423 -0.653438606847697
6.240336864982342 0.7217601338055886
6.300450964411845 -0.6420995670995664
6.350546047269764 0.6777646595828419
6.410660146699267 -0.6225964187327819
6.4607552295571855 0.6242443919716649
6.520869328986689 -0.5922077922077915
6.570964411844607 0.5548494687131056
6.631078511274111 -0.5495730027548205
6.681173594132029 0.4686727666273125
6.7412876935615325 -0.4860743801652889
6.781363759847868 0.3679316979316982
6.84147785927737 -0.39541245791245716
6.861515892420538 0.25880333951762546
6.926639500135833 -0.28237987012986965
6.917336127605076 0.14262677798392165
6.946677533279001 0.05098957832291173
6.967431210462995 -0.13605442176870675
6.965045730326905 -0.03674603174603108
I find that an easy way to sort points with x,y-coordinates like that is to sort them dependent on the angle between the line from the points and the center of mass of the whole polygon and the horizontal line which is called alpha in the example. The coordinates of the center of mass (x0 and y0) can easily be calculated by averaging the x,y coordinates of all points. Then you calculate the angle using numpy.arccos for instance. When y-y0 is larger than 0 you take the angle directly, otherwise you subtract the angle from 360° (2𝜋). I have used numpy.where for the calculation of the angle and then numpy.argsort to produce a mask for indexing the initial x,y-values. The following function sort_xy sorts all x and y coordinates with respect to this angle. If you want to start from any other point you could add an offset angle for that. In your case that would be zero though.
def sort_xy(x, y):
x0 = np.mean(x)
y0 = np.mean(y)
r = np.sqrt((x-x0)**2 + (y-y0)**2)
angles = np.where((y-y0) > 0, np.arccos((x-x0)/r), 2*np.pi-np.arccos((x-x0)/r))
mask = np.argsort(angles)
x_sorted = x[mask]
y_sorted = y[mask]
return x_sorted, y_sorted
Plotting x, y before sorting using matplotlib.pyplot.plot (points are obvisously not sorted):
Plotting x, y using matplotlib.pyplot.plot after sorting with this method:
If it is certain that the curve does not cross the same X coordinate (i.e. any vertical line) more than twice, then you could visit the points in X-sorted order and append a point to one of two tracks you follow: to the one whose last end point is the closest to the new one. One of these tracks will represent the "upper" part of the curve, and the other, the "lower" one.
The logic would be as follows:
dist2 = lambda a,b: (a[0]-b[0])*(a[0]-b[0]) + (a[1]-b[1])*(a[1]-b[1])
z = list(zip(x, y)) # get the list of coordinate pairs
z.sort() # sort by x coordinate
cw = z[0:1] # first point in clockwise direction
ccw = z[1:2] # first point in counter clockwise direction
# reverse the above assignment depending on how first 2 points relate
if z[1][1] > z[0][1]:
cw = z[1:2]
ccw = z[0:1]
for p in z[2:]:
# append to the list to which the next point is closest
if dist2(cw[-1], p) < dist2(ccw[-1], p):
cw.append(p)
else:
ccw.append(p)
cw.reverse()
result = cw + ccw
This would also work for a curve with steep fluctuations in the Y-coordinate, for which an angle-look-around from some central point would fail, like here:
No assumption is made about the range of the X nor of the Y coordinate: like for instance, the curve does not necessarily have to cross the X axis (Y = 0) for this to work.
Counter-clock-wise order depends on the choice of a pivot point. From your question, one good choice of the pivot point is the center of mass.
Something like this:
# Find the Center of Mass: data is a numpy array of shape (Npoints, 2)
mean = np.mean(data, axis=0)
# Compute angles
angles = np.arctan2((data-mean)[:, 1], (data-mean)[:, 0])
# Transform angles from [-pi,pi] -> [0, 2*pi]
angles[angles < 0] = angles[angles < 0] + 2 * np.pi
# Sort
sorting_indices = np.argsort(angles)
sorted_data = data[sorting_indices]
Not really a python question I think, but still I think you could try sorting by - sign(y) * x doing something like:
def counter_clockwise_sort(points):
return sorted(points, key=lambda point: point['x'] * (-1 if point['y'] >= 0 else 1))
should work fine, assuming you read your points properly into a list of dicts of format {'x': 0.12312, 'y': 0.912}
EDIT: This will work as long as you cross the X axis only twice, like in your example.
If:
the shape is arbitrarily complex and
the point spacing is ~random
then I think this is a really hard problem.
For what it's worth, I have faced a similar problem in the past, and I used a traveling salesman solver. In particular, I used the LKH solver. I see there is a Python repo for solving the problem, LKH-TSP. Once you have an order to the points, I don't think it will be too hard to decide on a clockwise vs clockwise ordering.
If we want to answer your specific problem, we need to pick a pivot point.
Since you want to sort according to the starting point you picked, I would take a pivot in the middle (x=4,y=0 will do).
Since we're sorting counterclockwise, we'll take arctan2(-(y-pivot_y),-(x-center_x)) (we're flipping the x axis).
We get the following, with a gradient colored scatter to prove correctness (fyi I removed the first line of the dat file after downloading):
import numpy as np
import matplotlib.pyplot as plt
points = np.loadtxt('points.dat')
#oneliner for ordering points (transform, adjust for 0 to 2pi, argsort, index at points)
ordered_points = points[np.argsort(np.apply_along_axis(lambda x: np.arctan2(-x[1],-x[0]+4) + np.pi*2, axis=1,arr=points)),:]
#color coding 0-1 as str for gray colormap in matplotlib
plt.scatter(ordered_points[:,0], ordered_points[:,1],c=[str(x) for x in np.arange(len(ordered_points)) / len(ordered_points)],cmap='gray')
Result (in the colormap 1 is white and 0 is black), they're numbered in the 0-1 range by order:
For points with comparable distances between their neighbouring pts, we can use KDTree to get two closest pts for each pt. Then draw lines connecting those to give us a closed shape contour. Then, we will make use of OpenCV's findContours to get contour traced always in counter-clockwise manner. Now, since OpenCV works on images, we need to sample data from the provided float format to uint8 image format. Given, comparable distances between two pts, that should be pretty safe. Also, OpenCV handles it well to make sure it traces even sharp corners in curvatures, i.e. smooth or not-smooth data would work just fine. And, there's no pivot requirement, etc. As such all kinds of shapes would be good to work with.
Here'e the implementation -
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist
from scipy.spatial import cKDTree
import cv2
from scipy.ndimage.morphology import binary_fill_holes
def counter_clockwise_order(a, DEBUG_PLOT=False):
b = a-a.min(0)
d = pdist(b).min()
c = np.round(2*b/d).astype(int)
img = np.zeros(c.max(0)[::-1]+1, dtype=np.uint8)
d1,d2 = cKDTree(c).query(c,k=3)
b = c[d2]
p1,p2,p3 = b[:,0],b[:,1],b[:,2]
for i in range(len(b)):
cv2.line(img,tuple(p1[i]),tuple(p2[i]),255,1)
cv2.line(img,tuple(p1[i]),tuple(p3[i]),255,1)
img = (binary_fill_holes(img==255)*255).astype(np.uint8)
if int(cv2.__version__.split('.')[0])>=3:
_,contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
else:
contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cont = contours[0][:,0]
f1,f2 = cKDTree(cont).query(c,k=1)
ordered_points = a[f2.argsort()[::-1]]
if DEBUG_PLOT==1:
NPOINTS = len(ordered_points)
for i in range(NPOINTS):
plt.plot(ordered_points[i:i+2,0],ordered_points[i:i+2,1],alpha=float(i)/(NPOINTS-1),color='k')
plt.show()
return ordered_points
Sample run -
# Load data in a 2D array with 2 columns
a = np.loadtxt('random_shape.csv',delimiter=' ')
ordered_a = counter_clockwise_order(a, DEBUG_PLOT=1)
Output -

nearest intersection point to many lines in python

I need a good algorithm for calculating the point that is closest to a collection of lines in python, preferably by using least squares. I found this post on a python implementation that doesn't work:
Finding the centre of multiple lines using least squares approach in Python
And I found this resource in Matlab that everyone seems to like... but I'm not sure how to convert it to python:
https://www.mathworks.com/matlabcentral/fileexchange/37192-intersection-point-of-lines-in-3d-space
I find it hard to believe that someone hasn't already done this... surely this is part of numpy or a standard package, right? I'm probably just not searching for the right terms - but I haven't been able to find it yet. I'd be fine with defining lines by two points each or by a point and a direction. Any help would be greatly appreciated!
Here's an example set of points that I'm working with:
initial XYZ points for the first set of lines
array([[-7.07107037, 7.07106748, 1. ],
[-7.34818339, 6.78264559, 1. ],
[-7.61352972, 6.48335745, 1. ],
[-7.8667115 , 6.17372055, 1. ],
[-8.1072994 , 5.85420065, 1. ]])
the angles that belong to the first set of lines
[-44.504854, -42.029223, -41.278573, -37.145774, -34.097022]
initial XYZ points for the second set of lines
array([[ 0., -20. , 1. ],
[ 7.99789129e-01, -19.9839984, 1. ],
[ 1.59830153e+00, -19.9360366, 1. ],
[ 2.39423914e+00, -19.8561769, 1. ],
[ 3.18637019e+00, -19.7445510, 1. ]])
the angles that belong to the second set of lines
[89.13244, 92.39087, 94.86425, 98.91849, 99.83488]
The solution should be the origin or very near it (the data is just a little noisy, which is why the lines don't perfectly intersect at a single point).
Here's a numpy solution using the method described in this link
def intersect(P0,P1):
"""P0 and P1 are NxD arrays defining N lines.
D is the dimension of the space. This function
returns the least squares intersection of the N
lines from the system given by eq. 13 in
http://cal.cs.illinois.edu/~johannes/research/LS_line_intersect.pdf.
"""
# generate all line direction vectors
n = (P1-P0)/np.linalg.norm(P1-P0,axis=1)[:,np.newaxis] # normalized
# generate the array of all projectors
projs = np.eye(n.shape[1]) - n[:,:,np.newaxis]*n[:,np.newaxis] # I - n*n.T
# see fig. 1
# generate R matrix and q vector
R = projs.sum(axis=0)
q = (projs # P0[:,:,np.newaxis]).sum(axis=0)
# solve the least squares problem for the
# intersection point p: Rp = q
p = np.linalg.lstsq(R,q,rcond=None)[0]
return p
Works
Edit: here is a generator for noisy test data
n = 6
P0 = np.stack((np.array([5,5])+3*np.random.random(size=2) for i in range(n)))
a = np.linspace(0,2*np.pi,n)+np.random.random(size=n)*np.pi/5.0
P1 = np.array([5+5*np.sin(a),5+5*np.cos(a)]).T
If this wikipedia equation carries any weight:
then you can use:
def nearest_intersection(points, dirs):
"""
:param points: (N, 3) array of points on the lines
:param dirs: (N, 3) array of unit direction vectors
:returns: (3,) array of intersection point
"""
dirs_mat = dirs[:, :, np.newaxis] # dirs[:, np.newaxis, :]
points_mat = points[:, :, np.newaxis]
I = np.eye(3)
return np.linalg.lstsq(
(I - dirs_mat).sum(axis=0),
((I - dirs_mat) # points_mat).sum(axis=0),
rcond=None
)[0]
If you want help deriving / checking that equation from first principles, then math.stackexchange.com would be a better place to ask.
surely this is part of numpy
Note that numpy gives you enough tools to express this very concisely already
Here's the final code that I ended up using. Thanks to kevinkayaks and everyone else who responded! Your help is very much appreciated!!!
The first half of this function simply converts the two collections of points and angles to direction vectors. I believe the rest of it is basically the same as what Eric and Eugene proposed. I just happened to have success first with Kevin's and ran with it until it was an end-to-end solution for me.
import numpy as np
def LS_intersect(p0,a0,p1,a1):
"""
:param p0 : Nx2 (x,y) position coordinates
:param p1 : Nx2 (x,y) position coordinates
:param a0 : angles in degrees for each point in p0
:param a1 : angles in degrees for each point in p1
:return: least squares intersection point of N lines from eq. 13 in
http://cal.cs.illinois.edu/~johannes/research/LS_line_intersect.pdf
"""
ang = np.concatenate( (a0,a1) ) # create list of angles
# create direction vectors with magnitude = 1
n = []
for a in ang:
n.append([np.cos(np.radians(a)), np.sin(np.radians(a))])
pos = np.concatenate((p0[:,0:2],p1[:,0:2])) # create list of points
n = np.array(n)
# generate the array of all projectors
nnT = np.array([np.outer(nn,nn) for nn in n ])
ImnnT = np.eye(len(pos[0]))-nnT # orthocomplement projectors to n
# now generate R matrix and q vector
R = np.sum(ImnnT,axis=0)
q = np.sum(np.array([np.dot(m,x) for m,x in zip(ImnnT,pos)]),axis=0)
# and solve the least squares problem for the intersection point p
return np.linalg.lstsq(R,q,rcond=None)[0]
#sample data
pa = np.array([[-7.07106638, 7.07106145, 1. ],
[-7.34817263, 6.78264524, 1. ],
[-7.61354115, 6.48336347, 1. ],
[-7.86671133, 6.17371816, 1. ],
[-8.10730426, 5.85419995, 1. ]])
paa = [-44.504854321138524, -42.02922380123842, -41.27857390748773, -37.145774853341386, -34.097022454778674]
pb = np.array([[-8.98220431e-07, -1.99999962e+01, 1.00000000e+00],
[ 7.99789129e-01, -1.99839984e+01, 1.00000000e+00],
[ 1.59830153e+00, -1.99360366e+01, 1.00000000e+00],
[ 2.39423914e+00, -1.98561769e+01, 1.00000000e+00],
[ 3.18637019e+00, -1.97445510e+01, 1.00000000e+00]])
pba = [88.71923357743934, 92.55801427272372, 95.3038321024299, 96.50212060095349, 100.24177145619092]
print("Should return (-0.03211692, 0.14173216)")
solution = LS_intersect(pa,paa,pb,pba)
print(solution)

How to interpolate a line between two other lines in python

Note: I asked this question before but it was closed as a duplicate, however, I, along with several others believe it was unduely closed, I explain why in an edit in my original post. So I would like to re-ask this question here again.
Does anyone know of a python library that can interpolate between two lines. For example, given the two solid lines below, I would like to produce the dashed line in the middle. In other words, I'd like to get the centreline. The input is a just two numpy arrays of coordinates with size N x 2 and M x 2 respectively.
Furthermore, I'd like to know if someone has written a function for this in some optimized python library. Although optimization isn't exactly a necessary.
Here is an example of two lines that I might have, you can assume they do not overlap with each other and an x/y can have multiple y/x coordinates.
array([[ 1233.87375018, 1230.07095987],
[ 1237.63559365, 1253.90749041],
[ 1240.87500801, 1264.43925132],
[ 1245.30875975, 1274.63795396],
[ 1256.1449357 , 1294.48254424],
[ 1264.33600095, 1304.47893299],
[ 1273.38192911, 1313.71468591],
[ 1283.12411536, 1322.35942538],
[ 1293.2559388 , 1330.55873344],
[ 1309.4817002 , 1342.53074698],
[ 1325.7074616 , 1354.50276051],
[ 1341.93322301, 1366.47477405],
[ 1358.15898441, 1378.44678759],
[ 1394.38474581, 1390.41880113]])
array([[ 1152.27115094, 1281.52899302],
[ 1155.53345506, 1295.30515742],
[ 1163.56506781, 1318.41642169],
[ 1168.03497425, 1330.03181319],
[ 1173.26135672, 1341.30559949],
[ 1184.07110925, 1356.54121651],
[ 1194.88086178, 1371.77683353],
[ 1202.58908737, 1381.41765447],
[ 1210.72465255, 1390.65097106],
[ 1227.81309742, 1403.2904646 ],
[ 1244.90154229, 1415.92995815],
[ 1261.98998716, 1428.56945169],
[ 1275.89219696, 1438.21626352],
[ 1289.79440676, 1447.86307535],
[ 1303.69661656, 1457.50988719],
[ 1323.80994319, 1470.41028655],
[ 1343.92326983, 1488.31068591],
[ 1354.31738934, 1499.33260989],
[ 1374.48879779, 1516.93734053],
[ 1394.66020624, 1534.54207116]])
Visualizing this we have:
So my attempt at this has been using the skeletonize function in the skimage.morphology library by first rasterizing the coordinates into a filled in polygon. However, I get branching at the ends like this:
First of all, pardon the overkill; I had fun with your question. If the description is too long, feel free to skip to the bottom, I defined a function that does everything I describe.
Your problem would be relatively straightforward if your arrays were the same length. In that case, all you would have to do is find the average between the corresponding x values in each array, and the corresponding y values in each array.
So what we can do is create arrays of the same length, that are more or less good estimates of your original arrays. We can do this by fitting a polynomial to the arrays you have. As noted in comments and other answers, the midline of your original arrays is not specifically defined, so a good estimate should fulfill your needs.
Note: In all of these examples, I've gone ahead and named the two arrays that you posted a1 and a2.
Step one: Create new arrays that estimate your old lines
Looking at the data you posted:
These aren't particularly complicated functions, it looks like a 3rd degree polynomial would fit them pretty well. We can create those using numpy:
import numpy as np
# Find the range of x values in a1
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
# Create an evenly spaced array that ranges from the minimum to the maximum
# I used 100 elements, but you can use more or fewer.
# This will be used as your new x coordinates
new_a1_x = np.linspace(min_a1_x, max_a1_x, 100)
# Fit a 3rd degree polynomial to your data
a1_coefs = np.polyfit(a1[:,0],a1[:,1], 3)
# Get your new y coordinates from the coefficients of the above polynomial
new_a1_y = np.polyval(a1_coefs, new_a1_x)
# Repeat for array 2:
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a2_x = np.linspace(min_a2_x, max_a2_x, 100)
a2_coefs = np.polyfit(a2[:,0],a2[:,1], 3)
new_a2_y = np.polyval(a2_coefs, new_a2_x)
The result:
That's not bad so bad! If you have more complicated functions, you'll have to fit a higher degree polynomial, or find some other adequate function to fit to your data.
Now, you've got two sets of arrays of the same length (I chose a length of 100, you can do more or less depending on how smooth you want your midpoint line to be). These sets represent the x and y coordinates of the estimates of your original arrays. In the example above, I named these new_a1_x, new_a1_y, new_a2_x and new_a2_y.
Step two: calculate the average between each x and each y in your new arrays
Then, we want to find the average x and average y value for each of our estimate arrays. Just use np.mean:
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(100)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(100)]
midx and midy now represent the midpoint between our 2 estimate arrays. Now, just plot your original (not estimate) arrays, alongside your midpoint array:
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
And voilà:
This method still works with more complex, noisy data (but you have to fit the function thoughtfully):
As a function:
I've put the above code in a function, so you can use it easily. It returns an array of your estimated midpoints, in the format you had your original arrays in.
The arguments: a1 and a2 are your 2 input arrays, poly_deg is the degree polynomial you want to fit, n_points is the number of points you want in your midpoint array, and plot is a boolean, whether you want to plot it or not.
import matplotlib.pyplot as plt
import numpy as np
def interpolate(a1, a2, poly_deg=3, n_points=100, plot=True):
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
new_a1_x = np.linspace(min_a1_x, max_a1_x, n_points)
a1_coefs = np.polyfit(a1[:,0],a1[:,1], poly_deg)
new_a1_y = np.polyval(a1_coefs, new_a1_x)
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a2_x = np.linspace(min_a2_x, max_a2_x, n_points)
a2_coefs = np.polyfit(a2[:,0],a2[:,1], poly_deg)
new_a2_y = np.polyval(a2_coefs, new_a2_x)
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(n_points)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(n_points)]
if plot:
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
return np.array([[x, y] for x, y in zip(midx, midy)])
[EDIT]:
I was thinking back on this question, and I overlooked a simpler way to do this, by "densifying" both arrays to the same number of points using np.interp. This method follows the same basic idea as the line-fitting method above, but instead of approximating lines using polyfit / polyval, it just densifies:
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a1_x = np.linspace(min_a1_x, max_a1_x, 100)
new_a2_x = np.linspace(min_a2_x, max_a2_x, 100)
new_a1_y = np.interp(new_a1_x, a1[:,0], a1[:,1])
new_a2_y = np.interp(new_a2_x, a2[:,0], a2[:,1])
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(100)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(100)]
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
The "line between two lines" is not so well defined. You can obtain a decent though simple solution by triangulating between the two curves (you can triangulate by progressing from vertex to vertex, choosing the diagonals that produce the less skewed triangle).
Then the interpolated curve joins the middles of the sides.
I work with rivers, so this is a common problem. One of my solutions is exactly like the one you showed in your question--i.e. skeletonize the blob. You see that the boundaries have problems, so what I've done that seems to work well is to simply mirror the boundaries. For this approach to work, the blob must not intersect the corners of the image.
You can find my implementation in RivGraph; this particular algorithm is in rivers/river_utils.py called "mask_to_centerline".
Here's an example output showing how the ends of the centerline extend to the desired edge of the object:
sacuL's solution almost worked for me, but I needed to aggregate more than just two curves.
Here is my generalization for sacuL's solution:
def interp(*axis_list):
min_max_xs = [(min(axis[:,0]), max(axis[:,0])) for axis in axis_list]
new_axis_xs = [np.linspace(min_x, max_x, 100) for min_x, max_x in min_max_xs]
new_axis_ys = [np.interp(new_x_axis, axis[:,0], axis[:,1]) for axis, new_x_axis in zip(axis_list, new_axis_xs)]
midx = [np.mean([new_axis_xs[axis_idx][i] for axis_idx in range(len(axis_list))]) for i in range(100)]
midy = [np.mean([new_axis_ys[axis_idx][i] for axis_idx in range(len(axis_list))]) for i in range(100)]
for axis in axis_list:
plt.plot(axis[:,0], axis[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
If we now run an example:
a1 = np.array([[x, x**2+5*(x%4)] for x in range(10)])
a2 = np.array([[x-0.5, x**2+6*(x%3)] for x in range(10)])
a3 = np.array([[x+0.2, x**2+7*(x%2)] for x in range(10)])
interp(a1, a2, a3)
we get the plot:

Detect loops/intersections in matplotlib scatter plot

At some point in my work, I came up with that kind of scatter plot.
I would like for my script to be able to detect the fact that it "loops" and to give me the point (or an approximation thereof) where it does so : for instance, in this case it would be about [0.2,0.1].
I tried to play around with some representative quantities of my points, like norm and/or argument, like in the following piece of code.
import numpy as np
x,y = np.genfromtxt('points.dat',unpack=True)
norm = np.sqrt(x**2+y**2)
arg = np.arctan2(y,x)
left,right = np.meshgrid(norm,norm)
norm_diff = np.fabs(left - right)
mask = norm_diff == 0.
norm_diff_ma = np.ma.masked_array(norm_diff,mask)
left,right = np.meshgrid(arg,arg)
arg_diff = np.fabs(left - right)
mask = arg_diff == 0.
arg_diff_ma = np.ma.masked_array(arg_diff,mask)
list_of_indices = np.ma.where((norm_diff_ma<1.0e-04)*(arg_diff_ma<1.0e-04))
But, it does not work as intended : might be because the dataset contains too many points and the distance between two aligned points is anyhow of the same order of magnitude as the distance between the points in the "loop cluster" ...
I was thinking about detecting clusters, or maybe even detecting lines in the scatter plot and then see if there are any intersections between any two lines, but I am afraid my skills in image processing only go so far.
Is there any algorithm, any trick that any of you can think about would work here ?
A representative data sample can be found here.
Edit 08/13/2015 16h18 : after the short discussion with #DrBwts I took a closer look at the data I obtained after a pyplot.contour() call. Using the following routine to extract all the vertices :
def contour_points(contour, steps=1):
try:
loc_arr = np.row_stack([path.interpolated(steps).vertices for linecol in contour.collections for path in linecol.get_paths()])
except ValueError:
loc_arr = np.empty((0,2))
finally:
return loc_arr
y,x = contour_points(CS,steps=1).T
it turns out the points of coordinates (x,y) are ordered, in the sense where a call to pyplot.plot() connects the dots correctly.

Categories