Create array mask from randomly ordered list

Create array mask from randomly ordered list - python

I have a dataset made of velocity data on an unstructured grid from a CFD simulation, in the structure:
data = [[x1, y1, u1, v1], ... , [xn, yn, un, vn]]
I need to have a regular grid inside the area covered by this data. However, I do not have information about the boundaries of the x, y domain other than the x, y values itself. The boundary is defined by a complex geometrical shape.
My solution would be to create a rectangular grid with numpy.mgrid and then construct an array mask to mask out areas with no data.
But I have no idea how to get a mask just from the randomly ordered coordinates. I tried using scipy's ConvexHull to find the boundaries but it is a concave problem. However, even if I had the boundary points, I am not sure how to create the mask from it, since the indices are not the same as in the regular grid.
How to determine this grid? Is there any other possibility? Maybe its useful to reorder the dataset?

I'm really unfamiliar with your use-case so I may be way off base here. It sounds like you're effectively looking for the min/max for each coordinate system to create a mask for? Conceptually (maybe not efficient for large datasets):
x_min = min([a[0] for a in data])
With that I think you'd be able to say something like "the domain of x is [x_min, x_max]"
Here's a full example that you can copy/past to see if it produces what you're looking for:
from random import randint
# Convenience
def r():
return randint(-100, 100)
# Generate 100 random coordinates
data = [[r(), r(), r(), r()] for _ in range(0, 100)]
x_min = min([a[0] for a in data])
x_max = max([a[0] for a in data])
y_min = min([a[1] for a in data])
y_max = max([a[1] for a in data])
u_min = min([a[2] for a in data])
u_max = max([a[2] for a in data])
v_min = min([a[3] for a in data])
v_max = max([a[3] for a in data])
print(f'X-Range: {x_min} to {x_max}')
print(f'Y-Range: {y_min} to {y_max}')
print(f'U-Range: {u_min} to {u_max}')
print(f'V-Range: {v_min} to {v_max}')
That produces this:
X-Range: -98 to 96
Y-Range: -100 to 96
U-Range: -95 to 100
V-Range: -100 to 100
While any single entry within data might be this:
print(data[randint(0, len(data)])
[70, -69, -59, -49]

Related

Counterclockwise sorting of x, y data

I have a set of points in a text file: random_shape.dat.
The initial order of points in the file is random. I would like to sort these points in a counter-clockwise order as follows (the red dots are the xy data):
I tried to achieve that by using the polar coordinates: I calculate the polar angle of each point (x,y) then sort by the ascending angles, as follows:
"""
Script: format_file.py
Description: This script will format the xy data file accordingly to be used with a program expecting CCW order of data points, By soting the points in Counterclockwise order
Example: python format_file.py random_shape.dat
"""
import sys
import numpy as np
# Read the file name
filename = sys.argv[1]
# Get the header name from the first line of the file (without the newline character)
with open(filename, 'r') as f:
header = f.readline().rstrip('\n')
angles = []
# Read the data from the file
x, y = np.loadtxt(filename, skiprows=1, unpack=True)
for xi, yi in zip(x, y):
angle = np.arctan2(yi, xi)
if angle < 0:
angle += 2*np.pi # map the angle to 0,2pi interval
angles.append(angle)
# create a numpy array
angles = np.array(angles)
# Get the arguments of sorted 'angles' array
angles_argsort = np.argsort(angles)
# Sort x and y
new_x = x[angles_argsort]
new_y = y[angles_argsort]
print("Length of new x:", len(new_x))
print("Length of new y:", len(new_y))
with open(filename.split('.')[0] + '_formatted.dat', 'w') as f:
print(header, file=f)
for xi, yi in zip(new_x, new_y):
print(xi, yi, file=f)
print("Done!")
By running the script:
python format_file.py random_shape.dat
Unfortunately I don't get the expected results in random_shape_formated.dat! The points are not sorted in the desired order.
Any help is appreciated.
EDIT: The expected resutls:
Create a new file named: filename_formatted.dat that contains the sorted data according to the image above (The first line contains the starting point, the next lines contain the points as shown by the blue arrows in counterclockwise direction in the image).
EDIT 2: The xy data added here instead of using github gist:
random_shape
0.4919261070361315 0.0861956168831175
0.4860816807027076 -0.06601587301587264
0.5023029456281289 -0.18238249845392662
0.5194784026079869 0.24347943722943777
0.5395164357511545 -0.3140611471861465
0.5570497147514262 0.36010146103896146
0.6074231036252226 -0.4142604617604615
0.6397066014669927 0.48590810704447085
0.7048302091822873 -0.5173701298701294
0.7499157837544145 0.5698170011806378
0.8000108666123336 -0.6199254449254443
0.8601249660418364 0.6500974025974031
0.9002010323281716 -0.7196585989767801
0.9703341483292582 0.7299242424242429
1.0104102146155935 -0.7931355765446666
1.0805433306166803 0.8102046438410078
1.1206193969030154 -0.865251869342778
1.1907525129041021 0.8909386068476981
1.2308285791904374 -0.9360074773711129
1.300961695191524 0.971219008264463
1.3410377614778592 -1.0076702085792988
1.4111708774789458 1.051499409681228
1.451246943765281 -1.0788793781975592
1.5213800597663678 1.1317798110979933
1.561456126052703 -1.1509956709956706
1.6315892420537896 1.2120602125147582
1.671665308340125 -1.221751279024005
1.7417984243412115 1.2923406139315234
1.7818744906275468 -1.2943211334120424
1.8520076066286335 1.3726210153482883
1.8920836729149686 -1.3596340023612745
1.9622167889160553 1.4533549783549786
2.0022928552023904 -1.4086186540731989
2.072425971203477 1.5331818181818184
2.1125020374898122 -1.451707005116095
2.182635153490899 1.6134622195985833
2.2227112197772345 -1.4884454939000387
2.292844335778321 1.6937426210153486
2.3329204020646563 -1.5192876820149541
2.403053518065743 1.774476584022039
2.443129584352078 -1.5433264462809912
2.513262700353165 1.8547569854388037
2.5533387666395 -1.561015348288075
2.6234718826405867 1.9345838252656438
2.663547948926922 -1.5719008264462806
2.7336810649280086 1.9858362849271942
2.7737571312143436 -1.5750757575757568
2.8438902472154304 2.009421487603306
2.883966313501766 -1.5687258953168035
2.954099429502852 2.023481896890988
2.9941754957891877 -1.5564797323888229
3.0643086117902745 2.0243890200708385
3.1043846780766096 -1.536523022432113
3.1745177940776963 2.0085143644234558
3.2145938603640314 -1.5088557654466737
3.284726976365118 1.9749508067689887
3.324803042651453 -1.472570838252656
3.39493615865254 1.919162731208186
3.435012224938875 -1.4285753640299088
3.5051453409399618 1.8343467138921687
3.545221407226297 -1.3786835891381335
3.6053355066557997 1.7260966810966811
3.655430589513719 -1.3197205824478546
3.6854876392284703 1.6130086580086582
3.765639771801141 -1.2544077134986225
3.750611246943765 1.5024152236652237
3.805715838087476 1.3785173160173163
3.850244800627849 1.2787337662337666
3.875848954088563 -1.1827449822904361
3.919007794704616 1.1336638361638363
3.9860581363759846 -1.1074537583628485
3.9860581363759846 1.0004485329485333
4.058012891753723 0.876878197560016
4.096267318663407 -1.0303482880755608
4.15638141809291 0.7443374218374221
4.206476500950829 -0.9514285714285711
4.256571583808748 0.6491902794175526
4.3166856832382505 -0.8738695395513574
4.36678076609617 0.593855765446675
4.426894865525672 -0.7981247540338443
4.476989948383592 0.5802489177489183
4.537104047813094 -0.72918339236521
4.587199130671014 0.5902272727272733
4.647313230100516 -0.667045454545454
4.697408312958435 0.6246979535615904
4.757522412387939 -0.6148858717040526
4.807617495245857 0.6754968516332154
4.8677315946753605 -0.5754260133805582
4.917826677533279 0.7163173947264858
4.977940776962782 -0.5500265643447455
5.028035859820701 0.7448917748917752
5.088149959250204 -0.5373268398268394
5.138245042108123 0.7702912239275879
5.198359141537626 -0.5445838252656432
5.2484542243955445 0.7897943722943728
5.308568323825048 -0.5618191656828015
5.358663406682967 0.8052154663518301
5.41877750611247 -0.5844972451790631
5.468872588970389 0.8156473829201105
5.5289866883998915 -0.6067217630853987
5.579081771257811 0.8197294372294377
5.639195870687313 -0.6248642266824076
5.689290953545233 0.8197294372294377
5.749405052974735 -0.6398317591499403
5.799500135832655 0.8142866981503349
5.859614235262157 -0.6493565525383702
5.909709318120076 0.8006798504525783
5.969823417549579 -0.6570670995670991
6.019918500407498 0.7811767020857934
6.080032599837001 -0.6570670995670991
6.13012768269492 0.7562308146399057
6.190241782124423 -0.653438606847697
6.240336864982342 0.7217601338055886
6.300450964411845 -0.6420995670995664
6.350546047269764 0.6777646595828419
6.410660146699267 -0.6225964187327819
6.4607552295571855 0.6242443919716649
6.520869328986689 -0.5922077922077915
6.570964411844607 0.5548494687131056
6.631078511274111 -0.5495730027548205
6.681173594132029 0.4686727666273125
6.7412876935615325 -0.4860743801652889
6.781363759847868 0.3679316979316982
6.84147785927737 -0.39541245791245716
6.861515892420538 0.25880333951762546
6.926639500135833 -0.28237987012986965
6.917336127605076 0.14262677798392165
6.946677533279001 0.05098957832291173
6.967431210462995 -0.13605442176870675
6.965045730326905 -0.03674603174603108

I find that an easy way to sort points with x,y-coordinates like that is to sort them dependent on the angle between the line from the points and the center of mass of the whole polygon and the horizontal line which is called alpha in the example. The coordinates of the center of mass (x0 and y0) can easily be calculated by averaging the x,y coordinates of all points. Then you calculate the angle using numpy.arccos for instance. When y-y0 is larger than 0 you take the angle directly, otherwise you subtract the angle from 360° (2𝜋). I have used numpy.where for the calculation of the angle and then numpy.argsort to produce a mask for indexing the initial x,y-values. The following function sort_xy sorts all x and y coordinates with respect to this angle. If you want to start from any other point you could add an offset angle for that. In your case that would be zero though.
def sort_xy(x, y):
x0 = np.mean(x)
y0 = np.mean(y)
r = np.sqrt((x-x0)**2 + (y-y0)**2)
angles = np.where((y-y0) > 0, np.arccos((x-x0)/r), 2*np.pi-np.arccos((x-x0)/r))
mask = np.argsort(angles)
x_sorted = x[mask]
y_sorted = y[mask]
return x_sorted, y_sorted
Plotting x, y before sorting using matplotlib.pyplot.plot (points are obvisously not sorted):
Plotting x, y using matplotlib.pyplot.plot after sorting with this method:

If it is certain that the curve does not cross the same X coordinate (i.e. any vertical line) more than twice, then you could visit the points in X-sorted order and append a point to one of two tracks you follow: to the one whose last end point is the closest to the new one. One of these tracks will represent the "upper" part of the curve, and the other, the "lower" one.
The logic would be as follows:
dist2 = lambda a,b: (a[0]-b[0])*(a[0]-b[0]) + (a[1]-b[1])*(a[1]-b[1])
z = list(zip(x, y)) # get the list of coordinate pairs
z.sort() # sort by x coordinate
cw = z[0:1] # first point in clockwise direction
ccw = z[1:2] # first point in counter clockwise direction
# reverse the above assignment depending on how first 2 points relate
if z[1][1] > z[0][1]:
cw = z[1:2]
ccw = z[0:1]
for p in z[2:]:
# append to the list to which the next point is closest
if dist2(cw[-1], p) < dist2(ccw[-1], p):
cw.append(p)
else:
ccw.append(p)
cw.reverse()
result = cw + ccw
This would also work for a curve with steep fluctuations in the Y-coordinate, for which an angle-look-around from some central point would fail, like here:
No assumption is made about the range of the X nor of the Y coordinate: like for instance, the curve does not necessarily have to cross the X axis (Y = 0) for this to work.

Counter-clock-wise order depends on the choice of a pivot point. From your question, one good choice of the pivot point is the center of mass.
Something like this:
# Find the Center of Mass: data is a numpy array of shape (Npoints, 2)
mean = np.mean(data, axis=0)
# Compute angles
angles = np.arctan2((data-mean)[:, 1], (data-mean)[:, 0])
# Transform angles from [-pi,pi] -> [0, 2*pi]
angles[angles < 0] = angles[angles < 0] + 2 * np.pi
# Sort
sorting_indices = np.argsort(angles)
sorted_data = data[sorting_indices]

Not really a python question I think, but still I think you could try sorting by - sign(y) * x doing something like:
def counter_clockwise_sort(points):
return sorted(points, key=lambda point: point['x'] * (-1 if point['y'] >= 0 else 1))
should work fine, assuming you read your points properly into a list of dicts of format {'x': 0.12312, 'y': 0.912}
EDIT: This will work as long as you cross the X axis only twice, like in your example.

If:
the shape is arbitrarily complex and
the point spacing is ~random
then I think this is a really hard problem.
For what it's worth, I have faced a similar problem in the past, and I used a traveling salesman solver. In particular, I used the LKH solver. I see there is a Python repo for solving the problem, LKH-TSP. Once you have an order to the points, I don't think it will be too hard to decide on a clockwise vs clockwise ordering.

If we want to answer your specific problem, we need to pick a pivot point.
Since you want to sort according to the starting point you picked, I would take a pivot in the middle (x=4,y=0 will do).
Since we're sorting counterclockwise, we'll take arctan2(-(y-pivot_y),-(x-center_x)) (we're flipping the x axis).
We get the following, with a gradient colored scatter to prove correctness (fyi I removed the first line of the dat file after downloading):
import numpy as np
import matplotlib.pyplot as plt
points = np.loadtxt('points.dat')
#oneliner for ordering points (transform, adjust for 0 to 2pi, argsort, index at points)
ordered_points = points[np.argsort(np.apply_along_axis(lambda x: np.arctan2(-x[1],-x[0]+4) + np.pi*2, axis=1,arr=points)),:]
#color coding 0-1 as str for gray colormap in matplotlib
plt.scatter(ordered_points[:,0], ordered_points[:,1],c=[str(x) for x in np.arange(len(ordered_points)) / len(ordered_points)],cmap='gray')
Result (in the colormap 1 is white and 0 is black), they're numbered in the 0-1 range by order:

For points with comparable distances between their neighbouring pts, we can use KDTree to get two closest pts for each pt. Then draw lines connecting those to give us a closed shape contour. Then, we will make use of OpenCV's findContours to get contour traced always in counter-clockwise manner. Now, since OpenCV works on images, we need to sample data from the provided float format to uint8 image format. Given, comparable distances between two pts, that should be pretty safe. Also, OpenCV handles it well to make sure it traces even sharp corners in curvatures, i.e. smooth or not-smooth data would work just fine. And, there's no pivot requirement, etc. As such all kinds of shapes would be good to work with.
Here'e the implementation -
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist
from scipy.spatial import cKDTree
import cv2
from scipy.ndimage.morphology import binary_fill_holes
def counter_clockwise_order(a, DEBUG_PLOT=False):
b = a-a.min(0)
d = pdist(b).min()
c = np.round(2*b/d).astype(int)
img = np.zeros(c.max(0)[::-1]+1, dtype=np.uint8)
d1,d2 = cKDTree(c).query(c,k=3)
b = c[d2]
p1,p2,p3 = b[:,0],b[:,1],b[:,2]
for i in range(len(b)):
cv2.line(img,tuple(p1[i]),tuple(p2[i]),255,1)
cv2.line(img,tuple(p1[i]),tuple(p3[i]),255,1)
img = (binary_fill_holes(img==255)*255).astype(np.uint8)
if int(cv2.__version__.split('.')[0])>=3:
_,contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
else:
contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cont = contours[0][:,0]
f1,f2 = cKDTree(cont).query(c,k=1)
ordered_points = a[f2.argsort()[::-1]]
if DEBUG_PLOT==1:
NPOINTS = len(ordered_points)
for i in range(NPOINTS):
plt.plot(ordered_points[i:i+2,0],ordered_points[i:i+2,1],alpha=float(i)/(NPOINTS-1),color='k')
plt.show()
return ordered_points
Sample run -
# Load data in a 2D array with 2 columns
a = np.loadtxt('random_shape.csv',delimiter=' ')
ordered_a = counter_clockwise_order(a, DEBUG_PLOT=1)
Output -

Incorrect results for simple 2D transformation

I'm attempting a 2D transformation using the nudged package.
The code is really simple:
import nudged
# Domain data
x_d = [2538.87, 1294.42, 3002.49, 2591.56, 2881.37, 891.906, 1041.24, 2740.13, 1928.55, 3335.12, 3771.76, 1655.0, 696.772, 583.242, 2313.95, 2422.2]
y_d = [2501.89, 4072.37, 2732.65, 2897.21, 808.969, 1760.97, 992.531, 1647.57, 2407.18, 2868.68, 724.832, 1938.11, 1487.66, 1219.14, 672.898, 145.059]
# Range data
x_r = [3.86551776277075, 3.69693290266126, 3.929110096606081, 3.8731112887391532, 3.9115924127798536, 3.6388068074815862, 3.6590261077461577, 3.892482104449016, 3.781816183438835, 3.97464058821231, 4.033173444601999, 3.743901522907265, 3.6117470568340906, 3.5959585708147728, 3.8338853650390945, 3.8487836817639334]
y_r = [1.6816478101135388, 1.8732008327428353, 1.7089144628920678, 1.729386055302033, 1.4767657611559102, 1.5933812675900505, 1.5003232598807479, 1.5781629182153942, 1.670867507106891, 1.7248363641300841, 1.4654588884234485, 1.6143557610354264, 1.5603626129237362, 1.5278835570641824, 1.4609066190929916, 1.397111300807424]
# Random domain data
x, y = np.random.uniform(0., 4000., (2, 1000))
# Define domain and range points
dom, ran = (x_d, y_d), (x_r, y_r)
# Obtain transformation dom --> ran
trans = nudged.estimate(dom, ran)
# Apply the transformation to the (x, y) points
x_t, y_t = trans.transform((x, y))
where (x_d, y_d) and (x_r, y_r) are the 1 to 1 correlated "domain" and "range" points, and (x, y) are all the points in the (x_d, y_d) (domain) system that I want to transform to the (x_r, y_r) (range) system.
This is the result I get:
where:
trans.get_matrix()
[[-0.0006459232439068067, -0.0007947429558548157, 6.534164085946009], [0.0007947429558548157, -0.0006459232439068067, 2.515279819707991], [0, 0, 1]]
trans.get_rotation()
2.2532603497070713
trans.get_scale()
0.0010241255796531702
trans.get_translation()
[6.534164085946009, 2.515279819707991]
This is the final transformed dom values with the original ran points overlayed:
This is clearly not right and I can't figure out what I'm doing wrong.

I was able to figure out your issue. It is simply that nudge has somewhat problematic notation, which is poorly documented.
The estimate function accepts a list of coordinate pairs. You effectively have to transpose dom and ran to get this to work. I suggest either switching to numpy arrays, or using list(map(list, zip(...))) to do the transpose.
The Transform.transfom method is extremely restrictive, and requires that the inner pairs be of type list. Not tuple, not any other sequence, but specifically list. Your attempt to call trans.transform((x, y)) only happened to work by pure luck. transform assessed that the first element is not a list, and attempted to transform (x, y) as a pair of integers. Luckily for you, numpy operators are vectorized, so you can process an entire array as a single unit.
Here is a working version of your code that generates the correct plots using mostly python:
x_d = [2538.87, 1294.42, 3002.49, 2591.56, 2881.37, 891.906, 1041.24, 2740.13, 1928.55, 3335.12, 3771.76, 1655.0, 696.772, 583.242, 2313.95, 2422.2]
y_d = [2501.89, 4072.37, 2732.65, 2897.21, 808.969, 1760.97, 992.531, 1647.57, 2407.18, 2868.68, 724.832, 1938.11, 1487.66, 1219.14, 672.898, 145.059]
# Range data
x_r = [3.86551776277075, 3.69693290266126, 3.929110096606081, 3.8731112887391532, 3.9115924127798536, 3.6388068074815862, 3.6590261077461577, 3.892482104449016, 3.781816183438835, 3.97464058821231, 4.033173444601999, 3.743901522907265, 3.6117470568340906, 3.5959585708147728, 3.8338853650390945, 3.8487836817639334]
y_r = [1.6816478101135388, 1.8732008327428353, 1.7089144628920678, 1.729386055302033, 1.4767657611559102, 1.5933812675900505, 1.5003232598807479, 1.5781629182153942, 1.670867507106891, 1.7248363641300841, 1.4654588884234485, 1.6143557610354264, 1.5603626129237362, 1.5278835570641824, 1.4609066190929916, 1.397111300807424]
# Random domain data
uni = np.random.uniform(0., 4000., (2, 1000))
# Define domain and range points
dom = list(map(list, zip(x_d, y_d)))
ran = list(map(list, zip(x_r, y_r)))
# Obtain transformation dom --> ran
trans = estimate(dom, ran)
# Apply the transformation to the (x, y) points
tra = trans.transform(uni)
fig, ax = plt.subplots(2, 2)
ax[0][0].scatter(x_d, y_d)
ax[0][0].set_title('dom')
ax[0][1].scatter(x_r, y_r)
ax[0][1].set_title('ran')
ax[1][0].scatter(*uni)
ax[1][1].scatter(*tra)
I left in your hack with uni, since I did not feel like converting the array of random values to a nested list. The resulting plot looks like this:
My overall recommendation is to submit a number of bug reports to the nudge library based on these findings.

How to interpolate a line between two other lines in python

Note: I asked this question before but it was closed as a duplicate, however, I, along with several others believe it was unduely closed, I explain why in an edit in my original post. So I would like to re-ask this question here again.
Does anyone know of a python library that can interpolate between two lines. For example, given the two solid lines below, I would like to produce the dashed line in the middle. In other words, I'd like to get the centreline. The input is a just two numpy arrays of coordinates with size N x 2 and M x 2 respectively.
Furthermore, I'd like to know if someone has written a function for this in some optimized python library. Although optimization isn't exactly a necessary.
Here is an example of two lines that I might have, you can assume they do not overlap with each other and an x/y can have multiple y/x coordinates.
array([[ 1233.87375018, 1230.07095987],
[ 1237.63559365, 1253.90749041],
[ 1240.87500801, 1264.43925132],
[ 1245.30875975, 1274.63795396],
[ 1256.1449357 , 1294.48254424],
[ 1264.33600095, 1304.47893299],
[ 1273.38192911, 1313.71468591],
[ 1283.12411536, 1322.35942538],
[ 1293.2559388 , 1330.55873344],
[ 1309.4817002 , 1342.53074698],
[ 1325.7074616 , 1354.50276051],
[ 1341.93322301, 1366.47477405],
[ 1358.15898441, 1378.44678759],
[ 1394.38474581, 1390.41880113]])
array([[ 1152.27115094, 1281.52899302],
[ 1155.53345506, 1295.30515742],
[ 1163.56506781, 1318.41642169],
[ 1168.03497425, 1330.03181319],
[ 1173.26135672, 1341.30559949],
[ 1184.07110925, 1356.54121651],
[ 1194.88086178, 1371.77683353],
[ 1202.58908737, 1381.41765447],
[ 1210.72465255, 1390.65097106],
[ 1227.81309742, 1403.2904646 ],
[ 1244.90154229, 1415.92995815],
[ 1261.98998716, 1428.56945169],
[ 1275.89219696, 1438.21626352],
[ 1289.79440676, 1447.86307535],
[ 1303.69661656, 1457.50988719],
[ 1323.80994319, 1470.41028655],
[ 1343.92326983, 1488.31068591],
[ 1354.31738934, 1499.33260989],
[ 1374.48879779, 1516.93734053],
[ 1394.66020624, 1534.54207116]])
Visualizing this we have:
So my attempt at this has been using the skeletonize function in the skimage.morphology library by first rasterizing the coordinates into a filled in polygon. However, I get branching at the ends like this:

First of all, pardon the overkill; I had fun with your question. If the description is too long, feel free to skip to the bottom, I defined a function that does everything I describe.
Your problem would be relatively straightforward if your arrays were the same length. In that case, all you would have to do is find the average between the corresponding x values in each array, and the corresponding y values in each array.
So what we can do is create arrays of the same length, that are more or less good estimates of your original arrays. We can do this by fitting a polynomial to the arrays you have. As noted in comments and other answers, the midline of your original arrays is not specifically defined, so a good estimate should fulfill your needs.
Note: In all of these examples, I've gone ahead and named the two arrays that you posted a1 and a2.
Step one: Create new arrays that estimate your old lines
Looking at the data you posted:
These aren't particularly complicated functions, it looks like a 3rd degree polynomial would fit them pretty well. We can create those using numpy:
import numpy as np
# Find the range of x values in a1
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
# Create an evenly spaced array that ranges from the minimum to the maximum
# I used 100 elements, but you can use more or fewer.
# This will be used as your new x coordinates
new_a1_x = np.linspace(min_a1_x, max_a1_x, 100)
# Fit a 3rd degree polynomial to your data
a1_coefs = np.polyfit(a1[:,0],a1[:,1], 3)
# Get your new y coordinates from the coefficients of the above polynomial
new_a1_y = np.polyval(a1_coefs, new_a1_x)
# Repeat for array 2:
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a2_x = np.linspace(min_a2_x, max_a2_x, 100)
a2_coefs = np.polyfit(a2[:,0],a2[:,1], 3)
new_a2_y = np.polyval(a2_coefs, new_a2_x)
The result:
That's not bad so bad! If you have more complicated functions, you'll have to fit a higher degree polynomial, or find some other adequate function to fit to your data.
Now, you've got two sets of arrays of the same length (I chose a length of 100, you can do more or less depending on how smooth you want your midpoint line to be). These sets represent the x and y coordinates of the estimates of your original arrays. In the example above, I named these new_a1_x, new_a1_y, new_a2_x and new_a2_y.
Step two: calculate the average between each x and each y in your new arrays
Then, we want to find the average x and average y value for each of our estimate arrays. Just use np.mean:
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(100)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(100)]
midx and midy now represent the midpoint between our 2 estimate arrays. Now, just plot your original (not estimate) arrays, alongside your midpoint array:
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
And voilà:
This method still works with more complex, noisy data (but you have to fit the function thoughtfully):
As a function:
I've put the above code in a function, so you can use it easily. It returns an array of your estimated midpoints, in the format you had your original arrays in.
The arguments: a1 and a2 are your 2 input arrays, poly_deg is the degree polynomial you want to fit, n_points is the number of points you want in your midpoint array, and plot is a boolean, whether you want to plot it or not.
import matplotlib.pyplot as plt
import numpy as np
def interpolate(a1, a2, poly_deg=3, n_points=100, plot=True):
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
new_a1_x = np.linspace(min_a1_x, max_a1_x, n_points)
a1_coefs = np.polyfit(a1[:,0],a1[:,1], poly_deg)
new_a1_y = np.polyval(a1_coefs, new_a1_x)
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a2_x = np.linspace(min_a2_x, max_a2_x, n_points)
a2_coefs = np.polyfit(a2[:,0],a2[:,1], poly_deg)
new_a2_y = np.polyval(a2_coefs, new_a2_x)
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(n_points)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(n_points)]
if plot:
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
return np.array([[x, y] for x, y in zip(midx, midy)])
[EDIT]:
I was thinking back on this question, and I overlooked a simpler way to do this, by "densifying" both arrays to the same number of points using np.interp. This method follows the same basic idea as the line-fitting method above, but instead of approximating lines using polyfit / polyval, it just densifies:
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a1_x = np.linspace(min_a1_x, max_a1_x, 100)
new_a2_x = np.linspace(min_a2_x, max_a2_x, 100)
new_a1_y = np.interp(new_a1_x, a1[:,0], a1[:,1])
new_a2_y = np.interp(new_a2_x, a2[:,0], a2[:,1])
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(100)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(100)]
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()

The "line between two lines" is not so well defined. You can obtain a decent though simple solution by triangulating between the two curves (you can triangulate by progressing from vertex to vertex, choosing the diagonals that produce the less skewed triangle).
Then the interpolated curve joins the middles of the sides.

I work with rivers, so this is a common problem. One of my solutions is exactly like the one you showed in your question--i.e. skeletonize the blob. You see that the boundaries have problems, so what I've done that seems to work well is to simply mirror the boundaries. For this approach to work, the blob must not intersect the corners of the image.
You can find my implementation in RivGraph; this particular algorithm is in rivers/river_utils.py called "mask_to_centerline".
Here's an example output showing how the ends of the centerline extend to the desired edge of the object:

sacuL's solution almost worked for me, but I needed to aggregate more than just two curves.
Here is my generalization for sacuL's solution:
def interp(*axis_list):
min_max_xs = [(min(axis[:,0]), max(axis[:,0])) for axis in axis_list]
new_axis_xs = [np.linspace(min_x, max_x, 100) for min_x, max_x in min_max_xs]
new_axis_ys = [np.interp(new_x_axis, axis[:,0], axis[:,1]) for axis, new_x_axis in zip(axis_list, new_axis_xs)]
midx = [np.mean([new_axis_xs[axis_idx][i] for axis_idx in range(len(axis_list))]) for i in range(100)]
midy = [np.mean([new_axis_ys[axis_idx][i] for axis_idx in range(len(axis_list))]) for i in range(100)]
for axis in axis_list:
plt.plot(axis[:,0], axis[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
If we now run an example:
a1 = np.array([[x, x**2+5*(x%4)] for x in range(10)])
a2 = np.array([[x-0.5, x**2+6*(x%3)] for x in range(10)])
a3 = np.array([[x+0.2, x**2+7*(x%2)] for x in range(10)])
interp(a1, a2, a3)
we get the plot:

How to unravel data interpolated with griddata

I have interpolate a function on a grid with scipy.interpolate.griddata like so
interpolated_quantity = scipy.interpolate.griddata(old_points, old_array, grid_x, grid_y, grid_z, method='nearest')
What I would like to do is to convert have a set of 4 1-D arrays: 3 with the position of each cell and one with the corresponding value of interpolated quantity in each cell.
So far I'm using a very slow and time consuming operation:
arrays={}
base_gridx = linspace(xmin,xmax,abs(ngridx)+1)
base_gridy = linspace(ymin,ymax,abs(ngridy)+1)
base_gridz = linspace(zmin,zmax,abs(ngridz)+1)
cx = (base_gridx[1:]+base_gridx[:-1])/2.
cy = (base_gridy[1:]+base_gridy[:-1])/2.
cz = (base_gridz[1:]+base_gridz[:-1])/2.
data_len = len(cx)*len(cy)*len(cz)
for ii in arange(0,len(cx)):
for jj in arange(0,len(cy)):
for kk in arange(0,len(cz)):
arrays["x"].append(cx[ii])
arrays["y"].append(cy[jj])
arrays["z"].append(cz[kk])
arrays["prop"].append(interpolated quantity[ii][jj][kk])
This works, but it just takes a huge amount of time. Do you think there might be a faster way to do this? Maybe using ravel?

It is as simple as you suggest. The four arrays are:
grid_x.ravel()
grid_y.ravel()
grid_z.ravel()
interpolated_quantity.ravel()

3g coverage map - visualise lat, long, ping data

Suppose I've been driving a set route with a 3g modem and GPS on my laptop, while my computer back at home records the ping delay. I've correlated ping with GPS lat/long, and now I'd like to visualise this data.
I've got about 80,000 points of data per day, and I'd like to display several month's worth. I'm especially interested in displaying areas where ping consistently times out (ie ping == 1000).
Scatter plot
My first attempt was with a scatter plot, with one point per data entry. I made the size of the point 5x larger if it was a timeout, so it was obvious where these areas were. I also dropped the alpha to 0.1, for a crude way to see overlaid points.
# Colour
c = pings
# Size
s = [2 if ping < 1000 else 10 for ping in pings]
# Scatter plot
plt.scatter(longs, lats, s=s, marker='o', c=c, cmap=cm.jet, edgecolors='none', alpha=0.1)
The obvious problem with this is that it displays one marker per data point, which is a very poor way to display large amounts of data. If I've drive past the same area twice, then the first pass data is just displayed on top of the second pass.
Interpolate over an even grid
I then had a try at using numpy and scipy to interpolate over an even grid.
# Convert python list to np arrays
x = np.array(longs, dtype=float)
y = np.array(lats, dtype=float)
z = np.array(pings, dtype=float)
# Make even grid (200 rows/cols)
xi = np.linspace(min(longs), max(longs), 200)
yi = np.linspace(min(lats), max(lats), 200)
# Interpolate data points to grid
zi = griddata((x, y), z, (xi[None,:], yi[:,None]), method='linear', fill_value=0)
# Plot contour map
plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
plt.contourf(xi,yi,zi,15,cmap=plt.cm.jet)
From this example
This looks interesting (lots of colours and shapes), but it extrapolates too far around areas I haven't explored. You can't see the routes I've travelled, just red/blue blotches.
If I've driven in a large curve, it'll interpolate for the area between (see below):
Interpolate over an uneven grid
I then had a try at using meshgrid (xi, yi = np.meshgrid(lats, longs)) instead of a fixed grid, but I'm told my array is too big.
Is there an easy way I can create a grid from my points?
My requirements:
Handle large data sets (80,000 x 60 = ~5m points)
Display duplicate data for each point either by averaging (I assume interpolation will do this), or by taking a minimum value for each point.
Don't extrapolate too far from data points
I'm happy with a scatter plot (top), but I need some way to average the data before I display it.
(Apologies for the dodgy mspaint drawings, I can't upload actual data)
Solution:
# Get sum
hsum, long_range, lat_range = np.histogram2d(longs, lats, bins=(res_long,res_lat), range=((a,b),(c,d)), weights=pings)
# Get count
hcount, ignore1, ignore2 = np.histogram2d(longs, lats, bins=(res_long,res_lat), range=((a,b),(c,d)))
# Get average
h = hsum/hcount
x, y = np.where(h)
average = h[x, y]
# Make scatter plot
scatterplot = ax.scatter(long_range[x], lat_range[y], s=3, c=average, linewidths=0, cmap="jet", vmin=0, vmax=1000)

To simplify your question, you have two set of points, one for ping<1000, one for ping>=1000.
Since the count of points is very large, you can't plot them directly by scatter(). I created some sample data by:
longs = (np.random.rand(60, 1) + np.linspace(-np.pi, np.pi, 80000)).reshape(-1)
lats = np.sin(longs) + np.random.rand(len(longs)) * 0.1
bad_index = (longs>0) & (longs<1)
bad_longs = longs[bad_index]
bad_lats = lats[bad_index]
(longs, lats) is points for ping<1000, (bad_longs, bad_lats) is points for ping>1000
You can use numpy.histogram2d() to count the points:
ranges = [[np.min(lats), np.max(lats)], [np.min(longs), np.max(longs)]]
h, lat_range, long_range = np.histogram2d(lats, longs, bins=(400,400), range=ranges)
bad_h, lat_range2, long_range2 = np.histogram2d(bad_lats, bad_longs, bins=(400,400), range=ranges)
h and bad_h are the points count in every little squere area.
Then you can choose many methods to visualize it. For example, you can plot it by scatter():
y, x = np.where(h)
count = h[y, x]
pl.scatter(long_range[x], lat_range[y], s=count/20, c=count, linewidths=0, cmap="Blues")
count = bad_h[y, x]
pl.scatter(long_range2[x], lat_range2[y], s=count/20, c=count, linewidths=0, cmap="Reds")
pl.show()
Here is the full code:
import numpy as np
import pylab as pl
longs = (np.random.rand(60, 1) + np.linspace(-np.pi, np.pi, 80000)).reshape(-1)
lats = np.sin(longs) + np.random.rand(len(longs)) * 0.1
bad_index = (longs>0) & (longs<1)
bad_longs = longs[bad_index]
bad_lats = lats[bad_index]
ranges = [[np.min(lats), np.max(lats)], [np.min(longs), np.max(longs)]]
h, lat_range, long_range = np.histogram2d(lats, longs, bins=(300,300), range=ranges)
bad_h, lat_range2, long_range2 = np.histogram2d(bad_lats, bad_longs, bins=(300,300), range=ranges)
y, x = np.where(h)
count = h[y, x]
pl.scatter(long_range[x], lat_range[y], s=count/20, c=count, linewidths=0, cmap="Blues")
count = bad_h[y, x]
pl.scatter(long_range2[x], lat_range2[y], s=count/20, c=count, linewidths=0, cmap="Reds")
pl.show()
The output figure is:

The GDAL libraries including the Python API and associated utilities, particularly gdal_grid should work for you. It includes a number of interpolation and averaging methods and options for generating gridded data from scattered points. You should be able to manipulate the grid cell size to get a pleasing resolution.
GDAL handles a number of data formats, but you should be able to pass your coordinates and ping values as CSV and get back a PNG or JPEG without much trouble.
Keep in mind lat/lon data is not a planar coordinate system. If you intend to incorporate you results with other map data you'll have to figure out what map projection, units, etc. to use.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create array mask from randomly ordered list - python

Related

Counterclockwise sorting of x, y data

Incorrect results for simple 2D transformation

How to interpolate a line between two other lines in python

How to unravel data interpolated with griddata

3g coverage map - visualise lat, long, ping data

Categories

Resources