Speed up Python cKDTree

Speed up Python cKDTree - python

I currently have a function that I created that connects the blue dots with its (at maximum) 3 nearest neighbors within a pixel range of 55. The vertices_xy_list is an extremely large list or points (nested list) of about 5000-10000 pairs.
Example of vertices_xy_list:
[[3673.3333333333335, 2483.3333333333335],
[3718.6666666666665, 2489.0],
[3797.6666666666665, 2463.0],
[3750.3333333333335, 2456.6666666666665],...]
I currently have written this calculate_draw_vertice_lines() function that uses a CKDTree inside of a While loop to find all points within 55 pixels and then connect them each with a green line.
It can be seen that this would become exponentially slower as the list gets longer. Is there any method to speed up this function significantly? Such as vectorizing operations?
def calculate_draw_vertice_lines():
global vertices_xy_list
global cell_wall_lengths
global list_of_lines_references
index = 0
while True:
if (len(vertices_xy_list) == 1):
break
point_tree = spatial.cKDTree(vertices_xy_list)
index_of_closest_points = point_tree.query_ball_point(vertices_xy_list[index], 55)
index_of_closest_points.remove(index)
for stuff in index_of_closest_points:
list_of_lines_references.append(plt.plot([vertices_xy_list[index][0],vertices_xy_list[stuff][0]] , [vertices_xy_list[index][1],vertices_xy_list[stuff][1]], color = 'green'))
wall_length = math.sqrt( (vertices_xy_list[index][0] - vertices_xy_list[stuff][0])**2 + (vertices_xy_list[index][1] - vertices_xy_list[stuff][1])**2 )
cell_wall_lengths.append(wall_length)
del vertices_xy_list[index]
fig.canvas.draw()

If I understand the logic of selecting the green lines correctly, there is no need to create a KDTree at each iteration. For each pair (p1, p2) of blue points, the line should be drawn if and only if the following hold:
p1 is one of 3 closest neighbors of p2.
p2 is one of 3 closest neighbors of p1.
dist(p1, p2) < 55.
You can create the KDTree once and create a list of green lines efficiently. Here is part of the implementation that returns a list of pairs of indices for points between which the green lines need to be drawn. The runtime is about 0.5 seconds on my machine for 10,000 points.
import numpy as np
from scipy import spatial
data = np.random.randint(0, 1000, size=(10_000, 2))
def get_green_lines(data):
tree = spatial.cKDTree(data)
# each key in g points to indices of 3 nearest blue points
g = {i: set(tree.query(data[i,:], 4)[-1][1:]) for i in range(data.shape[0])}
green_lines = list()
for node, candidates in g.items():
for node2 in candidates:
if node2 < node:
# avoid double-counting
continue
if node in g[node2] and spatial.distance.euclidean(data[node,:], data[node2,:]) < 55:
green_lines.append((node, node2))
return green_lines
You can proceed to plot green lines as follows:
green_lines = get_green_lines(data)
fig, ax = plt.subplots()
ax.scatter(data[:, 0], data[:, 1], s=1)
from matplotlib import collections as mc
lines = [[data[i], data[j]] for i, j in green_lines]
line_collection = mc.LineCollection(lines, color='green')
ax.add_collection(line_collection)
Example output:

Related

Several unintended lines when attempting to create voronoi diagram given scatter point locations

I'm trying to create a Voronoi diagram given a set of scatterplot points. However, several "extra unintended lines" appear to get calculated in the process. Some of these "extra" lines appear to be the infinite edges getting incorrectly calculated. But others are appearing randomly in the middle of the plot as well. How can I only create an extra edge when it's needed/required to connect a polygon to the edge of the plot (e.g. plot boundaries)?
My graph outer boundaries are:
boundaries = np.array([[0, -2], [0, 69], [105, 69], [105, -2], [0, -2]])
Here's the section dealing with the voronoi diagram creation:
def voronoi_polygons(voronoi, diameter):
centroid = voronoi.points.mean(axis=0)
ridge_direction = defaultdict(list)
for (p, q), rv in zip(voronoi.ridge_points, voronoi.ridge_vertices):
u, v = sorted(rv)
if u == -1:
t = voronoi.points[q] - voronoi.points[p] # tangent
n = np.array([-t[1], t[0]]) / np.linalg.norm(t) # normal
midpoint = voronoi.points[[p, q]].mean(axis=0)
direction = np.sign(np.dot(midpoint - centroid, n)) * n
ridge_direction[p, v].append(direction)
ridge_direction[q, v].append(direction)
for i, r in enumerate(voronoi.point_region):
region = voronoi.regions[r]
if -1 not in region:
# Finite region.
yield Polygon(voronoi.vertices[region])
continue
# Infinite region.
inf = region.index(-1) # Index of vertex at infinity.
j = region[(inf - 1) % len(region)] # Index of previous vertex.
k = region[(inf + 1) % len(region)] # Index of next vertex.
if j == k:
# Region has one Voronoi vertex with two ridges.
dir_j, dir_k = ridge_direction[i, j]
else:
# Region has two Voronoi vertices, each with one ridge.
dir_j, = ridge_direction[i, j]
dir_k, = ridge_direction[i, k]
# Length of ridges needed for the extra edge to lie at least
# 'diameter' away from all Voronoi vertices.
length = 2 * diameter / np.linalg.norm(dir_j + dir_k)
# Polygon consists of finite part plus an extra edge.
finite_part = voronoi.vertices[region[inf + 1:] + region[:inf]]
extra_edge = [voronoi.vertices[j] + dir_j * length,
voronoi.vertices[k] + dir_k * length]
combined_finite_edge = np.concatenate((finite_part, extra_edge))
poly = Polygon(combined_finite_edge)
yield poly
Here are the points being used:
['52.629' '24.28099822998047']
['68.425' '46.077999114990234']
['60.409' '36.7140007019043']
['72.442' '28.762001037597656']
['52.993' '43.51799964904785']
['59.924' '16.972000122070312']
['61.101' '55.74899959564209']
['68.9' '13.248001098632812']
['61.323' '29.0260009765625']
['45.283' '36.97500038146973']
['52.425' '19.132999420166016']
['37.739' '28.042999267578125']
['48.972' '2.3539962768554688']
['33.865' '30.240001678466797']
['52.34' '64.94799995422363']
['52.394' '45.391000747680664']
['52.458' '34.79800033569336']
['31.353' '43.14500045776367']
['38.194' '39.24399948120117']
['98.745' '32.15999984741211']
['6.197' '32.606998443603516']

Most likely this is due to the errors associated with floating point arithmetic while computing the voronoi traingulation from your data (esp. the second column).
Assuming that, there is no single solution for such kinds of problems. I urge you to go through this page* of the Qhull manual and try iterating through those parameters in qhull_options before generating the voronoi object that you are inputting in the function. An example would be qhull_options='Qbb Qc Qz QJ'.
Other than that I doubt there is anything that could be modified in the function to avoid such a problem.
*This will take some time though. Just be patient.

Figured out what was wrong: after each polygon I needed to add a null x and y value or else it would attempt to 'stitch' one polygon to another, drawing an additional unintended line in order to do so. So the data should really look more like this:
GameTime,Half,ObjectType,JerseyNumber,X,Y,PlayerIDEvent,PlayerIDTracking,MatchIDEvent,Position,teamId,i_order,v_vor_x,v_vor_y
0.0,1,1,22,None,None,578478,794888,2257663,3,35179.0,0,22.79645297,6.20866756
0.0,1,1,22,None,None,578478,794888,2257663,3,35179.0,1,17.63464264,3.41230187
0.0,1,1,22,None,None,578478,794888,2257663,3,35179.0,2,20.27639318,34.29191902
0.0,1,1,22,None,None,578478,794888,2257663,3,35179.0,3,32.15600546,36.60432421
0.0,1,1,22,None,None,578478,794888,2257663,3,35179.0,4,38.34639812,33.62806739
0.0,1,1,22,None,None,578478,794888,2257663,3,35179.0,5,22.79645297,6.20866756
0.0,1,1,22,None,None,578478,794888,2257663,3,35179.0,5,nan,nan
0.0,1,1,22,33.865,30.240001678466797,578478,794888,2257663,3,35179.0,,,
0.0,1,0,92,None,None,369351,561593,2257663,1,32446.0,0,46.91696938,29.44801535
0.0,1,0,92,None,None,369351,561593,2257663,1,32446.0,1,55.37574848,29.5855499
0.0,1,0,92,None,None,369351,561593,2257663,1,32446.0,2,58.85876401,23.20381766
0.0,1,0,92,None,None,369351,561593,2257663,1,32446.0,3,57.17455086,21.5228301
0.0,1,0,92,None,None,369351,561593,2257663,1,32446.0,4,44.14237744,22.03925667
0.0,1,0,92,None,None,369351,561593,2257663,1,32446.0,5,45.85962774,28.83613332
0.0,1,0,92,None,None,369351,561593,2257663,1,32446.0,5,nan,nan
0.0,1,0,92,52.629,24.28099822998047,369351,561593,2257663,1,32446.0,,,
0.0,1,0,27,None,None,704169,704169,2257663,2,32446.0,0,65.56965667,33.4292025
0.0,1,0,27,None,None,704169,704169,2257663,2,32446.0,1,57.23303682,32.43809027
0.0,1,0,27,None,None,704169,704169,2257663,2,32446.0,2,55.65704152,38.97814049
0.0,1,0,27,None,None,704169,704169,2257663,2,32446.0,3,60.75304149,44.53251169
0.0,1,0,27,None,None,704169,704169,2257663,2,32446.0,4,65.14170295,40.77562188
0.0,1,0,27,None,None,704169,704169,2257663,2,32446.0,5,65.56965667,33.4292025
0.0,1,0,27,None,None,704169,704169,2257663,2,32446.0,5,nan,nan

Finding the area of an overlap between curves (python)

Is it possible to calculate the area of the overlap of two curves?
I found two answers here but they are written in R which I am not familiar with. Or struggling to convert them to python.
Area between the two curves and Find area of overlap between two curves
For example, for a given dataset with defined x, y points. (x1,y1,x2,y2)
I am able to get the area of each curve using :
np.trapz
However, to get the overlap only is challenging and I haven't found a solution to show. Any guidance or maths formulas will be appreciated.

So this can be done using the shapely module within Python.
Firstly, Join the two curves together to create one self-intersecting polygon (shown in code below).
Then using the unary_union() function from shapely, you will:
Split the complex polygon into seperate simple polygons.
Find the area of each simple polygon.
Sum it to find the overall area of the two curves.
Full code shown below:
import numpy as np
from shapely.geometry import LineString
from shapely.ops import unary_union, polygonize
avg_coords = [(0.0, 0.0), (4.872117, 2.29658), (5.268545, 2.4639225), (5.664686, 2.6485724), (6.059776, 2.8966842), (6.695151, 3.0986626), (7.728006, 3.4045217), (8.522297, 3.652668), (9.157002, 3.895031), (10.191483, 4.1028132), (10.827622, 4.258638), (11.38593, 4.2933016), (11.86478, 4.3048816), (12.344586, 4.258769), (12.984073, 4.2126703), (13.942729, 4.1781383), (14.58212, 4.137809), (15.542498, 3.99943), (16.502588, 3.878359), (17.182951, 3.7745714), (18.262657, 3.6621647), (19.102558, 3.567045), (20.061789, 3.497897), (21.139917, 3.4806826), (22.097425, 3.5153809), (23.65388, 3.5414772), (24.851482, 3.541581), (26.04966, 3.507069), (27.72702, 3.463945), (28.925198, 3.429433), (29.883854, 3.3949006), (31.08246, 3.3344274), (31.92107, 3.317192), (33.716183, 3.3952322), (35.63192, 3.4213595), (37.427895, 3.4474766), (39.343628, 3.473604), (41.49874, 3.508406), (43.773468, 3.5518723), (46.287716, 3.595359), (49.28115, 3.6302335), (52.633293, 3.6997545), (54.30922, 3.7431688), (55.8651, 3.8038807), (58.738773, 3.8387446), (60.893887, 3.8735466), (63.647655, 3.9170544), (66.760704, 3.960593), (68.79663, 3.9607692), (70.23332, 3.986855), (72.867905, 3.995737), (75.38245, 4.0219164), (77.778656, 3.9615464), (79.337975, 3.8145657), (80.41826, 3.6675436), (80.899734, 3.5204697), (81.62059, 3.38207), (82.34045, 3.3042476), (83.30039, 3.1918304), (84.38039, 3.062116), (84.50359, 2.854434), (83.906364, 2.7591898), (83.669716, 2.586092), (83.43435, 2.3351095), (83.19727, 2.1879735), (82.84229, 1.9283267), (82.48516, 1.7984879), (81.65014, 1.5993768), (80.454544, 1.4781193), (79.13962, 1.3308897), (77.944595, 1.1750168), (76.39001, 1.0364205), (74.59633, 0.87184185), (71.60447, 0.741775), (70.04903, 0.6551017), (58.3, 0.0)]
model_coords = [(0.0, 0.0), (0.6699889, 0.18807), (1.339894, 0.37499), (2.009583, 0.55966), (2.67915, 0.74106), (3.348189, 0.91826), (4.016881, 1.0904), (4.685107, 1.2567), (5.359344, 1.418), (6.026172, 1.5706), (6.685472, 1.714), (7.350604, 1.8508), (8.021434, 1.9803), (8.684451, 2.0996), (9.346408, 2.2099), (10.0066, 2.311), (10.66665, 2.4028), (11.32436, 2.4853), (11.98068, 2.5585), (12.6356, 2.6225), (13.29005, 2.6775), (13.93507, 2.7232), (14.58554, 2.7609), (15.23346, 2.7903), (15.87982, 2.8116), (16.52556, 2.8254), (17.16867, 2.832), (17.80914, 2.8317), (18.44891, 2.825), (19.08598, 2.8124), (19.72132, 2.7944), (20.35491, 2.7713), (20.98673, 2.7438), (21.61675, 2.7121), (22.24398, 2.677), (22.86939, 2.6387), (23.49297, 2.5978), (24.1147, 2.5548), (24.73458, 2.51), (25.3526, 2.464), (25.96874, 2.4171), (26.58301, 2.3697), (27.1954, 2.3223), (27.80491, 2.2751), (28.41354, 2.2285), (29.02028, 2.1829), (29.62512, 2.1384), (30.22809, 2.0954), (30.82917, 2.0541), (31.42837, 2.0147), (32.02669, 1.9775), (32.62215, 1.9425), (33.21674, 1.9099), (33.80945, 1.8799), (34.40032, 1.8525), (34.98933, 1.8277), (35.5765, 1.8058), (36.16283, 1.7865), (36.74733, 1.7701), (37.33002, 1.7564), (37.91187, 1.7455), (38.49092, 1.7372), (39.06917, 1.7316), (39.64661, 1.7285), (40.22127, 1.7279), (40.79514, 1.7297), (41.36723, 1.7337), (41.93759, 1.7399), (42.50707, 1.748), (43.07386, 1.7581), (43.63995, 1.7699), (44.20512, 1.7832), (44.76772, 1.7981), (45.3295, 1.8143), (45.88948, 1.8318), (46.44767, 1.8504), (47.00525, 1.8703), (47.55994, 1.8911), (48.11392, 1.9129), (48.6661, 1.9356), (49.21658, 1.959), (49.76518, 1.9832), (50.31305, 2.0079), (50.85824, 2.033), (51.40252, 2.0586), (51.94501, 2.0845), (52.48579, 2.1107), (53.02467, 2.1369), (53.56185, 2.1632), (54.09715, 2.1895), (54.63171, 2.2156), (55.1634, 2.2416), (55.69329, 2.2674), (56.22236, 2.2928), (56.74855, 2.3179), (57.27392, 2.3426), (57.7964, 2.3668), (58.31709, 2.3905), (58.83687, 2.4136), (59.35905, 2.4365), (59.87414, 2.4585), (60.38831, 2.4798), (60.8996, 2.5006), (61.40888, 2.5207), (61.91636, 2.5401), (62.42194, 2.5589), (62.92551, 2.577), (63.42729, 2.5945), (63.92607, 2.6113), (64.42384, 2.6275), (64.91873, 2.643), (65.4127, 2.658), (65.90369, 2.6724), (66.39266, 2.6862), (66.87964, 2.6995), (67.36373, 2.7123), (67.84679, 2.7246), (68.32689, 2.7364), (68.80595, 2.7478), (69.28194, 2.7588), (69.756, 2.7695), (70.22709, 2.7798), (70.69707, 2.7898), (71.16405, 2.7995), (71.62902, 2.809), (72.0919, 2.8183), (72.55277, 2.8273), (73.01067, 2.8362), (73.46734, 2.845), (73.92112, 2.8536), (74.37269, 2.8622), (74.82127, 2.8706), (75.26884, 2.8791), (75.71322, 2.8875), (76.15559, 2.8958), (76.59488, 2.9042), (77.03304, 2.9126), (77.46812, 2.921), (77.90111, 2.9294), (78.33199, 2.9379), (78.75986, 2.9464), (79.18652, 2.955), (79.60912, 2.9637), (80.03049, 2.9724), (80.44985, 2.9811), (80.86613, 2.99), (81.2802, 2.9989), (81.69118, 3.0078), (82.10006, 3.0168), (82.50674, 3.0259), (82.91132, 3.035), (83.31379, 3.0441), (83.71307, 3.0533), (84.10925, 3.0625), (84.50421, 3.0717), (84.8961, 3.0809), (85.28577, 3.0901), (85.67334, 3.0993), (86.05771, 3.1085), (86.43989, 3.1176), (86.81896, 3.1267), (87.19585, 3.1358), (87.57063, 3.1448), (87.94319, 3.1537), (88.31257, 3.1626), (88.67973, 3.1713), (89.04372, 3.18), (89.40659, 3.1886), (89.7652, 3.197), (90.12457, 3.2053), (90.47256, 3.2135), (90.82946, 3.2216), (91.17545, 3.2295), (91.52045, 3.2373), (91.86441, 3.2449), (92.20641, 3.2524), (92.54739, 3.2597), (92.88728, 3.2669), (93.21538, 3.2739), (93.55325, 3.2807), (93.87924, 3.2874), (94.20424, 3.2939), (94.52822, 3.3002), (94.85012, 3.3064), (95.16219, 3.3123), (95.48208, 3.3182), (95.79107, 3.3238), (96.09807, 3.3293), (96.40505, 3.3346), (96.71003, 3.3397), (97.01401, 3.3447), (97.31592, 3.3496), (97.60799, 3.3542), (97.90789, 3.3587), (98.19686, 3.3631), (98.48386, 3.3673), (98.77085, 3.3714), (99.05574, 3.3753), (99.32983, 3.3791), (99.6127, 3.3828), (99.8837, 3.3863), (100.1538, 3.3897), (100.4326, 3.393), (100.6897, 3.3961), (100.9566, 3.3991), (101.2215, 3.402), (101.4756, 3.4048), (101.7375, 3.4075), (101.9885, 3.4101), (102.2385, 3.4126), (102.4875, 3.4149), (102.7354, 3.4172), (102.9714, 3.4194), (103.2163, 3.4214), (103.4493, 3.4234), (103.6823, 3.4253), (103.9133, 3.4271), (104.1433, 3.4288), (104.3712, 3.4304), (104.5882, 3.4319), (104.8141, 3.4333), (105.0291, 3.4346), (105.2421, 3.4358), (105.4541, 3.437), (105.6651, 3.438), (105.8751, 3.439), (106.083, 3.4399), (106.28, 3.4407), (106.4759, 3.4414), (106.6699, 3.442), (106.8629, 3.4425), (107.0549, 3.443), (107.2458, 3.4433), (107.4249, 3.4435), (107.6128, 3.4437), (107.7897, 3.4438), (107.9647, 3.4437), (108.1387, 3.4436), (108.3116, 3.4433), (108.4737, 3.443), (108.6436, 3.4426), (108.8027, 3.4421), (108.9706, 3.4414), (109.1265, 3.4407), (109.2814, 3.4399), (109.4255, 3.439), (109.5784, 3.4379), (109.7195, 3.4368), (109.8694, 3.4356), (110.0084, 3.4342), (110.1454, 3.4328), (110.2813, 3.4313), (110.4162, 3.4296), (110.5403, 3.4279), (110.6722, 3.426), (110.7932, 3.424), (110.9132, 3.422), (111.0322, 3.4198), (111.1492, 3.4175), (111.2651, 3.4151), (111.3701, 3.4127), (111.483, 3.4101), (111.585, 3.4074), (111.686, 3.4046), (111.786, 3.4017), (111.884, 3.3987), (111.9809, 3.3956), (112.0669, 3.3924), (112.1608, 3.3891), (112.2448, 3.3857), (112.3268, 3.3822), (112.4078, 3.3786), (112.4867, 3.3749), (112.5548, 3.3711), (112.6317, 3.3672), (112.6978, 3.3632), (112.7726, 3.3591), (112.8356, 3.3549), (112.8975, 3.3506), (112.9476, 3.3462), (113.0076, 3.3417), (113.0655, 3.3372), (113.1125, 3.3325), (113.1584, 3.3278), (113.2024, 3.3229), (113.2464, 3.318), (113.2884, 3.313), (113.3283, 3.3079), (113.3584, 3.3027), (113.3963, 3.2974), (113.4233, 3.292), (113.4492, 3.2865), (113.4742, 3.281), (113.4972, 3.2753), (113.5201, 3.2696), (113.5312, 3.2638), (113.5501, 3.2579), (113.5591, 3.2519), (113.5661, 3.2459), (113.5721, 3.2397), (113.577, 3.2335), (113.5809, 3.2272), (113.573, 3.2208), (113.5749, 3.2143), (113.5649, 3.2077), (113.5539, 3.2011), (113.5409, 3.1944), (113.5278, 3.1876), (113.5128, 3.1807), (113.4967, 3.1737), (113.4697, 3.1667), (113.4418, 3.1596), (113.4227, 3.1524), (113.3917, 3.145), (113.3597, 3.1375), (113.3266, 3.1298), (113.2827, 3.1218), (113.2475, 3.1136), (113.2016, 3.1051), (113.1635, 3.0964), (113.1155, 3.0873), (113.0655, 3.0779), (113.0144, 3.0683), (112.9525, 3.0583), (112.8994, 3.048), (112.8345, 3.0373), (112.7793, 3.0264), (112.7123, 3.0152), (112.6453, 3.0037), (112.5763, 2.9919), (112.5063, 2.9798), (112.4352, 2.9674), (112.3533, 2.9548), (112.2801, 2.9419), (112.1952, 2.9287), (112.1102, 2.9153), (112.034, 2.9017), (111.9361, 2.8879), (111.8481, 2.8739), (111.7581, 2.8597), (111.667, 2.8453), (111.5661, 2.8307), (111.473, 2.816), (111.3689, 2.801), (111.2639, 2.786), (111.1579, 2.7708), (111.0509, 2.7555), (110.9428, 2.74), (110.8239, 2.7245), (110.7138, 2.7088), (110.5928, 2.6931), (110.4709, 2.6772), (110.3578, 2.6613), (110.2338, 2.6453), (110.1087, 2.6292), (109.9826, 2.613), (109.8457, 2.5968), (109.7176, 2.5805), (109.5787, 2.5642), (109.4496, 2.5478), (109.3086, 2.5314), (109.1666, 2.5149), (109.0236, 2.4984), (108.8806, 2.4819), (108.7355, 2.4653), (108.5905, 2.4488), (108.4434, 2.4322), (108.2865, 2.4155), (108.1384, 2.3989), (107.9794, 2.3822), (107.8195, 2.3655), (107.6684, 2.3488), (107.5063, 2.3321), (107.3374, 2.3156), (107.1744, 2.2989), (107.0104, 2.2822), (106.8442, 2.2654), (106.6683, 2.2487), (106.5012, 2.232), (106.3242, 2.2152), (106.1452, 2.1985), (105.9662, 2.1818), (105.7862, 2.165), (105.6052, 2.1483), (105.4232, 2.1316), (105.2402, 2.1149), (105.0572, 2.0981), (104.8721, 2.0814), (104.6772, 2.0647), (104.492, 2.048), (104.295, 2.0313), (104.098, 2.0147), (103.9, 1.998), (103.701, 1.9813), (103.502, 1.9647), (103.301, 1.948), (103.1, 1.9314), (102.899, 1.9148), (102.6959, 1.8982), (102.483, 1.8816), (102.2789, 1.865), (102.0649, 1.8484), (101.8588, 1.8318), (101.6428, 1.8153), (101.4268, 1.7988), (101.2098, 1.7822), (100.9918, 1.7657), (100.7728, 1.7492), (100.5538, 1.7328), (100.3338, 1.7163), (100.1128, 1.6999), (99.89169, 1.6834), (99.65978, 1.667), (99.43769, 1.6506), (99.20477, 1.6343), (98.98066, 1.6179), (98.74665, 1.6016), (98.51164, 1.5852), (98.27574, 1.5689), (98.04964, 1.5527), (97.81264, 1.5364), (97.57562, 1.5202), (97.33752, 1.5039), (97.08962, 1.4877), (96.8506, 1.4716), (96.61061, 1.4554), (96.37051, 1.4393), (96.12058, 1.4232), (95.87949, 1.4071), (95.62759, 1.391), (95.38547, 1.375), (95.13258, 1.359), (94.88946, 1.343), (94.63548, 1.3271), (94.38145, 1.3111), (94.12645, 1.2952), (93.87144, 1.2793), (93.61545, 1.2635), (93.35946, 1.2477), (93.10343, 1.2319), (92.84642, 1.2161), (92.58843, 1.2004), (92.33042, 1.1846), (92.07232, 1.169), (91.8034, 1.1533), (91.54331, 1.1377), (91.2744, 1.1221), (91.0133, 1.1065), (90.7434, 1.091), (90.48229, 1.0755), (90.21139, 1.0601), (89.9493, 1.0446), (89.67728, 1.0292), (89.40428, 1.0139), (89.13137, 0.99855), (88.86826, 0.98325), (88.59427, 0.96799), (88.32026, 0.95277), (88.04527, 0.93758), (87.77126, 0.92242), (87.4972, 0.90731), (87.21732, 0.89222), (86.94719, 0.87718), (86.66711, 0.86217), (86.3773, 0.8472), (86.10719, 0.83227), (85.82721, 0.81738), (85.5472, 0.80252), (85.26721, 0.7877), (84.9872, 0.77292), (84.7071, 0.75819), (84.41721, 0.74349), (84.1371, 0.72883), (83.84721, 0.71421), (83.5671, 0.69963), (83.27721, 0.68509), (82.99711, 0.6706), (82.70711, 0.65615), (82.41721, 0.64173), (82.1371, 0.62736), (81.8471, 0.61304), (81.55722, 0.59875), (81.27709, 0.58451), (80.98712, 0.57031), (80.697, 0.55616), (80.39711, 0.54205), (80.10722, 0.52798), (79.8271, 0.51396), (79.53701, 0.49999), (79.23711, 0.48605), (78.9471, 0.47217), (78.65701, 0.45833), (78.3571, 0.44453), (78.06712, 0.43078), (77.77701, 0.41708), (77.4771, 0.40343), (77.18701, 0.38982), (76.8871, 0.37626), (76.59711, 0.36274), (76.30701, 0.34928), (76.0071, 0.33586), (75.7169, 0.32249), (75.4071, 0.30917), (75.11701, 0.29589), (74.8171, 0.28267), (74.52701, 0.26949), (74.22711, 0.25636), (73.937, 0.24329), (73.63691, 0.23026), (73.3271, 0.21728), (73.03699, 0.20436), (72.73712, 0.19148), (72.4469, 0.17865), (72.13712, 0.16588), (71.84701, 0.15315), (71.547, 0.14048), (71.24701, 0.12786), (70.947, 0.11528), (70.64701, 0.10277), (70.3471, 0.090298), (70.05691, 0.077883), (69.74712, 0.06552), (69.457, 0.05321), (69.1569, 0.040952), (68.84709, 0.028747), (68.557, 0.016595), (68.25701, 0.0)]
polygon_points = [] #creates a empty list where we will append the points to create the polygon
for xyvalue in avg_coords:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 1
for xyvalue in model_coords[::-1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 2 in the reverse order (from last point to first point)
for xyvalue in avg_coords[0:1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append the first point in curve 1 again, to it "closes" the polygon
avg_poly = []
model_poly = []
for xyvalue in avg_coords:
avg_poly.append([xyvalue[0],xyvalue[1]])
for xyvalue in model_coords:
model_poly.append([xyvalue[0],xyvalue[1]])
line_non_simple = LineString(polygon_points)
mls = unary_union(line_non_simple)
Area_cal =[]
for polygon in polygonize(mls):
Area_cal.append(polygon.area)
print(polygon.area)# print area of each section
Area_poly = (np.asarray(Area_cal).sum())
print(Area_poly)#print combined area

If possible, represent your overlap regions as polygons. From there the polygon area is computable by a remarkably concise formula as explained on Paul Bourke's site.
Suppose (x[i], y[i]), i = 0, ..., N, are the polygon vertices, with (x[0], y[0]) = (x[N], y[N]) so that the polygon is closed, and consistently all in clockwise order or all in counter-clockwise order. Then the area is
area = |0.5 * sum_i (x[i] * y[i+1] - x[i+1] * y[i])|
where the sum goes over i = 0, ..., N-1. This is valid even for nonconvex polygons. This formula is essentially the same principle behind how a planimeter works to measure area of an arbitrary two-dimensional shape, a special case of Green's theorem.

If your functions are actually "function" meaning that no vertical lines intersect the functions more than once, then finding the overlaps is the matter of finding zeros.
import numpy as np
import matplotlib.pyplot as plt
dx = 0.01
x = np.arange(-2, 2, dx)
f1 = np.sin(4*x)
f2 = np.cos(4*x)
plt.plot(x, f1)
plt.plot(x, f2)
eps = 1e-1; # threshold of intersection points.
df = f1 - f2
idx_zeros = np.where(abs(df) <= eps)[0]
area = 0
for i in range(len(idx_zeros) - 1):
idx_left = idx_zeros[i]
idx_rite = idx_zeros[i+1]
area += abs(np.trapz(df[idx_left:idx_rite])) * dx
I have assumed areas to be considered positive.
The analytical value for the example I used is
sufficiently close to the computed value (area=2.819). Of course, you can improve this if your grids are finer, and threshold eps smaller.

Finding anomalous values from sinusoidal data

How can I find anomalous values from following data. I am simulating a sinusoidal pattern. While I can plot the data and spot any anomalies or noise in data, but how can I do it without plotting the data. I am looking for simple approaches other than Machine learning methods.
import random
import numpy as np
import matplotlib.pyplot as plt
N = 10 # Set signal sample length
t1 = -np.pi # Simulation begins at t1
t2 = np.pi; # Simulation ends at t2
in_array = np.linspace(t1, t2, N)
print("in_array : ", in_array)
out_array = np.sin(in_array)
plt.plot(in_array, out_array, color = 'red', marker = "o") ; plt.title("numpy.sin()")
Inject random noise
noise_input = random.uniform(-.5, .5); print("Noise : ",noise_input)
in_array[random.randint(0,len(in_array)-1)] = noise_input
print(in_array)
plt.plot(in_array, out_array, color = 'red', marker = "o") ; plt.title("numpy.sin()")
Data with noise

I've thought of the following approach to your problem, since you have only some values that are anomalous in the time vector, it means that the rest of the values have a regular progression, which means that if we gather all the data points in the vector under clusters and calculate the average step for the biggest cluster (which is essentially the pool of values that represent the real deal), then we can use that average to do a triad detection, in a given threshold, over the vector and detect which of the elements are anomalous.
For this we need two functions: calculate_average_step which will calculate that average for the biggest cluster of close values, and then we need detect_anomalous_values which will yield the indexes of the anomalous values in our vector, based on that average calculated earlier.
After we detected the anomalous values, we can go ahead and replace them with an estimated value, which we can determine from our average step value and by using the adjacent points in the vector.
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def calculate_average_step(array, threshold=5):
"""
Determine the average step by doing a weighted average based on clustering of averages.
array: our array
threshold: the +/- offset for grouping clusters. Aplicable on all elements in the array.
"""
# determine all the steps
steps = []
for i in range(0, len(array) - 1):
steps.append(abs(array[i] - array[i+1]))
# determine the steps clusters
clusters = []
skip_indexes = []
cluster_index = 0
for i in range(len(steps)):
if i in skip_indexes:
continue
# determine the cluster band (based on threshold)
cluster_lower = steps[i] - (steps[i]/100) * threshold
cluster_upper = steps[i] + (steps[i]/100) * threshold
# create the new cluster
clusters.append([])
clusters[cluster_index].append(steps[i])
# try to match elements from the rest of the array
for j in range(i + 1, len(steps)):
if not (cluster_lower <= steps[j] <= cluster_upper):
continue
clusters[cluster_index].append(steps[j])
skip_indexes.append(j)
cluster_index += 1 # increment the cluster id
clusters = sorted(clusters, key=lambda x: len(x), reverse=True)
biggest_cluster = clusters[0] if len(clusters) > 0 else None
if biggest_cluster is None:
return None
return sum(biggest_cluster) / len(biggest_cluster) # return our most common average
def detect_anomalous_values(array, regular_step, threshold=5):
"""
Will scan every triad (3 points) in the array to detect anomalies.
array: the array to iterate over.
regular_step: the step around which we form the upper/lower band for filtering
treshold: +/- variation between the steps of the first and median element and median and third element.
"""
assert(len(array) >= 3) # must have at least 3 elements
anomalous_indexes = []
step_lower = regular_step - (regular_step / 100) * threshold
step_upper = regular_step + (regular_step / 100) * threshold
# detection will be forward from i (hence 3 elements must be available for the d)
for i in range(0, len(array) - 2):
a = array[i]
b = array[i+1]
c = array[i+2]
first_step = abs(a-b)
second_step = abs(b-c)
first_belonging = step_lower <= first_step <= step_upper
second_belonging = step_lower <= second_step <= step_upper
# detect that both steps are alright
if first_belonging and second_belonging:
continue # all is good here, nothing to do
# detect if the first point in the triad is bad
if not first_belonging and second_belonging:
anomalous_indexes.append(i)
# detect the last point in the triad is bad
if first_belonging and not second_belonging:
anomalous_indexes.append(i+2)
# detect the mid point in triad is bad (or everything is bad)
if not first_belonging and not second_belonging:
anomalous_indexes.append(i+1)
# we won't add here the others because they will be detected by
# the rest of the triad scans
return sorted(set(anomalous_indexes)) # return unique indexes
if __name__ == "__main__":
N = 10 # Set signal sample length
t1 = -np.pi # Simulation begins at t1
t2 = np.pi; # Simulation ends at t2
in_array = np.linspace(t1, t2, N)
# add some noise
noise_input = random.uniform(-.5, .5);
in_array[random.randint(0, len(in_array)-1)] = noise_input
noisy_out_array = np.sin(in_array)
# display noisy sin
plt.figure()
plt.plot(in_array, noisy_out_array, color = 'red', marker = "o");
plt.title("noisy numpy.sin()")
# detect anomalous values
average_step = calculate_average_step(in_array)
anomalous_indexes = detect_anomalous_values(in_array, average_step)
# replace anomalous points with an estimated value based on our calculated average
for anomalous in anomalous_indexes:
# try forward extrapolation
try:
in_array[anomalous] = in_array[anomalous-1] + average_step
# else try backwward extrapolation
except IndexError:
in_array[anomalous] = in_array[anomalous+1] - average_step
# generate sine wave
out_array = np.sin(in_array)
plt.figure()
plt.plot(in_array, out_array, color = 'green', marker = "o");
plt.title("cleaned numpy.sin()")
plt.show()
Noisy sine:
Cleaned sine:

Your problem relies in the time vector (which is of 1 dimension). You will need to apply some sort of filter on that vector.
First thing that came to mind was medfilt (median filter) from scipy and it looks something like this:
from scipy.signal import medfilt
l1 = [0, 10, 20, 30, 2, 50, 70, 15, 90, 100]
l2 = medfilt(l1)
print(l2)
the output of this will be:
[ 0. 10. 20. 20. 30. 50. 50. 70. 90. 90.]
the problem with this filter though is that if we apply some noise values to the edges of the vector like [200, 0, 10, 20, 30, 2, 50, 70, 15, 90, 100, -50] then the output would be something like [ 0. 10. 10. 20. 20. 30. 50. 50. 70. 90. 90. 0.] and obviously this is not ok for the sine plot since it will produce the same artifacts for the sine values array.
A better approach to this problem is to treat the time vector as an y output and it's index values as the x input and do a linear regression on the "time linear function", not the quotes, it just means we're faking the 2 dimensional model by applying a fake X vector. The code implies the use of scipy's linregress (linear regression) function:
from scipy.stats import linregress
l1 = [5, 0, 10, 20, 30, -20, 50, 70, 15, 90, 100]
l1_x = range(0, len(l1))
slope, intercept, r_val, p_val, std_err = linregress(l1_x, l1)
l1 = intercept + slope * l1_x
print(l1)
whose output will be:
[-10.45454545 -1.63636364 7.18181818 16. 24.81818182
33.63636364 42.45454545 51.27272727 60.09090909 68.90909091
77.72727273]
Now let's apply this to your time vector.
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress
N = 20
# N = 10 # Set signal sample length
t1 = -np.pi # Simulation begins at t1
t2 = np.pi; # Simulation ends at t2
in_array = np.linspace(t1, t2, N)
# add some noise
noise_input = random.uniform(-.5, .5);
in_array[random.randint(0, len(in_array)-1)] = noise_input
# apply filter on time array
in_array_x = range(0, len(in_array))
slope, intercept, r_val, p_val, std_err = linregress(in_array_x, in_array)
in_array = intercept + slope * in_array_x
# generate sine wave
out_array = np.sin(in_array)
print("OUT ARRAY")
print(out_array)
plt.plot(in_array, out_array, color = 'red', marker = "o") ; plt.title("numpy.sin()")
plt.show()
the output will be:
the resulting signal will be an approximation of the original, as it is with any form of extrapolation/interpolation/regression filtering.

Splitting self-intersecting polygon only returned one polygon in Shapely

I am using Python 3.5 64 bit in Windows 7 64 bit, shapely version 1.5.13.
I have the following code that returned me a self-intersecting polygon:
import numpy as np
from shapely.geometry import Polygon, MultiPolygon
import matplotlib.pyplot as plt
x = np.array([ 0.38517325, 0.40859912, 0.43296919, 0.4583215 , 0.4583215 ,
0.43296919, 0.40859912, 0.38517325, 0.36265506, 0.34100929])
y = np.array([ 62.5 , 56.17977528, 39.39698492, 0. ,
0. , 17.34605377, 39.13341671, 60.4180932 ,
76.02574417, 85.47008547])
polygon = Polygon(np.c_[x, y])
plt.plot(*polygon.exterior.xy)
This is correct. Then I tried to obtain the two individual polygons by using buffer(0):
split_polygon = polygon.buffer(0)
plt.plot(*polygon.exterior.xy)
print(type(split_polygon))
plt.fill(*split_polygon.exterior.xy)
Unfortunately, it only returned of the the two polygons:
Could anyone please help? Thanks!

The first step is to close the LineString to make a LinearRing, which is what Polygons are made of.
from shapely.geometry import LineString, MultiPolygon
from shapely.ops import polygonize, unary_union
# original data
ls = LineString(np.c_[x, y])
# closed, non-simple
lr = LineString(ls.coords[:] + ls.coords[0:1])
lr.is_simple # False
However, note that it is non-simple, since the lines cross to make a bow-tie. (The widely used buffer(0) trick usually does not work for fixing bow-ties in my experience). This is unsuitable for a LinearRing, so it needs further work. Make it simple and MultiLineString with unary_union:
mls = unary_union(lr)
mls.geom_type # MultiLineString'
Then use polygonize to find the Polygons from the linework:
for polygon in polygonize(mls):
print(polygon)
Or if you want one MultiPolygon geometry:
mp = MultiPolygon(list(polygonize(mls)))

I struggled with this for a while still in 2020, and finally just wrote a method that cleans up self intersections.
This requires Shapely v 1.2.1 explain_validity() method to work.
def clean_bowtie_geom(base_linearring):
base_polygon = Polygon(base_linearring)
invalidity = explain_validity(base_polygon)
invalid_regex = re.compile('^(Self-intersection)[[](.+)\s(.+)[]]$')
match = invalid_regex.match(invalidity)
if match:
groups = match.groups()
intersect_point = (float(groups[1]), float(groups[2]))
new_linring_coords1 = []
new_linring_coords2 = []
pop_new_linring = False
for i in range(0, len(base_linearring.coords)):
if i == len(base_linearring.coords) - 1:
end_point = base_linearring.coords[0]
else:
end_point = base_linearring.coords[i + 1]
start_point = base_linearring.coords[i]
if not pop_new_linring:
if is_point_on_line_and_between(start=start_point, end=end_point, pt=intersect_point):
new_linring_coords2.append(intersect_point)
new_linring_coords1.append(intersect_point)
pop_new_linring = True
else:
new_linring_coords1.append(start_point)
else:
new_linring_coords2.append(start_point)
if is_point_on_line_and_between(start=start_point, end=end_point, pt=intersect_point):
new_linring_coords2.append(intersect_point)
pop_new_linring = False
corrected_linear_ring1 = LinearRing(coordinates=new_linring_coords1)
corrected_linear_ring2 = LinearRing(coordinates=new_linring_coords2)
polygon1 = Polygon(corrected_linear_ring1)
polygon2 = Polygon(corrected_linear_ring2)
def is_point_on_line_and_between(start, end, pt, tol=0.0005):
"""
Checks to see if pt is directly in line and between start and end coords
:param start: list or tuple of x, y coordinates of start point of line
:param end: list or tuple of x, y coordinates of end point of line
:param pt: list or tuple of x, y coordinates of point to check if it is on the line
:param tol: Tolerance for checking if point on line
:return: True if on the line, False if not on the line
"""
v1 = (end[0] - start[0], end[1] - start[1])
v2 = (pt[0] - start[0], pt[1] - start[1])
cross = cross_product(v1, v2)
if cross <= tol:
# The point lays on the line, but need to check if in between
if ((start[0] <= pt[0] <= end[0]) or (start[0] >= pt[0] >= end[0])) and ((start[1] <= pt[1] <= end[1]) or (start[1] >= pt[1] >= end[1])):
return True
return False
This is not the cleanest code, but it gets the job done for me.
Input is a LinearRing with self intersecting geometry (is_simple=False) and output can be either 2 LinearRings, or Two Polygons, whichever you prefer (or have condition to pick one or the other, the world is your oyster, really).
EDIT
In Shapely 1.8.0, new function added.
shapely.validation.make_valid() will take a self intersecting Polygon and return a MultiPolygon with each polygon created by splitting at the self intersection point(s).

How to generate a fractal graph of a market in python

I wish to generate this in python:
http://classes.yale.edu/fractals/RandFrac/Market/TradingTime/Example1/Example1.html
but I'm incredibly stuck and new to this concept. Does anybody know of a library or gist for this?
Edit:
From what I can understand is that you need to split the fractal in 2 every time. So you have to calculate the y-axis point from the line between the two middle points. Then the two sections need to be formed according to the fractal?

Not 100% sure what you are asking, but as I understood from your comments, you want to generate a realistically looking stock market curve using the recursion described in the link.
As far as I understood the description in the linked page and some of the parent pages, it works like this:
You are given a start and an end point and a number of turning points in the form (t1, v1), (t2, v2), etc., for example start=(0,0), end=(1,1), turns = [(1/4, 1/2), (3/4, 1/4)], where ti and vi are fractions between 0 and 1.
You determine the actual turning points scaled to that interval between start and end and calculate the differences between those points, i.e. how far to go from pi to reach pi+1.
You shuffle those segments to introduce some randomness; when put together, they still cover exactly the same distance, i.e. they connect the original start and end point.
Repeat by recursively calling the function for the different segments between the new points.
Here's some Python code I just put together:
from __future__ import division
from random import shuffle
def make_graph(depth, graph, start, end, turns):
# add points to graph
graph.add(start)
graph.add(end)
if depth > 0:
# unpack input values
fromtime, fromvalue = start
totime, tovalue = end
# calcualte differences between points
diffs = []
last_time, last_val = fromtime, fromvalue
for t, v in turns:
new_time = fromtime + (totime - fromtime) * t
new_val = fromvalue + (tovalue - fromvalue) * v
diffs.append((new_time - last_time, new_val - last_val))
last_time, last_val = new_time, new_val
# add 'brownian motion' by reordering the segments
shuffle(diffs)
# calculate actual intermediate points and recurse
last = start
for segment in diffs:
p = last[0] + segment[0], last[1] + segment[1]
make_graph(depth - 1, graph, last, p, turns)
last = p
make_graph(depth - 1, graph, last, end, turns)
from matplotlib import pyplot
depth = 8
graph = set()
make_graph(depth, graph, (0, 0), (1, 1), [(1/9, 2/3), (5/9, 1/3)])
pyplot.plot(*zip(*sorted(graph)))
pyplot.show()
And here some example output:

I had a similar interest and developed a python3 library to do just what you want.
pip install fractalmarkets
See https://github.com/hyperstripe50/fractal-market-analysis/blob/master/README.md

Using #tobias_k solution and pandas, we can translate and scale the normalized fractal to a time-based one.
import arrow
import pandas as pd
import time
depth = 5
# the "geometry" of fractal
turns = [
(1 / 9, 0.60),
(5 / 9, 0.30),
(8 / 9, 0.70),
]
# select start / end time
t0 = arrow.now().floor("hours")
t1 = t0.shift(days=5)
start = (pd.to_datetime(t0._datetime), 1000)
end = (pd.to_datetime(t1._datetime), 2000)
# create a non-dimensionalized [0,0]x[1,1] Fractal
_start, _end = (0, 0), (1, 1)
graph = set()
make_graph(depth, graph, _start, _end, turns)
# just check graph length
assert len(graph) == (len(turns) + 1) ** depth + 1
# create a pandas dataframe from the normalized Fractal
df = pd.DataFrame(graph)
df.sort_values(0, inplace=True)
df.reset_index(drop=True, inplace=True)
# translate to real coordinates
X = pd.DataFrame(
data=[(start[0].timestamp(), start[1]), (end[0].timestamp(), end[1])]
).T
delta = X[1] - X[0]
Y = df.mul(delta) + X[0]
Y[0] = [*map(lambda x: pd.to_datetime(x, unit="s"), Y[0])]
# now resample and interpolate data according to *grid* size
grid ="min"
Z = Y.set_index(0)
A = Z.resample(grid).mean().interpolate()
# plot both graph to check errors
import matplotlib.pyplot as plt
ax = Z.plot()
A.plot(ax=ax)
plt.show()
showing both graphs:
and zooming to see interpolation and snap-to-grid differences:

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speed up Python cKDTree - python

Related

Several unintended lines when attempting to create voronoi diagram given scatter point locations

Finding the area of an overlap between curves (python)

Finding anomalous values from sinusoidal data

Splitting self-intersecting polygon only returned one polygon in Shapely

How to generate a fractal graph of a market in python

Categories

Resources