Counterclockwise sorting of x, y data - python

I have a set of points in a text file: random_shape.dat.
The initial order of points in the file is random. I would like to sort these points in a counter-clockwise order as follows (the red dots are the xy data):
I tried to achieve that by using the polar coordinates: I calculate the polar angle of each point (x,y) then sort by the ascending angles, as follows:
"""
Script: format_file.py
Description: This script will format the xy data file accordingly to be used with a program expecting CCW order of data points, By soting the points in Counterclockwise order
Example: python format_file.py random_shape.dat
"""
import sys
import numpy as np
# Read the file name
filename = sys.argv[1]
# Get the header name from the first line of the file (without the newline character)
with open(filename, 'r') as f:
header = f.readline().rstrip('\n')
angles = []
# Read the data from the file
x, y = np.loadtxt(filename, skiprows=1, unpack=True)
for xi, yi in zip(x, y):
angle = np.arctan2(yi, xi)
if angle < 0:
angle += 2*np.pi # map the angle to 0,2pi interval
angles.append(angle)
# create a numpy array
angles = np.array(angles)
# Get the arguments of sorted 'angles' array
angles_argsort = np.argsort(angles)
# Sort x and y
new_x = x[angles_argsort]
new_y = y[angles_argsort]
print("Length of new x:", len(new_x))
print("Length of new y:", len(new_y))
with open(filename.split('.')[0] + '_formatted.dat', 'w') as f:
print(header, file=f)
for xi, yi in zip(new_x, new_y):
print(xi, yi, file=f)
print("Done!")
By running the script:
python format_file.py random_shape.dat
Unfortunately I don't get the expected results in random_shape_formated.dat! The points are not sorted in the desired order.
Any help is appreciated.
EDIT: The expected resutls:
Create a new file named: filename_formatted.dat that contains the sorted data according to the image above (The first line contains the starting point, the next lines contain the points as shown by the blue arrows in counterclockwise direction in the image).
EDIT 2: The xy data added here instead of using github gist:
random_shape
0.4919261070361315 0.0861956168831175
0.4860816807027076 -0.06601587301587264
0.5023029456281289 -0.18238249845392662
0.5194784026079869 0.24347943722943777
0.5395164357511545 -0.3140611471861465
0.5570497147514262 0.36010146103896146
0.6074231036252226 -0.4142604617604615
0.6397066014669927 0.48590810704447085
0.7048302091822873 -0.5173701298701294
0.7499157837544145 0.5698170011806378
0.8000108666123336 -0.6199254449254443
0.8601249660418364 0.6500974025974031
0.9002010323281716 -0.7196585989767801
0.9703341483292582 0.7299242424242429
1.0104102146155935 -0.7931355765446666
1.0805433306166803 0.8102046438410078
1.1206193969030154 -0.865251869342778
1.1907525129041021 0.8909386068476981
1.2308285791904374 -0.9360074773711129
1.300961695191524 0.971219008264463
1.3410377614778592 -1.0076702085792988
1.4111708774789458 1.051499409681228
1.451246943765281 -1.0788793781975592
1.5213800597663678 1.1317798110979933
1.561456126052703 -1.1509956709956706
1.6315892420537896 1.2120602125147582
1.671665308340125 -1.221751279024005
1.7417984243412115 1.2923406139315234
1.7818744906275468 -1.2943211334120424
1.8520076066286335 1.3726210153482883
1.8920836729149686 -1.3596340023612745
1.9622167889160553 1.4533549783549786
2.0022928552023904 -1.4086186540731989
2.072425971203477 1.5331818181818184
2.1125020374898122 -1.451707005116095
2.182635153490899 1.6134622195985833
2.2227112197772345 -1.4884454939000387
2.292844335778321 1.6937426210153486
2.3329204020646563 -1.5192876820149541
2.403053518065743 1.774476584022039
2.443129584352078 -1.5433264462809912
2.513262700353165 1.8547569854388037
2.5533387666395 -1.561015348288075
2.6234718826405867 1.9345838252656438
2.663547948926922 -1.5719008264462806
2.7336810649280086 1.9858362849271942
2.7737571312143436 -1.5750757575757568
2.8438902472154304 2.009421487603306
2.883966313501766 -1.5687258953168035
2.954099429502852 2.023481896890988
2.9941754957891877 -1.5564797323888229
3.0643086117902745 2.0243890200708385
3.1043846780766096 -1.536523022432113
3.1745177940776963 2.0085143644234558
3.2145938603640314 -1.5088557654466737
3.284726976365118 1.9749508067689887
3.324803042651453 -1.472570838252656
3.39493615865254 1.919162731208186
3.435012224938875 -1.4285753640299088
3.5051453409399618 1.8343467138921687
3.545221407226297 -1.3786835891381335
3.6053355066557997 1.7260966810966811
3.655430589513719 -1.3197205824478546
3.6854876392284703 1.6130086580086582
3.765639771801141 -1.2544077134986225
3.750611246943765 1.5024152236652237
3.805715838087476 1.3785173160173163
3.850244800627849 1.2787337662337666
3.875848954088563 -1.1827449822904361
3.919007794704616 1.1336638361638363
3.9860581363759846 -1.1074537583628485
3.9860581363759846 1.0004485329485333
4.058012891753723 0.876878197560016
4.096267318663407 -1.0303482880755608
4.15638141809291 0.7443374218374221
4.206476500950829 -0.9514285714285711
4.256571583808748 0.6491902794175526
4.3166856832382505 -0.8738695395513574
4.36678076609617 0.593855765446675
4.426894865525672 -0.7981247540338443
4.476989948383592 0.5802489177489183
4.537104047813094 -0.72918339236521
4.587199130671014 0.5902272727272733
4.647313230100516 -0.667045454545454
4.697408312958435 0.6246979535615904
4.757522412387939 -0.6148858717040526
4.807617495245857 0.6754968516332154
4.8677315946753605 -0.5754260133805582
4.917826677533279 0.7163173947264858
4.977940776962782 -0.5500265643447455
5.028035859820701 0.7448917748917752
5.088149959250204 -0.5373268398268394
5.138245042108123 0.7702912239275879
5.198359141537626 -0.5445838252656432
5.2484542243955445 0.7897943722943728
5.308568323825048 -0.5618191656828015
5.358663406682967 0.8052154663518301
5.41877750611247 -0.5844972451790631
5.468872588970389 0.8156473829201105
5.5289866883998915 -0.6067217630853987
5.579081771257811 0.8197294372294377
5.639195870687313 -0.6248642266824076
5.689290953545233 0.8197294372294377
5.749405052974735 -0.6398317591499403
5.799500135832655 0.8142866981503349
5.859614235262157 -0.6493565525383702
5.909709318120076 0.8006798504525783
5.969823417549579 -0.6570670995670991
6.019918500407498 0.7811767020857934
6.080032599837001 -0.6570670995670991
6.13012768269492 0.7562308146399057
6.190241782124423 -0.653438606847697
6.240336864982342 0.7217601338055886
6.300450964411845 -0.6420995670995664
6.350546047269764 0.6777646595828419
6.410660146699267 -0.6225964187327819
6.4607552295571855 0.6242443919716649
6.520869328986689 -0.5922077922077915
6.570964411844607 0.5548494687131056
6.631078511274111 -0.5495730027548205
6.681173594132029 0.4686727666273125
6.7412876935615325 -0.4860743801652889
6.781363759847868 0.3679316979316982
6.84147785927737 -0.39541245791245716
6.861515892420538 0.25880333951762546
6.926639500135833 -0.28237987012986965
6.917336127605076 0.14262677798392165
6.946677533279001 0.05098957832291173
6.967431210462995 -0.13605442176870675
6.965045730326905 -0.03674603174603108

I find that an easy way to sort points with x,y-coordinates like that is to sort them dependent on the angle between the line from the points and the center of mass of the whole polygon and the horizontal line which is called alpha in the example. The coordinates of the center of mass (x0 and y0) can easily be calculated by averaging the x,y coordinates of all points. Then you calculate the angle using numpy.arccos for instance. When y-y0 is larger than 0 you take the angle directly, otherwise you subtract the angle from 360° (2𝜋). I have used numpy.where for the calculation of the angle and then numpy.argsort to produce a mask for indexing the initial x,y-values. The following function sort_xy sorts all x and y coordinates with respect to this angle. If you want to start from any other point you could add an offset angle for that. In your case that would be zero though.
def sort_xy(x, y):
x0 = np.mean(x)
y0 = np.mean(y)
r = np.sqrt((x-x0)**2 + (y-y0)**2)
angles = np.where((y-y0) > 0, np.arccos((x-x0)/r), 2*np.pi-np.arccos((x-x0)/r))
mask = np.argsort(angles)
x_sorted = x[mask]
y_sorted = y[mask]
return x_sorted, y_sorted
Plotting x, y before sorting using matplotlib.pyplot.plot (points are obvisously not sorted):
Plotting x, y using matplotlib.pyplot.plot after sorting with this method:

If it is certain that the curve does not cross the same X coordinate (i.e. any vertical line) more than twice, then you could visit the points in X-sorted order and append a point to one of two tracks you follow: to the one whose last end point is the closest to the new one. One of these tracks will represent the "upper" part of the curve, and the other, the "lower" one.
The logic would be as follows:
dist2 = lambda a,b: (a[0]-b[0])*(a[0]-b[0]) + (a[1]-b[1])*(a[1]-b[1])
z = list(zip(x, y)) # get the list of coordinate pairs
z.sort() # sort by x coordinate
cw = z[0:1] # first point in clockwise direction
ccw = z[1:2] # first point in counter clockwise direction
# reverse the above assignment depending on how first 2 points relate
if z[1][1] > z[0][1]:
cw = z[1:2]
ccw = z[0:1]
for p in z[2:]:
# append to the list to which the next point is closest
if dist2(cw[-1], p) < dist2(ccw[-1], p):
cw.append(p)
else:
ccw.append(p)
cw.reverse()
result = cw + ccw
This would also work for a curve with steep fluctuations in the Y-coordinate, for which an angle-look-around from some central point would fail, like here:
No assumption is made about the range of the X nor of the Y coordinate: like for instance, the curve does not necessarily have to cross the X axis (Y = 0) for this to work.

Counter-clock-wise order depends on the choice of a pivot point. From your question, one good choice of the pivot point is the center of mass.
Something like this:
# Find the Center of Mass: data is a numpy array of shape (Npoints, 2)
mean = np.mean(data, axis=0)
# Compute angles
angles = np.arctan2((data-mean)[:, 1], (data-mean)[:, 0])
# Transform angles from [-pi,pi] -> [0, 2*pi]
angles[angles < 0] = angles[angles < 0] + 2 * np.pi
# Sort
sorting_indices = np.argsort(angles)
sorted_data = data[sorting_indices]

Not really a python question I think, but still I think you could try sorting by - sign(y) * x doing something like:
def counter_clockwise_sort(points):
return sorted(points, key=lambda point: point['x'] * (-1 if point['y'] >= 0 else 1))
should work fine, assuming you read your points properly into a list of dicts of format {'x': 0.12312, 'y': 0.912}
EDIT: This will work as long as you cross the X axis only twice, like in your example.

If:
the shape is arbitrarily complex and
the point spacing is ~random
then I think this is a really hard problem.
For what it's worth, I have faced a similar problem in the past, and I used a traveling salesman solver. In particular, I used the LKH solver. I see there is a Python repo for solving the problem, LKH-TSP. Once you have an order to the points, I don't think it will be too hard to decide on a clockwise vs clockwise ordering.

If we want to answer your specific problem, we need to pick a pivot point.
Since you want to sort according to the starting point you picked, I would take a pivot in the middle (x=4,y=0 will do).
Since we're sorting counterclockwise, we'll take arctan2(-(y-pivot_y),-(x-center_x)) (we're flipping the x axis).
We get the following, with a gradient colored scatter to prove correctness (fyi I removed the first line of the dat file after downloading):
import numpy as np
import matplotlib.pyplot as plt
points = np.loadtxt('points.dat')
#oneliner for ordering points (transform, adjust for 0 to 2pi, argsort, index at points)
ordered_points = points[np.argsort(np.apply_along_axis(lambda x: np.arctan2(-x[1],-x[0]+4) + np.pi*2, axis=1,arr=points)),:]
#color coding 0-1 as str for gray colormap in matplotlib
plt.scatter(ordered_points[:,0], ordered_points[:,1],c=[str(x) for x in np.arange(len(ordered_points)) / len(ordered_points)],cmap='gray')
Result (in the colormap 1 is white and 0 is black), they're numbered in the 0-1 range by order:

For points with comparable distances between their neighbouring pts, we can use KDTree to get two closest pts for each pt. Then draw lines connecting those to give us a closed shape contour. Then, we will make use of OpenCV's findContours to get contour traced always in counter-clockwise manner. Now, since OpenCV works on images, we need to sample data from the provided float format to uint8 image format. Given, comparable distances between two pts, that should be pretty safe. Also, OpenCV handles it well to make sure it traces even sharp corners in curvatures, i.e. smooth or not-smooth data would work just fine. And, there's no pivot requirement, etc. As such all kinds of shapes would be good to work with.
Here'e the implementation -
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist
from scipy.spatial import cKDTree
import cv2
from scipy.ndimage.morphology import binary_fill_holes
def counter_clockwise_order(a, DEBUG_PLOT=False):
b = a-a.min(0)
d = pdist(b).min()
c = np.round(2*b/d).astype(int)
img = np.zeros(c.max(0)[::-1]+1, dtype=np.uint8)
d1,d2 = cKDTree(c).query(c,k=3)
b = c[d2]
p1,p2,p3 = b[:,0],b[:,1],b[:,2]
for i in range(len(b)):
cv2.line(img,tuple(p1[i]),tuple(p2[i]),255,1)
cv2.line(img,tuple(p1[i]),tuple(p3[i]),255,1)
img = (binary_fill_holes(img==255)*255).astype(np.uint8)
if int(cv2.__version__.split('.')[0])>=3:
_,contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
else:
contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cont = contours[0][:,0]
f1,f2 = cKDTree(cont).query(c,k=1)
ordered_points = a[f2.argsort()[::-1]]
if DEBUG_PLOT==1:
NPOINTS = len(ordered_points)
for i in range(NPOINTS):
plt.plot(ordered_points[i:i+2,0],ordered_points[i:i+2,1],alpha=float(i)/(NPOINTS-1),color='k')
plt.show()
return ordered_points
Sample run -
# Load data in a 2D array with 2 columns
a = np.loadtxt('random_shape.csv',delimiter=' ')
ordered_a = counter_clockwise_order(a, DEBUG_PLOT=1)
Output -

Related

generate a occupancy map from x, y coordinates of an irregular shape, script dying from SIGKILL

I have a number of CSV files with x, y, and z coordinates. These coordinates are not long/lat, but rather a distance from an origin. So within the CSV, there is a 0,0 origin, and all other x, y locations are a distance from that origin point in meters.
The x, and y values will be both negative and positive float values. The largest file I have is ~1.4 million data points, the smallest is ~20k.
The files represent an irregular shaped map of sorts. The distance values will not produce a uniform shape such as a rectangle, circle, etc. I need to generate a bounding box that fits the most area within the values that are contained within the csv files.
logically, here are the steps I want to take.
Read the points from the file
Get the minimum and maximum x coordinates
get the minimum and maximum y coordinates.
Use min/max coordinates to get a bounding rectangle with (xmin,ymin), (xmax,ymin), (xmin,ymax) and (xmax,ymax) that will contain the entirety of the values of the CSV file.
Create a grid across that rectangle with a 1 m resolution. Set that grid as a boolean array for the occupancy.
Round the map coordinates to the nearest integer.
For every rounded map coordinate switch the occupancy to True.
Use a morphological filter to erode the edges of the occupancy map.
Now when a point is selected check the nearest integer value and whether it falls within the occupancy map.
I'm facing multiple issues, but thus far my biggest issue is memory resources. for some reason this script keeps dying with a SIGKILL, or at least I think that is what is occuring.
class GridBuilder:
"""_"""
def __init__(self, filename, search_radius) -> None:
"""..."""
self.filename = filename
self.search_radius = search_radius
self.load_map()
self.process_points()
def load_map(self):
"""..."""
data = np.loadtxt(self.filename, delimiter=",")
self.x_coord = data[:, 0]
self.y_coord = data[:, 1]
self.z_coord = data[:, 2]
def process_points(self):
"""..."""
min_x = math.floor(np.min(self.x_coord))
min_y = math.floor(np.min(self.y_coord))
max_x = math.floor(np.max(self.x_coord))
max_y = math.floor(np.max(self.y_coord))
int_x_coord = np.floor(self.x_coord).astype(np.int32)
int_y_coord = np.floor(self.y_coord).astype(np.int32)
x = np.arange(min_x, max_x, 1)
y = np.arange(min_y, max_y, 1)
xx, yy = np.meshgrid(x, y, copy=False)
if __name__ == "__main__":
"""..."""
MAP_FILE_DIR = r"/sample_data"
FILE = "testfile.csv"
fname = os.path.join(MAP_FILE_DIR, FILE)
builder = GridBuilder(fname, 500)
my plan was to take the grid with the coordinates and update each location with a dataclass.
#dataclass
class LocationData:
"""..."""
coord: list
occupied: bool
This identifies the grid location, and if its found within the CSV file map.
I understand this is going to be time consuming process, but I figured this would be my first attempt.
I know Stackoverflow generally dislikes attachements, but I figured it might be useful for a sample dataset of what I'm working with. So I've uploaded a file for sharing. test_file
UPDATE:
the original code utilized itertools to generate a grid for each location. I ended up switching away from itertools, and utilized numpy meshgrid() instead. This caused the same issue, but meshgrid() has a copy parameter that can be set to False to preserver memory resources. this fixed the memory issue.
As others mentioned, your process is probably being killed by OOM killer for using too much memory.
You don't really need to copy the whole file to memory (which is what np.loadtxt does), you also don't need to copy values (which is what data[:, 0] does).
You can read file line by line and calculate min/max values with something like this:
min_x = float("+inf")
max_x = float("-inf")
min_y = float("+inf")
max_y = float("-inf")
with open("/your/file/path", "r") as f:
for row in f:
cols = row.split(",")
x, y = cols[0], cols[1]
min_x = min(x, min_x)
max_x = max(x, max_x)
min_y = min(y, min_y)
max_y = max(y, max_y)
This uses minimal memory.
Ok, so with the aid of several other users I've come up with the solution to my immediate issue.
I solved the memory error by utilizing the numpy meshgrid function, and disabling the copy parameter.
xx, yy = np.meshgrid(x, y, copy=False)
MeshGrid() Documentation
However, the implementation was overkill, and I only needed two lists, an X and Y list that was composed of their respective min and max values spaced at intervals of 1. This allowed me to create the grid map I was looking for, with no memory issues.
def process_points(self):
"""..."""
int_x_coord = np.floor(self.x_coord).astype(np.int32)
int_y_coord = np.floor(self.y_coord).astype(np.int32)
min_x = np.min(int_x_coord)
min_y = np.min(int_y_coord)
max_x = np.max(int_x_coord)
max_y = np.max(int_y_coord)
x = np.arange(min_x, max_x + 1, 1)
y = np.arange(min_y, max_y + 1, 1)
grid = np.zeros(shape=(len(x), len(y)))
for x_coord, y_coord in zip(int_x_coord, int_y_coord):
x_pos = np.where(x == x_coord)[0][0]
y_pos = np.where(y == y_coord)[0][0]
grid[x_pos][y_pos] = 1

replace nested for loops combined with conditions to boost performance

In order to speed up my code I want to exchange my for loops by vectorization or other recommended tools. I found plenty of examples with replacing simple for loops but nothing for replacing nested for loops in combination with conditions, which I was able to comprehend / would have helped me...
With my code I want to check if points (X, Y coordinates) can be connected by lineaments (linear structures). I started pretty simple but over time the code outgrew itself and is now exhausting slow...
Here is an working example of the part taking the most time:
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import MultiLineString, LineString, Point
from shapely.affinity import rotate
from math import sqrt
from tqdm import tqdm
import random as rng
# creating random array of points
xys = rng.sample(range(201 * 201), 100)
points = [list(divmod(xy, 201)) for xy in xys]
# plot points
plt.scatter(*zip(*points))
# calculate length for rotating lines -> diagonal of bounds so all points able to be reached
length = sqrt(2)*200
# calculate angles to rotate lines
angles = []
for a in range(0, 360, 1):
angle = np.deg2rad(a)
angles.append(angle)
# copy points array to helper array (points_list) so original array is not manipulated
points_list = points.copy()
# array to save final lines
lines = []
# iterate over every point in points array to search for connecting lines
for point in tqdm(points):
# delete point from helper array to speed up iteration -> so points do not get
# double, triple, ... checked
if len(points_list) > 0:
points_list.remove(point)
else:
break
# create line from original point to point at end of line (x+length) - this line
# gets rotated at calculated angles
start = Point(point)
end = Point(start.x+length, start.y)
line = LineString([start,end])
# iterate over angle Array to rotate line by each angle
for angle in angles:
rot_line = rotate(line, angle, origin=start, use_radians=True)
lst = list(rot_line.coords)
# save starting point (a) and ending point(b) of rotated line for np.cross()
# (cross product to check if points on/near rotated line)
a = np.asarray(lst[0])
b = np.asarray(lst[1])
# counter to count number of points on/near line
count = 0
line_list = []
# iterate manipulated points_list array (only points left for which there has
# not been a line rotated yet)
for poi in points_list:
# check whether point (pio) is on/near rotated line by calculating cross
# product (np.corss())
p = np.asarray(poi)
cross = np.cross(p-a,b-a)
# check if poi is inside accepted deviation from cross product
if cross > -750 and cross < 750:
# check if more than 5 points (poi) are on/near the rotated line
if count < 5:
line_list.append(poi)
count += 1
# if 5 points are connected by the rotated line sort the coordinates
# of the points and check if the length of the line meets the criteria
else:
line_list = sorted(line_list , key=lambda k: [k[1], k[0]])
line_length = LineString(line_list)
if line_length.length >= 10 and line_length.length <= 150:
lines.append(line_list)
break
# use shapeplys' MultiLineString to create lines from coordinates and plot them
# afterwards
multiLines = MultiLineString(lines)
fig, ax = plt.subplots()
ax.set_title("Lines")
for multiLine in MultiLineString(multiLines).geoms:
# print(multiLine)
plt.plot(*multiLine.xy)
As mentioned above it was thinking about using pandas or numpy vectorization and therefore build a pandas df for the points and lines (gdf) and one with the different angles (angles) to rotate the lines:
Name
Type
Size
Value
gdf
DataFrame
(122689, 6)
Column name: x, y, value, start, end, line
angles
DataFrame
(360, 1)
Column name: angle
But I ran out of ideas to replace this nested for loops with conditions with pandas vectorization. I found this article on medium and halfway through the article there are conditions for vectorization mentioned and I was wondering if my code maybe is not suitbale for vectorization because of dependencies within the loops...
If this is right, it does not necessarily needs to be vectoriation everything boosting the performance is welcome!
You can quite easily vectorize the most computationally intensive part: the innermost loop. The idea is to compute the points_list all at once. np.cross can be applied on each lines, np.where can be used to filter the result (and get the IDs).
Here is the (barely tested) modified main loop:
for point in tqdm(points):
if len(points_list) > 0:
points_list.remove(point)
else:
break
start = Point(point)
end = Point(start.x+length, start.y)
line = LineString([start,end])
# CHANGED PART
if len(points_list) == 0:
continue
p = np.asarray(points_list)
for angle in angles:
rot_line = rotate(line, angle, origin=start, use_radians=True)
a, b = np.asarray(rot_line.coords)
cross = np.cross(p-a,b-a)
foundIds = np.where((cross > -750) & (cross < 750))[0]
if foundIds.size > 5:
# Similar to the initial part, not efficient, but rarely executed
line_list = p[foundIds][:5].tolist()
line_list = sorted(line_list, key=lambda k: [k[1], k[0]])
line_length = LineString(line_list)
if line_length.length >= 10 and line_length.length <= 150:
lines.append(line_list)
This is about 15 times faster on my machine.
Most of the time is spent in the shapely module which is very inefficient (especially rotate and even np.asarray(rot_line.coords)). Indeed, each call to rotate takes about 50 microseconds which is simply insane: it should take no more than 50 nanoseconds, that is, 1000 time faster (actually, an optimized native code should be able to to that in less than 20 ns on my machine). If you want a faster code, then please consider not using this package (or improving its performance).

How can I cut a piece away from a plot and set the point I need to zero?

In my work I have the task to read in a CSV file and do calculations with it. The CSV file consists of 9 different columns and about 150 lines with different values acquired from sensors. First the horizontal acceleration was determined, from which the distance was derived by double integration. This represents the lower plot of the two plots in the picture. The upper plot represents the so-called force data. The orange graph shows the plot over the 9th column of the CSV file and the blue graph shows the plot over the 7th column of the CSV file.
As you can see I have drawn two vertical lines in the lower plot in the picture. These lines represent the x-value, which in the upper plot is the global minimum of the orange function and the intersection with the blue function. Now I want to do the following, but I need some help: While I want the intersection point between the first vertical line and the graph to be (0,0), i.e. the function has to be moved down. How do I achieve this? Furthermore, the piece of the function before this first intersection point (shown in purple) should be omitted, so that the function really only starts at this point. How can I do this?
In the following picture I try to demonstrate how I would like to do that:
If you need my code, here you can see it:
import numpy as np
import matplotlib.pyplot as plt
import math as m
import loaddataa as ld
import scipy.integrate as inte
from scipy.signal import find_peaks
import pandas as pd
import os
# Loading of the values
print(os.path.realpath(__file__))
a,b = os.path.split(os.path.realpath(__file__))
print(os.chdir(a))
print(os.chdir('..'))
print(os.chdir('..'))
path=os.getcwd()
path=path+"\\Data\\1 Fabienne\\Test1\\left foot\\50cm"
print(path)
dataListStride = ld.loadData(path)
indexStrideData = 0
strideData = dataListStride[indexStrideData]
#%%Calculation of the horizontal acceleration
def horizontal(yAngle, yAcceleration, xAcceleration):
a = ((m.cos(m.radians(yAngle)))*yAcceleration)-((m.sin(m.radians(yAngle)))*xAcceleration)
return a
resultsHorizontal = list()
for i in range (len(strideData)):
strideData_yAngle = strideData.to_numpy()[i, 2]
strideData_xAcceleration = strideData.to_numpy()[i, 4]
strideData_yAcceleration = strideData.to_numpy()[i, 5]
resultsHorizontal.append(horizontal(strideData_yAngle, strideData_yAcceleration, strideData_xAcceleration))
resultsHorizontal.insert(0, 0)
#plt.plot(x_values, resultsHorizontal)
#%%
#x-axis "convert" into time: 100 Hertz makes 0.01 seconds
scale_factor = 0.01
x_values = np.arange(len(resultsHorizontal)) * scale_factor
#Calculation of the global high and low points
heel_one=pd.Series(strideData.iloc[:,7])
plt.scatter(heel_one.idxmax()*scale_factor,heel_one.max(), color='red')
plt.scatter(heel_one.idxmin()*scale_factor,heel_one.min(), color='blue')
heel_two=pd.Series(strideData.iloc[:,9])
plt.scatter(heel_two.idxmax()*scale_factor,heel_two.max(), color='orange')
plt.scatter(heel_two.idxmin()*scale_factor,heel_two.min(), color='green')#!
#Plot of force data
plt.plot(x_values[:-1],strideData.iloc[:,7]) #force heel
plt.plot(x_values[:-1],strideData.iloc[:,9]) #force toe
# while - loop to calculate the point of intersection with the blue function
i = heel_one.idxmax()
while strideData.iloc[i,7] > strideData.iloc[i,9]:
i = i-1
# Length calculation between global minimum orange function and intersection with blue function
laenge=(i-heel_two.idxmin())*scale_factor
print(laenge)
#%% Integration of horizontal acceleration
velocity = inte.cumtrapz(resultsHorizontal,x_values)
plt.plot(x_values[:-1], velocity)
#%% Integration of the velocity
s = inte.cumtrapz(velocity, x_values[:-1])
plt.plot(x_values[:-2],s)
I hope it's clear what I want to do. Thanks for helping me!
I didn't dig all the way through your code, but the following tricks may be useful.
Say you have x and y values:
x = np.linspace(0,3,100)
y = x**2
Now, you only want the values corresponding to, say, .5 < x < 1.5. First, create a boolean mask for the arrays as follows:
mask = np.logical_and(.5 < x, x < 1.5)
(If this seems magical, then run x < 1.5 in your interpreter and observe the results).
Then use this mask to select your desired x and y values:
x_masked = x[mask]
y_masked = y[mask]
Then, you can translate all these values so that the first x,y pair is at the origin:
x_translated = x_masked - x_masked[0]
y_translated = y_masked - y_masked[0]
Is this the type of thing you were looking for?

Incorrect results for simple 2D transformation

I'm attempting a 2D transformation using the nudged package.
The code is really simple:
import nudged
# Domain data
x_d = [2538.87, 1294.42, 3002.49, 2591.56, 2881.37, 891.906, 1041.24, 2740.13, 1928.55, 3335.12, 3771.76, 1655.0, 696.772, 583.242, 2313.95, 2422.2]
y_d = [2501.89, 4072.37, 2732.65, 2897.21, 808.969, 1760.97, 992.531, 1647.57, 2407.18, 2868.68, 724.832, 1938.11, 1487.66, 1219.14, 672.898, 145.059]
# Range data
x_r = [3.86551776277075, 3.69693290266126, 3.929110096606081, 3.8731112887391532, 3.9115924127798536, 3.6388068074815862, 3.6590261077461577, 3.892482104449016, 3.781816183438835, 3.97464058821231, 4.033173444601999, 3.743901522907265, 3.6117470568340906, 3.5959585708147728, 3.8338853650390945, 3.8487836817639334]
y_r = [1.6816478101135388, 1.8732008327428353, 1.7089144628920678, 1.729386055302033, 1.4767657611559102, 1.5933812675900505, 1.5003232598807479, 1.5781629182153942, 1.670867507106891, 1.7248363641300841, 1.4654588884234485, 1.6143557610354264, 1.5603626129237362, 1.5278835570641824, 1.4609066190929916, 1.397111300807424]
# Random domain data
x, y = np.random.uniform(0., 4000., (2, 1000))
# Define domain and range points
dom, ran = (x_d, y_d), (x_r, y_r)
# Obtain transformation dom --> ran
trans = nudged.estimate(dom, ran)
# Apply the transformation to the (x, y) points
x_t, y_t = trans.transform((x, y))
where (x_d, y_d) and (x_r, y_r) are the 1 to 1 correlated "domain" and "range" points, and (x, y) are all the points in the (x_d, y_d) (domain) system that I want to transform to the (x_r, y_r) (range) system.
This is the result I get:
where:
trans.get_matrix()
[[-0.0006459232439068067, -0.0007947429558548157, 6.534164085946009], [0.0007947429558548157, -0.0006459232439068067, 2.515279819707991], [0, 0, 1]]
trans.get_rotation()
2.2532603497070713
trans.get_scale()
0.0010241255796531702
trans.get_translation()
[6.534164085946009, 2.515279819707991]
This is the final transformed dom values with the original ran points overlayed:
This is clearly not right and I can't figure out what I'm doing wrong.
I was able to figure out your issue. It is simply that nudge has somewhat problematic notation, which is poorly documented.
The estimate function accepts a list of coordinate pairs. You effectively have to transpose dom and ran to get this to work. I suggest either switching to numpy arrays, or using list(map(list, zip(...))) to do the transpose.
The Transform.transfom method is extremely restrictive, and requires that the inner pairs be of type list. Not tuple, not any other sequence, but specifically list. Your attempt to call trans.transform((x, y)) only happened to work by pure luck. transform assessed that the first element is not a list, and attempted to transform (x, y) as a pair of integers. Luckily for you, numpy operators are vectorized, so you can process an entire array as a single unit.
Here is a working version of your code that generates the correct plots using mostly python:
x_d = [2538.87, 1294.42, 3002.49, 2591.56, 2881.37, 891.906, 1041.24, 2740.13, 1928.55, 3335.12, 3771.76, 1655.0, 696.772, 583.242, 2313.95, 2422.2]
y_d = [2501.89, 4072.37, 2732.65, 2897.21, 808.969, 1760.97, 992.531, 1647.57, 2407.18, 2868.68, 724.832, 1938.11, 1487.66, 1219.14, 672.898, 145.059]
# Range data
x_r = [3.86551776277075, 3.69693290266126, 3.929110096606081, 3.8731112887391532, 3.9115924127798536, 3.6388068074815862, 3.6590261077461577, 3.892482104449016, 3.781816183438835, 3.97464058821231, 4.033173444601999, 3.743901522907265, 3.6117470568340906, 3.5959585708147728, 3.8338853650390945, 3.8487836817639334]
y_r = [1.6816478101135388, 1.8732008327428353, 1.7089144628920678, 1.729386055302033, 1.4767657611559102, 1.5933812675900505, 1.5003232598807479, 1.5781629182153942, 1.670867507106891, 1.7248363641300841, 1.4654588884234485, 1.6143557610354264, 1.5603626129237362, 1.5278835570641824, 1.4609066190929916, 1.397111300807424]
# Random domain data
uni = np.random.uniform(0., 4000., (2, 1000))
# Define domain and range points
dom = list(map(list, zip(x_d, y_d)))
ran = list(map(list, zip(x_r, y_r)))
# Obtain transformation dom --> ran
trans = estimate(dom, ran)
# Apply the transformation to the (x, y) points
tra = trans.transform(uni)
fig, ax = plt.subplots(2, 2)
ax[0][0].scatter(x_d, y_d)
ax[0][0].set_title('dom')
ax[0][1].scatter(x_r, y_r)
ax[0][1].set_title('ran')
ax[1][0].scatter(*uni)
ax[1][1].scatter(*tra)
I left in your hack with uni, since I did not feel like converting the array of random values to a nested list. The resulting plot looks like this:
My overall recommendation is to submit a number of bug reports to the nudge library based on these findings.

Applying circular filter to image in Python / Applying function to each element of numpy array

I have a Python code that works, but it's quite slow and I believe there has to be a way of doing this more efficiently.
The idea is to apply a filter to an image. The filter is an average of the points which fall within a specified radius. The input is a mx2 array representing x,y and mx1 array z, representing coordinates of m observation points.
The program that works is the following
import numpy as np
def haversine(point, xy_list):
earth_radius = 6378137.0
dlon = np.radians(xy_list[:,0]) - np.radians(point[0])
dlat = np.radians(xy_list[:,1]) - np.radians(point[1])
a = np.square(np.sin(dlat/2.0)) + np.cos(np.radians(point[0])) * np.cos(np.radians(xy_list[:,0])) * np.square(np.sin(dlon/2.0))
return 2 * earth_radius * np.arcsin(np.sqrt(a))
def circular_filter(xy, z, radius):
filtered = np.zeros(xy.shape[0])
for q in range(xy.shape[0]):
dist = haversine(xy[q,:],xy)
masked_z = np.ma.masked_where(dist>radius, z)
filtered[q] = masked_z.mean()
return filtered
x = np.random.uniform(low=-90, high=0, size=(1000,1)) # x represents longitude
y = np.random.uniform(low=0, high=90, size=(1000,1)) # y represents latitude
xy = np.hstack((x,y))
z = np.random.rand(1000,)
fitered_z = circular_filter(xy, z, radius=100.)
The problem is that I have 6 million points per data set, and the code is horribly slow. There must be a way to do this more efficiently. I thought of using scipy.spatial.distance.cdist() which is fast, but then I'd have to reproject the data to UTM, and I'd like to avoid reprojection. Any suggestions?
Thanks,
Reniel
After a lot of reading an searching I finally found the reason my code took forever to run. It's because I needed to understand and apply the concept of a filter kernel. Basically I realized there was a connection between my problem and this post:
Local Maxima with circular window
The downside: User needs to provide proper EPSG code, but I think I can find workarounds for this later.
The upside: It is very fast and efficient.
What worked for me was converting the lat long to UTM so that we can create a circular kernel and apply generic_filter from scipy.
import numpy as np
from pyproj import Proj, transform
from scipy.ndimage.filters import generic_filter
def circular_filter(tile, radius):
x, y = np.meshgrid(tile['lon'], tile['lat'])
x = x.reshape(x.size)
y = np.flipud(y.reshape(y.size))
z = tile['values'].reshape(tile['values'].size)
wgs84 = Proj(init='epsg:4326')
utm18N = Proj(init='epsg:26918')
x,y = transform(wgs84,utm18N,x,y)
dem_res = np.abs(x[1]-x[0]) # calculates the raster resolution, (original data is geoTiff read using gdal).
radius = int(np.ceil(radius/dem_res)) # user gives meters, but we figure out number of cells
print(radius)
kernel = np.zeros((2*radius+1, 2*radius+1))
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
mask = x**2 + y**2 <= radius**2
kernel[mask] = 1
print('Commence circular filter.'); start = time.time()
tile['values'] = generic_filter(tile['values'], np.mean, footprint=kernel)
print('Took {:.3f} seconds'.format(time.time()-start))
I also took a look at clustering techniques from here: http://geoffboeing.com/2014/08/clustering-to-reduce-spatial-data-set-size/
But I realized these clustering techniques serve a completely different purpose.

Categories