Vectorized creation of shapely Polygons from GeoPandas DataFrame

Vectorized creation of shapely Polygons from GeoPandas DataFrame - python

I have a GeoDataFrame with a point geometry.
From the point geometry, I want to define a square polygon geometry in a quite straightforward manner.
Given a point, the point should be the left bottom corner in a square with sides 250 units of length.
I.e, left bottom corner is the current point, right bottom corner is the current point + 250 on the x axis etc.
My naive way of doing this is the following:
Create the corners as new columns in the GeoDataFrame:
After that, I try to define a new columns as:
gdf['POLY'] = shapely.Geometry([gdf['BOTTOM_LEFT'], gdf['BOTTOM_RIGHT'], gdf['TOP_LEFT'], gdf['TOP_RIGHT']])
But this returns the following error message:
AttributeError: 'list' object has no attribute '__array_interface__'

Your implementation is close, but you can't call shapely.geometry.Polygon with an array of points - it can only be done one at a time. So the trick is to use df.apply to call Polygon on every row of the DataFrame:
gdf['geometry'] = gdf.apply(
lambda s: shapely.geometry.Polygon(
[s['BOTTOM_LEFT'], s['BOTTOM_RIGHT'], s['TOP_LEFT'], s['TOP_RIGHT']],
axis=1,
)
)
You could do that with your original point using translate:
gdf['geometry'] = gdf.apply(
lambda s: shapely.geometry.Polygon(
[
s['POINT'],
s['POINT'].translate(xoff=250),
s['POINT'].translate(yoff=250, xoff=250),
s['POINT'].translate(yoff=250),
],
axis=1,
)
)

Let's assume you have a GeoDataFrame with only single point. It is called gdf and it looks as follows:
X Y geometry
0 5 6 POINT (5.00000 6.00000)
You can access the x and y components of the point using the following lambda function:
#Access x and y components of point geometry
X = gdf.geometry.apply(lambda x: x.x)
Y = gdf.geometry.apply(lambda x: x.y)
Now you can create a square object using shapely.geometry.Polygon. You need to specify the four vertices of the square. You can do it using:
gdf_square = shapely.geometry.Polygon([[X[0], Y[0]],
[X[0]+250, Y[0]],
[X[0]+250, Y[0]+250],
[X[0], Y[0]+250]])
You can get a square polygon object as shown below:
Note that if you have many points in the GeoDataFrame, modify the last function such that it creates the square polygon for point in each row one by one.

In my case it was more than 5 times faster to build the triangles using list comprehension than using geopandas.apply :
polys = [Polygon(((x, y), (x, y+d), (x+d, y+d), (x+d, y))) for x in xs for y in ys]
gdf = gpd.GeoDataFrame(geometry=polys)

Related

Python Shapley Point within a Polygon wrong data

I have a function that assigns an id if a point is within a polygon. My function is classifying the same shapely point incorrectly. It runs over two DataFrames poly that contains the polygon in shapely format (I looked at the polygons and look correct) and df that contains the start_point in shapely format. When I run the code I get inconsistent results. The dataset I am using is big, over 2 million rows. None of the misclassified points are on the boundary of the polygon.
def inside_polygon(df, polygons):
result = np.zeros((len(df), 2), dtype=object)
for polygon in polygons[["fence_id","polygon","name"]].itertuples():
inside = np.array([point.within(polygon.polygon) for point in df["start_point"]])
result[inside, 0] = polygon.fence_id
result[inside, 1] = polygon.name
return pd.DataFrame(result, columns=["fence_id", "name"])
df.loc[:,'start_point'] = df.apply(lambda row: Point(row['start_long'], row['start_lat']), axis=1)
df["fence_id"] = None
df["name"] = None
df.loc[:, ['fence_id','name']] = inside_polygon(df, poly)
| Same Point different classification (the point is actually outside the polygon (https://i.stack.imgur.com/VdPzi.png) A | Column B |
Can someone help?
Tried using both "within" and "contain" function, same results for both, maybe the issue is on how I link the fence_id on 'poly' DataFrame with the 'df' DataFrame that contains the Points

shapely interpolate in three dimensions returns Point Z but invalid results

I am trying to use interpolate along a three dimensional line. However, any changes in the Z axis are not taken into account by .interpolate.
LineString([(0, 0, 0), (0, 0, 1), (0, 0, 2)]).interpolate(1, normalized=True).wkt
'POINT Z (0 0 0)'
vs
LineString([(0, 0, 0), (0, 1, 0), (0, 2, 0)]).interpolate(1, normalized=True).wkt
'POINT Z (0 2 0)'
I read the documentation and it is silent on 3D lines or the restriction is documented at a higher level than the interpolate documentation.
Is this a bug? I can't believe I'm the first person to try this.
Assuming that there is no direct way to accomplish this, any suggestions for doing my own interpolation?

That does indeed seem like a bug from shapely. I looked into the source code a little bit and I'm willing to bet it's an upstream issue with PyGEOS.
Anyways, here's a little implementation I put together:
import numpy as np
import shapely
import geopandas as gpd # Only necessary for the examples, not the actual function
def my_interpolate(input_line, input_dist, normalized=False):
'''
Function that interpolates the coordinates of a shapely LineString.
Note: If you use this function on a MultiLineString geometry, it will
"flatten" the geometry and consider all the points in it to be
consecutively connected. For example, consider the following shape:
MultiLineString(((0,0),(0,2)),((0,4),(0,6)))
In this case, this function will assume that there is no gap between
(0,2) and (0,4). Instead, the function will assume that these points
all connected. Explicitly, the MultiLineString above will be
interpreted instead as the following shape:
LineString((0,0),(0,2),(0,4),(0,6))
Parameters
----------
input_line : shapely.geometry.Linestring or shapely.geometry.MultiLineString
(Multi)LineString whose coordinates you want to interpolate
input_dist : float
Distance used to calculate the interpolation point
normalized : boolean
Flag that indicates whether or not the `input_dist` argument should be
interpreted as being an absolute number or a percentage that is
relative to the total distance or not.
When this flag is set to "False", the `input_dist` argument is assumed
to be an actual absolute distance from the starting point of the
geometry. When this flag is set to "True", the `input_dist` argument
is assumed to represent the relative distance with respect to the
geometry's full distance.
The default is False.
Returns
-------
shapely.geometry.Point
The shapely geometry of the interpolated Point.
'''
# Making sure the entry value is a LineString or MultiLineString
if ((input_line.type.lower() != 'linestring') and
(input_line.type.lower() != 'multilinestring')):
return None
# Extracting the coordinates from the geometry
if input_line.type.lower()[:len('multi')] == 'multi':
# In case it's a multilinestring, this step "flattens" the points
coords = [item for sub_list in [list(this_geom.coords) for
this_geom in input_line.geoms]
for item in sub_list]
else:
coords = [tuple(coord) for coord in list(input_line.coords)]
# Transforming the list of coordinates into a numpy array for
# ease of manipulation
coords = np.array(coords)
# Calculating the distances between points
dists = ((coords[:-1] - coords[1:])**2).sum(axis=1)**0.5
# Calculating the cumulative distances
dists_cum = np.append(0,dists.cumsum())
# Finding the total distance
dist_total = dists_cum[-1]
# Finding appropriate use of the `input_dist` value
if normalized == False:
input_dist_abs = input_dist
input_dist_rel = input_dist / dist_total
else:
input_dist_abs = input_dist * dist_total
input_dist_rel = input_dist
# Taking care of some edge cases
if ((input_dist_rel < 0) or
(input_dist_rel > 1) or
(input_dist_abs < 0) or
(input_dist_abs > dist_total)):
return None
elif ((input_dist_rel == 0) or (input_dist_abs == 0)):
return shapely.geometry.Point(coords[0])
elif ((input_dist_rel == 1) or (input_dist_abs == dist_total)):
return shapely.geometry.Point(coords[-1])
# Finding which point is immediately before and after the input distance
pt_before_idx = np.arange(dists_cum.shape[0])[(dists_cum <= input_dist_abs)].max()
pt_after_idx = np.arange(dists_cum.shape[0])[(dists_cum >= input_dist_abs)].min()
pt_before = coords[pt_before_idx]
pt_after = coords[pt_after_idx]
seg_full_dist = dists[pt_before_idx]
dist_left = input_dist_abs - dists_cum[pt_before_idx]
# Calculating the interpolated coordinates
interpolated_coords = ((dist_left / seg_full_dist) * (pt_after - pt_before)) + pt_before
# Creating a shapely geometry
interpolated_point = shapely.geometry.Point(interpolated_coords)
return interpolated_point
The function above can be used on Shapely (Multi)LineStrings. Here's an example of it being applied to a simple LineString.
input_line = shapely.geometry.LineString([(0, 0, 0),
(1, 2, 3),
(4, 5, 6)])
interpolated_point = my_interpolate(input_line, 2.5, normalized=False)
print(interpolated_point.wkt)
> POINT Z (0.6681531047810609 1.336306209562122 2.004459314343183)
And here's an example of using the apply method to perform the interpolation on a whole GeoDataFrame of LineStrings:
line_df = gpd.GeoDataFrame({'id':[1,
2,
3],
'geometry':[input_line,
input_line,
input_line],
'interpolate_dist':[0.5,
2.5,
6.5],
'interpolate_dist_normalized':[True,
False,
False]})
interpolated_points = line_df.apply(
lambda row: my_interpolate(input_line=row['geometry'],
input_dist=row['interpolate_dist'],
normalized=row['interpolate_dist_normalized']),
axis=1)
print(interpolated_points.apply(lambda point: point.wkt))
> 0 POINT Z (1.419876550265357 2.419876550265356 3...
> 1 POINT Z (0.6681531047810609 1.336306209562122 ...
> 2 POINT Z (2.592529850263281 3.592529850263281 4...
> dtype: object
Important notes
Corner cases and error handling
Please note that the function I developed doesn't do error handling very well. In many cases, it just silently returns a None object. Depending on your use case, you might want to adjust that behavior.
MultiLineStrings
The function above can be used on MultiLineStrings, but it makes some simplifications and assumptions. If you use this function on a MultiLineString geometry, it will "flatten" the geometry and consider all the points in it to be consecutively connected. For example, consider the following shape:
MultiLineString(((0,0),(0,2)),((0,4),(0,6)))
In this case, the function will assume that there is no gap between (0,2) and (0,4). Instead, the function will assume that these points are all connected. Explicitly, the MultiLineString above will be interpreted instead as the following shape:
LineString((0,0),(0,2),(0,4),(0,6))

Someone asked me " Can you interpolate along each axis instead of doing all three together?" I think the answer is yes and here is the approach I used.
# Upsample to 1S intervals rather than our desired interval because resample throws
# out rows that do not fall on the desired interval, including the rows we want to keep.
int_df = df.resample('1S', origin='start').asfreq()
# For each axis, interpolate to fill in NAN values.
int_df['Latitude'] = int_df['Latitude'].interpolate(method='polynomial', order=order)
int_df['Longitude'] = int_df['Longitude'].interpolate(method='polynomial', order=order)
int_df['AGL'] = int_df['AGL'].interpolate(method='polynomial', order=order)
# Now downsample to our desired frequency
int_df = int_df.resample('5S', origin='start').asfreq()
I initially resampled at 5S intervals but that caused any existing points that were not on the interval boundaries to get dropped in favor of new ones that were on the interval boundaries. For my use case this is important. If you want regular intervals then you don't need to upsample then down sample.
After that, just interpolate each of the three axis.
So, if I started with:
I now have:

To answer the question of why the shapely manipulation functions are not operating on 3D / Z:
From shapely docs. (writing this when version 1.8.X is current)
A third z coordinate value may be used when constructing instances,
but has no effect on geometric analysis. All operations are performed
in the x-y plane.
I also need Z for my purposes. So was searching for this information to see if using geopandas (which uses shaply) was an option, rather then osgeo.ogr.

geopandas point in polygon distance to nearest edge

So I have a geopandas dataframe of ~10,000 rows like this. Each point is within the polygon (I've made sure of it).
point name field_id geometry
POINT(-0.1618445 51.5103873) polygon1 1 POLYGON ((-0.1642799 51.5113756, -0.1639581 51.5089851, -0.1593661 51.5096729, -0.1606536 51.5115358, -0.1642799 51.5113756))
I want to add a new column called distance_to_nearest_edge. Which is the distance from the point to the nearest boundary of the polygon.
There is a shapely function that calculates what I want:
from shapely import wkt
poly = wkt.loads('POLYGON ((-0.1642799 51.5113756, -0.1639581 51.5089851, -0.1593661 51.5096729, -0.1606536 51.5115358, -0.1642799 51.5113756))')
pt = wkt.loads('POINT(-0.1618445 51.5103873)')
dist = poly.boundary.distance(pt)
---
dist = 0.0010736436340879488
But I'm struggling to apply this to 10k rows.
I've tried creating a function, but I keep getting errors ("'Polygon' object has no attribute 'encode'", 'occurred at index 0')
Eg:
def fxy(x, y):
poly = wkt.loads(x)
pt = wkt.loads(y)
return poly.exterior.distance(pt)
Appreciate any help!

I think your data has missing values.
you can try this:
df['distance'] = df.apply(lambda row : row['point'].distance(row['geometry'].boundary) if pd.notnull(row['point']) & pd.notnull(row['geometry']) else np.nan, axis=1)

How to pick the nearest marker from clicked point in vispy?

In vispy library I have a list of points shown as markers, and I want to change the color of(or just get the index of) the point which is nearest to the clicked point.
I can get the pixels of the clicked point by event.pos, but i need its actual co-ordinate to compare it with others (or get pixel point of other markers to compare it with the event location).
I have this code to get the nearest point index.which takes input of an array and a point(clicked one)
def get_nearest_index(pos,p0):
count=0
col =[(1,1,1,0.5) for i in pos]
dist= math.inf
ind=0
for i in range(len(pos)):
d = (pos[i][0]-p0[0])**2+(pos[i][1]-p0[1])**2
if d<dist:
ind=i
dist=d
return ind
But the problem is i have to pass both of them in same co-ordinate system.
Printing out event.pos returns pixels like: [319 313] while my positions, in pos array are:
[[-0.23801816 0.55117583 -0.56644607]
[-0.91117247 -2.28957391 -1.3636486 ]
[-1.81229627 0.50565064 -0.06175591]
[-1.79744952 0.48388072 -0.00389405]
[ 0.33729051 -0.4087148 0.57522977]]
So I need to convert one of them to another. Transformation like
tf = view.scene.transform
p0 = tf.map(pixel_pt)
print(str(pixel_pt) + "--->"+str(p0))
prints out [285 140 0 1]--->[ 4.44178173e+04 -1.60156369e+04 0.00000000e+00 1.00000000e+00] which is nowhere near the points.

When transforming the pixels to your local co-ordinates, you are using transform.map, which according to the vispy tutorial, gives you map co-ordinates. What you need to use is the inverse map.
You could try doing this:
tf = view.scene.transform
point = tf.imap(event.pos)
print(str(event.pos) + "--->"+str(point))
Likewise, in case you need to transform specific sets of markers, this would be a better approach.
ct = markers.node_transform(markers.root_node)
point = ct.imap(event.pos)

Counterclockwise sorting of x, y data

I have a set of points in a text file: random_shape.dat.
The initial order of points in the file is random. I would like to sort these points in a counter-clockwise order as follows (the red dots are the xy data):
I tried to achieve that by using the polar coordinates: I calculate the polar angle of each point (x,y) then sort by the ascending angles, as follows:
"""
Script: format_file.py
Description: This script will format the xy data file accordingly to be used with a program expecting CCW order of data points, By soting the points in Counterclockwise order
Example: python format_file.py random_shape.dat
"""
import sys
import numpy as np
# Read the file name
filename = sys.argv[1]
# Get the header name from the first line of the file (without the newline character)
with open(filename, 'r') as f:
header = f.readline().rstrip('\n')
angles = []
# Read the data from the file
x, y = np.loadtxt(filename, skiprows=1, unpack=True)
for xi, yi in zip(x, y):
angle = np.arctan2(yi, xi)
if angle < 0:
angle += 2*np.pi # map the angle to 0,2pi interval
angles.append(angle)
# create a numpy array
angles = np.array(angles)
# Get the arguments of sorted 'angles' array
angles_argsort = np.argsort(angles)
# Sort x and y
new_x = x[angles_argsort]
new_y = y[angles_argsort]
print("Length of new x:", len(new_x))
print("Length of new y:", len(new_y))
with open(filename.split('.')[0] + '_formatted.dat', 'w') as f:
print(header, file=f)
for xi, yi in zip(new_x, new_y):
print(xi, yi, file=f)
print("Done!")
By running the script:
python format_file.py random_shape.dat
Unfortunately I don't get the expected results in random_shape_formated.dat! The points are not sorted in the desired order.
Any help is appreciated.
EDIT: The expected resutls:
Create a new file named: filename_formatted.dat that contains the sorted data according to the image above (The first line contains the starting point, the next lines contain the points as shown by the blue arrows in counterclockwise direction in the image).
EDIT 2: The xy data added here instead of using github gist:
random_shape
0.4919261070361315 0.0861956168831175
0.4860816807027076 -0.06601587301587264
0.5023029456281289 -0.18238249845392662
0.5194784026079869 0.24347943722943777
0.5395164357511545 -0.3140611471861465
0.5570497147514262 0.36010146103896146
0.6074231036252226 -0.4142604617604615
0.6397066014669927 0.48590810704447085
0.7048302091822873 -0.5173701298701294
0.7499157837544145 0.5698170011806378
0.8000108666123336 -0.6199254449254443
0.8601249660418364 0.6500974025974031
0.9002010323281716 -0.7196585989767801
0.9703341483292582 0.7299242424242429
1.0104102146155935 -0.7931355765446666
1.0805433306166803 0.8102046438410078
1.1206193969030154 -0.865251869342778
1.1907525129041021 0.8909386068476981
1.2308285791904374 -0.9360074773711129
1.300961695191524 0.971219008264463
1.3410377614778592 -1.0076702085792988
1.4111708774789458 1.051499409681228
1.451246943765281 -1.0788793781975592
1.5213800597663678 1.1317798110979933
1.561456126052703 -1.1509956709956706
1.6315892420537896 1.2120602125147582
1.671665308340125 -1.221751279024005
1.7417984243412115 1.2923406139315234
1.7818744906275468 -1.2943211334120424
1.8520076066286335 1.3726210153482883
1.8920836729149686 -1.3596340023612745
1.9622167889160553 1.4533549783549786
2.0022928552023904 -1.4086186540731989
2.072425971203477 1.5331818181818184
2.1125020374898122 -1.451707005116095
2.182635153490899 1.6134622195985833
2.2227112197772345 -1.4884454939000387
2.292844335778321 1.6937426210153486
2.3329204020646563 -1.5192876820149541
2.403053518065743 1.774476584022039
2.443129584352078 -1.5433264462809912
2.513262700353165 1.8547569854388037
2.5533387666395 -1.561015348288075
2.6234718826405867 1.9345838252656438
2.663547948926922 -1.5719008264462806
2.7336810649280086 1.9858362849271942
2.7737571312143436 -1.5750757575757568
2.8438902472154304 2.009421487603306
2.883966313501766 -1.5687258953168035
2.954099429502852 2.023481896890988
2.9941754957891877 -1.5564797323888229
3.0643086117902745 2.0243890200708385
3.1043846780766096 -1.536523022432113
3.1745177940776963 2.0085143644234558
3.2145938603640314 -1.5088557654466737
3.284726976365118 1.9749508067689887
3.324803042651453 -1.472570838252656
3.39493615865254 1.919162731208186
3.435012224938875 -1.4285753640299088
3.5051453409399618 1.8343467138921687
3.545221407226297 -1.3786835891381335
3.6053355066557997 1.7260966810966811
3.655430589513719 -1.3197205824478546
3.6854876392284703 1.6130086580086582
3.765639771801141 -1.2544077134986225
3.750611246943765 1.5024152236652237
3.805715838087476 1.3785173160173163
3.850244800627849 1.2787337662337666
3.875848954088563 -1.1827449822904361
3.919007794704616 1.1336638361638363
3.9860581363759846 -1.1074537583628485
3.9860581363759846 1.0004485329485333
4.058012891753723 0.876878197560016
4.096267318663407 -1.0303482880755608
4.15638141809291 0.7443374218374221
4.206476500950829 -0.9514285714285711
4.256571583808748 0.6491902794175526
4.3166856832382505 -0.8738695395513574
4.36678076609617 0.593855765446675
4.426894865525672 -0.7981247540338443
4.476989948383592 0.5802489177489183
4.537104047813094 -0.72918339236521
4.587199130671014 0.5902272727272733
4.647313230100516 -0.667045454545454
4.697408312958435 0.6246979535615904
4.757522412387939 -0.6148858717040526
4.807617495245857 0.6754968516332154
4.8677315946753605 -0.5754260133805582
4.917826677533279 0.7163173947264858
4.977940776962782 -0.5500265643447455
5.028035859820701 0.7448917748917752
5.088149959250204 -0.5373268398268394
5.138245042108123 0.7702912239275879
5.198359141537626 -0.5445838252656432
5.2484542243955445 0.7897943722943728
5.308568323825048 -0.5618191656828015
5.358663406682967 0.8052154663518301
5.41877750611247 -0.5844972451790631
5.468872588970389 0.8156473829201105
5.5289866883998915 -0.6067217630853987
5.579081771257811 0.8197294372294377
5.639195870687313 -0.6248642266824076
5.689290953545233 0.8197294372294377
5.749405052974735 -0.6398317591499403
5.799500135832655 0.8142866981503349
5.859614235262157 -0.6493565525383702
5.909709318120076 0.8006798504525783
5.969823417549579 -0.6570670995670991
6.019918500407498 0.7811767020857934
6.080032599837001 -0.6570670995670991
6.13012768269492 0.7562308146399057
6.190241782124423 -0.653438606847697
6.240336864982342 0.7217601338055886
6.300450964411845 -0.6420995670995664
6.350546047269764 0.6777646595828419
6.410660146699267 -0.6225964187327819
6.4607552295571855 0.6242443919716649
6.520869328986689 -0.5922077922077915
6.570964411844607 0.5548494687131056
6.631078511274111 -0.5495730027548205
6.681173594132029 0.4686727666273125
6.7412876935615325 -0.4860743801652889
6.781363759847868 0.3679316979316982
6.84147785927737 -0.39541245791245716
6.861515892420538 0.25880333951762546
6.926639500135833 -0.28237987012986965
6.917336127605076 0.14262677798392165
6.946677533279001 0.05098957832291173
6.967431210462995 -0.13605442176870675
6.965045730326905 -0.03674603174603108

I find that an easy way to sort points with x,y-coordinates like that is to sort them dependent on the angle between the line from the points and the center of mass of the whole polygon and the horizontal line which is called alpha in the example. The coordinates of the center of mass (x0 and y0) can easily be calculated by averaging the x,y coordinates of all points. Then you calculate the angle using numpy.arccos for instance. When y-y0 is larger than 0 you take the angle directly, otherwise you subtract the angle from 360° (2𝜋). I have used numpy.where for the calculation of the angle and then numpy.argsort to produce a mask for indexing the initial x,y-values. The following function sort_xy sorts all x and y coordinates with respect to this angle. If you want to start from any other point you could add an offset angle for that. In your case that would be zero though.
def sort_xy(x, y):
x0 = np.mean(x)
y0 = np.mean(y)
r = np.sqrt((x-x0)**2 + (y-y0)**2)
angles = np.where((y-y0) > 0, np.arccos((x-x0)/r), 2*np.pi-np.arccos((x-x0)/r))
mask = np.argsort(angles)
x_sorted = x[mask]
y_sorted = y[mask]
return x_sorted, y_sorted
Plotting x, y before sorting using matplotlib.pyplot.plot (points are obvisously not sorted):
Plotting x, y using matplotlib.pyplot.plot after sorting with this method:

If it is certain that the curve does not cross the same X coordinate (i.e. any vertical line) more than twice, then you could visit the points in X-sorted order and append a point to one of two tracks you follow: to the one whose last end point is the closest to the new one. One of these tracks will represent the "upper" part of the curve, and the other, the "lower" one.
The logic would be as follows:
dist2 = lambda a,b: (a[0]-b[0])*(a[0]-b[0]) + (a[1]-b[1])*(a[1]-b[1])
z = list(zip(x, y)) # get the list of coordinate pairs
z.sort() # sort by x coordinate
cw = z[0:1] # first point in clockwise direction
ccw = z[1:2] # first point in counter clockwise direction
# reverse the above assignment depending on how first 2 points relate
if z[1][1] > z[0][1]:
cw = z[1:2]
ccw = z[0:1]
for p in z[2:]:
# append to the list to which the next point is closest
if dist2(cw[-1], p) < dist2(ccw[-1], p):
cw.append(p)
else:
ccw.append(p)
cw.reverse()
result = cw + ccw
This would also work for a curve with steep fluctuations in the Y-coordinate, for which an angle-look-around from some central point would fail, like here:
No assumption is made about the range of the X nor of the Y coordinate: like for instance, the curve does not necessarily have to cross the X axis (Y = 0) for this to work.

Counter-clock-wise order depends on the choice of a pivot point. From your question, one good choice of the pivot point is the center of mass.
Something like this:
# Find the Center of Mass: data is a numpy array of shape (Npoints, 2)
mean = np.mean(data, axis=0)
# Compute angles
angles = np.arctan2((data-mean)[:, 1], (data-mean)[:, 0])
# Transform angles from [-pi,pi] -> [0, 2*pi]
angles[angles < 0] = angles[angles < 0] + 2 * np.pi
# Sort
sorting_indices = np.argsort(angles)
sorted_data = data[sorting_indices]

Not really a python question I think, but still I think you could try sorting by - sign(y) * x doing something like:
def counter_clockwise_sort(points):
return sorted(points, key=lambda point: point['x'] * (-1 if point['y'] >= 0 else 1))
should work fine, assuming you read your points properly into a list of dicts of format {'x': 0.12312, 'y': 0.912}
EDIT: This will work as long as you cross the X axis only twice, like in your example.

If:
the shape is arbitrarily complex and
the point spacing is ~random
then I think this is a really hard problem.
For what it's worth, I have faced a similar problem in the past, and I used a traveling salesman solver. In particular, I used the LKH solver. I see there is a Python repo for solving the problem, LKH-TSP. Once you have an order to the points, I don't think it will be too hard to decide on a clockwise vs clockwise ordering.

If we want to answer your specific problem, we need to pick a pivot point.
Since you want to sort according to the starting point you picked, I would take a pivot in the middle (x=4,y=0 will do).
Since we're sorting counterclockwise, we'll take arctan2(-(y-pivot_y),-(x-center_x)) (we're flipping the x axis).
We get the following, with a gradient colored scatter to prove correctness (fyi I removed the first line of the dat file after downloading):
import numpy as np
import matplotlib.pyplot as plt
points = np.loadtxt('points.dat')
#oneliner for ordering points (transform, adjust for 0 to 2pi, argsort, index at points)
ordered_points = points[np.argsort(np.apply_along_axis(lambda x: np.arctan2(-x[1],-x[0]+4) + np.pi*2, axis=1,arr=points)),:]
#color coding 0-1 as str for gray colormap in matplotlib
plt.scatter(ordered_points[:,0], ordered_points[:,1],c=[str(x) for x in np.arange(len(ordered_points)) / len(ordered_points)],cmap='gray')
Result (in the colormap 1 is white and 0 is black), they're numbered in the 0-1 range by order:

For points with comparable distances between their neighbouring pts, we can use KDTree to get two closest pts for each pt. Then draw lines connecting those to give us a closed shape contour. Then, we will make use of OpenCV's findContours to get contour traced always in counter-clockwise manner. Now, since OpenCV works on images, we need to sample data from the provided float format to uint8 image format. Given, comparable distances between two pts, that should be pretty safe. Also, OpenCV handles it well to make sure it traces even sharp corners in curvatures, i.e. smooth or not-smooth data would work just fine. And, there's no pivot requirement, etc. As such all kinds of shapes would be good to work with.
Here'e the implementation -
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist
from scipy.spatial import cKDTree
import cv2
from scipy.ndimage.morphology import binary_fill_holes
def counter_clockwise_order(a, DEBUG_PLOT=False):
b = a-a.min(0)
d = pdist(b).min()
c = np.round(2*b/d).astype(int)
img = np.zeros(c.max(0)[::-1]+1, dtype=np.uint8)
d1,d2 = cKDTree(c).query(c,k=3)
b = c[d2]
p1,p2,p3 = b[:,0],b[:,1],b[:,2]
for i in range(len(b)):
cv2.line(img,tuple(p1[i]),tuple(p2[i]),255,1)
cv2.line(img,tuple(p1[i]),tuple(p3[i]),255,1)
img = (binary_fill_holes(img==255)*255).astype(np.uint8)
if int(cv2.__version__.split('.')[0])>=3:
_,contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
else:
contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cont = contours[0][:,0]
f1,f2 = cKDTree(cont).query(c,k=1)
ordered_points = a[f2.argsort()[::-1]]
if DEBUG_PLOT==1:
NPOINTS = len(ordered_points)
for i in range(NPOINTS):
plt.plot(ordered_points[i:i+2,0],ordered_points[i:i+2,1],alpha=float(i)/(NPOINTS-1),color='k')
plt.show()
return ordered_points
Sample run -
# Load data in a 2D array with 2 columns
a = np.loadtxt('random_shape.csv',delimiter=' ')
ordered_a = counter_clockwise_order(a, DEBUG_PLOT=1)
Output -

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Vectorized creation of shapely Polygons from GeoPandas DataFrame - python

In my case it was more than 5 times faster to build the triangles using list comprehension than using geopandas.apply : polys = [Polygon(((x, y), (x, y+d), (x+d, y+d), (x+d, y))) for x in xs for y in ys] gdf = gpd.GeoDataFrame(geometry=polys)

Related

Python Shapley Point within a Polygon wrong data

shapely interpolate in three dimensions returns Point Z but invalid results

geopandas point in polygon distance to nearest edge

How to pick the nearest marker from clicked point in vispy?

Counterclockwise sorting of x, y data

Categories

Resources