geopandas point in polygon distance to nearest edge - python

So I have a geopandas dataframe of ~10,000 rows like this. Each point is within the polygon (I've made sure of it).
point name field_id geometry
POINT(-0.1618445 51.5103873) polygon1 1 POLYGON ((-0.1642799 51.5113756, -0.1639581 51.5089851, -0.1593661 51.5096729, -0.1606536 51.5115358, -0.1642799 51.5113756))
I want to add a new column called distance_to_nearest_edge. Which is the distance from the point to the nearest boundary of the polygon.
There is a shapely function that calculates what I want:
from shapely import wkt
poly = wkt.loads('POLYGON ((-0.1642799 51.5113756, -0.1639581 51.5089851, -0.1593661 51.5096729, -0.1606536 51.5115358, -0.1642799 51.5113756))')
pt = wkt.loads('POINT(-0.1618445 51.5103873)')
dist = poly.boundary.distance(pt)
---
dist = 0.0010736436340879488
But I'm struggling to apply this to 10k rows.
I've tried creating a function, but I keep getting errors ("'Polygon' object has no attribute 'encode'", 'occurred at index 0')
Eg:
def fxy(x, y):
poly = wkt.loads(x)
pt = wkt.loads(y)
return poly.exterior.distance(pt)
Appreciate any help!

I think your data has missing values.
you can try this:
df['distance'] = df.apply(lambda row : row['point'].distance(row['geometry'].boundary) if pd.notnull(row['point']) & pd.notnull(row['geometry']) else np.nan, axis=1)

Related

Python Shapley Point within a Polygon wrong data

I have a function that assigns an id if a point is within a polygon. My function is classifying the same shapely point incorrectly. It runs over two DataFrames poly that contains the polygon in shapely format (I looked at the polygons and look correct) and df that contains the start_point in shapely format. When I run the code I get inconsistent results. The dataset I am using is big, over 2 million rows. None of the misclassified points are on the boundary of the polygon.
def inside_polygon(df, polygons):
result = np.zeros((len(df), 2), dtype=object)
for polygon in polygons[["fence_id","polygon","name"]].itertuples():
inside = np.array([point.within(polygon.polygon) for point in df["start_point"]])
result[inside, 0] = polygon.fence_id
result[inside, 1] = polygon.name
return pd.DataFrame(result, columns=["fence_id", "name"])
df.loc[:,'start_point'] = df.apply(lambda row: Point(row['start_long'], row['start_lat']), axis=1)
df["fence_id"] = None
df["name"] = None
df.loc[:, ['fence_id','name']] = inside_polygon(df, poly)
| Same Point different classification (the point is actually outside the polygon (https://i.stack.imgur.com/VdPzi.png) A | Column B |
Can someone help?
Tried using both "within" and "contain" function, same results for both, maybe the issue is on how I link the fence_id on 'poly' DataFrame with the 'df' DataFrame that contains the Points

Vectorized creation of shapely Polygons from GeoPandas DataFrame

I have a GeoDataFrame with a point geometry.
From the point geometry, I want to define a square polygon geometry in a quite straightforward manner.
Given a point, the point should be the left bottom corner in a square with sides 250 units of length.
I.e, left bottom corner is the current point, right bottom corner is the current point + 250 on the x axis etc.
My naive way of doing this is the following:
Create the corners as new columns in the GeoDataFrame:
After that, I try to define a new columns as:
gdf['POLY'] = shapely.Geometry([gdf['BOTTOM_LEFT'], gdf['BOTTOM_RIGHT'], gdf['TOP_LEFT'], gdf['TOP_RIGHT']])
But this returns the following error message:
AttributeError: 'list' object has no attribute '__array_interface__'
Your implementation is close, but you can't call shapely.geometry.Polygon with an array of points - it can only be done one at a time. So the trick is to use df.apply to call Polygon on every row of the DataFrame:
gdf['geometry'] = gdf.apply(
lambda s: shapely.geometry.Polygon(
[s['BOTTOM_LEFT'], s['BOTTOM_RIGHT'], s['TOP_LEFT'], s['TOP_RIGHT']],
axis=1,
)
)
You could do that with your original point using translate:
gdf['geometry'] = gdf.apply(
lambda s: shapely.geometry.Polygon(
[
s['POINT'],
s['POINT'].translate(xoff=250),
s['POINT'].translate(yoff=250, xoff=250),
s['POINT'].translate(yoff=250),
],
axis=1,
)
)
Let's assume you have a GeoDataFrame with only single point. It is called gdf and it looks as follows:
X Y geometry
0 5 6 POINT (5.00000 6.00000)
You can access the x and y components of the point using the following lambda function:
#Access x and y components of point geometry
X = gdf.geometry.apply(lambda x: x.x)
Y = gdf.geometry.apply(lambda x: x.y)
Now you can create a square object using shapely.geometry.Polygon. You need to specify the four vertices of the square. You can do it using:
gdf_square = shapely.geometry.Polygon([[X[0], Y[0]],
[X[0]+250, Y[0]],
[X[0]+250, Y[0]+250],
[X[0], Y[0]+250]])
You can get a square polygon object as shown below:
Note that if you have many points in the GeoDataFrame, modify the last function such that it creates the square polygon for point in each row one by one.
In my case it was more than 5 times faster to build the triangles using list comprehension than using geopandas.apply :
polys = [Polygon(((x, y), (x, y+d), (x+d, y+d), (x+d, y))) for x in xs for y in ys]
gdf = gpd.GeoDataFrame(geometry=polys)

Apply a column containing polygon to Polygon() shapely

I have a dataframe with a column called NewPolygon :
NewPolygon
[(1.23,10),(4.4, 10)...]
[(16.0,10),(8.1, 10)...]
[(2.2,10),(0, 10)...]
My code :
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon
polygon = pd.read_csv(file_path)
point = Point(10, 1.1)
polygon = Polygon() ####How to apply the value from column `NewPolygon` here iteratively
print(polygon.contains(point))
How can I apply all values from NewPolygon to Polygon() iteratively?
Polygon() take a list of tuple as value such as Polygon([(1.23,10),(4.4, 10)...])
Simply, use, Series.transform:
df['NewPolygon'] = df['NewPolygon'].transform(Polygon)
To use methods on polygon objects stored inside NewPolygon column, Use:
df['NewPolygon'].apply(lambda p : p.contains(point))

How do I correctly reproject a geodataframe with multiple geometry colums?

In the geopandas documentation it says that
A GeoDataFrame may also contain other columns with geometrical (shapely) objects, but only one column can be the active geometry at a time. To change which column is the active geometry column, use the set_geometry method.
I'm wondering how to use such a GeoDataFrame if the goal is to flexibly reproject the geometrical data in these various columns to one or more other coordinate reference systems. Here's what I tried.
First try
import geopandas as gpd
from shapely.geometry import Point
crs_lonlat = 'epsg:4326' #geometries entered in this crs (lon, lat in degrees)
crs_new = 'epsg:3395' #geometries needed in (among others) this crs
gdf = gpd.GeoDataFrame(crs=crs_lonlat)
gdf['geom1'] = [Point(9,53), Point(9,54)]
gdf['geom2'] = [Point(8,63), Point(8,64)]
#Working: setting geometry and reprojecting for first time.
gdf = gdf.set_geometry('geom1')
gdf = gdf.to_crs(crs_new) #geom1 is reprojected to crs_new, geom2 still in crs_lonlat
gdf
Out:
geom1 geom2
0 POINT (1001875.417 6948849.385) POINT (8 63)
1 POINT (1001875.417 7135562.568) POINT (8 64)
gdf.crs
Out: 'epsg:3395'
So far, so good. Things go off the rails if I want to set geom2 as the geometry column, and reproject that one as well:
#Not working: setting geometry and reprojecting for second time.
gdf = gdf.set_geometry('geom2') #still in crs_lonlat...
gdf.crs #...but this still says crs_new...
Out: 'epsg:3395'
gdf = gdf.to_crs(crs_new) #...so this doesn't do anything! (geom2 unchanged)
gdf
Out:
geom1 geom2
0 POINT (1001875.417 6948849.385) POINT (8.00000 63.00000)
1 POINT (1001875.417 7135562.568) POINT (8.00000 64.00000)
Ok, so, apparently, the .crs attribute of the gdf is not reset to its original value when changing the column that serves as the geometry - it seems, the crs is not stored for the individual columns. If that is the case, the only way I see to use reprojection with this dataframe, is to backtrack: start --> select column as geometry --> reproject gdf to crs_new --> use/visualize/... --> reproject gdf back to crs_lonlat --> goto start. This is not usable if I want to visualise both columns in one figure.
Second try
My second attempt was, to store the crs with each column separately, by changing the corresponding lines in the script above to:
gdf = gpd.GeoDataFrame()
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat)
However, it soon became clear that, though initialised as a GeoSeries, these columns are normal pandas Series, and don't have a .crs attribute the same way GeoSeries do:
gdf['geom1'].crs
AttributeError: 'Series' object has no attribute 'crs'
s = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
s.crs
Out: 'epsg:4326'
Is there something I'm missing here?
Is the only solution, to decide on the 'final' crs beforehand - and do all the reprojecting before adding the columns? Like so...
gdf = gpd.GeoDataFrame(crs=crs_new)
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat).to_crs(crs_new)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat).to_crs(crs_new)
#no more reprojecting done/necessary/possible! :/
...and then, when another crs is needed, rebuild the entire gdf from scratch? That can't be the way this was intended to be used.
Unfortunately, the desired behaviour is currently not possible. Due to limitations in the package, geopandas does not accommodate this use case at the moment, as can be seen in this issue in the github repo.
My workaround is to not use a GeoDataFrame at all, but rather combine a normal pandas DataFrame, for the non-shapely data, with several seperate geopandas GeoSeries, for the shapely geometry data. The GeoSeries each have their own crs and can be correctly reprojected whenever necessary.

Shapely point geometry in geopandas df to lat/lon columns

I have a geopandas df with a column of shapely point objects. I want to extract the coordinate (lat/lon) from the shapely point objects to generate latitude and longitude columns. There must be an easy way to do this, but I cannot figure it out.
I know you can extract the individual coordinates like this:
lon = df.point_object[0].x
lat = df.point_object[0].y
And I could create a function that does this for the entire df, but I figured there was a more efficient/elegant way.
If you have the latest version of geopandas (0.3.0 as of writing), and the if df is a GeoDataFrame, you can use the x and y attributes on the geometry column:
df['lon'] = df.point_object.x
df['lat'] = df.point_object.y
In general, if you have a column of shapely objects, you can also use apply to do what you can do on individual coordinates for the full column:
df['lon'] = df.point_object.apply(lambda p: p.x)
df['lat'] = df.point_object.apply(lambda p: p.y)
Without having to iterate over the Dataframe, you can do the following:
df['lon'] = df['geometry'].x
df['lat'] = df['geometry'].y
The solution to extract the center point (latitude and longitude) from the polygon and multi-polygon.
import geopandas as gpd
df = gpd.read_file(path + 'df.geojson')
#Find the center point
df['Center_point'] = df['geometry'].centroid
#Extract lat and lon from the centerpoint
df["long"] = df.Center_point.map(lambda p: p.x)
df["lat"] = df.Center_point.map(lambda p: p.y)

Categories