How to extract LineString data from GeoDataFrame and match with Polygon? - python

Dear StackOverflow community,
Main question:
I have a GeoDataFrame containing the StreetNetwork of NYC - Manhattan (obtained through the osmnx package), where I would like to extract the coordinates (lon/lat data) from all streets which are stored as LineStrings under geometry, like so:
*
field_1 0
access
bridge
geometry LINESTRING (-73.9975944 40.7140611, -73.997492...
highway residential
junction
key 0
lanes
length 11.237
maxspeed 25 mph
name Catherine Street
oneway True
osmid 5670536
ref
service
tunnel
u 1773060097
v 42437559
width
geometry LINESTRING (-73.9975944 40.7140611, -73.997492...
Name: 0, dtype: object
*
What I am trying to do is to extract the geometry information for each line-item:
df.iloc[x][3]
The issue is that the output format is as str:
[1]: LINESTRING (-73.9975944 40.7140611, -73.9974922 40.7139962)
[2]: print(type(...))
[2]: <class 'str'>
This makes it hard to automate and process the output data. Does anyone know how to extract it so that it is already in LineString (or any other useable list/array) format?
Further question:
My overall goal is to map this street information with a shapefile for taxi zones (zones in the format of polygons) to identify which streets are in which zone, and which lon/lat areas are covered with streets within one zone (polygon). Is there any straight forward way to do this leveraging shapely, geopandas or osmnx packages (i.e. something like "polygon.contains(Point)" but in the sense of "polygon.contains(LineString)"?
Thanks a lot for your support!

For extraction coordinates of geometry you need to use the following code:
df.geometry.apply(lambda geom: geom.coords, axis=1)

Related

Geopandas buffer and intersect

I am using the geojson file from the [OpenData Vancouver][1] website and I am trying to find the zoning classifications that fall within 5 kms of a "Historical Area".
So, I am buffering all historical areas by 5kms (My data is projected), performing the intersect operation and using the intersect results as an index:
buffs = gdf_26910[gdf_26910['zoning_classification']=='Historical Area']['geometry'].buffer(5000)
gdf_26910[buffs.intersects(gdf_26910['geometry'])]
However, this is the output I am getting:
zoning_category zoning_classification zoning_district object_id geometry area centroid buffer5K
87 HA Historical Area HA-1A 78541 POLYGON ((492805.516 5458679.305, 492805.038 5... 3848.384041 POINT (492778.785 5458699.947) POLYGON ((497803.807 5458548.605, 497803.124 5...
111 HA Historical Area HA-3 78640 POLYGON ((491358.402 5458065.050, 491309.735 5... 66336.339719 POINT (491183.139 5458103.162) POLYGON ((492818.045 5453267.595, 492421.697 5...
180 HA Historical Area HA-1A 78836 POLYGON ((492925.194 5458575.204, 492929.600 5... 90566.768532 POINT (492753.969 5458456.804) POLYGON ((487583.872 5458086.263, 487564.746 5...
683 HA Historical Area HA-1 78779 POLYGON ((492925.194 5458575.204, 492802.702 5... 69052.427940 POINT (492606.372 5458621.753) POLYGON ((487874.100 5456398.633, 487789.801 5...
1208 HA Historical Area HA-2 78833 POLYGON ((492332.139 5458699.308, 492346.989 5... 179805.027166 POINT (492343.437 5458944.412) POLYGON ((489822.136 5454379.453, 489755.087 5...
Clearly, I am getting a match for the Historical Areas and not all the other geometries that intersect the buffers.
I have plotted the buffers and the output looks correct:
#Plot
base=gdf_26910.plot()
buffs.plot(ax=base, color='red', alpha=0.25)
[![enter image description here][2]][2]
I have also opened the data in QGIS and verified that there are 5 'Historical Areas' and they are all adjacent to 'Comprehensive Development'. So, the matching rows after the intersect operation should be "Comprehensive Development" at the least.
Where am I going wrong?
Two core points
need to work in meters for a 5km buffer. Hence have used estimate_utm_crs() for projection. Have also use cap_style and join_style for a more reflective buffered polygon.
have used sjoin() instead of mask approach in your code. This will effectively give duplicates, so de-dupe using pandas groupby().first()
UPDATE changed to predicate="within" and used folium to visualise (possibly helps you understand how geometry is working)
import geopandas as gpd
import folium
gdf_26910 = gpd.read_file(
"https://opendata.vancouver.ca/explore/dataset/zoning-districts-and-labels/download/?format=geojson&timezone=Europe/London&lang=en"
)
buffs = gdf_26910.loc[gdf_26910["zoning_classification"] == "Historical Area"]
# buffer is defined as km, so need a CRS in meters...
buffs = (
buffs.to_crs(buffs.estimate_utm_crs())
.buffer(5000, cap_style=2, join_style=3)
.to_crs(gdf_ha.crs)
)
# this warns so is clearly bad !
# gdf_26910[buffs.intersects(gdf_26910['geometry'])]
# some geometries intersect multiple historical areas, take first intersection from sjoin()
gdf_5km = (
gdf_26910.reset_index()
.sjoin(buffs.to_frame(), predicate="within")
.groupby("index")
.first()
.set_crs(gdf_26910.crs)
)
m = buffs.explore(name="buffer")
gdf_5km.explore("zoning_classification", m=m, name="within")
gdf_26910.explore("zoning_classification", m=m, name="all", legend=False)
folium.LayerControl().add_to(m)
m

Which multipolygon does the point of longtitude and latitude belong to in Python?

I have the longitude and latitude and the expected result is that whichever multipolygon the point is in, I get the name or ID of the multipolygon.
import geopandas as gpd
world = gpd.read_file('/Control_Areas.shp')
world.plot()
Output
0 MULTIPOLYGON (((-9837042.000 6137048.000, -983...
1 MULTIPOLYGON (((-11583146.000 5695095.000, -11...
2 MULTIPOLYGON (((-8542840.287 4154568.013, -854...
3 MULTIPOLYGON (((-10822667.912 2996855.452, -10...
4 MULTIPOLYGON (((-13050304.061 3865631.027, -13.
Previous attempts:
I have tried fiona, shapely and geopandas to get that done but I have struggled horribly to make progress on this. The closest I have gotten is the within and contains function, but the area of work I have struggled is the transformation of multipolygon to polygon successfully as well and then utilising the power of within and contains to get the desired output.
The shapefile has been downloaded from here.
world.crs gives {'init': 'epsg:3857'} (Web Mercator projection) so you should first reproject your GeoDataFrame in the WGS84 projection if you want to keep the latitude-longitude coordinate system of your point.
world = world.to_crs("EPSG:4326")
Then you can use the intersects method of GeoPandas to find the indexes of the Polygons that contain your point.
For example for the city of New York:
from shapely.geometry import Point
NY_pnt = Point(40.712784, -74.005941)
world[["ID","NAME"]][world.intersects(NY_pnt)]
which results in:
ID NAME
20 13501 NEW YORK INDEPENDENT SYSTEM OPERATOR
you can check the result with shapely within method:
NY_pnt.within(world["geometry"][20])
If you have multiple points, you can create a GeoDataFrame and use the sjoin method:
NY_pnt = Point(40.712784, -74.005941)
LA_pnt = Point(34.052235, -118.243683)
points_df = gpd.GeoDataFrame({'geometry': [NY_pnt, LA_pnt]}, crs='EPSG:4326')
results = gpd.sjoin(points_df, world, op='within')
results[['ID', 'NAME']]
Output:
ID NAME
0 13501 NEW YORK INDEPENDENT SYSTEM OPERATOR
1 11208 LOS ANGELES DEPARTMENT OF WATER AND POWER

geopandas sjoin returning empty rows

I have a table of polygons of all UK output areas structured as such:
newpoly
OBJECTID OA11CD LAD11CD Shape__Are Shape__Len TCITY15NM geometry
67519 67520 E00069658 E06000018 3.396296e+04 1006.464423 Nottingham POLYGON ((456069.067 340766.874, 456057.000 34...
67520 67521 E00069659 E06000018 1.014138e+05 1404.327776 Nottingham POLYGON ((456691.549 340778.104, 456557.864 34...
67521 67522 E00069660 E06000018 1.812783e+04 731.882609 Nottingham POLYGON ((456945.994 340821.233, 456969.220 34...
67522 67523 E00069661 E06000018 2.765546e+04 1112.317587 Nottingham POLYGON ((456527.178 340669.119, 456484.993 34...
67523 67524 E00069662 E06000018 3.647822e+04 964.989153 Nottingham POLYGON ((456301.845 340419.759, 456244.357 34...
and a table of points structured like:
restaurants
name latitude longitude geometry
0 Restaurant Sat Bains with rooms 52.925050 -1.167712 POINT (-1.16771 52.92505)
1 Revolution Hockley 52.954090 -1.144025 POINT (-1.14403 52.95409)
2 Revolution Cornerhouse 52.955517 -1.150088 POINT (-1.15009 52.95552)
but when i do:
spatial_join = gpd.sjoin(restaurants, newpoly, op = 'contains')
spatial_join
0 rows match.
the geometry column of the restaurants were made via:
restaurants = pd.read_csv('Restaurants_clean.csv')
restaurants = gpd.GeoDataFrame(
restaurants, geometry=gpd.points_from_xy(restaurants.longitude, restaurants.latitude))
I have tried different 'op' arguments but the same problem occurs. I am convinced that there must be a join because all UK output areas exist in the table.
Am i missing something?
You are using different projections. I am sure GeoPandas sjoin actually warns you about that. Create your point layer in the following way:
restaurants = pd.read_csv('Restaurants_clean.csv')
restaurants = gpd.GeoDataFrame(
restaurants,
geometry=gpd.points_from_xy(restaurants.longitude, restaurants.latitude),
crs=4326)
restaurants = restaurants.to_crs(newpoly.crs)
I am first specifying the CRS of input (as 4326, which is EPSG code of WS84, i.e. lon/lat coordinates) and then I am re-projecting the data to the same CRS newpoly has (I assume 27700).

PANDAS-GEOPANDAS: Localization of points in a shapefile

Using pandas and geopandas, I would like to define a function to be applied to each row of a dataframe which operates as follows:
INPUT: column with coordinates
OUTPUT: zone in which the point falls.
I tried with this, but it takes very long.
def zone_assign(point,zones,codes):
try:
zone_label=zones[zones['geometry'].contains(point)][codes].values[0]
except:
zone_label=np.NaN
return(zone_label)
where:
point is the cell of the row which contains geographical coordinates;
zones is the shapefile imported with geopandas;
codes is the column of the shapefile which contains label to be assigned to the point.
Part of the answer, is taken from another answer I made earlier that needed within rather than contains
Your situation looks like a typical case where spatial joins are useful. The idea of spatial joins is to merge data using geographic coordinates instead of using attributes.
Three possibilities in geopandas:
intersects
within
contains
It seems like you want contains, which is possible using the following syntax:
geopandas.sjoin(polygons, points, how="inner", op='contains')
Note: You need to have installed rtree to be able to perform such operations. If you need to install this dependency, use pip or conda to install it
Example
As an example, let's take a random sample of cities and plot countries associated. The two example datasets are
import geopandas
import matplotlib.pyplot as plt
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))
cities = cities.sample(n=50, random_state=1)
world.head(2)
pop_est continent name iso_a3 gdp_md_est geometry
0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
cities.head(3)
name geometry
196 Bogota POINT (-74.08529 4.59837)
95 Tbilisi POINT (44.78885 41.72696)
173 Seoul POINT (126.99779 37.56829)
world is a worldwide dataset and cities is a subset.
Both dataset need to be in the same projection system. If not, use .to_crs before merging.
data_merged = geopandas.sjoin(countries, cities, how="inner", op='contains')
Finally, to see the result let's do a map
f, ax = plt.subplots(1, figsize=(20,10))
data_merged.plot(axes=ax)
countries.plot(axes=ax, alpha=0.25, linewidth=0.1)
plt.show()
and the underlying dataset merges together the information we need
data_merged.head(2)
pop_est continent name_left iso_a3 gdp_md_est geometry index_right name_right
7 6909701 Oceania Papua New Guinea PNG 28020.0 MULTIPOLYGON (((141.00021 -2.60015, 142.73525 ... 59 Port Moresby
9 44293293 South America Argentina ARG 879400.0 MULTIPOLYGON (((-68.63401 -52.63637, -68.25000... 182 Buenos Aires
Here, I used inner join method but that's a parameter you can change if, for instance, you want to keep all points, including those not within a polygon.

Geofence calculating new longitude, latitude from distance and bearing (Python - Library)

I need to create a set of Lat/Long to define a geofence zone, I have a central point (LatX, LongX) and I want to provide a distance (Miles) and distance (Degrees) and then get the new position (LatY, LongY).
I have some examples but there do not take into account potential N/S and E/W , is there a python library that can be used to to provide these new positions based on a distance and angle?
Yes there is! Try:
https://geopy.readthedocs.io/en/stable/#module-geopy.distance
pip install geopy
Example code:
from geopy import distance, location
london = location.Point(51.5074, 0.1278)
paris = location.Point(48.8566, -2.3522)
brighton = distance.geodesic().destination(
point=london,
bearing=180,
distance=distance.Distance(kilometers=76))
print("london-paris:", distance.distance(london, paris).kilometers)
print("brighton-paris:", distance.distance(brighton, paris).kilometers)
prints
london-paris: 343.9231200909896
brighton-paris: 282.31635799139883

Categories