Is there a way to merge two overlapping GEOJSON polygons in python, returning a single merged GEOJSON object?
This is how I was able to do it using the packages/modules json, geojson, shapely, pyproj, and partial from functools:
import json
import geojson
from functools import partial
import pyproj
import shapely.geometry
import shapely.ops
# reading into two geojson objects, in a GCS (WGS84)
with open('file1.json') as geojson1:
poly1_geojson = json.load(geojson1)
with open('file2.json') as geojson2:
poly2_geojson = json.load(geojson2)
# pulling out the polygons
poly1 = shapely.geometry.asShape(poly1_geojson['features'][2]['geometry'])
poly2 = shapely.geometry.asShape(poly2_geojson['features'][2]['geometry'])
# checking to make sure they registered as polygons
print poly1.geom_type
print poly2.geom_type
# merging the polygons - they are feature collections, containing a point, a polyline, and a polygon - I extract the polygon
# for my purposes, they overlap, so merging produces a single polygon rather than a list of polygons
mergedPolygon = poly1.union(poly2)
# using geojson module to convert from WKT back into GeoJSON format
geojson_out = geojson.Feature(geometry=mergedPolygon, properties={})
# outputting the updated geojson file - for mapping/storage in its GCS format
with open('Merged_Polygon.json', 'w') as outfile:
json.dump(geojson_out.geometry, outfile, indent=3, encoding="utf-8")
outfile.close()
# reprojecting the merged polygon to determine the correct area
# it is a polygon covering much of the US, and dervied form USGS data, so using Albers Equal Area
project = partial(
pyproj.transform,
pyproj.Proj(init='epsg:4326'),
pyproj.Proj(init='epsg:5070'))
mergedPolygon_proj = shapely.ops.transform(project,mergedPolygon)
This example from here seems to be a lot more concise:
from shapely.geometry import Polygon
from shapely.ops import cascaded_union
polygon1 = Polygon([(0, 0), (5, 3), (5, 0)])
polygon2 = Polygon([(0, 0), (3, 10), (3, 0)])
polygons = [polygon1, polygon2]
u = cascaded_union(polygons)
The dissolve() function within GeoPandas makes this very simple. For example, I had a GeoDataFrame containing US counties (along with their polygons) and their corresponding Catholic diocese. I wanted to create polygons showing the outline of each diocese for each county. To do so, I used the following code:
diocese_boundaries = df_dioceses.dissolve(by = 'Diocese')
This returned a new DataFrame containing one row for each diocese. Each row had a new geometry column that contained the outline of each diocese.
Related
I have a csv file in which the 2nd and 3rd rows have lat and long values. The CSV file contains the temperature data from 2011 to 2099 in India, and I want to filter data for only the Satulaj basin using the shapefile of the Satulaj basin. How do I do this in python.
import shapefile
from shapely.geometry import shape, Point
import pandas as pd
path="D:\\THESIS\\Others\\DharamVeer_Sir\\1_Future_Climate Data\\"
df = pd.read_csv(path+"test3.csv")
path1 = "D:\\THESIS\\Others\\DharamVeer_Sir\\satulaj (1)\\Satluj\\"
# read your shapefile
r = shapefile.Reader(path1+"satulaj.shp")
# get the shapes
shapes = r.shapes()
# build a shapely polygon from your shape
polygon = shape(shapes[0])
def check(lon, lat):
# build a shapely point from your geopoint
point = Point(lon, lat)
# the contains function does exactly what you want
return polygon.contains(point)
for i in range(len(df.axes[1])):
sfile = df.values[0][i]
dst = df.values[1][i]
print(check(sfile,dst))
I have a shapefile (will be called source-file hereafter), which I need to clip by a multi-polygon shapefile so that I can have a clipped shapefile for each polygon. I tried the geopandas, though I am able to clip the source-file by individually clipping the it by selecting the polygons separately from the multi-polygon shapefile, but when I try to loop over the polygons to automate the clipping process I get the following error:
Error:
TypeError: 'mask' should be GeoDataFrame, GeoSeries or(Multi)Polygon, got <class 'tuple'>
Code:
import geopandas as gpd
source = ('source-shapefile.shp')
mask = ('mask_shapefile.shp')
sourcefile = gpd.read_file(source)
maskfile = gpd.read_file(mask)
for row in maskfile.iterrows():
gpd.clip(sourcefile, row)
Two points
https://geopandas.org/en/stable/docs/reference/api/geopandas.clip.html mask can be a GeoDataFrame hence no need for looping
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html yields a tuple of the index value and named tuple of row. Hence your error is the fact your are passing this tuple to clip()
Have constructed an example. It is far simpler to clip using a GeoDataFrame as the mask.
import geopandas as gpd
import pandas as pd
# lets build a mask for use in clip, multipolygons and polygons
maskfile = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
maskfile = maskfile.loc[maskfile["continent"].eq("Europe") & maskfile["name"].ne("Russia")].pipe(
lambda d: d.assign(gdp_grp=pd.cut(d["gdp_md_est"], bins=4, labels=list("abcd")))
).dissolve("gdp_grp").reset_index()
sourcefile = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# now clip, no looping needed
gpd.clip(sourcefile, maskfile)
Finally, after 5 hours of research, I am now able clip the shapefile by a multi-polygon shapefile and save the clipped polygons separately with their respective names. Following code may be dirty
but it works.
code:
import geopandas as gpd
import pandas as pd
import os, sys
source = ('source-shapefile.shp')
mask = ('mask_shapefile.shp')
outpath = ('/outpath')
sourcefile = gpd.read_file(source)
maskfile = gpd.read_file(mask)
clipshape = maskfile.explode()
clipshape.set_index('CATCH_NAME', inplace=True) # CATCH_NAME is attribute column name
for index, row in clipshape['geometry'].iteritems():
clipped = gpd.clip(sourcefile, row)
clipped.to_file(os.path.join(outpath, f'{index}.shp'))
I am trying to save the following geodataframe with columns (geometry, area, centroid, and boundary) to a json file using df.to_file('result.geojson', driver="GeoJSON"):
However, I get the following error because I have centroid and boundary columns.
TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fb7fff86940>' as a data type
This works perfectly fine when there is only geometry column and area column.
Also when I try to save it as a csv file and then read it back as geopandas file, I am only able to convert geometry into datatype geometry. Howver, centroid and boundary show as object datatype. How do I convert them to geometry datatype?
as per comments, both geopandas and geojson only support one geometry per feature
hence your data frame is polygons as geometry and other columns are series of objects (shapely objects)
have simulated data set. This clearly demonstrates your data structure is not normalised. boundary, centroid and area are calculated columns. Hence in relational theory not normalised
this can be saved as CSV shapely objects will be encoded as WKT
this can then simply be loaded, with second step to decode WKT back into shapely objects
have demonstrated this works by plotting geometries loaded from CSV
import geopandas as gpd
import pandas as pd
import shapely.geometry
from pathlib import Path
gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# derive columns in question...
gdf["boundary"] = gdf["geometry"].apply(lambda p: p.boundary )
gdf["centroid"] = gdf["geometry"].apply(lambda p: p.centroid )
gdf["area"] = gdf["geometry"].apply(lambda p: p.area )
# save as CSV shapely.wkt.dumps will be implicitly used
gdf.to_csv(Path.cwd().joinpath("SO_geom.csv"), index=False)
# load encoded dataframe
df = pd.read_csv(Path.cwd().joinpath("SO_geom.csv"))
# decode geometry columns as strings back into shapely objects
for c in ["geometry","boundary","centroid"]:
df[c] = df[c].apply(shapely.wkt.loads)
# finally reconstruct geodataframe
gdf = gpd.GeoDataFrame(df)
# show it has worked
gdf.plot()
gpd.GeoSeries(gdf["boundary"]).plot()
gpd.GeoSeries(gdf["centroid"]).plot()
I am trying to extract countries from NetCDF3 data using the pdsi monthly mean calibrate data from: https://psl.noaa.gov/data/gridded/data.pdsi.html. I am using the following code which performs a spatial merge of coordinates and identifies countries based on a shapefile of the world.
PDSI data format
# Import shapefile from geopandas
path_to_data = geopandas.datasets.get_path("naturalearth_lowres")
world_shp = geopandas.read_file(path_to_data)
world_shp.head()
# Import netCDF file
ncs = "pdsi.mon.mean.selfcalibrated.nc"
# Read in netCDF as a pandas dataframe
# Xarray provides a simple method of opening netCDF files, and converting them to pandas dataframes
ds = xr.open_dataset(ncs)
pdsi = ds.to_dataframe()
# the index in the df is a Pandas.MultiIndex. To reset it, use df.reset_index()
pdsi = pdsi.reset_index()
# quick check for shpfile plotting
world_shp.plot(figsize=(12, 8));
# use geopandas points_from_xy() to transform Longitude and Latitude into a list of shapely.Point objects and set it as a geometry while creating the GeoDataFrame
pdsi_gdf = geopandas.GeoDataFrame(pdsi, geometry=geopandas.points_from_xy(pdsi.lon, pdsi.lat))
print(pdsi_gdf.head())
# check CRS coordinates
world_shp.crs #shapefile
pdsi_gdf.crs #geodataframe netcdf
# set coordinates equal to each other
# PointsGeodataframe.crs = PolygonsGeodataframe.crs
pdsi_gdf.crs = world_shp.crs
# check coordinates after setting coordinates equal to each other
pdsi_gdf.crs #geodataframe netcdf
#spatial join
join_inner_df = geopandas.sjoin(pdsi_gdf, world_shp, how="inner")
join_inner_df
The problem I am having is that the original data in the NetCDF format consists of spatial coverage/gridded data where the values of the key variable (pdsi) represents the area within each shaded squares (see image below). So far, only the coordinate points in the middle of the squares are being matches, and I would like each shaded square to match to each country that it is inside. For example, if the area of the shaded squares are within the boundaries of Germany and Netherlands, then the key variable should be attributed to both countries. Any help on this issue would be greatly appreciated.
NetCDF gridded data example
have sourced data that you referenced to ensure this can be re-run on any machine
core solution, a square buffer around the point https://gis.stackexchange.com/questions/314949/creating-square-buffers-around-points-using-shapely
have analysed data to ensure value used for buffer is appropriate and calculated from data
# make sure that data supports using a buffer...
assert (
gdf["lat"].diff().loc[lambda s: s.ne(0)].mode()
== gdf["lon"].diff().loc[lambda s: s.ne(0)].mode()
).all()
# how big should the square buffer be around the point??
buffer = gdf["lat"].diff().loc[lambda s: s.ne(0)].mode().values[0] / 2
gdf["geometry"] = gdf["geometry"].buffer(buffer, cap_style=3)
the remaining solution is now a spatial join
# the solution... spatial join buffered polygons to countries
# comma separate associated countries
gdf = gdf.join(
world_shp.sjoin(gdf.set_crs("EPSG:4326"))
.groupby("index_right")["name"]
.agg(",".join)
)
have used plotly to visualise. From image you can see that multiple countries have been associated with a bounding box.
complete code
import geopandas as gpd
import numpy as np
import plotly.express as px
import requests
from pathlib import Path
from zipfile import ZipFile
import urllib
import geopandas as gpd
import shapely.geometry
import xarray as xr
# download NetCDF data...
# fmt: off
url = "https://psl.noaa.gov/repository/entry/get/pdsi.mon.mean.selfcalibrated.nc?entryid=synth%3Ae570c8f9-ec09-4e89-93b4-babd5651e7a9%3AL2RhaV9wZHNpL3Bkc2kubW9uLm1lYW4uc2VsZmNhbGlicmF0ZWQubmM%3D"
f = Path.cwd().joinpath(Path(urllib.parse.urlparse(url).path).name)
# fmt: on
if not f.exists():
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
ds = xr.open_dataset(f)
pdsi = ds.to_dataframe()
pdsi = pdsi.reset_index().dropna() # don't care about places in oceans...
# use subset for testing... last 5 times...
pdsim = pdsi.loc[pdsi["time"].isin(pdsi.groupby("time").size().index[-5:])]
# create geopandas dataframe
gdf = gpd.GeoDataFrame(
pdsim, geometry=pdsim.loc[:, ["lon", "lat"]].apply(shapely.geometry.Point, axis=1)
)
# make sure that data supports using a buffer...
assert (
gdf["lat"].diff().loc[lambda s: s.ne(0)].mode()
== gdf["lon"].diff().loc[lambda s: s.ne(0)].mode()
).all()
# how big should the square buffer be around the point??
buffer = gdf["lat"].diff().loc[lambda s: s.ne(0)].mode().values[0] / 2
gdf["geometry"] = gdf["geometry"].buffer(buffer, cap_style=3)
# Import shapefile from geopandas
path_to_data = gpd.datasets.get_path("naturalearth_lowres")
world_shp = gpd.read_file(path_to_data)
# the solution... spatial join buffered polygons to countries
# comma separate associated countries
gdf = gdf.join(
world_shp.sjoin(gdf.set_crs("EPSG:4326"))
.groupby("index_right")["name"]
.agg(",".join)
)
gdf["time_a"] = gdf["time"].dt.strftime("%Y-%b-%d")
# simplest way to test is visualise...
px.choropleth_mapbox(
gdf,
geojson=gdf.geometry,
locations=gdf.index,
color="pdsi",
hover_data=["name"],
animation_frame="time_a",
opacity=.3
).update_layout(
mapbox={"style": "carto-positron", "zoom": 1},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
I am hoping to create a region on a map and be able to automatically determine if points (coordinates) are inside that region. I'm using a geojson file of the entire US and coordinates for New York City for this example.
Geojson: https://github.com/johan/world.geo.json
I have read the shapely documentation and just can't figure out why my results are returning False. Any help would be much appreciated.
import json
from shapely.geometry import shape, GeometryCollection, Point
with open('USA.geo.json', 'r') as f:
js = json.load(f)
point = Point(40.712776, -74.005974)
for feature in js['features']:
polygon = shape(feature['geometry'])
if polygon.contains(point):
print ('Found containing polygon:', feature)
I'm hoping to print the contained coordinates, but nothing is printed.
You need to swap the values of the Point() around:
point = Point(-74.005974, 40.712776)
The dataset you're using has the longitude first and the latitude second in their coordinates.