I have CSV file, which contains the coordinates of points (more than 100 rows). Within CSV file there are 2 columns: Latitude, Longitude.
These points are the top left corners of some polygons. (squares)
All of the polygons has the same size (for example 100x100 meter).
Latitude Longitude
56.37769816725615 -4.325049868061924
55.37769816725615 -3.325049868061924
51.749167440074324 -4.963575226888083
...
I can load the CSV to dataframe, I can make points (or 4 points within row) from the coordinates with GeoPandas.
But how can I make Polygons for each row, which connects the 4 points?
Thanks for your help.
df = pd.read_csv('ExportPolyID.csv',nrows=10)
gdf= geopandas.GeoDataFrame(df,geometry=geopandas.points_from_xy(df.long, df.lat))
gdf['point2']= gdf.translate(2,2)
gdf['point3']=gdf.translate(3,3)
gdf['point4']=gdf.translate(4,4)
#After this I have 4 points for each row, but I can't connect them to create Polygons
If you want to define square in meters, make sure you are using projected CRS (http://geopandas.org/projections.html#re-projecting).
Then you can use something like this (there might be more effective ways, but this one is explicit):
from shapely.geometry import Polygon
lat = [0, 2, 4]
lon = [0, 2, 4]
gdf = gpd.GeoDataFrame()
gdf['lat'] = lat
gdf['lon'] = lon
dim = 1 # define the length of the side of the square
geoms = []
for index, row in gdf.iterrows():
ln = row.lon
lt = row.lat
geom = Polygon([(ln, lt), ((ln + dim), lt), ((ln + dim), (lt - dim)), (ln, (lt - dim))])
geoms.append(geom)
gdf['geometry'] = geoms
This will generate square polygons from set coordinates of size dim x dim with point defined by given coords being top left.
Related
I have a list of points (longitude and latitude), as well as their associated point geometries in a geodataframe. All of the points should be able to be subdivided into individual polygons, as the points are generally clustered in several areas. What I would like to do is have some sort of algorithm that loops over the points and checks the the distance between the previous and current point. If the distance is sufficiently small, it would group those points together. This process would continue until the current point is too far away. It would make a polygon out of those close points, and then continue the process with the next group of points.
gdf
longitude latitude geometry
0 -76.575249 21.157229 POINT (-76.57525 21.15723)
1 -76.575035 21.157453 POINT (-76.57503 21.15745)
2 -76.575255 21.157678 POINT (-76.57526 21.15768)
3 -76.575470 21.157454 POINT (-76.57547 21.15745)
5 -112.973177 31.317333 POINT (-112.97318 31.31733)
... ... ... ...
2222 -113.492501 47.645914 POINT (-113.49250 47.64591)
2223 -113.492996 47.643609 POINT (-113.49300 47.64361)
2225 -113.492379 47.643557 POINT (-113.49238 47.64356)
2227 -113.487443 47.643142 POINT (-113.48744 47.64314)
2230 -105.022627 48.585669 POINT (-105.02263 48.58567)
So in the data above, the first 4 points would be grouped together and turned into a polygon. Then, it would move onto the next group, and so forth. Each group of points is not evenly spaced, i.e., the next group might be 7 pairs of points, and the following could be 3. Ideally, the final output would be another geodataframe that is just a bunch of polygons.
You can try DBSCAN clustering as it will automatically find the best number of clusters and you can specify a maximum distance between points ( ε ).
Using your example, the algorithm identifies two clusters.
import pandas as pd
from sklearn.cluster import DBSCAN
df = pd.DataFrame(
[
[-76.575249, 21.157229, (-76., 21.15723)],
[-76.575035, 21.157453, (-76.57503, 21.15745)],
[-76.575255, 21.157678, (-76.57526, 21.15768)],
[-76.575470, 21.157454, (-76.57547, 21.15745)],
[-112.973177, 31.317333, (-112.97318, 31.31733)],
[-113.492501, 47.645914, (-113.49250, 47.64591)],
[-113.492996, 47.643609, (-113.49300, 47.64361)],
[-113.492379, 47.643557, (-113.49238, 47.64356)],
[-113.487443, 47.643142, (-113.48744, 47.64314)],
[-105.022627, 48.585669, (-105.02263, 48.58567)]
], columns=["longitude", "latitude", "geometry"])
clustering = DBSCAN(eps=0.3, min_samples=4).fit(df[['longitude','latitude']].values)
gdf = pd.concat([df, pd.Series(clustering.labels_, name='label')], axis=1)
print(gdf)
gdf.plot.scatter(x='longitude', y='latitude', c='label')
longitude latitude geometry label
0 -76.575249 21.157229 (-76.0, 21.15723) 0
1 -76.575035 21.157453 (-76.57503, 21.15745) 0
2 -76.575255 21.157678 (-76.57526, 21.15768) 0
3 -76.575470 21.157454 (-76.57547, 21.15745) 0
4 -112.973177 31.317333 (-112.97318, 31.31733) -1 # not in cluster
5 -113.492501 47.645914 (-113.4925, 47.64591) 1
6 -113.492996 47.643609 (-113.493, 47.64361) 1
7 -113.492379 47.643557 (-113.49238, 47.64356) 1
8 -113.487443 47.643142 (-113.48744, 47.64314) 1
9 -105.022627 48.585669 (-105.02263, 48.58567) -1 # not in cluster
If we add random data to your data set, run the clustering algorithm, and filter out those data points not in clusters, you get a clearer idea of how it's working.
import numpy as np
rng = np.random.default_rng(seed=42)
arr2 = pd.DataFrame(rng.random((3000, 2)) * 100, columns=['latitude', 'longitude'])
randdf = pd.concat([df[['latitude', 'longitude']], arr2]).reset_index()
clustering = DBSCAN(eps=1, min_samples=4).fit(randdf[['longitude','latitude']].values)
labels = pd.Series(clustering.labels_, name='label')
gdf = pd.concat([randdf[['latitude', 'longitude']], labels], axis=1)
subgdf = gdf[gdf['label']> -1]
subgdf.plot.scatter(x='longitude', y='latitude', c='label', colormap='viridis', figsize=(20,10))
print(gdf['label'].value_counts())
-1 2527
16 10
3 8
10 8
50 8
...
57 4
64 4
61 4
17 4
0 4
Name: label, Length: 99, dtype: int64
Getting the clustered points from this dataframe would be relatively simple. Something like this:
subgdf['point'] = subgdf.apply(lambda x: (x['latitude'], x['longitude']), axis=1)
subgdf.groupby(['label'])['point'].apply(list)
label
0 [(21.157229, -76.575249), (21.157453, -76.5750...
1 [(47.645914, -113.492501), (47.643609, -113.49...
2 [(46.67210037270342, 4.380376578722878), (46.5...
3 [(85.34030732681661, 23.393948586534073), (86....
4 [(81.40203846660347, 16.697291990770392), (82....
...
93 [(61.419880354359925, 23.25522624430636), (61....
94 [(50.893415175135424, 90.70863269095085), (52....
95 [(88.80586950148697, 81.17523712192651), (88.6...
96 [(34.23624333000541, 40.8156668231013), (35.86...
97 [(16.10456828199399, 67.41443008931344), (15.9...
Name: point, Length: 98, dtype: object
Although you'd probably need to do some kind of sorting to make sure you were connecting the closest points when drawing the polygons.
Similar SO question
DBSCAN from sklearn
Haversine Formula in Python (Bearing and Distance between two GPS points)
https://gis.stackexchange.com/questions/121256/creating-a-circle-with-radius-in-metres
You may be able to use the haversine formula to group points within a distance. Create polygons for each point (function below) with the formula then filter points inside from the master list of points and repeat until there are no more points.
#import modules
import numpy as np
import pandas as pd
import geopandas as gpd
from geopandas import GeoDataFrame, GeoSeries
from shapely import geometry
from shapely.geometry import Polygon, Point
from functools import partial
import pyproj
from shapely.ops import transform
#function to create polygons on radius
def polycir(lat, lon, radius):
local_azimuthal_projection = """+proj=aeqd +R=6371000 +units=m +lat_0={} +lon_0=
{}""".format(lat, lon)
wgs84_to_aeqd = partial(
pyproj.transform,
pyproj.Proj("+proj=longlat +datum=WGS84 +no_defs"),
pyproj.Proj(local_azimuthal_projection),
)
aeqd_to_wgs84 = partial(
pyproj.transform,
pyproj.Proj(local_azimuthal_projection),
pyproj.Proj("+proj=longlat +datum=WGS84 +no_defs"),
)
center = Point(float(lon), float(lat))
point_transformed = transform(wgs84_to_aeqd, center)
buffer = point_transformed.buffer(radius)
# Get the polygon with lat lon coordinates
circle_poly = transform(aeqd_to_wgs84, buffer)
return circle_poly
#Convert df to gdf
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude))
#Create circle polygons col
gdf['polycir'] = [polycir(x, y, <'Radius in Meters'>) for x, y in zip(gdf.latitude,
gdf.longitude)]
gdf.set_geometry('polycir', inplace=True)
#You should be able to loop through the polygons and find the geometries that overlap with
# gdf_filtered = gdf[gdf.polycir.within(gdf.iloc[0,4])]
Looks like a job for k-means clustering.
You may need to be careful to how you define your distance (actual disctance "through" earth, or shortest path around?)
Turning each cluster into a polygon depends on what you want to do... just chain them or look for their convex enveloppe...
I have a satellite image file. Loaded into dask array. I want to get pixel value (nearest) of a latitude, longitude of interest.
Satellite image is in GEOS projection. I have longitude and latitude information as 2D numpy arrays.
Satellite Image file
I have loaded it into a dask data array
from satpy import Scene
import matplotlib as plt
import os
cwd = os.getcwd()
fn = os.path.join(cwd, 'EUMETSAT_data/1Jan21/MSG1-SEVI-MSG15-0100-NA-20210101185741.815000000Z-20210101185757-1479430.nat')
files = [fn]
scn = Scene(filenames=files, reader='seviri_l1b_native')
scn.load(["VIS006"])
da = scn['VIS006']
This is what the dask array looks like:
I read lon lats from the area attribute with the help of satpy:
lon, lat = scn['VIS006'].attrs['area'].get_lonlats()
print(lon.shape)
print(lat.shape)
(1179, 808)
(1179, 808)
I get a 2d numpy array each, for longitude and latitude that are coordinates but I can not use them for slicing or selecting.
What is the best practice/method to get nearest lat long, pixel information?
How do I project the data onto lat long coordinates that I can then use for indexing to arrive at the pixel value.
At the end, I want to get pixel value (nearest) of lat long of interest.
Thanks in advance!!!
The AreaDefinition object you are using (.attrs['area']) has a few methods for getting different coordinate information.
area = scn['VIS006'].attrs['area']
col_idx, row_idx = area.get_xy_from_lonlat(lons, lats)
scn['VIS006'].values[row_idx, col_idx]
Note that row and column are flipped. The get_xy_from_lonlat method should work for arrays or scalars.
There are other methods for getting X/Y coordinates of each pixel if that is what you're interesting in.
You can find the location with following:
import numpy as np
px,py = (23.0,55.0) # some location to take out values:
dist = np.sqrt(np.cos(lat*np.pi/180.0)*(lon-px)**2+(lat-py)**2); # this is the distance matrix from point (px,py)
kkout = np.squeeze(np.where(np.abs(dist)==np.nanmin(dist))); # find location where distance is minimum
print(kkout) # you will see the row and column, where to take out data
#serge ballesta - thanks for the direction
Answering my own question.
Project the latitude and longitude (platecaree projection) onto the GEOS projection CRS. Find x and y. Use this x and y and nearest select method of xarray to get pixel value from dask array.
import cartopy.crs as ccrs
data_crs = ccrs.Geostationary(central_longitude=41.5, satellite_height=35785831, false_easting=0, false_northing=0, globe=None, sweep_axis='y')
lon = 77.541677 # longitude of interest
lat = 8.079148 # latitude of interst
# lon lat system in
x, y = data_crs.transform_point(lon, lat, src_crs=ccrs.PlateCarree())
dn = ds.sel(x=x,y=y, method='nearest')
I have a pandas dataframe containing MULTIPOLYGON coordinates in (LON, LAT) format. I need to use this coordinates to add a polygon to an ipyleaflet map but I need to change the order of the coordinates to (LAT, LON)
df['Footprint'][0]
''MULTIPOLYGON (((-3.870231 39.827106,-3.49322 41.329609,-6.624273 41.739006,-6.931492 40.237854,-3.870231 39.827106)))''
# Here in locations, I have manually changed the order
polygon = Polygon(
locations=[(39.827106,-3.870231),(41.329609,-3.49322),(41.739006,-6.624273),(40.237854,-6.931492),(39.827106,-3.870231)],
color="green",
fill_color="green"
)
m = Map(center=(39.5531, -3.6914), zoom=6)
m.add_layer(polygon);
m
Any idea on how to do the trick?
I am trying to find the equivalent (if there exists one) of an NCL function that returns the indices of two-dimensional latitude/longitude arrays closest to a user-specified latitude/longitude coordinate pair.
This is the link to the NCL function that I am hoping there is an equivalent to in python. I'm suspecting at this point that there is not, so any tips on how to get indices from lat/lon coordinates is appreciated
https://www.ncl.ucar.edu/Document/Functions/Contributed/getind_latlon2d.shtml
Right now , I have my coordinate values saved into an .nc file and are read by:
coords='coords.nc'
fh = Dataset(coords, mode='r')
lons = fh.variables['g5_lon_1'][:,:]
lats = fh.variables['g5_lat_0'][:,:]
rot = fh.variables['g5_rot_2'][:,:]
fh.close()
I found scipy spatial.KDTree can perform similar task. Here is my code of finding the model grid that is closest to the observation location
from scipy import spatial
from netCDF4 import Dataset
# read in the one dimensional lat lon info from a dataset
fname = '0k_T_ann_clim.nc'
fid = Dataset(fname, 'r')
lat = fid.variables['lat'][:]
lon = fid.variables['lon'][:]
# make them a meshgrid for later use KDTree
lon2d, lat2d = np.meshgrid(lon, lat)
# zip them together
model_grid = list( zip(np.ravel(lon2d), np.ravel(lat2d)) )
#target point location : 30.5N, 56.1E
target_pts = [30.5 56.1]
distance, index = spatial.KDTree(model_grid).query(target_pts)
# the nearest model location (in lat and lon)
model_loc_coord = [coord for i, coord in enumerate(model_grid) if i==index]
I'm not sure how lon/lat arrays are stored when read in python, so to use the following solution you may need to convert lon/lat to numpy arrays. You can just put the abs(array-target).argmin() in a function.
import numpy as np
# make a dummy longitude array, 0.5 degree resolution.
lon=np.linspace(0.5,360,720)
# find index of nearest longitude to 25.4
ind=abs(lon-25.4).argmin()
# check it works! this gives 25.5
lon[ind]
I have thousands of polygons stored in a table format (given their 4 corner coordinates) which represent small regions of the earth. In addition, each polygon has a data value.
The file looks for example like this:
lat1, lat2, lat3, lat4, lon1, lon2, lon3, lon4, data
57.27, 57.72, 57.68, 58.1, 151.58, 152.06, 150.27, 150.72, 13.45
56.96, 57.41, 57.36, 57.79, 151.24, 151.72, 149.95, 150.39, 56.24
57.33, 57.75, 57.69, 58.1, 150.06, 150.51, 148.82, 149.23, 24.52
56.65, 57.09, 57.05, 57.47, 150.91, 151.38, 149.63, 150.06, 38.24
57.01, 57.44, 57.38, 57.78, 149.74, 150.18, 148.5, 148.91, 84.25
...
Many of the polygons intersect or overlap. Now I would like to create a n*m matrix ranging from -90° to 90° latitude and -180° to 180° longitude in steps of, for instance, 0.25°x0.25° to store the (area-weighted) mean data value of all polygons that fall within each pixel.
So, one pixel in the regular mesh shall get the mean value of one or more polygons (or none if no polygon overlaps with the pixel). Each polygon should contribute to this mean value depending on its area fraction within this pixel.
Basically the regular mesh and the polygons look like this:
If you look at pixel 2, you see that two polygons are inside this pixel. Thus, I have to take the mean data value of both polygons considering their area fractions. The result should be then stored in the regular mesh pixel.
I looked around the web and found no satisfactory approach for this so far. Since I am using Python/Numpy for daily work I would like to stick to it. Is this possible? The package shapely looks promising but I don't know where to begin with...
Porting everything to a postgis database is an awful amount of effort and I guess there will be quite a few obstacles in my way.
There are plenty of ways to do it, but yes, Shapely can help. It appears that your polygons are quadrilateral, but the approach I'll sketch doesn't count on that. You won't need anything other than box() and Polygon() from shapely.geometry.
For each pixel, find the polygons that approximately overlap with it by comparing the pixels bounds to the minimum bounding box of each polygon.
from shapely.geometry import box, Polygon
for pixel in pixels:
# say the pixel has llx, lly, urx, ury values.
pixel_shape = box(llx, lly, urx, ury)
for polygon in approximately_overlapping:
# say the polygon has a ``value`` and a 2-D array of coordinates
# [[x0,y0],...] named ``xy``.
polygon_shape = Polygon(xy)
pixel_value += polygon_shape.intersection(pixel_shape).area * value
If the pixel and polygon don't intersect, the area of their intersection will be 0 and the contribution of that polygon to that pixel vanishes.
I added a couple of things to my initial question, but this is a working solution so far. Do you have any ideas to speed things up? It is still quite slow. As input, I have over 100000 polygons and the meshgrid has 720*1440 grid cells. That is also why I changed the order, because there are a lot of grid cells with no intersecting polygons. Furthermore, when there is only one polygon that intersects with a grid cell, the grid cell receives the whole data value of the polygon.
In addition, since I have to store the area fraction and the data value for the "post-processing" part, I set the possible number of intersections to 10.
from shapely.geometry import box, Polygon
import h5py
import numpy as np
f = h5py.File('data.he5','r')
geo = f['geo'][:] #10 columns: 4xlat, lat center, 4xlon, lon center
product = f['product'][:]
f.close()
#prepare the regular meshgrid
delta = 0.25
darea = delta**-2
llx, lly = np.meshgrid( np.arange(-180, 180, delta), np.arange(-90, 90, delta) )
urx, ury = np.meshgrid( np.arange(-179.75, 180.25, delta), np.arange(-89.75, 90.25, delta) )
lly = np.flipud(lly)
ury = np.flipud(ury)
llx = llx.flatten()
lly = lly.flatten()
urx = urx.flatten()
ury = ury.flatten()
#initialize the data structures
data = np.zeros(len(llx),'f2')+np.nan
counter = np.zeros(len(llx),'f2')
fraction = np.zeros( (len(llx),10),'f2')
value = np.zeros( (len(llx),10),'f2')
#go through all polygons
for ii in np.arange(1000):#len(hcho)):
percent = (float(ii)/float(len(hcho)))*100
print("Polygon: %i (%0.3f %%)" % (ii, percent))
xy = [ [geo[ii,5],geo[ii,0]], [geo[ii,7],geo[ii,2]], [geo[ii,8],geo[ii,3]], [geo[ii,6],geo[ii,1]] ]
polygon_shape = Polygon(xy)
# only go through grid cells which might intersect with the polygon
minx = np.min( geo[ii,5:9] )
miny = np.min( geo[ii,:3] )
maxx = np.max( geo[ii,5:9] )
maxy = np.max( geo[ii,:3] )
mask = np.argwhere( (lly>=miny) & (lly<=maxy) & (llx>=minx) & (llx<=maxx) )
if mask.size:
cc = 0
for mm in mask:
cc = int(counter[mm])
pixel_shape = box(llx[mm], lly[mm], urx[mm], ury[mm])
fraction[mm,cc] = polygon_shape.intersection(pixel_shape).area * darea
value[mm,cc] = hcho[ii]
counter[mm] += 1
print("post-processing")
mask = np.argwhere(counter>0)
for mm in mask:
for cc in np.arange(counter[mm]):
maxfraction = np.sum(fraction[mm,:])
value[mm,cc] = (fraction[mm,cc]/maxfraction) * value[mm,cc]
data[mm] = np.mean(value[mm,:int(counter[mm])])
data = data.reshape( 720, 1440 )