Overlaying Shapefile datapoints on Density Map - python

I am new to shapefiles and mapping in python so I was hoping to get some help with overlaying data points from a shapefile on a density map.
To be honest, I am a beginner with mapping and reading in shapefiles so what I have so far not much.
I have started off using pyshp but if there are better packages out there to do this then I would love any feedback.
The following code is to create the base map of the LA area:
def get_base_map(rides_clean):
return folium.Map(locations=[rides_clean.start_lat.mean(),
rides_clean.start_lon.mean()],
zoom_start = 20, tiles = 'cartodbpositron')
The following code is to create the density/heat map:
from folium import plugins
stationArr = rides_clean[['start_lat', 'start_lon']][:40000].as_matrix()
get_base_map(rides_clean).add_child(plugins.HeatMap(stationArr,
radius=40, max_val=300))
The following code is the same heat map but with route lines added:
(draw_route_lines(get_base_map(rides_clean),
routedf_vol)).add_child(plugins.HeatMap(stationArr, radius=40,
max_val=300))
I want to see data points from the shapefile shown as markers on top of the density plot.

It is possible to do this with pyshp. I've only ever used Matplotlib to plot shapefile points on a map, but this method will create two arrays which will be the x and y coordinates of each point you'd like to plot. The first snippet is used if you have multiple shapes in your shapefile, while the second can be used if you only have one shape.
import shapefile
import numpy as np
sf = shapefile.Reader('/path/to/shapefile')
point_list = []
for shape in sf:
temp = shape.points()
point_list.append(temp)
point_list = np.array(point_list)
x = point_list[:,0]
y = point_list[:,1]
And for a shapefile with only a single shape:
import shapefile
import numpy as np
sf = shapefile.Reader('/path/to/shapefile')
point_list = np.array(sf.shape(0).points)
x = point_list[:,0]
y = point_list[:,1]
You can tell how many shapes are in your shapefile using sf.shapes() and it will print a list detailing all the different shapes. From your question it appeared you were wanting to plot it as points on the marker rather than lines, sorry if this is not the case.

Related

Find in what polygon is each point

I am new to Python, so I apologize for the rudimentary programming skills, I am aware I am using a bit too much "loop for" (coming from Matlab it is dragging me down).
I have millions of points (timestep, long, lat, pointID) and hundreds of irregular non-overlapping polygons (vertex_long,vertex_lat,polygonID).points and polygons format sample
I want to know what polygon contains each point.
I was able to do it this way:
from matplotlib import path
def inpolygon(lon_point, lat_point, lon_poly, lat_poly):
shape = lon_point.shape
lon_point = lon_point.reshape(-1)
lat_point = lat_point.reshape(-1)
lon_poly = lon_poly.values.reshape(-1)
lat_poly = lat_poly.values.reshape(-1)
points = [(lon_point[i], lat_point[i]) for i in range(lon_point.shape[0])]
polys = path.Path([(lon_poly[i], lat_poly[i]) for i in range(lon_poly.shape[0])])
return polys.contains_points(points).reshape(shape)
And then
import numpy as np
import pandas as pd
Areas_Lon = Areas.iloc[:,0]
Areas_Lat = Areas.iloc[:,1]
Areas_ID = Areas.iloc[:,2]
Unique_Areas = np.unique(Areas_ID)
Areas_true=np.zeros((Areas_ID.shape[0],Unique_Areas.shape[0]))
for i in range(Areas_ID.shape[0]):
for ii in range(Unique_Areas.shape[0]):
Areas_true[i,ii]=(Areas_ID[i]==Unique_Areas[ii])
Areas_Lon_Vertex=np.zeros(Unique_Areas.shape[0],dtype=object)
Areas_Lat_Vertex=np.zeros(Unique_Areas.shape[0],dtype=object)
for i in range(Unique_Areas.shape[0]):
Areas_Lon_Vertex[i]=(Areas_Lon[(Areas_true[:,i]==1)])
Areas_Lat_Vertex[i]=(Areas_Lat[(Areas_true[:,i]==1)])
import f_inpolygon as inpolygon
Areas_in=np.zeros((Unique_Areas.shape[0],Points.shape[0]))
for i in range (Unique_Areas.shape[0]):
for ii in range (PT.shape[0]):
Areas_in[i,ii]=(inpolygon.inpolygon(Points[ii,2], Points[ii,3], Areas_Lon_Vertex[i], Areas_Lat_Vertex[i]))
This way the final outcome Areas_in Areas_in format contains as many rows as polygons and as many columns as points, where every column is true=1 at the row where the point is relative to polygon index (1st given polygon ID --> 1st row, and so).
The code works but very slowly for what it is supossed to do. When locating points in a regular grid or within a point radius I have succesfully tried implement a KDtree, what increases dramatically the speed, but I can`t do the same or whatever faster to irregular non-overlapping polygons.
I have seen some related questions but rather than asking for what polygons a point is were about whether a point is inside a polygon or not.
Any idea please?
Have you tried Geopandas Spatial join?
install the Package using pip
pip install geopandas
or conda
conda install -c conda-forge geopandas
then you should able to read the data as GeoDataframe
import geopandas
df = geopandas.read_file("file_name1.csv") # you can read shp files too.
right_df = geopandas.read_file("file_name2.csv") # you can read shp files too.
# Convert into geometry column
geometry = [Point(xy) for xy in zip(df['longitude'], df['latitude'])] # Coordinate reference system : WGS84
crs = {'init': 'epsg:4326'}
# Creating a Geographic data frame
left_df = geopandas.GeoDataFrame(df, crs=crs, geometry=geometry)
Then you can apply the sjoin
jdf = geopandas.sjoin(left_df, right_df, how='inner', op='intersects', lsuffix='left', rsuffix='right')
the option in op are:
intersects
contains
within
All should do the same in your case when you joining two geometry columns of type Polygon and Point

Plotting with folium

The task is to make an adress popularity map for Moscow. Basically, it should look like this:
https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/GeoJSON_and_choropleth.ipynb
For my map I use public geojson: http://gis-lab.info/qa/moscow-atd.html
The only data I have - points coordinates and there's no information about the district they belong to.
Question 1:
Do I have to manually calculate for each disctrict if the point belongs to it, or there is more effective way to do this?
Question 2:
If there is no way to do this easier, then, how can I get all the coordinates for each disctrict from the geojson file (link above)?
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
Reading in the Moscow area shape file with geopandas
districts = gpd.read_file('mo-shape/mo.shp')
Construct a mock user dataset
moscow = [55.7, 37.6]
data = (
np.random.normal(size=(100, 2)) *
np.array([[.25, .25]]) +
np.array([moscow])
)
my_df = pd.DataFrame(data, columns=['lat', 'lon'])
my_df['pop'] = np.random.randint(500, 100000, size=len(data))
Create Point objects from the user data
geom = [Point(x, y) for x,y in zip(my_df['lon'], my_df['lat'])]
# and a geopandas dataframe using the same crs from the shape file
my_gdf = gpd.GeoDataFrame(my_df, geometry=geom)
my_gdf.crs = districts.crs
Then the join using default value of 'inner'
gpd.sjoin(districts, my_gdf, op='contains')
Thanks to #BobHaffner, I tried to solve the problem using geopandas.
Here are my steps:
I download a shape-files for Moscow using this link click
From a list of tuples containing x and y (latitude and logitude) coordinates I create list of Points (docs)
Assuming that in the dataframe from the first link I have polygons I can write a simple loop for checking if the Point is inside this polygon. For details read this.

How to create a geographical heatmap passing custom radii

I want to create a visualization on a map using folium. In the map I want to observe how many items are related to a particular geographical point building a heatmap. Below is the code I'm using.
import pandas as pd
import folium
from folium import plugins
data = [[41.895278,12.482222,2873494.0,20.243001,20414,7.104243],
[41.883850,12.333330,3916.0,0.835251,4,1.021450],
[41.854241,12.567000,22263.0,1.132390,35,1.572115],
[41.902147,12.590388,19505.0,0.839181,37,1.896950],
[41.994240,12.48520,16239.0,1.383981,25,1.539504]]
df = pd.DataFrame(columns=['latitude','longitude','population','radius','count','normalized'],data=data)
middle_lat = df['latitude'].median()
middle_lon = df['longitude'].median()
m = folium.Map(location=[middle_lat, middle_lon],tiles = "Stamen Terrain",zoom_start=11)
# convert to (n, 2) nd-array format for heatmap
points = df[['latitude', 'longitude', 'normalized']].dropna().values
# plot heatmap
plugins.HeatMap(points, radius=15).add_to(m)
m.save(outfile='map.html')
Here the result
In this map, each point has the same radius. Insted, I want to create a heatmap in which the points radius is proportional with the one of the city it belongs to. I already tried to pass the radii in a list, but it is not working, as well as passing the values with a for loop.
Any idea?
You need to add one point after another. So you can specify the radius for each point. Like this:
import random
import numpy
pointArrays = numpy.split(points, len(points))
radii = [5, 10, 15, 20, 25]
for point, radius in zip(pointArrays, radii):
plugins.HeatMap(point, radius=radius).add_to(m)
m.save(outfile='map.html')
Here you can see, each point has a different size.

Extracting the value of 2-d gridded field by scatter points in Python

I have a 2-d gridded files which represents the land use catalogues for the place of interest.
I also have some lat/lon based point distributed in this area.
from netCDF4 import Dataset
## 2-d gridded files
nc_file = "./geo_em.d02.nc"
geo = Dataset(nc_file, 'r')
lu = geo.variables["LU_INDEX"][0,:,:]
lat = geo.variables["XLAT_M"][0,:]
lon = geo.variables["XLONG_M"][0,:]
## point files
point = pd.read_csv("./point_data.csv")
plt.pcolormesh(lon,lat,lu)
plt.scatter(point_data.lon,cf_fire_data.lat, color ='r')
I want to extract the values of the 2-d gridded field which those points belong, but I found it is difficult to define a simple function to solve that.
Is there any efficient method to achieve it?
Any advices would be appreciated.
PS
I have uploaded my files here
1. nc_file
2. point_file
I can propose solution like this, where I just loop over the points and select the data based on the distance from the point.
#/usr/bin/env ipython
import numpy as np
from netCDF4 import Dataset
import matplotlib.pylab as plt
import pandas as pd
# --------------------------------------
## 2-d gridded files
nc_file = "./geo_em.d02.nc"
geo = Dataset(nc_file, 'r')
lu = geo.variables["LU_INDEX"][0,:,:]
lat = geo.variables["XLAT_M"][0,:]
lon = geo.variables["XLONG_M"][0,:]
## point files
point = pd.read_csv("./point_data.csv")
plt.pcolormesh(lon,lat,lu)
#plt.scatter(point_data.lon,cf_fire_data.lat, color ='r')
# --------------------------------------------
# get data for points:
dataout=[];
lon_ratio=np.cos(np.mean(lat)*np.pi/180.0)
for ii in range(len(point)):
plon,plat = point.lon[ii],point.lat[ii]
distmat=np.sqrt(1./lon_ratio*(lon-plon)**2+(lat-plat)**2)
kk=np.where(distmat==np.min(distmat));
dataout.append([float(lon[kk]),float(lat[kk]),float(lu[kk])]);
# ---------------------------------------------

Mapping GPS coordinates around Chicago using Basemap

I'm using python's matplotlib and Basemap libraries.
I'm attempting to plot a list of GPS points around the city of Chicago for a project that I'm working on but it's not working. I've looked at all of the available examples, but despite copying and pasting them verbatim (and then changing the gps points) the map fails to render with the points plotted.
Here are some example points as they are stored in my code:
[(41.98302392, -87.71849159),
(41.77351707, -87.59144826),
(41.77508317, -87.58899995),
(41.77511247, -87.58646695),
(41.77514645, -87.58515301),
(41.77538531, -87.58611272),
(41.71339537, -87.56963306),
(41.81685612, -87.59757281),
(41.81697313, -87.59910809),
(41.81695808, -87.60049861),
(41.75894604, -87.55560586)]
and here's the code that I'm using to render the map (which doesn't work).
# -*- coding: utf-8 -*-
from pymongo import *
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
from collections import Counter
import ast
def routes_map():
"""
doesn't work :(
# map of chicago
"""
all_locations = [] #<-- this is the example data above
x = []
y = []
for loc in all_locations: #creates two lists for the x and y (lat,lon) coordinates
x.append(float(loc[0]))
y.append(float(loc[1]))
# llcrnrlat,llcrnrlon,urcrnrlat,urcrnrlon
# are the lat/lon values of the lower left and upper right corners
# of the map.
# resolution = 'i' means use intermediate resolution coastlines.
# lon_0, lat_0 are the central longitude and latitude of the projection.
loc = [41.8709, -87.6331]
# setup Lambert Conformal basemap.
m = Basemap(llcrnrlon=-90.0378,llcrnrlat=40.6046,urcrnrlon=-85.4277,urcrnrlat=45.1394,
projection='merc',resolution='h')
# draw coastlines.
m.drawcoastlines()
m.drawstates()
# draw a boundary around the map, fill the background.
# this background will end up being the ocean color, since
# the continents will be drawn on top.
m.drawmapboundary(fill_color='white')
x1, y1 = m(x[:100],y[:100])
m.plot(x1,y1,marker="o",alpha=1.0)
plt.title("City of Chicago Bus Stops")
plt.show()
This is what I get from running this code:
Does anyone have any tips as to what I'm doing wrong?
You are accidentally inputting latitude values as x and longitude values as y. In the example data you give, the first column is latitude and the second column is longitude, not the other way around as your code seems to think.
So use x.append(float(loc[1])) and y.append(float(loc[0])) instead of what you have.

Categories