Plotting coordinates with Matplotlib is distorting the base-map - python

I am trying to show a spatial distribution of shops on a map using Geopandas and Matplotlib.
Problem:
When I am plotting the pins the base map gets distorted. Here is a sample before plotting the pins and after .
Question:
What is the source of this distortion? How can I prevent it?
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
# Creating the simplified polygon
latitude = [60.41125, 59.99236, 59.99236]
longitude = [24.66917, 24.66917, 25.36972]
geometry = Polygon(zip(longitude, latitude))
polygon = gpd.GeoDataFrame(index=[0], crs = 'epsg:4326', geometry=[geometry])
# ploting the basemap
ax = polygon.plot(color="#3791CB")
# Dict of sample coordinates
coordinates = {"latitude": ["60.193141", "60.292777", "60.175053", "60.163187", "60.245272", "60.154392", "60.182906"],
"longitude": ["24.934214", "24.969730", "24.831068", "24.739044", "24.860983", "24.884773", "24.959175"]}
# Creating a dataframe from coordinates
df = pd.DataFrame(coordinates)
# Creating the GeoDataFrame
shops = gpd.GeoDataFrame(coordinates, geometry=gpd.points_from_xy(df.longitude, df.latitude))
# Plotting office coordinates
shops.plot(ax=ax, color="red", markersize = 20, zorder=2)
# adding grid
plt.grid(linestyle=":", color='grey')
plt.show()
Thank you!

You're map and pins have different reference systems..
When you create your first GeoDataFrame you specify its Coordinate Reference System (crs = 'epsg:4326'). When you create the geodataframe for the shop coordinates you don't. This is where the distortion is coming from..
This should fix it:
shops = gpd.GeoDataFrame(
coordinates,
geometry = gpd.points_from_xy(
df.longitude,
df.latitude),
crs = "EPSG:4326"
)
)
Cheers!

Related

why isnt my gridded data showing up on basemap?

I am trying to plot NASA GISS gridded temperature data but my maps keep showing up blank. Below is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import geopandas as gpd
import xarray as xr
ncin = xr.open_dataset('GriddedAir250.nc')
lons = ncin.variables['lon'][:]
lats = ncin.variables['lat'][:]
air = ncin.air
MeanTmax=air.mean(dim='time')
m=Basemap(projection='merc',
llcrnrlon= -123.416059,
llcrnrlat=18.954443,
urcrnrlon=-61.285950,
urcrnrlat= 47.536340,
resolution='i')
lon, lat = np.meshgrid(lons, lats)
xi, yi = m(lon, lat)
# Add Coastlines, States, and Country Boundaries
m.drawcoastlines()
m.drawstates()
m.drawcountries()
# Plot Data
cs = m.pcolor(xi,yi,np.squeeze(MeanTmax))
# Add Colorbar
cbar = m.colorbar(cs, location='bottom', pad="10%")
cbar.set_label('winter')
# Add Title
plt.title('DJF Maximum Temperature')
plt.show()
All I get is a blank map that looks like this. Why isn't the temperature data showing up?
The longitude grid in the source data is from 0 to 360 rather than -180 to 180. Because of this, it's likely that you've filtered out all of the data in your basemap projection command. I haven't tested because I don't have the deprecated basemap package.

How to add a an additional point location while plotting geopandas dataframe using matplotlib

I am using the following code to plot an additional coordinate/point on top of a matplotlib plot of a geopandas dataframe. However, as the image indicates the point is not overlapping the choropleth - it should, since the latitude & longitude lies within the geographic location for which the data has been obtained from census. Please advise.
from cenpy import products
import pandas as pd
import matplotlib.pyplot as plt
import geopandas
%matplotlib inline
tustin = products.ACS(2018).from_county('Orange County, CA', level='tract',
variables=['B23025_005E', 'B23025_003E'])
tustin['pct_unemployed'] = tustin.B23025_005E / tustin.B23025_003E * 100
print(tustin.crs)
# additional data co-ordinate
lat = 3374569.5
lon = -11782886.0
df = pd.DataFrame(
{'Place': ['X'],
'Latitude': [lat],
'Longitude': [lon]})
gdf = geopandas.GeoDataFrame(
df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude))
# setting the same crs as the main geopandas dataframe
gdf.crs = {'init': 'epsg:3857'}
# plot the first dataframe 'tustin' as a choropleth
f, ax = plt.subplots(1,1,figsize=(20,20))
tustin.dropna(subset=['pct_unemployed'], axis=0).plot('pct_unemployed', ax=ax, cmap='plasma')
ax.set_facecolor('k')
gdf.plot(ax=ax, color='red')
#ax.plot(-13144890.450, 3992372.350, "ro")
plt.show()
enter image description here

Geopandas data not plotting correctly

I don't have very much experience with GeoPandas at all, so I am a little lost. I am trying to plot this data
jupyterNotebook dataframe image
I have followed many references on the GeoPandas website, read through blog posts, and this stack overflow post. All of them tell me to do the same thing, but it seems to still now be working.
Ploting data in geopandas
When I try to plot this data, it comes out this like:
enter image description here
All I am trying to do is plot points from this csv file that has latitude and longitude data onto a map (eventually a map that I have loaded from an .shp file).
Anyways, here is the code I have written so far:
import csv
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import descartes
from shapely.geometry import Point, Polygon
#Load in the CSV Bike Station Location Data
df = pd.read_csv('HRSQ12020.csv')
#combine the latitude and longitude to make coordinates
df['coordinates'] = df[['Longitude', 'Latitude']].values.tolist()
# Change the coordinates to a geoPoint
df['coordinates'] = df['coordinates'].apply(Point)
df
#convert df to a geodf
df = gpd.GeoDataFrame(df, geometry='coordinates')
df
#plot the geodf
df.plot(figsize=(20,10));
Any ideas what is wrong? I check all 100 coordinates and they all seem to be fine. Any suggestions would be great! Thanks!
It's likely to be a problem of projection system. A good thing to do is defining immediately the crs when creating a Geopandas object. If you try,
df = gpd.GeoDataFrame(df, geometry='coordinates', crs = 4326)
maybe you will be able to see your points. I put "4326" because your x-y coordinates look like GPS coordinates which are WSG84 standards (crs code: 4326). Change to the relevent crs code if it's not the good one.
Those responses above are helpful. This also turned out to be another solution as lingo suggested to set the crs. I was getting an error, but this worked out when I ignored the error. Here is my code that ended up working.
import csv
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import descartes
from shapely.geometry import Point, Polygon
#Load in the CSV Bike Station Location Data
df = pd.read_csv('HRSQ12020.csv')
#combine the latitude and longitude to make coordinates
df['coordinates'] = df[['Longitude', 'Latitude']].values.tolist()
# Change the coordinates to a geoPoint
df['coordinates'] = df['coordinates'].apply(Point)
df.head()
#fixing wrong negative value for Latitude
df.loc[df["Latitude"] == df["Latitude"].min()]
df.at[80, 'Latitude'] = 40.467715
#count the numner of racks at each station
rackTot = 0
for index, row in df.iterrows():
rackTot += row['NumRacks']
crs = {'init' :'epsg:4326'}
geometry = [Point(xy) for xy in zip(df.Longitude, df.Latitude)]
geobikes = gpd.GeoDataFrame(df, crs=crs, geometry=geometry)
geobikes.head()
#plot the geodf
#not working for some reason, fix later
geobikes.plot()
When I run your code with the first four rows of coords, I get what you'd expect. From the extent of your plot, it looks like you might have some negative latitude values. Can you do df['Latitude'].min() to check?
import csv
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
from shapely.geometry import Point, Polygon
df = pd.DataFrame({'Latitude' :[40.441326, 40.440877, 40.439030, 40.437200],
'Longitude' :[-80.004679, -80.003080, -80.001860, -80.000375]})
df['coordinates'] = df[['Longitude', 'Latitude']].values.tolist()
# Change the coordinates to a geoPoint
df['coordinates'] = df['coordinates'].apply(Point)
df
#convert df to a geodf
df = gpd.GeoDataFrame(df, geometry='coordinates')
df
#plot the geodf
df.plot(figsize=(20,10));
You can also use plt.subplots() and then set xlim and ylim for your data.
df = pd.DataFrame({'Latitude' :[40.441326, 41.440877, 42.439030, 43.437200],
'Longitude' :[-78.004679, -79.003080, -80.001860, -81.000375]})
df['coordinates'] = df[['Longitude', 'Latitude']].values.tolist()
# Change the coordinates to a geoPoint
df['coordinates'] = df['coordinates'].apply(Point)
df
#convert df to a geodf
df = gpd.GeoDataFrame(df, geometry='coordinates')
print(type(df))
#plot the geodf
fig, ax = plt.subplots(figsize=(14,6))
df.plot(ax = ax)
xlim = ([df.total_bounds[0] - 1, df.total_bounds[2] + 1])
ylim = ([df.total_bounds[1] - 1, df.total_bounds[3] + 1])
# you can also pass in the xlim or ylim vars defined above
ax.set_xlim([-82, -77])
ax.set_ylim([40, 42])
plt.show()

Generate grid of latitude-longitude coordinates that fall within polygon

I'm trying to plot data onto a map. I would like to generate data for specific points on the map (e.g. transit times to one or more prespecified location) for a specific city.
I found data for New York City here: https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm
It looks like they have a shapefile available. I'm wondering if there is a way to sample a latitude-longitude grid within the bounds of the shapefile for each borough (perhaps using Shapely package, etc).
Sorry if this is naive, I'm not very familiar with working with these files--I'm doing this as a fun project to learn about them
I figured out how to do this. Essentially, I just created a full grid of points and then removed those that did not fall within the shape files corresponding to the boroughs. Here is the code:
import geopandas
from geopandas import GeoDataFrame, GeoSeries
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import matplotlib.cm as cm
%matplotlib inline
import seaborn as sns
from shapely.geometry import Point, Polygon
import numpy as np
import googlemaps
from datetime import datetime
plt.rcParams["figure.figsize"] = [8,6]
# Get the shape-file for NYC
boros = GeoDataFrame.from_file('./Borough Boundaries/geo_export_b641af01-6163-4293-8b3b-e17ca659ed08.shp')
boros = boros.set_index('boro_code')
boros = boros.sort_index()
# Plot and color by borough
boros.plot(column = 'boro_name')
# Get rid of are that you aren't interested in (too far away)
plt.gca().set_xlim([-74.05, -73.85])
plt.gca().set_ylim([40.65, 40.9])
# make a grid of latitude-longitude values
xmin, xmax, ymin, ymax = -74.05, -73.85, 40.65, 40.9
xx, yy = np.meshgrid(np.linspace(xmin,xmax,100), np.linspace(ymin,ymax,100))
xc = xx.flatten()
yc = yy.flatten()
# Now convert these points to geo-data
pts = GeoSeries([Point(x, y) for x, y in zip(xc, yc)])
in_map = np.array([pts.within(geom) for geom in boros.geometry]).sum(axis=0)
pts = GeoSeries([val for pos,val in enumerate(pts) if in_map[pos]])
# Plot to make sure it makes sense:
pts.plot(markersize=1)
# Now get the lat-long coordinates in a dataframe
coords = []
for n, point in enumerate(pts):
coords += [','.join(__ for __ in _.strip().split(' ')[::-1]) for _ in str(point).split('(')[1].split(')')[0].split(',')]
which results in the following plots:
I also got a matrix of lat-long coordinates I used to make a transport-time map for every point in the city to Columbia Medical Campus. Here is that map:
and a zoomed-up version so you can see how the map is made up of the individual points:

Make geopandas DataFrame plot on top a raster plot fit correctly

I am trying to plot the Pumps.shp data on top of the OSMap.tif file from this website on the same figure.
I tried using rasterio.plot() and geopandas.plot() methods, with matplotlibs subplots.
The problem is that the plots don't match, the raster file gets plotted in the range(0,1000) for both axis and the shp gets plotted in the actual coordinates range(around 50000 on the x axis and around).
The crs are equal in both objects and the coordinates are in the same range. Why is this? What am I doing wrong?
Here is my code
import rasterio as rast
import rasterio.plot as rsplot
import geopandas as gpd
src=rast.open("OSMap.tif")
data=gpd.read_file("Pumps.shp")
fig,ax=plt.subplots()
rsplot.show(src,ax=ax)
data.plot(ax=ax)
plt.show()
This is the result of calling src.bounds:
BoundingBox(left=528765.0, bottom=180466.0, right=529934.0, top=181519.0)
This is the result of data.bounds
(528765.0, 180466.0, 529934.0, 181519.0)
This is crs of both:
CRS({'lon_0': -2, 'y_0': -100000, 'k': 0.9996012717, 'lat_0': 49, 'proj': 'tmerc', 'wktext': True, 'datum': 'OSGB36', 'no_defs': True, 'x_0': 400000, 'units': 'm'})
I had the same problem with rasterio 0.36.0. I first tried to translate and scale the raster but than prefered to translate the shapefile.
My code looks like:
import geopandas as gpd
import matplotlib.pyplot as plt
import rasterio
image = rasterio.open('input.tif') # with tgw world file
shapefile = gpd.read_file('input.shp')
# coordinates and scaling factors
scale_x = image.transform[1]
scale_y = image.transform[5]
x0 = image.transform[0]
y0 = image.transform[3]
# translates back shapefile
shapefile.geometry = shapefile.translate(-x0, -y0)
shapefile.geometry = shapefile.scale(-1.0/scale_x, -1.0/scale_y, origin=(0, 0, 0))
# plots both elements
fig, ax = plt.subplots()
ax = rasterio.plot.show(image.read(), with_bounds=True, ax=ax)
shapefile.plot(ax=ax)
Use matplotlib imshow instead of rasterio show. Pass the bounds of raster as "extent" parameter of imshow.

Categories