Geopandas data not plotting correctly - python

I don't have very much experience with GeoPandas at all, so I am a little lost. I am trying to plot this data
jupyterNotebook dataframe image
I have followed many references on the GeoPandas website, read through blog posts, and this stack overflow post. All of them tell me to do the same thing, but it seems to still now be working.
Ploting data in geopandas
When I try to plot this data, it comes out this like:
enter image description here
All I am trying to do is plot points from this csv file that has latitude and longitude data onto a map (eventually a map that I have loaded from an .shp file).
Anyways, here is the code I have written so far:
import csv
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import descartes
from shapely.geometry import Point, Polygon
#Load in the CSV Bike Station Location Data
df = pd.read_csv('HRSQ12020.csv')
#combine the latitude and longitude to make coordinates
df['coordinates'] = df[['Longitude', 'Latitude']].values.tolist()
# Change the coordinates to a geoPoint
df['coordinates'] = df['coordinates'].apply(Point)
df
#convert df to a geodf
df = gpd.GeoDataFrame(df, geometry='coordinates')
df
#plot the geodf
df.plot(figsize=(20,10));
Any ideas what is wrong? I check all 100 coordinates and they all seem to be fine. Any suggestions would be great! Thanks!

It's likely to be a problem of projection system. A good thing to do is defining immediately the crs when creating a Geopandas object. If you try,
df = gpd.GeoDataFrame(df, geometry='coordinates', crs = 4326)
maybe you will be able to see your points. I put "4326" because your x-y coordinates look like GPS coordinates which are WSG84 standards (crs code: 4326). Change to the relevent crs code if it's not the good one.

Those responses above are helpful. This also turned out to be another solution as lingo suggested to set the crs. I was getting an error, but this worked out when I ignored the error. Here is my code that ended up working.
import csv
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import descartes
from shapely.geometry import Point, Polygon
#Load in the CSV Bike Station Location Data
df = pd.read_csv('HRSQ12020.csv')
#combine the latitude and longitude to make coordinates
df['coordinates'] = df[['Longitude', 'Latitude']].values.tolist()
# Change the coordinates to a geoPoint
df['coordinates'] = df['coordinates'].apply(Point)
df.head()
#fixing wrong negative value for Latitude
df.loc[df["Latitude"] == df["Latitude"].min()]
df.at[80, 'Latitude'] = 40.467715
#count the numner of racks at each station
rackTot = 0
for index, row in df.iterrows():
rackTot += row['NumRacks']
crs = {'init' :'epsg:4326'}
geometry = [Point(xy) for xy in zip(df.Longitude, df.Latitude)]
geobikes = gpd.GeoDataFrame(df, crs=crs, geometry=geometry)
geobikes.head()
#plot the geodf
#not working for some reason, fix later
geobikes.plot()

When I run your code with the first four rows of coords, I get what you'd expect. From the extent of your plot, it looks like you might have some negative latitude values. Can you do df['Latitude'].min() to check?
import csv
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
from shapely.geometry import Point, Polygon
df = pd.DataFrame({'Latitude' :[40.441326, 40.440877, 40.439030, 40.437200],
'Longitude' :[-80.004679, -80.003080, -80.001860, -80.000375]})
df['coordinates'] = df[['Longitude', 'Latitude']].values.tolist()
# Change the coordinates to a geoPoint
df['coordinates'] = df['coordinates'].apply(Point)
df
#convert df to a geodf
df = gpd.GeoDataFrame(df, geometry='coordinates')
df
#plot the geodf
df.plot(figsize=(20,10));
You can also use plt.subplots() and then set xlim and ylim for your data.
df = pd.DataFrame({'Latitude' :[40.441326, 41.440877, 42.439030, 43.437200],
'Longitude' :[-78.004679, -79.003080, -80.001860, -81.000375]})
df['coordinates'] = df[['Longitude', 'Latitude']].values.tolist()
# Change the coordinates to a geoPoint
df['coordinates'] = df['coordinates'].apply(Point)
df
#convert df to a geodf
df = gpd.GeoDataFrame(df, geometry='coordinates')
print(type(df))
#plot the geodf
fig, ax = plt.subplots(figsize=(14,6))
df.plot(ax = ax)
xlim = ([df.total_bounds[0] - 1, df.total_bounds[2] + 1])
ylim = ([df.total_bounds[1] - 1, df.total_bounds[3] + 1])
# you can also pass in the xlim or ylim vars defined above
ax.set_xlim([-82, -77])
ax.set_ylim([40, 42])
plt.show()

Related

Folium HeatMapWithTime html file generated is blank

I created a self-contained code to create a HeatMapWithTime map but it shows up as a blank file. This code is run on Jupyter and the output is a 14KB file and I've tried to open it in Chrome, Safari, Firefox but it is still blank.
import folium
import pandas as pd
import numpy as np
from folium.plugins import HeatMapWithTime
# Generate dummy data
latitudes = np.random.uniform(low=45.523, high=45.524, size=50)
longitudes = np.random.uniform(low=-122.675, high=-122.676, size=50)
times = np.sort(np.random.uniform(low=1580000000, high=1600000000, size=50))
data = {'latitude': latitudes, 'longitude': longitudes, 'time': times}
# Create a pandas dataframe from the dummy data
df = pd.DataFrame(data)
df['time'] = pd.to_datetime(df['time'], unit='s')
# Create a base map
map = folium.Map(location=[45.523, -122.675], zoom_start=13)
# Create a heat map with timeline
HeatMapWithTime(data=df[['latitude', 'longitude', 'time']].values.tolist(),
index=df['time'].dt.strftime("%Y-%m-%d %H:%M:%S"),
auto_play=True,
max_opacity=0.8).add_to(map)
# Save the map to an html file
map.save("heatmap_with_timeline.html")
Folium version: 0.14.0
Python version: 3.9.12
To begin with, the target data for the heatmap is time-series data in date format. The sample data itself was raw data, but it was converted to date format. Also, I think the index of the time animation of the heatmap also needs to be in list format. Finally, the sample data is a latitude/longitude and heatmap value for one time series. Since this folium heatmap is a densitiy heatmap, multiple groups of data may be necessary. To create the data to draw the heatmap, utilizing your sample data, I have added an array of 50 latitude/longitude and heatmap values for each time series index in a loop process for 50 indexes.
For data structures and examples, please refer to the following references. HeatMapWithTime Plugin
import folium
import pandas as pd
import numpy as np
from folium.plugins import HeatMapWithTime
# Generate dummy data
times = np.sort(np.random.uniform(low=1580000000, high=1600000000, size=50))
data = []
for i in range(len(times)):
latitudes = np.random.uniform(low=45.423, high=45.524, size=50)
longitudes = np.random.uniform(low=-122.575, high=-122.676, size=50)
value = np.random.uniform(0.0, 20.0, 50)
row = []
for lat, lon ,v in zip(latitudes,longitudes,value):
row.append([lat, lon, v])
data.append(row)
index_time = pd.to_datetime(times, unit='s')
index_time = index_time.strftime("%Y-%m-%d %H:%M:%S").tolist()
# Create a base map
m = folium.Map(location=[45.523, -122.675], tiles="stamentoner", zoom_start=11)
# Create a heat map with timeline
hm = HeatMapWithTime(data,
index=index_time,
auto_play=True,
max_opacity=0.8)
hm.add_to(m)
# Save the map to an html file
#map.save("heatmap_with_timeline.html")
m
This should help.
import pandas as pd
import folium
from folium.plugins import HeatMap
#for_map = pd.read_csv('campaign_contributions_for_map.tsv', sep='\t')
df = pd.read_csv('C:\\your_path\\business.csv')
df.head(3)
max_amount = float(df['stars'].max())
hmap = folium.Map(location=[42.5, -75.5], zoom_start=7, )
hm_wide = HeatMap( list(zip(df.latitude.values, df.longitude.values, df.stars.values)),
min_opacity=0.2,
max_val=max_amount,
radius=17, blur=15,
max_zoom=1,
)
hmap.add_child(hm_wide)
import pandas as pd
import gmplot
import matplotlib.pyplot as plt
import folium
from folium import plugins
import seaborn as sns
df = pd.read_csv('C:\\your_path\\lat_lon.csv')
m = folium.Map([40.7379601, -73.9666422], zoom_start=11)
m
X = df[['longitude', 'latitude', 'LOT']].copy()
# mark each station as a point
for index, row in X.iterrows():
folium.CircleMarker([row['latitude'], row['longitude']],
radius=15,
popup=row['LOT'],
fill_color="#3db7e4", # divvy color
).add_to(m)
# convert to (n, 2) nd-array format for heatmap
stationArr = df[['latitude', 'longitude']].to_numpy()
# plot heatmap
m.add_child(plugins.HeatMap(stationArr, radius=15))
m
https://github.com/ASH-WICUS/Notebooks/blob/master/Plotting%20Longitude%20and%20Latitude%20Coordinates%20using%20Folium%20CircleMarker.ipynb
https://github.com/ASH-WICUS/Notebooks/blob/master/Plotting%20Longitude%20and%20Latitude%20to%20Visualize%20Spatial%20Data%20for%20NYC%20Taxis.ipynb

How to turn individual points into a kernel density map?

I am trying to recreate in Python a map I made in Tableau. I'm pretty sure it's called a kernel density map (just "density" map in Tableau). Each point is just a single point and doesn't correspond with any kind of value.
I have plotted my points on a map, but am unable to figure out how to give them the contour appearance in the Tableau map. I see some examples that include np.meshgrid or matplotlib's contourf function, but I'm unable to apply it to my data because I don't have a Z coordinate (from what I can tell). Below is what I have currently:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import timedelta, date
import matplotlib
from pandas import Series, DataFrame
import geopandas as gpd
from geopandas import GeoDataFrame
from shapely.geometry import Point, mapping
# import data
df = pd.read_csv('./Data/complete.csv', on_bad_lines='skip') # csv had some bad data, had to skip
# drop unneccessary columns
df = df[['datetime', 'latitude', 'longitude']]
# convert 'datetime' to YYYY-MM-DD
df['datetime'] = pd.to_datetime(df['datetime'], dayfirst=True)
# clean the data
# remove values that contain '/' and 'q'
df = df.drop(df[df.latitude.str.contains(r'[/q]')].index)
# convert lat/long to float
df['latitude'] = df.latitude.astype('float')
df['longitude'] = df.longitude.astype('float')
# create GDF from lat/long
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude)).set_crs('EPSG:4326')
# import shapefile
us_map = gpd.read_file(r'./Data/USA_States_(Generalized)/USA_States_Generalized.shp')
#remove AK and HI
us_map = us_map[~us_map['STATE_NAME'].isin(['Alaska', 'Hawaii'])]
# plotting only 2012 sightings
sightings2012 = df[(df['datetime'] > '12/31/2011') & (df['datetime'] < '01/01/2013')]
# create GDF for 2012 sightings
gdf2012 = gpd.GeoDataFrame(sightings2012,
geometry=gpd.points_from_xy(sightings2012.longitude,
sightings2012.latitude)).set_crs('EPSG:4326')
# clip the sightings data
us_sightings_2012 = gpd.clip(gdf2012, us_map)
cmap = matplotlib.cm.get_cmap('plasma')
# plot
fig, ax = plt.subplots(1, 1)
us_map.plot(ax=ax)
us_sightings_2012.plot(ax=ax, cmap=cmap)
plt.show()
And here is my output:

How to add a an additional point location while plotting geopandas dataframe using matplotlib

I am using the following code to plot an additional coordinate/point on top of a matplotlib plot of a geopandas dataframe. However, as the image indicates the point is not overlapping the choropleth - it should, since the latitude & longitude lies within the geographic location for which the data has been obtained from census. Please advise.
from cenpy import products
import pandas as pd
import matplotlib.pyplot as plt
import geopandas
%matplotlib inline
tustin = products.ACS(2018).from_county('Orange County, CA', level='tract',
variables=['B23025_005E', 'B23025_003E'])
tustin['pct_unemployed'] = tustin.B23025_005E / tustin.B23025_003E * 100
print(tustin.crs)
# additional data co-ordinate
lat = 3374569.5
lon = -11782886.0
df = pd.DataFrame(
{'Place': ['X'],
'Latitude': [lat],
'Longitude': [lon]})
gdf = geopandas.GeoDataFrame(
df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude))
# setting the same crs as the main geopandas dataframe
gdf.crs = {'init': 'epsg:3857'}
# plot the first dataframe 'tustin' as a choropleth
f, ax = plt.subplots(1,1,figsize=(20,20))
tustin.dropna(subset=['pct_unemployed'], axis=0).plot('pct_unemployed', ax=ax, cmap='plasma')
ax.set_facecolor('k')
gdf.plot(ax=ax, color='red')
#ax.plot(-13144890.450, 3992372.350, "ro")
plt.show()
enter image description here

geopandas find all possible adjacent/closest geometrical points from a given point

So, I have a dataframe like this,
import numpy as np
import pandas as pd
import descartes
from shapely.geometry import Point, Polygon
import geopandas as gpd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Address':['280 Broadway','1 Liberty Island','141 John Street'],
'Latitude':[ 40.71,40.69,40.71],
'Longitude':[-74.01,-74.05,-74.00]
})
%matplotlib inline
geometry = [Point(xy) for xy in zip( df["Longitude"],df["Latitude"])]
crs = {'init':'epsg:4326'}
df = gpd.GeoDataFrame(df,
crs=crs,
geometry=geometry)
df.head()
I converted the lat and lon to geometry points and I am trying to find all possible closest points for each address using the geometrical points. For example, all possible closest points adjacent to 280 Broadway which lies next to each other for one block.There could be more than one point if the points are adjacent to each other containing in a polygon shape.
This was my approach but didn't really get what I wanted,
df.insert(4, 'nearest_geometry', None)
from shapely.geometry import Point, MultiPoint
from shapely.ops import nearest_points
for index, row in df.iterrows():
point = row.geometry
multipoint = df.drop(index, axis=0).geometry.unary_union
queried_geom, nearest_geom = nearest_points(point, multipoint)
df.loc[index, 'nearest_geometry'] = nearest_geom
Any help is appreciated, Thanks.

Matplotlib cannot plot points on basemap from CSV, but plots corectly from JSON

Problem
I'm trying to plot a set of points on a base-map. Below is my code. However it doesn't display it correctly where it supposed to be displaying it on the map. I have added below a Dropbox link to the csv file I am using.
Dropbox link to the csv file
import pandas as pd
import geopandas
import matplotlib.pyplot as plt
%matplotlib inline
#read data from CSV
building = pd.read_csv('masteronlyfive.csv')
# convert coords to float type
building = building.astype({"lat": float, "long": float})
# convert to geodata series
building = geopandas.GeoDataFrame(towers, geometry=geopandas.points_from_xy(building.lat,building.long))
# set CRS
building.crs = {'init' :'epsg:4326'}
building.head()
# read basemap file and set CRS
world = geopandas.read_file("South_Africa_Polygon.shp")
world.crs = {'init' :'epsg:4326'}
# Plot basemap
ax = world.plot(color='white', edgecolor='black')
# plot points
building.plot(ax=ax, color='red')
plt.show()
What I have tried
I have taken the co-ordinates and re-coded them in a json format, instead of csv, so Im reading the data from a json array rather than doing a csv import, as such below and they work completely fine, its totally shocking for me.
import pandas as pd
import geopandas
import matplotlib.pyplot as plt
%matplotlib inline
#reading from json array
df = pd.DataFrame(
{'Country': ['building', 'building', 'building', 'building', 'building'],
'Latitude': [-28.506806, -27.463611, -29.192053, -28.871950, -27.242444],
'Longitude': [28.613972, 28.040001, 26.235583,27.873739, 28.838861]})
#creating geopandas points from the coordinates
gdf = geopandas.GeoDataFrame(
df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude))
#reading the basemap file
world = geopandas.read_file("South_Africa_Polygon.shp")
# plotting the basemap
ax = world.plot(color='white', edgecolor='black')
# plotting the geodata points
gdf.plot(ax=ax, color='red')
plt.show()
What could I possibly be doing wrong that the exact same co-ords works fine from JSON but not from CSV.
When you create a geodataframe, change long/lat order, like below:
# convert to geodata series
building = geopandas.GeoDataFrame(
geometry=geopandas.points_from_xy(building.long, building.lat)
)
Your longitude is x, not y. Your latitude is y. Hence when dealing with the points_from_xy() function, longitude (which is x) comes first. This is a very common error and you can spot it in the plot – the polygon boundaries are diagonally opposite to your points cluster, so it is most often the order of lat/lon!
P.S. I am not sure why your original code references towers variable in that code snippet, so I removed it.

Categories