I want to plot points using Longitude and Latitude with Geopandas, but nothing gets plotted. How to fix this?
it's never easy to answer a question when one has to use OCR to extract the data and code. Here's what I've managed to extract with OCR, there are some errors in sample points
this sample works as can be seen by output of plot()
what is very clear from your output is the axes make no sense 1e6 is too big. Check your data, do you have longitude / latitudes that are invalid WGS84 bounds: -180.0 -90.0 180.0 90.0
import pandas as pd
import geopandas as gpd
geodata = pd.DataFrame(
[
[-88.355555, 30.757778],
[-120.849722, 46.041111],
[-113.8875, 8.12],
[-173.24, 38.54],
[-85.663611, 46.154444],
[-98.3555, -119.1342],
[-9.5932, -11.2836],
[-3.2948, 38.2224],
[36.2327, 29.3626],
[3.3483, 47.5047],
],
columns=["Longitude", "Latitude"],
)
data_gdf = gpd.GeoDataFrame(
geodata, geometry=gpd.points_from_xy(geodata["Longitude"], geodata["Latitude"])
)
data_gdf.plot()
Related
I'm trying to transform some points that are tabulated .csv in a netcdf file.
This is my .csv file: https://1drv.ms/u/s!AhZf0QH5jEVSjWfnPtJjJgmXf-i0?e=WEpMyU
In my spreadsheet, I have the unique location of each point, not regular for all area but points are spaced by 0.1 degree, an SP value per year up to 100 years forward.
To work with this data, I needed something like other sources that use netcdf data tabled in sp(time, lat, lon). So, I can evaluate and visualize the values of this specific region by year (using panoply or ncview for example).
For that, I came up with this code:
import pandas as pd
import xarray as xr
import numpy as np
csv_file = 'example.csv'
df = pd.read_csv(csv_file)
df = pd.melt(df, id_vars=["lon", "lat"], var_name="time", value_name="sp")
df['time']= pd.to_datetime(df['time'])
df = df.set_index(["time", "lat", "lon"])
df = df.astype('float32')
xr = df.to_xarray()
xc = xr.fillna(0)
xc.to_netcdf(csv_file + '.nc')
And I got a netcdf file like this: https://1drv.ms/u/s!AhZf0QH5jEVSjWfnPtJjJgmXf-i0?e=WEpMyU
At first, my code seems to work and create my netcdf file without problems, however, I noticed that in some places I am creating some "leakage" of points, or interpolating the same values in some direction (north-south and west-east) when it shouldn't happen.
If you do a simple plot before converting to xarray you can see there are 3 west segments and one south segment
xr.sp[0].plot()
And this ends up being masked a bit when I fill the NaN with 0 and plot it again:
xc.sp[0].plot()
Checking the netcdf file using panoply I got something similar as well:
So I've start to check every-step of my code to see if I miss something.. my first guess was the melt part but I not 100% sure because if I plot df I can't see any leaking or extrapolation in the same region:
joint_axes = seaborn.jointplot(
x="lon", y="lat", data=df, s=0.5
)
contextily.add_basemap(
joint_axes.ax_joint,
crs="EPSG:4326",
source=contextily.providers.CartoDB.PositronNoLabels,
);
So anyone have any idea what's happening here?
EDIT:
Now a solution that would help me at the moment would be to fill in the missing coordinates with a value equal to 0 within my domain area using the minimum and maximum latitudes and longitudes.
My first (and unconventional) idea was to create a 0.1 x 0.1 grid with values equal to zero and feed this grid with my existing values.
However, the method using reindex would help me and I would be able to execute it in a few lines. My doubt is whether I should do this before or after the df.melt in my code.
I'm in this situation:
csv_file = '/Users/helioguerraneto/Desktop/example.csv'
df = pd.read_csv(csv_file)
lonmin, lonmax = df['lon'].min(), df['lon'].max()
latmin, latmax = df['lat'].min(), df['lat'].max()
df = pd.melt(df, id_vars=["lon", "lat"], var_name="time", value_name="sp")
df['time']= pd.to_datetime(df['time'])
df = df.set_index(["time", "lat", "lon"])
df = df.astype('float32')
xr = df.to_xarray()
xc = xr.reindex(lat=np.arange(latmin, latmax, 0.1), lon=np.arange(lonmin, lonmax, 0.1), fill_value=0)
xc.to_netcdf(csv_file + '.nc')
Seems like reindex is the way but I need to keep original data. I was expecting some zeros but not in all area:
EDIT2:
I think I found something might help! My goal now could be same what's happing here: How to interpolate latitude/longitude and heading in Pandas
But instead of interpolation by the nearest I just could match with the exactly coordinates. Maybe the real problem here is mix 100 hundred grids in the end..
Any suggestions?
I am trying plot the intersection between a buffer circle and the mesh blocks (or boundaries) within that circle of some radius (in this case, 80 km).
I got the intersection using sjoin() as follows:
intersection_MeshBlock = gpd.sjoin(buffer_df, rest_VIC, how='inner', predicate='intersects')
My buffer variable looks like this:
buffer_df
And the intersection looks like this:
intersection
The problem is I am not able to plot the intersection polygons.
Here is the plot I get after I plot it using the polygon plotting in folium:
for _, r in intersection_MeshBlock.iterrows():
# Without simplifying the representation of each borough,
# the map might not be displayed
sim_geo = gpd.GeoSeries(r['geometry']).simplify(tolerance=0.00001)
geo_j = sim_geo.to_json()
geo_j = folium.GeoJson(data=geo_j,
style_function=lambda x: {'fillColor': 'orange'} )
folium.Popup(r['SA1_CODE21']).add_to(geo_j)
geo_j.add_to(m)
m
Plot:
color filled maps
What am I doing in wrong ways?
EDIT:
I might have solved the issue partially. Now, I am able to plot the polygons inside some buffer radius. This is how my plot looks like:
If you see the image, you will realise that there are certain meshblocks that cross the circular boundary region. How do I get rid of everything which is outside that circular region?
have located some geometry for Melbourne to demonstrate
fundamentally, you want to use overlay() not sjoin()
generation of folium map is much simpler using GeoPandas 0.10 capability explore()
import geopandas as gpd
import numpy as np
import shapely.geometry
import folium
rest_VIC = gpd.read_file(
"https://raw.githubusercontent.com/codeforgermany/click_that_hood/main/public/data/melbourne.geojson"
)
# select a point randomly from total bounds of geometry
buffer_df = gpd.GeoDataFrame(
geometry=[
shapely.geometry.Point(
np.random.uniform(*rest_VIC.total_bounds[[0, 2]], size=1)[0],
np.random.uniform(*rest_VIC.total_bounds[[1, 3]], size=1)[0],
)
],
crs=rest_VIC.crs,
)
buffer_df = gpd.GeoDataFrame(
geometry=buffer_df.to_crs(buffer_df.estimate_utm_crs())
.buffer(8 * 10**3)
.to_crs(buffer_df.crs)
)
# need overlay not sjoin
intersection_MeshBlock = gpd.overlay(buffer_df, rest_VIC, how="intersection")
m = rest_VIC.explore(name="base", style_kwds={"fill":False}, width=400, height=300)
m = buffer_df.explore(m=m, name="buffer", style_kwds={"fill":False})
m = intersection_MeshBlock.explore(m=m, name="intersection", style_kwds={"fillColor":"orange"})
folium.LayerControl().add_to(m)
m
I have a few hundred geopandas multilinestrings that trace along an object of interest (one line each week over a few years tracing the Gulf Stream) and I want to use those lines to extract values from a few other xarray datasets to know sea surface temperature, chlorophyll-a, and other variables along this path each week.
I'm unsure though how exactly to use these geopandas lines to extract values from the xarray datasets. I have thought about breaking them into points and grabbing the dataset values at each point but that seems a bit cumbersome. Is there any straightforward way to do this operation?
Breaking the lines into points and then extracting the point is quite straightforward actually!
import geopandas as gpd
import numpy as np
import shapely.geometry as sg
import xarray as xr
# Setup an example DataArray:
y = np.arange(20.0)
x = np.arange(20.0)
da = xr.DataArray(
data=np.random.rand(y.size, x.size),
coords={"y": y, "x": x},
dims=["y", "x"],
)
# Setup an example geodataframe:
gdf = gpd.GeoDataFrame(
geometry=[
sg.LineString([(0.0, 0.0), (5.0, 5.0)]),
sg.LineString([(10.0, 10.0), (15.0, 15.0)]),
]
)
# Get the centroids, and create the indexers for the DataArray:
centroids = gdf.centroid
x_indexer = xr.DataArray(centroids.x, dims=["point"])
y_indexer = xr.DataArray(centroids.y, dims=["point"])
# Grab the results:
da.sel(x=x_indexer, y=y_indexer, method="nearest")
<xarray.DataArray (point: 2)>
array([0.80121949, 0.34728138])
Coordinates:
y (point) float64 3.0 13.0
x (point) float64 3.0 13.0
* point (point) int64 0 1
The main thing is to decide on which point you'd like to sample, or how many points, etc.
Note that the geometry objects in the geodataframe also have an interpolation method, if you'd like draw values at specific points along the trajectory:
https://shapely.readthedocs.io/en/stable/manual.html#object.interpolate
In such a case, .apply can come in handy:
gdf.geometry.apply(lambda geom: geom.interpolate(3.0))
0 POINT (2.12132 2.12132)
1 POINT (12.12132 12.12132)
Name: geometry, dtype: geometry
I have used regionmask and it is pretty fast and easy to use. The mask_geopandas method is what you need.
Since GeoPandas uses the same conventions as Pandas, the best way is to unify the data type when you're working on it. You can do this in xarray with:
xr.Dataset.from_dataframe(df)
The task is to make an adress popularity map for Moscow. Basically, it should look like this:
https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/GeoJSON_and_choropleth.ipynb
For my map I use public geojson: http://gis-lab.info/qa/moscow-atd.html
The only data I have - points coordinates and there's no information about the district they belong to.
Question 1:
Do I have to manually calculate for each disctrict if the point belongs to it, or there is more effective way to do this?
Question 2:
If there is no way to do this easier, then, how can I get all the coordinates for each disctrict from the geojson file (link above)?
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
Reading in the Moscow area shape file with geopandas
districts = gpd.read_file('mo-shape/mo.shp')
Construct a mock user dataset
moscow = [55.7, 37.6]
data = (
np.random.normal(size=(100, 2)) *
np.array([[.25, .25]]) +
np.array([moscow])
)
my_df = pd.DataFrame(data, columns=['lat', 'lon'])
my_df['pop'] = np.random.randint(500, 100000, size=len(data))
Create Point objects from the user data
geom = [Point(x, y) for x,y in zip(my_df['lon'], my_df['lat'])]
# and a geopandas dataframe using the same crs from the shape file
my_gdf = gpd.GeoDataFrame(my_df, geometry=geom)
my_gdf.crs = districts.crs
Then the join using default value of 'inner'
gpd.sjoin(districts, my_gdf, op='contains')
Thanks to #BobHaffner, I tried to solve the problem using geopandas.
Here are my steps:
I download a shape-files for Moscow using this link click
From a list of tuples containing x and y (latitude and logitude) coordinates I create list of Points (docs)
Assuming that in the dataframe from the first link I have polygons I can write a simple loop for checking if the Point is inside this polygon. For details read this.
I have a pandas Dataframe with a few million rows, each with an X and Y attribute with their location in kilometres according to the WGS 1984 World Mercator projection (created using ArcGIS).
What is the easiest way to project these points back to degrees, without leaving the Python/pandas environment?
There is already a python module that can do these kind of transformations for you called pyproj. I will agree it is actually not the simplest module to find via google. Some examples of its use can be seen here
Many years later, this is how I would do this. Keeping everything in GeoPandas to minimise the possibility of footguns.
Some imports:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
Create a dataframe (note the values must be in metres!)
df = pd.DataFrame({"X": [50e3, 900e3], "Y": [20e3, 900e3]})
Create geometries from the X/Y coordinates
df["geometry"] = df.apply(lambda row: Point(row.X, row.Y), axis=1)
Convert to a GeoDataFrame, setting the current CRS.
In this case EPSG:3857, the projection from the question.
gdf = gpd.GeoDataFrame(df, crs=3857)
Project it to the standard WGS84 CRS in degrees (EPSG:4326).
gdf = gdf.to_crs(4326)
And then (optionally), extract the X/Y coordinates in degrees back into standard columns:
gdf["X_deg"] = gdf.geometry.apply(lambda p: p.x)
gdf["Y_deg"] = gdf.geometry.apply(lambda p: p.y)