Extract values from xarray dataset using geopandas multilinestring - python

I have a few hundred geopandas multilinestrings that trace along an object of interest (one line each week over a few years tracing the Gulf Stream) and I want to use those lines to extract values from a few other xarray datasets to know sea surface temperature, chlorophyll-a, and other variables along this path each week.
I'm unsure though how exactly to use these geopandas lines to extract values from the xarray datasets. I have thought about breaking them into points and grabbing the dataset values at each point but that seems a bit cumbersome. Is there any straightforward way to do this operation?

Breaking the lines into points and then extracting the point is quite straightforward actually!
import geopandas as gpd
import numpy as np
import shapely.geometry as sg
import xarray as xr
# Setup an example DataArray:
y = np.arange(20.0)
x = np.arange(20.0)
da = xr.DataArray(
data=np.random.rand(y.size, x.size),
coords={"y": y, "x": x},
dims=["y", "x"],
)
# Setup an example geodataframe:
gdf = gpd.GeoDataFrame(
geometry=[
sg.LineString([(0.0, 0.0), (5.0, 5.0)]),
sg.LineString([(10.0, 10.0), (15.0, 15.0)]),
]
)
# Get the centroids, and create the indexers for the DataArray:
centroids = gdf.centroid
x_indexer = xr.DataArray(centroids.x, dims=["point"])
y_indexer = xr.DataArray(centroids.y, dims=["point"])
# Grab the results:
da.sel(x=x_indexer, y=y_indexer, method="nearest")
<xarray.DataArray (point: 2)>
array([0.80121949, 0.34728138])
Coordinates:
y (point) float64 3.0 13.0
x (point) float64 3.0 13.0
* point (point) int64 0 1
The main thing is to decide on which point you'd like to sample, or how many points, etc.
Note that the geometry objects in the geodataframe also have an interpolation method, if you'd like draw values at specific points along the trajectory:
https://shapely.readthedocs.io/en/stable/manual.html#object.interpolate
In such a case, .apply can come in handy:
gdf.geometry.apply(lambda geom: geom.interpolate(3.0))
0 POINT (2.12132 2.12132)
1 POINT (12.12132 12.12132)
Name: geometry, dtype: geometry

I have used regionmask and it is pretty fast and easy to use. The mask_geopandas method is what you need.

Since GeoPandas uses the same conventions as Pandas, the best way is to unify the data type when you're working on it. You can do this in xarray with:
xr.Dataset.from_dataframe(df)

Related

"leak" converting .csv to .nc using xarray in some points

I'm trying to transform some points that are tabulated .csv in a netcdf file.
This is my .csv file: https://1drv.ms/u/s!AhZf0QH5jEVSjWfnPtJjJgmXf-i0?e=WEpMyU
In my spreadsheet, I have the unique location of each point, not regular for all area but points are spaced by 0.1 degree, an SP value per year up to 100 years forward.
To work with this data, I needed something like other sources that use netcdf data tabled in sp(time, lat, lon). So, I can evaluate and visualize the values ​​of this specific region by year (using panoply or ncview for example).
For that, I came up with this code:
import pandas as pd
import xarray as xr
import numpy as np
csv_file = 'example.csv'
df = pd.read_csv(csv_file)
df = pd.melt(df, id_vars=["lon", "lat"], var_name="time", value_name="sp")
df['time']= pd.to_datetime(df['time'])
df = df.set_index(["time", "lat", "lon"])
df = df.astype('float32')
xr = df.to_xarray()
xc = xr.fillna(0)
xc.to_netcdf(csv_file + '.nc')
And I got a netcdf file like this: https://1drv.ms/u/s!AhZf0QH5jEVSjWfnPtJjJgmXf-i0?e=WEpMyU
At first, my code seems to work and create my netcdf file without problems, however, I noticed that in some places I am creating some "leakage" of points, or interpolating the same values ​​in some direction (north-south and west-east) when it shouldn't happen.
If you do a simple plot before converting to xarray you can see there are 3 west segments and one south segment
xr.sp[0].plot()
And this ends up being masked a bit when I fill the NaN with 0 and plot it again:
xc.sp[0].plot()
Checking the netcdf file using panoply I got something similar as well:
So I've start to check every-step of my code to see if I miss something.. my first guess was the melt part but I not 100% sure because if I plot df I can't see any leaking or extrapolation in the same region:
joint_axes = seaborn.jointplot(
x="lon", y="lat", data=df, s=0.5
)
contextily.add_basemap(
joint_axes.ax_joint,
crs="EPSG:4326",
source=contextily.providers.CartoDB.PositronNoLabels,
);
So anyone have any idea what's happening here?
EDIT:
Now a solution that would help me at the moment would be to fill in the missing coordinates with a value equal to 0 within my domain area using the minimum and maximum latitudes and longitudes.
My first (and unconventional) idea was to create a 0.1 x 0.1 grid with values equal to zero and feed this grid with my existing values.
However, the method using reindex would help me and I would be able to execute it in a few lines. My doubt is whether I should do this before or after the df.melt in my code.
I'm in this situation:
csv_file = '/Users/helioguerraneto/Desktop/example.csv'
df = pd.read_csv(csv_file)
lonmin, lonmax = df['lon'].min(), df['lon'].max()
latmin, latmax = df['lat'].min(), df['lat'].max()
df = pd.melt(df, id_vars=["lon", "lat"], var_name="time", value_name="sp")
df['time']= pd.to_datetime(df['time'])
df = df.set_index(["time", "lat", "lon"])
df = df.astype('float32')
xr = df.to_xarray()
xc = xr.reindex(lat=np.arange(latmin, latmax, 0.1), lon=np.arange(lonmin, lonmax, 0.1), fill_value=0)
xc.to_netcdf(csv_file + '.nc')
Seems like reindex is the way but I need to keep original data. I was expecting some zeros but not in all area:
EDIT2:
I think I found something might help! My goal now could be same what's happing here: How to interpolate latitude/longitude and heading in Pandas
But instead of interpolation by the nearest I just could match with the exactly coordinates. Maybe the real problem here is mix 100 hundred grids in the end..
Any suggestions?

Geopandas plots no points

I want to plot points using Longitude and Latitude with Geopandas, but nothing gets plotted. How to fix this?
it's never easy to answer a question when one has to use OCR to extract the data and code. Here's what I've managed to extract with OCR, there are some errors in sample points
this sample works as can be seen by output of plot()
what is very clear from your output is the axes make no sense 1e6 is too big. Check your data, do you have longitude / latitudes that are invalid WGS84 bounds: -180.0 -90.0 180.0 90.0
import pandas as pd
import geopandas as gpd
geodata = pd.DataFrame(
[
[-88.355555, 30.757778],
[-120.849722, 46.041111],
[-113.8875, 8.12],
[-173.24, 38.54],
[-85.663611, 46.154444],
[-98.3555, -119.1342],
[-9.5932, -11.2836],
[-3.2948, 38.2224],
[36.2327, 29.3626],
[3.3483, 47.5047],
],
columns=["Longitude", "Latitude"],
)
data_gdf = gpd.GeoDataFrame(
geodata, geometry=gpd.points_from_xy(geodata["Longitude"], geodata["Latitude"])
)
data_gdf.plot()

Regridding Python xarray coordinates

I have some dummy data at 0.2 and 1 degree resolution. I would like to subsample foo to the same scale as foo1.
Is there any easy way to average and regrid my lat and long coordinates somehow?
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
#Set at 0.2 degree grids ish
freq=20
lats=240
lons=1020
time=pd.date_range('2000-01',periods=freq,freq='Y')
data=np.random.rand(freq,lats,lons)
lat=np.linspace(-19.5,19.5,lats)
lon=np.linspace(120,290,lons)
foo = xr.DataArray(data, coords=[time, lat,lon], dims=['time', 'lat','lon'])
foo.sel(time='2005',method='nearest').plot()
plt.show()
#Set at 1 degree grids
freq1=20
lats1=40 #Factor of 6 difference
lons1=170
time1=pd.date_range('2000-01',periods=freq1,freq='Y')
data1=np.random.rand(freq1,lats1,lons1)
lat1=np.linspace(-19.5,19.5,lats1)
lon1=np.linspace(120,290,lons1)
foo1 = xr.DataArray(data1, coords=[time1, lat1,lon1], dims=['time', 'lat','lon'])
foo1.sel(time='2005',method='nearest').plot()
plt.show()
Xarray can linearly interpolate latitudes and longitudes as if they were cartesian coordinates (as in your example above), but that isn't the same a proper geographical regridding. For that, you probably want to check out xesmf.
I decided the easiest way would be to interp using the foo1 grid.
Thus:
foo2=foo.interp(lat=lat1).interp(lon=lon1)
foo2.sel(time='2005',method='nearest').plot()
Should produce an accurate subsampled gridded map.

How to create a variable from xarray dataset coordinates?

Based on a xarray dataset containing latitude and longitude coordinates and several variables I would like to create a new variable containing objects based on the latitude and longitude coordinates.
For example, from the following dataset:
<xarray.Dataset>
Dimensions: (time: 100, x: 1000, y: 840)
Coordinates:
* x (x) float64 2.452e+06 2.458e+06 2.462e+06 ... 7.442e+06 7.448e+06
* y (y) float64 1.352e+06 1.358e+06 1.362e+06 ... 5.542e+06 5.548e+06
* time (time) datetime64[ns] 2015-01-01 ... 2015-01-05T03:00:00
... I would like to simply create a point object for each grid cell based on the respective latitude and longitude coordinates.
Pseudocode:
ds['points'] = (('y', 'x'), point_creation_function(ds.y, ds.x))
(How) Can I apply a function that requires the coordinate values as inputs such, that the result can be directly added as new variable?
A horrible implementation after an initialization of ds.points would be:
for x_value in ds.x:
for y_value in ds.y:
ds.points.loc[dict(x=x_value, y=y_value)] = (x_value, y_value)
I assume there is an elegant and computation-efficient solution available, but searching the documentation I did not understand how to use apply, reduce or other functions to achieve it.
If I undestand you question correctly, I think this is the answer:
import numpy as np
import xarray as xr
# Create some example data
data = np.random.rand(10,5,6)
# Make the dataset.
ds = xr.Dataset({"my_var": (["time", "x", "y"], data)})
# Create a MultiIndex
ds = ds.stack(points=("x", "y"))

Python: Convert map in kilometres to degrees

I have a pandas Dataframe with a few million rows, each with an X and Y attribute with their location in kilometres according to the WGS 1984 World Mercator projection (created using ArcGIS).
What is the easiest way to project these points back to degrees, without leaving the Python/pandas environment?
There is already a python module that can do these kind of transformations for you called pyproj. I will agree it is actually not the simplest module to find via google. Some examples of its use can be seen here
Many years later, this is how I would do this. Keeping everything in GeoPandas to minimise the possibility of footguns.
Some imports:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
Create a dataframe (note the values must be in metres!)
df = pd.DataFrame({"X": [50e3, 900e3], "Y": [20e3, 900e3]})
Create geometries from the X/Y coordinates
df["geometry"] = df.apply(lambda row: Point(row.X, row.Y), axis=1)
Convert to a GeoDataFrame, setting the current CRS.
In this case EPSG:3857, the projection from the question.
gdf = gpd.GeoDataFrame(df, crs=3857)
Project it to the standard WGS84 CRS in degrees (EPSG:4326).
gdf = gdf.to_crs(4326)
And then (optionally), extract the X/Y coordinates in degrees back into standard columns:
gdf["X_deg"] = gdf.geometry.apply(lambda p: p.x)
gdf["Y_deg"] = gdf.geometry.apply(lambda p: p.y)

Categories