I have some dummy data at 0.2 and 1 degree resolution. I would like to subsample foo to the same scale as foo1.
Is there any easy way to average and regrid my lat and long coordinates somehow?
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
#Set at 0.2 degree grids ish
freq=20
lats=240
lons=1020
time=pd.date_range('2000-01',periods=freq,freq='Y')
data=np.random.rand(freq,lats,lons)
lat=np.linspace(-19.5,19.5,lats)
lon=np.linspace(120,290,lons)
foo = xr.DataArray(data, coords=[time, lat,lon], dims=['time', 'lat','lon'])
foo.sel(time='2005',method='nearest').plot()
plt.show()
#Set at 1 degree grids
freq1=20
lats1=40 #Factor of 6 difference
lons1=170
time1=pd.date_range('2000-01',periods=freq1,freq='Y')
data1=np.random.rand(freq1,lats1,lons1)
lat1=np.linspace(-19.5,19.5,lats1)
lon1=np.linspace(120,290,lons1)
foo1 = xr.DataArray(data1, coords=[time1, lat1,lon1], dims=['time', 'lat','lon'])
foo1.sel(time='2005',method='nearest').plot()
plt.show()
Xarray can linearly interpolate latitudes and longitudes as if they were cartesian coordinates (as in your example above), but that isn't the same a proper geographical regridding. For that, you probably want to check out xesmf.
I decided the easiest way would be to interp using the foo1 grid.
Thus:
foo2=foo.interp(lat=lat1).interp(lon=lon1)
foo2.sel(time='2005',method='nearest').plot()
Should produce an accurate subsampled gridded map.
Related
I have a gridded temperature dataset and a list of weather stations across the country and their latitudes and longitudes. I want to find the grid points that are nearest to the weather stations. My gridded data has coordinates x,y which latitude and longitude are a function of.
I found that the simplest way of finding the nearest grid point is to first transform the latitude and longitude (Lat, Lon) of the weather stations to x and y values and then find the nearest grid point. I did that for one station (lat= , lon= ) by doing the following:
import matplotlib.pyplot as plt
from netCDF4 import Dataset as netcdf_dataset
import numpy as np
from cartopy import config
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import xarray as xr
import pandas as pd
import netCDF4 as nc
#open gridded data
df=xr.open_dataset('/home/mmartin/LauNath/air.2m.2015.nc')
#open weather station data
CMStations=pd.read_csv('Slope95.csv')
import cartopy.crs as ccrs
# Example - your x and y coordinates are in a Lambert Conformal projection
data_crs = ccrs.LambertConformal(central_longitude=-107.0,central_latitude=50.0,standard_parallels = (50, 50.000001),false_easting=5632642.22547,false_northing=4612545.65137)
# Transform the point - src_crs is always Plate Carree for lat/lon grid
x, y = data_crs.transform_point(-94.5786,39.0997, src_crs=ccrs.PlateCarree())
# Now you can select data
ks=df.sel(x=x, y=y, method='nearest')
How would I apply this to all of the weather stations latitudes and longitudes (Lat,Lon)?
There is no need to use geopandas in here... just use crs.transform_points() instead of crs.transform_point() and pass the coordinates as arrays!
import numpy as np
import cartopy.crs as ccrs
data_crs = ccrs.LambertConformal(central_longitude=-107.0,central_latitude=50.0,standard_parallels = (50, 50.000001),false_easting=5632642.22547,false_northing=4612545.65137)
lon, lat = np.array([1,2,3]), np.array([1,2,3])
data_crs.transform_points(ccrs.PlateCarree(), lon, lat)
which will return an array of the projected coordinates:
array([[16972983.1673108 , 8528848.37931063, 0. ],
[16841398.80456616, 8697676.02704447, 0. ],
[16709244.32834945, 8862533.81411212, 0. ]])
... also... if you really have a lot of points to transform (and maybe use some crs not yet supported by cartopy) you might want to have a look at PyProj directly since it provides a lot more functionality and also some tricks to speed up transformations. (it's used under the hood by cartopy as well so you should already have it installed!)
You can create a geopandas GeoDataFrame from x, y columns using geopandas.points_from_xy. I'll assume these points are WGS84/EPSG4326:
import geopandas as gpd
stations = gpd.GeoDataFrame(
CMStations,
geometry=gpd.points_from_xy(
CMStations.Lon, CMStations.Lat, crs="epsg:4326" # assume WGS84
),
)
Now, we can use geopandas.GeoDataFrame.to_crs to transform all the points at once:
stations_xy = stations.to_crs(data_crs)
Finally, we can use xarray's advanced indexing, using DataArrays with lat/lon data and station ID as coordinates, to reshape the x/y data to the shape of the CMStations index:
station_x = stations.geometry.x.to_xarray()
station_y = stations.geometry.y.to_xarray()
# use these to select from xarray Dataset ds
station_data = ds.sel(y=station_y, x=station_x, method="nearest")
If desired, you could set a station ID column to be the index first with CMStations.set_index("station_id") to get the station_id column as the dataset dimension which replaces x and y.
I have a few hundred geopandas multilinestrings that trace along an object of interest (one line each week over a few years tracing the Gulf Stream) and I want to use those lines to extract values from a few other xarray datasets to know sea surface temperature, chlorophyll-a, and other variables along this path each week.
I'm unsure though how exactly to use these geopandas lines to extract values from the xarray datasets. I have thought about breaking them into points and grabbing the dataset values at each point but that seems a bit cumbersome. Is there any straightforward way to do this operation?
Breaking the lines into points and then extracting the point is quite straightforward actually!
import geopandas as gpd
import numpy as np
import shapely.geometry as sg
import xarray as xr
# Setup an example DataArray:
y = np.arange(20.0)
x = np.arange(20.0)
da = xr.DataArray(
data=np.random.rand(y.size, x.size),
coords={"y": y, "x": x},
dims=["y", "x"],
)
# Setup an example geodataframe:
gdf = gpd.GeoDataFrame(
geometry=[
sg.LineString([(0.0, 0.0), (5.0, 5.0)]),
sg.LineString([(10.0, 10.0), (15.0, 15.0)]),
]
)
# Get the centroids, and create the indexers for the DataArray:
centroids = gdf.centroid
x_indexer = xr.DataArray(centroids.x, dims=["point"])
y_indexer = xr.DataArray(centroids.y, dims=["point"])
# Grab the results:
da.sel(x=x_indexer, y=y_indexer, method="nearest")
<xarray.DataArray (point: 2)>
array([0.80121949, 0.34728138])
Coordinates:
y (point) float64 3.0 13.0
x (point) float64 3.0 13.0
* point (point) int64 0 1
The main thing is to decide on which point you'd like to sample, or how many points, etc.
Note that the geometry objects in the geodataframe also have an interpolation method, if you'd like draw values at specific points along the trajectory:
https://shapely.readthedocs.io/en/stable/manual.html#object.interpolate
In such a case, .apply can come in handy:
gdf.geometry.apply(lambda geom: geom.interpolate(3.0))
0 POINT (2.12132 2.12132)
1 POINT (12.12132 12.12132)
Name: geometry, dtype: geometry
I have used regionmask and it is pretty fast and easy to use. The mask_geopandas method is what you need.
Since GeoPandas uses the same conventions as Pandas, the best way is to unify the data type when you're working on it. You can do this in xarray with:
xr.Dataset.from_dataframe(df)
I hope I explain this correctly.
I'm looking for a way to better visualise underwater noise.
I'm not after a solution (well, maybe I am) but more interested in what the perfect start would be, considering speed is of essence (so pretty much your opinion of Q1 to Q3).
I’m trying perform calculations and visualisations of a body of water.
For this I want basically import the bathymetry (csv containing x,y,z) of a substantial area (lets say 50kmx50km).
Q1: do I use a pandas dataframe or numpy array.
Q2: do you envision this as a mesh, where the column names are x and row names are y and the elevation(z) are the fields?
As z can be positive or negative, the landmass starts when z>0, which will always vary depending on tide. I want to be able to increase or decrease the low and high tide on the ‘fly’
The actual seafloor bottom is also important depending on the surface, salinity, water temperature per meter ect.
Q3: is this where I should go 3D (in mesh?)
For now I was just focussing on importing the bathymetry and visualise what I imported in a graphical way (and failing a bit).
So far my code looks like below, sorry about the lack of comments.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
import pandas as pd
import tkinter as tk
from tkinter import filedialog
import scipy
from scipy import interpolate
# min_? is minimum bound, max_? is maximum bound,
# dim_? is the granularity in that direction
filename = filedialog.askopenfilename()
df = pd.read_csv(filename, delimiter=',',names=["X", "Y", "Z"])
df.sort_values(by=["X"])
mat = df.to_numpy()
min_x = np.array(df['X'].values.tolist()).min(axis=0)
max_x = np.array(df['X'].values.tolist()).max(axis=0)
min_y = np.array(df['Y'].values.tolist()).min(axis=0)
max_y = np.array(df['Y'].values.tolist()).max(axis=0)
min_Z = np.array(df['Z'].values.tolist()).min(axis=0)
max_z = np.array(df['Z'].values.tolist()).max(axis=0)
dim_x = np.array(df['X'].count())
dim_y = np.array(df['Y'].count())
x=df.columns[0:]
y=df.columns[1:]
z=df.columns[2:]
x = np.linspace(min_x, max_x, dim_x)
y = np.linspace(min_y, max_y, dim_y)
X,Y = np.meshgrid(x, y)
# Interpolate (x,y,z) points [mat] over a normal (x,y) grid [X,Y]
# Depending on your "error", you may be able to use other methods
Z = scipy.interpolate.griddata((mat[:,0], mat[:,1]), mat[:,2], (X,Y),method='linear')
plt.pcolormesh(X,Y,Z)
plt.show()
I have data that are multidimensional compositional data (all dimensions sum to 1 or 100). I have learned how to use three of the variables to create a 2d ternary plot.
I would like to add a fourth dimension such that my plot looks like this.
I am willing to use python or R. I am using pyr2 to create the ternary plots in python using R right now, but just because that's an easy solution. If the ternary data could be transformed into 3d coordinates a simple wire plot could be used.
This post shows how 3d compositional data can be transformed into 2d data so that normal plotting method can be used. One solution would be to do the same thing in 3d.
Here is some sample Data:
c1 c2 c3 c4
0 0.082337 0.097583 0.048608 0.771472
1 0.116490 0.065047 0.066202 0.752261
2 0.114884 0.135018 0.073870 0.676229
3 0.071027 0.097207 0.070959 0.760807
4 0.066284 0.079842 0.103915 0.749959
5 0.016074 0.074833 0.044532 0.864561
6 0.066277 0.077837 0.058364 0.797522
7 0.055549 0.057117 0.045633 0.841701
8 0.071129 0.077620 0.049066 0.802185
9 0.089790 0.086967 0.083101 0.740142
10 0.084430 0.094489 0.039989 0.781093
Well, I solved this myself using a wikipedia article, an SO post, and some brute force. Sorry for the wall of code, but you have to draw all the plot outlines and labels and so forth.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d, Axes3D
from itertools import combinations
import pandas as pd
def plot_ax(): #plot tetrahedral outline
verts=[[0,0,0],
[1,0,0],
[0.5,np.sqrt(3)/2,0],
[0.5,0.28867513, 0.81649658]]
lines=combinations(verts,2)
for x in lines:
line=np.transpose(np.array(x))
ax.plot3D(line[0],line[1],line[2],c='0')
def label_points(): #create labels of each vertices of the simplex
a=(np.array([1,0,0,0])) # Barycentric coordinates of vertices (A or c1)
b=(np.array([0,1,0,0])) # Barycentric coordinates of vertices (B or c2)
c=(np.array([0,0,1,0])) # Barycentric coordinates of vertices (C or c3)
d=(np.array([0,0,0,1])) # Barycentric coordinates of vertices (D or c3)
labels=['a','b','c','d']
cartesian_points=get_cartesian_array_from_barycentric([a,b,c,d])
for point,label in zip(cartesian_points,labels):
if 'a' in label:
ax.text(point[0],point[1]-0.075,point[2], label, size=16)
elif 'b' in label:
ax.text(point[0]+0.02,point[1]-0.02,point[2], label, size=16)
else:
ax.text(point[0],point[1],point[2], label, size=16)
def get_cartesian_array_from_barycentric(b): #tranform from "barycentric" composition space to cartesian coordinates
verts=[[0,0,0],
[1,0,0],
[0.5,np.sqrt(3)/2,0],
[0.5,0.28867513, 0.81649658]]
#create transformation array vis https://en.wikipedia.org/wiki/Barycentric_coordinate_system
t = np.transpose(np.array(verts))
t_array=np.array([t.dot(x) for x in b]) #apply transform to all points
return t_array
def plot_3d_tern(df,c='1'): #use function "get_cartesian_array_from_barycentric" to plot the scatter points
#args are b=dataframe to plot and c=scatter point color
bary_arr=df.values
cartesian_points=get_cartesian_array_from_barycentric(bary_arr)
ax.scatter(cartesian_points[:,0],cartesian_points[:,1],cartesian_points[:,2],c=c)
#Create Dataset 1
np.random.seed(123)
c1=np.random.normal(8,2.5,20)
c2=np.random.normal(8,2.5,20)
c3=np.random.normal(8,2.5,20)
c4=[100-x for x in c1+c2+c3] #make sur ecomponents sum to 100
#df unecessary but that is the format of my real data
df1=pd.DataFrame(data=[c1,c2,c3,c4],index=['c1','c2','c3','c4']).T
df1=df1/100
#Create Dataset 2
np.random.seed(1234)
c1=np.random.normal(16,2.5,20)
c2=np.random.normal(16,2.5,20)
c3=np.random.normal(16,2.5,20)
c4=[100-x for x in c1+c2+c3]
df2=pd.DataFrame(data=[c1,c2,c3,c4],index=['c1','c2','c3','c4']).T
df2=df2/100
#Create Dataset 3
np.random.seed(12345)
c1=np.random.normal(25,2.5,20)
c2=np.random.normal(25,2.5,20)
c3=np.random.normal(25,2.5,20)
c4=[100-x for x in c1+c2+c3]
df3=pd.DataFrame(data=[c1,c2,c3,c4],index=['c1','c2','c3','c4']).T
df3=df3/100
fig = plt.figure()
ax = Axes3D(fig) #Create a 3D plot in most recent version of matplot
plot_ax() #call function to draw tetrahedral outline
label_points() #label the vertices
plot_3d_tern(df1,'b') #call function to plot df1
plot_3d_tern(df2,'r') #...plot df2
plot_3d_tern(df3,'g') #...
The accepted answer explains how to do this in python but the question was also asking about R.
I've provided an answer in this thread on how to do this 'manually' in R.
Otherwise, you can use the klaR package directly for this:
df <- matrix(c(
0.082337, 0.097583, 0.048608, 0.771472,
0.116490, 0.065047, 0.066202, 0.752261,
0.114884, 0.135018, 0.073870, 0.676229,
0.071027, 0.097207, 0.070959, 0.760807,
0.066284, 0.079842, 0.103915, 0.749959,
0.016074, 0.074833, 0.044532, 0.864561,
0.066277, 0.077837, 0.058364, 0.797522,
0.055549, 0.057117, 0.045633, 0.841701,
0.071129, 0.077620, 0.049066, 0.802185,
0.089790, 0.086967, 0.083101, 0.740142,
0.084430, 0.094489, 0.039989, 0.781094
), byrow = TRUE, nrow = 11, ncol = 4)
# install.packages(c("klaR", "scatterplot3d"))
library(klaR)
#> Loading required package: MASS
quadplot(df)
Created on 2020-08-14 by the reprex package (v0.3.0)
I have a 2-d gridded files which represents the land use catalogues for the place of interest.
I also have some lat/lon based point distributed in this area.
from netCDF4 import Dataset
## 2-d gridded files
nc_file = "./geo_em.d02.nc"
geo = Dataset(nc_file, 'r')
lu = geo.variables["LU_INDEX"][0,:,:]
lat = geo.variables["XLAT_M"][0,:]
lon = geo.variables["XLONG_M"][0,:]
## point files
point = pd.read_csv("./point_data.csv")
plt.pcolormesh(lon,lat,lu)
plt.scatter(point_data.lon,cf_fire_data.lat, color ='r')
I want to extract the values of the 2-d gridded field which those points belong, but I found it is difficult to define a simple function to solve that.
Is there any efficient method to achieve it?
Any advices would be appreciated.
PS
I have uploaded my files here
1. nc_file
2. point_file
I can propose solution like this, where I just loop over the points and select the data based on the distance from the point.
#/usr/bin/env ipython
import numpy as np
from netCDF4 import Dataset
import matplotlib.pylab as plt
import pandas as pd
# --------------------------------------
## 2-d gridded files
nc_file = "./geo_em.d02.nc"
geo = Dataset(nc_file, 'r')
lu = geo.variables["LU_INDEX"][0,:,:]
lat = geo.variables["XLAT_M"][0,:]
lon = geo.variables["XLONG_M"][0,:]
## point files
point = pd.read_csv("./point_data.csv")
plt.pcolormesh(lon,lat,lu)
#plt.scatter(point_data.lon,cf_fire_data.lat, color ='r')
# --------------------------------------------
# get data for points:
dataout=[];
lon_ratio=np.cos(np.mean(lat)*np.pi/180.0)
for ii in range(len(point)):
plon,plat = point.lon[ii],point.lat[ii]
distmat=np.sqrt(1./lon_ratio*(lon-plon)**2+(lat-plat)**2)
kk=np.where(distmat==np.min(distmat));
dataout.append([float(lon[kk]),float(lat[kk]),float(lu[kk])]);
# ---------------------------------------------