I have a gridded temperature dataset and a list of weather stations across the country and their latitudes and longitudes. I want to find the grid points that are nearest to the weather stations. My gridded data has coordinates x,y which latitude and longitude are a function of.
I found that the simplest way of finding the nearest grid point is to first transform the latitude and longitude (Lat, Lon) of the weather stations to x and y values and then find the nearest grid point. I did that for one station (lat= , lon= ) by doing the following:
import matplotlib.pyplot as plt
from netCDF4 import Dataset as netcdf_dataset
import numpy as np
from cartopy import config
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import xarray as xr
import pandas as pd
import netCDF4 as nc
#open gridded data
df=xr.open_dataset('/home/mmartin/LauNath/air.2m.2015.nc')
#open weather station data
CMStations=pd.read_csv('Slope95.csv')
import cartopy.crs as ccrs
# Example - your x and y coordinates are in a Lambert Conformal projection
data_crs = ccrs.LambertConformal(central_longitude=-107.0,central_latitude=50.0,standard_parallels = (50, 50.000001),false_easting=5632642.22547,false_northing=4612545.65137)
# Transform the point - src_crs is always Plate Carree for lat/lon grid
x, y = data_crs.transform_point(-94.5786,39.0997, src_crs=ccrs.PlateCarree())
# Now you can select data
ks=df.sel(x=x, y=y, method='nearest')
How would I apply this to all of the weather stations latitudes and longitudes (Lat,Lon)?
There is no need to use geopandas in here... just use crs.transform_points() instead of crs.transform_point() and pass the coordinates as arrays!
import numpy as np
import cartopy.crs as ccrs
data_crs = ccrs.LambertConformal(central_longitude=-107.0,central_latitude=50.0,standard_parallels = (50, 50.000001),false_easting=5632642.22547,false_northing=4612545.65137)
lon, lat = np.array([1,2,3]), np.array([1,2,3])
data_crs.transform_points(ccrs.PlateCarree(), lon, lat)
which will return an array of the projected coordinates:
array([[16972983.1673108 , 8528848.37931063, 0. ],
[16841398.80456616, 8697676.02704447, 0. ],
[16709244.32834945, 8862533.81411212, 0. ]])
... also... if you really have a lot of points to transform (and maybe use some crs not yet supported by cartopy) you might want to have a look at PyProj directly since it provides a lot more functionality and also some tricks to speed up transformations. (it's used under the hood by cartopy as well so you should already have it installed!)
You can create a geopandas GeoDataFrame from x, y columns using geopandas.points_from_xy. I'll assume these points are WGS84/EPSG4326:
import geopandas as gpd
stations = gpd.GeoDataFrame(
CMStations,
geometry=gpd.points_from_xy(
CMStations.Lon, CMStations.Lat, crs="epsg:4326" # assume WGS84
),
)
Now, we can use geopandas.GeoDataFrame.to_crs to transform all the points at once:
stations_xy = stations.to_crs(data_crs)
Finally, we can use xarray's advanced indexing, using DataArrays with lat/lon data and station ID as coordinates, to reshape the x/y data to the shape of the CMStations index:
station_x = stations.geometry.x.to_xarray()
station_y = stations.geometry.y.to_xarray()
# use these to select from xarray Dataset ds
station_data = ds.sel(y=station_y, x=station_x, method="nearest")
If desired, you could set a station ID column to be the index first with CMStations.set_index("station_id") to get the station_id column as the dataset dimension which replaces x and y.
Related
I have a satellite image file. Loaded into dask array. I want to get pixel value (nearest) of a latitude, longitude of interest.
Satellite image is in GEOS projection. I have longitude and latitude information as 2D numpy arrays.
Satellite Image file
I have loaded it into a dask data array
from satpy import Scene
import matplotlib as plt
import os
cwd = os.getcwd()
fn = os.path.join(cwd, 'EUMETSAT_data/1Jan21/MSG1-SEVI-MSG15-0100-NA-20210101185741.815000000Z-20210101185757-1479430.nat')
files = [fn]
scn = Scene(filenames=files, reader='seviri_l1b_native')
scn.load(["VIS006"])
da = scn['VIS006']
This is what the dask array looks like:
I read lon lats from the area attribute with the help of satpy:
lon, lat = scn['VIS006'].attrs['area'].get_lonlats()
print(lon.shape)
print(lat.shape)
(1179, 808)
(1179, 808)
I get a 2d numpy array each, for longitude and latitude that are coordinates but I can not use them for slicing or selecting.
What is the best practice/method to get nearest lat long, pixel information?
How do I project the data onto lat long coordinates that I can then use for indexing to arrive at the pixel value.
At the end, I want to get pixel value (nearest) of lat long of interest.
Thanks in advance!!!
The AreaDefinition object you are using (.attrs['area']) has a few methods for getting different coordinate information.
area = scn['VIS006'].attrs['area']
col_idx, row_idx = area.get_xy_from_lonlat(lons, lats)
scn['VIS006'].values[row_idx, col_idx]
Note that row and column are flipped. The get_xy_from_lonlat method should work for arrays or scalars.
There are other methods for getting X/Y coordinates of each pixel if that is what you're interesting in.
You can find the location with following:
import numpy as np
px,py = (23.0,55.0) # some location to take out values:
dist = np.sqrt(np.cos(lat*np.pi/180.0)*(lon-px)**2+(lat-py)**2); # this is the distance matrix from point (px,py)
kkout = np.squeeze(np.where(np.abs(dist)==np.nanmin(dist))); # find location where distance is minimum
print(kkout) # you will see the row and column, where to take out data
#serge ballesta - thanks for the direction
Answering my own question.
Project the latitude and longitude (platecaree projection) onto the GEOS projection CRS. Find x and y. Use this x and y and nearest select method of xarray to get pixel value from dask array.
import cartopy.crs as ccrs
data_crs = ccrs.Geostationary(central_longitude=41.5, satellite_height=35785831, false_easting=0, false_northing=0, globe=None, sweep_axis='y')
lon = 77.541677 # longitude of interest
lat = 8.079148 # latitude of interst
# lon lat system in
x, y = data_crs.transform_point(lon, lat, src_crs=ccrs.PlateCarree())
dn = ds.sel(x=x,y=y, method='nearest')
I have some dummy data at 0.2 and 1 degree resolution. I would like to subsample foo to the same scale as foo1.
Is there any easy way to average and regrid my lat and long coordinates somehow?
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
#Set at 0.2 degree grids ish
freq=20
lats=240
lons=1020
time=pd.date_range('2000-01',periods=freq,freq='Y')
data=np.random.rand(freq,lats,lons)
lat=np.linspace(-19.5,19.5,lats)
lon=np.linspace(120,290,lons)
foo = xr.DataArray(data, coords=[time, lat,lon], dims=['time', 'lat','lon'])
foo.sel(time='2005',method='nearest').plot()
plt.show()
#Set at 1 degree grids
freq1=20
lats1=40 #Factor of 6 difference
lons1=170
time1=pd.date_range('2000-01',periods=freq1,freq='Y')
data1=np.random.rand(freq1,lats1,lons1)
lat1=np.linspace(-19.5,19.5,lats1)
lon1=np.linspace(120,290,lons1)
foo1 = xr.DataArray(data1, coords=[time1, lat1,lon1], dims=['time', 'lat','lon'])
foo1.sel(time='2005',method='nearest').plot()
plt.show()
Xarray can linearly interpolate latitudes and longitudes as if they were cartesian coordinates (as in your example above), but that isn't the same a proper geographical regridding. For that, you probably want to check out xesmf.
I decided the easiest way would be to interp using the foo1 grid.
Thus:
foo2=foo.interp(lat=lat1).interp(lon=lon1)
foo2.sel(time='2005',method='nearest').plot()
Should produce an accurate subsampled gridded map.
I am using a dataset from DWD (Deutscher Wetterdienst) and want to select data by latitude and longitude. The import works so far. So no problem there. Now I want to select data by latitude and longitude. It works when I try to select data with sel when I use x and y.
But not with lat and long. I tried all the answer which I could find, like:
ds.sel(latitude=50, longitude=14, method='nearest')
but I am getting the error
ValueError: dimensions or multi-index levels ['latitude', 'longitude'] do not exist
That's my code:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import xarray as xr
ds = xr.open_dataset(
'cosmo-d2_germany_rotated-lat-lon_single-level_2019061721_012_ASWDIFD_S.grib2',
engine='cfgrib',
backend_kwargs={'filter_by_keys': {'stepUnits': 1}}
)
print(ds)
Output:
<xarray.Dataset>
Dimensions: (x: 651, y: 716)
Coordinates:
time datetime64[ns] ...
step timedelta64[ns] ...
surface int32 ...
latitude (y, x) float64 ...
longitude (y, x) float64 ...
valid_time datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables:
ASWDIFD_S (y, x) float32 ...
Attributes:
GRIB_edition: 2
GRIB_centre: edzw
GRIB_centreDescription: Offenbach
GRIB_subCentre: 255
Conventions: CF-1.7
institution: Offenbach
history: 2019-07-22T13:35:33 GRIB to CDM+CF via cfgrib-
In your file latitude and longitude are not dimensions but rather helper 2D variables containing coordinate data. In xarray parlance they are called non-dimension coordinates and you cannot slice on them. See also Working with Multidimensional Coordinates.
It would be better if you regrid the data to a regular grid inside python so that you have latitudes and longitudes as 1D vectors, you would have to make a grid and then interpolate the data over that grid.
Also you need to check https://www.ecmwf.int/sites/default/files/elibrary/2018/18727-cfgrib-easy-and-efficient-grib-file-access-xarray.pdf to see the way to access grib files in xarray. If you dont want to use xarray for this purpose pygrib is another option.
I can't test the solution as I don't have the cfgrib engine installed, but could you try to use
numpy.find_nearest(lonarray, lonvalue)
to find the lon and lat indexes near your point as per this soln:
Find nearest value in numpy array
And then select the point using the index directly on the x,y coordinates?
http://xarray.pydata.org/en/stable/indexing.html
i wrote a function for the files from the DWD:
import pygrib # https://jswhit.github.io/pygrib/docs/
import numpy as np
def get_grib_data_nearest_point(grib_file, inp_lat, inp_lon):
"""
Gets the correspondent value to a latitude-longitude pair of coordinates in
a grib file.
:param grib_file: path to the grib file in disk
:param lat: latitude
:param lon: longitude
:return: scalar
"""
# open the grib file, get the coordinates and values
grbs = pygrib.open(grib_file)
grb = grbs[1]
lats, lons = grb.latlons()
values = grb.values
grbs.close()
# check if user coords are valide
if inp_lat > max(grb.distinctLatitudes): return np.nan
if inp_lat < min(grb.distinctLatitudes): return np.nan
if inp_lon > max(grb.distinctLongitudes): return np.nan
if inp_lon < min(grb.distinctLongitudes): return np.nan
# find index for closest lat (x)
diff_save = 999
for x in range(0, len(lats)):
diff = abs(lats[x][0] - inp_lat)
if diff < diff_save:
diff_save = diff
else:
break
# find index for closest lon (y)
diff_save = 999
for y in range(0, len(lons[x])):
diff = abs(lons[x][y] - inp_lon)
if diff < diff_save:
diff_save = diff
else:
break
# index the array to return the correspondent value
return values[x][y]
As noted above, you can re-grid your data (probably given in curvilinear grid i.e., lat and lon in 2D arrays) to your desired resolution of 1-D array (lat/lon) , after which you can use .sel directly on the lat/lon coords to slice the data.
Check out xESMF(https://xesmf.readthedocs.io/en/latest/notebooks/Curvilinear_grid.html).
Easy, fast interpolation and regridding of Xarray fields with good examples and documentation.
I'm trying a create a Choropleth in Python3 using shapely, fiona & bokeh for display.
I have a file with about 7000 lines that have the location of a town and a counter.
Example:
54.7604;9.55827;208
54.4004;9.95918;207
53.8434;9.95271;203
53.5979;10.0013;201
53.728;10.2526;197
53.646;10.0403;196
54.3977;10.1054;193
52.4385;9.39217;193
53.815;10.3476;192
...
I want to show these in a 12,5km grid, for which a shapefile is available on
https://opendata-esri-de.opendata.arcgis.com/datasets/3c1f46241cbb4b669e18b002e4893711_0
The code I have works.
It's very slow, because it's a brute force algorithm that checks each of the 7127 grid points against all of the 7000 points.
import pandas as pd
import fiona
from shapely.geometry import Polygon, Point, MultiPoint, MultiPolygon
from shapely.prepared import prep
sf = r'c:\Temp\geo_de\Hexagone_125_km\Hexagone_125_km.shp'
shp = fiona.open(sf)
district_xy = [ [ xy for xy in feat["geometry"]["coordinates"][0]] for feat in shp]
district_poly = [ Polygon(xy) for xy in district_xy] # coords to Polygon
df_p = pd.read_csv('points_file.csv', sep=';', header=None)
df_p.columns = ('lat', 'lon', 'count')
map_points = [Point(x,y) for x,y in zip(df_p.lon, df_p.lat)] # Convert Points to Shapely Points
all_points = MultiPoint(map_points) # all points
def calc_points_per_poly(poly, points, values): # Returns total for poly
poly = prep(poly)
return sum([v for p, v in zip(points, values) if poly.contains(p)])
# this is the slow part
# for each shape this sums um the points
sum_hex = [calc_points_per_poly(x, all_points, df_p['count']) for x in district_poly]
Since this is extremly slow, I'm wondering if there is a faster way to get the num_hex value, especially, since the real world list of points may be a lot larger and a smaller grid with more shapes would deliver a better result.
I would recommend using 'geopandas' and its built-in rtree spatial index. It allows you to do the check only if there is a possibility that point lies within polygon.
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, Point
sf = 'Hexagone_125_km.shp'
shp = gpd.read_file(sf)
df_p = pd.read_csv('points_file.csv', sep=';', header=None)
df_p.columns = ('lat', 'lon', 'count')
gdf_p = gpd.GeoDataFrame(df_p, geometry=[Point(x,y) for x,y in zip(df_p.lon, df_p.lat)])
sum_hex = []
spatial_index = gdf_p.sindex
for index, row in shp.iterrows():
polygon = row.geometry
possible_matches_index = list(spatial_index.intersection(polygon.bounds))
possible_matches = gdf_p.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.within(polygon)]
sum_hex.append(sum(precise_matches['count']))
shp['sum'] = sum_hex
This solution should be faster than your. You can then plot your GeoDataFrame via Bokeh. If you want more details on spatial indexing I recommend this article by Geoff Boeing: https://geoffboeing.com/2016/10/r-tree-spatial-index-python/
I have a 2-d gridded files which represents the land use catalogues for the place of interest.
I also have some lat/lon based point distributed in this area.
from netCDF4 import Dataset
## 2-d gridded files
nc_file = "./geo_em.d02.nc"
geo = Dataset(nc_file, 'r')
lu = geo.variables["LU_INDEX"][0,:,:]
lat = geo.variables["XLAT_M"][0,:]
lon = geo.variables["XLONG_M"][0,:]
## point files
point = pd.read_csv("./point_data.csv")
plt.pcolormesh(lon,lat,lu)
plt.scatter(point_data.lon,cf_fire_data.lat, color ='r')
I want to extract the values of the 2-d gridded field which those points belong, but I found it is difficult to define a simple function to solve that.
Is there any efficient method to achieve it?
Any advices would be appreciated.
PS
I have uploaded my files here
1. nc_file
2. point_file
I can propose solution like this, where I just loop over the points and select the data based on the distance from the point.
#/usr/bin/env ipython
import numpy as np
from netCDF4 import Dataset
import matplotlib.pylab as plt
import pandas as pd
# --------------------------------------
## 2-d gridded files
nc_file = "./geo_em.d02.nc"
geo = Dataset(nc_file, 'r')
lu = geo.variables["LU_INDEX"][0,:,:]
lat = geo.variables["XLAT_M"][0,:]
lon = geo.variables["XLONG_M"][0,:]
## point files
point = pd.read_csv("./point_data.csv")
plt.pcolormesh(lon,lat,lu)
#plt.scatter(point_data.lon,cf_fire_data.lat, color ='r')
# --------------------------------------------
# get data for points:
dataout=[];
lon_ratio=np.cos(np.mean(lat)*np.pi/180.0)
for ii in range(len(point)):
plon,plat = point.lon[ii],point.lat[ii]
distmat=np.sqrt(1./lon_ratio*(lon-plon)**2+(lat-plat)**2)
kk=np.where(distmat==np.min(distmat));
dataout.append([float(lon[kk]),float(lat[kk]),float(lu[kk])]);
# ---------------------------------------------