Get nearest pixel value from satellite image using latitude longitude coordinates - python

I have a satellite image file. Loaded into dask array. I want to get pixel value (nearest) of a latitude, longitude of interest.
Satellite image is in GEOS projection. I have longitude and latitude information as 2D numpy arrays.
Satellite Image file
I have loaded it into a dask data array
from satpy import Scene
import matplotlib as plt
import os
cwd = os.getcwd()
fn = os.path.join(cwd, 'EUMETSAT_data/1Jan21/MSG1-SEVI-MSG15-0100-NA-20210101185741.815000000Z-20210101185757-1479430.nat')
files = [fn]
scn = Scene(filenames=files, reader='seviri_l1b_native')
scn.load(["VIS006"])
da = scn['VIS006']
This is what the dask array looks like:
I read lon lats from the area attribute with the help of satpy:
lon, lat = scn['VIS006'].attrs['area'].get_lonlats()
print(lon.shape)
print(lat.shape)
(1179, 808)
(1179, 808)
I get a 2d numpy array each, for longitude and latitude that are coordinates but I can not use them for slicing or selecting.
What is the best practice/method to get nearest lat long, pixel information?
How do I project the data onto lat long coordinates that I can then use for indexing to arrive at the pixel value.
At the end, I want to get pixel value (nearest) of lat long of interest.
Thanks in advance!!!

The AreaDefinition object you are using (.attrs['area']) has a few methods for getting different coordinate information.
area = scn['VIS006'].attrs['area']
col_idx, row_idx = area.get_xy_from_lonlat(lons, lats)
scn['VIS006'].values[row_idx, col_idx]
Note that row and column are flipped. The get_xy_from_lonlat method should work for arrays or scalars.
There are other methods for getting X/Y coordinates of each pixel if that is what you're interesting in.

You can find the location with following:
import numpy as np
px,py = (23.0,55.0) # some location to take out values:
dist = np.sqrt(np.cos(lat*np.pi/180.0)*(lon-px)**2+(lat-py)**2); # this is the distance matrix from point (px,py)
kkout = np.squeeze(np.where(np.abs(dist)==np.nanmin(dist))); # find location where distance is minimum
print(kkout) # you will see the row and column, where to take out data

#serge ballesta - thanks for the direction
Answering my own question.
Project the latitude and longitude (platecaree projection) onto the GEOS projection CRS. Find x and y. Use this x and y and nearest select method of xarray to get pixel value from dask array.
import cartopy.crs as ccrs
data_crs = ccrs.Geostationary(central_longitude=41.5, satellite_height=35785831, false_easting=0, false_northing=0, globe=None, sweep_axis='y')
lon = 77.541677 # longitude of interest
lat = 8.079148 # latitude of interst
# lon lat system in
x, y = data_crs.transform_point(lon, lat, src_crs=ccrs.PlateCarree())
dn = ds.sel(x=x,y=y, method='nearest')

Related

How to apply a point transformation to many points?

I have a gridded temperature dataset and a list of weather stations across the country and their latitudes and longitudes. I want to find the grid points that are nearest to the weather stations. My gridded data has coordinates x,y which latitude and longitude are a function of.
I found that the simplest way of finding the nearest grid point is to first transform the latitude and longitude (Lat, Lon) of the weather stations to x and y values and then find the nearest grid point. I did that for one station (lat= , lon= ) by doing the following:
import matplotlib.pyplot as plt
from netCDF4 import Dataset as netcdf_dataset
import numpy as np
from cartopy import config
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import xarray as xr
import pandas as pd
import netCDF4 as nc
#open gridded data
df=xr.open_dataset('/home/mmartin/LauNath/air.2m.2015.nc')
#open weather station data
CMStations=pd.read_csv('Slope95.csv')
import cartopy.crs as ccrs
# Example - your x and y coordinates are in a Lambert Conformal projection
data_crs = ccrs.LambertConformal(central_longitude=-107.0,central_latitude=50.0,standard_parallels = (50, 50.000001),false_easting=5632642.22547,false_northing=4612545.65137)
# Transform the point - src_crs is always Plate Carree for lat/lon grid
x, y = data_crs.transform_point(-94.5786,39.0997, src_crs=ccrs.PlateCarree())
# Now you can select data
ks=df.sel(x=x, y=y, method='nearest')
How would I apply this to all of the weather stations latitudes and longitudes (Lat,Lon)?
There is no need to use geopandas in here... just use crs.transform_points() instead of crs.transform_point() and pass the coordinates as arrays!
import numpy as np
import cartopy.crs as ccrs
data_crs = ccrs.LambertConformal(central_longitude=-107.0,central_latitude=50.0,standard_parallels = (50, 50.000001),false_easting=5632642.22547,false_northing=4612545.65137)
lon, lat = np.array([1,2,3]), np.array([1,2,3])
data_crs.transform_points(ccrs.PlateCarree(), lon, lat)
which will return an array of the projected coordinates:
array([[16972983.1673108 , 8528848.37931063, 0. ],
[16841398.80456616, 8697676.02704447, 0. ],
[16709244.32834945, 8862533.81411212, 0. ]])
... also... if you really have a lot of points to transform (and maybe use some crs not yet supported by cartopy) you might want to have a look at PyProj directly since it provides a lot more functionality and also some tricks to speed up transformations. (it's used under the hood by cartopy as well so you should already have it installed!)
You can create a geopandas GeoDataFrame from x, y columns using geopandas.points_from_xy. I'll assume these points are WGS84/EPSG4326:
import geopandas as gpd
stations = gpd.GeoDataFrame(
CMStations,
geometry=gpd.points_from_xy(
CMStations.Lon, CMStations.Lat, crs="epsg:4326" # assume WGS84
),
)
Now, we can use geopandas.GeoDataFrame.to_crs to transform all the points at once:
stations_xy = stations.to_crs(data_crs)
Finally, we can use xarray's advanced indexing, using DataArrays with lat/lon data and station ID as coordinates, to reshape the x/y data to the shape of the CMStations index:
station_x = stations.geometry.x.to_xarray()
station_y = stations.geometry.y.to_xarray()
# use these to select from xarray Dataset ds
station_data = ds.sel(y=station_y, x=station_x, method="nearest")
If desired, you could set a station ID column to be the index first with CMStations.set_index("station_id") to get the station_id column as the dataset dimension which replaces x and y.

Converting Sentinel 3 LST image with netcdf format to tiff with proper coordinates python

I have a Sentinel 3 image which is stored in a number of netcdf files. The variable is stored in the file "LST_in.nc" with dimensions = rows and columns of the image. The lat and long are in another file "geodetic_in.nc". I want to export the image with the lat and long to tiff format.
To my understanding, the names of dimensions and coordinates should be the same, while I failed to do this
here are my attempts
import rioxarray as rio
import xarray as xr
xds = xr.open_dataset('LST_in.nc')
coord =xr.open_dataset('geodetic_in.nc')
lat, lon = coord.latitude_in.data, coord.longitude_in.data
xds = xds.assign_coords({"lat":(["rows","columns"], lat), "lon":(["rows","columns"], lon)})
xds = xds.rename_dims({"rows": "lon", "columns": 'lat'})
Here I received this error
ValueError: Cannot rename rows to lon because lon already exists. Try using swap_dims instead.
Then I tried this
xds = xds.swap_dims({'rows' : 'lon', 'columns' : 'lat'})
but received another error
ValueError: replacement dimension 'lon' is not a 1D variable along the old dimension 'rows'
Also this one
lst = xds.LST
lst.rio.set_spatial_dims(x_dim = 'lon', y_dim = 'lat', inplace = True)
Error: MissingSpatialDimensionError: x dimension (lon) not found. Data variable: LST
The only one that works but with the wrong coordinates is
lst = xds.LST
lst.rio.set_spatial_dims(x_dim = 'columns', y_dim = 'rows', inplace = True)
lst.rio.write_crs("epsg:4326", inplace = True)
lst.rio.to_raster("lst.tif")
I would appreciate your help. attached is the image files
https://wetransfer.com/downloads/e3711adf56f73cd07119b43d19f7360820220117154330/c46b21
The short answer is: you can't. Because both netCDF and grib are gridded data format and the current data points positions can't be described using a regular latitude/longitude grid.
I plotted a sample of your data points latitude and longitude:
As you can see, the data points are not placed on lines of constant latitude and longitude, they do not follow any pattern that could be describe with projected grid, rotated grid or curvilinear grid either.
If you want to make a gridded file with the LST values and latitude and longitude as coordinates, you will have to reproject your data. You can use the rasterio.warp module, see here for more information.

Select data by latitude and longitude

I am using a dataset from DWD (Deutscher Wetterdienst) and want to select data by latitude and longitude. The import works so far. So no problem there. Now I want to select data by latitude and longitude. It works when I try to select data with sel when I use x and y.
But not with lat and long. I tried all the answer which I could find, like:
ds.sel(latitude=50, longitude=14, method='nearest')
but I am getting the error
ValueError: dimensions or multi-index levels ['latitude', 'longitude'] do not exist
That's my code:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import xarray as xr
​
​
ds = xr.open_dataset(
'cosmo-d2_germany_rotated-lat-lon_single-level_2019061721_012_ASWDIFD_S.grib2',
engine='cfgrib',
backend_kwargs={'filter_by_keys': {'stepUnits': 1}}
)
​
print(ds)
Output:
<xarray.Dataset>
Dimensions: (x: 651, y: 716)
Coordinates:
time datetime64[ns] ...
step timedelta64[ns] ...
surface int32 ...
latitude (y, x) float64 ...
longitude (y, x) float64 ...
valid_time datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables:
ASWDIFD_S (y, x) float32 ...
Attributes:
GRIB_edition: 2
GRIB_centre: edzw
GRIB_centreDescription: Offenbach
GRIB_subCentre: 255
Conventions: CF-1.7
institution: Offenbach
history: 2019-07-22T13:35:33 GRIB to CDM+CF via cfgrib-
In your file latitude and longitude are not dimensions but rather helper 2D variables containing coordinate data. In xarray parlance they are called non-dimension coordinates and you cannot slice on them. See also Working with Multidimensional Coordinates.
It would be better if you regrid the data to a regular grid inside python so that you have latitudes and longitudes as 1D vectors, you would have to make a grid and then interpolate the data over that grid.
Also you need to check https://www.ecmwf.int/sites/default/files/elibrary/2018/18727-cfgrib-easy-and-efficient-grib-file-access-xarray.pdf to see the way to access grib files in xarray. If you dont want to use xarray for this purpose pygrib is another option.
I can't test the solution as I don't have the cfgrib engine installed, but could you try to use
numpy.find_nearest(lonarray, lonvalue)
to find the lon and lat indexes near your point as per this soln:
Find nearest value in numpy array
And then select the point using the index directly on the x,y coordinates?
http://xarray.pydata.org/en/stable/indexing.html
i wrote a function for the files from the DWD:
import pygrib # https://jswhit.github.io/pygrib/docs/
import numpy as np
def get_grib_data_nearest_point(grib_file, inp_lat, inp_lon):
"""
Gets the correspondent value to a latitude-longitude pair of coordinates in
a grib file.
:param grib_file: path to the grib file in disk
:param lat: latitude
:param lon: longitude
:return: scalar
"""
# open the grib file, get the coordinates and values
grbs = pygrib.open(grib_file)
grb = grbs[1]
lats, lons = grb.latlons()
values = grb.values
grbs.close()
# check if user coords are valide
if inp_lat > max(grb.distinctLatitudes): return np.nan
if inp_lat < min(grb.distinctLatitudes): return np.nan
if inp_lon > max(grb.distinctLongitudes): return np.nan
if inp_lon < min(grb.distinctLongitudes): return np.nan
# find index for closest lat (x)
diff_save = 999
for x in range(0, len(lats)):
diff = abs(lats[x][0] - inp_lat)
if diff < diff_save:
diff_save = diff
else:
break
# find index for closest lon (y)
diff_save = 999
for y in range(0, len(lons[x])):
diff = abs(lons[x][y] - inp_lon)
if diff < diff_save:
diff_save = diff
else:
break
# index the array to return the correspondent value
return values[x][y]
As noted above, you can re-grid your data (probably given in curvilinear grid i.e., lat and lon in 2D arrays) to your desired resolution of 1-D array (lat/lon) , after which you can use .sel directly on the lat/lon coords to slice the data.
Check out xESMF(https://xesmf.readthedocs.io/en/latest/notebooks/Curvilinear_grid.html).
Easy, fast interpolation and regridding of Xarray fields with good examples and documentation.

Extracting the value of 2-d gridded field by scatter points in Python

I have a 2-d gridded files which represents the land use catalogues for the place of interest.
I also have some lat/lon based point distributed in this area.
from netCDF4 import Dataset
## 2-d gridded files
nc_file = "./geo_em.d02.nc"
geo = Dataset(nc_file, 'r')
lu = geo.variables["LU_INDEX"][0,:,:]
lat = geo.variables["XLAT_M"][0,:]
lon = geo.variables["XLONG_M"][0,:]
## point files
point = pd.read_csv("./point_data.csv")
plt.pcolormesh(lon,lat,lu)
plt.scatter(point_data.lon,cf_fire_data.lat, color ='r')
I want to extract the values of the 2-d gridded field which those points belong, but I found it is difficult to define a simple function to solve that.
Is there any efficient method to achieve it?
Any advices would be appreciated.
PS
I have uploaded my files here
1. nc_file
2. point_file
I can propose solution like this, where I just loop over the points and select the data based on the distance from the point.
#/usr/bin/env ipython
import numpy as np
from netCDF4 import Dataset
import matplotlib.pylab as plt
import pandas as pd
# --------------------------------------
## 2-d gridded files
nc_file = "./geo_em.d02.nc"
geo = Dataset(nc_file, 'r')
lu = geo.variables["LU_INDEX"][0,:,:]
lat = geo.variables["XLAT_M"][0,:]
lon = geo.variables["XLONG_M"][0,:]
## point files
point = pd.read_csv("./point_data.csv")
plt.pcolormesh(lon,lat,lu)
#plt.scatter(point_data.lon,cf_fire_data.lat, color ='r')
# --------------------------------------------
# get data for points:
dataout=[];
lon_ratio=np.cos(np.mean(lat)*np.pi/180.0)
for ii in range(len(point)):
plon,plat = point.lon[ii],point.lat[ii]
distmat=np.sqrt(1./lon_ratio*(lon-plon)**2+(lat-plat)**2)
kk=np.where(distmat==np.min(distmat));
dataout.append([float(lon[kk]),float(lat[kk]),float(lu[kk])]);
# ---------------------------------------------

indices of 2D lat lon data

I am trying to find the equivalent (if there exists one) of an NCL function that returns the indices of two-dimensional latitude/longitude arrays closest to a user-specified latitude/longitude coordinate pair.
This is the link to the NCL function that I am hoping there is an equivalent to in python. I'm suspecting at this point that there is not, so any tips on how to get indices from lat/lon coordinates is appreciated
https://www.ncl.ucar.edu/Document/Functions/Contributed/getind_latlon2d.shtml
Right now , I have my coordinate values saved into an .nc file and are read by:
coords='coords.nc'
fh = Dataset(coords, mode='r')
lons = fh.variables['g5_lon_1'][:,:]
lats = fh.variables['g5_lat_0'][:,:]
rot = fh.variables['g5_rot_2'][:,:]
fh.close()
I found scipy spatial.KDTree can perform similar task. Here is my code of finding the model grid that is closest to the observation location
from scipy import spatial
from netCDF4 import Dataset
# read in the one dimensional lat lon info from a dataset
fname = '0k_T_ann_clim.nc'
fid = Dataset(fname, 'r')
lat = fid.variables['lat'][:]
lon = fid.variables['lon'][:]
# make them a meshgrid for later use KDTree
lon2d, lat2d = np.meshgrid(lon, lat)
# zip them together
model_grid = list( zip(np.ravel(lon2d), np.ravel(lat2d)) )
#target point location : 30.5N, 56.1E
target_pts = [30.5 56.1]
distance, index = spatial.KDTree(model_grid).query(target_pts)
# the nearest model location (in lat and lon)
model_loc_coord = [coord for i, coord in enumerate(model_grid) if i==index]
I'm not sure how lon/lat arrays are stored when read in python, so to use the following solution you may need to convert lon/lat to numpy arrays. You can just put the abs(array-target).argmin() in a function.
import numpy as np
# make a dummy longitude array, 0.5 degree resolution.
lon=np.linspace(0.5,360,720)
# find index of nearest longitude to 25.4
ind=abs(lon-25.4).argmin()
# check it works! this gives 25.5
lon[ind]

Categories