Convert Longitude, latitude to Pixel values using GDAL - python

I have a satellite GeoTIFF Image and a corresponding OSM file with only the highways. I want to convert the longitude latitude value in the OSM file to the pixels and want to highlight highway on the satellite image.
I have tried several methods that are explained on StackExchange. But I get the negative and same pixel value for every longitude and latitude values. Could somebody explain, what am I missing?
Here is the information of the image that I have gathered using OTB application.
Here is the code that i am using.
from osgeo import gdal, osr
import numpy as np
import xml.etree.ElementTree as xml
src_filename = 'image.tif'
dst_filename = 'foo.tiff'
def readLongLat(path):
lonlatList = []
latlongtuple = ()
root = xml.parse(path).getroot()
for i in root:
if i.tag == "node":
latlong = []
lat = float(i.attrib["lat"])
long = float(i.attrib["lon"])
latlong.append(lat)
latlong.append(long)
lonlatList.append(latlong)
return lonlatList
# Opens source dataset
src_ds = gdal.Open(src_filename)
format = "GTiff"
driver = gdal.GetDriverByName(format)
# Open destination dataset
dst_ds = driver.CreateCopy(dst_filename, src_ds, 0)
# Get raster projection
epsg = 4269 # http://spatialreference.org/ref/sr-org/lambert_conformal_conic_2sp/
srs = osr.SpatialReference()
srs.ImportFromEPSG(epsg)
# Make WGS84 lon lat coordinate system
world_sr = osr.SpatialReference()
world_sr.SetWellKnownGeogCS('WGS84')
transform = src_ds.GetGeoTransform()
gt = [transform[0],transform[1],0,transform[3],0,-transform[5]]
#Reading the osm file
lonlat = readLongLat("highways.osm")
# Transform lon lats into XY
coord_transform = osr.CoordinateTransformation(world_sr, srs)
newpoints = coord_transform.TransformPoints(lonlat) # list of XYZ tuples
# Make Inverse Geotransform (try:except due to gdal version differences)
try:
success, inverse_gt = gdal.InvGeoTransform(gt)
except:
inverse_gt = gdal.InvGeoTransform(gt)
# [Note 1] Set pixel values
marker_array_r = np.array([[255]], dtype=np.uint8)
marker_array_g = np.array([[0]], dtype=np.uint8)
marker_array_b = np.array([[0]], dtype=np.uint8)
for x,y,z in newpoints:
pix_x = int(inverse_gt[0] + inverse_gt[1] * x + inverse_gt[2] * y)
pix_y = int(inverse_gt[3] + inverse_gt[4] * x + inverse_gt[5] * y)
dst_ds.GetRasterBand(1).WriteArray(marker_array_r, pix_x, pix_y)
dst_ds.GetRasterBand(2).WriteArray(marker_array_g, pix_x, pix_y)
dst_ds.GetRasterBand(3).WriteArray(marker_array_b, pix_x, pix_y)
# Close files
dst_ds = None
src_ds = None

Something I have tried recently is using the xarray module. I think of xarray as a hybrid between pandas and numpy that allows you to store information as an array but access it using simply .sel requests. Docs here.
UPDATE: Seems as if rasterio and xarray are required to be installed for the below method to work. See link.
It is a much simpler way of translating a GeoTiff file to a user-friendly array. See my example below:
import xarray as xr
ds = xr.open_rasterio("/path/to/image.tif")
# Insert your lat/lon/band below to extract corresponding pixel value
ds.sel(band=2, lat=19.9, lon=39.5, method='nearest').values
>>> [10.3]
This does not answer your question directly, but may help you identify a different (and probably simpler) approach that I've recently switched to.
Note: obviously care needs to be taken to ensure that your lat/lon pairs are in the same coordinate system as the GeoTiff file, but I think you're handling that anyway.

I was able to do that using the library geoio.
import geoio
img = geoio.GeoImage(src_filename)
pix_x, pix_y = img.proj_to_raster(lon,lat)

Related

How to calculate the mean value of multiple large rasters in python/arcpy?

After having converted the original monthly MOD13C2 product (from year 2000-2020) to raster format, now I have to calculate the mean value of the 251 rasters.
First I have tried this tutorial which simply used the algebra function that behaved badly in cooperating NA values. So I tried the second one which converted each raster to array to skip NA pixel. I adapted it into my code:
import arcpy, sys, os, glob
from arcpy.sa import *
import numpy
arcpy.CheckOutExtension('Spatial')
# input path
inws = "G:/data0610/MODIS_VI/EVI/EVI_pro/"
# output path
outws = "G:/data0610/MODIS_VI/EVI/EVI_pro/"
rasters = glob.glob(os.path.join(inws, "*.tif"))
r = Raster(rasters[0])
array = arcpy.RasterToNumPyArray(r) # convert to numpy
rowNum, colNum = array.shape
sum = numpy.zeros(shape=array.shape) # save the accumulating value
count = numpy.zeros(shape=array.shape) # save the counting number
Average = numpy.zeros(shape=array.shape) # save the mean value
for ras in rasters:
rmm = Raster(ras)
array = arcpy.RasterToNumPyArray(rmm)
# one by one pixel
for i in range(0, rowNum):
for j in range(0, colNum):
if array[i][j] >= 0 : # verdict invalid value
sum[i][j] += array[i][j] # accumulate
count[i][j] += 1 # counter
continue
Average = sum / count # cal the mean value
# save the raster
lowerLeft = arcpy.Point(r.extent.XMin, r.extent.YMin)
cellWidth = r.meanCellWidth
cellHeight = r.meanCellHeight
nameT = "evi.tif"
outname = os.path.join(outws, nameT)
arcpy.env.overwriteOutput = True
#convert to WGS84
inf = "G:/data0610/MODIS_VI/sm_mean.tif"
arcpy.env.outputCoordinateSystem = Raster(inf) # convert the crs to wgs84
print("successfully converted the CRS!")
AvgRas = arcpy.NumPyArrayToRaster(Average, lowerLeft, cellWidth, cellHeight, r.noDataValue) # turn into raster
AvgRas.save(outname)
print("successfully output the evi_mean.tif!")
# resample
outname_res = outws + "evi_mean_res.tif"
# get the standard cellsize
cellsize025 = "{0} {1}".format(arcpy.Describe(inf).meanCellWidth, arcpy.Describe(inf).meanCellHeight)
arcpy.Resample_management(AvgRas, outname_res, cellsize025, "NEAREST")
print("successfully output the evi_mean.tif with the 0.25 degree resolution!")
Unfortunately, the arcpy (python 2.7, 32 bit) had the memory error because there were too many large rasters (I'm dealing with the global extent). I found the reason and solution from this question:
enter image description here
So I installed the 64-bit Background Geoprocessing of ArcGIS and run the above code again, then it came up with another problem:
enter image description here
It turned out the soultion might be useless, because ArcGIS is very sensitive to administrator license, you can't run the 64-bit python while the ArcGIS being 32-bit.
Now return to my initial question: How to calculate the mean value of multiple rasters in python/arcpy? Did I make things complicated? Is there a simpler way of generating the mean value raster?
It's really driving me crazy.

Extracting countries from NetCDF data using geopandas

I am trying to extract countries from NetCDF3 data using the pdsi monthly mean calibrate data from: https://psl.noaa.gov/data/gridded/data.pdsi.html. I am using the following code which performs a spatial merge of coordinates and identifies countries based on a shapefile of the world.
PDSI data format
# Import shapefile from geopandas
path_to_data = geopandas.datasets.get_path("naturalearth_lowres")
world_shp = geopandas.read_file(path_to_data)
world_shp.head()
# Import netCDF file
ncs = "pdsi.mon.mean.selfcalibrated.nc"
# Read in netCDF as a pandas dataframe
# Xarray provides a simple method of opening netCDF files, and converting them to pandas dataframes
ds = xr.open_dataset(ncs)
pdsi = ds.to_dataframe()
# the index in the df is a Pandas.MultiIndex. To reset it, use df.reset_index()
pdsi = pdsi.reset_index()
# quick check for shpfile plotting
world_shp.plot(figsize=(12, 8));
# use geopandas points_from_xy() to transform Longitude and Latitude into a list of shapely.Point objects and set it as a geometry while creating the GeoDataFrame
pdsi_gdf = geopandas.GeoDataFrame(pdsi, geometry=geopandas.points_from_xy(pdsi.lon, pdsi.lat))
print(pdsi_gdf.head())
# check CRS coordinates
world_shp.crs #shapefile
pdsi_gdf.crs #geodataframe netcdf
# set coordinates equal to each other
# PointsGeodataframe.crs = PolygonsGeodataframe.crs
pdsi_gdf.crs = world_shp.crs
# check coordinates after setting coordinates equal to each other
pdsi_gdf.crs #geodataframe netcdf
#spatial join
join_inner_df = geopandas.sjoin(pdsi_gdf, world_shp, how="inner")
join_inner_df
The problem I am having is that the original data in the NetCDF format consists of spatial coverage/gridded data where the values of the key variable (pdsi) represents the area within each shaded squares (see image below). So far, only the coordinate points in the middle of the squares are being matches, and I would like each shaded square to match to each country that it is inside. For example, if the area of the shaded squares are within the boundaries of Germany and Netherlands, then the key variable should be attributed to both countries. Any help on this issue would be greatly appreciated.
NetCDF gridded data example
have sourced data that you referenced to ensure this can be re-run on any machine
core solution, a square buffer around the point https://gis.stackexchange.com/questions/314949/creating-square-buffers-around-points-using-shapely
have analysed data to ensure value used for buffer is appropriate and calculated from data
# make sure that data supports using a buffer...
assert (
gdf["lat"].diff().loc[lambda s: s.ne(0)].mode()
== gdf["lon"].diff().loc[lambda s: s.ne(0)].mode()
).all()
# how big should the square buffer be around the point??
buffer = gdf["lat"].diff().loc[lambda s: s.ne(0)].mode().values[0] / 2
gdf["geometry"] = gdf["geometry"].buffer(buffer, cap_style=3)
the remaining solution is now a spatial join
# the solution... spatial join buffered polygons to countries
# comma separate associated countries
gdf = gdf.join(
world_shp.sjoin(gdf.set_crs("EPSG:4326"))
.groupby("index_right")["name"]
.agg(",".join)
)
have used plotly to visualise. From image you can see that multiple countries have been associated with a bounding box.
complete code
import geopandas as gpd
import numpy as np
import plotly.express as px
import requests
from pathlib import Path
from zipfile import ZipFile
import urllib
import geopandas as gpd
import shapely.geometry
import xarray as xr
# download NetCDF data...
# fmt: off
url = "https://psl.noaa.gov/repository/entry/get/pdsi.mon.mean.selfcalibrated.nc?entryid=synth%3Ae570c8f9-ec09-4e89-93b4-babd5651e7a9%3AL2RhaV9wZHNpL3Bkc2kubW9uLm1lYW4uc2VsZmNhbGlicmF0ZWQubmM%3D"
f = Path.cwd().joinpath(Path(urllib.parse.urlparse(url).path).name)
# fmt: on
if not f.exists():
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
ds = xr.open_dataset(f)
pdsi = ds.to_dataframe()
pdsi = pdsi.reset_index().dropna() # don't care about places in oceans...
# use subset for testing... last 5 times...
pdsim = pdsi.loc[pdsi["time"].isin(pdsi.groupby("time").size().index[-5:])]
# create geopandas dataframe
gdf = gpd.GeoDataFrame(
pdsim, geometry=pdsim.loc[:, ["lon", "lat"]].apply(shapely.geometry.Point, axis=1)
)
# make sure that data supports using a buffer...
assert (
gdf["lat"].diff().loc[lambda s: s.ne(0)].mode()
== gdf["lon"].diff().loc[lambda s: s.ne(0)].mode()
).all()
# how big should the square buffer be around the point??
buffer = gdf["lat"].diff().loc[lambda s: s.ne(0)].mode().values[0] / 2
gdf["geometry"] = gdf["geometry"].buffer(buffer, cap_style=3)
# Import shapefile from geopandas
path_to_data = gpd.datasets.get_path("naturalearth_lowres")
world_shp = gpd.read_file(path_to_data)
# the solution... spatial join buffered polygons to countries
# comma separate associated countries
gdf = gdf.join(
world_shp.sjoin(gdf.set_crs("EPSG:4326"))
.groupby("index_right")["name"]
.agg(",".join)
)
gdf["time_a"] = gdf["time"].dt.strftime("%Y-%b-%d")
# simplest way to test is visualise...
px.choropleth_mapbox(
gdf,
geojson=gdf.geometry,
locations=gdf.index,
color="pdsi",
hover_data=["name"],
animation_frame="time_a",
opacity=.3
).update_layout(
mapbox={"style": "carto-positron", "zoom": 1},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)

Given a geotiff file, how does one find the single pixel closest to a given latitude/longitude?

I have a geotiff file that I'm opening with gdal in Python, and I need to find the single pixel closest to a specified latitude/longitude. I was previously working with an unrelated file type for similar data, so I'm completely new to both gdal and geotiff.
How does one do this? What I have so far is
import gdal
ds = gdal.Open('foo.tiff')
width = ds.RasterXSize
height = ds.RasterYSize
gt = ds.GetGeoTransform()
gp = ds.GetProjection()
data = np.array(ds.ReadAsArray())
print(gt)
print(gp)
which produces (for my files)
(-3272421.457337171, 2539.703, 0.0, 3790842.1060354356, 0.0, -2539.703)
and
PROJCS["unnamed",GEOGCS["Coordinate System imported from GRIB file",DATUM["unnamed",SPHEROID["Sphere",6371200,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["latitude_of_origin",25],PARAMETER["central_meridian",265],PARAMETER["standard_parallel_1",25],PARAMETER["standard_parallel_2",25],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
Ideally, there'd be a single simple function call, and it would also return an indication whether the specified location falls outside the bounds of the raster.
My fallback is to obtain a grid from another source containing the latitudes and longitudes for each pixel and then do a brute force search for the desired location, but I'm hoping there's a more elegant way.
Note: I think what I'm trying to do is equivalent to the command line
gdallocationinfo -wgs84 foo.tif <longitude> <latitude>
which returns results like
Report:
Location: (1475P,1181L)
Band 1:
Value: 66
This suggests to me that the functionality is probably already in the gdal module, if I can just find the right method to call.
You basically need two steps:
Convert the lat/lon point to the raster-projection
Convert the mapx/mapy (in raster proj) to pixel coordinates
Given the code you already posted above, defining both projection systems can be done with:
from osgeo import gdal, osr
point_srs = osr.SpatialReference()
point_srs.ImportFromEPSG(4326) # hardcode for lon/lat
# GDAL>=3: make sure it's x/y
# see https://trac.osgeo.org/gdal/wiki/rfc73_proj6_wkt2_srsbarn
point_srs.SetAxisMappingStrategy(osr.OAMS_TRADITIONAL_GIS_ORDER)
file_srs = osr.SpatialReference()
file_srs.ImportFromWkt(gp)
Creating the coordinate transformation, and using it to convert the point from lon/lat to mapx/mapy coordinates (whatever projection it is) with:
ct = osr.CoordinateTransformation(point_srs, file_srs)
point_x = -114.06138 # lon
point_y = 51.03163 # lat
mapx, mapy, z = ct.TransformPoint(point_x, point_y)
To go from map coordinates to pixel coordinates, the geotransform needs to be inverted first. And can then be used to retrieve the pixel coordinates like:
gt_inv = gdal.InvGeoTransform(gt)
pixel_x, pixel_y = gdal.ApplyGeoTransform(gt_inv, mapx, mapy)
Rounding those pixel coordinates should allow you to use them for indexing the data array. You might need to clip them if the point you're querying is outside the raster.
# round to pixel
pixel_x = round(pixel_x)
pixel_y = round(pixel_y)
# clip to file extent
pixel_x = max(min(pixel_x, width-1), 0)
pixel_y = max(min(pixel_y, height-1), 0)
pixel_data = data[pixel_y, pixel_x]

Get nearest pixel value from satellite image using latitude longitude coordinates

I have a satellite image file. Loaded into dask array. I want to get pixel value (nearest) of a latitude, longitude of interest.
Satellite image is in GEOS projection. I have longitude and latitude information as 2D numpy arrays.
Satellite Image file
I have loaded it into a dask data array
from satpy import Scene
import matplotlib as plt
import os
cwd = os.getcwd()
fn = os.path.join(cwd, 'EUMETSAT_data/1Jan21/MSG1-SEVI-MSG15-0100-NA-20210101185741.815000000Z-20210101185757-1479430.nat')
files = [fn]
scn = Scene(filenames=files, reader='seviri_l1b_native')
scn.load(["VIS006"])
da = scn['VIS006']
This is what the dask array looks like:
I read lon lats from the area attribute with the help of satpy:
lon, lat = scn['VIS006'].attrs['area'].get_lonlats()
print(lon.shape)
print(lat.shape)
(1179, 808)
(1179, 808)
I get a 2d numpy array each, for longitude and latitude that are coordinates but I can not use them for slicing or selecting.
What is the best practice/method to get nearest lat long, pixel information?
How do I project the data onto lat long coordinates that I can then use for indexing to arrive at the pixel value.
At the end, I want to get pixel value (nearest) of lat long of interest.
Thanks in advance!!!
The AreaDefinition object you are using (.attrs['area']) has a few methods for getting different coordinate information.
area = scn['VIS006'].attrs['area']
col_idx, row_idx = area.get_xy_from_lonlat(lons, lats)
scn['VIS006'].values[row_idx, col_idx]
Note that row and column are flipped. The get_xy_from_lonlat method should work for arrays or scalars.
There are other methods for getting X/Y coordinates of each pixel if that is what you're interesting in.
You can find the location with following:
import numpy as np
px,py = (23.0,55.0) # some location to take out values:
dist = np.sqrt(np.cos(lat*np.pi/180.0)*(lon-px)**2+(lat-py)**2); # this is the distance matrix from point (px,py)
kkout = np.squeeze(np.where(np.abs(dist)==np.nanmin(dist))); # find location where distance is minimum
print(kkout) # you will see the row and column, where to take out data
#serge ballesta - thanks for the direction
Answering my own question.
Project the latitude and longitude (platecaree projection) onto the GEOS projection CRS. Find x and y. Use this x and y and nearest select method of xarray to get pixel value from dask array.
import cartopy.crs as ccrs
data_crs = ccrs.Geostationary(central_longitude=41.5, satellite_height=35785831, false_easting=0, false_northing=0, globe=None, sweep_axis='y')
lon = 77.541677 # longitude of interest
lat = 8.079148 # latitude of interst
# lon lat system in
x, y = data_crs.transform_point(lon, lat, src_crs=ccrs.PlateCarree())
dn = ds.sel(x=x,y=y, method='nearest')

indices of 2D lat lon data

I am trying to find the equivalent (if there exists one) of an NCL function that returns the indices of two-dimensional latitude/longitude arrays closest to a user-specified latitude/longitude coordinate pair.
This is the link to the NCL function that I am hoping there is an equivalent to in python. I'm suspecting at this point that there is not, so any tips on how to get indices from lat/lon coordinates is appreciated
https://www.ncl.ucar.edu/Document/Functions/Contributed/getind_latlon2d.shtml
Right now , I have my coordinate values saved into an .nc file and are read by:
coords='coords.nc'
fh = Dataset(coords, mode='r')
lons = fh.variables['g5_lon_1'][:,:]
lats = fh.variables['g5_lat_0'][:,:]
rot = fh.variables['g5_rot_2'][:,:]
fh.close()
I found scipy spatial.KDTree can perform similar task. Here is my code of finding the model grid that is closest to the observation location
from scipy import spatial
from netCDF4 import Dataset
# read in the one dimensional lat lon info from a dataset
fname = '0k_T_ann_clim.nc'
fid = Dataset(fname, 'r')
lat = fid.variables['lat'][:]
lon = fid.variables['lon'][:]
# make them a meshgrid for later use KDTree
lon2d, lat2d = np.meshgrid(lon, lat)
# zip them together
model_grid = list( zip(np.ravel(lon2d), np.ravel(lat2d)) )
#target point location : 30.5N, 56.1E
target_pts = [30.5 56.1]
distance, index = spatial.KDTree(model_grid).query(target_pts)
# the nearest model location (in lat and lon)
model_loc_coord = [coord for i, coord in enumerate(model_grid) if i==index]
I'm not sure how lon/lat arrays are stored when read in python, so to use the following solution you may need to convert lon/lat to numpy arrays. You can just put the abs(array-target).argmin() in a function.
import numpy as np
# make a dummy longitude array, 0.5 degree resolution.
lon=np.linspace(0.5,360,720)
# find index of nearest longitude to 25.4
ind=abs(lon-25.4).argmin()
# check it works! this gives 25.5
lon[ind]

Categories