netCDF grid file: Extracting information from 1D array using 2D values

netCDF grid file: Extracting information from 1D array using 2D values - python

I am trying to work in Python 3 with topography/bathymetry-information (basically a grid containing x [longitude in decimal degrees], y [latitude in decimal degrees] and z [meter]).
The grid file has the extension .nc and is therefore a netCDF-file. Normally I would use it in mapping tools like Generic Mapping Tools and don't have to bother with how a netCDF file works, but I need to extract specific information in a Python script. Right now this is only limiting the dataset to certain longitude/latitude ranges.
However, right now I am a bit lost on how to get to the z-information for specific x and y values. Here's what I know about the data so far
import netCDF4
#----------------------
# Load netCDF file
#----------------------
bathymetry_file = 'C:/Users/te279/Matlab/data/gebco_08.nc'
fh = netCDF4.Dataset(bathymetry_file, mode='r')
#----------------------
# Getting information about the file
#----------------------
print(fh.file_format)
NETCDF3_CLASSIC
print(fh)
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
title: GEBCO_08 Grid
source: 20100927
dimensions(sizes): side(2), xysize(933120000)
variables(dimensions): float64 x_range(side), float64 y_range(side), int16 z_range(side), float64 spacing(side), int32 dimension(side), int16 z(xysize)
groups:
print(fh.dimensions.keys())
odict_keys(['side', 'xysize'])
print(fh.dimensions['side'])
: name = 'side', size = 2
print(fh.dimensions['xysize'])
: name = 'xysize', size = 933120000
#----------------------
# Variables
#----------------------
print(fh.variables.keys()) # returns all available variable keys
odict_keys(['x_range', 'y_range', 'z_range', 'spacing', 'dimension', 'z'])
xrange = fh.variables['x_range'][:]
print(xrange)
[-180. 180.] # contains the values -180 to 180 for the longitude of the whole world
yrange = fh.variables['y_range'][:]
print(yrange)
[-90. 90.] # contains the values -90 to 90 for the latitude of the whole world
zrange = fh.variables['z_range'][:]
[-10977 8685] # contains the depths/topography range for the world
spacing = fh.variables['spacing'][:]
[ 0.00833333 0.00833333] # spacing in both x and y. Equals the dimension, if multiplied with x and y range
dimension = fh.variables['dimension'][:]
[43200 21600] # corresponding to the shape of z if it was the 2D array I would've hoped for (it's currently an 1D array of 9333120000 - which is 43200*21600)
z = fh.variables['z'][:] # currently an 1D array of the depth/topography/z information I want
fh.close
Based on this information I still don't know how to access z for specific x/y (longitude/latitude) values. I think basically I need to convert the 1D array of z into a 2D array corresponding to longitude/latitude values. I just have not a clue how to do that. I saw in some posts where people tried to convert a 1D into a 2D array, but I have no means to know in what corner of the world they start and how they progress.
I know there is a 3 year old similar post, however, I don't know how to find an analogue "index of the flattened array" for my problem - or how to exactly work with that. Can somebody help?

You need to first read in all three of z's dimensions (lat, lon, depth) and then extract values across each of those dimensions. Here are a few examnples.
# Read in all 3 dimensions [lat x lon x depth]
z = fh.variables['z'][:,:,:]
# Topography at a single lat/lon/depth (1 value):
z_1 = z[5,5,5]
# Topography at all depths for a single lat/lon (1D array):
z_2 = z[5,5,:]
# Topography at all latitudes and longitudes for a single depth (2D array):
z_3 = z[:,:,5]
Note that the number you enter for lat/lon/depth is the index in that dimension, not an actual latitude, for instance. You'll need to determine the indices of the values you are looking for beforehand.

I just found the solution in this post. Sorry that I didn't see that before. Here's what my code looks like now. Thanks to Dave (he answered his own question in the post above). The only thing I had to work on was that the dimensions have to stay integers.
import netCDF4
import numpy as np
#----------------------
# Load netCDF file
#----------------------
bathymetry_file = 'C:/Users/te279/Matlab/data/gebco_08.nc'
fh = netCDF4.Dataset(bathymetry_file, mode='r')
#----------------------
# Extract variables
#----------------------
xrange = fh.variables['x_range'][:]
yrange = fh.variables['y_range'][:]
zz = fh.variables['z'][:]
fh.close()
#----------------------
# Compute Lat/Lon
#----------------------
nx = (xrange[-1]-xrange[0])/spacing[0] # num pts in x-dir
ny = (yrange[-1]-yrange[0])/spacing[1] # num pts in y-dir
nx = nx.astype(np.integer)
ny = ny.astype(np.integer)
lon = np.linspace(xrange[0],xrange[-1],nx)
lat = np.linspace(yrange[0],yrange[-1],ny)
#----------------------
# Reshape the 1D to an 2D array
#----------------------
bathy = zz[:].reshape(ny, nx)
So, now when I look at the shape of both zz and bathy (following code), the former is a 1D array with a length of 933120000, the latter the 2D array with dimensions of 43200x21600.
print(zz.shape)
print(bathy.shape)
The next step is to use indices to access the bathymetry/topography data correctly, just as N1B4 described in his post

Related

Converting Sentinel 3 LST image with netcdf format to tiff with proper coordinates python

I have a Sentinel 3 image which is stored in a number of netcdf files. The variable is stored in the file "LST_in.nc" with dimensions = rows and columns of the image. The lat and long are in another file "geodetic_in.nc". I want to export the image with the lat and long to tiff format.
To my understanding, the names of dimensions and coordinates should be the same, while I failed to do this
here are my attempts
import rioxarray as rio
import xarray as xr
xds = xr.open_dataset('LST_in.nc')
coord =xr.open_dataset('geodetic_in.nc')
lat, lon = coord.latitude_in.data, coord.longitude_in.data
xds = xds.assign_coords({"lat":(["rows","columns"], lat), "lon":(["rows","columns"], lon)})
xds = xds.rename_dims({"rows": "lon", "columns": 'lat'})
Here I received this error
ValueError: Cannot rename rows to lon because lon already exists. Try using swap_dims instead.
Then I tried this
xds = xds.swap_dims({'rows' : 'lon', 'columns' : 'lat'})
but received another error
ValueError: replacement dimension 'lon' is not a 1D variable along the old dimension 'rows'
Also this one
lst = xds.LST
lst.rio.set_spatial_dims(x_dim = 'lon', y_dim = 'lat', inplace = True)
Error: MissingSpatialDimensionError: x dimension (lon) not found. Data variable: LST
The only one that works but with the wrong coordinates is
lst = xds.LST
lst.rio.set_spatial_dims(x_dim = 'columns', y_dim = 'rows', inplace = True)
lst.rio.write_crs("epsg:4326", inplace = True)
lst.rio.to_raster("lst.tif")
I would appreciate your help. attached is the image files
https://wetransfer.com/downloads/e3711adf56f73cd07119b43d19f7360820220117154330/c46b21

The short answer is: you can't. Because both netCDF and grib are gridded data format and the current data points positions can't be described using a regular latitude/longitude grid.
I plotted a sample of your data points latitude and longitude:
As you can see, the data points are not placed on lines of constant latitude and longitude, they do not follow any pattern that could be describe with projected grid, rotated grid or curvilinear grid either.
If you want to make a gridded file with the LST values and latitude and longitude as coordinates, you will have to reproject your data. You can use the rasterio.warp module, see here for more information.

Re-distributing 2d data with max in middle

Hey all I have a set up seemingly random 2D data that I want to reorder. This is more for an image with specific values at each pixel but the concept will be the same.
I have large 2d array that looks very random, say:
x = 100
y = 120
np.random.random((x,y))
and I want to re-distribute the 2d matrix so that the maximum value is in the center and the values from the maximum surround it giving it sort of a gaussian fall off from the center.
small example:
output = [[0.0,0.5,1.0,1.0,1.0,0.5,0.0]
[0.0,1.0,1.0,1.5,1.0,0.5,0.0]
[0.5,1.0,1.5,2.0,1.5,1.0,0.5]
[0.0,1.0,1.0,1.5,1.0,0.5,0.0]
[0.0,0.5,1.0,1.0,1.0,0.5,0.0]]
I know it wont really be a gaussian but just trying to give a visualization of what I would like. I was thinking of sorting the 2d array into a list from max to min and then using that to create a new 2d array but Im not sure how to distribute the values down to fill the matrix how I want.
Thank you very much!

If anyone looks at this in the future and needs help, Here is some advice on how to do this effectively for a lot of data. Posted below is the code.
def datasort(inputarray,spot_in_x,spot_in_y):
#get the data read
center_of_y = spot_in_y
center_of_x = spot_in_x
M = len(inputarray[0])
N = len(inputarray)
l_list = list(itertools.chain(*inputarray)) #listed data
l_sorted = sorted(l_list,reverse=True) #sorted listed data
#Reorder
to_reorder = list(np.arange(0,len(l_sorted),1))
x = np.linspace(-1,1,M)
y = np.linspace(-1,1,N)
centerx = int(M/2 - center_of_x)*0.01
centery = int(N/2 - center_of_y)*0.01
[X,Y] = np.meshgrid(x,y)
R = np.sqrt((X+centerx)**2 + (Y+centery)**2)
R_list = list(itertools.chain(*R))
values = zip(R_list,to_reorder)
sortedvalues = sorted(values)
unzip = list(zip(*sortedvalues))
unzip2 = unzip[1]
l_reorder = zip(unzip2,l_sorted)
l_reorder = sorted(l_reorder)
l_unzip = list(zip(*l_reorder))
l_unzip2 = l_unzip[1]
sorted_list = np.reshape(l_unzip2,(N,M))
return(sorted_list)
This code basically takes your data and reorders it in a sorted list. Then zips it together with a list based on a circular distribution. Then using the zip and sort commands you can create the distribution of data you wish to have based on your distribution function, in my case its a circle that can be offset.

Comparing two lists of stellar x-y coordinates to find matching objects

I have two .txt files that contain the x and y pixel coordinates of thousands of stars in an image. These two different coordinate lists were the products of different data processing methods, which result in slightly different x and y values for the same object.
File 1 *the id_out is arbitrary
id_out x_out y_out m_out
0 803.6550 907.0910 -8.301
1 700.4570 246.7670 -8.333
2 802.2900 894.2130 -8.344
3 894.6710 780.0040 -8.387
File 2
xcen ycen mag merr
31.662 37.089 22.759 0.387
355.899 37.465 19.969 0.550
103.079 37.000 20.839 0.847
113.500 38.628 20.966 0.796
The objects listed in the .txt files are not organized in a way that allows me to identify the same object in both files. So, I thought for every object in file 1, which has fewer objects than file 2, I would impose a test to find the star match between file 1 and file 2. For every star in file 1, I want to find the star in file 2 that has the closest match to x-y coordinates using the distance formula:
distance= sqrt((x1-x2)^2 + (y1-y2)^2) within some tolerance of distance that I can change. Then print onto a master list the x1, y1, x2, y2, m_out, mag, and merr parameters in the file.
Here is the code I have so far, but I am not sure how to arrive at a working solution.
#/usr/bin/python
import pandas
import numpy as np
xcen_1 = np.genfromtxt('file1.txt', dtype=float, usecols=1)
ycen_1 = np.genfromtxt('file1.txt', dtype=float, usecols=2)
mag1 = np.genfromtxt('file1.txt', dtype=float, usecols=3)
xcen_2 = np.genfromtxt('file2.txt', dtype=float, usecols=0)
ycen_2 = np.genfromtxt('file2.txt', dtype=float, usecols=1)
mag2 = np.genfromtxt('file2.txt', dtype=float, usecols=2)
merr2 = np.genfromtxt('file2.txt', dtype=float, usecols=3)
tolerance=10.0
i=0
file=open('results.txt', 'w+')
file.write("column names")
for i in len(xcen_1):
dist=np.sqrt((xcen_1[i]-xcen_2[*])^2+(ycen_1[i]-ycen_2[*]^2))
if dist < tolerance:
f.write(i, xcen_1, ycen_1, xcen_2, ycen_2, mag1, mag2, merr2)
else:
pass
i=i+1
file.close
The code doesn't work as I don't know how to implement that every star in file 2 must be run through the test, as indicated by the * index (which comes from idl, in which I am more versed). Is there a solution for this logic, as opposed to the thinking in this case:
To compare two independent image coordinate lists with same scale but coordinate-grid having some rotation and shift
Thanks in advance!

You can use pandas Dataframes. Here's how:
import pandas as pd
# files containing the x and y pixel coordinates and other information
df_file1=pd.read_csv('file1.txt',sep='\s+')
df_file2=pd.read_csv('file2.txt',sep='\s+')
join=[]
for i in range(len(df_file1)):
for j in range(len(df_file2)):
dis=((df_file1['x_out'][i]-df_file2['xcen'][j])**2+(df_file1['y_out'][i]-df_file2['ycen'][j])**2)**0.5
if dis<10:
join.append({'id_out': df_file1['id_out'][i], 'x_out': df_file1['x_out'][i], 'y_out':df_file1['y_out'][i],
'm_out':df_file1['m_out'][i],'xcen':df_file2['xcen'][j],'ycen':df_file2['ycen'][j],
'mag':df_file2['mag'][j],'merr':df_file2['merr'][j]})
df_join=pd.DataFrame(join)
df_join.to_csv('results.txt', sep='\t')

Select data by latitude and longitude

I am using a dataset from DWD (Deutscher Wetterdienst) and want to select data by latitude and longitude. The import works so far. So no problem there. Now I want to select data by latitude and longitude. It works when I try to select data with sel when I use x and y.
But not with lat and long. I tried all the answer which I could find, like:
ds.sel(latitude=50, longitude=14, method='nearest')
but I am getting the error
ValueError: dimensions or multi-index levels ['latitude', 'longitude'] do not exist
That's my code:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import xarray as xr


ds = xr.open_dataset(
'cosmo-d2_germany_rotated-lat-lon_single-level_2019061721_012_ASWDIFD_S.grib2',
engine='cfgrib',
backend_kwargs={'filter_by_keys': {'stepUnits': 1}}
)

print(ds)
Output:
<xarray.Dataset>
Dimensions: (x: 651, y: 716)
Coordinates:
time datetime64[ns] ...
step timedelta64[ns] ...
surface int32 ...
latitude (y, x) float64 ...
longitude (y, x) float64 ...
valid_time datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables:
ASWDIFD_S (y, x) float32 ...
Attributes:
GRIB_edition: 2
GRIB_centre: edzw
GRIB_centreDescription: Offenbach
GRIB_subCentre: 255
Conventions: CF-1.7
institution: Offenbach
history: 2019-07-22T13:35:33 GRIB to CDM+CF via cfgrib-

In your file latitude and longitude are not dimensions but rather helper 2D variables containing coordinate data. In xarray parlance they are called non-dimension coordinates and you cannot slice on them. See also Working with Multidimensional Coordinates.

It would be better if you regrid the data to a regular grid inside python so that you have latitudes and longitudes as 1D vectors, you would have to make a grid and then interpolate the data over that grid.
Also you need to check https://www.ecmwf.int/sites/default/files/elibrary/2018/18727-cfgrib-easy-and-efficient-grib-file-access-xarray.pdf to see the way to access grib files in xarray. If you dont want to use xarray for this purpose pygrib is another option.

I can't test the solution as I don't have the cfgrib engine installed, but could you try to use
numpy.find_nearest(lonarray, lonvalue)
to find the lon and lat indexes near your point as per this soln:
Find nearest value in numpy array
And then select the point using the index directly on the x,y coordinates?
http://xarray.pydata.org/en/stable/indexing.html

i wrote a function for the files from the DWD:
import pygrib # https://jswhit.github.io/pygrib/docs/
import numpy as np
def get_grib_data_nearest_point(grib_file, inp_lat, inp_lon):
"""
Gets the correspondent value to a latitude-longitude pair of coordinates in
a grib file.
:param grib_file: path to the grib file in disk
:param lat: latitude
:param lon: longitude
:return: scalar
"""
# open the grib file, get the coordinates and values
grbs = pygrib.open(grib_file)
grb = grbs[1]
lats, lons = grb.latlons()
values = grb.values
grbs.close()
# check if user coords are valide
if inp_lat > max(grb.distinctLatitudes): return np.nan
if inp_lat < min(grb.distinctLatitudes): return np.nan
if inp_lon > max(grb.distinctLongitudes): return np.nan
if inp_lon < min(grb.distinctLongitudes): return np.nan
# find index for closest lat (x)
diff_save = 999
for x in range(0, len(lats)):
diff = abs(lats[x][0] - inp_lat)
if diff < diff_save:
diff_save = diff
else:
break
# find index for closest lon (y)
diff_save = 999
for y in range(0, len(lons[x])):
diff = abs(lons[x][y] - inp_lon)
if diff < diff_save:
diff_save = diff
else:
break
# index the array to return the correspondent value
return values[x][y]

As noted above, you can re-grid your data (probably given in curvilinear grid i.e., lat and lon in 2D arrays) to your desired resolution of 1-D array (lat/lon) , after which you can use .sel directly on the lat/lon coords to slice the data.
Check out xESMF(https://xesmf.readthedocs.io/en/latest/notebooks/Curvilinear_grid.html).
Easy, fast interpolation and regridding of Xarray fields with good examples and documentation.

indices of 2D lat lon data

I am trying to find the equivalent (if there exists one) of an NCL function that returns the indices of two-dimensional latitude/longitude arrays closest to a user-specified latitude/longitude coordinate pair.
This is the link to the NCL function that I am hoping there is an equivalent to in python. I'm suspecting at this point that there is not, so any tips on how to get indices from lat/lon coordinates is appreciated
https://www.ncl.ucar.edu/Document/Functions/Contributed/getind_latlon2d.shtml
Right now , I have my coordinate values saved into an .nc file and are read by:
coords='coords.nc'
fh = Dataset(coords, mode='r')
lons = fh.variables['g5_lon_1'][:,:]
lats = fh.variables['g5_lat_0'][:,:]
rot = fh.variables['g5_rot_2'][:,:]
fh.close()

I found scipy spatial.KDTree can perform similar task. Here is my code of finding the model grid that is closest to the observation location
from scipy import spatial
from netCDF4 import Dataset
# read in the one dimensional lat lon info from a dataset
fname = '0k_T_ann_clim.nc'
fid = Dataset(fname, 'r')
lat = fid.variables['lat'][:]
lon = fid.variables['lon'][:]
# make them a meshgrid for later use KDTree
lon2d, lat2d = np.meshgrid(lon, lat)
# zip them together
model_grid = list( zip(np.ravel(lon2d), np.ravel(lat2d)) )
#target point location : 30.5N, 56.1E
target_pts = [30.5 56.1]
distance, index = spatial.KDTree(model_grid).query(target_pts)
# the nearest model location (in lat and lon)
model_loc_coord = [coord for i, coord in enumerate(model_grid) if i==index]

I'm not sure how lon/lat arrays are stored when read in python, so to use the following solution you may need to convert lon/lat to numpy arrays. You can just put the abs(array-target).argmin() in a function.
import numpy as np
# make a dummy longitude array, 0.5 degree resolution.
lon=np.linspace(0.5,360,720)
# find index of nearest longitude to 25.4
ind=abs(lon-25.4).argmin()
# check it works! this gives 25.5
lon[ind]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

netCDF grid file: Extracting information from 1D array using 2D values - python

Related

Converting Sentinel 3 LST image with netcdf format to tiff with proper coordinates python

Re-distributing 2d data with max in middle

Comparing two lists of stellar x-y coordinates to find matching objects

Select data by latitude and longitude

indices of 2D lat lon data

Categories

Resources