latitudes and longitudes are two dimensional arrays in netcdf file? - python

I'm trying to open, read, and plot cloud cover data from a netCDF file. The file opens and plots fine in the Panoply viewer, so the data looks to be OK. But I can't figure out for the life in me how to convert the latitudes and longitudes into a 1-D array each, they seem to be 2-D. Which makes NaN sense to me...
Reading the file and variables works fine:
fh = Dataset("/home/ubuntu/HIMA8_CC/Himawari8_AHI_FLDK_2020171_0140_00_CLOUD_MASK_EN.nc", mode='r')
lon = fh.variables['Longitude'][:]
lat = fh.variables['Latitude'][:]
cloud_mask = fh.variables['CloudMask'][:]
However, the dimensions of the latitude and longitude variable are 2D, I would have expected those to be 1D:
print(lon.shape)
print(lat.shape)
print(np.mean(lon))
print(cloud_mask.shape)
print(np.mean(lon))
print(np.mean(cloud_mask))
prints
(5500, 5500)
(5500, 5500)
91.97970824333167
(5500, 5500)
91.97970824333167
1.8154066433116118
the mean values look as expected. Can anyone with netCDF experience shed some light on what it is I'm missing here?

You could try this.
import xarray as xr
fh = xr.open_dataset('/home/ubuntu/HIMA8_CC/Himawari8_AHI_FLDK_2020171_0140_00_CLOUD_MASK_EN.nc')
lon = fh['Longitude'].values.flatten()
lat = fh['Latitude'].values.flatten()

Related

Converting Sentinel 3 LST image with netcdf format to tiff with proper coordinates python

I have a Sentinel 3 image which is stored in a number of netcdf files. The variable is stored in the file "LST_in.nc" with dimensions = rows and columns of the image. The lat and long are in another file "geodetic_in.nc". I want to export the image with the lat and long to tiff format.
To my understanding, the names of dimensions and coordinates should be the same, while I failed to do this
here are my attempts
import rioxarray as rio
import xarray as xr
xds = xr.open_dataset('LST_in.nc')
coord =xr.open_dataset('geodetic_in.nc')
lat, lon = coord.latitude_in.data, coord.longitude_in.data
xds = xds.assign_coords({"lat":(["rows","columns"], lat), "lon":(["rows","columns"], lon)})
xds = xds.rename_dims({"rows": "lon", "columns": 'lat'})
Here I received this error
ValueError: Cannot rename rows to lon because lon already exists. Try using swap_dims instead.
Then I tried this
xds = xds.swap_dims({'rows' : 'lon', 'columns' : 'lat'})
but received another error
ValueError: replacement dimension 'lon' is not a 1D variable along the old dimension 'rows'
Also this one
lst = xds.LST
lst.rio.set_spatial_dims(x_dim = 'lon', y_dim = 'lat', inplace = True)
Error: MissingSpatialDimensionError: x dimension (lon) not found. Data variable: LST
The only one that works but with the wrong coordinates is
lst = xds.LST
lst.rio.set_spatial_dims(x_dim = 'columns', y_dim = 'rows', inplace = True)
lst.rio.write_crs("epsg:4326", inplace = True)
lst.rio.to_raster("lst.tif")
I would appreciate your help. attached is the image files
https://wetransfer.com/downloads/e3711adf56f73cd07119b43d19f7360820220117154330/c46b21
The short answer is: you can't. Because both netCDF and grib are gridded data format and the current data points positions can't be described using a regular latitude/longitude grid.
I plotted a sample of your data points latitude and longitude:
As you can see, the data points are not placed on lines of constant latitude and longitude, they do not follow any pattern that could be describe with projected grid, rotated grid or curvilinear grid either.
If you want to make a gridded file with the LST values and latitude and longitude as coordinates, you will have to reproject your data. You can use the rasterio.warp module, see here for more information.

WCS.all_pix2world: Read in latitude and longitude of entire FITS field

I have FITS file in pixel coordinates. I want to basically read in its latitude and longitude.
I know we can use something following:
from astropy import WCS
w = WCS('sample.fits')
lat, long = w.all_pix2world(90, 38, 1)
But instead of one RA(=90), DEC(=38) value inside
w.all_pix2world
I want to fetch latitudes and longitudes (in Galactic coordinate system) of the entire field of sample.fits.
header
NAXIS = 2
NAXIS1 = 150
NAXIS2 = 150
Please let me know if any other information is needed. Any help is appreciated.
Thanks.

netCDF grid file: Extracting information from 1D array using 2D values

I am trying to work in Python 3 with topography/bathymetry-information (basically a grid containing x [longitude in decimal degrees], y [latitude in decimal degrees] and z [meter]).
The grid file has the extension .nc and is therefore a netCDF-file. Normally I would use it in mapping tools like Generic Mapping Tools and don't have to bother with how a netCDF file works, but I need to extract specific information in a Python script. Right now this is only limiting the dataset to certain longitude/latitude ranges.
However, right now I am a bit lost on how to get to the z-information for specific x and y values. Here's what I know about the data so far
import netCDF4
#----------------------
# Load netCDF file
#----------------------
bathymetry_file = 'C:/Users/te279/Matlab/data/gebco_08.nc'
fh = netCDF4.Dataset(bathymetry_file, mode='r')
#----------------------
# Getting information about the file
#----------------------
print(fh.file_format)
NETCDF3_CLASSIC
print(fh)
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
title: GEBCO_08 Grid
source: 20100927
dimensions(sizes): side(2), xysize(933120000)
variables(dimensions): float64 x_range(side), float64 y_range(side), int16 z_range(side), float64 spacing(side), int32 dimension(side), int16 z(xysize)
groups:
print(fh.dimensions.keys())
odict_keys(['side', 'xysize'])
print(fh.dimensions['side'])
: name = 'side', size = 2
print(fh.dimensions['xysize'])
: name = 'xysize', size = 933120000
#----------------------
# Variables
#----------------------
print(fh.variables.keys()) # returns all available variable keys
odict_keys(['x_range', 'y_range', 'z_range', 'spacing', 'dimension', 'z'])
xrange = fh.variables['x_range'][:]
print(xrange)
[-180. 180.] # contains the values -180 to 180 for the longitude of the whole world
yrange = fh.variables['y_range'][:]
print(yrange)
[-90. 90.] # contains the values -90 to 90 for the latitude of the whole world
zrange = fh.variables['z_range'][:]
[-10977 8685] # contains the depths/topography range for the world
spacing = fh.variables['spacing'][:]
[ 0.00833333 0.00833333] # spacing in both x and y. Equals the dimension, if multiplied with x and y range
dimension = fh.variables['dimension'][:]
[43200 21600] # corresponding to the shape of z if it was the 2D array I would've hoped for (it's currently an 1D array of 9333120000 - which is 43200*21600)
z = fh.variables['z'][:] # currently an 1D array of the depth/topography/z information I want
fh.close
Based on this information I still don't know how to access z for specific x/y (longitude/latitude) values. I think basically I need to convert the 1D array of z into a 2D array corresponding to longitude/latitude values. I just have not a clue how to do that. I saw in some posts where people tried to convert a 1D into a 2D array, but I have no means to know in what corner of the world they start and how they progress.
I know there is a 3 year old similar post, however, I don't know how to find an analogue "index of the flattened array" for my problem - or how to exactly work with that. Can somebody help?
You need to first read in all three of z's dimensions (lat, lon, depth) and then extract values across each of those dimensions. Here are a few examnples.
# Read in all 3 dimensions [lat x lon x depth]
z = fh.variables['z'][:,:,:]
# Topography at a single lat/lon/depth (1 value):
z_1 = z[5,5,5]
# Topography at all depths for a single lat/lon (1D array):
z_2 = z[5,5,:]
# Topography at all latitudes and longitudes for a single depth (2D array):
z_3 = z[:,:,5]
Note that the number you enter for lat/lon/depth is the index in that dimension, not an actual latitude, for instance. You'll need to determine the indices of the values you are looking for beforehand.
I just found the solution in this post. Sorry that I didn't see that before. Here's what my code looks like now. Thanks to Dave (he answered his own question in the post above). The only thing I had to work on was that the dimensions have to stay integers.
import netCDF4
import numpy as np
#----------------------
# Load netCDF file
#----------------------
bathymetry_file = 'C:/Users/te279/Matlab/data/gebco_08.nc'
fh = netCDF4.Dataset(bathymetry_file, mode='r')
#----------------------
# Extract variables
#----------------------
xrange = fh.variables['x_range'][:]
yrange = fh.variables['y_range'][:]
zz = fh.variables['z'][:]
fh.close()
#----------------------
# Compute Lat/Lon
#----------------------
nx = (xrange[-1]-xrange[0])/spacing[0] # num pts in x-dir
ny = (yrange[-1]-yrange[0])/spacing[1] # num pts in y-dir
nx = nx.astype(np.integer)
ny = ny.astype(np.integer)
lon = np.linspace(xrange[0],xrange[-1],nx)
lat = np.linspace(yrange[0],yrange[-1],ny)
#----------------------
# Reshape the 1D to an 2D array
#----------------------
bathy = zz[:].reshape(ny, nx)
So, now when I look at the shape of both zz and bathy (following code), the former is a 1D array with a length of 933120000, the latter the 2D array with dimensions of 43200x21600.
print(zz.shape)
print(bathy.shape)
The next step is to use indices to access the bathymetry/topography data correctly, just as N1B4 described in his post

indices of 2D lat lon data

I am trying to find the equivalent (if there exists one) of an NCL function that returns the indices of two-dimensional latitude/longitude arrays closest to a user-specified latitude/longitude coordinate pair.
This is the link to the NCL function that I am hoping there is an equivalent to in python. I'm suspecting at this point that there is not, so any tips on how to get indices from lat/lon coordinates is appreciated
https://www.ncl.ucar.edu/Document/Functions/Contributed/getind_latlon2d.shtml
Right now , I have my coordinate values saved into an .nc file and are read by:
coords='coords.nc'
fh = Dataset(coords, mode='r')
lons = fh.variables['g5_lon_1'][:,:]
lats = fh.variables['g5_lat_0'][:,:]
rot = fh.variables['g5_rot_2'][:,:]
fh.close()
I found scipy spatial.KDTree can perform similar task. Here is my code of finding the model grid that is closest to the observation location
from scipy import spatial
from netCDF4 import Dataset
# read in the one dimensional lat lon info from a dataset
fname = '0k_T_ann_clim.nc'
fid = Dataset(fname, 'r')
lat = fid.variables['lat'][:]
lon = fid.variables['lon'][:]
# make them a meshgrid for later use KDTree
lon2d, lat2d = np.meshgrid(lon, lat)
# zip them together
model_grid = list( zip(np.ravel(lon2d), np.ravel(lat2d)) )
#target point location : 30.5N, 56.1E
target_pts = [30.5 56.1]
distance, index = spatial.KDTree(model_grid).query(target_pts)
# the nearest model location (in lat and lon)
model_loc_coord = [coord for i, coord in enumerate(model_grid) if i==index]
I'm not sure how lon/lat arrays are stored when read in python, so to use the following solution you may need to convert lon/lat to numpy arrays. You can just put the abs(array-target).argmin() in a function.
import numpy as np
# make a dummy longitude array, 0.5 degree resolution.
lon=np.linspace(0.5,360,720)
# find index of nearest longitude to 25.4
ind=abs(lon-25.4).argmin()
# check it works! this gives 25.5
lon[ind]

Reading netCDF data

I am trying to read data from a nc file, which has the following variables:
['latitude',
'longitude',
'latitude_bnds',
'longitude_bnds',
'time',
'minimum',
'maximum',
'average',
'stddev',
'AirTemperature']
What I am trying to achieve is to extract the AirTemperature data for any given (time, latitude and longitude):
And for that, I am doing something like this:
df = Dataset('data_file.nc', 'r')
lat = df.variables['latitude'][:]
lon = df.variables['longitude'][:]
temp = df.variables['AirTemperature'][:,:,:]
#(lat, lon) for Coffee, TN
test_lat = 35.45
test_lon = -86.05
#getting the indices for the (lat, lon) using numpy.where
lat_idx = np.where(lat==test_lat)[0][0]
lon_idx = np.where(lon==test_lon)[0][0]
#extracting data for all the times for given indices
tmp_crd = temp[:,lat_idx,lon_idx]
Up till this point, it all goes fine. However, when I print the data, I see all the identical values being printed.. (for any lat, lon that I have been testing..)
print tmp_crd.data
>>> [-9999. -9999. -9999. ..., -9999. -9999. -9999.]
Which I don't seem to understand..why the air temperature is always shown as -9999.0? I have tested for a lot of other (lat, lon) points, and it seems for every location point, the air temperature is -9999.0. How can I extract the real data from this file?
Please help :-(.
Thank You
Okay..I think I figured out. Here is what was happening:
The nc file i have has a different precision for latitude and longitudes, and I was apparently passing much more rounded sets of (lat, lon) points. Once I figured out the right precision, it works fine for me. The -9999.0 value was basically the _fill_value for the numpy's masked array (which indicated that if there is no record matching the given set of lat and long, return the masked values).
Thanks every one.

Categories