Change dimension and values of netcdf file in Python

Change dimension and values of netcdf file in Python - python

I'm trying to change the dimension of values in netcdf file.
First I read a netcdf file and interpolated the data.
import numpy as np
import netCDF4
from scipy.interpolate import interp1d
def interpolation(a,b,c):
f = interp1d(a,b,kind='linear')
return f(c)
file = 'directory/test.nc'
data = netcdf4.Dataset(file)
lon = data.variables['lon'] # size = 10
lat = data.variables['lat'] # size = 10
lev = data.variables['lev'] # size = 100
values = data.variables['values'] # size = (100,10,10)
new_lev = np.linspace(0,1,200) # new vertical grid size = 200
new_values = np.full(len(new_lev), len(lat), len(lon)) # size = (200,10,10)
### interpolation ###
for loop_lat in range(len(lat)):
for loop_lon in range(len(lon)):
new_values[:, loop_lat, loop_lon] = interpolation(lev, values[:,loop_lat,loop_lon], new_lev)
## how can I save these new_lev and new_values in the netcdf file ?
Using the interpolation, I converted the values of dimension A to dimension B.
Let say the original dimension A is 100 and interpolated dimension B is 200.
After the changing dimension, how can I save this values and dimension into netcdf file?
Could you please give me some advise?

You can open a NetCDF file for editing in place. See:
https://unidata.github.io/netcdf4-python/#creatingopeningclosing-a-netcdf-file
Rather than:
data = netcdf4.Dataset(file)
Try:
data = netCDF4.Dataset(file,'r+',clobber=True).

Related

Replace negative values by NaN on a raster (without converting it into an array first)

I want to replace all the negative files of a raster (tiff) file to no data values (nan), and save it into a new file (also tiff). I don't want to convert it into a numpy array first - I want to replace directly the pixel on the raster itself, using rasterio for instance.
I tried the following:
#Open the file with rasterio
raster_file = rasterio.open(r"path_to_file.tif")
#Read as raster
raster = raster_file.read(1)
#Assign 999 to all negative values
raster[raster <= 0] = 999
#Create a boolean mask
mask_boolean = (raster !=999)
# Write the mask back to the dataset:
raster.write_mask(mask_boolean)
raster.close()

You could use xarray:
import xarray
# Open the file with xarray
raster_file = xarray.open_dataarray(path, engine="rasterio")
print(raster_file.min().data)
# > - 1
# Assign NaN to negative values
raster_file = raster_file.where(raster_file > 0)
print(raster_file.min().data)
# > 0
You will end up with a xarray representation of your raster filled with NaN.
However, you cannot change pixels values in place, you will have to save the raster back on disk.

Iterating over files and stitching together 3D arrays

I have several netCDF files that have an array in the shape of (365,585,1386) and I'm trying to read in each new array and stitch them together along axis = 0; or add all of the days of the year (365). The other two dimensions are latitude and longitude, so ideally I have several years of data for each lat/long point (each netCDF file is one calendar year of data).
import glob
from netCDF4 import Dataset
import numpy as np
data = '/Users/sjakober/Documents/ResearchSpring2020/erc_1979.nc'
files = sorted(glob.glob('erc*'))
for x, f in enumerate(files):
nc = Dataset(data, mode='r')
print(f)
if x == 0:
a = nc.variables['energy_release_component-g'][:]
else:
b = nc.variables['energy_release_component-g'][:]
np.hstack((a, b))
nc.close()

Numpy Masked Array changes zeros to a huge number when uploaded in another script

I have a script where I create masks on a gridded array for certain distances from the coastline and save them to a netcdf file. I have a grid file with latitude, longitude and a land mask (netcdf files). My script (simplified here) does the following:
from netCDF4 import Dataset
grd = Dataset('gridfile', 'r')
mask_rho = grd.variables['mask_rho'][:]
lat = grd.variables['lat'][:]
full_region = np.copy(mask_rho) #copy land mask from grid file
full_region[lat<5] = 0. #masks latitudes above and below region of interest
full_region[lat>35] = 0.
mask_full = np.ma.masked_where(full_region==0., full_region) #numpy mask
filename = directory+'mask.nc'
ncfile = Dataset(filename, 'w', format='NETCDF4')
ncfile.createDimension('lat', lat)
ncfile.createDimension('lon',lon)
Dims = ('lat,'lon')
data = ncfile.createVariable('mask_full','f4',Dims)
data[:] = mask_full
ncfile.close()
However, when I open the netcdf file i created in another script, the data seems to come in three forms:
'masked data' (masked out values are blank, values within the mask are 1)
'data' (values within the mask are 1, and masked out values are some huge number, 996920996838.....)
and 'mask', which is a true/false boolean.
Why is the 'data' part giving me such a huge number? And why are there three data types?

Why is the NETcdf data I created masked?

I create a netcdf file with some data, and when I import the data in another script, it is masked :
>>> type(Data[:])
<class 'numpy.ma.core.MaskedArray'>
Here is how I create the data :
# Put in a grid
print 'Putting the data in a grid...'
LatRange = range( int(min(Lat)), int(max(Lat)), 1 )
LonRange = np.arange( int(min(Lon)), int(max(Lon)), 1 )
dRange = range(0,200,10) + range(200,4000,100)
dateRange = np.arange( float(min(Dates).year)+min(Dates).month/12., float(max(Dates).year)+max(Dates).month/12., 1./12. )
dataset = Dataset('gridded_data/DataAveraged.nc','w', format='NETCDF4_CLASSIC')
zD = dataset.createDimension('zD',len(dRange))
latD = dataset.createDimension('latD',len(LatRange))
lonD = dataset.createDimension('lonD',len(LonRange))
timeD = dataset.createDimension('timeD',len(dateRange))
tempAve = dataset.createVariable('tempAve', np.float32, ('zD','latD','lonD','timeD'), fill_value=-9999)
tempAve.units = 'psu'
tempAve[:] = Tgrid_ave
Where Tgrid_ave is a numpy array.
Then, I import the data this way in another script :
dataset = Dataset('gridded_data/DataAveraged.nc', 'r')
LatRange = dataset.variables['lat'][:]
LonRange = dataset.variables['lon'][:-1]
Tgrid_ave = dataset.variables['tempAve']
And my Lat and Lon data are not masked, but my Tgrid_ave data is.
How can I avoid this!?

The netCDF4 library used to return either a masked array or a regular Numpy array, depending on if the data you request from the array (or array slice) contains fill values or not. This is unfortunate behavior but it seems to be fixed in PR 787. So I think that, from version 1.4 onward, the default behavior is always to return a masked array if a fill value is defined (I haven't tested it).
Anyway, you can ensure that you always get a regular numpy array by setting the set_auto_mask to False.

netCDF grid file: Extracting information from 1D array using 2D values

I am trying to work in Python 3 with topography/bathymetry-information (basically a grid containing x [longitude in decimal degrees], y [latitude in decimal degrees] and z [meter]).
The grid file has the extension .nc and is therefore a netCDF-file. Normally I would use it in mapping tools like Generic Mapping Tools and don't have to bother with how a netCDF file works, but I need to extract specific information in a Python script. Right now this is only limiting the dataset to certain longitude/latitude ranges.
However, right now I am a bit lost on how to get to the z-information for specific x and y values. Here's what I know about the data so far
import netCDF4
#----------------------
# Load netCDF file
#----------------------
bathymetry_file = 'C:/Users/te279/Matlab/data/gebco_08.nc'
fh = netCDF4.Dataset(bathymetry_file, mode='r')
#----------------------
# Getting information about the file
#----------------------
print(fh.file_format)
NETCDF3_CLASSIC
print(fh)
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
title: GEBCO_08 Grid
source: 20100927
dimensions(sizes): side(2), xysize(933120000)
variables(dimensions): float64 x_range(side), float64 y_range(side), int16 z_range(side), float64 spacing(side), int32 dimension(side), int16 z(xysize)
groups:
print(fh.dimensions.keys())
odict_keys(['side', 'xysize'])
print(fh.dimensions['side'])
: name = 'side', size = 2
print(fh.dimensions['xysize'])
: name = 'xysize', size = 933120000
#----------------------
# Variables
#----------------------
print(fh.variables.keys()) # returns all available variable keys
odict_keys(['x_range', 'y_range', 'z_range', 'spacing', 'dimension', 'z'])
xrange = fh.variables['x_range'][:]
print(xrange)
[-180. 180.] # contains the values -180 to 180 for the longitude of the whole world
yrange = fh.variables['y_range'][:]
print(yrange)
[-90. 90.] # contains the values -90 to 90 for the latitude of the whole world
zrange = fh.variables['z_range'][:]
[-10977 8685] # contains the depths/topography range for the world
spacing = fh.variables['spacing'][:]
[ 0.00833333 0.00833333] # spacing in both x and y. Equals the dimension, if multiplied with x and y range
dimension = fh.variables['dimension'][:]
[43200 21600] # corresponding to the shape of z if it was the 2D array I would've hoped for (it's currently an 1D array of 9333120000 - which is 43200*21600)
z = fh.variables['z'][:] # currently an 1D array of the depth/topography/z information I want
fh.close
Based on this information I still don't know how to access z for specific x/y (longitude/latitude) values. I think basically I need to convert the 1D array of z into a 2D array corresponding to longitude/latitude values. I just have not a clue how to do that. I saw in some posts where people tried to convert a 1D into a 2D array, but I have no means to know in what corner of the world they start and how they progress.
I know there is a 3 year old similar post, however, I don't know how to find an analogue "index of the flattened array" for my problem - or how to exactly work with that. Can somebody help?

You need to first read in all three of z's dimensions (lat, lon, depth) and then extract values across each of those dimensions. Here are a few examnples.
# Read in all 3 dimensions [lat x lon x depth]
z = fh.variables['z'][:,:,:]
# Topography at a single lat/lon/depth (1 value):
z_1 = z[5,5,5]
# Topography at all depths for a single lat/lon (1D array):
z_2 = z[5,5,:]
# Topography at all latitudes and longitudes for a single depth (2D array):
z_3 = z[:,:,5]
Note that the number you enter for lat/lon/depth is the index in that dimension, not an actual latitude, for instance. You'll need to determine the indices of the values you are looking for beforehand.

I just found the solution in this post. Sorry that I didn't see that before. Here's what my code looks like now. Thanks to Dave (he answered his own question in the post above). The only thing I had to work on was that the dimensions have to stay integers.
import netCDF4
import numpy as np
#----------------------
# Load netCDF file
#----------------------
bathymetry_file = 'C:/Users/te279/Matlab/data/gebco_08.nc'
fh = netCDF4.Dataset(bathymetry_file, mode='r')
#----------------------
# Extract variables
#----------------------
xrange = fh.variables['x_range'][:]
yrange = fh.variables['y_range'][:]
zz = fh.variables['z'][:]
fh.close()
#----------------------
# Compute Lat/Lon
#----------------------
nx = (xrange[-1]-xrange[0])/spacing[0] # num pts in x-dir
ny = (yrange[-1]-yrange[0])/spacing[1] # num pts in y-dir
nx = nx.astype(np.integer)
ny = ny.astype(np.integer)
lon = np.linspace(xrange[0],xrange[-1],nx)
lat = np.linspace(yrange[0],yrange[-1],ny)
#----------------------
# Reshape the 1D to an 2D array
#----------------------
bathy = zz[:].reshape(ny, nx)
So, now when I look at the shape of both zz and bathy (following code), the former is a 1D array with a length of 933120000, the latter the 2D array with dimensions of 43200x21600.
print(zz.shape)
print(bathy.shape)
The next step is to use indices to access the bathymetry/topography data correctly, just as N1B4 described in his post

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Change dimension and values of netcdf file in Python - python

You can open a NetCDF file for editing in place. See: https://unidata.github.io/netcdf4-python/#creatingopeningclosing-a-netcdf-file Rather than: data = netcdf4.Dataset(file) Try: data = netCDF4.Dataset(file,'r+',clobber=True).

Related

Replace negative values by NaN on a raster (without converting it into an array first)

Iterating over files and stitching together 3D arrays

Numpy Masked Array changes zeros to a huge number when uploaded in another script

Why is the NETcdf data I created masked?

netCDF grid file: Extracting information from 1D array using 2D values

Categories

Resources