NetCDF: How can I script plotting at each time step? - python

I would like to create plot images from a NetCDF at each time step.
My NetCDF files look like this:
netcdf file:/C:/home/data/cmorph/test/reduced_cmorph_adjusted_spi_pearson_01.nc {
dimensions:
time = UNLIMITED; // (240 currently)
lat = 120;
lon = 360;
variables:
float spi_pearson_01(time=240, lat=120, lon=360);
:_FillValue = NaNf; // float
:valid_min = -3.09; // double
:valid_max = 3.09; // double
:long_name = "Standard Precipitation Index (Pearson Type III distribution), 1-month scale";
:_ChunkSizes = 1, 120, 360; // int
int time(time=240);
:units = "days since 1800-01-01 00:00:00";
:_ChunkSizes = 1024; // int
:_CoordinateAxisType = "Time";
float lat(lat=120);
:units = "degrees_north";
:_CoordinateAxisType = "Lat";
float lon(lon=360);
:units = "degrees_east";
:_CoordinateAxisType = "Lon";
// global attributes:
:title = "CMORPH Version 1.0BETA Version, daily precip from 00Z-24Z";
:history = "Wed Feb 28 07:30:01 2018: C:\\home\\miniconda\\Library\\bin\\ncks.exe --dmn lon,0,,4 --dmn lat,0,,4 CMORPH_V1.0_ADJ_0.25deg-DLY_00Z_1998_2017.nc cmorph_reduced_adjusted.nc";
:NCO = "4.7.1";
:_CoordSysBuilder = "ucar.nc2.dataset.conv.DefaultConvention";
}
I like the plots produced by Panoply but I haven't worked out how to script it (I don't want to go through the GUI for this since I'll have roughly 1500 plots to create). I'm not wedded to Panoply per se, so if someone has a better idea please advise. I could hammer this out in matplotlib but it'd take me quite a while and wouldn't look as good as the Panoply plots. I'm trying to avoid doing much if any of the plotting myself, but maybe there's something out there that provides easy plotting of NetCDFs which can be called from a script (I typically use Python and Bash).

Example using xarray:
import xarray as xr
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
file_name = "reduced_cmorph_adjusted_spi_pearson_01.nc"
with xr.open_dataset(file_name) as ds:
for t in range(ds.time.shape[0]):
da = ds.spi_pearson_01.isel(time=t)
plt.figure()
da.plot()
plt.savefig('frame{}.png'.format(t))
Non-scripting method if you don't mind using a few clicks in Panoply: create a lat/lon plot and then choose File->Export Animation . You can output individual time steps as JPG or PNG.

I'm kind of assuming you don't want to insert 1500 figures in a report or talk and therefore the purpose of this is just to investigate the file slice by slice. If this is the case I would simply open the file using
ncview file.nc
This allows you to step through the slices, animate, pass the cursor over the slices to see the values and click on a point to see a timeseries. If you don't have it, you can install it easily with apt-get (ubuntu, mint etc) with
sudo apt-get install ncview

Related

Obtain mean value of specific area in netCDF

I am trying to plot a time series of the sea surface temperature (SST) for a specific region from a .nc file. The SST is a three-dimensional variable (lat,lon,time), that has mean daily values for a specific region from 1982 to 2016. I want my plot to reflect the seasonal sst variability of the entire period of time. I assume that what I need to do first is to obtain a mean sst value for my lat,lon region for each of the days with which I can work alter on. So far, I assume that I need to read the .nc file and the variables:
import netCDF4 as nc
f = nc.Dataset('cmems_SST_MED_SST_L4_REP_OBSERVATIONS_010_021_1639073212518.nc')
sst = f.variables['analysed_sst'][:]
lon = f.variables['longitude'][:]
lat = f.variables['latitude'][:]
Next, following the code suggested here, I tried to reshape and obtain the mean, but an error pops up:
global_average= np.nanmean(sst[:,:,:],axis=(1,2))
annual_temp = np.nanmean(np.reshape(global_average, (34,12)), axis = 1)
#34 years between 1982 and 2016, and 12 months per year.
ERROR cannot reshape array of size 14008 into shape (34,12)
From here I found different ways, like using cdo or nco (which didn't work due installation problems) among others, which were not suitable for my case. I used nanmean because know that in MATLAB this is done using the nanmean function. I am quite new to this topic and I would like to ask for some hints, like, where should I focus more or what path is more suitable for this case. Thank you!!
Handling daily data with just pure python is difficult because you should consider leap years and sub-setting a region require tedious indexing striding....
As steTATO mentioned, since the data that you are working has daily temporal resolution you need to consider the following
You need to reshape the global_average in the shape of (34,365) or (34,366) depending on the year (1984,1988,1992,1996,2000,2004,2008,2012,2016). So your above code should look something like
annual_temp = np.nanmean(np.reshape(global_average, (34,365)), axis = 1)
But, like I said, because of the leap years, you can't do the things you want by simply reshaping the global_average.
If I had no choice but to use python only, I'd do the following
import numpy as np
def days_in_year(in_year):
leap_years = [1984,1988,1992,1996,2000,2004,2008,2012,2016]
if (in_year in leap_years):
out_days = 366
else:
out_days = 365
return out_days
# some of your code, importing netcdf data
year = np.arange(1982,2017)
global_avg= np.nanmean(sst[:,:,:],axis=(1,2))
annual_avgs = []
i = 0
for yr in range(35):
i = i + days_in_year(year[yr])
f = i - days_in_year(year[yr])
annual_avg = np.nanmean(global_avg[i:f])
annual_avgs.append(annual_avg)
Above code basically takes and averages by taking strides of the global_avg considering the leap year, and saving it as annual_avgs.

Regrid Netcdf file in python

I'm trying to regrid a NetCDF file from 0.125 degrees to 0.083-degree spatial scale. The netcdf contains 224 latitudes and 464 longitudes and it has daily data for one year.
I tried xarray for it but it produces this memory error:
MemoryError: Unable to allocate 103. GiB for an array with shape (13858233841,) and data type float64
How can I regrid the file with python?
A Python option, using CDO as a backend, is my package nctoolkit: https://nctoolkit.readthedocs.io/en/latest/, instalable via pip (https://pypi.org/project/nctoolkit/)
It has a built in method called to_latlon which will regrid to a specified latlon grid
In your case, you would need to do:
import nctoolkit as nc
data = nc.open_data(infile)
data.to_latlon(lon = [lon_min,lon_max],lat=[lat_min,lat_max], res =[0.083, 0.083])
Another option is try cf-python, which can (in general) regrid larger-than-memory datasets in both spherical polar coordinates and Cartesian coordinates. It uses the ESMF regridding engine to do this, so linear, first and second-order conservative, nearest neighbour, etc. regridding methods are available.
Here is an example of the kind of regridding that you need:
import cf
import numpy
f = cf.example_field(2) # Use cf.read to read your own data
print('Source field:')
print(f)
# Define the output grid
lat = cf.DimensionCoordinate(
data=cf.Data(numpy.arange(-90, 90.01, 0.083), 'degreesN'))
lon = cf.DimensionCoordinate(
data=cf.Data(numpy.arange(0, 360, 0.083), 'degreesE'))
# Regrid the field
g = f.regrids({'latitude': lat, 'longitude': lon}, method='linear')
print('\nRegridded field:')
print(g)
which produces:
Source field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data : air_potential_temperature(time(36), latitude(5), longitude(8)) K
Cell methods : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
: latitude(5) = [-75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
Regridded field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data : air_potential_temperature(time(36), latitude(2169), longitude(4338)) K
Cell methods : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
: latitude(2169) = [-90.0, ..., 89.94399999999655] degreesN
: longitude(4338) = [0.0, ..., 359.971] degreesE
: air_pressure(1) = [850.0] hPa
There are plenty of options to get the destination grid from other fields, as well as defining it explicitly. More details can be found in the documentation
cf-python will infer which axes are X and Y, etc from the CF metadata attached to the dataset, but if that is missing then there are always ways to manually set it or work around it.
The easiest way to do this is to use operators like cdo and nco.
For example:
cdo remapbil,target_grid infile.nc ofile.nc
The target_grid can be a descriptor file or you can use a NetCDF file with your desired grid resolution. Take note of other regridding methods that might suit your need. The example above is using bilinear interpolation.
Xarray uses something called 'lazy loading' to try and avoid using too much memory. Somewhere in your code, you are using a command which loads the entirety of the data into memory, which it cannot do. Instead, you should specify the calculation, then save the result directly to file. Xarray will perform the calculation a chunk at a time without loading everything into memory.
An example of regridding might look something like this:
da_input = open_dataarray(
'input.nc') # the file the data will be loaded from
regrid_axis = np.arange(-90, 90, 0.125) # new coordinates
da_output = da_input.interp(lat=regrid_axis) # specify calculation
da_ouput.to_netcdf('output.nc') # save direct to file
Doing da_input.load(), or da_output.compute(), for example, would cause all the data to be loaded into memory - which you want to avoid.
Another way to access the cdo functionality from within python is to make use of the Pypi cdo project:
pip install cdo
Then you can do
from cdo import Cdo
cdo=Cdo()
cdo.remapbil("target_grid",input="in.nc",output="out.nc")
where target_grid is your usual list of options
a nc file to use the grid from
a regular grid specifier e.g. r360x180
a txt file with a grid descriptor (see below)
There are several methods built in for the regridding:
remapbic : bicubic interpolation
remapbil : bilinear interpolation
remapnn : nearest neighbour interpolation
remapcon : first order conservative remapping
remapcon2 : 2nd order conservative remapping
You can use a grid descriptor file to define the area you need to interpolate to...
in the file grid.txt
gridtype=lonlat
xfirst=X (here X is the longitude of the left hand point)
xinc=0.083
xsize=NX (here put the number of points in domain)
yfirst=Y
yinc=0.083
ysize=NY
For more details you can refer to my video guide on interpolation.

Projecting GOES-16 Geostationary data into Plate Carree Cartopy

I am trying desperately to project some geostationary data from GOES-16 netCDF file to a different projection. I can get the background map to re-project but can't seem to get the data to follow.
I'm not super versed in this yet, but here is what I have thus far:
Reading the data through NetCDF4:
from netCDF4 import Dataset
nc = Dataset('OR_ABI-L1b-RadF-
M3C13_G16_s20182831030383_e20182831041161_c20182831041217.nc')
data = nc.variables['Rad'][:]
Here I'm trying to get the geostationary info:
sat_h = nc.variables['goes_imager_projection'].perspective_point_height
X = nc.variables['x'][:] * sat_h
Y = nc.variables['y'][:] * sat_h
# Satellite longitude
sat_lon =
nc.variables['goes_imager_projection'].longitude_of_projection_origin
# Satellite sweep
sat_sweep = nc.variables['goes_imager_projection'].sweep_angle_axis
Here I'm taking projection data from the .nc file:
proj_var = nc.variables['goes_imager_projection']
sat_height = proj_var.perspective_point_height
central_lon = proj_var.longitude_of_projection_origin
semi_major = proj_var.semi_major_axis
semi_minor = proj_var.semi_minor_axis
print proj_var
<type 'netCDF4._netCDF4.Variable'>
int32 goes_imager_projection()
long_name: GOES-R ABI fixed grid projection
grid_mapping_name: geostationary
perspective_point_height: 35786023.0
semi_major_axis: 6378137.0
semi_minor_axis: 6356752.31414
inverse_flattening: 298.2572221
latitude_of_projection_origin: 0.0
longitude_of_projection_origin: -75.0
sweep_angle_axis: x
unlimited dimensions:
current shape = ()
filling on, default _FillValue of -2147483647 used
And here is a small snippet of my code that's relevant:
fig = plt.figure(figsize=(30,20))
globe = ccrs.Globe(semimajor_axis=semi_major, semiminor_axis=semi_minor)
proj = ccrs.Geostationary(central_longitude=central_lon,
satellite_height=sat_height, globe=globe)
ax = fig.add_subplot(1, 1, 1, projection=proj)
IR_img = ax.imshow(data[:,:],origin='upper',extent=(X.min(), X.max(), Y.min(), Y.max()),
cmap=IR_cmap,interpolation='nearest',vmin=162.,vmax=330.)
And an image of everyone playing nicely:
Data and map working
When I try and get say a Plate Carree projection I try:
proj = ccrs.PlateCarree(central_longitude=central_lon,globe=globe)
And an image of my failure:
Data and map not working
I've tried messing with the extent in the imshow method, I've tried adding a
transform=proj
in the imshow and no luck, it just gets hung up and I have to restart the kernel.
Clearly it is a lack of understanding on my part. If anyone can quickly and easily help/explain the way I want to change my projection from geostationary, I would greatly appreciate that.
I'm running archaic python2.
Thanks for looking.
EDIT: Problem seems to be resolved thanks to insight from DopplerShift and ajdawson, I guess I was maybe a little impatient/ignorant of how long a full disk transformation would take.
It looks like you need to specify the transform keyword to imshow. This keyword tells cartopy what coordinates your data are in, which in this case should be geostationary.
I don't have your dataset so I cannot test this, but the snippet below illustrates the concept. The projection and the transform are independent so you should define both. The value of the transform argument (crs in the example below) is fixed for the data set, but the projection can be anything you like (including the same as crs).
See this example of reprojecting a geostationary image: https://scitools.org.uk/cartopy/docs/v0.16/gallery/geostationary.html#sphx-glr-gallery-geostationary-py. Also see the guide to projection and transform arguments here: https://scitools.org.uk/cartopy/docs/v0.16/tutorials/understanding_transform.html.
globe = ccrs.Globe(semimajor_axis=semi_major, semiminor_axis=semi_minor)
crs = ccrs.Geostationary(central_longitude=central_lon,
satellite_height=sat_height, globe=globe)
proj = ccrs.PlateCarree(central_longitude=central_lon, globe=globe)
ax = fig.add_subplot(1, 1, 1, projection=proj)
IR_img = ax.imshow(data[:,:], origin='upper',
extent=(X.min(), X.max(), Y.min(), Y.max()),
transform=crs,
cmap=IR_cmap,
interpolation='nearest', vmin=162., vmax=330.)

Python: using shiftgrid to change longitude

I am using python 3.6 to map climate model data that has the original longitude of (0,360). I using a basemap function called shiftgrid in order to shift all of the longitude values in my data set to (-180,180). However I am still getting an empty map. Any suggestions would be helpful. Thanks!
Here is my code so far:
#Longitude values:
a=0
b=360
prcp = np.load('data.npy')
data=np.average(prcp,axis=0)
plt.figure()
# create Basemap
x1 = np.linspace(a,b, data.shape[1])
y1 = np.linspace(-90, 90, data.shape[0])
xx1, yy1 = np.meshgrid(x1, y1)
data, x1 = shiftgrid(180., data, x1,start= False)
I've run into these kind of problems myself. My solution was to transform the co-ordinates of the input file using cdo (there is a python front end available for cdo). The shift can be done with:
cdo sellonlatbox,-180,180,-90,90 input.nc output.nc
or, if you have the python front end available, you can get the data directly as netCDF4.Dataset with
from cdo import Cdo
cdo = Cdo()
data = cdo.sellonlatbox(
'-180,180,-90,90',
input = 'input.nc',
)
Hope this helps.

pystan .plot() plots trace summary twice

This is probably a stupid question, but whenever I'm using the .plot() function it plots the summary twice. Anyone knows, why it does that and how I can stop it?
As you can see I'm using jupyter notebooks if that matters.
It happens with any stan model (and on two separate installations)
This code would produce the problem for me
import pystan
import numpy as np
model_string = """
data {
int<lower=0> N;
int y[N];
}
parameters {
real<lower=0, upper=1> theta;
}
model {
theta ~ beta(1,1);
y ~ bernoulli(theta);
}
"""
N = 50
z = 10
y = np.append(np.repeat(1, z), np.repeat(0, N - z))
dat = {'y':y,
'N':N}
fit = pystan.stan(model_code=model_string, data=dat, iter=1000, warmup=200, thin=1, chains = 3)
fit.plot()
This is caused by the %matplotlib inline statement drawing more than you want it to. The StanFit4Model.plot method calls matplotlib.pyplot.subplot, and that call itself will draw a plot when your notebook has %matplotlib inline. Then the method returns the Figure object. If you don't capture it, your notebook decides to show it to you as an image instead of printing the type, and you get the double plot.
You can output a single plot by capturing the output Figure. Change your code from
fit.plot()
to instead be
fig = fit.plot()
Putting a semicolon after the `.plot()' also does the trick.
Learned it from https://github.com/stan-dev/pystan/issues/230

Categories