Subtract, Average, Squeeze, then Subset Variable in NetCDF via Python

Subtract, Average, Squeeze, then Subset Variable in NetCDF via Python - python

I have a NetCDF that has multiple pieces pollution data across U.S. in a daily value format. I am hoping to subtract a background variable ('nosmoke background PM25') from another ('PM25') then average this result to create an annual value. From what I understand, by creating an annual value I change my array from three dimensions (time, lat, long) to two (lat, long) since there is now no daily values. I am then hoping to subset this variable to only include a region of interest. Below is my code. Any help is much appreciated.
import netCDF4 as nc
import numpy as np
fn = '24hr_SmokePM25_2018.nc'
ds = nc.Dataset(fn)
#All data defining total PM25
TOT_PM25 = ds.variables['PM25']
NS_BKGD_PM25 = ds.variables['nosmoke background PM25']
#create empty array that is same size as TOT_PM25 variable to fill
Smoke_PM25 = np.full_like(TOT_PM25,0)
#For loop in subtracting total PM25 from no smoke PM25
for i in range (1,365):
Smoke2_PM25[i,:,:]=TOT_PM25[i,:,:]-NS_BKGD_PM25[i,:,:]
#Averaging and squeezing daily value
np.mean(Smoke2_PM25[i,:,:])
np.squeeze(Smoke2_PM25, axis=0)
#boundary input for desired region to pull
latbounds = [ 24 , 39 ]
lonbounds = [ -96 , -77 ]
lons = ds.variables['lon'][:]
lats = ds.variables['lat'][:]
# latitude lower and upper index
latli = np.argmin( np.abs( lats - latbounds[0] ) )
latui = np.argmin( np.abs( lats - latbounds[1] ) )
# longitude lower and upper index
lonli = np.argmin( np.abs( lons - lonbounds[0] ) )
lonui = np.argmin( np.abs( lons - lonbounds[1] ) )
# Value subsets (SmokeP25, latitude, longitude)
Smoke_Region_PM25 = ds.variables['PM25'][ : , latli:latui , lonli:lonui ]
#Close the NetCDF
ds.close()

You can do this easily with v0.3.0 of my nctoolkit package (https://pypi.org/project/nctoolkit/).
First read the data in:
import nctoolkit as nc
fn = '24hr_SmokePM25_2018.nc'
data = nc.open_data(fn)
Then subtract one variable from the other to create your new variable:
data.assign(new = lambda x: x.PM25 - x.nosmoke_background_PM25)
Once that is done you can calculate an annual mean:
data.tmean("year")
Then visualize the results:
data.plot("new")

Related

Aggregating points using xarray

I have a set of netcdf datasets that basically look like a CSV file with columns for latitude, longitude, value. These are points along tracks that I want to aggregate to a regular grid of (say) 1 degree from -90 to 90 and -180 to 180 degrees, by for example calculating the mean and/or standard deviation of all points that fall within a given cell.
This is quite easily done with a loop
D = np.zeros((180, 360))
for ilat in np.arange(-90, 90, 1, dtype=np.int):
for ilon in np.arange(-180, 180, 1, dtype=np.int):
p1 = np.logical_and(ds.lat >= ilat,
ds.lat <= ilat + 1)
p2 = np.logical_and(ds.lon >=ilon,
ds.lon <= ilon+1)
if np.sum(p1*p2) == 0:
D[90 + ilat, 180 +ilon] = np.nan
else:
D[90 + ilat, 180 + ilon] = np.mean(ds.var.values[p1*p2])
# D[90 + ilat, 180 + ilon] = np.std(ds.var.values[p1*p2])
Other than using numba/cython to speed this up, I was wondering whether this is something you can directly do with xarray in a more efficient way?

You should be able to solve this using pandas and xarray.
You will first need to convert your data set to a pandas data frame.
Once this is done, df is the dataframe and assuming longitude and latitude are lon/lat, you will need to round the lon/lats to the nearest integer value, and then calculate the mean for each lon/lat. You will then need to set lon/lat to indices. Then you can use xarray's to_xarray to convert to an array:
import xarray as xr
import pandas as pd
import numpy as np
df = df.assign(lon = lambda x: np.round(x.lon))
df = df.assign(lat = lambda x: np.round(x.lat))
df = df.groupby(["lat", "lon"]).mean()
df = df.set_index(["lat", "lon"])
df.to_xarray()

I use #robert-wilson as a starting point, and to_xarray is indeed part of my solution. Other inspiration came from here. The approach that I used is shown below. It's probably slower than numba-ing my solution above, but much simpler.
import netCDF4
import numpy as np
import xarray as xr
import pandas as pd
fname = "super_funky_file.nc"
f = netCDF4.Dataset(fname)
lat = f.variables['lat'][:]
lon = f.variables['lon'][:]
vari = f.variables['super_duper_variable'][:]
df = pd.DataFrame({"lat":lat,
"lon":lon,
"vari":vari})
# Simple functions to calculate the grid location in rows/cols
# using lat/lon as inputs. Global 0.5 deg grid
# Remember to cast to integer
to_col = lambda x: np.floor(
(x+90)/0.5).astype(
np.int)
to_row = lambda x: np.floor(
(x+180.)/0.5).astype(
np.int)
# Map the latitudes to columns
# Map the longitudes to rows
df['col'] = df.lat.map(to_col)
df['row'] = df.lon.map(to_row)
# Aggregate by row and col
gg = df.groupby(['col', 'row'])
# Now, create an xarray dataset with
# the mean of vari per grid cell
ds = gg.mean().to_xarray()
dx = gg.std().to_xarray()
ds['stdi'] = dx['vari']
dx = gg.count().to_xarray()
ds['counti'] = dx['vari']```

How to interpolate numpy.polyval and numpy.polyfit python

I did a numpy.polyfit() for latitude, longitude, & altitude data for a satellite orbit and interpolated (50 points) with numpy.polyval().
Now, I want to just take a window (0-4.5 degrees longitude) and do a higher resolution interpolation (6,000 points). I think that I need to use the fit coefficients from the first low res fit in order to interpolate for my longitude window, and I am not quite sure how to do this.
Inputs:
lat = [27.755611104020687, 22.50661883405905, 17.083576087905502, 11.53891099628959, 5.916633366002468, 0.2555772624429494, -5.407902834141322, -11.037514984810027, -16.594621304857206, -22.03556688048686, -27.308475759820045, -32.34927891621322, -37.07690156937186, -41.38803163295967, -45.15306971601912, -48.21703193866987, -50.41165326774015, -51.58419672864487, -51.63883932997542, -50.57025116952513, -48.46557920053242, -45.47329014246061, -41.76143266388077, -37.48707787049647, -32.782653540783, -27.754184631685046, -22.48503337048438, -17.041097574740743, -11.475689837873944, -5.833592289780744, -0.1543286595142316, 5.525119007560692, 11.167878192881306, 16.73476477885508, 22.18160021405449, 27.455997555900108, 32.493386953033685, 37.21222272985329, 41.508824407948275, 45.25350232626601, 48.291788915858554, 50.45698534747271, 51.59925055739275, 51.62660832560593, 50.53733379179681, 48.420673231121725, 45.42531420150485, 41.71819693220144, 37.45473807165676, 32.76569228387106]
lon = [-109.73105744378498, -104.28690174554579, -99.2435132929552, -94.48533149079628, -89.91054414962821, -85.42671400689177, -80.94616150449806, -76.38135021210172, -71.6402674905218, -66.62178379632216, -61.21120467960157, -55.27684029674759, -48.66970878028004, -41.23083703244677, -32.813881865289346, -23.332386757370532, -12.832819226213942, -1.5659455609661785, 10.008077792630402, 21.33116444634303, 31.92601575632583, 41.51883213364072, 50.04498630545507, 57.58103957109249, 64.26993028992476, 70.2708323505337, 75.73441871754586, 80.7944079829813, 85.56734813043659, 90.1558676264546, 94.65309120129724, 99.14730128118617, 103.72658922048785, 108.48349841714494, 113.51966824008079, 118.95024882101737, 124.9072309203375, 131.5395221402974, 139.00523971191907, 147.44847902856114, 156.95146022590976, 167.46163867248032, 178.72228750873975, -169.72898181991064, -158.44642409799974, -147.8993300787564, -138.35373014113995, -129.86955508919888, -122.36868103811106, -115.70852432245486]
alt = [374065.49207488785, 372510.1635949105, 371072.75959230476, 369836.3092635453, 368866.7921820211, 368209.0950216997, 367884.3703536549, 367888.97894243425, 368195.08833668986, 368752.88080031495, 369494.21701128664, 370337.49662954226, 371193.3839051864, 371971.0136622536, 372584.272228585, 372957.752022573, 373032.0104747458, 372767.8112563471, 372149.0940816824, 371184.49208500446, 369907.2992362557, 368373.8795969478, 366660.5935723809, 364859.4071422184, 363072.42955020745, 361405.69765685993, 359962.58417682414, 358837.24421522504, 358108.5277743581, 357834.7679493668, 358049.8054538341, 358760.531463618, 359946.1257064284, 361559.04646970675, 363527.70518032915, 365760.6377191965, 368151.8843206526, 370587.2165838985, 372950.8014553002, 375131.8814988529, 377031.06540952163, 378565.8596562773, 379675.13241518533, 380322.2707576381, 380496.8682141012, 380214.86538256245, 379517.14674525027, 378466.68079100474, 377144.36811517406, 375643.83731560566]
myOrbitJ2000Time =[ 20027712., 20027713., 20027714., 20027715., 20027716.,
20027717., 20027718., 20027719., 20027720., 20027721.,
20027722., 20027723., 20027724., 20027725., 20027726.,
20027727., 20027728., 20027729., 20027730., 20027731.,
20027732., 20027733., 20027734., 20027735., 20027736.,
20027737., 20027738., 20027739., 20027740., 20027741.,
20027742., 20027743., 20027744., 20027745., 20027746.,
20027747., 20027748., 20027749., 20027750., 20027751.,
20027752., 20027753., 20027754., 20027755., 20027756.,
20027757., 20027758., 20027759., 20027760., 20027761.]
Code:
deg = 30 #polynomial degree for fit
fittime = myOrbitJ2000Time - myOrbitJ2000Time[0]
'Latitude Interpolation'
fitLat = np.polyfit(fittime, lat, deg)
polyval_lat = np.polyval(fitLat,fittime)
'Longitude Interpolation'
fitLon = np.polyfit(fittime, lon, deg)
polyval_lon = np.polyval(fitLon,fittime)
'Altitude Interpolation'
fitAlt = np.polyfit(fittime, alt, deg)
polyval_alt = np.polyval(fitAlt,fittime)
'Get Lat, Lon, & Alt values for a window of 0-4.5 deg Longitude'
lonwindow =[]
latwindow = []
altwindow = []
for i in range(len(polyval_lat)):
if 0 < polyval_lon[i] < 4.5: # get lon vals in window
lonwindow.append(polyval_lon[i]) #append lon vals
latwindow.append(polyval_lat[i]) #append corresponding lat vals
altwindow.append(polyval_alt[i]) #append corresponding alt vals
lonwindow = np.array(lonwindow)
Just to be clear -- The issue is I only have one point in the window range, I want to use the interpolation/equation/curve from the previous step. So then I can use that to interpolate again and generate 6,000 points in my window range.

Original answer posted here
First, generate the polynomial fit coefficients using the old time (x-axis) values, and interpolated longitude (y-axis) values.
import numpy as np
import matplotlib.pyplot as plt
poly_deg = 3 #degree of the polynomial fit
polynomial_fit_coeff = np.polyfit(original_times, interp_lon, poly_deg)
Next, use np.linspace() to generate arbitrary time values based on the number of desire points in the window.
start = 0
stop = 4
num_points = 6000
arbitrary_time = np.linspace(start, stop, num_points)
Finally, use the fit coefficients and the arbitrary time to get the actual interpolated longitude (y-axis) values and plot.
lon_intrp_2 = np.polyval(polynomial_fit_coeff, arbitrary_time)
plt.plot(arbitrary_time, lon_intrp_2, 'r') #interpolated window as a red curve
plt.plot(myOrbitJ2000Time, lon, '.') #original data plotted as points

How to do a second interpolation in python

I did my first interpolation with numpy.polyfit() and numpy.polyval() for 50 longitude values for a full satellite orbit.
Now, I just want to look at a window of 0-4.5 degrees longitude and do a second interpolation so that I have 6,000 points for longitude in the window.
I need to use the equation/curve from the first interpolation to create the second one because there is only one point in the window range. I'm not sure how to do the second interpolation.
Inputs:
lon = [-109.73105744378498, -104.28690174554579, -99.2435132929552, -94.48533149079628, -89.91054414962821, -85.42671400689177, -80.94616150449806, -76.38135021210172, -71.6402674905218, -66.62178379632216, -61.21120467960157, -55.27684029674759, -48.66970878028004, -41.23083703244677, -32.813881865289346, -23.332386757370532, -12.832819226213942, -1.5659455609661785, 10.008077792630402, 21.33116444634303, 31.92601575632583, 41.51883213364072, 50.04498630545507, 57.58103957109249, 64.26993028992476, 70.2708323505337, 75.73441871754586, 80.7944079829813, 85.56734813043659, 90.1558676264546, 94.65309120129724, 99.14730128118617, 103.72658922048785, 108.48349841714494, 113.51966824008079, 118.95024882101737, 124.9072309203375, 131.5395221402974, 139.00523971191907, 147.44847902856114, 156.95146022590976, 167.46163867248032, 178.72228750873975, -169.72898181991064, -158.44642409799974, -147.8993300787564, -138.35373014113995, -129.86955508919888, -122.36868103811106, -115.70852432245486]
myOrbitJ2000Time = [ 20027712., 20027713., 20027714., 20027715., 20027716.,
20027717., 20027718., 20027719., 20027720., 20027721.,
20027722., 20027723., 20027724., 20027725., 20027726.,
20027727., 20027728., 20027729., 20027730., 20027731.,
20027732., 20027733., 20027734., 20027735., 20027736.,
20027737., 20027738., 20027739., 20027740., 20027741.,
20027742., 20027743., 20027744., 20027745., 20027746.,
20027747., 20027748., 20027749., 20027750., 20027751.,
20027752., 20027753., 20027754., 20027755., 20027756.,
20027757., 20027758., 20027759., 20027760., 20027761.]
Code:
deg = 30 #polynomial degree for fit
fittime = myOrbitJ2000Time - myOrbitJ2000Time[0]
'Longitude Interpolation'
fitLon = np.polyfit(fittime, lon, deg) #gets fit coefficients
polyval_lon = np.polyval(fitLon,fittime) #interp.s to get actual values
'Get Longitude values for a window of 0-4.5 deg Longitude'
lonwindow =[]
for i in range(len(polyval_lon)):
if 0 < polyval_lon[i] < 4.5: # get lon vals in window
lonwindow.append(polyval_lon[i]) #append lon vals
lonwindow = np.array(lonwindow)

First, generate the polynomial fit coefficients using the old time (x-axis) values, and interpolated longitude (y-axis) values.
import numpy as np
import matplotlib.pyplot as plt
poly_deg = 3 #degree of the polynomial fit
polynomial_fit_coeff = np.polyfit(original_times, interp_lon, poly_deg)
Next, use np.linspace() to generate arbitrary time values based on the number of desire points in the window.
start = 0
stop = 4
num_points = 6000
arbitrary_time = np.linspace(start, stop, num_points)
Finally, use the fit coefficients and the arbitrary time to get the actual interpolated longitude (y-axis) values and plot.
lon_intrp_2 = np.polyval(polynomial_fit_coeff, arbitrary_time)
plt.plot(arbitrary_time, lon_intrp_2, 'r') #interpolated window as a red curve
plt.plot(myOrbitJ2000Time, lon, '.') #original data plotted as points

linear interpolation with grided data in python

I've a gridded weather data set which have a dimension 33 X 77 X 77. The first dimension is time and rest are Lat and Lon respectively. I need to interpolate (linear or nearest neighbour) the data to different points (lat&lon) for each time and write it into a csv file. I've used interp2d function from scipy and it is successful for one time step. As I've many locations I don't want to loop over time.
below shown is the piece of code that I wrote, Can any one suggest a better method to accomplish the task?
import sys ; import numpy as np ; import scipy as sp ; from scipy.interpolate import interp2d ;import datetime ; import time ; import pygrib as pg ;
grb_f=pg.open('20150331/gfs.20150331.grb2') lat=tmp[0].data(lat1=4,lat2=42,lon1=64,lon2=102)[1] ; lat=lat[:,0];
lon=tmp[0].data(lat1=4,lat2=42,lon1=64,lon2=102)[2] ; lon=lon[0,:] ;
temp=np.empty((0,lon.shape[0]))
for i in range(0,tmp.shape[0]):
dat=tmp[i].data(lat1=4,lat2=42,lon1=64,lon2=102)
temp=np.concatenate([temp,dat[0]-273.15],axis=0)
temp1=temp.reshape(tmp.shape[0],lat.shape[0],lon.shape[0])
x=77 ; y=28 #(many points)
f=interp2d(lon,lat, temp1[0,:,:],kind='linear',copy=False,bounds_error=True ) ; Z=f(x,y)
EDIT ::
Instead of making a 3D matrix, I appended the data in vertically and made data matrix of size 2541 X 77 and lat and lon of size 2541 X 1. the interp2d function gives Invalid length Error.
f=interp2d(lon,lat, temp1[0,:,:],kind='linear',copy=False,bounds_error=True )
"Invalid length for input z for non rectangular grid")
ValueError: Invalid length for input z for non rectangular grid
length of my x,y,z matrix are same (2541,2541,2541). Then why did it throw an Error?
Could any one explain ? Your help will be highly appreciated.

Processing of time series is very easy with RedBlackPy.
import datetime as dt
import redblackpy as rb
index = [dt.date(2018,1,1), dt.date(2018,1,3), dt.date(2018,1,5)]
lat = [10.0, 30.0, 50.0]
# create Series object
lat_series = rb.Series(index=index, values=lat, dtype='float32',
interpolate='linear')
# Now you can access at any key using linear interpolation
# Interpolation does not create new items in Series
# It uses neighbours to calculate value inplace when you call getitem
print(lat_series[dt.date(2018,1,2)]) #prints 20
So, if you want to just write interpolated values to csv file, you can iterate over list of needed keys and call getitem of Series object then put value to file:
# generator for dates range
def date_range(start, stop, step=dt.timedelta(1)):
it = start - step
while it < step:
it += step
yield it
#------------------------------------------------
# create list for keeping output strings
out_data = []
# create output file
out_file = open('data.csv', 'w')
# add head for output table
out_data.append('Time,Lat\n')
for date in date_range(dt.date(2018,1,1), dt.date(2018,1,5)):
out_data.append( '{:},{:}\n'.format(date, lat_series[date]) )
# write output Series
out_file.writelines(out_data)
out_file.close()
By the same way you can add to your processing Lon data.

If you want to create an "interpolator" object once, and use it to sequentially query just the specific points you need, you could take a loot at the scipy.interpolate.Rbf module:
"A class for radial basis function approximation/interpolation of n-dimensional scattered data."
Where n-dimensional would work for your data if you adjust ratio between temporal and spatial dimensions, and scattered meaning you can also use it for regular/uniform data.

If it's the same lat and lon for each time could you do it using slices and a manual interpolation. So if you want a 1D array of values at lat = 4.875, lon = 8.4 (obviously you would need to scale to match your actual spacing)
b = a[:,4:6, 8:10]
c = ((b[:,0,0] * 0.125 + b[:,0,1] * 0.875) * 0.6 + ((b[:,1,0] * 0.125 + b[:,1,1] * 0.875) * 0.4)
obviously you could do it all in one line but it would be even uglier
EDIT to allow variable lat and lon at each time period.
lat = np.linspace(55.0, 75.0, 33)
lon = np.linspace(1.0, 25.0, 33)
data = np.linspace(18.0, 25.0, 33 * 77 * 77).reshape(33, 77, 77)
# NB for simplicity I map 0-360 and 0-180 rather than -180+180
# also need to ensure values on grid lines or edges work ok
lat_frac = lat * 77.0 / 360.0
lat_fr = np.floor(lat_frac).astype(int)
lat_to = lat_fr + 1
lat_frac -= lat_fr
lon_frac = lon * 77.0 / 180.0
lon_fr = np.floor(lon_frac).astype(int)
lon_to = lon_fr + 1
lon_frac -= lon_fr
data_interp = ((data[:,lat_fr,lon_fr] * (1.0 - lat_frac) +
data[:,lat_fr,lon_to] * lat_frac) * (1.0 - lon_frac) +
(data[:,lat_to,lon_fr] * (1.0 - lat_frac) +
data[:,lat_to,lon_to] * lat_frac) * lon_frac)

Python 2D array -- How to plug in x and retrieve y value?

I have been looking for an answer since yesterday but no luck. So I have a 1D spectrum (.fits) file with flux value at each wavelength. I have converted them into a 2D array (x,y)=(wavelength, flux) and want to write a program which will return flux(y) at some assigned wavelengths(x). I have tried this:
#modules
import scipy
import numpy as np
import pyfits as pf
#Target Global Vaiables
hdulist_tg = pf.open('cutmask1-2.0001.fits')
hdr_tg = hdulist_tg[0].header
flux_tg = hdulist_tg[0].data
crval_tg = hdr_tg['CRVAL1'] #Starting wavelength
cdel_tg = hdr_tg['CDELT1'] #Wavelength axis width
wave_tg = crval_tg + np.arange(3183)*cdel_tg #Create an x-axis
wavelist = [6207,6315,6369,6438,6490,6565,6588]
wave_flux=[]
diff = 10
for wave in wave_tg:
for flux in flux_tg:
wave_flux.append((wave,flux))
for item in wave_flux:
wave = item[0]
flux = item[1]
#Where I got my actual wavelength that exists in wave_tg
diffmatch = np.abs(wave - wavelist[0])
if diffmatch < diff:
flux_wave = flux
diff = diffmatch
wavematch = wave
print wavelist[0],flux_wave,wavematch
but the program always return the same flux value even though the wavelength is different. Please help...

I would skip the creation of the two dimensional table altogether and just use interp:
fluxvalues = np.interp(wavelist, wave_tg, flux_tg)
For the file you posted, the code you posted doesn't work due to the hard-coded length of the wave_tg array. I would therefore recommend you rather use
wave_tg = crval_tg + np.arange(len(flux_tg))*cdel_tg
Also, for some reason it seems that the file you posted doesn't actually go up to the wavelengths you are looking up. You might need to check that you are calculating the corresponding wavelengths correctly or check that you are looking up the right wavelengths.

I've made some changes in your code:
using numpy ot create wave_flux as a ndarray using np.hstack(), np.repeat() and np.tile()
using fancy indexing to get the values matching your search
The resulting code is:
#modules
import scipy
import numpy as np
import pyfits as pf
#Target Global Vaiables
hdulist_tg = pf.open('cutmask1-2.0001.fits')
hdr_tg = hdulist_tg[0].header
flux_tg = hdulist_tg[0].data
crval_tg = hdr_tg['CRVAL1'] #Starting wavelength
cdel_tg = hdr_tg['CDELT1'] #Wavelength axis width
wave_tg = crval_tg + np.arange(3183)*cdel_tg #Create an x-axis
wavelist = [6207,6315,6369,6438,6490,6565,6588]
wave_flux = np.vstack(( np.repeat(wave_tg, len(flux_tg)),
np.tile(flux_tg, len(wave_tg)) )).transpose()
wave_ref = wavelist[0]
diff = 10
print wave_flux[ np.abs(wave_flux[:,0]-wave_ref) < diff ]
Which will return a sub-group of wave_flux with the wave values in column 0 and flux values in column 1:
[[ 6197.10300138 500.21020508]
[ 6197.10300138 523.24102783]
[ 6197.10300138 510.6390686 ]
...,
[ 6216.68436446 674.94732666]
[ 6216.68436446 684.74255371]
[ 6216.68436446 712.20098877]]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Subtract, Average, Squeeze, then Subset Variable in NetCDF via Python - python

Related

Aggregating points using xarray

How to interpolate numpy.polyval and numpy.polyfit python

How to do a second interpolation in python

linear interpolation with grided data in python

Python 2D array -- How to plug in x and retrieve y value?

Categories

Resources