I am currently trying to plot some data into cartopy, but I am having some issues.
I have a datasheet that has a shape of (180, 180, 360) time, lat, and lon respectively.
I would like to get an annual mean of this data. I had been using the code
def global_mean_3D(var, weights):
# make sure masking is correct, otherwise we get nans
var = np.ma.masked_invalid(var)
# resulting variable should have dimensions of depth and time (x)
ave = np.zeros([var.shape[0], var.shape[1]])
# loop over time
for t in np.arange(var.shape[0]):
# loop over each depth slice
for d in np.arange(var.shape[1]):
ave[t,d] = np.ma.average(var[t,d,:], weights = weights)
return ave
which I then use to plot
ax=plt.axes(projection=ccrs.Robinson())
ax.coastlines()
ax.contourf(x,y, ann_total_5tg)
But this code gives me a one dimension shape, over time, which I can't plot into cartopy using pcolor mesh.
I am left with the error
TypeError: Input z must be a 2D array.
Would it be possible to get an annual mean whilst maintaining variables within the datasheet?
I suspect that you have to reshape your numpy array to use it with the contour method.
Using your variable name it can be done like this :
ann_total_5tg = ann_total_5tg.reshape((180, 180))
Related
I have 2 separate arrays with different sizes:
len(range_data) = 4320
len(az1) = 385
len(az2) = 347
data1.shape = (385,4320)
data2.shape = (347,4320)
I would like for the dimensions of data2 to equal that of data1, such that data2.shape should be (385,4320). I have tried scipy interpolate such as:
f = interpolate.interp2d(az1,range_data,data1,kind='cubic')
znew = f(az2,range_data)
print(znew.shape)
(347,4320)
znew.shape should be (385,4320), any ideas why this is happening and/or what might need to be done to fix this?
I don't think that interp2d actually generates more points for you, it defines an interpolation function over a grid. That means that what you've created is a way to interpolate points within the grid defined by your first set of data points. znew will return an interpolated grid with the same number of values as the x and y passed to it.
See the source code.
Returns
-------
z : 2-D array with shape (len(y), len(x))
The interpolated values.
If you want to add extra data points, I would suggest deriving a regression function (or whatever ML technique you want, NNs if you're so inclined) on the second data set and use that function to produce the extra 38 datapoints you need.
I have a function f(u,v,w) which I would like to interpolate using a scipy function (with linear interpolation). This is easy enough.
When I run the interpolation step, I simply do the following (interpolating over a u,v,w grid):
u = np.linspace(-1,1,100)
v = np.linspace(-2,2,50)
w = np.linspace(3,8,30)
values_grid = np.zeros((len(u),len(v),len(w)))
count = 0
for i in range(len(u)):
for j in range(len(w)):
for k in range(len(w)):
values_grid[i,j,k] = f(u[i],v[j],w[k])
from scipy.interpolate import RegularGridInterpolator
my_interpolating_function = RegularGridInterpolator((u, v, w), values_grid, method='linear',bounds_error=False,fill_value=-999)
This is fine for many cases. However, when I want to evaluate this interpolation function it seems like I am required to use inputs which have shape [(Number of input samples) x (Dimension of Samples)]. E.g:
func_input = np.vstack([u_samps,v_samps,w_samps].T # E.g. shape is 500,3
output = my_interpolating_function(func_input)) # Has output shape 500
This works fine. The issue is that I would like to evaluate this function over a grid where the samples have the following shape
shape(u_samps) = 500
shape(v_samps) = (100,100)
shape(w_samps) = (100,100)
Meaning I would like to evaluate
my_interpolating_function([u_samps, v_samps, w_samps])
and get out an array which has shape (500,100,100) (so the interpolation is evaluated for all 500 u_samps over the v_samps and w_samps grids). I can flatten the v_samps and w_samps array, but then I have to make several (hundreds) copies of u_samps to get the inputs into the correct format. So is there any way to have an interpolation function that can take the inputs above (u_samps, v_samps, w_samps with the specified shapes) and get out an array with shape (500,100,100) efficiently?
Any help greatly appreciated, I have been stuck on this problem and it's really holding up my progress! The end goal is to use this function in a statistical likelihood which needs to be sampled with MCMC, so speed is pretty important (and making hundreds of copies of massive arrays is very slow)
I have used all mpl_toolkits.basemap functions successfully on several global GCM netcdf datasets. Until I met this grid, with longitudes starting at 0.9375 (instead of 0 as I have always seen) and ending at 359.062.
To prepare a plot, I need to:
make the plot continuous with:
# input_var is a 2D numpy array
var_cyclicDUMMY, lons_cyclicDUMMY = addcyclic(input_var, lons)
I thus obtain a 2D array var_cyclicDUMMYwith an extra column (one extra longitude), and a 1D array lons_cyclicDUMMY with one extra element at the end, i.e. one extra longitude, but at 0.9375, instead of the 360 that is needed.
Indeed in the next step, where I
shift the grid, so longitudes go from -180 to 180 instead of 0 to 360, with:
var_cyclic, lons_cyclic = shiftgrid(180., var_cyclicDUMMY,
lons_cyclicDUMMY, start=False)
I get an ValueError: lon0 outside of range of lonsin
Any suggestions how to get around this with basemap or another solution?
I am new to this so I apologize if I am missing something. I am trying to get a probability range of a dataset with three dimensions (time, lat, lon). For 1 "cell" (single lat/lon combination), I have done the following:
# create some data
mu, sigma = 0, 0.1
s = np.random.normal(mu, sigma,900)
# get 90th - 100th percentiles
t_90x_ref= np.percentile(s, 90,interpolation="nearest")
t_100x_ref=np.percentile(s,100,interpolation="nearest")
# apply gaussian_kde function
AbnomRef_pdf= gaussian_kde(s)
# get probability range
Prob_range_90_100_Ref=AbnomRef_pdf.integrate_box_1d(t_90x_ref, t_100x_ref)*100
I would now like to repeat this exact process for each grid cell (lat/lon combination) along the time axis (with 900 timesteps,like above).
lat= np.linspace(-38.28,34.76, 167)
lon = np.linspace(143.92,207.72, 146)
# 3dim data
Anomalies_ref = np.random.rand(900, 167,146)
# get percentiles for 3 dim data
t_90x_ref= np.percentile(Anomalies_ref, 90,interpolation="nearest", axis=0)
Here is where I get stuck with the gaussian_kde function (neither a for-loop worked, nor was I able to flatten the gaussian_kde results). I have seen this case Using scipy.stats.gaussian_kde with 2 dimensional data but can`t really apply it to my problem.Ultimately, my goal is to get a Prob_range_90_100_Ref with shape (167,146)
Any help would be very much appreciated!
Thanks!
I have two datasets of a specific region: The first is the rainfall and the second a vegetation measure (npp) of that region. So, the first two dimensions (x,y) represent the geographical location. The third dimension is the time (8 time steps). What I want to do is to perform a linear regression for each location of the 8 values rainfall versus the 8 values of the vegetation. The result should be either several two dimensional arrays in which for each location the p-value, the r², the slope and ideally the residuals are calculated or all values togeher in a 3D array.
nppList = glob.glob(nppPath+"*.img")
rainList = glob.glob(rainPath+"*.img")
nppImg = [gdal.Open(i) for i in nppList]
rainImg = [gdal.Open(i) for i in rainList]
nppFiles = [i.ReadAsArray() for i in nppImg]
rainFiles = [i.ReadAsArray() for i in rainImg]
# get nodata
nppNodata = nppImg[1].GetRasterBand(1).GetNoDataValue()
rainNodata = rainImg[1].GetRasterBand(1).GetNoDataValue()
# convert to float and set no data
nppStack = nppStack.astype(float)
nppStack[nppStack == nppNodata] = np.nan
rainStack = rainStack.astype(float)
rainStack[rainStack == rainNodata] = np.nan
# instead of range(0,8) there should be the rainfall variable, but on a pixel base
def linReg(a):
return stats.linregress(a, range(0, 8))
lm = np.apply_along_axis(linReg, axis=2, arr=nppStack)
I know the function numpy.apply_along_axis() but here a function can be applied to only one array. I am searching for a possibility to apply a function on two arrays along an axis preferably wihtout looping through the arrays.
The source for scipy.stats.linregress indicates that only arrays with dimension greater than 2 are not supported (and only then for the case that your x and y data happen to be in the same data structure).
Honestly, in your case I would use a Python loop -- it is unlikely that the slowest part of the code is looping over the data points; rather, the regression itself will be determining the speed.
In that case, you could flatten your positional axes, use a single loop, and then reshape the regression results back to 3D. Something like:
n = nx * ny
frain = rainStack.reshape((n, 8))
fnpp = nppStack.reshape((n, 8))
reg_results = np.empty((n,5))
for i in range(n):
reg_results[i] = stats.linregress(frain[i], fnpp[i])
reg_results[i].reshape((nx,ny,8)) # back to 3D