scipy ND Interpolating over NaNs - python

I have been trouble working out how to use the scipy.interpolate functions (either LinearNDInterpolator, griddata or Preferably NearestNDInterpolator)
There are some tutorials online but i am confused what form my data needs to be in.
The online documentation for nearestND is terrible.
The function asks for:
x : (Npoints, Ndims) ndarray of floats
Data point coordinates.
y : (Npoints,) ndarray of float or complex
Data point values.
I have data in the form: lat,long,data,time held within an xarray dataset. There are some gaps in the data I would like to fill in.
I don't understand how to tell the function my x points.
i have tried (lat,long) as a tuple and np.meshgrid(lat,long) but can't seem to get it going.
Any help on how i can pass my lat,long coordinates into the function? Bonus points for time coordinates as well to make the estimates more robust through the third dimension.
Thanks!

i have tried (lat,long) as a tuple
If lat and long are 1D arrays or lists, try this:
points = np.array((lat, long)).T # make a 2D array of shape Npoints x 2
nd = NearestNDInterpolator(points, data)
The you can compute interpolated values as nd(lat1, long1), etc.

Scipy provides multivariate interpolation methods for both unstructured data and data point regularly placed on a grid. Unstructured data means the data could be provided as a list of non-ordered points. It seems that your data is structured: it is an array of size (480, 2040). However, the NearestNDInterpolator works on unstructured data. The flatten method can be used to transform the array to a list (1d) of value (of length 480*2040). The same have to be done for the coordinates. meshgrid is used to have the coordinates for every points of the grid, and again flatten is used to obtain a "list" of 2d coordinates (an array of shape 480*2040 x 2).
Here is an example which go from structured data to unstructured:
import numpy as np
lat = np.linspace(2, 6, 10)
lon = np.linspace(5, 9, 14)
latM, lonM = np.meshgrid(lat, lon) # M is for Matrix
dataM = np.sin(latM)*np.cos(lonM) # example of data, Matrix form
from scipy.interpolate import NearestNDInterpolator
points = np.array((latM.flatten(), lonM.flatten())).T
print( points.shape )
# >>> (140, 2)
f_nearest = NearestNDInterpolator(points, dataM.flatten())
f_nearest(5, 5)
Working with NaNs should not be a big problem in this case, because it is just a missing point in the list, except that the coordinates of the missing points have to be removed from the list too.

Related

Scipy interpolation of non-uniform data

I have a set of data loaded in from a csv file, with 1D arrays representing the x,y,z coords of the data points, and another 1D array, T, representing the value of a field at the corresponding points. The points are not uniform in space.
I am struggling to interpolate T a given point xi,yi,zi. scipy's interpn seems to want to accept T only as a 3D array, which doesn't make sense to me as T is simply 1D data?
Any advice would be appreciated.
Edit:
Example:
import numpy as np
x = np.array([1.0,1.5,1.1,1.3,1.4])
y = np.array([1.1,1.3,1.2,1.4,1.45])
z = np.array([1.0,1.1,1.4,1.2,1.0])
T = np.array([5.0,5.1,5.4,4.6,4.9])
point = ([1.2,1.1,1.25])
from scipy.interpolate import interpn
out = interpn((x,y,z),T,point)
print(out)
Cheers

How to generate uniformly new 2-D data set in a given square in python

How to generate uniformly new 2-D data set (dimension (10000,2) )in a given square in python,if we already know the coordinates of the four points of the square.
You can use numpy.random.uniform and specify the min and max values of each dimension.
import numpy as np
data = np.random.uniform((min_d0, min_d1), (max_d0, max_d1), (10000, 2))
You can check the documentation here: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.uniform.html

Given a 2D Numpy array representing a 2D distribution, how to sample data from this distribution with the aid of Numpy or Scipy functions?

Given a 2D numpy array dist with shape (200,200), where each entry of the array represents the joint probability of (x1, x2) for all x1 , x2 ∈ {0, 1, . . . , 199}. How do I sample bivariate data x = (x1, x2) from this probability distribution with the aid of Numpy or Scipy API?
This solution works with probability distributions of any number of dimensions, assuming they are a valid probability distribution (its contents must sum to 1, etc.). It flattens the distribution, samples from that, and adjusts the random index to match the original array shape.
# Create a flat copy of the array
flat = array.flatten()
# Then, sample an index from the 1D array with the
# probability distribution from the original array
sample_index = np.random.choice(a=flat.size, p=flat)
# Take this index and adjust it so it matches the original array
adjusted_index = np.unravel_index(sample_index, array.shape)
print(adjusted_index)
Also, to get multiple samples, add a size keyword argument to the np.random.choice call, and modify adjusted_index before printing it:
adjusted_index = np.array(zip(*adjusted_index))
This is necessary because np.random.choice with a size argument outputs a list of indices for each coordinate dimension, so this zips them into a list of coordinate tuples. This is also much more efficient than simply repeating the first code.
Relevant documentation:
np.random.choice
np.unravel_index
Here's a way, but I'm sure there's a much more elegant solution using scipy.
numpy.random doesn't deal with 2d pmfs, so you have to do some reshaping gymnastics to go this way.
import numpy as np
# construct a toy joint pmf
dist=np.random.random(size=(200,200)) # here's your joint pmf
dist/=dist.sum() # it has to be normalized
# generate the set of all x,y pairs represented by the pmf
pairs=np.indices(dimensions=(200,200)).T # here are all of the x,y pairs
# make n random selections from the flattened pmf without replacement
# whether you want replacement depends on your application
n=50
inds=np.random.choice(np.arange(200**2),p=dist.reshape(-1),size=n,replace=False)
# inds is the set of n randomly chosen indicies into the flattened dist array...
# therefore the random x,y selections
# come from selecting the associated elements
# from the flattened pairs array
selections = pairs.reshape(-1,2)[inds]
I can't comment either, but #applemonkey496 's suggestion for getting multiple samples doesn't work as written. It's an excellent solution otherwise.
Instead of
adjusted_index = np.array(zip(*adjusted_index))
adjusted_index should be converted to a python list before trying to put it into a numpy array (numpy arrays do not accept zipped objects), eg:
adjusted_index = np.array(list(zip(*adjusted_index)))
I can't comment, but to improve kevinkayaks answer's :
pairs=np.indices(dimensions=(200,200)).T
selections = pairs.reshape(-1,2)[inds]
Is not needed can be replace by :
np.array([inds//m, inds%m]).T
The matrix "pairs" is not needed anymore.

numpy 3D meshgrid only as a view

I have a cubic grid defined by the spacing xi,yi,zi:
xi,yi,zi = [linspace(ox,ox+s*d,s) for ox,s,d in zip(origin,size,delta)]
I also have set of scalar values W onto that grid. W.shape() == size. I'd like to use scipy's linear interpolation, which requires as input:
class scipy.interpolate.LinearNDInterpolator(points, values):
Parameters :
points : ndarray of floats, shape (npoints, ndims) Data point coordinates.
values : ndarray of float or complex, shape (npoints, ...) Data values.
How do I create a fake set of points (via magical broadcasting) from xi,yi,zi? Right now I'm creating an intermediate array to feed to the interpolation function - is there a better way?
Related Question: Numpy meshgrid in 3D. The answers in this post actually create the grid - I only want to simulate it as input to another function (pure numpy solution preferred).
>>> xi, yi, zi = [np.arange(3) for i in range(3)]
>>> xx, yy, zz = np.broadcast_arrays(xi,yi[:,np.newaxis],zi[:,np.newaxis,np.newaxis])
>>> xx.shape
(3, 3, 3)
>>> xx.strides
(0, 0, 8)
You can see it didn't create new copies since the strides are 0 in the first two dimensions.
I wrote a n dimensional version of this also:
def ndmesh(*args):
args = map(np.asarray,args)
return np.broadcast_arrays(*[x[(slice(None),)+(None,)*i] for i, x in enumerate(args)])
You can construct the necessary points array in a similar way as explained in the other answers:
xx, yy, zz = np.broadcast_arrays(xi[:,None,None], yi[None,:,None], zi[None,None,:])
points = (xx.ravel(), yy.ravel(), zz.ravel())
ip = LinearNDInterpolator(points, data.ravel())
However, if you have a regular grid, then using LinearNDInterpolator is most likely not the best choice, since it is designed for scattered data interpolation. It constructs a Delaunay triangulation of the data points, but in this case the original data has already a very regular structure that would be more efficient to make use of.
Since your grid is rectangular, you can build up the interpolation as a tensor product of three 1-D interpolations. Scipy doesn't have this built-in (so far), but it's fairly easy to do, see this thread: http://mail.scipy.org/pipermail/scipy-user/2012-June/032314.html
(use e.g. interp1d instead of pchip to get 1-D interpolation)
I do not believe there is any way you can pass something to LinearNDInterpolator short of a full copy (as there are no functions for regular grids in three dimensions too). So the only place to avoid creating full arrays would be during creation of this points array, I do not know how you do it right now, so maybe it is already efficient in this regard, but I guess its likely not worth the trouble to avoid this.
Other then np.mgrid+reshape maybe something like this might be an option (not to hard to write for n-dimensions too):
# Create broadcastest versions of xi, yi and zi
# np.broadcast_arrays does not allocate the full arrays
xi, yi, zi = np.broadcast_arrays(xi[:,None,None], yi[:,None,None], zi[:,None,None])
# then you could use .flat to fill a point array:
points = np.empty((xi.size, 3), dtype=xi.dtype)
points[:,0] = xi.flat
points[:,1] = yi.flat
points[:,2] = zi.flat
Opposed to the .repeat function, the temporary arrays created here are not larger then the original xi, etc. arrays.

Numpy Index values for each element in 3D array

I have a 3D array created using the numpy mgrid command so that each element has a certain value and the indexes retain the spatial information. For example, if one summed over the z-axis (3rd dimension) then the the resultant 2D array could be used in matplotlib with the function imshow() to obtain an image with different binned pixel values.
My question is: How can I obtain the index values for each element in this grid (a,b,c)?
I need to use the index values to calculate the relative angle of each point to the origin of the grid. (eg. theta=sin-1(sqrt(x^2+y^2)/sqrt(x^2+y^2+z^2))
Maybe this can be translated to another 3D grid where each element is the array [a,b,c]?
I'm not exactly clear on your meaning, but if you are looking for 3d arrays that contain the indices x, y, and z, then the following may suit your needs; assume your data is held in a 3D array called "abc":
import numpy as nm
x,y,z = nm.mgrid[[slice(dm) for dm in abc.shape]]

Categories