How to interpolate in n dimensions with different length axes - python

So I was looking at how to use scipy's interpn function, and the example they have on the documentation isn't quite working with what I need it to do.
My implementation is a bit different. I have a precomputed value array with shape [200,40,40,40] that I get from a different script.
So when I do something like:
t = np.linspace(0,1, 200)
x = np.linspace(0,1, 40)
y = np.linspace(0,1, 40)
z = np.linspace(0,1, 40)
points = (t,x,y,z)
interpn(points,values,point)
I get an error: "ValueError: There are 40 points and 200 values in dimension 0"
It seems as though the dimensions of my points tuple and value array are not lining up, but I thought since my "t" axis is first in the tuple, it should be match. Any advice?

So this works for me:
import numpy as np
from scipy.interpolate import interpn
def f(x,y,z,t):
'''Simple 3D + time dimensional function.'''
return (np.sin(x)+y+np.sqrt(z))*t
t = np.linspace(0,1,200)
x = np.linspace(0,1,40)
y = np.linspace(0,1,40)
z = np.linspace(0,1,40)
points = (x,y,z,t)
values = f(*np.meshgrid(*points))
# example point in domain
point = [0,0.5,0.75,1/3.]
print(interpn(points, values, point))
array([0.44846267])
You defined x,y,z as np.linspace(0,40,1), this means you have a single point on the interval [0,40]. The same for t. That's probably your error. Example taken from the official scipy documentation.

Related

What is the best way/method to digitize the data of a 3D surface into a grid of pixels with smaller resolution in Python?

I want to digitize (= average out over cells) photon count data into pixels given by a grid that tells how they are aligned. The photon count data is stored in a 2D array. I want to split that data into cells, each of which would correspond to a pixel. The idea is basically the same as changing an HD image to a smaller resolution. I'd like to achieve this in Python.
The digitizing function I've written:
import numpy as np
def digitize(function_data, grid_shape):
"""
function_data = 2D array of function values of some 3D shape,
eg.: exp(-(x^2 + y^2 -> want to digitize this
grid_shape: an array of length 2 which contains the dimensions of the smaller resolution
"""
l = len(function_data)
pixel_len_x = int(l/grid_shape[0])
pixel_len_y = int(l/grid_shape[1])
digitized_data = np.empty((grid_shape[0], grid_shape[1]))
for i in range(grid_shape[0]): #row-index of pixel in smaller-resolution grid
for j in range(grid_shape[1]): #column-index of pixel in smaller-resolution grid
hd_pixel = []
for k in range(pixel_len_y):
hd_pixel.append(z_data[k][j:j*pixel_len_x])
hd_pixel = np.ravel(hd_pixel) #turns 2D array into 1D to be able to compute average
pixel_avg = np.average(hd_pixel)
digitized_data[i][j] = pixel_avg
return digitized_data
In theory, this function should do what I want to achieve, but when tested it doesn't yield the expected results. Either a completed version of my function or any other method that achieves my goal would be extremely helpful.
You could also use a interpolation function, if you can use SciPy. Here we use one of the gridded data interpolating functions, RectBivariateSpline to upsample your function, but you can find numerous examples on this and other sites.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import RectBivariateSpline as rbs
# Sampling coordinates
x = np.linspace(-2,2,20)
y = np.linspace(-2,2,30)
# Your function
f = np.exp(-(x[:,None]**2 + y**2))
# Interpolator
interp = rbs(x, y, f)
# Higher resolution coordinates
x_hd = np.linspace(x.min(), x.max(), x.size * 5)
y_hd = np.linspace(y.min(), y.max(), y.size * 5)
# New higher res function
f_hd = interp(x_hd, y_hd, grid = True)
# Some plots
fig, ax = plt.subplots(ncols = 2)
ax[0].imshow(f)
ax[1].imshow(f_hd)

Error when trying to interpolate using SmoothSphereBivariateSpline(): "ValueError: Error code returned by bispev: 10"

I want to interpolate data, which is randomly scattered on the surface of a sphere, onto a regular longitude/latitude grid. I tried to do this with SmoothSphereBivariateSpline() from the scipy.interpolate package (see the code below).
import numpy as np
from scipy.interpolate import SmoothSphereBivariateSpline
#Define the input data and the original sampling points
NSamp = 2000
Theta = np.random.uniform(0,np.pi,NSamp)
Phi = np.random.uniform(0,2*np.pi, NSamp)
Data = np.ones(NSamp)
Interpolator = SmoothSphereBivariateSpline(Theta, Phi, Data, s=3.5)
#Prepare the grid to which the input shall be interpolated
NLon = 64
NLat = 32
GridPosLons = np.arange(NLon)/NLon * 2 * np.pi
GridPosLats = np.arange(NLat)/NLat * np.pi
LatsGrid, LonsGrid = np.meshgrid(GridPosLats, GridPosLons)
Lats = LatsGrid.ravel()
Lons = LonsGrid.ravel()
#Interpolate
Interpolator(Lats, Lons)
However, when I execute this code it gives me the following error:
ValueError: Error code returned by bispev: 10
Does anyone know what the problem is and how to fix it? Is this a bug or am I doing something wrong?
In the documentation of __call__ method of SmoothSphereBivariateSpline, note the grid flag (some other interpolators have it too). If True, it's understood that you are entering one-dimensional arrays from which a grid is to be formed. This is the default value. But you already made a meshgrid from your one-dimensional arrays, so this default behavior doesn't work for your input.
Solution: use either
Interpolator(Lats, Lons, grid=False)
or, which is simpler and better:
Interpolator(GridPosLats, GridPosLons)
The latter will return the data in grid form (2D array), which makes more sense than the flattened data you would get with the first version.

scipy splrep() with weights not fitting the given curve

Using scipy's splrep I can easily fit a test sinewave:
import numpy as np
from scipy.interpolate import splrep, splev
import matplotlib.pyplot as plt
plt.style.use("ggplot")
# Generate test sinewave
x = np.arange(0, 20, .1)
y = np.sin(x)
# Interpolate
tck = splrep(x, y)
x_spl = x + 0.05 # Just to show it wors
y_spl = splev(x_spl, tck)
plt.plot(x_spl, y_spl)
The splrep documentation states that the default value for the weight parameter is np.ones(len(x)). However, plotting this results in a totally different plot:
tck = splrep(x, y, w=np.ones(len(x_spl)))
y_spl = splev(x_spl, tck)
plt.plot(x_spl, y_spl)
The documentation also states that the smoothing condition s is different when a weight array is given - but even when setting s=len(x_spl) - np.sqrt(2*len(x_spl)) (the default value without a weight array) the result does not strictly correspond to the original curve as shown in the plot.
What do I need to change in the code listed above in order to make the interpolation with weight array (as listed above) output the same result as the interpolation without the weights?
I have tested this with scipy 0.17.0. Gist with a test IPython notebook
You only have to change one line of your code to get the identical output:
tck = splrep(x, y, w=np.ones(len(x_spl)))
should become
tck = splrep(x, y, w=np.ones(len(x_spl)), s=0)
So, the only difference is that you have to specify s instead of using the default one.
When you look at the source code of splrep you will see why that is necessary:
if w is None:
w = ones(m, float)
if s is None:
s = 0.0
else:
w = atleast_1d(w)
if s is None:
s = m - sqrt(2*m)
which means that, if neither weights nor s are provided, s is set to 0 and if you provide weights but no s then s = m - sqrt(2*m) where m = len(x).
So, in your example above you compare outputs with the same weights but with different s (which are 0 and m - sqrt(2*m), respectively).

Return the value of a 2D PDF given x and y in Python?

I have some data that I plotted the PDF using matplotlib's hist2D function.
The result looks like this:
The hist2d function returns a triple of arrays: H,xedges,yedges. H being the 2D histogram value.
Now I'd like to turn this discrete H matrix and turn it into a function, that returns the value of H for any given (x,y) input.
In other words I'd like to turn my 2D histogram into a 2D step function. Is there a specific function that would be computationally cheap that I could use on that purpose?
This looks like a pretty simple operation (usually done for image processing but with pixel indices instead of real numbers) but I'm unable to find anything about it, can you please help me?
You can construct an interpolator from the counts like this:
from numpy import random, histogram2d, diff
import matplotlib.pyplot as plt
from scipy.interpolate import interp2d
# Generate sample data
n = 10000
x = random.randn(n)
y = -x + random.randn(n)
# bin
nbins = 100
H, xedges, yedges = histogram2d(x, y, bins=nbins)
# Figure out centers of bins
def centers(edges):
return edges[:-1] + diff(edges[:2])/2
xcenters = centers(xedges)
ycenters = centers(yedges)
# Construct interpolator
pdf = interp2d(xcenters, ycenters, H)
# test
plt.pcolor(xedges, yedges, pdf(xedges, yedges))
Result:
Note that this will be linearly interpolated rather than step-wise. For a quicker version which assumes a regular grid, this will also work:
from numpy import meshgrid, vectorize
def position(edges, value):
return int((value - edges[0])/diff(edges[:2]))
#vectorize
def pdf2(x, y):
return H[position(yedges, y), position(xedges, x)]
# test - note we need the meshgrid here to get the right shapes
xx, yy = meshgrid(xcenters, ycenters)
plt.pcolor(xedges, yedges, pdf2(xx, yy))

How to pick points under the curve?

What I'm trying to do is make a gaussian function graph. then pick random numbers anywhere in a space say y=[0,1] (because its normalized) & x=[0,200]. Then, I want it to ignore all values above the curve and only keep the values underneath it.
import numpy
import random
import math
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
from math import sqrt
from numpy import zeros
from numpy import numarray
variance = input("Input variance of the star:")
mean = input("Input mean of the star:")
x=numpy.linspace(0,200,1000)
sigma = sqrt(variance)
z = max(mlab.normpdf(x,mean,sigma))
foo = (mlab.normpdf(x,mean,sigma))/z
plt.plot(x,foo)
zing = random.random()
random = random.uniform(0,200)
import random
def method2(size):
ret = set()
while len(ret) < size:
ret.add((random.random(), random.uniform(0,200)))
return ret
size = input("Input number of simulations:")
foos = set(foo)
xx = set(x)
method = method2(size)
def undercurve(xx,foos,method):
Upper = numpy.where(foos<(method))
Lower = numpy.where(foos[Upper]>(method[Upper]))
return (xx[Upper])[Lower],(foos[Upper])[Lower]
When I try to print undercurve, I get an error:
TypeError: 'set' object has no attribute '__getitem__'
and I have no idea how to fix it.
As you can all see, I'm quite new at python and programming in general, but any help is appreciated and if there are any questions I'll do my best to answer them.
The immediate cause of the error you're seeing is presumably this line (which should be identified by the full traceback -- it's generally quite helpful to post that):
Lower = numpy.where(foos[Upper]>(method[Upper]))
because the confusingly-named variable method is actually a set, as returned by your function method2. Actually, on second thought, foos is also a set, so it's probably failing on that first. Sets don't support indexing with something like the_set[index]; that's what the complaint about __getitem__ means.
I'm not entirely sure what all the parts of your code are intended to do; variable names like "foos" don't really help like that. So here's how I might do what you're trying to do:
# generate sample points
num_pts = 500
sample_xs = np.random.uniform(0, 200, size=num_pts)
sample_ys = np.random.uniform(0, 1, size=num_pts)
# define distribution
mean = 50
sigma = 10
# figure out "normalized" pdf vals at sample points
max_pdf = mlab.normpdf(mean, mean, sigma)
sample_pdf_vals = mlab.normpdf(sample_xs, mean, sigma) / max_pdf
# which ones are under the curve?
under_curve = sample_ys < sample_pdf_vals
# get pdf vals to plot
x = np.linspace(0, 200, 1000)
pdf_vals = mlab.normpdf(x, mean, sigma) / max_pdf
# plot the samples and the curve
colors = np.array(['cyan' if b else 'red' for b in under_curve])
scatter(sample_xs, sample_ys, c=colors)
plot(x, pdf_vals)
Of course, you should also realize that if you only want the points under the curve, this is equivalent to (but much less efficient than) just sampling from the normal distribution and then randomly selecting a y for each sample uniformly from 0 to the pdf value there:
sample_xs = np.random.normal(mean, sigma, size=num_pts)
max_pdf = mlab.normpdf(mean, mean, sigma)
sample_pdf_vals = mlab.normpdf(sample_xs, mean, sigma) / max_pdf
sample_ys = np.array([np.random.uniform(0, pdf_val) for pdf_val in sample_pdf_vals])
It's hard to read your code.. Anyway, you can't access a set using [], that is, foos[Upper], method[Upper], etc are all illegal. I don't see why you convert foo, x into set. In addition, for a point produced by method2, say (x0, y0), it is very likely that x0 is not present in x.
I'm not familiar with numpy, but this is what I'll do for the purpose you specified:
def undercurve(size):
result = []
for i in xrange(size):
x = random()
y = random()
if y < scipy.stats.norm(0, 200).pdf(x): # here's the 'undercurve'
result.append((x, y))
return results

Categories