With the griddata in scipy used to perform interpolation (cubic splines and others), we have to put as parameters the data from which we interpolate, and at the same time, the new points on which we want to make a "prediction".
Is it possible to construct a "griddata object", that would have a method to predict a new point without reconstructing a new interpolation spline each time... ?
(for example, like with regression tree, we first construct the tree, then we aplly the .predict(new_points) method).
Here is an example :
import pandas as pd
import numpy as np
import sklearn
import scipy.interpolate as itp
n = 100
x1 = np.linspace(-2, 4, n)
X1 = []
X2 = []
for x in x1:
X1.append( [x for i in range(0, n)] )
X2.append( np.linspace(9, 15, n) )
X1 = np.array(X1).flatten()
X2 = np.array(X2).flatten()
Y1 = exp( 2*X1 )
Y2 = 3 * sqrt(X2)
#Data frames :
X = np.transpose( [X1, X2] )
X = pd.DataFrame(X, columns=["X1", "X2"])
Y = np.transpose( [Y1, Y2] )
Y = pd.DataFrame(Y, columns=["Y1", "Y2"])
X_new = np.transpose( [[-2], [9]] )
inter_cubic = itp.griddata(X, Y, X_new, method='cubic', fill_value=nan, rescale=False)
print(inter_cubic)
print(exp(2*(-2)), 3*sqrt(9))
Now inter_cubic is just an numpy array..
Is there a way of performing it, or can we use another "spline" constructor?
If you look at the source code for griddata (scroll down past the docstring to see the actual code), you'll see that it is a wrapper for several other interpolation functions, most of which work the way you want. In your case, with 2-d data and cubic interpolation, griddata does this:
ip = CloughTocher2DInterpolator(points, values, fill_value=fill_value,
rescale=rescale)
return ip(xi)
So instead of using griddata, you could use CloughTocher2DInterpolator. Specifically, using the names from your script, you would create the interpolator with
ip = itp.CloughTocher2DInterpolator(X, Y, fill_value=np.nan, rescale=False)
The object ip doesn't have a predict method; you just call it with the points at which you want to evaluate the interpolator. In your case, your would write
Y_new = ip(X_new)
Related
I have been able to interpolate values successfully from linear values of x to sine-like values of y.
However - I am struggling to interpolate the other way - from nonlinear values of y to linear values of x.
The below is a toy example
import matplotlib.pylab as plt
from scipy import interpolate
#create 100 x values
x = np.linspace(-np.pi, np.pi, 100)
#create 100 values of y where y= sin(x)
y=np.sin(x)
#learn function to map y from x
f = interpolate.interp1d(x, y)
With new values of linear x
xnew = np.array([-1,1])
I get correctly interpolated values of nonlinear y
ynew = f(xnew)
print(ynew)
array([-0.84114583, 0.84114583])
The problem comes when I try and interpolate values of x from y.
I create a new function, the reverse of f:
f2 = interpolate.interp1d(y,x,kind='cubic')
I put in values of y that I successfully interpolated before
ynew=np.array([-0.84114583, 0.84114583])
I am expecting to get the original values of x [-1, 1]
But I get:
array([-1.57328791, 1.57328791])
I have tried putting in other values for the 'kind' parameter with no luck and am not sure if I have got the wrong approach here. Thanks for your help
I guess the problem raises from the fact, that x is not a function of y, since for an arbitrary y value there may be more than one x value found.
Take a look at a truncated range of data.
When x ranges from 0 to np.pi/2, then for every y value there is a unique x value.
In this case the snippet below works as expected.
>>> import numpy as np
>>> from scipy import interpolate
>>> x = np.linspace(0, np.pi / 2, 100)
>>> y = np.sin(x)
>>> f = interpolate.interp1d(x, y)
>>> f([0, 0.1, 0.3, 0.5])
array([0. , 0.09983071, 0.29551713, 0.47941047])
>>> f2 = interpolate.interp1d(y, x)
>>> f2([0, 0.09983071, 0.29551713, 0.47941047])
array([0. , 0.1 , 0.3 , 0.50000001])
Maxim provided the reason for this behavior. This interpolation is a class designed to work for functions. In your case, y=arcsin(x) is only in a limited interval a function. This leads to interesting phenomena in the interpolation routine that interpolates to the nearest y-value which in the case of the arcsin() function is not necessarily the next value in the x-y curve but maybe several periods away. An illustration:
import numpy as np
import matplotlib.pylab as plt
from scipy import interpolate
xmin=-np.pi
xmax=np.pi
fig, axes = plt.subplots(3, 3, figsize=(15, 10))
for i, fac in enumerate([2, 1, 0.5]):
x = np.linspace(xmin * fac, xmax*fac, 100)
y=np.sin(x)
#x->y
f = interpolate.interp1d(x, y)
x_fit = np.linspace(xmin*fac, xmax*fac, 1000)
y_fit = f(x_fit)
axes[i][0].plot(x_fit, y_fit)
axes[i][0].set_ylabel(f"sin period {fac}")
if not i:
axes[i][0].set_title(label="interpolation x->y")
#y->x
f2 = interpolate.interp1d(y, x)
y2_fit = np.linspace(.99 * min(y), .99 * max(y), 1000)
x2_fit = f2(y2_fit)
axes[i][1].plot(x2_fit, y2_fit)
if not i:
axes[i][1].set_title(label="interpolation y->x")
#y->x with cubic interpolation
f3 = interpolate.interp1d(y, x, kind="cubic")
y3_fit = np.linspace(.99 * min(y), .99 * max(y), 1000)
x3_fit = f3(y3_fit)
axes[i][2].plot(x3_fit, y3_fit)
if not i:
axes[i][2].set_title(label="cubic interpolation y->x")
plt.show()
As you can see, the interpolation works along the ordered list of y-values (as you instructed it to), and this works particularly badly with cubic interpolation.
I have 2 numpy arrays with data, say x,y, and I apply plt.step() and get a continues (step) curve of it.
I would like to be able to create this function by my own, meaning I want to have an (zero order hold) step approximation to the value of y for x that does not actually exist in the original x array.
For example, in the following link I want to have the 'new' actual rectangle sine values, and not only plotted:
https://matplotlib.org/gallery/lines_bars_and_markers/step_demo.html#sphx-glr-gallery-lines-bars-and-markers-step-demo-py
You can use scipy's interp1d to create a step function. Default the interpolation is 'linear', but you can change it to 'next', 'previous' or 'nearest' for a step function.
A standard step function is obtained from step_fun = interp1d(x, y, kind='previous') and then calling it as step_fun(new_x).
The following code compares different types of "interpolation":
from matplotlib import pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
x = np.random.uniform(0.1, 0.7, 20).cumsum()
y = np.sin(x)
kinds = ['linear', 'previous', 'next', 'nearest', 'cubic']
for i, kind in enumerate(kinds):
function_from_points = interp1d(x, y + i, kind=kind)
x_detailed = np.linspace(x[0], x[-1], 1000)
plt.plot(x_detailed, function_from_points(x_detailed), color='dodgerblue')
plt.scatter(x, y + i, color='crimson')
plt.yticks(range(len(kinds)), kinds)
plt.show()
You can choose tick values and corresponding function values whichever you want. This is an example not equally spaced arguments and their values:
x = np.arange(20) + np.random.random(20)/2
y = np.sin(x / 2)**2 + np.random.random(20)/5
Remark: these two arrays must have equal size. If you want your own custom function, you can use np.vectorise:
x = np.arange(20) + np.random.random(20)/2
func = np.vectorize(lambda x: np.sin(x) + np.random.random()/5)
y = func(x)
From https://stackoverflow.com/a/30460089/2202107, we can generate CDF of a normal distribution:
import numpy as np
import matplotlib.pyplot as plt
N = 100
Z = np.random.normal(size = N)
# method 1
H,X1 = np.histogram( Z, bins = 10, normed = True )
dx = X1[1] - X1[0]
F1 = np.cumsum(H)*dx
#method 2
X2 = np.sort(Z)
F2 = np.array(range(N))/float(N)
# plt.plot(X1[1:], F1)
plt.plot(X2, F2)
plt.show()
Question: How do we generate the "original" normal distribution, given only x (eg X2) and y (eg F2) coordinates?
My first thought was plt.plot(x,np.gradient(y)), but gradient of y was all zero (data points are evenly spaced in y, but not in x) These kind of data is often met in percentile calculations. The key is to get the data evenly space in x and not in y, using interpolation:
x=X2
y=F2
num_points=10
xinterp = np.linspace(-2,2,num_points)
yinterp = np.interp(xinterp, x, y)
# for normalizing that sum of all bars equals to 1.0
tot_val=1.0
normalization_factor = tot_val/np.trapz(np.ones(len(xinterp)),yinterp)
plt.bar(xinterp, normalization_factor * np.gradient(yinterp), width=0.2)
plt.show()
output looks good to me:
I put my approach here for examination. Let me know if my logic is flawed.
One issue is: when num_points is large, the plot looks bad, but it's a issue in discretization, not sure how to avoid it.
Related posts:
I failed to understand why the answer was so complicated in https://stats.stackexchange.com/a/6065/131632
I also didn't understand why my approach was different than Generate distribution given percentile ranks
Usually I use Scipy.optimize.curve_fit to fit custom functions to data.
Data in this case was always a 1 dimensional array.
Is there a similiar function for a two dimensional array?
So, for example, I have a 10x10 numpy array. Then I have a function that does some stuff and creates a 10x10 numpy array, and I want to fit the function, so that the resulting 10x10 array has the best fit to the input array.
Maybe an example is better :)
data = pyfits.getdata('data.fits') #fits is an image format, this gives me a NxM numpy array
mod1 = pyfits.getdata('mod1.fits')
mod2 = pyfits.getdata('mod2.fits')
mod3 = pyfits.getdata('mod3.fits')
mod1_1D = numpy.ravel(mod1)
mod2_1D = numpy.ravel(mod2)
mod3_1D = numpy.ravel(mod3)
def dostuff(a,b): #originaly this is a function for 2D arrays
newdata = (mod1_1D*12)+(mod2_1D)**a - mod3_1D/b
return newdata
Now a and b should be fitted, so that newdata is as close as possible to data.
What I got so far:
data1D = numpy.ravel(data)
data_X = numpy.arange(data1D.size)
fit = curve_fit(dostuff,data_X,data1D)
But print fit only gives me
(array([ 1.]), inf)
I do have some nans in the arrays, maybe thats a problem?
The goal is to express the 2D function as a 1D function: g(x, y, ...) --> f(xy, ...)
Converting the coordinate pair (x, y) into a single number xy may seem tricky at first. But it's actually quite simple. Just enumerate all data points and you have a single number that uniquely defines each coordinate pair. The fitted function simply has to reconstruct the original coordinates, do it's calculations and return the result.
Example that fits a 2D linear gradient in a 20x10 image:
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
n, m = 10, 20
# noisy example data
x = np.arange(m).reshape(1, m)
y = np.arange(n).reshape(n, 1)
z = x + y * 2 + np.random.randn(n, m) * 3
def f(xy, a, b):
i = xy // m # reconstruct y coordinates
j = xy % m # reconstruct x coordinates
out = i * a + j * b
return out
xy = np.arange(z.size) # 0 is the top left pixel and 199 is the top right pixel
res = sp.optimize.curve_fit(f, xy, np.ravel(z))
z_est = f(xy, *res[0])
z_est2d = z_est.reshape(n, m)
plt.subplot(2, 1, 1)
plt.plot(np.ravel(z), label='original')
plt.plot(z_est, label='fitted')
plt.legend()
plt.subplot(2, 2, 3)
plt.imshow(z)
plt.xlabel('original')
plt.subplot(2, 2, 4)
plt.imshow(z_est2d)
plt.xlabel('fitted')
I would recommend using symfit for this, I wrote that to take care of all of the magic for you automatically.
In symfit you would just write the equation pretty much as you would on paper, and then you can run the fit.
I would do something like this:
from symfit import parameters, variables, Fit
# Assuming all this data is in the form of NxM arrays
data = pyfits.getdata('data.fits')
mod1 = pyfits.getdata('mod1.fits')
mod2 = pyfits.getdata('mod2.fits')
mod3 = pyfits.getdata('mod3.fits')
a, b = parameters('a, b')
x, y, z, u = variables('x, y, z, u')
model = {u: (x * 12) + y**a - z / b}
fit = Fit(model, x=mod1, y=mod2, z=mod3, u=data)
fit_result = fit.execute()
print(fit_result)
Unfortunatelly I have not yet included examples of the kind you need in the docs yet, but if you just look at the docs I think you can figure it out in case this doesn't work out of the box.
I want to generate a Gaussian distribution in Python with the x and y dimensions denoting position and the z dimension denoting the magnitude of a certain quantity.
The distribution has a maximum value of 2e6 and a standard deviation sigma=0.025.
In MATLAB I can do this with:
x1 = linspace(-1,1,30);
x2 = linspace(-1,1,30);
mu = [0,0];
Sigma = [.025,.025];
[X1,X2] = meshgrid(x1,x2);
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
F = 314159.153*reshape(F,length(x2),length(x1));
surf(x1,x2,F);
In Python, what I have so far is:
x = np.linspace(-1,1,30)
y = np.linspace(-1,1,30)
mu = (np.median(x),np.median(y))
sigma = (.025,.025)
There is a Numpy function numpy.random.multivariate_normal what can supposedly do the same as MATLAB's mvnpdf, but I am struggling to undestand the documentation. Especially in obtaining the covariance matrix needed by numpy.random.multivariate_normal.
As of scipy 0.14, you can use scipy.stats.multivariate_normal.pdf()
import numpy as np
from scipy.stats import multivariate_normal
x, y = np.mgrid[-1.0:1.0:30j, -1.0:1.0:30j]
# Need an (N, 2) array of (x, y) pairs.
xy = np.column_stack([x.flat, y.flat])
mu = np.array([0.0, 0.0])
sigma = np.array([.025, .025])
covariance = np.diag(sigma**2)
z = multivariate_normal.pdf(xy, mean=mu, cov=covariance)
# Reshape back to a (30, 30) grid.
z = z.reshape(x.shape)
I am working on a scikit called scikit-guess that contains some fast estimation routines for non-linear fits. It has a function skg.ngauss.model (also accessible as skg.ngauss_fit.model or skg.ngauss.ngauss_fit.model) which does exactly what you want. The nice thing is that it's not a PDF, so you set the amplitude out of the box:
import numpy as np
import skg.ngauss
a = 2e6
mu = 0, 0
sigma = 0.025, 0.025
x = y = np.linspace(-1, 1, 31)
cov = np.diag(sigma)**2
X = np.meshgrid(x, y)
data = skg.ngauss.model(X, a, mu, cov, axis=0)
You need to tell it axis=0 because it automatically stacks your arrays for you. To avoid passing in that argument, you could write
X = np.stack(np.meshgrid(x, y), axis=-1)
You can plot the result:
from matplotlib import pyplot as plt
plt.imshow(data)
plt.show()
This is not a very exciting distribution because the spread is so small that you end up with a value of ~2e-5 just one pixel away. You may want to up your sampling space to get any sort of meaningful resolution.
Note: At time of writing, the fitting function (ngauss_fit) is still buggy, but the model has been tested successfully, just not in the scikit.
Disclaimer: In case it wasn't obvious from the above, I am the author of scikit-guess.