I have been able to interpolate values successfully from linear values of x to sine-like values of y.
However - I am struggling to interpolate the other way - from nonlinear values of y to linear values of x.
The below is a toy example
import matplotlib.pylab as plt
from scipy import interpolate
#create 100 x values
x = np.linspace(-np.pi, np.pi, 100)
#create 100 values of y where y= sin(x)
y=np.sin(x)
#learn function to map y from x
f = interpolate.interp1d(x, y)
With new values of linear x
xnew = np.array([-1,1])
I get correctly interpolated values of nonlinear y
ynew = f(xnew)
print(ynew)
array([-0.84114583, 0.84114583])
The problem comes when I try and interpolate values of x from y.
I create a new function, the reverse of f:
f2 = interpolate.interp1d(y,x,kind='cubic')
I put in values of y that I successfully interpolated before
ynew=np.array([-0.84114583, 0.84114583])
I am expecting to get the original values of x [-1, 1]
But I get:
array([-1.57328791, 1.57328791])
I have tried putting in other values for the 'kind' parameter with no luck and am not sure if I have got the wrong approach here. Thanks for your help
I guess the problem raises from the fact, that x is not a function of y, since for an arbitrary y value there may be more than one x value found.
Take a look at a truncated range of data.
When x ranges from 0 to np.pi/2, then for every y value there is a unique x value.
In this case the snippet below works as expected.
>>> import numpy as np
>>> from scipy import interpolate
>>> x = np.linspace(0, np.pi / 2, 100)
>>> y = np.sin(x)
>>> f = interpolate.interp1d(x, y)
>>> f([0, 0.1, 0.3, 0.5])
array([0. , 0.09983071, 0.29551713, 0.47941047])
>>> f2 = interpolate.interp1d(y, x)
>>> f2([0, 0.09983071, 0.29551713, 0.47941047])
array([0. , 0.1 , 0.3 , 0.50000001])
Maxim provided the reason for this behavior. This interpolation is a class designed to work for functions. In your case, y=arcsin(x) is only in a limited interval a function. This leads to interesting phenomena in the interpolation routine that interpolates to the nearest y-value which in the case of the arcsin() function is not necessarily the next value in the x-y curve but maybe several periods away. An illustration:
import numpy as np
import matplotlib.pylab as plt
from scipy import interpolate
xmin=-np.pi
xmax=np.pi
fig, axes = plt.subplots(3, 3, figsize=(15, 10))
for i, fac in enumerate([2, 1, 0.5]):
x = np.linspace(xmin * fac, xmax*fac, 100)
y=np.sin(x)
#x->y
f = interpolate.interp1d(x, y)
x_fit = np.linspace(xmin*fac, xmax*fac, 1000)
y_fit = f(x_fit)
axes[i][0].plot(x_fit, y_fit)
axes[i][0].set_ylabel(f"sin period {fac}")
if not i:
axes[i][0].set_title(label="interpolation x->y")
#y->x
f2 = interpolate.interp1d(y, x)
y2_fit = np.linspace(.99 * min(y), .99 * max(y), 1000)
x2_fit = f2(y2_fit)
axes[i][1].plot(x2_fit, y2_fit)
if not i:
axes[i][1].set_title(label="interpolation y->x")
#y->x with cubic interpolation
f3 = interpolate.interp1d(y, x, kind="cubic")
y3_fit = np.linspace(.99 * min(y), .99 * max(y), 1000)
x3_fit = f3(y3_fit)
axes[i][2].plot(x3_fit, y3_fit)
if not i:
axes[i][2].set_title(label="cubic interpolation y->x")
plt.show()
As you can see, the interpolation works along the ordered list of y-values (as you instructed it to), and this works particularly badly with cubic interpolation.
Related
I am using interp1d from Scipy to interpolate a function with linear interpolation. Now I need to upgrade to Whittaker–Shannon interpolation. Is this already implemented somewhere? I am surprised it is not among the options of interp1d as it is a very common interpolation algorithm.
I am not familiar with sinc interpolation, but based on What's wrong with this Whittaker-Shannon-Kotel’nikov interpolation implementation? I roughly follow the same pattern
The idea is we need to resample the original data with less frequency than the original (which is represented by freq_s_ratio), then reconstruct the signal using sinc, and finally resample back to the original size
However, this caused boundary artifacts, but padding and truncate signal seems to be working. Here is my code
import numpy as np
import scipy.signal
import matplotlib.pyplot as plt
def rough_sinc_interp(samples, freq_s_ratio = 0.5):
offset_amount = int(len(samples)/2)
padded_samples = np.concatenate([ offset_amount*[samples[0]], samples, offset_amount*[samples[-1]]])
f_s = int(freq_s_ratio * len(padded_samples))
resamples = scipy.signal.resample(padded_samples, f_s)
T_s = 1/f_s
t = np.arange(0, 1, T_s)
y = np.zeros(len(t))
for k in range(1, len(resamples)):
y = y + resamples[k] * np.sinc((t - k*T_s)/T_s)
return scipy.signal.resample(y, len(padded_samples))[offset_amount:-offset_amount]
np.random.seed(1337)
signal_fn = lambda x: -1*(np.sin(x) + np.cos(x**2) + np.random.normal(scale=0.5, size=len(x)) + np.log(np.abs(x**2) + 0.1)) + 50
x = np.arange(0, 10, 0.05)
y = signal_fn(x)
plt.figure(figsize=(15, 7))
plt.plot(x, y, label="noisy")
plt.plot(x, rough_sinc_interp(y, freq_s_ratio=0.5), label="smooth - 50%")
plt.plot(x, rough_sinc_interp(y, freq_s_ratio=0.15), label="smooth - 15%")
plt.plot(x, rough_sinc_interp(y, freq_s_ratio=0.1), label="smooth - 10%")
plt.legend(loc="best")
plt.show()
I have 2 numpy arrays with data, say x,y, and I apply plt.step() and get a continues (step) curve of it.
I would like to be able to create this function by my own, meaning I want to have an (zero order hold) step approximation to the value of y for x that does not actually exist in the original x array.
For example, in the following link I want to have the 'new' actual rectangle sine values, and not only plotted:
https://matplotlib.org/gallery/lines_bars_and_markers/step_demo.html#sphx-glr-gallery-lines-bars-and-markers-step-demo-py
You can use scipy's interp1d to create a step function. Default the interpolation is 'linear', but you can change it to 'next', 'previous' or 'nearest' for a step function.
A standard step function is obtained from step_fun = interp1d(x, y, kind='previous') and then calling it as step_fun(new_x).
The following code compares different types of "interpolation":
from matplotlib import pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
x = np.random.uniform(0.1, 0.7, 20).cumsum()
y = np.sin(x)
kinds = ['linear', 'previous', 'next', 'nearest', 'cubic']
for i, kind in enumerate(kinds):
function_from_points = interp1d(x, y + i, kind=kind)
x_detailed = np.linspace(x[0], x[-1], 1000)
plt.plot(x_detailed, function_from_points(x_detailed), color='dodgerblue')
plt.scatter(x, y + i, color='crimson')
plt.yticks(range(len(kinds)), kinds)
plt.show()
You can choose tick values and corresponding function values whichever you want. This is an example not equally spaced arguments and their values:
x = np.arange(20) + np.random.random(20)/2
y = np.sin(x / 2)**2 + np.random.random(20)/5
Remark: these two arrays must have equal size. If you want your own custom function, you can use np.vectorise:
x = np.arange(20) + np.random.random(20)/2
func = np.vectorize(lambda x: np.sin(x) + np.random.random()/5)
y = func(x)
From https://stackoverflow.com/a/30460089/2202107, we can generate CDF of a normal distribution:
import numpy as np
import matplotlib.pyplot as plt
N = 100
Z = np.random.normal(size = N)
# method 1
H,X1 = np.histogram( Z, bins = 10, normed = True )
dx = X1[1] - X1[0]
F1 = np.cumsum(H)*dx
#method 2
X2 = np.sort(Z)
F2 = np.array(range(N))/float(N)
# plt.plot(X1[1:], F1)
plt.plot(X2, F2)
plt.show()
Question: How do we generate the "original" normal distribution, given only x (eg X2) and y (eg F2) coordinates?
My first thought was plt.plot(x,np.gradient(y)), but gradient of y was all zero (data points are evenly spaced in y, but not in x) These kind of data is often met in percentile calculations. The key is to get the data evenly space in x and not in y, using interpolation:
x=X2
y=F2
num_points=10
xinterp = np.linspace(-2,2,num_points)
yinterp = np.interp(xinterp, x, y)
# for normalizing that sum of all bars equals to 1.0
tot_val=1.0
normalization_factor = tot_val/np.trapz(np.ones(len(xinterp)),yinterp)
plt.bar(xinterp, normalization_factor * np.gradient(yinterp), width=0.2)
plt.show()
output looks good to me:
I put my approach here for examination. Let me know if my logic is flawed.
One issue is: when num_points is large, the plot looks bad, but it's a issue in discretization, not sure how to avoid it.
Related posts:
I failed to understand why the answer was so complicated in https://stats.stackexchange.com/a/6065/131632
I also didn't understand why my approach was different than Generate distribution given percentile ranks
I have the following quadratic form f(x) = x^T A x - b^T x and i've used numpy to define my matrices A, b:
A = np.array([[4,3], [3,7]])
b = np.array([3,-7])
So we're talking about 2 dimensions here, meaning that the contour plot will have the axes x1 and x2 and I want these to span from -4 to 4.
I've tried to experiment by doing
u = np.linspace(-4,4,100)
x, y = np.meshgrid(u,u)
in order to create the 2 axis x1 and x2 but then I dont know how to define my function f(x) and if I do plt.contour(x,y,f) it won't work because the function f(x) is defined with only x as an argument.
Any ideas would be greatly appreciated. Thanks !
EDIT : I managed to "solve" the problem by doing the operations between the quadratic form , for example x^T A x, and ended up with a function of x1,x2 where these are the components of x vector. After that I did
u = np.linspace(-4,4,100)
x, y = np.meshgrid(u,u)
z = 1.5*(x**2) + 3*(y**2) - 2*x + 8*y + 2*x*y #(thats the function i ended up with)
plt.contour(x, y, z)
If Your transformation matrices A, b look like
A = np.array([[4,3], [3,7]])
b = np.array([3,-7])
and Your data look like
u = np.linspace(-4,4,100)
x, y = np.meshgrid(u,u)
x.shape
x and y will have the shapes (100,100).
You can define f(x) as
def f(x):
return np.dot(np.dot(x.T,A),x) - np.dot(b,x)
to then input anything with the shape (2, N) into the function f.
I am unfortunately not sure, which values You want to feed into it.
But one example would be: [(-4:4), (-4:4)]
plt.contour(x, y, f(x[0:2,:]))
update
If the visualization of the contour plot does not fit Your purpose, You can use other plots, e.g. 3D visualizations.
from mpl_toolkits.mplot3d import Axes3D # This import has side effects required for the kwarg projection='3d' in the call to fig.add_subplot
fig = plt.figure(figsize=(40,20))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x,y, f(x[0:2,:]))
plt.show()
If You expect other values in the z-dimension, the projection f might be off.
For other 3d plots see: https://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html
you could try something like this:
import numpy as np
import matplotlib.pyplot as plt
A = np.array([[4,3], [3,7]])
n_points = 100
u = np.linspace(-4, 4, n_points)
x, y = np.meshgrid(u, u)
X = np.vstack([x.flatten(), y.flatten()])
f_x = np.dot(np.dot(X.T, A), X)
f_x = np.diag(f_x).reshape(n_points, n_points)
plt.figure()
plt.contour(x, y, f_x)
Another alternative is to compute f_x as follows.
f_x = np.zeros((n_points, n_points))
for i in range(n_points):
for j in range(n_points):
in_v = np.array([[x[i][j]], [y[i][j]]])
f_x[i][j] = np.dot(np.dot(in_v.T, A), in_v)
I want to generate a Gaussian distribution in Python with the x and y dimensions denoting position and the z dimension denoting the magnitude of a certain quantity.
The distribution has a maximum value of 2e6 and a standard deviation sigma=0.025.
In MATLAB I can do this with:
x1 = linspace(-1,1,30);
x2 = linspace(-1,1,30);
mu = [0,0];
Sigma = [.025,.025];
[X1,X2] = meshgrid(x1,x2);
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
F = 314159.153*reshape(F,length(x2),length(x1));
surf(x1,x2,F);
In Python, what I have so far is:
x = np.linspace(-1,1,30)
y = np.linspace(-1,1,30)
mu = (np.median(x),np.median(y))
sigma = (.025,.025)
There is a Numpy function numpy.random.multivariate_normal what can supposedly do the same as MATLAB's mvnpdf, but I am struggling to undestand the documentation. Especially in obtaining the covariance matrix needed by numpy.random.multivariate_normal.
As of scipy 0.14, you can use scipy.stats.multivariate_normal.pdf()
import numpy as np
from scipy.stats import multivariate_normal
x, y = np.mgrid[-1.0:1.0:30j, -1.0:1.0:30j]
# Need an (N, 2) array of (x, y) pairs.
xy = np.column_stack([x.flat, y.flat])
mu = np.array([0.0, 0.0])
sigma = np.array([.025, .025])
covariance = np.diag(sigma**2)
z = multivariate_normal.pdf(xy, mean=mu, cov=covariance)
# Reshape back to a (30, 30) grid.
z = z.reshape(x.shape)
I am working on a scikit called scikit-guess that contains some fast estimation routines for non-linear fits. It has a function skg.ngauss.model (also accessible as skg.ngauss_fit.model or skg.ngauss.ngauss_fit.model) which does exactly what you want. The nice thing is that it's not a PDF, so you set the amplitude out of the box:
import numpy as np
import skg.ngauss
a = 2e6
mu = 0, 0
sigma = 0.025, 0.025
x = y = np.linspace(-1, 1, 31)
cov = np.diag(sigma)**2
X = np.meshgrid(x, y)
data = skg.ngauss.model(X, a, mu, cov, axis=0)
You need to tell it axis=0 because it automatically stacks your arrays for you. To avoid passing in that argument, you could write
X = np.stack(np.meshgrid(x, y), axis=-1)
You can plot the result:
from matplotlib import pyplot as plt
plt.imshow(data)
plt.show()
This is not a very exciting distribution because the spread is so small that you end up with a value of ~2e-5 just one pixel away. You may want to up your sampling space to get any sort of meaningful resolution.
Note: At time of writing, the fitting function (ngauss_fit) is still buggy, but the model has been tested successfully, just not in the scikit.
Disclaimer: In case it wasn't obvious from the above, I am the author of scikit-guess.