I have 2 numpy arrays with data, say x,y, and I apply plt.step() and get a continues (step) curve of it.
I would like to be able to create this function by my own, meaning I want to have an (zero order hold) step approximation to the value of y for x that does not actually exist in the original x array.
For example, in the following link I want to have the 'new' actual rectangle sine values, and not only plotted:
https://matplotlib.org/gallery/lines_bars_and_markers/step_demo.html#sphx-glr-gallery-lines-bars-and-markers-step-demo-py
You can use scipy's interp1d to create a step function. Default the interpolation is 'linear', but you can change it to 'next', 'previous' or 'nearest' for a step function.
A standard step function is obtained from step_fun = interp1d(x, y, kind='previous') and then calling it as step_fun(new_x).
The following code compares different types of "interpolation":
from matplotlib import pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
x = np.random.uniform(0.1, 0.7, 20).cumsum()
y = np.sin(x)
kinds = ['linear', 'previous', 'next', 'nearest', 'cubic']
for i, kind in enumerate(kinds):
function_from_points = interp1d(x, y + i, kind=kind)
x_detailed = np.linspace(x[0], x[-1], 1000)
plt.plot(x_detailed, function_from_points(x_detailed), color='dodgerblue')
plt.scatter(x, y + i, color='crimson')
plt.yticks(range(len(kinds)), kinds)
plt.show()
You can choose tick values and corresponding function values whichever you want. This is an example not equally spaced arguments and their values:
x = np.arange(20) + np.random.random(20)/2
y = np.sin(x / 2)**2 + np.random.random(20)/5
Remark: these two arrays must have equal size. If you want your own custom function, you can use np.vectorise:
x = np.arange(20) + np.random.random(20)/2
func = np.vectorize(lambda x: np.sin(x) + np.random.random()/5)
y = func(x)
Related
I have been able to interpolate values successfully from linear values of x to sine-like values of y.
However - I am struggling to interpolate the other way - from nonlinear values of y to linear values of x.
The below is a toy example
import matplotlib.pylab as plt
from scipy import interpolate
#create 100 x values
x = np.linspace(-np.pi, np.pi, 100)
#create 100 values of y where y= sin(x)
y=np.sin(x)
#learn function to map y from x
f = interpolate.interp1d(x, y)
With new values of linear x
xnew = np.array([-1,1])
I get correctly interpolated values of nonlinear y
ynew = f(xnew)
print(ynew)
array([-0.84114583, 0.84114583])
The problem comes when I try and interpolate values of x from y.
I create a new function, the reverse of f:
f2 = interpolate.interp1d(y,x,kind='cubic')
I put in values of y that I successfully interpolated before
ynew=np.array([-0.84114583, 0.84114583])
I am expecting to get the original values of x [-1, 1]
But I get:
array([-1.57328791, 1.57328791])
I have tried putting in other values for the 'kind' parameter with no luck and am not sure if I have got the wrong approach here. Thanks for your help
I guess the problem raises from the fact, that x is not a function of y, since for an arbitrary y value there may be more than one x value found.
Take a look at a truncated range of data.
When x ranges from 0 to np.pi/2, then for every y value there is a unique x value.
In this case the snippet below works as expected.
>>> import numpy as np
>>> from scipy import interpolate
>>> x = np.linspace(0, np.pi / 2, 100)
>>> y = np.sin(x)
>>> f = interpolate.interp1d(x, y)
>>> f([0, 0.1, 0.3, 0.5])
array([0. , 0.09983071, 0.29551713, 0.47941047])
>>> f2 = interpolate.interp1d(y, x)
>>> f2([0, 0.09983071, 0.29551713, 0.47941047])
array([0. , 0.1 , 0.3 , 0.50000001])
Maxim provided the reason for this behavior. This interpolation is a class designed to work for functions. In your case, y=arcsin(x) is only in a limited interval a function. This leads to interesting phenomena in the interpolation routine that interpolates to the nearest y-value which in the case of the arcsin() function is not necessarily the next value in the x-y curve but maybe several periods away. An illustration:
import numpy as np
import matplotlib.pylab as plt
from scipy import interpolate
xmin=-np.pi
xmax=np.pi
fig, axes = plt.subplots(3, 3, figsize=(15, 10))
for i, fac in enumerate([2, 1, 0.5]):
x = np.linspace(xmin * fac, xmax*fac, 100)
y=np.sin(x)
#x->y
f = interpolate.interp1d(x, y)
x_fit = np.linspace(xmin*fac, xmax*fac, 1000)
y_fit = f(x_fit)
axes[i][0].plot(x_fit, y_fit)
axes[i][0].set_ylabel(f"sin period {fac}")
if not i:
axes[i][0].set_title(label="interpolation x->y")
#y->x
f2 = interpolate.interp1d(y, x)
y2_fit = np.linspace(.99 * min(y), .99 * max(y), 1000)
x2_fit = f2(y2_fit)
axes[i][1].plot(x2_fit, y2_fit)
if not i:
axes[i][1].set_title(label="interpolation y->x")
#y->x with cubic interpolation
f3 = interpolate.interp1d(y, x, kind="cubic")
y3_fit = np.linspace(.99 * min(y), .99 * max(y), 1000)
x3_fit = f3(y3_fit)
axes[i][2].plot(x3_fit, y3_fit)
if not i:
axes[i][2].set_title(label="cubic interpolation y->x")
plt.show()
As you can see, the interpolation works along the ordered list of y-values (as you instructed it to), and this works particularly badly with cubic interpolation.
From https://stackoverflow.com/a/30460089/2202107, we can generate CDF of a normal distribution:
import numpy as np
import matplotlib.pyplot as plt
N = 100
Z = np.random.normal(size = N)
# method 1
H,X1 = np.histogram( Z, bins = 10, normed = True )
dx = X1[1] - X1[0]
F1 = np.cumsum(H)*dx
#method 2
X2 = np.sort(Z)
F2 = np.array(range(N))/float(N)
# plt.plot(X1[1:], F1)
plt.plot(X2, F2)
plt.show()
Question: How do we generate the "original" normal distribution, given only x (eg X2) and y (eg F2) coordinates?
My first thought was plt.plot(x,np.gradient(y)), but gradient of y was all zero (data points are evenly spaced in y, but not in x) These kind of data is often met in percentile calculations. The key is to get the data evenly space in x and not in y, using interpolation:
x=X2
y=F2
num_points=10
xinterp = np.linspace(-2,2,num_points)
yinterp = np.interp(xinterp, x, y)
# for normalizing that sum of all bars equals to 1.0
tot_val=1.0
normalization_factor = tot_val/np.trapz(np.ones(len(xinterp)),yinterp)
plt.bar(xinterp, normalization_factor * np.gradient(yinterp), width=0.2)
plt.show()
output looks good to me:
I put my approach here for examination. Let me know if my logic is flawed.
One issue is: when num_points is large, the plot looks bad, but it's a issue in discretization, not sure how to avoid it.
Related posts:
I failed to understand why the answer was so complicated in https://stats.stackexchange.com/a/6065/131632
I also didn't understand why my approach was different than Generate distribution given percentile ranks
I have the following quadratic form f(x) = x^T A x - b^T x and i've used numpy to define my matrices A, b:
A = np.array([[4,3], [3,7]])
b = np.array([3,-7])
So we're talking about 2 dimensions here, meaning that the contour plot will have the axes x1 and x2 and I want these to span from -4 to 4.
I've tried to experiment by doing
u = np.linspace(-4,4,100)
x, y = np.meshgrid(u,u)
in order to create the 2 axis x1 and x2 but then I dont know how to define my function f(x) and if I do plt.contour(x,y,f) it won't work because the function f(x) is defined with only x as an argument.
Any ideas would be greatly appreciated. Thanks !
EDIT : I managed to "solve" the problem by doing the operations between the quadratic form , for example x^T A x, and ended up with a function of x1,x2 where these are the components of x vector. After that I did
u = np.linspace(-4,4,100)
x, y = np.meshgrid(u,u)
z = 1.5*(x**2) + 3*(y**2) - 2*x + 8*y + 2*x*y #(thats the function i ended up with)
plt.contour(x, y, z)
If Your transformation matrices A, b look like
A = np.array([[4,3], [3,7]])
b = np.array([3,-7])
and Your data look like
u = np.linspace(-4,4,100)
x, y = np.meshgrid(u,u)
x.shape
x and y will have the shapes (100,100).
You can define f(x) as
def f(x):
return np.dot(np.dot(x.T,A),x) - np.dot(b,x)
to then input anything with the shape (2, N) into the function f.
I am unfortunately not sure, which values You want to feed into it.
But one example would be: [(-4:4), (-4:4)]
plt.contour(x, y, f(x[0:2,:]))
update
If the visualization of the contour plot does not fit Your purpose, You can use other plots, e.g. 3D visualizations.
from mpl_toolkits.mplot3d import Axes3D # This import has side effects required for the kwarg projection='3d' in the call to fig.add_subplot
fig = plt.figure(figsize=(40,20))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x,y, f(x[0:2,:]))
plt.show()
If You expect other values in the z-dimension, the projection f might be off.
For other 3d plots see: https://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html
you could try something like this:
import numpy as np
import matplotlib.pyplot as plt
A = np.array([[4,3], [3,7]])
n_points = 100
u = np.linspace(-4, 4, n_points)
x, y = np.meshgrid(u, u)
X = np.vstack([x.flatten(), y.flatten()])
f_x = np.dot(np.dot(X.T, A), X)
f_x = np.diag(f_x).reshape(n_points, n_points)
plt.figure()
plt.contour(x, y, f_x)
Another alternative is to compute f_x as follows.
f_x = np.zeros((n_points, n_points))
for i in range(n_points):
for j in range(n_points):
in_v = np.array([[x[i][j]], [y[i][j]]])
f_x[i][j] = np.dot(np.dot(in_v.T, A), in_v)
I have the following data:
x = [0, 2, 4, 8, 30]
y = [1.2e-3, 3.5e-4, 5.1e-5, 1.6e-5, 2e-7]
I'm trying to interpolate to get y from a given x value.
When plotted the data looks like:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1)
ax.semilogy(x, y, 'o-')
plt.show()
So say I'm interested in finding what value of x is for a y value of 3e-5.
I can get the x value of a given y by:
z = np.linspace(0, 30, 10000)
logy = np.log10(y)
yy = np.power(10.0, np.interp(z, x, logy))
z[np.isclose(3.5e-5, yy, atol=1e-8)]
Out:
array([5.29852985])
But I have to adjust the atol if I change the value to get a single match and also have to create a load more data points to get the resolution.
Is there a simpler way to do this? Thanks.
Let's say you want to find the x_f corresponding to y_f. Assuming the entries in your original list y are in strictly decreasing order and the entries in x are increasing, you find the first entry in y that is less than or equal to your y_0. Say it is the one at index i, so the x,y tuples that make up your the relevant linear function from your piecewise partition will be (x[i-1],y[i-1]) and (x[i], y[i]).
Using the formula for the line given two points on the line we can get x_f:
x_f = x[i-1] + (x[i]-x[i-1])/(y[i]-y[i-1])*(y_f-y[i-1])
With the griddata in scipy used to perform interpolation (cubic splines and others), we have to put as parameters the data from which we interpolate, and at the same time, the new points on which we want to make a "prediction".
Is it possible to construct a "griddata object", that would have a method to predict a new point without reconstructing a new interpolation spline each time... ?
(for example, like with regression tree, we first construct the tree, then we aplly the .predict(new_points) method).
Here is an example :
import pandas as pd
import numpy as np
import sklearn
import scipy.interpolate as itp
n = 100
x1 = np.linspace(-2, 4, n)
X1 = []
X2 = []
for x in x1:
X1.append( [x for i in range(0, n)] )
X2.append( np.linspace(9, 15, n) )
X1 = np.array(X1).flatten()
X2 = np.array(X2).flatten()
Y1 = exp( 2*X1 )
Y2 = 3 * sqrt(X2)
#Data frames :
X = np.transpose( [X1, X2] )
X = pd.DataFrame(X, columns=["X1", "X2"])
Y = np.transpose( [Y1, Y2] )
Y = pd.DataFrame(Y, columns=["Y1", "Y2"])
X_new = np.transpose( [[-2], [9]] )
inter_cubic = itp.griddata(X, Y, X_new, method='cubic', fill_value=nan, rescale=False)
print(inter_cubic)
print(exp(2*(-2)), 3*sqrt(9))
Now inter_cubic is just an numpy array..
Is there a way of performing it, or can we use another "spline" constructor?
If you look at the source code for griddata (scroll down past the docstring to see the actual code), you'll see that it is a wrapper for several other interpolation functions, most of which work the way you want. In your case, with 2-d data and cubic interpolation, griddata does this:
ip = CloughTocher2DInterpolator(points, values, fill_value=fill_value,
rescale=rescale)
return ip(xi)
So instead of using griddata, you could use CloughTocher2DInterpolator. Specifically, using the names from your script, you would create the interpolator with
ip = itp.CloughTocher2DInterpolator(X, Y, fill_value=np.nan, rescale=False)
The object ip doesn't have a predict method; you just call it with the points at which you want to evaluate the interpolator. In your case, your would write
Y_new = ip(X_new)