Related
I have been able to interpolate values successfully from linear values of x to sine-like values of y.
However - I am struggling to interpolate the other way - from nonlinear values of y to linear values of x.
The below is a toy example
import matplotlib.pylab as plt
from scipy import interpolate
#create 100 x values
x = np.linspace(-np.pi, np.pi, 100)
#create 100 values of y where y= sin(x)
y=np.sin(x)
#learn function to map y from x
f = interpolate.interp1d(x, y)
With new values of linear x
xnew = np.array([-1,1])
I get correctly interpolated values of nonlinear y
ynew = f(xnew)
print(ynew)
array([-0.84114583, 0.84114583])
The problem comes when I try and interpolate values of x from y.
I create a new function, the reverse of f:
f2 = interpolate.interp1d(y,x,kind='cubic')
I put in values of y that I successfully interpolated before
ynew=np.array([-0.84114583, 0.84114583])
I am expecting to get the original values of x [-1, 1]
But I get:
array([-1.57328791, 1.57328791])
I have tried putting in other values for the 'kind' parameter with no luck and am not sure if I have got the wrong approach here. Thanks for your help
I guess the problem raises from the fact, that x is not a function of y, since for an arbitrary y value there may be more than one x value found.
Take a look at a truncated range of data.
When x ranges from 0 to np.pi/2, then for every y value there is a unique x value.
In this case the snippet below works as expected.
>>> import numpy as np
>>> from scipy import interpolate
>>> x = np.linspace(0, np.pi / 2, 100)
>>> y = np.sin(x)
>>> f = interpolate.interp1d(x, y)
>>> f([0, 0.1, 0.3, 0.5])
array([0. , 0.09983071, 0.29551713, 0.47941047])
>>> f2 = interpolate.interp1d(y, x)
>>> f2([0, 0.09983071, 0.29551713, 0.47941047])
array([0. , 0.1 , 0.3 , 0.50000001])
Maxim provided the reason for this behavior. This interpolation is a class designed to work for functions. In your case, y=arcsin(x) is only in a limited interval a function. This leads to interesting phenomena in the interpolation routine that interpolates to the nearest y-value which in the case of the arcsin() function is not necessarily the next value in the x-y curve but maybe several periods away. An illustration:
import numpy as np
import matplotlib.pylab as plt
from scipy import interpolate
xmin=-np.pi
xmax=np.pi
fig, axes = plt.subplots(3, 3, figsize=(15, 10))
for i, fac in enumerate([2, 1, 0.5]):
x = np.linspace(xmin * fac, xmax*fac, 100)
y=np.sin(x)
#x->y
f = interpolate.interp1d(x, y)
x_fit = np.linspace(xmin*fac, xmax*fac, 1000)
y_fit = f(x_fit)
axes[i][0].plot(x_fit, y_fit)
axes[i][0].set_ylabel(f"sin period {fac}")
if not i:
axes[i][0].set_title(label="interpolation x->y")
#y->x
f2 = interpolate.interp1d(y, x)
y2_fit = np.linspace(.99 * min(y), .99 * max(y), 1000)
x2_fit = f2(y2_fit)
axes[i][1].plot(x2_fit, y2_fit)
if not i:
axes[i][1].set_title(label="interpolation y->x")
#y->x with cubic interpolation
f3 = interpolate.interp1d(y, x, kind="cubic")
y3_fit = np.linspace(.99 * min(y), .99 * max(y), 1000)
x3_fit = f3(y3_fit)
axes[i][2].plot(x3_fit, y3_fit)
if not i:
axes[i][2].set_title(label="cubic interpolation y->x")
plt.show()
As you can see, the interpolation works along the ordered list of y-values (as you instructed it to), and this works particularly badly with cubic interpolation.
My task is to do first an integration and second a trapezoid integration with Python of f(x)=x^2
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-10,10)
y = x**2
l=plt.plot(x,y)
plt.show(l)
Now I want to integrate this function to get this: F(x)=(1/3)x^3 with the picture:
This should be the output in the end:
Could someone explain me how to get the antiderivative F(x) of f(x)=x^2 with python?
I want to do this with a normal integration and a trapeze integration. For trapezoidal integration from (-10 to 10) and a step size of 0.01 (width of the trapezoids). In the end I want to get the function F(x)=(1/3)x^3 in both cases. How can I reach this?
Thanks for helping me.
There are two key observations:
the trapezoidal rule refers to numeric integration, whose output is not an integral function but a number
integration is up to an arbitrary constant which is not included in your definition of F(x)
With this in mind, you can use scipy.integrate.trapz() to define an integral function:
import numpy as np
from scipy.integrate import trapz
def numeric_integral(x, f, c=0):
return np.array([sp.integrate.trapz(f(x[:i]), x[:i]) for i in range(len(x))]) + c
or, more efficiently, using scipy.integrate.cumtrapz() (which does the computation from above):
import numpy as np
from scipy.integrate import cumtrapz
def numeric_integral(x, f, c=0):
return cumtrapz(f(x), x, initial=c)
This plots as below:
import matplotlib.pyplot as plt
def func(x):
return x ** 2
x = np.arange(-10, 10, 0.01)
y = func(x)
Y = numeric_integral(x, func)
plt.plot(x, y, label='f(x) = x²')
plt.plot(x, Y, label='F(x) = x³/3 + c')
plt.plot(x, x ** 3 / 3, label='F(x) = x³/3')
plt.legend()
which provides you the desidered result except for the arbitrary constant, which you should specify yourself.
For good measure, while not relevant in this case, note that np.arange() does not provide stable results if used with a fractional step. Typically, one would use np.linspace() instead.
The cumtrapz function from scipy will provide an antiderivative using trapezoid integration:
from scipy.integrate import cumtrapz
yy = cumtrapz(y, x, initial=0)
# make yy==0 around x==0 (optional)
i_x0 = np.where(x >= 0)[0][0]
yy -= yy[i_x0]
Trapezoid integration
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-10, 10, 0.1)
f = x**2
F = [-333.35]
for i in range(1, len(x) - 1):
F.append((f[i] + f[i - 1])*(x[i] - x[i - 1])/2 + F[i - 1])
F = np.array(F)
fig, ax = plt.subplots()
ax.plot(x, f)
ax.plot(x[1:], F)
plt.show()
Here I have applied the theoretical formula (f[i] + f[i - 1])*(x[i] - x[i - 1])/2 + F[i - 1], while the integration is done in the block:
F = [-333.35]
for i in range(1, len(x) - 1):
F.append((f[i] + f[i - 1])*(x[i] - x[i - 1])/2 + F[i - 1])
F = np.array(F)
Note that, in order to plot x and F, they must have the same number of element; so I ignore the first element of x, so they both have 199 element. This is a result of the trapezoid method: if you integrate an array f of n elements, you obtain an array F of n-1 elements. Moreover, I set the initial value of F to -333.35 at x = -10, this is the arbitrary constant from the integration process, I decided that value in order to pass the function near the origin.
Analytical integration
import sympy as sy
import numpy as np
import matplotlib.pyplot as plt
x = sy.symbols('x')
f = x**2
F = sy.integrate(f, x)
xv = np.arange(-10, 10, 0.1)
fv = sy.lambdify(x, f)(xv)
Fv = sy.lambdify(x, F)(xv)
fig, ax = plt.subplots()
ax.plot(xv, fv)
ax.plot(xv, Fv)
plt.show()
Here I use the symbolic math through sympy module. The integration is done in the block:
F = sy.integrate(f, x)
Note that, in this case, F and x have already the same number of elements. Moreover, the code is simpler.
I am trying to create a 3D array to then perform volume rendering on (in other software or volume rendering packages) of strange attractor like Lorenz Attractor. It is easy enough to plot the attractor from data points and provide a value to assign a color and visualize in matplotlib for example.
However I would like a filled volume array. I have tried interpolation methods like griddata but it doesn't give the desired result. What I am envisioning is something like:
Which is from the wikipedia page.
Here is what I have tried but if you open the result in a simple viewer it doesn't look great. I am thinking instead maybe doing a interpolation only between the points that make up the x,y,z array... I am a little lost after playing with this for several hours. What I think I need is to take the points and do some sort of interpolation or filling into an array, here I am calling interp_im. This can then be viewed in volume rendering. Any help is greatly appreciated on this!
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
from scipy.interpolate import griddata
from scipy.interpolate import LinearNDInterpolator
from skimage.external import tifffile
rho = 28.0
sigma = 10.0
beta = 8.0 / 3.0
def f(state, t):
x, y, z = state # Unpack the state vector
return sigma * (y - x), x * (rho - z) - y, x * y - beta * z # Derivatives
state0 = [1.0, 1.0, 1.0]
t = np.arange(0.0, 40.0, 0.01) #t = np.arange(0.0, 40.0, 0.01)
states = odeint(f, state0, t)
# shift x,y,z positions to int for regular image volume
x = states[:, 0]
y = states[:, 1]
z = states[:, 2]
x_min = x.min()
y_min = y.min()
z_min = z.min()
states_int = states + [abs(x_min),abs(y_min),abs(z_min)] + 1
states_int = states_int * 10
states_int = states_int.astype(int)
# values will be in order of tracing for color
values = []
for i,j in enumerate(states_int):
values.append(i*10)
values = np.asarray(values)
fig = plt.figure()
ax = fig.gca(projection='3d')
sc = ax.scatter(states_int[:, 0], states_int[:, 1], states_int[:, 2],c=values)
plt.colorbar(sc)
plt.draw()
plt.show()
#print(x.shape, y.shape, z.shape, values.shape)
#Interpolate for volume rendering
x_ = np.linspace(0,999,500)
y_ = np.linspace(0,999,500)
z_ = np.linspace(0,999,500)
xx,yy,zz = np.meshgrid(x_,y_,z_, sparse = True)
#
# X = states_int.tolist()
#
interp_im = griddata(states_int, values, (xx,yy,zz), method='linear')
interp_im = interp_im.astype(np.uint16)
np.save('interp_im.npy', interp_im)
tifffile.imsave('LorenzAttractor.tif', interp_im)
Your data is in the volume, it is just pixelated. If you blur the volume, for example with a gaussian, you get something much more usable. For example:
from scipy import ndimage
vol = np.zeros((512, 512, 512), dtype=states_int.dtype)
# add data to vol
vol[tuple(np.split(states_int, vol.ndim, axis=1))] = values[:, np.newaxis]
# apply gaussian filter, sigma=5 in this case
vol = ndimage.gaussian_filter(vol, 5)
I would then use something like napari to view the data in 3D:
import napari
with napari.gui_qt():
napari.view_image(v)
To make the volume smoother you may want to reduce your integration step size.
Following the recommendations in this answer I have used several combination of values for beta0, and as shown here, the values from polyfit.
This example is UPDATED in order to show the effect of relative scales of values of X versus Y (X range is 0.1 to 100 times Y):
from random import random, seed
from scipy import polyfit
from scipy import odr
import numpy as np
from matplotlib import pyplot as plt
seed(1)
X = np.array([random() for i in range(1000)])
Y = np.array([i + random()**2 for i in range(1000)])
for num in range(1, 5):
plt.subplot(2, 2, num)
plt.title('X range is %.1f times Y' % (float(100 / max(X))))
X *= 10
z = np.polyfit(X, Y, 1)
plt.plot(X, Y, 'k.', alpha=0.1)
# Fit using odr
def f(B, X):
return B[0]*X + B[1]
linear = odr.Model(f)
mydata = odr.RealData(X, Y)
myodr = odr.ODR(mydata, linear, beta0=z)
myodr.set_job(fit_type=0)
myoutput = myodr.run()
a, b = myoutput.beta
sa, sb = myoutput.sd_beta
xp = np.linspace(plt.xlim()[0], plt.xlim()[1], 1000)
yp = a*xp+b
plt.plot(xp, yp, label='ODR')
yp2 = z[0]*xp+z[1]
plt.plot(xp, yp2, label='polyfit')
plt.legend()
plt.ylim(-1000, 2000)
plt.show()
It seems that no combination of beta0 helps... The only way to get polyfit and ODR fit similar is to swap X and Y, OR as shown here to increase the range of values of X with regard to Y, still not really a solution :)
=== EDIT ===
I do not want ODR to be the same as polyfit. I am showing polyfit just to emphasize that the ODR fit is wrong and it is not a problem of the data.
=== SOLUTION ===
thanks to #norok2 answer when Y range is 0.001 to 100000 times X:
from random import random, seed
from scipy import polyfit
from scipy import odr
import numpy as np
from matplotlib import pyplot as plt
seed(1)
X = np.array([random() / 1000 for i in range(1000)])
Y = np.array([i + random()**2 for i in range(1000)])
plt.figure(figsize=(12, 12))
for num in range(1, 10):
plt.subplot(3, 3, num)
plt.title('Y range is %.1f times X' % (float(100 / max(X))))
X *= 10
z = np.polyfit(X, Y, 1)
plt.plot(X, Y, 'k.', alpha=0.1)
# Fit using odr
def f(B, X):
return B[0]*X + B[1]
linear = odr.Model(f)
mydata = odr.RealData(X, Y,
sy=min(1/np.var(Y), 1/np.var(X))) # here the trick!! :)
myodr = odr.ODR(mydata, linear, beta0=z)
myodr.set_job(fit_type=0)
myoutput = myodr.run()
a, b = myoutput.beta
sa, sb = myoutput.sd_beta
xp = np.linspace(plt.xlim()[0], plt.xlim()[1], 1000)
yp = a*xp+b
plt.plot(xp, yp, label='ODR')
yp2 = z[0]*xp+z[1]
plt.plot(xp, yp2, label='polyfit')
plt.legend()
plt.ylim(-1000, 2000)
plt.show()
The key difference between polyfit() and the Orthogonal Distance Regression (ODR) fit is that polyfit works under the assumption that the error on x is negligible. If this assumption is violated, like it is in your data, you cannot expect the two methods to produce similar results.
In particular, ODR() is very sensitive to the errors you specify.
If you do not specify any error/weighting, it will assign a value of 1 for both x and y, meaning that any scale difference between x and y will affect the results (the so-called numerical conditioning).
On the contrary, polyfit(), before computing the fit, applies some sort of pre-whitening to the data (see around line 577 of its source code) for better numerical conditioning.
Therefore, if you want ODR() to match polyfit(), you could simply fine-tune the error on Y to change your numerical conditioning.
I tested that this works for any numerical conditioning between 1e-10 and 1e10 of your Y (it is / 10. or 1e-1 in your example).
mydata = odr.RealData(X, Y)
# equivalent to: odr.RealData(X, Y, sx=1, sy=1)
to:
mydata = odr.RealData(X, Y, sx=1, sy=1/np.var(Y))
(EDIT: note there was a typo on the line above)
I tested that this works for any numerical conditioning between 1e-10 and 1e10 of your Y (it is / 10. or 1e-1 in your example).
Note that this would only make sense for well-conditioned fits.
I cannot format source code in a comment, and so place it here. This code uses ODR to calculate fit statistics, note the line that has "parameter order for odr" such that I use a wrapper function for the ODR call to my "actual" function.
from scipy.optimize import curve_fit
import numpy as np
import scipy.odr
import scipy.stats
x = np.array([5.357, 5.797, 5.936, 6.161, 6.697, 6.731, 6.775, 8.442, 9.861])
y = np.array([0.376, 0.874, 1.049, 1.327, 2.054, 2.077, 2.138, 4.744, 7.104])
def f(x,b0,b1):
return b0 + (b1 * x)
def f_wrapper_for_odr(beta, x): # parameter order for odr
return f(x, *beta)
parameters, cov= curve_fit(f, x, y)
model = scipy.odr.odrpack.Model(f_wrapper_for_odr)
data = scipy.odr.odrpack.Data(x,y)
myodr = scipy.odr.odrpack.ODR(data, model, beta0=parameters, maxit=0)
myodr.set_job(fit_type=2)
parameterStatistics = myodr.run()
df_e = len(x) - len(parameters) # degrees of freedom, error
cov_beta = parameterStatistics.cov_beta # parameter covariance matrix from ODR
sd_beta = parameterStatistics.sd_beta * parameterStatistics.sd_beta
ci = []
t_df = scipy.stats.t.ppf(0.975, df_e)
ci = []
for i in range(len(parameters)):
ci.append([parameters[i] - t_df * parameterStatistics.sd_beta[i], parameters[i] + t_df * parameterStatistics.sd_beta[i]])
tstat_beta = parameters / parameterStatistics.sd_beta # coeff t-statistics
pstat_beta = (1.0 - scipy.stats.t.cdf(np.abs(tstat_beta), df_e)) * 2.0 # coef. p-values
for i in range(len(parameters)):
print('parameter:', parameters[i])
print(' conf interval:', ci[i][0], ci[i][1])
print(' tstat:', tstat_beta[i])
print(' pstat:', pstat_beta[i])
print()
Ok so I started with Python a few days ago. I mainly use it for DataScience because I am an undergraduate chemistry student. Well, now I got a small problem on my hands, as I have to extrapolate a function. I know how to make simple diagrams and graphs, so please try to explain it as easy to me as possible. I start off with:
from matplotlib import pyplot as plt
from matplotlib import style
style.use('classic')
x = [0.632455532, 0.178885438, 0.050596443, 0.014310835, 0.004047715]
y = [114.75, 127.5, 139.0625, 147.9492188, 153.8085938]
x2 = [0.707, 0.2, 0.057, 0.016, 0.00453]
y2 = [2.086, 7.525, 26.59375,87.03125, 375.9765625]
so with these values I have to work out a way to extrapolate in order to get a y(or y2) value when my x=0. I know how to do this mathematically, but I would like to know if python can do this and how do I execute it in Python. Is there a simple way? Can you give me maybe an example with my given values?
Thank you
Taking a quick look at your data,
from matplotlib import pyplot as plt
from matplotlib import style
style.use('classic')
x1 = [0.632455532, 0.178885438, 0.050596443, 0.014310835, 0.004047715]
y1 = [114.75, 127.5, 139.0625, 147.9492188, 153.8085938]
plt.plot(x1, y1)
x2 = [0.707, 0.2, 0.057, 0.016, 0.00453]
y2 = [2.086, 7.525, 26.59375,87.03125, 375.9765625]
plt.plot(x2, y2)
This is definitely not linear. If you know what sort of function this follows, you may want to use scipy's curve fitting to get a best-fit function which you can then use.
Edit:
If we convert the plots to log-log,
import numpy as np
plt.plot(np.log(x1), np.log(y1))
plt.plot(np.log(x2), np.log(y2))
they look pretty linear (if you squint a bit). Finding a best-fit line,
np.polyfit(np.log(x1), np.log(y1), 1)
# array([-0.05817402, 4.73809081])
np.polyfit(np.log(x2), np.log(y2), 1)
# array([-1.01664659, 0.36759068])
we can convert back to functions,
# f1:
# log(y) = -0.05817402 * log(x) + 4.73809081
# so
# y = (e ** 4.73809081) * x ** (-0.05817402)
def f1(x):
return np.e ** 4.73809081 * x ** (-0.05817402)
xs = np.linspace(0.01, 0.8, 100)
plt.plot(x1, y1, xs, f1(xs))
# f2:
# log(y) = -1.01664659 * log(x) + 0.36759068
# so
# y = (e ** 0.36759068) * x ** (-1.01664659)
def f2(x):
return np.e ** 0.36759068 * x ** (-1.01664659)
plt.plot(x2, y2, xs, f2(xs))
The second looks pretty darn good; the first still needs a bit of refinement (ie find a more representative function and curve-fit it). But you should have a pretty good picture of the process ;-)
Here's some example code that can hopefully help you get started on building a linear model for your purposes.
import numpy as np
from sklearn.linear_model import LinearRegression
from matplotlib import pyplot as plt
# sample data
x = [0.632455532, 0.178885438, 0.050596443, 0.014310835, 0.004047715]
y = [114.75, 127.5, 139.0625, 147.9492188, 153.8085938]
# linear model
lm = LinearRegression()
lm.fit(np.array(x).reshape(-1, 1), y)
test_x = np.linspace(0.01, 0.7, 100)
test_y = [lm.predict(xx) for xx in test_x]
## try linear model with log(x)
lm2 = LinearRegression()
lm2.fit(np.log(np.array(x)).reshape(-1, 1), y)
test_y2 = [lm2.predict(np.log(xx)) for xx in test_x]
# plot
plt.figure()
plt.plot(x, y, label='Given Data')
plt.plot(test_x, test_y, label='Linear Model')
plt.plot(test_x, test_y2, label='Log-Linear Model')
plt.legend()
Which produces the following:
As the #Hugh Bothwell showed, the values you gave did not have a linear relationship. However, taking the log of x seems to produce a better fit.