How to extrapolate a function based on x,y values? - python

Ok so I started with Python a few days ago. I mainly use it for DataScience because I am an undergraduate chemistry student. Well, now I got a small problem on my hands, as I have to extrapolate a function. I know how to make simple diagrams and graphs, so please try to explain it as easy to me as possible. I start off with:
from matplotlib import pyplot as plt
from matplotlib import style
style.use('classic')
x = [0.632455532, 0.178885438, 0.050596443, 0.014310835, 0.004047715]
y = [114.75, 127.5, 139.0625, 147.9492188, 153.8085938]
x2 = [0.707, 0.2, 0.057, 0.016, 0.00453]
y2 = [2.086, 7.525, 26.59375,87.03125, 375.9765625]
so with these values I have to work out a way to extrapolate in order to get a y(or y2) value when my x=0. I know how to do this mathematically, but I would like to know if python can do this and how do I execute it in Python. Is there a simple way? Can you give me maybe an example with my given values?
Thank you

Taking a quick look at your data,
from matplotlib import pyplot as plt
from matplotlib import style
style.use('classic')
x1 = [0.632455532, 0.178885438, 0.050596443, 0.014310835, 0.004047715]
y1 = [114.75, 127.5, 139.0625, 147.9492188, 153.8085938]
plt.plot(x1, y1)
x2 = [0.707, 0.2, 0.057, 0.016, 0.00453]
y2 = [2.086, 7.525, 26.59375,87.03125, 375.9765625]
plt.plot(x2, y2)
This is definitely not linear. If you know what sort of function this follows, you may want to use scipy's curve fitting to get a best-fit function which you can then use.
Edit:
If we convert the plots to log-log,
import numpy as np
plt.plot(np.log(x1), np.log(y1))
plt.plot(np.log(x2), np.log(y2))
they look pretty linear (if you squint a bit). Finding a best-fit line,
np.polyfit(np.log(x1), np.log(y1), 1)
# array([-0.05817402, 4.73809081])
np.polyfit(np.log(x2), np.log(y2), 1)
# array([-1.01664659, 0.36759068])
we can convert back to functions,
# f1:
# log(y) = -0.05817402 * log(x) + 4.73809081
# so
# y = (e ** 4.73809081) * x ** (-0.05817402)
def f1(x):
return np.e ** 4.73809081 * x ** (-0.05817402)
xs = np.linspace(0.01, 0.8, 100)
plt.plot(x1, y1, xs, f1(xs))
# f2:
# log(y) = -1.01664659 * log(x) + 0.36759068
# so
# y = (e ** 0.36759068) * x ** (-1.01664659)
def f2(x):
return np.e ** 0.36759068 * x ** (-1.01664659)
plt.plot(x2, y2, xs, f2(xs))
The second looks pretty darn good; the first still needs a bit of refinement (ie find a more representative function and curve-fit it). But you should have a pretty good picture of the process ;-)

Here's some example code that can hopefully help you get started on building a linear model for your purposes.
import numpy as np
from sklearn.linear_model import LinearRegression
from matplotlib import pyplot as plt
# sample data
x = [0.632455532, 0.178885438, 0.050596443, 0.014310835, 0.004047715]
y = [114.75, 127.5, 139.0625, 147.9492188, 153.8085938]
# linear model
lm = LinearRegression()
lm.fit(np.array(x).reshape(-1, 1), y)
test_x = np.linspace(0.01, 0.7, 100)
test_y = [lm.predict(xx) for xx in test_x]
## try linear model with log(x)
lm2 = LinearRegression()
lm2.fit(np.log(np.array(x)).reshape(-1, 1), y)
test_y2 = [lm2.predict(np.log(xx)) for xx in test_x]
# plot
plt.figure()
plt.plot(x, y, label='Given Data')
plt.plot(test_x, test_y, label='Linear Model')
plt.plot(test_x, test_y2, label='Log-Linear Model')
plt.legend()
Which produces the following:
As the #Hugh Bothwell showed, the values you gave did not have a linear relationship. However, taking the log of x seems to produce a better fit.

Related

Whittaker–Shannon (sinc) interpolation in Python

I am using interp1d from Scipy to interpolate a function with linear interpolation. Now I need to upgrade to Whittaker–Shannon interpolation. Is this already implemented somewhere? I am surprised it is not among the options of interp1d as it is a very common interpolation algorithm.
I am not familiar with sinc interpolation, but based on What's wrong with this Whittaker-Shannon-Kotel’nikov interpolation implementation? I roughly follow the same pattern
The idea is we need to resample the original data with less frequency than the original (which is represented by freq_s_ratio), then reconstruct the signal using sinc, and finally resample back to the original size
However, this caused boundary artifacts, but padding and truncate signal seems to be working. Here is my code
import numpy as np
import scipy.signal
import matplotlib.pyplot as plt
def rough_sinc_interp(samples, freq_s_ratio = 0.5):
offset_amount = int(len(samples)/2)
padded_samples = np.concatenate([ offset_amount*[samples[0]], samples, offset_amount*[samples[-1]]])
f_s = int(freq_s_ratio * len(padded_samples))
resamples = scipy.signal.resample(padded_samples, f_s)
T_s = 1/f_s
t = np.arange(0, 1, T_s)
y = np.zeros(len(t))
for k in range(1, len(resamples)):
y = y + resamples[k] * np.sinc((t - k*T_s)/T_s)
return scipy.signal.resample(y, len(padded_samples))[offset_amount:-offset_amount]
np.random.seed(1337)
signal_fn = lambda x: -1*(np.sin(x) + np.cos(x**2) + np.random.normal(scale=0.5, size=len(x)) + np.log(np.abs(x**2) + 0.1)) + 50
x = np.arange(0, 10, 0.05)
y = signal_fn(x)
plt.figure(figsize=(15, 7))
plt.plot(x, y, label="noisy")
plt.plot(x, rough_sinc_interp(y, freq_s_ratio=0.5), label="smooth - 50%")
plt.plot(x, rough_sinc_interp(y, freq_s_ratio=0.15), label="smooth - 15%")
plt.plot(x, rough_sinc_interp(y, freq_s_ratio=0.1), label="smooth - 10%")
plt.legend(loc="best")
plt.show()

Is there a way to plot Nullclines of a nonlinear system of ODEs

So I am trying to plot the nullclines of a system of ODEs, however I can't seem to plot them in the correct way. When I plot them, I manage to plot them according to time (t vs x and t vs y) but not at (x vs y). I'm not really sure how to explain it, and I think it would be better to just show it. I am trying to replicate this. The equations and parameters are given, however this was done in a program called XPP (I'll post these at the bottom), and there are some parameters that i don't understand what they mean.
My entire code is:
import numpy as np
from scipy import integrate
import matplotlib.pyplot as plt
# define system in terms of a Numpy array
def Sys(X, t=0):
# here X[0] = x and x[1] = y
#protien [] is represented with y, and mRNA [] is represented by x
return np.array([ (k1*S*Kd**p)/(Kd**p + X[1]**p) - kdx*X[0], ksy*X[0] - (k2*ET*X[1])/(Km + X[1])])
#variables
k1=.1
S=1
Kd=1
kdx=.1
p=2
ksy=1
k2=1
ET=1
Km=1
# generate 1000 linearly spaced numbers for x-axes
t = np.linspace(0, 50,100)
# initial values
Sys0 = np.array([1, 0])
#Solves the ODE
X, infodict = integrate.odeint(Sys, Sys0, t, full_output = 1, mxstep = 50000)
#assigns appropriate equations to x and y
x,y = X.T
#plot's the graph
fig = plt.figure(figsize=(15,5))
fig.subplots_adjust(wspace = 0.5, hspace = 0.3)
ax1 = fig.add_subplot(1,2,1)
ax1.plot(x, color="blue")
ax1.plot(y, color = 'red')
ax1.set_xlabel("Protien concentration")
ax1.set_ylabel("mRNA concentration")
ax1.set_title("Phase space")
ax1.grid()
The given equations and parameters are:
model for a simple negative feedback loop
protein (y) inhibits the synthesis of its mRNA (x)
dx/dt = k1SKd^p/(Kd^p + y^p) - kdx*x
dy/dt = ksyx - k2ET*y/(Km + y)
p k1=0.1, S=1, Kd=1, kdx=0.1, p=2
p ksy=1, k2=1, ET=1, Km=1
# XP=y, YP=x, TOTAL=100, METH=stiff, XLO=0, XHI=4, YLO=0, YHI=1.05 (I don't exactly understand what is going on here)
Again, this uses a program called XPP or WINPP.
Any help with this would be appreciated, the original paper I am trying to replicate this from is : Design principles of biochemical oscillators by Bela Novak and John J. Tyson

Python how to control curvature when joining two points

I have a original curve. I am developing a model curve matching closely the original curve. Everything is working fine but not matching. How to control the curvature of my model curve? Below code is based on answer here.
My code:
def curve_line(point1, point2):
a = (point2[1] - point1[1])/(np.cosh(point2[0]) - np.cosh(point1[0]))
b = point1[1] - a*np.sinh(point1[0])
x = np.linspace(point1[0], point2[0],100).tolist()
y = (a*np.cosh(x) + b).tolist()
return x,y
###### A sample of my code is given below
point1 = [10,100]
point2 = [20,50]
x,y = curve_line(point1, point2)
plt.plot(point1[0], point1[1], 'o')
plt.plot(point2[0], point2[1], 'o')
plt.plot(x,y) ## len(x)
My present output:
I tried following function as well:
y = (50*np.exp(-x/10) +2.5)
The output is:
Instead of just guessing the right parameters of your model function, you can fit a model curve to your data using curve_fit.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
x = np.array([ 1.92, 14.35, 21.50, 25.27, 27.34, 30.32, 32.31, 34.09, 34.21])
y = np.array([8.30, 8.26, 8.13, 7.49, 6.66, 4.59, 2.66, 0.60, 0.06])
def fun(x, a, b, c):
return a * np.cosh(b * x) + c
coef,_ = curve_fit(fun, x, y)
plt.plot(x, y, label='Original curve')
plt.plot(x, fun(x, *coef), label=f'Model: %5.3f cosh(%4.2f x + %4.2f)' % tuple(coef) )
plt.legend()
plt.show()
If it is important that the start and end points are closely fitted, you can pass uncertainties to curve_fit, adjusting them to lower values towards the ends, e.g. by
s = np.ones(len(x))
s[1:-1] = s[1:-1] * 3
coef,_ = curve_fit(fun, x, y, sigma=s)
Your other approach a * np.exp(b * x) + c will also work and gives -0.006 exp(0.21 x + 8.49).
In some cases you'll have to provide an educated guess for the initial values of the coefficients to curve_fit (it uses 1 as default).

SciPy Curve Fit Fails Power Law

So, I'm trying to fit a set of data with a power law of the following kind:
def f(x,N,a): # Power law fit
if a >0:
return N*x**(-a)
else:
return 10.**300
par,cov = scipy.optimize.curve_fit(f,data,time,array([10**(-7),1.2]))
where the else condition is just to force a to be positive. Using scipy.optimize.curve_fit yields an awful fit (green line), returning values of 1.2e+04 and 1.9e0-7 for N and a, respectively, with absolutely no intersection with the data. From fits I've put in manually, the values should land around 1e-07 and 1.2 for N and a, respectively, though putting those into curve_fit as initial parameters doesn't change the result. Removing the condition for a to be positive results in a worse fit, as it chooses a negative, which leads to a fit with the wrong sign slope.
I can't figure out how to get a believable, let alone reliable, fit out of this routine, but I can't find any other good Python curve fitting routines. Do I need to write my own least-squares algorithm or is there something I'm doing wrong here?
UPDATE
In the original post, I showed a solution that uses lmfit which allows to assign bounds to your parameters. Starting with version 0.17, scipy also allows to assign bounds to your parameters directly (see documentation). Please find this solution below after the EDIT which can hopefully serve as a minimal example on how to use scipy's curve_fit with parameter bounds.
Original post
As suggested by #Warren Weckesser, you could use lmfit to get this task done, which allows you to assign bounds to your parameters and avoids this 'ugly' if-clause.
Since you do not provide any data, I created some which are shown here:
They follow the law f(x) = 10.5 * x ** (-0.08)
I fit them - as suggested by #roadrunner66 - by transforming the power law in a linear function:
y = N * x ** a
ln(y) = ln(N * x ** a)
ln(y) = a * ln(x) + ln(N)
So I first use np.log on the original data and then do the fit. When I now use lmfit, I get the following output:
[[Variables]]
lN: 2.35450302 +/- 0.019531 (0.83%) (init= 1.704748)
a: -0.08035342 +/- 0.005158 (6.42%) (init=-0.5)
So a is pretty close to the original value and np.exp(2.35450302) gives 10.53 which is also very close to the original value.
The plot then looks as follows; as you can see the fit describes the data very well:
Here is the entire code with a couple of inline comments:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import minimize, Parameters, Parameter, report_fit
# generate some data with noise
xData = np.linspace(0.01, 100., 50.)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
plt.plot(xData, yData, 'bo')
plt.show()
# transform data so that we can use a linear fit
lx = np.log(xData)
ly = np.log(yData)
plt.plot(lx, ly, 'bo')
plt.show()
def decay(params, x, data):
lN = params['lN'].value
a = params['a'].value
# our linear model
model = a * x + lN
return model - data # that's what you want to minimize
# create a set of Parameters
params = Parameters()
params.add('lN', value=np.log(5.5), min=0.01, max=100) # value is the initial value
params.add('a', value=-0.5, min=-1, max=-0.001) # min, max define parameter bounds
# do fit, here with leastsq model
result = minimize(decay, params, args=(lx, ly))
# write error report
report_fit(params)
# plot data
xnew = np.linspace(0., 100., 5000.)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, np.exp(result.values['lN']) * xnew ** (result.values['a']), 'r')
plt.show()
EDIT
Assuming that you have scipy 0.17 installed, you can also do the following using curve_fit. I show it for your original definition of the power law (red line in the plot below) as well as for the logarithmic data (black line in the plot below). The data is generated in the same way as above. The plot the looks as follows:
As you can see, the data is described very well. If you print popt and popt_log, you obtain array([ 10.47463426, 0.07914812]) and array([ 2.35158653, -0.08045776]), respectively (note: for the letter one you will have to take the exponantial of the first argument - np.exp(popt_log[0]) = 10.502 which is close to the original data).
Here is the entire code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generate some data with noise
xData = np.linspace(0.01, 100., 50)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
# get logarithmic data
lx = np.log(xData)
ly = np.log(yData)
def f(x, N, a):
return N * x ** (-a)
def f_log(x, lN, a):
return a * x + lN
# optimize using the appropriate bounds
popt, pcov = curve_fit(f, xData, yData, bounds=(0, [30., 20.]))
popt_log, pcov_log = curve_fit(f_log, lx, ly, bounds=([0, -10], [30., 20.]))
xnew = np.linspace(0.01, 100., 5000)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, f(xnew, *popt), 'r')
plt.plot(xnew, f(xnew, np.exp(popt_log[0]), -popt_log[1]), 'k')
plt.show()

Parametric plot of solution of 2x2 diff. system in python, Mathematica

I've implemented a solution to the following system of equations
dy/dt = -t*y(t) - x(t)
dx/dt = 2*x(t) - y(t)^3
y(0) = x(0) = 1.
0 <= t <= 20
firstly in Mathematica and afterwards in Python.
My code in Mathematica:
s = NDSolve[
{x'[t] == -t*y[t] - x[t], y'[t] == 2 x[t] - y[t]^3, x[0] == y[0] == 1},
{x, y}, {t, 20}]
ParametricPlot[Evaluate[{x[t], y[t]} /. s], {t, 0, 20}]
From that I get the following plot: Plot1 (if it gives a 403 Forbidden message please press enter inside the url field)
Later on I coded the same into python:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
g = lambda t: t
def f(z,t):
xi = z[0]
yi = z[1]
gi = z[2]
f1 = -gi*yi-xi
f2 = 2*xi-yi**3
return [f1,f2]
# Initial Conditions
x0 = 1.
y0 = 1.
g0 = g(0)
z0 = [x0,y0,g0]
t= np.linspace(0,20.,1000)
# Solve the ODEs
soln = odeint(f,z0,t)
x = soln[:,0]
y = soln[:,1]
plt.plot(x,y)
plt.show()
And this is the plot I get:
Plot2 (if it gives a 403 Forbidden message please press enter inside the url field)
If one plots again the Mathematica solution in a smaller field:
ParametricPlot[Evaluate[{x[t], y[t]} /. s], {t, 0, 6}]
he will get a similar result to the python solution. Only the axis' will be misplaced.
Why is there such a big difference in the plots? What am I doing wrong?
I suspect that my python implementation of the model is wrong, especially where f1 is calculated. Or maybe the plot() function isn't handy at all for plotting parametric equations as in this case.
Thanks.
ps: sorry for making your life hard by not slapping the images inside the text; I don't have enough reputation yet.
You're using t as your third parameter in the input vector, not as a separate parameter. The t in f(z,t) is never used; instead, you use z[2], which will not equal the range of t as you define before (t=np.linspace(0,20.,1000)). The lambda function for g won't help here: you only use it once to set a t0, but never after.
Simplify your code, and remove that third parameter from your input vector (as well as the lambda function). For example:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def f(z,t):
xi = z[0]
yi = z[1]
f1 = -t*yi-xi
f2 = 2*xi-yi**3
return [f1,f2]
# Initial Conditions
x0 = 1.
y0 = 1.
#t= np.linspace(0,20.,1000)
t = np.linspace(0, 10., 100)
# Solve the ODEs
soln = odeint(f,[x0,y0],t)
x = soln[:,0]
y = soln[:,1]
ax = plt.axes()
#plt.plot(x,y)
plt.plot(t,x)
# Put those axes at their 0 value position
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
#plt.axis([-0.085, 0.085, -0.05, 0.07])
plt.show()
I have commented out the actual plot you want, and instead I'm plotting x versus t, what you have in the comments, since I feel that makes it easier to see things are correct now. The figure I get:

Categories