Running the following code,
x = np.array([50.849937, 53.849937, 56.849937, 59.849937, 62.849937, 65.849937, 68.849937, 71.849937, 74.849937, 77.849937, 80.849937, 83.849937, 86.849937, 89.849937, 92.849937])
y = np.array([410.67800, 402.63800, 402.63800, 386.55800, 330.27600, 217.71400, 72.98990, 16.70860, 8.66833, 40.82920, 241.83400, 386.55800, 394.59800, 394.59800, 402.63800])
def f(om, a, i , c):
return a - i*np.exp(- c* (om-74.)**2)
par, cov = curve_fit(f, x, y)
stdev = np.sqrt(np.diag(cov) )
produces this Graph,
With the following parameters and standard deviation:
par = [ 4.09652163e+02, 4.33961227e+02, 1.58719772e-02]
stdev = [ 1.46309578e+01, 2.44878171e+01, 2.40474753e-03]
However, when trying to fit this data to the following function:
def f(om, a, i , c, omo):
return a - i*np.exp(- c* (om-omo)**2)
It doesn't work, it produces a standard deviation of
stdev = [inf, inf, inf, inf, inf]
Is there any way to fix this?
It looks like it isn't converging (see this and this). Try adding an initial condition,
par, cov = curve_fit(f, x, y, p0=[1.,1.,1.,74.])
which results in the
par = [ 4.11892318e+02, 4.36953868e+02, 1.55741131e-02, 7.32560690e+01])
stdev = [ 1.17579445e+01, 1.94401006e+01, 1.86709423e-03, 2.62952690e-01]
You can calculate the initial condition from data:
%matplotlib inline
import pylab as pl
import numpy as np
from scipy.optimize import curve_fit
x = np.array([50.849937, 53.849937, 56.849937, 59.849937, 62.849937, 65.849937, 68.849937, 71.849937, 74.849937, 77.849937, 80.849937, 83.849937, 86.849937, 89.849937, 92.849937])
y = np.array([410.67800, 402.63800, 402.63800, 386.55800, 330.27600, 217.71400, 72.98990, 16.70860, 8.66833, 40.82920, 241.83400, 386.55800, 394.59800, 394.59800, 402.63800])
def f(om, a, i , c, omo):
return a - i*np.exp(- c* (om-omo)**2)
par, cov = curve_fit(f, x, y, p0=[y.max(), y.ptp(), 1, x[np.argmin(y)]])
stdev = np.sqrt(np.diag(cov) )
pl.plot(x, y, "o")
x2 = np.linspace(x.min(), x.max(), 100)
pl.plot(x2, f(x2, *par))
Related
I'm trying to fit an exponential curve on a histogram created from the variable y1_pt and then get the exponential's parameters. Problem is it gives me the following warnings:
OptimizeWarning: Covariance of the parameters could not be estimated
and pcov_exponential =
array([[inf, inf, inf],
[inf, inf, inf],
[inf, inf, inf]]))
and the result is more an exponential fit which looks to me slightly random.. (see plot)
Does anyone have a clue as to what's wrong?
import pandas as pd
import numpy
from pylab import *
import scipy.stats as ss
from scipy.optimize import curve_fit
df=pd.read_hdf('data.h5','dataset')
pty1 = df1['y1_pt']
bins1 = numpy.linspace(35, 1235, 100)
counts, bins = numpy.histogram(pty1, bins = bins1, range = [35, 1235], density = False)
binscenters = numpy.array([0.5 * (bins1[i] + bins1[i+1]) for i in range(len(bins1)-1)])
def exponential(x, a, k, b):
return a*np.exp(-x*k) + b
popt_exponential, pcov_exponential = curve_fit(exponential, xdata=binscenters, ydata=counts)
print(popt_exponential)
xspace = numpy.linspace(0, 6, 100000)
plt.bar(binscenters, counts, color='navy', label=r'Histogram entries')
plt.plot(xspace, exponential(xspace, *popt_exponential), color='darkorange', linewidth=2.5, label=r'Fitted function')
plt.show()
I think you are missing a minus sign in the exponential formula, hence the overflow. It should be a * np.exp( - x * k) + b
See the example at https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
I am trying to implement a exponential regression function. sp stands for sympy. I use numpy and sympy. Firstly, in func_exp I tried to use np.exp but it generated an error (attribute error), so I decided to use sympy instead. Well, this is the code
import numpy as np
from numpy.linalg import matrix_rank
import scipy
import scipy.integrate
import random
import matplotlib.pyplot as plt
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
from sympy import integrate
import sympy as sp
x, y = sp.symbols('x, y')
sp.init_printing(use_unicode=True,use_latex='mathjax')
def exponential_regression (x_data, y_data):
def func_exp(x, a, b):
return a*sp.exp(b*x)
popt, pcov = scipy.optimize.curve_fit(func_exp, x_data, y_data)
a = popt[0] # componente a, Parámetro ÓPTimo (popt).
b = popt[1] # componente b, Parámetro ÓPTimo (popt).
plt.figure()
puntos = plt.plot(x_data, y_data, 'x', color='xkcd:maroon')
curva_regresion = plt.plot(x_data, func_exp(x_data, a, b), color='xkcd:teal')
plt.show(puntos, curva_regresion)
return func_exp(x, a, b)
I try to execute:
x_data = np.arange(0, 51) # Crea un array de 0 a 50.
y_data = np.array([0.001, 0.199, 0.394, 0.556, 0.797, 0.891, 1.171, 1.128, 1.437,
1.525, 1.720, 1.703, 1.895, 2.003, 2.108, 2.408, 2.424,2.537,
2.647, 2.740, 2.957, 2.58, 3.156, 3.051, 3.043, 3.353, 3.400,
3.606, 3.659, 3.671, 3.750, 3.827, 3.902, 3.976, 4.048, 4.018,
4.286, 4.353, 4.418, 4.382, 4.444, 4.485, 4.465, 4.600, 4.681,
4.737, 4.792, 4.845, 4.909, 4.919, 5.100])
exponential_regression(x_data, y_data)
And I get:
exponential_regression(x_data, y_data)
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
Traceback (most recent call last):
File "<ipython-input-122-ee7c243ae4b0>", line 1, in <module>
exponential_regression(x_data, y_data)
File "/Volumes/TOSHIBA/spline.py", line 35, in exponential_regression
popt, pcov = scipy.optimize.curve_fit(func_exp, x_data, y_data)
File "/Applications/anaconda3/lib/python3.6/site-packages/scipy/optimize/minpack.py", line 742, in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File "/Applications/anaconda3/lib/python3.6/site-packages/scipy/optimize/minpack.py", line 387, in leastsq
gtol, maxfev, epsfcn, factor, diag)
error: Result from function call is not a proper array of floats.
What is wrong? Thanks in advance!
Here is a minimal example for your fit function as close as possible to your code but removing all unnecessary elements. You can easily remove c to adhere to your requirements:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def func_exp(x, a, b, c):
#c = 0
return a * np.exp(b * x) + c
def exponential_regression (x_data, y_data):
popt, pcov = curve_fit(func_exp, x_data, y_data, p0 = (-1, 0.01, 1))
print(popt)
puntos = plt.plot(x_data, y_data, 'x', color='xkcd:maroon', label = "data")
curva_regresion = plt.plot(x_data, func_exp(x_data, *popt), color='xkcd:teal', label = "fit: {:.3f}, {:.3f}, {:.3f}".format(*popt))
plt.legend()
plt.show()
return func_exp(x_data, *popt)
x_data = np.arange(0, 51)
y_data = np.array([0.001, 0.199, 0.394, 0.556, 0.797, 0.891, 1.171, 1.128, 1.437,
1.525, 1.720, 1.703, 1.895, 2.003, 2.108, 2.408, 2.424,2.537,
2.647, 2.740, 2.957, 2.58, 3.156, 3.051, 3.043, 3.353, 3.400,
3.606, 3.659, 3.671, 3.750, 3.827, 3.902, 3.976, 4.048, 4.018,
4.286, 4.353, 4.418, 4.382, 4.444, 4.485, 4.465, 4.600, 4.681,
4.737, 4.792, 4.845, 4.909, 4.919, 5.100])
exponential_regression(x_data, y_data)
Output with c = 0:
Output with c != 0:
Main changes explained:
Removed sympy - it has nothing to do with the fitting procedure.
The definition of the exponential fit function is placed outside exponential_regression, so it can be accessed from other parts of the script. It uses np.exp because you work with numpy arrays in scipy.
Added the parameter p0 which contains the initial guesses for the parameters. Fit functions are often sensitive to this initial guess because of local extrema.
Unpack variables with *popt to make it more flexible for different numbers of variables. a = popt[0], b = popt[1], etc.
Removed unnecessary imports. Keep your namespace free from clutter.
There is an equation of exponential truncated power law in the article below:
Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A. L. (2008). Understanding individual human mobility patterns. Nature, 453(7196), 779-782.
like this:
It is an exponential truncated power law. There are three parameters to be estimated: rg0, beta and K. Now we have got several users' radius of gyration(rg), and uploaded it onto Github: radius of gyrations.txt
The following codes can be used to read data and calculate P(rg):
import numpy as np
# read radius of gyration from file
rg = []
with open('/path-to-the-data/radius of gyrations.txt', 'r') as f:
for i in f:
rg.append(float(i.strip('\n')))
# calculate P(rg)
rg = sorted(rg, reverse=True)
rg = np.array(rg)
prg = np.arange(len(sorted_data)) / float(len(sorted_data)-1)
or you can directly get rg and prg data as the following:
rg = np.array([ 20.7863444 , 9.40547933, 8.70934714, 8.62690145,
7.16978087, 7.02575052, 6.45280959, 6.44755478,
5.16630287, 5.16092884, 5.15618737, 5.05610068,
4.87023561, 4.66753197, 4.41807645, 4.2635671 ,
3.54454372, 2.7087178 , 2.39016885, 1.9483156 ,
1.78393238, 1.75432688, 1.12789787, 1.02098332,
0.92653501, 0.32586582, 0.1514813 , 0.09722761,
0. , 0. ])
prg = np.array([ 0. , 0.03448276, 0.06896552, 0.10344828, 0.13793103,
0.17241379, 0.20689655, 0.24137931, 0.27586207, 0.31034483,
0.34482759, 0.37931034, 0.4137931 , 0.44827586, 0.48275862,
0.51724138, 0.55172414, 0.5862069 , 0.62068966, 0.65517241,
0.68965517, 0.72413793, 0.75862069, 0.79310345, 0.82758621,
0.86206897, 0.89655172, 0.93103448, 0.96551724, 1. ])
I can plot the P(r_g) and r_g using the following python script:
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(rg, prg, 'bs', alpha = 0.3)
# roughly estimated params:
# rg0=1.8, beta=0.15, K=5
plt.plot(rg, (rg+1.8)**-.15*np.exp(-rg/5))
plt.yscale('log')
plt.xscale('log')
plt.xlabel('$r_g$', fontsize = 20)
plt.ylabel('$P(r_g)$', fontsize = 20)
plt.show()
How can I use these data of rgs to estimate the three parameters above? I hope to solve it using python.
According to #Michael 's suggestion, we can solve the problem using scipy.optimize.curve_fit
def func(rg, rg0, beta, K):
return (rg + rg0) ** (-beta) * np.exp(-rg / K)
from scipy import optimize
popt, pcov = optimize.curve_fit(func, rg, prg, p0=[1.8, 0.15, 5])
print popt
print pcov
The results are given below:
[ 1.04303608e+03 3.02058550e-03 4.85784945e+00]
[[ 1.38243336e+18 -6.14278286e+11 -1.14784675e+11]
[ -6.14278286e+11 2.72951900e+05 5.10040746e+04]
[ -1.14784675e+11 5.10040746e+04 9.53072925e+03]]
Then we can inspect the results by plotting the fitted curve.
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(rg, prg, 'bs', alpha = 0.3)
plt.plot(rg, (rg+popt[0])**-(popt[1])*np.exp(-rg/popt[2]) )
plt.yscale('log')
plt.xscale('log')
plt.xlabel('$r_g$', fontsize = 20)
plt.ylabel('$P(r_g)$', fontsize = 20)
plt.show()
I hope that my question can be answered without having runnable code, as its too complex to create a small but running version. The following code is part of my project:
x0 = [0.5, 0.5]
solution = optimize.root(solveMe, x0, args=(Param, Result, False), method='broyden1')
if solution.status != 1:
Result.__dict__
two.plot_some_function(solveMe, np.arange(0.1, 1.5, 0.1), np.arange(0.1, 1.5, 0.1), Param, Result, False)
raise Exception ('did not converge')
solveMe is a function that returns a vector of two residuals, F(x0). Whenever root() does not converge, I create a grid between 0.1 and 1.5 for both variables and just plot the output of F(x0) for any xo on that two-dimensional grid. I also check whether there are grid points such that both residuals are close to zero. The code follows:
# for debuggign
def plot_some_function(func, x, y, *args):
from matplotlib import cm
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
X, Y = np.meshgrid(x, y)
Z0 = np.empty(X.shape)
Z1 = np.empty(X.shape)
for idx in np.ndindex(X.shape):
x, y = X[idx], Y[idx]
Z0[idx], Z1[idx] = func([x,y], *args)
if (abs(Z0[idx]) < 0.1) & (abs(Z1[idx]) < 0.1):
print idx
for Z in [Z0, Z1]:
fig, ax = plt.subplots()
p = ax.pcolor(X, Y, Z, cmap=cm.RdBu, vmin=abs(Z).min(), vmax=abs(Z).max())
cb = fig.colorbar(p, ax=ax)
plt.show()
I ran my main code. I got an exception, the solution does not converge with a set of parameters. Following are the plots (which show that the function is actually quite smooth).
The idx output was
(1, 0)
(2, 1)
which corresponds to (0.2, 0.1) and (0.3, 0.2).
But why does root() not converge then? Its output follows
status: 2
success: False
fun: array([ 0.01725503, 0.25234002])
x: array([ 0.36981866, 0.4440247 ])
message: 'The maximum number of iterations allowed has been reached.'
nit: 300
An additional fine-tuning in the grid for the matrices actually gives me the coordinates for the following:
solveMe([0.165, 0.258], Param, Result, False)
Out[6]: array([ 0.00012388, 0.00105457])
Which is much smaller than what the solver found.
I have a x and y one-dimension numpy array and I would like to reproduce y with a known function to obtain "beta". Here is the code I am using:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
y = array([ 0.04022493, 0.04287536, 0.03983657, 0.0393201 , 0.03810298,
0.0363814 , 0.0331144 , 0.03074823, 0.02795767, 0.02413816,
0.02180802, 0.01861309, 0.01632699, 0.01368056, 0.01124232,
0.01005323, 0.00867196, 0.00940864, 0.00961282, 0.00892419,
0.01048963, 0.01199101, 0.01533408, 0.01855704, 0.02163586,
0.02630014, 0.02971127, 0.03511223, 0.03941218, 0.04280329,
0.04689105, 0.04960554, 0.05232003, 0.05487037, 0.05843364,
0.05120701])
x= array([ 0., 0.08975979, 0.17951958, 0.26927937, 0.35903916,
0.44879895, 0.53855874, 0.62831853, 0.71807832, 0.80783811,
0.8975979 , 0.98735769, 1.07711748, 1.16687727, 1.25663706,
1.34639685, 1.43615664, 1.52591643, 1.61567622, 1.70543601,
1.7951958 , 1.88495559, 1.97471538, 2.06447517, 2.15423496,
2.24399475, 2.33375454, 2.42351433, 2.51327412, 2.60303391,
2.6927937 , 2.78255349, 2.87231328, 2.96207307, 3.05183286,
3.14159265])
def func(x,beta):
return 1.0/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))
guesses = [20]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = 1/(4.0*np.pi)*(1+popt[0]*(3.0/2*np.cos(x)**2-1.0/2))
plt.figure(1)
plt.plot(x,y,'ro',x,y_fit,'k-')
plt.show()
The code works but the fitting is completely off (see picture). Any idea why?
It looks like the formula to use contains an additional parameter, i.e. p
def func(x,beta,p):
return p/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))
guesses = [20,5]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = func(angle_plot,*popt)
plt.figure(2)
plt.plot(x,y,'ro',x,y_fit,'k-')
plt.show()
print popt # [ 1.23341604 0.27362069]
In the popt which one is beta and which one is p?
This is perhaps not what you want but, if you are just trying to get a good fit to the data, you could use np.polyfit:
fit = np.polyfit(x,y,4)
fit_fn = np.poly1d(fit)
plt.scatter(x,y,label='data',color='r')
plt.plot(x,fit_fn(x),color='b',label='fit')
plt.legend(loc='upper left')
Note that fit gives the coefficient values of, in this case, a 4th order polynomial:
>>> fit
array([-0.00877534, 0.05561778, -0.09494909, 0.02634183, 0.03936857])
This is going to be as good as you can get (assuming you get the equation right as #mdurant suggested), an additional intercept term is required to further improve the fit:
def func(x,beta, icpt):
return 1.0/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))+icpt
guesses = [20, 0]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = func(x, *popt)
plt.figure(1)
plt.plot(x,y,'ro', x,y_fit,'k-')
print popt #[ 0.33748816 -0.05780343]