One of my python script gives me some results depending on processing duration, which I display like that:
Now I would like to trace the function's curve which approximate the best the results evolution.
After few researches, the best tool I found is the curve_fit of scipy.optimize.
There is just one problem, the function curve_fit requires at first parameter a function (if I have well understand the documentation's example) but my points on the graph are not the results of a function, so I don't know what to put here.
Can someone help me to fix this problem or proposing me another way t do that?
Thanks.
When you say "now I would like to trace the function's curve which approximate the best the results evolution", you must have some sort of curve in mind that is the ideal form for the data. So, what is that function? In curve-fitting, that function is called "the model function" -- the function that models your data.
Think of it this way: you have 50 or so measurement points. You might believe that they are each perfectly accurate and free-of-error. But since you asked about curve-fitting, this is probably not the case. That is, you probably believe there is some noise or errors in the data and that the data can be represented by an idealized function with many fewer than 50 or so parameters (I'd guess 4 or so).
That idealized function that explains your model (and would allow predicting "optimum" values at "duration" points that you did not measure) is the "model function". If you have that, curve-fitting can help: you write that function (which probably depends on a few Parameters) to model the data in python and find the best values for the Parameters so that the model matches your data. If you don't have that, what do you mean by "curve-fitting"?
You could draw a spline through the data or otherwise smooth the data, but that gives little power about predicting new values that would be different from "interpolate/extrapolate the data without worrying about the effect of noise".
It looks like an "exponential approach" type of curve like you get for charging a capacitor - see here.
So, I'd start with this formula:
y = a * ( 1 - n * np.exp(-b*x))
If I plot that with Matplotlib:
#!/usr/bin/env python3
import numpy as np
import matplotlib.pyplot as plt
# Make 100 samples along x-axis, from 0..10
x = np.linspace(0,10,100)
# Make an exponential approach type of curve
a = 17000
n = 1
b = 3
y = a * ( 1 - n * np.exp(-b*x))
# Plot it
plt.title(f'Plot for a={a}, n={n}, b={b}')
plt.plot(x,y)
plt.show()
Related
Scipy Stats comes with an ANOVA test f_oneway() that returns a p-value and a F-score. The p-score easily tells you if your test passes or fails simply by comparing it to your alpha threshold, which can be chosen arbitrarily small to make the test stricter. If the p-value falls below your chosen alpha, good to go.
However, it seems like the F-Value is rather meaningless unless you have a critical value to compare it to. Looking at Wikipedia, it seems like this critical value is computed based on alpha, degrees of freedom, etc. As a stats moron (but getting better!), I don't really want to try my hand at making my own function, but I can't find one in the stats library. Am I missing something?
Reason for question: I want to make a bar graph of F-scores next to their critical values. p-values seem to be really small, so not so good for graphing.
Thanks!
You can use ppf method of scipy.stats.f which stands for percent point function (inverse of cdf).
Usage is:
from scipy.stats import f
lower_tail_prob = 0.05
dof_num = 5
dof_den = 12
f_critical = f.ppf(lower_tail_prob, dof_num, dof_den)
The lower_tail_prob argument might be your alpha if one-tailed test, or alpha / 2 if two-tailed.
I’m trying to solve a simple ODE to visualise the temporal response, which works well for constant input conditions using the new solve_ivp integration API in SciPy. For example:
def dN1_dt_simple(t, N1):
return -100 * N1
sol = solve_ivp(fun=dN1_dt_simple, t_span=[0, 100e-3], y0=[N0,])
However, I wonder is it possible to plot the response to a time-varying input? For instance, rather than having y0 fixed at N0, can I find the response to a simple sinusoid?
Is there a compatible way to pass time-varying input conditions into the API?
The function you pass to solve_ivp has a t in its signature for exactly this reason. You can do with it whatever you like¹. For example, to get a smooth, one-time pulse, you can do:
from numpy import pi, cos
def fun(t,N1):
input = 1-cos(t) if 0<t<2*pi else 0
return -100*N1 + input
sol = solve_ivp(fun=fun, t_span=[0,20], y0=[N0])
Note that y0 is not the input in your use of the term, but the initial condition. It is defined and makes sense for one point in time only – where you start your integration/simulation.
With ODEs, you typically model external inputs as forces or similar (affecting the time derivative of the system like in the above example) rather than direct changes to the state.
With this approach and in your context of an excitable system, N0 is already the outcome of some external input.
¹ As long as it is sufficiently smooth for the needs of the respective integrator, usually continuously differentiable (C¹). If you want to implement a step, better use a very sharp sigmoid instead.
I am trying to integrate on a surface the following equation:
intensity=(2*J1(z)/z)^2 with z=A*sqrt((x-mu1)^2+(y-mu2)^2), A(L) a constant of x and y and J1 the first order bessel function. To do so I use the dblquad function as below:
resultinf = dblquad(lambda r,phi:intensity(mu1,mu2,L,r,phi),0,inf,lambda phi:0,lambda phi:2*pi)
The only important parameters here are r and phi in polar coordinates (the others depends of other parameters unimportant here), with x=rcos(phi) and y=rsin(phi)
But when I try to integrate the function I get this message:
C:\pyzo2013b\Lib\pyzo-packages\scipy\integrate\quadpack.py:289:
UserWarning: The maximum number of subdivisions (50) has been
achieved. If increasing the limit yields no improvement it is
advised to analyze the integrand in order to determine the
difficulties. If the position of a local difficulty can be
determined (singularity, discontinuity) one will probably gain from
splitting up the interval and calling the integrator on the
subranges. Perhaps a special-purpose integrator should be used.
warnings.warn(msg)
And then a completely unacurrate result followed by:
C:\pyzo2013b\Lib\pyzo-packages\scipy\integrate\quadpack.py:289:
UserWarning: The integral is probably divergent, or slowly convergent.
warnings.warn(msg)
I do understand the meaning of this messages but I got two questions:
Is there any mean for me to avoid this subdivision error other than just dividing my integrated intervalls in smaller segment (I'd like to check my other results by comparing them to the norm over an infinite domain and I won't be able to do so if I can't integrate over an infinite domain properly)? Maybe with a special purpose integrator? But I don't know what it is or how to use them.
Why do I get a warning about a divergent integral or a singularity knowing that J1(z)/(z) converge to 1 when z converge to zero (just like a sinus cardinal)?
Does anybody has an answer?
Here is the complete useful lines of codes (all the others parameters are defined otherwise):
def intensity(mu1,mu2,L,r,phi):# distribution area for a diffracted beam
x=r*cos(phi)
y=r*sin(phi)
X=x-mu1
Y=y-mu2
R=sqrt(X**2+Y**2)
scaled_R = R*Dt *pi/(lambd*L)
return (4*(special.jv(1,scaled_R)**2/scaled_R**2)
resultinf = dblquad(lambda r,phi:intensity(mu1,mu2,L,r,phi),0,inf,lambda phi:0,lambda phi:2*pi)
print(resultinf)
(I have modified it on the advise of gboffi for the sake of a better understanding of the function.)
I'm using using scipy's least-squares optimization to fit an exponentially-modified gaussian distribution to a set of reaction time measurements. In general, it works well, but sometimes, the optimization goes off the rails and chooses a crazy value for a parameter -- the resulting plot clearly doesn't fit the data very well. In general, it looks like the problems arise from floating-point precision errors -- we head off into 0 or inf or nan-land.
I'm thinking of doing two things:
Using the parameters to simultaneously fit a CDF and PDF to the data; I have formulas for both. (I'm using a kernel density estimate to approximate the PDF from the data.)
Somehow taking into account the distance from the initial parameter estimates (generated by the method of moments approach on the wikipedia page). Those estimates are far from perfect, but are pretty good and seem to steer clear of "exploding floating point" problems.
Combining the PDF and CDF fits sounds pretty straightforward; the scales of the error will even be generally the same. But getting the initial parameter fits in there: I'm not quite sure if it's even a good idea -- but if it is:
What would I do about the difference in scale? Should I normalize the parameter "error" to a percent error?
Is there a reasonable way to decide on a relative weight between the data estimation error and parameter "error"?
Are these even the right questions to be asking? Are there generally-regarded "correct" answers, or is "try some stuff until you find something that seems to work" a good approach?
One example dataset
As requested, here's a dataset for which this process isn't working very well. I know there are only a few samples and that the data don't fit the distribution well; I'm still hoping against hope that I can get a "reasonable-looking" result from optimization.
array([ 450., 560., 692., 730., 758., 723., 486., 596., 716.,
695., 757., 522., 535., 419., 478., 666., 637., 569.,
859., 883., 551., 652., 378., 801., 718., 479., 544.])
MLE Update
I had a bunch of problems getting my MLE estimate to converge to a "reasonable" value, until I discovered this: If X contains at least one nan, np.sum(X) == nan when X is a numpy array but not when X is a pandas Series. So the sum of the log-likelihood was doing stupid things when the parameters started to go out of bounds.
Added a np.asarray() call and everything is great!
This should have been a comment but I run out of space.
I think a Maximum Likelihood fit is probably the most appropriate approach here. ML method is already implemented for many distributions in scipy.stats. For example, you can find the MLE of normal distribution by calling scipy.stats.norm.fit and find the MLE of exponential distribution in a similar way. Combining these two resulting MLE parameters should give you a pretty good starting parameter for Ex-Gaussian ML fit. In fact I would imaging most of your data is quite nicely Normally distributed. If that is the case, the ML parameter estimates for Normal distribution alone should give you a pretty good starting parameter.
Since Ex-Gaussian only has 3 parameters, I don't think a ML fit will be hard at all. If you could provide a dataset for which your current method doesn't work well, it will be easier to show a real example.
Alright, here you go:
>>> import scipy.special as sse
>>> import scipy.stats as sss
>>> import scipy.optimize as so
>>> from numpy import *
>>> def eg_pdf(p, x): #defines the PDF
m=p[0]
s=p[1]
l=p[2]
return 0.5*l*exp(0.5*l*(2*m+l*s*s-2*x))*sse.erfc((m+l*s*s-x)/(sqrt(2)*s))
>>> xo=array([ 450., 560., 692., 730., 758., 723., 486., 596., 716.,
695., 757., 522., 535., 419., 478., 666., 637., 569.,
859., 883., 551., 652., 378., 801., 718., 479., 544.])
>>> sss.norm.fit(xo) #get the starting parameter vector form the normal MLE
(624.22222222222217, 132.23977474531389)
>>> def llh(p, f, x): #defines the negative log-likelihood function
return -sum(log(f(p,x)))
>>> so.fmin(llh, array([624.22222222222217, 132.23977474531389, 1e-6]), (eg_pdf, xo)) #yeah, the data is not good
Warning: Maximum number of function evaluations has been exceeded.
array([ 6.14003407e+02, 1.31843250e+02, 9.79425845e-02])
>>> przt=so.fmin(llh, array([624.22222222222217, 132.23977474531389, 1e-6]), (eg_pdf, xo), maxfun=1000) #so, we increase the number of function call uplimit
Optimization terminated successfully.
Current function value: 170.195924
Iterations: 376
Function evaluations: 681
>>> llh(array([624.22222222222217, 132.23977474531389, 1e-6]), eg_pdf, xo)
400.02921290185645
>>> llh(przt, eg_pdf, xo) #quite an improvement over the initial guess
170.19592431051217
>>> przt
array([ 6.14007039e+02, 1.31844654e+02, 9.78934519e-02])
The optimizer used here (fmin, or Nelder-Mead simplex algorithm) does not use any information from gradient and usually works much slower than the optimizer that does. It appears that the derivative of the negative log-likelihood function of Exponential Gaussian may be written in a close form easily. If so, optimizers that utilize gradient/derivative will be better and more efficient choice (such as fmin_bfgs).
The other thing to consider is parameter constrains. By definition, sigma and lambda has to be positive for Exponential Gaussian. You can use a constrained optimizer (such as fmin_l_bfgs_b). Alternatively, you can optimize for:
>>> def eg_pdf2(p, x): #defines the PDF
m=p[0]
s=exp(p[1])
l=exp(p[2])
return 0.5*l*exp(0.5*l*(2*m+l*s*s-2*x))*sse.erfc((m+l*s*s-x)/(sqrt(2)*s))
Due to the functional invariance property of MLE, the MLE of this function should be the same as same as the original eg_pdf. There are other transformation that you can use, besides exp(), to project (-inf, +inf) to (0, +inf).
And you can also consider http://en.wikipedia.org/wiki/Lagrange_multiplier.
I am having trouble sovling the optical bloch equation, which is a first order ODE system with complex values. I have found scipy may solve such system, but their webpage offers too little information and I can hardly understand it.
I have 8 coupled first order ODEs, and I should generate a function like:
def derv(y):
compute the time dervative of elements in y
return answers as an array
then do complex_ode(derv)
My questions are:
my y is not a list but a matrix, how can i give a corrent output
fits into complex_ode()?
complex_ode() needs a jacobian, I have no idea how to start constructing one
and what type it should be?
Where should I put the initial conditions like in the normal ode and
time linspace?
this is scipy's complex_ode link:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.complex_ode.html
Could anyone provide me with more infomation so that I can learn a bit more.
I think we can at least point you in the right direction. The optical
bloch equation is a problem which is well understood in the scientific
community, although not by me :-), so there are already solutions on the internet
to this particular problem.
http://massey.dur.ac.uk/jdp/code.html
However, to address your needs, you spoke of using complex_ode, which I suppose
is fine, but I think just plain scipy.integrate.ode will work just fine as well
according to their documentation:
from scipy import eye
from scipy.integrate import ode
y0, t0 = [1.0j, 2.0], 0
def f(t, y, arg1):
return [1j*arg1*y[0] + y[1], -arg1*y[1]**2]
def jac(t, y, arg1):
return [[1j*arg1, 1], [0, -arg1*2*y[1]]]
r = ode(f, jac).set_integrator('zvode', method='bdf', with_jacobian=True)
r.set_initial_value(y0, t0).set_f_params(2.0).set_jac_params(2.0)
t1 = 10
dt = 1
while r.successful() and r.t < t1:
r.integrate(r.t+dt)
print r.t, r.y
You also have the added benefit of an older more established and better
documented function. I am surprised you have 8 and not 9 coupled ODE's, but I'm
sure you understand this better than I. Yes, you are correct, your function
should be of the form ydot = f(t,y), which you call def derv() but you're
going to need to make sure your function takes at least two parameters
like derv(t,y). If your y is in matrix, no problem! Just "reshape" it in
the derv(t,y) function like so:
Y = numpy.reshape(y,(num_rows,num_cols));
As long as num_rows*num_cols = 8, your number of ODE's you should be fine. Then
use the matrix in your computations. When you're all done, just be sure to return
a vector and not a matrix like:
out = numpy.reshape(Y,(8,1));
The Jacobian is not required, but it will likely allow the computation to proceed
much more quickly. If you do not know how to compute this you may want to consult
wikipedia or a calculus text book. It's pretty simple, but can be time consuming.
As far as initial conditions, you should probably already know what those should
be, whether it's complex or real valued. As long as you select values that are
within reason, it shouldn't matter much.