I have the following function to_minimize which should be equal to the log-likelihood of a dataset for a Weibull distribution, truncated from the left at d.
import numpy as np
from scipy.optimize import minimize
def to_minimize(args, data, d=1):
theta, tau = args
n = len(data)
if tau <= 0 or theta <= 0:
pass
term1 = n * (np.log(tau) - tau * np.log(theta) - (-d / theta) ** tau)
term2 = 0
for x in data:
term2 += (tau - 1) * np.log(x) + (-x / theta) ** tau
return term1 + term2
data = numpy.random.rand(100)
weibull = minimize(lambda args: -to_minimize(args, data),
x0=np.array((1., 1.)), bounds=np.array([(1e-15, 10), (1e-15, 10)]))
As far as I can tell, the only thing that should cause an error of the form
RuntimeWarning: invalid value encountered in double_scalars
should be if tau or theta are 0. But the bounds on those parameters is specifically above 0 so why does my optimization routine crash?
After calling np.seterr(all='raise') and debugging some more, I noticed that I had an error in my calculations. The - in the exponential function has to be applied after the power. Otherwise it will try to take the root of a negative number which won't work for obvious reasons.
Related
I am running scipy.optimize.minimize trying to maximize the likelihood for left-truncated data on a Gompertz distribution. Since the data is left-truncated at 1, I get this likelihood:
# for a single point x_i, the left-truncated log-likelihood is:
# ln(tau) + tau*(ln(theta) - ln(x_i)) - (theta / x_i) ** tau - ln(x_i) - ln(1 - exp(-(theta / d) ** tau))
def to_minimize(args, data, d=1):
theta, tau = args
if tau <= 0 or theta <= 0 or theta / d < 0 or np.exp(-(theta / d) ** tau) >= 1:
print('ERROR')
term1 = len(data) * (np.log(tau) + tau * np.log(theta) - np.log(1 - np.exp(-(theta / d) ** tau)))
term2 = 0
for x in data:
term2 += (-(tau + 1) * np.log(x)) - (theta / x) ** tau
return term1 + term2
This will fail in all instances where the if statement is true. In other words, tau and theta have to be strictly positive, and theta ** tau must be sufficiently far away from 0 so that np.exp(-theta ** tau) is "far enough away" from 1, since otherwise the logarithm will be undefined.
These are the constraints which I thus defined. I used the notation with a dict instead of a NonlinearConstraints object since it seems that this methods accepts strict inequality (np.exp(-x[0] ** x[1]) must be strictly less than 1). Maybe I have misunderstood the documentation on this.
def constraints(x):
return [1 - np.exp(-(x[0]) ** x[1])]
To maximize the likelihood, I minimize the negative likelihood.
opt = minimize(lambda args: -to_minimize(args, data),
x0=np.array((1, 1)),
constraints={'type': 'ineq', 'fun': constraints},
bounds=np.array([(1e-15, 10), (1e-15, 10)]))
As I take it, the two arguments should then never be chosen in a way such that my code fails. Yet, the algorithm tries to move theta very close to its lower bound and tau very close to its upper bound so that the logarithm becomes undefined.
What makes my code fail?
Both forms of constraints, i.e. NonlinearConstraint and dict constraints don't support strict inequalities. Typically, one therefore uses g(x) >= c + Ɛ to model the strict inequality g(x) > c, where Ɛ is a sufficiently small number.
Note also that it is not guaranteed that each iteration lies inside the feasible region. Internally, most of the methods try to bring it back into the feasible region by a simple clipping of the bounds. In cases where this doesn't work, you can try NonlinearConstraints keep_feasible option and then use the trust-constr method:
import numpy as np
from scipy.optimize import NonlinearConstraint, minimize
def con_fun(x):
return 1 - np.exp(-(x[0]) ** x[1])
# 1.0e-8 <= con_fun <= np.inf
con = NonlinearConstraint(con_fun, 1.0e-8, np.inf, keep_feasible=True)
x0 = np.array((1., 1.))
bounds = np.array([(1e-5, 10), (1e-5, 10)])
opt = minimize(lambda args: -to_minimize(args, data),
x0=x0, constraints=(con,),
bounds=bounds, method="trust-constr")
A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters α > 0 and β > 0. The shape parameters define the probability of success.
I want to find the values for α and β that best describe my data from the perspective of a beta-binomial distribution. My dataset players consist of data about the number of hits (H), the number of at-bats (AB) and the conversion (H / AB) of a lot of baseball players. I estimate the PDF with the help of the answer of JulienD in Beta Binomial Function in Python
from scipy.special import beta
from scipy.misc import comb
pdf = comb(n, k) * beta(k + a, n - k + b) / beta(a, b)
Next, I write a loglikelihood function that we will minimize.
def loglike_betabinom(params, *args):
"""
Negative log likelihood function for betabinomial distribution
:param params: list for parameters to be fitted.
:param args: 2-element array containing the sample data.
:return: negative log-likelihood to be minimized.
"""
a, b = params[0], params[1]
k = args[0] # the conversion rate
n = args[1] # the number of at-bats (AE)
pdf = comb(n, k) * beta(k + a, n - k + b) / beta(a, b)
return -1 * np.log(pdf).sum()
Now, I want to write a function that minimizes loglike_betabinom
from scipy.optimize import minimize
init_params = [1, 10]
res = minimize(loglike_betabinom, x0=init_params,
args=(players['H'] / players['AB'], players['AB']),
bounds=bounds,
method='L-BFGS-B',
options={'disp': True, 'maxiter': 250})
print(res.x)
The result is [-6.04544138 2.03984464], which implies that α is negative which is not possible. I based my script on the following R-snippet. They get [101.359, 287.318]..
ll <- function(alpha, beta) {
x <- career_filtered$H
total <- career_filtered$AB
-sum(VGAM::dbetabinom.ab(x, total, alpha, beta, log=True))
}
m <- mle(ll, start = list(alpha = 1, beta = 10),
method = "L-BFGS-B", lower = c(0.0001, 0.1))
ab <- coef(m)
Can someone tell me what I am doing wrong? Help is much appreciated!!
One thing to pay attention to is that comb(n, k) in your log-likelihood might not be well-behaved numerically for the values of n and k in your dataset. You can verify this by applying comb to your data and see if infs appear.
One way to amend things could be to rewrite the negative log-likelihood as suggested in https://stackoverflow.com/a/32355701/4240413, i.e. as a function of logarithms of Gamma functions as in
from scipy.special import gammaln
import numpy as np
def loglike_betabinom(params, *args):
a, b = params[0], params[1]
k = args[0] # the OVERALL conversions
n = args[1] # the number of at-bats (AE)
logpdf = gammaln(n+1) + gammaln(k+a) + gammaln(n-k+b) + gammaln(a+b) - \
(gammaln(k+1) + gammaln(n-k+1) + gammaln(a) + gammaln(b) + gammaln(n+a+b))
return -np.sum(logpdf)
You can then minimize the log-likelihood with
from scipy.optimize import minimize
init_params = [1, 10]
# note that I am putting 'H' in the args
res = minimize(loglike_betabinom, x0=init_params,
args=(players['H'], players['AB']),
method='L-BFGS-B', options={'disp': True, 'maxiter': 250})
print(res)
and that should give reasonable results.
You could check How to properly fit a beta distribution in python? for inspiration if you want to rework further your code.
Consider a simple problem:
max log(x)
subject to x >= 1e-4
To solve the problem with scipy.optimize.minimize:
import numpy as np
from scipy.optimize import minimize
from math import log
def func(x):
return log(x[0])
def func_deriv(x):
return np.array([1 / x[0]])
cons = ({'type': 'ineq',
'fun' : lambda x: x[0] - 1e-4,
'jac' : lambda x: np.array([1])})
minimize(func, [1.0], jac=func_deriv, constraints=cons, method='SLSQP')
The script encounters ValueError because log(x) is evaluated with negative x. It seems that the function value is evaluated even if the constraint is not satisfied.
I understand that using bounds in minimize() could avoid the problem, but this is just a simplification of my original problem. In my original problem, the constraint x >= 1e-4 cannot be represented easily as bounds of x, but rather of the form g(x) >= C, so bounds wouldn't help.
If we only care about the function value with x > ε, it is possible to define a safe function extending the domain.
Take the log function as an example. It is possible to extend log with another cubic function, while making the bridge point ε smooth:
safe_log(x) = log(x) if x > ε else a * (x - b)**3
To calculate a and b, we have to satisfy:
log(ε) = a * (ε - b)**3
1 / ε = 3 * a * (ε - b)**2
Hence the safe_log function:
eps = 1e-3
def safe_log(x):
if x > eps:
return log(x)
logeps = log(eps)
a = 1 / (3 * eps * (3 * logeps * eps)**2)
b = eps * (1 - 3 * logeps)
return a * (x - b)**3
And it looks like this:
What do I have to use to figure out the inverse probability density function for normal distribution? I'm using scipy to find out normal distribution probability density function:
from scipy.stats import norm
norm.pdf(1000, loc=1040, scale=210)
0.0018655737107410499
How can I figure out that 0.0018 probability corresponds to 1000 in the given normal distribution?
There can be no 1:1 mapping from probability density to quantile.
Because the PDF of the normal distribution is quadratic, there can be either 2, 1 or zero quantiles that have a particular probability density.
Update
It's actually not that hard to find the roots analytically. The PDF of a normal distribution is given by:
With a bit of rearrangement we get:
(x - mu)**2 = -2 * sigma**2 * log( pd * sigma * sqrt(2 * pi))
If the discriminant on the RHS is < 0, there are no real roots. If it equals zero, there is a single root (where x = mu), and where it is > 0 there are two roots.
To put it all together into a function:
import numpy as np
def get_quantiles(pd, mu, sigma):
discrim = -2 * sigma**2 * np.log(pd * sigma * np.sqrt(2 * np.pi))
# no real roots
if discrim < 0:
return None
# one root, where x == mu
elif discrim == 0:
return mu
# two roots
else:
return mu - np.sqrt(discrim), mu + np.sqrt(discrim)
This gives the desired quantile(s), to within rounding error:
from scipy.stats import norm
pd = norm.pdf(1000, loc=1040, scale=210)
print get_quantiles(pd, 1040, 210)
# (1000.0000000000001, 1079.9999999999998)
import scipy.stats as stats
import scipy.optimize as optimize
norm = stats.norm(loc=1040, scale=210)
y = norm.pdf(1000)
print(y)
# 0.00186557371074
print(optimize.fsolve(lambda x:norm.pdf(x)-y, norm.mean()-norm.std()))
# [ 1000.]
print(optimize.fsolve(lambda x:norm.pdf(x)-y, norm.mean()+norm.std()))
# [ 1080.]
There exist distributions which attain any value an infinite number of times. (For example, the simple function with value 1 on an infinite sequence of intervals with lengths 1/2, 1/4, 1/8, etc. attains the value 1 an infinite number of times. And it is a distribution since 1/2 + 1/4 + 1/8 + ... = 1)
So the use of fsolve above is not guaranteed to find all values of x where pdf(x) equals a certain value, but it may help you find some root.
I'm trying to fit some data from a simulation code I've been running in order to figure out a power law dependence. When I plot a linear fit, the data does not fit very well.
Here's the python script I'm using to fit the data:
#!/usr/bin/env python
from scipy import optimize
import numpy
xdata=[ 0.00010851, 0.00021701, 0.00043403, 0.00086806, 0.00173611, 0.00347222]
ydata=[ 29.56241016, 29.82245508, 25.33930469, 19.97075977, 12.61276074, 7.12695312]
fitfunc = lambda p, x: p[0] + p[1] * x ** (p[2])
errfunc = lambda p, x, y: (y - fitfunc(p, x))
out,success = optimize.leastsq(errfunc, [1,-1,-0.5],args=(xdata, ydata),maxfev=3000)
print "%g + %g*x^%g"%(out[0],out[1],out[2])
the output I get is:
-71205.3 + 71174.5*x^-9.79038e-05
While on the plot the fit looks about as good as you'd expect from a leastsquares fit, the form of the output bothers me. I was hoping the constant would be close to where you'd expect the zero to be (around 30). And I was expecting to find a power dependence of a larger fraction than 10^-5.
I've tried rescaling my data and playing with the parameters to optimize.leastsq with no luck. Is what I'm trying to accomplish possible or does my data just not allow it? The calculation is expensive, so getting more data points is non-trivial.
Thanks!
It is much better to first take the logarithm, then use leastsquare to fit to this linear equation, which will give you a much better fit. There is a great example in the scipy cookbook, which I've adapted below to fit your code.
The best fits like this are: amplitude = 0.8955, and index = -0.40943265484
As we can see from the graph (and your data), if its a power law fit we would not expect the amplitude value to be near 30. As in the power law equation f(x) == Amp * x ** index, so with a negative index: f(1) == Amp and f(0) == infinity.
from pylab import *
from scipy import *
from scipy import optimize
xdata=[ 0.00010851, 0.00021701, 0.00043403, 0.00086806, 0.00173611, 0.00347222]
ydata=[ 29.56241016, 29.82245508, 25.33930469, 19.97075977, 12.61276074, 7.12695312]
logx = log10(xdata)
logy = log10(ydata)
# define our (line) fitting function
fitfunc = lambda p, x: p[0] + p[1] * x
errfunc = lambda p, x, y: (y - fitfunc(p, x))
pinit = [1.0, -1.0]
out = optimize.leastsq(errfunc, pinit,
args=(logx, logy), full_output=1)
pfinal = out[0]
covar = out[1]
index = pfinal[1]
amp = 10.0**pfinal[0]
print 'amp:',amp, 'index', index
powerlaw = lambda x, amp, index: amp * (x**index)
##########
# Plotting data
##########
clf()
subplot(2, 1, 1)
plot(xdata, powerlaw(xdata, amp, index)) # Fit
plot(xdata, ydata)#, yerr=yerr, fmt='k.') # Data
text(0.0020, 30, 'Ampli = %5.2f' % amp)
text(0.0020, 25, 'Index = %5.2f' % index)
xlabel('X')
ylabel('Y')
subplot(2, 1, 2)
loglog(xdata, powerlaw(xdata, amp, index))
plot(xdata, ydata)#, yerr=yerr, fmt='k.') # Data
xlabel('X (log scale)')
ylabel('Y (log scale)')
savefig('power_law_fit.png')
show()
It helps to rescale xdata so the numbers are not all so small.
You could work in a new variable xprime = 1000*x.
Then fit xprime versus y.
Least squares will find parameters q fitting
y = q[0] + q[1] * (xprime ** q[2])
= q[0] + q[1] * ((1000*x) ** q[2])
So let
p[0] = q[0]
p[1] = q[1] * (1000**q[2])
p[2] = q[2]
Then y = p[0] + p[1] * (x ** p[2])
It also helps to change the initial guess to something closer to your desired result, such as
[max(ydata), -1, -0.5].
from scipy import optimize
import numpy as np
def fitfunc(p, x):
return p[0] + p[1] * (x ** p[2])
def errfunc(p, x, y):
return y - fitfunc(p, x)
xdata=np.array([ 0.00010851, 0.00021701, 0.00043403, 0.00086806,
0.00173611, 0.00347222])
ydata=np.array([ 29.56241016, 29.82245508, 25.33930469, 19.97075977,
12.61276074, 7.12695312])
N = 5000
xprime = xdata * N
qout,success = optimize.leastsq(errfunc, [max(ydata),-1,-0.5],
args=(xprime, ydata),maxfev=3000)
out = qout[:]
out[0] = qout[0]
out[1] = qout[1] * (N**qout[2])
out[2] = qout[2]
print "%g + %g*x^%g"%(out[0],out[1],out[2])
yields
40.1253 + -282.949*x^0.375555
The standard way to use linear least squares to obtain an exponential fit is to do what fraxel suggests in his/her answer: fit a straight line to log(y_i).
However, this method has known numerical disadvantages, particularly sensitivity (a small change in the data yields a large change in the estimate). The preferred alternative is to use a nonlinear least squares approach -- it is less sensitive. But if you are satisfied with the linear LS method for non-critical purposes, just use that.