Consider this simple optimization code:
from symfit import *
x1 = 1
x2 = 4
p1, p2 = parameters('p1, p2')
model = p1*p2
constraints = [
Eq(x1*p1+(x2-x1)*p2, 1),
Ge(p1, p2),
Ge(p2, 0)
]
fit = Fit(- model, constraints=constraints)
fit_result = fit.execute()
print(fit_result)
This gives me the following warning:
/home/user/.local/lib/python3.5/site-packages/symfit/core/fit.py:1783: RuntimeWarning: invalid value encountered in double_scalars
return 1 - SS_res/SS_tot
What is causing this?
This happens because symfit currently always tries to calculate the coefficient of determination (R^2) for the fit. In this case, since there is no data, both SS_res and SS_tot are zero, resulting in a 0/0 and hence the warning and the resulting nan in fit_results.
We are now adding extra intelligence to FitResults such that it won't be calculated in the first place, but it is safe to ignore this warning.
Related
Hi I am trying to do linear regression and this what happens to me when n I try to run the code
the code is :
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('train.csv')
df = data.dropna()
X = np.array(df.x)
Y = np.array(df.y)
def compute_error(x,y,m,b):
error = 0
for i in range(len(x)):
error+= (y[i]-(m*x[i]+b))**2
return error / float(len(x))
def step_graident_descent(x,y,m,b , alpha):
N = float(len(x))
b_graident = 0
m_graident = 0
for i in range(0 , len(x)):
x = X[i]
y = Y[i]
b_graident +=(-2/N) * (y-(m*x+b))
m_graident += (-2/N) * x*(y-(m*x+b))
new_m = m - alpha*m_graident
new_b = b - alpha*b_graident
return new_m , new_b
def graident_decsent(x,y,m,b,num_itters,alpha):
for i in range(num_itters):
m,b = step_graident_descent(x,y,m,b,alpha)
return m,b
def run():
b=0
m=0
numberOfIttertions = 1000
m,b = graident_decsent(X , Y ,m,b,numberOfIttertions , 0.001)
print(m,b)
if __name__ == '__main__':
run()
and the error that i get is :
linearRegression.py:22: RuntimeWarning: overflow encountered in double_scalars
m_graident += (-2/N) * x*(y-(m*x+b))
linearRegression.py:21: RuntimeWarning: invalid value encountered in double_scalars
b_graident +=(-2/N) * (y-(m*x+b))
linearRegression.py:22: RuntimeWarning: invalid value encountered in double_scalars
m_graident += (-2/N) * x*(y-(m*x+b))
if any one can help me i would be so greatfull since i am stuck on this for about two months and thankyou
Edit: tl;dr Solution
Ok so here is the minimal reproducible example that I was talking of. I replace your X,Y by the following.
n = 10**2
X = np.linspace(0,10**6,n)
Y = 1.5*X+0.2*10**6*np.random.normal(size=n)
If I then run
b=0
m=0
numberOfIttertions = 1000
m,b = graident_decsent(X , Y ,m,b,numberOfIttertions , 0.001)
I get exactly the problem you describe. The only surprising thing is the ease of the solution. I just replace your alpha by 10**-14 and everything works fine.
Why and how to give a Minimal, Reproducible Example
Your example is not reproducible since we don't have train.csv. Generally both for understanding your problem yourself and to get concrete answers it is very helpful to have a very small example that people can run and tinker with it. E.g. maybe you can think of a much shorter input to your regression that also results in this error.
The first RuntimeWarning
But now to your question. Your first RuntimeWarning i.e.
linearRegression.py:22: RuntimeWarning: overflow encountered in double_scalars
m_graident += (-2/N) * x*(y-(m*x+b))
means x and hence m_graident are of type numpy.double=numpy.float64. This datatype can store numbers in the range (-1.79769313486e+308, 1.79769313486e+308). If you go bigger or smaller that's called an overflow. E.g.
np.double(1.79769313486e+308)
is still ok but if you multiply it by say 1.1 you get your favorite runtime warning. Notice that this is 'just' a warning and still runs. But it can't give you a number back since it would be too big. instead it gives you inf.
The other RuntimeWarnings
Ok but what does
linearRegression.py:21: RuntimeWarning: invalid value encountered in double_scalars
b_graident +=(-2/N) * (y-(m*x+b))
mean?
It comes from calculating with the infinity that I just mentioned. Some calculations with infinity are valid.
np.inf-10**6 -> inf
np.inf+10**6 -> inf
np.inf/10**6 -> inf
np.inf*10**6 -> inf
np.inf*(-10**6) -> -inf
1/np.inf -> 0
np.inf *np.inf -> inf
but some are not and give nan i.e. not a number.
np.inf/np.inf
np.inf-np.inf
These are called indeterminate forms in math since it depends on how you got to the infinity what you would get out. E.g.
(np.double(1e+309)+np.double(1e+309))-np.double(1e+309)
np.double(1e+309)-(np.double(1e+309)+np.double(1e+309))
are both inf-inf but you would expect different results.
Getting a nan is unfortunate since calculations with nan yield always nan. And you can't use your gradients anymore once you add a nan.
Other resources
An other option is to use an existing implementation of linear regression. E.g. from scikit-learn. See
scikit-learn linear regression reference
scikit-learn user guid on linear models
I am trying to minimize variance across a portfolio of 100 securities.
def portvol(w, x):
return np.dot(w.T, np.dot(x, w))*252
covmat = annreturn.cov()
w0 = np.ones(len(covmat)) * (1 / len(covmat)) #equal weighting initially
bounds = ((0,1),) * len(covmat)
constraints = {'fun': lambda i: np.sum(i)-1.0, 'type': 'eq'}
optweights = minimize(portvol, w0, args = (covmat), method = 'SLSQP', bounds = bounds, constraints =
constraints)
annreturn.cov() is a 100x100 DataFrame. The output is the same .01 even weightings I started with and this failure message:
message: 'Inequality constraints incompatible'
nfev: 102
nit: 1
njev: 1
status: 4
success: False
This is how I calculated annualized returns...
annreturn = data.pct_change() #again, assuming percentage change
annreturn = annreturn.iloc[1:]
annreturn = (annreturn+1)**252-1
If you don't notice anything off the bat, it's ok. It took me 2 days to realize I didn't divide my PCT_CHANGE() result by 100. Time well spent. I was getting correlations to the powers of like 15+. Here is what the last line should have looked like, and the minimize function from the original question works fine.
annreturn = (annreturn/100+1)**252-1
Sorry if anyone took time on this without the above piece!
I have a reasonably simple constrained optimization problem but get different answers depending on how I do it. Let's get the import and a pretty print function out of the way first:
import numpy as np
from scipy.optimize import minimize, LinearConstraint, NonlinearConstraint, SR1
def print_res( res, label ):
print("\n\n ***** ", label, " ***** \n")
print(res.message)
print("obj func value at solution", obj_func(res.x))
print("starting values: ", x0)
print("ending values: ", res.x.astype(int) )
print("% diff", (100.*(res.x-x0)/x0).astype(int) )
print("target achieved?",target,res.x.sum())
The sample data is very simple:
n = 5
x0 = np.arange(1,6) * 10_000
target = x0.sum() + 5_000 # increase sum from 15,000 to 20,000
Here's the constrained optimization (including jacobians). In words, the objective function I want to minimize is just the sum of squared percentage changes from the initial values to final values. The linear equality constraint is simply requiring x.sum() to equal a constant.
def obj_func(x):
return ( ( ( x - x0 ) / x0 ) ** 2 ).sum()
def obj_jac(x):
return 2. * ( x - x0 ) / x0 ** 2
def constr_func(x):
return x.sum() - target
def constr_jac(x):
return np.ones(n)
And for comparison, I've re-factored as an unconstrained minimization by using the equality constraint to replace x[0] with a function of x[1:]. Note that the unconstrained function is passed x0[1:] whereas the constrained function is passed x0.
def unconstr_func(x):
x_one = target - x.sum()
first_term = ( ( x_one - x0[0] ) / x0[0] ) ** 2
second_term = ( ( ( x - x0[1:] ) / x0[1:] ) ** 2 ).sum()
return first_term + second_term
I then try to minimize in three ways:
Unconstrained with 'Nelder-Mead'
Constrained with 'trust-constr' (w/ & w/o jacobian)
Constrained with 'SLSQP' (w/ & w/o jacobian)
Code:
##### (1) unconstrained
res0 = minimize( unconstr_func, x0[1:], method='Nelder-Mead') # OK, but weird note
res0.x = np.hstack( [target - res0.x.sum(), res0.x] )
print_res( res0, 'unconstrained' )
##### (2a) constrained -- trust-constr w/ jacobian
nonlin_con = NonlinearConstraint( constr_func, 0., 0., constr_jac )
resTCjac = minimize( obj_func, x0, method='trust-constr',
jac='2-point', hess=SR1(), constraints = nonlin_con )
print_res( resTCjac, 'trust-const w/ jacobian' )
##### (2b) constrained -- trust-constr w/o jacobian
nonlin_con = NonlinearConstraint( constr_func, 0., 0. )
resTC = minimize( obj_func, x0, method='trust-constr',
jac='2-point', hess=SR1(), constraints = nonlin_con )
print_res( resTC, 'trust-const w/o jacobian' )
##### (3a) constrained -- SLSQP w/ jacobian
eq_cons = { 'type': 'eq', 'fun' : constr_func, 'jac' : constr_jac }
resSQjac = minimize( obj_func, x0, method='SLSQP',
jac = obj_jac, constraints = eq_cons )
print_res( resSQjac, 'SLSQP w/ jacobian' )
##### (3b) constrained -- SLSQP w/o jacobian
eq_cons = { 'type': 'eq', 'fun' : constr_func }
resSQ = minimize( obj_func, x0, method='SLSQP',
jac = obj_jac, constraints = eq_cons )
print_res( resSQ, 'SLSQP w/o jacobian' )
Here is some simplified output (and of course you can run the code to get the full output):
starting values: [10000 20000 30000 40000 50000]
***** (1) unconstrained *****
Optimization terminated successfully.
obj func value at solution 0.0045454545454545305
ending values: [10090 20363 30818 41454 52272]
***** (2a) trust-const w/ jacobian *****
The maximum number of function evaluations is exceeded.
obj func value at solution 0.014635854609684874
ending values: [10999 21000 31000 41000 51000]
***** (2b) trust-const w/o jacobian *****
`gtol` termination condition is satisfied.
obj func value at solution 0.0045454545462939935
ending values: [10090 20363 30818 41454 52272]
***** (3a) SLSQP w/ jacobian *****
Optimization terminated successfully.
obj func value at solution 0.014636111111111114
ending values: [11000 21000 31000 41000 51000]
***** (3b) SLSQP w/o jacobian *****
Optimization terminated successfully.
obj func value at solution 0.014636111111111114
ending values: [11000 21000 31000 41000 51000]
Notes:
(1) & (2b) are plausible solutions in that they achieve significantly lower objective function values and intuitively we'd expect the variables with larger starting values to move more (both absolutely and in percentage terms) than the smaller ones.
Adding the jacobian to 'trust-const' causes it to get the wrong answer (or at least a worse answer) and also to exceed max iterations. Maybe the jacobian is wrong, but the function is so simple that I'm pretty sure it's correct (?)
'SLSQP' doesn't seem to work w/ or w/o the jacobian supplied, but works very fast and claims to terminate successfully. This seems very worrisome in that getting the wrong answer and claiming to have terminated successfully is pretty much the worst possible outcome.
Initially I used very small starting values and targets (just 1/1,000 of what I have above) and in that case all 5 approaches above work fine and give the same answers. My sample data is still extremely small, and it seems kinda bizarre for it to handle 1,2,..,5 but not 1000,2000,..5000.
FWIW, note that the 3 incorrect results all hit the target by adding 1,000 to each initial value -- this satisfies the constraint but comes nowhere near minimizing the objective function (b/c variables with higher initial values should be increased more than lower ones to minimize the sum of squared percentage differences).
So my question is really just what is happening here and why do only (1) and (2b) seem to work?
More generally, I'd like to find a good python-based approach to this and similar optimization problems and will consider answers using other packages besides scipy although the best answer would ideally also address what is going on with scipy here (e.g. is this user error or a bug I should post to github?).
Here is how this problem could be solved using nlopt which is a library for nonlinear optimization which I've been pretty impressed with.
First, the objective function and gradient are both defined using the same function:
def obj_func(x, grad):
if grad.size > 0:
grad[:] = obj_jac(x)
return ( ( ( x/x0 - 1 )) ** 2 ).sum()
def obj_jac(x):
return 2. * ( x - x0 ) / x0 ** 2
def constr_func(x, grad):
if grad.size > 0:
grad[:] = constr_jac(x)
return x.sum() - target
def constr_jac(x):
return np.ones(n)
Then, to run the minimization using Nelder-Mead and SLSQP:
opt = nlopt.opt(nlopt.LN_NELDERMEAD,len(x0)-1)
opt.set_min_objective(unconstr_func)
opt.set_ftol_abs(1e-15)
xopt = opt.optimize(x0[1:].copy())
xopt = np.hstack([target - xopt.sum(), xopt])
fval = opt.last_optimum_value()
print_res(xopt,fval,"Nelder-Mead");
opt = nlopt.opt(nlopt.LD_SLSQP,len(x0))
opt.set_min_objective(obj_func)
opt.add_equality_constraint(constr_func)
opt.set_ftol_abs(1e-15)
xopt = opt.optimize(x0.copy())
fval = opt.last_optimum_value()
print_res(xopt,fval,"SLSQP w/ jacobian");
And here are the results:
***** Nelder-Mead *****
obj func value at solution 0.00454545454546
result: 3
starting values: [ 10000. 20000. 30000. 40000. 50000.]
ending values: [10090 20363 30818 41454 52272]
% diff [0 1 2 3 4]
target achieved? 155000.0 155000.0
***** SLSQP w/ jacobian *****
obj func value at solution 0.00454545454545
result: 3
starting values: [ 10000. 20000. 30000. 40000. 50000.]
ending values: [10090 20363 30818 41454 52272]
% diff [0 1 2 3 4]
target achieved? 155000.0 155000.0
When testing this out, I think I discovered what the issue with the original attempt was. If I set the absolute tolerance on the function to 1e-8 which is what the scipy functions default to I get:
***** Nelder-Mead *****
obj func value at solution 0.0045454580693
result: 3
starting values: [ 10000. 20000. 30000. 40000. 50000.]
ending values: [10090 20363 30816 41454 52274]
% diff [0 1 2 3 4]
target achieved? 155000.0 155000.0
***** SLSQP w/ jacobian *****
obj func value at solution 0.0146361108503
result: 3
starting values: [ 10000. 20000. 30000. 40000. 50000.]
ending values: [10999 21000 31000 41000 51000]
% diff [9 5 3 2 2]
target achieved? 155000.0 155000.0
which is exactly what you were seeing. So my guess is that the minimizer ends up somewhere in the likelihood space during SLSQP where the next jump is less than 1e-8 from the last place.
This is a partial answer to the question that I'm putting here to keep the question from getting even bigger, but I'd still love to see a more comprehensive and explanatory answer. These answers are based on comments from two others, but neither of them fully wrote out the code, and I thought it would make sense to make that explicit so here it is:
Fixing 2a (trust-constr with jacobian)
It's seems that the key here with regard to the Jacobian and Hessian is to specify neither or both (but not the jacobian only). #SubhaneilLahiri commented to this effect and there was also an error message to this effect that I initially failed to notice:
UserWarning: delta_grad == 0.0. Check if the approximated function is linear. If the function is linear better results can be obtained by defining the Hessian as zero instead of using quasi-Newton approximations.
So I fixed it by defining the hessian function:
def constr_hess(x,v):
return np.zeros([n,n])
and adding it to the constraint
nonlin_con = NonlinearConstraint( constr_func, 0., 0., constr_jac, constr_hess )
Fixing 3a & 3b (SLSQP)
This just seemed to be a matter of making the tolerance smaller as suggested by #user545424. So I just added options={'ftol':1e-15} to the minimization:
resSQjac = minimize( obj_func, x0, method='SLSQP',
options={'ftol':1e-15},
jac = obj_jac, constraints = eq_cons )
I am trying to solve a really simple problem programatically.
I have a single variable t that must satisfy 2 equations simultaneously as follows:
x_v*t = (x_1 - x_2)
y_v*t = (y_1 - y_2)
My first reaction is to just solve it by dividing the right side by the coefficient in the left, however that coefficient is not guaranteed to be non 0.
Thus we can always use the RREF algorithm and represent the system as:
a | b
c | d
where a = x_v, b = (x_1 - x_2), c = y_v, d = (y_1 - y_2)
After finding the RREF we could have:
The 0 matrix (system is solvable)
First row has a leading one and the second row is 0's (system is sovable)
Either row has a leading 0 and a non zero trailing number (system is not solvable)
Although I could try to code the above myself, I wanted to use a library instead where I can just setup the system and ask an api whether a solution exists or not, so I used numpy.
Currently however I can't even set a system where the non-extended matrix is not square.
Is this achievable?
This is achievable. You can use the function fsolve of the scipy library. An example
import numpy as np
import scipy.optimize as so
def f(t, x_v, x_1, x_2, y_v, y_1, y_2):
return np.sum(np.abs([
x_v*t - (x_1 - x_2),
y_v*t - (y_1 - y_2),
]))
and then you would do
sol_object = so.fsolve(
func = f, # the function that returns the (scalar) 0 you want.
x0 = 1, # The starting estimate
args = (1, 2, 3, 1, 2, 3), # Other arguments of f, i.e. x_v, x_1, x_2, y_v, y_1, y_2
full_output = True
)
sol = sol_object[0]
message = sol_object[-1]
print(sol)
print(message)
Output
[-1.]
The solution converged.
As mentioned in comment by jdhesa, this could have been done using linear-in-parameter solving methods. The one I use above a priori works with any kind of transformation.
An alternative is to just perform the division.
If both "sides" are zero then the result will be NaN (0/0).
If the rhs i.e. (x_1 - x_2) is nonzero the result will be inf.
# c1 is np.array([x_1, y_1, z_1, ...])
# c2 is np.array([x_2, y_2, z_2, ...])
c = c1 - c2
# Use this to supress numpy warnings
with np.warnings.catch_warnings():
np.warnings.filterwarnings('ignore', 'invalid value encountered in true_divide')
np.warnings.filterwarnings('ignore','divide by zero encountered in true_divide')
t = c / v
non_nan_t = t[~np.isnan(t)]
if np.isinf(t).any():
print('unsolvable because rhs is nonzero but v is zero')
elif not np.allclose(non_nan_t, non_nan_t[0]):
print('no solution because equations disagree')
else:
print('solution:', non_nan_t[0])
I'm trying to solve a system of ordinary differential equations with Euler's method, but when I try to print velocity I get
RuntimeWarning: overflow encountered in double_scalars
and instead of printing numbers I get nan (not a number). I think the problem might be when defining the acceleration, but I don't know for sure, I would really appreciate if someone could help me.
from numpy import *
from math import pi,exp
d=0.0005*10**-6
a=[]
while d<1.0*10**-6 :
d=d*2
a.append(d)
D=array(a)
def a_particula (D,x, v, t):
cc=((1.00+((2.00*lam)/D))*(1.257+(0.400*exp((-1.10*D)/(2.00*lam)))))
return (g-((densaire*g)/densparticula)-((mu*18.0*v)/(cc*densparticula* (D**2.00))))
def euler (acel,D, x, v, tv, n=15):
nv, xv, vv = tv.size, zeros_like(tv), zeros_like(tv)
xv[0], vv[0] = x, v
for k in range(1, nv):
t, Dt = tv[k-1], (tv[k]-tv[k-1])/float(n)
for i in range(n):
a = acel(D,x, v, t)
t, v = t+Dt, v+a*Dt
x = x+v*Dt
xv[k], vv[k] = x, v
return (xv, vv)
g=(9.80)
densaire= 1.225
lam=0.70*10**-6
densparticula=1000.00
mu=(1.785*10**-5)
tv = linspace(0, 5, 50)
x, v = 0, 0 #initial conditions
for j in range(len(D)):
xx, vv = euler(a_particula, D[j], x, v, tv)
print(D[j],xx,vv)
In future it would be helpful if you included the full warning message in your question - it will contain the line where the problem occurs:
tmp/untitled.py:15: RuntimeWarning: overflow encountered in double_scalars
return (g-((densaire*g)/densparticula)-((mu*18.0*v)/(cc*densparticula* (D**2.00))))
Overflow occurs when the magnitude of a variable exceeds the largest value that can be represented. In this case, double_scalars refers to a 64 bit float, which has a maximum value of:
print(np.finfo(float).max)
# 1.79769313486e+308
So there is a scalar value in the expression:
(g-((densaire*g)/densparticula)-((mu*18.0*v)/(cc*densparticula* (D**2.00))))
that is exceeding ~1.79e308. To find out which one, you can use np.errstate to raise a FloatingPointError when this occurs, then catch it and start the Python debugger:
...
with errstate(over='raise'):
try:
ret = (g-((densaire*g)/densparticula)-((mu*18.0*v)/(cc*densparticula* (D**2.00))))
except FloatingPointError:
import pdb
pdb.set_trace()
return ret
...
From within the debugger you can then check the values of the various parts of this expression. The overflow seems to occur in:
(mu*18.0*v)/(cc*densparticula* (D**2.00))
The first time the warning occurs, (cc*densparticula* (D**2.00) evaluates as 2.3210168586496022e-12, whereas (mu*18.0*v) evaluates as -9.9984582297025182e+299.
Basically you are dividing a very large number by a very small number, and the magnitude of the result is exceeding the maximum value that can be represented. This might be a problem with your math, or it might be that your inputs to the function are not reasonably scaled.
Your system reduces to
dv/dt = a = K - L*v
with K about 10 and L ranging between, at first glance 1e+5 to 1e+10. The actual coefficients used confirm that:
D=1.0000e-09 K= 9.787995 L=1.3843070e+08
D=3.2000e-08 K= 9.787995 L=4.2570244e+06
D=1.0240e-06 K= 9.787995 L=9.0146813e+04
The Euler step for the velocity is
v[j+1]=v[j]+(K-L*v[j])*dt =(1-L*dt)*v[j] + K*dt
For anything resembling the intended friction effect, i.e., the velocity falling to K/L one needs that abs(1-L*dt)<1, and if possible 0<1-L*dt<1, that is, dt < 1/L. Which means here that dt < 1e-10.
To be able to use larger time steps you need to use methods for stiff differential equations, which means implicit methods. The most simple ones are the implicit Euler method, the midpoint method and the trapezoidal method.
Because of the linearity, the midpoint and trapezoidal method amount to the same formula
v[j+1] = v[j] + dt * ( K - L*(v[j]+v[j+1])/2 )
or
v[j+1] = ( (1-L*dt/2)*v[j] + K*dt ) / (1+L*dt/2)
Of course, the most simple method is to just exactly integrate the ODE
(-L*v')/(K-L*v)=-L => K-L*v(t)=C*exp(-L*t), C=K-L*v(0)
v(t)=K/L + exp(-L*t)*(v(0)-K/L)
which integrates to
x(t)=x(0)+K/L*t+(1-exp(-L*t))/L*(v(0)-K/L).
Alternatively it is possible that you made an error in transcribing the physical laws into formulas so that the magnitudes of the constants are all wrong.