How to use Lambda functions in SciPy Orthogonal Distance Regression (ODR)? - python

I am trying to fit different curves to the same data using SciPy's ODR. To make this easier, I have defined each candidate function as a Lambda function. This Lambda function then gets passed on to ODR through a function of my own, called odr_fit():
if trend_name == 'linear':
eps = odr_fit(f=lambda x, p: p[0]*x+p[1], xdata=x, ydata=y)
elif trend_name == 'quadratic':
eps = odr_fit(f=lambda x, p: p[0]*x**2+p[1]*x+p[2], xdata=x, ydata=y)
# and so on....
The odr_fit() function is defined as follows:
def odr_fit(f, xdata, ydata):
"""
Function to calculate orthogonal residuals given data and a function to fit
:param f: function in the format f(x, p), where p is a list of parameters
:param xdata: Pandas Series with independent variable
:param ydata: Pandas Series of same length with dependent variable
:return: eps: estimated orthogonal errors, same length as xdata and ydata
"""
f = np.vectorize(f, excluded=['p'])
model = Model(f)
data = RealData(xdata, ydata)
odr = ODR(data, model, beta0=[10]*3) # beta0 is an initial estimate. Should be the same length as p ideally as well
odr_result = odr.run()
return odr_result.eps
My problem: somewhere in this code, p (the parameters, the second argument of my Lambda function) is expected to be a scalar. I want it to be seen as a list or tuple.
IndexError: invalid index to scalar variable.
ODR also expects a list/tuple (see documentation here). Of course, I could just define each function separately, but this would be cumbersome and harder to expand. Is there a way of making ODR work using Lambda functions? Perhaps something that can force p to be a vector of a certain length.
Hope someone knows a solution to this!
Thanks,
Alex
EDIT: Here is the error in full:
ERROR - <class 'IndexError'>: invalid index to scalar variable.
<traceback object at 0x000001F4669271C8>
ERROR - <class 'IndexError'>: invalid index to scalar variable.
<traceback object at 0x000001F4669271C8>
ERROR - <class 'IndexError'>: invalid index to scalar variable.
<traceback object at 0x000001F4669271C8>
ERROR - <class 'IndexError'>: invalid index to scalar variable.
<traceback object at 0x000001F4669271C8>
ERROR - <class 'IndexError'>: invalid index to scalar variable.
<traceback object at 0x000001F4669271C8>
Traceback (most recent call last):
File "C:\Users\...\features.py", line 296, in add_features
data = _add_trend_features(data, col_trends={'x':'TemperatureOutsideAvgNight', 'y':'TemperatureInsideAvgNight'})
File "C:\Users\...\trend_features.py", line 109, in _add_trend_features
return_orth_dist=True)
File "C:\Users\...\trend_features.py", line 88, in _add_trend_features
eps = odr_fit(f=lambda x, p: p[0]*x+p[1], xdata=x, ydata=y)
File "C:\Users\...\Documents\phil_anomaly_detection\phil_anomaly_detection\trend_features.py", line 23, in odr_fit
odr = ODR(data, model, beta0=[10]*3)
File "C:\Users\...\miniconda3\lib\site-packages\scipy\odr\odrpack.py", line 770, in __init__
self._check()
File "C:\Users\...\miniconda3\lib\site-packages\scipy\odr\odrpack.py", line 831, in _check
res = self.model.fcn(*arglist)
File "C:\Users\...\miniconda3\lib\site-packages\numpy\lib\function_base.py", line 2091, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\Users\...\miniconda3\lib\site-packages\numpy\lib\function_base.py", line 2161, in _vectorize_call
ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
File "C:\Users\...\miniconda3\lib\site-packages\numpy\lib\function_base.py", line 2121, in _get_ufunc_and_otypes
outputs = func(*inputs)
File "C:\Users\...\miniconda3\lib\site-packages\numpy\lib\function_base.py", line 2086, in func
return self.pyfunc(*the_args, **kwargs)
File "C:\Users\...\trend_features.py", line 88, in <lambda>
eps = odr_fit(f=lambda x, p: p[0]*x+p[1], xdata=x, ydata=y)
IndexError: invalid index to scalar variable.
Process finished with exit code 1

Related

Error in Fitting a curve using curve_fit in python

I'm trying to fit the next function into some data using the Scipy Curve_fit function:
def sinugauss(x, A, B, C):
exponente = A*(np.sin(x-B))**2
return np.array([C/(np.exp(exponente))])
I have a data set of 33 points but I keep getting this error:
Traceback (most recent call last):\
File "D:Es_periodico_o_no.py", line 35, in <module>\
res, cov = curve_fit(sinugauss,datos['x'],datos['y'])\
File "D:\lib\site-packages\scipy\optimize\minpack.py", line 789, in curve_fit\
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)\
File "D:\lib\site-packages\scipy\optimize\minpack.py", line 414, in leastsq
raise TypeError(f"Improper input: func input vector length N={n} must"\
TypeError: Improper input: func input vector length N=3 must not exceed func output vector length M=1
This is the full code:
def sinugauss(x, Ventas, Inicio, Desv):
exponente = Desv*(np.sin(x-Inicio))**2
return np.array([Ventas/(np.exp(exponente))])
for index, row in real_df.iterrows():
datos_y = np.array([row]).transpose()
datos_x = np.array([range(len(datos_y))]).transpose()
datos = pd.DataFrame(np.column_stack([datos_x,datos_y]),columns=['x','y'])
res, cov = curve_fit(sinugauss,datos['x'],datos['y'])
print(res)
print(cov)
The error raises since the first iteration, all the rows has 33 not nan points. There may be zeros
Thank you
In the function sinugauss, change the return statement to:
return C/np.exp(exponente)
When you write np.array([C/(np.exp(exponente))]), you are wrapping the expression C/np.exp(exponente), which might be an array with shape, say, (3,), in a 2-d array with shape (1, 3). That is not the shape that curve_fit expects from your function.

scipy.optimize.shgo ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I am trying to fit a function y(x,T,p) to get the coefficients a,b,c,d,e,f. The data for y,x,T,p are known. With a global optimizer I want to find a good starting point. shgo seems to be the only one that accept constraints.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import shgo
# test data
x = np.array([0.1,0.2,0.3,1])
T = np.array([300,300,300,300])
p = np.array([67.2,67.2,67.2,67.2])
y = np.array([30,50,55,67.2])
# function
def func(pars,x,T,p):
a,b,c,d,e,f = pars
return x*p+x*(1-x)*(a+b*T+c*T**2+d*x+e*x*T+f*x*T**2)*p
# residual
def resid(pars):
return ((func(pars,x,T,p) - y) ** 2).sum()
# constraint: derivation is positive in every data point
def der(pars):
a,b,c,d,e,f = pars
return -p*((3*f*T**2+3*e*T+3*d)*x**2+((2*c-2*f)*T**2+(2*b-2*e)*T-2*d+2*a)*x-c*T**2-b*T-a-1)
con1 = ({'type':'ineq', 'fun':der})
# minimizer shgo
bounds = [(-1,1),(-1,1),(-1,1),(-1,1),(-1,1),(-1,1)]
res = shgo(resid, bounds, constraints=con1)
print("a = %f , b = %f, c = %f, d = %f, e = %f, f = %f" % (res[0], res[1], res[2], res[3], res[4], res[5]))
# plotting
x0 = np.linspace(0, 1, 100)
fig, ax = plt.subplots()
fig.dpi = 80
ax.plot(x,y,'ro',label='data')
for i,txt in enumerate(T):
ax.annotate(txt,(x[i],y[i]))
ax.plot(x0, func(res.x, x0, 300,67.2), '-', label='fit1')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
With this I am getting ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I do not know what that Error means and other threads with the same Error does not realy help me to understand. When I use a local minimizer (scipy.optimize.minimize with the method cobyla) the error does not appear.
Can someone help me to understand my problem or even help to fix it?
Thanks
EDIT:
Traceback (most recent call last):
File "C:\Users\...\Python\Python36\site-packages\scipy\optimize\_shgo_lib\triangulation.py", line 759, in __getitem__
return self.cache[x]
KeyError: (0, 0, 0, 0, 0, 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/.../test.py", line 70, in <module>
res = shgo(resid, bounds, constraints=con1)
File "C:\Users\...\Python\Python36\site-packages\scipy\optimize\_shgo.py", line 423, in shgo
shc.construct_complex()
File "C:\Users\...\Python\Python36\site-packages\scipy\optimize\_shgo.py", line 726, in construct_complex
self.iterate()
File "C:\Users\...\Python\Python36\site-packages\scipy\optimize\_shgo.py", line 869, in iterate
self.iterate_complex()
File "C:\Users\...\Python\Python36\site-packages\scipy\optimize\_shgo.py", line 890, in iterate_hypercube
self.g_args)
File "C:\Users\...\Python\Python36\site-packages\scipy\optimize\_shgo_lib\triangulation.py", line 121, in __init__
self.n_cube(dim, symmetry=symmetry)
File "C:\Users\...\Python\Python36\site-packages\scipy\optimize\_shgo_lib\triangulation.py", line 172, in n_cube
self.C0.add_vertex(self.V[origintuple])
File "C:\Users\...\Python\Python36\site-packages\scipy\optimize\_shgo_lib\triangulation.py", line 767, in __getitem__
index=self.index)
File "C:\Users\...\Python\Python36\site-packages\scipy\optimize\_shgo_lib\triangulation.py", line 681, in __init__
if g(self.x_a, *args) < 0.0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The problem is, that der returns an array instead of a scalar value. Changing
con1 = ({'type':'ineq', 'fun':der})
to
con_list = [{'type':'ineq', 'fun': lambda x: der(x)[i_out]} for i_out in range(T.shape[0])]
removes the error. This transforms each output of der into its own inequality constraint.
Also, since your constraints are all written so that der(x)>=0, one can simply keep the definition of oyur constraint with vector output and then fetch the minimum of the outputs, i.e., take a scalar value constraint x -> \min (der(x)).

Passing extra arguments to a custom scoring function in sklearn pipeline

I need to perform univariate feature selection in sklearn with a custom score, therefore I am using GenericUnivariateSelect. However, as in documentation,
modes for selectors : {‘percentile’, ‘k_best’, ‘fpr’, ‘fdr’, ‘fwe’}
In my case, I needed to select features where the score was above a certain value, so I have implemented:
from sklearn.feature_selection.univariate_selection import _clean_nans
from sklearn.feature_selection.univariate_selection import f_classif
import numpy as np
import pandas as pd
from sklearn.feature_selection import GenericUnivariateSelect
from sklearn.metrics import make_scorer
from sklearn.feature_selection.univariate_selection import _BaseFilter
from sklearn.pipeline import Pipeline
class SelectMinScore(_BaseFilter):
# Sklearn documentation: modes for selectors : {‘percentile’, ‘k_best’, ‘fpr’, ‘fdr’, ‘fwe’}
# custom selector:
# select features according to the k highest scores.
def __init__(self, score_func=f_classif, minScore=0.7):
super(SelectMinScore, self).__init__(score_func)
self.minScore = minScore
self.score_func=score_func
[...]
def _get_support_mask(self):
check_is_fitted(self, 'scores_')
if self.minScore == 'all':
return np.ones(self.scores_.shape, dtype=bool)
else:
scores = _clean_nans(self.scores_)
mask = np.zeros(scores.shape, dtype=bool)
# Custom part
# only score above the min
mask=scores>self.minScore
if not np.any(mask):
mask[np.argmax(scores)]=True
return mask
However, I also need to use a custom score function which must receive extra arguments (XX) here:
Unfortunatley, I could not solve using make_scorer
def Custom_Score(X,Y,XX):
return 1
class myclass():
def mymethod(self,_XX):
custom_filter=GenericUnivariateSelect(Custom_Score(XX=_XX),mode='MinScore',param=0.7)
custom_filter._selection_modes.update({'MinScore': SelectMinScore})
MyProcessingPipeline=Pipeline(steps=[('filter_step', custom_filter)])
# finally
X=pd.DataFrame(data=np.random.rand(500,3))
y=pd.DataFrame(data=np.random.rand(500,1))
MyProcessingPipeline.fit(X,y)
MyProcessingPipeline.transform(X,y)
_XX=np.random.rand(500,1
C=myclass()
C.mymethod(_XX)
This raises the following error:
Traceback (most recent call last):
File "<ipython-input-37-f493745d7e1b>", line 1, in <module>
runfile('C:/Users/_____/Desktop/pd-sk-integration.py', wdir='C:/Users/_____/Desktop')
File "C:\Users\______\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Users\\______\\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)=
File "C:/Users/______/Desktop/pd-sk-integration.py", line 65, in <module>
C.mymethod()
File "C:/Users/______/Desktop/pd-sk-integration.py", line 55, in mymethod
custom_filter=GenericUnivariateSelect(Custom_Score(XX=_XX),mode='MinScore',param=0.7)
TypeError: Custom_Score() takes exactly 3 arguments (1 given)
EDIT:
I have tried a fix by adding an extra kwarg (XX) to the fit() of my SelectMinScore function, and by passing it as a fit paramter.
As suggested by #TomDLT,
custom_filter = SelectMinScore(minScore=0.7)
pipe = Pipeline(steps=[('filter_step', custom_filter)])
pipe.fit(X,y, filter_step__XX=XX)
However, if I do
line 291, in set_params
(key, self.__class__.__name__))
ValueError: Invalid parameter XX for estimator SelectMinScore. Check the list of available parameters with `estimator.get_params().keys()`.
As you can see in the code, the scorer function is not called with extra arguments, so there is currently no easy way in scikit-learn to pass your samples properties XX.
For your problem, a slightly hackish way could be to change the function fit in SelectMinScore, adding an additional parameter XX:
def fit(self, X, y, XX):
"""..."""
X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=True)
if not callable(self.score_func):
raise TypeError("The score function should be a callable, %s (%s) "
"was passed."
% (self.score_func, type(self.score_func)))
self._check_params(X, y)
score_func_ret = self.score_func(X, y, XX)
if isinstance(score_func_ret, (list, tuple)):
self.scores_, self.pvalues_ = score_func_ret
self.pvalues_ = np.asarray(self.pvalues_)
else:
self.scores_ = score_func_ret
self.pvalues_ = None
self.scores_ = np.asarray(self.scores_)
return self
Then you could call the pipeline with extra fit params:
custom_filter = SelectMinScore(minScore=0.7)
pipe = Pipeline(steps=[('filter_step', custom_filter)])
pipe.fit(X,y, filter_step__XX=XX)

Passing jacobian to scipy.optimize.fsolve, when optimising a univariate function

import math
from scipy.optimize import fsolve
def sigma(s, Bpu):
return s - math.sin(s) - math.pi * Bpu
def jac_sigma(s):
return 1 - math.cos(s)
if __name__ == '__main__':
Bpu = 0.5
sig_r = fsolve(sigma, x0=[math.pi], args=(Bpu), fprime=jac_sigma)
Running the above script throws the following error,
Traceback (most recent call last):
File "C:\Users\RP12808\Desktop\_test_fsolve.py", line 12, in <module>
sig_r = fsolve(sigma, x0=[math.pi], args=(Bpu), fprime=jac_sigma)
File "C:\Users\RP12808\AppData\Local\Programs\Python\Python36\lib\site-packages\scipy\optimize\minpack.py", line 146, in fsolve
res = _root_hybr(func, x0, args, jac=fprime, **options)
File "C:\Users\RP12808\AppData\Local\Programs\Python\Python36\lib\site-packages\scipy\optimize\minpack.py", line 226, in _root_hybr
_check_func('fsolve', 'fprime', Dfun, x0, args, n, (n, n))
File "C:\Users\RP12808\AppData\Local\Programs\Python\Python36\lib\site-packages\scipy\optimize\minpack.py", line 26, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
TypeError: jac_sigma() takes 1 positional argument but 2 were given
I am unsure how to pass jacobian to fsolve function... how do solve this?
Thanks in advance..RP
The function that computes the Jacobian matrix must take the same arguments as the function to be solved, and it must return an array:
def jac_sigma(s, Bpu):
return np.array([1 - math.cos(s)])
In general, the Jacobian matrix is a two-dimensional array, but
when the variable is a scalar (as it is here) and the Jacobian "matrix" is 1x1, the code accepts a one- or two-dimensional value. (It might be nice if it also accepted a scalar in this case, but it doesn't.)
Actually, it is sufficient that the return value be "array-like"; e.g. a list is also acceptable:
def jac_sigma(s, Bpu):
return [1 - math.cos(s)]

Scipy Minimize uses a NoneType

I'm trying to code a multiple linear regression. Here's the line of code where my program raises an error:
least = optimize.minimize(residsq(xmat, ylist, coeff), coeff, constraints = ({'type': 'eq', 'fun': sum(resid(xmat, ylist, coeff))}), method = 'BFGS') # Choose the coefficients that minimize the sum of the residuals squared subject to keeping the sum of the residuals equal to 0.
xmat is a list of vectors: [[3,5,2],[3,1,6],[7,2,3], [9,-2,0]]. ylist is a list of the same length as xmat: [5,2,7,7]. coeff is the coefficient list, initially [mean(ylist), 0, 0, 0] ([constant, b_0, b_1, b_2]). resid is the list of residuals for each point, and residsq is the N2 norm of the residuals (sqrt of sum of squares).
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
import linregtest
File "C:\Python33\lib\site-packages\linregtest.py", line 4, in <module>
out = linreg.multilinreg(xmat, ylist, True)
File "C:\Python33\lib\site-packages\linreg.py", line 120, in multilinreg
least = optimize.minimize(residsq(xmat, ylist, coeff), coeff, constraints = ({'type': 'eq', 'fun': sum(resid(xmat, ylist, coeff))}), method = 'BFGS') # Choose the coefficients that minimize the sum of the residuals squared subject to keeping the sum of the residuals equal to 0.
File "C:\Python33\lib\site-packages\scipy\optimize\_minimize.py", line 302, in minimize
RuntimeWarning)
File "C:\Python33\lib\idlelib\PyShell.py", line 60, in idle_showwarning
file.write(warnings.formatwarning(message, category, filename,
AttributeError: 'NoneType' object has no attribute 'write'
Where does file come from, and how do I suppress this error?
EDIT: Solve one problem, find another. Maybe you can help me determine where SciPy is calling a float?
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
import linregtest
File "C:\Python33\lib\site-packages\linregtest.py", line 4, in <module>
out = linreg.multilinreg(xmat, ylist, True)
File "C:\Python33\lib\site-packages\linreg.py", line 123, in multilinreg
least = optimize.minimize(residsq(xmat, ylist, coeff), coeff, constraints = ({'type': 'eq', 'fun': sumresid(xmat, ylist, coeff)}), method = 'SLSQP') # Choose the coefficients that minimize the sum of the residuals squared subject to keeping the sum of the residuals equal to 0.
File "C:\Python33\lib\site-packages\scipy\optimize\_minimize.py", line 364, in minimize
constraints, **options)
File "C:\Python33\lib\site-packages\scipy\optimize\slsqp.py", line 301, in _minimize_slsqp
meq = sum(map(len, [atleast_1d(c['fun'](x, *c['args'])) for c in cons['eq']]))
File "C:\Python33\lib\site-packages\scipy\optimize\slsqp.py", line 301, in <listcomp>
meq = sum(map(len, [atleast_1d(c['fun'](x, *c['args'])) for c in cons['eq']]))
TypeError: 'float' object is not callable
I just edited my python 3.2 IDLE, PyShell.py (fixing lines 59 and 62)
def idle_showwarning(message, category, filename, lineno,
file=None, line=None):
if file is None:
file = sys.stderr #warning_stream
try:
file.write(warnings.formatwarning(message, category, filename,
lineno, line=line))
use sys.stderr instead of the global warning_stream which uses sys.__stderr__. sys.__stderr__ is None in my case. I don't know why a global is used.
the call to warnings.formatwarning had an extra invalid file keyword.
Now, I get the warning printed, for example
>>> import numpy as np
>>> np.uint(1) - np.uint(2)
Warning (from warnings module):
File "C:\Programs\Python32\Lib\idlelib\idle.pyw", line 1
try:
RuntimeWarning: overflow encountered in ulong_scalars
>>> 4294967295
>>>
edit:
searching for python bug reports
http://bugs.python.org/issue12438 wrong file argument has been fixed
http://bugs.python.org/issue13582 problems with sys.__stderr__ is None is open

Categories