Logistic regression using SciPy - python
I am trying to code up logistic regression in Python using the SciPy fmin_bfgs function, but am running into some issues. I wrote functions for the logistic (sigmoid) transformation function, and the cost function, and those work fine (I have used the optimized values of the parameter vector found via canned software to test the functions, and those match up). I am not that sure of my implementation of the gradient function, but it looks reasonable.
Here is the code:
# purpose: logistic regression
import numpy as np
import scipy.optimize
# prepare the data
data = np.loadtxt('data.csv', delimiter=',', skiprows=1)
vY = data[:, 0]
mX = data[:, 1:]
intercept = np.ones(mX.shape[0]).reshape(mX.shape[0], 1)
mX = np.concatenate((intercept, mX), axis = 1)
iK = mX.shape[1]
iN = mX.shape[0]
# logistic transformation
def logit(mX, vBeta):
return((1/(1.0 + np.exp(-np.dot(mX, vBeta)))))
# test function call
vBeta0 = np.array([-.10296645, -.0332327, -.01209484, .44626211, .92554137, .53973828,
1.7993371, .7148045 ])
logit(mX, vBeta0)
# cost function
def logLikelihoodLogit(vBeta, mX, vY):
return(-(np.sum(vY*np.log(logit(mX, vBeta)) + (1-vY)*(np.log(1-logit(mX, vBeta))))))
logLikelihoodLogit(vBeta0, mX, vY) # test function call
# gradient function
def likelihoodScore(vBeta, mX, vY):
return(np.dot(mX.T,
((np.dot(mX, vBeta) - vY)/
np.dot(mX, vBeta)).reshape(iN, 1)).reshape(iK, 1))
likelihoodScore(vBeta0, mX, vY).shape # test function call
# optimize the function (without gradient)
optimLogit = scipy.optimize.fmin_bfgs(logLikelihoodLogit,
x0 = np.array([-.1, -.03, -.01, .44, .92, .53,
1.8, .71]),
args = (mX, vY), gtol = 1e-3)
# optimize the function (with gradient)
optimLogit = scipy.optimize.fmin_bfgs(logLikelihoodLogit,
x0 = np.array([-.1, -.03, -.01, .44, .92, .53,
1.8, .71]), fprime = likelihoodScore,
args = (mX, vY), gtol = 1e-3)
The first optimization (without gradient) ends with a whole lot of stuff about division by zero.
The second optimization (with gradient) ends with a matrices not aligned error, which probably means I have got the way the gradient is to be returned wrong.
Any help with this is appreciated. If anyone wants to try this, the data is included below.
low,age,lwt,race,smoke,ptl,ht,ui
0,19,182,2,0,0,0,1
0,33,155,3,0,0,0,0
0,20,105,1,1,0,0,0
0,21,108,1,1,0,0,1
0,18,107,1,1,0,0,1
0,21,124,3,0,0,0,0
0,22,118,1,0,0,0,0
0,17,103,3,0,0,0,0
0,29,123,1,1,0,0,0
0,26,113,1,1,0,0,0
0,19,95,3,0,0,0,0
0,19,150,3,0,0,0,0
0,22,95,3,0,0,1,0
0,30,107,3,0,1,0,1
0,18,100,1,1,0,0,0
0,18,100,1,1,0,0,0
0,15,98,2,0,0,0,0
0,25,118,1,1,0,0,0
0,20,120,3,0,0,0,1
0,28,120,1,1,0,0,0
0,32,121,3,0,0,0,0
0,31,100,1,0,0,0,1
0,36,202,1,0,0,0,0
0,28,120,3,0,0,0,0
0,25,120,3,0,0,0,1
0,28,167,1,0,0,0,0
0,17,122,1,1,0,0,0
0,29,150,1,0,0,0,0
0,26,168,2,1,0,0,0
0,17,113,2,0,0,0,0
0,17,113,2,0,0,0,0
0,24,90,1,1,1,0,0
0,35,121,2,1,1,0,0
0,25,155,1,0,0,0,0
0,25,125,2,0,0,0,0
0,29,140,1,1,0,0,0
0,19,138,1,1,0,0,0
0,27,124,1,1,0,0,0
0,31,215,1,1,0,0,0
0,33,109,1,1,0,0,0
0,21,185,2,1,0,0,0
0,19,189,1,0,0,0,0
0,23,130,2,0,0,0,0
0,21,160,1,0,0,0,0
0,18,90,1,1,0,0,1
0,18,90,1,1,0,0,1
0,32,132,1,0,0,0,0
0,19,132,3,0,0,0,0
0,24,115,1,0,0,0,0
0,22,85,3,1,0,0,0
0,22,120,1,0,0,1,0
0,23,128,3,0,0,0,0
0,22,130,1,1,0,0,0
0,30,95,1,1,0,0,0
0,19,115,3,0,0,0,0
0,16,110,3,0,0,0,0
0,21,110,3,1,0,0,1
0,30,153,3,0,0,0,0
0,20,103,3,0,0,0,0
0,17,119,3,0,0,0,0
0,17,119,3,0,0,0,0
0,23,119,3,0,0,0,0
0,24,110,3,0,0,0,0
0,28,140,1,0,0,0,0
0,26,133,3,1,2,0,0
0,20,169,3,0,1,0,1
0,24,115,3,0,0,0,0
0,28,250,3,1,0,0,0
0,20,141,1,0,2,0,1
0,22,158,2,0,1,0,0
0,22,112,1,1,2,0,0
0,31,150,3,1,0,0,0
0,23,115,3,1,0,0,0
0,16,112,2,0,0,0,0
0,16,135,1,1,0,0,0
0,18,229,2,0,0,0,0
0,25,140,1,0,0,0,0
0,32,134,1,1,1,0,0
0,20,121,2,1,0,0,0
0,23,190,1,0,0,0,0
0,22,131,1,0,0,0,0
0,32,170,1,0,0,0,0
0,30,110,3,0,0,0,0
0,20,127,3,0,0,0,0
0,23,123,3,0,0,0,0
0,17,120,3,1,0,0,0
0,19,105,3,0,0,0,0
0,23,130,1,0,0,0,0
0,36,175,1,0,0,0,0
0,22,125,1,0,0,0,0
0,24,133,1,0,0,0,0
0,21,134,3,0,0,0,0
0,19,235,1,1,0,1,0
0,25,95,1,1,3,0,1
0,16,135,1,1,0,0,0
0,29,135,1,0,0,0,0
0,29,154,1,0,0,0,0
0,19,147,1,1,0,0,0
0,19,147,1,1,0,0,0
0,30,137,1,0,0,0,0
0,24,110,1,0,0,0,0
0,19,184,1,1,0,1,0
0,24,110,3,0,1,0,0
0,23,110,1,0,0,0,0
0,20,120,3,0,0,0,0
0,25,241,2,0,0,1,0
0,30,112,1,0,0,0,0
0,22,169,1,0,0,0,0
0,18,120,1,1,0,0,0
0,16,170,2,0,0,0,0
0,32,186,1,0,0,0,0
0,18,120,3,0,0,0,0
0,29,130,1,1,0,0,0
0,33,117,1,0,0,0,1
0,20,170,1,1,0,0,0
0,28,134,3,0,0,0,0
0,14,135,1,0,0,0,0
0,28,130,3,0,0,0,0
0,25,120,1,0,0,0,0
0,16,95,3,0,0,0,0
0,20,158,1,0,0,0,0
0,26,160,3,0,0,0,0
0,21,115,1,0,0,0,0
0,22,129,1,0,0,0,0
0,25,130,1,0,0,0,0
0,31,120,1,0,0,0,0
0,35,170,1,0,1,0,0
0,19,120,1,1,0,0,0
0,24,116,1,0,0,0,0
0,45,123,1,0,0,0,0
1,28,120,3,1,1,0,1
1,29,130,1,0,0,0,1
1,34,187,2,1,0,1,0
1,25,105,3,0,1,1,0
1,25,85,3,0,0,0,1
1,27,150,3,0,0,0,0
1,23,97,3,0,0,0,1
1,24,128,2,0,1,0,0
1,24,132,3,0,0,1,0
1,21,165,1,1,0,1,0
1,32,105,1,1,0,0,0
1,19,91,1,1,2,0,1
1,25,115,3,0,0,0,0
1,16,130,3,0,0,0,0
1,25,92,1,1,0,0,0
1,20,150,1,1,0,0,0
1,21,200,2,0,0,0,1
1,24,155,1,1,1,0,0
1,21,103,3,0,0,0,0
1,20,125,3,0,0,0,1
1,25,89,3,0,2,0,0
1,19,102,1,0,0,0,0
1,19,112,1,1,0,0,1
1,26,117,1,1,1,0,0
1,24,138,1,0,0,0,0
1,17,130,3,1,1,0,1
1,20,120,2,1,0,0,0
1,22,130,1,1,1,0,1
1,27,130,2,0,0,0,1
1,20,80,3,1,0,0,1
1,17,110,1,1,0,0,0
1,25,105,3,0,1,0,0
1,20,109,3,0,0,0,0
1,18,148,3,0,0,0,0
1,18,110,2,1,1,0,0
1,20,121,1,1,1,0,1
1,21,100,3,0,1,0,0
1,26,96,3,0,0,0,0
1,31,102,1,1,1,0,0
1,15,110,1,0,0,0,0
1,23,187,2,1,0,0,0
1,20,122,2,1,0,0,0
1,24,105,2,1,0,0,0
1,15,115,3,0,0,0,1
1,23,120,3,0,0,0,0
1,30,142,1,1,1,0,0
1,22,130,1,1,0,0,0
1,17,120,1,1,0,0,0
1,23,110,1,1,1,0,0
1,17,120,2,0,0,0,0
1,26,154,3,0,1,1,0
1,20,106,3,0,0,0,0
1,26,190,1,1,0,0,0
1,14,101,3,1,1,0,0
1,28,95,1,1,0,0,0
1,14,100,3,0,0,0,0
1,23,94,3,1,0,0,0
1,17,142,2,0,0,1,0
1,21,130,1,1,0,1,0
Your problem is that the function you are trying to minimise, logLikelihoodLogit, will return NaN with values very close to your initial estimate. And it will also try to evaluate negative logarithms and encounter other problems. fmin_bfgs doesn't know about this, will try to evaluate the function for such values and run into trouble.
I suggest using a bounded optimisation instead. You can use scipy's optimize.fmin_l_bfgs_b for this. It uses a similar algorithm to fmin_bfgs, but it supports bounds in the parameter space. You call it similarly, just add a bounds keyword. Here's a simple example on how you'd call fmin_l_bfgs_b:
from scipy.optimize import fmin_bfgs, fmin_l_bfgs_b
# list of bounds: each item is a tuple with the (lower, upper) bounds
bd = [(0, 1.), ...]
test = fmin_l_bfgs_b(logLikelihoodLogit, x0=x0, args=(mX, vY), bounds=bd,
approx_grad=True)
Here I'm using an approximate gradient (seemed to work fine with your data), but you can pass fprime as in your example (I don't have time to check its correctness). You'll know your parameter space better than me, just make sure to build the bounds array for all the meaningful values that your parameters can take.
Here is the answer I sent back to the SciPy list where this question was cross-posted. Thanks to #tiago for his answer. Basically, I reparametrized the likelihood function. Also, added a call to the check_grad function.
#=====================================================
# purpose: logistic regression
import numpy as np
import scipy as sp
import scipy.optimize
import matplotlib as mpl
import os
# prepare the data
data = np.loadtxt('data.csv', delimiter=',', skiprows=1)
vY = data[:, 0]
mX = data[:, 1:]
# mX = (mX - np.mean(mX))/np.std(mX) # standardize the data; if required
intercept = np.ones(mX.shape[0]).reshape(mX.shape[0], 1)
mX = np.concatenate((intercept, mX), axis = 1)
iK = mX.shape[1]
iN = mX.shape[0]
# logistic transformation
def logit(mX, vBeta):
return((np.exp(np.dot(mX, vBeta))/(1.0 + np.exp(np.dot(mX, vBeta)))))
# test function call
vBeta0 = np.array([-.10296645, -.0332327, -.01209484, .44626211, .92554137, .53973828,
1.7993371, .7148045 ])
logit(mX, vBeta0)
# cost function
def logLikelihoodLogit(vBeta, mX, vY):
return(-(np.sum(vY*np.log(logit(mX, vBeta)) + (1-vY)*(np.log(1-logit(mX, vBeta))))))
logLikelihoodLogit(vBeta0, mX, vY) # test function call
# different parametrization of the cost function
def logLikelihoodLogitVerbose(vBeta, mX, vY):
return(-(np.sum(vY*(np.dot(mX, vBeta) - np.log((1.0 + np.exp(np.dot(mX, vBeta))))) +
(1-vY)*(-np.log((1.0 + np.exp(np.dot(mX, vBeta))))))))
logLikelihoodLogitVerbose(vBeta0, mX, vY) # test function call
# gradient function
def likelihoodScore(vBeta, mX, vY):
return(np.dot(mX.T,
(logit(mX, vBeta) - vY)))
likelihoodScore(vBeta0, mX, vY).shape # test function call
sp.optimize.check_grad(logLikelihoodLogitVerbose, likelihoodScore,
vBeta0, mX, vY) # check that the analytical gradient is close to
# numerical gradient
# optimize the function (without gradient)
optimLogit = scipy.optimize.fmin_bfgs(logLikelihoodLogitVerbose,
x0 = np.array([-.1, -.03, -.01, .44, .92, .53,
1.8, .71]),
args = (mX, vY), gtol = 1e-3)
# optimize the function (with gradient)
optimLogit = scipy.optimize.fmin_bfgs(logLikelihoodLogitVerbose,
x0 = np.array([-.1, -.03, -.01, .44, .92, .53,
1.8, .71]), fprime = likelihoodScore,
args = (mX, vY), gtol = 1e-3)
#=====================================================
I was facing the same issues. When I experimented with different algorithms implementation in scipy.optimize.minimize , I found that for finding optimal logistic regression parameters for my data set , Newton Conjugate Gradient proved helpful. Call can be made to it like:
Result = scipy.optimize.minimize(fun = logLikelihoodLogit,
x0 = np.array([-.1, -.03, -.01, .44, .92, .53,1.8, .71]),
args = (mX, vY),
method = 'TNC',
jac = likelihoodScore);
optimLogit = Result.x;
Related
How to tune the parameters of the following system of ODEs?
I have a system of ODEs which assume the form In essence, I have the solutions B_1(t) and B_2(t) for t=5 and I am interested in finding the unknown parameters rho_1 and rho_2. The approach I took entailed: 1) define the function corresponding to the system above; 2) integrate using solve_ivp and deduct the result from the true values of B_1(t) and B_2(t); 3) finally use fsolve to find the appropriate values of rho_1 and rho_2, such that the difference between the true parameters B_1(t) and B_2(t) and the ones obtained using the tuned parameters of rho_1 and rho_2 is a zero vector. The code I have implemented for this purpose is the following: t_eval = np.arange(0, 5) def fun(t, s, rho_1, rho_2): return np.dot(np.array([0.775416, 0,0, 0.308968]).reshape(2,2), s) + np.array([rho_1, rho_2]).reshape(2,1) def fun2(t, rho_1, rho_2): res = solve_ivp(fun, [0, 5], y0 = [0, 0], t_eval=t_eval, args = (rho_1, rho_2), vectorized = True) sol = res.y[:,4]-np.array([0.01306365, 0.00589119]) return sol root = fsolve(fun2, [0, 0]) However, I am not sure whether fsolve is not appropriate for this purpose tor there is something wrong with my code, as I get the following error: fun2() missing 2 required positional arguments: 'rho_1' and 'rho_2'
Find the value of variables to maximize return of function in Python
I'd want to achieve similar result as how the Solver-function in Excel is working. I've been reading of Scipy optimization and been trying to build a function which outputs what I would like to find the maximal value of. The equation is based on four different variables which, see my code below: import pandas as pd import numpy as np from scipy import optimize cols = { 'Dividend2': [9390, 7448, 177], 'Probability': [341, 376, 452], 'EV': [0.53, 0.60, 0.55], 'Dividend': [185, 55, 755], 'EV2': [123, 139, 544], } df = pd.DataFrame(cols) def myFunc(params): """myFunc metric.""" (ev, bv, vc, dv) = params df['Number'] = np.where(df['Dividend2'] <= vc, 1, 0) \ + np.where(df['EV2'] <= dv, 1, 0) df['Return'] = np.where( df['EV'] <= ev, 0, np.where( df['Probability'] >= bv, 0, df['Number'] * df['Dividend'] - (vc + dv) ) ) return -1 * (df['Return'].sum()) b1 = [(0.2,4), (300,600), (0,1000), (0,1000)] start = [0.2, 600, 1000, 1000] result = optimize.minimize(fun=myFunc, bounds=b1, x0=start) print(result) So I'd like to find the maximum value of the column Return in df when changing the variables ev,bv,vc & dv. I'd like them to be between in the intervals of ev: 0.2-4, bv: 300-600, vc: 0-1000 & dv: 0-1000. When running my code it seem like the function stops at x0.
Solution I will use optuna library to give you a solution to the type of problem you are trying to solve. I have tried using scipy.optimize.minimize and it appears that the loss-landscape is probably quite flat in most places, and hence the tolerances enforce the minimizing algorithm (L-BFGS-B) to stop prematurely. Optuna Docs: https://optuna.readthedocs.io/en/stable/index.html With optuna, it rather straight forward. Optuna only requires an objective function and a study. The study send various trials to the objective function, which in turn, evaluates the metric of your choice. I have defined another metric function myFunc2 by mostly removing the np.where calls, as you can do-away with them (reduces number of steps) and make the function slightly faster. # install optuna with pip pip install -Uqq optuna Although I looked into using a rather smooth loss landscape, sometimes it is necessary to visualize the landscape itself. The answer in section B elaborates on visualization. But, what if you want to use a smoother metric function? Section D sheds some light on this. Order of code-execution should be: Sections: C >> B >> B.1 >> B.2 >> B.3 >> A.1 >> A.2 >> D A. Building Intuition If you create a hiplot (also known as a plot with parallel-coordinates) with all the possible parameter values as mentioned in the search_space for Section B.2, and plot the lowest 50 outputs of myFunc2, it would look like this: Plotting all such points from the search_space would look like this: A.1. Loss Landscape Views for Various Parameter-Pairs These figures show that mostly the loss-landscape is flat for any two of the four parameters (ev, bv, vc, dv). This could be a reason why, only GridSampler (which brute-forces the searching process) does better, compared to the other two samplers (TPESampler and RandomSampler). Please click on any of the images below to view them enlarged. This could also be the reason why scipy.optimize.minimize(method="L-BFGS-B") fails right off the bat. 01. dv-vc 02. dv-bv 03. dv-ev 04. bv-ev 05. cv-ev 06. vc-bv # Create contour plots for parameter-pairs study_name = "GridSampler" study = studies.get(study_name) views = [("dv", "vc"), ("dv", "bv"), ("dv", "ev"), ("bv", "ev"), ("vc", "ev"), ("vc", "bv")] for i, (x, y) in enumerate(views): print(f"Figure: {i}/{len(views)}") study_contour_plot(study=study, params=(x, y)) A.2. Parameter Importance study_name = "GridSampler" study = studies.get(study_name) fig = optuna.visualization.plot_param_importances(study) fig.update_layout(title=f'Hyperparameter Importances: {study.study_name}', autosize=False, width=800, height=500, margin=dict(l=65, r=50, b=65, t=90)) fig.show() B. Code Section B.3. finds the lowest metric -88.333 for: {'ev': 0.2, 'bv': 500.0, 'vc': 222.2222, 'dv': 0.0} import warnings from functools import partial from typing import Iterable, Optional, Callable, List import pandas as pd import numpy as np import optuna from tqdm.notebook import tqdm warnings.filterwarnings("ignore", category=optuna.exceptions.ExperimentalWarning) optuna.logging.set_verbosity(optuna.logging.WARNING) PARAM_NAMES: List[str] = ["ev", "bv", "vc", "dv",] DEFAULT_METRIC_FUNC: Callable = myFunc2 def myFunc2(params): """myFunc metric v2 with lesser steps.""" global df # define as a global variable (ev, bv, vc, dv) = params df['Number'] = (df['Dividend2'] <= vc) * 1 + (df['EV2'] <= dv) * 1 df['Return'] = ( (df['EV'] > ev) * (df['Probability'] < bv) * (df['Number'] * df['Dividend'] - (vc + dv)) ) return -1 * (df['Return'].sum()) def make_param_grid( bounds: List[Tuple[float, float]], param_names: Optional[List[str]]=None, num_points: int=10, as_dict: bool=True, ) -> Union[pd.DataFrame, Dict[str, List[float]]]: """ Create parameter search space. Example: grid = make_param_grid(bounds=b1, num_points=10, as_dict=True) """ if param_names is None: param_names = PARAM_NAMES # ["ev", "bv", "vc", "dv"] bounds = np.array(bounds) grid = np.linspace(start=bounds[:,0], stop=bounds[:,1], num=num_points, endpoint=True, axis=0) grid = pd.DataFrame(grid, columns=param_names) if as_dict: grid = grid.to_dict() for k,v in grid.items(): grid.update({k: list(v.values())}) return grid def objective(trial, bounds: Optional[Iterable]=None, func: Optional[Callable]=None, param_names: Optional[List[str]]=None): """Objective function, necessary for optimizing with optuna.""" if param_names is None: param_names = PARAM_NAMES if (bounds is None): bounds = ((-10, 10) for _ in param_names) if not isinstance(bounds, dict): bounds = dict((p, (min(b), max(b))) for p, b in zip(param_names, bounds)) if func is None: func = DEFAULT_METRIC_FUNC params = dict( (p, trial.suggest_float(p, bounds.get(p)[0], bounds.get(p)[1])) for p in param_names ) # x = trial.suggest_float('x', -10, 10) return func((params[p] for p in param_names)) def optimize(objective: Callable, sampler: Optional[optuna.samplers.BaseSampler]=None, func: Optional[Callable]=None, n_trials: int=2, study_direction: str="minimize", study_name: Optional[str]=None, formatstr: str=".4f", verbose: bool=True): """Optimizing function using optuna: creates a study.""" if func is None: func = DEFAULT_METRIC_FUNC study = optuna.create_study( direction=study_direction, sampler=sampler, study_name=study_name) study.optimize( objective, n_trials=n_trials, show_progress_bar=True, n_jobs=1, ) if verbose: metric = eval_metric(study.best_params, func=myFunc2) msg = format_result(study.best_params, metric, header=study.study_name, format=formatstr) print(msg) return study def format_dict(d: Dict[str, float], format: str=".4f") -> Dict[str, float]: """ Returns formatted output for a dictionary with string keys and float values. """ return dict((k, float(f'{v:{format}}')) for k,v in d.items()) def format_result(d: Dict[str, float], metric_value: float, header: str='', format: str=".4f"): """Returns formatted result.""" msg = f"""Study Name: {header}\n{'='*30} ✅ study.best_params: \n\t{format_dict(d)} ✅ metric: {metric_value} """ return msg def study_contour_plot(study: optuna.Study, params: Optional[List[str]]=None, width: int=560, height: int=500): """ Create contour plots for a study, given a list or tuple of two parameter names. """ if params is None: params = ["dv", "vc"] fig = optuna.visualization.plot_contour(study, params=params) fig.update_layout( title=f'Contour Plot: {study.study_name} ({params[0]}, {params[1]})', autosize=False, width=width, height=height, margin=dict(l=65, r=50, b=65, t=90)) fig.show() bounds = [(0.2, 4), (300, 600), (0, 1000), (0, 1000)] param_names = PARAM_NAMES # ["ev", "bv", "vc", "dv",] pobjective = partial(objective, bounds=bounds) # Create an empty dict to contain # various subsequent studies. studies = dict() Optuna comes with a few different types of Samplers. Samplers provide the strategy of how optuna is going to sample points from the parametr-space and evaluate the objective function. https://optuna.readthedocs.io/en/stable/reference/samplers.html B.1 Use TPESampler from optuna.samplers import TPESampler sampler = TPESampler(seed=42) study_name = "TPESampler" studies[study_name] = optimize( pobjective, sampler=sampler, n_trials=100, study_name=study_name, ) # Study Name: TPESampler # ============================== # # ✅ study.best_params: # {'ev': 1.6233, 'bv': 585.2143, 'vc': 731.9939, 'dv': 598.6585} # ✅ metric: -0.0 B.2. Use GridSampler GridSampler requires a parameter search grid. Here we are using the following search_space. from optuna.samplers import GridSampler # create search-space search_space = make_param_grid(bounds=bounds, num_points=10, as_dict=True) sampler = GridSampler(search_space) study_name = "GridSampler" studies[study_name] = optimize( pobjective, sampler=sampler, n_trials=2000, study_name=study_name, ) # Study Name: GridSampler # ============================== # # ✅ study.best_params: # {'ev': 0.2, 'bv': 500.0, 'vc': 222.2222, 'dv': 0.0} # ✅ metric: -88.33333333333337 B.3. Use RandomSampler from optuna.samplers import RandomSampler sampler = RandomSampler(seed=42) study_name = "RandomSampler" studies[study_name] = optimize( pobjective, sampler=sampler, n_trials=300, study_name=study_name, ) # Study Name: RandomSampler # ============================== # # ✅ study.best_params: # {'ev': 1.6233, 'bv': 585.2143, 'vc': 731.9939, 'dv': 598.6585} # ✅ metric: -0.0 C. Dummy Data For the sake of reproducibility, I am keeping a record of the dummy data used here. import pandas as pd import numpy as np from scipy import optimize cols = { 'Dividend2': [9390, 7448, 177], 'Probability': [341, 376, 452], 'EV': [0.53, 0.60, 0.55], 'Dividend': [185, 55, 755], 'EV2': [123, 139, 544], } df = pd.DataFrame(cols) def myFunc(params): """myFunc metric.""" (ev, bv, vc, dv) = params df['Number'] = np.where(df['Dividend2'] <= vc, 1, 0) \ + np.where(df['EV2'] <= dv, 1, 0) df['Return'] = np.where( df['EV'] <= ev, 0, np.where( df['Probability'] >= bv, 0, df['Number'] * df['Dividend'] - (vc + dv) ) ) return -1 * (df['Return'].sum()) b1 = [(0.2,4), (300,600), (0,1000), (0,1000)] start = [0.2, 600, 1000, 1000] result = optimize.minimize(fun=myFunc, bounds=b1, x0=start) print(result) C.1. An Observation So, it seems at first glance that the code executed properly and did not throw any error. It says it had success in finding the minimized solution. fun: -0.0 hess_inv: <4x4 LbfgsInvHessProduct with dtype=float64> jac: array([0., 0., 3., 3.]) message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL' # 💡 nfev: 35 nit: 2 status: 0 success: True x: array([2.e-01, 6.e+02, 0.e+00, 0.e+00]) # 🔥 A close observation reveals that the solution (see 🔥) is no different from the starting point [0.2, 600, 1000, 1000]. So, seems like nothing really happened and the algorithm just finished prematurely?!! Now look at the message above (see 💡). If we run a google search on this, you could find something like this: Summary b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL' If the loss-landscape does not have a smoothely changing topography, the gradient descent algorithms will soon find that from one iteration to the next, there isn't much change happening and hence, will terminate further seeking. Also, if the loss-landscape is rather flat, this could see similar fate and get early-termination. scipy-optimize-minimize does not perform the optimization - CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL D. Making the Loss Landscape Smoother A binary evaluation of value = 1 if x>5 else 0 is essentially a step-function that assigns 1 for all values of x that are greater than 5 and 0 otherwise. But this introduces a kink - a discontinuity in smoothness and this could potentially introduce problems in traversing the loss-landscape. What if we use a sigmoid function to introduce some smoothness? # Define sigmoid function def sigmoid(x): """Sigmoid function.""" return 1 / (1 + np.exp(-x)) For the above example, we could modify it as follows. You can additionally introduce another factor (gamma: γ) as follows and try to optimize it to make the landscape smoother. Thus by controlling the gamma factor, you could make the function smoother and change how quickly it changes around x = 5 The above figure is created with the following code-snippet. import matplotlib.pyplot as plt %matplotlib inline %config InlineBackend.figure_format = 'svg' # 'svg', 'retina' plt.style.use('seaborn-white') def make_figure(figtitle: str="Sigmoid Function"): """Make the demo figure for using sigmoid.""" x = np.arange(-20, 20.01, 0.01) y1 = sigmoid(x) y2 = sigmoid(x - 5) y3 = sigmoid((x - 5)/3) y4 = sigmoid((x - 5)/0.3) fig, ax = plt.subplots(figsize=(10,5)) plt.sca(ax) plt.plot(x, y1, ls="-", label="$\sigma(x)$") plt.plot(x, y2, ls="--", label="$\sigma(x - 5)$") plt.plot(x, y3, ls="-.", label="$\sigma((x - 5) / 3)$") plt.plot(x, y4, ls=":", label="$\sigma((x - 5) / 0.3)$") plt.axvline(x=0, ls="-", lw=1.3, color="cyan", alpha=0.9) plt.axvline(x=5, ls="-", lw=1.3, color="magenta", alpha=0.9) plt.legend() plt.title(figtitle) plt.show() make_figure() D.1. Example of Metric Smoothing The following is an example of how you could apply function smoothing. from functools import partial def sig(x, gamma: float=1.): return sigmoid(x/gamma) def myFunc3(params, gamma: float=0.5): """myFunc metric v3 with smoother metric.""" (ev, bv, vc, dv) = params _sig = partial(sig, gamma=gamma) df['Number'] = _sig(x = -(df['Dividend2'] - vc)) * 1 \ + _sig(x = -(df['EV2'] - dv)) * 1 df['Return'] = ( _sig(x = df['EV'] - ev) * _sig(x = -(df['Probability'] - bv)) * _sig(x = df['Number'] * df['Dividend'] - (vc + dv)) ) return -1 * (df['Return'].sum())
As already mentioned in my comment, the crucial problem is that np.where() is neither differentiable nor continuous. Consequently, your objective function violates the mathematical assumptions for most of the (derivate-based) algorithms under the hood of scipy.optimize.minimize. So, basically, you've got three options: Use a derivative-free algorithm and hope for the best. Replace np.where() with a smooth approximation such that your objective is continuously differentiable. Reformulate your problem as a MIP. Since #CypherX answer pursues approach 1, I'd like to focus on 2. Here, the main idea is to approximate the np.where function. One possible approximation is def smooth_if_then(x): eps = 1e-12 return 0.5 + x/(2*np.sqrt(eps + x*x)) which is continuous and differentiable. Then, given a np.ndarray arr and a scalar value x, the expression np.where(arr <= x, 1, 0) is equivalent to smooth_if_then(x - arr). Hence, the objective function becomes: div = df['Dividend'].values div2 = df['Dividend2'].values ev2 = df['EV2'].values ev = df['EV'].values prob = df['Probability'].values def objective(x, *params): ev, bv, vc, dv = x div_vals, div2_vals, ev2_vals, ev_vals, prob_vals = params number = smooth_if_then(vc - div2_vals) + smooth_if_then(dv - ev2_vals) part1 = smooth_if_then(bv - prob_vals) * (number * div_vals - (vc + dv)) part2 = smooth_if_then(-1*(ev - ev_vals)) * part1 return -1 * part2.sum() and using the trust-constr algorithm (which is the most robust one inside scipy.optimize.minimize), yields: res = minimize(lambda x: objective(x, div, div2, ev2, ev, prob), x0=start, bounds=b1, method="trust-constr") barrier_parameter: 1.0240000000000006e-08 barrier_tolerance: 1.0240000000000006e-08 cg_niter: 5 cg_stop_cond: 0 constr: [array([8.54635975e-01, 5.99253512e+02, 9.95614973e+02, 9.95614973e+02])] constr_nfev: [0] constr_nhev: [0] constr_njev: [0] constr_penalty: 1.0 constr_violation: 0.0 execution_time: 0.2951819896697998 fun: 1.3046631387761482e-08 grad: array([0.00000000e+00, 0.00000000e+00, 8.92175218e-12, 8.92175218e-12]) jac: [<4x4 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>] lagrangian_grad: array([-3.60651033e-09, 4.89643010e-09, 2.21847918e-09, 2.21847918e-09]) message: '`gtol` termination condition is satisfied.' method: 'tr_interior_point' nfev: 20 nhev: 0 nit: 14 niter: 14 njev: 4 optimality: 4.896430096425101e-09 status: 1 success: True tr_radius: 478515625.0 v: [array([-3.60651033e-09, 4.89643010e-09, 2.20955743e-09, 2.20955743e-09])] x: array([8.54635975e-01, 5.99253512e+02, 9.95614973e+02, 9.95614973e+02]) Last but not least: Using smooth approximations is a common way to achieve differentiability. However, it's worth mentioning that these approximations are not convex. In practice, this means that your optimization problem is not convex and thus, you have no guarantee that a found stationary point (local minimizer) is a global optimum. For this end, one either needs to use a global optimization algorithm or formulate the problem as a MIP. The latter is the recommended approach, both from a mathematical and a practice point of view.
Are optimisation builtin functions of Matlab better than Python?
all. I encountered a case where minimisation results of Matlab are very close to mathematical solution(i.e., when we solve equations by hand) when compared to the results obtained from Python's scipy minimize builtin function. I'm not sure where i'm doing wrong or how to improve the results in python. Any suggestion would be of great help. Aim of this problem is to find the time period of set nonlinear differential equations without time evolving. For of test case I took the problem from "This place". Non-Linear Differential equations looks like this Here i'm implementing pseudo spectral method for periodic systems. Implementation method is similar to what described here , only change is i'm taking uniform points and "D" matrix is formed using Pseudospectral. import numpy as np from numpy import linalg as LA import ast from ast import literal_eval as make_tuple import scipy from scipy.optimize import minimize from scipy.linalg import toeplitz import matplotlib.pyplot as plt %matplotlib tk # This "Dmatrix" is used to get derivative. def Dmatrix(N): h = 2.0*np.pi/N; col = np.zeros(N); col[1:] = 0.5*(-1.0)**np.arange(1,N)/np.sin(np.arange(1,N)*h/2.0); row = np.zeros(N); row[0] = col[0]; row[1:] = col[N-1:0:-1] D = toeplitz(col,row); return D # Actual differential equations. def dxD(x,y,t): u=(1-(x**2)/4 - (y**2)); dx=-4*y+x*u; dy=x+y*u; return np.array([dx,dy]) # Implementing Pseudo spectral method def dxFdxD(initial_guess,final_time): N=len(initial_guess)//2; x_guess=initial_guess[:N]; xl=np.array(x_guess[:]); y_guess=initial_guess[N:2*N]; yl=np.array(y_guess[:]); tf=final_time; tl=np.arange(1,N+1)*tf/N; D=Dmatrix(N); XYTzipped=zip(xl,yl,tl); dX_D=np.array([dxDynamics(xs,ys,ts) for xs,ys,ts in XYTzipped ]); xlyl=np.array([xl,yl]).transpose(); dX_F=(np.array(D#xlyl))*(2*np.pi/tf); err=np.array(dX_D - dX_F).flatten(); normError= LA.norm(err, 2); return normError # Initial guess points N=201; final_time=1.052*np.pi; tf=final_time; tgrid=np.arange(1,N+1)*tf/N; xguess=np.cos(tgrid)*2.0; yguess=-np.cos(tgrid)*0.5; tfl=np.pi*0.85; tfu=1.5*np.pi; tfbounds=(tfl,tfu); xstates= np.array([xguess,yguess]).flatten(); xstatesParameter=np.array([xstates,final_time], dtype=object); xins=np.hstack(xstatesParameter).tolist(); # Objective function for optimising def obj(x): N=(len(x)-1)//2; tf=x[-1]; xylist=x[:2*N]; return dxFdxD(xylist,tf) # Optimization using method='trust-constr' l1=[tfbounds]; str1=str([bounds123 for bounds123 in l1]); str2=str1.replace("[", "").replace("]", "") bounds1=make_tuple("("+ "(-5,5),(-5,5),"*N + str2+ ")") bnds=bounds1; # constraint def xyradius(x): nps=(len(x)-1)//2; xs=x[:nps]; ys=x[nps:2*nps]; xsysZip=zip(xs,ys) truelist=[bool((xi**2)+(yi**2)>0.25) for xi,yi in xsysZip] result=int(all(truelist)) return result xyradiusConstraintType={'type':'ineq','fun':xyradius}; cons=[xyradiusConstraintType] # Minimising "obj" sol=minimize(obj, xins, method='trust-constr', bounds=bnds, tol=1e-10) # Results x_y_tf=sol.x; x_F=x_y_tf[:N]; y_F=x_y_tf[N:2*N]; tf_system=x_y_tf[-1]; print("time period tf=",tf_system,end="\n \n") tgrid=np.arange(1,N+1)*tf/N; # Plots fig = plt.figure(1) ax = fig.add_subplot(111) #specify label for the corresponding curve # ax.set_xticks(tgrid, minor=False) ax.set_xticks(tgrid, minor=True) ax.xaxis.grid(True, which='major') ax.xaxis.grid(True, which='minor') ax.set_title('Collocation points') plt.plot(tgrid,x_F,label='x result') plt.plot(tgrid,y_F,label='y result') ax.set_title('Optimized result x,y') plt.legend() plt.show() # Parametric plot ax = plt.figure(4).add_subplot() ax.plot(x_F,y_F,label='State Space') ax.legend() plt.show() Optimizing(Minimizing) using method='SLSQP' # Scipy for minimization using method='SLSQP' l1=[tfbounds]; str1=str([bounds123 for bounds123 in l1]); str2=str1.replace("[", "").replace("]", "") bounds1=make_tuple("("+ "(-5,5),(-5,5),"*N + str2+ ")") bnds=bounds1; def xyradius(x): nps=(len(x)-1)//2; xs=x[:nps]; ys=x[nps:2*nps]; xsysZip=zip(xs,ys) truelist=[bool((xi**2)+(yi**2)>0.25) for xi,yi in xsysZip] result=int(all(truelist)) return result xyradiusConstraintType={'type':'ineq','fun':xyradius}; cons=[xyradiusConstraintType] sol=minimize(obj, xins, method='SLSQP', bounds=bnds, constraints=cons, tol=1e-10) When I implemented the same work in MatLab . I got "pi= 3.14" as the solution(time period of system), where as when in python i'm getting "4.70" as time period. Any suggestions are greatly appreciated. Thank you
GPflow 2 custom kernel construction: fine upon construction, but kernel of size None in optimization
I'm creating some GPflow models in which I need the observations pre and post of a threshold x0 to be independent a priori. I could achieve this with just GP models, or with a ChangePoints kernel with infinite steepness, but both solutions don't work well with my future extensions in mind (MOGP in particular). I figured I could easily construct what I want from scratch, so I made a new Combination kernel object, which uses the appropriate child kernel pre- or post x0. This works as intended when I evaluate the kernel on a set of input points; the expected correlations between points before and after threshold are zero, and the rest is determined by the children kernels: import numpy as np import gpflow from gpflow.kernels import Matern32 import matplotlib.pyplot as plt import tensorflow as tf from gpflow.kernels import Combination class IndependentKernel(Combination): def __init__(self, kernels, x0, forcing_variable=0, name=None): self.x0 = x0 self.forcing_variable = forcing_variable super().__init__(kernels, name=name) def K(self, X, X2=None): # threshold X, X2 based on self.x0, and construct a joint tensor if X2 is None: X2 = X fv = self.forcing_variable mask = tf.dtypes.cast(X[:, fv] >= self.x0, tf.int32) X_partitioned = tf.dynamic_partition(X, mask, 2) X2_partitioned = tf.dynamic_partition(X2, mask, 2) K_pre = self.kernels[0].K(X_partitioned[0], X2_partitioned[0]) K_post = self.kernels[1].K(X_partitioned[1], X2_partitioned[1]) zero_block_1 = tf.zeros([K_pre.shape[0], K_post.shape[1]], tf.float64) zero_block_2 = tf.zeros([K_post.shape[0], K_pre.shape[1]], tf.float64) upper_row = tf.concat([K_pre, zero_block_1], axis=1) lower_row = tf.concat([zero_block_2, K_post], axis=1) return tf.concat([upper_row, lower_row], axis=0) # def K_diag(self, X): fv = self.forcing_variable mask = tf.dtypes.cast(X[:, fv] >= self.x0, tf.int32) X_partitioned = tf.dynamic_partition(X, mask, 2) return tf.concat([self.kernels[0].K_diag(X_partitioned[0]), self.kernels[1].K_diag(X_partitioned[1])], axis=1) # # def f(x): return np.sin(6*(x-0.7)) x0 = 0.3 n = 100 x = np.linspace(0, 1, n) sigma = 0.5 y = np.random.normal(loc=f(x), scale=sigma) fv = 0 X = x[:, None] kernel = IndependentKernel([Matern32(), Matern32()], x0=x0, name='indep') x_pred = np.linspace(0, 1, 100) K = kernel.K(x_pred[:, None]) # <- kernel is evaluated correctly here However, when I want to train a GPflow model with this kernel, I receive the error message TypeError: Expected int32, got None of type 'NoneType' instead. This appears to result from the sub-kernel matrices K_pre and K_post to be of size (None, 1), instead of the expected squares (which they correctly are if I evaluate the kernel 'manually'). m = gpflow.models.GPR(data=(X, y[:, None]), kernel=kernel) gpflow.optimizers.Scipy().minimize(m.training_loss, m.trainable_variables, options=dict(maxiter=10000), method="L-BFGS-B") # <- K_pre & K_post are of size (None, 1) now? What can I do to make the kernel properly trainable? I am using GPflow 2.1.3 and TensorFlow 2.4.1.
this is not a GPflow issue but a subtlety of TensorFlow's eager vs graph mode: In eager mode (which is the default behaviour when you interact with tensors "manually" as in calling the kernel) K_pre.shape works just as expected. In graph mode (which is what happens when you wrap code in tf.function(), this generally does not always work (e.g. the shape might depend on tf.Variables with None shapes), and you have to use tf.shape(K_pre) instead to obtain the dynamic shape (that depends on the actual values inside the variables). GPflow's Scipy class by default wraps the loss&gradient computation inside tf.function() to speed up optimization. If you explicitly turn this off by passing compile=False to the minimize() call, your code example runs fine. If you replace the .shape attributes with tf.shape() calls to fix it properly, it likewise will run fine.
Use Python lmfit with a variable number of parameters in function
I am trying to deconvolve complex gas chromatogram signals into individual gaussian signals. Here is an example, where the dotted line represents the signal I am trying to deconvolve. I was able to write the code to do this using scipy.optimize.curve_fit; however, once applied to real data the results were unreliable. I believe being able to set bounds to my parameters will improve my results, so I am attempting to use lmfit, which allows this. I am having a problem getting lmfit to work with a variable number of parameters. The signals I am working with may have an arbitrary number of underlying gaussian components, so the number of parameters I need will vary. I found some hints here, but still can't figure it out... Creating a python lmfit Model with arbitrary number of parameters Here is the code I am currently working with. The code will run, but the parameter estimates do not change when the model is fit. Does anyone know how I can get my model to work? import numpy as np from collections import OrderedDict from scipy.stats import norm from lmfit import Parameters, Model def add_peaks(x_range, *pars): y = np.zeros(len(x_range)) for i in np.arange(0, len(pars), 3): curve = norm.pdf(x_range, pars[i], pars[i+1]) * pars[i+2] y = y + curve return(y) # generate some fake data x_range = np.linspace(0, 100, 1000) peaks = [50., 40., 60.] a = norm.pdf(x_range, peaks[0], 5) * 2 b = norm.pdf(x_range, peaks[1], 1) * 0.1 c = norm.pdf(x_range, peaks[2], 1) * 0.1 fake = a + b + c param_dict = OrderedDict() for i in range(0, len(peaks)): param_dict['pk' + str(i)] = peaks[i] param_dict['wid' + str(i)] = 1. param_dict['mult' + str(i)] = 1. # In case, you'd like to see the plot of fake data #y = add_peaks(x_range, *param_dict.values()) #plt.plot(x_range, y) #plt.show() # Initialize the model and fit pmodel = Model(add_peaks) params = pmodel.make_params() for i in param_dict.keys(): params.add(i, value=param_dict[i]) result = pmodel.fit(fake, params=params, x_range=x_range) print(result.fit_report())
I think you would be better off using lmfits ability to build composite model. That is, with a single peak defined with from scipy.stats import norm def peak(x, amp, center, sigma): return amp * norm.pdf(x, center, sigma) (see also lmfit.models.GaussianModel), you can build a model with many peaks: npeaks = 3 model = Model(peak, prefix='p1_') for i in range(1, npeaks): model = model + Model(peak, prefix='p%d_' % (i+1)) params = model.make_params() Now model will be a sum of 3 Gaussian functions, and the params created for that model will have names like p1_amp, p1_center, p2_amp, ..., which you can add sensible initial values and/or bounds and/or constraints. Given your example data, you could pass in initial values to make_params like params = model.make_params(p1_amp=2.0, p1_center=50., p1_sigma=2, p2_amp=0.2, p2_center=40., p2_sigma=2, p3_amp=0.2, p3_center=60., p3_sigma=2) result = model.fit(fake, params, x=x_range)
I was able to find a solution here: https://lmfit.github.io/lmfit-py/builtin_models.html#example-3-fitting-multiple-peaks-and-using-prefixes Building on the code above, the following accomplishes what I was trying to do... from lmfit.models import GaussianModel gauss1 = GaussianModel(prefix='g1_') gauss2 = GaussianModel(prefix='g2_') gauss3 = GaussianModel(prefix='g3_') gauss4 = GaussianModel(prefix='g4_') gauss5 = GaussianModel(prefix='g5_') gauss = [gauss1, gauss2, gauss3, gauss4, gauss5] prefixes = ['g1_', 'g2_', 'g3_', 'g4_', 'g5_'] mod = np.sum(gauss[0:len(peaks)]) pars = mod.make_params() for i, prefix in zip(range(0, len(peaks)), prefixes[0:len(peaks)]): pars[prefix + 'center'].set(peaks[i]) init = mod.eval(pars, x=x_range) out = mod.fit(fake, pars, x=x_range) print(out.fit_report(min_correl=0.5)) out.plot_fit() plt.show()