Least squares fit in python for 3d surface - python

I would like to fit my surface equation to some data. I already tried scipy.optimize.leastsq but as I cannot specify the bounds it gives me an unusable results. I also tried scipy.optimize.least_squares but it gives me an error:
ValueError: too many values to unpack
My equation is:
f(x,y,z)=(x-A+y-B)/2+sqrt(((x-A-y+B)/2)^2+C*z^2)
parameters A, B, C should be found so that the equation above would be as close as possible to zero when the following points are used for x,y,z:
[
[-0.071, -0.85, 0.401],
[-0.138, -1.111, 0.494],
[-0.317, -0.317, -0.317],
[-0.351, -2.048, 0.848]
]
The bounds would be A > 0, B > 0, C > 1
How I should obtain such a fit? What is the best tool in python to do that. I searched for examples on how to fit 3d surfaces but most of examples involving function fitting is about line or flat surface fits.

I've edited this answer to provide a more general example of how this problem can be solved with scipy's general optimize.minimize method as well as scipy's optimize.least_squares method.
First lets set up the problem:
import numpy as np
import scipy.optimize
# ===============================================
# SETUP: define common compoments of the problem
def our_function(coeff, data):
"""
The function we care to optimize.
Args:
coeff (np.ndarray): are the parameters that we care to optimize.
data (np.ndarray): the input data
"""
A, B, C = coeff
x, y, z = data.T
return (x - A + y - B) / 2 + np.sqrt(((x - A - y + B) / 2) ** 2 + C * z ** 2)
# Define some training data
data = np.array([
[-0.071, -0.85, 0.401],
[-0.138, -1.111, 0.494],
[-0.317, -0.317, -0.317],
[-0.351, -2.048, 0.848]
])
# Define training target
# This is what we want the target function to be equal to
target = 0
# Make an initial guess as to the parameters
# either a constant or random guess is typically fine
num_coeff = 3
coeff_0 = np.ones(num_coeff)
# coeff_0 = np.random.rand(num_coeff)
This isn't strictly least squares, but how about something like this?
This solution is like throwing a sledge hammer at the problem. There probably is a way to use least squares to get a solution more efficiently using an SVD solver, but if you're just looking for an answer scipy.optimize.minimize will find you one.
# ===============================================
# FORMULATION #1: a general minimization problem
# Here the bounds and error are all specified within the general objective function
def general_objective(coeff, data, target):
"""
General function that simply returns a value to be minimized.
The coeff will be modified to minimize whatever the output of this function
may be.
"""
# Constraints to keep coeff above 0
if np.any(coeff < 0):
# If any constraint is violated return infinity
return np.inf
# The function we care about
prediction = our_function(coeff, data)
# (optional) L2 regularization to keep coeff small
# (optional) reg_amount = 0.0
# (optional) reg = reg_amount * np.sqrt((coeff ** 2).sum())
losses = (prediction - target) ** 2
# (optional) losses += reg
# Return the average squared error
loss = losses.sum()
return loss
general_result = scipy.optimize.minimize(general_objective, coeff_0,
method='Nelder-Mead',
args=(data, target))
# Test what the squared error of the returned result is
coeff = general_result.x
general_output = our_function(coeff, data)
print('====================')
print('general_result =\n%s' % (general_result,))
print('---------------------')
print('general_output = %r' % (general_output,))
print('====================')
The output looks like this:
====================
general_result =
final_simplex: (array([[ 2.45700466e-01, 7.93719271e-09, 1.71257109e+00],
[ 2.45692680e-01, 3.31991619e-08, 1.71255150e+00],
[ 2.45726858e-01, 6.52636219e-08, 1.71263360e+00],
[ 2.45713989e-01, 8.06971686e-08, 1.71260234e+00]]), array([ 0.00012404, 0.00012404, 0.00012404, 0.00012404]))
fun: 0.00012404137498459109
message: 'Optimization terminated successfully.'
nfev: 431
nit: 240
status: 0
success: True
x: array([ 2.45700466e-01, 7.93719271e-09, 1.71257109e+00])
---------------------
general_output = array([ 0.00527974, -0.00561568, -0.00719941, 0.00357748])
====================
I found in the documentation that all you need to do to adapt this to actual least squares is to specify the function that computes the residuals.
# ===============================================
# FORMULATION #2: a special least squares problem
# Here all that is needeed is a function that computes the vector of residuals
# the optimization function takes care of the rest
def least_squares_residuals(coeff, data, target):
"""
Function that returns the vector of residuals between the predicted values
and the target value. Here we want each predicted value to be close to zero
"""
A, B, C = coeff
x, y, z = data.T
prediction = our_function(coeff, data)
vector_of_residuals = (prediction - target)
return vector_of_residuals
# Here the bounds are specified in the optimization call
bound_gt = np.full(shape=num_coeff, fill_value=0, dtype=np.float)
bound_lt = np.full(shape=num_coeff, fill_value=np.inf, dtype=np.float)
bounds = (bound_gt, bound_lt)
lst_sqrs_result = scipy.optimize.least_squares(least_squares_residuals, coeff_0,
args=(data, target), bounds=bounds)
# Test what the squared error of the returned result is
coeff = lst_sqrs_result.x
lst_sqrs_output = our_function(coeff, data)
print('====================')
print('lst_sqrs_result =\n%s' % (lst_sqrs_result,))
print('---------------------')
print('lst_sqrs_output = %r' % (lst_sqrs_output,))
print('====================')
The output here is:
====================
lst_sqrs_result =
active_mask: array([ 0, -1, 0])
cost: 6.197329866927735e-05
fun: array([ 0.00518416, -0.00564099, -0.00710112, 0.00385024])
grad: array([ -4.61826888e-09, 3.70771396e-03, 1.26659198e-09])
jac: array([[-0.72611025, -0.27388975, 0.13653112],
[-0.74479565, -0.25520435, 0.1644325 ],
[-0.35777232, -0.64222767, 0.11601263],
[-0.77338046, -0.22661953, 0.27104366]])
message: '`gtol` termination condition is satisfied.'
nfev: 13
njev: 13
optimality: 4.6182688779976278e-09
status: 1
success: True
x: array([ 2.46392438e-01, 5.39025298e-17, 1.71555150e+00])
---------------------
lst_sqrs_output = array([ 0.00518416, -0.00564099, -0.00710112, 0.00385024])
====================

Related

Python weighted quantile as R wtd.quantile()

I want to convert the R package Hmisc::wtd.quantile() into python.
Here is the example in R:
I took this as reference and it seems that the logics are different than R:
# First function
def weighted_quantile(values, quantiles, sample_weight = None,
values_sorted = False, old_style = False):
""" Very close to numpy.percentile, but supports weights.
NOTE: quantiles should be in [0, 1]!
:param values: numpy.array with data
:param quantiles: array-like with many quantiles needed
:param sample_weight: array-like of the same length as `array`
:return: numpy.array with computed quantiles.
"""
values = np.array(values)
quantiles = np.array(quantiles)
if sample_weight is None:
sample_weight = np.ones(len(values))
sample_weight = np.array(sample_weight)
assert np.all(quantiles >= 0) and np.all(quantiles <= 1), 'quantiles should be in [0, 1]'
if not values_sorted:
sorter = np.argsort(values)
values = values[sorter]
sample_weight = sample_weight[sorter]
# weighted_quantiles = np.cumsum(sample_weight)
# weighted_quantiles /= np.sum(sample_weight)
weighted_quantiles = np.cumsum(sample_weight)/np.sum(sample_weight)
return np.interp(quantiles, weighted_quantiles, values)
weighted_quantile(values = [0.4890342, 0.4079128, 0.5083345, 0.2136325, 0.6197319],
quantiles = np.arange(0, 1 + 1 / 5, 1 / 5),
sample_weight = [1,1,1,1,1])
>> array([0.2136325, 0.2136325, 0.4079128, 0.4890342, 0.5083345, 0.6197319])
# Second function
def weighted_percentile(data, weights, perc):
"""
perc : percentile in [0-1]!
"""
data = np.array(data)
weights = np.array(weights)
ix = np.argsort(data)
data = data[ix] # sort data
weights = weights[ix] # sort weights
cdf = (np.cumsum(weights) - 0.5 * weights) / np.sum(weights) # 'like' a CDF function
return np.interp(perc, cdf, data)
weighted_percentile([0.4890342, 0.4079128, 0.5083345, 0.2136325, 0.6197319], [1,1,1,1,1], np.arange(0, 1 + 1 / 5, 1 / 5))
>> array([0.2136325 , 0.31077265, 0.4484735 , 0.49868435, 0.5640332 ,
0.6197319 ])
Both are different with R. Any idea?
I am Python-illiterate, but from what I see and after some quick checks I can tell you the following.
Here you use uniform (sampling) weights, so you could also directly use the quantile() function. Not surprisingly, it gives the same results as wtd.quantile() with uniform weights:
x <- c(0.4890342, 0.4079128, 0.5083345, 0.2136325, 0.6197319)
n <- length(x)
x <- sort(x)
quantile(x, probs = seq(0,1,0.2))
# 0% 20% 40% 60% 80% 100%
# 0.2136325 0.3690567 0.4565856 0.4967543 0.5306140 0.6197319
The R quantile() function get the quantiles in a 'textbook' way, i.e. by determining the index i of the obs to use with i = q(n+1).
In your case:
seq(0,1,0.2)*(n+1)
# 0.0 1.2 2.4 3.6 4.8 6.0
Of course since you have 5 values/obs and you want quintiles, the indices are not integers. But you know for example that the first quintile (i = 1.2) lies between obs 1 and obs 2. More precisely, it is a linear combination of the two observations (the 'weights' are derived from the value of the index):
0.2*x[1] + 0.8*x[2]
# 0.3690567
You can do the same for the all the quintiles, on the basis of the indices:
q <-
c(min(x), ## 0: actually, the first obs
0.2*x[1] + 0.8*x[2], ## 1.2: quintile lies between obs 1 and 2
0.4*x[2] + 0.6*x[3], ## 2.4: quintile lies between obs 2 and 3
0.6*x[3] + 0.4*x[4], ## 3.6: quintile lies between obs 3 and 4
0.8*x[4] + 0.2*x[5], ## 4.8: quintile lies between obs 4 and 5
max(x) ## 6: actually, the last obs
)
q
# 0.2136325 0.3690567 0.4565856 0.4967543 0.5306140 0.6197319
You can see that you get exactly the output of quantile() and wtd.quantile().
If instead of 0.2*x[1] + 0.8*x[2] we consider the following:
0.5*x[1] + 0.5*x[2]
# 0.3107726
We get the output of your second Python function. It appears that your second function considers uniform 'weights' (obviously I am not talking about the sampling weights here) when combining the two observations. The issue (at least for the second Python function) seems to come from this. I know these are just insights, but I hope they will help.
EDIT: note that the difference between the two is not necessary an 'issue' with the python code. There are different quantile estimators (and their weighted versions) and the python functions could simply rely on a different estimator than Hmisc::wtd.quantile(). I think that the latter uses the weighted version of the Harrell-Davis quantile estimator. If you really want to implement this one, you should check the source code of Hmisc::wtd.quantile() and try to 'directly' translate this into Python.

Find the value of variables to maximize return of function in Python

I'd want to achieve similar result as how the Solver-function in Excel is working. I've been reading of Scipy optimization and been trying to build a function which outputs what I would like to find the maximal value of. The equation is based on four different variables which, see my code below:
import pandas as pd
import numpy as np
from scipy import optimize
cols = {
'Dividend2': [9390, 7448, 177],
'Probability': [341, 376, 452],
'EV': [0.53, 0.60, 0.55],
'Dividend': [185, 55, 755],
'EV2': [123, 139, 544],
}
df = pd.DataFrame(cols)
def myFunc(params):
"""myFunc metric."""
(ev, bv, vc, dv) = params
df['Number'] = np.where(df['Dividend2'] <= vc, 1, 0) \
+ np.where(df['EV2'] <= dv, 1, 0)
df['Return'] = np.where(
df['EV'] <= ev, 0, np.where(
df['Probability'] >= bv, 0, df['Number'] * df['Dividend'] - (vc + dv)
)
)
return -1 * (df['Return'].sum())
b1 = [(0.2,4), (300,600), (0,1000), (0,1000)]
start = [0.2, 600, 1000, 1000]
result = optimize.minimize(fun=myFunc, bounds=b1, x0=start)
print(result)
So I'd like to find the maximum value of the column Return in df when changing the variables ev,bv,vc & dv. I'd like them to be between in the intervals of ev: 0.2-4, bv: 300-600, vc: 0-1000 & dv: 0-1000.
When running my code it seem like the function stops at x0.
Solution
I will use optuna library to give you a solution to the type of problem you are trying to solve. I have tried using scipy.optimize.minimize and it appears that the loss-landscape is probably quite flat in most places, and hence the tolerances enforce the minimizing algorithm (L-BFGS-B) to stop prematurely.
Optuna Docs: https://optuna.readthedocs.io/en/stable/index.html
With optuna, it rather straight forward. Optuna only requires an objective function and a study. The study send various trials to the objective function, which in turn, evaluates the metric of your choice.
I have defined another metric function myFunc2 by mostly removing the np.where calls, as you can do-away with them (reduces number of steps) and make the function slightly faster.
# install optuna with pip
pip install -Uqq optuna
Although I looked into using a rather smooth loss landscape, sometimes it is necessary to visualize the landscape itself. The answer in section B elaborates on visualization. But, what if you want to use a smoother metric function? Section D sheds some light on this.
Order of code-execution should be:
Sections: C >> B >> B.1 >> B.2 >> B.3 >> A.1 >> A.2 >> D
A. Building Intuition
If you create a hiplot (also known as a plot with parallel-coordinates) with all the possible parameter values as mentioned in the search_space for Section B.2, and plot the lowest 50 outputs of myFunc2, it would look like this:
Plotting all such points from the search_space would look like this:
A.1. Loss Landscape Views for Various Parameter-Pairs
These figures show that mostly the loss-landscape is flat for any two of the four parameters (ev, bv, vc, dv). This could be a reason why, only GridSampler (which brute-forces the searching process) does better, compared to the other two samplers (TPESampler and RandomSampler). Please click on any of the images below to view them enlarged. This could also be the reason why scipy.optimize.minimize(method="L-BFGS-B") fails right off the bat.
01. dv-vc
02. dv-bv
03. dv-ev
04. bv-ev
05. cv-ev
06. vc-bv
# Create contour plots for parameter-pairs
study_name = "GridSampler"
study = studies.get(study_name)
views = [("dv", "vc"), ("dv", "bv"), ("dv", "ev"),
("bv", "ev"), ("vc", "ev"), ("vc", "bv")]
for i, (x, y) in enumerate(views):
print(f"Figure: {i}/{len(views)}")
study_contour_plot(study=study, params=(x, y))
A.2. Parameter Importance
study_name = "GridSampler"
study = studies.get(study_name)
fig = optuna.visualization.plot_param_importances(study)
fig.update_layout(title=f'Hyperparameter Importances: {study.study_name}',
autosize=False,
width=800, height=500,
margin=dict(l=65, r=50, b=65, t=90))
fig.show()
B. Code
Section B.3. finds the lowest metric -88.333 for:
{'ev': 0.2, 'bv': 500.0, 'vc': 222.2222, 'dv': 0.0}
import warnings
from functools import partial
from typing import Iterable, Optional, Callable, List
import pandas as pd
import numpy as np
import optuna
from tqdm.notebook import tqdm
warnings.filterwarnings("ignore", category=optuna.exceptions.ExperimentalWarning)
optuna.logging.set_verbosity(optuna.logging.WARNING)
PARAM_NAMES: List[str] = ["ev", "bv", "vc", "dv",]
DEFAULT_METRIC_FUNC: Callable = myFunc2
def myFunc2(params):
"""myFunc metric v2 with lesser steps."""
global df # define as a global variable
(ev, bv, vc, dv) = params
df['Number'] = (df['Dividend2'] <= vc) * 1 + (df['EV2'] <= dv) * 1
df['Return'] = (
(df['EV'] > ev)
* (df['Probability'] < bv)
* (df['Number'] * df['Dividend'] - (vc + dv))
)
return -1 * (df['Return'].sum())
def make_param_grid(
bounds: List[Tuple[float, float]],
param_names: Optional[List[str]]=None,
num_points: int=10,
as_dict: bool=True,
) -> Union[pd.DataFrame, Dict[str, List[float]]]:
"""
Create parameter search space.
Example:
grid = make_param_grid(bounds=b1, num_points=10, as_dict=True)
"""
if param_names is None:
param_names = PARAM_NAMES # ["ev", "bv", "vc", "dv"]
bounds = np.array(bounds)
grid = np.linspace(start=bounds[:,0],
stop=bounds[:,1],
num=num_points,
endpoint=True,
axis=0)
grid = pd.DataFrame(grid, columns=param_names)
if as_dict:
grid = grid.to_dict()
for k,v in grid.items():
grid.update({k: list(v.values())})
return grid
def objective(trial,
bounds: Optional[Iterable]=None,
func: Optional[Callable]=None,
param_names: Optional[List[str]]=None):
"""Objective function, necessary for optimizing with optuna."""
if param_names is None:
param_names = PARAM_NAMES
if (bounds is None):
bounds = ((-10, 10) for _ in param_names)
if not isinstance(bounds, dict):
bounds = dict((p, (min(b), max(b)))
for p, b in zip(param_names, bounds))
if func is None:
func = DEFAULT_METRIC_FUNC
params = dict(
(p, trial.suggest_float(p, bounds.get(p)[0], bounds.get(p)[1]))
for p in param_names
)
# x = trial.suggest_float('x', -10, 10)
return func((params[p] for p in param_names))
def optimize(objective: Callable,
sampler: Optional[optuna.samplers.BaseSampler]=None,
func: Optional[Callable]=None,
n_trials: int=2,
study_direction: str="minimize",
study_name: Optional[str]=None,
formatstr: str=".4f",
verbose: bool=True):
"""Optimizing function using optuna: creates a study."""
if func is None:
func = DEFAULT_METRIC_FUNC
study = optuna.create_study(
direction=study_direction,
sampler=sampler,
study_name=study_name)
study.optimize(
objective,
n_trials=n_trials,
show_progress_bar=True,
n_jobs=1,
)
if verbose:
metric = eval_metric(study.best_params, func=myFunc2)
msg = format_result(study.best_params, metric,
header=study.study_name,
format=formatstr)
print(msg)
return study
def format_dict(d: Dict[str, float], format: str=".4f") -> Dict[str, float]:
"""
Returns formatted output for a dictionary with
string keys and float values.
"""
return dict((k, float(f'{v:{format}}')) for k,v in d.items())
def format_result(d: Dict[str, float],
metric_value: float,
header: str='',
format: str=".4f"):
"""Returns formatted result."""
msg = f"""Study Name: {header}\n{'='*30}
✅ study.best_params: \n\t{format_dict(d)}
✅ metric: {metric_value}
"""
return msg
def study_contour_plot(study: optuna.Study,
params: Optional[List[str]]=None,
width: int=560,
height: int=500):
"""
Create contour plots for a study, given a list or
tuple of two parameter names.
"""
if params is None:
params = ["dv", "vc"]
fig = optuna.visualization.plot_contour(study, params=params)
fig.update_layout(
title=f'Contour Plot: {study.study_name} ({params[0]}, {params[1]})',
autosize=False,
width=width,
height=height,
margin=dict(l=65, r=50, b=65, t=90))
fig.show()
bounds = [(0.2, 4), (300, 600), (0, 1000), (0, 1000)]
param_names = PARAM_NAMES # ["ev", "bv", "vc", "dv",]
pobjective = partial(objective, bounds=bounds)
# Create an empty dict to contain
# various subsequent studies.
studies = dict()
Optuna comes with a few different types of Samplers. Samplers provide the strategy of how optuna is going to sample points from the parametr-space and evaluate the objective function.
https://optuna.readthedocs.io/en/stable/reference/samplers.html
B.1 Use TPESampler
from optuna.samplers import TPESampler
sampler = TPESampler(seed=42)
study_name = "TPESampler"
studies[study_name] = optimize(
pobjective,
sampler=sampler,
n_trials=100,
study_name=study_name,
)
# Study Name: TPESampler
# ==============================
#
# ✅ study.best_params:
# {'ev': 1.6233, 'bv': 585.2143, 'vc': 731.9939, 'dv': 598.6585}
# ✅ metric: -0.0
B.2. Use GridSampler
GridSampler requires a parameter search grid. Here we are using the following search_space.
from optuna.samplers import GridSampler
# create search-space
search_space = make_param_grid(bounds=bounds, num_points=10, as_dict=True)
sampler = GridSampler(search_space)
study_name = "GridSampler"
studies[study_name] = optimize(
pobjective,
sampler=sampler,
n_trials=2000,
study_name=study_name,
)
# Study Name: GridSampler
# ==============================
#
# ✅ study.best_params:
# {'ev': 0.2, 'bv': 500.0, 'vc': 222.2222, 'dv': 0.0}
# ✅ metric: -88.33333333333337
B.3. Use RandomSampler
from optuna.samplers import RandomSampler
sampler = RandomSampler(seed=42)
study_name = "RandomSampler"
studies[study_name] = optimize(
pobjective,
sampler=sampler,
n_trials=300,
study_name=study_name,
)
# Study Name: RandomSampler
# ==============================
#
# ✅ study.best_params:
# {'ev': 1.6233, 'bv': 585.2143, 'vc': 731.9939, 'dv': 598.6585}
# ✅ metric: -0.0
C. Dummy Data
For the sake of reproducibility, I am keeping a record of the dummy data used here.
import pandas as pd
import numpy as np
from scipy import optimize
cols = {
'Dividend2': [9390, 7448, 177],
'Probability': [341, 376, 452],
'EV': [0.53, 0.60, 0.55],
'Dividend': [185, 55, 755],
'EV2': [123, 139, 544],
}
df = pd.DataFrame(cols)
def myFunc(params):
"""myFunc metric."""
(ev, bv, vc, dv) = params
df['Number'] = np.where(df['Dividend2'] <= vc, 1, 0) \
+ np.where(df['EV2'] <= dv, 1, 0)
df['Return'] = np.where(
df['EV'] <= ev, 0, np.where(
df['Probability'] >= bv, 0, df['Number'] * df['Dividend'] - (vc + dv)
)
)
return -1 * (df['Return'].sum())
b1 = [(0.2,4), (300,600), (0,1000), (0,1000)]
start = [0.2, 600, 1000, 1000]
result = optimize.minimize(fun=myFunc, bounds=b1, x0=start)
print(result)
C.1. An Observation
So, it seems at first glance that the code executed properly and did not throw any error. It says it had success in finding the minimized solution.
fun: -0.0
hess_inv: <4x4 LbfgsInvHessProduct with dtype=float64>
jac: array([0., 0., 3., 3.])
message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL' # 💡
nfev: 35
nit: 2
status: 0
success: True
x: array([2.e-01, 6.e+02, 0.e+00, 0.e+00]) # 🔥
A close observation reveals that the solution (see 🔥) is no different from the starting point [0.2, 600, 1000, 1000]. So, seems like nothing really happened and the algorithm just finished prematurely?!!
Now look at the message above (see 💡). If we run a google search on this, you could find something like this:
Summary
b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
If the loss-landscape does not have a smoothely changing topography, the gradient descent algorithms will soon find that from one iteration to the next, there isn't much change happening and hence, will terminate further seeking. Also, if the loss-landscape is rather flat, this could see similar fate and get early-termination.
scipy-optimize-minimize does not perform the optimization - CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
D. Making the Loss Landscape Smoother
A binary evaluation of value = 1 if x>5 else 0 is essentially a step-function that assigns 1 for all values of x that are greater than 5 and 0 otherwise. But this introduces a kink - a discontinuity in smoothness and this could potentially introduce problems in traversing the loss-landscape.
What if we use a sigmoid function to introduce some smoothness?
# Define sigmoid function
def sigmoid(x):
"""Sigmoid function."""
return 1 / (1 + np.exp(-x))
For the above example, we could modify it as follows.
You can additionally introduce another factor (gamma: γ) as follows and try to optimize it to make the landscape smoother. Thus by controlling the gamma factor, you could make the function smoother and change how quickly it changes around x = 5
The above figure is created with the following code-snippet.
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'svg' # 'svg', 'retina'
plt.style.use('seaborn-white')
def make_figure(figtitle: str="Sigmoid Function"):
"""Make the demo figure for using sigmoid."""
x = np.arange(-20, 20.01, 0.01)
y1 = sigmoid(x)
y2 = sigmoid(x - 5)
y3 = sigmoid((x - 5)/3)
y4 = sigmoid((x - 5)/0.3)
fig, ax = plt.subplots(figsize=(10,5))
plt.sca(ax)
plt.plot(x, y1, ls="-", label="$\sigma(x)$")
plt.plot(x, y2, ls="--", label="$\sigma(x - 5)$")
plt.plot(x, y3, ls="-.", label="$\sigma((x - 5) / 3)$")
plt.plot(x, y4, ls=":", label="$\sigma((x - 5) / 0.3)$")
plt.axvline(x=0, ls="-", lw=1.3, color="cyan", alpha=0.9)
plt.axvline(x=5, ls="-", lw=1.3, color="magenta", alpha=0.9)
plt.legend()
plt.title(figtitle)
plt.show()
make_figure()
D.1. Example of Metric Smoothing
The following is an example of how you could apply function smoothing.
from functools import partial
def sig(x, gamma: float=1.):
return sigmoid(x/gamma)
def myFunc3(params, gamma: float=0.5):
"""myFunc metric v3 with smoother metric."""
(ev, bv, vc, dv) = params
_sig = partial(sig, gamma=gamma)
df['Number'] = _sig(x = -(df['Dividend2'] - vc)) * 1 \
+ _sig(x = -(df['EV2'] - dv)) * 1
df['Return'] = (
_sig(x = df['EV'] - ev)
* _sig(x = -(df['Probability'] - bv))
* _sig(x = df['Number'] * df['Dividend'] - (vc + dv))
)
return -1 * (df['Return'].sum())
As already mentioned in my comment, the crucial problem is that np.where() is neither differentiable nor continuous. Consequently, your objective function violates the mathematical assumptions for most of the (derivate-based) algorithms under the hood of scipy.optimize.minimize.
So, basically, you've got three options:
Use a derivative-free algorithm and hope for the best.
Replace np.where() with a smooth approximation such that your objective is continuously differentiable.
Reformulate your problem as a MIP.
Since #CypherX answer pursues approach 1, I'd like to focus on 2. Here, the main idea is to approximate the np.where function. One possible approximation is
def smooth_if_then(x):
eps = 1e-12
return 0.5 + x/(2*np.sqrt(eps + x*x))
which is continuous and differentiable. Then, given a np.ndarray arr and a scalar value x, the expression np.where(arr <= x, 1, 0) is equivalent to smooth_if_then(x - arr).
Hence, the objective function becomes:
div = df['Dividend'].values
div2 = df['Dividend2'].values
ev2 = df['EV2'].values
ev = df['EV'].values
prob = df['Probability'].values
def objective(x, *params):
ev, bv, vc, dv = x
div_vals, div2_vals, ev2_vals, ev_vals, prob_vals = params
number = smooth_if_then(vc - div2_vals) + smooth_if_then(dv - ev2_vals)
part1 = smooth_if_then(bv - prob_vals) * (number * div_vals - (vc + dv))
part2 = smooth_if_then(-1*(ev - ev_vals)) * part1
return -1 * part2.sum()
and using the trust-constr algorithm (which is the most robust one inside scipy.optimize.minimize), yields:
res = minimize(lambda x: objective(x, div, div2, ev2, ev, prob), x0=start, bounds=b1, method="trust-constr")
barrier_parameter: 1.0240000000000006e-08
barrier_tolerance: 1.0240000000000006e-08
cg_niter: 5
cg_stop_cond: 0
constr: [array([8.54635975e-01, 5.99253512e+02, 9.95614973e+02, 9.95614973e+02])]
constr_nfev: [0]
constr_nhev: [0]
constr_njev: [0]
constr_penalty: 1.0
constr_violation: 0.0
execution_time: 0.2951819896697998
fun: 1.3046631387761482e-08
grad: array([0.00000000e+00, 0.00000000e+00, 8.92175218e-12, 8.92175218e-12])
jac: [<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>]
lagrangian_grad: array([-3.60651033e-09, 4.89643010e-09, 2.21847918e-09, 2.21847918e-09])
message: '`gtol` termination condition is satisfied.'
method: 'tr_interior_point'
nfev: 20
nhev: 0
nit: 14
niter: 14
njev: 4
optimality: 4.896430096425101e-09
status: 1
success: True
tr_radius: 478515625.0
v: [array([-3.60651033e-09, 4.89643010e-09, 2.20955743e-09, 2.20955743e-09])]
x: array([8.54635975e-01, 5.99253512e+02, 9.95614973e+02, 9.95614973e+02])
Last but not least: Using smooth approximations is a common way to achieve differentiability. However, it's worth mentioning that these approximations are not convex. In practice, this means that your optimization problem is not convex and thus, you have no guarantee that a found stationary point (local minimizer) is a global optimum. For this end, one either needs to use a global optimization algorithm or formulate the problem as a MIP. The latter is the recommended approach, both from a mathematical and a practice point of view.

Trying to solve this non linear optimization using GEKKO, getting this error

#Error: setting an array element with a sequence
I am trying to mninimize the downside risk.
I have a two dimensional array of returns shape(1000, 10), and the portfolio starts with $100. Compound that 10 times by each return in a row. Do that for all the rows. Compare that last cell's value for each row with mean of last column's values. Keep the value if it's less than mean or else zero. So we will have an array of (1000, 1). At the end I am finding the standard deviation of that.
Objective is to minimize the standard deviation.
Constraints: weights need to be less than 1
the expected return i.e. wt*ret should be equal to a value like 7%. I have to do that for couple of values like 7%, 8% or 10%.
wt = np.array([0.4, 0.3, 0.3])
cov = array([[0.00026566, 0.00016167, 0.00011949],
[0.00016167, 0.00065866, 0.00021662],
[0.00011949, 0.00021662, 0.00043748]])
ret =[.098, 0.0620,.0720]
iterations = 10000
return_sim = np.random.multivariate_normal(ret, cov, iterations)
def simulations(wt):
downside =[]
fund_ret =np.zeros((1000,10))
prt_ret = np.dot(return_sim , wt)
re_ret = np.array(prt_ret).reshape(1000, 10) #10 years
for m in range(len(re_ret)):
fund_ret[m][0] = 100 * (1 + re_ret[m][0]) #start with $100
for n in range(9):
fund_ret[m][n+1] = fund_ret[m][n]* (1 + re_ret[m][n+1])
mean = np.mean(fund_ret[:,-1]) #just need the last column and all rows
for i in range(1000):
downside.append(np.maximum((mean - fund_ret[i,-1]), 0))
return np.std(downside)
b = GEKKO()
w = b.Array(b.Var,3,value=0.33,lb=1e-5, ub=1)
b.Equation(b.sum(w)<=1)
b.Equation(np.dot(w,ret) == .07)
b.Minimize(simulations(w))
b.solve(disp=False)
#simulations(wt)
If you comment out the gekko section and call the simulation function at the bottom, it works fine
In this case, you would want to consider a different optimizer such as scipy.minimize.optimize. The function np.std() is not currently supported in Gekko. Gekko compiles the model into byte-code for automatic differentiation so you need to fit the problem into a form that is supported. Gekko's approach has several advantages, especially for large-scale or non-linear problems. For small problems with fewer than 100 variables and nearly linear constraints, an optimizer such as scipy.minimize.optimize is often a viable option. Here is your problem with a solution:
import numpy as np
from scipy.optimize import minimize
wt = np.array([0.4, 0.3, 0.3])
cov = np.array([[0.00026566, 0.00016167, 0.00011949],
[0.00016167, 0.00065866, 0.00021662],
[0.00011949, 0.00021662, 0.00043748]])
ret =[.098, 0.0620,.0720]
iterations = 10000
return_sim = np.random.multivariate_normal(ret, cov, iterations)
def simulations(wt):
downside =[]
fund_ret =np.zeros((1000,10))
prt_ret = np.dot(return_sim , wt)
re_ret = np.array(prt_ret).reshape(1000, 10) #10 years
for m in range(len(re_ret)):
fund_ret[m][0] = 100 * (1 + re_ret[m][0]) #start with $100
for n in range(9):
fund_ret[m][n+1] = fund_ret[m][n]* (1+re_ret[m][n+1])
#just need the last column and all rows
mean = np.mean(fund_ret[:,-1])
for i in range(1000):
downside.append(np.maximum((mean - fund_ret[i,-1]), 0))
return np.std(downside)
b = (1e-5,1); bnds=(b,b,b)
cons = ({'type': 'ineq', 'fun': lambda x: sum(x)-1},\
{'type': 'eq', 'fun': lambda x: np.dot(x,ret)-.07})
sol = minimize(simulations,wt,bounds=bnds,constraints=cons)
w = sol.x
print(w)
This produces the solution sol with optimal values w=sol.x:
fun: 6.139162309118155
jac: array([ 8.02691203, 10.04863131, 9.49171901])
message: 'Optimization terminated successfully.'
nfev: 33
nit: 6
njev: 6
status: 0
success: True
x: array([0.09741111, 0.45326888, 0.44932001])

Custom Theano Op to do numerical integration

I'm attempting to write a custom Theano Op which numerically integrates a function between two values. The Op is a custom likelihood for PyMC3 which involves the numerical evaluation of some integrals. I can't simply use the #as_op decorator as I need to use HMC to do the MCMC step. Any help would be much appreciated, as this question seems to have come up several times but has never been solved (e.g. https://stackoverflow.com/questions/36853015/using-theano-with-numerical-integration, Theano: implementing an integral function).
Clearly one solution would be to write a numerical integrator within Theano, but this seems like a waste of effort when very good integrators are already available, for example through scipy.integrate.
To keep this as a minimal example, let's just try and integrate a function between 0 and 1 inside an Op. The following integrates a Theano function outside of an Op, and produces correct results as far as my testing has gone.
import theano
import theano.tensor as tt
from scipy.integrate import quad
x = tt.dscalar('x')
y = x**4 # integrand
f = theano.function([x], y)
print f(0)
print f(1)
ans = integrate.quad(f, 0, 1)[0]
print ans
However, attempting to do integration within an Op appears much harder. My current best effort is:
import numpy as np
import theano
import theano.tensor as tt
from scipy import integrate
class IntOp(theano.Op):
__props__ = ()
def make_node(self, x):
x = tt.as_tensor_variable(x)
return theano.Apply(self, [x], [x.type()])
def perform(self, node, inputs, output_storage):
x = inputs[0]
z = output_storage[0]
f_to_int = theano.function([x], x)
z[0] = tt.as_tensor_variable(integrate.quad(f_to_int, 0, 1)[0])
def infer_shape(self, node, i0_shapes):
return i0_shapes
def grad(self, inputs, output_grads):
ans = integrate.quad(output_grads[0], 0, 1)[0]
return [ans]
intOp = IntOp()
x = tt.dmatrix('x')
y = intOp(x)
f = theano.function([x], y)
inp = np.asarray([[2, 4], [6, 8]], dtype=theano.config.floatX)
out = f(inp)
print inp
print out
Which gives the following error:
Traceback (most recent call last):
File "stackoverflow.py", line 35, in <module>
out = f(inp)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 871, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 859, in __call__
outputs = self.fn()
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 912, in rval
r = p(n, [x[0] for x in i], o)
File "stackoverflow.py", line 17, in perform
f_to_int = theano.function([x], x)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function.py", line 320, in function
output_keys=output_keys)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 390, in pfunc
for p in params]
File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 489, in _pfunc_param_to_in
raise TypeError('Unknown parameter type: %s' % type(param))
TypeError: Unknown parameter type: <type 'numpy.ndarray'>
Apply node that caused the error: IntOp(x)
Toposort index: 0
Inputs types: [TensorType(float64, matrix)]
Inputs shapes: [(2, 2)]
Inputs strides: [(16, 8)]
Inputs values: [array([[ 2., 4.],
[ 6., 8.]])]
Outputs clients: [['output']]
Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "stackoverflow.py", line 30, in <module>
y = intOp(x)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 611, in __call__
node = self.make_node(*inputs, **kwargs)
File "stackoverflow.py", line 11, in make_node
return theano.Apply(self, [x], [x.type()])
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
I'm surprised by this, especially the TypeError, as I thought I had converted the output_storage variable into a tensor but it appears to believe here that it is still an ndarray.
I found your question because I'm trying to build a random variable in PyMC3 that represents a general point process (Hawkes, Cox, Poisson, etc) and the likelihood function has an integral. I really want to be able to use Hamiltonian Monte Carlo or NUTS samplers, so I needed that integral with respect to time to be differentiable.
Starting off of your attempt, I made an integrateOut theano Op that seems to work correctly with the behavior I need. I've tested it out on a few different inputs (not on my stats model just yet, but it appears promising!). I'm a total theano n00b, so pardon any stupidity. I would greatly appreciate feedback if anyone has any. Not sure it's exactly what you're looking for, but here's my solution (example at the bottom and in the doc strings). *EDIT: simplified some remnants of screwing around with ways to do this.
import theano
import theano.tensor as T
from scipy.integrate import quad
class integrateOut(theano.Op):
"""
Integrate out a variable from an expression, computing
the definite integral w.r.t. the variable specified
!!! Only implemented in this for scalars !!!
Parameters
----------
f : scalar
input 'function' to integrate
t : scalar
the variable to integrate out
t0: float
lower integration limit
tf: float
upper integration limit
Returns
-------
scalar
a new scalar with the 't' integrated out
Notes
-----
usage of this looks like:
x = T.dscalar('x')
y = T.dscalar('y')
t = T.dscalar('t')
z = (x**2 + y**2)*t
# integrate z w.r.t. t as a function of (x,y)
intZ = integrateOut(z,t,0.0,5.0)(x,y)
gradIntZ = T.grad(intZ,[x,y])
funcIntZ = theano.function([x,y],intZ)
funcGradIntZ = theano.function([x,y],gradIntZ)
"""
def __init__(self,f,t,t0,tf,*args,**kwargs):
super(integrateOut,self).__init__()
self.f = f
self.t = t
self.t0 = t0
self.tf = tf
def make_node(self,*inputs):
self.fvars=list(inputs)
# This will fail when taking the gradient... don't be concerned
try:
self.gradF = T.grad(self.f,self.fvars)
except:
self.gradF = None
return theano.Apply(self,self.fvars,[T.dscalar().type()])
def perform(self,node, inputs, output_storage):
# Everything else is an argument to the quad function
args = tuple(inputs)
# create a function to evaluate the integral
f = theano.function([self.t]+self.fvars,self.f)
# actually compute the integral
output_storage[0][0] = quad(f,self.t0,self.tf,args=args)[0]
def grad(self,inputs,grads):
return [integrateOut(g,self.t,self.t0,self.tf)(*inputs)*grads[0] \
for g in self.gradF]
x = T.dscalar('x')
y = T.dscalar('y')
t = T.dscalar('t')
z = (x**2+y**2)*t
intZ = integrateOut(z,t,0,1)(x,y)
gradIntZ = T.grad(intZ,[x,y])
funcIntZ = theano.function([x,y],intZ)
funcGradIntZ = theano.function([x,y],gradIntZ)
print funcIntZ(2,2)
print funcGradIntZ(2,2)
SymPy is proving harder than anticipated, but in the meantime in case anyone's finding this useful, I'll also point out how to modify this Op to allow for changing the final timepoint without creating a new Op. This can be useful if you have a point process, or if you have uncertainty in your time measurements.
class integrateOut2(theano.Op):
def __init__(self, f, int_var, *args,**kwargs):
super(integrateOut2,self).__init__()
self.f = f
self.int_var = int_var
def make_node(self, *inputs):
tmax = inputs[0]
self.fvars=list(inputs[1:])
return theano.Apply(self, [tmax]+self.fvars, [T.dscalar().type()])
def perform(self, node, inputs, output_storage):
# Everything else is an argument to the quad function
tmax = inputs[0]
args = tuple(inputs[1:])
# create a function to evaluate the integral
f = theano.function([self.int_var]+self.fvars, self.f)
# actually compute the integral
output_storage[0][0] = quad(f, 0., tmax, args=args)[0]
def grad(self, inputs, grads):
tmax = inputs[0]
param_grads = T.grad(self.f, self.fvars)
## Recall fundamental theorem of calculus
## d/dt \int^{t}_{0}f(x)dx = f(t)
## So sub in t_max to the graph
FTC_grad = theano.clone(self.f, {self.int_var: tmax})
grad_list = [FTC_grad*grads[0]] + \
[integrateOut2(grad_fn, self.int_var)(*inputs)*grads[0] \
for grad_fn in param_grads]
return grad_list
I always use the following code where I generate B = 10000 samples of n = 30 observations from a normal distribution with µ = 1 and σ 2 = 2.25. For each sample, the parameters µ and σ are estimated and stored in a matrix. I hope this can help you.
loglik <- function(p,z){
sum(dnorm(z,mean=p[1],sd=p[2],log=TRUE))
}
set.seed(45)
n <- 30
x <- rnorm(n,mean=1,sd=1.5)
optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=x)
B <- 10000
bootstrap.results <- matrix(NA,nrow=B,ncol=3)
colnames(bootstrap.results) <- c("mu","sigma","convergence")
for (b in 1:B){
sample.b <- rnorm(n,mean=1,sd=1.5)
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=sample.b)
bootstrap.results[b,] <- c(m.b$par,m.b$convergence)
}
One can also obtain the ML estimate of λ and use the bootstrap to estimate the bias and the standard error of the estimate. First calculate the MLE of λ Then, we estimate the bias and the standard error of λˆ by a nonparametric bootstrap.
B <- 9999
lambda.B <- rep(NA,B)
n <- length(w.time)
for (b in 1:B){
b.sample <- sample(1:n,n,replace=TRUE)
lambda.B[b] <- 1/mean(w.time[b.sample])
}
bias <- mean(lambda.B-m$estimate)
sd(lambda.B)
In the second part we calculate a 95% confidence interval for the mean time between failures.
n <- length(w.time)
m <- mean(w.time)
se <- sd(w.time)/sqrt(n)
interval.1 <- m + se * qnorm(c(0.025,0.975))
interval.1
But we can also use the the assumption that the data are from an exponential distribution. In that case we have varX¯ = 1/(nλ^2) = θ^{2}/n which can be estimated by X¯^{2}/n.
sd.m <- sqrt(m^2/n)
interval.2 <- m + sd.m * qnorm(c(0.025,0.975))
interval.2
We can also estimate the standard error of ˆθ by means of a boostrap procedure. We use the nonparametric bootstrap, that is, we sample from the original sample with replacement.
B <- 9999
m.star <- rep(NA,B)
for (b in 1:B){
m.star[b] <- mean(sample(w.time,replace=TRUE))
}
sd.m.star <- sd(m.star)
interval.3 <- m + sd.m.star * qnorm(c(0.025,0.975))
interval.3
An interval not based on the assumption of normality of ˆθ is obtained by the percentile method:
interval.4 <- quantile(m.star, probs=c(0.025,0.975))
interval.4

Use optimize.minimize from scipy with 2 variables and interpolated function

I didn't find a way to perform optimize.minimize from scipy with a multidimensional function. In nearly all examples an analytical function is optimized while my function is interpolated. The test data set looks like this:
x = np.array([2000,2500,3000,3500])
y = np.array([10,15,25,50])
z = np.array([10,12,17,19,13,13,16,20,17,60,25,25,8,35,15,20])
data = np.array([x,y,z])
While the function is like F(x,y) = z
What I want to know is what happens at f(2200,12) and what is the global maximum in the range of x (2000:3500) and y (10:50). The interpolation works fine. But finding the global maximum doesn't work so far.
The interpolation
self.F2 = interp2d(xx, -yy, z, kind, bounds_error=False)
yields
<scipy.interpolate.interpolate.interp2d object at 0x0000000002C3BBE0>
I tried to optimize via:
x0 = [(2000,3500),(10,50)]
res = scipy.optimize.minimize(self.F2, x0, method='Nelder-Mead')
An exception is thrown:
TypeError: __call__() missing 1 required positional argument: 'y'
I think that the optimizer can't handle the object from the interpolation. In the examples the people used lambda to get values from their function. What do I have to do in my case?
Best,
Alex
First, to find global maximum (instead of minimum) you need to interpolate your function with opposite sign:
F2 = interp2d(x, y, -z)
Second, the callable in minimize takes a tuple of arguments, and interp2d object needs input coordinates to be given as separate positional arguments. Therefore, we cannot use interp2d object in minimize directly; we need a wrapper that will unpack a tuple of arguments from minimize and feed it to interp2d:
f = lambda x: F2(*x)
And third, to use minimize you need to specify an initial guess for minimum (and bounds, in your case). Any reasonable point will do:
x0 = (2200, 12)
bounds = [(2000,3500),(10,50)]
print minimize(f, x0, method='SLSQP', bounds=bounds)
This yields:
status: 0
success: True
njev: 43
nfev: 243
fun: array([-59.99999488])
x: array([ 2500.00002708, 24.99999931])
message: 'Optimization terminated successfully.'
jac: array([ 0.07000017, 1. , 0. ])
nit: 43
One more possible solution (hope you get the idea):
One more function is created (f), and the minimized values are sent as arguments to this function.
from scipy.optimize import minimize
x = data.Height.values
y = data.Weight.values
def f(params):
w0, w1 = params
return mse(w0, w1, x, y)
optimum = minimize(f, (0,0), method = 'L-BFGS-B', bounds = ((-100, 100), (-5,5)) )
w0 = optimum.x[0]
w1 = optimum.x[1]
Also tried implementation with lambda function, but had no luck.

Categories