I do have a function, for example , but this can be something else as well, like a quadratic or logarithmic function. I am only interested in the domain of . The parameters of the function (a and k in this case) are known as well.
My goal is to fit a continuous piece-wise function to this, which contains alternating segments of linear functions (i.e. sloped straight segments, each with intercept of 0) and constants (i.e. horizontal segments joining the sloped segments together). The first and last segments are both sloped. And the number of segments should be pre-selected between around 9-29 (that is 5-15 linear steps + 4-14 constant plateaus).
Formally
The input function:
The fitted piecewise function:
I am looking for the optimal resulting parameters (c,r,b) (in terms of least squares) if the segment numbers (n) are specified beforehand.
The resulting constants (c) and the breakpoints (r) should be whole natural numbers, and the slopes (b) round two decimal point values.
I have tried to do the fitting numerically using the pwlf package using a segmented constant models, and further processed the resulting constant model with some graphical intuition to "slice" the constant steps with the slopes. It works to some extent, but I am sure this is suboptimal from both fitting perspective and computational efficiency. It takes multiple minutes to generate a fitting with 8 slopes on the range of 1-50000. I am sure there must be a better way to do this.
My idea would be to instead using only numerical methods/ML, the fact that we have the algebraic form of the input function could be exploited in some way to at least to use algebraic transforms (integrals) to get to a simpler optimization problem.
import numpy as np
import matplotlib.pyplot as plt
import pwlf
# The input function
def input_func(x,k,a):
return np.power(x,1/a)*k
x = np.arange(1,5e4)
y = input_func(x, 1.8, 1.3)
plt.plot(x,y);
def pw_fit(func, x_r, no_seg, *fparams):
# working on the specified range
x = np.arange(1,x_r)
y_input = func(x, *fparams)
my_pwlf = pwlf.PiecewiseLinFit(x, y_input, degree=0)
res = my_pwlf.fit(no_seg)
yHat = my_pwlf.predict(x)
# Function values at the breakpoints
y_isec = func(res, *fparams)
# Slope values at the breakpoints
slopes = np.round(y_isec / res, decimals=2)
slopes = slopes[1:]
# For the first slope value, I use the intersection of the first constant plateau and the input function
slopes = np.insert(slopes,0,np.round(y_input[np.argwhere(np.diff(np.sign(y_input - yHat))).flatten()[0]] / np.argwhere(np.diff(np.sign(y_input - yHat))).flatten()[0], decimals=2))
plateaus = np.unique(np.round(yHat))
# If due to rounding slope values (to two decimals), there is no change in a subsequent step, I just remove those segments
to_del = np.argwhere(np.diff(slopes) == 0).flatten()
slopes = np.delete(slopes,to_del + 1)
plateaus = np.delete(plateaus,to_del)
breakpoints = [np.ceil(plateaus[0]/slopes[0])]
for idx, j in enumerate(slopes[1:-1]):
breakpoints.append(np.floor(plateaus[idx]/j))
breakpoints.append(np.ceil(plateaus[idx+1]/j))
breakpoints.append(np.floor(plateaus[-1]/slopes[-1]))
return slopes, plateaus, breakpoints
slo, plat, breaks = pw_fit(input_func, 50000, 8, 1.8, 1.3)
# The piecewise function itself
def pw_calc(x, slopes, plateaus, breaks):
x = x.astype('float')
cond_list = [x < breaks[0]]
for idx, j in enumerate(breaks[:-1]):
cond_list.append((j <= x) & (x < breaks[idx+1]))
cond_list.append(breaks[-1] <= x)
func_list = [lambda x: x * slopes[0]]
for idx, j in enumerate(slopes[1:]):
func_list.append(plateaus[idx])
func_list.append(lambda x, j=j: x * j)
return np.piecewise(x, cond_list, func_list)
y_output = pw_calc(x, slo, plat, breaks)
plt.plot(x,y,y_output);
(Not important, but I think the fitted piecewise function is not continuous as it is. Intervals should be x<=r1; r1<x<=r2; ....)
As Anatolyg has pointed out, it looks to me that in the optimal solution (for the function posted at least, and probably for any where the derivative is different from zero), the horizantal segments will collapse to a point or the minimum segment length (in this case 1).
EDIT---------------------------------------------
The behavior above could only be valid if the slopes could have an intercept. If the intercepts are zero, as posted in the question, one consideration must be taken into account: Is the initial parabolic function defined in zero or nearby? Imagine the function y=0.001 *sqrt(x-1000), then the segments defined as b*x will have a slope close to zero and will be so similar to the constant segments that the best fit will be just the line that without intercept that fits better all the function.
Provided that the function is defined in zero or nearby, you can start by approximating the curve just by linear segments (with intercepts):
divide the function domain in N intervals(equal intervals or whose size is a function of the average curvature (or second derivative) of the function along the domain).
linear fit/regression in each intervals
for each interval, if a point (or bunch of points) in the extreme of any interval is better fitted by the line of the neighbor interval than the line of its interval, this point is assigned to the neighbor interval.
Repeat from 2) until no extreme points are moved.
Linear regressions might be optimized not to calculate all the covariance matrixes from scratch on each iteration, but just adding the contributions of the moved points to the previous covariance matrixes.
Then each linear segment (LSi) is replaced by a combination of a small constant segment at the beginning (Cbi), a linear segment without intercept (Si), and another constant segment at the end (Cei). This segments are easy to calculate as Si will contain the middle point of LSi, and Cbi and Cei will have respectively the begin and end values of the segment LSi. Then the intervals of each segment has to be calculated as an intersection between lines.
With this, the constant end segment will be collinear with the constant begin segment from the next interval so they will merge, resulting in a series of constant and linear segments interleaved.
But this would be a floating point start solution. Next, you will have to apply all the roundings which will mess up quite a lot all the segments as the conditions integer intervals and linear segments without slope can be very confronting. In fact, b,c,r are not totally independent. If ci and ri+1 are known, then bi+1 is already fixed
If nothing is broken so far, the final task will be to minimize the error/cost function (I assume that it will be the integral of the error between the parabolic function and the segments). My guess is that gradients here will be quite a pain, as if you change for example one ci, all the rest of the bj and cj will have to adapt as well due to the integer intervals restriction. However, if you can generalize the derivatives between parameters ( how much do I have to adapt bi+1 if ci changes a unit), you can propagate the change of one parameter to all other parameters and have kind of a gradient. Then for each interval, you can estimate what would be the ideal parameter and averaging all intervals calculate the best gradient step. Let me illustrate this:
Assuming first that r parameters are fixed, if I change c1 by one unit, b2 changes by 0.1, c2 changes by -0.2 and b3 changes by 0.2. This would be the gradient.
Then I estimate, comparing with the parabolic curve, that c1 should increase 0.5 (to reduce the cost by 10 points), b2 should increase 0.2 (to reduce the cost by 5 points), c2 should increase 0.2 (to reduce the cost by 6 points) and b3 should increase 0.1 (to reduce the cost by 9 points).
Finally, the gradient step would be (0.5/1·10 + 0.2/0.1·5 - 0.2/(-0.2)·6 + 0.1/0.2·9)/(10 + 5 + 6 + 9)~= 0.45. Thus, c1 would increase 0.45 units, b2 would increase 0.45·0.1, and so on.
When you add the r parameters to the pot, as integer intervals do not have an proper derivative, calculation is not straightforward. However, you can consider r parameters as floating points, calculate and apply the gradient step and then apply the roundings.
We can integrate the squared error function for linear and constant pieces and let SciPy optimize it. Python 3:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize
xl = 1
xh = 50000
a = 1.3
p = 1 / a
n = 8
def split_b_and_c(bc):
return bc[::2], bc[1::2]
def solve_for_r(b, c):
r = np.empty(2 * n)
r[0] = xl
r[1:-1:2] = c / b[:-1]
r[2::2] = c / b[1:]
r[-1] = xh
return r
def linear_residual_integral(b, x):
return (
(x ** (2 * p + 1)) / (2 * p + 1)
- 2 * b * x ** (p + 2) / (p + 2)
+ b ** 2 * x ** 3 / 3
)
def constant_residual_integral(c, x):
return x ** (2 * p + 1) / (2 * p + 1) - 2 * c * x ** (p + 1) / (p + 1) + c ** 2 * x
def squared_error(bc):
b, c = split_b_and_c(bc)
r = solve_for_r(b, c)
linear = np.sum(
linear_residual_integral(b, r[1::2]) - linear_residual_integral(b, r[::2])
)
constant = np.sum(
constant_residual_integral(c, r[2::2])
- constant_residual_integral(c, r[1:-1:2])
)
return linear + constant
def evaluate(x, b, c, r):
i = 0
while x > r[i + 1]:
i += 1
return b[i // 2] * x if i % 2 == 0 else c[i // 2]
def main():
bc0 = (xl + (xh - xl) * np.arange(1, 4 * n - 2, 2) / (4 * n - 2)) ** (
p - 1 + np.arange(2 * n - 1) % 2
)
bc = scipy.optimize.minimize(
squared_error, bc0, bounds=[(1e-06, None) for i in range(2 * n - 1)]
).x
b, c = split_b_and_c(bc)
r = solve_for_r(b, c)
X = np.linspace(xl, xh, 1000)
Y = [evaluate(x, b, c, r) for x in X]
plt.plot(X, X ** p)
plt.plot(X, Y)
plt.show()
if __name__ == "__main__":
main()
I have tried to come up with a new solution myself, based on the idea of #Amo Robb, where I have partitioned the domain, and curve fitted a dual - constant and linear - piece together (with the help of np.maximum). I have used the 1 / f(x)' as the function to designate the breakpoints, but I know this is arbitrary and does not provide a global optimum. Maybe there is some optimal function for these breakpoints. But this solution is OK for me, as it might be appropriate to have a better fit at the first segments, at the expense of the error for the later segments. (The task itself is actually a cost based retail margin calculation {supply price -> added margin}, as the retail POS software can only work with such piecewise margin function).
The answer from #David Eisenstat is correct optimal solution if the parameters are allowed to be floats. Unfortunately the POS software can not use floats. It is OK to round up c-s and r-s afterwards. But the b-s should be rounded to two decimals, as those are inputted as percents, and this constraint would ruin the optimal solution with long floats. I will try to further improve my solution with both Amo's and David's valuable input. Thank You for that!
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# The input function f(x)
def input_func(x,k,a):
return np.power(x,1/a) * k
# 1 / f(x)'
def one_per_der(x,k,a):
return a / (k * np.power(x, 1/a-1))
# 1 / f(x)' inverted
def one_per_der_inv(x,k,a):
return np.power(a / (x*k), a / (1-a))
def segment_fit(start,end,y,first_val):
b, _ = curve_fit(lambda x,b: np.maximum(first_val, b*x), np.arange(start,end), y[start-1:end-1])
b = float(np.round(b, decimals=2))
bp = np.round(first_val / b)
last_val = np.round(b * end)
return b, bp, last_val
def pw_fit(end_range, no_seg, **fparams):
y_bps = np.linspace(one_per_der(1, **fparams), one_per_der(end_range,**fparams) , no_seg+1)[1:]
x_bps = np.round(one_per_der_inv(y_bps, **fparams))
y = input_func(x, **fparams)
slopes = [np.round(float(curve_fit(lambda x,b: x * b, np.arange(1,x_bps[0]), y[:int(x_bps[0])-1])[0]), decimals = 2)]
plats = [np.round(x_bps[0] * slopes[0])]
bps = []
for i, xbp in enumerate(x_bps[1:]):
b, bp, last_val = segment_fit(int(x_bps[i]+1), int(xbp), y, plats[i])
slopes.append(b); bps.append(bp); plats.append(last_val)
breaks = sorted(list(x_bps) + bps)[:-1]
# If due to rounding slope values (to two decimals), there is no change in a subsequent step, I just remove those segments
to_del = np.argwhere(np.diff(slopes) == 0).flatten()
breaks_to_del = np.concatenate((to_del * 2, to_del * 2 + 1))
slopes = np.delete(slopes,to_del + 1)
plats = np.delete(plats[:-1],to_del)
breaks = np.delete(breaks,breaks_to_del)
return slopes, plats, breaks
def pw_calc(x, slopes, plateaus, breaks):
x = x.astype('float')
cond_list = [x < breaks[0]]
for idx, j in enumerate(breaks[:-1]):
cond_list.append((j <= x) & (x < breaks[idx+1]))
cond_list.append(breaks[-1] <= x)
func_list = [lambda x: x * slopes[0]]
for idx, j in enumerate(slopes[1:]):
func_list.append(plateaus[idx])
func_list.append(lambda x, j=j: x * j)
return np.piecewise(x, cond_list, func_list)
fparams = {'k':1.8, 'a':1.2}
end_range = 5e4
no_steps = 10
x = np.arange(1, end_range)
y = input_func(x, **fparams)
slopes, plats, breaks = pw_fit(end_range, no_steps, **fparams)
y_output = pw_calc(x, slopes, plats, breaks)
plt.plot(x,y_output,y);
I have a differential equation:
from scipy.integrate import odeint
import matplotlib.pyplot as plt
# function that returns dy/dt
def model(y,t):
k = 0.3
dydt = -k * y
return dydt
# initial condition
y0 = 5
# time points
t = np.linspace(0,10)
t1=2
# solve ODE
y = odeint(model,y0,t)
And I want to evaluate the solution of this differential equation on two different points. For example I want y(t=2) and y(t=3).
I can solve the problem in the following way:
Suppose that you need y(2). Then you, define
t = np.linspace(0,2)
and just print
print y[-1]
In order to get the value of y(2). However I think that this procedure is slow, since I need to do the same again in order to calculate y(3), and if I want another point I need to do same again. So there is some faster way to do this?
isn't this just:
y = odeint(model, y0, [0, 2, 3])[1:]
i.e. the third parameter just specifies the values of t that you want back.
as an example of printing the results out, we'd just follow the above with:
print(f'y(2) = {y[0,0]}')
print(f'y(3) = {y[1,0]}')
which gives me:
y(2) = 2.7440582441900494
y(3) = 2.032848408317066
which seems the same as the anytical solution:
5 * np.exp(-0.3 * np.array([2,3]))
You can get exactly what you want if you use solve_ivp with the dense-output option
from scipy.integrate import solve_ivp
# function that returns dy/dt
def model(t,y):
k = 0.3
dydt = -k * y
return dydt
# initial condition
y0 = [5]
# solve ODE
res = solve_ivp(model,[0,10],y0,dense_output=True)
y = lambda t: res.sol(t)[0]
for t in [2,3,3.4]:
print(f'y({t}) = {y(t)}')
with the output
y(2) = 2.743316182689662
y(3) = 2.0315223673200338
y(3.4) = 1.802238620366918
Given this function:
def f(x):
return (1-x**2)**m * ((1-x)/2)**n
where m and n are constants, let's say both 0.5 for the sake of an example.
I'm trying to use functions from scipy.optimize to solve for x given a value of y. I'm only interested in xvalues from -1 to 1. Plotting the function with
x = numpy.arange(0, 1, 0,1)
matplotlib.pyplot.plot(x, f(x))
shows that the function is a kind of distorted parabola covering the range about 0 to 0.65. So lets try solving it for y = 0.3:
def f(x):
return (1 - x**2)**m * ((1-x)/2)**n - 0.3
print(scipy.optimize.newton_krylov(f, 0.5))
0.6718791645800665
This looks about right for one of the possible solutions. But there are two. The second should be around -0.9. Try what I might for an initial guess, I can't get it to find this second solution. The Newton-Krylov method gives no convergence at all for xin < 0 but none of the solvers can find this second solution.
Am I missing something? What am I doing wrong?
The method converges at least for x=-0.9:
scipy.optimize.newton_krylov(f, -0.9)
#array(-0.9527983).
It diverges for x approximately in [-0.85...0.06].
This is because, newton_krylov uses the Jacobian of the function. This makes it a gradient decent method consequently your solutions always converge to a local minima. Furthermore, because your function is parabolic you have a very interesting option!
The first is to find the maxima of f(x) and split your search domain into to. Next you can make an initial guess in each domain and solve with newton_krylov.
def f(x):
# Here is our function
return (1-x**2)**m * ((1-x)/2)**n
def minf(x):
# Here is where we find an optima and split the domain
return -f(x)
def fy(x):
# This is where you want your y value target defined
return abs(f(x) - .3)
if __name__ == "__main__":
x = numpy.arange(-1., 1., 1e-3, dtype=float)
# pyplot.plot(x, f(x))
# pyplot.show()
minx = minimize(minf, 0.0)['x']
# Make an initial guess in each domain
a1 = minx - 1.6 * minx
a2 = minx + 1.6 * minx
print(newton_krylov(fy, a1))
print(newton_krylov(fy, a2))
The output then is:
[0.67187916]
[-0.95279992]
I am a new Python user, so bear with me if this question is obvious.
I am trying to find the value of lmbda that minimizes the following function, given a fixed vector Z and scalar sigma:
def sure_sft(z,lmbda, sigma):
indicator = np.abs(z) <= lmbda;
minimum = np.minimum(z**2,lmbda**2);
return -sigma**2*np.sum(indicator) + np.sum(minimum);
When I pass in values of lmbda manually, I find that the function produces the correct value of sure_stf. However, when I try to use the following code to find the value of lmbda that minimizes sure_stf:
minimize_scalar(lambda lmbda: sure_sft(Z, lmbda, sigma))
it gives me an incorrect value for sure_stf (-8.6731 for lmbda = 0.4916). If I pass in 0.4916 manually to sure_sft, I obtain -7.99809 instead. What am I doing incorrectly? I would appreciate any advice!
EDIT: I've pasted my code below. The data is from: https://web.stanford.edu/~chadj/HallJones400.asc
import pandas as pd
import numpy as np
from scipy.optimize import minimize_scalar
# FUNCTIONS
# Calculate orthogonal projection of g onto f
def proj(f, g):
return ( np.dot(f,g) / np.dot(f,f) ) * f
def gs(X):
# Copy of X -- will be used to store orthogonalization
F = np.copy(X)
# Orthogonalize design matrix
for i in range(1, X.shape[1]): # Iterate over columns of X
for j in range(i): # Iterate over columns less than current one
F[:,i] -= proj(F[:,j], X[:,i]) # Subtract projection of x_i onto f_j for all j<i from F_i
# normalize each column to have unit length
norm_F=( (F**2).mean(axis=0) ) ** 0.5 # Row vector with sqrt root of average of the squares of each column
W = F/norm_F # Normalize
return W
# SURE for soft-thresholding
def sure_sft(z,lmbda, sigma):
indicator = np.abs(z) <= lmbda
minimum = np.minimum(z**2,lmbda**2)
return -sigma**2*np.sum(indicator) + np.sum(minimum)
# Import data.
data_raw = pd.read_csv("hall_jones1999.csv")
# Drop missing observations.
data = data_raw.dropna(subset=['logYL', 'Latitude'])
Y = data['logYL']
Y = np.array(Y)
N = Y.size
# Create design matrix.
design = np.empty([data['Latitude'].size,15])
design[:,0] = 1
for j in range(1, 15):
design[:,j] = data['Latitude']**j
K = design.shape[1]
# Use Gramm-Schmidt on design matrix.
W = gs(design)
Z = np.dot(W.T, Y)/N
# MLE
mu_mle = np.dot(W, Z)
# Soft-thresholding
# Use MLE residuals to calculate sigma for SURE calculation
sigma = np.sqrt(np.sum((Y - mu_mle)**2)/(N-K))
# Write SURE as a function of lmbda
sure = lambda lmbda: sure_sft(Z, lmbda, sigma)
# Find SURE-minimizing lmbda
lmbda = minimize_scalar(sure).x
min_sure = minimize_scalar(sure).fun #-8.673172212265738
# Compare to manually inputting minimized lambda into sure_sft
# I'm s
act_sure1 = sure_sft(Z, 0.49167598, sigma) #-7.998060514873529
act_sure2 = sure_sft(Z, 0.491675989, sigma) #-8.673172212306728
You're actually not doing anything wrong. I just tested out the code and confirmed that lmbda has a value of 0.4916759890416824 at the end of the script. You can confirm this for yourself by adding the following lines to the bottom of your script:
print(lmbda)
print(sure_sft(Z, lmbda, sigma))
when you run your script you should then see:
0.4916759890416824
-8.673158394698172
The only thing I can figure is that somehow the routine you were using to print out lmbda was set up to only print a fixed number of digits of floating point numbers, or somehow the printout was otherwise truncated.
I am trying to solve this differential equation as part of my assignment. I am not able to understand on how can i put the condition for u in the code. In the code shown below, i arbitrarily provided
u = 5.
2dx(t)dt=−x(t)+u(t)
5dy(t)dt=−y(t)+x(t)
u=2S(t−5)
x(0)=0
y(0)=0
where S(t−5) is a step function that changes from zero to one at t=5. When it is multiplied by two, it changes from zero to two at that same time, t=5.
def model(x,t,u):
dxdt = (-x+u)/2
return dxdt
def model2(y,x,t):
dydt = -(y+x)/5
return dydt
x0 = 0
y0 = 0
u = 5
t = np.linspace(0,40)
x = odeint(model,x0,t,args=(u,))
y = odeint(model2,y0,t,args=(u,))
plt.plot(t,x,'r-')
plt.plot(t,y,'b*')
plt.show()
I do not know the SciPy Library very well, but regarding the example in the documentation I would try something like this:
def model(x, t, K, PT)
"""
The model consists of the state x in R^2, the time in R and the two
parameters K and PT regarding the input u as step function, where K
is the infimum of u and PT is the delay of the step.
"""
x1, x2 = x # Split the state into two variables
u = K if t>=PT else 0 # This is the system input
# Here comes the differential equation in vectorized form
dx = [(-x1 + u)/2,
(-x2 + x1)/5]
return dx
x0 = [0, 0]
K = 2
PT = 5
t = np.linspace(0,40)
x = odeint(model, x0, t, args=(K, PT))
plt.plot(t, x[:, 0], 'r-')
plt.plot(t, x[:, 1], 'b*')
plt.show()
You have a couple of issues here, and the step function is only a small part of it. You can define a step function with a simple lambda and then simply capture it from the outer scope without even passing it to your function. Because sometimes that won't be the case, we'll be explicit and pass it.
Your next problem is the order of arguments in the function to integrate. As per the docs (y,t,...). Ie, First the function, then the time vector, then the other args arguments. So for the first part we get:
u = lambda t : 2 if t>5 else 0
def model(x,t,u):
dxdt = (-x+u(t))/2
return dxdt
x0 = 0
y0 = 0
t = np.linspace(0,40)
x = odeint(model,x0,t,args=(u,))
Moving to the next part, the trouble is, you can't feed x as an arg to y because it's a vector of values for x(t) for particular times and so y+x doesn't make sense in the function as you wrote it. You can follow your intuition from math class if you pass an x function instead of the x values. Doing so requires that you interpolate the x values using the specific time values you are interested in (which scipy can handle, no problem):
from scipy.interpolate import interp1d
xfunc = interp1d(t.flatten(),x.flatten(),fill_value="extrapolate")
#flatten cuz the shape is off , extrapolate because odeint will go out of bounds
def model2(y,t,x):
dydt = -(y+x(t))/5
return dydt
y = odeint(model2,y0,t,args=(xfunc,))
Then you get:
#Sven's answer is more idiomatic for vector programming like scipy/numpy. But I hope my answer provides a clearer path from what you know already to a working solution.