Trying to interpolate linearly in python - python

I have 3 arrays: a, b, c all with length 15.
a=[950, 850, 750, 675, 600, 525, 460, 400, 350, 300, 250, 225, 200, 175, 150]
b = [16, 12, 9, -35, -40, -40, -40, -45, -50, -55, -60, -65, -70, -75, -80]
c=[32.0, 22.2, 12.399999999999999, 2.599999999999998, -7.200000000000003, -17.0, -26.800000000000004, -36.60000000000001, -46.400000000000006, -56.2, -66.0, -75.80000000000001, -85.60000000000001, -95.4, -105.20000000000002]
I am trying to find the value of a at the index where b=c. T
The problem is that there is no place where b=c exactly so I need to linearly interpolate between values in the array to find the value of a where b=c. Does that make sense?
I was thinking about using scipy.interpolate to do the interpolation.
I am having a hard time wrappying my mind around how to solve this problem. Any ideas on this would be great!

Here's simpler variation of a function from another answer of mine:
from __future__ import division
import numpy as np
def find_roots(t, y):
"""
Given the input signal `y` with samples at times `t`,
find the times where `y` is 0.
`t` and `y` must be 1-D numpy arrays.
Linear interpolation is used to estimate the time `t` between
samples at which sign changes in `y` occur.
"""
# Find where y crosses 0.
transition_indices = np.where(np.sign(y[1:]) != np.sign(y[:-1]))[0]
# Linearly interpolate the time values where the transition occurs.
t0 = t[transition_indices]
t1 = t[transition_indices + 1]
y0 = y[transition_indices]
y1 = y[transition_indices + 1]
slope = (y1 - y0) / (t1 - t0)
transition_times = t0 - y0/slope
return transition_times
That function can be used with t = a and y = b - c. For example, here is your data, entered as numpy arrays:
In [354]: a = np.array([950, 850, 750, 675, 600, 525, 460, 400, 350, 300, 250, 225, 200, 175, 150])
In [355]: b = np.array([16, 12, 9, -35, -40, -40, -40, -45, -50, -55, -60, -65, -70, -75, -80])
In [356]: c = np.array([32.0, 22.2, 12.399999999999999, 2.599999999999998, -7.200000000000003, -17.0, -26.800000000000004, -3
...: 6.60000000000001, -46.400000000000006, -56.2, -66.0, -75.80000000000001, -85.60000000000001, -95.4, -105.2000000000
...: 0002])
The place where "b = c" is the place where "b - c = 0", so we pass b - c for y:
In [357]: find_roots(a, b - c)
Out[357]: array([ 312.5])
So the linearly interpolated value of a is 312.5.
With the following matplotlib commands:
In [391]: plot(a, b, label="b")
Out[391]: [<matplotlib.lines.Line2D at 0x11eac8780>]
In [392]: plot(a, c, label="c")
Out[392]: [<matplotlib.lines.Line2D at 0x11f23aef0>]
In [393]: roots = find_roots(a, b - c)
In [394]: [axvline(root, color='k', alpha=0.2) for root in roots]
Out[394]: [<matplotlib.lines.Line2D at 0x11f258208>]
In [395]: grid()
In [396]: legend(loc="best")
Out[396]: <matplotlib.legend.Legend at 0x11f260ba8>
In [397]: xlabel("a")
Out[397]: <matplotlib.text.Text at 0x11e71c470>
I get the plot

This is not necessarily a solution to your problem, since your data does not appear to be linear, but it might give you some ideas. If you assume that your lines a, b, and c are linear, then the following idea works:
Perform a linear regression of lines a, b and c to get their respective slopes (m_a, m_b, m_c) and y-intercepts (b_a, b_b, b_c). Then solve the equation 'y_b = y_c' for x, and find y = m_a * x + b_a to get your result.
Since the linear regression approximately solves y = m * x + b, equation y_b = y_c can be solved by hand giving: x = (b_b-b_c) / (m_c-m_b).
Using python, you get:
>> m_a, b_a, r_a, p_a, err_a = stats.linregress(range(15), a)
>> m_b, b_b, r_b, p_b, err_b = stats.linregress(range(15), b)
>> m_c, b_c, r_c, p_c, err_c = stats.linregress(range(15), c)
>> x = (b_b-b_c) / (m_c-m_b)
>> m_a * x + b_a
379.55151515151516
Since your data is not linear, you probably need to go through your vectors one by one and search for overlapping y intervals. Then you can apply the above method but using only the endpoints of your two intervals to construct your b and c inputs to the linear regression. In this case, you should get an exact result, since the least-squares method will interpolate perfectly with only two points (although there are more efficient ways to do this since the intersection can be solved exactly in this simple case where there are two straight lines).
Cheers.

Another simple solution using:
one linear-regressor for each vector (done with scikit-learn as scipy-docs were down for me; easy to switch to numpy/scipy-based linear-regression)
general-purpose minimization using scipy.optimize.minimize
Code
a=[950, 850, 750, 675, 600, 525, 460, 400, 350, 300, 250, 225, 200, 175, 150]
b = [16, 12, 9, -35, -40, -40, -40, -45, -50, -55, -60, -65, -70, -75, -80]
c=[32.0, 22.2, 12.399999999999999, 2.599999999999998, -7.200000000000003, -17.0, -26.800000000000004, -36.60000000000001, -46.400000000000006, -56.2, -66.0, -75.80000000000001, -85.60000000000001, -95.4, -105.20000000000002]
from sklearn.linear_model import LinearRegression
from scipy.optimize import minimize
import numpy as np
reg_a = LinearRegression().fit(np.arange(len(a)).reshape(-1,1), a)
reg_b = LinearRegression().fit(np.arange(len(b)).reshape(-1,1), b)
reg_c = LinearRegression().fit(np.arange(len(c)).reshape(-1,1), c)
funA = lambda x: reg_a.predict(x.reshape(-1,1))
funB = lambda x: reg_b.predict(x.reshape(-1,1))
funC = lambda x: reg_c.predict(x.reshape(-1,1))
opt_crossing = lambda x: (funB(x) - funC(x))**2
x0 = 1
res = minimize(opt_crossing, x0, method='SLSQP', tol=1e-6)
print(res)
print('Solution: ', funA(res.x))
import matplotlib.pyplot as plt
x = np.linspace(0, 15, 100)
a_ = reg_a.predict(x.reshape(-1,1))
b_ = reg_b.predict(x.reshape(-1,1))
c_ = reg_c.predict(x.reshape(-1,1))
plt.plot(x, a_, color='blue')
plt.plot(x, b_, color='green')
plt.plot(x, c_, color='cyan')
plt.scatter(np.arange(15), a, color='blue')
plt.scatter(np.arange(15), b, color='green')
plt.scatter(np.arange(15), c, color='cyan')
plt.axvline(res.x, color='red', linestyle='solid')
plt.axhline(funA(res.x), color='red', linestyle='solid')
plt.show()
Output
fun: array([ 7.17320622e-15])
jac: array([ -3.99479864e-07, 0.00000000e+00])
message: 'Optimization terminated successfully.'
nfev: 8
nit: 2
njev: 2
status: 0
success: True
x: array([ 8.37754008])
Solution: [ 379.55151658]
Plot

Related

numpy interpolation with period

Can someone explain to me the code that is in the documentation specifically this:
Interpolation with periodic x-coordinates:
x = [-180, -170, -185, 185, -10, -5, 0, 365]
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
np.interp(x, xp, fp, period=360)
array([7.5 , 5. , 8.75, 6.25, 3. ,
3.25, 3.5 , 3.75])
I did a trial like this
import matplotlib.pyplot as plt
import numpy as np
x = [-180, -170, -185, 185, -10, -5, 0, 365]
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
y=np.interp(x, xp, fp, period=360)
print(x)
print(y)
plt.grid()
plt.plot(xp, fp)
#plt.scatter(x,y,marker="o",color="green")
plt.plot(x,y,'o')
plt.show()
and it shows like this
How the orange points can be considered "interpolations" is beyond me. They are not even in the curve
EDIT: Thanks to Warren Weckesser for the detailed explanation!
A plot to see it better
The numbers used in the example that demonstrates the use of period in the interp docstring can be a bit difficult to interpret in a plot. Here's what is happening...
The period is 360, and the given "known" points are
xp = [190, -190, 350, -350]
fp = [ 5, 10, 3, 4]
Note that the values in xp span an interval longer than 360. Let's consider the interval [0, 360) to be the fundamental domain of the interpolator. If we map the given points to the fundamental domain, they are:
xp1 = [190, 170, 350, 10]
fp1 = [ 5, 10, 3, 4]
Now for a periodic interpolator, we can imagine this data being extended periodically in the positive and negative directions, e.g.
xp_ext = [..., 190-360, 170-360, 350-360, 10-360, 190, 170, 350, 10, 190+360, 170+360, 350+360, 10+360, ...]
fp_ext = [..., 5, 10, 3, 4, 5, 10, 3, 4, 5, 10, 3, 4, ...]
It is this extended data that interp is interpolating.
Here's a script that replaces the array x from the example with a dense set of points. With this dense set, the plot of y = np.interp(x, xp, fp, period=360) should make clearer what is going on:
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
x = np.linspace(-360, 720, 1200)
y = np.interp(x, xp, fp, period=360)
plt.plot(x, y, '--')
plt.plot(xp, fp, 'ko')
plt.grid(True)
Each "corner" in the plot is at a point in the periodically extended version of (xp, fp).

Python fitting a curve spitting TypeError: only size-1 arrays can be converted to Python scalars

I am trying to fit a curve, this is my code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.optimize import curve_fit
import math
vector = np.vectorize(np.int_)
x_data = np.array([-5.0, -4, -3, -2, -1, 0, 1, 2, 3, 4])
x1 = vector(x_data)
y_data = np.array([77, 81, 171, 303, 409, 302, 139, 115, 88, 89])
y1 = vector(y_data)
def model_f(x, a, b, c, d):
return a/(math.sqrt(2*math.pi*d**2)) * math.exp( -(x-c)**2/(2*d**2) ) + b
popt, pcov = curve_fit(model_f, x1, y1, p0=[3,2,-16, 2])
This is the error I get:
TypeError: only size-1 arrays can be converted to Python scalars
From what I understand math.sqrt() and math.exp() are causing the problem. I thought that vectorizing the arrays would fix it. Am I missing something?
Don't call vectorize, and don't use the math module; use np. equivalents. Also your initial values were way off and produced a degenerate solution. Either don't provide initial values at all, or provide ones in the ballpark of what you know to be needed:
import numpy as np
from scipy.optimize import curve_fit
def model_f(x: np.ndarray, a: float, b: float, c: float, d: float) -> np.ndarray:
return a/d/np.sqrt(2*np.pi) * np.exp(-((x-c)/d)**2 / 2) + b
x1 = np.arange(-5, 5)
y1 = np.array((77, 81, 171, 303, 409, 302, 139, 115, 88, 89))
popt, _ = curve_fit(model_f, x1, y1, p0=(1000, 100, -1, 1))
print('Parameters:', popt)
print('Ideal vs. fit y:')
print(np.stack((y1, model_f(x1, *popt))))
Parameters: [916.86287196 85.71611182 -1.03419295 1.13753421]
Ideal vs. fit y:
[[ 77. 81. 171. 303. 409.
302. 139. 115. 88. 89. ]
[ 86.45393326 96.46010314 157.95219577 309.95808531 407.12196914
298.41481145 150.70663751 94.88484707 86.3133437 85.73407366]]

Piecewise Fit not working - large dataset

I have been using a solution found in several places on stack overflow for fitting a piecewise function:
from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03])
def piecewise_linear(x, x0, y0, k1, k2):
return np.piecewise(x, [x < x0], [lambda x:k1*x + y0-k1*x0, lambda x:k2*x + y0-k2*x0])
p, e = optimize.curve_fit(piecewise_linear, x, y)
xd = np.linspace(-5, 30, 100)
plt.plot(x, y, ".")
plt.plot(xd, piecewise_linear(xd, *p))
plt.show()
(for example, here: How to apply piecewise linear fit in Python?)
The first time I try it in the console I get an OptimizeWarning.
OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)
After that I just get a straight line for my fit. It seems as though there is clearly a bend in the data that the fit isn't following, although I cannot figure out why.
For the dataset I am using there are about 3200 points in each x and y, is this part of the problem?
Here are some fake data that kind of simulate mine (same problem occurs where fit is not piecewise):
x = np.append(np.random.uniform(low=10.0, high=40.2, size=(1500,)), np.random.uniform(low=-10.0, high=20.2, size=(1500,)))
y = np.append(np.random.uniform(low=-3000, high=0, size=(1500,)), np.random.uniform(low=-2000, high=1000, size=(1500,)))
Just to complete the question with the answer provided in the comment above:
The issue was not due to the large number of points, but the fact that I had such large values on my y axis. Since the default initial values are 1, my values of around 1000 were too large. To fix that an initial guess for the line fit was used for parameter p0. From the docs for scipy.optimize.curve_fit it looks like:
p0 : None, scalar, or N-length sequence, optional
Initial guess for the parameters. If None, then the initial values will all be 1 (if the number of parameters for the function can be determined using introspection, otherwise a ValueError is raised).
So my final code ended up looking like this:
from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15], dtype=float)
y = np.array([500, 700, 900, 1100, 1300, 1500, 2892, 4281, 5670, 7059, 8447, 9836, 11225, 12614, 14003])
def piecewise_linear(x, x0, y0, k1, k2):
return np.piecewise(x, [x < x0], [lambda x:k1*x + y0-k1*x0, lambda x:k2*x + y0-k2*x0])
p, e = optimize.curve_fit(piecewise_linear, x, y, p0=(10, -2500, 0, -500))
xd = np.linspace(-5, 30, 100)
plt.plot(x, y, ".")
plt.plot(xd, piecewise_linear(xd, *p))
plt.show()
Just for fun (very scattered case) :
Since the original data was not available, the coordinates of the points are obtained from the figure published in the Rachel W's question, thanks to a graphical scan and the record of the blue pixels. They are some artefact due to the straight line and the grid which, after scanning, appear in white.
The result of a piecewise regression (two segments) is drawn in red on the above figure.
The equation of the fitted function is :
The regression method used is not iterative and don't require initial guess. The code is very simple : pp.12-13 in this paper https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf

scipy.optimize.linprog - difficulty understanding the parameters

I want to minimize the following LPP:
c=60x+40y+50z
subject to
20x+10y+10z>=350 ,
10x+10y+20z>=400, x,y,z>=0
my code snippet is the following(I'm using scipy package for the first time)
from scipy.optimize import linprog
c = [60, 40, 50]
A = [[20,10], [10,10],[10,20]]
b = [350,400]
res = linprog(c, A, b)
print(res)
The output is : screenshot of the output in Pycharm
1.Can someone explain the parameters of the linprog function in detail, especially how the bound will be calculated?
2.Have I written the parameters right?
I am naive with LPP basics, I think I am understanding the parameters wrong.
linprog expects A to have one row per inequation and one column per variable, and not the other way around. Try this:
from scipy.optimize import linprog
c = [60, 40, 50]
A = [[20, 10, 10], [10, 10, 20]]
b = [350, 400]
res = linprog(c, A, b)
print(res)
Output:
fun: -0.0
message: 'Optimization terminated successfully.'
nit: 0
slack: array([ 350., 400.])
status: 0
success: True
x: array([ 0., 0., 0.])
The message is telling you that your A_ub matrix has incorrect dimension. It is currently a 3x2 matrix which cannot left-multiply your 3x1 optimization variable x. You need to write:
A = [[20,10, 10], [10,10,20]]
which is a 2x3 matrix and can left multiply x.

How to weight station to Order Least Squares in python?

I have 10 climate stations data about precipitation and it's DEM.
I had done a linear regression follow:
DEM = [200, 300, 400, 500, 600, 300, 200, 100, 50, 200]
Prep = [50, 95, 50, 59, 99, 50, 23, 10, 10, 60]
X = DEM #independent variable
Y = Prep #dependent variable
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
But now I want to add weight to those stations like:
Weight = [0.3, 0.1, 0.1, 0.1, 0.2, 0.05, 0.05, 0.05, 0.05, 0.05]
The diagram is like http://ppt.cc/XXrEv
I found Weighted Least Squares to do it, but I want to know how and why it work or if it is wrong.
import numpy as np
import statsmodels.api as sm
Y = [1, 3, 4, 5, 2, 3, 4]
X = range(1, 8)
X = sm.add_constant(X)
wls_model = sm.WLS(Y, X, weights=range(1, 8))
results = wls_model.fit()
results.params
Answer:
import numpy as np
import statsmodels.api as sm
start_time = time.time()
alist=[2,4,6]
DEM=[200,300,400,500,300,600]
PRE=[20,19,18,20,21,22,30,23]
A_DEM=[]
A_PRE=[]
W=[]
for a in alist:
A_DEM.append(DEM[a-1])
A_PRE.append(PRE[a-1])
W.append(1)
X = sm.add_constant(A_DEM)
Y = A_PRE
wls_model = sm.WLS(Y,X, weights=W).fit()
print wls_model.params[0] # intercept
print wls_model.params[1] # slope
print wls_model.rsquared #rsquared
print wls_model.summary()
And I found the WLS will auto normalize.So you can add weight direct.

Categories