How to weight station to Order Least Squares in python?

How to weight station to Order Least Squares in python? - python

I have 10 climate stations data about precipitation and it's DEM.
I had done a linear regression follow:
DEM = [200, 300, 400, 500, 600, 300, 200, 100, 50, 200]
Prep = [50, 95, 50, 59, 99, 50, 23, 10, 10, 60]
X = DEM #independent variable
Y = Prep #dependent variable
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
But now I want to add weight to those stations like:
Weight = [0.3, 0.1, 0.1, 0.1, 0.2, 0.05, 0.05, 0.05, 0.05, 0.05]
The diagram is like http://ppt.cc/XXrEv
I found Weighted Least Squares to do it, but I want to know how and why it work or if it is wrong.
import numpy as np
import statsmodels.api as sm
Y = [1, 3, 4, 5, 2, 3, 4]
X = range(1, 8)
X = sm.add_constant(X)
wls_model = sm.WLS(Y, X, weights=range(1, 8))
results = wls_model.fit()
results.params

Answer:
import numpy as np
import statsmodels.api as sm
start_time = time.time()
alist=[2,4,6]
DEM=[200,300,400,500,300,600]
PRE=[20,19,18,20,21,22,30,23]
A_DEM=[]
A_PRE=[]
W=[]
for a in alist:
A_DEM.append(DEM[a-1])
A_PRE.append(PRE[a-1])
W.append(1)
X = sm.add_constant(A_DEM)
Y = A_PRE
wls_model = sm.WLS(Y,X, weights=W).fit()
print wls_model.params[0] # intercept
print wls_model.params[1] # slope
print wls_model.rsquared #rsquared
print wls_model.summary()
And I found the WLS will auto normalize.So you can add weight direct.

Related

Predicting y and x values using linear regressions

I am making a program to predict the x and y value using linear regression.
I can predict y from x. However, when trying to predict x given y i do not get the intended result. Output:
Given (x) predict (y):
x = 10
85.59308314937454
Given (y) predict (x):
y = 85
-45.75349521707133
code:
def place_y(x, slope, intercept):
return slope * x + intercept
def predict_value_x():
"""Using the line of regression a value can be predicted based on a given value.
i.e. Predict the speed of a car (y) given it is (x) years old"""
from scipy import stats
age_x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6] # population
speed_y = [99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86] # population
slope, intercept, r, p, std_err = stats.linregress(age_x, speed_y) # get stats values
predict_value = int(input("Given (x) predict (y): \nx = ")) # age of car(x)
predicted = place_y(predict_value, slope, intercept) # the speed of car given x
print(predicted)
predict_value_x()
def predict_value_y():
"""Using the line of regression a value can be predicted based on a given value.
i.e. Predict the age of a car (x) given its speed (y)"""
from scipy import stats
age_x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6] # population
speed_y = [99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86] # population
slope, intercept, r, p, std_err = stats.linregress(age_x, speed_y) # get stats values
predict_value = int(input("Given (y) predict (x): \ny = ")) # age of car(x)
predicted = place_y(predict_value, slope, intercept) # the speed of car given x
print(predicted)

y=ax+b -> x=(y-b)/a
The problem is that you try to solve by y twice.
You need an aditional function that solves by y:
def place_x(y, slope, intercept):
return (y - intercept)/slope
and replace placey in your predict_value_y function:
predicted = place_x(predict_value, slope, intercept)
the entire code could look like:
def place_y(x, slope, intercept):
return slope * x + intercept
def place_x(y, slope, intercept):
return (y - intercept)/slope
def predict_value_x():
"""Using the line of regression a value can be predicted based on a given value.
i.e. Predict the speed of a car (y) given it is (x) years old"""
from scipy import stats
age_x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6] # population
speed_y = [99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86] # population
slope, intercept, r, p, std_err = stats.linregress(age_x, speed_y) # get stats values
predict_value = int(input("Given (x) predict (y): \nx = ")) # age of car(x)
predicted = place_y(predict_value, slope, intercept) # the speed of car given x
print(predicted)
predict_value_x()
def predict_value_y():
"""Using the line of regression a value can be predicted based on a given value.
i.e. Predict the age of a car (x) given its speed (y)"""
from scipy import stats
age_x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6] # population
speed_y = [99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86] # population
slope, intercept, r, p, std_err = stats.linregress(age_x, speed_y) # get stats values
predict_value = int(input("Given (y) predict (x): \ny = ")) # age of car(x)
predicted = place_x(predict_value, slope, intercept) # the speed of car given x
print(predicted)
predict_value_y()

The issue is with the place_y function, which is intended to predict y based on x, but you are using it to predict x based on y. The current implementation calculates y = slope * x + intercept, which doesn't return the correct result when trying to predict x from y. To predict x from y, you need to solve the equation y = slope * x + intercept for x: x = (y - intercept) / slope. Update the predict_value_y function in the line you calculate predicted:
predicted = (predict_value - intercept) / slope

numpy interpolation with period

Can someone explain to me the code that is in the documentation specifically this:
Interpolation with periodic x-coordinates:
x = [-180, -170, -185, 185, -10, -5, 0, 365]
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
np.interp(x, xp, fp, period=360)
array([7.5 , 5. , 8.75, 6.25, 3. ,
3.25, 3.5 , 3.75])
I did a trial like this
import matplotlib.pyplot as plt
import numpy as np
x = [-180, -170, -185, 185, -10, -5, 0, 365]
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
y=np.interp(x, xp, fp, period=360)
print(x)
print(y)
plt.grid()
plt.plot(xp, fp)
#plt.scatter(x,y,marker="o",color="green")
plt.plot(x,y,'o')
plt.show()
and it shows like this
How the orange points can be considered "interpolations" is beyond me. They are not even in the curve
EDIT: Thanks to Warren Weckesser for the detailed explanation!
A plot to see it better

The numbers used in the example that demonstrates the use of period in the interp docstring can be a bit difficult to interpret in a plot. Here's what is happening...
The period is 360, and the given "known" points are
xp = [190, -190, 350, -350]
fp = [ 5, 10, 3, 4]
Note that the values in xp span an interval longer than 360. Let's consider the interval [0, 360) to be the fundamental domain of the interpolator. If we map the given points to the fundamental domain, they are:
xp1 = [190, 170, 350, 10]
fp1 = [ 5, 10, 3, 4]
Now for a periodic interpolator, we can imagine this data being extended periodically in the positive and negative directions, e.g.
xp_ext = [..., 190-360, 170-360, 350-360, 10-360, 190, 170, 350, 10, 190+360, 170+360, 350+360, 10+360, ...]
fp_ext = [..., 5, 10, 3, 4, 5, 10, 3, 4, 5, 10, 3, 4, ...]
It is this extended data that interp is interpolating.
Here's a script that replaces the array x from the example with a dense set of points. With this dense set, the plot of y = np.interp(x, xp, fp, period=360) should make clearer what is going on:
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
x = np.linspace(-360, 720, 1200)
y = np.interp(x, xp, fp, period=360)
plt.plot(x, y, '--')
plt.plot(xp, fp, 'ko')
plt.grid(True)
Each "corner" in the plot is at a point in the periodically extended version of (xp, fp).

How to save result obtained via various fuctions into csv or excel file

I was hyper-tuning the machine learning algorithms via for loop.
but I do not know how to keep it saving into csv or excel file as it is showing in terminal output...Help me..Here I am sharing the code for reference.
def random_search():
options = create_opts()
# kernel
kernel_opts = np.array(["rbf", "poly", "linear"]) # 3
# C
C_opts = np.array([0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000]) # 15
# C_linear_opts = np.array([0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100]) # 10
# epsilon
epsilon_opts = np.array([.01, .02, .03, .04, .05, .06, .07, .08, .09, 0.1]) # 10
# gamma
gamma_opts = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]) # 9
# n_iter = len(kernel_opts)*len(C_opts)*len(epsilon_opts)*len(gamma_opts)
for kernel in kernel_opts:
for C in C_opts:
for epsilon in epsilon_opts:
for gamma in gamma_opts:
run_svr(options.random_state, options.poly_degree, kernel=kernel, C=C, epsilon=epsilon, gamma=gamma)

As a sample, use this. It's a little messed up, but I hope it helps you.
import pandas as pd
import numpy as np
# Creating
csv_format = {'col_1': [], 'col_2': [], 'col_3': [], 'col_4': []}
_db = pd.DataFrame(csv_format)
_db = seeds_db.fillna(0) # with 0s rather than NaNs
_db.to_csv("file.csv", index=False)
# Updating
_db = pd.read_csv("file.csv", delimiter=',')
new_rec = np.array([['val_1', 'val_2', 'val_3', 'val_4']])
_db = _db.append(pd.DataFrame(new_rec, columns=['col_1', 'col_2', 'col_3', 'col_4']))
_db.to_csv("file.csv", index=False)

Piecewise regression python

I am trying to do a piecewise linear regression in Python and the data looks like this,
I need to fit 3 lines for each section. Any idea how? I am having the following code, but the result is shown below. Any help would be appreciated.
import numpy as np
import matplotlib
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from scipy import optimize
def piecewise(x,x0,x1,y0,y1,k0,k1,k2):
return np.piecewise(x , [x <= x0, np.logical_and(x0<x, x< x1),x>x1] , [lambda x:k0*x + y0, lambda x:k1*(x-x0)+y1+k0*x0 lambda x:k2*(x-x1) y0+y1+k0*x0+k1*(x1-x0)])
x1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18,19,20,21], dtype=float)
y1 = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,145,147,149,151,153,155])
y1 = np.flip(y1,0)
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18,19,20,21], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,145,147,149,151,153,155])
y = np.flip(y,0)
perr_min = np.inf
p_best = None
for n in range(100):
k = np.random.rand(7)*20
p , e = optimize.curve_fit(piecewise, x1, y1,p0=k)
perr = np.sum(np.abs(y1-piecewise(x1, *p)))
if(perr < perr_min):
perr_min = perr
p_best = p
xd = np.linspace(0, 21, 100)
plt.figure()
plt.plot(x1, y1, "o")
y_out = piecewise(xd, *p_best)
plt.plot(xd, y_out)
plt.show()
data with fit
Thanks.

A very simple method (without iteration, without initial guess) can solve this problem.
The method of calculus comes from page 30 of this paper : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf (copy below).
The next figure shows the result :
The equation of the fitted function is :
Or equivalently :
H is the Heaviside function.
In addition, the details of the numerical calculus are given below :

Trying to interpolate linearly in python

I have 3 arrays: a, b, c all with length 15.
a=[950, 850, 750, 675, 600, 525, 460, 400, 350, 300, 250, 225, 200, 175, 150]
b = [16, 12, 9, -35, -40, -40, -40, -45, -50, -55, -60, -65, -70, -75, -80]
c=[32.0, 22.2, 12.399999999999999, 2.599999999999998, -7.200000000000003, -17.0, -26.800000000000004, -36.60000000000001, -46.400000000000006, -56.2, -66.0, -75.80000000000001, -85.60000000000001, -95.4, -105.20000000000002]
I am trying to find the value of a at the index where b=c. T
The problem is that there is no place where b=c exactly so I need to linearly interpolate between values in the array to find the value of a where b=c. Does that make sense?
I was thinking about using scipy.interpolate to do the interpolation.
I am having a hard time wrappying my mind around how to solve this problem. Any ideas on this would be great!

Here's simpler variation of a function from another answer of mine:
from __future__ import division
import numpy as np
def find_roots(t, y):
"""
Given the input signal `y` with samples at times `t`,
find the times where `y` is 0.
`t` and `y` must be 1-D numpy arrays.
Linear interpolation is used to estimate the time `t` between
samples at which sign changes in `y` occur.
"""
# Find where y crosses 0.
transition_indices = np.where(np.sign(y[1:]) != np.sign(y[:-1]))[0]
# Linearly interpolate the time values where the transition occurs.
t0 = t[transition_indices]
t1 = t[transition_indices + 1]
y0 = y[transition_indices]
y1 = y[transition_indices + 1]
slope = (y1 - y0) / (t1 - t0)
transition_times = t0 - y0/slope
return transition_times
That function can be used with t = a and y = b - c. For example, here is your data, entered as numpy arrays:
In [354]: a = np.array([950, 850, 750, 675, 600, 525, 460, 400, 350, 300, 250, 225, 200, 175, 150])
In [355]: b = np.array([16, 12, 9, -35, -40, -40, -40, -45, -50, -55, -60, -65, -70, -75, -80])
In [356]: c = np.array([32.0, 22.2, 12.399999999999999, 2.599999999999998, -7.200000000000003, -17.0, -26.800000000000004, -3
...: 6.60000000000001, -46.400000000000006, -56.2, -66.0, -75.80000000000001, -85.60000000000001, -95.4, -105.2000000000
...: 0002])
The place where "b = c" is the place where "b - c = 0", so we pass b - c for y:
In [357]: find_roots(a, b - c)
Out[357]: array([ 312.5])
So the linearly interpolated value of a is 312.5.
With the following matplotlib commands:
In [391]: plot(a, b, label="b")
Out[391]: [<matplotlib.lines.Line2D at 0x11eac8780>]
In [392]: plot(a, c, label="c")
Out[392]: [<matplotlib.lines.Line2D at 0x11f23aef0>]
In [393]: roots = find_roots(a, b - c)
In [394]: [axvline(root, color='k', alpha=0.2) for root in roots]
Out[394]: [<matplotlib.lines.Line2D at 0x11f258208>]
In [395]: grid()
In [396]: legend(loc="best")
Out[396]: <matplotlib.legend.Legend at 0x11f260ba8>
In [397]: xlabel("a")
Out[397]: <matplotlib.text.Text at 0x11e71c470>
I get the plot

This is not necessarily a solution to your problem, since your data does not appear to be linear, but it might give you some ideas. If you assume that your lines a, b, and c are linear, then the following idea works:
Perform a linear regression of lines a, b and c to get their respective slopes (m_a, m_b, m_c) and y-intercepts (b_a, b_b, b_c). Then solve the equation 'y_b = y_c' for x, and find y = m_a * x + b_a to get your result.
Since the linear regression approximately solves y = m * x + b, equation y_b = y_c can be solved by hand giving: x = (b_b-b_c) / (m_c-m_b).
Using python, you get:
>> m_a, b_a, r_a, p_a, err_a = stats.linregress(range(15), a)
>> m_b, b_b, r_b, p_b, err_b = stats.linregress(range(15), b)
>> m_c, b_c, r_c, p_c, err_c = stats.linregress(range(15), c)
>> x = (b_b-b_c) / (m_c-m_b)
>> m_a * x + b_a
379.55151515151516
Since your data is not linear, you probably need to go through your vectors one by one and search for overlapping y intervals. Then you can apply the above method but using only the endpoints of your two intervals to construct your b and c inputs to the linear regression. In this case, you should get an exact result, since the least-squares method will interpolate perfectly with only two points (although there are more efficient ways to do this since the intersection can be solved exactly in this simple case where there are two straight lines).
Cheers.

Another simple solution using:
one linear-regressor for each vector (done with scikit-learn as scipy-docs were down for me; easy to switch to numpy/scipy-based linear-regression)
general-purpose minimization using scipy.optimize.minimize
Code
a=[950, 850, 750, 675, 600, 525, 460, 400, 350, 300, 250, 225, 200, 175, 150]
b = [16, 12, 9, -35, -40, -40, -40, -45, -50, -55, -60, -65, -70, -75, -80]
c=[32.0, 22.2, 12.399999999999999, 2.599999999999998, -7.200000000000003, -17.0, -26.800000000000004, -36.60000000000001, -46.400000000000006, -56.2, -66.0, -75.80000000000001, -85.60000000000001, -95.4, -105.20000000000002]
from sklearn.linear_model import LinearRegression
from scipy.optimize import minimize
import numpy as np
reg_a = LinearRegression().fit(np.arange(len(a)).reshape(-1,1), a)
reg_b = LinearRegression().fit(np.arange(len(b)).reshape(-1,1), b)
reg_c = LinearRegression().fit(np.arange(len(c)).reshape(-1,1), c)
funA = lambda x: reg_a.predict(x.reshape(-1,1))
funB = lambda x: reg_b.predict(x.reshape(-1,1))
funC = lambda x: reg_c.predict(x.reshape(-1,1))
opt_crossing = lambda x: (funB(x) - funC(x))**2
x0 = 1
res = minimize(opt_crossing, x0, method='SLSQP', tol=1e-6)
print(res)
print('Solution: ', funA(res.x))
import matplotlib.pyplot as plt
x = np.linspace(0, 15, 100)
a_ = reg_a.predict(x.reshape(-1,1))
b_ = reg_b.predict(x.reshape(-1,1))
c_ = reg_c.predict(x.reshape(-1,1))
plt.plot(x, a_, color='blue')
plt.plot(x, b_, color='green')
plt.plot(x, c_, color='cyan')
plt.scatter(np.arange(15), a, color='blue')
plt.scatter(np.arange(15), b, color='green')
plt.scatter(np.arange(15), c, color='cyan')
plt.axvline(res.x, color='red', linestyle='solid')
plt.axhline(funA(res.x), color='red', linestyle='solid')
plt.show()
Output
fun: array([ 7.17320622e-15])
jac: array([ -3.99479864e-07, 0.00000000e+00])
message: 'Optimization terminated successfully.'
nfev: 8
nit: 2
njev: 2
status: 0
success: True
x: array([ 8.37754008])
Solution: [ 379.55151658]
Plot

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to weight station to Order Least Squares in python? - python

Related

Predicting y and x values using linear regressions

numpy interpolation with period

How to save result obtained via various fuctions into csv or excel file

Piecewise regression python

Trying to interpolate linearly in python

Categories

Resources