Multi-Variable Gradient Descent using Numpy - Error in no. of coefficients - python

For the past few days, I have been trying to code this application of Gradient Descent for my final-year project in Mechanical Engineering. https://drive.google.com/open?id=1tIGqZ2Lb0sN4GEpgYEZLFvtmhigXnot0 The HTML file is attached above. Just download the file, and if you see the results. There are only 3 values in theta, whereas x has 3 independent variables. So it should have 4 values in theta.
The code is as follows. For the result, it is theta [-0.03312393 0.94409351 0.99853041]
import numpy as np
import random
import pandas as pd
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=",",header=None)
x = df.loc[:,'0':'2']
y = df[3]
print (x)
m, n = np.shape(x)
numIterations= 200
alpha = 0.000001
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

Related

Integrating and fitting coupled ODE's for SIR modelling

In this case, there are 3 ODE's that describe a SIR model. The issue comes in I want to calculate which beta and gamma values are the best to fit onto the datapoints from the x_axis and y_axisvalues. The method I'm currently using is to integrate the ODE's using odeintfrom the scipy library and the curve_fit method also from the same library. In this case, how would you calculate the values for beta and gamma to fit the datapoints?
P.S. the current error is this: ValueError: operands could not be broadcast together with shapes (3,) (14,)
#initial values
S_I_R = (0.762/763, 1/763, 0)
x_axis = [m for m in range(1,15)]
y_axis = [3,8,28,75,221,291,255,235,190,125,70,28,12,5]
# ODE's that describe the system
def equation(SIR_Values,t,beta,gamma):
Array = np.zeros((3))
SIR = SIR_Values
Array[0] = -beta * SIR[0] * SIR[1]
Array[1] = beta * SIR[0] * SIR[1] - gamma * SIR[1]
Array[2] = gamma * SIR[1]
return Array
# Results = spi.odeint(equation,S_I_R,time)
#fitting the values
beta_values,gamma_values = curve_fit(equation, x_axis,y_axis)
# Starting values
S0 = 762/763
I0 = 1/763
R0 = 0
x_axis = np.array([m for m in range(0,15)])
y_axis = np.array([1,3,8,28,75,221,291,255,235,190,125,70,28,12,5])
y_axis = np.divide(y_axis,763)
def sir_model(y, x, beta, gamma):
S = -beta * y[0] * y[1]
R = gamma * y[1]
I = beta * y[0] * y[1] - gamma * y[1]
return S, I, R
def fit_odeint(x, beta, gamma):
return spi.odeint(sir_model, (S0, I0, R0), x, args=(beta, gamma))[:,1]
popt, pcov = curve_fit(fit_odeint, x_axis, y_axis)
beta,gamma = popt
fitted = fit_odeint(x_axis,*popt)
plt.plot(x_axis,y_axis, 'o', label = "infected per day")
plt.plot(x_axis, fitted, label = "fitted graph")
plt.xlabel("Time (in days)")
plt.ylabel("Fraction of infected")
plt.title("Fitted beta and gamma values")
plt.legend()
plt.show()
As this example from scipy documentation, the function must output an array with the same size as x_axis and y_axis.

How to solve / fit a geometric brownian motion process in Python?

For example, the below code simulates Geometric Brownian Motion (GBM) process, which satisfies the following stochastic differential equation:
The code is a condensed version of the code in this Wikipedia article.
import numpy as np
np.random.seed(1)
def gbm(mu=1, sigma = 0.6, x0=100, n=50, dt=0.1):
step = np.exp( (mu - sigma**2 / 2) * dt ) * np.exp( sigma * np.random.normal(0, np.sqrt(dt), (1, n)))
return x0 * step.cumprod()
series = gbm()
How to fit the GBM process in Python? That is, how to estimate mu and sigma and solve the stochastic differential equation given the timeseries series?
Parameter estimation for SDEs is a research level area, and thus rather non-trivial. Whole books exist on the topic. Feel free to look into those for more details.
But here's a trivial approach for this case. Firstly, note that the log of GBM is an affinely transformed Wiener process (i.e. a linear Ito drift-diffusion process). So
d ln(S_t) = (mu - sigma^2 / 2) dt + sigma dB_t
Thus we can estimate the log process parameters and translate them to fit the original process. Check out
[1],
[2],
[3],
[4], for example.
Here's a script that does this in two simple ways for the drift (just wanted to see the difference), and just one for the diffusion (sorry). The drift of the log-process is estimated by (X_T - X_0) / T and via the incremental MLE (see code). The diffusion parameter is estimated (in a biased way) with its definition as the infinitesimal variance.
import numpy as np
np.random.seed(9713)
# Parameters
mu = 1.5
sigma = 0.9
x0 = 1.0
n = 1000
dt = 0.05
# Times
T = dt*n
ts = np.linspace(dt, T, n)
# Geometric Brownian motion generator
def gbm(mu, sigma, x0, n, dt):
step = np.exp( (mu - sigma**2 / 2) * dt ) * np.exp( sigma * np.random.normal(0, np.sqrt(dt), (1, n)))
return x0 * step.cumprod()
# Estimate mu just from the series end-points
# Note this is for a linear drift-diffusion process, i.e. the log of GBM
def simple_estimate_mu(series):
return (series[-1] - x0) / T
# Use all the increments combined (maximum likelihood estimator)
# Note this is for a linear drift-diffusion process, i.e. the log of GBM
def incremental_estimate_mu(series):
total = (1.0 / dt) * (ts**2).sum()
return (1.0 / total) * (1.0 / dt) * ( ts * series ).sum()
# This just estimates the sigma by its definition as the infinitesimal variance (simple Monte Carlo)
# Note this is for a linear drift-diffusion process, i.e. the log of GBM
# One can do better than this of course (MLE?)
def estimate_sigma(series):
return np.sqrt( ( np.diff(series)**2 ).sum() / (n * dt) )
# Estimator helper
all_estimates0 = lambda s: (simple_estimate_mu(s), incremental_estimate_mu(s), estimate_sigma(s))
# Since log-GBM is a linear Ito drift-diffusion process (scaled Wiener process with drift), we
# take the log of the realizations, compute mu and sigma, and then translate the mu and sigma
# to that of the GBM (instead of the log-GBM). (For sigma, nothing is required in this simple case).
def gbm_drift(log_mu, log_sigma):
return log_mu + 0.5 * log_sigma**2
# Translates all the estimates from the log-series
def all_estimates(es):
lmu1, lmu2, sigma = all_estimates0(es)
return gbm_drift(lmu1, sigma), gbm_drift(lmu2, sigma), sigma
print('Real Mu:', mu)
print('Real Sigma:', sigma)
### Using one series ###
series = gbm(mu, sigma, x0, n, dt)
log_series = np.log(series)
print('Using 1 series: mu1 = %.2f, mu2 = %.2f, sigma = %.2f' % all_estimates(log_series) )
### Using K series ###
K = 10000
s = [ np.log(gbm(mu, sigma, x0, n, dt)) for i in range(K) ]
e = np.array( [ all_estimates(si) for si in s ] )
avgs = np.mean(e, axis=0)
print('Using %d series: mu1 = %.2f, mu2 = %.2f, sigma = %.2f' % (K, avgs[0], avgs[1], avgs[2]) )
The output:
Real Mu: 1.5
Real Sigma: 0.9
Using 1 series: mu1 = 1.56, mu2 = 1.54, sigma = 0.96
Using 10000 series: mu1 = 1.51, mu2 = 1.53, sigma = 0.93

Gradient Descent numpy Python - discrepancy between excel and calculated data

Writing this algorithm for my final year project. Used gradient descent to find the best fit line. I tried solving it with excel too using Multi-regression. The values are different.
The csv file is attached here https://drive.google.com/file/d/1-UaU34w3c5-VunYrVz9fD7vRb0c-XDqk/view?usp=sharing. The first 3 columns are independent variables (x1,x2,x3) and the last is dependent (y).
Its a different question, If you could explain why the answer is different from excel values?
import numpy as np
import random
import pandas as pd
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
gradient = np.dot(xTrans, loss) / m
theta = theta - alpha * gradient
return theta
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=",",header=None)
df.columns = ['x0','Speed','Feed','DOC','Roughness']
print(df)
y = np.array(df['Roughness'])
#x = np.array(d)
x = np.array(df.drop(['Roughness'],1))
#x[:,2:3] = 1.0
print (x)
print(y)
m, n = np.shape(x)
print(m,n)
numIterations= 50000
alpha = 0.000001
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

Linear Regression with Gradient Descent in Python with numpy

I'm trying to implement in Python the first exercise of Andrew NG's Coursera Machine Learning course. In the course the exercise is with Matlab/Octave, but I wanted to implement it in Python as well.
The problem is that the line that updates theta values, does not seem to be working right, is returning values ​​[[0.72088159] [0.72088159]] but should return [[-3.630291] [1.166362]]
I'm using a learning rate of 0.01 and the gradient loop was set to 1500 (the same values ​​from the original exercise in Octave).
And obviously, with these wrong values ​​for theta, the predictions are not correct as shown in the last chart.
In the rows in which I tesyo the cost function with theta values ​​defined as [0; 0] and [-1; 2], the results are correct (the same as the exercise in Octave), so the error can only be in the function of the gradient, but I do not know what went wrong.
I wanted someone to help me figure out what I'm doing wrong. I'm grateful already.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def load_data():
X = np.genfromtxt('data.txt', usecols=(0), delimiter=',', dtype=None)
y = np.genfromtxt('data.txt', usecols=(1), delimiter=',', dtype=None)
X = X.reshape(1, X.shape[0])
y = y.reshape(1, y.shape[0])
ones = np.ones(X.shape)
X = np.append(ones, X, axis=0)
theta = np.zeros((2, 1))
return (X, y, theta)
alpha = 0.01
iter_num = 1500
debug_at_loop = 10
def plot(x, y, y_hat=None):
x = x.reshape(x.shape[0], 1)
plt.xlabel('x')
plt.ylabel('hΘ(x)')
plt.ylim(ymax = 25, ymin = -5)
plt.xlim(xmax = 25, xmin = 5)
plt.scatter(x, y)
if type(y_hat) is np.ndarray:
plt.plot(x, y_hat, '-')
plt.show()
plot(X[1], y)
def hip(X, theta):
return np.dot(theta.T, X)
def cost(X, y, theta):
m = y.shape[1]
return np.sum(np.square(hip(X, theta) - y)) / (2 * m)
print('With theta = [0 ; 0]')
print('Cost computed =', cost(X, y, np.array([0, 0])))
print()
print('With theta = [-1 ; 2]')
print('Cost computed =', cost(X, y, np.array([-1, 2])))
def grad(X, y, alpha, theta, iter_num=1500, debug_cost_at_each=10):
J = []
m = y.shape[1]
for i in range(iter_num):
theta -= ((alpha * 1) / m) * np.sum(np.dot(hip(X, theta) - y, X.T))
if i % debug_cost_at_each == 0:
J.append(round(cost(X, y, theta), 6))
return J, theta
X, y, theta = load_data()
J, fit_theta = grad(X, y, alpha, theta)
print('Theta found by Gradient Descent:', fit_theta)
# Predict values for population sizes of 35,000 and 70,000
predict1 = np.dot(np.array([[1], [3.5]]).T, fit_theta);
print('For population = 35,000, we predict a profit of \n', predict1 * 10000);
predict2 = np.dot(np.array([[1], [7]]).T, fit_theta);
print('For population = 70,000, we predict a profit of \n', predict2 * 10000);
pred_y = hip(X, fit_theta)
plot(X[1], y, pred_y.T)
The data I'm using is the following txt:
6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
8.5781,12
6.4862,6.5987
5.0546,3.8166
5.7107,3.2522
14.164,15.505
5.734,3.1551
8.4084,7.2258
5.6407,0.71618
5.3794,3.5129
6.3654,5.3048
5.1301,0.56077
6.4296,3.6518
7.0708,5.3893
6.1891,3.1386
20.27,21.767
5.4901,4.263
6.3261,5.1875
5.5649,3.0825
18.945,22.638
12.828,13.501
10.957,7.0467
13.176,14.692
22.203,24.147
5.2524,-1.22
6.5894,5.9966
9.2482,12.134
5.8918,1.8495
8.2111,6.5426
7.9334,4.5623
8.0959,4.1164
5.6063,3.3928
12.836,10.117
6.3534,5.4974
5.4069,0.55657
6.8825,3.9115
11.708,5.3854
5.7737,2.4406
7.8247,6.7318
7.0931,1.0463
5.0702,5.1337
5.8014,1.844
11.7,8.0043
5.5416,1.0179
7.5402,6.7504
5.3077,1.8396
7.4239,4.2885
7.6031,4.9981
6.3328,1.4233
6.3589,-1.4211
6.2742,2.4756
5.6397,4.6042
9.3102,3.9624
9.4536,5.4141
8.8254,5.1694
5.1793,-0.74279
21.279,17.929
14.908,12.054
18.959,17.054
7.2182,4.8852
8.2951,5.7442
10.236,7.7754
5.4994,1.0173
20.341,20.992
10.136,6.6799
7.3345,4.0259
6.0062,1.2784
7.2259,3.3411
5.0269,-2.6807
6.5479,0.29678
7.5386,3.8845
5.0365,5.7014
10.274,6.7526
5.1077,2.0576
5.7292,0.47953
5.1884,0.20421
6.3557,0.67861
9.7687,7.5435
6.5159,5.3436
8.5172,4.2415
9.1802,6.7981
6.002,0.92695
5.5204,0.152
5.0594,2.8214
5.7077,1.8451
7.6366,4.2959
5.8707,7.2029
5.3054,1.9869
8.2934,0.14454
13.394,9.0551
5.4369,0.61705
Well, I got it after losing several strands of hair (the programming will still leave me bald).
It was on the gradient line, and the solution was this:
theta -= ((alpha * 1) / m) * np.dot(X, (hip(X, theta) - y).T)
I changed the place of X and transposed the error vector.

Why is my gradient descent algorithm not working correctly?

I am trying to mimic the gradient descent algorithm for linear regression from Andrew NG's Machine learning course to Python, but for some reason my implementation is not working correctly.
Here's my implementation in Octave, it works correctly:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
prediction = X*theta;
margin_error = prediction - y;
gradient = 1/m * (alpha * (X' * margin_error));
theta = theta - gradient;
J_history(iter) = computeCost(X, y, theta);
end
end
However, when I translate this to Python for some reason it is not giving me accurate results. The cost seems to be going up rather than descending.
Here's my implementation in Python:
def gradientDescent(x, y, theta, alpha, iters):
m = len(y)
J_history = np.matrix(np.zeros((iters,1)))
for i in range(iters):
prediction = x*theta.T
margin_error = prediction - y
gradient = 1/m * (alpha * (x.T * margin_error))
theta = theta - gradient
J_history[i] = computeCost(x,y,theta)
return theta,J_history
My code is compiling and there isn't anything wrong. Please note this is theta:
theta = np.matrix(np.array([0,0]))
Alpha and iters is set to this:
alpha = 0.01
iters = 1000
When I run it, opt_theta, cost = gradientDescent(x, y, theta, alpha, iters) and print out opt_theta, I get this:
matrix([[ 2.36890383e+16, -1.40798902e+16],
[ 2.47503758e+17, -2.36890383e+16]])
when I should get this:
matrix([[-3.24140214, 1.1272942 ]])
What am I doing wrong?
Edit:
Cost function
def computeCost(x, y, theta):
# Get length of data set
m = len(y)
# We get theta transpose because we are working with a numpy array [0,0] for example
prediction = x * theta.T
J = 1/(2*m) * np.sum(np.power((prediction - y), 2))
return J
Look there:
>>> A = np.matrix([3,3,3])
>>> B = np.matrix([[1,1,1], [2,2,2]])
>>> A-B
matrix([[2, 2, 2],
[1, 1, 1]])
Matrices are broadcasted together.
"it's because np.matrix inherits from np.array. np.matrix overrides multiplication, but not addition and subtraction"
In yours situation theta(1x2) subtract gradient(2x1) and in result you have got 2x2. Try to transpose gradient before subtracting.
theta = theta - gradient.T

Categories