Based on the Coursera Course for Machine Learning, I'm trying to implement the cost function for a neural network in python. There is a question similar to this one -- with an accepted answer -- but the code in that answers is written in octave. Not to be lazy, I have tried to adapt the relevant concepts of the answer to my case, and as far as I can tell, I'm implementing the function correctly. The cost I output differs from the expected cost, however, so I'm doing something wrong.
Here's a small reproducible example:
The following link leads to an .npz file which can be loaded (as below) to obtain relevant data. Rename the file "arrays.npz" please, if you use it.
http://www.filedropper.com/arrays_1
if __name__ == "__main__":
with np.load("arrays.npz") as data:
thrLayer = data['thrLayer'] # The final layer post activation; you
# can derive this final layer, if verification needed, using weights below
thetaO = data['thetaO'] # The weight array between layers 1 and 2
thetaT = data['thetaT'] # The weight array between layers 2 and 3
Ynew = data['Ynew'] # The output array with a 1 in position i and 0s elsewhere
#class i is the class that the data described by X[i,:] belongs to
X = data['X'] #Raw data with 1s appended to the first column
Y = data['Y'] #One dimensional column vector; entry i contains the class of entry i
import numpy as np
m = len(thrLayer)
k = thrLayer.shape[1]
cost = 0
for i in range(m):
for j in range(k):
cost += -Ynew[i,j]*np.log(thrLayer[i,j]) - (1 - Ynew[i,j])*np.log(1 - thrLayer[i,j])
print(cost)
cost /= m
'''
Regularized Cost Component
'''
regCost = 0
for i in range(len(thetaO)):
for j in range(1,len(thetaO[0])):
regCost += thetaO[i,j]**2
for i in range(len(thetaT)):
for j in range(1,len(thetaT[0])):
regCost += thetaT[i,j]**2
regCost *= lam/(2*m)
print(cost)
print(regCost)
In actuality, cost should be 0.287629 and cost + newCost should be 0.383770.
This is the cost function posted in the question above, for reference:
The problem is that you are using the wrong class labels. When computing the cost function, you need to use the ground truth, or the true class labels.
I'm not sure what your Ynew array, was, but it wasn't the training outputs. So, I changed your code to use Y for the class labels in the place of Ynew, and got the correct cost.
import numpy as np
with np.load("arrays.npz") as data:
thrLayer = data['thrLayer'] # The final layer post activation; you
# can derive this final layer, if verification needed, using weights below
thetaO = data['thetaO'] # The weight array between layers 1 and 2
thetaT = data['thetaT'] # The weight array between layers 2 and 3
Ynew = data['Ynew'] # The output array with a 1 in position i and 0s elsewhere
#class i is the class that the data described by X[i,:] belongs to
X = data['X'] #Raw data with 1s appended to the first column
Y = data['Y'] #One dimensional column vector; entry i contains the class of entry i
m = len(thrLayer)
k = thrLayer.shape[1]
cost = 0
Y_arr = np.zeros(Ynew.shape)
for i in xrange(m):
Y_arr[i,int(Y[i,0])-1] = 1
for i in range(m):
for j in range(k):
cost += -Y_arr[i,j]*np.log(thrLayer[i,j]) - (1 - Y_arr[i,j])*np.log(1 - thrLayer[i,j])
cost /= m
'''
Regularized Cost Component
'''
regCost = 0
for i in range(len(thetaO)):
for j in range(1,len(thetaO[0])):
regCost += thetaO[i,j]**2
for i in range(len(thetaT)):
for j in range(1,len(thetaT[0])):
regCost += thetaT[i,j]**2
lam=1
regCost *= lam/(2.*m)
print(cost)
print(cost + regCost)
This outputs:
0.287629165161
0.383769859091
Edit: Fixed an integer division error with regCost *= lam/(2*m) that was zeroing out the regCost.
You might try this implementation
import scipy.io
mat=scipy.io.loadmat('ex4data1.mat')
X=mat['X']
y=mat['y']
theta=scipy.io.loadmat('ex4weights.mat')
theta1=theta['Theta1']
theta2=theta['Theta2']
theta=[theta1,theta2]
new=np.zeros((10,len(y)))
for i in range(len(y)):
new[y[i]-1,i]=1
y=new
def sigmoid(x):
return 1/(1+np.exp(-x))
def reg_cost(theta,X,y,lambda1):
current=X
for i in range(len(theta)):
a= np.append(np.ones((len(current),1)),current,axis=1)
z=np.matmul(a,theta[i].T)
z=sigmoid(z)
current=z
htheta=current
ans=np.sum(np.multiply(np.log(htheta),(y).T)) +
np.sum(np.multiply(np.log(1-htheta),(1-y).T))
ans=-ans/len(X)
for i in range(len(theta)):
new=theta[i][:,1:]
newsum=np.sum(np.multiply(new,new))
ans+=newsum*(lambda1)/(2*len(X))
return ans
print(reg_cost(theta,X,y,1))
It outputs
0.3837698590909236
Related
I have this least squares method, but I need it to obtain 10 dimensional data. This is from a practice text I'm learning from. I came up with this method for a two-dimensional data set. Now I need to have it work for a 10 dimensional one, but I'm totally stuck on it.
def least_squares(w):
cost = 0
for p in range(len(y)):
# get pth input/output pair
x_p = x[p]
y_p = y[p]
# form linear combination
c_p = w[0] + w[1] * x_p
# add least squares for this datapoint
cost += (c_p - y_p) ** 2
return cost
This is the result I should get after the edit
w = np.ones((11,1))
print (least_squares(w))
[ 7917.97952037]
I figured it out after a lot of tinkering.
# least squares cost function for linear regression
def least_squares(w):
cost = 0
for p in range(len(y)):
# get pth input/output pair
x_p = x[p]
y_p = y[p]
# form linear combination
c_p = w[0] + w[10] * sum(x_p)
# add least squares for this datapoint
cost += (c_p - y_p) ** 2
return cost
Problem Summary
I have been optimizing my function VectorizedVcdfe, and I am still trying to optimize it. This function is responsible for 99% of the slowness of another function customFunc. This customFunc is used in a PyMC3 code block.
Please help me optimize VectorizedVcdfe.
Function to optimize
def VectorizedVcdfe(self, x, dataVector, recip_h_times_lambda_vector):
n = len(dataVector)
differenceVector = x - dataVector
stackedDiffVecAndRecipVec = pymc3.math.stack(differenceVector, recip_h_times_lambda_vector)
erfcTerm = 1. - pymc3.math.erf(self.neg_sqrt1_2 * pymc3.math.prod(stackedDiffVecAndRecipVec, axis=0))
# Calc F_Hat
F_Hat = (1. / float(n)) * pymc3.math.sum(0.5 * erfcTerm)
# Return F_Hat
return(F_Hat)
Arguments/variables
x is a TensorVariable.
dataVector is a 1Xn numpy matrix.
recip_h_times_lambda_vector is also a 1Xn numpy matrix.
neg_sqrt1_2 is a scalar constant.
How customFunc is used
with pymc3.Model() as model:
# Create likelihood
like = pymc3.DensityDist('X', customFunc, shape=2)
# Make samples
step = pymc3.NUTS()
trace = pymc3.sample(2000, tune=1000, init=None, step=step, cores=2)
EDIT:
To answer commenters, random values are OK for both dataVector and
recip_h_times_lambda_vector for the purposes of doing this optimization. In reality, recip_h_times_lambda_vector is dependent on dataVector and a scalar parameter h.
Some commenters were wondering about customFunc, so here it is...
def customFunc(X):
Y = []
for j in range(2):
x_j = X[j]
F_x_j = fittedKdEstimator.VCDFE(x_j)
y_j = myPPF(F_x_j)
Y.append(y_j)
logLikelihood = 0.
recipSqrtTwoPi = 1. / math.sqrt(2. * math.pi)
for j in range(2):
y_j = Y[j]
logLikelihood += pymc3.math.log(recipSqrtTwoPi * pymc3.math.exp(y_j * y_j / -2.))
return(pymc3.math.exp(logLikelihood))
The global variable fittedKdEstimator is an instance of the class that contains the functions VectorizedVcdfe and VCDFE.
Here is the Python code for VCDFE...
def VCDFE(self, x):
if not self.beenFit: raise Exception("Must first fit to data")
return(self.VectorizedVcdfe(x, self.__dataVector, self.__recip_h_times_lambda_vector))
On a separate note, the function myPPF is my implementation of the standard normal "percent-point function" (AKA: "quantile function"). I have timed the customFunc, and myPPF takes a fraction of the entire time. The vast majority of time is consumed by VectorizedVcdfe.
Last but not least, a typical value for n may range from 10,000 to 100,000.
I am trying to solve the following problem via a Finite Difference Approximation in Python using NumPy:
$u_t = k \, u_{xx}$, on $0 < x < L$ and $t > 0$;
$u(0,t) = u(L,t) = 0$;
$u(x,0) = f(x)$.
I take $u(x,0) = f(x) = x^2$ for my problem.
Programming is not my forte so I need help with the implementation of my code. Here is my code (I'm sorry it is a bit messy, but not too bad I hope):
## This program is to implement a Finite Difference method approximation
## to solve the Heat Equation, u_t = k * u_xx,
## in 1D w/out sources & on a finite interval 0 < x < L. The PDE
## is subject to B.C: u(0,t) = u(L,t) = 0,
## and the I.C: u(x,0) = f(x).
import numpy as np
import matplotlib.pyplot as plt
# definition of initial condition function
def f(x):
return x^2
# parameters
L = 1
T = 10
N = 10
M = 100
s = 0.25
# uniform mesh
x_init = 0
x_end = L
dx = float(x_end - x_init) / N
#x = np.zeros(N+1)
x = np.arange(x_init, x_end, dx)
x[0] = x_init
# time discretization
t_init = 0
t_end = T
dt = float(t_end - t_init) / M
#t = np.zeros(M+1)
t = np.arange(t_init, t_end, dt)
t[0] = t_init
# Boundary Conditions
for m in xrange(0, M):
t[m] = m * dt
# Initial Conditions
for j in xrange(0, N):
x[j] = j * dx
# definition of solution to u_t = k * u_xx
u = np.zeros((N+1, M+1)) # NxM array to store values of the solution
# finite difference scheme
for j in xrange(0, N-1):
u[j][0] = x**2 #initial condition
for m in xrange(0, M):
for j in xrange(1, N-1):
if j == 1:
u[j-1][m] = 0 # Boundary condition
else:
u[j][m+1] = u[j][m] + s * ( u[j+1][m] - #FDM scheme
2 * u[j][m] + u[j-1][m] )
else:
if j == N-1:
u[j+1][m] = 0 # Boundary Condition
print u, t, x
#plt.plot(t, u)
#plt.show()
So the first issue I am having is I am trying to create an array/matrix to store values for the solution. I wanted it to be an NxM matrix, but in my code I made the matrix (N+1)x(M+1) because I kept getting an error that the index was going out of bounds. Anyways how can I make such a matrix using numpy.array so as not to needlessly take up memory by creating a (N+1)x(M+1) matrix filled with zeros?
Second, how can I "access" such an array? The real solution u(x,t) is approximated by u(x[j], t[m]) were j is the jth spatial value, and m is the mth time value. The finite difference scheme is given by:
u(x[j],t[m+1]) = u(x[j],t[m]) + s * ( u(x[j+1],t[m]) - 2 * u(x[j],t[m]) + u(x[j-1],t[m]) )
(See here for the formulation)
I want to be able to implement the Initial Condition u(x[j],t[0]) = x**2 for all values of j = 0,...,N-1. I also need to implement Boundary Conditions u(x[0],t[m]) = 0 = u(x[N],t[m]) for all values of t = 0,...,M. Is the nested loop I created the best way to do this? Originally I tried implementing the I.C. and B.C. under two different for loops which I used to calculate values of the matrices x and t (in my code I still have comments placed where I tried to do this)
I think I am just not using the right notation but I cannot find anywhere in the documentation for NumPy how to "call" such an array so at to iterate through each value in the proposed scheme. Can anyone shed some light on what I am doing wrong?
Any help is very greatly appreciated. This is not homework but rather to understand how to program FDM for Heat Equation because later I will use similar methods to solve the Black-Scholes PDE.
EDIT: So when I run my code on line 60 (the last "else" that I use) I get an error that says invalid syntax, and on line 51 (u[j][0] = x**2 #initial condition) I get an error that reads "setting an array element with a sequence." What does that mean?
lately i am been working fitting a fourier series function to a periodic signal for retrieve the amplitude and the phase of each component via least squares, so i modified the code of this file for it:
import math
import numpy as np
#period of the signal
per=1.0
w = 2.0*np.pi/per
#number of fourier components.
nf = 5
fp = open("file.cat","r")
# m1 is the number of unknown coefficients.
m1 = 2*nf + 1
# Create empty matrices.
x = np.zeros((m1,m1))
y = np.zeros((m1,1))
xi = [0.0]*m1
# Read (time, value) from each line of the file.
for line in fp:
t = float(line.split()[0])
yi = float(line.split()[1])
xi[0] = 1.0
for k in range(1,nf+1):
xi[2*k-1] = np.sin(k*w*t)
xi[2*k] = np.cos(k*w*t)
for j in range(m1):
for k in range(m1):
x[j,k] += xi[j]*xi[k]
y[j] += yi*xi[j]
fp.close()
# Copy to big matrices.
X = np.mat( x.copy() )
Y = np.mat( y.copy() )
# Invert X and multiply by Y to get coefficients.
A = X.I*Y
A0 = A[0]
# Solution is A0 + Sum[ Amp*sin(k*wt + phi) ]
print "a[0] = %f" % A[0]
for k in range(1,nf+1):
amp = math.sqrt(A[2*k-1]**2 + A[2*k]**2)
phs = math.atan2(A[2*k],A[2*k-1])
print "amp[%d] = %f phi = %f" % (k, amp, phs)
but the plot show this (without the points, of course):
and it should show something like this:
somebody can tell me how can i compute the phase and the amplitude in another simpler way? a guide maybe, i will be very grateful.
cheers!
PD. I will attach the FILE that i used, just because :)
EDITED
The error was with a index :(
First, I defined the vector with the values:
amp = np.array([np.sqrt((A[2*k-1])**2 + (A[2*k])**2) for k in range(1,nf+1)])
phs = np.array([math.atan2(A[2*k],A[2*k-1]) for k in range(1,nf+1)])
and then, to build the signal, I defined:
def term(t): return np.array([amp[k]*np.sin(k*w*t + phs[k]) for k in range(len(amp))])
Signal = np.array([A0+sum(term(phase[i])) for i in range(len(mag))])
but within the np.sin(), k should be k+1, because the index start in 0 ·__·
def term(t): return np.array([amp[k]*np.sin((k+1)*w*t + phs[k]) for k in range(len(amp))])
plt.plot(phase,Signal,'r-',lw=3)
and that is all.
Thanks Marco Tompitak for the help!!
You're specifying the wrong period for the signal:
#period of the signal
per=0.178556
This gives you the resulting Fourier fit, indeed with a maximum period of ~0.17. The problem is that this number specifies the longest period that is present in your Fourier series. The function only has components with perior 0.17 or shorter. Apparently you are expecting a fit with period ~1, so it can never approximate that properly. You should specify per=1.0. There's nothing wrong with the algorithm; a quick writeup of a similar algorithm in Mathematica gives the same output and plausible results:
Currently my convergence criteria for SGD checks whether the MSE error ratio is within a specific boundary.
def compute_mse(data, labels, weights):
m = len(labels)
hypothesis = np.dot(data,weights)
sq_errors = (hypothesis - labels) ** 2
mse = np.sum(sq_errors)/(2.0*m)
return mse
cur_mse = 1.0
prev_mse = 100.0
m = len(labels)
while cur_mse/prev_mse < 0.99999:
prev_mse = cur_mse
for i in range(m):
d = np.array(data[i])
hypothesis = np.dot(d, weights)
gradient = np.dot((labels[i] - hypothesis), d)/m
weights = weights + (alpha * gradient)
cur_mse = compute_mse(data, labels, weights)
if cur_mse > prev_mse:
return
The weights are update w.r.t. to a single data point in the training set.
With an alpha of 0.001, the model is supposed to have converged within a few iterations however I get no convergence. Is this convergence criteria too strict?
I'll try to answer the question. First, the pseudocode of stochastic gradient descent looks something like this:
input: f(x), alpha, initial x (guess or random)
output: min_x f(x) # x that minimizes f(x)
while True:
shuffle data # good practice, not completely needed
for d in data:
x -= alpha * grad(f(x)) # df/dx
if <stopping criterion>:
break
There can be other regularization parameters added to the function that you want to minimize, such as the l1 penalty to avoid overfitting.
Going back to your problem, looking at your data and definition of the gradient, looks like you want to solve a simple linear system of equations of the form:
Ax = b
which yields the objevtive function:
f(x) = ||Ax - b||^2
stochastic gradient descent uses one row data at a time:
||A_i x - b||
where || o || is the euclidean norm and _i means index of a row.
Here, A is your data, x is your weights and b is your labels.
The gradient of the function is then computed as a:
grad(f(x)) = 2 * A.T (Ax - b)
Or in the case of the stochastic gradient descent:
2 * A_i.T (A_i x - b)
where .T means transpose.
Putting everything back into your code... first I will setup a synthetic data:
A = np.random.randn(100, 2) # 100x2 data
x = np.random.randn(2, 1) # 2x1 weights
b = np.random.randint(0, 2, 100).reshape(100, 1) # 100x1 labels
b[b == 0] = -1 # labels in {-1, 1}
Then, define the parameters:
alpha = 0.001
cur_mse = 100.
prev_mse = np.inf
it = 0
max_iter = 100
m = A.shape[0]
idx = range(m)
And loop!
while cur_mse/prev_mse < 0.99999 and it < max_iter:
prev_mse = cur_mse
shuffle(idx)
for i in idx:
d = A[i:i+1]
y = b[i:i+1]
h = np.dot(d, x)
dx = 2 * np.dot(d.T, (h - y))
x -= (alpha * dx)
cur_mse = np.mean((A.dot(x) - b)**2)
if cur_mse > prev_mse:
raise Exception("Not converging")
it += 1
This code is pretty much the same as yours, with a couple of additions:
Another stopping criterion based on the number of iterations (to avoid looping forever if the system doesn't converge or does too slowly)
Redefinition of the gradient dx (still similar to yours). You have the sign inverted and therefore the weight update is positive + since in my example is negative - (makes sense since you are going down in a gradient).
Indexing of data and labels. While data[i] gives a tuple of size (2,) (in this case for a 100x2 data), using fancy indexing data[i:i+1] will return a view of the data without reshaping it (e.g with shape (1, 2)) and therefore will allow you to perform the proper matrix multiplications.
You can add a 3rd stopping criterion based on acceptable mse error, i.e: if cur_mse < 1e-3: break.
This algorithm, with random data, converges in 20-40 iterations for me (depending on the generated random data).
So... assuming that this is the function you want to minimize, if this method doesn't work for you, it might mean that your system is underdeterminated (you have less training data than features, which means A is more wide than high).
Hope it helps!