T.hessian gives NotImplementedError() in theano

T.hessian gives NotImplementedError() in theano - python

How it can be that it works
g_W = T.grad(cost=cost, wrt=classifier.vparamW)
whereas this
H_W=T.hessian(cost=cost, wrt=classifier.vparamW)
gives NotImplementedError()
may it be that the problem in such cost function:
-T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
Here y is the vector of class labels from 0 to n-1 and
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)

I am unable to reproduce this problem with the limited code that has been provided. However, here is a fully working demo of T.grad and T.hessian.
import numpy
import theano
import theano.tensor as T
x = T.matrix()
w_flat = theano.shared(numpy.random.randn(3, 2).astype(theano.config.floatX).flatten())
w = w_flat.reshape((3, 2))
cost = T.pow(theano.dot(x, w), 2).sum()
g_w = T.grad(cost=cost, wrt=[w])
h_w = T.hessian(cost=cost, wrt=[w_flat])
f = theano.function([x], outputs=g_w + h_w)
for output in f(numpy.random.randn(4, 3).astype(theano.config.floatX)):
print output.shape, '\n', output
Note that the wrt value for T.hessian needs to be a vector.

Related

Linear Regression with python - gradient descent error

I have been trying to implement my own Linear Regression from scratch using python but have been facing a issue during the last days.
This is the code I am using :
Import modules
import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
import matplotlib.pyplot as plt
Initialize parameters
def initialize_parameters(n):
w = np.zeros(n,)
b = 0.0
return w,b
Predictor/Hypothesis
def predictor(x, w, b):
return np.dot(x,w) + b
Cost function
def calculate_cost(X, y, theta, b):
m = len(y)
predictions = np.dot(X, theta)
error = predictions - y
cost = (1/2*m) * np.sum(np.power(error,2))
return cost
Gradient descent
def gradient_descent(X, W, b, y, learning_rate = 0.0001, epochs = 25):
m = len(y)
final_cost = 0
for _ in range(epochs):
predictions = predictor(X, W, b)
error = predictions - y
derivate = np.dot(error, X)
print(derivate)
W = W - (learning_rate/m) * derivate
b = b - (learning_rate/m) * error.sum()
Test run :
# Load dataset
boston = load_boston()
data = pd.DataFrame(boston.data)
data.columns = boston.feature_names
data['PRICE'] = boston.target
# Split dataset
X = data.drop(columns=['PRICE']).values
Y = data['PRICE'].values
w, b = initialize_parameters(X.shape[1])
gradient_descent(X, w, b, Y)
During the test run, I can see that the values for the derivate is growing insanely fast :
[1.41239553e+06 3.20162679e+06 3.84829686e+06 2.17737688e+04
1.81667467e+05 1.99565485e+06 2.27660208e+07 1.15045731e+06
3.50107975e+06 1.40396525e+08 5.96494458e+06 1.14447329e+08
4.25947931e+06]
[-4.33362969e+07 -9.66008831e+07 -1.16941872e+08 -6.62733008e+05
-5.50761913e+06 -6.04452389e+07 -6.90425672e+08 -3.46792848e+07
-1.06967561e+08 -4.26847914e+09 -1.80579130e+08 -3.45024565e+09
-1.29016170e+08]
...
[-2.01209195e+34 -4.47742185e+34 -5.42629282e+34 -3.07294644e+32
-2.55503032e+33 -2.80363423e+34 -3.20314565e+35 -1.60824109e+34
-4.96433806e+34 -1.98052568e+36 -8.37673498e+34 -1.60024763e+36
-5.98654489e+34]
[6.09700758e+35 1.35674093e+36 1.64426623e+36 9.31159124e+33
7.74221040e+34 8.49552585e+35 9.70611871e+36 4.87326542e+35
1.50428547e+36 6.00135600e+37 2.53830431e+36 4.84904376e+37
1.81403288e+36]
[-1.84750510e+37 -4.11117381e+37 -4.98242821e+37 -2.82158290e+35
-2.34603173e+36 -2.57430013e+37 -2.94113196e+38 -1.47668879e+37
-4.55826082e+37 -1.81852092e+39 -7.69152754e+37 -1.46934918e+39
-5.49685229e+37]
[5.59827926e+38 1.24576106e+39 1.50976712e+39 8.54991361e+36
7.10890636e+37 7.80060146e+38 8.91216919e+39 4.47463782e+38
1.38123662e+39 5.51045187e+40 2.33067389e+39 4.45239747e+40
1.66564705e+39]
[-1.69638128e+40 -3.77488445e+40 -4.57487122e+40 -2.59078061e+38
-2.15412899e+39 -2.36372529e+40 -2.70055070e+41 -1.35589732e+40
-4.18540025e+40 -1.66976797e+42 -7.06236930e+40 -1.34915808e+42
-5.04721600e+40]
And then, the gradient descent run stops before all interactions due to the high values.
At a certain point, the values form the derivate assume values as NaN.
As expected, when I try to predict a test case, I get 0.0 as output:
sample_house = [[2.29690000e-01, 0.00000000e+00, 1.05900000e+01, 0.00000000e+00, 4.89000000e-01,
6.32600000e+00, 5.25000000e+01, 4.35490000e+00, 4.00000000e+00, 2.77000000e+02,
1.86000000e+01, 3.94870000e+02, 1.09700000e+01]]
test_predict = predictor(sample_house, w, b)
test_predict
------------------------------------------------
out : array([0.])
Thanks!

Your cost function is wrong, it should be:
cost = 1/(2*m) * np.sum(np.power(error,2))
Also, try to initialize your weights as random values between 0 an 1 and scale your inputs to range 0-1.

I had the same issue which I resolved by normalizing the x values.

I think that you are making a mistake in the gradient descent algorithm. When updating the values for "W" vector it should be:
W = W - (learning_rate/m) * derivate.sum()

The learning rate is too large.
I try learning_rate = 0.000001, and it converges normally.

How to Use AutoGrad Packages?

I am trying to do a simple thing: use autograd to get gradients and do gradient descent:
import tangent
def model(x):
return a*x + b
def loss(x,y):
return (y-model(x))**2.0
After getting loss for an input-output pair, I want to get gradients wrt loss:
l = loss(1,2)
# grad_a = gradient of loss wrt a?
a = a - grad_a
b = b - grad_b
But the library tutorials don't show how to do obtain gradient with respect to a or b i.e. the parameters so, neither autograd nor tangent.

You can specify this with the second argument of the grad function:
def f(x,y):
return x*x + x*y
f_x = grad(f,0) # derivative with respect to first argument
f_y = grad(f,1) # derivative with respect to second argument
print("f(2,3) = ", f(2.0,3.0))
print("f_x(2,3) = ", f_x(2.0,3.0))
print("f_y(2,3) = ", f_y(2.0,3.0))
In your case, 'a' and 'b' should be an input to the loss function, which passes them to the model in order to calculate the derivatives.
There was a similar question i just answered:
Partial Derivative using Autograd

Here this may help:
import autograd.numpy as np
from autograd import grad
def tanh(x):
y=np.exp(-x)
return (1.0-y)/(1.0+y)
grad_tanh = grad(tanh)
print(grad_tanh(1.0))
e=0.00001
g=(tanh(1+e)-tanh(1))/e
print(g)
Output:
0.39322386648296376
0.39322295790622513
Here is what you may create:
import autograd.numpy as np
from autograd import grad # grad(f) returns f'
def f(x): # tanh
y = np.exp(-x)
return (1.0 - y) / ( 1.0 + y)
D_f = grad(f) # Obtain gradient function
D2_f = grad(D_f)# 2nd derivative
D3_f = grad(D2_f)# 3rd derivative
D4_f = grad(D3_f)# etc.
D5_f = grad(D4_f)
D6_f = grad(D5_f)
import matplotlib.pyplot as plt
plt.subplots(figsize = (9,6), dpi=153 )
x = np.linspace(-7, 7, 100)
plt.plot(x, list(map(f, x)),
x, list(map(D_f , x)),
x, list(map(D2_f , x)),
x, list(map(D3_f , x)),
x, list(map(D4_f , x)),
x, list(map(D5_f , x)),
x, list(map(D6_f , x)))
plt.show()
Output:

Python: How to prevent Scipy's optimize.minimize function from changing the shape of initial guess x0?

I am trying to implement the optimization algorithm from Scipy. It works fine when I implement it without inputting the Jacobian gradient function. I believe the issue that I am getting when I input the gradient is because the minimize function itself is changing the shape of the initial guess x0. You can see this from the output of the code below.
Input:
import numpy as np
from costFunction import *
import scipy.optimize as op
def sigmoid(z):
epsilon = np.finfo(z.dtype).eps
g = 1/(1+np.exp(-z))
g = np.clip(g,epsilon,1-epsilon)
return g
def costFunction(theta,X,y):
m = y.size
h = sigmoid(X#theta)
J = 1/(m)*(-y.T#np.log(h)-(1-y).T#np.log(1-h))
grad = 1/m*X.T#(h-y)
print ('Shape of theta is',np.shape(theta),'\n')
print ('Shape of gradient is',np.shape(grad),'\n')
return J, grad
X = np.array([[1, 3],[5,7]])
y = np.array([[1],[0]])
m,n = np.shape(X)
one_vec = np.ones((m,1))
X = np.hstack((one_vec,X))
initial_theta = np.zeros((n+1,1))
print ('Running costFunction before executing minimize function...\n')
cost, grad = costFunction(initial_theta,X,y) #To test the shape of gradient before calling minimize
print ('Executing minimize function...\n')
Result = op.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
Output:
Running costFunction before executing minimize function...
Shape of theta is (3, 1)
Traceback (most recent call last):
Shape of gradient is (3, 1)
Executing minimize function...
Shape of theta is (3,)
File "C:/Users/#####/minimizeshapechange.py", line 34, in <module>
Shape of gradient is (3, 2)
Result = op.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
File "C:\Users\#####\anaconda3\lib\site-packages\scipy\optimize\_minimize.py", line 453, in minimize
**options)
File "C:\Users\#####\anaconda3\lib\site-packages\scipy\optimize\tnc.py", line 409, in _minimize_tnc
xtol, pgtol, rescale, callback)
ValueError: tnc: invalid gradient vector from minimized function.
Process finished with exit code 1

I will not analyze your exact computations, but some remarks:
(1) Your gradient is broken!
scipy expects a partial derivative resulting in an array of shape equal to your x0.
your gradient is of shape (3,2), while (n+1, 1) is expected
compare with the example given in the tutorial which uses scipy.optimize.rosen_der (der = derivative)
(2) It seems your scipy-version is a bit older, because mine (0.19.0) is telling me:
ValueError: tnc: invalid gradient vector from minimized function.
Some supporting source-code from scipy:
if (PyArray_SIZE(arr_grad) != py_state->n)
{
PyErr_SetString(PyExc_ValueError,
"tnc: invalid gradient vector from minimized function.");
goto failure;
Remark: This code above was changed / touched / introduced 5 years ago. If you really don't get this error while using your code listed (with removal of the import of costFunction), it seems you are using scipy < v0.13.0b1, which i do no recommend! I assume you are using some deprecated windows-based inofficial distribution with outdated scipy. You should change that!

I had the same problem with Scipy trying to do the same thing as you. I don't understand exactly why this solves the problem but playing with array shapes until it worked gave me the following:
Gradient function defined as follows
def Gradient(theta,X,y):
#Initializing variables
m = len(y)
theta = theta[:,np.newaxis] #<---- THIS IS THE TRICK
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
grad = (1/m)*(X.T # ( h - y));
return grad #< --- also works with grad.ravel()
Initial_theta initialized as
initial_theta = np.zeros((n+1))
initial_theta.shape
(3,)
i.e. a simple numpy array rather than a column vector.
Gradient function returns
Gradient(initial_theta,X,y).shape
(3,1) or (3,) depending on whether the function returns grad or grad.ravel
scipy.optimize called as
import scipy.optimize as opt
model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)
What does not work with Scipy
initial_theta of shape (3,1) using initial_theta = np.zeros((n+1))[:,np.newaxis] crashes the scipy.minimize function call.
ValueError: tnc: invalid gradient vector from minimized function.
If someone could clarify these points that would be great ! Thanks

your code of costFunctuion is wrong,maybe you should look that
def costFunction(theta,X,y):
h_theta = sigmoid(X#theta)
J = (-y) * np.log(h_theta) - (1 - y) * np.log(1 - h_theta)
return np.mean(J)

please copy and past in jpuiter in1 and so on in separte cell
In 1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
filepath =('C:/Pythontry/MachineLearning/dataset/couresra/ex2data1.txt')
data =pd.read_csv(filepath,sep=',',header=None)
#print(data)
X = data.values[:,:2] #(100,2)
y = data.values[:,2:3] #(100,1)
#print(np.shape(y))
#In 2
#%% ==================== Part 1: Plotting ====================
postive_value = data.loc[data[2] == 1]
#print(postive_value.values[:,2:3])
negative_value = data.loc[data[2] == 0]
#print(len(postive_value))
#print(len(negative_value))
ax1 = postive_value.plot(kind='scatter',x=0,y=1,s=50,color='b',marker="+",label="Admitted") # S is line width #https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter
ax2 = negative_value.plot(kind='scatter',x=0,y=1,s=50,color='y',ax=ax1,label="Not Admitted")
ax1.set_xlabel("Exam 1 score")
ax2.set_ylabel("Exam 2 score")
plt.show()
#print(ax1 == ax2)
#print(np.shape(X))
# In 3
#============ Part 2: Compute Cost and Gradient ===========
[m,n] = np.shape(X) #(100,2)
print(m,n)
additional_coulmn = np.ones((m,1))
X = np.append(additional_coulmn,X,axis=1)
initial_theta = np.zeros((n+1), dtype=int)
print(initial_theta)
# In4
#Sigmoid and cost function
def sigmoid(z):
g = np.zeros(np.shape(z));
g = 1/(1+np.exp(-z));
return g
def costFunction(theta, X, y):
J = 0;
#print(theta)
receive_theta = np.array(theta)[np.newaxis] ##This command is used to create the 1D array
#print(receive_theta)
theta = np.transpose(receive_theta)
#print(np.shape(theta))
#grad = np.zeros(np.shape(theta))
z = np.dot(X,theta) # where z = theta*X
#print(z)
h = sigmoid(z) #formula h(x) = g(z) whether g = 1/1+e(-z) #(100,1)
#print(np.shape(h))
#J = np.sum(((-y)*np.log(h)-(1-y)*np.log(1-h))/m);
J = np.sum(np.dot((-y.T),np.log(h))-np.dot((1-y).T,np.log(1-h)))/m
#J = (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
#error = h-y
#print(np.shape(error))
#print(np.shape(X))
grad =np.dot(X.T,(h-y))/m
#print(grad)
return J,grad
#In5
[cost, grad] = costFunction(initial_theta, X, y)
print('Cost at initial theta (zeros):', cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros): \n',grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n')
In6 # Compute and display cost and gradient with non-zero theta
test_theta = [-24, 0.2, 0.2]
#test_theta_value = np.array([-24, 0.2, 0.2])[np.newaxis] #This command is used to create the 1D row array
#test_theta = np.transpose(test_theta_value) # Transpose
#test_theta = test_theta_value.transpose()
[cost, grad] = costFunction(test_theta, X, y)
print('\nCost at test theta: \n', cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta: \n',grad);
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')
#IN6
# ============= Part 3: Optimizing using range =============
import scipy.optimize as opt
#initial_theta_initialize = np.array([0, 0, 0])[np.newaxis]
#initial_theta = np.transpose(initial_theta_initialize)
print ('Executing minimize function...\n')
# Working models
#result = opt.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
result = opt.fmin_tnc(func=costFunction, x0=initial_theta, args=(X, y))
# Not working model
#costFunction(initial_theta,X,y)
#model = opt.minimize(fun = costFunction, x0 = initial_theta, args = (X, y), method = 'TNC',jac = costFunction)
print('Thetas found by fmin_tnc function: ', result);
print('Cost at theta found : \n', cost);
print('Expected cost (approx): 0.203\n');
print('theta: \n',result[0]);
print('Expected theta (approx):\n');
print(' -25.161\n 0.206\n 0.201\n');
Result:
Executing minimize function...
Thetas found by fmin_tnc function: (array([-25.16131854, 0.20623159, 0.20147149]), 36, 0)
Cost at theta found :
0.218330193827
Expected cost (approx): 0.203
theta:
[-25.16131854 0.20623159 0.20147149]
Expected theta (approx):
-25.161
0.206
0.201

scipy’s fmin_tnc doesn’t work well with column or row vector. It expects the parameters to be in an array format.
Python Implementation of Andrew Ng’s Machine Learning Course (Part 2.1)
opt.fmin_tnc(func = costFunction, x0 = theta.flatten(),fprime = gradient, args = (X, y.flatten()))

What worked for me is to reshape y as a vector (1-D) rather than a matrix (2-D array). I simply used the following code and then reran the SciPy's minimize function and it worked.
y = np.reshape(y,100) #e.g., if your y variable has 100 data points.

Little bit late but I also started anderw assignment to implement with Python and put a lot of effort to resolve the mentioned issue. Finally is works for me.
This blog help me but with one changes in fmin_tnc function calling, refer below :-
result = op.fmin_tnc(func=costFunction, x0=initial_theta, fprime=None, approx_grad=True, args=(X, y)) Got this info from here

Simple autoencoder in theano optimizing using Scipy Optimize minimum

Hi I am trying to build a simple autoencoder model in Theano but I also want to optimize using Scipy Optimize Minimize. I found a code here : http://dlacombejr.github.io/programming/2015/09/13/sparse-filtering-implemenation-in-theano.html which I modified a bit. The problem is since scipy minimum takes in an array, but I need to optimize more that one weight parameters. what is the solution to this problem if I want to keep using the same structure as the code in the link?
I found a possible way of getting around this problem, that is to pass in all the weights as a single array flattened and appended, and then assigning parts of the array to various variables, according to this https://github.com/Theano/Theano/issues/3131
import theano
from theano import tensor as t
import numpy as np
class SparseFilter(object):
def __init__(self, theta, dims, x):
# assign inputs to sparse filter
self.theta = theta
self.x = x
self.dims = dims
self.w = theano.shared(np.random.randn(self.dims[1], self.dims[0]), name='w')
self.w = t.reshape(self.theta[0:(self.dims[0]*self.dims[1])], self.dims)
self.b = theano.shared(np.random.randn(self.dims[1]), name='b')
self.b = t.reshape(self.theta[(self.dims[0]*self.dims[1]):self.theta.get_value().shape[0]], (self.dims[0],1))
# the feed-forward function is not fully written
def feed_forward(self):
f = t.dot(self.w, self.x.T) + self.b
return f
def get_cost_grads(self):
cost = t.sum(t.abs_(self.feed_forward()))
gradw = t.grad(cost=cost, wrt=self.w).flatten()
gradb = t.grad(cost=cost, wrt=self.b).flatten()
return cost, gradw, gradb
def training_functions(data, model, dims):
cost, grad = model.get_cost_grads()
fn = theano.function(inputs=[], outputs=[cost, grad],
givens={model.x: data}, allow_input_downcast=True)
def train_fn(theta_value):
# reshape the theta value for Theano and convert to float32
model.w = t.reshape(theta_value[0:(dims[0]*dims[1])], dims)
model.b = t.reshape(theta_value[(dims[0]*dims[1]):theta_value.shape[0]], (dims[0],1))
c, gw, gb = fn()
# convert values to float64 for SciPy
c = np.asarray(c, dtype=np.float64)
gw = np.asarray(gw, dtype=np.float64)
gb = np.asarray(gb, dtype=np.float64)
grad = np.append(gw, gb)
return c, grad
return train_fn
theta = theano.shared(np.random.randn((input_dim*hdim) + hdim), name='theta')
np.random.seed(0)
theta.set_value(np.random.randn((input_dim*hdim) + hdim).astype('float32'))
model = SparseFilter(theta, dims, x)
train_fn = training_functions(data, model, dims)
from scipy.optimize import minimize
weights = minimize(train_fn, model.theta.eval(),
method='L-BFGS-B', jac=True,
options={'maxiter': 100, 'disp': True})
There is no iteration and the oparation stops.
There is a message in the output
message: 'ABNORMAL_TERMINATION_IN_LNSRCH'
Any suggestion? Thank you.

Error in backpropagation python neural net

Darn thing just won't learn. Sometimes weights seem to become nan.
I haven't played with different numbers of hidden layers/inputs/outputs but the bug appears consistent across different sizes of hidden layer.
from __future__ import division
import numpy
import matplotlib
import random
class Net:
def __init__(self, *sizes):
sizes = list(sizes)
sizes[0] += 1
self.sizes = sizes
self.weights = [numpy.random.uniform(-1, 1, (sizes[i+1],sizes[i])) for i in range(len(sizes)-1)]
#staticmethod
def activate(x):
return 1/(1+numpy.exp(-x))
def y(self, x_):
x = numpy.concatenate(([1], numpy.atleast_1d(x_.copy())))
o = [x] #o[i] is the (activated) output of hidden layer i, "hidden layer 0" is inputs
for weight in self.weights[:-1]:
x = weight.dot(x)
x = Net.activate(x)
o.append(x)
o.append(self.weights[-1].dot(x))
return o
def __call__(self, x):
return self.y(x)[-1]
def delta(self, x, t):
o = self.y(x)
delta = [(o[-1]-t) * o[-1] * (1-o[-1])]
for i, weight in enumerate(reversed(self.weights)):
delta.append(weight.T.dot(delta[-1]) * o[-i-2] * (1-o[-i-2]))
delta.reverse()
return o, delta
def train(self, inputs, outputs, epochs=100, rate=.1):
for epoch in range(epochs):
pairs = zip(inputs, outputs)
random.shuffle(pairs)
for x, t in pairs: #shuffle? subset?
o, d = self.delta(x, t)
for layer in range(len(self.sizes)-1):
self.weights[layer] -= rate * numpy.outer(o[layer+1], d[layer])
n = Net(1, 4, 1)
x = numpy.linspace(0, 2*3.14, 10)
t = numpy.sin(x)
matplotlib.pyplot.plot(x, t, 'g')
matplotlib.pyplot.plot(x, map(n, x), 'r')
n.train(x, t)
print n.weights
matplotlib.pyplot.plot(x, map(n, x), 'b')
matplotlib.pyplot.show()

I haven't looked for a particular bug in your code, but can you please try the following things to narrow down your problem further? Otherwise it is very tedious to find the needle in the haystack.
1) Please try to use a real dataset to have an idea what to expect, e.g., MNIST, and/or standardize your data, because your weights may become NaN if they become too small.
2) Try different learning rates and plot the cost function vs. epochs to check if you are converging. It should look somewhat like this (note that I used minibatch learning and averaged the minibatch chunks for each epoch).
3) I see that you are using a sigmoid activation, your implementation is correct, but to make it numerically more stable, replace 1.0 / (1.0 + np.exp(-z)) by expit(z) from scipy.special (same function but more efficient).
4) Implement gradient checking. Here, you compare the analytical solution to a numerically approximated gradient
Or an even better approach that yields a more accurate approximation of the gradient is to compute the symmetric (or centered) difference quotient given by the two-point formula
PS: If you are interested and find it useful, I have a working vanilla NumPy neural net implemented here.

I fixed it! Thanks for all the suggestions. I worked out numeric partials and found that my o and deltas were correct, but I was multiplying the wrong ones. That's why I now take numpy.outer(d[layer+1], o[layer]) instead of numpy.outer(d[layer], o[layer+1]).
I was also skipping the update on one layer. That's why I changed for layer in range(self.hidden_layers) to for layer in range(self.hidden_layers+1).
I'll add that I caught a bug just before posting originally. My output layer delta was incorrect because my net (intentionally) doesn't activate the final outputs, but my delta was computed as though it did.
Debugged primarily with a one hidden layer, one hidden unit net, then moved to a 2 input, 3 hidden layers of 2 neurons each, 2 output model.
from __future__ import division
import numpy
import scipy
import scipy.special
import matplotlib
#from pylab import *
#numpy.random.seed(23)
def nmap(f, x):
return numpy.array(map(f, x))
class Net:
def __init__(self, *sizes):
self.hidden_layers = len(sizes)-2
self.weights = [numpy.random.uniform(-1, 1, (sizes[i+1],sizes[i])) for i in range(self.hidden_layers+1)]
#staticmethod
def activate(x):
return scipy.special.expit(x)
#return 1/(1+numpy.exp(-x))
#staticmethod
def activate_(x):
s = scipy.special.expit(x)
return s*(1-s)
def y(self, x):
o = [numpy.array(x)] #o[i] is the (activated) output of hidden layer i, "hidden layer 0" is inputs and not activated
for weight in self.weights[:-1]:
o.append(Net.activate(weight.dot(o[-1])))
o.append(self.weights[-1].dot(o[-1]))
# for weight in self.weights:
# o.append(Net.activate(weight.dot(o[-1])))
return o
def __call__(self, x):
return self.y(x)[-1]
def delta(self, x, t):
x = numpy.array(x)
t = numpy.array(t)
o = self.y(x)
#delta = [(o[-1]-t) * o[-1] * (1-o[-1])]
delta = [o[-1]-t]
for i, weight in enumerate(reversed(self.weights)):
delta.append(weight.T.dot(delta[-1]) * o[-i-2] * (1-o[-i-2]))
delta.reverse() #surely i need this
return o, delta
def train(self, inputs, outputs, epochs=1000, rate=.1):
errors = []
for epoch in range(epochs):
for x, t in zip(inputs, outputs): #shuffle? subset?
o, d = self.delta(x, t)
for layer in range(self.hidden_layers+1):
grad = numpy.outer(d[layer+1], o[layer])
self.weights[layer] -= rate * grad
return errors
def rmse(self, inputs, outputs):
return ((outputs - nmap(self, inputs))**2).sum()**.5/len(inputs)
n = Net(1, 8, 1)
X = numpy.linspace(0, 2*3.1415, 10)
T = numpy.sin(X)
Y = map(n, X)
Y = numpy.array([y[0,0] for y in Y])
matplotlib.pyplot.plot(X, T, 'g')
matplotlib.pyplot.plot(X, Y, 'r')
print 'output successful'
print n.rmse(X, T)
errors = n.train(X, T)
print 'tried to train successfully'
print n.rmse(X, T)
Y = map(n, X)
Y = numpy.array([y[0,0] for y in Y])
matplotlib.pyplot.plot(x, Y, 'b')
matplotlib.pyplot.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

T.hessian gives NotImplementedError() in theano - python

Related

Linear Regression with python - gradient descent error

How to Use AutoGrad Packages?

Python: How to prevent Scipy's optimize.minimize function from changing the shape of initial guess x0?

Simple autoencoder in theano optimizing using Scipy Optimize minimum

Error in backpropagation python neural net

Categories

Resources