Related
To calculate the second derivative of a neural network output with respect to the input using a keras model, I implemented codes used in those 2 posts :
Keras: using the sum of the first and second derivatives of a model as final output
Second derivative in Keras
However, for both techniques, the second derivative is always equal to 0. Here is the problem reproduced on a simple regression of the quadratic function:
import tensorflow as tf
import tensorflow.keras.backend as K
from tensorflow.keras.backend import set_session
from tensorflow.keras import datasets, layers, models
from tensorflow.keras import optimizers
import numpy as np
x = np.linspace(0, 1, 1000)
y = x**2
def model_regression():
model = tf.keras.Sequential([
layers.Dense(1024, input_dim=1, activation="relu"),
layers.Dense(1, activation="linear")])
model.compile(optimizer=optimizers.Adam(lr=0.001),
loss='mean_squared_error')
return model
my_model = model_regression()
my_model.fit(x, y, epochs=10, batch_size=20)
y_pred = my_model.predict(x)
#### Technique 1
first_input = K.gradients(my_model.output, my_model.input)
second_input = K.gradients(first_input, my_model.input)
iterate_first = K.function([my_model.input], [first_input])
iterate_second = K.function([my_model.input], [second_input])
#### Technique 2
# def grad(y, x):
# print(y.shape)
# return layers.Lambda(lambda z: K.gradients(z[0], z[1]), output_shape=[1])([y, x])
# derivative1 = grad(my_model.output, my_model.input)
# derivative2 = grad(derivative1, my_model.input)
# iterate_first = K.function([my_model.input], [derivative1])
# iterate_second = K.function([my_model.input], [derivative2])
first_derivative, second_derivative = [], []
for i in range(x.shape[0]):
first_derivative.append(iterate_first(np.array([[x[i]]]))[0][0][0])
second_derivative.append(iterate_second(np.array([[x[i]]]))[0][0][0])
Here is the graphical result, as you can see the second derivative is always equal to 0 while it should approximatively be equal to 2.
How can I compute correctly the second derivative of my neural network in keras ?
I am using the Sequential model from Keras, with the DENSE layer type. I wrote a function that recursively calculates predictions, but the predictions are way off. I am wondering what is the best activation function to use for my data. Currently I am using hard_sigmoid function. The output data values range from 5 to 25. The input data has the shape (6,1) and the output data is a single value. When I plot the predictions they never decrease. Thank you for the help!!
# create and fit Multilayer Perceptron model
model = Sequential();
model.add(Dense(20, input_dim=look_back, activation='hard_sigmoid'))
model.add(Dense(16, activation='hard_sigmoid'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=200, batch_size=2, verbose=0)
#function to predict using predicted values
numOfPredictions = 96;
for i in range(numOfPredictions):
temp = [[origAndPredictions[i,0],origAndPredictions[i,1],origAndPredictions[i,2],origAndPredictions[i,3],origAndPredictions[i,4],origAndPredictions[i,5]]]
temp = numpy.array(temp)
temp1 = model.predict(temp)
predictions = numpy.append(predictions, temp1, axis=0)
temp2 = []
temp2 = [[origAndPredictions[i,1],origAndPredictions[i,2],origAndPredictions[i,3],origAndPredictions[i,4],origAndPredictions[i,5],predictions[i,0]]]
temp2 = numpy.array(temp2)
origAndPredictions = numpy.vstack((origAndPredictions, temp2))
update:
I used this code to implement the swish.
from keras.backend import sigmoid
def swish1(x, beta = 1):
return (x * sigmoid(beta * x))
def swish2(x, beta = 1):
return (x * sigmoid(beta * x))
from keras.utils.generic_utils import get_custom_objects
from keras.layers import Activation
get_custom_objects().update({'swish': Activation(swish)})
model.add(Activation(custom_activation,name = "swish1"))
update:
Using this code:
from keras.backend import sigmoid
from keras import backend as K
def swish1(x):
return (K.sigmoid(x) * x)
def swish2(x):
return (K.sigmoid(x) * x)
Thanks for all the help!!
Although there is no best activation function as such, I find Swish to work particularly well for Time-Series problems. AFAIK keras doesn't provide Swish builtin, you can use:
from keras.utils.generic_utils import get_custom_objects
from keras import backend as K
from keras.layers import Activation
def custom_activation(x, beta = 1):
return (K.sigmoid(beta * x) * x)
get_custom_objects().update({'custom_activation': Activation(custom_activation)})
Then use it in model:
model.add(Activation(custom_activation,name = "Swish"))
Your output data ranges from 5 to 25 and your output ReLU activation will give you values from 0 to inf. So what you try is to "parameterize" your outputs or normalize your labels. This means, using sigmoid as activation (outputs in (0,1)) and transform your labels by subtracting 5 and dividing by 20, so they will be in (almost) the same interval as your outputs, [0,1]. Or you can use sigmoid and multiply your outputs by 20 and add 5 before calculating the loss.
Would be interesting to see the results.
I am building a simple neural network using Keras. It has activity regularization so that the output of the only hidden layer is forced to have small values. Here is the code:
import numpy as np
import math
import keras
from keras.models import Model, Sequential
from keras.layers import Input, Dense, Activation
from keras import regularizers
from keras import backend as K
a=1
def my_regularizer(inputs):
means=K.mean((inputs),axis=1)
return a*K.sum(means)**2
x_train=np.random.uniform(low=-1,high=1,size=(200,2))
model=Sequential([
Dense(20,input_shape=(2,),activity_regularizer=my_regularizer),
Activation('tanh'),
Dense(2,),
Activation('linear')
])
model.compile(optimizer='adam',loss='mean_squared_error')
model.fit(x_train,x_train,epochs=20,validation_split=0.1)
Questions:
1) Currently, parameter a is set at the beginning and it does not change. How can I change the code such that the parameter a is updated after each iteration such that
a_new=f(a_old,input)
where input is the values at the hidden layer and f(.) is an arbitrary function.
2) I want my activity regularizer to be applied after the first activation function tanh is applied. Have I written my code correctly? The term "activity_regularizer=my_regularizer" in
Dense(20,input_sahpe=(2,),activity_regularizer=my_regularizer)
makes me feel that the regularizer is being applied to values before the activation function tanh.
You can - but first, you need a valid Keras Regularizer object (your function won't work):
class MyActivityRegularizer(Regularizer):
def __init__(self, a=1):
self.a = K.variable(a, name='a')
# gets called at each train iteration
def __call__(self, x): # your custom function here
means = K.mean(x, axis=1)
return self.a * K.sum(means)**2
def get_config(self): # required class method
return {"a": float(K.get_value(self.a))}
Next, to work with .fit, you need a custom Keras Callback object (see alternative at bottom):
class ActivityRegularizerScheduler(Callback):
""" 'on_batch_end' gets automatically called by .fit when finishing
iterating over a batch. The model, and its attributes, are inherited by
'Callback' (except at __init__) and can be accessed via, e.g., self.model """
def __init__(self, model, update_fn):
self.update_fn=update_fn
self.activity_regularizers=_get_activity_regularizers(model)
def on_batch_end(self, batch, logs=None):
iteration = K.get_value(self.model.optimizer.iterations)
new_activity_reg = self.update_fn(iteration)
# 'activity_regularizer' references model layer's activity_regularizer (in this
# case 'MyActivityRegularizer'), so its attributes ('a') can be set directly
for activity_regularizer in self.activity_regularizers:
K.set_value(activity_regularizer.a, new_activity_reg)
def _get_activity_regularizers(model):
activity_regularizers = []
for layer in model.layers:
a_reg = getattr(layer,'activity_regularizer',None)
if a_reg is not None:
activity_regularizers.append(a_reg)
return activity_regularizers
Lastly, you'll need to create your model within the Keras CustomObjectScope - see in full ex. below.
Example usage:
from keras.layers import Dense
from keras.models import Sequential
from keras.regularizers import Regularizer
from keras.callbacks import Callback
from keras.utils import CustomObjectScope
from keras.optimizers import Adam
import keras.backend as K
import numpy as np
def make_model(my_reg):
return Sequential([
Dense(20, activation='tanh', input_shape=(2,), activity_regularizer=my_reg),
Dense(2, activation='linear'),
])
my_reg = MyActivityRegularizer(a=1)
with CustomObjectScope({'MyActivityRegularizer':my_reg}): # required for Keras to recognize
model = make_model(my_reg)
opt = Adam(lr=1e-4)
model.compile(optimizer=opt, loss='mse')
x = np.random.randn(320,2) # dummy data
y = np.random.randn(320,2) # dummy labels
update_fn = lambda x: .5 + .4*np.cos(x) #x = number of train updates (optimizer.iterations)
activity_regularizer_scheduler = ActivityRegularizerScheduler(model, update_fn)
model.fit(x,y,batch_size=32,callbacks=[activity_regularizer_scheduler],
epochs=4,verbose=1)
To TRACK your a and make sure it's changing, you can get its value at, e.g., each epoch end via:
for epoch in range(4):
model.fit(x,y,batch_size=32,callbacks=[activity_regularizer_scheduler],epochs=1)
print("Epoch {} activity_regularizer 'a': {}".format(epoch,
K.get_value(_get_activity_regularizers(model)[0].a)))
# My output:
# Epoch 0 activity_regularizer 'a': 0.7190816402435303
# Epoch 1 activity_regularizer 'a': 0.4982417821884155
# Epoch 2 activity_regularizer 'a': 0.2838689386844635
# Epoch 3 activity_regularizer 'a': 0.8644570708274841
Regarding (2), I'm afraid you're right - the 'tanh' outputs won't be used; you'll need to pass activation='tanh' instead.
Lastly, you can do it without a callback, via train_on_batch - but a drawback is, you'll need to feed data to the model yourself (and shuffle it, etc):
activity_regularizers = _get_activity_regularizers(model)
for iteration in range(100):
x, y = get_data()
model.train_on_batch(x,y)
iteration = K.get_value(model.optimizer.iterations)
for activity_regularizer in activity_regularizers:
K.set_value(activity_regularizer, update_fn(iteration))
I have trained a binary classification model with CNN, and here is my code
model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
border_mode='valid',
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (16, 16, 32)
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (8, 8, 64) = (2048)
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(2)) # define a binary classification problem
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=1,
validation_data=(x_test, y_test))
And here, I wanna get the output of each layer just like TensorFlow, how can I do that?
You can easily get the outputs of any layer by using: model.layers[index].output
For all layers use this:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functors = [K.function([inp, K.learning_phase()], [out]) for out in outputs] # evaluation functions
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test, 1.]) for func in functors]
print layer_outs
Note: To simulate Dropout use learning_phase as 1. in layer_outs otherwise use 0.
Edit: (based on comments)
K.function creates theano/tensorflow tensor functions which is later used to get the output from the symbolic graph given the input.
Now K.learning_phase() is required as an input as many Keras layers like Dropout/Batchnomalization depend on it to change behavior during training and test time.
So if you remove the dropout layer in your code you can simply use:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functors = [K.function([inp], [out]) for out in outputs] # evaluation functions
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test]) for func in functors]
print layer_outs
Edit 2: More optimized
I just realized that the previous answer is not that optimized as for each function evaluation the data will be transferred CPU->GPU memory and also the tensor calculations needs to be done for the lower layers over-n-over.
Instead this is a much better way as you don't need multiple functions but a single function giving you the list of all outputs:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functor = K.function([inp, K.learning_phase()], outputs ) # evaluation function
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs
From https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer
One simple way is to create a new Model that will output the layers that you are interested in:
from keras.models import Model
model = ... # include here your original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
Alternatively, you can build a Keras function that will return the output of a certain layer given a certain input, for example:
from keras import backend as K
# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
[model.layers[3].output])
layer_output = get_3rd_layer_output([x])[0]
Based on all the good answers of this thread, I wrote a library to fetch the output of each layer. It abstracts all the complexity and has been designed to be as user-friendly as possible:
https://github.com/philipperemy/keract
It handles almost all the edge cases.
Hope it helps!
Following looks very simple to me:
model.layers[idx].output
Above is a tensor object, so you can modify it using operations that can be applied to a tensor object.
For example, to get the shape model.layers[idx].output.get_shape()
idx is the index of the layer and you can find it from model.summary()
This answer is based on: https://stackoverflow.com/a/59557567/2585501
To print the output of a single layer:
from tensorflow.keras import backend as K
layerIndex = 1
func = K.function([model.get_layer(index=0).input], model.get_layer(index=layerIndex).output)
layerOutput = func([input_data]) # input_data is a numpy array
print(layerOutput)
To print output of every layer:
from tensorflow.keras import backend as K
for layerIndex, layer in enumerate(model.layers):
func = K.function([model.get_layer(index=0).input], layer.output)
layerOutput = func([input_data]) # input_data is a numpy array
print(layerOutput)
I wrote this function for myself (in Jupyter) and it was inspired by indraforyou's answer. It will plot all the layer outputs automatically. Your images must have a (x, y, 1) shape where 1 stands for 1 channel. You just call plot_layer_outputs(...) to plot.
%matplotlib inline
import matplotlib.pyplot as plt
from keras import backend as K
def get_layer_outputs():
test_image = YOUR IMAGE GOES HERE!!!
outputs = [layer.output for layer in model.layers] # all layer outputs
comp_graph = [K.function([model.input]+ [K.learning_phase()], [output]) for output in outputs] # evaluation functions
# Testing
layer_outputs_list = [op([test_image, 1.]) for op in comp_graph]
layer_outputs = []
for layer_output in layer_outputs_list:
print(layer_output[0][0].shape, end='\n-------------------\n')
layer_outputs.append(layer_output[0][0])
return layer_outputs
def plot_layer_outputs(layer_number):
layer_outputs = get_layer_outputs()
x_max = layer_outputs[layer_number].shape[0]
y_max = layer_outputs[layer_number].shape[1]
n = layer_outputs[layer_number].shape[2]
L = []
for i in range(n):
L.append(np.zeros((x_max, y_max)))
for i in range(n):
for x in range(x_max):
for y in range(y_max):
L[i][x][y] = layer_outputs[layer_number][x][y][i]
for img in L:
plt.figure()
plt.imshow(img, interpolation='nearest')
From: https://github.com/philipperemy/keras-visualize-activations/blob/master/read_activations.py
import keras.backend as K
def get_activations(model, model_inputs, print_shape_only=False, layer_name=None):
print('----- activations -----')
activations = []
inp = model.input
model_multi_inputs_cond = True
if not isinstance(inp, list):
# only one input! let's wrap it in a list.
inp = [inp]
model_multi_inputs_cond = False
outputs = [layer.output for layer in model.layers if
layer.name == layer_name or layer_name is None] # all layer outputs
funcs = [K.function(inp + [K.learning_phase()], [out]) for out in outputs] # evaluation functions
if model_multi_inputs_cond:
list_inputs = []
list_inputs.extend(model_inputs)
list_inputs.append(0.)
else:
list_inputs = [model_inputs, 0.]
# Learning phase. 0 = Test mode (no dropout or batch normalization)
# layer_outputs = [func([model_inputs, 0.])[0] for func in funcs]
layer_outputs = [func(list_inputs)[0] for func in funcs]
for layer_activations in layer_outputs:
activations.append(layer_activations)
if print_shape_only:
print(layer_activations.shape)
else:
print(layer_activations)
return activations
Previous solutions were not working for me. I handled this issue as shown below.
layer_outputs = []
for i in range(1, len(model.layers)):
tmp_model = Model(model.layers[0].input, model.layers[i].output)
tmp_output = tmp_model.predict(img)[0]
layer_outputs.append(tmp_output)
Wanted to add this as a comment (but don't have high enough rep.) to #indraforyou's answer to correct for the issue mentioned in #mathtick's comment. To avoid the InvalidArgumentError: input_X:Y is both fed and fetched. exception, simply replace the line outputs = [layer.output for layer in model.layers] with outputs = [layer.output for layer in model.layers][1:], i.e.
adapting indraforyou's minimal working example:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers][1:] # all layer outputs except first (input) layer
functor = K.function([inp, K.learning_phase()], outputs ) # evaluation function
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs
p.s. my attempts trying things such as outputs = [layer.output for layer in model.layers[1:]] did not work.
Assuming you have:
1- Keras pre-trained model.
2- Input x as image or set of images. The resolution of image should be compatible with dimension of the input layer. For example 80*80*3 for 3-channels (RGB) image.
3- The name of the output layer to get the activation. For example, "flatten_2" layer. This should be include in the layer_names variable, represents name of layers of the given model.
4- batch_size is an optional argument.
Then you can easily use get_activation function to get the activation of the output layer for a given input x and pre-trained model:
import six
import numpy as np
import keras.backend as k
from numpy import float32
def get_activations(x, model, layer, batch_size=128):
"""
Return the output of the specified layer for input `x`. `layer` is specified by layer index (between 0 and
`nb_layers - 1`) or by name. The number of layers can be determined by counting the results returned by
calling `layer_names`.
:param x: Input for computing the activations.
:type x: `np.ndarray`. Example: x.shape = (80, 80, 3)
:param model: pre-trained Keras model. Including weights.
:type model: keras.engine.sequential.Sequential. Example: model.input_shape = (None, 80, 80, 3)
:param layer: Layer for computing the activations
:type layer: `int` or `str`. Example: layer = 'flatten_2'
:param batch_size: Size of batches.
:type batch_size: `int`
:return: The output of `layer`, where the first dimension is the batch size corresponding to `x`.
:rtype: `np.ndarray`. Example: activations.shape = (1, 2000)
"""
layer_names = [layer.name for layer in model.layers]
if isinstance(layer, six.string_types):
if layer not in layer_names:
raise ValueError('Layer name %s is not part of the graph.' % layer)
layer_name = layer
elif isinstance(layer, int):
if layer < 0 or layer >= len(layer_names):
raise ValueError('Layer index %d is outside of range (0 to %d included).'
% (layer, len(layer_names) - 1))
layer_name = layer_names[layer]
else:
raise TypeError('Layer must be of type `str` or `int`.')
layer_output = model.get_layer(layer_name).output
layer_input = model.input
output_func = k.function([layer_input], [layer_output])
# Apply preprocessing
if x.shape == k.int_shape(model.input)[1:]:
x_preproc = np.expand_dims(x, 0)
else:
x_preproc = x
assert len(x_preproc.shape) == 4
# Determine shape of expected output and prepare array
output_shape = output_func([x_preproc[0][None, ...]])[0].shape
activations = np.zeros((x_preproc.shape[0],) + output_shape[1:], dtype=float32)
# Get activations with batching
for batch_index in range(int(np.ceil(x_preproc.shape[0] / float(batch_size)))):
begin, end = batch_index * batch_size, min((batch_index + 1) * batch_size, x_preproc.shape[0])
activations[begin:end] = output_func([x_preproc[begin:end]])[0]
return activations
In case you have one of the following cases:
error: InvalidArgumentError: input_X:Y is both fed and fetched
case of multiple inputs
You need to do the following changes:
add filter out for input layers in outputs variable
minnor change on functors loop
Minimum example:
from keras.engine.input_layer import InputLayer
inp = model.input
outputs = [layer.output for layer in model.layers if not isinstance(layer, InputLayer)]
functors = [K.function(inp + [K.learning_phase()], [x]) for x in outputs]
layer_outputs = [fun([x1, x2, xn, 1]) for fun in functors]
Well, other answers are very complete, but there is a very basic way to "see", not to "get" the shapes.
Just do a model.summary(). It will print all layers and their output shapes. "None" values will indicate variable dimensions, and the first dimension will be the batch size.
Generally, output size can be calculated as
[(W−K+2P)/S]+1
where
W is the input volume - in your case you have not given us this
K is the Kernel size - in your case 2 == "filter"
P is the padding - in your case 2
S is the stride - in your case 3
Another, prettier formulation:
I am trying to build a deep network using theano. However the accuracy is zero. I can not figure out my mistake. I am trying to create a deep learning network with 3 hidden layers and one output. I am tyring to do a classification task and I have 5 classes. Therefore, the output layer have 5 nodes.
Any suggestion?
#!/usr/bin/env python
from __future__ import print_function
import theano
import theano.tensor as T
import lasagne
import numpy as np
import sklearn.datasets
import os
import csv
import pandas as pd
# Lasagne is pre-release, so it's interface is changing.
# Whenever there's a backwards-incompatible change, a warning is raised.
# Let's ignore these for the course of the tutorial
import warnings
warnings.filterwarnings('ignore', module='lasagne')
from lasagne.objectives import categorical_crossentropy, aggregate
#load the data and prepare it
df = pd.read_excel('risk_sample_data_9.20.16_anon.xls',skiprows=0)
rawdata = df.values
# remove empty rows (odd rows)
mask = np.ones(len(rawdata), dtype=bool)
mask[::2] = False
data = rawdata[mask]
idx = np.array([1,5,6,7])
m = np.zeros_like(data)
m[:,idx] = 1
X = np.ma.masked_array(data,m)
X = np.ma.filled(X, fill_value=0)
X = X.astype(theano.config.floatX)
y = data[:,7] # extract financial rating labels
# convert char lables into int , A=1 , B=2, C=3, D=4, F=5
y[y == 'A'] = 1
y[y == 'B'] = 2
y[y == 'C'] = 3
y[y == 'D'] = 4
y[y == 'F'] = 5
y = pd.to_numeric(y)
y = y.astype('int32')
#y = y.astype(theano.config.floatX)
N_CLASSES = 5
# First, construct an input layer.
# The shape parameter defines the expected input shape,
# which is just the shape of our data matrix data.
l_in = lasagne.layers.InputLayer(shape=X.shape)
# We'll create a network with two dense layers:
# A tanh hidden layer and a softmax output layer.
l_hidden1 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_in,
# This defines the layer's output dimensionality
num_units=250,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.rectify)
l_hidden2 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_hidden1,
# This defines the layer's output dimensionality
num_units=100,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.rectify)
l_hidden3 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_hidden2,
# This defines the layer's output dimensionality
num_units=50,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.rectify)
l_hidden4 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_hidden3,
# This defines the layer's output dimensionality
num_units=10,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.sigmoid)
# For our output layer, we'll use a dense layer with a softmax nonlinearity.
l_output = lasagne.layers.DenseLayer(
l_hidden4, num_units=N_CLASSES, nonlinearity=lasagne.nonlinearities.softmax)
net_output = lasagne.layers.get_output(l_output)
# As a loss function, we'll use Theano's categorical_crossentropy function.
# This allows for the network output to be class probabilities,
# but the target output to be class labels.
true_output = T.ivector('true_output')
# get_loss computes a Theano expression for the objective,
# given a target variable
# By default, it will use the network's InputLayer input_var,
# which is what we want.
#loss = objective.get_loss(target=true_output)
loss = lasagne.objectives.categorical_crossentropy(net_output, true_output)
loss = aggregate(loss, mode='mean')
# Retrieving all parameters of the network is done using get_all_params,
# which recursively collects the parameters of all layers
# connected to the provided layer.
all_params = lasagne.layers.get_all_params(l_output)
# Now, we'll generate updates using Lasagne's SGD function
updates = lasagne.updates.sgd(loss, all_params, learning_rate=1)
# Finally, we can compile Theano functions for training and
# computing the output.
# Note that because loss depends on the input variable of our input layer,
# we need to retrieve it and tell Theano to use it.
train = theano.function([l_in.input_var, true_output], loss, updates=updates)
get_output = theano.function([l_in.input_var], net_output)
def eq(x, y):
if x==y:
return 1
return 0
print("Training ...")
# Train for 100 epochs
for n in xrange(10):
train(X, y)
y_predicted = np.argmax(get_output(X), axis=1)
correct = reduce(lambda a, b: a+b, map(eq, y_predicted, y))
print("Iteration {} correct prediction {}".format(n, correct))
# Compute the predicted label of the training data.
# The argmax converts the class probability output to class label
y_predicted = np.argmax(get_output(X), axis=1)
print(y_predicted)
The learning rate seems way too high. Try a lower learning rate first. It might be that your model diverges on the task. Hard to tell without being able to try it on your data.