I am trying to build a deep network using theano. However the accuracy is zero. I can not figure out my mistake. I am trying to create a deep learning network with 3 hidden layers and one output. I am tyring to do a classification task and I have 5 classes. Therefore, the output layer have 5 nodes.
Any suggestion?
#!/usr/bin/env python
from __future__ import print_function
import theano
import theano.tensor as T
import lasagne
import numpy as np
import sklearn.datasets
import os
import csv
import pandas as pd
# Lasagne is pre-release, so it's interface is changing.
# Whenever there's a backwards-incompatible change, a warning is raised.
# Let's ignore these for the course of the tutorial
import warnings
warnings.filterwarnings('ignore', module='lasagne')
from lasagne.objectives import categorical_crossentropy, aggregate
#load the data and prepare it
df = pd.read_excel('risk_sample_data_9.20.16_anon.xls',skiprows=0)
rawdata = df.values
# remove empty rows (odd rows)
mask = np.ones(len(rawdata), dtype=bool)
mask[::2] = False
data = rawdata[mask]
idx = np.array([1,5,6,7])
m = np.zeros_like(data)
m[:,idx] = 1
X = np.ma.masked_array(data,m)
X = np.ma.filled(X, fill_value=0)
X = X.astype(theano.config.floatX)
y = data[:,7] # extract financial rating labels
# convert char lables into int , A=1 , B=2, C=3, D=4, F=5
y[y == 'A'] = 1
y[y == 'B'] = 2
y[y == 'C'] = 3
y[y == 'D'] = 4
y[y == 'F'] = 5
y = pd.to_numeric(y)
y = y.astype('int32')
#y = y.astype(theano.config.floatX)
N_CLASSES = 5
# First, construct an input layer.
# The shape parameter defines the expected input shape,
# which is just the shape of our data matrix data.
l_in = lasagne.layers.InputLayer(shape=X.shape)
# We'll create a network with two dense layers:
# A tanh hidden layer and a softmax output layer.
l_hidden1 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_in,
# This defines the layer's output dimensionality
num_units=250,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.rectify)
l_hidden2 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_hidden1,
# This defines the layer's output dimensionality
num_units=100,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.rectify)
l_hidden3 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_hidden2,
# This defines the layer's output dimensionality
num_units=50,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.rectify)
l_hidden4 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_hidden3,
# This defines the layer's output dimensionality
num_units=10,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.sigmoid)
# For our output layer, we'll use a dense layer with a softmax nonlinearity.
l_output = lasagne.layers.DenseLayer(
l_hidden4, num_units=N_CLASSES, nonlinearity=lasagne.nonlinearities.softmax)
net_output = lasagne.layers.get_output(l_output)
# As a loss function, we'll use Theano's categorical_crossentropy function.
# This allows for the network output to be class probabilities,
# but the target output to be class labels.
true_output = T.ivector('true_output')
# get_loss computes a Theano expression for the objective,
# given a target variable
# By default, it will use the network's InputLayer input_var,
# which is what we want.
#loss = objective.get_loss(target=true_output)
loss = lasagne.objectives.categorical_crossentropy(net_output, true_output)
loss = aggregate(loss, mode='mean')
# Retrieving all parameters of the network is done using get_all_params,
# which recursively collects the parameters of all layers
# connected to the provided layer.
all_params = lasagne.layers.get_all_params(l_output)
# Now, we'll generate updates using Lasagne's SGD function
updates = lasagne.updates.sgd(loss, all_params, learning_rate=1)
# Finally, we can compile Theano functions for training and
# computing the output.
# Note that because loss depends on the input variable of our input layer,
# we need to retrieve it and tell Theano to use it.
train = theano.function([l_in.input_var, true_output], loss, updates=updates)
get_output = theano.function([l_in.input_var], net_output)
def eq(x, y):
if x==y:
return 1
return 0
print("Training ...")
# Train for 100 epochs
for n in xrange(10):
train(X, y)
y_predicted = np.argmax(get_output(X), axis=1)
correct = reduce(lambda a, b: a+b, map(eq, y_predicted, y))
print("Iteration {} correct prediction {}".format(n, correct))
# Compute the predicted label of the training data.
# The argmax converts the class probability output to class label
y_predicted = np.argmax(get_output(X), axis=1)
print(y_predicted)
The learning rate seems way too high. Try a lower learning rate first. It might be that your model diverges on the task. Hard to tell without being able to try it on your data.
Related
I am attempting to reduce the dimensionality of a categorical feature by extracting an embedding layer from a neural net and using it as an input feature in a separate XGBoost model.
An embedding layer has the dimensions (nr. unique categories + 1, chosen output size). How can it be concatenated to the continuous variables in the original training data with the dimensions (nr. observations, nr. features)?
Below is a reproducible example of regression with a neural net, in which a categorical feature is encoded as a learned embedding layer. The example is closely adapted from:
http://machinelearningmechanic.com/keras/2018/03/09/keras-regression-with-categorical-variable-embeddings-md.html#Define-the-input-layers
At the end I have printed the embedding layer and its shape. How can this layer be merged with the continuous features in the original training data (X_train_continuous)? If the number of rows were equal to the number of categories and if we knew the order in which categories are represented in the embedding layer, the embedding array could perhaps be joined to the training observations on category, but instead the number of rows equals the number of categories + 1 (in the code: len(values) + 1).
# Imports and helper functions
import numpy as np
import pandas as pd
import numpy as np
import pandas as pd
import keras
from keras.models import Sequential
from keras.layers import Dense, BatchNormalization
from keras.layers import Input, Embedding, Dense
from keras.models import Model
from keras.callbacks import Callback
import matplotlib.pyplot as plt
# Bayesian Methods for Hackers style sheet
plt.style.use('bmh')
np.random.seed(1234567890)
class PeriodicLogger(Callback):
"""
A helper callback class that only prints the losses once in 'display' epochs
"""
def __init__(self, display=100):
self.display = display
def on_train_begin(self, logs={}):
self.epochs = 0
def on_epoch_end(self, batch, logs={}):
self.epochs += 1
if self.epochs % self.display == 0:
print("Epoch: %d - loss: %f - val_loss: %f" % (
self.epochs, logs['loss'], logs['val_loss']))
periodic_logger_250 = PeriodicLogger(250)
# Define the mapping and a function that computes the house price for each
# example
per_meter_mapping = {
'Mercaz': 500,
'Old North': 350,
'Florentine': 230
}
per_room_additional_price = {
'Mercaz': 15. * 10 ** 4,
'Old North': 8. * 10 ** 4,
'Florentine': 5. * 10 ** 4
}
def house_price_func(row):
"""
house_price_func is the function f(a,s,n).
:param row: dict (contains the keys: ['area', 'size', 'n_rooms'])
:return: float
"""
area, size, n_rooms = row['area'], row['size'], row['n_rooms']
return size * per_meter_mapping[area] + n_rooms * \
per_room_additional_price[area]
# Create toy data
AREAS = ['Mercaz', 'Old North', 'Florentine']
def create_samples(n_samples):
"""
Helper method that creates dataset DataFrames
Note that the np.random.choice call only determines the number of rooms and the size of the house
(the price, which we calculate later, is deterministic)
:param n_samples: int (number of samples for each area (suburb))
:return: pd.DataFrame
"""
samples = []
for n_rooms in np.random.choice(range(1, 6), n_samples):
samples += [(area, int(np.random.normal(25, 5)), n_rooms) for area in
AREAS]
return pd.DataFrame(samples, columns=['area', 'size', 'n_rooms'])
# Create the train and validation sets
train = create_samples(n_samples=1000)
val = create_samples(n_samples=100)
# Calculate the prices for each set
train['price'] = train.apply(house_price_func, axis=1)
val['price'] = val.apply(house_price_func, axis=1)
# Define the features and the y vectors
continuous_cols = ['size', 'n_rooms']
categorical_cols = ['area']
y_col = ['price']
X_train_continuous = train[continuous_cols]
X_train_categorical = train[categorical_cols]
y_train = train[y_col]
X_val_continuous = val[continuous_cols]
X_val_categorical = val[categorical_cols]
y_val = val[y_col]
# Normalization
# Normalizing both train and test sets to have 0 mean and std. of 1 using the
# train set mean and std.
# This will give each feature an equal initial importance and speed up the
# training time
train_mean = X_train_continuous.mean(axis=0)
train_std = X_train_continuous.std(axis=0)
X_train_continuous = X_train_continuous - train_mean
X_train_continuous /= train_std
X_val_continuous = X_val_continuous - train_mean
X_val_continuous /= train_std
# Build a model using a categorical variable
# First let's define a helper class for the categorical variable
class EmbeddingMapping():
"""
Helper class for handling categorical variables
An instance of this class should be defined for each categorical variable
we want to use.
"""
def __init__(self, series):
# get a list of unique values
values = series.unique().tolist()
# Set a dictionary mapping from values to integer value
# In our example this will be {'Mercaz': 1, 'Old North': 2,
# 'Florentine': 3}
self.embedding_dict = {value: int_value + 1 for int_value, value in
enumerate(values)}
# The num_values will be used as the input_dim when defining the
# embedding layer.
# It will also be returned for unseen values
self.num_values = len(values) + 1
def get_mapping(self, value):
# If the value was seen in the training set, return its integer mapping
if value in self.embedding_dict:
return self.embedding_dict[value]
# Else, return the same integer for unseen values
else:
return self.num_values
# Create an embedding column for the train/validation sets
area_mapping = EmbeddingMapping(X_train_categorical['area'])
X_train_categorical = \
X_train_categorical.assign(area_mapping=X_train_categorical['area']
.apply(area_mapping.get_mapping))
X_val_categorical = \
X_val_categorical.assign(area_mapping=X_val_categorical['area']
.apply(area_mapping.get_mapping))
# Define the input layers
# Define the embedding input
area_input = Input(shape=(1,), dtype='int32')
# Decide to what vector size we want to map our 'area' variable.
# I'll use 1 here because we only have three areas
embeddings_output = 2
# Let’s define the embedding layer and flatten it
area_embedings = Embedding(output_dim=embeddings_output,
input_dim=area_mapping.num_values,
input_length=1, name="embedding_layer")(area_input)
area_embedings = keras.layers.Reshape((embeddings_output,))(area_embedings)
# Define the continuous variables input (just like before)
continuous_input = Input(shape=(X_train_continuous.shape[1], ))
# Concatenate continuous and embeddings inputs
all_input = keras.layers.concatenate([continuous_input, area_embedings])
# To merge them together we will use Keras Functional API
# Will define a simple model with 2 hidden layers, with 25 neurons each.
# Define the model
units=25
dense1 = Dense(units=units, activation='relu')(all_input)
dense2 = Dense(units, activation='relu')(dense1)
predictions = Dense(1)(dense2)
# Note using the input object 'area_input' not 'area_embeddings'
model = Model(inputs=[continuous_input, area_input], outputs=predictions)
# Lets train the model
epochs = 100 # to train properly, use 10000
model.compile(loss='mse',
optimizer=keras.optimizers.Adam(lr=.8, beta_1=0.9,
beta_2=0.999, decay=1e-03,
amsgrad=True))
# Note continuous and categorical columns are inserted in the same order as
# defined in all_inputs
history = model.fit([X_train_continuous, X_train_categorical['area_mapping']],
y_train, epochs=epochs, batch_size=128, callbacks=[
periodic_logger_250], verbose=0,
validation_data=([X_val_continuous, X_val_categorical[
'area_mapping']], y_val))
# Observe the embedding layer
embeddings_output = model.get_layer('embedding_layer').get_weights()[0]
print(f'Embedding layer:\n{embeddings_output}')
print(f'Embedding layer shape: {embeddings_output.shape}')
First, this post has a terminology problem: an "embedding" is the representation of a particular input sample. It is the vector output by a layer. The "weights" are the matrices stored and trained inside the layer.
In Keras, the Model class is a subclass of Layer. You can use any Model as a Layer in a larger model.
You can create a Model with just the Embedding layer, then use it as a layer when building the rest of your model. After training, you can call .predict() on that "sub-model". Also, you can save that sub-model out to a json file and reload it later.
This is the standard technique for creating a model that emits internal embeddings.
To get the embedding layer outputs with shape (nr. samples, chosen output size):
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer("embedding_layer")
.output)
embedding_output = \
intermediate_layer_model.predict([X_train_continuous,
X_train_categorical['area_mapping']])
print(embedding_output.shape) # (3000, 1, 2)
intermediate_output = \
embedding_output.reshape(embedding_output.shape[0], -1)
print(intermediate_output.shape) # (3000, 2)
One thing you can do is to run your 'pretrained' model with each layer having a unique name and save it
Then, create your new model, with the same named layers you want to keep, and use
Model.load_weights(file_path, by_name=True)
This will let you keep all of the layers that you want and let you change everything afterwards
I have several DistributionLambda layers as the outputs of one model, and I would like to make a Concatenate-like operation into a new layer, in order to have only one output that is the mix of all the distributions, assuming they are independent. Then, I can apply a log-likelihood loss to the output of the model. Otherwise, I cannot apply the loss over a Concatenate layer, because it lost the log_prob method. I have been trying with the Blockwise distribution, but with no luck so far.
Here an example code:
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers
from tensorflow_probability import distributions
from tensorflow_probability import layers as tfp_layers
def likelihood_loss(y_true, y_pred):
"""Adding negative log likelihood loss."""
return -y_pred.log_prob(y_true)
def distribution_fn(params):
"""Distribution function."""
return distributions.Normal(
params[:, 0], math.log(1.0 + math.exp(params[:, 1])))
output_steps = 3
...
lstm_layer = layers.LSTM(10, return_state=True)
last_layer, l_h, l_c = lstm_layer(last_layer)
lstm_state = [l_h, l_c]
dense_layer = layers.Dense(2)
last_layer = dense_layer(last_layer)
last_layer = tfp_layers.DistributionLambda(
make_distribution_fn=distribution_fn)(last_layer)
output_layers = [last_layer]
# Get output sequence, re-injecting the output of each step
for number in range(1, output_steps):
last_layer = layers.Reshape((1, 1))(last_layer)
last_layer, l_h, l_c = lstm_layer(last_layer, initial_state=lstm_states)
# Storing state for next time step
lstm_states = [l_h, l_c]
last_layer = tfp_layers.DistributionLambda(
make_distribution_fn=distribution_fn)(dense_layer(last_layer))
output_layers.append(last_layer)
# This does not work
# last_layer = distributions.Blockwise(output_layers)
# This works for the model but cannot compute loss
# last_layer = layers.Concatenate(axis=1)(output_layers)
the_model = models.Model(inputs=[input_layer], outputs=[last_layer])
the_model.compile(loss=likelihood_loss, optimizer=optimizers.Adam(lr=0.001))
The problem is your Input, not your output layer ;)
Input:0 is referenced in your error message.
Could you try to be more specific about your input?
I would like to check if noise is truly added and used during training of my neural network. I therefore build my NN with keras like this:
from keras.layers import Input
from keras.layers.noise import GaussianNoise
inp = Input(tensor=self.X_ph)
noised_x = GaussianNoise(stddev=self.x_noise_std)(inp)
x = Dense(15, activation='elu')(noised_x)
x = Dense(15, activation='elu')(x)
self.estimator = x
...
# kernel weights, as output by the neural network
self.logits = logits = Dense(n_locs * self.n_scales, activation='softplus')(self.estimator)
self.weights = tf.nn.softmax(logits)
# mixture distributions
self.cat = cat = Categorical(logits=logits)
self.components = components = [MultivariateNormalDiag(loc=loc, scale_diag=scale) for loc in locs_array for scale in scales_array]
self.mixtures = mixtures = Mixture(cat=cat, components=components, value=tf.zeros_like(self.y_ph))
Then I use edward to execute inference:
self.inference = ed.MAP(data={self.mixtures: self.y_ph})
self.inference.initialize(var_list=tf.trainable_variables(), n_iter=self.n_training_epochs)
tf.global_variables_initializer().run()
According to the documentation, the closest I get to this is through ed.MAP's run() and update() functions.
Preferably, I would do something like this:
noised_x = self.sess.run(self.X_ph, feed_dict={self.X_ph: X, self.y_ph: Y})
np.allclose(noised_x, X) --> False
How can I properly verify that noise is being used at train time and disabled at test time within ed.MAP?
Update 1
Apparently the way I use GaussianNoise doesn't seem to add noise to my input since the following unittest fails:
X, Y = self.get_samples(std=1.0)
model_no_noise = KernelMixtureNetwork(n_centers=5, x_noise_std=None, y_noise_std=None)
model_no_noise.fit(X,Y)
var_no_noise = model_no_noise.covariance(x_cond=np.array([[2]]))[0][0][0]
model_noise = KernelMixtureNetwork(n_centers=5, x_noise_std=20.0, y_noise_std=20.0)
model_noise.fit(X, Y)
var_noise = model_noise.covariance(x_cond=np.array([[2]]))[0][0][0]
self.assertGreaterEqual(var_noise - var_no_noise, 0.1)
I also made sure that during the inference.update(...) the assertion
assert tf.keras.backend.learning_phase() == 1
passes.
Where could have something gone wrong here?
I have recently started using TensorFlow (TF), and I have come across a problem that I need some help with. Basically, I've restored a pre-trained model, and I need to modify the weights and biases of one of its layers before I retest its accuracy. Now, my problem is the following:
how can I change the weights and biases using the assign method in TF? Is modifying the weights of a restored modeled even possible in TF?
Here is my code:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data # Imports the MINST dataset
# Data Set:
# ---------
mnist = input_data.read_data_sets("/home/frr/MNIST_data", one_hot=True)# An object where data is stored
ImVecDim = 784# The number of elements in a an image vector (flattening a 28x28 2D image)
NumOfClasses = 10
g = tf.get_default_graph()
with tf.Session() as sess:
LoadMod = tf.train.import_meta_graph('simple_mnist.ckpt.meta') # This object loads the model
LoadMod.restore(sess, tf.train.latest_checkpoint('./'))# Loading weights and biases and other stuff to the model
# ( Here I'd like to modify the weights and biases of layer 1, set them to one for example, before I go ahead and test the accuracy ) #
# Testing the acuracy of the model:
X = g.get_tensor_by_name('ImageIn:0')
Y = g.get_tensor_by_name('LabelIn:0')
KP = g.get_tensor_by_name('KeepProb:0')
Accuracy = g.get_tensor_by_name('NetAccuracy:0')
feed_dict = { X: mnist.test.images[:256], Y: mnist.test.labels[:256], KP: 1.0 }
print( 'Model Accuracy = ' )
print( sess.run( Accuracy, feed_dict ) )
In addition to an existing answer, tensor update can be performed via tf.assign function.
v1 = sess.graph.get_tensor_by_name('v1:0')
print(sess.run(v1)) # 1.0
sess.run(tf.assign(v1, v1 + 1))
print(sess.run(v1)) # 2.0
Thanks for everyone who responded. I'd just like to put the pieces together. This is the code the helped me accomplish what I want:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data # Imports the MINST dataset
# Data Set:
# ---------
mnist = input_data.read_data_sets("/home/frr/MNIST_data", one_hot=True)# An object where data is stored
ImVecDim = 784# The number of elements in a an image vector (flattening a 28x28 2D image)
NumOfClasses = 10
g = tf.get_default_graph()
with tf.Session() as sess:
LoadMod = tf.train.import_meta_graph('simple_mnist.ckpt.meta') # This object loads the model
LoadMod.restore(sess, tf.train.latest_checkpoint('./'))# Loading weights and biases and other stuff to the model
wc1 = g.get_tensor_by_name('wc1:0')
sess.run( tf.assign( wc1,tf.multiply(wc1,0) ) )# Setting the values of the variable 'wc1' in the model to zero.
# Testing the acuracy of the model:
X = g.get_tensor_by_name('ImageIn:0')
Y = g.get_tensor_by_name('LabelIn:0')
KP = g.get_tensor_by_name('KeepProb:0')
Accuracy = g.get_tensor_by_name('NetAccuracy:0')
feed_dict = { X: mnist.test.images[:256], Y: mnist.test.labels[:256], KP: 1.0 }
print( 'Model Accuracy = ' )
print( sess.run( Accuracy, feed_dict ) )
Yes it is possible. Your weights and biases are already loaded after you loaded the meta graph. You need to find their names (see the list_variables function) and then assign them to a Python variable.
For that, use tf.get_variable with the variable name. You might have to set reuse=True on your variable scope. See this answer for more detail on reusing variables.
Once you have them as a weights variable, you can call sess.run(weights.assign(...)).
An update to this for Tensorflow 2.4 using a different example than OP's.
# Step 0 - Init
model = # some tf.keras.Model
model_folder = # path to model files
ckpt_obj = tf.train.Checkpoint(model=model)
ckpt_obj.restore(save_path=tf.train.latest_checkpoint(str(model_folder))).expect_partial()
# Step 1 - Loop over all layers
for layer in model.layers:
# Step 2 - Loop over submodules of a layer
for submodule in layer.submodules:
# Step 3 - Find a particular type of submodule (alternative use submodule.name=='SomeName')
if type(submodule) == tfp.layers.Convolution3DFlipout: # kernel=N(loc,scale) --> N=Normal distro
# Step 4 - Extract numpy weights using .get_weights()
## Note: Different compared to submodule.weights which returns a tensor that shall also have a name e.g. wc1:0
weights = submodule.get_weights() # [scale, rho, bias] --> kernel=N(loc,scale=tfp.bijectors.Softplus(rho)) --> output=input*kernel + bias
# Step 5 - Set weights as a new numpy array of your choice
weights[1] = np.full(weights[1].shape, -np.inf)
# Step 6 - Update weights
submodule.set_weights(weights)
input = tf.random.normal((1,100,100,100,1)) # 3D input with batch=1, channels=1
_ = model(input)
I have trained a simple long short-term memory (lstm) model in lasagne following the recipie here:https://github.com/Lasagne/Recipes/blob/master/examples/lstm_text_generation.py
Here is the architecture:
l_in = lasagne.layers.InputLayer(shape=(None, None, vocab_size))
# We now build the LSTM layer which takes l_in as the input layer
# We clip the gradients at GRAD_CLIP to prevent the problem of exploding gradients.
l_forward_1 = lasagne.layers.LSTMLayer(
l_in, N_HIDDEN, grad_clipping=GRAD_CLIP,
nonlinearity=lasagne.nonlinearities.tanh)
l_forward_2 = lasagne.layers.LSTMLayer(
l_forward_1, N_HIDDEN, grad_clipping=GRAD_CLIP,
nonlinearity=lasagne.nonlinearities.tanh)
# The l_forward layer creates an output of dimension (batch_size, SEQ_LENGTH, N_HIDDEN)
# Since we are only interested in the final prediction, we isolate that quantity and feed it to the next layer.
# The output of the sliced layer will then be of size (batch_size, N_HIDDEN)
l_forward_slice = lasagne.layers.SliceLayer(l_forward_2, -1, 1)
# The sliced output is then passed through the softmax nonlinearity to create probability distribution of the prediction
# The output of this stage is (batch_size, vocab_size)
l_out = lasagne.layers.DenseLayer(l_forward_slice, num_units=vocab_size, W = lasagne.init.Normal(), nonlinearity=lasagne.nonlinearities.softmax)
# Theano tensor for the targets
target_values = T.ivector('target_output')
# lasagne.layers.get_output produces a variable for the output of the net
network_output = lasagne.layers.get_output(l_out)
# The loss function is calculated as the mean of the (categorical) cross-entropy between the prediction and target.
cost = T.nnet.categorical_crossentropy(network_output,target_values).mean()
# Retrieve all parameters from the network
all_params = lasagne.layers.get_all_params(l_out)
# Compute AdaGrad updates for training
print("Computing updates ...")
updates = lasagne.updates.adagrad(cost, all_params, LEARNING_RATE)
# Theano functions for training and computing cost
print("Compiling functions ...")
train = theano.function([l_in.input_var, target_values], cost, updates=updates, allow_input_downcast=True)
compute_cost = theano.function([l_in.input_var, target_values], cost, allow_input_downcast=True)
# In order to generate text from the network, we need the probability distribution of the next character given
# the state of the network and the input (a seed).
# In order to produce the probability distribution of the prediction, we compile a function called probs.
probs = theano.function([l_in.input_var],network_output,allow_input_downcast=True)
and the model is trained via:
for it in xrange(data_size * num_epochs / BATCH_SIZE):
try_it_out() # Generate text using the p^th character as the start.
avg_cost = 0;
for _ in range(PRINT_FREQ):
x,y = gen_data(p)
#print(p)
p += SEQ_LENGTH + BATCH_SIZE - 1
if(p+BATCH_SIZE+SEQ_LENGTH >= data_size):
print('Carriage Return')
p = 0;
avg_cost += train(x, y)
print("Epoch {} average loss = {}".format(it*1.0*PRINT_FREQ/data_size*BATCH_SIZE, avg_cost / PRINT_FREQ))
How can I save the model so I do not need to train it again? With scikit I generally just pickle the model object. However I am unclear on the analogous process with Theano / lasagne.
You can save the weights with numpy:
np.savez('model.npz', *lasagne.layers.get_all_param_values(network_output))
And load them again later on like this:
with np.load('model.npz') as f:
param_values = [f['arr_%d' % i] for i in range(len(f.files))]
lasagne.layers.set_all_param_values(network_output, param_values)
Source: https://github.com/Lasagne/Lasagne/blob/master/examples/mnist.py
As for the model definition itself: One option is certainly to keep the code and regenerate the network, before setting the pretrained weights.
You can save the model parameters and the model by Pickle
import cPickle as pickle
import os
#save the network and its parameters as a dictionary
netInfo = {'network': network, 'params': lasagne.layers.get_all_param_values(network)}
Net_FileName = 'LSTM.pkl'
# save the dictionary as a .pkl file
pickle.dump(netInfo, open(os.path.join(/path/to/a/folder/, Net_FileName), 'wb'),protocol=pickle.HIGHEST_PROTOCOL)
After saving your model, it can be retrieved by pickle.load:
net = pickle.load(open(os.path.join(/path/to/a/folder/,Net_FileName),'rb'))
all_params = net['params']
lasagne.layers.set_all_param_values(net['network'], all_params)
I've had success using dill in combination with the numpy.savez function:
import dill as pickle
...
np.savez('model.npz', *lasagne.layers.get_all_param_values(network))
with open('model.dpkl','wb') as p_output:
pickle.dump(network, p_output)
To import the pickled model:
with open('model.dpkl', 'rb') as p_input:
network = pickle.load(p_input)
with np.load('model.npz') as f:
param_values = [f['arr_%d' % i] for i in range(len(f.files))]
lasagne.layers.set_all_param_values(network, param_values)