Let's suppose we have a neural nets with three layers : Inputs > Hidden > Outputs and consider that the weigths between the Hidden and Outputs layers are : W, b where W is a matrix of shape (N, M). By default, all components of W and b are set as trainable in keras. I know how to set the entire W or b as non trainable like in the link below:
How to set parameters in keras to be non-trainable?
What I want is to be able to set only a specific component of W (for example) to be non trainable. For instance, If:
W = [[W11, W12]
[W21, W22]]
Which can be rewritten in:
W = [W1, W2] with W1 = [W11, W12] and W2 = [W21, W22]
and all W1 and W2 are of type tf.Variable,
How to set for instance W1 as non trainable?
I looked for some other topics but non of them helps me to get what I want. Some examples of links are belows:
Link 1 : https://keras.io/guides/transfer_learning/
Link 2 : https://github.com/tensorflow/tensorflow/issues/47597
Can anyone help me to solve this?
Thank you in advance
The tensor W is stored as a single tf.Variable (not four variables w11, w12, w21, w22) and tf.Variable.trainable controls entire tensors, not sub tensors. Worse yet, inside a keras layer, all variables have the same trainable attribute, because they are controlled by the tf.keras.layers.Layer.trainable attribute.
To do what you want, you'd want two variables W1 and W2, each wrapped in a different instance of a layer. You'd apply each layer to the input, resulting in half the answer. Then you can concat to get the complete answer.
You can create your own layers in keras. This will help you to customize your weights within your layers, e.g., whether they are trainable or not.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress Tensorflow messages
import tensorflow as tf
from keras.layers import *
from keras.models import *
# Your custom layer
class Linear(Layer):
def __init__(self, units=32,**kwargs):
super(Linear, self).__init__(**kwargs)
self.units = units
def build(self, input_shape):
self.w = self.add_weight(
shape=(input_shape[-1], self.units),
initializer="random_normal",
trainable=True,
)
self.b = self.add_weight(
shape=(self.units,), initializer="random_normal", trainable=False
)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
In Linear, the weights w are trainable and the bias b are not. Here, I am creating a training loop for dummy data to visualize the weight updating.
batch_size=10
input_shape=(batch_size,5,5)
## model
model = Sequential()
model.add(Input(shape=input_shape))
model.add(Linear(units=4,name='my_linear_layer'))
model.add(Dense(1))
## dummy dataset
x = tf.random.normal(input_shape) # dummy input
y = tf.ones((batch_size,1)) # dummy output
## loss functions and optimizer
loss_fn = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-2)
### training loop
epochs = 3
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
tf.print(model.get_layer('my_linear_layer').get_weights())
# Open a GradientTape to record the operations run
# during the forward pass, which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
This loop returns the following result,
Start of epoch 0
[array([[ 0.08920084, -0.04294993, 0.06111819, 0.08334437],
[-0.0369432 , -0.05014499, 0.0305218 , -0.07486793],
[-0.01227043, 0.09460627, -0.0560123 , 0.01324316],
[-0.00255878, 0.00214959, -0.02924518, 0.04721532],
[-0.05532415, -0.02014978, -0.06785563, -0.07330619]],
dtype=float32),
array([ 0.02154647, 0.05153348, -0.00128291, -0.06794706], dtype=float32)]
Start of epoch 1
[array([[ 0.08961578, -0.04327399, 0.06152926, 0.08325274],
[-0.03829437, -0.04908974, 0.02918325, -0.07456956],
[-0.01417133, 0.09609085, -0.05789544, 0.01366292],
[-0.00236284, 0.00199657, -0.02905108, 0.04717206],
[-0.05536905, -0.02011472, -0.06790011, -0.07329627]],
dtype=float32),
array([ 0.02154647, 0.05153348, -0.00128291, -0.06794706], dtype=float32)]
Start of epoch 2
[array([[ 0.09001605, -0.04358549, 0.06192534, 0.08316355],
[-0.03960795, -0.04806747, 0.02788337, -0.07427685],
[-0.01599812, 0.09751251, -0.05970317, 0.01406999],
[-0.00217021, 0.00184666, -0.02886046, 0.04712913],
[-0.05540781, -0.02008455, -0.06793848, -0.07328764]],
dtype=float32),
array([ 0.02154647, 0.05153348, -0.00128291, -0.06794706], dtype=float32)]
As you see while the weights w are updating, the bias b stays constant.
So i'm trying to solve a similar problem at the moment. What you would need to do is first use the functional API of keras. Then put all the weights that you want to be trainable into one layers and all the weights you want to be non-trianable into another layer. Have the previous layer input into both these layers. Then what you can do is use the tensorflow concatenate layer to combine the layers back together. So say you had a hidden layer with 5 neurons, 3 where you wanted them to be trainable and 2 where you wanted them to be non-trainable.
X = Dense(5, activation='relu')(X) #previous layer
Y = Dense(3, activation='relu',name='trainable_layer')(X)
Z = Dense(2, activation='relu',name='non_trainable_layer')(X)
Z.trainable = False
X = Concatenate()([Y, Z])
X = Dense(5, activation='relu')(X) #layer after layer with mixed trainable weights
Related
I am having an issue with my code that I modified from https://keras.io/examples/generative/wgan_gp/ . Instead of the data being images, my data is a (1001,2) array of sequential data. The first column being the time and the second the velocity measurements. I'm getting this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_14704/3651127346.py in <module>
21 # Training the WGAN-GP model
22 tic = time.perf_counter()
---> 23 WGAN.fit(dataset, batch_size=batch_Size, epochs=n_epochs, callbacks=[cbk])
24 toc = time.perf_counter()
25 time_elapsed(toc-tic)
~\Anaconda3\lib\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
~\Anaconda3\lib\site-packages\tensorflow\python\framework\func_graph.py in autograph_handler(*args, **kwargs)
1145 except Exception as e: # pylint:disable=broad-except
1146 if hasattr(e, "ag_error_metadata"):
-> 1147 raise e.ag_error_metadata.to_exception(e)
1148 else:
1149 raise
ValueError: in user code:
File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1021, in train_function *
return step_function(self, iterator)
File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "C:\Users\sissonn\AppData\Local\Temp/ipykernel_14704/3074469771.py", line 141, in train_step
gp = self.gradient_penalty(batch_size, x_real, x_fake)
File "C:\Users\sissonn\AppData\Local\Temp/ipykernel_14704/3074469771.py", line 106, in gradient_penalty
alpha = tf.random.uniform(batch_size,1,1)
ValueError: Shape must be rank 1 but is rank 0 for '{{node random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0](strided_slice)' with input shapes: [].
And here is my code:
import time
from tqdm.notebook import tqdm
import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Input
import numpy as np
import matplotlib.pyplot as plt
def define_generator(latent_dim):
# This function creates the generator model using the functional API.
# Layers...
# Input Layer
inputs = Input(shape=latent_dim, name='INPUT_LAYER')
# 1st hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_1')(inputs)
# 2nd hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_2')(x)
# 3rd hidden layer
x = Dense(300, activation='relu', name='HIDDEN_LAYER_3')(x)
# 4th hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_4')(x)
# 5th hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_5')(x)
# Output layer
outputs = Dense(2, activation='linear', name='OUPUT_LAYER')(x)
# Instantiating the generator model
model = Model(inputs=inputs, outputs=outputs, name='GENERATOR')
return model
def generator_loss(fake_logits):
# This function calculates and returns the WGAN-GP generator loss.
# Expected value of critic ouput from fake images
expectation_fake = tf.reduce_mean(fake_logits)
# Loss to minimize
loss = -expectation_fake
return loss
def define_critic():
# This function creates the critic model using the functional API.
# Layers...
# Input Layer
inputs = Input(shape=2, name='INPUT_LAYER')
# 1st hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_1')(inputs)
# 2nd hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_2')(x)
# 3rd hidden layer
x = Dense(300, activation='relu', name='HIDDEN_LAYER_3')(x)
# 4th hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_4')(x)
# 5th hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_5')(x)
# Output layer
outputs = Dense(1, activation='linear', name='OUPUT_LAYER')(x)
# Instantiating the critic model
model = Model(inputs=inputs, outputs=outputs, name='CRITIC')
return model
def critic_loss(real_logits, fake_logits):
# This function calculates and returns the WGAN-GP critic loss.
# Expected value of critic output from real images
expectation_real = tf.reduce_mean(real_logits)
# Expected value of critic output from fake images
expectation_fake = tf.reduce_mean(fake_logits)
# Loss to minimize
loss = expectation_fake - expectation_real
return loss
class define_wgan(keras.Model):
# This class creates the WGAN-GP object.
# Attributes:
# critic = the critic model.
# generator = the generator model.
# latent_dim = defines generator input dimension.
# critic_steps = defines how many times the discriminator gets trained for each training cycle.
# gp_weight = defines and returns the critic gradient for the gradient penalty term.
# Methods:
# compile() = defines the optimizer and loss function of both the critic and generator.
# gradient_penalty() = calcuates and returns the gradient penalty term in the WGAN-GP loss function.
# train_step() = performs the WGAN-GP training by updating the critic and generator weights
# and returns the loss for both. Called by fit().
def __init__(self, gen, critic, latent_dim, n_critic_train, gp_weight):
super().__init__()
self.critic = critic
self.generator = gen
self.latent_dim = latent_dim
self.critic_steps = n_critic_train
self.gp_weight = gp_weight
def compile(self, generator_loss, critic_loss):
super().compile()
self.generator_optimizer = keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5, beta_2=0.9)
self.critic_optimizer = keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5, beta_2=0.9)
self.generator_loss_function = generator_loss
self.critic_loss_function = critic_loss
def gradient_penalty(self, batch_size, x_real, x_fake):
# Random uniform samples of points between distribution.
# "alpha" must be a tensor so that "x_interp" will also be a tensor.
alpha = tf.random.uniform(batch_size,1,1)
# Data interpolated between real and fake distributions
x_interp = alpha*x_real + (1-alpha)*x_fake
# Calculating critic output gradient wrt interpolated data
with tf.GradientTape() as gp_tape:
gp_tape.watch(x_interp)
critc_output = self.discriminator(x_interp, training=True)
grad = gp_tape.gradient(critic_output, x_interp)[0]
# Calculating norm of gradient
grad_norm = tf.sqrt(tf.reduce_sum(tf.square(grad)))
# calculating gradient penalty
gp = tf.reduce_mean((norm - 1.0)**2)
return gp
def train_step(self, x_real):
# Critic training
# Getting batch size for creating latent vectors
print(x_real)
batch_size = tf.shape(x_real)[0]
print(batch_size)
# Critic training loop
for i in range(self.critic_steps):
# Generating latent vectors
latent = tf.random.normal(shape=(batch_size, self.latent_dim))
with tf.GradientTape() as tape:
# Obtaining fake data from generator
x_fake = self.generator(latent, training=True)
# Critic output from fake data
fake_logits = self.critic(x_fake, training=True)
# Critic output from real data
real_logits = self.critic(x_real, training=True)
# Calculating critic loss
c_loss = self.critic_loss_function(real_logits, fake_logits)
# Calcuating gradient penalty
gp = self.gradient_penalty(batch_size, x_real, x_fake)
# Adjusting critic loss with gradient penalty
c_loss = c_loss + gp_weight*gp
# Calculating gradient of critic loss wrt critic weights
critic_grad = tape.gradient(c_loss, self.critic.trainable_variables)
# Updating critic weights
self.critic_optimizer.apply_gradients(zip(critic_gradient, self.critic.trainable_variables))
# Generator training
# Generating latent vectors
latent = tf.random.normal(shape=(batch_size, self.latent_dim))
with tf.GradientTape() as tape:
# Obtaining fake data from generator
x_fake = self.generator(latent, training=True)
# Critic output from fake data
fake_logits = self.critic(x_fake, training=True)
# Calculating generator loss
g_loss = self.generator_loss_function(fake_logits)
# Calculating gradient of generator loss wrt generator weights
genertor_grad = tape.gradient(g_loss, self.generator.trainable_variables)
# Updating generator weights
self.generator_optimizer.apply_gradients(zip(generator_gradient, self.generator.trainable_variables))
return g_loss, c_loss
class GAN_monitor(keras.callbacks.Callback):
def __init__(self, n_samples, latent_dim):
self.n_samples = n_samples
self.latent_dim = latent_dim
def on_epoch_end(self, epoch, logs=None):
latent = tf.random.normal(shape=(self.n_samples, self.latent_dim))
generated_data = self.model.generator(latent)
plt.plot(generated_data)
plt.savefig('Epoch _'+str(epoch)+'.png', dpi=300)
data = np.genfromtxt('Flight_1.dat', dtype='float', encoding=None, delimiter=',')[0:1001,0]
time_span = np.linspace(0,20,1001)
dataset = np.concatenate((time_sapn[:,np.newaxis], data[:,np.newaxis]), axis=1)
dataset.shape
# Training Parameters
latent_dim = 100
n_epochs = 10
n_critic_train = 5
gp_weight = 10
batch_Size = 100
# Instantiating the generator and discriminator models
gen = define_generator(latent_dim)
critic = define_critic()
# Instantiating the WGAN-GP object
WGAN = define_wgan(gen, critic, latent_dim, n_critic_train, gp_weight)
# Compling the WGAN-GP model
WGAN.compile(generator_loss, critic_loss)
# Instantiating custom Keras callback
cbk = GAN_monitor(n_samples=1, latent_dim=latent_dim)
# Training the WGAN-GP model
tic = time.perf_counter()
WGAN.fit(dataset, batch_size=batch_Size, epochs=n_epochs, callbacks=[cbk])
toc = time.perf_counter()
time_elapsed(toc-tic)
This issue is the shape I am providing to tf.random.rand() for the assignment of alpha. I don't fully understand why the shape input is (batch_size, 1, 1, 1) in the Keras example. So I don't know how to specify the shape for my example. Furthermore I don't understand this line in the Keras example:
batch_size = tf.shape(real_images)[0]
In this example 'real_images' is a (60000, 28, 28, 1) array and it gets passed to the fit() method which then passes it to the train_step() method. (It gets passed as "train_images", but they are the same variable.) If I add a line that prints out 'real_images' before this tf.shape() this is what it produces:
Tensor("IteratorGetNext:0", shape=(None, 28, 28, 1), dtype=float32)
Why is the 60000 now None? Then, I added a line that printed out "batch_size" after the tf.shape() and this is what it produces:
Tensor("strided_slice:0", shape=(), dtype=int32)
I googled "tf strided_slice", but all I could find is the method tf.strided_slice(). So what exactly is the value of "batch_size" and why are the output of variables so ambiguous when they are tensors? In fact, I type:
tf.shape(train_images)[0]
in another cell of Jupyter notebook. I get a completely different output:
<tf.Tensor: shape=(), dtype=int32, numpy=60000>
I really need to understand this Keras example in order to successfully implement this code for my data. Any help is appreciated.
BTW: I am using only one set of data for now, but once I get the GAN running, I will provide multiple sets of these (1001,2) datasets. Also, if you want to test the code yourself, replacing the "dataset" variable with any (1001,2) numpy array should suffice. Thank You.
'Why is the 60000 now None?': In defining TensorFlow models, the first dimension (batch_size) is None. Getting under the hood of what goes on with TensorFlow and how it uses graphs for computation can be quite complex. But for your understanding right now, all you need to know is that batch_size does not need to be specified when defining the model, hence None. This is essential as it allow a model to be defined once but then trained with and applied to datasets of an arbitrary number of examples. For example, when training you may provide the model with a batch of 256 images at a time, but when using the trained model for inference, it's very likely that you might only want the input to be a single image. Therefore the actual value of the first dimension of the size of the input is only important once the computation is going to begin.
'I don't fully understand why the shape input is (batch_size, 1, 1, 1) in the Keras example': The reason for this size is that you want a different random value, alpha, for each image. You have batch_size number of images, hence batch_size in the first dimension, but it is just a single value in tensor format, so it only need size 1 in all other dimensions. The reason it has 4 dimensions overall is so that it can be used in calculation with your inputs, which are 4-D image tensors which will have a shape of something like (batch_size, img_h, img_w, 3) for color images with 3 RGB channels.
In terms of understanding your error, Shape must be rank 1 but is rank 0, this is saying that the function you are using - tf.random.uniform requires a rank 1 tensor, i.e. something with 1 dimension, but is being passed a rank 0 tensor, i.e. a scalar value. It is possible from your code that you are just passing it the value of batch_size rather than a tensor. This might work instead:
alpha = tf.random.uniform([batch_size, 1, 1, 1])
The first parameter of this function is its shape and so it is important to have the [] there. Check out the documentation on this function in order to make sure you're using it correctly - https://www.tensorflow.org/api_docs/python/tf/random/uniform.
I've created the following Keras custom model:
import tensorflow as tf
from tensorflow.keras.layers import Layer
class MyModel(tf.keras.Model):
def __init__(self, num_classes):
super(MyModel, self).__init__()
self.dense_layer = tf.keras.layers.Dense(num_classes,activation='softmax')
self.lambda_layer = tf.keras.layers.Lambda(lambda x: tf.math.argmax(x, axis=-1))
def call(self, inputs):
x = self.dense_layer(inputs)
x = self.lambda_layer(x)
return x
# A convenient way to get model summary
# and plot in subclassed api
def build_graph(self, raw_shape):
x = tf.keras.layers.Input(shape=(raw_shape))
return tf.keras.Model(inputs=[x],
outputs=self.call(x))
The task is multi-class classification.
Model consists of a dense layer with softmax activation and a lambda layer as a post-processing unit that converts the dense output vector to a single value (predicted class).
The train targets are a one-hot encoded matrix like so:
[
[0,0,0,0,1]
[0,0,1,0,0]
[0,0,0,1,0]
[0,0,0,0,1]
]
It would be nice if I could define a categorical_crossentropy loss over the dense layer and ignore the lambda layer while still maintaining the functionality and outputting a single value when I call model.predict(x).
Please note
My workspace environment doesn't allow me to use a custom training loop as suggested by #alonetogether excellent answer.
You can try using a custom training loop, which is pretty straightforward IMO:
import tensorflow as tf
from tensorflow.keras.layers import Layer
class MyModel(tf.keras.Model):
def __init__(self, num_classes):
super(MyModel, self).__init__()
self.dense_layer = tf.keras.layers.Dense(num_classes,activation='softmax')
self.lambda_layer = tf.keras.layers.Lambda(lambda x: tf.math.argmax(x, axis=-1))
def call(self, inputs):
x = self.dense_layer(inputs)
x = self.lambda_layer(x)
return x
# A convenient way to get model summary
# and plot in subclassed api
def build_graph(self, raw_shape):
x = tf.keras.layers.Input(shape=(raw_shape))
return tf.keras.Model(inputs=[x],
outputs=self.call(x))
n_classes = 5
model = MyModel(n_classes)
labels = tf.keras.utils.to_categorical(tf.random.uniform((50, 1), maxval=5, dtype=tf.int32))
train_dataset = tf.data.Dataset.from_tensor_slices((tf.random.normal((50, 1)), labels)).batch(2)
optimizer = tf.keras.optimizers.Adam()
loss_fn = tf.keras.losses.CategoricalCrossentropy()
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model.layers[0](x_batch_train)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
And prediction:
print(model.predict(tf.random.normal((1, 1))))
[3]
I think there is a Model.predict_classes function that would replace the need for that lambda layer. But if it doesn't work:
There doesn't seem to be a way to do that without using one of these hacks:
Two inputs (one is the groud truth values Y)
Two outputs
Two models
I'm quite convinced there is no other workaround for this.
So, I believe the "two models" version is the best for your case where you seem to "need" a model with single input, single output and fit.
Then I'd do this:
inputs = tf.keras.layers.Input(input_shape_without_batch_size)
loss_outputs = tf.keras.layers.Dense(num_classes,activation='softmax')(inputs)
final_outputs = tf.keras.layers.Lambda(lambda x: tf.math.argmax(x, axis=-1))(loss_outputs)
training_model = tf.keras.models.Model(inputs, loss_outputs)
final_model = tf.keras.models.Model(inputs, final_outputs)
training_model.compile(.....)
training_model.fit(....)
results = final_model.predict(...)
I have written the following multilayer perceptron model in TensorFlow, but it is not training. The accuracy stays around 9%, which is equivalent to random guessing, and cross-entropy stay around 2.56 and does not vary much.
The architecture is as follows:
def create_model(fingerprint_input, model_settings, is_training):
if is_training:
dropout_prob = tf.placeholder(tf.float32, name='dropout_prob')
fingerprint_size = model_settings['fingerprint_size']
label_count = model_settings['label_count']
weights_1 = tf.Variable(tf.truncated_normal([fingerprint_size, 128], stddev=0.001))
weights_2 = tf.Variable(tf.truncated_normal([128, 128], stddev=0.001))
weights_3 = tf.Variable(tf.truncated_normal([128, 128], stddev=0.001))
weights_out = tf.Variable(tf.truncated_normal([128, label_count], stddev=0.001))
bias_1 = tf.Variable(tf.zeros([128]))
bias_2 = tf.Variable(tf.zeros([128]))
bias_3 = tf.Variable(tf.zeros([128]))
bias_out = tf.Variable(tf.zeros([label_count]))
layer_1 = tf.matmul(fingerprint_input, weights_1) + bias_1
layer_1 = tf.nn.relu(layer_1)
layer_2 = tf.matmul(layer_1, weights_2) + bias_2
layer_2 = tf.nn.relu(layer_2)
layer_3 = tf.matmul(layer_2, weights_3) + bias_3
layer_3 = tf.nn.relu(layer_3)
logits = tf.matmul(layer_3, weights_out) + bias_out
if is_training:
return logits, dropout_prob
else:
return logits
It takes the input size as fingerprint_size and the labels size as well as label_count. It has three hidden layers with 128 neurons each. I'm following the TensorFlow example on a speech data set, which provides a framework for everything else. In the documentation, all I needed to do is to include my own neural network architecture and my method should have those arguments defined and return the logits.
When I trained another predefined architecture, with the same inputs and output, the neural network trains. But this one is not training. Here is one predefined architecture:
def create_single_fc_model(fingerprint_input, model_settings, is_training):
if is_training:
dropout_prob = tf.placeholder(tf.float32, name='dropout_prob')
fingerprint_size = model_settings['fingerprint_size']
label_count = model_settings['label_count']
weights = tf.Variable(
tf.truncated_normal([fingerprint_size, label_count], stddev=0.001))
bias = tf.Variable(tf.zeros([label_count]))
logits = tf.matmul(fingerprint_input, weights) + bias
if is_training:
return logits, dropout_prob
else:
return logits
The learning rate are 0.001 for the first 15000 steps and 0.0001 for the last 3000 steps. These are the defaults. I also tried with 0.01 and 0.001, but the same result. I think the problem is somewhere in the above implementation.
Any idea?
Thank you in advance!
You had potentially encountered a vanishing gradient problem, your variables were initialized with very small values (which is controlled by stddev parameter), it worked with one layer but in case of multiple layers it caused gradients to vanish during backpropagation.
Try to increase standard deviation of randomly initialized weight variables, e.g.
weights_n = tf.Variable(tf.truncated_normal([a, b], stddev=0.1))
and initialize biases with non-zero values like
bias_n = tf.Variable(tf.constant(0.1, shape=[b])))
I have to build a binary classifier to predict whether the input video contains an action or not.
The input to the model will be of shape: [batch, frames, height, width, channel]
Here, batch is number of videos, frames is number of images in that video (It's fixed for every video), height is number of rows in that image, width is number of columns in that image, and channel is RGB colors.
I found in Andrej Karpathy blog that many to many Recurrent Neural Network is best for this application: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Thus, I need to implement this in TensorFlow:
I learned how to implement LSTM using this tutorial: https://github.com/nlintz/TensorFlow-Tutorials/blob/master/07_lstm.py#L52
But, it is implementing many to one LSTM and predicting output and reducing loss using only last tensor: outputs[-1]
And, I want to predict output using many tensors (let say 4) and reduce loss using them.
Here's my implementation:
import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np
# Training Parameters
batch = 5 # number of examples
frames = time_step_size = 20
height = 60
width = 80
channel = 3
lstm_size = 240
num_classes = 2
# Creating random data
input_x = np.random.normal(size=[batch, frames, height, width, channel])
input_y = np.zeros((batch, num_classes))
B = np.ones(batch)
input_y[:,1] = B
X = tf.placeholder("float", [None, frames, height, width, channel], name='InputData')
Y = tf.placeholder("float", [None, num_classes], name='LabelData')
with tf.name_scope('Model'):
XR = tf.reshape(X, [-1, height*width*channel]) # shape=(?, 14400)
X_split3 = tf.split(XR, time_step_size, 0) # 20 tensors of shape=(?, 14400)
lstm = rnn.BasicLSTMCell(lstm_size, forget_bias=1.0, state_is_tuple=True)
outputs, _states = rnn.static_rnn(lstm, X_split3, dtype=tf.float32) # 20 tensors of shape=(?, 240)
logits = tf.layers.dense(outputs[-1], num_classes, name='logits') # shape=(?, 2)
prediction = tf.nn.softmax(logits)
# Define loss and optimizer
with tf.name_scope('Loss'):
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
with tf.name_scope('optimizer'):
optimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
train_op = optimizer.minimize(loss_op)
# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
with tf.name_scope('Accuracy'):
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
with tf.Session() as sess:
tf.global_variables_initializer().run()
logits_output = sess.run(logits, feed_dict={X: input_x})
print(logits_output.shape) # shape=(5, 2)
sess.run(train_op, feed_dict={X: input_x, Y: input_y})
loss, acc = sess.run([loss_op, accuracy], feed_dict={X: input_x, Y: input_y})
print("Loss: ", loss) # loss: 1.46626135e-05
print("Accuracy: ", acc) # Accuracy: 1.0
Problems:
1. I need help to implement many to many LSTM and predict output after certain frames (let say 4), but, I am only using last tensor outputs[-1] to reduce loss. There are 20 tensors, one for each frames or time_step_size. If I transform every 5th tensor: outputs[4], outputs[9], outputs[14], outputs[-1], I will get 4 logits. So, how I am going to reduce loss on all four of them?
2. One more problem is, I have to implement binary classifier, but I only have video of action I want to identify. So, the input_y is one hot representation of labels in which 1st column is always 0 and 2nd column is always 1 (action I have to identify), and I don’t have any example video in which 1st column's value is 1. Do you think it will work?
3. Why in above implement, in only one iteration the accuracy is 1?
Thanks
For 1., Dense takes any number of batch dimensions, so you should be able to transform into logits from all of the steps in one go (then likewise operate on a batch until you get a final loss for each step, then aggregate e.g. by taking the mean).
For 2. and 3., it seems like you need to find some negative examples. There's a literature on "positive and unlabeled (PU)" learning and "one-class classification" which may help.
import tensorflow as tf
import numpy as np
sess = tf.InteractiveSession()
n_steps = 3 # number of time steps in RNN
n_inputs = 1 # number of inputs received by RNN Cell at each time step
n_neurons = 10 # number of RNN cells in hidden layer
n_outputs = 3 # number of outputs given out by RNN
n_layers = 3 # number of layers in network
n_epochs = 1000 # number of epochs for RNN training
learning_rate = 0.01 # learning rate for training step
X_train = np.array([[[1.],
[2.],
[3.]],
[[4.], # training data for input sequence
[5.],
[6.]],
[[7.],
[8.],
[9.]]])
y_train = np.array([[4., 5., 6.],
[7., 8., 9.], # training data for output sequence
[10., 11., 12.]])
X_train.reshape((3, 3, 1)) # reshape X training data
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) # placeholder for input sequence
y = tf.placeholder(tf.float32, [None, n_outputs]) # placeholder for output sequence
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units = n_neurons, reuse =
True) # create hidden layer of 10 basic RNN cells
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype = tf.float32) # create layer using basic RNN Cell and input sequence X using dynamic RNN method (get outputs and final state)
logits = tf.layers.dense(states, n_outputs) # logits (and final prediction) of RNN
prediction = logits # final prediction of RNN
rsme = tf.square(y - prediction) # squared deviations of predictions form targets
loss = tf.reduce_mean(rmse) # mean of all squared deviations (cost function)
training_op = tf.train.GradientDescentOptimizer(learning_rate =
learning_rate).minimize(loss) # training step using Gradient Descent Optimizer
tf.global_variables_initializer().run()
for _ in range(n_epochs):
sess.run(training_op , feed_dict = {X: X_train, y: y_train}) # run training operation iteratively
In the above code, I am trying to predict the last 3 elements of a sequence given the first 3 elements using a dynamic Recurrent Neural Network with basic RNN cells. It has an input layer with 3 neurons and a hidden layer containing 10 recursive neurons and an output layer of 3 neurons. But, it is giving an 'Invalid Argument Error', saying that I am feeding a tensor with a negative dimension of (-1, 3, 1) to a placeholder of shape (?, 3, 1). I am not able to fix this error even after plenty of desperate googling. Can someone please help me fix this error.
Thanks in advance for the help!