Tensorflow, I got a Shape mismatch in execution time - python

Good Afternoon Everyone,
I am currently having some trouble with tensorflow, since for some reason I get a Shape error after about 3 and a half hours running. The files are loaded using the tensorflow pipeline, and creating two reinitializable datasets for training and test. I know the data has the correct shape because I do a hardcoded reshape to the expected shape and I've never got an error there. The problem is, when running the network at some point there is a sample that do not have the correct amount of number in the flatten operation. And the program crashes, but there is no other explanation other than the number of elements in the tensor is not divisible by 10 (my batch size). Which honestly makes no sense to me since the data has gone exactly through the same pipeline as the other batches that run with no problem.
I can provide code if needed, but I think is more a failure to understand some concept from the framework.
Thanks in advance for all the help.
EDIT: Please, find the code here, a bit of nomemclature t corresponds to a layer that has time data (X), f corresponds to a layer that has frequency data (FREQ), q corresponds to a layer that contains cepstral data (QUEF) and tf corresponds to layers that contain 2-D data, spectrograms of X (SPECG), Y is the label. All data are tf.float32 except for the labels which are tf.int64
EDIT 2: The operation that gives problems is the flatten on qsubnet_out
EDIT 3: Probably the most important, it seems than some of the layers converge to NaNs
Training loop:
for i in range(FLAGS.max_steps):
start = time.time()
sess.run([train],feed_dict={handle:train_handle})
if i%10 == False:
summary_op,entropy,acc,expected,output = sess.run([merged,loss,accuracy,Y,tf.argmax(logit,1)],feed_dict={handle:train_handle})
summary_op,_,_ = sess.run([merged,loss,accuracy],feed_dict={handle:test_handle})
Training operations:
W = { 'tc1': [64,3], 'tc2':[128,3], 'tc3':[256,5], 'tc4': [128, 2],
'fc1': [64,3], 'fc2':[128,3], 'fc3':[256,5], 'fc4': [128, 2],
'qc1': [64,3], 'qc2':[128,3], 'qc3':[256,5], 'qc4': [128, 2],
'tfc1': [64,(3,3)], 'tfc2':[128,(3,3)], 'tfc3':[256,(5,5)], 'tfc4': [128, (2,2)],
'dense1': 1000, 'dense2': 100, 'dense3': 200,'dense4': 300, 'dense5': 200,
'out' : NUM_CLASSES
}
iter = tf.data.Iterator.from_string_handle(handle, train_dataset.output_types, train_dataset.output_shapes)
X,FREQ,QUEF,SPECG,Y = iter.get_next()
X.set_shape([FLAGS.batch_size,768,14])
FREQ.set_shape([FLAGS.batch_size,384,14])
QUEF.set_shape([FLAGS.batch_size,384,14])
SPECG.set_shape([FLAGS.batch_size,65,18,14])
logit = net.run(X,FREQ,QUEF,SPECG,W)
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=Y,logits=logit))
And the the file net.py:
def run(X,FREQ,QUEF,SPECG,W):
time = tf.layers.batch_normalization(X,axis=-1,training=True,trainable=True)
freq = tf.layers.batch_normalization(FREQ,axis=-1,training=True,trainable=True)
quef = tf.layers.batch_normalization(QUEF,axis=-1,training=True,trainable=True)
time_freq = tf.layers.batch_normalization(SPECG,axis=-1,training=True,trainable=True)
regularizer = tf.contrib.layers.l2_regularizer(0.1);
#########################################################################################################
#### TIME SUBNET
with tf.device('/GPU:1'):
tc1 = tf.layers.conv1d(inputs=time,filters=W['tc1'][0],kernel_size=W['tc1'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc1')
trelu1 = tf.nn.relu(features=tc1,name='trelu1')
tpool1 = tf.layers.max_pooling1d(trelu1,pool_size=2,strides=1)
tc2 = tf.layers.conv1d(inputs=tpool1,filters=W['tc2'][0],kernel_size=W['tc2'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc2')
tc3 = tf.layers.conv1d(inputs=tc2,filters=W['tc3'][0],kernel_size=W['tc3'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc3')
trelu2 = tf.nn.relu(tc3,name='trelu2')
tpool2 = tf.layers.max_pooling1d(trelu2,pool_size=2,strides=1)
tc4 = tf.layers.conv1d(inputs=tpool2,filters=W['tc4'][0],kernel_size=W['tc4'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc4')
tsubnet_out = tf.nn.relu6(tc4,'trelu61')
#########################################################################################################
#### CEPSTRUM SUBNET (QUEFRENCIAL)
qc1 = tf.layers.conv1d(inputs=quef,filters=W['qc1'][0],kernel_size=W['qc1'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc1')
qrelu1 = tf.nn.relu(features=qc1,name='qrelu1')
qpool1 = tf.layers.max_pooling1d(qrelu1,pool_size=2,strides=1)
qc2 = tf.layers.conv1d(inputs=qpool1,filters=W['qc2'][0],kernel_size=W['qc2'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc2')
qc3 = tf.layers.conv1d(inputs=qc2,filters=W['qc3'][0],kernel_size=W['qc3'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc3')
qrelu2 = tf.nn.relu(qc3,name='qrelu2')
qpool2 = tf.layers.max_pooling1d(qrelu2,pool_size=2,strides=1)
qc4 = tf.layers.conv1d(inputs=qpool2,filters=W['qc4'][0],kernel_size=W['qc4'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc4')
qsubnet_out = tf.nn.relu6(qc4,'qrelu61')
#########################################################################################################
#FREQ SUBNET
with tf.device('/GPU:1'):
fc1 = tf.layers.conv1d(inputs=freq,filters=W['fc1'][0],kernel_size=W['fc1'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc1')
frelu1 = tf.nn.relu(features=fc1,name='trelu1')
fpool1 = tf.layers.max_pooling1d(frelu1,pool_size=2,strides=1)
fc2 = tf.layers.conv1d(inputs=fpool1,filters=W['fc2'][0],kernel_size=W['fc2'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc2')
fc3 = tf.layers.conv1d(inputs=fc2,filters=W['fc3'][0],kernel_size=W['fc3'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc3')
frelu2 = tf.nn.relu(fc3,name='frelu2')
fpool2 = tf.layers.max_pooling1d(frelu2,pool_size=2,strides=1)
fc4 = tf.layers.conv1d(inputs=fpool2,filters=W['fc4'][0],kernel_size=W['fc4'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc4')
fsubnet_out = tf.nn.relu6(fc4,'frelu61')
########################################################################################################
## TIME/FREQ SUBNET
with tf.device('/GPU:0'):
tfc1 = tf.layers.conv2d(inputs=time_freq,filters=W['tfc1'][0],kernel_size=W['tfc1'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc1')
tfrelu1 = tf.nn.relu(tfc1)
tfpool1 = tf.layers.max_pooling2d(tfrelu1,pool_size=[2, 2],strides=[1, 1])
tfc2 = tf.layers.conv2d(inputs=tfpool1,filters=W['tfc2'][0],kernel_size=W['tfc2'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc2')
tfc3 = tf.layers.conv2d(inputs=tfc2,filters=W['tfc3'][0],kernel_size=W['tfc3'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc3')
tfrelu2 = tf.nn.relu(tfc3)
tfpool2 = tf.layers.max_pooling2d(tfrelu2,pool_size=[2, 2], strides=[1, 1])
tfc4 = tf.layers.conv2d(inputs=tfpool2,filters=W['tfc4'][0],kernel_size=W['tfc4'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc4')
tfsubnet_out = tf.nn.relu6(tfc4,'tfrelu61')
########################################################################################################
##Flatten subnet outputs
tsubnet_out = tf.layers.flatten(tsubnet_out)
fsubnet_out = tf.layers.flatten(fsubnet_out)
tfsubnet_out = tf.layers.flatten(tfsubnet_out)
qsubnet_out = tf.layers.flatten(qsubnet_out)
#Final subnet computation
input_final = tf.concat((tsubnet_out,fsubnet_out,qsubnet_out,tfsubnet_out),1)
dense1 = tf.layers.dense(input_final,W['dense1'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense1')
dense2 = tf.layers.dense(dense1,W['dense2'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense2')
dense3 = tf.layers.dense(dense2,W['dense3'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense3')
dense4 = tf.layers.dense(dense3,W['dense4'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense4')
dense5 = tf.layers.dense(dense4,W['dense5'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense5')
out = tf.layers.dense(dense5,W['out'],tf.nn.relu, name='out')
return out

Finally after some days, I've been able to track the problem. Which was not related to the code, I submitted, in the end. But it was related to the creation of the Tensorflow Dataset. Since in the batchin, if the length of the Dataset was not divisible by the batch size. The flag drop_remainder to True.
I will not delete the question since I believe is a problem that more people may have in the future and the source is not easily identificable.

Related

Feature importance in neural networks with multiple differently shaped inputs in pytorch and captum (classification)

I have developed a model with three inputs types. Image, categorical data and numerical data. For Image data I've used ResNet50 for the other two I develop my own network.
class MulticlassClassification(nn.Module):
def __init__(self, cat_size, num_col, output_size, layers, p=0.4):
super(MulticlassClassification, self).__init__()
# IMAGE: ResNet
self.cnn = models.resnet50(pretrained = True)
for param in self.cnn.parameters():
param.requires_grad = False
n_inputs = self.cnn.fc.in_features
self.cnn.fc = nn.Sequential(
nn.Linear(n_inputs, 250),
nn.ReLU(),
nn.Dropout(p),
nn.Linear(250, output_size),
nn.LogSoftmax(dim=1)
)
# TABULAR
self.all_embeddings = nn.ModuleList(
[nn.Embedding(categories, size) for categories, size in cat_size]
)
self.embedding_dropout = nn.Dropout(p)
self.batch_norm_num = nn.BatchNorm1d(num_col)
all_layers = []
num_cat_col = sum(e.embedding_dim for e in self.all_embeddings)
input_size = num_cat_col + num_col
for i in layers:
all_layers.append(nn.Linear(input_size, i))
all_layers.append(nn.ReLU(inplace=True))
all_layers.append(nn.BatchNorm1d(i))
all_layers.append(nn.Dropout(p))
input_size = i
all_layers.append(nn.Linear(layers[-1], output_size))
self.layers = nn.Sequential(*all_layers)
#combine
self.combine_fc = nn.Linear(output_size * 2, output_size)
def forward(self, image, x_categorical, x_numerical):
embeddings = []
for i, embedding in enumerate(self.all_embeddings):
print(x_categorical[:,i])
embeddings.append(embedding(x_categorical[:,i]))
x = torch.cat(embeddings, 1)
x = self.embedding_dropout(x)
x_numerical = self.batch_norm_num(x_numerical)
x = torch.cat([x, x_numerical], 1)
x = self.layers(x)
# img
x2 = self.cnn(image)
# combine
x3 = torch.cat([x, x2], 1)
x3 = F.relu(self.combine_fc(x3))
return x
Now after successful training I would like to calculate integrated gradients by using the captum library.
from captum.attr import IntegratedGradients
ig = IntegratedGradients(model)
testiter = iter(testloader)
img, stack_cat, stack_num, target = next(testiter)
attributions_ig = ig.attribute(inputs=(img.cuda(), stack_cat.cuda(), stack_num.cuda()), target=target.cuda())
And here I got an error:
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)
I figured out that captum injects a wrongly shaped tensor into my x_categorical input (with the print in my forward method). It seems like captum only sees the first input tensor and uses it's shape for all other inputs. How can I change this behaviour?
I've found the similar issue here (https://github.com/pytorch/captum/issues/439). It was recommended to use Interpretable Embedding for categorical data. When I used it I got this error:
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
I would be very grateful for any tips and advises how to combine all three inputs and to solve my problem.

Get the CNN layer output size in def init PyTorch

When defining our model architecture in PyTorch, we need to specify the size of CNN output layer to feed into the nn.Linear layer. How can we find the size of this layer in the def __init__ function (not in def forward())
class model(nn.Module):
def __init__(self,word_count,img_channel,n_out):
super(multimodal,self).__init__()
# CNN image encoding hyperparameters
conv1_channel_out = 8
conv1_kernel = 5
pool1_size = 2
conv2_channel_out = 16
conv2_kernel = 16
pool2_size = 2
conv3_channel_out = 32
conv3_kernel = 4
dropout_rate = 0.1
cnn_fc_out = 512
comb_fc1_out = 512
comb_fc2_out = 128
# FNN text encoding hyperparameters
text_fc1_out = 4096
text_fc2_out = 512
# Text encoding
self.text_fc1 = nn.Linear(word_count, text_fc1_out)
self.text_fc2 = nn.Linear(text_fc1_out, text_fc2_out)
# Image encoding
self.conv1 = nn.Conv2d(img_channel, conv1_channel_out, conv1_kernel)
self.max_pool1 = nn.MaxPool2d(pool1_size)
self.conv2 = nn.Conv2d(conv1_channel_out, conv2_channel_out, conv2_kernel)
self.max_pool2 = nn.MaxPool2d(pool2_size)
self.conv3 = nn.Conv2d(conv2_channel_out, conv3_channel_out, conv3_kernel)
self.cnn_dropout = nn.Dropout(dropout_rate)
self.cnn_fc = nn.Linear(32*24*12, cnn_fc_out)
#Concat layer
concat_feat = cnn_fc_out + text_fc2_out
self.combined_fc1 = nn.Linear(concat_feat, comb_fc1_out)
self.combined_fc2 = nn.Linear(comb_fc1_out, comb_fc2_out)
self.output_fc = nn.Linear(comb_fc2_out, n_out)
def forward(self, text, img):
# Image Encoding
x = F.relu(self.conv1(img))
x = self.max_pool1(x)
x = F.relu(self.conv2(x))
x = self.max_pool2(x)
x = F.relu(self.conv3(x))
x = x.view(-1, 32*24*12)
x = self.cnn_dropout(x)
img = F.relu(self.cnn_fc(x))
# Text Encoding
text = F.relu(self.text_fc1(text))
text = F.relu(self.text_fc2(text))
# Concat the features
concat_inp = torch.cat((text, img), 1)
out = F.relu(self.combined_fc1(concat_inp))
out = F.relu(self.combined_fc2(out))
return torch.sigmoid(self.output_fc(out))
If you see above, I define the size of CNN output layer as 322412 manually
self.cnn_fc = nn.Linear(32*24*12, cnn_fc_out)
How can I avoid this? I know we might be able to call [model_name].[layer_name].in_features in def forward(), but not in def __init__()
I dont think there is a specific way to do that. You would have to run a sample (you can just use x = torch.rand((1, C, W, H)) for testing) and then in forward print out the shape of the conv layer right before your linear layer, then you memorize that number and hardcode it into init.
Or you could use formulas to calculate the shape of a conv layer based on the dimensions of the input, kernel-size, padding, etc. Here is a thread about those formulas.
There is no general way to do this, since the input and output sizes are not fixed in a CNN. What you can output is the number of channels, but the module will accept and transform any image height X width dimensions (so long as hey are sufficiently large to produce results large enough for the next layer after unpadded convolutions and pooling etc).
Hence you cannot include this in init (naive to input, instantiation of object), only in forward (calculated upon seeing input).

Create an LSTM layer with Attention in Keras for multi-label text classification neural network

Greetings dear members of the community. I am creating a neural network to predict a multi-label y. Specifically, the neural network takes 5 inputs (list of actors, plot summary, movie features, movie reviews, title) and tries to predict the sequence of movie genres. In the neural network I use Embeddings Layer and Global Max Pooling layers.
However, I recently discovered the Recurrent Layers with Attention, which are a very interesting topic these days in machine learning translation. So, I wondered if I could use one of those layers but only the Plot Summary input. Note that I don't do ml translation but rather text classification.
My neural network in its current state
def create_fit_keras_model(hparams,
version_data_control,
optimizer_name,
validation_method,
callbacks,
optimizer_version = None):
sentenceLength_actors = X_train_seq_actors.shape[1]
vocab_size_frequent_words_actors = len(actors_tokenizer.word_index)
sentenceLength_plot = X_train_seq_plot.shape[1]
vocab_size_frequent_words_plot = len(plot_tokenizer.word_index)
sentenceLength_features = X_train_seq_features.shape[1]
vocab_size_frequent_words_features = len(features_tokenizer.word_index)
sentenceLength_reviews = X_train_seq_reviews.shape[1]
vocab_size_frequent_words_reviews = len(reviews_tokenizer.word_index)
sentenceLength_title = X_train_seq_title.shape[1]
vocab_size_frequent_words_title = len(title_tokenizer.word_index)
model = keras.Sequential(name='{0}_{1}dim_{2}batchsize_{3}lr_{4}decaymultiplier_{5}'.format(sequential_model_name,
str(hparams[HP_EMBEDDING_DIM]),
str(hparams[HP_HIDDEN_UNITS]),
str(hparams[HP_LEARNING_RATE]),
str(hparams[HP_DECAY_STEPS_MULTIPLIER]),
version_data_control))
actors = keras.Input(shape=(sentenceLength_actors,), name='actors_input')
plot = keras.Input(shape=(sentenceLength_plot,), batch_size=hparams[HP_HIDDEN_UNITS], name='plot_input')
features = keras.Input(shape=(sentenceLength_features,), name='features_input')
reviews = keras.Input(shape=(sentenceLength_reviews,), name='reviews_input')
title = keras.Input(shape=(sentenceLength_title,), name='title_input')
emb1 = layers.Embedding(input_dim = vocab_size_frequent_words_actors + 2,
output_dim = 16, #hparams[HP_EMBEDDING_DIM], hyperparametered or fixed sized.
embeddings_initializer = 'uniform',
mask_zero = True,
input_length = sentenceLength_actors,
name="actors_embedding_layer")(actors)
# encoded_layer1 = layers.GlobalAveragePooling1D(name="globalaveragepooling_actors_layer")(emb1)
encoded_layer1 = layers.GlobalMaxPooling1D(name="globalmaxpooling_actors_layer")(emb1)
emb2 = layers.Embedding(input_dim = vocab_size_frequent_words_plot + 2,
output_dim = hparams[HP_EMBEDDING_DIM],
embeddings_initializer = 'uniform',
mask_zero = True,
input_length = sentenceLength_plot,
name="plot_embedding_layer")(plot)
# (Option 1)
# encoded_layer2 = layers.GlobalMaxPooling1D(name="globalmaxpooling_plot_summary_Layer")(emb2)
# (Option 2)
emb2 = layers.Bidirectional(layers.LSTM(hparams[HP_EMBEDDING_DIM], return_sequences=True))(emb2)
avg_pool = layers.GlobalAveragePooling1D()(emb2)
max_pool = layers.GlobalMaxPooling1D()(emb2)
conc = layers.concatenate([avg_pool, max_pool])
# (Option 3)
# emb2 = layers.Bidirectional(layers.LSTM(hparams[HP_EMBEDDING_DIM], return_sequences=True))(emb2)
# emb2 = layers.Bidirectional(layers.LSTM(hparams[HP_EMBEDDING_DIM], return_sequences=True))(emb2)
# emb2 = AttentionWithContext()(emb2)
emb3 = layers.Embedding(input_dim = vocab_size_frequent_words_features + 2,
output_dim = hparams[HP_EMBEDDING_DIM],
embeddings_initializer = 'uniform',
mask_zero = True,
input_length = sentenceLength_features,
name="features_embedding_layer")(features)
# encoded_layer3 = layers.GlobalAveragePooling1D(name="globalaveragepooling_movie_features_layer")(emb3)
encoded_layer3 = layers.GlobalMaxPooling1D(name="globalmaxpooling_movie_features_layer")(emb3)
emb4 = layers.Embedding(input_dim = vocab_size_frequent_words_reviews + 2,
output_dim = hparams[HP_EMBEDDING_DIM],
embeddings_initializer = 'uniform',
mask_zero = True,
input_length = sentenceLength_reviews,
name="reviews_embedding_layer")(reviews)
# encoded_layer4 = layers.GlobalAveragePooling1D(name="globalaveragepooling_user_reviews_layer")(emb4)
encoded_layer4 = layers.GlobalMaxPooling1D(name="globalmaxpooling_user_reviews_layer")(emb4)
emb5 = layers.Embedding(input_dim = vocab_size_frequent_words_title + 2,
output_dim = hparams[HP_EMBEDDING_DIM],
embeddings_initializer = 'uniform',
mask_zero = True,
input_length = sentenceLength_title,
name="title_embedding_layer")(title)
# encoded_layer5 = layers.GlobalAveragePooling1D(name="globalaveragepooling_movie_title_layer")(emb5)
encoded_layer5 = layers.GlobalMaxPooling1D(name="globalmaxpooling_movie_title_layer")(emb5)
merged = layers.concatenate([encoded_layer1, conc, encoded_layer3, encoded_layer4, encoded_layer5], axis=-1) #(Option 2)
# merged = layers.concatenate([encoded_layer1, emb2, encoded_layer3, encoded_layer4, encoded_layer5], axis=-1) #(Option 3)
dense_layer_1 = layers.Dense(hparams[HP_HIDDEN_UNITS],
kernel_regularizer=regularizers.l2(neural_network_parameters['l2_regularization']),
activation=neural_network_parameters['dense_activation'],
name="1st_dense_hidden_layer_concatenated_inputs")(merged)
layers.Dropout(neural_network_parameters['dropout_rate'])(dense_layer_1)
output_layer = layers.Dense(neural_network_parameters['number_target_variables'],
activation=neural_network_parameters['output_activation'],
name='output_layer')(dense_layer_1)
model = keras.Model(inputs=[actors, plot, features, reviews, title], outputs=output_layer, name='{0}_{1}dim_{2}batchsize_{3}lr_{4}decaymultiplier_{5}'.format(sequential_model_name,
str(hparams[HP_EMBEDDING_DIM]),
str(hparams[HP_HIDDEN_UNITS]),
str(hparams[HP_LEARNING_RATE]),
str(hparams[HP_DECAY_STEPS_MULTIPLIER]),
version_data_control))
print(model.summary())
# pruning_schedule = tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.0,
# final_sparsity=0.4,
# begin_step=600,
# end_step=1000)
# model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(model, pruning_schedule=pruning_schedule)
if optimizer_name=="adam" and optimizer_version is None:
optimizer = optimizer_adam_v2(hparams)
elif optimizer_name=="sgd" and optimizer_version is None:
optimizer = optimizer_sgd_v1(hparams, "no decay")
elif optimizer_name=="rmsprop" and optimizer_version is None:
optimizer = optimizer_rmsprop_v1(hparams)
print("here: {0}".format(optimizer.lr))
lr_metric = [get_lr_metric(optimizer)]
if type(get_lr_metric(optimizer)) in (float, int):
print("Learning Rate's type is Float or Integer")
model.compile(optimizer=optimizer,
loss=neural_network_parameters['model_loss'],
metrics=neural_network_parameters['model_metric'] + lr_metric, )
else:
print("Learning Rate's type is not Float or Integer, but rather {0}".format(type(lr_metric)))
model.compile(optimizer=optimizer,
loss=neural_network_parameters['model_loss'],
metrics=neural_network_parameters['model_metric'], ) #+ lr_metric
You will see in the above structure that I have 5 input layers, 5 Embedding layers, then I apply a Bidirectional layer on LSTM only in the Plot Summary input.
However, with the current bidirectional approach on Plot summary, I got the following error. My problem is how I can utilize the attention in text classification and not solve the error below. So, don't comment solution on this error.
My question is about suggesting ways on how to create a recurrent layer with attention for the plot summary (input 2). Also, do not hesitate to write in comments any article that might help me on achieving this in Keras.
I remain at your disposal if any additional information is required regarding the structure of the neural network.
If you find the above neural network complicated I can make a simple version of it. However, the above is my original neural network, so I want any proposals do be based on that nn.
EDIT: 14.12.2020
Find here the colab notebook with the code I want to execute. The code has included two answers, one proposed in the comments (from an already answered question, and the other written as an official answer to my question.
The first approach proposed by #MarcoCerliani works. Although, I would like also the second approach to work. The approach of #Allohvk (both approaches are implemented in the Runtime cell [21] of the attached colab). The latter does not work at the moment. The latest error I get is:
ValueError: Input 0 of layer globalmaxpooling_plot_summary_Layer is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 100]
I solved the latest error of my edit by removing the globalmaxpooling_plot_summary_Layer from my neural's network structure.
Let me summarize the intent. You want to add attention to your code. Yours is a sequence classification task and not a seq-seq translator. You dont really care much about the way it is done, so you are ok with not debugging the error above, but just need a working piece of code. Our main input here is the movie reviews consisting of 'n' words for which you want to add attention.
Assume you embed the reviews and pass it to an LSTM layer. Now you want to 'attend' to all the hidden states of the LSTM layer and then generate a classification (instead of just using the last hidden state of the encoder). So an attention layer needs to be inserted. A barebones implementation would look like this:
def __init__(self):
##Nothing special to be done here
super(peel_the_layer, self).__init__()
def build(self, input_shape):
##Define the shape of the weights and bias in this layer
##This is a 1 unit layer.
units=1
##last index of the input_shape is the number of dimensions of the prev
##RNN layer. last but 1 index is the num of timesteps
self.w=self.add_weight(name="att_weights", shape=(input_shape[-1], units), initializer="normal") #name property is useful for avoiding RuntimeError: Unable to create link.
self.b=self.add_weight(name="att_bias", shape=(input_shape[-2], units), initializer="zeros")
super(peel_the_layer,self).build(input_shape)
def call(self, x):
##x is the input tensor..each word that needs to be attended to
##Below is the main processing done during training
##K is the Keras Backend import
e = K.tanh(K.dot(x,self.w)+self.b)
a = K.softmax(e, axis=1)
output = x*a
##return the ouputs. 'a' is the set of attention weights
##the second variable is the 'attention adjusted o/p state' or context
return a, K.sum(output, axis=1)
Now call the above Attention layer after your LSTM and before your Dense output layer.
a, context = peel_the_layer()(lstm_out)
##context is the o/p which be the input to your classification layer
##a is the set of attention weights and you may want to route them to a display
You can build on top of this as you seem to want to use other features apart for the movie reviews to come up with the final sentiment. Attention largely applies to reviews..and benefits are to be seen if the sentences are very long.
For more specific details, please refer https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

How to feed multiple images at once in VGG16-CNN?

I am struggling to implement a VGG-16 based feature extractor which accepts two inputs, i.e. the first input is the whole image, and the second input are the patch-wise images (N-local region sub-images). At first i am defining two models, global ones, which operates on the whole image, and local one, which operates on the local image regions. The idea is to add some kind of channel-wise pooling in the local model, where out of N local patches only one will be the resulting one, and after that the features from the global and the resulting local need to be concatenated.
Can you help me implement this kind of vgg-16 based feature extractor?
The idea behind this methodology is shown on the picture.
VGG-16 fusion scheme
The code goes like this:
def ChannelPool(x):
return K.max(x, axis=0, keepdims = True)
def ConcatLayer(x):
tensor_1 = x[0]
tensor2 = x[1]
return K.concatenate([tensor_1, tensor2], axis = 1)
N_patches = 9 # local image regions
input_shape_global = Input(shape=(224, 224, 3))
input_shape_local = Input(shape=(N_patches, 50, 50, 3)) # i struggle with this part
Global_Model = VGG16(include_top=False, weights='imagenet', input_tensor=None, input_shape=(224,224,3), pooling='avg')
Local_Model = VGG16(include_top=False, weights='imagenet', input_tensor=input_shape_local[0], input_shape=(50,50,3), pooling='avg')
# Change layer names to avoid confusion
for layer_l in Global_Model.layers:
layer_l.name = layer_l.name + str("_1")
for layer_g in Local_Model.layers:
layer_g.name = layer_g.name + str("_2")
inp1 = Global_Model.input
out1 = Global_Model.output
inp2 = Local_Model.input
out2 = Local_Model.output
image_features = Global_Model(inp1)
patch_features = Local_Model(inp2)
patch_feature = Lambda(ChannelPool, name="Channel_Pool_Layer")(patch_features)
image_features = K.reshape(image_features, (1,512))
patch_feature = K.reshape(patch_feature, (1,512))
merged = Concatenate(axis = 1)([image_features, patch_feature])
merged = Dense(total_features ,activation='softmax', name='Fc1')(merged)
merged = Dense(total_features , activation='relu', name='Fc2')(merged)
final_model = Model(inputs = [inp1,inp2], outputs = merged)
Can you help me solve this kind of problem, for now the issue is this:
**node = layer._inbound_nodes[node_index]**
AttributeError: 'NoneType' object has no attribute '_inbound_nodes'
Thank you in advance.

Iterate over a tensor dimension in Tensorflow

I am trying to develop a seq2seq model from a low level perspective (creating by myself all the tensors needed). I am trying to feed the model with a sequence of vectors as a two-dimensional tensor, however, i can't iterate over one dimension of the tensor to extract vector by vector. Does anyone know what could I do to feed a batch of vectors and later get them one by one?
This is my code:
batch_size = 100
hidden_dim = 5
input_dim = embedding_dim
time_size = 5
input_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='input')
output_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='output')
input_array = np.asarray(input_sentence)
output_array = np.asarray(output_sentence)
gru_layer1 = GRU(input_array, input_dim, hidden_dim) #This is a class created by myself
for i in range(input_array.shape[-1]):
word = input_array[:,i]
previous_state = gru_encoder.h_t
gru_layer1.forward_pass(previous_state,word)
And this is the error that I get
TypeError: Expected binary or unicode string, got <tf.Tensor 'input_7:0' shape=(10, ?) dtype=float64>
Tensorflow does deferred execution.
You usually can't know how big the vector will be (words in a sentance, audio samples, etc...). The common thing to do is to cap it at some reasonably large value and then pad the shorter sequences with an empty token.
Once you do this you can select the data for a time slice with the slice operator:
data = tf.placeholder(shape=(batch_size, max_size, numer_of_inputs))
....
for i in range(max_size):
time_data = data[:, i, :]
DoStuff(time_data)
Also lookup tf.transpose for swapping batch and time indices. It can help with performance in certain cases.
Alternatively consider something like tf.nn.static_rnn or tf.nn.dynamic_rnn to do the boilerplate stuff for you.
Finally I found an approach that solves my problem. It worked using tf.scan() instead of a loop, which doesn't require the input tensor to have a defined number in the second dimension. Consecuently you hace to prepare the input tensor previously to be parsed as you want throught tf.san(). In my case this is the code:
batch_size = 100
hidden_dim = 5
input_dim = embedding_dim
time_size = 5
input_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='input')
output_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='output')
input_array = np.asarray(input_sentence)
output_array = np.asarray(output_sentence)
x_t = tf.transpose(input_array, [1, 0], name='x_t')
h_0 = tf.convert_to_tensor(h_0, dtype=tf.float64)
h_t_transposed = tf.scan(forward_pass, x_t, h_0, name='h_t_transposed')
h_t = tf.transpose(h_t_transposed, [1, 0], name='h_t')

Categories