How to feed multiple images at once in VGG16-CNN?

How to feed multiple images at once in VGG16-CNN? - python

I am struggling to implement a VGG-16 based feature extractor which accepts two inputs, i.e. the first input is the whole image, and the second input are the patch-wise images (N-local region sub-images). At first i am defining two models, global ones, which operates on the whole image, and local one, which operates on the local image regions. The idea is to add some kind of channel-wise pooling in the local model, where out of N local patches only one will be the resulting one, and after that the features from the global and the resulting local need to be concatenated.
Can you help me implement this kind of vgg-16 based feature extractor?
The idea behind this methodology is shown on the picture.
VGG-16 fusion scheme
The code goes like this:
def ChannelPool(x):
return K.max(x, axis=0, keepdims = True)
def ConcatLayer(x):
tensor_1 = x[0]
tensor2 = x[1]
return K.concatenate([tensor_1, tensor2], axis = 1)
N_patches = 9 # local image regions
input_shape_global = Input(shape=(224, 224, 3))
input_shape_local = Input(shape=(N_patches, 50, 50, 3)) # i struggle with this part
Global_Model = VGG16(include_top=False, weights='imagenet', input_tensor=None, input_shape=(224,224,3), pooling='avg')
Local_Model = VGG16(include_top=False, weights='imagenet', input_tensor=input_shape_local[0], input_shape=(50,50,3), pooling='avg')
# Change layer names to avoid confusion
for layer_l in Global_Model.layers:
layer_l.name = layer_l.name + str("_1")
for layer_g in Local_Model.layers:
layer_g.name = layer_g.name + str("_2")
inp1 = Global_Model.input
out1 = Global_Model.output
inp2 = Local_Model.input
out2 = Local_Model.output
image_features = Global_Model(inp1)
patch_features = Local_Model(inp2)
patch_feature = Lambda(ChannelPool, name="Channel_Pool_Layer")(patch_features)
image_features = K.reshape(image_features, (1,512))
patch_feature = K.reshape(patch_feature, (1,512))
merged = Concatenate(axis = 1)([image_features, patch_feature])
merged = Dense(total_features ,activation='softmax', name='Fc1')(merged)
merged = Dense(total_features , activation='relu', name='Fc2')(merged)
final_model = Model(inputs = [inp1,inp2], outputs = merged)
Can you help me solve this kind of problem, for now the issue is this:
**node = layer._inbound_nodes[node_index]**
AttributeError: 'NoneType' object has no attribute '_inbound_nodes'
Thank you in advance.

Related

Feature importance in neural networks with multiple differently shaped inputs in pytorch and captum (classification)

I have developed a model with three inputs types. Image, categorical data and numerical data. For Image data I've used ResNet50 for the other two I develop my own network.
class MulticlassClassification(nn.Module):
def __init__(self, cat_size, num_col, output_size, layers, p=0.4):
super(MulticlassClassification, self).__init__()
# IMAGE: ResNet
self.cnn = models.resnet50(pretrained = True)
for param in self.cnn.parameters():
param.requires_grad = False
n_inputs = self.cnn.fc.in_features
self.cnn.fc = nn.Sequential(
nn.Linear(n_inputs, 250),
nn.ReLU(),
nn.Dropout(p),
nn.Linear(250, output_size),
nn.LogSoftmax(dim=1)
)
# TABULAR
self.all_embeddings = nn.ModuleList(
[nn.Embedding(categories, size) for categories, size in cat_size]
)
self.embedding_dropout = nn.Dropout(p)
self.batch_norm_num = nn.BatchNorm1d(num_col)
all_layers = []
num_cat_col = sum(e.embedding_dim for e in self.all_embeddings)
input_size = num_cat_col + num_col
for i in layers:
all_layers.append(nn.Linear(input_size, i))
all_layers.append(nn.ReLU(inplace=True))
all_layers.append(nn.BatchNorm1d(i))
all_layers.append(nn.Dropout(p))
input_size = i
all_layers.append(nn.Linear(layers[-1], output_size))
self.layers = nn.Sequential(*all_layers)
#combine
self.combine_fc = nn.Linear(output_size * 2, output_size)
def forward(self, image, x_categorical, x_numerical):
embeddings = []
for i, embedding in enumerate(self.all_embeddings):
print(x_categorical[:,i])
embeddings.append(embedding(x_categorical[:,i]))
x = torch.cat(embeddings, 1)
x = self.embedding_dropout(x)
x_numerical = self.batch_norm_num(x_numerical)
x = torch.cat([x, x_numerical], 1)
x = self.layers(x)
# img
x2 = self.cnn(image)
# combine
x3 = torch.cat([x, x2], 1)
x3 = F.relu(self.combine_fc(x3))
return x
Now after successful training I would like to calculate integrated gradients by using the captum library.
from captum.attr import IntegratedGradients
ig = IntegratedGradients(model)
testiter = iter(testloader)
img, stack_cat, stack_num, target = next(testiter)
attributions_ig = ig.attribute(inputs=(img.cuda(), stack_cat.cuda(), stack_num.cuda()), target=target.cuda())
And here I got an error:
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)
I figured out that captum injects a wrongly shaped tensor into my x_categorical input (with the print in my forward method). It seems like captum only sees the first input tensor and uses it's shape for all other inputs. How can I change this behaviour?
I've found the similar issue here (https://github.com/pytorch/captum/issues/439). It was recommended to use Interpretable Embedding for categorical data. When I used it I got this error:
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
I would be very grateful for any tips and advises how to combine all three inputs and to solve my problem.

Get the CNN layer output size in def init PyTorch

When defining our model architecture in PyTorch, we need to specify the size of CNN output layer to feed into the nn.Linear layer. How can we find the size of this layer in the def __init__ function (not in def forward())
class model(nn.Module):
def __init__(self,word_count,img_channel,n_out):
super(multimodal,self).__init__()
# CNN image encoding hyperparameters
conv1_channel_out = 8
conv1_kernel = 5
pool1_size = 2
conv2_channel_out = 16
conv2_kernel = 16
pool2_size = 2
conv3_channel_out = 32
conv3_kernel = 4
dropout_rate = 0.1
cnn_fc_out = 512
comb_fc1_out = 512
comb_fc2_out = 128
# FNN text encoding hyperparameters
text_fc1_out = 4096
text_fc2_out = 512
# Text encoding
self.text_fc1 = nn.Linear(word_count, text_fc1_out)
self.text_fc2 = nn.Linear(text_fc1_out, text_fc2_out)
# Image encoding
self.conv1 = nn.Conv2d(img_channel, conv1_channel_out, conv1_kernel)
self.max_pool1 = nn.MaxPool2d(pool1_size)
self.conv2 = nn.Conv2d(conv1_channel_out, conv2_channel_out, conv2_kernel)
self.max_pool2 = nn.MaxPool2d(pool2_size)
self.conv3 = nn.Conv2d(conv2_channel_out, conv3_channel_out, conv3_kernel)
self.cnn_dropout = nn.Dropout(dropout_rate)
self.cnn_fc = nn.Linear(32*24*12, cnn_fc_out)
#Concat layer
concat_feat = cnn_fc_out + text_fc2_out
self.combined_fc1 = nn.Linear(concat_feat, comb_fc1_out)
self.combined_fc2 = nn.Linear(comb_fc1_out, comb_fc2_out)
self.output_fc = nn.Linear(comb_fc2_out, n_out)
def forward(self, text, img):
# Image Encoding
x = F.relu(self.conv1(img))
x = self.max_pool1(x)
x = F.relu(self.conv2(x))
x = self.max_pool2(x)
x = F.relu(self.conv3(x))
x = x.view(-1, 32*24*12)
x = self.cnn_dropout(x)
img = F.relu(self.cnn_fc(x))
# Text Encoding
text = F.relu(self.text_fc1(text))
text = F.relu(self.text_fc2(text))
# Concat the features
concat_inp = torch.cat((text, img), 1)
out = F.relu(self.combined_fc1(concat_inp))
out = F.relu(self.combined_fc2(out))
return torch.sigmoid(self.output_fc(out))
If you see above, I define the size of CNN output layer as 322412 manually
self.cnn_fc = nn.Linear(32*24*12, cnn_fc_out)
How can I avoid this? I know we might be able to call [model_name].[layer_name].in_features in def forward(), but not in def __init__()

I dont think there is a specific way to do that. You would have to run a sample (you can just use x = torch.rand((1, C, W, H)) for testing) and then in forward print out the shape of the conv layer right before your linear layer, then you memorize that number and hardcode it into init.
Or you could use formulas to calculate the shape of a conv layer based on the dimensions of the input, kernel-size, padding, etc. Here is a thread about those formulas.

There is no general way to do this, since the input and output sizes are not fixed in a CNN. What you can output is the number of channels, but the module will accept and transform any image height X width dimensions (so long as hey are sufficiently large to produce results large enough for the next layer after unpadded convolutions and pooling etc).
Hence you cannot include this in init (naive to input, instantiation of object), only in forward (calculated upon seeing input).

Gradcam with guided backprop for transfer learning in Tensorflow 2.0

I get an error using gradient visualization with transfer learning in TF 2.0. The gradient visualization works on a model that does not use transfer learning.
When I run my code I get the error:
assert str(id(x)) in tensor_dict, 'Could not compute output ' + str(x)
AssertionError: Could not compute output Tensor("block5_conv3/Identity:0", shape=(None, 14, 14, 512), dtype=float32)
When I run the code below it errors. I think there's an issue with the naming conventions or connecting inputs and outputs from the base model, vgg16, to the layers I'm adding. Really appreciate your help!
"""
Broken example when grad_model is created.
"""
!pip uninstall tensorflow
!pip install tensorflow==2.0.0
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
import matplotlib.pyplot as plt
IMAGE_PATH = '/content/cat.3.jpg'
LAYER_NAME = 'block5_conv3'
model_layer = 'vgg16'
CAT_CLASS_INDEX = 281
imsize = (224,224,3)
img = tf.keras.preprocessing.image.load_img(IMAGE_PATH, target_size=(224, 224))
plt.figure()
plt.imshow(img)
img = tf.io.read_file(IMAGE_PATH)
img = tf.image.decode_jpeg(img)
img = tf.cast(img, dtype=tf.float32)
# img = tf.keras.preprocessing.image.img_to_array(img)
img = tf.image.resize(img, (224,224))
img = tf.reshape(img, (1, 224,224,3))
input = layers.Input(shape=(imsize[0], imsize[1], imsize[2]))
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet',
input_shape=(imsize[0], imsize[1], imsize[2]))
# base_model.trainable = False
flat = layers.Flatten()
dropped = layers.Dropout(0.5)
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
fc1 = layers.Dense(16, activation='relu', name='dense_1')
fc2 = layers.Dense(16, activation='relu', name='dense_2')
fc3 = layers.Dense(128, activation='relu', name='dense_3')
prediction = layers.Dense(2, activation='softmax', name='output')
for layr in base_model.layers:
if ('block5' in layr.name):
layr.trainable = True
else:
layr.trainable = False
x = base_model(input)
x = global_average_layer(x)
x = fc1(x)
x = fc2(x)
x = prediction(x)
model = tf.keras.models.Model(inputs = input, outputs = x)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
loss='binary_crossentropy',
metrics=['accuracy'])
This portion of the code is where the error lies. I'm not sure what is the correct way to label inputs and outputs.
# Create a graph that outputs target convolution and output
grad_model = tf.keras.models.Model(inputs = [model.input, model.get_layer(model_layer).input],
outputs=[model.get_layer(model_layer).get_layer(LAYER_NAME).output,
model.output])
print(model.get_layer(model_layer).get_layer(LAYER_NAME).output)
# Get the score for target class
# Get the score for target class
with tf.GradientTape() as tape:
conv_outputs, predictions = grad_model(img)
loss = predictions[:, 1]
The section below is for plotting a heatmap of gradcam.
print('Prediction shape:', predictions.get_shape())
# Extract filters and gradients
output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]
# Apply guided backpropagation
gate_f = tf.cast(output > 0, 'float32')
gate_r = tf.cast(grads > 0, 'float32')
guided_grads = gate_f * gate_r * grads
# Average gradients spatially
weights = tf.reduce_mean(guided_grads, axis=(0, 1))
# Build a ponderated map of filters according to gradients importance
cam = np.ones(output.shape[0:2], dtype=np.float32)
for index, w in enumerate(weights):
cam += w * output[:, :, index]
# Heatmap visualization
cam = cv2.resize(cam.numpy(), (224, 224))
cam = np.maximum(cam, 0)
heatmap = (cam - cam.min()) / (cam.max() - cam.min())
cam = cv2.applyColorMap(np.uint8(255 * heatmap), cv2.COLORMAP_JET)
output_image = cv2.addWeighted(cv2.cvtColor(img.astype('uint8'), cv2.COLOR_RGB2BGR), 0.5, cam, 1, 0)
plt.figure()
plt.imshow(output_image)
plt.show()
I also asked this to the tensorflow team on github at https://github.com/tensorflow/tensorflow/issues/37680.

I figured it out. If you set up the model extending the vgg16 base model with your own layers, rather than inserting the base model into a new model like a layer, then it works.
First set up the model and be sure to declare the input_tensor.
inp = layers.Input(shape=(imsize[0], imsize[1], imsize[2]))
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet', input_tensor=inp,
input_shape=(imsize[0], imsize[1], imsize[2]))
This way we don't have to include a line like x=base_model(inp) to show what input we want to put in. That's already included in tf.keras.applications.VGG16(...).
Instead of putting this vgg16 base model inside another model, it's easier to do gradcam by adding layers to the base model itself. I grab the output of the last layer of VGG16 (with the top removed), which is the pooling layer.
block5_pool = base_model.get_layer('block5_pool')
x = global_average_layer(block5_pool.output)
x = fc1(x)
x = prediction(x)
model = tf.keras.models.Model(inputs = inp, outputs = x)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
loss='binary_crossentropy',
metrics=['accuracy'])
Now, I grab the layer for visualization, LAYER_NAME='block5_conv3'.
# Create a graph that outputs target convolution and output
grad_model = tf.keras.models.Model(inputs = [model.input],
outputs=[model.output, model.get_layer(LAYER_NAME).output])
print(model.get_layer(LAYER_NAME).output)
# Get the score for target class
# Get the score for target class
with tf.GradientTape() as tape:
predictions, conv_outputs = grad_model(img)
loss = predictions[:, 1]
print('Prediction shape:', predictions.get_shape())
# Extract filters and gradients
output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]

We (I plus a number of team members developing a project) found a similar problem with a code implementing Grad-CAM that we found in a tutorial.
That code didn't work with a model consisting of the base model of VGG19 plus a few extra layers added on top of it. The problem was that the VGG19 base model was inserted as a "layer" inside our model, and apparently the GradCAM code didn't know how to deal with it - we were getting a "Graph disconnected..." error. Then after some debugging (carried out by another team member, not me) we managed to modify the original code to make it work for this kind of model that contains another model inside it. The idea is to add the inner model as an extra argument of the class GradCAM. Since this may be helpful to others I am including the modified code below (we also renamed the GradCAM class as My_GradCAM).
class My_GradCAM:
def __init__(self, model, classIdx, inner_model=None, layerName=None):
self.model = model
self.classIdx = classIdx
self.inner_model = inner_model
if self.inner_model == None:
self.inner_model = model
self.layerName = layerName
[...]
gradModel = tensorflow.keras.models.Model(inputs=[self.inner_model.inputs],
outputs=[self.inner_model.get_layer(self.layerName).output,
self.inner_model.output])
Then the class can be instantiated by adding the inner model as the extra argument, e.g.:
cam = My_GradCAM(model, None, inner_model=model.get_layer("vgg19"), layerName="block5_pool")
I hope this helps.
Edit: Credit to Mirtha Lucas for doing the debugging and finding the solution.

After a lot of struggle, I condense the way to draw the heat map when you are using transfer learning. Here is the keras official tutorial
The issue I encounter is that when I'm trying to draw the heat map
from my model, the densenet can be only seen as functional layer in my
model. So the make_gradcam_heatmap can not figure out the layer that
inside functional layer. As the 5th layer shows.
Therefore, to simulate the Keras official document, I need to only use the densenet as the model for visualization. Here is the step
Only Take out the model from your model
dense_model = dense_model.get_layer('densenet121')
Copy the weight from dense model to your new initiated model
inputs = tf.keras.Input(shape=(224, 224, 3))
model = model_builder(weights="imagenet", include_top=True, input_tensor=inputs)
for layer, dense_layer in zip(model.layers[1:], dense_model.layers[1:]):
layer.set_weights(dense_layer.get_weights())
relu = model.get_layer('relu')
x = tf.keras.layers.GlobalAveragePooling2D()(relu.output)
outputs = tf.keras.layers.Dense(5)(x)
model = tf.keras.models.Model(inputs = inputs, outputs = outputs)
Draw the heat map
preprocess_input = keras.applications.densenet.preprocess_input
img_array = preprocess_input(get_img_array(img_path, size=(224, 224)))
heatmap = make_gradcam_heatmap(img_array, model, 'bn')
plt.matshow(heatmap)
plt.show()
get_img_array, make_gradcam_heatmap and save_and_display_gradcam are kept in still. Follow the keras tutorial then you are good to go.

Solved: How to combine tf.gradients with tf.data.dataset and keras models

I'm trying to build a workflow that uses tf.data.dataset batches and an iterator. For performance reasons, I am really trying to avoid using the placeholder->feed_dict loop workflow.
The process I'm trying to implement involves grad-cam (which requires the gradient of the loss with respect to the final convolutional layer of a CNN) as an intermediate step, and ideally I'd like to be able to try it out on several Keras pre-trained models, including non-sequential ones like ResNet.
Most implementations of grad-cam that I've found rely on hand-crafting the CNN of interest in tensorflow. I found one implementation, https://github.com/jacobgil/keras-grad-cam, that is made for keras models, and following that example, I get
def safe_norm(x):
return x / tf.sqrt(tf.reduce_mean(x ** 2) + 1e-8)
vgg_ = VGG19()
dataset = tf.data.Dataset.from_tensor_slices((filenames))
#preprocessing...
it = dataset.make_one_shot_iterator()
files, batch = it.get_next()
conv5_4 = vgg_.layers[-6]
h_k, w_k, c_k = conv5_4.output.shape[1:]
vgg_model = Model(inputs=vgg_.input, outputs=vgg_.output)
conv_model = Model(inputs=vgg_.input, outputs=conv5_4.output)
probs = vgg_model(batch)
predicted_class = tf.argmax(probs, axis=-1)
layer_name = 'block5_conv4'
target_layer = lambda x: target_category_loss(x, predicted_class, n_categories)
x = Lambda(target_layer)(vgg_model.outputs[0])
model = Model(inputs=vgg_model.inputs[0], outputs=x)
loss = K.sum(model.output, axis=-1)
conv_output = [l for l in model.layers if l.name is layer_name][0].output
grads = Lambda(safe_norm)(K.gradients(loss, [conv_output])[0])
gradient_function = K.function([model.input], [conv_output, grads])
output, grads_val = gradient_function([batch])
weights = tf.reduce_mean(grads_val, axis = (1, 2))
cam = tf.ones([batch_size, h_k, w_k], dtype = tf.float32)
cam += tf.reduce_sum(output * tf.reshape(weights, [-1, 1, 1, weights.shape[-1]]), axis=-1)
cam = tf.squeeze(tf.image.resize_images(images=tf.expand_dims(cam, axis=-1), size=(224, 224)))
cam = tf.maximum(cam, 0)
heatmap = cam / tf.reshape(tf.reduce_max(cam, axis=[1, 2]), shape=[-1, 1, 1])
The problem is that gradient_function([batch]) returns a numpy array whose value is determined by the first batch, so that heatmap doesn't change with subsequent evaluations.
I've tried replacing K.function with a Model in various ways, but nothing seems to work. I usually end up either with an error suggesting that grads evaluates to None or that one model or another is expecting a feed_dict and not receiving one.
Is this code salvageable? Is there a better way to do this besides looping through the data several times (once to get all the grad-cams and then again once I have them) or using placeholders and feed_dicts?
Edit:
def safe_norm(x):
return x / tf.sqrt(tf.reduce_mean(x ** 2) + 1e-8)
vgg_ = VGG19()
dataset = tf.data.Dataset.from_tensor_slices((filenames))
#preprocessing...
it = dataset.make_one_shot_iterator()
files, batch = it.get_next()
conv5_4 = vgg_.layers[-6]
h_k, w_k, c_k = conv5_4.output.shape[1:]
vgg_model = Model(inputs=vgg_.input, outputs=vgg_.output)
conv_model = Model(inputs=vgg_.input, outputs=conv5_4.output)
probs = vgg_model(batch)
predicted_class = tf.argmax(probs, axis=-1)
layer_name = 'block5_conv4'
target_layer = lambda x: target_category_loss(x, predicted_class, n_categories)
x = Lambda(target_layer)(vgg_model.outputs[0])
model = Model(inputs=vgg_model.inputs[0], outputs=x)
loss = K.sum(model.output, axis=-1)
conv_output = [l for l in model.layers if l.name is layer_name][0].output
grads = Lambda(safe_norm)(K.gradients(loss, [conv_output])[0])
gradient_function = K.function([model.input], [conv_output, grads])
output, grads_val = gradient_function([batch])
weights = tf.reduce_mean(grads_val, axis = (1, 2))
cam = tf.ones([batch_size, h_k, w_k], dtype = tf.float32)
cam += tf.reduce_sum(output * tf.reshape(weights, [-1, 1, 1, weights.shape[-1]]), axis=-1)
cam = tf.squeeze(tf.image.resize_images(images=tf.expand_dims(cam, axis=-1), size=(224, 224)))
cam = tf.maximum(cam, 0)
heatmap = cam / tf.reshape(tf.reduce_max(cam, axis=[1, 2]), shape=[-1, 1, 1])
# other operations on heatmap and batch ...
# ...
output_function = K.function(model.input, [node1, ..., nodeN])
for batch in range(n_batches):
outputs1, ... , outputsN = output_function(batch)
Gives me the desired outputs for each batch.

Yes, K.function returns numpy arrays because it evaluates the symbolic computation in your graph. What I think you should do is to keep everything symbolic up to K.function, and after getting the gradients, perform all computations of the Grad-CAM weights and final saliency map using numpy.
Then you can iterate on your dataset, evaluate gradient_function on a new batch of data, and compute the saliency map.
If you want to keep everything symbolic, then you should not use K.function to produce the gradient function, but use the symbolic gradient (the output of K.gradient, without lambda) and convolutional feature maps (conv_output) and perform the saliency map computation on top of that, and then build a function (using K.function) that takes the model input, and outputs the saliency map.
Hope the explanation is enough.

Tensorflow, I got a Shape mismatch in execution time

Good Afternoon Everyone,
I am currently having some trouble with tensorflow, since for some reason I get a Shape error after about 3 and a half hours running. The files are loaded using the tensorflow pipeline, and creating two reinitializable datasets for training and test. I know the data has the correct shape because I do a hardcoded reshape to the expected shape and I've never got an error there. The problem is, when running the network at some point there is a sample that do not have the correct amount of number in the flatten operation. And the program crashes, but there is no other explanation other than the number of elements in the tensor is not divisible by 10 (my batch size). Which honestly makes no sense to me since the data has gone exactly through the same pipeline as the other batches that run with no problem.
I can provide code if needed, but I think is more a failure to understand some concept from the framework.
Thanks in advance for all the help.
EDIT: Please, find the code here, a bit of nomemclature t corresponds to a layer that has time data (X), f corresponds to a layer that has frequency data (FREQ), q corresponds to a layer that contains cepstral data (QUEF) and tf corresponds to layers that contain 2-D data, spectrograms of X (SPECG), Y is the label. All data are tf.float32 except for the labels which are tf.int64
EDIT 2: The operation that gives problems is the flatten on qsubnet_out
EDIT 3: Probably the most important, it seems than some of the layers converge to NaNs
Training loop:
for i in range(FLAGS.max_steps):
start = time.time()
sess.run([train],feed_dict={handle:train_handle})
if i%10 == False:
summary_op,entropy,acc,expected,output = sess.run([merged,loss,accuracy,Y,tf.argmax(logit,1)],feed_dict={handle:train_handle})
summary_op,_,_ = sess.run([merged,loss,accuracy],feed_dict={handle:test_handle})
Training operations:
W = { 'tc1': [64,3], 'tc2':[128,3], 'tc3':[256,5], 'tc4': [128, 2],
'fc1': [64,3], 'fc2':[128,3], 'fc3':[256,5], 'fc4': [128, 2],
'qc1': [64,3], 'qc2':[128,3], 'qc3':[256,5], 'qc4': [128, 2],
'tfc1': [64,(3,3)], 'tfc2':[128,(3,3)], 'tfc3':[256,(5,5)], 'tfc4': [128, (2,2)],
'dense1': 1000, 'dense2': 100, 'dense3': 200,'dense4': 300, 'dense5': 200,
'out' : NUM_CLASSES
}
iter = tf.data.Iterator.from_string_handle(handle, train_dataset.output_types, train_dataset.output_shapes)
X,FREQ,QUEF,SPECG,Y = iter.get_next()
X.set_shape([FLAGS.batch_size,768,14])
FREQ.set_shape([FLAGS.batch_size,384,14])
QUEF.set_shape([FLAGS.batch_size,384,14])
SPECG.set_shape([FLAGS.batch_size,65,18,14])
logit = net.run(X,FREQ,QUEF,SPECG,W)
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=Y,logits=logit))
And the the file net.py:
def run(X,FREQ,QUEF,SPECG,W):
time = tf.layers.batch_normalization(X,axis=-1,training=True,trainable=True)
freq = tf.layers.batch_normalization(FREQ,axis=-1,training=True,trainable=True)
quef = tf.layers.batch_normalization(QUEF,axis=-1,training=True,trainable=True)
time_freq = tf.layers.batch_normalization(SPECG,axis=-1,training=True,trainable=True)
regularizer = tf.contrib.layers.l2_regularizer(0.1);
#########################################################################################################
#### TIME SUBNET
with tf.device('/GPU:1'):
tc1 = tf.layers.conv1d(inputs=time,filters=W['tc1'][0],kernel_size=W['tc1'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc1')
trelu1 = tf.nn.relu(features=tc1,name='trelu1')
tpool1 = tf.layers.max_pooling1d(trelu1,pool_size=2,strides=1)
tc2 = tf.layers.conv1d(inputs=tpool1,filters=W['tc2'][0],kernel_size=W['tc2'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc2')
tc3 = tf.layers.conv1d(inputs=tc2,filters=W['tc3'][0],kernel_size=W['tc3'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc3')
trelu2 = tf.nn.relu(tc3,name='trelu2')
tpool2 = tf.layers.max_pooling1d(trelu2,pool_size=2,strides=1)
tc4 = tf.layers.conv1d(inputs=tpool2,filters=W['tc4'][0],kernel_size=W['tc4'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc4')
tsubnet_out = tf.nn.relu6(tc4,'trelu61')
#########################################################################################################
#### CEPSTRUM SUBNET (QUEFRENCIAL)
qc1 = tf.layers.conv1d(inputs=quef,filters=W['qc1'][0],kernel_size=W['qc1'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc1')
qrelu1 = tf.nn.relu(features=qc1,name='qrelu1')
qpool1 = tf.layers.max_pooling1d(qrelu1,pool_size=2,strides=1)
qc2 = tf.layers.conv1d(inputs=qpool1,filters=W['qc2'][0],kernel_size=W['qc2'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc2')
qc3 = tf.layers.conv1d(inputs=qc2,filters=W['qc3'][0],kernel_size=W['qc3'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc3')
qrelu2 = tf.nn.relu(qc3,name='qrelu2')
qpool2 = tf.layers.max_pooling1d(qrelu2,pool_size=2,strides=1)
qc4 = tf.layers.conv1d(inputs=qpool2,filters=W['qc4'][0],kernel_size=W['qc4'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc4')
qsubnet_out = tf.nn.relu6(qc4,'qrelu61')
#########################################################################################################
#FREQ SUBNET
with tf.device('/GPU:1'):
fc1 = tf.layers.conv1d(inputs=freq,filters=W['fc1'][0],kernel_size=W['fc1'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc1')
frelu1 = tf.nn.relu(features=fc1,name='trelu1')
fpool1 = tf.layers.max_pooling1d(frelu1,pool_size=2,strides=1)
fc2 = tf.layers.conv1d(inputs=fpool1,filters=W['fc2'][0],kernel_size=W['fc2'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc2')
fc3 = tf.layers.conv1d(inputs=fc2,filters=W['fc3'][0],kernel_size=W['fc3'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc3')
frelu2 = tf.nn.relu(fc3,name='frelu2')
fpool2 = tf.layers.max_pooling1d(frelu2,pool_size=2,strides=1)
fc4 = tf.layers.conv1d(inputs=fpool2,filters=W['fc4'][0],kernel_size=W['fc4'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc4')
fsubnet_out = tf.nn.relu6(fc4,'frelu61')
########################################################################################################
## TIME/FREQ SUBNET
with tf.device('/GPU:0'):
tfc1 = tf.layers.conv2d(inputs=time_freq,filters=W['tfc1'][0],kernel_size=W['tfc1'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc1')
tfrelu1 = tf.nn.relu(tfc1)
tfpool1 = tf.layers.max_pooling2d(tfrelu1,pool_size=[2, 2],strides=[1, 1])
tfc2 = tf.layers.conv2d(inputs=tfpool1,filters=W['tfc2'][0],kernel_size=W['tfc2'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc2')
tfc3 = tf.layers.conv2d(inputs=tfc2,filters=W['tfc3'][0],kernel_size=W['tfc3'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc3')
tfrelu2 = tf.nn.relu(tfc3)
tfpool2 = tf.layers.max_pooling2d(tfrelu2,pool_size=[2, 2], strides=[1, 1])
tfc4 = tf.layers.conv2d(inputs=tfpool2,filters=W['tfc4'][0],kernel_size=W['tfc4'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc4')
tfsubnet_out = tf.nn.relu6(tfc4,'tfrelu61')
########################################################################################################
##Flatten subnet outputs
tsubnet_out = tf.layers.flatten(tsubnet_out)
fsubnet_out = tf.layers.flatten(fsubnet_out)
tfsubnet_out = tf.layers.flatten(tfsubnet_out)
qsubnet_out = tf.layers.flatten(qsubnet_out)
#Final subnet computation
input_final = tf.concat((tsubnet_out,fsubnet_out,qsubnet_out,tfsubnet_out),1)
dense1 = tf.layers.dense(input_final,W['dense1'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense1')
dense2 = tf.layers.dense(dense1,W['dense2'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense2')
dense3 = tf.layers.dense(dense2,W['dense3'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense3')
dense4 = tf.layers.dense(dense3,W['dense4'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense4')
dense5 = tf.layers.dense(dense4,W['dense5'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense5')
out = tf.layers.dense(dense5,W['out'],tf.nn.relu, name='out')
return out

Finally after some days, I've been able to track the problem. Which was not related to the code, I submitted, in the end. But it was related to the creation of the Tensorflow Dataset. Since in the batchin, if the length of the Dataset was not divisible by the batch size. The flag drop_remainder to True.
I will not delete the question since I believe is a problem that more people may have in the future and the source is not easily identificable.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to feed multiple images at once in VGG16-CNN? - python

Related

Feature importance in neural networks with multiple differently shaped inputs in pytorch and captum (classification)

Get the CNN layer output size in def init PyTorch

Gradcam with guided backprop for transfer learning in Tensorflow 2.0

Solved: How to combine tf.gradients with tf.data.dataset and keras models

Tensorflow, I got a Shape mismatch in execution time

Categories

Resources