My neural network receives a (1000, 1000, 5) shape array which undergoes convolution in one branch (5 stacked raster images) and a (12) shape array (just 12 numbers) which go through a couple of dense layers in a second branch.
The outputs are concatenated into a (31, 31, 65) shape tensor which then goes deconvolution into a final (1000, 1000) shape array.
My Issue:
I made my own simple loss function (mean error), because the output represents temperature in an area.
My issue is currently that my loss goes down significantly over 200 epochs (both loss and val_loss, from a small decimal to about -3) and the accuracy hovers around 0.002 the entire time .
I have changed the learning rate as low as 1e-5. I have given more samples to the training set (there aren't many samples to begin with unfortunately), increased (for fear of overfitting) and decreased (for lack of data) the batch size. All the input data is normalized to 0:1, which makes losses of anything beyond -1 unreasonable.
I am not sure whether I should use a different optimizer for this task, or different activation, or just remove a layer or two. But mostly I'd love to understand what is happening to make the model so unreliable.
I really tried to refrain from having to post the entire thing on here but I am officially out of ideas.
MLP Branch
dim = 12
inputs = Input(shape = (dim, ))
x = inputs
x = Dense(dim * 4, activation = 'relu')(x)
x = Dense(dim * 16, activation = 'relu')(x)
x = Dense(961, activation = 'relu')(x) # 961 nodes
x = Reshape((31, 31, 1))(x) # (31, 31, 1) array
model1 = Model(inputs, x)
Convolutional Branch
inputShape = (1000, 1000, 5)
chanDim = -1
inputs = Input(shape = inputShape)
x = inputs
# layer 1: conv, f = 8, pool = 2
x = Conv2D(8, (3, 3), padding = 'same', activation = 'relu')(x)
x = BatchNormalization(axis = chanDim)(x)
x = MaxPooling2D(pool_size = (2, 2))(x)
# layer 2: conv, f = 16, pool = 2
x = Conv2D(16, (3, 3), padding = 'same', activation = 'relu')(x)
x = BatchNormalization(axis = chanDim)(x)
x = MaxPooling2D(pool_size = (2, 2))(x)
# layer 3: conv, f = 32, pool = 2
x = Conv2D(32, (3, 3), padding = 'same', activation = 'relu')(x)
x = BatchNormalization(axis = chanDim)(x)
x = MaxPooling2D(pool_size = (2, 2))(x)
# layer 4: conv = 64, pool = 4
x = Conv2D(64, (3, 3), padding = 'same', activation = 'relu')(x)
x = BatchNormalization(axis = chanDim)(x)
x = MaxPooling2D(pool_size = (4, 4))(x)
model2 = Model(inputs, x)
Deconvolution
combinedInput = Concatenate()([model1.output, model2.output])
x = combinedInput # (31, 31, 65)
x = Conv2DTranspose(43, (3, 3), strides = (4, 4), padding = 'same', activation = 'relu')(x) # (124, 124, 43)
x = Conv2DTranspose(22, (3, 3), strides = (2, 2), padding = 'same', activation = 'relu')(x) # (248, 248, 22)
x = Lambda(lambda y: spatial_2d_padding(y))(x) # (250, 250, 22)
x = Conv2DTranspose(10, (3, 3), strides = (2, 2), padding = 'same', activation = 'relu')(x) # (500, 500, 10)
x = Conv2DTranspose(1, (3, 3), strides = (2, 2), padding = 'same', activation = 'linear')(x) # (1000, 1000, 1)
x = Lambda(lambda y: squeeze(y, axis = 3))(x) # (1000, 1000)
Compiling
def custom_loss(y_actual, y_predicted):
custom_loss_value = mean(y_actual - y_predicted)
return custom_loss_value
model = Model(inputs = [mlp.input, cnn.input], outputs = x)
model.compile(loss = custom_loss, optimizer = Adam(lr = 0.000001), metrics = ['mae'])
# train with epochs = 200, batch_size = 12
The Issue
As I explained above, my loss never stabilizes and the accuracy hovers roughly around the same number over the epochs.
I'd love to know possible reasons and possible solutions.
Edits:
Since writing this question I have attempted:
Transfering layers from the convolution branch to the deconvolution branch.
Adding BatchNormalization() after every Conv2DTranspose() layer.
Related
I'm having a problem implementing a super-resolution model
class SRNet(Model):
def __init__(self, scale=4):
super(SRNet, self).__init__()
self.scale = scale
self.conv1 = Sequential([
layers.Conv2D(filters=64, kernel_size=3,
strides=(1, 1), padding="same", data_format="channels_first"),
layers.ReLU(),
])
self.residualBlocks = Sequential(
[ResidualBlock() for _ in range(16)])
self.convUp = Sequential([
layers.Conv2DTranspose(filters=64, kernel_size=3, strides=(
2, 2), padding="same", data_format="channels_first"),
layers.ReLU(),
layers.Conv2DTranspose(filters=64, kernel_size=3, strides=(
2, 2), padding="same", data_format="channels_first"),
layers.ReLU(),
])
self.reluAfterPixleShuffle = layers.ReLU()
self.convOut = layers.Conv2D(
filters=3, kernel_size=3, strides=(1, 1), padding="same", data_format="channels_first", input_shape=(4, 1440, 2560)) # (kernel, kernel, channel, output)
def call(self, lrCur_hrPrevTran):
lrCur, hrPrevTran = lrCur_hrPrevTran
x = tf.concat([lrCur, hrPrevTran], axis=1)
x = self.conv1(x)
x = self.residualBlocks(x)
x = self.convUp(x)
# pixel shuffle
Subpixel_layer = Lambda(lambda x: tf.nn.depth_to_space(
x, self.scale, data_format="NCHW"))
x = Subpixel_layer(inputs=x)
x = self.reluAfterPixleShuffle(x)
x = self.convOut(x)
return x
Error
/usr/src/app/generator.py:164 call *
x = self.convOut(x)
ValueError: Tensor's shape (3, 3, 64, 3) is not compatible with supplied shape (3, 3, 4, 3)
after reading the error I know that (3, 3, 4, 3) is (kernel size, kernel size, channel, output) mean that only channel of input is not correct
so I printed out the shape of the input
# after pixel shuffle before convOut
print(x.shape)
>>> (1, 4, 1440, 2560) (batch size, channel, height, width)
but the shape of x after pixel shuffle (depth_to_space) is (1, 4, 1440, 2560) the channel value is 4 which is the same as convOut need
question is why the input's channel is changing from 4 to 64 as the error?
I have found a solution
First of all, I'm using checkpoints to save model weight when training
during the implementation and testing of the model, I have changed some of the layers so the input size is changed too, but my weight still remember the input size from the previous checkpoint
so I delete the checkpoints folder and then everything works again
I am trying to create a Conv-6 CNN for classification using CIFAR-10 dataset. This CNN uses 3 blocks, where each block has 2 conv layers followed by a max pooling layer. This is flattened and passed to two dense layers before being passed to the output layer.
The code I have is:
# input image dimensions
img_rows, img_cols = 32, 32
# Load CIFAR-10 dataset-
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
if tf.keras.backend.image_data_format() == 'channels_first':
X_train = X_train.reshape(X_train.shape[0], 3, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 3, img_rows, img_cols)
input_shape = (3, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 3)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 3)
input_shape = (img_rows, img_cols, 3)
print("\n'input_shape' which will be used = {0}\n".format(input_shape))
# 'input_shape' which will be used = (32, 32, 3)
# Convert datasets to floating point types-
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# Normalize the training and testing datasets-
X_train /= 255.0
X_test /= 255.0
# Convert class vectors/target to binary class matrices or one-hot encoded values-
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)
print("\nDimensions of training and testing sets are:")
print("X_train.shape = {0}, y_train.shape = {1}".format(X_train.shape, y_train.shape))
print("X_test.shape = {0}, y_test.shape = {1}".format(X_test.shape, y_test.shape))
# Dimensions of training and testing sets are:
# X_train.shape = (50000, 32, 32, 3), y_train.shape = (50000, 10)
# X_test.shape = (10000, 32, 32, 3), y_test.shape = (10000, 10)
class Conv6(Model):
def __init__(self, **kwargs):
super(Conv6, self).__init__(**kwargs)
self.conv1 = Conv2D(
filters = 64, kernel_size = (3, 3),
activation = 'relu', kernel_initializer = tf.initializers.GlorotNormal(),
strides = (1, 1), padding = 'same'
)
self.conv2 = Conv2D(
filters = 64, kernel_size = (3, 3),
activation = 'relu', kernel_initializer = tf.initializers.GlorotNormal(),
strides = (1, 1), padding = 'same'
)
self.pool1 = MaxPooling2D(
pool_size = (2, 2),
strides = (2, 2)
)
self.conv3 = Conv2D(
filters = 128, kernel_size = (3, 3),
activation = 'relu', kernel_initializer = tf.initializers.GlorotNormal(),
strides = (1, 1), padding = 'same'
)
self.conv4 = Conv2D(
filters = 128, kernel_size = (3, 3),
activation = 'relu', kernel_initializer = tf.initializers.GlorotNormal(),
strides = (1, 1), padding = 'same'
)
self.pool2 = MaxPooling2D(
pool_size = (2, 2),
strides = (2, 2)
)
self.conv5 = Conv2D(
filters = 256, kernel_size = (3, 3),
activation = 'relu', kernel_initializer = tf.initializers.GlorotNormal(),
strides = (1, 1), padding = 'same'
)
self.conv6 = Conv2D(
filters = 256, kernel_size = (3, 3),
activation = 'relu', kernel_initializer = tf.initializers.GlorotNormal(),
strides = (1, 1), padding = 'same'
)
self.flatten = Flatten()
self.dense1 = Dense(
units = 256, activation = 'relu',
kernel_initializer = tf.initializers.GlorotNormal()
)
self.dense2 = Dense(
units = 256, activation = 'relu',
kernel_initializer = tf.initializers.GlorotNormal()
)
self.op = Dense(
units = num_classes, activation = 'softmax'
)
def call(self, inputs):
x = self.conv1(inputs)
x = self.conv2(x)
x = self.pool1(inputs)
x = self.conv3(x)
x = self.conv4(x)
x = self.pool2(x)
x = self.conv5(x)
x = self.conv6(x)
x = self.flatten(x)
x = self.dense1(x)
x = self.dense2(x)
return self.op(x)
# Initialize a Conv-6 CNN model-
model = Conv6()
# Compile defined model-
model.compile(
loss=tf.keras.losses.categorical_crossentropy,
# optimizer='adam',
optimizer=tf.keras.optimizers.Adam(lr = 0.0003),
metrics=['accuracy']
)
# Define early stopping callback-
early_stopping_callback = tf.keras.callbacks.EarlyStopping(
monitor = 'val_loss', min_delta = 0.001,
patience = 3)
# Train defined and compiled model-
history = model.fit(
x = X_train, y = y_train,
batch_size = batch_size, shuffle = True,
epochs = num_epochs,
callbacks = [early_stopping_callback],
validation_data = (X_test, y_test)
)
On calling "model.fit()", it gives me the following warnings:
WARNING:tensorflow:Gradients do not exist for variables
['conv6/conv2d/kernel:0', 'conv6/conv2d/bias:0',
'conv6/conv2d_1/kernel:0', 'conv6/conv2d_1/bias:0'] when minimizing
the loss. WARNING:tensorflow:Gradients do not exist for variables
['conv6/conv2d/kernel:0', 'conv6/conv2d/bias:0',
'conv6/conv2d_1/kernel:0', 'conv6/conv2d_1/bias:0'] when minimizing
the loss.
In spite of the warnings, the defined CNN model reaches a validation accuracy of about 72% in 9 epochs.
Why am I getting these warnings?
Thanks!
In "call()" method, use the following line of code to remove the warnings:
x = self.pool1(inputs)
Working on a university exercise, I used the model sub-classing API of TF2.0. Here's my code (it's the Alexnet architecture, if you wonder...):
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
# OPS
self.relu = Activation('relu', name='ReLU')
self.maxpool = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='valid', name='MaxPool')
self.softmax = Activation('softmax', name='Softmax')
# Conv layers
self.conv1 = Conv2D(filters=96, input_shape=(224, 224, 3), kernel_size=(11, 11), strides=(4, 4), padding='same',
name='conv1')
self.conv2a = Conv2D(filters=128, kernel_size=(5, 5), strides=(1, 1), padding='same', name='conv2a')
self.conv2b = Conv2D(filters=128, kernel_size=(5, 5), strides=(1, 1), padding='same', name='conv2b')
self.conv3 = Conv2D(filters=384, kernel_size=(3, 3), strides=(1, 1), padding='same', name='conv3')
self.conv4a = Conv2D(filters=192, kernel_size=(3, 3), strides=(1, 1), padding='same', name='conv4a')
self.conv4b = Conv2D(filters=192, kernel_size=(3, 3), strides=(1, 1), padding='same', name='conv4b')
self.conv5a = Conv2D(filters=128, kernel_size=(3, 3), strides=(1, 1), padding='same', name='conv5a')
self.conv5b = Conv2D(filters=128, kernel_size=(3, 3), strides=(1, 1), padding='same', name='conv5b')
# Fully-connected layers
self.flatten = Flatten()
self.dense1 = Dense(4096, input_shape=(100,), name='FC_4096_1')
self.dense2 = Dense(4096, name='FC_4096_2')
self.dense3 = Dense(1000, name='FC_1000')
# Network definition
def call(self, x, **kwargs):
x = self.conv1(x)
x = self.relu(x)
x = tf.nn.local_response_normalization(x, depth_radius=2, alpha=2e-05, beta=0.75, bias=1.0)
x = self.maxpool(x)
x = tf.concat((self.conv2a(x[:, :, :, :48]), self.conv2b(x[:, :, :, 48:])), 3)
x = self.relu(x)
x = tf.nn.local_response_normalization(x, depth_radius=2, alpha=2e-05, beta=0.75, bias=1.0)
x = self.maxpool(x)
x = self.conv3(x)
x = self.relu(x)
x = tf.concat((self.conv4a(x[:, :, :, :192]), self.conv4b(x[:, :, :, 192:])), 3)
x = self.relu(x)
x = tf.concat((self.conv5a(x[:, :, :, :192]), self.conv5b(x[:, :, :, 192:])), 3)
x = self.relu(x)
x = self.maxpool(x)
x = self.flatten(x)
x = self.dense1(x)
x = self.relu(x)
x = self.dense2(x)
x = self.relu(x)
x = self.dense3(x)
return self.softmax(x)
My goal is to access an arbitrary layer's output (in order to maximize a specific neuron's activation, if you have to know exactly :) ). The problem is that trying to access any layer's output, I get an attribute error. For example:
model = MyModel()
print(model.get_layer('conv1').output)
# => AttributeError: Layer conv1 has no inbound nodes.
I found some questions with this error here in SO, and all of them claim that I have to define the input shape in the first layer, but as you can see - it's already done (see the definition of self.conv1 in the __init__ function)!
I did find that if I define a keras.layers.Input object, I do manage to get the output of conv1, but trying to access deeper layers fails, for example:
model = MyModel()
I = tf.keras.Input(shape=(224, 224, 3))
model(I)
print(model.get_layer('conv1').output)
# prints Tensor("my_model/conv1/Identity:0", shape=(None, 56, 56, 96), dtype=float32)
print(model.get_layer('FC_1000').output)
# => AttributeError: Layer FC_1000 has no inbound nodes.
I googled every exception that I got on the way, but found no answer. How can I access any layer's input/output (or input/output _shape attributes, by the way) in this case?
In sub-classed model there is no graph of layers, it's just a piece of code (models call function). Layer connections are not defined while creating instance of Model class. Hence we need to build model first by calling call method.
Try this:
model = MyModel()
inputs = tf.keras.Input(shape=(224,224,3))
model.call(inputs)
# instead of model(I) in your code.
After doing this model graph is created.
for i in model.layers:
print(i.output)
# output
# Tensor("ReLU_7/Relu:0", shape=(?, 56, 56, 96), dtype=float32)
# Tensor("MaxPool_3/MaxPool:0", shape=(?, 27, 27, 96), dtype=float32)
# Tensor("Softmax_1/Softmax:0", shape=(?, 1000), dtype=float32)
# ...
Is it possible x*a from input size(None, None, 3) in Keras? For example, the input x and constant a: x(batch,None,None,1024)*a(batch,1).
I use input size (64, 64, 3) in train but test data should use variable input sizes. The test size can not be adjusted for fair image processing.
I try Lambda function(lambda x : x * a)(seq). Then, I got no problem in code. And then, I start model.fit function, I got error:
------------->>tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [7,4,4,1024] vs. [7,1].
.
conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
conv_c = Conv2D(num_classes, 1, activation='softmax')(conv5)
conv_c1 = GlobalAveragePooling2D(name="class_output")(conv_c)
conv_c1_1 = conv_c1[:, 0:1]
conv_c1_2 = conv_c1[:, 1:2]
conv_c1_3 = conv_c1[:, 2:3]
conv5_b = Lambda(lambda x: x * conv_c1_1)(conv5) #conv5:Tensor(shape=(?, 4, 4, 1024))
conv5_h = Lambda(lambda x: x * conv_c1_2)(conv5) #conv_c1_1: Tensor(shape=(?, 1))
conv5_r = Lambda(lambda x: x * conv_c1_3)(conv5)
I get the following binary classification Keras model, which trains not well, but trains:
def vgg_stack(self):
def func(x):
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((3, 3), strides=(2, 2))(x)
x = layers.Conv2D(128, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)
x = layers.Conv2D(128, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)
x = layers.Flatten()(x)
x = layers.Dense(512, activation='relu')(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dense(1, activation='sigmoid')(x)
return x
return func
def implement(self):
self.inputs = layers.Input((self.input_width, self.input_height, self.input_depth))
self.outputs = self.vgg_stack()(self.inputs)
self.opt = optimizers.Adam(lr=self.learning_rate)
self.model = models.Model(inputs=self.inputs, outputs=self.outputs)
self.model.compile(loss='binary_crossentropy', optimizer=self.opt)
def fit_predict(self):
...
self.model.fit(data_train, actuals_train, batch_size=self.batch_size, epochs=10, verbose=1,
validation_data=[data_validation, actuals_validation], callbacks=[self])
it's predictions look like following
[[ 0.58952832]
[ 0.89163774]
[ 0.99083483]
...,
[ 0.52727282]
[ 0.72056866]
[ 0.99504411]]
I.e. it's something.
I tried to convert the model to pure tensroflow and got
def conv2drelu(self, x, filters, kernel_size, padding='VALID'):
input_depth = int(x.get_shape()[-1])
weights = tf.Variable(tf.truncated_normal([kernel_size[0], kernel_size[0], input_depth, filters],
dtype=tf.float32, stddev=self.init_stddev))
self.var_list.append(weights)
biases = tf.Variable(tf.constant(0.0, shape=[filters], dtype=tf.float32))
self.var_list.append(biases)
y = tf.nn.conv2d(x, weights, [1, 1, 1, 1], padding=padding)
y = tf.nn.bias_add(y, biases)
y = tf.nn.relu(y)
return y
def maxpooling(self, x, pool_size, strides, padding='VALID'):
y = tf.nn.max_pool(x, ksize=[1, pool_size[0], pool_size[1], 1], strides=[1, strides[0], strides[1], 1],
padding=padding)
return y
def flatten(self, x):
shape = int(np.prod(x.get_shape()[1:]))
y = tf.reshape(x, [-1, shape])
return y
def dense(self, x, units, activation):
shape = int(x.get_shape()[1])
weights = tf.Variable(tf.truncated_normal([shape, units], dtype=tf.float32, stddev=self.init_stddev))
self.var_list.append(weights)
biases = tf.Variable(tf.constant(0.0, shape=[units], dtype=tf.float32))
self.var_list.append(biases)
y = tf.matmul(x, weights)
y = tf.nn.bias_add(y, biases)
if activation == 'relu':
y = tf.nn.relu(y)
elif activation == 'sigmoid':
y = tf.nn.sigmoid(y)
return y
def vgg_stack(self, x):
x = self.conv2drelu(x, 64, (3, 3))
x = self.maxpooling(x, (3, 3), strides=(2, 2))
x = self.conv2drelu(x, 128, (3, 3))
x = self.maxpooling(x, (2, 2), strides=(2, 2))
x = self.conv2drelu(x, 128, (3, 3))
x = self.maxpooling(x, (2, 2), strides=(2, 2))
x = self.conv2drelu(x, 64, (3, 3))
x = self.maxpooling(x, (2, 2), strides=(2, 2))
x = self.flatten(x)
x = self.dense(x, 512, activation='relu')
x = self.dense(x, 256, activation='relu')
x = self.dense(x, 1, activation='sigmoid')
return x
def implement(self):
self.var_list = []
self.input_data = tf.placeholder(tf.float32, shape=(None, self.width, self.height, self.depth))
self.prediction = self.vgg_stack(self.input_data)
self.actual = tf.placeholder(tf.float32, shape=(None, 1))
self.log_loss = tf.losses.log_loss(self.actual, self.prediction)
opt = tf.train.AdamOptimizer(learning_rate=self.learning_rate)
# self.step = opt.minimize(self.mean_squared_error, var_list=self.var_list)
self.step = opt.minimize(self.log_loss, var_list=self.var_list)
i.e. I tries to write functions equivalent to each Keras layer and then combine them into the same structure.
I used all the same numbers. Unfortunately, network provides something degraded:
[[ 0.46732453]
[ 0.46732453]
[ 0.46732453]
...,
[ 0.46732453]
[ 0.46732453]
[ 0.46732453]]
I.e. the same values for all samples.
What can be the reason of this?
Conversion was correct. I wrote unittests for convolution layers from Keras and Tensorflow and found they produce numerically identical results.
Additionally, I replaced optimization goal from just log-loss to sigmoid_cross_entropy_with_logits but this didn't helped alone.
The problem was with too small stdev of initialization values.
I was thinking it is enough to have it very small to break symmetry, and was setting it to 1e-8 or 1e-5, but this was wrong: such small values were nearly identical to zeros and after several layers network was starting to produce identical results for all samples.
After I changed stdev to 1e-1, then netwrok started to perfor as in Keras.