Related
I have a convolutional neural network, that does better job than others for my dataset. The problem is with the ZeroPadding2D I need to place to account for down/up sampling; it creates artifacts in the output (zero samples). So, How can I avoid ZeroPadding2D option without changing the network structure (Layers). I need to maintain the structure as it's (no.layers) and may change the
1- Filter
2- kernel
3- first dimension in my data (e.g 96)
4- any other options
Bellow is my CNN
input_img = Input(shape=(96, 44, 1), name='full')
x = GaussianNoise(.1)(input_img)
x = Conv2D(64, (5, 5), activation='relu', padding='same')(x)
x = AveragePooling2D((2, 2), padding='same')(x)
x = Dropout(0.1)(x)
x = Conv2D(128, (5, 5), activation='relu', padding='same')(x)
x = AveragePooling2D((2, 2), padding='same')(x)
x = Dropout(0.2)(x)
x = Conv2D(512, (5, 5), activation='relu', padding='same')(x)
encoded = AveragePooling2D((2, 2), padding='same')(x)
x = Dropout(0.2)(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
x = Conv2D(512, (5, 5), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Dropout(0.2)(x)
x = Conv2D(128, (5, 5), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Dropout(0.12)(x)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
x = Dropout(0.12)(x)
x = ZeroPadding2D(((4, 0), (0, 0)))(x)
decoded = Conv2D(1, (5, 5), activation='tanh', padding='same',
name='out')(x)
autoencoder = Model(input_img, decoded)
I guess if you replace your Upsampling+ ZeroPadding sections with Conv2DTranspose , that might help to solve your issue.
Have a look here :
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2DTranspose
I have an autoencoder with two outputs and the first output should be used as an input to the next part of autoencoder after some changes. I put my code here and I'm a little confused that this code is logically true or not. in the following code I have one output in decoder part named act11 the output of sigmoid activation and the second output is in the w extraction part with name pred_W. I feed bncv11 to the GaussianNoise instead of act11. I want to know it is correct or not? based on back propagation rules and the structure of the network is it possible to do this? can I just use the output of activation in this code act11=Activation('sigmoid',name='imageprim')(bncv11) for the output of model?
all of my questions is about this part of the code:
decoded = Conv2D(1, (5, 5), padding='same', name='decoder_output',dilation_rate=(2,2))(BNd)
bncv11=BatchNormalization()(decoded)
act11=Activation('sigmoid',name='imageprim')(bncv11)
decoded_noise = GaussianNoise(0.5)(bncv11)
#----------------------w extraction------------------------------------
convw1 = Conv2D(64, (3,3), activation='relu', padding='same', name='conl1w',dilation_rate=(2,2))(decoded_noise)
watermark_extraction=Model(inputs=[image,wtm],outputs=[act11,pred_w])
I want to know the above code based on deep learning is correct or not?
I just use act11 in the activation and for output but use bncv11 for feeding during learning.
wtm=Input((28,28,1))
image = Input((28, 28, 1))
conv1 = Conv2D(64, (5, 5), activation='relu', padding='same', name='convl1e',dilation_rate=(2,2))(image)
conv2 = Conv2D(64, (5, 5), activation='relu', padding='same', name='convl2e',dilation_rate=(2,2))(conv1)
conv3 = Conv2D(64, (5, 5), activation='relu', padding='same', name='convl3e',dilation_rate=(2,2))(conv2)
BN=BatchNormalization()(conv3)
encoded = Conv2D(1, (5, 5), activation='relu', padding='same',name='encoded_I',dilation_rate=(2,2))(BN)
add_const = Kr.layers.Lambda(lambda x: x[0] + x[1])
encoded_merged = add_const([encoded,wtm])
#-----------------------decoder------------------------------------------------
deconv1 = Conv2D(64, (5, 5), activation='relu', padding='same', name='convl1d',dilation_rate=(2,2))(encoded_merged)
deconv2 = Conv2D(64, (5, 5), activation='relu', padding='same', name='convl2d',dilation_rate=(2,2))(deconv1)
deconv3 = Conv2D(64, (5, 5), activation='relu',padding='same', name='convl3d',dilation_rate=(2,2))(deconv2)
deconv4 = Conv2D(64, (5, 5), activation='relu',padding='same', name='convl4d',dilation_rate=(2,2))(deconv3)
BNd=BatchNormalization()(deconv3)
decoded = Conv2D(1, (5, 5), padding='same', name='decoder_output',dilation_rate=(2,2))(BNd)
bncv11=BatchNormalization()(decoded)
act11=Activation('sigmoid',name='imageprim')(bncv11)
decoded_noise = GaussianNoise(0.5)(bncv11)
#----------------------w extraction------------------------------------
convw1 = Conv2D(64, (3,3), activation='relu', padding='same', name='conl1w',dilation_rate=(2,2))(decoded_noise)
convw2 = Conv2D(64, (3, 3), activation='relu', padding='same', name='convl2w',dilation_rate=(2,2))(convw1)
convw3 = Conv2D(64, (3, 3), activation='relu', padding='same', name='conl3w',dilation_rate=(2,2))(convw2)
convw4 = Conv2D(64, (3, 3), activation='relu', padding='same', name='conl4w',dilation_rate=(2,2))(convw3)
convw5 = Conv2D(64, (3, 3), activation='relu', padding='same', name='conl5w',dilation_rate=(2,2))(convw4)
convw6 = Conv2D(64, (3, 3), activation='relu', padding='same', name='conl6w',dilation_rate=(2,2))(convw5)
pred_w = Conv2D(1, (1, 1), activation='sigmoid', padding='same', name='reconstructed_W',dilation_rate=(2,2))(convw6)
watermark_extraction=Model(inputs=[image,wtm],outputs=[act11,pred_w])
I am having trouble getting this model to compile.
I am trying to implement a VGG16 but I will be using a custom loss function. The target variable has a shape of (?, 14, 14, 9, 6) where we only use binary crossentropy on Y_train[:,:,:,:,0] then Y_train[:,:,:,:,1] as a switch to turn off the loss effectively making this a mini-batch -- the others will be used on a separate branch of the neural net. This is a binary classification problem on this branch so I only want to have output of shape (?, 14, 14, 9, 1).
I have listed my error below. Can you please explain firstly what is going wrong and secondly how to mitigate this issue?
Model code
img_input = Input(shape = (224,224,3))
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
# # Block 2
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)
# Block 3
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)
# # Block 4
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)
# # Block 5
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer='normal', name='rpn_conv1')(x)
x_class = Conv2D(9, (1, 1), activation='sigmoid', kernel_initializer='uniform', name='rpn_out_class')(x)
x_class = Reshape((14,14,9,1))(x_class)
model = Model(inputs=img_input, outputs=x_class)
model.compile(loss=rpn_loss_cls(), optimizer='adam')
Loss function code:
def rpn_loss_cls(lambda_rpn_class=1.0, epsilon = 1e-4):
def rpn_loss_cls_fixed_num(y_true, y_pred):
return lambda_rpn_class * K.sum(y_true[:,:,:,:,0]
* K.binary_crossentropy(y_pred[:,:,:,:,:], y_true[:,:,:,:,1]))
/ K.sum(epsilon + y_true[:,:,:,:,0])
return rpn_loss_cls_fixed_num
Error:
ValueError: logits and labels must have the same shape ((?, ?, ?, ?) vs (?, 14, 14, 9, 1))
Note: I have read multiple question on this site having the same error, but none of the solutions allowed my model to compile.
Potential solution:
I continued messing with this and found that by adding
y_true = K.expand_dims(y_true, axis=-1)
I was able to compile the model. Still dubious that this is going to work correctly.
Keras model set y_true shape equivalent to input shape. Therefore, when your loss function gets shape mismatch error. So you need to align dimensions by using expand_dims. This, however, needs to be done considering your model architecture, data and loss function. Code below will compile.
def rpn_loss_cls(lambda_rpn_class=1.0, epsilon = 1e-4):
def rpn_loss_cls_fixed_num(y_true, y_pred):
y_true = tf.keras.backend.expand_dims(y_true, -1)
return lambda_rpn_class * K.sum(y_true[:,:,:,:,0]
* K.binary_crossentropy(y_pred[:,:,:,:,:], y_true[:,:,:,:,1]))
/ K.sum(epsilon + y_true[:,:,:,:,0])
return rpn_loss_cls_fixed_num
Trying to understand the autoencoders, I am looking for an algorithm of an autoencodeur and the principle of autoencoders function, mathematical formulas ..
I'm currently working on autoencoders and trying to take the encoder output the compressed data and i'm not sure if that's the good result
i'm i in the right way?
code :
nput_img = Input(shape=(64, 64, 3))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
encoder_mode = Model(input_img, encoded)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
...
autoencoder.fit
...
...
encoded_imgs = autoencoder.predict(X_test)
plt.imshow(encoded_imgs[i])
is it the encoded input it must be a the characteristic and compressed data ?
Autoencoders are used to encode the main features of the input data. You can think of it as a feature extractor. The result will be blurred because there is data loss when you encode. The principle is to represent the input with less data.
Your input data is 64x64x3 = 12288 pixels. And your encoded is 8x8x64 = 4096. Which is 1/3 of the input data.
Encoded size calculation:
input_img = Input(shape=(64, 64, 3)) # 64x64x3
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img) # 64x64x32
x = MaxPooling2D((2, 2), padding='same')(x) # 32x32x32
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x) # 32x32x64
x = MaxPooling2D((2, 2), padding='same')(x) # 16x16x64
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x) # 16x16x64
encoded = MaxPooling2D((2, 2), padding='same')(x) # 8x8x64
So you are reconstructing the original image from 33% of its data. It is quite impressive and of course there will be a little blur.
I have images with shape (3600, 3600, 3). I'd like to use an autoencoder on them. My code is:
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K
from keras.preprocessing.image import ImageDataGenerator
input_img = Input(shape=(3600, 3600, 3))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
batch_size=2
datagen = ImageDataGenerator(rescale=1. / 255)
# dimensions of our images.
img_width, img_height = 3600, 3600
train_data_dir = 'train'
validation_data_dir = validation
generator_train = datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
)
generator_valid = datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
autoencoder.fit_generator(generator=generator_train,
validation_data = generator_valid,
)
When I run the code I get this error message:
ValueError: Error when checking target: expected conv2d_21 to have 4 dimensions, but got array with shape (26, 1)
I know the problem is somewhere in the shape of the layers, but I couldn't find it. Can someone please help me and explain the solution?
There are the following issues in your code:
Pass class_mode='input' to flow_from_directory method to give input images as the labels as well (since you are creating an autoencoder).
Pass padding='same' to the third Conv2D layer in the decoder:
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
Use three filers in the last layer since your images are RGB:
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)