I've tried to input a specific scalar value(0 to 1; each image has) into a layer of latent variables.
How to insert the value in CNN based auto-encoder sequential model?
def encoder():
model = tf.keras.Sequential()
model.add(Conv2D(12, (2,2), activation='relu', padding='same', input_shape=(32, 32, 1)))
model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))
model.add(Flatten())
model.add(Dense(64))
return model
# maybe some code here
def decoder():
model = tf.keras.Sequential()
model.add(Dense(16*16*12, input_shape=(65,)))
model.add(Reshape((16,16,12)))
model.add(Conv2D(32, (2, 2), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(1, (2, 2), padding='same'))
return model
The number of latent variables is 64 in encoder, therefore, a scalar variable should make 65 latent variables in decoder.
Concatenate layer could be applied?
You can use a Concatenate layer for that.
For example:
x1 = tf.keras.layers.Dense(8)(np.arange(10).reshape(5, 2))
x2 = tf.keras.layers.Dense(8)(np.arange(10, 20).reshape(5, 2))
concatted = tf.keras.layers.Concatenate()([x1, x2])
But you will have to choose the correct shape for that constant you are trying to add.
In your case, you could use:
...
outputs = tf.keras.layers.Concatenate()([model, your_constant])
model = tf.keras.Model(inputs=model.inputs, outputs=outputs)
return model
...
Does this solution work for you?
I am currently developing a precipitation cell displacement prediction model. I have taken as a model to implement a variational convolutional autoencoder (I attach the model code). In summary, the model receives a sequence of 5 images, and must predict the following 5. The architecture consists of five convolutive layers in the encoder and decoder (Conv Transpose), which were made to greatly reduce the image size and learn spatial details. Between the encoder and decoder carries ConvLSTM layers to learn the temporal sequences. I am working it in Python, with tensorflow and keras.
The data consists of "images" of the rain radar of 400x400 pixels, with the dark background and the rain cells in the center of the frame. The time between frame and frame is 5 minutes, radar configuration. After further processing, the training data is scaled between 0 and 1, and in numpy format (for working with matrices). My training data finally has the form [number of sequences, number of images per sequence, height, width, channel = 1].
Sequence of precipitation Images
The sequences are made up of: 5 inputs and 5 targets, of which there are 2111 radar image sequences (I know I don't have much data :( for training) and 80% have been taken for training and 20% for the validation.
To detail:
train_input = [1688, 5, 400, 400, 1]
train_target = [1688, 5, 400, 400, 1]
valid_input = [423, 5, 400, 400, 1]
valid_target = [423, 5, 400, 400, 1]
The problem is that I have trained my model, and I have obtained the value of accuracy very poor. around 8e-05. I've been training 400 epochs, and the value remains or surrounds the mentioned value. Also when I take a sequence of 5 images to predict the next 5, I get very bad results (not even a small formation of "spots" in the center, which represents the rain). I have already tried to reduce the number of layers in the encoder and decoder, in addition to the optimizer [adam, nadam, adadelta], I have also tried using the activation function [relu, elu]. I have not obtained any profitable results, in the prediction images and the accuracy value.
Loss and Accuracy during Training
I am a beginner in Deep Learning topics, I like it a lot, but I can't find a solution to this problem. I suspect that my model architecture is not right. In addition to that I should look for a better optimizer or activation function to improve the accuracy value and predicted images. As a last solution, perhaps cut the image of 400x400 pixels to a central area, where the precipitation is. Although I would lose training data.
I appreciate you can help me solve this problem, maybe giving me some ideas to organize my architecture model, or ideas to organize de train data.
Best regards.
# Encoder
seq = Sequential()
seq.add(Input(shape=(5, 400, 400,1)))
seq.add(Conv3D(filters=32, kernel_size=(11, 11, 5), strides=3,
padding='same', activation ='relu'))
seq.add(BatchNormalization())
seq.add(Conv3D(filters=32, kernel_size=(9, 9, 32), strides=2,
padding='same', activation ='relu'))
seq.add(BatchNormalization())
seq.add(Conv3D(filters=64, kernel_size=(7, 7, 32), strides=2,
padding='same', activation ='relu'))
seq.add(BatchNormalization())
seq.add(Conv3D(filters=64, kernel_size=(5, 5, 64), strides=2,
padding='same', activation ='relu'))
seq.add(BatchNormalization())
seq.add(Conv3D(filters=32, kernel_size=(3, 3, 64), strides=3,
padding='same', activation ='relu'))
seq.add(BatchNormalization())
# ConvLSTM Layers
seq.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
input_shape=(None, 6, 6, 32),
padding='same', return_sequences=True))
seq.add(BatchNormalization())
seq.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
seq.add(BatchNormalization())
seq.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
seq.add(BatchNormalization())
seq.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
seq.add(BatchNormalization())
seq.add(Conv3D(filters=32, kernel_size=(3, 3, 3),
activation='relu',
padding='same', data_format='channels_last'))
# Decoder
seq.add(Conv3DTranspose(filters=32, kernel_size=(3, 3, 64), strides=(2,3,3),
input_shape=(1, 6, 6, 32),
padding='same', activation ='relu'))
seq.add(BatchNormalization())
seq.add(Conv3DTranspose(filters=64, kernel_size=(5, 5, 64), strides=(3,2,2),
padding='same', activation ='relu'))
seq.add(BatchNormalization())
seq.add(Conv3DTranspose(filters=64, kernel_size=(7, 7, 32), strides=(1,2,2),
padding='same', activation ='relu'))
seq.add(BatchNormalization())
seq.add(Conv3DTranspose(filters=32, kernel_size=(9, 9, 32), strides=(1,2,2),
padding='same', activation ='relu'))
seq.add(BatchNormalization())
seq.add(Conv3DTranspose(filters=1, kernel_size=(11, 11, 5), strides=(1,3,3),
padding='same', activation ='relu'))
seq.add(BatchNormalization())
seq.add(Cropping3D(cropping = (0,16,16)))
seq.add(Cropping3D(cropping = ((0,-5),(0,0),(0,0))))
seq.compile(loss='mean_squared_error', optimizer='adadelta', metrics=['accuracy'])
The metric you would like to use in case of a regression problem is mse(mean_squared_error) or mae (mean_absolute_error).
You may want to use mse in the beginning as it penalises greater errors more than the mae.
You just need to change a little bit the code where you compile your model.
seq.compile(loss='mean_squared_error', optimizer='adadelta', metrics=['mse','mae'])
In this way you can monitor both mse and mae metric during the training.
I am new to Convolutional Neural Network. Instead of getting my data in image format i have been given flattened images matrix which is [10000x784].
Means 10000 images of size 28x28
Considering one image size is 28x28, how should i give the data matrix to my input for CNN?
My model is:
model = models.Sequential()
model.add(layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28,28,1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
#model.add(layers.Flatten())
model.add(layers.Dense(2500, activation='relu'))
model.add(layers.Dense(2500, activation='relu'))
model.add(layers.Dense(1, activation='relu'))
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['mae','mse'])
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
#Fits model
history= model.fit(x_trained, y_train, epochs = 7000, validation_split = 0.2, shuffle= True, verbose = 1, callbacks=[callback])
I get error at model.fit.
P.S: I am doing regression and for every image i have one value as output
Begin with a Reshape layer:
model = models.Sequential()
model.add(layers.Reshape((28, 28, 1), input_shape=(784,)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# ...
I am currently trying to build a CRNN with Keras. When I try to reshape the input size, I had some trouble finding the correct dimension for my LSTM. After some debugging, I found a field in my model object called output_shape whose value was (3,1,244) and I tried to pass it as a 2D array with (3,224). Everything worked fine, but did I do correctly? What is the math behind this and what can I do next time to discover this size without debugging?
def CRNN(blockSize, blockCount, inputShape, trainGen, testGen, epochs):
model = Sequential()
# Conv Layer
channels = 32
for i in range(blockCount):
for j in range(blockSize):
if (i, j) == (0, 0):
conv = Conv2D(channels, kernel_size=(5, 5),
input_shape=inputShape, padding='same')
else:
conv = Conv2D(channels, kernel_size=(5, 5), padding='same')
model.add(conv)
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.15))
if j == blockSize - 2:
channels += 32
model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))
model.add(Dropout(0.15))
# Feature aggregation across time
model.add(Reshape((3, 224)))
# LSTM layer
model.add(Bidirectional(LSTM(200), merge_mode='ave'))
model.add(Dropout(0.5))
# Linear classifier
model.add(Dense(4, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adam(),
metrics=['accuracy']) # F1?
model.fit_generator(trainGen,
validation_data=testGen, steps_per_epoch = trainGen.x.size // 20,
validation_steps = testGen.x.size // 20,
epochs=epochs, verbose=1)
return model
# Function call
model = CRNN(4, 6, (140, 33, 1), trainGen, testGen, 1)
I have a binary classification problem. I want to detect raindrops on the image. I trained a simple model, but my prediction is not good. I want to have a prediction between 0 and 1.
For my first try, i used relu for all layers accept the final(I used softmax). As the optimizer, i used binary_crossentropy and i changed it into the categorical_crossentropy. Both of them didn't work.
opt = Adam(lr=LEARNING_RATE, decay=LEARNING_RATE / EPOCHS)
cnNetwork.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=lr),
metrics=['accuracy'])
inputShape = (height, width, depth)
# if we are using "channels first", update the input shape
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
# First layer is a convolution with 20 functions and a kernel size of 5x5 (2 neighbor pixels on each side)
model.add(Conv2D(20, (5, 5), padding="same",
input_shape=inputShape))
# our activation function is ReLU (Rectifier Linear Units)
model.add(Activation("relu"))
# second layer is maxpooling 2x2 that reduces our image resolution by half
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Third Layer - Convolution, twice the size of the first convoltion
model.add(Conv2D(40, (5, 5), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Fifth Layer is Full connected flattened layer that makes our 3D images into 1D arrays
model.add(Flatten())
model.add(Dense(500))
model.add(Activation("relu"))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
I expect to get for ex .1 for the first class and .9 for the second. In the result, i get 1 , 1.3987518e-35. The main problem is that i always get 1 as a prediction.
You should be using binary_crossentropy and there is nothing wrong in the output you have got. The output 1 , 1.3987518e-35 means the probability of first class is almost 1 and probability of second class is very close to 0 (1e-35).