Using Keras VGG19 preprocess_input function during model training - python

I am using VGG19 model pretrained on ImageNet dataset with top-layers (from flatten till the last output layer) removed.
The problem I am solving takes in multiple inputs that are from ModelNet dataset (a modified version actually). So what I am doing is, I am using VGG19 to extract features from these images for me and then I concatenate the output which is further fed to the rest of the network i.e. the top of the model that I have created according to my need and the number of classes (20 in my case). This is my code for the model I am using (if needed for reference):
from keras.applications.vgg19 import VGG19
input_1 = Input(shape=(224, 224, 3), name='image1')
input_2 = Input(shape=(224, 224, 3), name='image2')
input_3 = Input(shape=(224, 224, 3), name='image3')
input_4 = Input(shape=(224, 224, 3), name='image4')
base_model = VGG19(weights='imagenet', input_shape=(224, 224, 3), include_top=False)
base_model.trainable = False
x1 = base_model(input_1, training=False)
x2 = base_model(input_2, training=False)
x3 = base_model(input_3, training=False)
x4 = base_model(input_4, training=False)
x = concatenate([x1, x2, x3, x4])
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(256, activation='relu')(x)
outputs = Dense(20, activation='softmax', name='class_out')(x)
model = tf.keras.models.Model([input_1, input_2, input_3, input_4], outputs)
print(model.summary())
Now, I want to know that is it necessary to use 'from keras.applications.vgg19 import preprocess_input' for my input images before training the model? or should I use this function while predicting only?
As for my input, they are already normalized values that I am using for model training via a custom made data-loader function; which simply returns the generator of 4 normalized input images and a class_output, like this:
# The output from data-loader function
# where x_batch are values normalized by x_batch[i,j]/255
yield {'image1': x_batch[:, 0],
'image2': x_batch[:, 1],
'image3': x_batch[:, 2],
'image4': x_batch[:, 3], }, {'class_out': y_batch}
However, whenever I use preprocess_input instead of plain normalization, my output image is a weird looking image like this:
I do not understand this scenario.. if anyone can help me with this please.

VGG19 requires your input image to be a BGR image, not RGB image. And so, keras.applications.vgg19.preprocess_input will convert the input images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling. For more information about your model click here, and more about the preprocessing function can be found here
So, if you are already converting your images from RGB to BGR before passing to the model, then you don't need to use keras.applications.vgg19.preprocess_input on your input images.

Related

Image sequence detection with Keras, Convolutional and Stateful Neural Network

I am trying to write a pretty complicated neural network (at least for me) in keras that needs to combine both a common CNN structure and an LSTM/GRU layer.
Basically, I have a dataset of climatological maps of the Mediterranean sea, each map details the wind, pressure and other parameters. I am studying Medicanes (Mediterranean hurricanes) and my goal is to create a neural network that can classify each map with a label zero if there is no trace of such hurricanes or one if the map contains one.
In order to achieve that I need a network with two parts:
feature extractor (normal CNN).
temporal layer (LSTM/GRU).
The main cause of this is that each map is correlated with the previous one because the formation and life cycle of a Medicane can take several days to complete.
Important note: the dataset is too big to be uploaded all at once so I have to work one batch at a time.
I am working with Keras and I found it pretty challenging to adapt its standard framework to my needs so I have come up with some peculiar flow to feed my data into the network.
In particular, I found it hard to pass both my batch size and my time-step parameter to the GRU layer using a more standard alternative.
This is what I tried:
I am positively sure I have overcomplicated the task, but, as I said I am not very proficient with Keras and TensorFlow.
The main problem was that I could not find a way to import the data both in a batch (for RAM reasons) and in a sequence of 10-15 pictures (to be used as the time steps in the GRU layer).
I solved this problem by importing batches of 120 maps in order (no shuffle) and I created a way to turn these batches into the sequence of images I needed then I proceeded to re-batch the sequences and feed them to the model manually.
Data Import
batch_size=120
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"./Figures_1/Train",
validation_split=None,
subset=None,
labels="inferred",
label_mode="binary",
color_mode="rgb",
interpolation='bilinear',
batch_size=batch_size,
image_size=(600, 600),
shuffle=False,
seed=123
)
Get a sequence of Images
Here, I break down the 120 map batches into sequences of 60 observations, and I return each sequence one at a time.
sequence_lengh=60
def sequence_x(train_dataset):
x_numpy = np.asarray(list(map(lambda x: x[0], tfds.as_numpy(train_dataset))),dtype=object)
for element in range(0,x_numpy.shape[0]):
for i in range(0, x_numpy.shape[0],sequence_lengh):
x_seq = x_numpy[element][i:i+sequence_lengh]
yield x_seq
def sequence_y(train_dataset):
y_numpy = np.asarray(list(map(lambda x: x[1], tfds.as_numpy(train_dataset))),dtype=object)
for element in range(0,y_numpy.shape[0]):
for i in range(0, y_numpy.shape[0],sequence_lengh):
y_seq = y_numpy[element][i:i+sequence_lengh]
yield y_seq
CNN Model
I build the CNN model based on a pre-trained DenseNet
from keras.layers import TimeDistributed, GRU
def build_convnet(shape=(600, 600, 3)):
inputs = keras.Input(shape = shape)
x = inputs
# preprocessing
x = keras.applications.densenet.preprocess_input(x)
#Convbase
x = convBase(x)
x = layers.Flatten()(x)
# Fine tuning
x = keras.layers.Dense(1024, activation='relu')(x)
x = layers.Dropout(0.2)(x)
x = keras.layers.Dense(512, activation='relu')(x)
x = keras.layers.GlobalMaxPool2D()
return x
GRU Model
I build the time part of the network with a GRU layer
def action_model(shape=(15, 600, 600, 3), nbout=15):
# Create our convnet with (112, 112, 3) input shape
convnet = build_convnet(shape[1:]) #[1:]
# then create our final model
model = keras.Sequential()
# add the convnet with (5, 112, 112, 3) shape
model.add(TimeDistributed(convnet, input_shape=shape))
# here, you can also use GRU or LSTM
model.add(GRU(64))
# and finally, we make a decision network
model.add(Dense(1024, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(512, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(64, activation='relu'))
model.add(Dense(15, activation='softmax'))
return model
Transfer Learning
I retrain a part of the GRU
convBase = DenseNet121(include_top=False, weights=None, input_shape=(600,600,3), pooling="avg")
for layer in convBase.layers:
if 'conv5' in layer.name:
layer.trainable = True
for layer in convBase.layers:
if 'conv4' in layer.name:
layer.trainable = True
Model Compile
Model compilation ( image size= 600x600x3)
INSHAPE=(15, 600, 600, 3) # (5, 112, 112, 3)
model = action_model(INSHAPE, 1)
optimizer = keras.optimizers.Adam(0.001)
model.compile(
optimizer,
'categorical_crossentropy',
metrics='accuracy'
)
Model Fit
Here I manually batch my data. I turn an array (60, 600, 600, 3) into a (4,15,600,600) array. Meaning 4 batches each one containing a 15-map long sequence.
epochs = 10
for value in range(0, epochs):
train_x, train_y = sequence_x(train_ds), sequence_y(train_ds)
val_x, val_y = sequence_x(validation_ds), sequence_y(validation_ds)
for i in range(0,278): #
x = next(train_x, "none")
y = next(train_y, "none")
if (x!="none" or y!="none"):
if (np.any(x) and np.any(y)):
x_stack = np.stack((x[:15], x[15:30], x[30:45], x[45:]))
y_stack = np.stack((y[:15], y[15:30], y[30:45], y[45:]))
y_stack=y_stack.reshape(4,15)
model.fit(x=x_stack, y=y_stack,
validation_data=None,
batch_size=None,
shuffle=False
)
else:
continue
else:
continue
The idea is to get a model that, when presented with a sequence of images, can categorize each one of them with a 0 or a 1 if they have a Medicane or not.
The model does compile without any errors but the results it provides are horrible:
.
What am I doing incorrectly? Is there a more effective way to write all of this?

How to generate a tensor of desired features, extracted from RGB images using pre-trained models?

I would like to generate a 1D vector of features extracted from RGB images (256 x 256 x 3), using pre-trained models. Suppose I started from a tensor whose shape is (N_images, 256, 256, 3) I would like to obtain a tensor whose shape is (N_images, M_features), where M_features is the number of features, chose by the user.
I found a feasible solution in the keras/tensorflow documentation (see: “Extract features with VGG16”) and I try the following code (using ResNet50):
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
N_images= img_data.shape[0]
model = ResNet50(weights='imagenet', include_top=False, input_shape=(256,256,3))
model.summary()
img_data = preprocess_input(img_data)
res_feature = model.predict(img_data)
res_feature.shape
However, the shape of the feature set is (N_images, 8 ,8 ,2048). Therefore, I added a GlobalAveragePooling2D layer:
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
from tensorflow.keras import Model, Input, regularizers
N_images= img_data.shape[0]
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(256,256,3))**strong text**
x = base_model.output
x = GlobalAveragePooling2D() (x)
model = Model(inputs=base_model.input, outputs=x)
img_data = preprocess_input(img_data)
res_feature = model.predict(img_data)
res_feature.shape
In this case, the shape of the output tensor is (N_images, 2048), that could be ok but I would like to chose a specific number of desired features.
Thanks in advance.
You probably want an autoencoder, which is basically trying to do dimensionality reduction of a convex latent space... it's simpler than what it seems (given a desired dimension M):
create the "dataset":
res_feature = model.predict(img_data)
res_feature = np.reshape(res_feature, (len(res_feature), -1))
create the autoencoder:
input = tf.keras.layers.Input(shape=res_feature.shape[1:])
encoder = tf.keras.layers.Dense(M, activation="selu")(input)
decoder = tf.keras.layers.Dense(shape=res_feature.shape[1:], activation="linear")
model = tf.keras.models.Model(inputs=input, outputs=decoder)
compile and train it with your fav optimizer and the correct loss

Predicting a single PNG image using a trained TensorFlow model

import tensorflow as tf
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape = (28,28)),
tf.keras.layers.Dense(128, activation = 'relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
This is the code for the model, which I have trained using the mnist dataset. What I want to do is to then pass a 28x28 png image to the predict() method, which is not working. The code for the prediction is:
img = imageio.imread('image_0.png')
prediction = model.predict(img, batch_size = 1)
which produces the error
ValueError: Error when checking input: expected flatten_input to have shape (28, 28) but got array with shape (28, 3)
I have been stuck on this problem for a few days, but I can't find the correct way to pass an image into the predict method. Any help?
Predict function makes predictions over a batch of image. You should include batch dimension (first dimension) to your img, even to predict a single example.
You need something like this:
img = imageio.imread('image_0.png')
img = np.expand_dims(img, axis=0)
prediction = model.predict(img)
As #desertnaut says, seems you are using a RGB image, so your first layer should use input_shape = (28,28,3). Therefore, img parameter of predict function should have (1,28,28,3) shape.
In your case, img parameter of predict function has (28,28,3) shape, thus predict function took the first dimension as number of images, and could not match the other two dimensions to the input_shape of the first layer.

Get decoder from trained autoencoder model in Keras

I am training a deep autoencoder to map human faces to a 128 dimensional latent space, and then decode them back to its original 128x128x3 format.
I was hoping that after training the autoencoder, I would somehow be able to 'slice' the second half of the autoencoder, i.e. the decoder network responsible for mapping the latent space (128,) to the image space (128, 128, 3) by using the functional Keras API and autoenc_model.get_layer()
Here are the relevant layers of my model:
INPUT_SHAPE=(128,128,3)
input_img = Input(shape=INPUT_SHAPE, name='enc_input')
#1
x = Conv2D(64, (3, 3), padding='same', activation='relu')(input_img)
x = BatchNormalization()(x)
//Many Conv2D, BatchNormalization(), MaxPooling() layers
.
.
.
#Flatten
fc_input = Flatten(name='enc_output')(x)
y = Dropout(DROP_RATE)(fc_input)
y = Dense(128, activation='relu')(y)
y = Dropout(DROP_RATE)(y)
fc_output = Dense(128, activation='linear')(y)
#Reshape
decoder_input = Reshape((8, 8, 2), name='decoder_input')(fc_output)
#Decoder part
#UnPooling-1
z = UpSampling2D()(decoder_input)
//Many Conv2D, BatchNormalization, UpSampling2D layers
.
.
.
#16
decoder_output = Conv2D(3, (3, 3), padding='same', activation='linear', name='decoder_output')(z)
autoenc_model = Model(input_img, decoder_output)
here is the notebook containing the entire model architecture.
To get the decodeer network from the trained autoencoder, I have tried using:
dec_model = Model(inputs=autoenc_model.get_layer('decoder_input').input, outputs=autoenc_model.get_layer('decoder_output').output)
and
dec_model = Model(autoenc_model.get_layer('decoder_input'), autoenc_model.get_layer('decoder_output'))
neither of which seem to work.
I need to extract the decoder layers out of the autoencoder as I want to train the entire autoencoder model first, then use the encoder and the decoder independently.
I could not find a satisfactory answer anywhere else. The Keras blog article on building autoencoders only covers how to extract the decoder for 2 layered autoencoders.
The decoder input/output shape should be: (128, ) and (128, 128, 3), which is the input shape of the 'decoder_input' and output shape of the 'decoder_output' layers respectively.
Couple of changes are needed:
z = UpSampling2D()(decoder_input)
to
direct_input = Input(shape=(8,8,2), name='d_input')
#UnPooling-1
z = UpSampling2D()(direct_input)
and
autoenc_model = Model(input_img, decoder_output)
to
dec_model = Model(direct_input, decoder_output)
autoenc_model = Model(input_img, dec_model(decoder_input))
Now, you can train on the auto encoder and predict using the decoder.
import numpy as np
autoenc_model.fit(np.ones((5,128,128,3)), np.ones((5,128,128,3)))
dec_model.predict(np.ones((1,8,8,2)))
You can also refer this self-contained example:
https://github.com/keras-team/keras/blob/master/examples/variational_autoencoder.py
My solution isn't very elegant, and there are probably better solutions out there, but since no-one replied yet, I'll post it (I was actually hoping someone would so I can improve my own implementation, as you'll see below).
So what I did was built a network that can take a secondary input, directly into the latent space.
Unfortunately, both inputs are obligatory, so I end up with a network that requires dummy arrays full of zeros for the 'unwanted' input (you'll see in a second).
Using Keras functional API:
image_input = Input(shape=image_shape)
conv1 = Conv2D(...,activation='relu')(image_input)
...
dense_encoder = Dense(...)(<layer>)
z_input = Input(shape=n_latent)
decoder_entry = Dense(...,activation='relu')(Add()([dense_encoder,z_input]))
...
decoder_output = Conv2DTranspose(...)
model = Model(inputs=[image_input,z_input], outputs=decoder_output)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
encoder = Model(inputs=image_input,outputs=dense_encoder)
decoder = Model(inputs=[z_input,image_input], outputs=decoder_output)
Note that you shouldn't compile the encoder and decoder.
(some code is either omitted or left with ... for you to fill in your specific needs).
Finally, to train you'll have to provide one empty array. So to train the entire auto-encoder:
images is X in this context
model.fit([images,np.zeros((len(n_latent),...))],images)
And then you can get the latent features using:
latent_features = encoder.predict(images)
Or use the decoder with latent input and dummy variables (note the order of inputs above):
decoder.predict([Z_inputs,np.zeros(shape=images.shape)])
Finally, another solution I haven't tried is build to parallel models, with the same architecture, one the autoencoder, and the second only the decoder part, and then use:
decoder_layer.set_weights(model_layer.get_weights())
It should work, but I haven't confirmed it. It does have the disadvantage of having to copy the weights again every time your train the autoencoder model.
So to conclude, I am aware of the many problems here, but again, I only posted this because I saw no-one else replied, and was hoping this will still be of some use to you.
Please comment if something is not clear.
An option is to define a function which uses get_layer and then reconstruct the decoder part in there. For example, consider a simple autoencoder with the following architecture: [n_inputs, 500, 100, 500, n_outputs]. To be able to run some inputs through the second half (ie run 100 inputs through the layers of 500 and n_outputs.
# Function to get outputs from a given set of bottleneck inputs
def bottleneck_to_outputs(bottleneck_inputs, autoencoder):
# Run bottleneck_inputs (eg 100 units) through decoder layer (eg 500 units)
x = autoencoder.get_layer('decoder')(bottleneck_inputs)
# Run x (eg 500 units) through output layer (n units = n features)
x = autoencoder.get_layer('output')(x)
return x
For your example, this function should work (assuming you have given your layers the names referenced here).
def decoder_part(autoenc_model, image):
#UnPooling-1
z = autoenc_model.get_layer('upsampling1')(image)
#9
z = autoenc_model.get_layer('conv2d1')(z)
z = autoenc_model.get_layer('batchnorm1')(z)
#10
z = autoenc_model.get_layer('conv2d2')(z)
z = autoenc_model.get_layer('batchnorm2')(z)
#UnPooling-2
z = autoenc_model.get_layer('upsampling2')(z)
#11
z = autoenc_model.get_layer('conv2d3')(z)
z = autoenc_model.get_layer('batchnorm3')(z)
#12
z = autoenc_model.get_layer('conv2d4')(z)
z = autoenc_model.get_layer('batchnorm4')(z)
#UnPooling-3
z = autoenc_model.get_layer('upsampling3')(z)
#13
z = autoenc_model.get_layer('conv2d5')(z)
z = autoenc_model.get_layer('batchnorm5')(z)
#14
z = autoenc_model.get_layer('conv2d6')(z)
z = autoenc_model.get_layer('batchnorm6')(z)
#UnPooling-4
z = autoenc_model.get_layer('upsampling4')(z)
#15
z = autoenc_model.get_layer('conv2d7')(z)
z = autoenc_model.get_layer('batchnorm7')(z)
#16
decoder_output = autoenc_model.get_layer('decoder_output')(z)
return decoder_output
Given this function, it would make sense to also have a way to test if it is working correctly. In order to do this, define another model which gets you from inputs to the bottleneck (latent space), such as:
bottleneck_layer = Model(inputs= input_img,outputs=decoder_input)
Then, as a test, run a vector of ones through the first part of the model and obtain the latent space:
import numpy as np
ones_image = np.ones((128,128,3))
bottleneck_ones = bottleneck_layer(ones_image.reshape(1,128,128,3))
And then run that latent space through the function defined above to create a variable which you will test against the output of full network:
decoded_test = decoder_part(autoenc_model, bottleneck_ones)
Now, run the ones_image through the whole network and verify that you get the same results:
model_test = autoenc_model.predict(ones_image.reshape(1,128,128,3))
tf.debugging.assert_equal(model_test, decoder_test, message= 'Tensors are not equivalent')
If the assert_equal line does not throw an error, your decoder is working correctly.

Keras Conv2D layer outputs array filled with NaN

I built a keras model that takes an image as input and performs several convolutions and a pooling operation, then performs a specialized convolution layer with pre-initialized weights. When run on an image, this model outputs an array of the correct shape, but with all the elements as NaN.
The first part of the model is the first "block" of the pretrained VGG16 model for keras. The specialized layer (keras.layers.Conv2D) takes its weights as a set of filters corresponding to certain features I want to extract from the image. It does not matter if i flip the filters (to do cross-correlation), or if i change the image, always NaN. Any ideas?
EDIT: here is code. Takes a numpy image array as input.
def make_model(features, layer_name="block2_conv1"):
vgg = VGG16(include_top=False)
layer = vgg.get_layer(layer_name)
x = layer.output
num_chars, char_w, char_h, char_filters = features.shape
filters = features.transpose((1, 2, 3, 0)).astype(int)
filters = filters / np.sqrt(np.sum(np.square(filters), axis=(0, 1), keepdims=True))
x = BatchNormalization()(x)
specialized_layer = Conv2D(num_chars, (char_w, char_h))
x = specialized_layer(x)
biases = np.zeros((num_chars, ))
specialized_layer.set_weights([filters, biases])
model = Model(inputs=vgg.input, outputs=x)
return model

Categories