convolutional neural network for object localization

convolutional neural network for object localization - python

I saw in deep learning course by Andrew Ng a way to localize single object on image : https://www.youtube.com/watch?v=GSwYGkTfOKk .
As I understand it, you can for example bound a point to specific part of the object, take coordinates: x, y as labels y and train CNN.
I wanted to train a CNN neural network to localize my eyes (not clasiffication). I took 200 photos of me: 60x60 pixels in gray scale. I labeld left and right eye, Each coordinate of labeled eye was normalized to 0-1. The y label is : [x of eye1, y of eye1, x of eye2, y of eye2]. I used SGD optimazer with mse loss and in the output layer sigmoid function.
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(64, (3,3), input_shape= (60,60, 1)))
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model.add(tf.keras.layers.Conv2D(32, (3,3)))
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(4, activation='sigmoid'))
sgd= tf.keras.optimizers.SGD(lr = 0.01)
model.compile(loss = 'mean_squared_error', optimizer=sgd, metrics=['accuracy'])
model.fit(x,y, batch_size=3, epochs=15, validation_split=0.2)
It didnt work for this task, so what is the way to solve this problem? I saw somewhere: apply CNN to image (I suppose without dense layers), then on flatten data from CNN use linear regression for each x/y coordinate (multivariable logistic regression). Is this a solution ? As I understand it, I would feed each image into Conv and MaxPool layers, then Flatten and then i feed the data to lin. regression and train it, but I have no idea how to do this in keras. I am new in this field, so any idea helps me.

First of all, a couple of observations with regard to your code.
Since the last layer contanins more than 2 neurons, the activation function that you have to use is softmax , not sigmoid (note that this is in the case of classification, not regression).
You should only use sigmoid when you are doing binary classification, but not when you have more than two classes (note that you can also use softmax for 2 classes, however it is not necessarily recommended from a small computational overhead viewpoint).
Your problem is both a regression and classification one!.
The first layer of your convolutional neural network contains 64 feature maps, with each size of the kernel 3x3. Although the way you feed the images to your neural network is correct, you only feed the grayscale image, not the x1,x2,y1,y2 coordinates.
For an ANN with regression, take a look at this tutorial: https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/.
Your intuition is correct; object detection neural networks replace fully connected layers with convolutional ones. Yann LeCun even states that fully connected layers should not be a part of CNNs.
Since you are new to this field, I would recommend to adopt the following pipeline.
1) Find an online github model written in your preferred deep learning library(Keras/PyTorch/TensorFlow etc).
2) Follow the instructions/tutorial in order to reproduce the results obtained by the github user.
3) By means of the latter you should also understand the code / get a good intuitive grasp.
4) Adapt the model to the problem that you need.
One place where you could start is here (this is object detection - detect multiple objects and also of different categories) : https://github.com/pierluigiferrari/ssd_keras.
If you have further questions, please write them down, I would be glad to be of assistance!

Related

how will I be able to choose the right parameters and the structure of a neural networks during the training phase?

I'm trying to train a neural network in a supervised learning which has as input x_train a list of 100 list each containing 2000 column ....... and a target output y_train which has a list of 100 list also but contains each 20 column.
This is what x_train and y_train look like:
here is the neural networks that I created :
dnnmodel = tf.keras.models.Sequential()
dnnmodel.add(tf.keras.layers.Dense(40, input_dim = len(id2word), activation='relu'))
dnnmodel.add(tf.keras.layers.Dense(20, activation='relu'))
dnnmodel. compile ( loss = tf.keras.losses.MeanSquaredLogarithmicError(), optimizer = 'adam' , metrics = [ 'accuracy' ])
during the training phase I cannot choose the right number of neurons, layers and the activation and loss functions, since the accurency and loss values are not at all reasonable. .... can someone help me please?
Here is the display after the execution:

There is no correct method or formula to decide the correct number of layers or neurons or any other functions you use in your model. It all comes down experimentation and what works best for your data and the problem that you are trying to solve.
Here are some tips:
sigmoid, tanh = These activations are generally not used in hidden layers as their computed slopes or gradient is very small. So the model can take a long to converge.
Relu, elu, leaky relu - These activations can be used used in hidden layers as they have steep slope compared to others so the training process is fast. Relu is commonly used.
Layers: The more layers you add the deeper you make your neural network. Deeper neural networks are able to learn complex features about your data but they are prone to overfitting. Also, Deep Neural Network suffers from problems like vanishing gradient or exploding gradients. Fewer layers mean fewer params to learn and prone to underfitting.
Loss Function - Loss function depends on the problem you are trying to solve.
For classification
If y_label is categorical go for categorical_cross_entropy
If y_label is discreet go for sparse_categorical_cross_entropy
For regression problems
Use Rmse or MSE
Coming to the training logs. Your model is training as you can see the loss at each epoch less than the previous one. You should train your model for more epochs in order to see improvements in your accuracy.

How to fo transfer learning of a resnet50 model with with own dataset?

I am trying to build a face verification system using keras and resnet50 model with vggface weights. The way i am trying to achieve this is by the following steps:
given two image i first find out the face using mtcnn as embeddings
then i calculate the cosine distance between two vector embeddings. the distance starts from 0 to 1..... (Here to be noted
that the lower the distance the same two faces is)
Using the pre-trained model of resnet50 i get fairly good result. But since the model was trained mostly on european data and i want face verification on indian sub-contient i cannot rely on that. I want to train them on my own dataset. I have 10000 classes with each class containing 2 image. With image augmentation i can create 10-15 image per class from those two image.
here is the sample code i am using for training
base_model = VGGFace(model='resnet50',include_top=False,input_shape=(224, 224, 3))
base_model.layers.pop()
base_model.summary()
for layer in base_model.layers:
layer.trainable = False
y=base_model.input
x=base_model.output
x=GlobalAveragePooling2D()(x)
x=Dense(1024,activation='relu')(x) #we add dense layers so that the model can learn more complex functions and classify for better results.
x=Dense(1024,activation='relu')(x) #dense layer 2
x=Dense(512,activation='relu')(x) #dense layer 3
preds=Dense(8322,activation='softmax')(x) #final layer with softmax activation
model=Model(inputs=base_model.input,outputs=preds)
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['accuracy'])
model.summary()
train_datagen=ImageDataGenerator(preprocessing_function=preprocess_input) #included in our dependencies
train_generator=train_datagen.flow_from_directory('/Users/imac/Desktop/Fayed/Facematching/testenv/facenet/Dataset/train', # this is where you specify the path to the main data folder
target_size=(224,224),
color_mode='rgb',
batch_size=32,
class_mode='categorical',
shuffle=True)
step_size_train=train_generator.n/train_generator.batch_size
model.fit_generator(generator=train_generator,
steps_per_epoch=step_size_train,
epochs=10)
model.save('directory')
As far as the code code is concern what i understand is that i disable the last layer then add 4 layer train them and store them in a diectory.
i then load the model using
model=load_model('directory of my saved model')
model.summary()
yhat = model.predict(samples)
i predict the embedding of two image and then calculate cosine distance. But the problem is that the prediction gets worsen with my trained model. For two image of same person the pre-trained model gives distance of 0.3 whereas my trained model show distance of 1.0. Although during training loss function is decreasing with each epoch and accuracy is increasing but that doesn't reflect on my prediction output. I want to increase the prediction result of pre-trained model.
How can i achieve that with my own data?
N.B: I am relatively new in machine learning and don't know a lot about model layers

What I would suggest is to go with triplet or siamese with these many number of classes. Use MTCNN to extract faces and then use facenet architecture to generate 512 dimensions embedding vectors, then visualize it using TSNE plot. Every face will be assigned a small embedding cluster. Go through this link for Keras to generate face embeddings: Link.
Then, try Triplets semi-hard and hard loss on your dataset to cluster them into 10000 classes. It might help. Go through this detailed blog on triplet loss: Triplets. Codes to go through some of the repositries: Code.

Most suitable Machine Learning algorithm for this problem?

I have a dataset and i want to decide on which ML algorithm to apply to my given problem.
Customers are to fill out an assessment questionnaire of 50 questions. Examples of the questions are, what is your job, previous job history, how much do you earn, have you been rejected for a loan etc, and the end goal is to decide whether they should be rejected or not.
I have circa 500 entries for my algorithm to learn from and have pre-processed my dataset and converted the inputs into a numpy array and wondering what would be the best algorithm to use? Should i use a classification algorithm or a neural network in tensorflow and if the latter, what would be the layers I should use?
Thanks

How about beginning with xgboost or random forest? - So plain "old" ML?
The advantage would be that you could visualize the decision tree of the model once trained.
If using a NN in tensorflow (or even easier: keras with tensorflow backend), you could go with a MLP (multi layer perceptron), since the questions answers have fixed position in the input. You don't need many layers.
Important is that you normalize your input data columnwise, so that the input numbers are not much bigger/smaller than +1/-1, respectively. Introductory books often miss this point, though important.
Since your target labeling is "accept" or "reject", binary classifier will do it (also in the machine learning approach). (You use 0 and 1 as labels).
For NN, you don't need for such kind of classification that many layers or neurons. Try the smallest network first. let's say 10 neurons in first layer, then 7 neurons in the next layer (probably even less) and then 1 output neuron for the binary decision.
With Keras this would be:
from keras.models import Sequential
from keras.layers import Dense
def create_mlp(n_input = 500): # number of columns of input data 500 here
model = Sequential()
model.add(Dense(10, input_dim=n_input, kernel_initializer='normal', activation='relu')) # init = kernel_initializer
model.add(Dense(7, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['acc'])
return model
model = create_mlp(500) # this will generate the correct NN compiled.
Your data frame (or Numpy input array must have as rows the samples,
the columns are for each answer for a question 1 column.
the answers you have to encode in a numeric form. Numbers should be small - the best between -1 and 1. NNs don't like big numbers. Thus column-wise normalization can help.)
That's it. I learned all this stuff last year. Good luck for learning. It will be tons of fun!

How can I download and skip VGG weights that have no counterpart with my CNN in Keras?

I would like to follow the Convolutional Neural Net (CNN) approach here. However, this code in github uses Pytorch, whereas I am using Keras.
I want to reproduce boxes 6,7 and 8 where pre-trained weights from VGG-16 on ImageNet is downloaded and is used to make the CNN converge faster.
In particular, there is a portion (box 8) where weights are downloaded and skipped from VGG-16 that have no counterpart in SegNet (the CNN model). In my work, I am using a CNN model called U-Net instead of Segnet. The U-Net Keras code that I am using can be found here.
I am new to Keras and would appreciate any insight in Keras code on how I can go about downloading and skipping the VGG weights that have no counterpart with my U-Netmodel.

The technique you are addressing is called "Transfer Learning" - when a pre-trained model on a different dataset is used as part of the model as a starting point for better convergence. The intuition behind it is simple: we assume that after training on such a large and rich dataset as ImageNet, the convolution kernels of the model will learn useful representations.
In your specific case, you want to stack VGG16 weights in the bottom and deconvolution blocks on the top. I will go step-by-step, as you pointed out that you are new to Keras. This answer is organized as a step-by-step tutorial and will provide small snippets for you to use in your own code.
Loading weights
In the PyTorch code you linked to above, the model was first defined, and only then the weights are copied. I found this approach abundant, as it contains lots of not necessary code. Here, we will load VGG16 first, and then stack the other layers on the top.
from keras import applications
from keras.layers import Input
# Loading without top layers, since you only need convolution. Note that by not
# specifying the shape of top layers, the input tensor shape is (None, None, 3),
# so you can use them for any size of images.
vgg_model = applications.VGG16(weights='imagenet', include_top=False)
# If you want to specify input tensor shape, e.g. 256x256 with 3 channels:
input_tensor = Input(shape=(256, 256, 3))
vgg_model = applications.VGG16(weights='imagenet',
include_top=False,
input_tensor=input_tensor)
# To see the models' architecture and layer names, run the following
vgg_model.summary()
Defining the U-Net computation graph with VGG16 on the bottom
As pointed in previous paragraph, you do not need to define a model and copy the weights over. Just stack other layers on top of vgg_model:
# Import the layers to be used in U-Net
from keras.layers import ...
# From the U-Net code you provided
def make_conv_block(nb_filters, input_tensor, block):
...
# Creating dictionary that maps layer names to the layers
layers = dict([(layer.name, layer) for layer in vgg_model.layers])
# Getting output tensor of the last VGG layer that we want to include.
# I don't know much about U-Net, but according to the code you provided,
# you don't need the last pooling layer, right?
vgg_top = layers['block5_conv3'].output
# Now getting bottom layers for multi-scale skip-layers
block1_conv2 = layers['block1_conv2'].output
block2_conv2 = layers['block2_conv2'].output
block3_conv3 = layers['block3_conv3'].output
block4_conv3 = layers['block4_conv3'].output
# Stacking the remaining layers of U-Net on top of it (modified from
# the U-Net code you provided)
up6 = Concatenate()([UpSampling2D(size=(2, 2))(vgg_top), block4_conv3])
conv6 = make_conv_block(256, up6, 6)
up7 = Concatenate()([UpSampling2D(size=(2, 2))(conv6), block3_conv3])
conv7 = make_conv_block(128, up7, 7)
up8 = Concatenate()([UpSampling2D(size=(2, 2))(conv7), block2_conv2])
conv8 = make_conv_block(64, up8, 8)
up9 = Concatenate()([UpSampling2D(size=(2, 2))(conv8), block1_conv2])
conv9 = make_conv_block(32, up9, 9)
conv10 = Conv2D(nb_labels, (1, 1), name='conv_10_1')(conv9)
x = Reshape((nb_rows * nb_cols, nb_labels))(conv10)
x = Activation('softmax')(x)
outputs = Reshape((nb_rows, nb_cols, nb_labels))(x)
I want to emphasize that what we've done in this paragraph is just defining the computation graph for U-Net. This code is written specifically for VGG16, but you can modify it for other architectures as you wish.
Creating a model
After the previous step, we've got a computational graph (I assume that you use Tensorflow backend for Keras. If you're using Theano, I recommend you to switch to Tensorflow since this framework has achieved a state of maturity now). Now, we need to do the following things:
Create a model on top of this computation graph
Freeze the bottom layers, since you don't want to wreck your pre-trained weights
# Creating new model. Please note that this is NOT a Sequential() model
# as in commonly found tutorials on the internet.
from keras.models import Model
custom_model = Model(inputs=vgg_model.input, outputs=outputs)
# Make sure that the pre-trained bottom layers are not trainable.
# Here, I freeze all the layers of VGG16 (layers 0-18, including the
# pooling ones.
for layer in custom_model.layers[:19]:
layer.trainable = False
# Do not forget to compile it before training
custom_model.compile(loss='your_loss',
optimizer='your_optimizer',
metrics=['your_metrics'])
"I got confused"
Assuming that you're new to Keras and to Deep Learning in general (as you admitted in your question), I recommend the following articles to read to further understand the process of Fine Tuning and Transfer Learning on Keras:
How CNNs see the world - a great short article that will give you intuitive understanding of the dirty magic behind Transfer Learning.
Building powerful image classification models using very little data - This one will give you more insight on how to adjust learning rates and "release" the frozen layers.
When you're learning a framework, documentation is your best friend. Fortunately, Keras has an incredible documentation.
Q&A
The deconvolution blocks we put on top of VGG are from the UNET achitecture (i.e. up6 to conv10)? Please confirm.
Yes, it's the same as here, just with different names of the skip-connection layers (e.g. block1_conv2 instead of conv1)
We leave out the conv layers (i.e., conv1 to conv5). Can you please share with me as to why this is so?
We don't leave or throw any layers from the VGG network. The VGG16 network architecture and the bottom architecture of U-Net (up to conv5) is very similar. In fact, they are made of 5 blocks of the following format:
+-----------------+-------------------+
| VGG conv blocks | U-Net conv blocks |
+-----------------+-------------------+
| blockX_conv1 | convN |
| ... | poolN |
| blockX_convN | |
| blockX_pool | |
+-----------------+-------------------+
Here is a better visualization. So, the only difference between VGG16 and bottom part of U-Net is that each block of VGG16 contains multiple convolution layers instead of one. That's why, the alternative of connecting conv3 to conv6 is connecting block3_conv3 to conv6. The U-Net architecture remains the same, just with more convolution layers on the bottom.
Is there anyway to incorporate the Max pooling in the conv layers (in your opinion what are we doing here by leaving them out, and would you say it is insignificant?)
We don't leave them out. The only pooling layer that I threw away is block5_pool (which is the last layer in bottom part of VGG16) - because in the original U-Net (refer to the code) it seems like the last convolution block in the bottom part is not followed by a pooling layer (we have conv5 but don't have pool5). I kept all the layers of VGG16.
We see Maxpooling being used on the convolution blocks. Would we also just simply drop these pooling layers (as we are doing here with Unet) if we wanted to combine Segnet with VGG?
As I explained in the question above, we are not dropping any pooling layers.
However, you would need to stack a different type of pooling layers instead of the simple MaxPooling2D that is used in the default VGG16, because SegNet preserves max-indexes. This can be achieved with tf.nn.max_pool_with_argmax and using the trick of replacing middle layers of Keras model (I won't cover the detailed information in this answer to keep it clean). The replacement is harmless and doesn't require re-training because pooling layers don't contain any trained weights.
The U-NET from here is different from what I am using, can you tell what is the impact of such a difference between the two?
It is a more shallow U-Net. The one in your original question has 5 convolution blocks on the bottom (conv1 - conv5), while the later only has 3. Choose how many blocks you need depending on the data (e.g. for simple data as cells you might want to use only 2-3 blocks, while gray matter or tissue segmentation might require 5 blocks for better quality. See this link to have an insight of what convolution kernels "see".
Also, what do you think about the VGGSegnet from here. Does it use the trick of the middle layers you mentioned in Q&A? And is it the equivalent of the Pytorch code I initially posted?
Interesting. It is an incorrect implementation, and is not equivalent to the Pytorch code you posted. I have opened an issue in that repository.
Final question....is it always a rule in Transfer Learning to put the pretrained model (i.e., the model w/ pretrained weights) at the bottom?
Generally it is. Think of the convolution kernels as "features": the first layer detects small edges, colors. The following layers combines those edges and colors into more complicated detections, like "yellow lines" or "blue circle". Then the upper convolution layers detects more abstract shapes as "eyes", "nose", etc. based on detections of lower layers. So replacing the bottom layers (while the upper layers depends on the bottom representation) is illogic.

A solution sketch for this would look like the following:
Initialize VGG-16 and load the ImageNet weights by using the appropriate weights='imagenet' flag:
vgg_16 = keras.applications.vgg16.VGG16(weights='imagenet')
Initialize your model:
model = Model() # or Sequential()
... # Define and compile your model
For each layer that you want to copy over:
i_vgg = ... # Index of the layer you want to copy
i_mod = ... # Index of the corresponding layer in your model
weights = vgg_16.layers[i_vgg].get_weights()
model.layers[i_mod].set_weights(weights)
If you don't want to spend time to find out the index of each layer, you can assign a name to the relevant layers using the name='some_name' parameter in the layer constructor and then access the weights as follows:
layer_dict = dict([(layer.name, layer) for layer in model.layers])
weights = layer_dict['some_name'].get_weights()
layer_dict['some_name'].set_weights(weights)
Sources:
Loading VGG
Getting weights from layer (FChollet is the creator of Keras)
Getting and setting weights
Cheers

Keras Dense Net Overfitting

I am attempting to use keras to build an activity classifier from accelerometer signals. However, I am experiencing extreme overfitting of the data even with the most simplistic of models.
The input data is of shape (10,3) and contains roughly .1 second of data from the accelerometer in 3 dimensions. The model is simply
model = Sequential()
model.add(Flatten(input_shape=(10,3)))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
The model should output the label [1,0] for walking activities and [0,1] for non-walking activities. After training I get 99.8% accuracy (if only it was real...). When I attempt to predict on data that wasn't used for training, I get 50% accuracy, verifying that the net isn't really "learning" anything except to predict a single class value.
The data is being prepared from 100hz triaxial accelerometer signals. I am not preprocessing the data in any way except for windowing it into bins on length 10 that overlap with the previous bin by 50%. What measures can I take to make the network produce actual predictions? I have tried increasing the window size but the results remain the same. Any advice/general tips are greatly appreciated.
Ian

Try adding some hidden layers and dropout layers to your network. You could create a simple Multi Layer Perceptron (MLP) with a couple of extra lines in between your Flatten layer and Dense layer:
model.add(Dense(64, activation='relu', input_dim=30))
model.add(Dropout(0.25))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.1))
Or check out this guide. which explains how to create a simple MLP.
Without any hidden layers your model will not actually be 'learning' from the input data, rather it will be mapping the input number of features to the output number of features.
The more layers you add, the more intermediate features and patterns it should extract from the input data which should lead to better model predictions for test data. There will be a lot of trial and error to design the best model as too many layers can result in over fitting.
You have not provided information about how you train the model so that may be the cause of the issue as well. You must ensure that the data is spit into training, testing and validation sets. Some possible split ratios to use for training, validation, test data are: 60%:20%:20%, or 70%:15%:15%. This is ultimately something that you must also decide.

The problem of overfitting was caused by the input data type. The values passed to the classifier should have been float values with 2 decimal places. Somewhere along the way, some of these values had been augmented and had significantly more than 2 decimal places. That is, the input should have looked like
[9.81, 10.22, 11.3]
but instead looked like
[9.81000000012, 10.220010431, 11.3000000101]
The classifier was making its prediction based on this feature, which is obviously not the desired behavior! Lessoned learned - make sure the data preparation is consistent! Thanks to #umutto for the suggestions of random forests, the simple structure was helpful for diagnosing purposes.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.