Keras Conv2D output dimension. Multiclass instance segmentation - python

Im trying to create a keras instance segmentation CNN model with an Unet architecture.
The Keras CNN model i want to use/modify is:model.py
The CNN model should be able to detect 3 different object classes:(Main root, secondary root, stem).
I've converted the annotation-polygons(3 classes) to bitmap masks with a modified version of the balloon.py file in the samples folder in:GitHub: Mask RCNN Instance segmentation package
My annotation->bitmap mask file:annotation.py
Visualization of bitmaps of the 3 classes:
I want a last Conv2D output of 3 (my classes) feature maps, a final softmax activation and an categorical_crossentropy. I don't know how to "compare" my annotations to the models predictions. Do i have to use some kind of keras.losses.categorical_crossentropy(y_true, y_pred) function after the last Conv2D layer(conv9) in my model.py file?

Related

TorchVision using pretrained weights for entire model vs backbone

TorchVision Detection models have a weights and a weights_backbone parameter. Does using pretrained weights imply that the model uses pretrained weights_backbone under the hood? I am training a RetinaNet model and um unsure which of the two options I should use and what the differences are.
The difference is pretty simple: you can either choose to do transfer learning on the backbone only or on the whole network.
RetinaNet from Torchvision has a Resnet50 backbone. You should be able to do both of:
retinanet_resnet50_fpn(weights=RetinaNet_ResNet50_FPN_Weights.COCO_V1)
retinanet_resnet50_fpn(backbone_weights=ResNet50_Weights.IMAGENET1K_V1)
As implied by their names, the backbone weights are different. The former were trained on COCO (object detection) while the later were trained on ImageNet (classification).
To answer your question, pretrained weights implies that the whole network, including backbone weights, are initialized. However, I don't think that it calls backbone_weights under the hood.

From RESNET50 image classifier to any object detector

Hello Stack Overflow!
I am looking to use a resnet50 face classification model to transform it in a ssd, yolo or efficientDet. Is this even possible? Basically I am looking to use a trained model that detects single classes in an image to detect more than a single class in an image. To partition an input image, detect the objects(faces) in the given image based on my resnet50 classification model, where I give the yolo my resnet classification model as parameter.
Thanks in advance!

Which layer of a deep learning model (DenseNet-121) to use as output when using model as feature extractor

im having trouble deciding or being sure what layer of a densenet-121 (fined-tuned model) to use for feature extraction.
I have the following model (is based on DenseNet-121 but I added a classification layer because I have trained it to classify the image into 7 classes). These are the last layers of my model:
However, Im having trouble figuring out which layer to use (BatchNormalization or the relu). I want to have a vector of len(4096). Is there a difference of output from the two layers? Which one is the recommended one to use?
if you are doing classification you want the dense_3 layer as the model output. The batch normalization layer and the relu layer each produce an output of shape(2,2,1024). 4096 is the number of trainable parameters for the layer

Predictions from Savedmodel (Tensorflow 2)

I saved an image classifier I trained on two different classes and want to classify a new image using the classifier. Once I have my model loaded what tf function do I call to return the softmax prediction of the final layer after feeding an image?
Thank you
You should run model.predict(image_to_classify), if you just want the index of the prediction, and not the probabilities run np.argmax(model.predict(image_to_classify))

How to fo transfer learning of a resnet50 model with with own dataset?

I am trying to build a face verification system using keras and resnet50 model with vggface weights. The way i am trying to achieve this is by the following steps:
given two image i first find out the face using mtcnn as embeddings
then i calculate the cosine distance between two vector embeddings. the distance starts from 0 to 1..... (Here to be noted
that the lower the distance the same two faces is)
Using the pre-trained model of resnet50 i get fairly good result. But since the model was trained mostly on european data and i want face verification on indian sub-contient i cannot rely on that. I want to train them on my own dataset. I have 10000 classes with each class containing 2 image. With image augmentation i can create 10-15 image per class from those two image.
here is the sample code i am using for training
base_model = VGGFace(model='resnet50',include_top=False,input_shape=(224, 224, 3))
base_model.layers.pop()
base_model.summary()
for layer in base_model.layers:
layer.trainable = False
y=base_model.input
x=base_model.output
x=GlobalAveragePooling2D()(x)
x=Dense(1024,activation='relu')(x) #we add dense layers so that the model can learn more complex functions and classify for better results.
x=Dense(1024,activation='relu')(x) #dense layer 2
x=Dense(512,activation='relu')(x) #dense layer 3
preds=Dense(8322,activation='softmax')(x) #final layer with softmax activation
model=Model(inputs=base_model.input,outputs=preds)
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['accuracy'])
model.summary()
train_datagen=ImageDataGenerator(preprocessing_function=preprocess_input) #included in our dependencies
train_generator=train_datagen.flow_from_directory('/Users/imac/Desktop/Fayed/Facematching/testenv/facenet/Dataset/train', # this is where you specify the path to the main data folder
target_size=(224,224),
color_mode='rgb',
batch_size=32,
class_mode='categorical',
shuffle=True)
step_size_train=train_generator.n/train_generator.batch_size
model.fit_generator(generator=train_generator,
steps_per_epoch=step_size_train,
epochs=10)
model.save('directory')
As far as the code code is concern what i understand is that i disable the last layer then add 4 layer train them and store them in a diectory.
i then load the model using
model=load_model('directory of my saved model')
model.summary()
yhat = model.predict(samples)
i predict the embedding of two image and then calculate cosine distance. But the problem is that the prediction gets worsen with my trained model. For two image of same person the pre-trained model gives distance of 0.3 whereas my trained model show distance of 1.0. Although during training loss function is decreasing with each epoch and accuracy is increasing but that doesn't reflect on my prediction output. I want to increase the prediction result of pre-trained model.
How can i achieve that with my own data?
N.B: I am relatively new in machine learning and don't know a lot about model layers
What I would suggest is to go with triplet or siamese with these many number of classes. Use MTCNN to extract faces and then use facenet architecture to generate 512 dimensions embedding vectors, then visualize it using TSNE plot. Every face will be assigned a small embedding cluster. Go through this link for Keras to generate face embeddings: Link.
Then, try Triplets semi-hard and hard loss on your dataset to cluster them into 10000 classes. It might help. Go through this detailed blog on triplet loss: Triplets. Codes to go through some of the repositries: Code.

Categories