I am working on a Signature Verification project . I have used the ICDAR 2011 Signature Dataset.Currently,I am pairing the encoding of an original image and a forgery to get a training sample(labelled 0). The encodings are obtained from a pre-trained VGG-16 convolutional neural network (removing the fully connected layer). I have then modified the fully connected layer having the following architecture :
Input size : 50177
1st hidden layer : 1000 units (activation : "sigmoid",Dropout : 0.5)
2nd hidden layer : 500 units (activation : "sigmoid",Dropout : 0.2)
Output Layer : 1 unit (activation : "sigmoid")
The issue is that although the training set accuracy increases the validation accuracy fluctuates randomly.It performs very badly on the test set
I have tried different architectures but nothing seems to work
So is there any other way to prepare the data or should I continue trying different architectures??
I don't think that using a VGG16 model for features extraction for your task is the right way to go. You are using a model that was trained on relatively complex RGB images and than try to use it for a dataset that basically consists of grayscale images of edges (signatures). And you are using the last embedding layer which contains the most complex and specialized representation of the ImageNet dataset (the original training dataset for the VGG model).
The features you get have no real meaning and that is probably why the training accuracy and validation accuracy are not correlated at all when you try to fine-tune the model.
My suggestion is to either use an earlier layer of the VGG16 for feature extraction (I'm talking somewhere around layer no.5-6), or better yet, use a simpler model that was trained on a more similar dataset, like the MNIST dataset.
The MNIST dataset consists of handwritten digits so it is considerably more similar to your task and any model trained on it will act as a much better feature extractor for your task.
You can pick any model from the following list of benchmark results on the MNIST and use it as a feature extractor:
MNIST Benchmark Results
Related
I have trained a pretrained ResNet18 model with my custom dataset on Pytorch and wondered whether I could transfer my model file to train another one with a different architecture, e.g. ResNet50. I know I have to save my model accordingly (explained well on another post here) but this was a question that I have never thought before.
I was planning to use more advanced models like VisionTransformers (ViT) but I couldn't figure out whether I had to start with a pretrained ViT already or I could just take my previous model file and use it as the pretrained model to train a ViT.
Example Scenario: ResNet18 --> ResNet50 --> Inception v3 --> ViT
My best guess it that it's not possible due to number of weights, neurons and layer structures but I would love to hear that if I miss a crucial point here. Thanks!
Between models that only differ in number of layers (Resnet-18 and Resnet-50), it has been done to initialize some layers of the larger model from the weights of the smaller model's layers. Inversely, you can truncate a larger model by taking a subset of regularly spaced layers and initialize a smaller model. In both cases, you need to retrain everything at the end if you hope to achieve semi-decent performances.
The whole point of using architectures that vastly differ (vision transformers vs CNNs) is to learn different features from the inputs and unlock new levels of semantic understanding. Recent models like BeiT also use new self-supervised training schemes that have nothing to do with the classic ImageNet pretraining. Using trained weights from another model would go against the point.
Having said that,if you want to use a ViT, why not start from the available pretrained weights on HuggingFace and fine-tune it on the data you used to train your ResNet50 ?
My graduation project is to use transfer learning on a CNN model that can diagnose Covid-19 from Chest X-ray images. After spending days Fine tuning the hyper parameters such as the number of fully connected layers, the number of nodes in the layers, the learning rate, and the drop rate using Keras tuner library with Bayesian Optimizer, I got some very good results, A test accuracy of 98% for multi class classification and a 99% for binary class classification. However, i froze all the layers in the original base model. I only fine tuned the last Fully connected layers after exhaustive hyper parameter optimization. Most articles and papers out there say that they fine the fully connected layers as well as some of the convolutional layers. Am i doing something wrong? I am afraid that this is too good to be true.
My data set is not that big, only 7000 images taken from the Kaggle Covid-19 competition.
I used image enhancement techniques such as N-CLAHE on the images before the training and the classification which improved the accuracy significantly compared to not enhancing the images.
I did the same for multiple State of art models, such as VGG-16 and ResNet50, and they all gave me superb results.
If you mean by "only fine tuned the last Fully connected layers" then NO, you did not.
You can choose to fine-tune any layer of your choice but most importantly the final layers of the model, which is what you did, so you're good to go.
I'm starting out with GANs and I am training a DC-GAN on MNIST dataset. I want to evaluate my model using Frechet Inception Distance (FID).
Since Inception network is not trained to classify MNIST digits, can I use any simple MNIST classifier or are there any conditions on what kind of classifier I need to use? Or should I use Inception net only? I have few other questions
Does it make sense to compute FID for MNIST GAN?
How many images from real dataset should be used while computing FID
For a classifier I'm using, I'm getting FID in the order of 10^6. Is the value okay or is something horribly wrong?
If you can answer any of these questions, even partially, that would be of immense help to me. Thanks!
you can refer this.
Use a auto-encoder trained on MNIST and the bottleneck activations as the features as explained here
Model trained on Mnist dont do well on FID computation. As far as I can tell, major reasons are data distribution is too narrow(Gan images are too far from distribution model is trained on) and model is not deep enough to learn a lot of feature variation.
Training a few-convolutional layers model gives 10^6 values on FID. To test the above hypothesis, just adding L2 regularization, the FID values dropped to around 3k(confirming to data distribution being narrow), however the FID value dont improve as GAN training goes on. :(.
Finally, directly computing FID values from Inception network gives a nice plot as images becomes better.
[Note:- You need to rescale mnist image and convert to RGB by repeating one channel 3 times. Make sure real image and generated image have same intensity scales.]
I am working on a classification task which uses byte sequences as samples. A byte sequence can be normalized as input to neural networks by applying x/255 to each byte x. In this way, I trained a simple MLP and the accuracy is about 80%. Then I trained an autoencoder using 'mse' loss on the whole data to see if it works well for the task. I freezed the weights of the encoder's layers and add a softmax dense layer to it for classification. I retrained the new model (only trained the last layer) and to my surprise, the result was much worse than the MLP, merely 60% accuracy.
Can't the autoencoder learn good features from all the data? Why the result is so bad?
Possible actions to take:
Check the error of autoencoder, could it really predict itself?
Visualize the autoencoder results (dimensionality reduction), is the variance explained with fewer dimensions?
Making model more complex does not necessarily outperform simpler ones, did you plot the validation mse versus epoch? Is there a global minimum after a number of steps?
Do you have enough epochs?
What is the number of units you have in your autoencoder? It may be too less (or too much, in case of underfitting) depending on the behavior of your data and its volume.
Did you make any comparison with other dimensionality reduction methods like PCA, NMF?
Last but not least, is it the best way to engineer your features with autoencoder for this task?
"Why the result is so bad?" This is not actually a surprise. You've trained one model to be good at compressing the information. The transformations it learns at each layer do not need to be good for any other type of task at all. In fact, it could be throwing away a lot of information that is perfectly helpful for whatever auxiliary classification task you have, but which is not needed for a task purely of compressing and reconstructing the sequence.
Instead of approaching it by training a separate autoencoder, you might have better luck with just adding sparsity penalty terms from the MLP layers into the loss function, or use some other types of regularization like dropout. Finally you could consider more advanced network architectures, like ResNet / ODE layers or Inception layers, modified for a 1D sequence.
Currently I'm using VGG16 + Keras + Theano thought the Transfer Learning methodology to recognize plants classes. It works just fine and gives me a good accuracy. But the next problem I'm trying to solve - is to find a way of identifying if an input image contains plant at all. I don't want to have another one classifier that will do it, because it's not really efficiently.
So I did some search and have found that we can get activations from the latest model layer (before activation layer) and analyze it.
from keras import backend as K
model = util.load_model() # VGG16 model
model.load_weights(path_to_weights)
def get_activations(m, layer, X_batch):
x = [m.layers[0].input, K.learning_phase()]
y = [m.get_layer(layer).output]
get_activations = K.function(x, y)
activations = get_activations([X_batch, 0])
# trying to get some features from activations
# to understand how can we identify if an image is relevant
for l in activations[0]:
not_nulls = [x for x in l if x > 0]
# shows percentage of activated neurons
c1 = float(len(not_nulls)) / len(l)
n_activated = len(not_nulls)
print 'c1:{}, n_activated:{}'.format(c1, n_activated)
return activations
get_activations(model, 'the_latest_layer_name', inputs)
From the above code I've noticed that when we have very irrelevant image, the number of activated neurons is bigger than for images that contain plants:
For images that was using for model training, number of activated neurons 19%-23%
For images that contain unknown plants species 20%-26%
For irrelevant images 24%-28%
It's not really a good feature to understand if an image relevant as percentage values are intersect.
So, is there a good way to resolve this issue?
Thanks to Feras's idea in the comment above. After some trials, I've come up with the ultimate solution that allows solving this problem with accuracy up to 99.99%.
Steps are:
Train your model on a dataset;
Store activations (see method above how to get them) by predicting relevant and non-relevant images using trained model from the previous step. You should get activations from the penultimate layer. For VGG16 it's the last of two Dense(4096), for InceptionV3 - an extra penultimate Dense(1024) layer, for resnet50 - an extra penultimate Dense(2048) layer.
Solve a binary problem using stored activations data. I've tried a simple flat NN and Logistic Regression. Both were good in accuracy (flat NN was a bit more accurate), but I've chosen the Logistic Regression as it's simpler, faster and consumes less memory and CPU/GPU.
This process should be repeated each time after your model retrained as each time the final weights for CNN are different and what was working previously, will be different next time.
So as result we have another small model for solving the problem.