I'm starting out with GANs and I am training a DC-GAN on MNIST dataset. I want to evaluate my model using Frechet Inception Distance (FID).
Since Inception network is not trained to classify MNIST digits, can I use any simple MNIST classifier or are there any conditions on what kind of classifier I need to use? Or should I use Inception net only? I have few other questions
Does it make sense to compute FID for MNIST GAN?
How many images from real dataset should be used while computing FID
For a classifier I'm using, I'm getting FID in the order of 10^6. Is the value okay or is something horribly wrong?
If you can answer any of these questions, even partially, that would be of immense help to me. Thanks!
you can refer this.
Use a auto-encoder trained on MNIST and the bottleneck activations as the features as explained here
Model trained on Mnist dont do well on FID computation. As far as I can tell, major reasons are data distribution is too narrow(Gan images are too far from distribution model is trained on) and model is not deep enough to learn a lot of feature variation.
Training a few-convolutional layers model gives 10^6 values on FID. To test the above hypothesis, just adding L2 regularization, the FID values dropped to around 3k(confirming to data distribution being narrow), however the FID value dont improve as GAN training goes on. :(.
Finally, directly computing FID values from Inception network gives a nice plot as images becomes better.
[Note:- You need to rescale mnist image and convert to RGB by repeating one channel 3 times. Make sure real image and generated image have same intensity scales.]
Related
I was wondering if it is useful to train a pre-trained resnet (pre-trained with imagenet) with images that are closer to my classification problem. I want to use 50,000 labeled images of trees from a paper to update the weights of the pre-trained resnet. Then I would like to use these weights to re-train and evaluate the resnet, hopefully better fitted this way, with my own set of images of trees.
I already used the pre-trained resnet on my own images with moderate success. Due to the small dataset size (~5,000 imagery) I thought it might be smart to further train the pre-trained resnet with more similar data.
Any suggestions or experiences you want to share?
I'm currently using a custom version of YOLO v2 from pjreddie.com written with Tensorflow and Keras. I've successfully got the model to start and finish training over 100 epochs with 10000 training images and 2400 testing images which I randomly generated along with the associated JSON files all on some Titan X gpus with CUDA. I only wish to detect two classes. However, after leaving the training going, the loss function decreases but the test accuracy hovers at below 3%. All the images appear to be getting converted to black and white. The model seems to perform reasonably on one of the classes when using the training data, so the model appears overfitted. What can I do to my code to get the model to become accurate?
Okay, so it turned out that YOLOv2 was performing very well on unseen data except that the unseen data has to be the same size of images as the ones it's trained on. Don't feed Yolo with 800x800 images if it's been trained on 400x400and 300x400 images. Also the Keras accuracy measure is meaningless for detection. It might say 2% accuracy and actually be detecting all objects. Passing unseen data of the same size solved the problem.
I am working on a Signature Verification project . I have used the ICDAR 2011 Signature Dataset.Currently,I am pairing the encoding of an original image and a forgery to get a training sample(labelled 0). The encodings are obtained from a pre-trained VGG-16 convolutional neural network (removing the fully connected layer). I have then modified the fully connected layer having the following architecture :
Input size : 50177
1st hidden layer : 1000 units (activation : "sigmoid",Dropout : 0.5)
2nd hidden layer : 500 units (activation : "sigmoid",Dropout : 0.2)
Output Layer : 1 unit (activation : "sigmoid")
The issue is that although the training set accuracy increases the validation accuracy fluctuates randomly.It performs very badly on the test set
I have tried different architectures but nothing seems to work
So is there any other way to prepare the data or should I continue trying different architectures??
I don't think that using a VGG16 model for features extraction for your task is the right way to go. You are using a model that was trained on relatively complex RGB images and than try to use it for a dataset that basically consists of grayscale images of edges (signatures). And you are using the last embedding layer which contains the most complex and specialized representation of the ImageNet dataset (the original training dataset for the VGG model).
The features you get have no real meaning and that is probably why the training accuracy and validation accuracy are not correlated at all when you try to fine-tune the model.
My suggestion is to either use an earlier layer of the VGG16 for feature extraction (I'm talking somewhere around layer no.5-6), or better yet, use a simpler model that was trained on a more similar dataset, like the MNIST dataset.
The MNIST dataset consists of handwritten digits so it is considerably more similar to your task and any model trained on it will act as a much better feature extractor for your task.
You can pick any model from the following list of benchmark results on the MNIST and use it as a feature extractor:
MNIST Benchmark Results
Just I am curious why I have to scale the testing set on the testing set, and not on the training set when I’m training a model on, for example, CNN?!
Or am I wrong? And I still have to scale it on the training set.
Also, can I train a dataset in the CNN that contents positive and negative elements as the first input of the network?
Any answers with reference will be really appreciated.
We usually have 3 types of datasets for getting a model trained,
Training Dataset
Validation Dataset
Test Dataset
Training Dataset
This should be an evenly distributed data set which covers all varieties of data. If your train with more epochs, the model will get used to the training dataset and will only give proper proper prediction on the training dataset and this is called Overfitting. Only way to keep a check on overfitting is by having other datasets which the model has never been trained on.
Validation Dataset
This can be used fine tune model hyperparameters
Test Dataset
This is the dataset which the model has not been trained on has never been a part of deciding the hyperparameters and will give the reality of how the model is performing.
If scaling and normalization is used, the testing set should use the same parameters used during training.
A good answer that links to that: https://datascience.stackexchange.com/questions/27615/should-we-apply-normalization-to-test-data-as-well
Also, some models tend to require normalization and others do not.
The Neural Network architectures are normally robust and might not need normalization.
Scaling data depends upon the requirement as well the feed/data you got. Test data gets scaled with Test data only, because Test data don't have the Target variable (one less feature in Test data). If we scale our Training data with new Test data, our model will not be able to correlate with any target variable and thus fail to learn. So the key difference is the existence of Target variable.
I trained my dataset for tensorflow object detection using both ssd and faster r-cnn model.There were 220 train and 30 test images in my dataset.
I trained the model for 200k steps and got loss under 1.But when i tested my trained model on video it was detecting and labelling almost everything in the video.
Can anyone tell me why is that happening?
Thank you
The number of classes you are using is just one and you trained your model with images belonging to the same class and tested it for the same.
So the problem is the model is skewed(predicts the same for all images)
No matter whatever image you test it on, you will get the same output.
Solution:
Train you model with an nearly equal number of negative images.
Ex:220 images containing the object to be identified(label them as 1) and another nearly 220 images not containing the object(label them as 0)
Use F1 score to check your accuracy because it will help you understand if the dataset is skewed or not.
Check this to learn about different kinds of accuracy measures.
Take this course to learn more about CNNs.