I'm currently using a custom version of YOLO v2 from pjreddie.com written with Tensorflow and Keras. I've successfully got the model to start and finish training over 100 epochs with 10000 training images and 2400 testing images which I randomly generated along with the associated JSON files all on some Titan X gpus with CUDA. I only wish to detect two classes. However, after leaving the training going, the loss function decreases but the test accuracy hovers at below 3%. All the images appear to be getting converted to black and white. The model seems to perform reasonably on one of the classes when using the training data, so the model appears overfitted. What can I do to my code to get the model to become accurate?
Okay, so it turned out that YOLOv2 was performing very well on unseen data except that the unseen data has to be the same size of images as the ones it's trained on. Don't feed Yolo with 800x800 images if it's been trained on 400x400and 300x400 images. Also the Keras accuracy measure is meaningless for detection. It might say 2% accuracy and actually be detecting all objects. Passing unseen data of the same size solved the problem.
Related
I am working on an object detector using Tensorflow Object Detection API whereby I downloaded a model from the model zoo repository, specifically ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 to train my custom dataset on which contains 2500 images split into 70:30 training and testing ratio. I have edited the pipeline.config file accordingly following a tutorial, where I added the label_maps paths and other requirements. I have trained this model for 50,000 steps and monitored the training process on tensorboard which displayed good training progress results. Now the issue occurred when I went to evaluate the model. For some reason, the evaluation results are the following:
What could possibly be the issue, since the loss graphs seem to be good and even the learning rate graphs? Thanks
I am currently training a cnn to classify ICs into the classes "scratch" and "no scratch" (binary classification). I am fairly new to deep learning and when I trained my cnn a little bit I got very good accuracies (Good validation accuracy as well). But I quickly learned that my models where not as good as I thought, because when using them on a dataset to test it, it got quite a lot of false classification (false positives and false negatives). In my opinion there are 2 problems:
There is too little training data (about 1000 each class)
The ICs have markings (text) on it, which changes with every batch, so my training data has images of ICs with varying markings on it. And since some batches have more scratched ICs and other have less or none, the amount of IC images with different markings is unbalanced
here are two example images of 2 ICs from training set of the class "scratch":
As you see the text varies very strong. Every line has different characters and the amount of characters also varies.
I ask myself how the cnn should be able to differentiate between a scratch and an character?
Nevertheless I am trying to train my cnn and this is for example one model I currently trained (the other models look quite similar):
There are some points while training where the validation accuracy gets up and then down again. What could that mean? I think it is something like that there is a feature in the val data set that is not covered in my training set. Could this be the cause?
As you see Data Augmentation is no option (Or so I think) because of the text. One thing that came into my mind is to seperate the marking and the IC (cut out text region) with preprocessing (Don't know how I could do it properly and fast) and then classfy them seperately, but I don't know if this would be the right approach.
I first used VGG16, ResNet and InceptionV3 (with transfer learning). Now I tried to train my custom cnn (inspired by VGG but with 10 layers similar to this: https://journals.sagepub.com/doi/full/10.1177/1558925019897396)
Do you guys know how I should approach this problem or do you have any tips?
I'm starting out with GANs and I am training a DC-GAN on MNIST dataset. I want to evaluate my model using Frechet Inception Distance (FID).
Since Inception network is not trained to classify MNIST digits, can I use any simple MNIST classifier or are there any conditions on what kind of classifier I need to use? Or should I use Inception net only? I have few other questions
Does it make sense to compute FID for MNIST GAN?
How many images from real dataset should be used while computing FID
For a classifier I'm using, I'm getting FID in the order of 10^6. Is the value okay or is something horribly wrong?
If you can answer any of these questions, even partially, that would be of immense help to me. Thanks!
you can refer this.
Use a auto-encoder trained on MNIST and the bottleneck activations as the features as explained here
Model trained on Mnist dont do well on FID computation. As far as I can tell, major reasons are data distribution is too narrow(Gan images are too far from distribution model is trained on) and model is not deep enough to learn a lot of feature variation.
Training a few-convolutional layers model gives 10^6 values on FID. To test the above hypothesis, just adding L2 regularization, the FID values dropped to around 3k(confirming to data distribution being narrow), however the FID value dont improve as GAN training goes on. :(.
Finally, directly computing FID values from Inception network gives a nice plot as images becomes better.
[Note:- You need to rescale mnist image and convert to RGB by repeating one channel 3 times. Make sure real image and generated image have same intensity scales.]
I trained my dataset for tensorflow object detection using both ssd and faster r-cnn model.There were 220 train and 30 test images in my dataset.
I trained the model for 200k steps and got loss under 1.But when i tested my trained model on video it was detecting and labelling almost everything in the video.
Can anyone tell me why is that happening?
Thank you
The number of classes you are using is just one and you trained your model with images belonging to the same class and tested it for the same.
So the problem is the model is skewed(predicts the same for all images)
No matter whatever image you test it on, you will get the same output.
Solution:
Train you model with an nearly equal number of negative images.
Ex:220 images containing the object to be identified(label them as 1) and another nearly 220 images not containing the object(label them as 0)
Use F1 score to check your accuracy because it will help you understand if the dataset is skewed or not.
Check this to learn about different kinds of accuracy measures.
Take this course to learn more about CNNs.
Training on large scale images:
I'm trying to train a vehicle detector on Images with 4K-Resolution with about 100 small-sized vehicles per image (vehicle size about 100x100 pixel).
I'm currently using the full resolution, which costs me a lot of memory. I'm training using 32 cores and 128 GB RAM. The current architecture is Faster RCNN. I can train with a second stage batch size of 12 and a first_stage_mini_batch_size of 50. (I scaled both down until my memory was sufficient).
I assume, that I should increase the max number of RPN proposals. Which dimension would be appropriate?
Does this approach make sense?
Difficulty, truncated, labels and poses:
I currently separated my dataset only into three classes (cars, trucks, vans).
I assume giving additional information like:
difficult (for mostly hidden vehicles), and
truncated (I currently did not select truncated objects, but I could)
would improve the training process.
Would truncated include overlapped vehicles?
Would additional Information like views/poses and other labels also improve the training process, or would it make the training harder?
Adding new data to the training set:
Is it possible to add new images and objects into the training and validation record files and automatically resume the training using the latest checkpoint file from the training directory? Or is the option "fine_tune_checkpoint" with "from_detection_checkpoint" necessary?
Would it harm, if a random separation of training and validation data would pick different datasets than in the training before?
For your problem, the out-of-the-box config files won't work so well due to the high resolutions of the images and the small cars. I recommend:
Training on crops --- cut your image into smaller crops, keeping the cars roughly at about the same resolution as they are now.
Eval on crops --- at inference time, cut up your image into a bunch of overlapping crops, and run inference on each one of those crops. Usually people combine the detections across the multiple crops using non-max-suppression. See slide 25 here for an illustration of this.
I highly recommend training using a GPU or better yet, multiple GPUs.
Avoid tweaking the batch_size parameters to begin with --- they are set up to work quite well out of the box and changing them will often make it difficult to debug.
Currently the difficult/truncated/pose fields are not used during training, so including them won't make a difference.
I switched the evaluation and training data (in config) and training continues as normal with exactly same command start it.
there's log about restoring parameters from last checkpoint
as I switch test/train data mAP immediately shoots too the moon
Images tab in the tensorboard gets updated
So it looks like changing the data works correctly. I'm not sure how can it affect the model, basically it's pretrained without these examples and fine-tuned with them
LOG:
INFO:tensorflow:Restoring parameters from /home/.../train_output/model.ckpt-3190
This results in train/test contamination and real model performance suppose to be lower than one calculated on the contaminated validation dataset. You shouldn't worry that much unless you want to present some well-defined results
Real life example from https://arxiv.org/abs/1311.2901 :
ImageNet and Caltech datasets have some images in common. While evaluating how well your model trained with ImageNet performs with and Caltech as validation, you should remove duplicates from ImageNet before training.