Tensorflow Object Detetection training best practice questions

Tensorflow Object Detetection training best practice questions - python

Training on large scale images:
I'm trying to train a vehicle detector on Images with 4K-Resolution with about 100 small-sized vehicles per image (vehicle size about 100x100 pixel).
I'm currently using the full resolution, which costs me a lot of memory. I'm training using 32 cores and 128 GB RAM. The current architecture is Faster RCNN. I can train with a second stage batch size of 12 and a first_stage_mini_batch_size of 50. (I scaled both down until my memory was sufficient).
I assume, that I should increase the max number of RPN proposals. Which dimension would be appropriate?
Does this approach make sense?
Difficulty, truncated, labels and poses:
I currently separated my dataset only into three classes (cars, trucks, vans).
I assume giving additional information like:
difficult (for mostly hidden vehicles), and
truncated (I currently did not select truncated objects, but I could)
would improve the training process.
Would truncated include overlapped vehicles?
Would additional Information like views/poses and other labels also improve the training process, or would it make the training harder?
Adding new data to the training set:
Is it possible to add new images and objects into the training and validation record files and automatically resume the training using the latest checkpoint file from the training directory? Or is the option "fine_tune_checkpoint" with "from_detection_checkpoint" necessary?
Would it harm, if a random separation of training and validation data would pick different datasets than in the training before?

For your problem, the out-of-the-box config files won't work so well due to the high resolutions of the images and the small cars. I recommend:
Training on crops --- cut your image into smaller crops, keeping the cars roughly at about the same resolution as they are now.
Eval on crops --- at inference time, cut up your image into a bunch of overlapping crops, and run inference on each one of those crops. Usually people combine the detections across the multiple crops using non-max-suppression. See slide 25 here for an illustration of this.
I highly recommend training using a GPU or better yet, multiple GPUs.
Avoid tweaking the batch_size parameters to begin with --- they are set up to work quite well out of the box and changing them will often make it difficult to debug.
Currently the difficult/truncated/pose fields are not used during training, so including them won't make a difference.

I switched the evaluation and training data (in config) and training continues as normal with exactly same command start it.
there's log about restoring parameters from last checkpoint
as I switch test/train data mAP immediately shoots too the moon
Images tab in the tensorboard gets updated
So it looks like changing the data works correctly. I'm not sure how can it affect the model, basically it's pretrained without these examples and fine-tuned with them
LOG:
INFO:tensorflow:Restoring parameters from /home/.../train_output/model.ckpt-3190
This results in train/test contamination and real model performance suppose to be lower than one calculated on the contaminated validation dataset. You shouldn't worry that much unless you want to present some well-defined results
Real life example from https://arxiv.org/abs/1311.2901 :
ImageNet and Caltech datasets have some images in common. While evaluating how well your model trained with ImageNet performs with and Caltech as validation, you should remove duplicates from ImageNet before training.

Related

Train model by slicing train dataset(basically load few portion of data, train and unload again load another portion train and unload again repeat)

I have dataset of about 400k images (for training only separate dataset for validation and testing).
If I train model using 30k images (after gray scaling and cropping up to 256*64 pixel),
it takes up about 25GB of RAM, sometimes it spikes up to 30GB while running fit model obviously.
If lets say 30k images consumes 25GB of RAM (feasible for me) in average, I need 333.333GB of RAM and maybe more while fitting model which is not feasible in local run. Also I do not want to rely on cloud computing(which I think google is giving 90 day $300 free trail, but nah I want a solution that can work longer) cause I am broke.
So the Question is: Is there any way to train a model by slicing those training data and do rinse and repeat with load-train-unload data?
I was thinking of using libraries like crème through which we can stream data. But I do not think. It works for my project.
Also, if there is a solution can you please share a link to article with guide(if possible.)
Few lines of code example would be great too.
Thank you for your time.

How to solve classification problem where Dataset amount is low and features between two classes are similar/confusable

I am currently training a cnn to classify ICs into the classes "scratch" and "no scratch" (binary classification). I am fairly new to deep learning and when I trained my cnn a little bit I got very good accuracies (Good validation accuracy as well). But I quickly learned that my models where not as good as I thought, because when using them on a dataset to test it, it got quite a lot of false classification (false positives and false negatives). In my opinion there are 2 problems:
There is too little training data (about 1000 each class)
The ICs have markings (text) on it, which changes with every batch, so my training data has images of ICs with varying markings on it. And since some batches have more scratched ICs and other have less or none, the amount of IC images with different markings is unbalanced
here are two example images of 2 ICs from training set of the class "scratch":
As you see the text varies very strong. Every line has different characters and the amount of characters also varies.
I ask myself how the cnn should be able to differentiate between a scratch and an character?
Nevertheless I am trying to train my cnn and this is for example one model I currently trained (the other models look quite similar):
There are some points while training where the validation accuracy gets up and then down again. What could that mean? I think it is something like that there is a feature in the val data set that is not covered in my training set. Could this be the cause?
As you see Data Augmentation is no option (Or so I think) because of the text. One thing that came into my mind is to seperate the marking and the IC (cut out text region) with preprocessing (Don't know how I could do it properly and fast) and then classfy them seperately, but I don't know if this would be the right approach.
I first used VGG16, ResNet and InceptionV3 (with transfer learning). Now I tried to train my custom cnn (inspired by VGG but with 10 layers similar to this: https://journals.sagepub.com/doi/full/10.1177/1558925019897396)
Do you guys know how I should approach this problem or do you have any tips?

Training & Validation loss and dataset size

I'm new on Neural Networks and I am doing a project that has to define a NN and train it. I've defined a NN of 2 hidden layers with 17 inputs and 17 output. The NN has 21 inputs and 3 outputs.
I have a data set of labels of 10 million, and a dataset of samples of another 10 million. My first issue is about the size of the validation set and the training set. I'm using PyTorch and batches, and of what I've read, the batches shouldn't be larger. But I don't know how many approximately should be the size of the sets.
I've tried with larger and small numbers, but I cannot find a correlation that shows me if I'm right choosing a large set o small set in one of them (apart from the time that requires to process a very large set).
My second issue is about the Training and Validation loss, which I've read that can tell me if I'm overfitting or underfitting depending on if it is bigger or smaller. The perfect should be the same value for both, and it also depends on the epochs. But I am not able to tune the network parameters like batch size, learning rate or choosing how much data should I use in the training and validation. If 80% of the set (8 million), it takes hours to finish it, and I'm afraid that if I choose a smaller dataset, it won't learn.
If anything is badly explained, please feel free to ask me for more information. As I said, the data is given, and I only have to define the network and train it with PyTorch.
Thanks!

For your first question about batch size, there is no fix rule for what value should it have. You have to try and see which one works best. When your NN starts performing badly don't go above or below that value for batch size. There is no hard rule here to follow.
For your second question, first of all, having training and validation loss same doesn't mean your NN is performing nicely, it is just an indication that its performance will be good enough on a test set if the above is the case, but it largely depends on many other things like your train and test set distribution.
And with NN you need to try as many things you can try. Try different parameter values, train and validation split size, etc. You cannot just assume that it won't work.

Does the size of the 'bounding box' for the training data matter when transfer learning a SSD model with the Tensorflow object detection library?

I am trying to transfer learn a mobilenet_v2_coco model on the publicly available GTSRB (German Traffic Signs) Dataset.
I selected 3 classes to have a faster training time and I've already trained for about 10 000 epochs. Usually I already get decent results at this point in time. But my SSD fails to find anything on a livestream video I access over a small python program with my webcam. It even classifies almost the entire screen as one of the classes provided (the one that has more training data) with >90% confidence.
My guesses are that either this is because of the unbalanced data set (class1 = 2000 images, class2 = 1000 images, class3 = 800) or because of the images being filled with the object, without much noise or anything. So basically the ROI is almost as big as the dataset images, but the classifier aims to predict dash cam like videos, where the signs are usually very small.
Or do I just have to train harder and longer this time to get decent results?
The second part of my question is, if there is like a rule of thumb what the images in the dataset need to fulfil to output good predictions.

validation and training don't converge at the same time, but validation still converges

https://github.com/wenxinxu/resnet-in-tensorflow#overall-structure
The link above is the Resnet model for cifar10.
I am modifying above code to do object detection using Resnet and Cifar10 as training/validating dataset. ( I know the dataset is for object classification) I know that it sounds strange, but hear me out. I use Cifar10 for training and validation then during testing I use a sliding window approach, and then I classify each of the windows to one of 10 classes + "background" classes.
for background classes, I used images from ImageNet. I search ImageNet with following keyword: construction, landscape, byway, mountain, sky, ocean, furniture, forest, room, store, carpet, and floor. then I clean bad images out as much as I can including images that contain Cifar10 classes, for example, I delete a few "floor" images that have dogs in it.
I am currently running the result in Floydhub. Total steps that I am running is 60,000 which is where section under "training curve" from the link about suggests that the result starts to consolidate and do not converge further ( I personally run this code myself and I can back up the claim)
My question is:
what is the cause of the sudden step down in training and validation data which occurs at about the same step?
What if(or Is it possible that)training and validation data don't converge in a step-like fashion at about the same step? what I mean is, for example, training steps down at around 40,000 and validation just converge with no step-down? (smoothly converge)

The sudden step down is caused by the learning rate decay happening at 40k steps (you can find this parameter in hyper_parameters.py). The leraning rate suddenly gets divided by 10, which allows you to tune the parameters more precisely, which in this case improves your performance a lot. You still need the first part, with a pretty big learning rate, to get in a "good" area for your parameters, then the part with a 10x smaller learning rate will refine it and find a very good spot in that area for your parameters.
This would be surprising, since there is a clear difference between before and after 40k, that affects training and validation the same way. You could still see different behaviors from that point: for instance you might start overtraining because of a too small LR, and see you train error drop down and validation go up, because the refinements you're doing are too specific to the training data.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.