Object detection model's performance jumping up and down - python

I am training a model to detect buildings from satellite images in rural Africa. For labels, I use OpenStreetMap geometries. I use the Tensorflow Object Detection API and SSD Inception V2 as a model and I use the default config file. I trained separate models on two different datasets (in different geographical regions). In one area, the model behaves like I would expect:
When training the model in the other area, however, it's performance jumps up and down:
Note that I use the exact same model, configuration, batch size, the training area is of the same size, etc. In the second case, the model's prediction change extremely rapidly and I am not able to see why. For example, here is a comparison of the predictions the model makes at 107k and 108k global steps (i.e. I'd expect the predictions to be similar):
I am quite new to deep learning and cannot understand why this might happen. It might be something simple that I am overlooking. I've checked the labels and they are ok. Also, I thought that it could be a bad batch that in every epoch turns the training in the wrong direction but this is not the case - the performance drops like that for up to several epochs.
I would be very grateful for any tips where to look, etc. I am using TF 1.14.
Let me know if I should provide more information.

Related

Is the performance of a deep learning model affected if it has "seen" the same test images before?

I am working with the YOLOv3 model for an object detection task. I am using pre-trained weights that were generated for the COCO dataset, however, I have my own data for the problem I am working on. According to my knowledge, using those trained weights as a starting point for my own model should not have any effect on the performance of the model once it is trained on an entirely different dataset (right?).
My question is: will the model give "honest" results if I train it multiple times and test it on the same test set each time, or would it have better performance since it has already been exposed to those test images during an earlier experiment? I've heard people say things like "the model has already seen that data", does that apply in my case?
For hyper-parameter selection, evaluation of different models, or evaluating during training, you should always use a validation set.
You are not allowed to use the test set until the end!
The whole purpose of test data is to get an estimation of the performance after the deployment. When you use it during training to train your model or evaluate your model, you expose that data. For example, based on the accuracy of the test set, you decide to increase the number of layers.
Now, your performance will be increased on the test set. However, it will come with a price!
Your estimation on the test set becomes biased, and you no longer be able to use that estimation to talk about data that your model sees after deployment.
For example, You want to train an object detector for self-driving cars, and you exposed the test set during training. Therefore, you can not use the accuracy on the test set to talk about the performance of the object detector when you put it on a car and sell it to a customer.
There is an old sentence related to this matter:
If you torture the data enough, it will confess.

In Leave One Out Cross Validation, How can I Use `shap.Explainer()` Function to Explain a Machine Learning Model?

Background of the Problem
I want to explain the outcome of machine learning (ML) models using SHapley Additive exPlanations (SHAP) which is implemented in the shap library of Python. As a parameter of the function shap.Explainer(), I need to pass an ML model (e.g. XGBRegressor()). However, in each iteration of the Leave One Out Cross Validation (LOOCV), the ML model will be different as in each iteration, I am training on a different dataset (1 participant’s data will be different). Also, the model will be different as I am doing feature selection in each iteration.
Then, My Question
In LOOCV, How can I use shap.Explainer() function of shap library to present the performance of a machine learning model? It can be noted that I have checked several tutorials (e.g. this one, this one) and also several questions (e.g. this one) of SO. But I failed to find the answer of the problem.
Thanks for reading!
Update
I know that in LOOCV, the model found in each iteration can be explained by shap.Explainer(). However, as there is 250 participants' data, if I apply shap here for each model, there will be 250 output! Thus, I want to get a single output which will present the performance of the 250 models.
You seem to train model on a 250 datapoints while doing LOOCV. This is about choosing a model with hyperparams that will ensure best generalization ability.
Model explanation is different from training in that you don't sift through different sets of hyperparams -- note, 250 LOOCV is already overkill. Will you do that with 250'000 rows? -- you are rather trying to understand which features influence output in what direction and by how much.
Training has it's own limitations (availability of data, if new data resembles the data the model was trained on, if the model good enough to pick up peculiarities of data and generalize well etc), but don't overestimate explanation exercise either. It's still an attempt to understand how inputs influence outputs. You may be willing to average 250 different matrices of SHAP values. But do you expect the result to be much more different from a single random train/test split?
Note as well:
However, in each iteration of the Leave One Out Cross Validation (LOOCV), the ML model will be different as in each iteration, I am training on a different dataset (1 participant’s data will be different).
In each iteration of LOOCV the model is still the same (same features, hyperparams may be different, depending on your definition of iteration). It's still the same dataset (same features)
Also, the model will be different as I am doing feature selection in each iteration.
Doesn't matter. Feed resulting model to SHAP explainer and you'll get what you want.

Is it possible to train model from multiple datasets for each class?

I'm pretty new to object detection. I'm using tensorflow object detection API and I'm now collecting datasets for my project
and model_main.py to train my model.
I have found and transformed two quite large datasets of cars and traffic lights with annotations. And made two tfrecords from them.
Now I want to train a pretrained model however, I'm just curious will it work? When it is possible that an image for example "001.jpg" will have of course some annotated bounding boxes of cars (it is from the car dataset) but if there is a traffic light as well it wouldn't be annotated -> will it lead to bad learning rate? (there can be many of theese "problematic" images) How should I improve this? Is there any workaround? (I really don't want to annotate the images again)
If its stupid question I'm sorry, thanks for any response - some links with this problematic would be the best !
Thanks !
The short answer is yes, it might be problematic, but with some effort you can make it possible.
If you have two urban datasets, and in one you only have annotations for traffic lights, and in the second you only have annotations for cars, then each instance of car in the first dataset will be learned as false example, and each instance of traffic light in the second dataset will be learned as false example.
The two possible outcomes I can think of are:
The model will not converge, since it tries to learn opposite things.
The model will converge, but will be domain specific. This means that the model will only detect traffic lights on images from the domain of the first dataset, and cars on the second.
In fact I tried doing so myself in a different setup, and got this outcome.
In order to be able to learn your objective of learning traffic lights and cars no matter which dataset they come from, you'll need to modify your loss function. You need to tell the loss function from which dataset each image comes from, and then only compute the loss on the corresponding classes (/zero out the loss on the classes do not correspond to it). So returning to our example, you only compute loss and backpropagate traffic lights on the first dataset, and cars on the second.
For completeness I will add that if resources are available, then the better option is to annotate all the classes on all datasets in order to avoid the suggested modification, since by only backpropagating certain classes, you do not enjoy using actual false examples for other classes.

Using machine learning to detect images based on single learning image

I have a use case where I have about 300 images out of 300 different items. I need machine learning to detect an item about once a minute.
I've been using Keras with Sequential to detect images but I'm wondering what I should take into consideration when I have 300 labels and only one image per label for learning.
So in short:
1) Can you do machine learning image detection with one learning image per label?
2) Are there any special things I take into consideration?
If this were a special case -- say, one class in 100 was represented by a single training image -- then you might get away with it. However, a unique image per class is asking for trouble.
A neural network learns by iterative correction, figuring out what features and combinations are important, and which are not, in discriminating the classes from one another. Training starts by a chaotic process that has some similarities to research: look at the available data, form hypotheses, and test then against the real world.
In a NN, the "hypotheses" are the various kernels it develops. Each kernel is a pattern to recognize something important to the discrimination process. If you lack enough examples for the model to generalize and discriminate for each class, then you run the risk (actually, you have the likelihood) of the model making a conclusion that is valid for the one input image, but not others in the same class.
For instance, one acquaintance of mine did the canonical cat-or-dog model, using his own photos, showing the pets of his own household and those of a couple of friends. The model trained well, identified cats and dogs with 100% accuracy on the test data, and he brought it into work ...
... where it failed, having an accuracy of about 65% (random guessing is 50%). He did some analysis and found the problem: his friends have indoor cats, but their preferred dog photos were out of doors. Very simply, the model had learned to identify not cats vs dogs, but rather couches and kitchen cabinets vs outdoor foliage. One of the main filters was of large, textured, green areas. Yes, a dog is a large, textured, green being. :-)
The only way your one-shot training would work is if each of your training images was specifically designed to include exactly those features that differentiate this class from the other 299, and no other visual information. Unfortunately, to identify what features those might be, and to provide canonical training photos, you'd have to know in advance what patterns the model needed to pick.
This entirely defeats the use case of deep learning and model training.
If you were to only train on that image once, it probably wouldn't be able to detect it yet. If you train it more, it will probably over fit and only recognize that one image. If that is what you are trying to do then you should make an algorithm to search the screen for that image (it will be more efficient).
1) You'll probably have problems with the generalization of your models because the lack of training set. In other words, your model will not "learn" about that class.
2) It's good to have a better training set in order to create a better model.

validation and training don't converge at the same time, but validation still converges

https://github.com/wenxinxu/resnet-in-tensorflow#overall-structure
The link above is the Resnet model for cifar10.
I am modifying above code to do object detection using Resnet and Cifar10 as training/validating dataset. ( I know the dataset is for object classification) I know that it sounds strange, but hear me out. I use Cifar10 for training and validation then during testing I use a sliding window approach, and then I classify each of the windows to one of 10 classes + "background" classes.
for background classes, I used images from ImageNet. I search ImageNet with following keyword: construction, landscape, byway, mountain, sky, ocean, furniture, forest, room, store, carpet, and floor. then I clean bad images out as much as I can including images that contain Cifar10 classes, for example, I delete a few "floor" images that have dogs in it.
I am currently running the result in Floydhub. Total steps that I am running is 60,000 which is where section under "training curve" from the link about suggests that the result starts to consolidate and do not converge further ( I personally run this code myself and I can back up the claim)
My question is:
what is the cause of the sudden step down in training and validation data which occurs at about the same step?
What if(or Is it possible that)training and validation data don't converge in a step-like fashion at about the same step? what I mean is, for example, training steps down at around 40,000 and validation just converge with no step-down? (smoothly converge)
The sudden step down is caused by the learning rate decay happening at 40k steps (you can find this parameter in hyper_parameters.py). The leraning rate suddenly gets divided by 10, which allows you to tune the parameters more precisely, which in this case improves your performance a lot. You still need the first part, with a pretty big learning rate, to get in a "good" area for your parameters, then the part with a 10x smaller learning rate will refine it and find a very good spot in that area for your parameters.
This would be surprising, since there is a clear difference between before and after 40k, that affects training and validation the same way. You could still see different behaviors from that point: for instance you might start overtraining because of a too small LR, and see you train error drop down and validation go up, because the refinements you're doing are too specific to the training data.

Categories