I have trained tensorflow object detection model (for num_steps:50000) using SSD (mobilenet-v1) on custom dataset. I got mAP#.50IOU ~0.98 and loss ~1.17. The dataset consist of uno playing card images (skip, reverse, and draw four). On all these cards, model performs pretty well as I have trained model only on these 3 card (around 278 images with 829 bounding boxes (25% bounding box used for testing i.e. validation) collected using mobile phone).
However, I haven’t trained model on any other card but still it detects other cards (inference using webcam).
How can I fix this? Should I also collect other class images (anything other than skip, reverse and draw four cards) and ignore this class in operation? So that model sees this class i.e. Label: Other images during training and doesn’t put any label during inference.
I am not sure how to inform tensorflow object detection API that it should ignore images from Other class.
Can anyone please provide pointer?
Please share your views!
Yes, you need to have another Class which is the object you don't want to detect.
If you don't have this Other Class which includes everything that is not to be detected. The model will compare it to the existing class which is almost identical to the cards of interest.
Some of the factors are:
Similarity of Shape
Similarity of Color
Similarity of Symbols
This is why even though it is not the card of interest (Skip, Reverse, and Draw 4), it would somehow have high "belongingness" to these three classes.
Having another Class to dump all of these can significantly lessen the "belongingness" to the three classes of interest and as much as possible provide A LOT of Data during Training.
If you don't want to have another class.
You could overfit Skip, Reverse, and Draw 4 cards (close to 100%), then increase your threshold value of detection to (70-90%).
Hope this will help you.
Related
I'm pretty new to object detection. I'm using tensorflow object detection API and I'm now collecting datasets for my project
and model_main.py to train my model.
I have found and transformed two quite large datasets of cars and traffic lights with annotations. And made two tfrecords from them.
Now I want to train a pretrained model however, I'm just curious will it work? When it is possible that an image for example "001.jpg" will have of course some annotated bounding boxes of cars (it is from the car dataset) but if there is a traffic light as well it wouldn't be annotated -> will it lead to bad learning rate? (there can be many of theese "problematic" images) How should I improve this? Is there any workaround? (I really don't want to annotate the images again)
If its stupid question I'm sorry, thanks for any response - some links with this problematic would be the best !
Thanks !
The short answer is yes, it might be problematic, but with some effort you can make it possible.
If you have two urban datasets, and in one you only have annotations for traffic lights, and in the second you only have annotations for cars, then each instance of car in the first dataset will be learned as false example, and each instance of traffic light in the second dataset will be learned as false example.
The two possible outcomes I can think of are:
The model will not converge, since it tries to learn opposite things.
The model will converge, but will be domain specific. This means that the model will only detect traffic lights on images from the domain of the first dataset, and cars on the second.
In fact I tried doing so myself in a different setup, and got this outcome.
In order to be able to learn your objective of learning traffic lights and cars no matter which dataset they come from, you'll need to modify your loss function. You need to tell the loss function from which dataset each image comes from, and then only compute the loss on the corresponding classes (/zero out the loss on the classes do not correspond to it). So returning to our example, you only compute loss and backpropagate traffic lights on the first dataset, and cars on the second.
For completeness I will add that if resources are available, then the better option is to annotate all the classes on all datasets in order to avoid the suggested modification, since by only backpropagating certain classes, you do not enjoy using actual false examples for other classes.
I have a use case where I have about 300 images out of 300 different items. I need machine learning to detect an item about once a minute.
I've been using Keras with Sequential to detect images but I'm wondering what I should take into consideration when I have 300 labels and only one image per label for learning.
So in short:
1) Can you do machine learning image detection with one learning image per label?
2) Are there any special things I take into consideration?
If this were a special case -- say, one class in 100 was represented by a single training image -- then you might get away with it. However, a unique image per class is asking for trouble.
A neural network learns by iterative correction, figuring out what features and combinations are important, and which are not, in discriminating the classes from one another. Training starts by a chaotic process that has some similarities to research: look at the available data, form hypotheses, and test then against the real world.
In a NN, the "hypotheses" are the various kernels it develops. Each kernel is a pattern to recognize something important to the discrimination process. If you lack enough examples for the model to generalize and discriminate for each class, then you run the risk (actually, you have the likelihood) of the model making a conclusion that is valid for the one input image, but not others in the same class.
For instance, one acquaintance of mine did the canonical cat-or-dog model, using his own photos, showing the pets of his own household and those of a couple of friends. The model trained well, identified cats and dogs with 100% accuracy on the test data, and he brought it into work ...
... where it failed, having an accuracy of about 65% (random guessing is 50%). He did some analysis and found the problem: his friends have indoor cats, but their preferred dog photos were out of doors. Very simply, the model had learned to identify not cats vs dogs, but rather couches and kitchen cabinets vs outdoor foliage. One of the main filters was of large, textured, green areas. Yes, a dog is a large, textured, green being. :-)
The only way your one-shot training would work is if each of your training images was specifically designed to include exactly those features that differentiate this class from the other 299, and no other visual information. Unfortunately, to identify what features those might be, and to provide canonical training photos, you'd have to know in advance what patterns the model needed to pick.
This entirely defeats the use case of deep learning and model training.
If you were to only train on that image once, it probably wouldn't be able to detect it yet. If you train it more, it will probably over fit and only recognize that one image. If that is what you are trying to do then you should make an algorithm to search the screen for that image (it will be more efficient).
1) You'll probably have problems with the generalization of your models because the lack of training set. In other words, your model will not "learn" about that class.
2) It's good to have a better training set in order to create a better model.
I am working on a system to simplify our image library which grows anywhere from 7k to 20k new pictures per week. The specific application is identifying which race cars are in pictures (all cars are similar shapes with different paint schemes). I plan to use python and tensorflow for this portion of the project.
My initial thought was to use image classification to classify the image by car; however, there is a very high probability of the picture containing multiple cars. My next thought is to use object detection to detect the car numbers (present in fixed location on all cars [nose, tail, both doors, and roof] and consistent font week to week). Lastly there is the approach of object recognition of the whole car. This, on the surface, seems to be the most practical; however, the paint schemes change enough that it may not be.
Which approach will give me the best results? I have pulled a large number of images out for training, and obviously the different methods require very different training datasets.
The best approach would be to use all 3 methods as an ensamble. You train all 3 of those models, and pass the input image to all 3 of them. Then, there are several ways how you can evaluate output.
You can sum up the probabilities for all of the classes for all 3 models and then draw a conclusion based on the highest probability.
You can get prediction from every model and decide based on number of votes: 1. model - class1, 2. model - class2, 3. model - class2 ==> class2
You can do something like weighted decision making. So, let's say that first model is the best and the most robust one but you don't trust it 100% and want to see what other models will say. Than you can weight the output of the first model with 0.6, and output of other two models with weight of 0.2.
I hope this helps :)
I'm building CNN that will tell me if a person has brain damage. I'm planning to use tf inception v3 model, and build_image_data.py script to build TFRecord.
Dataset is composed of brain scans. Every scan has about 100 images(different head poses, angles). On some images, damage is visible, but on some is not. I can't label all images from the scan as a damage positive(or negative), because some of them would be labeled wrong(if scan is positive on damage, but that is not visible on specific image).
Is there a way to label the whole scan as positive/negative and in that way train the network?
And after training is done, pass scan as input to network(not single image) and classify it.
It looks like multiple instance learning might be your approach. Check out these two papers:
Multiple Instance Learning Convolutional Neural
Networks for Object Recognition
Classifying and segmenting microscopy images
with deep multiple instance learning
The last one is implemented by #dancsalo (not sure if he has a stack overflow account) here.
I looks like the second paper deals with very large images and breaks them into sub images, but labels the entire image. So, it is like labeling a bag of images with a label instead of having to make a label for each sub-image. In your case, you might be able to construct a matrix of images, i.e. a 10 image x 10 image master image for each of the scans...
Let us know if you do this and if it works well on your data set!
I am trying to make an end to end unified model that detects(localizes) the object in an image. The object itself can be of many types, like "text in the wild", but the surrounding features of the object should determine where the region of interest is.
Like detecting a human face, without considering the features of the face itself. i.e its some rage distance about the neck.
I'm expecting the output to be coordinates of the object, or like the image-net format to generate bounding boxes like : [xmin , ymin , xmax, ymax]
I have a data-set of 500 images. Are there any examples of object detection in tensorflow based on surrounding features. i.e the feature maps from conv1 or conv2. ?
There is Tensorflow based framework for object detection/localization that you can check out:
https://github.com/Russell91/TensorBox
Though, I am not sure that 500 images would be enough to successfully retrain provided model(s).
Object detection using deep learning is broadly classified in to one-stage detectors (Yolo,SSD) and two stage detectors like Faster RCNN. Google's repo[1] contains pre-trained models for various detection architectures.
You could pick up a pre-trained model and then train it on your dataset. The two-stage model is modular and you have a choice of different feature extractors depending on whether speed/accuracy is crucial for you.
[1] Google's object detection repository