I have a use case where I have about 300 images out of 300 different items. I need machine learning to detect an item about once a minute.
I've been using Keras with Sequential to detect images but I'm wondering what I should take into consideration when I have 300 labels and only one image per label for learning.
So in short:
1) Can you do machine learning image detection with one learning image per label?
2) Are there any special things I take into consideration?
If this were a special case -- say, one class in 100 was represented by a single training image -- then you might get away with it. However, a unique image per class is asking for trouble.
A neural network learns by iterative correction, figuring out what features and combinations are important, and which are not, in discriminating the classes from one another. Training starts by a chaotic process that has some similarities to research: look at the available data, form hypotheses, and test then against the real world.
In a NN, the "hypotheses" are the various kernels it develops. Each kernel is a pattern to recognize something important to the discrimination process. If you lack enough examples for the model to generalize and discriminate for each class, then you run the risk (actually, you have the likelihood) of the model making a conclusion that is valid for the one input image, but not others in the same class.
For instance, one acquaintance of mine did the canonical cat-or-dog model, using his own photos, showing the pets of his own household and those of a couple of friends. The model trained well, identified cats and dogs with 100% accuracy on the test data, and he brought it into work ...
... where it failed, having an accuracy of about 65% (random guessing is 50%). He did some analysis and found the problem: his friends have indoor cats, but their preferred dog photos were out of doors. Very simply, the model had learned to identify not cats vs dogs, but rather couches and kitchen cabinets vs outdoor foliage. One of the main filters was of large, textured, green areas. Yes, a dog is a large, textured, green being. :-)
The only way your one-shot training would work is if each of your training images was specifically designed to include exactly those features that differentiate this class from the other 299, and no other visual information. Unfortunately, to identify what features those might be, and to provide canonical training photos, you'd have to know in advance what patterns the model needed to pick.
This entirely defeats the use case of deep learning and model training.
If you were to only train on that image once, it probably wouldn't be able to detect it yet. If you train it more, it will probably over fit and only recognize that one image. If that is what you are trying to do then you should make an algorithm to search the screen for that image (it will be more efficient).
1) You'll probably have problems with the generalization of your models because the lack of training set. In other words, your model will not "learn" about that class.
2) It's good to have a better training set in order to create a better model.
Related
To explain the title better, I am looking to classify pictures between two classes. For example, let's say that 0 is white, and black is 1. I train and validate the system with pictures that are gray, some lighter than others. In other words, none of the training/validation (t/v) pictures are 0, and none are 1. The t/v pictures range between 0 and 1 depending of how dark the gray is.
Of course, this is just a hypothetical situation, but I want to apply a similar scenario for my work. All of the information I have found online is based on a binary classification (either 1 or 0), rather than a spectrum classification (between 1 and 0).
I assume that this is possible, but I have no idea where to start. Although, I do have a binary code written with good accuracy.
Based on your given example, maybe a classification approach is not the best one. I think that what you have is a regression problem, as you want your output to be a continuous value in some range, that has a meaning itself (as higher or lower values have a proper meaning).
Regression tasks usually have an output with linear activation, and they expect to have a continuous value as the ground truth.
I think you could start by taking a look at this tutorial.
Hope this helps!
If I understand you correctly, it's definitely possible.
The creator of Keras, François Chollet, wrote Deep Learning with Python which is worth reading. In it he describes how you could accomplish what you would like.
I have worked through examples in his book and shared the code: whyboris/ml-with-python-and-keras
There are many approaches, but a fast one is to use a pre-trained model that can recognize a wide variety of images (for example, classify 1,000 different categories). You will use it "headless" (without the last classification layer that takes the vectors and decides which of the 1,000 categories it falls most into). And you will train just the "last step" in the model (freezing all the previous layers) while training your binary classifier.
Alternatively you could train your own classifier from scratch. Specifically glance at my example (based off the book) cat-dog-classifier which trains its own binary classifier.
Model has been trained, it reliably recognises dogs from cats in tiny pictures like the following:
All these pictures are pretty much always centered on the cat/dog, and the cat/dog occupies almost all the image frame. There is little to none additional surrounding context, which allows the network to train very efficiently.
The next step is, how to make sure that the same model will effectively tell that in the picture below, there happens to be a cat, similar to the ones used to train the model, but surrounded by a broader environment?
Are there some specific steps to take when the model is supposed to be used in production with images showing a broader context than in training? Or is the model able to detect it automagically?
It decreasing order of effectiveness, the steps you can take are:
Use more training data, with images having larger borders.
Augment existing training images with borders, maybe through random or mirrored padding.
Try cropping out the borders during inference, creating multiple images with different borders. Pick the run with the best result.
I'm new to Tensorflow and AI, so I'm having trouble researching my question. Either that, or my question hasn't been answered.
I'm trying to make a text classifier to put websites into categories based on their keywords. I have at minimum 5,000 sites and maximum 37,000 sites to train with.
What I'm trying to accomplish is: after the model is trained, I want it to continue to train as it makes predictions about the category a website belongs in.
The keywords that the model is trained on is chosen by clients, so it can always be different than the rest of the websites in its category.
How can I make Tensorflow retrain it's model based on corrections made by me if it's prediction is inaccurate? Basically, to be training for ever.
The key phrase you lack is fine-tuning. This is when you take a model that has finished its customary training (whatever that may be), and needs more work for the application you have in mind. You then give it additional training with new input; when that training has completed (training accuracy plateaus and is close to test accuracy), you then deploy the enhanced model for your purposes.
This is often used in commercial applications -- for instance, when a large predictive model is updated to include the most recent week of customer activity. Another common use is to find a model in a zoo that is trained for something related to the application you want -- perhaps cats v dogs -- and use its recognition of facial features to shorten training for a model to identify two classes of cartoon characters -- perhaps Pokemon v Tiny Toons.
In this latter case, your fine-tuning will almost entirely eliminate what was learned by the last few layers of the model. What you gain is the early-layer abilities to find edges, regions, and features through eyes-nose-mouth combinations. This saves at least 30% of the overall training time.
I'm pretty new to object detection. I'm using tensorflow object detection API and I'm now collecting datasets for my project
and model_main.py to train my model.
I have found and transformed two quite large datasets of cars and traffic lights with annotations. And made two tfrecords from them.
Now I want to train a pretrained model however, I'm just curious will it work? When it is possible that an image for example "001.jpg" will have of course some annotated bounding boxes of cars (it is from the car dataset) but if there is a traffic light as well it wouldn't be annotated -> will it lead to bad learning rate? (there can be many of theese "problematic" images) How should I improve this? Is there any workaround? (I really don't want to annotate the images again)
If its stupid question I'm sorry, thanks for any response - some links with this problematic would be the best !
Thanks !
The short answer is yes, it might be problematic, but with some effort you can make it possible.
If you have two urban datasets, and in one you only have annotations for traffic lights, and in the second you only have annotations for cars, then each instance of car in the first dataset will be learned as false example, and each instance of traffic light in the second dataset will be learned as false example.
The two possible outcomes I can think of are:
The model will not converge, since it tries to learn opposite things.
The model will converge, but will be domain specific. This means that the model will only detect traffic lights on images from the domain of the first dataset, and cars on the second.
In fact I tried doing so myself in a different setup, and got this outcome.
In order to be able to learn your objective of learning traffic lights and cars no matter which dataset they come from, you'll need to modify your loss function. You need to tell the loss function from which dataset each image comes from, and then only compute the loss on the corresponding classes (/zero out the loss on the classes do not correspond to it). So returning to our example, you only compute loss and backpropagate traffic lights on the first dataset, and cars on the second.
For completeness I will add that if resources are available, then the better option is to annotate all the classes on all datasets in order to avoid the suggested modification, since by only backpropagating certain classes, you do not enjoy using actual false examples for other classes.
I am working on a system to simplify our image library which grows anywhere from 7k to 20k new pictures per week. The specific application is identifying which race cars are in pictures (all cars are similar shapes with different paint schemes). I plan to use python and tensorflow for this portion of the project.
My initial thought was to use image classification to classify the image by car; however, there is a very high probability of the picture containing multiple cars. My next thought is to use object detection to detect the car numbers (present in fixed location on all cars [nose, tail, both doors, and roof] and consistent font week to week). Lastly there is the approach of object recognition of the whole car. This, on the surface, seems to be the most practical; however, the paint schemes change enough that it may not be.
Which approach will give me the best results? I have pulled a large number of images out for training, and obviously the different methods require very different training datasets.
The best approach would be to use all 3 methods as an ensamble. You train all 3 of those models, and pass the input image to all 3 of them. Then, there are several ways how you can evaluate output.
You can sum up the probabilities for all of the classes for all 3 models and then draw a conclusion based on the highest probability.
You can get prediction from every model and decide based on number of votes: 1. model - class1, 2. model - class2, 3. model - class2 ==> class2
You can do something like weighted decision making. So, let's say that first model is the best and the most robust one but you don't trust it 100% and want to see what other models will say. Than you can weight the output of the first model with 0.6, and output of other two models with weight of 0.2.
I hope this helps :)