as the title states, is there a way to build a object detection model (with a library pytorch or tensorflow) that trains straight from the objects picture. Here's an example. This is the inputted image.
(A Clash Royale Battle Scene)
And say that I wanted it to detect a troop (Let's say the valkyrie with orange hair). I could train it with how you'd normally train it (put a box around it) and do a bunch of examples, but is there a way that I could just give it the valkyrie image (below)
And train it on that. For my situation, this will be much easier. Also I know that I said tensorflow before, but if possible I'd like not to use it as I have a 32bit system. Any help is greatly appreciated.
Related
I am new to tensorflow and I could not find an answer for my question.
I am trying to make a simple program what recognises the type of van from the picture. I downloaded about 100 pictures for my dataset from each category.
My question is should I crop the pictures so only the van is visible on the picture?
Or should I use the original picture with the background for better accuracy?
The short answer is yes it will, but there is a lot more to consider when asking that question. For example, when I use this model how will the images look? Will there be someone to manually crop these images and then use the model or will someone be taking these photos off of a cell phone using an app? A core concept of Machine learning is to have the images in the production environment as close to the training data as possible so that your performance in production doesn't change.
If you're just trying to learn I would highly recommend trying to build a network on MNIST or lego bricks dataset before you try on your own images as if you get stuck on either there are a lot of great resources available :). Also, consider setting aside 10 images as a Test set so that you can evaluate model performance. And third, Tensorflow has a built-in image dataset generator which will greatly improve your model performance on a small dataset like this. The TensorFlow image dataset generator can scale, rotate, flip and, zoom your images which will produce huge improvements in model accuracy.
Good luck!
i try to determine the quality of a person's sitting posture. (e.g. sitting upright = good / sitting crouched = bad) with a webcam.
First try:
Image aquisition (with OpenCV python bindings)
Create a dataset with labeled images into good/bad
Feature detection (FAST)
Train a neuronal net on the dataset with that features(ANN_MLP)
The result was ok with a few restrictions:
not invariant to webcam movements, displacement, other persons, objects etc.
iam not sure if FAST features will fit
im pretty new to machine learning and want to try more sophisticated approaches with TensorFlow:
Second try:
I tried human pose detection via Tensorflow PoseNet
And got a mini example working which can determine probabilities of human bodypart positions. So now the challenge would be to detect the quality of a person's sitting posture out of the output of PoseNet.
What is a good way to proceed:
train a second TF model which gets probabilities of human bodypart
positions as input and outputs good/bad posture? (so PoseNet is used as fancy feature detector)
rework the PoseNet model to fit my output needs and retrain it?
transfer learning from PoseNet (i just read about it but have no clue how or if its even applicable here)?
or maybe a complete different approach?
I'm pretty new to object detection. I'm using tensorflow object detection API and I'm now collecting datasets for my project
and model_main.py to train my model.
I have found and transformed two quite large datasets of cars and traffic lights with annotations. And made two tfrecords from them.
Now I want to train a pretrained model however, I'm just curious will it work? When it is possible that an image for example "001.jpg" will have of course some annotated bounding boxes of cars (it is from the car dataset) but if there is a traffic light as well it wouldn't be annotated -> will it lead to bad learning rate? (there can be many of theese "problematic" images) How should I improve this? Is there any workaround? (I really don't want to annotate the images again)
If its stupid question I'm sorry, thanks for any response - some links with this problematic would be the best !
Thanks !
The short answer is yes, it might be problematic, but with some effort you can make it possible.
If you have two urban datasets, and in one you only have annotations for traffic lights, and in the second you only have annotations for cars, then each instance of car in the first dataset will be learned as false example, and each instance of traffic light in the second dataset will be learned as false example.
The two possible outcomes I can think of are:
The model will not converge, since it tries to learn opposite things.
The model will converge, but will be domain specific. This means that the model will only detect traffic lights on images from the domain of the first dataset, and cars on the second.
In fact I tried doing so myself in a different setup, and got this outcome.
In order to be able to learn your objective of learning traffic lights and cars no matter which dataset they come from, you'll need to modify your loss function. You need to tell the loss function from which dataset each image comes from, and then only compute the loss on the corresponding classes (/zero out the loss on the classes do not correspond to it). So returning to our example, you only compute loss and backpropagate traffic lights on the first dataset, and cars on the second.
For completeness I will add that if resources are available, then the better option is to annotate all the classes on all datasets in order to avoid the suggested modification, since by only backpropagating certain classes, you do not enjoy using actual false examples for other classes.
I have a use case where I have about 300 images out of 300 different items. I need machine learning to detect an item about once a minute.
I've been using Keras with Sequential to detect images but I'm wondering what I should take into consideration when I have 300 labels and only one image per label for learning.
So in short:
1) Can you do machine learning image detection with one learning image per label?
2) Are there any special things I take into consideration?
If this were a special case -- say, one class in 100 was represented by a single training image -- then you might get away with it. However, a unique image per class is asking for trouble.
A neural network learns by iterative correction, figuring out what features and combinations are important, and which are not, in discriminating the classes from one another. Training starts by a chaotic process that has some similarities to research: look at the available data, form hypotheses, and test then against the real world.
In a NN, the "hypotheses" are the various kernels it develops. Each kernel is a pattern to recognize something important to the discrimination process. If you lack enough examples for the model to generalize and discriminate for each class, then you run the risk (actually, you have the likelihood) of the model making a conclusion that is valid for the one input image, but not others in the same class.
For instance, one acquaintance of mine did the canonical cat-or-dog model, using his own photos, showing the pets of his own household and those of a couple of friends. The model trained well, identified cats and dogs with 100% accuracy on the test data, and he brought it into work ...
... where it failed, having an accuracy of about 65% (random guessing is 50%). He did some analysis and found the problem: his friends have indoor cats, but their preferred dog photos were out of doors. Very simply, the model had learned to identify not cats vs dogs, but rather couches and kitchen cabinets vs outdoor foliage. One of the main filters was of large, textured, green areas. Yes, a dog is a large, textured, green being. :-)
The only way your one-shot training would work is if each of your training images was specifically designed to include exactly those features that differentiate this class from the other 299, and no other visual information. Unfortunately, to identify what features those might be, and to provide canonical training photos, you'd have to know in advance what patterns the model needed to pick.
This entirely defeats the use case of deep learning and model training.
If you were to only train on that image once, it probably wouldn't be able to detect it yet. If you train it more, it will probably over fit and only recognize that one image. If that is what you are trying to do then you should make an algorithm to search the screen for that image (it will be more efficient).
1) You'll probably have problems with the generalization of your models because the lack of training set. In other words, your model will not "learn" about that class.
2) It's good to have a better training set in order to create a better model.
in my project I need to train a object detection model which is able to recognize hands in different poses in real-time from a rgb webcam.
Thus, I'm using the TensorFlow object detection API.
What I did so far is training the model based on the ssd_inception_v2 architecture with the ssd_inception_v2_coco model as finetune-checkpoint.
I want to detect 10 different classes of hand poses. Each class has 300 images which are augmented. In total there are 2400 images in for training and 600 for evaluation. The labeling was done with LabelImg.
The Problem is that the model isn't able to detect the different classes properly. Even if it still wasn't good I got much better results by training with the same images, but only with like 3 different classes. It seems like the problem is the SSD architecture. I've read several times that SSD networks are not good in detecting small objects.
Finally, my questions are the following:
Could I receive better results by using a faster_rcnn_inception architecture for this use case?
Has anyone some advice how to optimize the model?
Do I need more images?
Thanks for your answers!