I am detecting hand gestures using tensorflow object detection api but I want to apply condition to this detection means I want to detect hand gesture of a person when a person is wearing a whistle otherwise no detection. Can I achieve this using Tensorflow object detection api. If no suggest me some good methods of achieving this. thanks :)
the first thing that you should perform is customizing a pre-trained Tensorflow model with one class, using this technique you can generate a model which has only one class, named "hand" for example. but how to do this? don't worry, just simply follow the below steps:
download Tensorflow model master from GitHub and build it. (you can clone it using git instead of downloading it).
after your building is accomplished, you have to label your train image using label image software as well. the output of the label image is a CSV file.
in the next step you have to convert this CSV file to the record file.
and then train your own model.
after that your model is being trained, you have to export your model and do the target detection simply.
it is good to say that if you don't have an Nvidia GPU, use google Colab because this process is very time-consuming by making use of CPU.
best wishes to you.
you can prepare the dataset (1) labeling the hand as wearing a whistle and (2) not wearing a whistle after that training with the classification model and not forgetting to preprocess before training
Related
I have a project in which I need firstly to detect if the image is fake or not and if it is fake to detect the object. For the first part, I am using ELA and CNN in order to detect if the image is forged or not, but for the object detection I need to use Mask R-CNN, but unfortunately I have a problem understanding how to use it. I am using the CASIA v2 dataset and I have the ground truth masks for all the forged images.
I saw that every model online is using the model COCO for the mask RCNN, but I need the model to be trained on my dataset. Also, I saw that I need a list of labels, but for my project I only need to display fake on the detected object, is it alright if in the label.txt I will only write "Fake"?
Also, I am a little bit new to Deep Learning, so any help is useful.
I've just started with tensorflow. I wrote a program that uses Fashion_MNIST dataset to train the model. And then predicts the labels using 'test_images'and it's working good so far.
But what I am curious how can I use my own image of a shoe or shirt for prediction. Because all the test images are of shape 28*28. How can I do this ?
The task you are engaged in is the task of data preparation and preprocessing. Among the things you must do already having a directory with images is the tagging of the images, for this task I recommend labelImg.
If you also need the dimensionality of the input to be of a specific size like the example you give, you can use digital image processing software. The OpenCV library has dimensionality reduction tools that work for this.
as the title states, is there a way to build a object detection model (with a library pytorch or tensorflow) that trains straight from the objects picture. Here's an example. This is the inputted image.
(A Clash Royale Battle Scene)
And say that I wanted it to detect a troop (Let's say the valkyrie with orange hair). I could train it with how you'd normally train it (put a box around it) and do a bunch of examples, but is there a way that I could just give it the valkyrie image (below)
And train it on that. For my situation, this will be much easier. Also I know that I said tensorflow before, but if possible I'd like not to use it as I have a 32bit system. Any help is greatly appreciated.
in my project I need to train a object detection model which is able to recognize hands in different poses in real-time from a rgb webcam.
Thus, I'm using the TensorFlow object detection API.
What I did so far is training the model based on the ssd_inception_v2 architecture with the ssd_inception_v2_coco model as finetune-checkpoint.
I want to detect 10 different classes of hand poses. Each class has 300 images which are augmented. In total there are 2400 images in for training and 600 for evaluation. The labeling was done with LabelImg.
The Problem is that the model isn't able to detect the different classes properly. Even if it still wasn't good I got much better results by training with the same images, but only with like 3 different classes. It seems like the problem is the SSD architecture. I've read several times that SSD networks are not good in detecting small objects.
Finally, my questions are the following:
Could I receive better results by using a faster_rcnn_inception architecture for this use case?
Has anyone some advice how to optimize the model?
Do I need more images?
Thanks for your answers!
Is it possible to have bounding boxes prediction using TensorFlow?
I found TensorBox on github but I'm looking for a better supported or maybe official way to address this problem.
I need to retrain the model for my own classes.
It is unclear what exactly do you mean. Do you need object detection? I assume it from the 'bounding boxes'. If so, inception networks are not directly applicable for your task, they are classification networks.
You should look for object detection models, like Single Shot Detector (SSD) or You Only Look Once (YOLO). They often use pre-trained convolutional layers from classification networks, but have additional layers on the top of it. If you want Inception (aka GoogLeNet), YOLO is based on that. Take a look at this implementation: https://github.com/thtrieu/darkflow or any other you can find in Google.
The COCO2016 winner for object detection was implemented in tensorflow. Some state of the art techniques are Faster R-CNN, R-FCN and SSD. Check the slides from http://image-net.org/challenges/talks/2016/GRMI-COCO-slidedeck.pdf (Slide 14 has key tensorflow ops for you to recreate this pipeline).
Edit 6/19/2017:
Tensorflow released some techniques to predict bboxes:
https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html