Good approach to determine quality of a human posture with TensorFlow - python

i try to determine the quality of a person's sitting posture. (e.g. sitting upright = good / sitting crouched = bad) with a webcam.
First try:
Image aquisition (with OpenCV python bindings)
Create a dataset with labeled images into good/bad
Feature detection (FAST)
Train a neuronal net on the dataset with that features(ANN_MLP)
The result was ok with a few restrictions:
not invariant to webcam movements, displacement, other persons, objects etc.
iam not sure if FAST features will fit
im pretty new to machine learning and want to try more sophisticated approaches with TensorFlow:
Second try:
I tried human pose detection via Tensorflow PoseNet
And got a mini example working which can determine probabilities of human bodypart positions. So now the challenge would be to detect the quality of a person's sitting posture out of the output of PoseNet.
What is a good way to proceed:
train a second TF model which gets probabilities of human bodypart
positions as input and outputs good/bad posture? (so PoseNet is used as fancy feature detector)
rework the PoseNet model to fit my output needs and retrain it?
transfer learning from PoseNet (i just read about it but have no clue how or if its even applicable here)?
or maybe a complete different approach?

Related

simple way to detect street area in google map images (aerial images)

im trying to detect the area of the street in image without any deep learning method.
say i have this image:
i am looking for any simple method to detect street portion of the image like the following:
now i know this might not be very accurate, and accuracy is not the problem at all , i am trying to achieve this without using any deep learning method.
Hough line can give direct straight line measure. but i don't thin it will give you exactly what you want. As shown below
You need a lot more complicated algorithms such as deep sematic segmentation model. and train based on that.
Even you don't like deep learning. traditional algo such as variational analysis, SVM learning or adaboost is also very complicated and you wont be able to use it easily. You need to have mucher deeper understanding on those topic.
if you really want you can start with variational analysis, active contour model, snake energy for extracting the road first. This variational analysis is proven to be working for a complex scenes and extract a particular model as shown in the image below. your road is the empty low gradient region and all building tree nearby are high gradient responses that you don't want.
My suggestion is to make your life easier by using pre trained model and extra the surface model. Download, run python script. that's all
There are a few open-source implementations that you can try such as this
https://github.com/ArkaJU/U-Net-Satellite
https://github.com/Paulymorphous/Road-Segmentation
https://github.com/avanetten/cresi
Based on the predicted mask. then you can get production accurately as shown below
This would be the result that you are looking for
Regards
Shenghai Yuan

Does the size of the 'bounding box' for the training data matter when transfer learning a SSD model with the Tensorflow object detection library?

I am trying to transfer learn a mobilenet_v2_coco model on the publicly available GTSRB (German Traffic Signs) Dataset.
I selected 3 classes to have a faster training time and I've already trained for about 10 000 epochs. Usually I already get decent results at this point in time. But my SSD fails to find anything on a livestream video I access over a small python program with my webcam. It even classifies almost the entire screen as one of the classes provided (the one that has more training data) with >90% confidence.
My guesses are that either this is because of the unbalanced data set (class1 = 2000 images, class2 = 1000 images, class3 = 800) or because of the images being filled with the object, without much noise or anything. So basically the ROI is almost as big as the dataset images, but the classifier aims to predict dash cam like videos, where the signs are usually very small.
Or do I just have to train harder and longer this time to get decent results?
The second part of my question is, if there is like a rule of thumb what the images in the dataset need to fulfil to output good predictions.

Using machine learning to detect images based on single learning image

I have a use case where I have about 300 images out of 300 different items. I need machine learning to detect an item about once a minute.
I've been using Keras with Sequential to detect images but I'm wondering what I should take into consideration when I have 300 labels and only one image per label for learning.
So in short:
1) Can you do machine learning image detection with one learning image per label?
2) Are there any special things I take into consideration?
If this were a special case -- say, one class in 100 was represented by a single training image -- then you might get away with it. However, a unique image per class is asking for trouble.
A neural network learns by iterative correction, figuring out what features and combinations are important, and which are not, in discriminating the classes from one another. Training starts by a chaotic process that has some similarities to research: look at the available data, form hypotheses, and test then against the real world.
In a NN, the "hypotheses" are the various kernels it develops. Each kernel is a pattern to recognize something important to the discrimination process. If you lack enough examples for the model to generalize and discriminate for each class, then you run the risk (actually, you have the likelihood) of the model making a conclusion that is valid for the one input image, but not others in the same class.
For instance, one acquaintance of mine did the canonical cat-or-dog model, using his own photos, showing the pets of his own household and those of a couple of friends. The model trained well, identified cats and dogs with 100% accuracy on the test data, and he brought it into work ...
... where it failed, having an accuracy of about 65% (random guessing is 50%). He did some analysis and found the problem: his friends have indoor cats, but their preferred dog photos were out of doors. Very simply, the model had learned to identify not cats vs dogs, but rather couches and kitchen cabinets vs outdoor foliage. One of the main filters was of large, textured, green areas. Yes, a dog is a large, textured, green being. :-)
The only way your one-shot training would work is if each of your training images was specifically designed to include exactly those features that differentiate this class from the other 299, and no other visual information. Unfortunately, to identify what features those might be, and to provide canonical training photos, you'd have to know in advance what patterns the model needed to pick.
This entirely defeats the use case of deep learning and model training.
If you were to only train on that image once, it probably wouldn't be able to detect it yet. If you train it more, it will probably over fit and only recognize that one image. If that is what you are trying to do then you should make an algorithm to search the screen for that image (it will be more efficient).
1) You'll probably have problems with the generalization of your models because the lack of training set. In other words, your model will not "learn" about that class.
2) It's good to have a better training set in order to create a better model.

Tensorflow - classify based on multiple image as input, not signle one

I'm building CNN that will tell me if a person has brain damage. I'm planning to use tf inception v3 model, and build_image_data.py script to build TFRecord.
Dataset is composed of brain scans. Every scan has about 100 images(different head poses, angles). On some images, damage is visible, but on some is not. I can't label all images from the scan as a damage positive(or negative), because some of them would be labeled wrong(if scan is positive on damage, but that is not visible on specific image).
Is there a way to label the whole scan as positive/negative and in that way train the network?
And after training is done, pass scan as input to network(not single image) and classify it.
It looks like multiple instance learning might be your approach. Check out these two papers:
Multiple Instance Learning Convolutional Neural
Networks for Object Recognition
Classifying and segmenting microscopy images
with deep multiple instance learning
The last one is implemented by #dancsalo (not sure if he has a stack overflow account) here.
I looks like the second paper deals with very large images and breaks them into sub images, but labels the entire image. So, it is like labeling a bag of images with a label instead of having to make a label for each sub-image. In your case, you might be able to construct a matrix of images, i.e. a 10 image x 10 image master image for each of the scans...
Let us know if you do this and if it works well on your data set!

Find most similar images by using neural networks

I am working with Python, scikit-learn and keras. I have 3000 thousands images of front-faced watches like the following ones:
Watch_1, Watch_2, Watch_3.
I like to write program which receives as an input a photo of a real watch which maybe taken under less ideal conditions than the photos above (different background colour, darker lightning etc) and find the most similar watches among the 3000 ones to it. By similarity I mean that if I give as an input a photo of a round, brown watch with thin lace then I expect as an output watches of round shape, of dark colour and with thin lace.
What is the most efficient machine learning algorithm to do this?
For example, by following this link I have two different solutions in my mind:
1) Using a CNN as a feature extractor and compare the distances between the these features for every pair of images with reference to the input image.
2) Using two CNNs in a Siamese Neural Network to compare the images.
Are these two options the best ones for this task or would you suggest something else?
Do you know any pre-trained neural network (with pre-determined hyperparameters) for this task?
I have found some interesting posts on StackOverflow about this but they are pretty old: Post_1, Post_2, Post_3.
It's going to be difficult to define what exactly you mean by similar with your photos. Since they are all watches you'll have to decide which features being most similar matter to you (shape, color, numbers/blank face etc.)
Here is an approach using the tensorflow library mixed with a nearest neighbor library with example code: http://douglasduhaime.com/posts/identifying-similar-images-with-tensorflow.html
It can at least get you started.

Categories