I'm currently trying to find any information on how to implement localization loss within the task of detection of multiple objects on an image. There are a lot of information on how to calculate localization loss if there is only one detection. From the other hand, there are also lots of implementations of sota object detectors (RCNN, F-RDCC, SSD etc).
The reason of my question is that I would like to try to train custom object detector which output is a simple tensor of shape B x N x 4, without any anchors and so on.
So, if I understand correctly, to calculate multiple objects loss one have to first map each perdition to ground truth bounding box (using for instance IoU) and than calculate smooth L1 loss for each pair and average them. How to do it within tensorflow?
Thanks in advance.
Related
I was trying to tackle an ML problem with tensor flow, but im not sure what algorithm should I use. I have tagged images on my dataset. When a new image comes on, i want the to correlate the images I have, based on the tags. Where should I start? O.o
What do you mean by correlate the images? Are you attempting to cluster the images based on their tags?
If so, you could train an encoder that runs over your images, produces a feature vector and cluster those feature vectors based on their image tags. So for example, consider you had multiple images of tags: cars & cats. You could run an encoder (consisting of convolutional layers), flatten the final layer to get a feature vector and run a clustering algorithm like K-means (with K=2, since you only have 2 tags -cars & cats).
Depending on the size and nature of the images in your dataset you might have to play around with the encoder architecture, collect more data, use alternate clustering algorithms etc.
In the event your image feature vector can belong to multiple classes and you would like to return possible tags, you'll have to opt for soft clustering algorithms such as GMMs (Gaussian Mixture Models) or FCMs (Fuzzy C Means). These algorithms don't specifically output class but outputs a class score for each data point. So if you want the top 5 tags of a new image, you could:
Run an encoder to get a feature vector
Perform soft clustering on the feature vectors
Get the 5 highest scoring classes
I'm a bit of a beginner in the art of machine learning. Here is a rather conceptual question I've been wondering:
Suppose I have a function X->Y, say y=x^2, then, generating enough data of X->Y, I can train a neural network to perform regression on the function, and get x^2 with any input x. This is basically also what the Universal Approximation Theorem suggests.
Now, my question is, what if I want the inverse relation, Y->X? In this case, Y is a multi-valued function of X, for instance for X>0, x=+-sqrt(y). I can swap X and Y as input/output data to train the network alright, but for any given y, there should be a random 1/2 - 1/2 chance that x=sqrt(y) and x=-sqrt(y). But of course, if one trains it with min-squared-error, the network wouldn't know this is a multi-value function, and would just follow SGD on the loss function and get x=0, the average value, for any given y.
Therefore, I wonder if there is any way a neural network can model a multi-valued function? For instance, my guess would be
(1) the neural network can output a collection of, say, the top 2 possible values for X and train it with cross-entropy. The problem is, if X is a vector or even a matrix (like a bit-map image) instead of a number, we don't know how many solutions Y=X has (which could very well be an infinite number, i.e. a continuous range), so a "list" of possible values and probabilities won't work - ideally the neural network should output values randomly and continuously distributed across possible X solutions.
(2) perhaps does this fall into the realm of probabilistic neural networks (PNN)? Does PNN model functions that support a given probabilistic distribution (continuous or discrete) of vectors as its output? If so, is it possible to implement PNN with popular frameworks like Tensorflow+Keras?
(Also, note that this is different from a "multivariate" function, which is the case where X,Y could be multi-component vectors, which is still something a traditional network can easily train on. The actual problem in question here is where the output could be a probabilistic distribution of vectors, which is something that a simple feed-forward network doesn't capture, since it doesn't have the inherent randomness.)
Thank you for your kind help!
Image of forward function Y=X^2 (can be easily modeled by network with regression)
Image of inverse function X=+-sqrt(Y) (the network cannot capture the two-value function and outputs the average value X=0 for any Y)
Try to read the following paper:
https://onlinelibrary.wiley.com/doi/abs/10.1002/ecjc.1028
Mifflin's algorithm (or its more general version SLQP-GS) mentioned in this paper is available here and corresponding paper with description is here.
I'm now working on a 3D multi-class semantic segmentation task with Keras.
I know there is a way like weighted cross entropy that can increase the weight of each class. But I'm wondering is there a possible way to add different weights on different slices in one input volume?
For example, let X denote the number of slices, and W is the weight. I'm now trying to set a dynamic weighted cross entropy as W=sqrt(X). But I don't how exactly to realize it in Keras.
Could anyone help me?
I try to write an script in python for analyse an .stl data file(3d geometry) and say which model is convex or concave and watertight and tell other properties...
I would like to use and TensorFlow, scikit-learn or other machine learning library. Create some database with examples of objects with tags and in future add some more examples and just re-train model for better results.
But my problem is: I donĀ“t know how to recalculate or restructure 3d data for working in ML libraries. I have no idea.
Thank you for your help.
You have to first extract "features" out of your dataset. These are fixed-dimension vectors. Then you have to define labels which define the prediction. Then, you have to define a loss function and a neural network. Put that all together and you can train a classifier.
In your example, you would first need to extract a fixed dimension vector out of an object. For instance, you could extract the object and project it on a fixed support on the x, y, and z dimensions. That defines the features.
For each object, you'll need to label whether it's convex or concave. You can do that by hand, analytically, or by creating objects analytically that are known to be concave or convex. Now you have a dataset with a lot of sample pairs (object, is-concave).
For the loss function, you can simply use the negative log-probability.
Finally, a feed-forward network with some convoluational layers at the bottom is probably a good idea.
I am trying to make an end to end unified model that detects(localizes) the object in an image. The object itself can be of many types, like "text in the wild", but the surrounding features of the object should determine where the region of interest is.
Like detecting a human face, without considering the features of the face itself. i.e its some rage distance about the neck.
I'm expecting the output to be coordinates of the object, or like the image-net format to generate bounding boxes like : [xmin , ymin , xmax, ymax]
I have a data-set of 500 images. Are there any examples of object detection in tensorflow based on surrounding features. i.e the feature maps from conv1 or conv2. ?
There is Tensorflow based framework for object detection/localization that you can check out:
https://github.com/Russell91/TensorBox
Though, I am not sure that 500 images would be enough to successfully retrain provided model(s).
Object detection using deep learning is broadly classified in to one-stage detectors (Yolo,SSD) and two stage detectors like Faster RCNN. Google's repo[1] contains pre-trained models for various detection architectures.
You could pick up a pre-trained model and then train it on your dataset. The two-stage model is modular and you have a choice of different feature extractors depending on whether speed/accuracy is crucial for you.
[1] Google's object detection repository