I try to write an script in python for analyse an .stl data file(3d geometry) and say which model is convex or concave and watertight and tell other properties...
I would like to use and TensorFlow, scikit-learn or other machine learning library. Create some database with examples of objects with tags and in future add some more examples and just re-train model for better results.
But my problem is: I don´t know how to recalculate or restructure 3d data for working in ML libraries. I have no idea.
Thank you for your help.
You have to first extract "features" out of your dataset. These are fixed-dimension vectors. Then you have to define labels which define the prediction. Then, you have to define a loss function and a neural network. Put that all together and you can train a classifier.
In your example, you would first need to extract a fixed dimension vector out of an object. For instance, you could extract the object and project it on a fixed support on the x, y, and z dimensions. That defines the features.
For each object, you'll need to label whether it's convex or concave. You can do that by hand, analytically, or by creating objects analytically that are known to be concave or convex. Now you have a dataset with a lot of sample pairs (object, is-concave).
For the loss function, you can simply use the negative log-probability.
Finally, a feed-forward network with some convoluational layers at the bottom is probably a good idea.
Related
I'm currently trying to find any information on how to implement localization loss within the task of detection of multiple objects on an image. There are a lot of information on how to calculate localization loss if there is only one detection. From the other hand, there are also lots of implementations of sota object detectors (RCNN, F-RDCC, SSD etc).
The reason of my question is that I would like to try to train custom object detector which output is a simple tensor of shape B x N x 4, without any anchors and so on.
So, if I understand correctly, to calculate multiple objects loss one have to first map each perdition to ground truth bounding box (using for instance IoU) and than calculate smooth L1 loss for each pair and average them. How to do it within tensorflow?
Thanks in advance.
I am making a project in which I have to predict a plane trajectory.
I have 2 types of trajectory, the first one is the planned, and the second one is the real one that I recovered after the end of the flight.
The two trajectories are (x,y) points on a map and I want to predict the real one with the planned one.
What kind of model do you use? I heard about multivariate regression or recurrent neural network but I am not sure about both, I think multivariate is not appropriate and rnn include time as parameter and I would not want to use it first.
Do you have any ideas?
Thank you
You could try either training single-target multiple regression models, and predict the x and y variables independently. The other way to go about is to use multi-target regression-based methods. The most commonly used method using Predictive Clustering trees. You can read about various methods from https://towardsdatascience.com/regression-models-with-multiple-target-variables-8baa75aacd to start with. I hope it is somewhat helpful. :)
I'm a bit of a beginner in the art of machine learning. Here is a rather conceptual question I've been wondering:
Suppose I have a function X->Y, say y=x^2, then, generating enough data of X->Y, I can train a neural network to perform regression on the function, and get x^2 with any input x. This is basically also what the Universal Approximation Theorem suggests.
Now, my question is, what if I want the inverse relation, Y->X? In this case, Y is a multi-valued function of X, for instance for X>0, x=+-sqrt(y). I can swap X and Y as input/output data to train the network alright, but for any given y, there should be a random 1/2 - 1/2 chance that x=sqrt(y) and x=-sqrt(y). But of course, if one trains it with min-squared-error, the network wouldn't know this is a multi-value function, and would just follow SGD on the loss function and get x=0, the average value, for any given y.
Therefore, I wonder if there is any way a neural network can model a multi-valued function? For instance, my guess would be
(1) the neural network can output a collection of, say, the top 2 possible values for X and train it with cross-entropy. The problem is, if X is a vector or even a matrix (like a bit-map image) instead of a number, we don't know how many solutions Y=X has (which could very well be an infinite number, i.e. a continuous range), so a "list" of possible values and probabilities won't work - ideally the neural network should output values randomly and continuously distributed across possible X solutions.
(2) perhaps does this fall into the realm of probabilistic neural networks (PNN)? Does PNN model functions that support a given probabilistic distribution (continuous or discrete) of vectors as its output? If so, is it possible to implement PNN with popular frameworks like Tensorflow+Keras?
(Also, note that this is different from a "multivariate" function, which is the case where X,Y could be multi-component vectors, which is still something a traditional network can easily train on. The actual problem in question here is where the output could be a probabilistic distribution of vectors, which is something that a simple feed-forward network doesn't capture, since it doesn't have the inherent randomness.)
Thank you for your kind help!
Image of forward function Y=X^2 (can be easily modeled by network with regression)
Image of inverse function X=+-sqrt(Y) (the network cannot capture the two-value function and outputs the average value X=0 for any Y)
Try to read the following paper:
https://onlinelibrary.wiley.com/doi/abs/10.1002/ecjc.1028
Mifflin's algorithm (or its more general version SLQP-GS) mentioned in this paper is available here and corresponding paper with description is here.
I'm currently using the Fourier transformation in conjunction with Keras for voice recogition (speaker identification). I have heard MFCC is a better option for voice recognition, but I am not sure how to use it.
I am using librosa in python (3) to extract 20 MFCC features. My question is: which MFCC features should I use for speaker identification?
In addition to this I am unsure on how to implement these features. What I would do is to get the necessary features and make one long vector input for a neural network. However, it is also possible to display colors, so could image recognition also be possible, or is this more aimed at speech, and not speaker recognition?
In short, I am unsure where I should start, as I am not very experienced with image recognition and have no idea where to start.
Thanks in advance!!
My question is: which MFCC features should I use for speaker identification?
I shall say that use all of them. Technically MFCC features are output from different filter banks. It is hard to say a priori which of them will be useful.
In addition to this I am unsure on how to implement these features. What I would do is to get the necessary features and make one long vector input for a neural network.
Actually when you extract MFCC for N samples you get an array like N x T x 20 T represents the number of frames in the audio signal after processed for MFCC. I will suggest using Sequence classification with LSTM. This will give better result.
In addition to this I am unsure on how to implement these features.
What I would do is to get the necessary features and make one long
vector input for a neural network.
For each sample, you must have got a 2D matrix of MFCC like N x T X no_mfccs (in your case no_mfccs=20); to make it into one single vector, various researchers take statistics such as mean, var, IQR, etc. to reduce the feature dimension. Some also model it using multivariate regression, and some fit it to a Gaussian mixture model. It depends on the next stage. In your case, you can use statistics to convert into a single vector
OR As told by Parthosarathi, you can use LSTM to preserve sequential information across time frames.
However, it is also possible to display colors, so could image recognition also be possible, or is this more aimed at speech, and not speaker recognition?
I will not recommend you to use spectrogram (image) as a feature vector to neural network because visual images and spectrograms do not accumulate visual objects and sound events information in the same manner.
when you feed image to neural network it assumes that features (pixel values) of an image carry the same meaning regardless of their location. But in case of the spectrogram, location of feature matters a lot.
e.g. Moving the frequencies of a male voice upwards could change its meaning from man to child. Therefore, the spatial invariance that 2D CNN provides might not perform as well for this form of data.
To learn more about it refer: What’s wrong with CNNs and spectrograms for audio processing?
You can use MFCCs with dense layers / multilayer perceptron, but probably a Convolutional Neural Network on the mel-spectrogram will perform better, assuming that you have enough training data.
I am going through tensorflow tutorial tensorflow. I would like to find description of the following line:
tf.contrib.layers.embedding_column
I wonder if it uses word2vec or anything else, or maybe I am thinking in completely wrong direction. I tried to click around on GibHub, but found nothing. I am guessing looking on GitHub is not going to be easy, since python might refer to some C++ libraries. Could anybody point me in the right direction?
I've been wondering about this too. It's not really clear to me what they're doing, but this is what I found.
In the paper on wide and deep learning, they describe the embedding vectors as being randomly initialized and then adjusted during training to minimize error.
Normally when you do embeddings, you take some arbitrary vector representation of the data (such as one-hot vectors) and then multiply it by a matrix that represents the embedding. This matrix can be found by PCA or while training by something like t-SNE or word2vec.
The actual code for the embedding_column is here, and it's implemented as a class called _EmbeddingColumn which is a subclass of _FeatureColumn. It stores the embedding matrix inside its sparse_id_column attribute. Then, the method to_dnn_input_layer applies this embedding matrix to produce the embeddings for the next layer.
def to_dnn_input_layer(self,
input_tensor,
weight_collections=None,
trainable=True):
output, embedding_weights = _create_embedding_lookup(
input_tensor=self.sparse_id_column.id_tensor(input_tensor),
weight_tensor=self.sparse_id_column.weight_tensor(input_tensor),
vocab_size=self.length,
dimension=self.dimension,
weight_collections=_add_variable_collection(weight_collections),
initializer=self.initializer,
combiner=self.combiner,
trainable=trainable)
So as far as I can see, it seems like the embeddings are formed by applying whatever learning rule you're using (gradient descent, etc.) to the embedding matrix.
I had a similar doubt about embeddings.
Here is the main point:
The ability of adding an embedding layer along with tradition wide linear models allows for accurate predictions by reducing sparse dimensionality down to low dimensionality.
Here is a good post about it!
And here is a simple example combining embedding layers. Using the Titanic Kaggle data to predict whether or not the passenger will survive based on certain attributes like Name, Sex, what ticket they had, the fare they paid the cabin they stayed in etc.