I would like to implement the code for model.predict (https://keras.io/models/model/) in C++. But I am unable to find the exact logic (equations, formula) used in prediction?
For C++, I implemented the source code here: https://github.com/Dobiasd/frugally-deep
but unfortunately could not find the equation behind the predict function. (Frugally deep exports the model as a .json file and does the prediction using the predict function).
Would there be any resources that I could refer to find the equations for model.predict?
model.predict implements a forward pass of the model, so there is no direct equation, the computation is inferred from the computation graph of the model.
So in order to implement the same behavior, you have to do a forward pass through the layers of the model, where each layer implements its own computation, so its not a simple recommendation of use equation X, because its a large set of computational formulas that you have to implement, one for each kind of layer.
Looking at the repo, it appears you're looking for this.
Related
I would like to implement a cost-sensitive loss function in PyTorch. My two-class training dataset is heavily imbalanced, where 75% of the data are label '0' and only 25% of the data are label '1'.
I am new to PyTorch but my supervisor is adamant that I use it (they have more experience with it).
I found some implementations in Keras, but I am not that strong in coding to be able to port it over to PyTorch.
I have read around to find some resources to create a cost-sensitive loss function.
This paper uses something which I think might work (https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9417097), but I do not understand how the code is implemented despite having access to it here (https://github.com/emadeldeen24/AttnSleep/blob/f993511426900f9fca20594a738bf8bee1116381/utils/util.py).
This website describes the math very detailedly but I do not understand it: https://medium.com/rv-data/how-to-do-cost-sensitive-learning-61848bf4f5e7
Here is an implementation in Keras which I have trouble with converting to PyTorch: https://towardsdatascience.com/fraud-detection-with-cost-sensitive-machine-learning-24b8760d35d9
I also found this implementation in PyTorch, but have trouble with understanding it: https://discuss.pytorch.org/t/dealing-with-imbalanced-datasets-in-pytorch/22596/21
Could you please help me to understand the last link's implementation of the cost-sensitive loss function?
Thank you.
I have a data set for which I use Sklearn Decision Tree regression machine learning package to build a model for prediction purposes. Subsequently, I am trying to utilize scipy.optimize package to solve for the minimized solution based on a given constraint.
However, I am not sure if I can take the decision tree model as the objective function for the optimization problem. What should be the approach in a situation like this? I have tried linear regression models such as LarsCV in the past and they worked just fine. But in a linear regression model, you can essentially extract the coefficients and interception point from the model.
Yes; a linear regression model is a straightforward linear function of coefficients (one of which is the "intercept" or "bias").
The problem you have now is that a more complex model isn't quite so simple. You need to load the model into an appropriate engine. To "call" the model, you feed that engine the input vector (the cognate of a list of arguments), and wait for the model to return the prediction.
You need to wrap this process in a function call, perhaps one that issues the model load and processing as external system / shell commands, and returns the results to your main program. Some applications are large enough that it makes sense to implement a full-bore data stream with listener and reporter to handle the throughput.
Does that get you moving?
I read that adding a new function is not so straightforward in this answer, so that is not a viable option. Also it was mentioned here that it is possible to implement Gaussian using the tools made available by tensorflow. Can someone please give a detailed answer on how to exactly implement a Gaussian activation function in tf such that it behaves like a normal non-linear function and can be trained by back-prop ?
In general, if you want to implement a complex function in tensorflow, you can use all the basic mathematic functions that tensorflow provides and many of the common functions will be implemented.
Those functions provided by tensorflow's api are usually implemented with gradient descent compatibility in mind. So if you implement your complex function using derivable tensorflow operations, your function will be compatible with gradient descent.
So when you have a new function to implement, search for its mathematical formula, most of the time you will find corresponding mathematical operations in tensorflow.
(To provide a specific, ready to use answer for your particular problem is not in the best interest of you or SO), but check the formula of a gaussian function and it should be easy to implement.
I am going through tensorflow tutorial tensorflow. I would like to find description of the following line:
tf.contrib.layers.embedding_column
I wonder if it uses word2vec or anything else, or maybe I am thinking in completely wrong direction. I tried to click around on GibHub, but found nothing. I am guessing looking on GitHub is not going to be easy, since python might refer to some C++ libraries. Could anybody point me in the right direction?
I've been wondering about this too. It's not really clear to me what they're doing, but this is what I found.
In the paper on wide and deep learning, they describe the embedding vectors as being randomly initialized and then adjusted during training to minimize error.
Normally when you do embeddings, you take some arbitrary vector representation of the data (such as one-hot vectors) and then multiply it by a matrix that represents the embedding. This matrix can be found by PCA or while training by something like t-SNE or word2vec.
The actual code for the embedding_column is here, and it's implemented as a class called _EmbeddingColumn which is a subclass of _FeatureColumn. It stores the embedding matrix inside its sparse_id_column attribute. Then, the method to_dnn_input_layer applies this embedding matrix to produce the embeddings for the next layer.
def to_dnn_input_layer(self,
input_tensor,
weight_collections=None,
trainable=True):
output, embedding_weights = _create_embedding_lookup(
input_tensor=self.sparse_id_column.id_tensor(input_tensor),
weight_tensor=self.sparse_id_column.weight_tensor(input_tensor),
vocab_size=self.length,
dimension=self.dimension,
weight_collections=_add_variable_collection(weight_collections),
initializer=self.initializer,
combiner=self.combiner,
trainable=trainable)
So as far as I can see, it seems like the embeddings are formed by applying whatever learning rule you're using (gradient descent, etc.) to the embedding matrix.
I had a similar doubt about embeddings.
Here is the main point:
The ability of adding an embedding layer along with tradition wide linear models allows for accurate predictions by reducing sparse dimensionality down to low dimensionality.
Here is a good post about it!
And here is a simple example combining embedding layers. Using the Titanic Kaggle data to predict whether or not the passenger will survive based on certain attributes like Name, Sex, what ticket they had, the fare they paid the cabin they stayed in etc.
I would like to ask if anyone has an idea or example of how to do support vector regression in python with high dimensional output( more than one) using a python binding of libsvm? I checked the examples and they are all assuming the output to be one dimensional.
libsvm might not be the best tool for this task.
The problem you describe is called multivariate regression, and usually for regression problems, SVM's are not necessarily the best choice.
You could try something like group lasso (http://www.di.ens.fr/~fbach/grouplasso/index.htm - matlab) or sparse group lasso (http://spams-devel.gforge.inria.fr/ - seems to have a python interface), which solve the multivariate regression problem with different types of regularization.
Support Vector Machines as a mathematical framework is formulated in terms of a single prediction variable. Hence most libraries implementing them will reflect this as using one single target variable in their API.
What you could do is train a single SVM model for each target dimension in your data.
on the plus side, you can train them in // on a cluster as each model is independent of one another
on the minus side, sub-models will share nothing and won't benefit from what they individually discovered in the structure of the input data and potentially need a lot of memory to store the model as they will have no shared intermediate representation
Variant of SVMs can probably be devised in a multi-task learning setting to learn some common kernel-based intermediate representation suitable for reuse to predict multi-dimensional targets however this is not implemented in libsvm AFAIK. Google for multi task learning SVM if you want to learn more.
Alternatively, multi-layer perceptrons (a kind of feed forward neural networks) can naturally deal with multi-dimensional outcomes and hence should be better at sharing intermediate representations of the data reused across targets, especially if they are deep enough with the first layers pre-trained in an unsupervised manner using an autoencoder objective function.
You might want to have a look at http://deeplearning.net/tutorial/ for a nice introduction to various neural network architectures and practical tools and examples to implement them efficiently.