I am running a logistic regression with statsmodel and I am trying to add robustness to my model, similar to STATA's robust command, and I can't seem to find it on their documentation. Note that I am not looking for the Robust Linear Regression with command sm.RLM(), as this is another model, not an add-in to the model I want to use.
Thanks in advance !
Related
I am new to machine learning, but I have decent experience in python. I am faced with a problem: I need to find a machine learning model that would work well to predict the speed of a boat given current environmental and physical conditions. I have looked into Scikit-Learn, Pytorch, and Tensorflow, but I am having trouble finding information on what type of model I should use. I am almost certain that linear regression models would be useless for this task. I have been told that non-parametric regression models would be ideal for this, but I am unable to find many in the Scikit Library. Should I be trying to use regression models at all, or should I be looking more into Neural Networks? I'm open to any suggestions, thanks in advance.
I think multi-linear regression model would work well for your case. I am assuming that the input data is just a bunch of environmental parameters and you have a boat speed corresponding to that. For such problems, regression usually works well. I would not recommend you to use neural networks unless you have a lot of training data and the size of one input data is also quite big.
I need to use knime for regression analysis. I am a python user, I know knime as well but not in deep!
I usually use statsmodel in python for regression analysis and working on statistical models.
However for solving regression problem as a machine learning problem I use sklearn regression model. Each of these packages in python has its own benefit deepened on your task, and also different view of output which is really important to address the problem in the right way.
Here is my question, does knime present any special package for statistical model? If I plan to do a regression analysis which nodes are recommended?
Many thanks for your help
There's a Linear Regression Learner node under Analytics > Mining > Linear/Polynomial Regression in the node repository. Does that do what you need?
I need to get the parameters to use the model in another program.
I tried cat_model.coef_, cat_model.intercept_ or what I think. is that possible to catch the params ?
I totally solved this problem, what i was tryna do is named 'saving model'.
cat_model.save_model('cat_model.cbm')
Attributes .coef_ and .intercept_ only exist in sklearn applications of linear regression and logistic regression and will give you the slopes and the intercept (if fitted). You can use .feature_importances_ instead.
For catboost, your model has something called feature importances, given that it's a gradient boosting tree model what you get back is how heavy certain features are in splitting the tree up.
cat_model.feature_importances_
will tell you that. Though you should do more research into how the model works and what it will give you back because interpreting these features can be somewhat deceptive.
Is it possible to use "reinforcement learning" or a feedback loop on a supervised model?
I have worked on a machine learning problem using a supervised learning model, more precisely a linear regression model, but I would like to improve the results by creating a feedback loop on the outputs of the prediction, i.e, tell the algorithm if it made mistakes on some examples.
As I know, this is basically how reinforcement learning works: the model learns from positive and negative feedbacks.
I found out that we can implement supervised learning and reinforcement learning algorithms using PyBrain, but I couldn't find a way to relate between both.
Most (or maybe all) iterative supervised learning methods already use a feedback loop on the outputs of the prediction. If fact, this feedback is very informative since it provides information with the exact amount of error in each sample. Think for example in stochastic gradient descent, where you compute the error of each sample to update the model parameters.
In reinforcement learning the feedback signal (i.e., reward) is much more limited than in supervised learning. Therefore, in the typical setup of adjusting some model parameters, if you have a set of input-output (i.e., a training data set), probably it has no sense to apply reinforcement learning.
If you are thinking on a more specific case/problem, you should be more specific in your question.
Reinforcement Learning has been used to tune hyper-parameters and/or select optimal Supervised Learning Models. There's also a paper on it: "Learning to optimize with Reinforcement Learning".
Reading Pablo's answer you may want to read up on "back propagation". It may be what you are looking for.
I have a data set for which I use Sklearn Decision Tree regression machine learning package to build a model for prediction purposes. Subsequently, I am trying to utilize scipy.optimize package to solve for the minimized solution based on a given constraint.
However, I am not sure if I can take the decision tree model as the objective function for the optimization problem. What should be the approach in a situation like this? I have tried linear regression models such as LarsCV in the past and they worked just fine. But in a linear regression model, you can essentially extract the coefficients and interception point from the model.
Yes; a linear regression model is a straightforward linear function of coefficients (one of which is the "intercept" or "bias").
The problem you have now is that a more complex model isn't quite so simple. You need to load the model into an appropriate engine. To "call" the model, you feed that engine the input vector (the cognate of a list of arguments), and wait for the model to return the prediction.
You need to wrap this process in a function call, perhaps one that issues the model load and processing as external system / shell commands, and returns the results to your main program. Some applications are large enough that it makes sense to implement a full-bore data stream with listener and reporter to handle the throughput.
Does that get you moving?