I need to use knime for regression analysis. I am a python user, I know knime as well but not in deep!
I usually use statsmodel in python for regression analysis and working on statistical models.
However for solving regression problem as a machine learning problem I use sklearn regression model. Each of these packages in python has its own benefit deepened on your task, and also different view of output which is really important to address the problem in the right way.
Here is my question, does knime present any special package for statistical model? If I plan to do a regression analysis which nodes are recommended?
Many thanks for your help
There's a Linear Regression Learner node under Analytics > Mining > Linear/Polynomial Regression in the node repository. Does that do what you need?
Related
Let's assume we're dealing with continuous features and responses. We fit a linear regression model (let's say first order) and after CV we get a somehow good r^2 (let's say r^2=0.8).
Why do we go for other ML algorithms? I've read some research papers and they were trying different ML algorithms and taking the simple linear model as a base model for comparison. In these papers, the linear model outperformed other algorithms, what I have difficulty understanding is why do we go for other ML algorithms then? Why can't we just be satisfied with the linear model especially in the specific case where other algorithms perform poorly?
The other question is what do they gain from presenting the other algorithms in their research papers if these algorithms performed poorly?
The Best model for solving predictive problems such as continuous output is the regression model especially if you do it using a Neural network (polynomial or linear) with hyperparameter tuning based on the problem.
Using other ML algorithms such as Decision Tree or SVM or any other model where their main goal is classification but on the paper, they say it can do regression also in fact, they can't predict any new values.
but in the field of research people always try to find a better way to predict values other than regression, like in the classification world we start with Logistic regression -> decision tree and now we have SVM and ensembling models and DeepLearning.
I think the answer is because you never know.
especially in the specific case where other algorithms perform poorly?
You know they performed poorly because someone tried dose models. It's always worthy trying various models.
I am new to machine learning, but I have decent experience in python. I am faced with a problem: I need to find a machine learning model that would work well to predict the speed of a boat given current environmental and physical conditions. I have looked into Scikit-Learn, Pytorch, and Tensorflow, but I am having trouble finding information on what type of model I should use. I am almost certain that linear regression models would be useless for this task. I have been told that non-parametric regression models would be ideal for this, but I am unable to find many in the Scikit Library. Should I be trying to use regression models at all, or should I be looking more into Neural Networks? I'm open to any suggestions, thanks in advance.
I think multi-linear regression model would work well for your case. I am assuming that the input data is just a bunch of environmental parameters and you have a boat speed corresponding to that. For such problems, regression usually works well. I would not recommend you to use neural networks unless you have a lot of training data and the size of one input data is also quite big.
I am running a logistic regression with statsmodel and I am trying to add robustness to my model, similar to STATA's robust command, and I can't seem to find it on their documentation. Note that I am not looking for the Robust Linear Regression with command sm.RLM(), as this is another model, not an add-in to the model I want to use.
Thanks in advance !
Is there some python packages that helps to do statistical linear regression? For example, I hope such program could do something like automatically performing different types of statistical tests (t-test, F-test etc.) and then automatically removes redundant variable etc., correct for heteroskedasticity etc.. Or is LASSO just the best?
You can perform and visualize linear regression in Python with a wide array of packages like:
scipy, statsmodels and seaborn. LASSO is available through statsmodels as described here. When it comes to automated approches to linear regresssion analysis you could start with Forward Selection with statsmodels that was described in an answer in the post Stepwise Regression in Python.
I'm am trying to identify phonemes in voices using a training database of known ones.
I'm wondering if there is a way of identifying common features within my training sample and using that to classify a new one.
It seems like there are two paths:
Give the process raw/normalised data and it will return similar ones
Extract certain metrics such as pitch, formants etc and compare to training set
My interest is the first!
Any recommendations on machine learning or regression methods/algorithms?
Since you tagged Python, I highly recommend looking into scikit-learn, an excellent Python library for Machine Learning. Their docs are very thorough, and should give you a good crash course in Machine Learning algorithms and implementation (including classification, regression, clustering, etc)
Your points 1 and 2 are not very different: 1) is the end results of a classification problem 2) is the feature that you give for classification. What you need is a good classifier (SVM, decision trees, hierarchical classifiers etc.) and a good set of features (pitch, formants etc. that you mentioned).