Election winning Prediction of 5 Candidates using Linear Regression in python - python

I got one project in which I need to build a prediction model using linear regression in build. The case study is, need to predict winning of 5 candidates in an election. In this, I don't have any data and need to build data on my own but I am not able to visualize parameters. Can anybody help me in data building, it would be highly helpful.

you can start by get previous years election winner as your'e train data, if you don't have any train data , you have a problem in using linear regression (or supervised learning) after that if you want to use python try this step's :
use some code from this tutorial : https://machinelearningmastery.com/machine-learning-in-python-step-by-step/ or any other good beginner tutorial , you can join some comunity like https://www.kaggle.com/ and get some ideas from their kernel regarding to some processing of the data and tuning parameters

As I understand this question, you need to create a model based on data you don't have yet. Presumably you will get the data later on, by which time the model should already be implemented. You can create a fake data set using the numpy.random library. We's need more details on what exactly you're trying to do, though.

Related

Suggestions for nonparametric machine learning models

I am new to machine learning, but I have decent experience in python. I am faced with a problem: I need to find a machine learning model that would work well to predict the speed of a boat given current environmental and physical conditions. I have looked into Scikit-Learn, Pytorch, and Tensorflow, but I am having trouble finding information on what type of model I should use. I am almost certain that linear regression models would be useless for this task. I have been told that non-parametric regression models would be ideal for this, but I am unable to find many in the Scikit Library. Should I be trying to use regression models at all, or should I be looking more into Neural Networks? I'm open to any suggestions, thanks in advance.
I think multi-linear regression model would work well for your case. I am assuming that the input data is just a bunch of environmental parameters and you have a boat speed corresponding to that. For such problems, regression usually works well. I would not recommend you to use neural networks unless you have a lot of training data and the size of one input data is also quite big.

Efficient way to compare effects of adding/removing multiple data-cleaning steps on the performance of deep learning model?

Somewhat a beginner here on deep learning using Python & Stack Overflow.
I am currently working on something similar to a sentiment analysis of community posts using LSTM, and have been trying to add preprocessing steps to clean up the text data.
I have lots of ideas - say, 7 - for modifying/dropping certain data without sacrificing context that I think could improve my prediction accuracy, but I want to be able to see exactly how implementing one or some of these ideas can affect the prediction accuracy.
So is there a tool, statistical method or technique that I can use that will drastically cut down on the number of experiments (training the model + predicting on test set) that I need to do to see how "toggling on" one, two, or several of these preprocessing steps can affect my prediction accuracy, instead of having to do like 49 experiments and filling out the results on a 7x7 table? I have used the Taguchi method of design of experiments on a different kind of problem before, but not sure it can be applied properly here since the neural network will be trained in a completely different way based on the data it is fed.
Thank you for any input and advice!

How to predict the results of a tobit model (trained using AER library in R) with rpy2?

I have a tobit regression model in R working completely well, where I am also able to predict the actual output values for the test set using Inverse-Mills Ratio. However, the rest of the code for my project is in python so I wanted to explore and use the rpy2 API to migrate the code to python from R. I have been able to achieve the bit until model training using AER.tobit() from the library AER (used in R). However when it comes to predicting on test data, the code is not performing as expected. When I use robjects.r.predict(model,newdata) from rpy2, it just gives me fitted values for the training data instead of responses for the test data. If anybody knows a way around this and can let me know, it will be great help! Thanks in advance!
Let me know if you'd need more clarification on the problem.

How to perform multi-step out-of-time forecast which does not involve refitting the ARIMA model?

I have an already existing ARIMA (p,d,q) model fit to a time-series data (for ex, data[0:100]) using python. I would like to do forecasts (forecast[100:120]) with this model. However, given that I also have the future true data (eg: data[100:120]), how do I ensure that the multi-step forecast takes into account the future true data that I have instead of using the data it forecasted?
In essence, when forecasting I would like forecast[101] to be computed using data[100] instead of forecast[100].
I would like to avoid refitting the entire ARIMA model at every time step with the updated "history".
I fit the ARIMAX model as follows:
train, test = data[:100], data[100:]
ext_train, ext_test = external[:100], external[100:]
model = ARIMA(train, order=(p, d, q), exog=ext_train)
model_fit = model.fit(displ=False)
Now, the following code allows me to predict values for the entire dataset, including the test
forecast = model_fit.predict(end=len(data)-1, exog=external, dynamic=False)
However in this case after 100 steps, the ARIMAX predicted values quickly converge to the long-run mean (as expected, since after 100 time steps it is using the forecasted values only). I would like to know if there is a way to provide the "future" true values to give better online predictions. Something along the lines of:
forecast = model_fit.predict_fn(end = len(data)-1, exog=external, true=data, dynamic=False)
I know I can always keep refitting the ARIMAX model by doing
historical = train
historical_ext = ext_train
predictions = []
for t in range(len(test)):
model = ARIMA(historical, order=(p,d,q), exog=historical_ext)
model_fit = model.fit(disp=False)
output = model_fit.forecast(exog=ext_test[t])[0]
predictions.append(output)
observed = test[t]
historical.append(observed)
historical_ext.append(ext_test[t])
but this leads to me training the ARIMAX model again and again which doesn't make a lot of sense to me. It leads to using a lot of computational resources and is quite impractical. It further makes it difficult to evaluate the ARIMAX model cause the fitted params to keep on changing every iteration.
Is there something incorrect about my understanding/use of the ARIMAX model?
You are right, if you want to do online forecasting using new data you will need to estimate the parameters over and over again which is computationally inefficient.
One thing to note is that for the ARIMA model mainly the estimation of the parameters of the MA part of the model is computationally heavy, since these parameters are estimated using numerical optimization, not using ordinary least squares. Since after calculating the parameters once for the initial model you know what is expected for future models, since one observation won't change them much, you might be able to initialize the search for the parameters to improve computational efficiency.
Also, there may be a method to do the estimation more efficiently, since you have your old data and parameters for the model, the only thing you do is add one more datapoint. This means that you only need to calculate the theta and phi parameters for the combination of the new datapoint with all the others, while not computing the known combinations again, which would save quite some time. I very much like this book: Heij, Christiaan, et al. Econometric methods with applications in business and economics. Oxford University Press, 2004.
And this lecture might give you some idea of how this might be feasible: lecture on ARIMA parameter estimation
You would have to implement this yourself, I'm afraid. As far as I can tell, there is nothing readily available to do this.
Hope this gives you some new ideas!
As this very good blog suggests (3 facts about time series forecasting that surprise experienced machine learning practitioners):
"You need to retrain your model every time you want to generate a new prediction", it also gives the intuitive understanding of why this happens with examples. That basically highlights time-series forecasting challenge as a constant change, that needs refitting.
I was struggling with this problem. Luckily, I found a very useful discussion about it. As far as I know, the case is not supported by ARIMA in python, we need to use SARIMAX.
You can refer to the link of discussion: https://github.com/statsmodels/statsmodels/issues/2788

Prediction Model to predict when the future events will happen next month

I have to develop a Prediction Model using Python to predict if a site will crash next month or not depending on the occurances in the last 6 monthes. Input Parameters are: Environment(Dev,Prod,Test), Region(NA,APAC,EMEA) and the Date of the month.
I am using matplotlib, pandas and numpy. It will be a 2D Data Frame or a 3D Panel in Pandas. I am not sure as input parameters are 3 - Region, Env and Date.
I think below Machine Learning Algorithm should be used:
from sklearn.linear_model import LinearRegression
Please correct me if I am wrong.
Linear regression is fine, but calling it is just two line of work, i would suggest try multiple machine learning algorithms, tuning their hyperparameters and checking which gives the best performance, moreover you should look into feature engineering, maybe you could extract features from the already given data

Categories