Predict python dataframe [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm new to python. Now I have a dataframe which contain annual records from 1959 to 2009. Could you please tell me how to use it to predict, say from 2010 to 2012?
Appreciation for any help!

First of all, plot your data and have a look at it. You must then have a feeling of what's going on and also have a subjective prediction.
If your data seems to be completely random, without any obvious trends, calculate its average and use it as a first-guess prediction. (For a fully random data, it will be the result from the linear regression as well).
You can then use linear regression, either with Pandas' ols regression tools, or numpy's polyfit. Make sure you plot your data and the regression line to actually see how well your prediction is doing.
And don't expect to do a miracle with this method. Complicated things are much harder to predict than a linear regression, and 50-year-long processes, whatever they be, are usually complicated enough.

Related

Why is my stacking regressor scoring worse than its components? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I'm using the following snippet of code:
The function test_submodels calculates the r^2 testscore of each submodel and tosses out the bad ones (in this case only the svm model), and returns the new list model_names. Then I'm calculating the r^2 scores of my stacked regressor which turns out the be awful. The output of this code can be seen below:
Here is some more clarification regarding the submodels, they are created as such:
I ended up fixing the problem, I had to define the final estimator in the stacking regressor, for example as such:
This improves the stacking score to roughly 0.9

Can we use Principal Components(PCA) with other features? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a dataset of 10 features. Three of these are categorical; when I apply one-hot encoding to these three, they blow up into 96 features. I reduced these 96 features into 20 by PCA.
I plan to use the 20 principal components and the remaining 7 features as my final feature set. Is this a good idea: to combine principal components with actual features?
PCA tends to represent a combination of actual features, most of the times this combination leads to some information loss. That usually is fair trade-off by the dimensionality reduction. Adding those actual features won't get you dimensionality too large and will get "back" some information lost by PCA.
But my advice would still be to try it both. and choose the one that leads better results (given your specification)
There is no theoretical problem with this approach. From a statistical standpoint, all you've done is to exclude those seven features from the PCA reduction. This implies that you know, a priori, that those seven features are principal components -- that they're significant to the results, without having to analyze them for independence from the other features, and for relevance.
As loeschet already mentioned, you should try it both ways: once the way you're proposing, and once with all 103 features included in your PCA phase. See which gives you better results. Much of data set analysis consists of trying different approaches to see which gives you the best empirical results.

Convex Optimization in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I recently got interested in soccer statistics. Right now I want to implement the famous Dixon-Coles Model in Python 3.5 (paper-link).
The basic problem is, that from the model described in the paper a Likelihood function with numerous parameters results, which needs to be maximized.
For example: The likelihood function for one Bundesliga season would result in 37 parameters. Of course I do the minimization of the corresponding negative log-likelihood function. I know that this log function is strictly convex so the optimization should not be too difficult. I also included the analytic gradient, but as the number of parameters exceeds ~10 the optimization methods from the SciPy-Package fail (scipy.optimize.minimize()).
My question:
Which other optimization techniques are out there and are mostly suited for optimization problems involving ~40 independent parameters?
Some hints to other methods would be great!
You may want to have a look at convex optimization packages like https://cvxopt.org/ or https://www.cvxpy.org/. It's Python-based, hence easy to use!
You can make use of Metaheuristic algorithms which work both on convex and non-convex spaces. Probably the most famous one of them is Genetic algorithm. It is also easy to implement and the concept is straightforward. The beautiful thing about Genetic algorithm is that you can adapt it to solve most of the optimization problems.

complex non-linear equations in python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Is there any way to find the solutions of a set of non-linear complex equations in python?
I need to solve the Bethe equations of the heisenberg model (eg. equation 15 of http://arxiv.org/pdf/1201.5627v1.pdf)
SciPy includes nonlinear numerical solvers, but you may want to consider dedicated software, such as Wolfram Mathematica especially for computation speed considerations.
If the maths is the problem at some point, consider posting to the Math Stack Exchange website
You can use Sage. Sage notebook is the browser-based interface of Sage.
Most of the scientific/ Mathematical (SciPy,NumPy,Sympy...) python libraries are integrated with Sage so you do not have to call these libraries explicitly.

Fitting an exponential approach/asymptotic power law in R/Python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
How can I fit my data to an asymptotic power law curve or an exponential approach curve in R or Python?
My data essentially shows that the the y-axis increases continuously but the delta (increase) decreases with increase in x.
Any help will be much appreciated.
Using Python, if you have numpy and scipy installed, you could use curve_fit of thescipy package. It takes a user-defined function and x- as well as y-values (x_values and y_values in the code), and returns the optimized parameters and the covariance of the parameters.
import numpy
import scipy
def exponential(x,a,b):
return a*numpy.exp(b*x)
fit_data, covariance = scipy.optimize.curve_fit(exponential, x_values, y_values, (1.,1.))
This answer assumes you have your data as a one-dimensional numpy-array. You could easily convert your data into one of these, though.
The last argument contains starting values for your optimization. If you dont supply them, there might be problems in determining the number of parameters.

Categories