I have some data that looks like this
What is the typical way to do a polynomial map of z based on x and y? I have used numpy.polyfit in the past to do similar things in 2 dimensions, so I suppose I could just iterate through all the points and then fit those answers with another 1d polyfit. However, it seems there should be a more straight forward way.
By the way the picture shows 2 different sets of data that would be fit with different equations.
It seems to me that what you really want is to fit a surface (linear or spline) in terms of z(x,y), but you have only one single line of data. This is like solving for two unknowns with only one equation - the problem is, basically, how could you decide if the difference in your red line from A to B was caused by the change in PSI, or the change in V?
My suggestions:
fit a surface to your existing dataset. You will get something.
try to get more data that you can fit a more accurate surface on
do what you first wanted - fit a function separately in each dimensions, combine them and use the best of the three functions (the one fitted for PSI, the one fitted for V and the one combined).
try to combine your PSI and V factors with some fancy physics-based trick into one significant factor that contains both of them
Related
Let's say price of houses(target variable) can be easily plotted against area of houses(predictor variables) and we can see the data plotted and draw a best fit line through the data.
However, consider if we have predictor variables as ( size, no.of bedrooms,locality,no.of floors ) etc. How am I gonna plot all these against the
target variable and visualize them on a 2-D figure?
The computation shouldn't be an issue (the math works regardless of dimensionality), but the plotting definitely gets tricky. PCA can be hard to interpret and forcing orthogonality might not be appropriate here. I'd check out some of the advice provided here: https://stats.stackexchange.com/questions/73320/how-to-visualize-a-fitted-multiple-regression-model
Fundamentally, it depends on what you are trying to communicate. Goodness of fit? Maybe throw together multiple plots of residuals.
If you truly want a 2D figure, that's certainly not easy. One possible approach would be to reduce the dimensionality of your data to 2 using something like Principal Component Analysis. Then you can plot it in two dimensions again. Reducing to 3 dimensions instead of 2 might also still work, humans can understand 3D plots drawn on a 2D screen fairly well.
You don't normally need to do linear regression by hand though, so you don't need a 2D drawing of your data either. You can just let your computer compute the linear regression, and that works perfectly fine with way more than 2 or 3 dimensions.
Is there a more intelligent function than scipy.optimize.curve_fit in Python?
I also need to define a function to fit data with.
I've spend ages trying to fit data with it. I can fit only basic functions and fitting two lines with piecewise function is impossible while the y-axis has low values like 0.01-0.05 and x-axis values like 20-60.
I know I have to plug in initial values, but still it takes too much time and sometimes it does not work.
EDIT
I added graph where are data I fitted and you can see the effect of changing bounds in scipy.optimize.curve_fit.
The function I fit with is this one:
def abslines(x,a,b,c,d):
return np.piecewise(x, [x < -b/a, x >= -b/a], [lambda x: a*x+b+d, lambda x: c*(x+b/a)+d])
Initial conditions are same everytime and I think they are close enough:
p0=[-0.001,0.2,0.005,0.]
because the values of parameters from best fit are:
[-0.00411946 0.19895546 0.00817832 0.00758401]
Bounds are:
No bounds;
bounds=([-1.,0.,0.,0.],[0.,1.,1.,1.])
bounds=([-0.5,0.01,0.0001,0.],[-0.001,0.5,0.5,1.])
bounds=([-0.1,0.01,0.0001,0.],[-0.001,0.5,0.1,1.])
bounds=([-0.01,0.1,0.0001,0.],[-0.001,0.5,0.1,1.])
starting with no bounds, end with best bounds
Still I think, that this takes too much time and curve_fit can find it better. This way I have to almost specify the function and it seems like I am fitting by changing parameters not that curve_fit is fitting.
Without knowing what is exactly the regression algorithm in Python it is quite impossible to give a definitive answer. Probably the calculus is iterative and requires initial guesses, which are probably derived from the specified bounds. So, the bounds have an indirect effect on the convergence and the results.
I suggest to try a simpler algorithm (not iterative, no initial guess) coming from this paper : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf
The code is easy to write in any computer language. I suppose this can be done with Python as well.
The piecewise function to be fitted is :
The parameters to be computed are a1, p1, q1, p2 and q2.
The result is shown on the next figure, with the approximate values of the parameters.
So that, no bounds are required to be specified and as a consequence no problems related to bounds.
NOTE : The method is based on the fitting of a convenient integral equation such as shown in the above referenced paper. The numerical calculus of the integral is subjected to deviations if the number of points is too small. In the present case, they are a large number of points. So, even scattered this is a favourable case for the practical application of this method.
1.Algorithms behind curve_fit expect differentiable functions, thus it can go south if given a non-differential one.
For a more powerful interface to curve fitting, have a look at lmfit.
Basically, I'm trying to make this function happen:
Where i'm solving for the beta. gamma, alpha, and x are all from the data.
Originally, I just used the summary statistic mean(xi/gamma_i), which meant that everything in that summation could be pre-calculated, and then i would just present a simple np array to the non-linear optimizer... but now there's no way to pre-calculate the summary statistic, as it's not immediately clear how beta will affect f when f is changing in response to alpha_i. Thus, I'm not sure how to go about presenting that array. is it possible to embed those covariates as lists (numpy Objects) to still present a numpy array, and then unpack the list within the residual function? Am i going about this the wrong way?
I have a number of data sets, each containing x, y, and y_error values, and I'm simply trying to calculate the average value of y at each x across these data sets. However the data sets are not quite the same length. I thought the best way to get them to an equal length would be to use scipy's interoplate.interp1d for each data set. However, I still need to be able to calculate the error on each of these averaged values, and I'm quite lost on how to accomplish that after doing an interpolation.
I'm pretty new to Python and coding in general, so I appreciate your help!
As long as you can assume that your errors represent one-sigma intervals of normal distributions, you can always generate synthetic datasets, resample and interpolate those, and compute the 1-sigma errors of the results.
Or just interpolate values+err and values-err, if all you need is a quick and dirty rough estimate.
So if you have 4 vectors, and you know their directions (which all could be different), and you could change all their magnitudes, what would you set the magnitudes equal to if you want all 4 vectors to add up to another vector which you know?
This might seem like a little bit too specific question, but the reason I ask this is for a program I'm making with kRPC for KSP in which 4 tilted engines hover an aircraft, even when the entire aircraft is tilted. I tried searching it, but I didn't know exactly what to search. I don't know a lot about the math of vectors. Thanks!
It may not always possible, it will depend on the vectors. Technically your target vector must be in the linear span of your four input vectors.
You can write it as the following matrix equation:
Ax = b
With, b the target vector, x the coefficients summing coefficients and A a matrix made by stacking your four vectors column-wise. The x vector exists if and only if the A matrix is invertible.
You can solve this using the numpy.linalg.solve function.
In your case, you may have problem if you have 4 vectors of dimension 3 (or 2): in this case there is not only one solution, and it gets tricky. In practice, you would need to ditch one (or two) of the vectors to keep only 3 (or 2) independent vectors.
You can still use numpy.linalg.lstsq to get an approximative solution though.
From what I understand, you are trying to sum vectors, and you are looking for a particular solution to get an end vector. In a way you are asking how to solve a problem that looks like this:
a1V1 + a2V2 + a3V3 + a4V4 = aV
where V are vectors, and a are magitudes. I am assuming that V are fixed (the way your plan jets are positioned?). I am also assuming that your vectors are 3 dimensional
So from the math alone, you have 3 dimensions and 4 parameters. Meaning that one of the 'a' can be anything. I don't think I can help any further than that without knowing the exact vectors.
In terms of code, solving for 'a's is trying to solve linear equations, typically done by using scipy's linalg:
https://docs.scipy.org/doc/scipy-0.15.1/reference/linalg.html
In particular, scipy.linalg.solve().
Again, you have to be very aware of the dimensions of the problem if you use the 4 vectors with the matrix, you will run into a lot of issues. Something must be done to determine one of the 'a' values before trying to solve the equations.
Edit: I thought a bit more about the problem, and I realized that there is a constraint in the system, and that is that the 'a's cannot be negative. So it seems like there are unique solutions to the system.