Polynomial regression (scikit learn), finding roots (Y = 0)

Polynomial regression (scikit learn), finding roots (Y = 0) - python

I have used scikit learn polynomial regression to fit a polynomial of order 4 to some data. I am interested in finding the roots or in other words at which point the curve crosses the x-axis when y = 0. When I looked at numpy's polynomial class, I found that they have a function called roots to do this. Is there something similar to this with scikit learn?
I have written the code below which checks for change from negative to positive giving me an approximate location of where the curve crosses the x-axis when y = 0. But I want to find if there is a better method.
#Equation for reference
YY4 = clf4.intercept_[0] + clf4.coef_[0][1] * XX + clf4.coef_[0][2]*np.power(XX,2) + clf4.coef_[0][3]*np.power(XX,3) + clf4.coef_[0][4]*np.power(XX,4)
def neutralState(Y): #Where Y takes the y-axis values
for i in range(1,len(Y)):
if (Y[i-1] < 0 and Y[i] > 0) or (Y[i-1] > 0 and Y[i] < 0): #ignore the very first value
NS_val = (Y[i])
NS_index = (i)
return(NS_val,NS_index) #return the closest value to zero
print(neutralState(YY4))

Related

Calculate the distance between fitted hyperplane and points

I'm trying to find the distance between a fitted hyperplane and five points. Most of the responses I've read use SVM, but I'm not trying to do a classification problem. I know there are probably multiple ways to do this in Python, but I'm a little stumped.
As an example here are my points:
[[ 163.3828172 169.65537306 144.69201418]
[-212.50951396 -167.06555958 56.69388025]
[-164.65129832 -163.42420063 -149.97008725]
[ 41.8704004 52.2538316 14.0683657 ]
[-128.38386078 -102.76840542 -303.4960438 ]]
To find the equation of a fitted plane I use SVD to compute the coefficients ax + by + cz - b = 0.
def fit_plane(points):
assert points.shape[1] == 3
centroid = points.mean(axis=0)
x = points - centroid[None, :]
U, S, Vt = np.linalg.svd(x.T # x)
#normal vector of best fitting plane is the left
#singular vector corresponding to the least singular value
normal = U[:, -1]
#calculate the distance from origin
origin_distance = normal # centroid
return np.hstack([normal, -origin_distance])
fit_plane(X)
Giving the equation:
-0.67449074x + 0.73767288y -0.03001614z -10.75632119 = 0
Now how do I calculate the distance between the points and the hyperplane? The answer I've seen used in conjunction with SVMs is d = |w^Tx +b|/||w||, but I don't know how to go from the equation I have already.

You can find the distance between an equation π and a point P by dropping a perpendicular N from P to π and get the point A where N and π intersect. The distance you are looking for is the distance between A and P.
This video explains the math of finding A (although it is about finding the reflection, finding A is part of it).

How to estimate the integeral of an oscillating curve using the Monte-Carlo method (in python)

I am trying to estimate the integral below using the Monte-Carlo method (in python):
I am using 1000 random points to estimate the integral. Here's my code:
N = 1000 #total number of points to be generated
def f(x):
return x*np.cos(x)
##Points between the x-axis and the curve will be stored in these empty lists.
red_points_x = []
red_points_y = []
blue_points_x = []
blue_points_y = []
##The loop checks if a point is between the x-axis and the curve or not.
i = 0
while i < N:
x = random.uniform(0, 2*np.pi)
y = random.uniform(3.426*np.cos(3.426), 2*np.pi*np.cos(2*np.pi))
if (0<= x <= np.pi and 0<= y <= f(x)) or (np.pi/2 <= x <= 3*np.pi/2 and f(x) <= y <= 0) or (3*np.pi/2 <= x <= 2*np.pi and 0 <= y <= f(x)):
red_points_x.append(x)
red_points_y.append(y)
else:
blue_points_x.append(x)
blue_points_y.append(y)
i +=1
area_of_rectangle= (2*np.pi)*(2*np.pi*np.cos(2*np.pi))
area= area_of_rectangle*(len(red_points_x))/N
print(area)
Output:
7.658813015245341
But that's far from 0 (the analytic solution)
Here's a visual representation of the area I am trying to plot:
Am I doing something wrong or missing something in my code? Please help, your help will be much appreciated. Thank you so much in advance.

TLDR: I believe the way you calculate the approximation is slightly wrong.
Looking a the wikipedia definition of the Monte Carlo integration the following definition is made:
https://en.wikipedia.org/wiki/Monte_Carlo_integration#Example
V corresponds the volume (area in this case) of the region of interest, x = [0, 2pi], y = [3.426*cos(3.426), 2pi*cos(2pi)].
So Q_N is the volume divided by N times the sum of the function evaluated at the randomly generated points. Hence:
total = 0
while i < N:
x = random.uniform(0, 2 * np.pi)
total += f(x)
i += 1
area_of_rectangle = (2*np.pi)*(2*np.pi*np.cos(2*np.pi)-3.426 * np.cos(3.426))
area = (area_of_rectangle * total) / N
This code yielded an average result of 0.0603 for 1000 runs with N=1000 (to remove the influence of randomly generated values). As you increase N the accuracy increases.

You are on the right track!
A couple pointers to put you on course...
Make your bounding box bigger in the y dimension to alleviate some of the confusing math. Yes, it will converge faster if you get it to "just touch" the max and min, but don't shoot for that yet. Heck, just make it -5 < y < 10 and you will have a nice (larger) box that covers the area you want to integrate. So, change your y generation to that and also change the area of your box calculation
Don't change x, you have it right 0 < x < 2*pi
When you are comparing the point to see if it is "under the curve" you do NOT need to check the x value... right? Just check if y is between f(x) and the axis. More on this in next point.... if so, it is "red"
Also on the point above, you will also need another category for the points that are BELOW the x-axis, because you will want to reduce your total by that amount. An alternate "trick" is to shift your whole function up by some constant such that the entire integral is positive, and then reduce your total by the size of that rectangle (constant * width)
Also, as you work on this, plot your points with matplotlib, it should be very easy the way you have your points gathered to overlay scatter plots with what you have and see if it looks visually accurate!
Comment me back w/ further q's... you got this!

Trigonometric Interpolation using the Discrete Fourier Transform

I am trying to approximate a function using the Discrete Fourier Transform, being given 2M+1 values of the function.
I've seen a few different expression for the coefficients and the approximation, but the ones I was originally trying were (12) and (13) as in http://www.chebfun.org/docs/guide/guide11.html
(I apologize for the link, but apparently Stack Overflow does not support Latex.)
I have a function for computing the approximation given the coefficients and another to calculate the coefficients, but it also returns this previous function. I've tested with some values but the results weren't close at all. I compared both of them with the numpy.fft.fft: the coefficients didn't match and passing the FFT to the first function did not result in a good approximation as well, so the coefficients aren't the only problem.
Here is my code:
def model(cks, x):
n = len(cks)
assert(n%2 == 1)
M = (n-1)//2
def soma(s):
soma = 0
for i in range(n):
m = -M + i
soma += cks[i]*cmath.exp(1j*m*s)
return soma
soma = np.vectorize(soma)
return soma(x)
def fourier(y):
n = len(y)
assert(n%2 == 1)
M = (n-1)//2
def soma(k):
soma = 0
for i in range(n):
t = 2*math.pi*i/n
soma += y[i]*cmath.exp(-1j*k*t)
return (1/n)*soma
cks = np.zeros(n, dtype='complex')
for i in range(n):
j = -M + i
cks[i] = soma(j)
return cks, (lambda x: model(cks,x))

I'm not sure I understand your code, but it looks to me like you have a forward and an inverse DFT there. One of those doesn't use pi, but it should.
If you're interested in obtaining interpolating samples, you can apply the DFT, pad it with zeros, then compute the inverse DFT (I'm using MATLAB code, because that is what I know, but I think it's fairly easy to read):
f = randn(1,21); % an input signal to be interpolated
F = fft(f); % forward DFT
F = fftshift(F); % shift zero frequency to middle of array
F = [zeros(1,60),F,zeros(1,60)]; % pad with equal number of zeros on both sides
F = ifftshift(F); % shift zero frequency back to first array element
fi = ifft(F) * length(F)/length(f); % inverse DFT, normalize
% `fi` is the interpolated `f`
% plotting
x = linspace(1,length(fi)+1,length(f)+1);
x = x(1:end-1);
plot(x,f,'x');
xi = 1:length(fi);
hold on
plot(xi,fi);
If you feel like you need to implement the DFT and inverse DFT from scratch, know that you can implement the latter using the former.
If you want to create a continuous function as a summation of shifted sine functions, follow the equation for the Fourier series, with A_n and Φ_n given by the amplitude and phase of the elements of the DFT.

Reverse output of polyfit numpy

I have used numpy's polyfit and obtained a very good fit (using a 7th order polynomial) for two arrays, x and y. My relationship is thus;
y(x) = p[0]* x^7 + p[1]*x^6 + p[2]*x^5 + p[3]*x^4 + p[4]*x^3 + p[5]*x^2 + p[6]*x^1 + p[7]
where p is the polynomial array output by polyfit.
Is there a way to reverse this method easily, so I have a solution in the form of,
x(y) = p[0]*y^n + p[1]*y^n-1 + .... + p[n]*y^0

No there is no easy way in general. Closed form-solutions for arbitrary polynomials are not available for polynomials of the seventh order.
Doing the fit in the reverse direction is possible, but only on monotonically varying regions of the original polynomial. If the original polynomial has minima or maxima on the domain you are interested in, then even though y is a function of x, x cannot be a function of y because there is no 1-to-1 relation between them.
If you are (i) OK with redoing the fitting procedure, and (ii) OK with working piecewise on single monotonic regions of your fit at a time, then you could do something like this:
-
import numpy as np
# generate a random coefficient vector a
degree = 1
a = 2 * np.random.random(degree+1) - 1
# an assumed true polynomial y(x)
def y_of_x(x, coeff_vector):
"""
Evaluate a polynomial with coeff_vector and degree len(coeff_vector)-1 using Horner's method.
Coefficients are ordered by increasing degree, from the constant term at coeff_vector[0],
to the linear term at coeff_vector[1], to the n-th degree term at coeff_vector[n]
"""
coeff_rev = coeff_vector[::-1]
b = 0
for a in coeff_rev:
b = b * x + a
return b
# generate some data
my_x = np.arange(-1, 1, 0.01)
my_y = y_of_x(my_x, a)
# verify that polyfit in the "traditional" direction gives the correct result
# [::-1] b/c polyfit returns coeffs in backwards order rel. to y_of_x()
p_test = np.polyfit(my_x, my_y, deg=degree)[::-1]
print p_test, a
# fit the data using polyfit but with y as the independent var, x as the dependent var
p = np.polyfit(my_y, my_x, deg=degree)[::-1]
# define x as a function of y
def x_of_y(yy, a):
return y_of_x(yy, a)
# compare results
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(my_x, my_y, '-b', x_of_y(my_y, p), my_y, '-r')
Note: this code does not check for monotonicity but simply assumes it.
By playing around with the value of degree, you should see that see the code only works well for all random values of a when degree=1. It occasionally does OK for other degrees, but not when there are lots of minima / maxima. It never does perfectly for degree > 1 because approximating parabolas with square-root functions doesn't always work, etc.

finding probability in multivariate normal distribution

I am using python. I know that to find probability in multivariate normal distribution I have to use following:
fx(x1,…,xk) = (1/√(2π)^k|Σ|) * exp(−1/2(x−μ)T* Σ^-1 *(x−μ))
where x = [x1, x2]
I have different values of x1 and x2.
but here I have to find probability for:
0.5< x1<1.5 and 4.5< x2<5.5
I know how to use this formula for single value of x1 and x2. But I am confused in this case. Please help.

What you need to so is find the area beneath the function for the rectangle bounded by 0.5 < x1 < 1.5 and 4.5 < x2 < 5.5.
As a quick and dirty solution, you could use this code to do a two-variable Reimann sum to estimate the integral. A Reimann sum just divides the rectangle into small regions and approximates the area under each region as if the function was flat.
Provided you've defined your distribution as the function f.
x1Low = 0.5
x1Hi = 1.5
x2Low = 4.5
x2Hi = 5.5
x1steps = 1000
x2steps = 1000
x1resolution = (x1Hi-x1Low)/x1steps
x2resolution = (x2Hi-x2Low)/x2steps
area = x1resolution*x2resolution
x1vals = [x1Low + i*x1resolution for i in range(x1steps)]
x2vals = [x2Low + i*x2resolution for i in range(x2steps)]
sum = 0;
for i in range(len(x1vals-1)):
for j in range(len(x2vals-1)):
sum += area * f(x1vals[i],x2vals[j])
print sum
Keep in mind that this sum is only an approximation, and not a great one either. It will seriously over- or under-estimate the integral in regions where the function changes too quickly.
If you need more accuracy, you can try implementing triangle rule or simpsons's rule, or look into scipy's numerical integration tools.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Polynomial regression (scikit learn), finding roots (Y = 0) - python

Related

Calculate the distance between fitted hyperplane and points

How to estimate the integeral of an oscillating curve using the Monte-Carlo method (in python)

Trigonometric Interpolation using the Discrete Fourier Transform

Reverse output of polyfit numpy

finding probability in multivariate normal distribution

Categories

Resources