Find tangent line satisfying certain requirements to curve equation with optimizer? - python

I have what seems to be a relatively simplistic geometric optimization question, but one in which I have absolutely no experience.
I have a bunch of data that I am fitting a spline through to find its equation, such as the below,
And this is the equation I fit to the dots,
with the code,
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline
plt.figure(figsize=(6.5, 4))
plt.axis([0.017,.045,.0045,.014])
plt.scatter(x,y,color='orange')
spl = UnivariateSpline(x, y)
plt.plot(xs, spl(xs), 'b', lw=3)
plt.show()
And what I want to do is find the equation of a line, y = mx + b, with a certain y-intercept a that hits the blue curve at some tangent point. So, for example, if my line has to pass through y=.006, what is the slope b of its line that makes it hit the blue line's tangent?
I though about doing a regular optimization over the derivatives of the curve at a large amount of points, but I am unsure how to set this up in a rigorous way.

Related

Curvature when extrapolated using Natural Cubic Spline

One of the assumptions behind the Natural Cubic Spline is that at the endpoints of the interval of interpolation, the second derivative of the spline polynomials is set to be equal to 0. I tried to show that using the Natural Cubic Spline via from scipy.interpolate import CubicSplines in the example (code below).
from scipy.interpolate import CubicSpline
from numpy import linspace
import matplotlib.pyplot as plt
runge_f = lambda x: 1 / (1 + 25*x**2)
x = linspace(-2, 2, 11)
y = runge_f(x)
cs = CubicSpline(x, y, bc_type = "natural")
t = linspace(-5, 5, 1000)
plt.plot(x, y, "p", color="red")
plt.plot(t, runge_f(t), color="black")
plt.plot(t, cs(t), color="lightblue")
plt.show()
In the presented example, the extrapolated points' curvature is not equal to zero - shouldn't the extrapolation outside the interval be linear in the Natural Cubic Spline?
The curvature (second derivative) of the spline at the end points is indeed 0, as you can check by running this
print(cs(x[0],2), cs(x[-1], 2))
which calculates second derivatives at both ends of your x interpolation interval. However that does not mean the spline is flat beyond the limits -- it continues on as a cubic polynomial. If you want to extrapolate linearly outside the range, you have to do it yourself. It is easy enough to extrapolate flat: replace your cs=.. line with
from scipy.interpolate import interp1d
cs = interp1d(x,y,fill_value = (y[0],y[-1]), bounds_error = False)
to get something like this:
but a bit more work to extrapolate linearly. I am not sure there is a Python function for that (or rather I am sure there is one I just don't know what it is)

The line of best fit doesn't match the scatter plot

Below is my scatter plot with a regression linear. Just by looking at how the markers are distributed on the plot, I feel like the linear is not covering them correctly. From what i see, it is supposed to be more of a diagonal and a more straight line instead of a curve. here is my code producing the plot:
for i in range (len(linkKarmaList)):
plt.scatter(commentKarmaList[i], linkKarmaList[i], marker="o", s=len(clearModSet[i])*1.0*0.9)
x = numpy.asarray(commentKarmaList)
y = numpy.asarray(linkKarmaList )
plt.plot(numpy.unique(x), numpy.poly1d(numpy.polyfit(x, y, 1))(numpy.unique(x)))
plt.xlabel('Comment Karma ')
plt.ylabel('Link Karma')
plt.title('Link and comment Karma of most popular Forums on reddit')
plt.xscale('log')
plt.yscale('log')
plt.legend()
plt.show
Am I interpreting that correctly? What am I missing?
You're trying to fit a straight line y = a*x + b, which doesn't look like a straight line in log-space. Instead, you should be plotting a straight line in log-space.
This comes down to log(y) = a * log(x) + b
Which we can then rewrite to log(y) = log(x^a) + b
If we then take the exponent of this, we find:
y = x^a * 10^b or just y = C * x^a, where C (=10^b) and a are the fitting parameters and x and y are your data.
This is the function that makes a straight line in log-log space, which is the function you should try to fit against your data.
From what you show, I'd say the problem is that in the log-log plot the scatterplot looks more or less like a line.
The problem is that you're fitting against natural values and then plotting in a log-log plot.

How to make two plots distinct in python(matplotlib) in same plot

I have written a simple python program to solve simple harmonic oscillator using both Euler method and Analytical method, but it seems that the two curves fit perfectly(I am not sure how and why?, since they had to be different). Since these curves fit perfectly, I have not been able to make any distinction between these two curves. Even though they fit, is there any way to make them distinct using matplotlib's features. Thanks
import matplotlib.pyplot as plt
import math as m
g=9.8
v=0.0 #initial velocity
h=0.01 #time step
x=5.0 #initial position
w=m.sqrt(10.0)
t=0.0
ta,xa,xb=[],[],[]
while t<12.0:
ta.append(t)
xa.append(x)
xb.append(5*m.cos(w*t))
v=v-(10.0/1.0)*x*h #k=10.0, m=1.0
x=x+v*h
t=t+h
plt.figure()
plt.plot(ta,xa,ta,xb,'bo--')
plt.xlabel('$t(s)$')
plt.ylabel('$x(m)$')
plt.show()
One way is changing color and reducing the opacity of one plot:
plt.plot(ta,xa)
plt.plot(ta,xb,c='red',alpha=.5)
instead of:
plt.plot(ta,xa,ta,xb,'bo--')
When zoomed in:
You can also scatter one and plot the other:
plt.plot(ta,xa)
plt.scatter(ta,xb,c='red',alpha=.3)
You could call the plot method twice, like
plt.plot(ta, xa, 'bo--')
plt.plot(ta, xb, 'gs')

plotting curve decision boundary in python using matplotlib

I am new to machine learning with python. I've managed to draw the straight decision boundary for logistic regression using matplotlib. However, I am facing a bit of difficulty in plotting a curve line to understand the case of overfitting using some sample dataset.
I am trying to build a logistic regression model using regularization and use regularization to control overfitting my data set.
I am aware of the sklearn library, however I prefer writing code separately
The test data sample I am working on is given below:
x=np.matrix('2,300;4,600;7,300;5,500;5,400;6,400;3,400;4,500;1,200;3,400;7,700;3,550;2.5,650')
y=np.matrix('0;1;1;1;0;1;0;0;0;0;1;1;0')
The decision boundary I am expecting is given in the graph below:
Any help would be appreciated.
I could plot a straight decision boundary using the code below:
# plot of x 2D
plt.figure()
pos=np.where(y==1)
neg=np.where(y==0)
plt.plot(X[pos[0],0], X[pos[0],1], 'ro')
plt.plot(X[neg[0],0], X[neg[0],1], 'bo')
plt.xlim([min(X[:,0]),max(X[:,0])])
plt.ylim([min(X[:,1]),max(X[:,1])])
plt.show()
# plot of the decision boundary
plt.figure()
pos=np.where(y==1)
neg=np.where(y==0)
plt.plot(x[pos[0],1], x[pos[0],2], 'ro')
plt.plot(x[neg[0],1], x[neg[0],2], 'bo')
plt.xlim([x[:, 1].min()-2 , x[:, 1].max()+2])
plt.ylim([x[:, 2].min()-2 , x[:, 2].max()+2])
plot_x = [min(x[:,1])-2, max(x[:,1])+2] # Takes a lerger decision line
plot_y = (-1/theta_NM[2])*(theta_NM[1]*plot_x +theta_NM[0])
plt.plot(plot_x, plot_y)
And my decision boundary looks like this:
In an ideal scenario the above decision boundary is good but I would like to plot a curve decision boundary that will fit my training data very well but will overfit my test data. something similar to shown in the 1st plot
This can be done by gridding the parameter space and setting each grid point to the value of the closest point. Then running a contour plot on this grid.
But there are numerous variations, such as setting it to a value of a distance-weighted average; or smoothing the final contour; etc.
Here's an example for finding the initial contour:
import numpy as np
import matplotlib.pyplot as plt
# get the data as numpy arrays
xys = np.array(np.matrix('2,300;4,600;7,300;5,500;5,400;6,400;3,400;4,500;1,200;3,400;7,700;3,550;2.5,650'))
vals = np.array(np.matrix('0;1;1;1;0;1;0;0;0;0;1;1;0'))[:,0]
N = len(vals)
# some basic spatial stuff
xs = np.linspace(min(xys[:,0])-2, max(xys[:,0])+1, 10)
ys = np.linspace(min(xys[:,1])-100, max(xys[:,1])+100, 10)
xr = max(xys[:,0]) - min(xys[:,0]) # ranges so distances can weight x and y equally
yr = max(xys[:,1]) - min(xys[:,1])
X, Y = np.meshgrid(xs, ys) # meshgrid for contour and distance calcs
# set each gridpoint to the value of the closest data point:
Z = np.zeros((len(xs), len(ys), N))
for n in range(N):
Z[:,:,n] = ((X-xys[n,0])/xr)**2 + ((Y-xys[n,1])/yr)**2 # stack arrays of distances to each points
z = np.argmin(Z, axis=2) # which data point is the closest to each grid point
v = vals[z] # set the grid value to the data point value
# do the contour plot (use only the level 0.5 since values are 0 and 1)
plt.contour(X, Y, v, cmap=plt.cm.gray, levels=[.5]) # contour the data point values
# now plot the data points
pos=np.where(vals==1)
neg=np.where(vals==0)
plt.plot(xys[pos,0], xys[pos,1], 'ro')
plt.plot(xys[neg,0], xys[neg,1], 'bo')
plt.show()

Python Curve Fitting

I'm trying to use Python to fit a curve to a set of points. Essentially the points look like this.
The blue curve indicates the data entered (in this case 4 points) with the green being a curve fit using np.polyfit and polyfit1d. What I essentially want is a curve fit that looks very similar to the blue line but with a smoother change in gradient at points 1 and 2 (meaning I don't require the line to pass through these points).
What would be the best way to do this? The line looks like an arctangent, is there any way to specify an arctangent fit?
I realise this is a bit of a rubbish question but I want to get away without specifying more points. Any help would be greatly appreciated.
It seems that you might be after interpolation between points rather than fitting a polynomial References: Spline Interpolation with Python and Fitting polynomials to data
However, in either case here is a code snippet that should get you started:
import numpy as np
import scipy as sp
from scipy.interpolate import interp1d
x = np.array([0,5,10,15,20,30,40,50])
y = np.array([0,0,0,12,40,40,40,40])
coeffs = np.polyfit(x, y, deg=4)#you can change degree as you see fit
poly = np.poly1d(coeffs)
yp = np.polyval(poly, x)
interpLength = 10
new_x = np.linspace(x.min(), x.max(), new_length)
new_y = sp.interpolate.interp1d(x, y, kind='cubic')(new_x)
plt.plot(x, y, '.', x, yp, '-', new_x,new_y, '--')
plt.show()

Categories