Calculating decision function of SVM manually - python

I'm attempting to calculate the decision_function of a SVC classifier MANUALLY (as opposed to using the inbuilt method) using the the python library SKLearn.
I've tried several methods, however, I can only ever get the manual calculation to match when I don't scale my data.
z is a test datum (that's been scaled) and I think the other variables speak for themselves (also, I'm using an rbf kernel if thats not obvious from the code).
Here are the methods that I've tried:
1 Looping method:
dec_func = 0
for j in range(np.shape(sup_vecs)[0]):
norm2 = np.linalg.norm(sup_vecs[j, :] - z)**2
dec_func = dec_func + dual_coefs[0, j] * np.exp(-gamma*norm2)
dec_func += intercept
2 Vectorized Method
diff = sup_vecs - z
norm2 = np.sum(np.sqrt(diff*diff), 1)**2
dec_func = dual_coefs.dot(np.exp(-gamma_params*norm2)) + intercept
However, neither of these ever returns the same value as decision_function. I think it may have something to do with rescaling my values or more likely its something silly that I've been over looking!
Any help would be appreciated.

So after a bit more digging and head scratching, I've figured it out.
As I mentioned above z is a test datum that's been scaled. To scale it I had to extract .mean_ and .std_ attributes from the preprocessing.StandardScaler() object (after calling .fit() on my training data of course).
I was then using this scaled z as an input to both my manual calculations and to the inbuilt function. However the inbuilt function was a part of a pipeline which already had StandardScaler as its first 'pipe' in the pipeline and as a result z was getting scaled twice!
Hence, when I removed scaling from my pipeline, the manual answers "matched" the inbuilt function's answer.
I say "matched" in quotes by the way as I found I always had to flip the sign of my manual calculations to match the inbuilt version. Currently I have no idea why this is the case.
To conclude, I misunderstood how pipelines worked.
For those that are interested, here's the final versions of my manual methods:
diff = sup_vecs - z_scaled
# Looping Method
dec_func_loop = 0
for j in range(np.shape(sup_vecs)[0]):
norm2 = np.linalg.norm(diff[j,:])
dec_func_loop = dec_func_loop + dual_coefs[j] * np.exp(-gamma*(norm2**2))
dec_func_loop = -1 * (dec_func_loop - intercept)
# Vectorized method
norm2 = np.array([np.linalg.norm(diff[n, :]) for n in range(np.shape(sup_vecs)[0])])
dec_func_vec = -1 * (dual_coefs.dot(np.exp(-gamma*(norm2**2))) - intercept)
Addendum
For those who are interested in implementing a manual method for a multiclass SVC, the following link is helpful: https://stackoverflow.com/a/27752709/1182556

Related

Inner workings of pytorch autograd.grad for inner derivatives

Consider the following code:
x = torch.tensor(2.0, requires_grad=True)
y = torch.square(x)
grad = autograd.grad(y, x)
x = x + grad[0]
y = torch.square(x)
grad2 = autograd.grad(y, x)
First, we have that ∇(x^2)=2x. In my understanding, grad2=∇((x + ∇(x^2))^2)=∇((x+2x)^2)=∇((3x)^2)=9∇x^2=18x . As expected, grad=4.0=2x, but grad2=12.0=6x, which I don't understand where it comes from. It feels as though the 3 comes from the expression I had, but it is not squared, and the 2 comes from the traditional derivative. Could somebody help me understand why this is happening? Furthermore, how far back does the computational graph that stores the gradients go?
Specifically, I am coming from a meta learning perspective, where one is interested in computing a quantity of the following form ∇ L(theta - alpha * ∇ L(theta))=(1 + ∇^2 L(theta)) ∇L(theta - alpha * ∇ L(theta) (here the derivative is with respect to theta). Therefore, the computation, let's call it A, includes a second derivative. Computation is quite different than the following ∇_{theta - alpha ∇ L(theta)}L(\theta - alpha * ∇ L(theta))=∇_beta L(beta), which I will call B.
Hopefully, it is clear how the snippet I had is related to what I described in the second paragraph. My overall question is: under what circumstances does pytorch realize computation A vs computation B when using autograd.grad? I'd appreciate any explanation that goes into technical details about how this particular case is handled by autograd.
PD. The original code I was following made me wonder this is here; in particular, lines 69 through 106, and subsequently line 193, which is when they use autograd.grad. For the code is even more unclear because they do a lot of model.clone() and so on.
If the question is unclear in any way, please let me know.
I made a few changes:
I am not sure what torch.rand(2.0) is supposed to do. According to the text I simply set it to 2.
An intermediate variable z is added so that we can compute gradient w.r.t. to the original variable. Yours is overwritten.
Set create_graph=True to compute higher order gradients. See https://pytorch.org/docs/stable/generated/torch.autograd.grad.html
import torch
from torch import autograd
x = torch.ones(1, requires_grad=True)*2
y = torch.square(x)
grad = autograd.grad(y, x, create_graph=True)
z = x + grad[0]
y = torch.square(z)
grad2 = autograd.grad(y, x)
# yours is more like autograd.grad(y, z)
print(x)
print(grad)
print(grad2)

Gaussian process regression - explain behaviour

I'm looking into GP regression, but I'm getting some behaviour that I do not understand.
Basically, I wanted to show convergence for GP on the osciallatory Genz function (basically a period wave), which led me to this picture Gp convergence, sorry for the missing labels (x axis: num samples, y axis: relative error measure in 2000 points)
This is OK, but I was curious why it took so long before the error started to drop. Plotting the resulting GP fit I got this (busy) plot GP fit is orange, true function is blue. What I don't understand is what happens up until it starts to capture the true function. I assumed it had something to do with the kernel. The plot here uses a RBF kernel with length_scale = 1 (I also tried both higher and lower values, but got the same results).
I kind of expected it to have a more smooth behaviour even if it couldn't capture the true model.
So, to my question: why do I see this "spikey" behaviour? And can I do something to change it (kernel-wise or other)?
kernel = RBF(length_scale = 1, length_scale_bounds = (1e-2, 1e2))
gp = GaussianProcessRegressor(kernel=kernel)
gp.fit(X, y)
def genz(x, method = 'default'):
d = x.shape[1]
a = 10/d
w = 1/2
num_points = x.shape[0]
funcval = np.empty([1,num_points])
for i in range(num_points):
funcval[0,i] = np.cos(2 * np.pi * w + np.sum(a * x[i,:]))
return funcval
It seems like the optimized length scale is very small compared to its domain space. I also felt very weird when I was digging into this library; changing some hyperparameters and the number of optimization didn't work for me as well. It might be helpful to change your kernel function to matern with changing the gamma value but not very much. If you really want to customize as you want, I might recommend you to use gpytorch similar to torch implementation or the GPML matlab toolbox.

How to fit exponential function with python

I am new to python and I am trying to learn how to plot and fit data.
I have an empeirical formula for describing the function y(x)
and i want to fit it to an exponential of the form : y = a* x ^ b
I am using numpy.arrays but i am not sure numpy.polyfit is usefull here because i do not want to fit with high order polynomials, neither exponentials of the form : y = a * e ^ (b*x).
Can you please suggest a way to do this?
my function is this one, here written as y (E_n):
E_n = np.linspace(1, 10**6, 10**6)
y= 0.018*(E_n**(-2.7)) * (1/(1+(2.77*cos(45)*E_n/115)) + 0.367/(1+(1.18*cos(45)*E_n/850)))
Thank you
Consider using scipy.optimize.curve_fit. Define a function of the form you desire, pass it to the function. Read the linked documentation well. In many cases, you may need to pass chosen initial values for the parameters. curve_fit takes all of them to be 1 by default, and this might not yield desirable results.

Fitting Parametric Curves in Python

I have experimental data of the form (X,Y) and a theoretical model of the form (x(t;*params),y(t;*params)) where t is a physical (but unobservable) variable, and *params are the parameters that I want to determine. t is a continuous variable, and there is a 1:1 relationship between x and t and between y and t in the model.
In a perfect world, I would know the value of T (the real-world value of the parameter) and would be able to do an extremely basic least-squares fit to find the values of *params. (Note that I am not trying to "connect" the values of x and y in my plot, like in 31243002 or 31464345.) I cannot guarantee that in my real data, the latent value T is monotonic, as my data is collected across multiple cycles.
I'm not very experienced doing curve fitting manually, and have to use extremely crude methods without easy access to a basic scipy function. My basic approach involves:
Choose some value of *params and apply it to the model
Take an array of t values and put it into the model to create an array of model(*params) = (x(*params),y(*params))
Interpolate X (the data values) into model to get Y_predicted
Run a least-squares (or other) comparison between Y and Y_predicted
Do it again for a new set of *params
Eventually, choose the best values for *params
There are several obvious problems with this approach.
1) I'm not experienced enough with coding to develop a very good "do it again" other than "try everything in the solution space," of maybe "try everything in a coarse grid" and then "try everything again in a slightly finer grid in the hotspots of the coarse grid." I tried doing MCMC methods, but I never found any optimum values, largely because of problem 2
2) Steps 2-4 are super inefficient in their own right.
I've tried something like (resembling pseudo-code; the actual functions are made up). There are many minor quibbles that could be made about using broadcasting on A,B, but those are less significant than the problem of needing to interpolate for every single step.
People I know have recommended using some sort of Expectation Maximization algorithm, but I don't know enough about that to code one up from scratch. I'm really hoping there's some awesome scipy (or otherwise open-source) algorithm I haven't been able to find that covers my whole problem, but at this point I am not hopeful.
import numpy as np
import scipy as sci
from scipy import interpolate
X_data
Y_data
def x(t,A,B):
return A**t + B**t
def y(t,A,B):
return A*t + B
def interp(A,B):
ts = np.arange(-10,10,0.1)
xs = x(ts,A,B)
ys = y(ts,A,B)
f = interpolate.interp1d(xs,ys)
return f
N = 101
lsqs = np.recarray((N**2),dtype=float)
count = 0
for i in range(0,N):
A = 0.1*i #checks A between 0 and 10
for j in range(0,N):
B = 10 + 0.1*j #checks B between 10 and 20
f = interp(A,B)
y_fit = f(X_data)
squares = np.sum((y_fit - Y_data)**2)
lsqs[count] = (A,b,squares) #puts the values in place for comparison later
count += 1 #allows us to move to the next cell
i = np.argmin(lsqs[:,2])
A_optimal = lsqs[i][0]
B_optimal = lsqs[i][1]
If I understand the question correctly, the params are constants which are the same in every sample, but t varies from sample to sample. So, for example, maybe you have a whole bunch of points which you believe have been sampled from a circle
x = a+r cos(t)
y = b+r sin(t)
at different values of t.
In this case, what I would do is eliminate the variable t to get a relation between x and y -- in this case, (x-a)^2+(y-b)^2 = r^2. If your data fit the model perfectly, you would have (x-a)^2+(y-b)^2 = r^2 at each of your data points. With some error, you could still find (a,b,r) to minimize
sum_i ((x_i-a)^2 + (y_i-b)^2 - r^2)^2.
Mathematica's Eliminate command can automate the procedure of eliminating t in some cases.
PS You might do better at stats.stackexchange, math.stackexchange or mathoverflow.net . I know the last one has a scary reputation, but we don't bite, really!

Multiple linear regression in python without fitting the origin?

I found this chunk of code on http://rosettacode.org/wiki/Multiple_regression#Python, which does a multiple linear regression in python. Print b in the following code gives you the coefficients of x1, ..., xN. However, this code is fitting the line through the origin (i.e. the resulting model does not include a constant).
All I'd like to do is the exact same thing except I do not want to fit the line through the origin, I need the constant in my resulting model.
Any idea if it's a small modification to do this? I've searched and found numerous documents on multiple regressions in python, except they are lengthy and overly complicated for what I need. This code works perfect, except I just need a model that fits through the intercept not the origin.
import numpy as np
from numpy.random import random
n=100
k=10
y = np.mat(random((1,n)))
X = np.mat(random((k,n)))
b = y * X.T * np.linalg.inv(X*X.T)
print(b)
Any help would be appreciated. Thanks.
you only need to add a row to X that is all 1.
Maybe a more stable approach would be to use a least squares algorithm anyway. This can also be done in numpy in a few lines. Read the documentation about numpy.linalg.lstsq.
Here you can find an example implementation:
http://glowingpython.blogspot.de/2012/03/linear-regression-with-numpy.html
What you have written out, b = y * X.T * np.linalg.inv(X * X.T), is the solution to the normal equations, which gives the least squares fit with a multi-linear model. swang's response is correct (and EMS's elaboration)---you need to add a row of 1's to X. If you want some idea of why it works theoretically, keep in mind that you are finding b_i such that
y_j = sum_i b_i x_{ij}.
By adding a row of 1's, you are are setting x_{(k+1)j} = 1 for all j, which means that you are finding b_i such that:
y_j = (sum_i b_i x_{ij}) + b_{k+1}
because the k+1st x_ij term is always equal to one. Thus, b_{k+1} is your intercept term.

Categories