pytorch gesv gives different result than scipy sparse solve - python

I'm trying to implement baseline als subtraction in pytorch so that I can run it on my GPU but I am running into problems because pytorch.gesv gives a different result than scipy.linalg.spsolve. Here is my code for scipy:
def baseline_als(y, lam, p, niter=10):
L = len(y)
D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2))
w = np.ones(L)
for i in range(niter):
W = sparse.spdiags(w, 0, L, L)
Z = W + lam * D.dot(D.transpose())
z = spsolve(Z, w*y)
w = p * (y > z) + (1-p) * (y < z)
return z
and here is my code for pytorch
def baseline_als_pytorch(y, lam, p, niter=10):
diag = torch.tensor(np.repeat(1, L))
diag = torch.diag(diag, 0)
diag_minus_one = torch.tensor(np.repeat(-2, L - 1))
diag_minus_one = torch.diag(diag_minus_one, -1)
diag_minus_two = torch.tensor(np.repeat(1, L - 2))
diag_minus_two = torch.diag(diag_minus_two, -2)
D = diag + diag_minus_one + diag_minus_two
D = D[:, :L - 2].double()
w = torch.tensor(np.repeat(1, L)).double()
for i in range(10):
W = diag.double()
Z = W + lam * torch.mm(D, D.permute(1, 0))
z = torch.gesv(w * y, Z)
z = z[0].squeeze()
w = p * (y > z).double() + (1 - p) * (y < z).double()
return z
Sorry that the pytorch code looks so bad I'm just starting out in it.
I've confirmed that Z, w, and y are all the same going into both scipy and pytorch and that z is different between them right after I try to solve the system of equations.
Thanks for the comment, here is an example:
I use 100000 for lam and 0.001 for p.
Using the dummy input: y = (5,5,5,5,5,10,10,5,5,5,10,10,10,5,5,5,5,5,5,5),
I get (3.68010263, 4.90344214, 6.12679489, 7.35022406, 8.57384278, 9.79774074, 11.02197199, 12.2465927 , 13.47164891, 14.69711435,15.92287813, 17.14873257, 18.37456982, 19.60038184, 20.82626043,22.05215157, 23.27805103, 24.50400438, 25.73010693, 26.95625922) from scipy and
(6.4938312 , 6.46912395, 6.44440175, 6.41963499, 6.39477958,6.36977727, 6.34455582, 6.31907933, 6.29334844, 6.26735058, 6.24106029, 6.21443939, 6.18748732, 6.16024137, 6.13277694,6.10515785, 6.07743658, 6.04965455, 6.02184242, 5.99402035) from pytorch.
This is with just one iteration of the loop. Scipy is correct, pytorch is not.
Interestingly, if I use a shorter dummy input (5,5,5,5,5,10,10,5,5,5), I get the same answer from both. My real input is 1011 dimensional.

Your pytorch function is wrong (you never update W at the first line inside the for loop), moreover I get the result you say you got from Pytorch from Scipy too.
Scipy version
def baseline_als(y, lam=100000, p=1e-3, niter=1):
L = len(y)
D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2))
w = np.ones(L)
for i in range(niter):
W = sparse.spdiags(w, 0, L, L)
Z = W + lam * D.dot(D.transpose())
z = spsolve(Z, w*y)
w = p * (y > z) + (1-p) * (y < z)
return z
equivalent in Pytorch
def baseline_als_pytorch(y, lam=100000, p=1e-3, niter=1):
L = len(y)
D = torch.diag(torch.ones(L), 0) + torch.diag(-2 * torch.ones(L-1), -1) + torch.diag(torch.ones(L-2), -2)
D = D[:, :L-2].double()
w = torch.ones(L).double()
for i in range(niter):
W = torch.diag(w)
Z = W + lam * torch.mm(D, D.permute(1, 0))
z = torch.gesv(w * y, Z)
z = z[0].squeeze()
w = p * (y > z).double() + (1 - p) * (y < z).double()
return z
when I feed them with y = np.array([5,5,5,5,5,10,10,5,5,5,10,10,10,5,5,5,5,5,5,5], dtype='float64'):
scipy:
array([6.4938312 , 6.46912395, 6.44440175, 6.41963499, 6.39477958,
6.36977727, 6.34455582, 6.31907933, 6.29334844, 6.26735058,
6.24106029, 6.21443939, 6.18748732, 6.16024137, 6.13277694,
6.10515785, 6.07743658, 6.04965455, 6.02184242, 5.99402035])
pytorch:
tensor([6.4938, 6.4691, 6.4444, 6.4196, 6.3948, 6.3698, 6.3446, 6.3191, 6.2933,
6.2674, 6.2411, 6.2144, 6.1875, 6.1602, 6.1328, 6.1052, 6.0774, 6.0497,
6.0218, 5.9940], dtype=torch.float64)
If I increase n_iter to 10:
scipy:
array([5.00202571, 5.00199038, 5.00195504, 5.00191963, 5.0018841 ,
5.00184837, 5.00181235, 5.00177598, 5.00173927, 5.00170221,
5.00166475, 5.00162685, 5.00158851, 5.00154979, 5.00151077,
5.00147155, 5.0014322 , 5.00139276, 5.00135329, 5.0013138 ])
pytorch:
tensor([5.0020, 5.0020, 5.0020, 5.0019, 5.0019, 5.0018, 5.0018, 5.0018, 5.0017,
5.0017, 5.0017, 5.0016, 5.0016, 5.0015, 5.0015, 5.0015, 5.0014, 5.0014,
5.0014, 5.0013], dtype=torch.float64)
And it checks out with the code of baseline als you linked to in your question.

Related

How I can do a double Reimann sum with one ecuation limit in python

I'm triying to make a double reimann sum, with a limit b = (x^2 + y^2 = 16), the problem is when I use Sympy and marks as a TypeError in the linespace column, I tried to def the ecuation but nothing works, I'm doing somethin wrong or should I change something?
import numpy as np
import matplotlib.pyplot as plt
import sympy
# function to integrate = x +3*y + 1
# Limit funtion 'b' =(x^2 + y^2 = 16)
Width=15; Length=20;
x = sympy.Symbol('x')
a = 0
b =(sympy.sqrt(16-x**2))
c = 0
d = 3
#Heigth of x
deltax= (b - a) / Width
#Heigth of y
deltay = (d - c) / Length
#Area of each square
dA = deltax * deltay
x = np.linspace((a, b - deltax, Width));
y = np.linspace((c, d - deltay, Length));
f = lambda x,y: x +3*y + 1
[X, Y] = np.meshgrid(x, y);
#reimann sum
Suma=sum(dA * f(X, Y))
Suma = sum(Suma)
int(Suma)
print(Suma)

use Theano to get the w_0 and w_1 parameters

I have a problem where i have to Create a dataset ,
Afterwards,I have to use Theano to get the w_0 and w_1 parameters of the following model:
y = log(1 + w_0 * |x|) + (w_1 * |x|)
the datasets are created and i have computed the w_0 and w_1 values but with numpy using the following code but I have studied throughly but don't know how to compute w_0 and w_1 values with theano .. how can I compute these using theano?
It will be great help thankyou :)
code that i am using :
import numpy as np
import math
import theano as t
#code to generate datasets
trX = np.linspace(-1, 1, 101)
trY = np.linspace(-1, 1, 101)
for i in range(len(trY)):
trY[i] = math.log(1 + 0.5 * abs(trX[i])) + trX[i] / 3 + np.random.randn() * 0.033
#code that produce w0 w1 and i want to compute it with theano
X = np.column_stack((np.ones(101, dtype=trX.dtype), trX))
print(X.shape)
Xplus = np.linalg.pinv(X) #pseudo-inverse of X
w_opt = Xplus # trY #The # symbol denotes matrix multiplication
print(w_opt)
x = abs(trX) #abs is a built in function to return positive values in a array
y= trY
for i in range(len(trX)):
y[i] = math.log(1 + w_opt[0] * x[i]) + (w_opt[1] * x[i])
Good morning Hina Malik,
Using the gradient descent algorithm and with the right model selection, this problem should be solved. also, you should create 2 shared variables (w & c) one for each parameter.
X = T.scalar()
Y = T.scalar()
def model(X, w, c):
return X * w + c
w = theano.shared(np.asarray(0., dtype = theano.config.floatX))
c = theano.shared(np.asarray(0., dtype = theano.config.floatX))
y = model(X, w, c)
learning_rate=0.01
cost = T.mean(T.sqr(y - Y))
gradient_w = T.grad(cost = cost, wrt = w)
gradient_c = T.grad(cost = cost, wrt = c)
updates = [[w, w - gradient_w * learning_rate], [c, c - gradient_c * learning_rate]]
train = theano.function(inputs = [X, Y], outputs = cost, updates = updates)
coste=[] #Variable para almacenar los datos de coste para poder representarlos gráficamente
for i in range(101):
for x, y in zip(trX, trY):
cost_i = train(x, y)
coste.append(cost_i)
w0=float(w.get_value())
w1=float(c.get_value())
print(w0,w1)
I replied also to the same or very similar topic in the 'Spanish' version of StackOverFlow here: go to solution
I hope this can help you
Best regards

Why isn't my gradient descent algorithm working?

I made a gradient descent algorithm in Python and it doesn't work. My m and b values keep increasing and never stop until I get the -inf error or the overflow encountered in square error.
import numpy as np
x = np.array([2,3,4,5])
y = np.array([5,7,9,5])
m = np.random.randn()
b = np.random.randn()
error = 0
lr = 0.0001
for q in range(1000):
for i in range(len(x)):
ypred = m*x[i] + b
error += (ypred - y[i]) **2
m = m - (x * error) *lr
b = b - (lr * error)
print(b,m)
I expected my algorithm to return the best m and b values for my data (x and y) but it didn't work. What is going wrong?
import numpy as np
x = np.array([2,3,4,5])
y = 0.3*x+0.6
m = np.random.randn()
b = np.random.randn()
lr = 0.001
for q in range(100000):
ypred = m*x + b
error = (1./(2*len(x))) * np.sum(np.square(ypred - y)) #eq 1
m = m - lr * np.sum((ypred - y)*x)/len(x) # eq 2 and eq 4
b = b - lr * np.sum(ypred - y)/len(x) # eq 3 and eq 5
print (m , b)
Output:
0.30007724168011807 0.5997039817571881
Math behind it
Use numpy vectorized operations to avoid loops.
I think you implemented the formula incorrectly:
Use summation on x - error
divide by length of x
See below code:
import numpy as np
x = np.array([2,3,4,5])
y = np.array([5,7,9,11])
m = np.random.randn()
b = np.random.randn()
error = 0
lr = 0.1
print(b, m)
for q in range(1000):
ypred = []
for i in range(len(x)):
temp = m*x[i] + b
ypred.append(temp)
error += temp - y[i]
m = m - np.sum(x * (ypred-y)) *lr/len(x)
b = b - np.sum(lr * (ypred-y))/len(x)
print(b,m)
Output:
-1.198074371762264 0.058595039571115955 # initial weights
0.9997389097653074 2.0000681277214487 # Final weights

Scipy Minimization TNC Working, But Not CG

I'm trying to complete week 4 the Machine Learning course on Coursera. The assingment uses the MINST data for multi-class classification.
The dimensions are X (5000,401), y (5000,1), theta (10,401), which start off as arrays. X was inserted with 1's on the first feature column.
My cost and gradient functions are below:
def sigmoid(z):
g = 1 / (1 + np.exp(-z))
return g
def lrCostFunction(theta, X, y, my_lambda):
m = float(len(X))
theta = np.matrix(theta)
X = np.matrix(X)
y = np.matrix(y)
#cost function:
term1 = np.multiply(-y,np.log(sigmoid(X*theta.T)))
term2 = np.multiply((1-y),np.log(1-sigmoid(X*theta.T)))
reg = np.power(theta[:,1:theta.shape[1]],2)
J = np.sum(term1-term2)/m + (my_lambda/(2.0*m) * np.sum(reg))
return J
def gradient (theta, X, y, my_lambda):
m = float(len(X))
theta = np.matrix(theta)
X = np.matrix(X)
y = np.matrix(y)
#gradient:
error = sigmoid(X * theta.T) - y
g = (X.T * error /(m)).T + ((my_lambda/m) * theta)
g[0,0] = np.sum(np.multiply(error, X[:,0])) / m
return g
Here is my One vs All classification function with the TNC optimization:
def oneVsAll(X, y, num_labels, my_lambda):
m = float(X.shape[0])
n = float(X.shape[1])-1
all_theta = np.zeros((num_labels,n+1))
for K in range(1, num_labels + 1):
theta = np.zeros(n+1)
y_logical = np.array([1 if j == K else 0 for j in y]).reshape(m,1)
opt_theta = opt.minimize(fun=lrCostFunction, x0=theta, \
args=(X,y_logical,my_lambda), \
method='TNC', jac=gradient).x
all_theta[K-1,:] = opt_theta
return all_theta
When I try to run CG however, it returns the error at line 8: "shapes (1,401) and (1,401) not aligned: 401 (dim 1) != 1 (dim 0)":
def oneVsAll(X, y, num_labels, my_lambda):
m = float(X.shape[0])
n = float(X.shape[1])-1
all_theta = np.zeros((num_labels,n+1))
for K in range(1, num_labels + 1):
theta = np.zeros(n+1)
y_logical = np.array([1 if j == K else 0 for j in y]).reshape(m,1)
opt_theta = opt.fmin_cg(f=lrCostFunction, x0=theta, \
fprime=gradient, \
args=(X,y_logical,my_lambda))
all_theta[K-1,:] = opt_theta
return all_theta
I saw elsewhere that CG only likes 1-d vectors from y. If I try to flatten y or reduce its dimension, however, everything else breaks. Is it generally a bad idea to use np.matrix as oppose to use np.dot with arrays? I like being able to easily transpose with matrixes.
Any help would be greatly appreciated.

alternate of spsolve and spdiag in matlab/octave

I have a python code which I am trying to convert to Matlab code. The code is for baseline correction for a wave.
def baseline_als(y, lam, p, niter=20):
L = len(y)
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
w = np.ones(L)
for i in xrange(niter):
W = sparse.spdiags(w, 0, L, L)
Z = W + lam * D.dot(D.transpose())
z = spsolve(Z, w*y)
w = p * (y > z) + (1-p) * (y < z)
return z
I have tried it converting like this.
function [z] = baseline_als(y, lam, p, niter=20)
L = len(y)
D = sparse.csc_matrix(diff(eye(L), 2))
w = ones(L)
for i = 1:niter
W = sparse.spdiags(w, 0, L, L) %Not working
Z = W + lam * dot(D,transpose(D))
z = spsolve(Z, w*y) % Not working
w = p * (y > z) + (1-p) * (y < z)
end % End of For loop
end % End of function
However there are no functions named spsolve and spdiag in octave/matlab. Is there any alternate function that I can use?
Its quite easy if you know what spsolve does. Lets focus on that, as spidiag seems easier to solve, doesn't it?
spsolve "Solve the sparse linear system Ax=b, where b may be a vector or a matrix."
This is exactly what MATLABs\ or mldivide does, it solves a system of Ax=b, for x. Happily for you, MATLAB can deal with both sparse and dense matrix with the same function, thus the change shoudl be as easy as:
from:
z = spsolve(Z, w*y)
to:
z= Z\(w*y);

Categories