I try to implement non-negative matrix factorization in Theano. In more detail, I try to find two matrices L and R such that their product L x R represents a give matrix M as accurate as possible.
For finding L and R matrices I use back propagation. At some point I have noticed that values in L and R can be negative (of course nothing prevents back prop from doing that). I have tried to correct this behavior by adding the following lines after the back propagation step:
self.L.set_value(T.abs_(self.L).eval())
self.R.set_value(T.abs_(self.R).eval())
After that my program became much more slower.
Am I doing something wrong? Do I update the values of the tensors in a wrong way? Is there a way to do it faster?
ADDED
As requested in the comments, I provide more code. This is how I define the function in the __init__.
self.L = theano.shared(value=np.random.rand(n_rows, n_hids), name='L', borrow=True)
self.R = theano.shared(value=np.random.rand(n_hids, n_cols), name='R', borrow=True)
Y = theano.dot(self.L, self.R)
diff = X - Y
D = T.pow(diff, 2)
E = T.sum(D)
gr_L = T.grad(cost=E, wrt=self.L)
gr_R = T.grad(cost=E, wrt=self.R)
self.l_rate = theano.shared(value=0.000001)
L_ups = self.L - self.l_rate*gr_L
R_ups = self.R - self.l_rate*gr_R
updates = [(self.L, L_ups), (self.R, R_ups)]
self.backprop = theano.function([X], E, updates=updates)
Then in my train function I had this code:
for i in range(self.n_iter):
costs = self.backprop(X, F)
self.L.set_value(T.abs_(self.L).eval())
self.R.set_value(T.abs_(self.R).eval())
A minor remark, I use the abs_ function, but it would make actually more sense to use a function that replace negative values by zero.
You can force the symbolic update values for L and R to always be positive like this:
self.l_rate = theano.shared(value=0.000001)
L_ups = self.L - self.l_rate*gr_L
R_ups = self.R - self.l_rate*gr_R
# This force R and L to always be updated to a positive value
L_ups_abs = T.abs_(L_ups)
R_ups_abs = T.abs_(R_ups)
# Use the update L_ups_abs instead of L_ups (same with R_ups)
updates = [(self.L, L_ups_abs), (self.R, R_ups_abs)]
self.backprop = theano.function([X], E, updates=updates)
and remove the lines
self.L.set_value(T.abs_(self.L).eval())
self.R.set_value(T.abs_(self.R).eval())
from your training loop
Related
I have an assignment for school. First of all can you help me with confirming I have interpreted the question right? And also does the code seem somewhat ok? There have been other tasks before this like create the class with a two dimensional function, write the newton method and so on. And now this question. Im not finished programming it, but Im a bit stuck and I feel like I dont know exactly what to do. On what do I run my Newton method? On the point P. Do I create it like I have done in the Plot method??
This is the question:
Write a method plot that checks the dependence of Newton’s method on
several initial vectors x0. This method should plot what is described
in the following steps:
• Use the meshgrid command to set up a grid of
N2 points in the set G = [a, b]×[c, d] (the parameters N, a, b, c and
d are parameters of the methods). You obtain two matrices X and Y
where a specific grid point is defined as pij = (Xij , Yij )
class fractals2D(object):
Allzeroes = [] #a list to add all stored values from each run of newtons method
def __init__(self,f, x):
self.f=f
f0 = self.f(x) #giving a variable name with the function to use in ckass
n=len(x) #for size of matrice
jac=zeros([n]) #creates an array to use for jacobian matrice
h=1.e-8 #to set h for derivative
self.jac = jac
for i in range(n): #creating loop to take partial derivatives of x and y from x in f
temp=x[i]
#print(x[i])
x[i]=temp +h #why setting x[i] two times?
#print(x[i])
f1=f(x)
x[i]=temp
#print(x[i])
jac[:,i]=(f1-f0)/h
def Newtons_method(self,guess):
f_val = f(guess)
self.guess = guess
for i in range(40):
delta = solve(self.jac,-f_val)
guess = guess +delta
if norm((delta),ord=2)<1.e-9:
return guess #alist for storing zeroes from one run
def ZeroesMethod(self, point):
point = self.guess
self.Newtons_method(point)
#adds zeroes from the run of newtons to a list to store them all
self.Allzeroes.append(self.guess)
return (len(self.Allzeroes)) #returns how many zeroes are found
def plot(self, N, a, b, c, d):
x = np.linspace(a, b, N)
y = np.linspace(c, d, N)
P = [X, Y] = np.meshgrid(x, y)
return P #calling ZeroesMethos with our newly meshed point of several arrays
x0 = array([2.0, 1.0]) #creates an x and y value?
x1= array([1, -5])
a= array([2, 8])
b = array([-2, -6])
def f(x):
f = np.array(
[x[0]**2 - x[1] + x[0]*cos(pi*x[0]),
x[0]*x[1] + exp(-x[1]) - x[0]**(-1)])
This is the errormessage im receiving:
delta = solve(self.jac,-f_val)
TypeError: bad operand type for unary -: 'NoneTyp
My implementation of steepest descent for solving Ax = b is showing some weird behavior: for any matrix large enough (~10 x 10, have only tested square matrices so far), the returned x contains all huge values (on the order of 1x10^10).
def steepestDescent(A, b, numIter=100, x=None):
"""Solves Ax = b using steepest descent method"""
warnings.filterwarnings(action="error",category=RuntimeWarning)
# Reshape b in case it has shape (nL,)
b = b.reshape(len(b), 1)
exes = []
res = []
# Make a guess for x if none is provided
if x==None:
x = np.zeros((len(A[0]), 1))
exes.append(x)
for i in range(numIter):
# Re-calculate r(i) using r(i) = b - Ax(i) every five iterations
# to prevent roundoff error. Also calculates initial direction
# of steepest descent.
if (numIter % 5)==0:
r = b - np.dot(A, x)
# Otherwise use r(i+1) = r(i) - step * Ar(i)
else:
r = r - step * np.dot(A, r)
res.append(r)
# Calculate step size. Catching the runtime warning allows the function
# to stop and return before all iterations are completed. This is
# necessary because once the solution x has been found, r = 0, so the
# calculation below divides by 0, turning step into "nan", which then
# goes on to overwrite the correct answer in x with "nan"s
try:
step = np.dot(r.T, r) / np.dot( np.dot(r.T, A), r )
except RuntimeWarning:
warnings.resetwarnings()
return x
# Update x
x = x + step * r
exes.append(x)
warnings.resetwarnings()
return x, exes, res
(exes and res are returned for debugging)
I assume the problem must be with calculating r or step (or some deeper issue) but I can't make out what it is.
The code seems correct. For example, the following test work for me (both linalg.solve and steepestDescent give the close answer, most of the time):
import numpy as np
n = 100
A = np.random.random(size=(n,n)) + 10 * np.eye(n)
print(np.linalg.eig(A)[0])
b = np.random.random(size=(n,1))
x, xs, r = steepestDescent(A,b, numIter=50)
print(x - np.linalg.solve(A,b))
The problem is in the math. This algorithm is guaranteed to converge to the correct solution if A is positive definite matrix. By adding the 10 * identity matrix to a random matrix, we increase the probability that all the eigen-values are positive
If you test with large random matrices (for example A = random.random(size=(n,n)), you are almost certain to have a negative eigenvalue, and the algorithm will not converge.
I'm a bit of a beginner and in the process of moving an algorithm that works with minimum variance optimization from scipy.minimize.optimize (which didn't perform properly) to CVXPY.
R are the expected returns, C the coveriances and rf the risk-free rate. w are the optimal weights and r various means along the Efficient Frontier for which the weights are calculated.
When I run the code below I get:
ValueError: setting an array element with a sequence.
I believe var is at fault here, but I don't know how else to structure it. Insight much appreciated. On top of that, the rest of the code could have additional errors so if you spot any please do point them out!
def solve_frontier(R, C, rf, context):
frontier_mean, frontier_var, frontier_weights = [], [], []
n = len(R)
w = cvx.Variable(n)
r = cvx.Parameter(sign='positive')
mean_1 = sum(R*w)
var = dot(dot(w, C), w)
penalty = (1/100)*abs(mean_1-r)
prob = cvx.Problem(cvx.Minimize(var + penalty),
[sum(w)-context.allowableMargin == 0])
r_vals = linspace(max(min(R), rf), max(R), num=20)
for i in range(20):
r.value = r_vals[i]
prob.solve()
frontier_mean.append(r)
frontier_var.append(compute_var(prob.value, C))
frontier_weights.append(prob.value)
print "status:", prob.status
return array(frontier_mean), array(frontier_var), frontier_weights
The problem was in frontier_mean.append(r), which should have been frontier_mean.append(r.value).
I need to fit a function
z(u,v) = C u v^p
That is, I have a two-dimensional data set, and I have to find two parameters, C and p. Is there something in numpy or scipy that can do this in a straightforward manner? I took a look at scipy.optimize.leastsq, but it's not clear to me how I would use it here.
def f(x,u,v,z_data):
C = x[0]
p = x[1]
modelled_z = C*u*v**p
diffs = modelled_z - z_data
return diffs.flatten() # it expects a 1D array out.
# it doesn't matter that it's conceptually 2D, provided flatten it consistently
result = scipy.optimize.leastsq(f,[1.0,1.0], # initial guess at starting point
args = (u,v,z_data) # alternatively you can do this with closure variables in f if you like
)
# result is the best fit point
For your specific function you might be able to do it better - for example, for any given value of p there is one best value of C that can be determined by straightforward linear algebra.
You can transform the problem into a simple linear least squares problem, and then you don't need leastsq() at all.
z[i] == C * u[i] * v[i]**p
becomes
z[i]/u[i] == C * v[i]**p
And then
log(z[i]/u[i]) == log(C) + p * log(v[i])
Change variables and you can solve as a simple linear problem:
Z[i] == L + p * V[i]
Using numpy and assuming you have the data in arrays z, u and v, this is rendered as:
Z = log(z/u)
V = log(v)
p, L = np.polyfit(V, Z, 1)
C = exp(L)
You probably ought to put a try: and except: around it in case some of the u values are zero or there are negative values.
I'm trying to implement Bayesian PCA using PyMC library for python. But, I'm stuck where I define lower dimensional coordinates...
Model is
x = Wz + e
where x is observation vector, W is the transformation matrix, and z is lower dimensional coordinate vector.
First I define a distribution for the transformation matrix W. Each column is drawn from a normal distribution (zero mean, and identity covariance for simplicity)
def W_logp(value):
logLikes = np.array([multivariate_normal.logpdf(value[:,i], mean=np.zeros(dimX), cov=1) for i in range(0, dimZ)])
return logLikes.sum()
def W_random():
W = np.zeros([dimX, dimZ])
for i in range(0, dimZ):
W[:,i] = multivariate_normal.rvs(mean=np.zeros(dimX), cov=1)
return W
w0 = np.random.randn(dimX, dimZ)
W = pymc.Stochastic(
logp = W_logp,
doc = 'Transformation',
name = 'W',
parents = {},
random = W_random,
trace = True,
value = w0,
dtype = float,
rseed = 116.,
observed = False,
cache_depth = 2,
plot = False,
verbose = 0)
Then, I want to define distribution for z that is again a multivariate normal (zero mean, and identity covariance). However, I need to draw a z for each observation separately while W is common for all of them. So, I tried
z = pymc.MvNormal('z', np.zeros(dimZ), np.eye(dimZ), size=N)
However, pymc.MvNormal does not have a size parameter. So it raises an error. Next step would be
m = Data.mean(axis=0) + np.dot(W, z)
obs = pymc.MvNormal('Obs', m, C, value=Data, observed=True)
I did not give the specification for C above since it is irrelevant for now. Any ideas how to implement?
Thanks
EDIT
After Chris Fonnesbeck's answer I changed my code as follows
numD, dimX = Data.shape
dimZ = 3
mm = Data.mean(axis=0)
tau = pymc.Gamma('tau', alpha=10, beta=2)
tauW = pymc.Gamma('tauW', alpha=20, beta=2, size=dimZ)
#pymc.deterministic(dtype=float)
def C(tau=tau):
return (tau)*np.eye(dimX)
#pymc.deterministic(dtype=float)
def CW(tau=tauW):
return np.diag(tau)
W = [pymc.MvNormal('W%i'%i, np.zeros(dimZ), CW) for i in range(dimX)]
z = [pymc.MvNormal('z%i'%i, np.zeros(dimZ), np.eye(dimZ)) for i in range(numD)]
mu = [pymc.Lambda('mu%i'%i, lambda W=W, z=z: mm + np.dot(np.array(W), np.array(z[i]))) for i in range(numD)]
obs = [pymc.MvNormal('Obs%i'%i, mu[i], C, value=Data[i,:], observed=True) for i in range(numD)]
model = pymc.Model([tau, tauW] + obs + W + z)
mcmc = pymc.MCMC(model)
But this time, it tries to allocate a huge amount of memory (more than 8GB) when running pymc.MCMC(model), with numD=45 and dimX=504. Even when I try it with only numD=1 (so creating only 1 z, mu, and obs), it does the same. Any idea why?
Unfortunately, PyMC does not easily let you define vectors of multivariate stochastics. Hopefully we can make this happen in PyMC 3. For now, you would have to specify this using a container. For example:
z = [pymc.MvNormal('z_%i' % i, np.zeros(dimZ), np.eye(dimZ)) for i in range(N)]
Regarding the memory issue, try using a different backend for the traces. The default ("ram") keeps everything in RAM. You can try something like "pickle" or "sqlite" instead.
Regarding the plate notation, it might be something we could pursue for PyMC 3. Feel free to create an issue suggesting this in our issue tracker.