I'm a bit of a beginner and in the process of moving an algorithm that works with minimum variance optimization from scipy.minimize.optimize (which didn't perform properly) to CVXPY.
R are the expected returns, C the coveriances and rf the risk-free rate. w are the optimal weights and r various means along the Efficient Frontier for which the weights are calculated.
When I run the code below I get:
ValueError: setting an array element with a sequence.
I believe var is at fault here, but I don't know how else to structure it. Insight much appreciated. On top of that, the rest of the code could have additional errors so if you spot any please do point them out!
def solve_frontier(R, C, rf, context):
frontier_mean, frontier_var, frontier_weights = [], [], []
n = len(R)
w = cvx.Variable(n)
r = cvx.Parameter(sign='positive')
mean_1 = sum(R*w)
var = dot(dot(w, C), w)
penalty = (1/100)*abs(mean_1-r)
prob = cvx.Problem(cvx.Minimize(var + penalty),
[sum(w)-context.allowableMargin == 0])
r_vals = linspace(max(min(R), rf), max(R), num=20)
for i in range(20):
r.value = r_vals[i]
prob.solve()
frontier_mean.append(r)
frontier_var.append(compute_var(prob.value, C))
frontier_weights.append(prob.value)
print "status:", prob.status
return array(frontier_mean), array(frontier_var), frontier_weights
The problem was in frontier_mean.append(r), which should have been frontier_mean.append(r.value).
Related
The simple optimization model below (a support vector machine, see https://www.supplychaindataanalytics.com/creating-a-support-vector-machine-using-gekko-in-python/ for further information) is an NLP with T=86 and U=6 (dataset generated for this minimal working example).
import numpy as np
import gekko as op
import itertools as it
a = np.random.rand(86, 6)
b = np.random.randint(0,6, size=(86))
C = range(len(set(b))) #Set of classes
U = range(len(a[0])) #Set of input features
T = range(len(b)) #Set of the training points
def model (C,U,T,a,b,solve="y"):
save_b = tuple(b)
alpha_c=[None for j in C]
z_c=[None for j in C]
for j in C:
for t in T:
if b[t] == j:
b[t] = +1
else:
b[t] = -1
print(b)
m = op.GEKKO(remote=False, name='SupportVectorMachine')
alpha = {t: m.Var(lb=0, ub=None) for t in T}
n_a = {(t,i): a[t][i] for t,i in it.product(T,U)}
n_b = {t: b[t] for t in T}
objs = {0: m.sum([alpha[t] for t in T]) - 0.5*m.sum([alpha[t]*alpha[tt] * n_b[t]*n_b[tt] * m.sum([n_a[(t,i)]*n_a[(tt,i)] for i in U]) for t,tt in it.product(T,T)])}
cons = {0: {0: ( m.sum([alpha[t]*n_b[t] for t in T]) == 0) for t in T}}
m.Maximize(objs[0])
for keys1 in cons:
for keys2 in cons[keys1]: m.Equation(cons[keys1][keys2])
if solve == "y":
m.options.SOLVER=1
m.solve(disp=False)
for keys in alpha:
alpha[keys] = alpha[keys].value[0]
print(f"alpha[{keys}]", alpha[keys])
x = [None for i in U]
for i in U:
x[i]=sum(alpha[t]*b[t]*n_a[(t,i)] for t in T)
for t in T:
if alpha[t]>0:
z=b[t] - sum(x[i]*n_a[(t,i)] for i in U)
break
b = list(save_b)
alpha_c[j]=alpha
z_c[j]=z
return m,z,alpha
m, z, alpha = model(C,U,T,a,b) #Model and solve the problem
With m.options.SOLVER=1, the code exits with:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\USERNAME\\AppData\\Local\\Temp\\tmpj66p0g5qsupportvectormachine\\options.json'
With m.options.SOLVER=2, the code exits with:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\USERNAME\\AppData\\Local\\Temp\\tmpgat29b25supportvectormachine\\options.json'
With m.options.SOLVER=3, the code exits with:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\USERNAME\\AppData\\Local\\Temp\\tmpgat29b25supportvectormachine\\options.json'
With m = op.GEKKO(remote=True, name='SupportVectorMachine') the code seems to take too much time to run and no output is reported.
I wondered why such a case occurs and how I could troubleshoot the code? Do we need to feed the solver's algorithm with an initial guess every time? (I am using gekko (1.21.5).)
Thanks in advance.
ٍEdit:
On Google's Colaboratory, it exits with:
Exception: #error: Solution Not Found
Try IMODE=2 (regression mode) where the model is defined only once and automatically applied for each data point. This greatly improves the speed of the model compilation. Below is a minimal example of IMODE=2. There is also a comparison to Scipy Minimize for this same example and a nonlinear regression example.
import numpy as np
from gekko import GEKKO
# load data
xm = np.array([18.3447,79.86538,85.09788,10.5211,44.4556, \
69.567,8.960,86.197,66.857,16.875, \
52.2697,93.917,24.35,5.118,25.126, \
34.037,61.4445,42.704,39.531,29.988])
ym = np.array([5.072,7.1588,7.263,4.255,6.282, \
6.9118,4.044,7.2595,6.898,4.8744, \
6.5179,7.3434,5.4316,3.38,5.464, \
5.90,6.80,6.193,6.070,5.737])
# define model
m = GEKKO()
# one value across all data
a,b,c = m.Array(m.FV,3,value=0)
c.LOWER = -100; c.UPPER = 100
# load data
x = m.Param(value=xm)
ymeas = m.Param(value=ym)
# predicted value
ypred = m.Var()
a.STATUS = 1; b.STATUS = 1; c.STATUS = 1
m.Equation(ypred == a + b/x + c*m.log(x))
m.Minimize(((ypred-ymeas)/ymeas)**2)
m.options.IMODE = 2 # regression mode
m.solve() # remote=False for local solve
print('Final SSE Objective: ' + str(m.options.objfcnval))
There are also a couple things in the code that improve the performance.
The m.sum([n_a[(t,i)]*n_a[(tt,i)] for i in U]) in the objective function can be replaced with a regular summation as sum([n_a[(t,i)]*n_a[(tt,i)] for i in U]). When all of the arguments are floating point numbers, this allows evaluation of the value instead of the Gekko symbolic form of the summation.
Can the cons be defined once (dimension T) instead of the same constraint repeated (dimension T*T)?
cons = {0: m.sum([alpha[t]*n_b[t] for t in T]) == 0}
m.Maximize(objs[0])
for keys1 in cons:
m.Equation(cons[keys1])
Thanks for posting the SVM code. If it can be transitioned into IMODE=2 then the solution efficiency can be greatly improved. If it is not possible (stay with IMODE=3 default) then hopefully the above suggestions help.
We're working on a new module in Gekko to import sklearn (sklearn example), tensorflow, gpflow and other models into Gekko. The potential is to use those specially crafted regression packages and export models to the more general optimization gekko package. We've found that a general gradient-based solver may be doing too much work in the fitting by providing automatic differentiation of constraints and objective with sparse 1st and 2nd derivatives when only 1st derivatives are required. The specifically designed optimizers for each regression method may be more efficient, especially with large data sets that train on GPUs. More documentation and results are expected in the next release of Gekko v1.1+ around Sept 2022.
I try to implement non-negative matrix factorization in Theano. In more detail, I try to find two matrices L and R such that their product L x R represents a give matrix M as accurate as possible.
For finding L and R matrices I use back propagation. At some point I have noticed that values in L and R can be negative (of course nothing prevents back prop from doing that). I have tried to correct this behavior by adding the following lines after the back propagation step:
self.L.set_value(T.abs_(self.L).eval())
self.R.set_value(T.abs_(self.R).eval())
After that my program became much more slower.
Am I doing something wrong? Do I update the values of the tensors in a wrong way? Is there a way to do it faster?
ADDED
As requested in the comments, I provide more code. This is how I define the function in the __init__.
self.L = theano.shared(value=np.random.rand(n_rows, n_hids), name='L', borrow=True)
self.R = theano.shared(value=np.random.rand(n_hids, n_cols), name='R', borrow=True)
Y = theano.dot(self.L, self.R)
diff = X - Y
D = T.pow(diff, 2)
E = T.sum(D)
gr_L = T.grad(cost=E, wrt=self.L)
gr_R = T.grad(cost=E, wrt=self.R)
self.l_rate = theano.shared(value=0.000001)
L_ups = self.L - self.l_rate*gr_L
R_ups = self.R - self.l_rate*gr_R
updates = [(self.L, L_ups), (self.R, R_ups)]
self.backprop = theano.function([X], E, updates=updates)
Then in my train function I had this code:
for i in range(self.n_iter):
costs = self.backprop(X, F)
self.L.set_value(T.abs_(self.L).eval())
self.R.set_value(T.abs_(self.R).eval())
A minor remark, I use the abs_ function, but it would make actually more sense to use a function that replace negative values by zero.
You can force the symbolic update values for L and R to always be positive like this:
self.l_rate = theano.shared(value=0.000001)
L_ups = self.L - self.l_rate*gr_L
R_ups = self.R - self.l_rate*gr_R
# This force R and L to always be updated to a positive value
L_ups_abs = T.abs_(L_ups)
R_ups_abs = T.abs_(R_ups)
# Use the update L_ups_abs instead of L_ups (same with R_ups)
updates = [(self.L, L_ups_abs), (self.R, R_ups_abs)]
self.backprop = theano.function([X], E, updates=updates)
and remove the lines
self.L.set_value(T.abs_(self.L).eval())
self.R.set_value(T.abs_(self.R).eval())
from your training loop
My implementation of steepest descent for solving Ax = b is showing some weird behavior: for any matrix large enough (~10 x 10, have only tested square matrices so far), the returned x contains all huge values (on the order of 1x10^10).
def steepestDescent(A, b, numIter=100, x=None):
"""Solves Ax = b using steepest descent method"""
warnings.filterwarnings(action="error",category=RuntimeWarning)
# Reshape b in case it has shape (nL,)
b = b.reshape(len(b), 1)
exes = []
res = []
# Make a guess for x if none is provided
if x==None:
x = np.zeros((len(A[0]), 1))
exes.append(x)
for i in range(numIter):
# Re-calculate r(i) using r(i) = b - Ax(i) every five iterations
# to prevent roundoff error. Also calculates initial direction
# of steepest descent.
if (numIter % 5)==0:
r = b - np.dot(A, x)
# Otherwise use r(i+1) = r(i) - step * Ar(i)
else:
r = r - step * np.dot(A, r)
res.append(r)
# Calculate step size. Catching the runtime warning allows the function
# to stop and return before all iterations are completed. This is
# necessary because once the solution x has been found, r = 0, so the
# calculation below divides by 0, turning step into "nan", which then
# goes on to overwrite the correct answer in x with "nan"s
try:
step = np.dot(r.T, r) / np.dot( np.dot(r.T, A), r )
except RuntimeWarning:
warnings.resetwarnings()
return x
# Update x
x = x + step * r
exes.append(x)
warnings.resetwarnings()
return x, exes, res
(exes and res are returned for debugging)
I assume the problem must be with calculating r or step (or some deeper issue) but I can't make out what it is.
The code seems correct. For example, the following test work for me (both linalg.solve and steepestDescent give the close answer, most of the time):
import numpy as np
n = 100
A = np.random.random(size=(n,n)) + 10 * np.eye(n)
print(np.linalg.eig(A)[0])
b = np.random.random(size=(n,1))
x, xs, r = steepestDescent(A,b, numIter=50)
print(x - np.linalg.solve(A,b))
The problem is in the math. This algorithm is guaranteed to converge to the correct solution if A is positive definite matrix. By adding the 10 * identity matrix to a random matrix, we increase the probability that all the eigen-values are positive
If you test with large random matrices (for example A = random.random(size=(n,n)), you are almost certain to have a negative eigenvalue, and the algorithm will not converge.
I'm working on programming a MLE for the Polya distribution using scipy. The Nelder-Mead method is working, however I get a "Desired error not necessarily achieved due to precision loss." error when running BFGS. The Nelder-Mead method seems like it's too slow for my needs (I have a lot of fairly big data, say 1000 tables in some cases as big as 10x10000). I've tried using the check_grad function and the result is smallish on the example below (order 10^-2), so I'm not sure if that means there's a bug in the gradient of the log likelihood or the likelihood is just very strongly peaked. For what it's worth, I've stared quite hard at my code and I can't see the issue. Here's some example code to recreate the problem
#setup some data
from numpy.random import dirichlet, multinomial
from scipy.optimize import check_grad
alpha = [10,30,50]
p = pd.DataFrame(dirichlet(alpha,200))
data = p.apply(lambda x: multinomial(500,x),1)
a = np.array(data.mean(0))
#optimize
result = minimize(lambda a: -1*llike(data,exp(a)),
x0=np.log(a),
method='Nelder-Mead')
x0=result.x
result = minimize(lambda a: -1*llike(data,exp(a)),
x0=x0,
jac=lambda a: -1*gradient_llike(data,np.exp(a)),
method='BFGS')
exp(result.x) #should be close to alpha
#uhoh, let's check that this is right.
check_grad(func=lambda a: -1*llike(data,a),grad=lambda a: -1*gradient_llike(data,a),x0=alpha)
Here's the code for my functions
def log_polya(Z,alpha):
"""
Z is a vector of counts
https://en.wikipedia.org/wiki/Dirichlet-multinomial_distribution
http://mimno.infosci.cornell.edu/info6150/exercises/polya.pdf
"""
if not isinstance(alpha,np.ndarray):
alpha = np.array(alpha)
if not isinstance(Z,np.ndarray):
Z = np.array(Z)
#Concentration Parameter
A = sum(alpha)
#Number of Datapoints
N = sum(Z)
return gammaln(A) - gammaln(N+A) + sum(gammaln(Z+alpha) - gammaln(alpha))
def llike(data,alpha):
return sum(data.apply(log_polya,1,alpha=alpha))
def log_polya_derivative(Z,alpha):
if not isinstance(alpha,np.ndarray):
alpha = np.array(alpha)
if not isinstance(Z,np.ndarray):
Z = np.array(Z)
if 0. in Z+alpha:
Warning("invalid prior parameter,nans should be produced")
#Concentration Parameter
A = sum(alpha)
#Number of Datapoints
N = sum(Z)
K = len(Z)
return np.array([psi(A) - psi(N+A) + psi(Z[i]+alpha[i]) - psi(alpha[i]) for i in xrange(K)])
def gradient_llike(data,alpha):
return np.array(data.apply(log_polya_derivative,1,alpha=alpha).sum(0))
UPDATE: Still curious about this, but for those interested in a working implementation for this problem, the following code for implementing the Minka Fixed Point Algorithm seems to work well (i.e. recovers quickly values that are close to the true dirichlet parameter).
def minka_mle_polya(data):
"""
http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/minka-dirichlet.pdf
"""
data = np.array(data)
K = np.shape(data)[1]
alpha = np.array(data.mean(0))
alpha_new = np.ndarray((K))
precision = 10
while precision > 10**-5:
for k in range(K):
A = sum(alpha)
N = data.sum(1)
numerator = sum(
psi(data[:,k]+alpha[k])-psi(alpha[k])
)
denominator = sum(
psi(N+A)-psi(A)
)
alpha_new[k] = alpha[k]*numerator/denominator
precision = sum(abs(alpha_new - alpha))
alpha_old = np.array(alpha)
alpha = np.array(alpha_new)
print "Gap", precision
I'm trying to implement Bayesian PCA using PyMC library for python. But, I'm stuck where I define lower dimensional coordinates...
Model is
x = Wz + e
where x is observation vector, W is the transformation matrix, and z is lower dimensional coordinate vector.
First I define a distribution for the transformation matrix W. Each column is drawn from a normal distribution (zero mean, and identity covariance for simplicity)
def W_logp(value):
logLikes = np.array([multivariate_normal.logpdf(value[:,i], mean=np.zeros(dimX), cov=1) for i in range(0, dimZ)])
return logLikes.sum()
def W_random():
W = np.zeros([dimX, dimZ])
for i in range(0, dimZ):
W[:,i] = multivariate_normal.rvs(mean=np.zeros(dimX), cov=1)
return W
w0 = np.random.randn(dimX, dimZ)
W = pymc.Stochastic(
logp = W_logp,
doc = 'Transformation',
name = 'W',
parents = {},
random = W_random,
trace = True,
value = w0,
dtype = float,
rseed = 116.,
observed = False,
cache_depth = 2,
plot = False,
verbose = 0)
Then, I want to define distribution for z that is again a multivariate normal (zero mean, and identity covariance). However, I need to draw a z for each observation separately while W is common for all of them. So, I tried
z = pymc.MvNormal('z', np.zeros(dimZ), np.eye(dimZ), size=N)
However, pymc.MvNormal does not have a size parameter. So it raises an error. Next step would be
m = Data.mean(axis=0) + np.dot(W, z)
obs = pymc.MvNormal('Obs', m, C, value=Data, observed=True)
I did not give the specification for C above since it is irrelevant for now. Any ideas how to implement?
Thanks
EDIT
After Chris Fonnesbeck's answer I changed my code as follows
numD, dimX = Data.shape
dimZ = 3
mm = Data.mean(axis=0)
tau = pymc.Gamma('tau', alpha=10, beta=2)
tauW = pymc.Gamma('tauW', alpha=20, beta=2, size=dimZ)
#pymc.deterministic(dtype=float)
def C(tau=tau):
return (tau)*np.eye(dimX)
#pymc.deterministic(dtype=float)
def CW(tau=tauW):
return np.diag(tau)
W = [pymc.MvNormal('W%i'%i, np.zeros(dimZ), CW) for i in range(dimX)]
z = [pymc.MvNormal('z%i'%i, np.zeros(dimZ), np.eye(dimZ)) for i in range(numD)]
mu = [pymc.Lambda('mu%i'%i, lambda W=W, z=z: mm + np.dot(np.array(W), np.array(z[i]))) for i in range(numD)]
obs = [pymc.MvNormal('Obs%i'%i, mu[i], C, value=Data[i,:], observed=True) for i in range(numD)]
model = pymc.Model([tau, tauW] + obs + W + z)
mcmc = pymc.MCMC(model)
But this time, it tries to allocate a huge amount of memory (more than 8GB) when running pymc.MCMC(model), with numD=45 and dimX=504. Even when I try it with only numD=1 (so creating only 1 z, mu, and obs), it does the same. Any idea why?
Unfortunately, PyMC does not easily let you define vectors of multivariate stochastics. Hopefully we can make this happen in PyMC 3. For now, you would have to specify this using a container. For example:
z = [pymc.MvNormal('z_%i' % i, np.zeros(dimZ), np.eye(dimZ)) for i in range(N)]
Regarding the memory issue, try using a different backend for the traces. The default ("ram") keeps everything in RAM. You can try something like "pickle" or "sqlite" instead.
Regarding the plate notation, it might be something we could pursue for PyMC 3. Feel free to create an issue suggesting this in our issue tracker.