I have a written a simple python code to calculate the entropy of a set and I am trying to write the same thing in Theano.
import math
# this computes the probabilities of each element in the set
def prob(values):
return [float(values.count(v))/len(values) for v in values]
# this computes the entropy
def entropy(values):
p = prob(values)
return -sum([v*math.log(v) for v in p])
I am trying to write the equivalent code in Theno, but I am not sure how to do it:
import theano
import theano.tensor as T
v = T.vector('v') # I create a symbolic vector to represent my initial values
p = T.vector('p') # The same for the probabilities
# this is my attempt to compute the probabilities which would feed vector p
theano.scan(fn=prob,outputs_info=p,non_sequences=v,n_steps=len(values))
# considering the previous step would work, the entropy is just
e = -T.sum(p*T.log(p))
entropy = theano.function([values],e)
However, the scan line is not correct and I get tons of errors. I am not sure if there is a simple way to do it (to compute the entropy of a vector), or if I have to put more effort on the scan function. Any ideas?
Other than the point raised by nouiz, P should not be declared as a T.vector because it will be the result of computation on your vector of values.
Also, to compute something like entropy, you do not need to use Scan (Scan introduces a computation overhead so it should only be used because there's no other way of computing what you want or to reduce memory usage); you can take a approach like this :
values = T.vector('values')
nb_values = values.shape[0]
# For every element in 'values', obtain the total number of times
# its value occurs in 'values'.
# NOTE : I've done the broadcasting a bit more explicitly than
# needed, for clarity.
freqs = T.eq(values[:,None], values[None, :]).sum(0).astype("float32")
# Compute a vector containing, for every value in 'values', the
# probability of that value in the vector 'values'.
# NOTE : these probabilities do *not* sum to 1 because they do not
# correspond to the probability of every element in the vector 'values
# but to the probability of every value in 'values'. For instance, if
# 'values' is [1, 1, 0] then 'probs' will be [2/3, 2/3, 1/3] because the
# value 1 has probability 2/3 and the value 0 has probability 1/3 in
# values'.
probs = freqs / nb_values
entropy = -T.sum(T.log2(probs) / nb_values)
fct = theano.function([values], entropy)
# Will output 0.918296...
print fct([0, 1, 1])
Related
I am trying to generate a list of 12 random weights for a stock portfolio in order to determine how the portfolio would have performed in the past given different weights assigned to each stock. The sum of the weights must of course be 1 and there is an additional restriction: each stock must have a weight between 1/24 and 1/4.
Although I am able to generate random numbers such that they all fall within the interval by using random.uniform(), as well as guarantee their sum is 1 by dividing each weighting by the sum of the weightings, I'm finding that
a) each subsequent array of weightings is very similar. I am rarely getting values for weightings that are near the upper boundary of 1/4
b) random.seed() does not seem to be working properly, whether I put it in the randweight() function or at the beginning of the for loop. I'm confused as to why because I thought that generating a random seed value would make my array of weights unique for each iteration. Currently, it's cyclical, with a period of 3.
The following is my code:
# boundaries on weightings
n = 12
min_weight = (1/(2*n))
max_weight = 25 / 100
def rand_weight(e):
random.seed()
return e + np.random.uniform(min_weight, max_weight)
for i in range(100):
weights = np.empty(12)
while not (np.all(weights > min_weight) and np.all(weights < max_weight)):
weights = np.array(list(map(rand_weight, weights)))
weights /= np.sum(weights)
I have already tried scattering the weights by changing the min_weight and max_weight inside the for loop so that rand_weight generates newer values, but this makes the runtime really slow because the "not" condition in the while loop takes longer to evaluate to false (since the probability of all the numbers being in the range decreases).
Lets start with simple facts first. If you want numbers to be in the range [0.042...0.25] and 12 iid numbers in total summed to one, then for mean value
Sum(Xi)=1
E[Sum(Xi)]=Sum(E[Xi])=N E[Xi] = 1
E[Xi]=1/N = 1/12 = 0.083
One corollary is that it would be hard to get numbers close to upper range boundary.
And instead doing things like sampling whatever and then normalizing to get sum to 1, better to use known distribution where sum of values is 1 to begin with.
So lets use Dirichlet distribution, and sample points uniformly in simplex, which means alpha (concentration) vector is all ones.
import numpy as np
N = 12
s = np.random.dirichlet(N*[1.0], 1)
print(np.sum(s))
Some value would be larger (or smaller), and you could reject them
def sampleWeights(alpha, lo, hi):
while True:
s = np.random.dirichlet(alpha, 1)[0]
if np.any(s > hi):
continue # reject
if np.any(s < lo):
continue # reject
return s # accept
and call it like this
N=12
alpha = N*[1.0]
q = sampleWeights(alpha, 1./24., 1./4.)
if you could check it, a lot of rejections happens at low bound, rather then high bound.
BEauty of using known Dirichlet distribution is that you could "concentrate" sampled values around mean, e.g.
alpha = N*[10.0]
q = sampleWeights(alpha, 1./24., 1./4.)
will produce same iid with mean of 1/12 but a lot smaller std.deviation, RV a lot more concentrated around mean
And if you want non-identically distributed RVs, use different alphas
alpha = [1.,2.,3.,4.,5.,6.,6.,5.,4.,3.,2.,1.]
q = sampleWeights(alpha, 1./24., 1./4.)
then some of RVs would be close to upper boundary, and some close to lower boundary. Lots of advantages to use known distribution
The following works. Particularly confusing to me is that np.empty(12) seemed to always return the same array. So once it had been initialized, it stayed the same.
This seems to produce numbers above 0.22 reasonably often.
import numpy as np
from random import random, seed
# boundaries on weightings
n = 12
min_weight = (1/(2*n))
max_weight = 25 / 100
seed(666)
for i in range(100):
weights = np.zeros(n)
while not (np.all(weights > min_weight) and np.all(weights < max_weight)):
weights = np.array([random() for _ in range(n)])
weights /= np.sum(weights) - min_weight * n
weights += min_weight
print(weights)
I'm working on an assignment where I am tasked to implement PCA in Python for an online course. Unfortunately, when I try to run a comparison (provided by the course) between my implementation and SKLearn's, my results appear to differ too greatly.
After many hours of review, I am still unsure where it is going wrong. If someone could take a look and determine what step I have coded or interpreted incorrectly, I would greatly appreciate it.
def normalize(X):
"""
Normalize the given dataset X to have zero mean.
Args:
X: ndarray, dataset of shape (N,D)
Returns:
(Xbar, mean): tuple of ndarray, Xbar is the normalized dataset
with mean 0; mean is the sample mean of the dataset.
Note:
You will encounter dimensions where the standard deviation is zero.
For those ones, the process of normalization results in normalized data with NaN entries.
We can handle this by setting the std = 1 for those dimensions when doing normalization.
"""
# YOUR CODE HERE
### Uncomment and modify the code below
mu = np.mean(X, axis = 0) # Setting axis = 0 will compute means column-wise. Setting it to 1 will compute the mean across rows.
std = np.std(X, axis = 0) # Computing the std dev column wise using axis = 0.
std_filled = std.copy()
std_filled[std == 0] = 1
# Compute the normalized data as Xbar
Xbar = (X - mu)/std_filled
return Xbar, mu, # std_filled
def eig(S):
"""
Compute the eigenvalues and corresponding unit eigenvectors for the covariance matrix S.
Args:
S: ndarray, covariance matrix
Returns:
(eigvals, eigvecs): ndarray, the eigenvalues and eigenvectors
Note:
the eigenvals and eigenvecs should be sorted in descending
order of the eigen values
"""
# YOUR CODE HERE
# Uncomment and modify the code below
# Compute the eigenvalues and eigenvectors
# You can use library routines in `np.linalg.*` https://numpy.org/doc/stable/reference/routines.linalg.html for this
eigvals, eigvecs = np.linalg.eig(S)
# The eigenvalues and eigenvectors need to be sorted in descending order according to the eigenvalues
# We will use `np.argsort` (https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html) to find a permutation of the indices
# of eigvals that will sort eigvals in ascending order and then find the descending order via [::-1], which reverse the indices
sort_indices = np.argsort(eigvals)[::-1]
# Notice that we are sorting the columns (not rows) of eigvecs since the columns represent the eigenvectors.
return eigvals[sort_indices], eigvecs[:, sort_indices]
def projection_matrix(B):
"""Compute the projection matrix onto the space spanned by the columns of `B`
Args:
B: ndarray of dimension (D, M), the basis for the subspace
Returns:
P: the projection matrix
"""
# YOUR CODE HERE
P = B # (np.linalg.inv(B.T # B)) # B.T
return P
def select_components(eig_vals, eig_vecs, num_components):
"""
Selects the n components desired for projecting the data upon.
Args:
eig_vals: The eigenvalues sorted in descending order of magnitude.
eig_vecs: The eigenvectors sorted in order relative to that of the eigenvalues.
num_components: the number of principal components to use.
Returns:
The number of desired components to keep for projection of the data upon.
"""
principal_vals, principal_components = eig_vals[:num_components], eig_vecs[:, range(num_components)]
return principal_vals, principal_components
def PCA(X, num_components):
"""
Projects normalized data onto the 'n' desired principal components.
Args:
X: ndarray of size (N, D), where D is the dimension of the data,
and N is the number of datapoints
num_components: the number of principal components to use.
Returns:
the reconstructed data, the sample mean of the X, principal values
and principal components
"""
# Normalize to have mean 0 and variance 1.
Z, mean_vec = normalize(X)
# Calculate the covariance matrix
S = np.cov(Z, rowvar=False, bias=True) # Set rowvar = False to treat columns as variables. Set bias = True to ensure normalization is done with N and not N-1
# Calculate the (unit) eigenvectors and eigenvalues of S. Sort them in descending order of importance relative to the magnitude of the eigenvalues.
eig_vals, eig_vecs = eig(S)
# Keep only the n largest Principle Components of the sorted unit eigenvectors.
principal_vals, principal_components = select_components(eig_vals, eig_vecs, num_components)
# Compute the projection matrix using only the n largest Principle Components of the sorted unit eigenvectors, where n = num_components.
#P = projection_matrix(eig_vecs[:, :num_components])
P = projection_matrix(principal_components)
# Reconstruct the data by using the projection matrix to project the data onto the principal component vectors we've kept
X_reconst = (P # X.T).T
return X_reconst, mean_vec, principal_vals, principal_components
And here is the test case I'm supposed to pass:
random = np.random.RandomState(0)
X = random.randn(10, 5)
from sklearn.decomposition import PCA as SKPCA
for num_component in range(1, 4):
# We can compute a standard solution given by scikit-learn's implementation of PCA
pca = SKPCA(n_components=num_component, svd_solver="full")
sklearn_reconst = pca.inverse_transform(pca.fit_transform(X))
reconst, _, _, _ = PCA(X, num_component)
# The difference in the result should be very small (<10^-20)
print(
"difference in reconstruction for num_components = {}: {}".format(
num_component, np.square(reconst - sklearn_reconst).sum()
)
)
np.testing.assert_allclose(reconst, sklearn_reconst)
As far as I can tell, there are a few things wrong with your code.
Your projection matrix is wrong.
If the eigenvectors of your covariance matrix is B with dimension D x M where M is the number of components you select and D is the dimension of the original data, then the projection matrix is just B # B.T.
In standard implementation of PCA, we typically do not scale the data by the inverse of the standard deviation. You seem to be trying to do an approximation of a whitened PCA (ZCA), but even then it looks wrong.
As a quick test, you can compute the normalized data without dividing by the standard deviation, and when you compute the covariance matrix, set bias=False.
You should also subtract the mean from the data before multiplying it by the projection operator, and adding it back after that, i.e.,
X_reconst = (P # (X - mean_vec).T).T + mean_vec.
PCA essentially is just a change of basis, followed by discarding coordinates corresponding to directions with low variance. The eigenvectors of the covariance matrix corresponds to the new orthogonal basis, and the eigenvalues tells you the variance of the data along the direction of the corresponding eigenvectors. P = B # B.T is just the change of basis followed to the new basis (and discarding some coordinates), B, followed by a change back to the original basis.
Edit
I'm curious to know which online course teaches people to implement PCA this way.
I wanna randomly select sample points based on the probability distribution specified by prob for a given row. However, I get the error in np.random.choice that the probabilities don't add up to 1. This is very weird because I first normalize using the L1-norm along the rows and then I define a uniform distribution if the values are smaller than the threshold 1e-6.
import numpy as np
import torch.nn.functional as F
prob = F.normalize(outputs, p=1, dim=1).clone().data.cpu().numpy() # outputs is a torch.Tensor of shape (14, 6890)
all_zero = np.where(prob.max(1) < 1e-6)[0] # find indices of rows where all values are smaller
prob[all_zero] = np.full(prob.shape[1], 1 / prob.shape[1]) # fill those rows uniformly
# ... somewhere later inside a method
for j in range(14):
sample = np.random.choice(6890, 4, replace=False, p=prob[j])
Do you understand, why that is?
As error suggests prob[j] doesn't sum to 1.
Your epsilon 1e-6 is way too big to be considered insignificant, there is no need for this operation at all. If you insist, you have to redistribute zero-ed out values across what's left to 1 (and it seems you did just that actually).
All in all you didn't normalize the array to 1:
prob /= prob.sum(axis=1) # make it prob dist
BTW. Broadcasting will extend your single number to the whole row, no need for np.full:
prob[all_zero] = 1 / prob.shape[1]
I am attempting to write a custom loss function in Keras from this paper. Namely, the loss I want to create is this:
This is a type of ranking loss for multi-class multi-label problems. Here are the details:
Y_i = set of positive labels for sample i
Y_i^bar = set of negative labels for sample i (complement of Y_i)
c_j^i = prediction on i^th sample at label j
In what follows, both y_true and y_pred are of dimension 18.
def multilabel_loss(y_true, y_pred):
""" Multi-label loss function.
More complete description here...
"""
zero = K.tf.constant(0, dtype=tf.float32)
where_one = K.tf.not_equal(y_true, zero)
where_zero = K.tf.equal(y_true, zero)
Y_p = K.tf.where(where_one)
Y_n = K.tf.where(where_zero)
n = K.tf.shape(y_true)[0]
loss = 0
for i in range(n):
# Here i is the ith sample; for a specific i, I find all locations
# where Y_p, Y_n belong to the ith sample; axis 0 denotes
# the sample index space
Y_p_i = K.tf.equal(Y_p[:,0], K.tf.constant(i, dtype=tf.int64))
Y_n_i = K.tf.equal(Y_n[:,0], K.tf.constant(i, dtype=tf.int64))
# Here I plug in those locations to get the values
Y_p_i = K.tf.where(Y_p_i)
Y_n_i = K.tf.where(Y_n_i)
# Here I get the indices of the values above
Y_p_ind = K.tf.gather(Y_p[:,1], Y_p_i)
Y_n_ind = K.tf.gather(Y_n[:,1], Y_n_i)
# Here I compute Y_i and its complement
yi = K.tf.shape(Y_p_ind)[0]
yi_not = K.tf.shape(Y_n_ind)[0]
# The value to normalize the inner summation
normalizer = K.tf.divide(1, K.tf.multiply(yi, yi_not))
# This creates a matrix of all combinations of indices k, l from the
# above equation; then it is reshaped
prod = K.tf.map_fn(lambda x: K.tf.map_fn(lambda y: K.tf.stack( [ x, y ] ), Y_n_ind ), Y_p_ind )
prod = K.tf.reshape(prod, [-1, 2, 1])
prod = K.tf.squeeze(prod)
# Next, the indices are fed into the corresponding prediction
# matrix, where the values are then exponentiated and summed
y_pred_gather = K.tf.gather(y_pred[i,:].T, prod)
s = K.tf.cast(K.sum(K.tf.exp(K.tf.subtract(y_pred_gather[:,0], y_pred_gather[:,1]))), tf.float64)
loss = loss + K.tf.multiply(normalizer, s)
return loss
My questions are the following:
When I go to compile my graph, I get an error revolving around n. Namely, TypeError: 'Tensor' object cannot be interpreted as an integer. I've looked around, but I can't find a way to stop this. My hunch is that I need to avoid a for loop altogether, which brings me to
How can I write this loss without for loops? I'm fairly new to Keras and have spent a solid few hours writing this custom loss myself. I'd love to write it more concisely. What's blocking me from using all matrices is the fact that Y_i and its complement can take on different sizes for each i.
Please let me know if you'd like me to elaborate more on my code. Happy to do so.
UPDATE 3
As per #Parag S. Chandakkar 's suggestions, I have the following:
def multi_label_loss(y_true, y_pred):
# set consistent casting
y_true = tf.cast(y_true, dtype=tf.float64)
y_pred = tf.cast(y_pred, dtype=tf.float64)
# this get all positive predictions and negative predictions
# it also exponentiates them in their respective Y_i classes
PT = K.tf.multiply(y_true, tf.exp(-y_pred))
PT_complement = K.tf.multiply((1-y_true), tf.exp(y_pred))
# this step gets the weight vector that we'll normalize by
m = K.shape(y_true)[0]
W = K.tf.multiply(K.sum(y_true, axis=1), K.sum(1-y_true, axis=1))
W_inv = 1./W
W_inv = K.reshape(W_inv, (m,1))
# this step computes the outer product of two tensors
def outer_product(inputs):
"""
inputs: list of two tensors (of equal dimensions,
for which you need to compute the outer product
"""
x, y = inputs
batchSize = K.shape(x)[0]
outerProduct = x[:,:, np.newaxis] * y[:,np.newaxis,:]
outerProduct = K.reshape(outerProduct, (batchSize, -1))
# returns a flattened batch-wise set of tensors
return outerProduct
# set up inputs to outer product
inputs = [PT, PT_complement]
# compute final loss
loss = K.sum(K.tf.multiply(W_inv, outer_product(inputs)))
return loss
This is not an answer but more like my thought process which should help you to write a concise code.
Firstly, I don't think you should worry about that error for now because by the time you eliminate for loops, your code may look very different.
Now, I haven't looked at the paper but the predictions c_j^i should be the raw values that come out of the last non-softmax layer (that is what I assume).
So you can add an additional exp layer and compute exp(c_j^i) for each prediction. Now, the for loop comes because of the summation. If you look closely, all it is doing is first forming pairs of all the labels and then subtracting their corresponding predictions. Now, first express the subtraction as exp(c_l^i) * exp(-c_k^i). To see what is happening, take a simple example.
import numpy as np
a = [1, 2, 3]
a = np.reshape(a, (3,1))
Following above explanation, you want the following result.
r1 = sum([1 * 2, 1 * 3, 2 * 3]) = sum([2, 3, 6]) = 11
You could get the same result by matrix multiplication, which is a way to elimiate for loops.
r2 = a * a.T
# r2 = array([[1, 2, 3],
# [2, 4, 6],
# [3, 6, 9]])
Extract the upper triangular part, i.e. 2, 3, 6 and sum the array to get 11, which is the result you want. Now, there may be some differences, for example, you may need to exhaustively form all the pairs. You should be able to convert it in the form of matrix multiplication.
Once you have taken care of the summation term, the normalization term can be easily computed if you pre-compute the quantities |Y_i| and \bar{Y_i} for each sample i. Pass them as input arrays and pass them into loss as a part of y_pred. The final summation over i will be done by Keras.
Edit 1: Even if |Y_i| and \bar{Y_i} take on different values, you should be able to build a generic formula for extracting the upper triangular part irrespective of the matrix size once you have pre-computed |Y_i| and \bar{Y_i}.
Edit 2: I don't think you understood me completely. In my opinion, NumPy shouldn't be used at all in the loss function. This is (mostly) doable using only Tensorflow. I will explain once more, while preserving my earlier explanation.
I now know that there is a cartesian product between the positive labels and negative labels (i.e. |Y_i| and \bar{Y_i}, respectively). So first, put a layer of exp after the raw predictions (in TF, not in Numpy).
Now, you need to know which indices out the 18 dimensions of y_true correspond to positive and which ones correspond to negative. If you are using one hot encoding, you can find this out on-the-fly by using tf.where and tf.gather (see here).
By now, you should know the indices j (in c_j^i) that correspond to positive and negative labels. All you need to do is compute \sum_(k, l) {exp(c_k^i) * (1 / exp(c_l^i))} for pairs (k, l). All you need to do is form one tensor consisting of exp(c_k^i) for all k (call it A) and another one consisting of exp(c_l^i) for all l (call it B). Then compute sum(A * B^T). No need to extract the upper triangular part too if you are taking cartesian product. By now, you should have the result of inner-most summation.
Contrary to what I said before, I think you could also compute the normalization factor on-the-fly from y_true.
You only have to figure out how to extend this to three dimensions to handle multiple samples.
Note: The usage of Numpy is probably possible by using tf.py_func but does not seem necessary here. Just use functions of TF.
I have the following setup for kmeans clustering algorithm that I am implementing for a project:
import numpy as np
import scipy
import sys
import random
import matplotlib.pyplot as plt
import operator
class KMeansClass:
#takes in an npArray like object
def __init__(self,dataset,k):
self.dataset=np.array(dataset)
#initialize mins to maximum possible value
self.min_x = sys.maxint
self.min_y = sys.maxint
#initialize maxs to minimum possible value
self.max_x = -(sys.maxint)-1
self.max_y = -(sys.maxint)-1
self.k = k
#a is the coefficient matrix that is continually updated as the centroids of the clusters change respectively.
# It is an mxk matrix where each row corresponds to a training_instance and each column corresponds to a centroid of a cluster
#Values are either 0 or 1. A value for a particular training_instance (data_point) is 1 only for that centroid to which the training_instance
# has the least distance else the value is 0.
self.a = np.zeros(shape=[self.dataset.shape[0],self.k])
self.distanceMatrix = np.empty(shape =[self.dataset.shape[0],self.k])
#initialize mu to zeros of the requisite shape array for now. Change this after implementing max and min methods.
self.mu = np.empty(shape=[k,2])
self.findMinMaxdataPoints()
self.initializeCentroids()
self.createDistanceMatrix()
self.scatterPlotOfInitializedPoints()
#pointa and pointb are npArray like vecors.
def euclideanDistance(self,pointa,pointb):
return np.sqrt(np.sum((pointa - pointb)**2))
""" Problem Initialization And Visualization Helper methods"""
##############################################################################
##param: dataset : list of tuples [(x1,y1),(x2,y2),...(xm,ym)]
def findMinMaxdataPoints(self):
for item in self.dataset:
self.min_x = min(self.min_x,item[0])
self.min_y = min(self.min_y,item[1])
self.max_x = max(self.max_x,item[0])
self.max_y = max(self.max_y,item[1])
def initializeCentroids(self):
for i in range(self.k):
#each value of mu is a tuple with a random number between (min_x - max_x) and (min_y - max_y)
self.mu[i] = (random.randint(self.min_x,self.max_x),random.randint(self.min_y,self.max_y))
self.sortCentroids()
print self.mu
def sortCentroids(self):
#the following 3 lines of code are to ensure that the mu values are always sorted in ascending order first with respect to the
#x values and then with respect to the y values.
half_sorted = sorted(self.mu,key=operator.itemgetter(1)) #sort wrt y values
full_sorted = sorted(half_sorted,key=operator.itemgetter(0)) #sort the y-sorted array wrt x-values
self.mu = np.array(full_sorted)
def scatterPlotOfInitializedPoints(self):
plt.scatter([item[0] for item in self.dataset],[item[1] for item in self.dataset],color='b')
plt.scatter([item[0] for item in self.mu],[item[1] for item in self.mu],color='r')
plt.show()
###############################################################################
#minimizing euclidean distance is the same as minimizing the square of the euclidean distance.
def calcSquareEuclideanDistanceBetweenTwoPoints(point_a,point_b):
return np.sum((pointa-pointb)**2)
def createDistanceMatrix(self):
for i in range(self.dataset.shape[0]):
for j in range(self.k):
self.distanceMatrix[i,j] = calcSquareEuclideanDistanceBetweenTwoPoints(self.dataset[i],self.mu[j])
def createCoefficientMatrix(self):
for i in range(self.dataset.shape[0]):
self.a[i,self.distanceMatrix[i].argmin()] = 1
#update functions for CoefficientMatrix and Centroid values:
def updateCoefficientMatrix(self):
for i in range(self.dataset.shape[0]):
self.a[i,self.distanceMatrix[i].argmin()]= 1
def updateCentroids(self):
for j in range(self.k):
non_zero_indices = np.nonzero(self.a[:,j])
avg = 0
for i in range(len(non_zero_indices[0])):
avg+=self.a[non_zero_indices[0][i],j]
self.mu[j] = avg/len(non_zero_indices[0])
############################################################
def lossFunction(self):
loss=0;
for j in range(self.k):
#vectorized this implementation.
loss+=np.sum(np.dot(self.a[:,j],self.distanceMatrix[:,j]))
return loss
Here my question pertains to the lossFunction and how to use this with the scipy.optimize package. I would like to minimize the loss function iteratively by performing the following steps:
Repeat until convergence:
a> Optimize 'a' by keeping mu constant ( I have an
updateCoefficientMatrix method for updating 'a' matrix which is an
mXk matrix where we have m training instances and k clusters.)
b> Optimize 'mu' by keeping 'a' constant (I have an updateCentroids
method to do this. where mu is a mXk matrix wherein m is number of
training instances and k is the number of clusters and the number of
centroids)
But I am very new to using scipy.optimize package so I am writing to ask for help as to how to invoke the scipy.optimize to achieve my optimization goal as stated above?
Basically I have 2 mxk matrices and I would like to minimize a lossFunction() by first optimizing one mxk matrix keeping the other constant and in the succeeding step optimize the second matrix keeping the first constant. This can be considered a special case of the expectation maximization problem but unfortunately I haven't quite gotten what the documentation is trying to say so far hence thought I'd turn to SO for help.
Thanks in advance!
And this is part of a class assignment so please do not post code! Any guidance or explanation would be highly appreciated.
Use scipy.optimize.minimize twice with different objective functions.
First run optimization with an objective function that takes a as a parameter, and returns the objective value.
As the second step, run scipy.optimize.minimize for a second time on a second objective function that takes mu as a parameter.
When writing the objective functions, remember that Python has nested functions, which avoids the need for passing mu (in the first case) or a (in the second case) as additional arguments; although it can be done by minimize(..., args=[mu]) and minimize(..., args=[a]).
Repeat the two-step process in a for loop, until the answer is such that your convergence condition is satisfied.