Is it possible to fit a multivariate GMHMM in hmmlearn?

Is it possible to fit a multivariate GMHMM in hmmlearn? - python

I know it is possible to fit several sequences into hmmlearn but it seems to me that these sequences need to be drawn from the same distributions.
Is it possible to fit a GMHMM with several observations sequences drawn from different distributions in hmmlearn?
My use case :
I would like to fit a GMHMM with K financial time series from different stocks and predict the market regime that generated the K stock prices at a specified time.
So the matrix input has dimension N (number of dates) × K (number of stocks).
If hmmlearn can't do that, please tell me if it is possible with another package in python or R?
Thanks for you help!

My approach to your problem will be to use a multi-variate Gaussian for emission probabilities.
For example: let's assume that K is 2, i.e., the number of locations is 2.
In hmmlearn, the K will be encoded in the dimensions of the mean matrix.
See, this example Sampling from HMM has a 2-dimensional output. In other words the X.shape = (N, K) where N is the length of the sample 500 in this case, and K is the dimension of the output which is 2.
Notice that the authors plotted each dimension on an axis, i.e., x-axis plots the first dimension X[:, 0], and the y-axis the second dimension X[:, 1].
To train your model, make sure that X1 and X2 are of the same shape as the sampled X in the example, and form the training dataset as described here.
In summary, adapt the example to your case by adjusting the K instead of K=2 and convert it to the GMHMM instead of GaussianHMM.
# Another example
model = hmm.GaussianHMM(n_components=5, covariance_type="diag", n_iter=100)
K = 3 # Number of sites
model.n_features = K # initialise that the model has size of observations = K
# Create a random training sequence (only 1 sequence) with length = 100.
X1 = np.random.randn(100, K) # 100 observation for K sites
model.fit(X1)
# Sample the fitted model
X, Z = model.sample(200)

Related

Draw categorical vectors from pyMC3 with dirichlet prior

I want to draw categorical vectors where its prior is a product of Dirichlet distributions. The categories are fixed and each element in the categorical vector corresponds to a different Dirichlet prior. Here is a categorical vector of length 33 with 4 categories, setup with prior with a Dirichlet.
import pymc3 as pm
with pm.Model() as model3:
theta = pm.Dirichlet(name='theta',a=np.ones((33,4)), shape=(33,4))
seq = [pm.Categorical(name='seq_{}'.format(str(i)), p=theta[i,:], shape=(1,)) for i in range(33)]
step1 = pm.Metropolis(vars=[theta])
step2 = [pm.CategoricalGibbsMetropolis(vars=[i]) for i in seq]
trace = pm.sample(50, step=[step1] + [i for i in step2])
However this approach is cumbersome as I have to do some array indexing to get the categorical vectors out. Are there better ways of doing this?

You don't need to specify the shape. Note that the way you've set it up there are 33 different categorical variables; I'm assuming that's what you've intended. Here's the easier way to do that:
with pm.Model() as model:
theta = pm.Dirichlet(name='theta',a=np.ones(4))
children = [pm.Categorical(f"seq_{i}", p=theta) for i in range(33)]

Constructing a Custom Loss Function in Keras

I am attempting to write a custom loss function in Keras from this paper. Namely, the loss I want to create is this:
This is a type of ranking loss for multi-class multi-label problems. Here are the details:
Y_i = set of positive labels for sample i
Y_i^bar = set of negative labels for sample i (complement of Y_i)
c_j^i = prediction on i^th sample at label j
In what follows, both y_true and y_pred are of dimension 18.
def multilabel_loss(y_true, y_pred):
""" Multi-label loss function.
More complete description here...
"""
zero = K.tf.constant(0, dtype=tf.float32)
where_one = K.tf.not_equal(y_true, zero)
where_zero = K.tf.equal(y_true, zero)
Y_p = K.tf.where(where_one)
Y_n = K.tf.where(where_zero)
n = K.tf.shape(y_true)[0]
loss = 0
for i in range(n):
# Here i is the ith sample; for a specific i, I find all locations
# where Y_p, Y_n belong to the ith sample; axis 0 denotes
# the sample index space
Y_p_i = K.tf.equal(Y_p[:,0], K.tf.constant(i, dtype=tf.int64))
Y_n_i = K.tf.equal(Y_n[:,0], K.tf.constant(i, dtype=tf.int64))
# Here I plug in those locations to get the values
Y_p_i = K.tf.where(Y_p_i)
Y_n_i = K.tf.where(Y_n_i)
# Here I get the indices of the values above
Y_p_ind = K.tf.gather(Y_p[:,1], Y_p_i)
Y_n_ind = K.tf.gather(Y_n[:,1], Y_n_i)
# Here I compute Y_i and its complement
yi = K.tf.shape(Y_p_ind)[0]
yi_not = K.tf.shape(Y_n_ind)[0]
# The value to normalize the inner summation
normalizer = K.tf.divide(1, K.tf.multiply(yi, yi_not))
# This creates a matrix of all combinations of indices k, l from the
# above equation; then it is reshaped
prod = K.tf.map_fn(lambda x: K.tf.map_fn(lambda y: K.tf.stack( [ x, y ] ), Y_n_ind ), Y_p_ind )
prod = K.tf.reshape(prod, [-1, 2, 1])
prod = K.tf.squeeze(prod)
# Next, the indices are fed into the corresponding prediction
# matrix, where the values are then exponentiated and summed
y_pred_gather = K.tf.gather(y_pred[i,:].T, prod)
s = K.tf.cast(K.sum(K.tf.exp(K.tf.subtract(y_pred_gather[:,0], y_pred_gather[:,1]))), tf.float64)
loss = loss + K.tf.multiply(normalizer, s)
return loss
My questions are the following:
When I go to compile my graph, I get an error revolving around n. Namely, TypeError: 'Tensor' object cannot be interpreted as an integer. I've looked around, but I can't find a way to stop this. My hunch is that I need to avoid a for loop altogether, which brings me to
How can I write this loss without for loops? I'm fairly new to Keras and have spent a solid few hours writing this custom loss myself. I'd love to write it more concisely. What's blocking me from using all matrices is the fact that Y_i and its complement can take on different sizes for each i.
Please let me know if you'd like me to elaborate more on my code. Happy to do so.
UPDATE 3
As per #Parag S. Chandakkar 's suggestions, I have the following:
def multi_label_loss(y_true, y_pred):
# set consistent casting
y_true = tf.cast(y_true, dtype=tf.float64)
y_pred = tf.cast(y_pred, dtype=tf.float64)
# this get all positive predictions and negative predictions
# it also exponentiates them in their respective Y_i classes
PT = K.tf.multiply(y_true, tf.exp(-y_pred))
PT_complement = K.tf.multiply((1-y_true), tf.exp(y_pred))
# this step gets the weight vector that we'll normalize by
m = K.shape(y_true)[0]
W = K.tf.multiply(K.sum(y_true, axis=1), K.sum(1-y_true, axis=1))
W_inv = 1./W
W_inv = K.reshape(W_inv, (m,1))
# this step computes the outer product of two tensors
def outer_product(inputs):
"""
inputs: list of two tensors (of equal dimensions,
for which you need to compute the outer product
"""
x, y = inputs
batchSize = K.shape(x)[0]
outerProduct = x[:,:, np.newaxis] * y[:,np.newaxis,:]
outerProduct = K.reshape(outerProduct, (batchSize, -1))
# returns a flattened batch-wise set of tensors
return outerProduct
# set up inputs to outer product
inputs = [PT, PT_complement]
# compute final loss
loss = K.sum(K.tf.multiply(W_inv, outer_product(inputs)))
return loss

This is not an answer but more like my thought process which should help you to write a concise code.
Firstly, I don't think you should worry about that error for now because by the time you eliminate for loops, your code may look very different.
Now, I haven't looked at the paper but the predictions c_j^i should be the raw values that come out of the last non-softmax layer (that is what I assume).
So you can add an additional exp layer and compute exp(c_j^i) for each prediction. Now, the for loop comes because of the summation. If you look closely, all it is doing is first forming pairs of all the labels and then subtracting their corresponding predictions. Now, first express the subtraction as exp(c_l^i) * exp(-c_k^i). To see what is happening, take a simple example.
import numpy as np
a = [1, 2, 3]
a = np.reshape(a, (3,1))
Following above explanation, you want the following result.
r1 = sum([1 * 2, 1 * 3, 2 * 3]) = sum([2, 3, 6]) = 11
You could get the same result by matrix multiplication, which is a way to elimiate for loops.
r2 = a * a.T
# r2 = array([[1, 2, 3],
# [2, 4, 6],
# [3, 6, 9]])
Extract the upper triangular part, i.e. 2, 3, 6 and sum the array to get 11, which is the result you want. Now, there may be some differences, for example, you may need to exhaustively form all the pairs. You should be able to convert it in the form of matrix multiplication.
Once you have taken care of the summation term, the normalization term can be easily computed if you pre-compute the quantities |Y_i| and \bar{Y_i} for each sample i. Pass them as input arrays and pass them into loss as a part of y_pred. The final summation over i will be done by Keras.
Edit 1: Even if |Y_i| and \bar{Y_i} take on different values, you should be able to build a generic formula for extracting the upper triangular part irrespective of the matrix size once you have pre-computed |Y_i| and \bar{Y_i}.
Edit 2: I don't think you understood me completely. In my opinion, NumPy shouldn't be used at all in the loss function. This is (mostly) doable using only Tensorflow. I will explain once more, while preserving my earlier explanation.
I now know that there is a cartesian product between the positive labels and negative labels (i.e. |Y_i| and \bar{Y_i}, respectively). So first, put a layer of exp after the raw predictions (in TF, not in Numpy).
Now, you need to know which indices out the 18 dimensions of y_true correspond to positive and which ones correspond to negative. If you are using one hot encoding, you can find this out on-the-fly by using tf.where and tf.gather (see here).
By now, you should know the indices j (in c_j^i) that correspond to positive and negative labels. All you need to do is compute \sum_(k, l) {exp(c_k^i) * (1 / exp(c_l^i))} for pairs (k, l). All you need to do is form one tensor consisting of exp(c_k^i) for all k (call it A) and another one consisting of exp(c_l^i) for all l (call it B). Then compute sum(A * B^T). No need to extract the upper triangular part too if you are taking cartesian product. By now, you should have the result of inner-most summation.
Contrary to what I said before, I think you could also compute the normalization factor on-the-fly from y_true.
You only have to figure out how to extend this to three dimensions to handle multiple samples.
Note: The usage of Numpy is probably possible by using tf.py_func but does not seem necessary here. Just use functions of TF.

Vectorized SVM gradient

I was going through the code for SVM loss and derivative, I did understand the loss but I cannot understand how the gradient is being computed in a vectorized manner
def svm_loss_vectorized(W, X, y, reg):
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
num_train = X.shape[0]
scores = X.dot(W)
yi_scores = scores[np.arange(scores.shape[0]),y]
margins = np.maximum(0, scores - np.matrix(yi_scores).T + 1)
margins[np.arange(num_train),y] = 0
loss = np.mean(np.sum(margins, axis=1))
loss += 0.5 * reg * np.sum(W * W)
Understood up to here, After here I cannot understand why we are summing up row-wise in binary matrix and subtracting by its sum
binary = margins
binary[margins > 0] = 1
row_sum = np.sum(binary, axis=1)
binary[np.arange(num_train), y] = -row_sum.T
dW = np.dot(X.T, binary)
# Average
dW /= num_train
# Regularize
dW += reg*W
return loss, dW

Let us recap the scenario and the loss function first, so we are on the same page:
Given are P sample points in N-dimensional space in the form of a PxN matrix X, so the points are the rows of this matrix. Each point in X is assigned to one out of M categories. These are given as a vector Y of length P that has integer values between 0 and M-1.
The goal is to predict the classes of all points by M linear classifiers (one for each category) given in the form of a weight matrix W of shape NxM, so the classifiers are the columns of W. To predict the categories of all samples X the scalar products between all points and all weight vectors are formed. This is the same as matrix multiplying X and W yielding a score matrix Y0 that is arranged such that its rows are ordered like theh elements of Y, each row corresponds to one sample. The predicted category for each sample is simply that with the largest score.
There are no bias terms so I presume there is some kind of symmetry or zero mean assumption.
Now, to find a good set of weights we want a loss function that is small for good predictions and large for bad predictions and that lets us do gradient descent. One of the most straight-forward ways is to just punish for each sample i each score that is larger than the score of the correct category for that sample and let the penalty grow linearly with the difference. So if we write A[i] for the set of categories j that score more than the correct category Y0[i, j] > Y0[i, Y[i]] the loss for sample i could be written as
sum_{j in A[i]} (Y0[i, j] - Y0[i, Y[i]])
or equivalently if we write #A[i] for the number of elements in A[i]
(sum_{j in A[i]} Y0[i, j]) - #A[i] Y0[i, Y[i]]
The partial derivatives with respect to the score are thus simply
| -#A[i] if j == Y[i]
dloss / dY0[i, j] = { 1 if j in A[i]
| 0 else
which is precisely what the first four lines you say you don't understand compute.
The next line applies the chain rule dloss/dW = dloss/dY0 dY0/dW.
It remains to divide by the number of samples to get a per sample loss and to add the derivative of the regulatization term which the regularization being just a componentwise quadratic function is easy.

Personally, I found it much easier to understand the whole gradient calculation through looking at the analytic derivation of the loss function in more detail. To extend on the given answer, I would like to point to the derivatives of the loss function
with respect to the weights as follows:
Loss gradient wrt w_yi (correct class)
Hence, we count the cases where w_j is not meeting the margin requirement and sum those cases up. This negative sum is then specified as weight for the position of the correct class w_yi. (we later need to multiply this value with xi, this is what you do in your code in line 5)
2) Loss gradient wrt w_j (incorrect classes)
where 1 is the indicator function, 1 if true, else 0.
In other words, "programatically" we need to apply equation (2) to all cases where the margin requirement is not met, and adding the negative sum of all unmet requirements to the true class column (as in (1)).
So what you did in the first 3 lines of your code is to determine the cases where the margin is not met, as well as adding the negative sum of these cases to the correct class column (j). In the 5 line, you do the final step where you multiply the x_i's to the other term - and this completes the gradient calculations as in (1) and (2).
I hope this makes it easier to understand, let me know if anything remains unclear. source

Randomly sample x% of each cluster

I am working on a project aiming to exploit the cluster structure of my dataset to improve a supervised active learning classifier for binray classification. I use the following code to cluster my data, X using scikit-leanr's K-Means implementation:
k = KMeans(n_clusters=(i+2), precompute_distances=True, ).fit(X)
df = pd.DataFrame({'cluster' : k.labels_, 'percentage posotive' : y})
a = df.groupby('cluster').apply(lambda cluster:cluster.sum()/cluster.count())
The two classes are positive (represented by a 1) and negative (represented by a 0) and are stored in an array y.
This code first clusters X and then stores in a data frame each clusters number and the number of percentage of positive instances within it.
I would now like to randomly select points from each cluster, until I have sampled 15%. How can I do this?
As requested here is a simplified script including a test dataset:
from sklearn.cluster import KMeans
import pandas as pd
X = [[1,2], [2,5], [1,2], [3,3], [1,2], [7,3], [1,1], [2,19], [1,11], [54,3], [78,2], [74,36]]
y = [0,0,0,0,0,0,0,0,0,1,0,0]
k = KMeans(n_clusters=(4), precompute_distances=True, ).fit(X)
df = pd.DataFrame({'cluster' : k.labels_, 'percentage posotive' : y})
a = df.groupby('cluster').apply(lambda cluster:cluster.sum()/cluster.count())
print(a)
Note: The real datasets are much larger consisting of thousands of features and thousands of data instances.
In response to #SandipanDey:
I can't tell you too much, but basically we are dealing with a highly unbalanced dataset (1:10,000) and we are only interested in identifying the minority class examples with recall > 95% whilst reducing the number of labels requested. (Recall needs to be so high as its related to healthcare.)
The minority examples cluster together, and any cluster containing a positive instances will usually contain at least x%, so by sampling x% we ensure that we identify all clusters with any positive instances. So we are able to quickly reduce the size of the dataset with potential positives. This parital dataset can then be used for active learning. Our approach is loosely inspired by 'Hierarchical Sampling for Active Learning'

If I understood you correctly, the following code should serve the purpose:
import numpy as np
# For each cluster
# (1) Find all the points from X that are assigned to the cluster.
# (2) Choose x% from those points randomly.
n_clusters = 4
x = 0.15 # percentage
for i in range(n_clusters):
# (1) indices of all the points from X that belong to cluster i
C_i = np.where(k.labels_ == i)[0].tolist()
n_i = len(C_i) # number of points in cluster i
# (2) indices of the points from X to be sampled from cluster i
sample_i = np.random.choice(C_i, int(x * n_i))
print i, sample_i
Just for curiosity, how are you going to use these x% points for active learning?

multiple polynomial regression with different degrees for the different parameters

I have velocity data (mf) for a fluid at 5 axial locations (x) for 14 different combinations of two parameters of the fluid (Re, k). The velocity data is dependent on Re, k and x.
I would like to use sklearn to do polynomial regression of my data as in this post, but I am facing some problems:
How should I build the X matrix (the matrix of the independent variables)? It seems to me that there are 3 independent variables here (Re, k and x) but I have 14 values of Re, 14 values of k and only 5 values for x.
Would it be possible to regress with degree=1 w.r.t. Re and k and degree=3 w.r.t. x?
Any help is appreciated. Thanks!

If you have three 2d array-like objects Re, k, and x, you can create polynomial features of degree=3 on just x, by applying the PolynomialFeatures transformer to just x before stacking the features into a single matrix.
poly_x = PolynomialFeatures(3)
X = np.hstack([Re, k, poly_x.fit_transform(x)])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is it possible to fit a multivariate GMHMM in hmmlearn? - python

Related

Draw categorical vectors from pyMC3 with dirichlet prior

Constructing a Custom Loss Function in Keras

Vectorized SVM gradient

Randomly sample x% of each cluster

multiple polynomial regression with different degrees for the different parameters

Categories

Resources