not-unique tensor decomposition for lanet analysis - python

Want to use Tucker and canonical polyadic decomposition (cdp or PARAFAC/CANDECOMP) of 3-dimension tensor for latent analysis.
I use python, function from tensorly.decomposition.parafac of library tensorly.
from tensorly.decomposition import parafac
# Rank of the CP decomposition
cp_rank = 5
Perform the CP decomposition
weights, factors = parafac(result, non_negative=True ,rank=cp_rank , normalize_factors=True, init='random', tol=10e-6)
# Reconstruct the tensor from the factors
cp_reconstruction = tl.kruskal_to_tensor((weights, factors))
Factors matrices and core are not-unique (can multiply on non-singular matrix), so factor matrices change after calling the function.
Use this code for understand this:
weights = 0
for i in range(100):
error = weights
weights, factors = parafac(result, non_negative=True ,rank=8, normalize_factors=True, init='random', tol=10e-6)
error -= weights
print(tl.norm(error))
How I can describe or analysis every components of tensor.Has ones any meaning?
For matrix I understand SVD decomposition. What do for tensor?

The decomposition you are using in your example (parafac, also known as Canonical-Polyadic -CP- decomposition), does not have a core. It expresses the original tensor as a weighted sum of rank-1 tensors, i.e. a weighted sum of outer-product of vectors. These vectors are collected for each mode (dimension) into the factors matrices. The weights of the sum are a vector. The CP decomposition, unlike the Tucker, does not have a core and is unique under mild conditions (you could consider CP as a special case of Tucker, with the weight vector being the values of a diagonal core).
However, there are several issues with directly comparing the factors: for one, even if the decomposition is unique, it is also invariant under permutations of the factors so you can't directly compare the factors. In addition, finding the actual rank of a tensor is in general NP-hard. What is typically computed with a CP decomposition is a low-rank approximation (i.e. best rank-R approximation) which is also, in general, NP-hard and the ALS is just a (good) heuristic. If you want to compare several factorized tensors, it is easier to compare the reconstructions rather than directly the factors.
For latent factor analysis I advise you look at this paper which shows how you can learn latent variable models by factorizing low-order observable moments.

Related

Calculating the derivative of a fully connected layer using numpy

I'm implementing my own neural network from scratch (educational purposes, I know there are faster and better libraries for this) and for that I'm trying to calculate the derivative of a fully connected layer. I know the following:
and assuming I have a way to calculate the derivative of f using f.derivative(<some_matrix>), how can I use numpy to efficiently calculate the derivative of f(XW) with respect to W as seen in the picture?
I want to be able to calculate the derivative for N different inputs at the same time (giving me a 4-d tensor, 1 dimension for the N samples, and 3 dimensions for the derivative in the image).
Note: f.derivative takes in a matrix of N inputs with d features each, and returns the derivative of each of the input points.

Efficient pytorch broadcasting command not obtained

I have the class-wise feature vector for the 5 classes in my model. the feature vectors for each class are 20 dimensional. I want to multiply a scalar gain to each class's feature vector and the resulting weighted feature vectors are summed to form new feature matrix.
The gain_matrix corresponds to a scalar value for each i-th j-th pair of classes. The feature vector (20 dimensional) of i-th class is calculated as the sum of the scalar gain multiplied by all other classes feature vectors.
The exact implementation code is shown below.
nClass=5
feature_dim=20
gain_matrix=torch.rand(nClass,nClass)
feature_matrix=torch.rand(nClass,feature_dim) #in my implementation this is output from model
feature_matrix_new=torch.zeros(nClass,feature_dim)
for i in range(nClass):
for j in range(nClass):
feature_matrix_new[i,:]+=gain_matrix[i][j]*feature_matrix[j,:]
The nested for loop is slowing down the implementation a lot.
Is there any efficient PyTorch broadcasting solution to avoid the nested for loop in my implementation?
I have seen pytorch broadcasting web page but it did not help me much.
This would be a good place to use torch.einsum:
>>> feature_matrix_new = torch.einsum('ij,jk->ik', gain_matrix, feature_matrix)
However in this case this just comes down to a matrix multiplication:
>>> feature_matrix_new = gain_matrix # feature_matrix

Spectral norm 2x2 matrix in tensorflow

I've got a 2x2 matrix defined by the variables J00, J01, J10, J11 coming in from other inputs. Since the matrix is small, I was able to compute the spectral norm by first computing the trace and determinant
J_T = tf.reduce_sum([J00, J11])
J_ad = tf.reduce_prod([J00, J11])
J_cb = tf.reduce_prod([J01, J10])
J_det = tf.reduce_sum([J_ad, -J_cb])
and then solving the quadratic
L1 = J_T/2.0 + tf.sqrt(J_T**2/4.0 - J_det)
L2 = J_T/2.0 - tf.sqrt(J_T**2/4.0 - J_det)
spectral_norm = tf.maximum(L1, L2)
This works, but it looks rather ugly and it isn't generalizable to larger matrices. Is there cleaner way (maybe a method call that I'm missing) to compute spectral_norm?
The spectral norm of a matrix J equals the largest singular value of the matrix.
Therefore you can use tf.svd() to perform the singular value decomposition, and take the largest singular value:
spectral_norm = tf.svd(J,compute_uv=False)[...,0]
where J is your matrix.
Notes:
I use compute_uv=False since we are interested only in singular values, not singular vectors.
J does not need to be square.
This solution works also for the case where J has any number of batch dimensions (as long as the two last dimensions are the matrix dimensions).
The elipsis ... operation works as in NumPy.
I take the 0 index because we are interested only in the largest singular value.

How to generate a random covariance matrix in Python?

So I would like to generate a 50 X 50 covariance matrix for a random variable X given the following conditions:
one variance is 10 times larger than the others
the parameters of X are only slightly correlated
Is there a way of doing this in Python/R etc? Or is there a covariance matrix that you can think of that might satisfy these requirements?
Thank you for your help!
OK, you only need one matrix and randomness isn't important. Here's a way to construct a matrix according to your description. Start with an identity matrix 50 by 50. Assign 10 to the first (upper left) element. Assign a small number (I don't know what's appropriate for your problem, maybe 0.1? 0.01? It's up to you) to all the other elements. Now take that matrix and square it (i.e. compute transpose(X) . X where X is your matrix). Presto! You've squared the eigenvalues so now you have a covariance matrix.
If the small element is small enough, X is already positive definite. But squaring guarantees it (assuming there are no zero eigenvalues, which you can verify by computing the determinant -- if the determinant is nonzero then there are no zero eigenvalues).
I assume you can find Python functions for these operations.

Jaccard's distance matrix with tensorflow

I would like to compute a distance matrix using the Jaccard distance. And do so as fast as possible. I used to use scikit-learn's pairwise_distances function. But scikit-learn doesn't plan to support GPU, and there's even a known bug that makes the function slower when run in parallel.
My only constraint is that the resulting distance matrix can then be fed to scikit-learn's DBSCAN clustering algorithm. I was thinking about implementing the computation with tensorflow but couldn't find a nice and simple way to do it.
PS: I have reasons to precompute the distance matrix instead of letting DBSCAN do it as needed.
Hej I was facing the same problem.
Given the idea that the jaccard similarity is the ratio of true postives (tp) to the sum of true positives, false negatives (fn) and false positives (fp), I came up with this solution:
def jaccard_distance(self):
tp = tf.reduce_sum(tf.mul(self.target, self.prediction), 1)
fn = tf.reduce_sum(tf.mul(self.target, 1-self.prediction), 1)
fp = tf.reduce_sum(tf.mul(1-self.target, self.prediction), 1)
return 1 - (tp / (tp + fn + fp))
Hope this helps!
I am not a tensorflow expert, but here is the solution I got. As far as I know, the only ways in tensorflow to do a computation on all-pairs of a list is to do a matrix multiplication or use the broadcasting rules, this solution uses both at some point.
So let's assume we have an input boolean matrix of n_samples rows, one per set, and n_features columns, one per possible element. A value True in the i-th row, j-th column means the i-th set contains the element j. Just like scikit-learn's pairwise_distances expect. We can then proceed as follow.
Cast the matrix to numbers, getting 1 for True and 0 for False.
Multiply the matrix by its own transpose. This produce a matrix where each element M[i][j] contains size of the intersection between the i-th and j-th sets.
Compute a cardv vector that contains the cardinality of all the sets by summing the input matrix by rows.
Make a row and a column vector from cardv.
Compute 1 - M / (cardvrow + cardvcol - M). The broadcasting rules will do all the work when adding a row and a column vector.
This algorithm as a whole seems a bit hack-ish, but it works and produce results within a reasonable margin from the result computed by scikit-learn's pairwise_distances function. A better algorithm should probably make a single pass on every pair of input vectors and compute only half of the matrix as it is symmetric. Any improvement is welcome.
setsin = tf.placeholder(tf.bool, shape=(N, M))
sets = tf.cast(setsin, tf.float16)
mat = tf.matmul(sets, sets, transpose_b=True, name="Main_matmul")
#mat = tf.cast(mat, tf.float32, name="Upgrade_mat")
#sets = tf.cast(sets, tf.float32, name="Upgrade_sets")
cardinal = tf.reduce_sum(sets, 1, name="Richelieu")
cardinalrow = tf.expand_dims(cardinal, 0)
cardinalcol = tf.expand_dims(cardinal, 1)
mat = 1 - mat / (cardinalrow + cardinalcol - mat)
I used float16 type as it seems much faster than float32. Casting to float32 might only be useful if the cardinals are large enough to make them inaccurate or if more precision is needed when performing the division. But even when the casts are needed, it seems to be still relevant to do the matrix multiplication as float16.

Categories