Efficient pytorch broadcasting command not obtained

Efficient pytorch broadcasting command not obtained - python

I have the class-wise feature vector for the 5 classes in my model. the feature vectors for each class are 20 dimensional. I want to multiply a scalar gain to each class's feature vector and the resulting weighted feature vectors are summed to form new feature matrix.
The gain_matrix corresponds to a scalar value for each i-th j-th pair of classes. The feature vector (20 dimensional) of i-th class is calculated as the sum of the scalar gain multiplied by all other classes feature vectors.
The exact implementation code is shown below.
nClass=5
feature_dim=20
gain_matrix=torch.rand(nClass,nClass)
feature_matrix=torch.rand(nClass,feature_dim) #in my implementation this is output from model
feature_matrix_new=torch.zeros(nClass,feature_dim)
for i in range(nClass):
for j in range(nClass):
feature_matrix_new[i,:]+=gain_matrix[i][j]*feature_matrix[j,:]
The nested for loop is slowing down the implementation a lot.
Is there any efficient PyTorch broadcasting solution to avoid the nested for loop in my implementation?
I have seen pytorch broadcasting web page but it did not help me much.

This would be a good place to use torch.einsum:
>>> feature_matrix_new = torch.einsum('ij,jk->ik', gain_matrix, feature_matrix)
However in this case this just comes down to a matrix multiplication:
>>> feature_matrix_new = gain_matrix # feature_matrix

Related

not-unique tensor decomposition for lanet analysis

Want to use Tucker and canonical polyadic decomposition (cdp or PARAFAC/CANDECOMP) of 3-dimension tensor for latent analysis.
I use python, function from tensorly.decomposition.parafac of library tensorly.
from tensorly.decomposition import parafac
# Rank of the CP decomposition
cp_rank = 5
Perform the CP decomposition
weights, factors = parafac(result, non_negative=True ,rank=cp_rank , normalize_factors=True, init='random', tol=10e-6)
# Reconstruct the tensor from the factors
cp_reconstruction = tl.kruskal_to_tensor((weights, factors))
Factors matrices and core are not-unique (can multiply on non-singular matrix), so factor matrices change after calling the function.
Use this code for understand this:
weights = 0
for i in range(100):
error = weights
weights, factors = parafac(result, non_negative=True ,rank=8, normalize_factors=True, init='random', tol=10e-6)
error -= weights
print(tl.norm(error))
How I can describe or analysis every components of tensor.Has ones any meaning?
For matrix I understand SVD decomposition. What do for tensor?

The decomposition you are using in your example (parafac, also known as Canonical-Polyadic -CP- decomposition), does not have a core. It expresses the original tensor as a weighted sum of rank-1 tensors, i.e. a weighted sum of outer-product of vectors. These vectors are collected for each mode (dimension) into the factors matrices. The weights of the sum are a vector. The CP decomposition, unlike the Tucker, does not have a core and is unique under mild conditions (you could consider CP as a special case of Tucker, with the weight vector being the values of a diagonal core).
However, there are several issues with directly comparing the factors: for one, even if the decomposition is unique, it is also invariant under permutations of the factors so you can't directly compare the factors. In addition, finding the actual rank of a tensor is in general NP-hard. What is typically computed with a CP decomposition is a low-rank approximation (i.e. best rank-R approximation) which is also, in general, NP-hard and the ALS is just a (good) heuristic. If you want to compare several factorized tensors, it is easier to compare the reconstructions rather than directly the factors.
For latent factor analysis I advise you look at this paper which shows how you can learn latent variable models by factorizing low-order observable moments.

tf.matmul(X,weight) vs tf.matmul(X,tf.traspose(weight)) in tensorflow

In standard ANN for fully connected layers we are using the following formula: tf.matmul(X,weight) + bias. Which is clear to me, as we use matrix multiplication in order to connect input with th hidden layer.
But in GloVe implementation(https://nlp.stanford.edu/projects/glove/) we are using the following formula for embeddings multiplication: tf.matmul(W, tf.transpose(U)) what confuses me is tf.transpose(U)part.
Why do we use tf.matmul(W, tf.transpose(U)) instead of tf.matmul(W, U)?

It has to do with the choice of column vs row orientation for the vectors.
Note that weight is the second parameter here:
tf.matmul(X, weight)
But the first parameter, W, here:
tf.matmul(W, tf.transpose(U))
So what you are seeing is a practical application of the following matrix transpose identity:
To bring it back to your example, let's assume 10 inputs and 20 outputs.
The first approach uses row vectors. A single input X would be a 1x10 matrix, called a row vector because it has a single row. To match, the weight matrix needs to be 10x20 to produce an output of size 20.
But in the second approach the multiplication is reversed. That is a hint that everything is using column vectors. If the multiplication is reversed, then everything gets a transpose. So this example is using column vectors, so named because they have a single column.
That's why the transpose is there. The way they GLoVe authors have done their notation, with the multiplication reversed, the weight matrix W must already be transposed to 20x10 instead of 10x20. And they must be expecting a 20x1 column vector for the output.
So if the input vector U is naturally a 1x10 row vector, it also has to be transposed, to a 10x1 column vector, to fit in with everything else.
Basically you should pick row vectors or column vectors, all the time, and then the order of multiplications and the transposition of the weights is determined for you.
Personally I think that column vectors, as used by GloVe, are awkward and unnatural compared to row vectors. It's better to have the multiplication ordering follow the data flow ordering.

Auto broadcasting in Scipy

I have two np.ndarrays, data with shape (8000, 500) and sample with shape (1, 500).
What I am trying to achieve is measure various types of metrics between every row in data to sample.
When using from sklearn.metrics.pairwise.cosine_distances I was able to take advantage of numpy's broadcasting executing the following line
x = cosine_distances(data, sample)
But when I tried to use the same procedure with scipy.spatial.distance.cosine I got the error
ValueError: Input vector should be 1-D.
I guess this is a broadcasting issue and I'm trying to find a way to get around it.
My ultimate goal is to iterate over all of the distances available in scipy.spatial.distance that can accept two vectors and apply them to the data and the sample.
How can I replicate the broadcasting that automatically happens in sklearn's in my scipy version of the code?

OK, looking at the docs, http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_distances.html
With (800,500) and (1,500) inputs ((samples, features)), you should get back a (800,1) result ((samples1, samples2)).
I wouldn't describe that as broadcasting. It's more like dot product, that performs some sort calculation (norm) over features (the 500 shape), reducing that down to one value. It's more like np.dot(data, sample.T) in its handling of dimensions.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cosine.html is Computes the Cosine distance between 1-D arrays, more like
for row in data:
for s in sample:
d = cosine(row, s)
or since sample has only one row
distances = np.array([cosine(row, sample[0]) for row in data])
In other words, the sklearn version does the pairwise iteration (maybe in compiled code), while the spartial just evaluates the distance for one pair.
pairwise.cosine_similarity does
# K(X, Y) = <X, Y> / (||X||*||Y||)
K = safe_sparse_dot(X_normalized, Y_normalized.T, dense_output=dense_output)
That's the dot like behavior that I mentioned earlier, but with the normalization added.

Decomposing 3rd Order Tensor in Python

I have a tensor in the shape (n_samples, n_steps, n_features). I want to decompose this into a tensor of shape (n_samples, n_components).
I need a method of decomposition that has a .fit(...) so that I can apply the same decomposition to a new batch of samples. I have been looking at Tucker Decomposition and PARAFAC Decomposition, but neither have that crucial .fit(...) and .transform(...) functionality. (Or at least I think they don't?)
I could use PCA and train it on a representative sample and then call .transform(...) on the remaining samples, but I would rather have some sort of tensor decomposition that can handle all of the samples at once, so as to get a better idea of the differences between each sample.
This is what I mean by "tensor":
In fact tensors are merely a generalisation of scalars and vectors; a scalar is a zero rank tensor, and a vector is a first rank tensor. The rank (or order) of a tensor is defined by the number of directions (and hence the dimensionality of the array) required to describe it.
If you have any questions, please ask, I'll try to clarify my problem if needed.
EDIT: The best solution would be some type of kernel but I have yet to find a kernel that can deal with n-rank Tensors and not just 2D data

You can do this using the development (master) version of TensorLy. Specifically, you can use the new partial_tucker function (it is not yet updated in the documentation...).
Note that the following solution preserves the structure of the tensor, i.e. a tensor of shape (n_samples, n_steps, n_features) is decomposed into a (smaller) tensor of shape (n_samples, n_components_1, n_components_2).
Code
Short answer: this is a very basic class that does what you want (and it would work on tensors of arbitrary order).
import tensorly as tl
from tensorly.decomposition._tucker import partial_tucker
class TensorPCA:
def __init__(self, ranks, modes):
self.ranks = ranks
self.modes = modes
def fit(self, tensor):
self.core, self.factors = partial_tucker(tensor, modes=self.modes, ranks=self.ranks)
return self
def transform(self, tensor):
return tl.tenalg.multi_mode_dot(tensor, self.factors, modes=self.modes, transpose=True)
Usage
Given an input tensor, you can use the previous class by first instantiating it with the desired ranks (size of the core tensor) and modes on which to perform the decomposition (in your 3D case, 1 and 2 since indexing starts at zero):
tpca = TensorPCA(ranks=[4, 5], modes=[1, 2])
tpca.fit(tensor)
Given a new tensor originally called new_tensor, you can project it using the transform method:
tpca.transform(new_tensor)
Explanation
Let's go through the code with an example: first let's import the necessary bits:
import numpy as np
import tensorly as tl
from tensorly.decomposition._tucker import partial_tucker
We then generate a random tensor:
tensor = np.random.random((10, 11, 12))
The next step is to decompose it along its second and third dimensions, or modes (as the first dimension corresponds to the samples):
core, factors = partial_tucker(tensor, modes=[1, 2], ranks=[4, 5])
The core corresponds to the transformed input tensor while factors is a list of two projection matrices, one for the second mode and one for the third mode. Given a new tensor, you can project it to the same subspace (the transform method) by projecting each of its last two dimensions:
tl.tenalg.multi_mode_dot(tensor, factors, modes=[1, 2], transpose=True)
The transposition here is equivalent to an inverse since the factors are orthogonal.
Finally, a note on the terminology: in general, even though it is sometimes done, it is probably best to not use interchangeably order and rank of a tensor. The order of a tensor is simply its number of dimensions while the rank of a tensor is usually a much more complicated notion which you could think of as a generalization of the notion of matrix rank.

Sparse-Dense multiplication in Python

I am using Python 3.23 and I am want to multiply a sparse VECTOR with a dense MATRIX. The idea of first unfolding the sparse vector into a dense one and then multiplying is of course silly from any standpoint except for mem management until the actual unfolding. It will be more expensive with zeros in there...
Also, does any one know of a good way for SciPy to keep one dimensional matrices in sparse mode? The only one (admittedly) i have used is the classical notation of three vectors (x,y,value), so i have had to use np.ones(len(...)) to get it to work.
Well.. comments welcome!

Store the vector using the Scipy sparse matrix classes:
x = csr_matrix(np.random.rand(1000) > 0.99).T
print x.shape # (1000, 1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.