Related
Probably a simple question, hopefully with a simple solution:
I am given a (sparse) 1D boolean tensor of size [1,N].
I would like to produce a 2D tensor our of it of size [N,N], containing islands which are induced by the 1D tensor. It will be the easiest to observe the following image example, where the upper is the 1D boolean tensor, and the matrix below represents the resulted matrix:
Given a mask input:
>>> x = torch.tensor([0,0,0,1,0,0,0,0,1,0,0])
You can retrieve the indices with torch.diff:
>>> index = x.nonzero()[:,0].diff(prepend=torch.zeros(1), append=torch.ones(1)*len(x))
tensor([3., 5., 3.])
Then use torch.block_diag to create the diagonal block matrix:
>>> torch.block_diag(*[torch.ones(i,i) for i in index.int()])
tensor([[1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.]])
How can I make a matrix H from two smaller matrices H_0 and H_1 as shown in the attached image? The final dimension is finite.
Here is an example.
a = np.array([[1,2,3],[4,5,6]])
b = np.ones(shape=(3,3))
a_r = a.reshape((-1,))
b_r = b.reshape((-1,))
b_r_ = np.diag(b_r,k=1)
b_r_ = b_r_ + b_r_.transpose()
for i in range(b_r_.shape[0]):
if i < len(a_r):
b_r_[i][i]=a_r[i]
else:
b_r_[i][i]=0
Output:
array([[1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 2., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 3., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 4., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 5., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 6., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 1.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]])
Concern:
I think this is not the most computationally efficient way but I think it works
H = np.kron(np.eye(r,dtype=int),H_0) + np.kron(np.diag(np.ones(r-1), 1),H_1) + np.kron(np.diag(np.ones(r-1), -1),transpose(conj(H_1))) #r = repetition
Is there any Python package that can build tensor by calculating outer products of matrix columns?
Like "ktensor" function from matlab
https://www.tensortoolbox.org/ktensor_doc.html
You can do this using the parafac function from the tensorly package. From their documentation:
import numpy as np
import tensorly as tl
tensor = tl.tensor([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
[ 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
[ 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
[ 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
from tensorly.decomposition import parafac
factors = parafac(tensor, rank=2)
factors now holds the list of matrices that form the decomposition.
As mentioned in the previous answer, you can use TensorLy for manipulating tensors, including tensors in CP form ("Kruskal tensors"). The tensorly.kruskal_tensor module handles these particular operations.
If you want to reconstruct a Kruskal tensor from the factors (and optional vector of weights), you can use kruskal_to_tensor from TensorLy:
full_tensor = kruskal_to_tensor(kruskal_tensor)
or, equivalently:
full_tensor = kruskal_to_tensor((weights, factors))
If your decomposition has rank K, weights is a vector of length R and factors a list of factors with R columns each.
You can also directly use the underlying functions such as outer-product or Khatri-Rao.
As mentioned in the previous answer, if you have a full tensor, you can apply CP (parafac) decomposition to approximate the factors.
In pandas or numpy, I can do the following to get one-hot vectors:
>>> import numpy as np
>>> import pandas as pd
>>> x = [0,2,1,4,3]
>>> pd.get_dummies(x).values
array([[ 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 1., 0.]])
>>> np.eye(len(set(x)))[x]
array([[ 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 1., 0.]])
From text, with gensim, I can do:
>>> from gensim.corpora import Dictionary
>>> sent1 = 'this is a foo bar sentence .'.split()
>>> sent2 = 'this is another foo bar sentence .'.split()
>>> texts = [sent1, sent2]
>>> vocab = Dictionary(texts)
>>> [[vocab.token2id[word] for word in sent] for sent in texts]
[[3, 4, 0, 6, 1, 2, 5], [3, 4, 7, 6, 1, 2, 5]]
Then I'll have to do the same pd.get_dummies or np.eyes to get the one-hot vector but I get an error where there's one dimension missing from my one-hot vector I have 8 unique words but the one-hot vector lengths are only 7:
>>> [pd.get_dummies(sent).values for sent in texts_idx]
[array([[ 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 1.],
[ 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0.]]), array([[ 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 1., 0.],
[ 1., 0., 0., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0.]])]
It seems like it's doing one-hot vector individually as it iterates through each sentence, instead of using the global vocabulary.
Using np.eye, I do get the right vectors:
>>> [np.eye(len(vocab))[sent] for sent in texts_idx]
[array([[ 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0.],
[ 1., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0.]]), array([[ 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0.]])]
Also, currently, I have to do several things from using gensim.corpora.Dictionary to converting the words to their ids then getting the one-hot vector.
Are there other ways to achieve the same one-hot vector from texts?
There are various packages that will do all the steps in a single function such as http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html.
Alternatively, if you have your vocabulary and text indexes for each sentence already, you can create a one-hot encoding by preallocating and using smart indexing. In the following text_idx is a list of integers and vocab is a list relating integers indexes to words.
import numpy as np
vocab_size = len(vocab)
text_length = len(text_idx)
one_hot = np.zeros(([vocab_size, text_length])
one_hot[text_idx, np.arange(text_length)] = 1
to create one_hot_vector, you need to create unique vocabulary from text
vocab=set(vocab)
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(vocab)
one_hot_encoder = OneHotEncoder(sparse=False)
doc = "dog"
index=vocab.index(doc)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
one_hot_encoder=one_hot_encoder.fit_transform(integer_encoded)[index]
The 7th value is the "."(Dot) in your sentences separated by a " "(space) and split() counts it as a word !!
I want to append a vector to a matrix in python. I tried append or concatenate methods but I didn't get the answer. I was previously working with Matlab and there I used this:
m = zeros(10, 4) % define my matrix, 10x4
v = ones(10, 1) % my vecto, 10x1
c = [m,v] % so simple! the result is: 10x5 (the vector added as the last column)
How can I do that in python using numpy?
You're looking for np.r_ and np.c_. (Think "column stack" and "row stack" (which are also functions) but with matlab-style range generations.)
Also see np.concatenate, np.vstack, np.hstack, np.dstack, np.row_stack, np.column_stack etc.
For example:
import numpy as np
m = np.zeros((10, 4))
v = np.ones((10, 1))
c = np.c_[m, v]
Yields:
array([[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.]])
This is also equivalent to np.hstack([m, v]) or np.column_stack([m, v])
If you're not coming from matlab, hstack and column_stack probably seem much more readable and descriptive. (And they're arguably better in this case for that reason.)
However, np.c_ and np.r_ have additional functionality that folks coming from matlab tend to expect. For example:
In [7]: np.r_[1:5, 2]
Out[7]: array([1, 2, 3, 4, 2])
Or:
In [8]: np.c_[m, 0:10]
Out[8]:
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 2.],
[ 0., 0., 0., 0., 3.],
[ 0., 0., 0., 0., 4.],
[ 0., 0., 0., 0., 5.],
[ 0., 0., 0., 0., 6.],
[ 0., 0., 0., 0., 7.],
[ 0., 0., 0., 0., 8.],
[ 0., 0., 0., 0., 9.]])
At any rate, for matlab folks, it's handy to know about np.r_ and np.c_ in addition to vstack, hstack, etc.
In numpy it is similar:
>>> m=np.zeros((10,4))
>>> m
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
>>> v=np.ones((10,1))
>>> v
array([[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.]])
>>> np.c_[m,v]
array([[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.]])