I'm trying to build a model in tensorflow that should take a number of points with n dimensions and find a set of hyperplanes that form a hull around one set of points while including as little of another set of points.
To do this I would input a Matrix of size [n,np] with n denoting dimensions and np denoting the number of points each defined in n dimensions. Like:
[[ 0.04370488 -0.09842589 -0.01787493]
[ 0.1415032 0.05342565 0.63025913]
[-0.84298323 -0.91433908 -0.9716289 ]
[ 0.19159608 -0.68356499 0.55441537]
[ 0.34797942 0.55592542 -0.74667198]]
As a last layer I would like to have n+1 hyperplanes that are each defined by two vectors, one of them pointing to a point on the hyperplane, the other being the normal vector of the hyperplane. In three dimensions this would give me 4 hyperplanes each defined by 2 vectors with 3 dimensions. So I guess this would be a 4x2x3 matrix or 24 values. Like:
[[0, 0, 0] [1, 0, 0]]
[[0, 0, 0] [0, 1, 0]]
[[0, 0, 0] [0, 0, 1]]
[[5, 5, 5] [-1, -1, -1]]
I was thinking this layer to either be the output of the model OR
to be used in calculating whether a point is on the in- or outside of the hull. Which could just be encoded as 0 or 1
For now I have a barebones model where I managed to input a Matrix with the correct size but couldn't yet manage to write a loss function or custom layer that makes it possible to evaluate whether a point is in or outside of the hull
The ys array is a 800,1 array containing labels for each point saying it is either a point that should be in the hull or outside the hull.
from tensorflow import keras
def in_convex_hull(point, plane_point, plane_normal):
if np.dot(plane, (point - a)) == 1:
return true
return false
def custom_loss(actual, pred):
loss = 0
return loss
def custom_layer():
return
model = keras.Sequential([keras.layers.Dense(units=1, input_shape=[800,3])])
model.add(keras.layers.Dense(1000))
model.add(keras.layers.Dense(24))
model.compile(optimizer='Adam', loss='BinaryCrossentropy', metrics = ["accuracy"])
xs = np.array([np.random.rand(800,3) for i in range(1)])
ys = np.array([np.eye(1)[np.random.choice(1, 800)]])
history = model.fit(xs, ys, epochs=10, batch_size=1, verbose = 1)
Any pointers on how this setup could be achieved is greatly appreciated
Related
I was trying to understand what is going on with sklearn's jaccard_score.
This is the result I got
1. jaccard_score([0 1 1], [1 1 1])
0.6666666666666666
2. jaccard_score([1 1 0], [1 0 0])
0.5
3. jaccard_score([1 1 0], [1 0 1])
0.3333333333333333
I understand that the formula is
intersection / size of A + size of B - intersection
I thought the last one should give me 0.2 because the overlap is 1 and total number of entries is 6 resulting 1/5. but I got 0.33333...
Can anyone explain how sklearn calculates jaccard_score?
Per sklearn's doc, the jaccard_score function "is used to compare set of predicted labels for a sample to the corresponding set of labels in y_true". If the attributes are binary, the computation is based on this using the confusion matrix. Otherwise, the same computation is done using the confusion matrix for each attribute value / class label.
The above definition for binary attributes / classes can be reduced to the set definition as explained in the following.
Assume that there are three records r1, r2, and r3. The vector [0, 1, 1] and [1, 1, 1] -- which are true and predicted classes of the records -- can be mapped to two sets {r2, r3} and {r1, r2, r3} respectively. Here, each element in the vector represents whether the correponding record exists in the set. The Jaccard similarity of the two sets are the same as the definition of similarity value for two vectors.
I'm trying to understand this code from lightaime's Github page. It is a vetorized softmax method. What confuses me is "softmax_output[range(num_train), list(y)]"
What does this expression mean?
def softmax_loss_vectorized(W, X, y, reg):
"""
Softmax loss function, vectorize implementation
Inputs have dimension D, there are C classes, and we operate on minibatches of N examples.
Inputs:
W: A numpy array of shape (D, C) containing weights.
X: A numpy array of shape (N, D) containing a minibatch of data.
y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C.
reg: (float) regularization strength
Returns a tuple of:
loss as single float
gradient with respect to weights W; an array of same shape as W
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
num_classes = W.shape[1]
num_train = X.shape[0]
scores = X.dot(W)
shift_scores = scores - np.max(scores, axis = 1).reshape(-1,1)
softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1,1)
loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))
loss /= num_train
loss += 0.5* reg * np.sum(W * W)
dS = softmax_output.copy()
dS[range(num_train), list(y)] += -1
dW = (X.T).dot(dS)
dW = dW/num_train + reg* W
return loss, dW
This expression means: slice an array softmax_output of shape (N, C) extracting from it only values related to the training labels y.
Two dimensional numpy.array can be sliced with two lists containing appropriate values (i.e. they should not cause an index error)
range(num_train) creates an index for the first axis which allows to select specific values in each row with the second index - list(y). You can find it in the numpy documentation for indexing.
The first index range_num has a length equals to the first dimension of softmax_output (= N). It points to each row of the matrix; then for each row it selects target value via corresponding value from the second part of an index - list(y).
Example:
softmax_output = np.array( # dummy values, not softmax
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]]
)
num_train = 4 # length of the array
y = [2, 1, 0, 2] # a labels; values for indexing along the second axis
softmax_output[range(num_train), list(y)]
Out:
[3, 5, 7, 12]
So, it selects third element from the first row, second from the second row etc. That's how it works.
(p.s. Do I misunderstand you and you interested in "why", not "how"?)
The loss here is defined by following equation
Here, y is 1 for the class datapoint belongs and 0 for all other classes. Thus we are only interested in softmax outputs for datapoint class. Thus above equation can be rewritten as
Thus then following code representing above equation.
loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))
The code softmax_output[range(num_train), list(y)] is used to select softmax outputs for respective classes. range(num_train) represents all the training samples and list(y) represents respective classes.
This indexing is nicely explained Mikhail in his answer.
In Attention Is All You Need, the authors implement a positional embedding (which adds information about where a word is in a sequence). For this, they use a sinusoidal embedding:
PE(pos,2i) = sin(pos/10000**(2*i/hidden_units))
PE(pos,2i+1) = cos(pos/10000**(2*i/hidden_units))
where pos is the position and i is the dimension. It must result in an embedding matrix of shape [max_length, embedding_size], i.e., given a position in a sequence, it returns the tensor of PE[position,:].
I found the Kyubyong's implementation, but I do not fully understand it.
I tried to implement it in numpy the following way:
hidden_units = 100 # Dimension of embedding
vocab_size = 10 # Maximum sentence length
# Matrix of [[1, ..., 99], [1, ..., 99], ...]
i = np.tile(np.expand_dims(range(hidden_units), 0), [vocab_size, 1])
# Matrix of [[1, ..., 1], [2, ..., 2], ...]
pos = np.tile(np.expand_dims(range(vocab_size), 1), [1, hidden_units])
# Apply the intermediate funcitons
pos = np.multiply(pos, 1/10000.0)
i = np.multiply(i, 2.0/hidden_units)
matrix = np.power(pos, i)
# Apply the sine function to the even colums
matrix[:, 1::2] = np.sin(matrix[:, 1::2]) # even
# Apply the cosine function to the odd columns
matrix[:, ::2] = np.cos(matrix[:, ::2]) # odd
# Plot
im = plt.imshow(matrix, cmap='hot', aspect='auto')
I don't understand how this matrix can give information on the position of inputs. Could someone first tell me if this is the right way to compute it and second what is the rationale behind it?
Thank you.
I found the answer in a pytorch implementation:
# keep dim 0 for padding token position encoding zero vector
position_enc = np.array([
[pos / np.power(10000, 2*i/d_pos_vec) for i in range(d_pos_vec)]
if pos != 0 else np.zeros(d_pos_vec) for pos in range(n_position)])
position_enc[1:, 0::2] = np.sin(position_enc[1:, 0::2]) # dim 2i
position_enc[1:, 1::2] = np.cos(position_enc[1:, 1::2]) # dim 2i+1
return torch.from_numpy(position_enc).type(torch.FloatTensor)
where d_pos_vec is the embedding dimension and n_position the max sequence length.
EDIT:
In the paper, the authors say that this representation of the embedding matrix allows "the model to extrapolate to sequence lengths longer than the ones encountered during training".
The only difference between two positions is the pos variable. Check the image below for a graphical representation.
I am trying to build a toy Hidden Markov Model with 2 states and 3 possible observations using the "MultinomialHMM" module, part of the scikit-learn library. My problem is that the module accepts (and generates predictions) even when the observation probabilities for a state add up to more than 1 or less than 1. example:
import numpy
from sklearn import hmm
startprob = numpy.array([0.5, 0.5])
transition_matrix = numpy.array([[0.5, 0.5], [0.5, 0.5]])
model = hmm.MultinomialHMM(2, startprob, transition_matrix)
model.emissionprob_ = numpy.array([[0, 0, 0.2], [0.6, 0.4, 0]])
Notice that the probabilities of the signals emitted by state 0 are [0,0,0.2] (which add up to 0.2). The module does not complain when asked to generate a sample of observations:
model.sample(10)
(array([1, 0, 0, 0, 0, 2, 1, 0, 0, 0], dtype=int64), array([1, 1, 0, 1, 1, 0, 1, 0, 0, 0]))
I can also specify emission probailities that sum up to more than 1, and the model genenrates predictions with no complaints.
Is this desired behavior? Are the probabilities being normalized in someway? If so, how?
First of all, HMM is deprecated in sklearn. You need to check out https://github.com/hmmlearn/hmmlearn, which is Hidden Markov Models in Python, with scikit-learn like API
BTW, The problem you ask seems like a bug. When you set emissionprob_, the _set_emissionprob is called. This tries re-normalize by calling normalize(emissionprob):
if not np.alltrue(emissionprob):
normalize(emissionprob)
However, this code has two problems:
dose not set axis properly.
not in-place even though the document says that.
So modified as
if not np.alltrue(emissionprob):
normalize(emissionprob, 1) # added axis term
and
def normalize(A, axis=None):
A += EPS
Asum = A.sum(axis)
if axis and A.ndim > 1:
# Make sure we don't divide by zero.
Asum[Asum == 0] = 1
shape = list(A.shape)
shape[axis] = 1
Asum.shape = shape
A /= Asum # this is true in-place, it was `return A / Asum` <<= here
I am trying to run a Fisher's LDA (1, 2) to reduce the number of features of matrix.
Basically, correct if I am wrong, given n samples classified in several classes, Fisher's LDA tries to find an axis that projecting thereon should maximize the value J(w), which is the ratio of total sample variance to the sum of variances within separate classes.
I think this can be used to find the most useful features for each class.
I have a matrix X of m features and n samples (m rows, n columns).
I have a sample classification y, i.e. an array of n labels, each one for each sample.
Basing on y I want to reduce the number of features to, for example, 3 most representative features.
Using scikit-learn I tried in this way (following this documentation):
>>> import numpy as np
>>> from sklearn.lda import LDA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = LDA(n_components=3)
>>> clf.fit_transform(X, y)
array([[ 4.],
[ 4.],
[ 8.],
[-4.],
[-4.],
[-8.]])
At this point I am a bit confused, how to obtain the most representative features?
The features you are looking for are in clf.coef_ after you have fitted the classifier.
Note that n_components=3 doesn't make sense here, since X.shape[1] == 2, i.e. your feature space only has two dimensions.
You do not need to invoke fit_transform in order to obtain coef_, calling clf.fit(X, y) will suffice.