I need to build custom categorical cross entropy loss function where I should compare y_true and Q*y_pred instead of just y_pred. Q is a matrix.
The problem is that the batch size must not be equal to 1. So, there is a problem with dimensions.
How to built categorical cross entropy loss function which works with batch_size=200?
For example, this is the custom categorical cross entropy loss function which works correctly but for batch_size = 1.
I have 3 classes, so, the shape of y_pred is (batch_size, 3, 1) and the shape of Q is (3,3).
I also tried to transfer a multidimensional numpy array with shape = (batch_size, 3, 3) but it did not work.
Q=np.matrix([[0, 0.7,0.2], [0,0,0.8],[1,0.3,0]])
def alpha_loss(y_true, y_pred):
return K.categorical_crossentropy(y_true,K.dot(tf.convert_to_tensor(Q,dtype=tf.float32 ),K.reshape(y_pred,(3,1)) ))
Since you are using TensorFlow back end, this may work:
Q=np.matrix([[0, 0.7,0.2], [0,0,0.8],[1,0.3,0]])
def alpha_loss(y_true, y_pred):
# Edit: from the comments below it appears that y_pred has dim (batch_size, 3), so reshape it to have (batch_size, 3, 1)
y_pred = tf.expand_dims(y_pred, axis=-1)
q_tf = tf.convert_to_tensor(Q,dtype=tf.float32)
# Changing the shape of Q from (3,3) to (batch_size, 3, 3)
q_expanded = tf.tile(tf.expand_dims(q_tf, axis=0), multiples=[tf.shape(y_pred)[0], 1,1])
# Calculate the matrix multiplication of Q and y_pred, gives a tensor of shape (batch_size, 3, 1)
qy_pred = tf.matmul(q_expanded, y_pred)
return K.categorical_crossentropy(y_true, qy_pred)
Related
I am trying to implement a Sigmoid function using numpy to use in logistic regression for image classification. The data-sets contain 209 training and 50 testing images, 12288 pixels per image. Each RGB image is represented by (Width,Height,3 RGB Channels).
The original datasets are of shape (209, 64, 64, 3) train, and shape (50, 64, 64, 3) test. I reshape the datasets into numpy arrays of (209,12288) and (50,12288) shape respectively, in which each row represents and image. In the Sigmoid function shown bellow, where X is the training set, w are the weights, b is the bias and Y_hat the return vector of probabilities:
𝑦̂ =𝜎(X𝑤+𝑏)
how is it possible to calculate the product of a vector of shape (209,12288) with a vector of shape (209,1)? My approach is the following, first preprocessing the data:
# Reshape into (209,12288) and normalize to unit range
X_train = (X_train.reshape(X_train.shape[0],-1)-np.min(X_train.reshape(X_train.shape[0],-1)))/(np.max(X_train.reshape(X_train.shape[0],-1))-np.min(X_train.reshape(X_train.shape[0],-1)))
The Sigmoid:
# Sigmoid function, 𝜎(x)=1/1+e^(-x)
def sigmoid(x):
"""
Computes the sigmoid of x
Arguments:
x -- numpy array
Returns:
s -- sigmoid(x), scalar or numpy array of any size
"""
s = 1/(1+np.exp(-x))
return s
I try to calculate the Y_hat, using
# Forward prop: computes loss from (X, Y)
Y_hat = sigmoid(X # w + b) # returns a col. vector of probabilities
It will fail due to the difference in dimensions between the training set X (209,12288) and the weights column vector (209,1). I tried transposing and some other approaches for example:
Y_hat = sigmoid(np.matmul(X.T,w) + b)
but I am not sure if this is analogous to what is needed, it produces a Y_hat of shape (12288, 1) which should rather be (209,1).
So to summarize, how is it possible to calculate the Y_hat given the vectors shapes, X_train (209,12288) and w (209,1). Is my mistake in the vector multiplication part or because I am reshaping into (209,12288) at the preprocessing step?
I am looking at an implementation of RCNN for text classification using PyTorch. Full Code. There are two points where the dimensions of tensors are permuted using the permute function. The first is after the LSTM layer and before tanh. The second is after a linear layer and before a max pooling layer.
Could you please explain why the permutation is necessary or useful?
Relevant Code
def forward(self, x):
# x.shape = (seq_len, batch_size)
embedded_sent = self.embeddings(x)
# embedded_sent.shape = (seq_len, batch_size, embed_size)
lstm_out, (h_n,c_n) = self.lstm(embedded_sent)
# lstm_out.shape = (seq_len, batch_size, 2 * hidden_size)
input_features = torch.cat([lstm_out,embedded_sent], 2).permute(1,0,2)
# final_features.shape = (batch_size, seq_len, embed_size + 2*hidden_size)
linear_output = self.tanh(
self.W(input_features)
)
# linear_output.shape = (batch_size, seq_len, hidden_size_linear)
linear_output = linear_output.permute(0,2,1) # Reshaping fot max_pool
max_out_features = F.max_pool1d(linear_output, linear_output.shape[2]).squeeze(2)
# max_out_features.shape = (batch_size, hidden_size_linear)
max_out_features = self.dropout(max_out_features)
final_out = self.fc(max_out_features)
return self.softmax(final_out)
Similar Code in other Repositories
Similar implementations of RCNN use permute or transpose. Here are examples:
https://github.com/prakashpandey9/Text-Classification-Pytorch/blob/master/models/RCNN.py
https://github.com/jungwhank/rcnn-text-classification-pytorch/blob/master/model.py
What permute function does is rearranges the original tensor according to the desired ordering, note permute is different from reshape function, because when apply permute, the elements in tensor follow the index you provide where in reshape it's not.
Example code:
import torch
var = torch.randn(2, 4)
pe_var = var.permute(1, 0)
re_var = torch.reshape(var, (4, 2))
print("Original size:\n{}\nOriginal var:\n{}\n".format(var.size(), var) +
"Permute size:\n{}\nPermute var:\n{}\n".format(pe_var.size(), pe_var) +
"Reshape size:\n{}\nReshape var:\n{}\n".format(re_var.size(), re_var))
Outputs:
Original size:
torch.Size([2, 4])
Original var:
tensor([[ 0.8250, -0.1984, 0.5567, -0.7123],
[-1.0503, 0.0470, -1.9473, 0.9925]])
Permute size:
torch.Size([4, 2])
Permute var:
tensor([[ 0.8250, -1.0503],
[-0.1984, 0.0470],
[ 0.5567, -1.9473],
[-0.7123, 0.9925]])
Reshape size:
torch.Size([4, 2])
Reshape var:
tensor([[ 0.8250, -0.1984],
[ 0.5567, -0.7123],
[-1.0503, 0.0470],
[-1.9473, 0.9925]])
With the role of permute in mind we could see what first permute does is reordering the concatenate tensor for it to fit the inputs format of self.W, i.e with batch as first dimension; and the second permute does similar thing because we want to max pool the linear_output along the sequence and F.max_pool1d will pool along the last dimension.
I am adding this answer to provide additional PyTorch-specific details.
It is necessary to use permute between nn.LSTM and nn.Linear because the output shape of LSTM does not correspond to the expected input shape of Linear.
nn.LSTM outputs output, (h_n, c_n). Tensor output has shape seq_len, batch, num_directions * hidden_size nn.LSTM. nn.Linear expects an input tensor with shape N,∗,H, where N is batch size and H is number of input features. nn.Linear.
It is necessary to use permute between nn.Linear and nn.MaxPool1d because the output of nn.Linear is N, L, C, where N is batch size, C is the number of features, and and L is sequence length. nn.MaxPool1d expects an input tensor of shape N, C, L. nn.MaxPool1d
I reviewed seven implementations of RCNN for text classification with PyTorch on GitHub and gitee and found that permute and transpose are the normal ways to convert the output of one layer to the input of a subsequent layer.
When I attempt to use the softmax cross entropy function, I get a ValueError saying
ValueError: Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2).
The thing is that my layers are built in such a way that my logits should output only 1 value.
The shape of my logits is (5, 1) but I have no idea why there is a 5. The X for each instance is a 5x7 matrix
X = tf.placeholder(shape=(1, 5, 7), name='inputs', dtype=tf.float32)
y = tf.placeholder(shape=(1, 1), name='outputs', dtype=tf.int32)
hidden1 = tf.layers.dense(X, 150)
hidden2 = tf.layers.dense(hidden1, 50)
logits = tf.layers.dense(hidden2, 1)
with tf.name_scope("loss"):
xentropy= tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
Edit
Check the comment, and try this code.
X = tf.placeholder(shape=(1, 5, 7), name='inputs', dtype=tf.float32)
y = tf.placeholder(shape=(1), name='outputs', dtype=tf.int32)
flattened = tf.layers.flatten(X) # shape (1,35)
hidden1 = tf.layers.dense(flattened, 150) # shape (1,150)
hidden2 = tf.layers.dense(hidden1, 50) # shape (1,50)
logits = tf.layers.dense(hidden2, 1) # shape (1,1)
with tf.name_scope("loss"):
# expects logits of shape (1,1) against labels of shape (1)
xentropy= tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
Original
Let's think through what's going on here.
You create an X placeholder with the shape (5,7) (presumably (batch_size, data_size)).
You feed it into a hidden layer, which transforms the shape from (batch_size, data_size) to (batch_size, units) (units here is 150)
Likewise for the next two layers with hidden2 and logits, resulting in logits having shape (batch_size, 1), which is (5, 1) in this case
You're computing cross entropy between the labels and logits. The requirement for shapes here is for logits to have shape (batch_size, num_classes), where each value is the weight for a particular class, and for labels to have shape (batch_size), where each value is the class number for that particular sample. So this is where things go wrong for you. Your y has shape (1,1), and TF is expecting just a tensor or shape (5).
From what I'm guessing, I think you're trying to directly feed forward X as the data of a single sample (so like a (5,7) shaped matrix). If this is the case, you should have X take the shape (1,5,7) to signify to Tensorflow that X only represents one piece of data.
The thing is that my layers are built in such a way that my logits should output only 1 value.
That's not true. When X is an a X b tensor and you do tf.layers.dense(X, c), you are multiplying X by a b X c matrix (and bias also added of size c). So output size is a X c.
In your case since the first dimension of X is 5, it continues to be 5 even for logits. And your logits should be of size 5. So you are definitely doing something wrong. It is difficult to say what is without more information.
I am currently working on a neural network that takes some inputs and returns 2 outputs. I used 2 outputs in a regression problem where they both are 2 coordinates, X and Y.
My problem doesn't need X and Y values but angle it is facing which is atan2(y,x).
I am trying to to create a custom keras metric and a loss function that does a atan2 operation between the elements of the predicted tensor and true tensor so as to better train the network on my task.
The shape of the output tensor in metric is [?, 2] and I want to do a function where I can loop through the tensor and apply atan2(tensor[itr, 1], tensor[itr, 0]) on it to get an array of another tensors.
I have tried using tf.slit and tf.slice
I don't want to convert it into a numpy array and back to tensorflow due to performance reasons.
I have tried to get the shape of tensors using tensor.get_shape().as_list() and iterate through it.
self.model.compile(loss="mean_absolute_error",
optimizer=tf.keras.optimizers.Adam(lr=0.01),
metrics=[vect2d_to_angle_metric])
# This is the function i want to work on
def vect2d_to_angle_metric(y_true, y_predicted):
print("y_true = ", y_true)
print("y_predicted = ", y_predicted)
print("y_true shape = ", y_true.shape())
print("y_predicted shape = ", y_predicted.shape())
The print out of the above function being
y_true = Tensor("dense_2_target:0", shape=(?, ?), dtype=float32)
y_predicted = Tensor("dense_2/BiasAdd:0", shape=(?, 2), dtype=float32)
y_true shape = Tensor("metrics/vect2d_to_angle_metric/Shape:0", shape=(2,), dtype=int32)
y_predicted shape = Tensor("metrics/vect2d_to_angle_metric/Shape_1:0", shape=(2,), dtype=int32)
Python pseudo-code of the functionality I want to apply to the tensorflow function
def evaluate(self):
mean_array = []
for i in range(len(x_test)):
inputs = x_test[i]
prediction = self.model.getprediction(i)
predicted_angle = np.arctan2(result[i][1], result[i][0])
real_angle = np.arctan2(float(self.y_test[i][1]), float(self.y_test[i][0]))
mean_array.append(([abs(predicted_angle - real_angle)]/real_angle) * 100)
i += 1
I expect to slide the 2 sides of the tensor [i][0] and [i][1] and to a tf.atan2() function on both of them and finally make another tensor out of them so as to follow with other calculations and pass the custom loss.
I have been using Zhixuhao's implementation of U-Net to try to do semantic binary segmentation and I modified it slightly using suggestions from this Stackoverflow answer:
Keras, binary segmentation, add weight to loss function
to be able to do a pixel-wise weighted binary cross-entropy, as they do in the original U-Net paper (see page 5), to force my U-Net to learn border pixels. Essentially the idea is to add a lambda layer that computes the pixel-wise weighted cross-entropy within the model itself and then use an "identity loss" that just copies the output of the network.
Here is what my input data looks like:
input image groundtruth weights
And here is what my code looks like:
def unet(pretrained_weights = None,input_size = (256,256,1)):
inputs = Input(input_size)
# [... Unet architecture from Zhixuhao's model.py file...]
conv10 = Conv2D(1, 1, activation = 'sigmoid', name='true_output')(conv9)
mask_weights = Input(input_size)
true_masks = Input(input_size)
loss1 = Lambda(weighted_binary_loss, output_shape=input_size, name='loss_output')([conv10, mask_weights, true_masks])
model = Model(inputs = [inputs, mask_weights, true_masks], outputs = loss1)
model.compile(optimizer = Adam(lr = 1e-4), loss =identity_loss)
And added those two functions:
def weighted_binary_loss(X):
y_pred, weights, y_true = X
loss = keras.losses.binary_crossentropy(y_pred, y_true)
loss = multiply([loss, weights])
return loss
def identity_loss(y_true, y_pred):
return y_pred
And finally here is the relevant part of my main.py:
input_size = (256,256,1)
target_size = (256,256)
myGene = trainGenerator(5,'data/moma/train','img','seg','wei',data_gen_args,save_to_dir=None,target_size=target_size)
model = unet(input_size=input_size)
model_checkpoint = ModelCheckpoint('unet_moma_weights.hdf5',monitor='loss',verbose=1, save_best_only=True)
model.fit_generator(myGene,steps_per_epoch=300,epochs=5,callbacks=[model_checkpoint])
Now this code runs fine, I can train my U-Net and it does learn border pixels, but only if I resize my input images to be 256*256 in size. If I instead use input_size=(256,32,1) and target_size=(256,32) in main.py , which is the relevant dimensions for my data and that allows me to use bigger batch sizes, I get the following error:
ValueError: Operands could not be broadcast together with shapes (256,
32, 1) (256, 32)
For the line loss = multiply([loss, weights]). And indeed the weights have one extra singleton dimension. I don't understand why the error is not raised when I use 256*256 inputs, but I tried to make both inputs the same dimensions with either k.expand_dims() or Reshape(), but while the code does not issue an error and the loss converges, when I test my network on extra inputs I get blank outputs (ie fully grey or white or black images, or stuff that has nothing to do with my inputs).
So this is a lot of text for the following question: Why does multiply() issue an error in the 256*32 case and not 256*256, and why creating/removing dimensions on the inputs does not help?
Thanks!
ps: In order to get the network to output the actual prediction instead of the pixel-wise loss after training, I remove the loss layer and the two extra input layers with the following code:
new_model = Model(inputs=model.inputs,outputs=model.get_layer("true_output").output)
new_model.compile(optimizer = Adam(lr = 1e-4), loss = 'binary_crossentropy')
new_model.set_weights(model.get_weights())
This works fine (again in the 256*256 case at least)
So for anyone who stumbles upon this question, here is how I implemented the loss function:
def pixelwise_weighted_binary_crossentropy(y_true, y_pred):
'''
Pixel-wise weighted binary cross-entropy loss.
The code is adapted from the Keras TF backend.
(see their github)
Parameters
----------
y_true : Tensor
Stack of groundtruth segmentation masks + weight maps.
y_pred : Tensor
Predicted segmentation masks.
Returns
-------
Tensor
Pixel-wise weight binary cross-entropy between inputs.
'''
try:
# The weights are passed as part of the y_true tensor:
[seg, weight] = tf.unstack(y_true, 2, axis=-1)
seg = tf.expand_dims(seg, -1)
weight = tf.expand_dims(weight, -1)
except:
pass
epsilon = tf.convert_to_tensor(K.epsilon(), y_pred.dtype.base_dtype)
y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)
y_pred = tf.math.log(y_pred / (1 - y_pred))
zeros = array_ops.zeros_like(y_pred, dtype=y_pred.dtype)
cond = (y_pred >= zeros)
relu_logits = math_ops.select(cond, y_pred, zeros)
neg_abs_logits = math_ops.select(cond, -y_pred, y_pred)
entropy = math_ops.add(relu_logits - y_pred * seg, math_ops.log1p(math_ops.exp(neg_abs_logits)), name=None)
# This is essentially the only part that is different from the Keras code:
return K.mean(math_ops.multiply(weight, entropy), axis=-1)