I am making a deep learning network that finds several points in 3d space.
The input is a stack of grayscale 1024 x 1024 images(# of images varies 5 to 20 ), and the output is 64 x 64 x 64 space. Each voxel of output has 0 or 1, but in my dataset there are only 2000 1s, so it is hard to tell whether my network is being trained well by observing the training losses.
For example if my network only spit out np.zeros((64,64,64)) as output, the accuracy still would be 1-2000/(64x64x64)~=99.9%.
So I want to ask which deep learning network I should choose for finding very small number of answers from 3d space. The input size becomes (1024 x 1024 x #img) and output size (64 x 64 x 64). I am now making experiments using 2D Unet-like net / 3D Unet-like net, with ReLU-with-ceiling end activation.
Please somebody recommend anything to refer and thank you very much.
Unet-like networks seems to be a good idea. Your problem does not comes frop the network itself, but from the loss and metrics you are using.
Indead, if you use a binary crossentropy loss and accuracy for metrics, because of the imbalanced character of your classes, your score will still be near 100%.
I suggest that you use Dice or Jaccard coefficient for metrics and/or loss (in this case loss is 1-Dicecoef), and that you calculate it only on the items of interest, and not on the background.
Depending on the framework you are using, you should easily find an existing implementation of these metrics. Then modify the code to avoid calculation on the background.
For example for python/tensorflow, using your volumes:
def dice_coef(y_true, y_pred, smooth=1):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
y_true_f = K.one_hot(K.cast(y_true_f, np.uint8), 2)
y_pred_f = K.one_hot(K.cast(y_pred_f, np.uint8), 2)
intersection = K.sum(y_true_f[:,1:]* y_pred_f[:,1:], axis=[-1])
union = K.sum(y_true_f[:,1:], axis=[-1]) + K.sum(y_pred_f[:,1:], axis=[-1])
dice = K.mean((2. * intersection + smooth)/(union + smooth), axis=0)
return dice
Related
I'm actually working on an image segmentation project with Keras.
I am using an implementation of Unet.
I have 2 classes identify by pixel value, 0 = background 1 = object I'm looking for.
I have a single output with a sigmoid activation function.
I'm using a binary cross-entropy has a loss function.
Here is the problem: I have a very unbalanced data set. I have aproximately 1 white pixel for 100 black pixels. And from what I understood, binary cross-entropy is not very compatible with an unbalanced data set.
So I try to implement the weigted cross entropy with the following formula:
This is my code:
def weighted_cross_entropy(y_true, y_pred):
w = [0.99, 0.01]
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
val = - (w[0] * y_true_f * K.log(y_pred_f) + w[1] * (1-y_true_f) * K.log(1-y_predf))
return K.mean(val, axis=-1)
I'm using the F1-Score/Dice to measure the result at the end of each epoch.
But pass 5 epochs, the loss is equal to NaN and the F1 Score stay very low (0.02).
It seems my network is not learning, but I don't understand why. Maybe my formula is wrong ?
I have also tried to invert weights value but the result is the same.
After some research I noted that it was possible to give defined weights directly in the fit function.
Like this:
from sklearn.utils import class_weight
w = class_weight.compute_class_weight('balanced', np.unique(y.train.ravel()) , y_train.ravel())
model.fit(epochs = 1000, ..., class_weight = w)
By doing this with the binary cross entropy basic function the network learns correctly and gives better results compared without the use of predefined weights.
So I don't understand the difference between the 2 methods. Is it really necessary to implement the weighted cross entropy function ?
This is a rather interesting question for Siamese network
I am following the example from https://keras.io/examples/mnist_siamese/.
My modified version of the code is in this google colab
The siamese network takes in 2 inputs (2 handwritten digits) and output whether they are of the same digit (1) or not (0).
Each of the two inputs are first processed by a shared base_network (3 Dense layers with 2 Dropout layers in between). The input_a is extracted into processed_a, input_b into processed_b.
The last layer of the siamese network is an euclidean distance layer between the two extracted tensors:
distance = Lambda(euclidean_distance,
output_shape=eucl_dist_output_shape)([processed_a, processed_b])
model = Model([input_a, input_b], distance)
I understand the reasoning behind using an euclidean distance layer for the lower part of the network: if the features are extracted nicely, then similar inputs should have similar features.
I am thinking, why not use a normal Dense layer for the lower part, as:
# distance = Lambda(euclidean_distance,
# output_shape=eucl_dist_output_shape)([processed_a, processed_b])
# model = Model([input_a, input_b], distance)
#my model
subtracted = Subtract()([processed_a, processed_b])
out = Dense(1, activation="sigmoid")(subtracted)
model = Model([input_a,input_b], out)
My reasoning is that if the extracted features are similar, then the Subtract layer should produce a small tensor, as the difference between the extracted features. The next layer, Dense layer, can learn that if the input is small, output 1, otherwise 0.
Because the euclidean distance layer outputs close to 0 value when two inputs are similar and 1 otherwise, I also need to invert the accuracy and loss function, as:
# the version of loss and accuracy for Euclidean distance layer
# def contrastive_loss(y_true, y_pred):
# '''Contrastive loss from Hadsell-et-al.'06
# http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
# '''
# margin = 1
# square_pred = K.square(y_pred)
# margin_square = K.square(K.maximum(margin - y_pred, 0))
# return K.mean(y_true * square_pred + (1 - y_true) * margin_square)
# def compute_accuracy(y_true, y_pred):
# '''Compute classification accuracy with a fixed threshold on distances.
# '''
# pred = y_pred.ravel() < 0.5
# return np.mean(pred == y_true)
# def accuracy(y_true, y_pred):
# '''Compute classification accuracy with a fixed threshold on distances.
# '''
# return K.mean(K.equal(y_true, K.cast(y_pred < 0.5, y_true.dtype)))
### my version, loss and accuracy
def contrastive_loss(y_true, y_pred):
margin = 1
square_pred = K.square(y_pred)
margin_square = K.square(K.maximum(margin - y_pred, 0))
# return K.mean(y_true * square_pred + (1-y_true) * margin_square)
return K.mean(y_true * margin_square + (1-y_true) * square_pred)
def compute_accuracy(y_true, y_pred):
'''Compute classification accuracy with a fixed threshold on distances.
'''
pred = y_pred.ravel() > 0.5
return np.mean(pred == y_true)
def accuracy(y_true, y_pred):
'''Compute classification accuracy with a fixed threshold on distances.
'''
return K.mean(K.equal(y_true, K.cast(y_pred > 0.5, y_true.dtype)))
The accuracy for the old model:
* Accuracy on training set: 99.55%
* Accuracy on test set: 97.42%
This slight change leads to a model that not learning anything:
* Accuracy on training set: 48.64%
* Accuracy on test set: 48.29%
So my question is:
1. What is wrong with my reasoning of using Substract + Dense for the lower part of the Siamese network?
2. Can we fix this? I have two potential solution in mind but I am not confident, (1) convoluted neural net for feature extraction (2) more dense layers for the lower part of the siamese network.
In case of two similar examples, after subtracting two n-dimensional feature vector (extracted using common/base feature extraction model) you will get zero or around zero value in most of the location of resulting n-dimensional vector on which next/output Dense layer works. On the other hand, we all know that in a ANN model weights are learnt in such a way that less important features produce very less responses and prominent/interesting features contributing towards decision produce high responses. Now you can understand that our subtracted features vector is just in the opposite direction because when two examples are from different class then they produce high responses and opposite for examples from same class. Furthermore with a single node in the output layer (no additional hidden layer before output layer) its quite difficult to learn for model to generate high response from zero values when two samples are of same class. This might be an important point to solve your problem.
Based on the above discussion, you may want to try following ideas:
transforming subtracted feature vector to ensure when there is similarity you get high responses, may be by doing subtraction from 1 or reciprocal (multiplicative inverse) followed by normalization.
Adding more Dense layer before output layer.
I wont be surprised if convolutional neural net instead of stacked Dense layer for feature extraction (as you are thinking) does not improve your accuracy much as it's just another way of doing the same (feature extraction).
Although not strictly a programming question, I haven't found anything about this topic on this site. I currently dealing with (variational) autoencoders ((V)AE), and plan to deploy them to detect anomalies. For testing purposes, I've implemented an VAE in tensorflow for detecting handwritten digits.
The training went well and the reconstructed images are very similar to the originals. But for actually using the autoencoder, I have to use some kind of measure to determine if a new image fed to the autoencoder is a digit or not by comparing it to a threshold value.
At this point, I have two major questions:
1.) For training, I used a loss consisting of two components. First one is the reconstruction error, which is a crossentropy function:
# x: actual input
# x_hat: reconstructed input
epsilon = 1e-10 # <-- small number for numeric stability within log
recons_loss = - f.reduce_sum( x * tf.log( epsilon + x_hat) + (1 - x) * tf.log( epsilon + 1 - x_hat),
axis=1)
The second one is KL-divergence, which is a measure of how similar two probability distributions are, as we are demanding that the latent variable space is a distribution similar to a Gaussian.
# z_mean: vector representing the means of the latent distribution
# z_log_var: vector representing the variances of the latent distribution
KL_div = -0.5 * tf.reduce_sum( 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var),
axis=1)
For determining the reconstruction error of a new image, do I have to use both parts of the training loss? Intuitively, I would say no and just go with the recon_loss.
2.) How do I determine the threshold value? Is there already a tf functionality implemented that I can use?
If you have some good source for anything related, please share the link!
Thanks!
I had a similar problem recently. VAEs are very well in projecting a high dimensional data into a lower dimensional latent space. Altering the latent vector and feeding it to the decoder part creates new samples.
I hope I get your question right, you try to do an anomaly detection with the encoder part on the lower dimensional latent space?
I guess you have trained your VAE on MNIST. What you can do is getting all latent vectors of the MNIST-digits and compare the latent vector of your new digit via euclidian distance to them. The threshold would be a max distance set by you.
The code would be something like this:
x_mnist_encoded = encoder.predict(x_mnist, batch_size=batch_size) #array of MNIST latent vectors
test_digit_encoded = encoder.predict(x_testdigit, batch_size=1) #your testdigit latent vector
#calc the distance
from scipy.spatial import distance
threshold = 0.3 #min eucledian distance
for vector in x_mnist_encoded:
dst = distance.euclidean(vector,test_digit_encoded[0])
if dst <= threshold:
return True
VAE code is from https://blog.keras.io/building-autoencoders-in-keras.html
I'm working on a 200-class classification task(but it's a little bit different, because there might be multiple 1's in the y vector) using a 4-layer fully-connected neural network. For most times y(label vector) contains one or two 1's, and that's where the problem is. When training, the model tends to predict all the labels as zero, even it should be 1.
Thus the accuracy is low(less than 99%, which is almostly worse than all-zero prediction). The activation function for each layer is sigmoid. Could you give me some advice to improve the model?
This is my loss function. The accuracy is low because when I predict all labels as 0, it'll get almost 99% accuracy.
loss = tf.reduce_mean(tf.reduce_sum(-(sum_all - sum_one) / sum_all * tf.multiply(ys, tf.log(prediction)) - sum_one / sum_all * tf.multiply((one - ys), tf.log(one - prediction)), reduction_indices = [1])) sum_one indicates the number of 1's in the label. I implemented a weighting here.
I want to implement an accuracy function for a triplet loss network so that I know, how does the algorithm works during the training. So far I have tried something, but I'm not sure whether it actually can work and also I have troubles implementing it in keras. My idea was to compare the predicted anchor-positive and anchor-negative distances (in y_pred), so that the positive distance should be low enough and the negative one large enough:
def accuracy(_, y_pred):
pos_treshold = 0.4
neg_treshold = 0.6
return K.mean(y_pred[0] < pos_treshold and y_pred[1] > neg_treshold)
The problem with this is that I couldn't figure out how to implement this and condition in keras.
Then I tried to find something on this topic of accuracy for triplet loss. One way of doing it is to define the accuracy as a proportion of the number of triplets in which the predicted distance between the anchor image and the positive image is less than the one between the anchor image and the negative image. With this I have even bigger problems in implementing it in keras.
I tried this (although I don't know whether it does what I described):
K.mean(y_pred[0] < y_pred[1])
which gives me accuracy around 0.5 all the time (probably some random stuff). So still I don't know whether the model is bad or the accuracy function is bad.
So my question is how to implement any reasonable accuracy function in keras? Whether it would be one of these two I don't really care.
That's what I use (condition y_pred[0] < y_pred[1]), while taking into account the batch dimension. Note that I'm not using a mean, so that it would support sample-weight.
def triplet_accuracy(_, y_pred):
'''
Input: y_pred shape is (batch_size, 2)
[pos, neg]
Output: shape (batch_size, 1)
loss[i] = 1 if y_pred[i, 0] < y_pred[i, 1] else 0
'''
subtraction = K.constant([-1, 1], shape=(2, 1))
diff = K.dot(y_pred, subtraction)
loss = K.maximum(K.sign(diff), K.constant(0))
return loss