A simple feed forward DNN with relevant .csv files can be found here https://github.com/jhsmith12345/tensorflow/blob/normalize_prediction/tf_from_csv.py
This piece of code
classification = prediction.eval(feed_dict={x: [[9,3]]})
print (classification)
is outputting
[[ -12.2412138 -17.24327469 ]]
I am expecting a prediction that conforms to the labels, which are 1 or 0. Something like
[[ 0 1 ]]
I believe that my predictive values are not getting normalized by a softmax, but have no idea how to proceed. Any help is appreciated! Also, I'm more than happy to post the full code here but didn't want to clutter the post. Thanks!
Let me clear, in your code
prediction = neural_network_model(x)
prediction.eval(feed_dict={x: [[9,3]]})
# output is [[ -12.2412138 -17.24327469 ]]
and you confuse why the range is not 0 ~ 1, right ?
because softmax doesn't apply on prediction
you use tf.nn.softmax_cross_entropy_with_logits
For I know, this function apply softmax to prediction before compute cross entropy
But it doesn't change the value of prediction
I think you can
do softmax then compute cross entropy, finally print prediction directly (it means can't use tf.nn.softmax_cross_entropy_with_logits)
or, change nothing, but do softmax before print prediction
Related
I am working on a NN with Pytorch which simply maps points from the plane into real numbers, for example
model = nn.Sequential(nn.Linear(2,2),nn.ReLU(),nn.Linear(2,1))
What I want to do, since this network defines a map h:R^2->R, is to compute the gradient of this mapping h in the training loop. So for example
for it in range(epochs):
pred = model(X_train)
grad = torch.autograd.grad(pred,X_train)
....
The training set has been defined as a tensor requiring the gradient. My problem is that even if the output, for each fixed point, is a scalar, since I am propagating a set of N=100 points, the output is actually a Nx1 tensor. This brings to the error: autograd can compute the gradient just of scalar functions.
In fact, trying with the little change
pred = torch.sum(model(X_train))
everything works perfectly. However I am interested in all the single gradients so, is there a way to compute all these gradients together?
Actually computing the sum as presented above gives exactly the same result I expect of course, but I wanted to know if this is the only possiblity.
There are other possibilities but using .sum is the simplest way. Using .sum() on the final loss vector and computing dpred/dinput will give you the desired output. Here is why:
Since, pred = sum(loss) = sum (f(xi))
where i is the index of input x.
dpred/dinput will be a matrix [dpred/dx0, dpred/dx1, dpred/dx...]
Consider, dpred/dx0, it will be equal to df(x0)/dx0, since other df(xi)/dx0 is 0.
PS: Please excuse the crappy mathematical expressions... SO does not support latex/math expressions.
I am training a Network on images for binary classification. The input images are normalized to have pixel values in the range[0,1]. Also, the weight matrices are initialized from a normal distribution. However, the output from my last Dense layer with sigmoid activation yields values with a very minute difference for the two classes. For example -
output for class1- 0.377525 output for class2- 0.377539
The difference for the classes comes after 4 decimal places. Is there any workaround to make sure that the output for class 1 falls around 0 to 0.5 and for class 2 , it falls between 0.5 to 1.
Edit:
I have tried both the cases.
Case 1 - Dense(1, 'sigmoid') with binary crossentropy
Case 2- Dense(2, 'softmax') with binary crossentropy
For case1, the output values differ by a very small amount as mentioned in the problem above. As such , i am taking mean of the predicted values to act as threshold for classification. This works upto some extent, but not a permanent solution.
For case 2 - the prediction overfits to one class only.
A sample code : -
inputs = Input(shape = (128,156,1))
x = Conv2D(.....)(inputs)
x = BatchNormalization()(x)
x = Maxpooling2D()(x)
...
.
.
flat=Flatten()(x)
out = Dense(1,'sigmoid')(x)
model = Model(inputs,out)
model.compile(optimizer='adamax',loss='binary_crossentropy',metrics=['binary_accuracy'])
It seems you are confusing a binary classification architecture with a 2 label multi-class classification architecture setup.
Since you mention the probabilities for the 2 classes, class1 and class2, you have, set up a single label multi-class setup. That means, you are trying to predict the probabilities of 2 classes, where a sample can have only one of the labels at a time.
In this setup, it's proper to use softmax instead of sigmoid. Your loss function would be binary_crossentropy as well.
Right now, with the multi-label setup and sigmoid activation, you are independently predicting the probability of a sample being class1 and class2 simultaneously (aka, multi-label multi-class classification).
Once you change to softmax you should see more significant differences between the probabilities IF the sample actually definitively belongs to one of the 2 classes and if your model is well trained & confident about its predictions (validation vs training results)
First, I would like to say the information you provided is insufficient to exactly debug your problem, because you didn't provide any code of your model and optimizer. I suspect there might be an error in the labels, and I also suggest you use a softmax activation fuction instead of the sigmoid function in the final layer, although it will still work through your approach, binary classification problems must output one single node and loss must be binary cross entropy.
If you want to receive an accurate solution, please provide more information.
I'm currenly working on a dataset where I've to predict an integer output. It starts from 1 to N. I've build a network with loss function mse. But I feel like mse loss function may not be an ideal loss function to minimize in the case of integer output.
I'm also round my prediction to get integer output. Is there a way to make/optimize the model better in case of integer output.
Can anyone provide some help on how to deal with integer output/targets. This is the loss function I'm using right now.
model.compile(optimizer=SGD(0.001), loss='mse')
You are using the wrong loss, mean squared error is a loss for regression, and you have a classification problem (discrete outputs, not continuous).
So for this your model should have a softmax output layer:
model.add(Dense(N, activation="softmax"))
And you should be using a classification loss:
model.compile(optimizer=SGD(0.001), loss='sparse_categorical_crossentropy')
Assuming your labels are integers in the [0, N-1] range (off by one), this should work. To make a prediction, you should do:
output = np.argmax(model.predict(some_data), axis=1) + 1
The +1 is because integer labels go from 0 to N-1
Ordinal regression could be an appropriate approach, in case predicting the wrong month but close to the true month is considered a smaller mistake than predicting a value one year earlier or later. Only you can know that, based on the specific problem you want to solve.
I found an implementation of the appropriate loss function on github (no affiliation). For completeness, below I copy-paste the code from that repo:
from keras import backend as K
from keras import losses
def loss(y_true, y_pred):
weights = K.cast(
K.abs(K.argmax(y_true, axis=1) - K.argmax(y_pred, axis=1))/(K.int_shape(y_pred)[1] - 1),
dtype='float32'
)
return (1.0 + weights) * losses.categorical_crossentropy(y_true, y_pred)
I am trying to build two neural network for classification. One for Binary and the second is for multi-class classification. I am trying to use the torch.nn.CrossEntropyLoss() as a loss function, but I try to train my first neural network I get the following error:
multi-target not supported at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THNN/generic/ClassNLLCriterion.c:22
From my analysis, I found that the my dataset has two problems that caused the error.
My data set is one hot encoded. I used one hot encoding to pre processes my dataset. The first target Y_binary variable has the shape of torch.Size([125973, 1]) full of 0s and 1 indicating classes 'No' and 'Yes'.
My data has the wrong dimensions? I found that I can't use a simple vector with the cross entropy loss function. Some people used the following code to reshape their target vector before feeding to the loss function.
out = out.permute(0, 2, 3, 1).contiguous().view(-1, class_number)
But I didn't really understand the reasoning behind this code. But it seems for my that I need to keep track of the following variables: Class_Number, Batch_size, Dimension_Output. For my code here are the dimensions
X_train.shape: (125973, 122)
Y_train2.shape: (125973, 1)
batch_size = 64
K = len(set(Y_train2)) # Binary classification For multi class classification use K = len(set(Y_train5))
Should the target value be one hot encoded? If not, how I can feed a nominal feature to the loss function?
If I use reshape the output, can you help me do this for my code ?
I am trying to use this loss function for both my neural networks.
Thank you in advance,
The error is due to the usage of torch.nn.CrossEntropyLoss() which can be used if you want to predict 1 class out of N classes. For multiclass classification, you should use torch.nn.BCEWithLogitsLoss() which combines a Sigmoid layer and the BCELoss in one single class.
In case of multi-class, and if you use Sigmoid + BCELoss, then you need the target to be one-hot encoding, i.e. something like this per sample: [0 1 0 0 0 1 0 0 1 0], where 1 will be at the locations of classes present.
I have a cost function in Keras which has 3 parts related to the different output of my network. suppose this is my loss function:
aL1+bL2+cL3 that L1 is mse, L2 is binary cross-entropy and L3 try to make minimum the number of pixels in the output that does not have value 0 or 1(∑n (x≠0 or x≠1)), but I do not know how can I make the last loss function?!( a,b and c are coefficients for each loss functions)
The output should be a 28x28 binary image which values are 0 or 1. by adding this term to loss function I try to force the output to be 0 or 1 and other values try to put in one of these two class. but I do not know how can I produce this loss function or I have the combination of these loss functions? if I only have two first loss function I do this
model.compile(optimizer=opt, loss=`{'decoder_output':'mse','reconstructed_W':'binary_crossentropy'}, loss_weights={'decoder_output': 0.1, 'reconstructed_W': 1.0},metrics=['mae'])`
the third loss is related to reconstructed_w and I want to force its value to be only 0 or 1 but I do not know how to code this. could you please help me with this issue? I appreciate your help. I really need your guidance.