I learning AI with Python and have this situation: I created a deep learning model that has 10 neurons in his Input layer. On the output layer I have 3 neurons. I split up my data to 80% for learning and 20% for testing.
The trained model is ready for testing.
Until now, I always got situation that I have only one neuron in the output layer. So, I tested the accuracy in that way:
classifier = Sequential()
# ...
classifier.add(Dense(units = 3, kernel_initializer = 'uniform', activation = 'sigmoid'))
# ...
y_pred = classifier.predict(np.array(X_test))
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
which working great when the output layer has only ONE value on each prediction.
In my case, I have 3 values in each prediction.
y_pred = array ([[3.142904686503911194e-11, 1.000000000000000000e+00, 1.729809626091548085e-16],
[7.398544450698540942e-12, 1.000000000000000000e+00, 1.776427415878292515e-22],
[4.224535246066807304e-07, 1.000000000000000000e+00 7.929732391553923065e-12]])
And I want to compare it to my expected values, which:
y_test = [[0, 1, 0], [0, 1, 0], [0, 1, 0]]
So, I have the option to make this work manually:
Put 1 in the highest value in the prediction value. Other values are getting 0.
Compare the two vectors row by row.
It looks like must have a better way to do it?
You want to measure how "close" the prediction vector is to the expected vector. A good formula that describes the "amount of difference" between two vectors is to check the magnitude (or square magnitude) of the delta vector (prediction - expected).
In this case, you can do something like this:
def square_magnitude(vector):
return sum(x*x for x in vector)
def inaccuracy(pred, test): #should only get equal-length items
return square_magnitude([pred[i] - test[i] for i in range(len(pred))]) / len(pred)
Since you have three samples:
total_inaccuracy = sum(inaccuracy(y_pred[i], y_test[i]) for i in range(len(y_pred))) / len(y_pred)
This should be 0 when it's perfectly accurate and higher (positive) when it's less accurate.
Related
I have 2 loss functions in my model - Cross Entropy and Mean Squared.
I want my model to minimize both the losses but the model is only minimizing mean squared error during training.
def buildGenerator(dmodel, batch=100):
inputs = Input(shape=(256,256,1))
x = Conv2D(
filters = 32,
kernel_size = 3,
padding = 'same',
strides = 1
)(inputs)
x = BatchNormalization(momentum = 0.9)(x)
x = LeakyReLU(alpha=0.2)(x)
.........................
...........................
outputs1 = Conv2D(
filters = 2,
kernel_size = 3,
padding = 'same',
strides = 1
)(x)
outputs2 = dmodel(outputs1)
model = Model(inputs = inputs, outputs = [ outputs2, outputs1], name = 'functional_model')
model.compile(
loss = ['binary_crossentropy','mse' ],
optimizer = 'Adam',
loss_weights = [1.0, 0.6],
metrics=['accuracy', 'mse']
)
return model
In this code, dmodel is another model. I am using dmodel to classify outputs1 generated by the model and then finding cross-entropy between input labels and the output labels.
This is how I am training
dmodel = buildDiscriminator()
dmodel.load_weights('./GAN/discriminator')
dmodel.trainable = False
x, y1 = getGeneratorData()
y2 = np.ones((batch, 1))
model = buildGenerator(dmodel)
model.fit(x,[y2, y1],epochs=1)
I tried a lot of things like changing loss_weights, changing loss functions but nothing is working. My model is only minimizing the MSE function.
I don't understand what I am doing wrong.
I think using the discriminator model inside the generator is the issue but I am not sure.
I do not know whether there is a simple syntax to combine different loss functions, but you can try to define an own loss class. In another thread I found this code snippet that defines an own loss class that combines two other loss functions:
rho = 0.05
class loss_with_KLD(losses.Loss):
def __init__(self, rho):
super(loss_with_KLD, self).__init__()
self.rho = rho
self.kl = losses.KLDivergence()
self.mse = losses.MeanSquaredError(reduction=tf.keras.losses.Reduction.SUM)
def call(self, y_true, y_pred):
mse = self.mse(y_true, y_pred)
kl = self.kl(self.rho, y_pred)
return mse + kl
If you just replace the KLDivergence by the binary cross entropy then this should work. Additionally, you would need to alter the call() function, since this implementation applies two loss function on the same predicted y value, but you actually predict two different y values. In this case, your y_true and y_pred would both contain two values and you would need to apply each loss function on only one of them. I do not know if it is easily possible to take a single element from a vector (in the style of y_true[0]), but if it is not, you could work around this by applying a "mask" to you vectors by multiplying them with [0, 1] or [1, 0], depending on the value you need. With this done you can use the reduce_sum() function get you single value and apply the loss function on your new y_true and y_pred.
This is a little bit more complicated, but it should get the job done.
When you specify 2 loss functions they apply to your 2 different outputs.
i.e. in your example binary_crossentropy applies to output2 which has a y_true value of all ones. And is the output of a non-trainable model.
It seems likely that you want to return a single value from model since you do not seem to have labels for output2. While you could define your own custom loss function that combines both losses on the same value, I would advise against it. If the output value is a single class prediction (i.e. pixel on/off) then binary_crossentrophy makes sense; If it is supposed to be a discrete value then mse makes sense.
I am using Ubuntu 19.04 (Disco Dingo), Python 3.7.3, and TensorFlow 1.14.0.
I noticed that the number of outputs given by the tensorflow.keras.Sequential.predict function is different than the number of inputs. Furthermore, it appears that there is no relation between the inputs and outputs.
Example:
import tensorflow as tf
import math
import numpy as np
import json
# We will train the model to recognize an XOR
x = [ [0,0], [0,1], [1,0], [1,1] ]
y = [ 0, 1, 1, 0 ]
xt = tf.cast(x, tf.float64)
yt = tf.cast(y, tf.float64)
# This model should be more than enough to learn an XOR
L0 = tf.keras.layers.Dense(2)
L1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
L2 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
L3 = tf.keras.layers.Dense(2, activation=tf.nn.softmax)
model = tf.keras.Sequential([L0,L1,L2,L3])
model.compile(
optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
model.fit(
x=xt,
y=yt,
batch_size=32,
epochs=1000, # Try to overfit data
shuffle=False,
steps_per_epoch=math.ceil(len(x)/32)
)
# While it is training, the loss drops to near zero
# and the accuracy goes to 100%.
# The large number of epochs and the small number of training examples
# should mean that the network is overtrained.
print("testing")
for i in range(len(y)):
m = tf.cast([x[i]],tf.float64)
# m should be the ith training example
values = model.predict(m,steps=1)
best = np.argmax(values[0])
print(x[i],y[i],best)
The output I always get is:
(input, correct answer, predicted answer)
[0, 0] 0 0
[0, 1] 1 0
[1, 0] 1 0
[1, 1] 0 0
or
[0, 0] 0 1
[0, 1] 1 1
[1, 0] 1 1
[1, 1] 0 1
So, even though I thought that the network would be overtrained, even though the program said that the accuracy was 100% and the loss was virtually zero, the output looks as though the network hadn't trained at all.
Stranger yet is when I replace the testing section with the following:
print("testing")
m = tf.cast([], tf.float64)
values = model.predict(m, steps=1)
print(values)
I would think that this would return an empty array or throw an exception. Instead it gives:
[[0.9979249 0.00207507]
[0.10981816 0.89018184]
[0.10981816 0.89018184]
[0.9932179 0.0067821 ]]
This corresponds to [0,1,1,0]
So even though it was given nothing to predict on, it still gives out predictions for something. And it appears as though the predictions match up with what what we would expect from sending the entire training set into the predict method.
Replacing the testing section again:
print("testing")
m = tf.cast([[0,0]], tf.float64)
# [0,0] is the first training example
# the output should be something close to [[1.0,0.0]]
values = model.predict(m, steps=1)
for j in range(len(values)):
print(values[j])
exit()
I get:
[0.9112452 0.08875483]
[0.00552484 0.9944752 ]
[0.00555605 0.99444395]
[0.9112452 0.08875483]
This corresponds to [0,1,1,0]
So asking it to predict on zero inputs, gives out 4 predictions. Asking it to predict on one input gives out 4 predictions. Furthermore, the predictions it gives out looks like what we would expect if we put the entire training set into the predict function.
Any ideas as to what's going on? How do I get my network to give exactly one prediction for each input given?
Providing the solution here (Answer Section), even though it is present in the Comment Section, for the benefit of the community.
Upgrading Tensorflow from 1.14.0 >=2.0 has resolved the issue.
After upgrading test section works as expected
m = tf.cast([[0,0]], tf.float64)
# [0,0] is the first training example
# the output should be something close to [[1.0,0.0]]
values = model.predict(m, steps=1)
for j in range(len(values)):
print(values[j])
exit()
Output:
[0.9921625 0.00783745]
I want to do evaluation of a classification Tensorflow model.
To compute the accuracy, I have the following code :
predictions = tf.argmax(logits, axis=-1, output_type=tf.int32)
accuracy = tf.metrics.accuracy(labels=label_ids, predictions=logits)
It work well in single label classification, but now I want to do multilabel classification, where my labels are Array of Integers instead of Integers.
Here is an example of label [0, 1, 1, 0, 1, 0] that are stored in label_ids, and an example of predictions [0.1, 0.8, 0.9, 0.1, 0.6, 0.2] from the Tensor logits
What function should I use instead of argmax to do so ? (My labels are arrays of 6 Integers with value of either 0 or 1)
If needed, we can suppose that there is a threshold of 0.5.
It is probably better to do this type of post-processing evaluation outside of tensorflow, where it is more natural to try several different thresholds.
If you want to do it in tensorflow, you can consider:
predictions = tf.math.greater(logits, tf.constant(0.5))
This will return a tensor of the original logits shape with True for all entries greater than 0.5. You can then calculate accuracy as before. This is suitable for cases where many labels can be simultaneously true for a given sample.
Use below code to caclutae accuracy in multiclass classification:
tf.argmax will return the axis where y value is max for both y_pred and y_true(actual y).
Further tf.equal is used to find total number of matches (It returns True, False).
Convert the boolean into float(i.e. 0 or 1) and use tf.reduce_mean to calculate the accuracy.
correct_mask = tf.equal(tf.argmax(y_pred,1), tf.argmax(y_true,1))
accuracy = tf.reduce_mean(tf.cast(correct_mask, tf.float32))
Edit
Example with data:
import numpy as np
y_pred = np.array([[0.1,0.5,0.4], [0.2,0.6,0.2], [0.9,0.05,0.05]])
y_true = np.array([[0,1,0],[0,0,1],[1,0,0]])
correct_mask = tf.equal(tf.argmax(y_pred,1), tf.argmax(y_true,1))
accuracy = tf.reduce_mean(tf.cast(correct_mask, tf.float32))
with tf.Session() as sess:
# print(sess.run([correct_mask]))
print(sess.run([accuracy]))
Output:
[0.6666667]
I created an LSTM model for intraday stock predictions. I took the training data with the shape of (290, 4). I did all the preprocessing like Normalize the data, taking the difference, taking window size of 4.
This is a sample of my input data.
X = array([[0, 0, 0, 0],
[array([ 0.19]), 0, 0, 0],
[array([-0.35]), array([ 0.19]), 0, 0],
...,
[array([ 0.11]), array([-0.02]), array([-0.13]), array([-0.09])],
[array([-0.02]), array([ 0.11]), array([-0.02]), array([-0.13])],
[array([ 0.07]), array([-0.02]), array([ 0.11]), array([-0.02])]], dtype=object)
y = array([[array([ 0.19])],
[array([-0.35])],
[array([-0.025])],
.....,
[array([-0.02])],
[array([ 0.07])],
[array([-0.04])]], dtype=object)
Note: I am giving as well as predicting the difference value. So input value is between range (-0.5,0.5)
Here is my Keras LSTM model :
dim_in = 4
dim_out = 1
model.add(LSTM(input_shape=(1, dim_in),
return_sequences=True,
units=6))
model.add(Dropout(0.2))
model.add(LSTM(batch_input_shape=(1, features.shape[1],features.shape[2]),return_sequences=False,units=6))
model.add(Dropout(0.3))
model.add(Dense(activation='linear', units=dim_out))
model.compile(loss = 'mse', optimizer = 'rmsprop')
for i in range(300):
#print("Completed :",i+1,"/",300, "Steps")
model.fit(X, y, epochs=1, batch_size=1, verbose=2, shuffle=False)
model.reset_states()
I am feeding the last sequence value of shape=(1,4) and predict the output.
This is my prediction :
base_value = df.iloc[290]['Close']
prediction = []
orig_pred = []
input_data = np.copy(test[0,:])
input_data = input_data.reshape(len(input_data),1)
for i in range(100):
inp = input_data[i:,:]
inp = inp.reshape(1,1,inp.shape[0])
y = model.predict(inp)
orig_pred.append(y[0][0])
input_data = np.insert(input_data,[i+4],y[0][0], axis=0)
base_value = base_value + y
prediction_apple.append(base_value[0][0])
sqrt(mean_squared_error(test_output, orig_pred))
RMSE = 0.10592485833344527
Here is the difference in prediction visualization along with stock price prediction.
fig:1 -> This is the LSTM prediction
fig:2 -> This is the Stock prediction
I am not sure why it is predicting the same output value after 10 iterations. Maybe it is the vanishing gradient problem or I am feeding fewer input data(290 approx) or problem in the model architecture. I am not sure.
Please Help how to get the reasonable result.
Thank you !!!
I don't work with Keras, but looking through your code and plots it seems like the complexity of your network might not be high enough to fit the data. Try enlarging the network with more units and also try larger window sizes.
Because your regressor secures the minimization of the cost function by replicating the feature you give as input feature. For example if you have BTC closing value as $6340 at time t, it will go for it at t+1 or some value close to it. Ensure that you are not giving a direct numerical intuition to a regressor that what the predicted label might be, especially when working with time-series data.
I have written the following binary classification program in tensorflow that is buggy. The cost is returning to be zero all the time no matter what the input is. I am trying to debug a larger program which is not learning anything from the data. I have narrowed down at least one bug to the cost function always returning zero. The given program is using some random inputs and is having the same problem. self.X_train and self.y_train is originally supposed to read from files and the function self.predict() has more layers forming a feedforward neural network.
import numpy as np
import tensorflow as tf
class annClassifier():
def __init__(self):
with tf.variable_scope("Input"):
self.X = tf.placeholder(tf.float32, shape=(100, 11))
with tf.variable_scope("Output"):
self.y = tf.placeholder(tf.float32, shape=(100, 1))
self.X_train = np.random.rand(100, 11)
self.y_train = np.random.randint(0,2, size=(100, 1))
def predict(self):
with tf.variable_scope('OutputLayer'):
weights = tf.get_variable(name='weights',
shape=[11, 1],
initializer=tf.contrib.layers.xavier_initializer())
bases = tf.get_variable(name='bases',
shape=[1],
initializer=tf.zeros_initializer())
final_output = tf.matmul(self.X, weights) + bases
return final_output
def train(self):
prediction = self.predict()
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=self.y))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(cost, feed_dict={self.X:self.X_train, self.y:self.y_train}))
with tf.Graph().as_default():
classifier = annClassifier()
classifier.train()
If someone could please figure out what I am doing wrong in this, I can try making the same change in my original program. Thanks a lot!
The only problem is invalid cost used. softmax_cross_entropy_with_logits should be used if you have more than two classes, as softmax of a single output always returns 1, as it is defined as :
softmax(x)_i = exp(x_i) / SUM_j exp(x_j)
so for a single number (one dimensional output)
softmax(x) = exp(x) / exp(x) = 1
Furthermore, for softmax output TF expects one-hot encoded labels, so if you provide only 0 or 1, there are two possibilities:
True label is 0, so the cost is -0*log(1) = 0
True label is 1, so the cost is -1*log(1) = 0
Tensorflow has a separate function to handle binary classification which applies sigmoid instead (note, that the same function for more than one output would apply sigmoid independently on each dimension which is what multi-label classification would expect):
tf.sigmoid_cross_entropy_with_logits
just switch to this cost and you are good to go, you do not have to encode anything as one-hot anymore either, as this function is designed solely to be used for your use-case.
The only missing bit is that .... your code does not have actual training routine you need to define optimiser, ask it to minimise a loss and then run a train op in the loop. In your current setting you just try to predict over and over, with the network which never changes.
In particular, please refer to Cross Entropy Jungle question on SO which provides more detailed description of all these different helper functions in TF (and other libraries), which have different requirements/use cases.
The softmax_cross_entropy_with_logits is basically a stable implementation of the 2 parts :
softmax = tf.nn.softmax(prediction)
cost = -tf.reduce_mean(labels * tf.log(softmax), 1)
Now in your example, prediction is a single value, so when you apply softmax on it, its going to be always 1 irrespective of the value (exp(prediction)/exp(prediction) = 1), and so the tf.log(softmax) term becomes 0. Thats why you always get your cost zero.
Either apply sigmoid to get your probabilities between 0 or 1 or if you use want to use softmax get the labels as [1, 0] for class 0 and [0, 1] for class 1.