Keras - model.predict_classes gives wrong labels - python

My model is like
print('Build main model...')
model = Sequential()
model.add(Merge([left, right], mode='sum'))
model.add(Dense(14, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
when I use model.evaluate([xtest1, xtest2], y_test), I get an accuracy of 90% but when I use model.predict_classes([x_test1, x_xtest2]), I get totally wrong class labels, going by which my accuracy drops significantly. What is the difference in model.evaluate and model.predict_classes schema? Where am I making the mistake?

Since you ask for loss='binary_cross_entropy' and metric=['accuracy'] in your model compilation, Keras infers that you are interested in the binary accuracy, and this is what it returns in model.evaluate(); in fact, since you have 14 classes, you are actually interested in the categorical accuracy, which is the one reported via model.predict_classes().
So, you should change the loss function in your model compilation to categorical_crossentropy:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
If, for whatever reason, you want to stick with loss='binary_crossentropy' (admittedly it would be a very unusual choice) , you should change the model compilation to clarify that you want the categorical accuracy as follows:
from keras.metrics import categorical_accuracy
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[categorical_accuracy])
In either of these cases, you will find that the accuracies reported by model.evaluate() and model.predict_classes() are the same, as they should be.
For a more detailed explanation and an example using the MNIST data, see my answer here.

Related

What is the difference between tf.train.AdamOptimizer and use adam in keras.compile?

i was building a dense neural network for predicting poker hands. First i had a problem with the reproducibility, but then i discovered my real problem: That i can not reproduce my code is because of the adam-optimizer, because with sgd it worked.
This means
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
did NOT work, whereas
opti = tf.train.AdamOptimizer()
model.compile(loss='sparse_categorical_crossentropy', optimizer=opti, metrics=['accuracy'])
worked with reproducibility.
So my question is now:
Is there any difference using
tf.train.AdamOptimizer
and
model.compile(..., optimizer = 'adam')
because i would like to use the first one because of the reproduce-problem.
They both are the same. However, in the tensorflow.train.AdamOptimizer you can change the learning rate
tf.compat.v1.train.AdamOptimizer(
learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False,
name='Adam')
which will improve the learning performance and the training would take longer. but in the model.compile(optimizer="adam") it will set the learning rate, beta1, beta2...etc to the default settings

Using Tensorflow 2.0 and eager execution without Keras

So this question might stem from a lack of knowledge about tensorflow. But I am trying to build a multilayer perceptron with tensorflow 2.0, but without Keras.
The reason being that it is a requirement for my machine learning course that we do not use keras. Why you might ask? I am not sure.
I already have implemented our model in tensorflow 2.0 with Keras ease, and now I want to do the exact same thing without keras.
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=784))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
X_train = X[:7000]
y_train = tf.keras.utils.to_categorical(y[:7000], num_classes=5)
X_dev = X[7000:]
y_dev = tf.keras.utils.to_categorical(y[7000:], num_classes=5)
model.fit(X_train, y_train,
epochs=100,
batch_size=128)
score = model.evaluate(X_dev, y_dev, batch_size=128)
print(score)
Here is my problem. Whenever I look up the documentation on Tensorflow 2.0, then even the guides on custom training are using Keras.
As placeholders and sessions are a thing of the past in tensorflow 2.0, as I understand it, then I am a bit unsure of how to structure it.
I can make tensor objects. I have the impression that I need to use eager execution and use gradient tape. But I still am unsure of how to put these things together.
Now my question is. Where should I look to get a better understanding? Which direction has the greatest descent?
Please do tell me if I am doing this stack overflow post wrong. It is my first time here.
As #Daniel Möller stated, there are these tutorials for custom training and custom layers on the official TensorFlow page. As stated on the custom training page:
This tutorial used tf.Variable to build and train a simple linear model.
There is also this blog that creates custom layers and training without Keras API. You can check this code on Google Colab, which uses Cifar-10 with custom layers and training in the same manner.

Keras trained regression model predicts same output for all set of test features

I am trying to build a regression model that predicts the 'Ratings' for movies using the dataset https://www.kaggle.com/shubhammehta21/movie-lens-small-latest-dataset. However after training the model, predictions outputs the same value for all test features. I have read previous similar features that suggested adjusting learning rates, no. of features and checking that the model predicting is the same as the trained model. None of these has worked for me.
I load the data and process it:
links= pd.read_csv('../input/movie-lens-small-latest-dataset/links.csv')
movies=pd.read_csv('../input/movie-lens-small-latest-dataset/movies.csv')
...
dataset=movies.merge(ratings,on='movieId').merge(tags,on='movieId').merge(links,on='movieId')
to_drop='title','genres','timestamp_x','timestamp_y','userId_y','imdbId','tmdbId']
dataset.drop(columns=to_drop,inplace=True)
dataset=pd.get_dummies(dataset)
The code shows how I build the regression model. I have tried adjusting the number of neuron and layers, however, that has not influenced the output.
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import Adam
model = Sequential()
model.add(Dense(13, input_dim=1586, kernel_initializer='zero', activation='relu'))
model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal',activation='linear'))
# Compile model
adam = Adam(lr=0.001)
model.compile(loss='mean_squared_error', optimizer=adam,metrics=['mse','mae'])
model.summary()
history = model.fit(train_dataset,train_labels,batch_size=30, epochs=10,verbose=1, validation_split=0.3)
score = model.evaluate(validation_dataset,validation_labels)
print("Test score:", score)
Whenever I try to predict the test dataset:
model.predict(test_dataset)
It predicts the value of
3.97
on all values. I am expecting a range of values between 0 - 5.
You should never (I mean, never) use kernel_initializer='zero' - to be honest, I am surprised that the option even exists in Keras!
Also, kernel_initializer='normal' is not recommended.
As a first step, remove all kernel_initializer arguments, so as to revert to the default and recommended one, kernel_initializer='glorot-uniform'; keep in mind that defaults are there for a reason (usually they work well), and you should change them only if you really have a reason to do so (which I trust you don't have here) and you know what you are doing.
If you still don't get what you would expect, experiment with other parameters (no. of layers/neurons, more epochs etc); you should leave the learning rate (lr) of Adam optimizer as is for starters (it's also one of these default values that seem to work nicely across cases).

How can I get weights converged in a way that MSE minimizes?

here is my code
for _ in range(5):
K.clear_session()
model = Sequential()
model.add(LSTM(256, input_shape=(None, 1)))
model.add(Dropout(0.2))
model.add(Dense(256))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='RmsProp', metrics=['accuracy'])
hist = model.fit(x_train, y_train, epochs=20, batch_size=64, verbose=0, validation_data=(x_val, y_val))
p = model.predict(x_test)
print(mean_squared_error(y_test, p))
plt.plot(y_test)
plt.plot(p)
plt.legend(['testY', 'p'], loc='upper right')
plt.show()
Total params : 330,241
samples : 2264
and below is the result
I haven't changed anything.
I only ran for loop.
As you can see in the picture, the result of the MSE is huge, even though I have just run the for loop.
I think the fundamental reason for this problem is that the optimizer can not find global maximum and find local maximum and converge. The reason is that after checking all the loss graphs, the loss is no longer reduced significantly. (After 20 times) So in order to solve this problem, I have to find the global minimum. How should I do this?
I tried adjusting the number of batch_size, epoch. Also, I tried hidden layer size, LSTM unit, kerner_initializer addition, optimizer change, etc. but could not get any meaningful result.
I wonder how can I solve this problem.
Your valuable opinions and thoughts will be very much appreciated.
if you want to see full source here is link https://gist.github.com/Lay4U/e1fc7d036356575f4d0799cdcebed90e
From your example, the problem simply comes from the fact that you have over 100 times more parameters than you have samples. If you reduce the size of your model, you will see less variance.
The wider question you are asking is actually very interesting that usually isn't covered in tutorials. Nearly all Machine Learning models are by nature stochastic, the output predictions will change slightly everytime you run it which means you will always have to ask the question: Which model do I deploy to production ?
Off the top of my head there are two things you can do:
Choose the first model trained on all the data (after cross-validation, ...)
Build an ensemble of models that all have the same hyper-parameters and implement a simple voting strategy
References:
https://machinelearningmastery.com/train-final-machine-learning-model/
https://machinelearningmastery.com/randomness-in-machine-learning/
If you want to always start from the same point you should set some seed. You can do it like this if you use Tensorflow backend in Keras:
from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)
If you want to learn why do you get different results in ML/DL models, I recommend this article.

LSTM accuracy too low

I’m a new learner, I just try to get accuracy and validate accuracy using the below code
model = Sequential()
model.add(LSTM(10, input_shape=(train_X.shape[1], train_X.shape[2])))
#model.add(Dropout(0.2))
#model.add(LSTM(30, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1), return_sequences=True)
model.compile(loss=’mae’, optimizer=’adam’, metrics=[‘accuracy’])
# fit network
history = model.fit(train_X, train_y, epochs=50, batch_size=120, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# plot history
pyplot.plot(history.history[‘loss’], label=’train’)
pyplot.plot(history.history[‘val_loss’], label=’test’)
pyplot.legend()
pyplot.show()
print(history.history[‘acc’])
As the loss value is very low (which is round 0.0136) inspite of that I’m getting the accuracy is 6.9% and validate accuracy is 2.3% respectively, which is very low
That is because accuracy is meaningful only for classification problems; for regression (i.e. numeric prediction) ones, such as yours, accuracy is meaningless.
What's more, the fact is that Keras unfortunately will not "protect" you or any other user from putting such meaningless requests in your code, i.e. you will not get any error, or even a warning, that you are attempting something that does not make sense, such as requesting the accuracy in a regression setting; see my answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)? for more details and a practical demonstration (the argument is identical in the case of MAE instead of MSE, since both loss functions signify regression problems).
In regression settings, usually the performance metric is the same with the loss (here MAE), so you should just remove the metrics=[‘accuracy’] argument from your model compilation and worry only for your loss (which, as you say, is low indeed).

Categories