How to interpret and transform the values predicted by Keras classifier? - python

I'm training my Keras model to predict whether, with the provided data parameter, it will make a shot or not and it will represent in such a way that 0 means no and 1 means yes. However, when I try to predict it I got values that are float.
I've tried using the data that is exactly the same as train data to get 1 but it does not work.
I used the data below to tried the one-hot encoding.
https://github.com/eijaz1/Deep-Learning-in-Keras-Tutorial/blob/master/keras_tutorial.ipynb
import pandas as pd
from keras.utils import to_categorical
from keras.models import load_model
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
#read in training data
train_df_2 = pd.read_csv('diabetes_data.csv')
#view data structure
train_df_2.head()
#create a dataframe with all training data except the target column
train_X_2 = train_df_2.drop(columns=['diabetes'])
#check that the target variable has been removed
train_X_2.head()
#one-hot encode target column
train_y_2 = to_categorical(train_df_2.diabetes)
#vcheck that target column has been converted
train_y_2[0:5]
#create model
model_2 = Sequential()
#get number of columns in training data
n_cols_2 = train_X_2.shape[1]
#add layers to model
model_2.add(Dense(250, activation='relu', input_shape=(n_cols_2,)))
model_2.add(Dense(250, activation='relu'))
model_2.add(Dense(250, activation='relu'))
model_2.add(Dense(2, activation='softmax'))
#compile model using accuracy to measure model performance
model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
early_stopping_monitor = EarlyStopping(patience=3)
model_2.fit(train_X_2, train_y_2, epochs=30, validation_split=0.2, callbacks=[early_stopping_monitor])
train_dft = pd.read_csv('diabetes_data - Copy.csv')
train_dft.head()
test_y_predictions = model_2.predict(train_dft)
print(test_y_predictions)
I wanted to get
[[0,1]
[1,0]]
However, I am getting
[[0.8544417 0.14555828]
[0.9312985 0.06870154]]
Additionally, can anyone explain to me what does this value 0.8544417 mean?

Actually, you may interpret the output of a model with a softmax classifier at the top as the confidence scores or probabilities of classes (because the softmax function normalizes the values such that they would be positive and have a sum of 1). So, when you provide the model with a true label of [1, 0] this means that this sample belongs to class 1 with probability of 1, and it belongs to class 2 with probability of zero. Therefore, during training the optimization process tries to get as close as possible to that label, but it would never exactly reach [1,0] (actually due to softmax it might get as close as [0.999999, 0.000001], but never [1, 0]).
But that is not a problem, because we are interested to get just close enough and know the class with the highest probability and consider that as the prediction of the model. And you can easily do that by finding the index of the class with maximum probability:
import numpy as np
preds = model.predict(some_data)
class_preds = np.argmax(preds, axis=-1) # e.g. for [max,min] it gives 0, for [min,max] it gives 1
Further, if you are interested to convert predictions to either [0,1] or [1,0] for any reason, you can just round the values:
import numpy as np
preds = model.predict(some_data)
round_preds = np.around(preds) # this would convert [0.87, 0.13] to [1., 0.]
Note: rounding only works properly with two classes, and not when you have more than two classes (e.g. [0.3, 0.4, 0.3] would become [0, 0, 0] after rounding).
Note 2: Since you are creating the model using Sequential API of Keras, then as an alternative to argmax approach described above you can directly use model.predict_classes(some_data) which gives you the exact same output.

Related

Why am I getting a constant loss and accuracy?

This is my code:-
# Importing the essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Getting the dataset
data = pd.read_csv("sales_train.csv")
X = data.iloc[:, 1:-1].values
y = data.iloc[:, -1].values
# y = np.array(y).reshape(-1, 1)
# Getting the values for november 2013 and 2014 to predict 2015
list_of_november_values = []
list_of_november_values_y = []
for i in range(0, len(y)):
if X[i, 0] == 10 or X[i, 0] == 22:
list_of_november_values.append(X[i, 1:])
list_of_november_values_y.append(y[i])
# Converting list to array
arr_of_november_values = np.array(list_of_november_values)
y_train = np.array(list_of_november_values_y).reshape(-1, 1)
# Scaling the independent values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(arr_of_november_values)
# Creating the neural network
from keras.models import Sequential
from keras.layers import Dense
nn = Sequential()
nn.add(Dense(units=120, activation='relu'))
nn.add(Dense(units=60, activation='relu'))
nn.add(Dense(units=30, activation='relu'))
nn.add(Dense(units=15, activation='relu'))
nn.add(Dense(units=1, activation='softmax'))
nn.compile(optimizer='adam', loss='mse')
nn.fit(X_train, y_train, batch_size=100, epochs=25)
# Saving the weights
nn.save_weights('weights.h5')
print("Weights Saved")
For my loss, I am getting the same value for every epoch. Is it possible if there is a concept I am missing that is causing my loss to be constant??
Here is the dataset for the code.
The predominant reason is your odd choice of final-layer activation, paired with the loss function used. Reconsider this: you are using softmax activation on a single-unit fully-connected layer. Softmax activation takes a vector and scales it such that the sum of the values are equal to one and it retains proportion according to the following function:
The idea is that your network will only ever output 1, thus there are no gradients, and no learning.
To resolve this, first change your final layer activation to either ReLU or Linear, depending upon the structure of your dataset (I'm not willing to use the provided data myself, but I'm sure you understand the structure of your dataset).
I expect there may be further issues regarding the structure of your network, but I'll leave that up to you. For now, the big issue is your final-layer activation.
Change this line:
nn.add(Dense(units=1, activation='softmax'))
To this line:
nn.add(Dense(units=1))
For a regression problem, you don't need an activation function.

Tensorflow model predicting Nans

I am new to TensorFlow framework and I am trying to apply Tensorflow to predict the survivor based on this Titanic Dataset:https://www.kaggle.com/c/titanic/data.
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
#%%
titanictrain = pd.read_csv('train.csv')
titanictest = pd.read_csv('test.csv')
df = pd.concat([titanictrain,titanictest],join='outer',keys='PassengerId',sort=False,ignore_index=True).drop(['Name'],1)
#%%
def preprocess(df):
df['Fare'].fillna(value=df.groupby('Pclass')['Fare'].transform('median'),inplace=True)
df['Fare'] = df['Fare'].map(lambda x: np.log(x) if x>0 else 0)
df['Embarked'].fillna(value=df['Embarked'].mode()[0],inplace=True)
df['CabinAlphabet'] = df['Cabin'].str[0]
categories_to_one_hot = ['Pclass','Sex','Embarked','CabinAlphabet']
df = pd.get_dummies(df,columns=categories_to_one_hot,drop_first=True)
return df
df = preprocess(df)
df = df.drop(['PassengerId','Ticket','Cabin','Survived'],1)
titanic_trainandval = df.iloc[:len(titanictrain)]
titanic_test = df.iloc[len(titanictrain):] #test after preprocessing
titanic_test.head()
# split train into training and validation set
labels = titanictrain['Survived']
y = labels.values
test = titanic_test.copy() # real test sets
print(len(test), 'test examples')
Here I am trying to apply preprocessing on the data:
1.Drop Name column and Do one hot coding both on the train and test set
2.Drop ['PassengerId','Ticket','Cabin','Survived'] for Simplicity.
Split train and test following the original order
Here is a picture showing what the training set looks like.
"""# model training"""
from tensorflow.keras.layers import Input, Dense, Activation,Dropout
from tensorflow.keras.models import Model
X = titanic_trainandval.copy()
input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(10, activation='relu')(input_layer)
dense_layer_2 = Dense(5, activation='relu')(dense_layer_1)
output = Dense(1, activation='softmax',name = 'predictions')(dense_layer_2)
model = Model(inputs=input_layer, outputs=output)
base_learning_rate = 0.0001
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate), metrics=['acc'])
history = model.fit(X, y, batch_size=5, epochs=20, verbose=2, validation_split=0.1,shuffle = False)
submission = pd.DataFrame()
submission['PassengerId'] = titanictest['PassengerId']
Then I put the training set X into the model to get the result. However, history shows the following result:
No matter how I change the learning rate and batch size, the result does not change, and the loss is always 'nan', and the prediction based on the test set is always 'nan' as well.
Could anybody explain where the problem is and give some possible solutions?
at first glance there are 2 major problems in your code:
your output layer must be Dense(2, activation='softmax'). this is because yours is a binary classification problem and if you are using softmax to generate probabilities the output dim must be equal to the number of classes. (you can use one output dimension with sigmoid activation)
you have to change your loss function. with softmax and numerical encoded target use sparse_categorical_crossentropy. (you can use binary_crossentropy with sigmoid and with from_logits=False as default)
PS: make sure to remove all NaNs in your original data before starting fit
Marco Cerliani is right with the points 1 and 2.
The real problem why you have NaNs is because you feed NaNs in your code. If you look carefully, even in your third photo, the 888th example on the column Age contains a NaN.
This is why you have NaNs. Solve this one, and apply Marco Cerliani's suggestions and you're good to go.
Apart from the above answers, 1 more thing which I would like to add is that whenever you want to use form_logits=True for classification problems, use Linear activation function i.e activation='linear' which is the default value for activation function in the last layer.

How do I determine the input data using the learned model in keras python?

I am learning my data set with the following code.
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
np.random.seed(5)
dataset = np.loadtxt('path to dataset', delimeter=',')
x_train=dataset[:700,0:3]
y_train=dataset[:700,3]
x_test=dataset[700:,0:3]
y_test=dataset[700:,3]
model =Sequential()
model.add(Dense(12, input_dim=3, activate='relu'))
model.add(Dense(8, activate='relu'))
model.add(Dense(1, activate='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer = 'adam', metrics = ['acuuracy'])
model.fit(x_train,y_train,epochs = 100, batch_size =24)
score = model.evaluate(x_test,y_test)
The above learning data consists of three attributes and two classes.
I would like to use the learned model above to determine data that the class does not contain.
For example, the learning data is (1, 34, 23, 0).
And the data I put in the input is (3,56,33), so I want to classify the class
using the learned model with these property information.
I would be very grateful if you could give me a concrete method.
You can use model.predict() function to determine the class of any new datapoint. As in your case, if you want to predict the class of (3,56,33), you can write this after fitting the model,
model.fit(x_train,y_train,epochs = 100, batch_size =24)
prediction_prob = model.predict(np.array([3,56,33]))
Since, you are using sigmoid activation in the final layer, your output will be a single probability value between 0 and 1. You can then threshold the prediction to get the final class value.
prediction_prob = model.predict(np.array([3,56,33]))
class_name = int(round(prediction_prob[0]))
print(class_name)
Remember, that model.predict() takes a numpy array as input and outputs another numpy array of predictions.

Customize Keras' loss function in a way that the y_true will depend on y_pred

I'm working on a multi-label classifier. I have many output labels [1, 0, 0, 1...] where 1 indicates that the input belongs to that label and 0 means otherwise.
In my case the loss function that I use is based on MSE. I want to change the loss function in a way that when the output label is -1 than it will change to the predicted probability of this label.
Check the attached images to best understand what I mean:
The scenario is - when the output label is -1 I want the MSE to be equal to zero:
This is the scenario:
And in such case I want it to change to:
In such case the MSE of the second label (the middle output) will be zero (this is a special case where I don't want the classifier to learn about this label).
It feels like this is a needed method and I don't really believe that I'm the first to think about it so firstly I wanted to know if there's a name for such way of training Neural Net and second I would like to know how can I do it.
I understand that I need to change some stuff in the loss function but I'm really newbie to Theano and not sure about how to look there for a specific value and how to change the content of the tensor.
I believe this is what you looking for.
import theano
from keras import backend as K
from keras.layers import Dense
from keras.models import Sequential
def customized_loss(y_true, y_pred):
loss = K.switch(K.equal(y_true, -1), 0, K.square(y_true-y_pred))
return K.sum(loss)
if __name__ == '__main__':
model = Sequential([ Dense(3, input_shape=(4,)) ])
model.compile(loss=customized_loss, optimizer='sgd')
import numpy as np
x = np.random.random((1, 4))
y = np.array([[1,-1,0]])
output = model.predict(x)
print output
# [[ 0.47242549 -0.45106074 0.13912249]]
print model.evaluate(x, y) # keras's loss
# 0.297689884901
print (output[0, 0]-1)**2 + 0 +(output[0, 2]-0)**2 # double-check
# 0.297689929093

How to decide the size of layers in Keras' Dense method?

Below is the simple example of multi-class classification task with
IRIS data.
import seaborn as sns
import numpy as np
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.regularizers import l2
from keras.utils import np_utils
#np.random.seed(1335)
# Prepare data
iris = sns.load_dataset("iris")
iris.head()
X = iris.values[:, 0:4]
y = iris.values[:, 4]
# Make test and train set
train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0)
################################
# Evaluate Keras Neural Network
################################
# Make ONE-HOT
def one_hot_encode_object_array(arr):
'''One hot encode a numpy array of objects (e.g. strings)'''
uniques, ids = np.unique(arr, return_inverse=True)
return np_utils.to_categorical(ids, len(uniques))
train_y_ohe = one_hot_encode_object_array(train_y)
test_y_ohe = one_hot_encode_object_array(test_y)
model = Sequential()
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
# Actual modelling
# If you increase the epoch the accuracy will increase until it drop at
# certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with
# epoch 70
hist = model.fit(train_X, train_y_ohe, verbose=0, nb_epoch=100, batch_size=1)
score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0)
print("Test fraction correct (NN-Score) = {:.2f}".format(score))
print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))
My question is how do people usually decide the size of layers?
For example based on code above we have:
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dense(3, activation='sigmoid'))
Where first parameter of Dense is 16 and second is 3.
Why two layers uses two different values for Dense?
How do we choose what's the best value for Dense?
Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).
Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.
There are two basic methods:
Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.
Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.
Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.
There is no known way to determine a good network structure evaluating the number of inputs or outputs. It relies on the number of training examples, batch size, number of epochs, basically, in every significant parameter of the network.
Moreover, a high number of units can introduce problems like overfitting and exploding gradient problems. On the other side, a lower number of units can cause a model to have high bias and low accuracy values. Once again, it depends on the size of data used for training.
Sadly it is trying some different values that give you the best adjustments. You may choose the combination that gives you the lowest loss and validation loss values, as well as the best accuracy for your dataset, as said in the previous post.
You could do some proportion on your number of units value, something like:
# Build the model
model = Sequential()
model.add(Dense(num_classes * 8, input_shape=(shape_value,), activation = 'relu' ))
model.add(Dropout(0.5))
model.add(Dense(num_classes * 4, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes * 2, activation = 'relu'))
model.add(Dropout(0.2))
#Output layer
model.add(Dense(num_classes, activation = 'softmax'))
The model above shows an example of a categorisation AI system. The num_classes are the number of different categories the system has to choose. For instance, in the iris dataset from Keras, we have:
Iris Setosa
Iris Versicolour
Iris Virginica
num_classes = 3
However, this could lead to worse results than with other random values. We need to adjust the parameters to the training dataset by making some different tries and then analyse the results seeking for the best combination of parameters.
My suggestion is to use EarlyStopping(). Then check the number of epochs and accuracy with test loss.
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
rlp = lrd = ReduceLROnPlateau(monitor = 'val_loss',patience = 2,verbose = 1,factor = 0.8, min_lr = 1e-6)
es = EarlyStopping(verbose=1, patience=2)
his = classifier.fit(X_train, y_train, epochs=500, batch_size = 128, validation_split=0.1, verbose = 1, callbacks=[lrd,es])

Categories