Tensorflow: Predict 1 of 4 labels for text classification

Tensorflow: Predict 1 of 4 labels for text classification - python

I'm currently working on a text classification problem that needs us to classify text into one of four labels. After encoding y-value should be one of [0,1,2,3] which should be the predicted label.
However, the prediction this model made seems ranging in (0,1) and I'm a bit confused? Moreover, can anyone clarify if this is ANN or RNN? Have zero experience in TensorFlow and still struggling...
model = Sequential()
model.add(Dense(16, activation='relu'))
model.add(Dense(4, activation='softmax'))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
from sklearn.preprocessing import LabelEncoder
#encode the label
label_encoder = LabelEncoder()
y_train=np.array(label_encoder.fit_transform(train_labels))
x_train=np.array(train_features)
y_true=np.array(label_encoder.fit_transform(dev_label))
#fit the model
model.fit(x_train,y_train,epochs=1)
y_pred=model.predict(dev_features)
and the error message:Classification metrics can't handle a mix of multiclass and continuous-multioutput targets

Let's say that the target column has 4 unique values: red, blue, green, yellow and the corpus is converted to TF-IDF values. The first 3 rows look like this:
word_1
word_2
target
0.567
0.897
red
0.098
0.238
blue
0.66
0.786
green
One-Hot Encoding
After one-hot encoding the target, your target looks like an array of the form:
array[[1. 0. 0. 0.], <- category 'red'
[0. 1. 0. 0.], <- category 'blue'
[0. 0. 1. 0.]...] <- category 'green'
Here, the target column is of the size (n_samples, n_targets) which is (n,4). In such a case final activation has to be sigmoid or softmax and you will train your model with categorical_crossentropy loss. The code here answering your question will be:
model.add(Dense(4, activation='sigmoid'))
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
Label-Encoding
After label-encoding the target, your target looks like an array of the form:
array([1, 2, 3 ...])
with a 1D array of size (n_targets). Here the code will be:
model.add(Dense(4, activation='softmax'))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Prediction
These numbers you see are the probability of each class for the given input sample. For example, [[0.4846592 0.5153408]] means that the given sample belongs to class 0 with probability of around 0.48 and it belongs to class 1 with probability of around 0.51. So you want to take the class with the highest probability and therefore you can use np.argmax to find which index (i.e. 0 or 1) is the maximum one:
import numpy as np
pred_class = np.argmax(y_pred, axis=-1)
Further, this has nothing to do with the loss function of the model. These probabilities are given by the last layer in your model which is very likely that it uses softmax as the activation function to normalize the output as a probability distribution.
Source
Conclusion
The error you are getting is because of the loss function being used incorrectly.
If you have 1D integer encoded or LabelEncoded target, you should use sparse_categorical_crossentropy as loss function.
If you have one-hot encoded your target in order to have 2D shape (n_samples, n_class), you should use categorical_crossentropy

The dense layer should have a dimension 4 and the activation function should be "softmax" instead of "sigmoid" since we are performing multi-class (more than 2 classes) classification.
Also, change the loss function to "categorical_crossentropy".
Your code sample will look like this:
model.add(Dense(16, activation='relu'))
model.add(Dense(4, activation='softmax'))
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

Related

Sigmoid as a last layer in LSTM

I have a classification (Keras) with LSTM for a dataset with 4 attributes labeled into 2 classes (safe and unsafe). with put the sigmoid in the last layer I got a better accuracy of 98% rather than softmax.
My question is that:
1 )If I use Softmax in the last layer:
in Softmax based on the 2 neurons as output at the end in other code, I can compare the score and say the data belong to which.
For example score_safe= 1.2945 and score_unsafe= -9.0 then I can say this row of dataset belongs to the safe class.
2)If I use Sigmid in the last layer:
Then I had to put just a neuron as up output and how can I compare the scores and how can say this row of datasets belongs to which class?
model = Sequential()
model.add(LSTM(256, input_shape=(x_train.shape[1:]), activation='tanh', return_sequences=True))
#model.add(BatchNormalization())
model.add(Dense(128, activation='tanh'))
#model.add(BatchNormalization())
model.add(Dense(128, activation='tanh'))
model.add(Dense(1, activation='sigmoid'))

The output of a sigmoid is a single float between 0. and 1.
Typically, it is set such that if the output is below 0.5 the model is classifying as the first class (whichever class is represented as a 0 in your dataset). If the output is above 0.5 the model is classifying as the second class (represented as a 1 in your dataset).
The 0.5 threshold can be varied to introduce a bias toward one or the other class.

How to match predicted class and prabablity with actual labels

I have trained a deep learning model based on Bidirectional LSTM and dense layer output. It is quite confusing that which output probability acquired using model.predict(x) matches with my actual label (one hot encoded labels). Moreover, the model.predict_classes(x) outputs (0,1,2) are also confusing. How can i relate these outputs with my original labels. Below is my code snippet for reference:
model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.5))
model.add(Bidirectional(tf.keras.layers.LSTM(250, return_sequences=True,activation='tanh')))
model.add(Bidirectional(tf.keras.layers.LSTM(250)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
#model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])
print(model.summary())
history = model.fit(X_train, Y_train, validation_data=(X_test, Y_test), batch_size=32, epochs=10)
model.predict_classes(test_doc)
model.predict(test_doc)
If someone please help to identify the outputs with actual labels in this sequential model.

Assuming that you have made a one hot encoding for your labels (0, 1, 2), you will have vectors as output of your model.
So, for example, if you have an instance with class 0, you'll target vector will be:
[1, 0, 0]
if you have an instance with class 1, your target vector will be:
[0, 1, 0]
if you have an instance with class 2, your target vector will be:
[0, 0, 1]
The method .predict, will give you a probability for every class in your target. So because you have 3 classes (0, 1, 2), you'll get a vector of size three with three probabilities
model.predict(x) # vector of size 3 with 3 probabilities
Something like this:
#class0, class1, class2
[0.31, 0.4, 0.29]
And these probabilities will sum to 1 because you have used the softmax activation function.
The method .predict_classes, will select for you the class with the highest probability from the vector and will make the decode
So if you have a probabilities vector of:
#class0, class1, class2
[0.31, 0.4, 0.29]
you'll get 1, because the max value in the vector is the one with index 1 in the target vector, the which represent the class 1
PS. You can make the model do the one hot encoding, changing the loss as "sparse_categorical_crossentropy":
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Without using pd.get_dummies(df['Sentiment']).value

How to get the values of the final layer of a DNN using KERAS?

I've just began using Keras to train a simple DNN and I'm struggling on setting my custom Loss Function, here's the code of the Model:
X_train = train_dataframe.to_numpy()[:, 0:4]
Y_train = train_dataframe.to_numpy()[:, 4]
model = Sequential()
model.add(Dense(1000, input_shape=(4,), activation='relu'))
model.add(Dense(1000, activation='relu'))
model.add(Dense(Y_train.shape[0], activation='linear', activity_regularizer=regularizers.l1(0.02)))
def custom_loss(y_true, y_pred):
mse_loss = tf.keras.losses.mean_squared_error(y_true,np.ones((450, 4)) * y_pred)
return mse_loss + y_pred
model.compile("adam", custom_loss(X_train, model.layers[2].output), metrics=["accuracy"])
model.fit(X_train, Y_train, epochs=5, batch_size=1)
I will briefly explain. I got a training set of 450 samples and 4 features for each one as input and a (450,1) numerical vector pared to the training set.
Now, what I would like to obtain is a sort of LASSO regression by applying the activity regularizer on the last layer and then building my custom loss function where I put a MSE between y_true (which is the input) y_pred which is not the output but a simple multiplication of the output layer values with a (450,4) matrix (for semplicity is filled with ones).
My problem is that I got this error when I run the script:
ValueError: Dimensions must be equal, but are 4 and 450 for 'mul' (op: 'Mul') with input shapes:
[450,4], [?,450].
And maybe it is because I'm not extracting well the values of the output layer doing model.layers[2].output. So How can I do this properly using Keras?

I think you are making 2 crucial mistakes:
Don't pass arguments for the loss in .compile keras is smart enough for that:
model.compile(loss=custom_loss, optimizer='adam', metrics=["accuracy"])
If you want apply some multiplication to the last layer then create a custom layer for that, don't do it inside the loss function; the loss function's job is only to find out how far the predicted value are from real one

Is there a way to use multilabel classification but take as correct when the model predicts only one label in keras?

I have a dataset of weather forecasts and am trying to make a model that predicts which forecast will be more accurate the next day.
In order to do so, my y output is of the form y=[1,0,1,0] because I have the forecasts of 4 different organizations. 1 represents that this is the best forecast for the current record and more 'ones' means that multiple forecasts had the same best prediction.
My problem is that I want to create a model that trains on these data but also learns that only predicting 1 value correctly is 100% correct answer as I only need to get as a result one of the best and equal forecasts. I believe that the way I am doing this 'shaves' accuracy from my evaluation. Is there a way to implement this in keras? The architecture of the neural network is totally experimental and there is no specific reason why I chose it. This is the code I wrote. My train dataset consists of 6463 rows × 505 columns.
model = Sequential()
model.add(LSTM(150, activation='relu',activity_regularizer=regularizers.l2(l=0.0001)))
model.add(Dense(100, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(4, activation='softmax'))
#LSTM
# reshape input to be 3D [samples, timesteps, features]
X_train_sc =X_train_sc.reshape((X_train_sc.shape[0], 1, X_train_sc.shape[1]))
X_test_sc = X_test_sc.reshape((X_test_sc.shape[0], 1,X_test_sc.shape[1]))
#validation set
x_val=X_train.iloc[-2000:-1300,0:505]
y_val=y_train[-2000:-1300]
x_val_sc=scaler.transform(x_val)
# reshape input to be 3D for LSTM[samples, timesteps, features]
x_val_sc =x_val_sc.reshape((x_val_sc.shape[0], 1, x_val_sc.shape[1]))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['categorical_accuracy'])
history= model.fit(x=X_train_sc, y=y_train ,validation_data=(x_val_sc,y_val), epochs=300, batch_size=24)
print(model.evaluate(X_test_sc,y_test))
yhat= model.predict(X_test_sc)
My accuracy is ~44%

If you want to make prediction of form [1,0,1,0] ie. the model should predict the probabiliyt of belong to each of the 4 classes then it is called multi-label classification. What you have coded for is multi-class classification.
Muti-label classification
Your last layer will be a dense layers of size 4 for each class, with sigmod activation. You will use a binary_crossentropy loss.
x = np.random.randn(100,10,1)
y = np.random.randint(0,2,(100,4))
model = keras.models.Sequential()
model.add(keras.layers.LSTM(16, activation='relu', input_shape=(10,1), return_sequences=False))
model.add(keras.layers.Dense(8, activation='relu'))
model.add(keras.layers.Dense(4, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(x,y)
Check
print (model.predict(x))
Output
array([[0.5196002 , 0.52978194, 0.5009601 , 0.5036485 ],
[0.508756 , 0.5189857 , 0.5022978 , 0.50169533],
[0.5213044 , 0.5254892 , 0.51159555, 0.49724004],
[0.5144601 , 0.5264933 , 0.505496 , 0.5008205 ],
[0.50524575, 0.5147699 , 0.50287664, 0.5021702 ],
[0.521035 , 0.53326863, 0.49642274, 0.50102305],
.........
As you can see the probabilities for each prediction do not sum up to one, rather each value is a probability of it belonging to the corresponding class. So if the probability > 0.5 you can say that it belong to the class.
On the other hand if you use softmax, the probabilies sum up to 1 ie. it belongs to the single class for which it has value > 0.5.

Keras MLP classifier not learning

I have a data like this
there are 29 column ,out of which I have to predict winPlacePerc(extreme end of dataframe) which is between 1(high perc) to 0(low perc)
Out of 29 column 25 are numerical data 3 are ID(object) 1 is categorical
I dropped all the Id column(since they're all unique) and also encoded the categorical(matchType) data into one hot encoding
After doing all this I am left with 41 column(after one hot)
This is how i am creating data
X = df.drop(columns=['winPlacePerc'])
#creating a dataframe with only the target column
y = df[['winPlacePerc']]
Now my X have 40 column and this is my label data looks like
> y.head()
winPlacePerc
0 0.4444
1 0.6400
2 0.7755
3 0.1667
4 0.1875
I also happen to have very large amount of data like 400k data ,so for testing purpose I am training on fraction of that,doing that using sckit
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.997, random_state=32)
which gives almost 13k data for training
For model I'm using Keras sequential model
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dense, Dropout, Activation
from keras.layers.normalization import BatchNormalization
from keras import optimizers
n_cols = X_train.shape[1]
model = Sequential()
model.add(Dense(40, activation='relu', input_shape=(n_cols,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='Adam',
metrics=['accuracy'])
model.fit(X_train, y_train,
epochs=50,
validation_split=0.2,
batch_size=20)
Since my y-label data is between 0 & 1 ,I'm using sigmoid layer as my output layer
this is training & validation loss & accuracy plot
I also tried to convert label into binary using step function and binary cross entropy loss function
after that y-label data looks like
> y.head()
winPlacePerc
0 0
1 1
2 1
3 0
4 0
and changing loss function
model.compile(loss='binary_crossentropy',
optimizer='Adam',
metrics=['accuracy'])
this method was more worse than previous
as you can see its not learning after certain epoch,and this also happens even if I am taking all data rather than fraction of it
after this did not work I also used dropout and tried adding more layer,but nothing works here
Now my question ,what I am doing wrong here is it wrong layer or in data how can I improve upon this?

To clear things out - this is a Regression problem so using accuracy doesn't really makes sense, because you will never be able to predict the exact value of 0.23124.
First of all you certainly want to normalise your values (not the one hot encoded) before passing it to the network. Try using a StandardScaler as a start.
Second, I would recommend changing the activation function in the output layer - try with linear and as a loss mean_squared_error should be fine.
In order to validate you model "accuracy" plot the predicted together with the actual - this should give you a chance of validating the results visually. However, that being said your loss already looks quite decent.
Check this post, should give you a good grasp of what (activation & loss functions) and when to use.

from sklearn.preprocessing import StandardScaler
n_cols = X_train.shape[1]
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(n_cols,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error',
optimizer='Adam',
metrics=['mean_squared_error'])
model.fit(X_train, y_train,
epochs=50,
validation_split=0.2,
batch_size=20)
Normalize data
Add more depth to your network
Make the last layer linear
Accuracy is not a good metric for regression. Let's see an example
predictions: [0.9999999, 2.0000001, 3.000001]
ground Truth: [1, 2, 3]
Accuracy = No:of Correct / Total => 0 /3 = 0
Accuracy is 0, but the predictions are pretty close to the ground truth. On the other hand, MSE will be very low pointing that the deviation of the predictions from the ground truth is very less.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow: Predict 1 of 4 labels for text classification - python

Related

Sigmoid as a last layer in LSTM

How to match predicted class and prabablity with actual labels

How to get the values of the final layer of a DNN using KERAS?

Is there a way to use multilabel classification but take as correct when the model predicts only one label in keras?

Keras MLP classifier not learning

Categories

Resources