I am trying to use GloVe embeddings to train a rnn model based on this article.
I have a labeled data: text(tweets) on one column, labels on another (hate, offensive or neither).
However the model seems to predict only one class in the result.
This is the LSTM model:
model = Sequential()
hidden_layer = 3
gru_node = 32
# model embedding matrix here....
for i in range(0,hidden_layer):
model.add(GRU(gru_node,return_sequences=True, recurrent_dropout=0.2))
model.add(Dropout(dropout))
model.add(GRU(gru_node, recurrent_dropout=0.2))
model.add(Dropout(dropout))
model.add(Dense(64, activation='softmax'))
model.add(Dense(nclasses, activation='softmax'))
start=time.time()
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
fitting the model:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1)
X_train_Glove,X_test_Glove, word_index, embeddings_index = loadData_Tokenizer(X_train, X_test)
model_RNN = Build_Model_RNN_Text(word_index,embeddings_index, 20)
model_RNN.fit(X_train_Glove,y_train,
validation_data=(X_test_Glove, y_test),
epochs=4,
batch_size=128,
verbose=2)
y_preds = model_RNN.predict_classes(X_test_Glove)
print(metrics.classification_report(y_test, y_preds))
Results:
classification report
Confusion matrix
Am I missing something here?
Update:
this is what the distribution looks like
and the model summary, more or less
How the distribution of your data looks like? The first suggestion is to stratify train/test split (here is the link for the documentation).
The second question is how much data do you have in comparison with the complexity of the model? Maybe, your model is so complex, that just do overfitting. You can use the command model.summary() to see the number of trainable parameters.
Related
I need to create a model to predict multiple labels based on sixteen (16) input features. The dataset has 4486 instances, each instance has a different number of labels (48 different labels).
This is how the data looks:
X Data example
Y Data example
The challenge is to predict labels on a new instance; I know the learning is the problem because the imbalance in the number of labels, this make the learning a bit difficult.
I will appreciate commets and advice regarding how to tackle this issue.
My best result is 30% in accuracy, but I've noticed it predicts the same labels sometimes and not given any satissfactory results so far...
This is the model I've implemented:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
n_inputs, n_outputs = X_train.shape[1], y_train.shape[1]
nodes = math.sqrt(n_inputs*n_outputs)
model = Sequential()
model.add(Dense(nodes, activation='relu'))
model.add(Flatten())
model.add(Dense(n_outputs, activation = 'sigmoid'))
model.compile(optimizer = 'adam',
loss = 'binary_crossentropy',
metrics = [tf.keras.metrics.BinaryAccuracy(),tf.keras.metrics.AUC(), 'accuracy'])
history = model.fit(X_train, y_train, epochs=300, verbose=1, shuffle=True, validation_data=
(X_test, y_test), batch_size=8)
I'm trying to predict big 5 personality traits (Extraversion, Neuroticism, Agreeableness, Conscientiousness, Openness) based on text analysis written by user.
Here is my preprocessed data set:
And here is my word2vec model:
model_wv = Word2Vec(df_processed['TEXT'], sg=1, size=300, window=10, min_count=1)
Vocabulary consists of 26126 words.
features = ['cEXT', 'cNEU', 'cAGR', 'cCON', 'cOPN']
X = df_processed['TEXT']
y = df_processed[features]
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3,random_state=101)
model = Sequential()
model.add(Embedding(max_features, 100))
model.add(LSTM(32, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(5, activation='softmax'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=batch_size,epochs=epochs, validation_data=(X_test, y_test))
My question is, how can I use word2vec vectors in tensorflow model?
Should I replace every word in every row in "X" with the vector from word2vec model? I think it will be quite expensive for calculation but what will be another possibility?
Sorry if my questions sounds dummy, I' m just really new in NLP and tensorflow.
For a school project, I'm trying to predict data using the keras framework, but it's returning 'nan' loss and values when I try to get predicted data.
Source code :
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=5)
# create model
model = Sequential()
model.add(Dense(950, input_shape=(425,), activation='relu'))
model.add(Dense(425, activation='relu'))
model.add(Dense(200, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
sgd = optimizers.SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer='sgd')
# Fit the model
model.fit(X_train, y_train, epochs=20, batch_size=1, verbose=1)
#evaluate the model
y_pred = model.predict(X_test)
score = model.evaluate(X_test, y_test,verbose=1)
print(score)
# calculate predictions
predictions = model.predict(X_pred)
Data :
X_train and X_test are (panda)dataframes of 5000 rows(nber of samples) * 425 columns (number of dimensions).
y_train and y_test look like :
array([ 1.17899644, 1.46080518, 0.9662137 , ..., 2.40157461,
0.53870386, 1.3192718 ])
Can you help me with that ?
Thank you for you help!
Usually, this means that something converges to infinity. As #desertnaut pointed out in the comment, reducing the learning rate might help.
But the root of the issue is your input data. What do these 425 data points mean? Are they from different sources, different features, different parameters? Finding outliners or normalizing the data, could help.
Your code looks fine otherwise.
Make sure your target output is in range (0, 1) as you have sigmoid in the last layer.
sigmoid has an output between zero and one so if the target output is not in this range then (a) change the activation function or (b) normalize outputs in the required range.
Make sure the purpose of this model is the regression.
After considering the above three points, play around with learning rate (decrease) and the optimiser (replace with any other).
Try changing your optimizer to 'Adam' instead of SGD
You initialized your SGD optimizer in variable sgd but you're not using it in compile
I am new to machine learning.
I have a continuous dataset. I am trying to model the target label using several features. I utilize the train_test_split function to separate the train and the test data. I am training and testing the model using the code below:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = Sequential()
model.add(Dense(128, input_dim=X.shape[1], kernel_initializer = 'normal', activation='relu'))
model.add(Dense(1, kernel_initializer = 'normal'))
hist = model.fit(X_train.values, y_train.values, validation_data=(X_test.values,y_test.values), epochs=200, batch_size=64, verbose=1)
I can get good results when I use X_test and y_test for validation data:
https://drive.google.com/open?id=0B-9aw4q1sDcgNWt5TDhBNVZjWmc
However, when I use this model to predict another data (X_real, y_real) (which are not so different from the X_test and y_test except that they are not randomly chosen by train_test_split) I get bad results:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = Sequential()
model.add(Dense(128, input_dim=X.shape[1], kernel_initializer = 'normal', activation='relu'))
model.add(Dense(1, kernel_initializer = 'normal'))
hist = model.fit(X_train.values, y_train.values, validation_data=(X_real.values,y_real.values), epochs=200, batch_size=64, verbose=1)
https://drive.google.com/open?id=0B-9aw4q1sDcgYWFZRU9EYzVKRFk
Is it an issue of overfitting? If it is so, why does my model work ok with the X_test and y_test generated by train_test_split?
Seems that your "real data" differs from your train and test data.
Why do you have "real" and "training" data in the first place?
My approach would be:
1: Mix up all Data you have
2: Devide your Data randomly in 3 sets (train, test and validate)
3: use train and test like you do it now and optimize your classifier
4: When it's good enough validate the classifier with your validation set to make sure no overfitting occurs.
If you have less data then I would suggest you to try a different algorithm. Neural networks generally need a lot of data to get the weights right.
Also, your real data doesn't seem to belong to the same distribution as the train and test data. Don't keep anything hidden, shuffle everything and use Train/Validation/Test splits.
I have just started using Keras and was trying to train a model using Keras deep learning kit. Works till the epochs are runned but crashes just after it.
np.random.seed(1778) # for reproducibility
need_normalise=True
need_validataion=True
nb_epoch=2#8
#Creating model
model = Sequential()
model.add(Dense(512, input_shape=(dims,)))
model.add(PReLU())
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
opt=Adadelta(lr=1,decay=0.995,epsilon=1e-5)
model.compile(loss='binary_crossentropy', optimizer=opt)
auc_scores=[]
best_score=-1
best_model=None
print('Training model...')
if need_validataion:
for i in range(nb_epoch):
#early_stopping=EarlyStopping(monitor='val_loss', patience=0, verbose=1)
#model.fit(X_train, y_train, nb_epoch=nb_epoch,batch_size=256,validation_split=0.01,callbacks=[early_stopping])
model.fit(X_train, y_train, nb_epoch=2,batch_size=256,validation_split=0.15)
y_pre = model.predict_proba(X_valid)
scores = roc_auc_score(y_valid,y_pre)
auc_scores.append(scores)
print (i,scores)
if scores>best_score:
best_score=scores
best_model=model
plt.plot(auc_scores)
plt.show()
else:
model.fit(X_train, y_train, nb_epoch=nb_epoch, batch_size=256)
y_pre = model.predict_proba(X_test)[:,1]
print roc_auc_score(y_test,y_pre)
Error Recieved:
I have pasted it over here. Please have a look at it.
http://pastebin.com/dSw9ckkk
It looks like you have two classes, a positive class and a negative class, so that the positive class labels are 1 minus the negative class labels. In that case, you can discard the negative class labels and make it a single-class problem:
model.add(Dense(1), activation='sigmoid') # instead of Dense(nb_classes) and Activation('softmax')
Alternatively, you can still train the model on both classes and just use the positive class in the AUC calculation:
roc_auc_score(y_test[:, 1],y_pre[:, 1])