I have a time-series data set with 1 feature that represents multiple games. The goal is to classify each game as a win or loss - binary classification. Each game has 61 rows, and the feature has been scaled to be between 0 and 1:
x_train = array([[0.55340617],
[0.54956823],
[0.54588505],
...,
[0.87483364],
[0.8956947 ],
[0.90343248]])
y_train = array([0, 0, 0, ..., 0, 0, 0])
The problem should be quite easy, and I was expecting around 70% accuracy based on other models.
I'm trying to train an LSTM with the data. I think I should be resetting the state on every game, and so the batch shape is defined by 61 timesteps, and 1 feature:
timesteps = 61
n_features = 1
# Reshape data for LSTM
x_train = x_train.reshape(len(x_train)//timesteps, timesteps, n_features)
# Get class of each game
y_train = x_test[0: len(y_train): timesteps]
model = Sequential()
# Hidden layer
n_neurons = 8
model.add(LSTM(n_neurons,
input_shape=(timesteps, n_features),
stateful=False))
model.add(Dense(1, activation='softmax'))
model.compile(loss='binary_crossentropy',
optimizer='adam', metrics=['accuracy'])
model.fit(x_train,
y_train,
epochs=3,
batch_size=1)
But when I train the model, the accuracy remains constant:
Epoch 1/3
301/301 [==============================] - 7s 23ms/step - loss: 8.2524 - accuracy: 0.4618
Epoch 2/3
301/301 [==============================] - 6s 21ms/step - loss: 8.2524 - accuracy: 0.4618
Epoch 3/3
301/301 [==============================] - 6s 21ms/step - loss: 8.2524 - accuracy: 0.4618
I have tried switching the optimiser to 'RMSprop', but I get the exact same result? I think the problem lies with the batch shape?
Any help would be greatly appreciated!
EDIT: Fixed some typos in the code. Sorry!
Related
I am using model.fit() several times, each time is responsible for training a block of layers where other layers are freezed
CODE
# create the base pre-trained model
base_model = efn.EfficientNetB0(input_tensor=input_tensor,weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# add a fully-connected layer
x = Dense(x.shape[1], activation='relu',name='first_dense')(x)
x=Dropout(0.5)(x)
x = Dense(x.shape[1], activation='relu',name='output')(x)
x=Dropout(0.5)(x)
no_classes=10
predictions = Dense(no_classes, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional layers
for layer in base_model.layers:
layer.trainable = False
#FIRST COMPILE
model.compile(optimizer='Adam', loss=loss_function,
metrics=['accuracy'])
#FIRST FIT
model.fit(features[train], labels[train],
batch_size=batch_size,
epochs=top_epoch,
verbose=verbosity,
validation_split=validation_split)
# Generate generalization metrics
scores = model.evaluate(features[test], labels[test], verbose=1)
print(scores)
#Let all layers be trainable
for layer in model.layers:
layer.trainable = True
from tensorflow.keras.optimizers import SGD
#FIRST COMPILE
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss=loss_function,
metrics=['accuracy'])
#SECOND FIT
model.fit(features[train], labels[train],
batch_size=batch_size,
epochs=no_epochs,
verbose=verbosity,
validation_split=validation_split)
What is weird is that in the second fit, accuracy resulted from first epoch is much lower that the accuracy of the last epoch of the first fit.
RESULT
Epoch 40/40
6286/6286 [==============================] - 14s 2ms/sample - loss: 0.2370 - accuracy: 0.9211 - val_loss: 1.3579 - val_accuracy: 0.6762
874/874 [==============================] - 2s 2ms/sample - loss: 0.4122 - accuracy: 0.8764
Train on 6286 samples, validate on 1572 samples
Epoch 1/40
6286/6286 [==============================] - 60s 9ms/sample - loss: 5.9343 - accuracy: 0.5655 - val_loss: 2.4981 - val_accuracy: 0.5115
I think the weights of the second fit are not taken from the first fit
Thanks in advance!!!
I think this is the result of using a different optimizer. You used Adam in the first fit and SGD in the second fit. Try using Adam in the second fit and see if it works correctly
I solved this by removing the second compiler.
I made Keras NN model for fake news detection. My features are avg length of the words, avg length of the sentence, number of punctuation signs, number of capital words, number of questions etc. I have 34 features. I have one output, 0 and 1 (0 for fake and 1 for real news).
I have used 50000 samples for training, 10000 for testing and 2000 for validation. Values of my data are going from -1 to 10, so there is not big difference between values. I have used Standard Scaler like this:
x_train, x_test, y_train, y_test = train_test_split(features, results, test_size=0.20, random_state=0)
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
validation_features = scaler.transform(validation_features)
My NN:
model = Sequential()
model.add(Dense(34, input_dim = x_train.shape[1], activation = 'relu')) # input layer requires input_dim param
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(1, activation='sigmoid')) # sigmoid instead of relu for final probability between 0 and 1
model.compile(loss="binary_crossentropy", optimizer= "adam", metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', min_delta=0.0, patience=0, verbose=0, mode='auto')
model.fit(x_train, y_train, epochs = 15, shuffle = True, batch_size=64, validation_data=(validation_features, validation_results), verbose=2, callbacks=[es])
scores = model.evaluate(x_test, y_test)
print(model.metrics_names[0], round(scores[0]*100,2), model.metrics_names[1], round(scores[1]*100,2))
Results:
Train on 50407 samples, validate on 2000 samples
Epoch 1/15
- 3s - loss: 0.3293 - acc: 0.8587 - val_loss: 0.2826 - val_acc: 0.8725
Epoch 2/15
- 1s - loss: 0.2647 - acc: 0.8807 - val_loss: 0.2629 - val_acc: 0.8745
Epoch 3/15
- 1s - loss: 0.2459 - acc: 0.8885 - val_loss: 0.2602 - val_acc: 0.8825
Epoch 4/15
- 1s - loss: 0.2375 - acc: 0.8930 - val_loss: 0.2524 - val_acc: 0.8870
Epoch 5/15
- 1s - loss: 0.2291 - acc: 0.8960 - val_loss: 0.2423 - val_acc: 0.8905
Epoch 6/15
- 1s - loss: 0.2229 - acc: 0.8976 - val_loss: 0.2495 - val_acc: 0.8870
12602/12602 [==============================] - 0s 21us/step
loss 23.95 acc 88.81
Accuracy check:
prediction = model.predict(validation_features , batch_size=64)
res = []
for p in prediction:
res.append(p[0].round(0))
# Accuracy with sklearn
acc_score = accuracy_score(validation_results, res)
print("Sklearn acc", acc_score) # 0.887
Saving the model:
model.save("new keras fake news acc 88.7.h5")
scaler_filename = "keras nn scaler.save"
joblib.dump(scaler, scaler_filename)
I have saved that model and that scaler.
When I load that model and that scaler, and when I want to make prediction, I get accuracy of 52%, and thats very low because I had accuracy of 88.7% when I was training that model.
I applied .transform on my new data for testing.
validation_df = pd.read_csv("validation.csv")
validation_features = validation_df.iloc[:,:-1]
validation_results = validation_df.iloc[:,-1].tolist()
scaler = joblib.load("keras nn scaler.save")
validation_features = scaler.transform(validation_features)
my_model_1 = load_model("new keras fake news acc 88.7.h5")
prediction = my_model_1.predict(validation_features , batch_size=64)
res = []
for p in prediction:
res.append(p[0].round(0))
# Accuracy with sklearn - much lower
acc_score = accuracy_score(validation_results, res)
print("Sklearn acc", round(acc_score,2)) # 0.52
Can you tell me what am I doing wrong, I have read a lot about this on github and stackoverflow but I couldnt find the answer?
It is difficult to answer that without your actual data. But there is a smoking gun, raising suspicions that your validation data might be (very) different from your training & test ones; and it comes from your previous question on this:
If i use fit_transform on my [validation set] features, I do not get an error, but I get accuracy of 52%, and that's terrible (because I had 89.1 %).
Although using fit_transform on the validation data is indeed wrong methodology (the correct one being what you do here), in practice, it should not lead to such a high discrepancy in the accuracy.
In other words, I have actually seen many cases where people erroneously apply such fit_transform approaches on their validation/deployment data, without never realizing any mistake in it, simply because they don't get any performance discrepancy - hence they are not alerted. And such a situation is expected, if indeed all these data are qualitatively similar.
But discrepancies such as yours here lead to strong suspicions that your validation data are actually (very) different from your training & test ones. If this is the case, such performance discrepancies are to be expected: the whole ML practice is founded upon the (often implicit) assumption that our data (training, validation, test, real-world deployment ones etc) do not change qualitatively, and they all come from the same statistical distribution.
So, the next step here is to perform an exploratory analysis to both your training & validation data to investigate this (actually, this is always assumed to be the step #0 in any predictive task). I guess that even elementary measures (mean & max/min values etc) will show if there are strong differences between them, as I suspect.
In particular, scikit-learn's StandardScaler uses
z = (x - u) / s
for the transformation, where u is the mean value and s the standard deviation of the data. If these values are significantly different between your training and validation sets, the performance discrepancy is not to be unexpected.
I'm currently working on the CIFAR-10 Dataset which is an image classification problem with 10 classes.
I have started to develop with Tensorflow 2 a Linear Classification without the LinearClassifier Object.
X shape corresponds to 10 000 images of 32*32 pixels RBG = (10000, 3072)
Y_one_hot is a one hot vector = (10000, 10)
model creation code:
model = tf.keras.Sequential()
model.add(Dense(1, activation="linear", input_dim=32*32*3))
model.add(Dense(10, activation="softmax", input_dim=1))
model.compile(optimizer="adam", loss="mean_squared_error", metrics=["accuracy"])
training code:
model.fit(X, Y_one_hot, batch_size=10000, verbose=1, epochs=100)
predict code:
img = X[0].reshape(1, 3072) # Select image 0
res = np.argmax((model.predict(img))) # select the max in output
Problem:
res value is always the same. It seems my model is not learning.
Model.summary
Summary displays :
dense (Dense) (None, 1) 3073
dense_1 (Dense) (None, 10) 20
Total params: 3,093
Trainable params: 3,093
Non-trainable params: 0
Accuracy & loss:
Epoch 1/100
10000/10000 [==============================] - 2s 184us/sample - loss: 0.0949 - accuracy: 0.1005
Epoch 50/100
10000/10000 [==============================] - 0s 10us/sample - loss: 0.0901 - accuracy: 0.1000
Epoch 100/100
10000/10000 [==============================] - 0s 8us/sample - loss: 0.0901 - accuracy: 0.1027
Do you have any idea why my model is always prediciting the same value ?
Thanks,
One remarks:
The loss you used loss="mean_squared_error"is not meant for classification. Is meant for regression. Two very different problems. Try a cross entropy. For example
`model.compile(optimizer=AdamOpt,
loss='categorical_crossentropy', metrics=['accuracy'])`
You can find an example here: https://github.com/michelucci/oreilly-london-ai/blob/master/day1/Beginner%20friendly%20networks/First_Example_of_a_CNN_(CIFAR10).ipynb. Is a note book I used for a training I gave. The network is CNN but you can change it with yours.
Try that...
Best of luck, Umberto
My code is very simple since my understanding is MLP can approximate any function:
def build_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(5, activation='tanh', input_shape=(4, ), name='a'),
tf.keras.layers.Dense(5, activation='tanh'),
tf.keras.layers.Dense(2, activation='sigmoid', name='b')])
optimizer = tf.keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
model.compile(loss='mse',optimizer=optimizer)
return model
def train_benchmark_NN(x, y, epochs=10000):
model = build_model()
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=20, verbose=0)
history = model.fit(x, y, batch_size = 1000, epochs=epochs, validation_split = 0.2, verbose=1, callbacks=[es])
return model, history
I tried different number of layers(like[256, 128, 64, 32]), nodes, optimizer, initializer, activation function. I also tried handle the two outputs separately instead of training one model for them together, but the result is bad too. Actually I don't have a decent judge on how heavy my model should be for data like this. I tried training model for some know function with same number of input and output, it's always very hard when the function like
y1=math.cos(x1)+math.cos(x2)+math.cos(x3)+math.cos(x4).
Can anyone tell me, I should try much heavier model or I did something wrong in my code? or I have to preprocess the data differently? I only normalized it with zscore. Data size is ~6000 in total.
Current results:
Epoch 62/10000
4936/4936 [==============================] - 0s 4us/sample - loss: 0.2711 - val_loss: 3.9427
Epoch 63/10000
4936/4936 [==============================] - 0s 4us/sample - loss: 0.2686 - val_loss: 3.9444
Epoch 64/10000
4936/4936 [==============================] - 0s 3us/sample - loss: 0.2661 - val_loss: 3.9457
If I change validation_split from 0.2 to 0.01,the result became very different:
6109/6109 [==============================] - 0s 5us/sample - loss: 0.3729 - val_loss: 0.0589
Epoch 96/10000
6109/6109 [==============================] - 0s 5us/sample - loss: 0.3683 - val_loss: 0.0356
Epoch 97/10000
6109/6109 [==============================] - 0s 5us/sample - loss: 0.3702 - val_loss: 0.0381
i: 0 , err_mean: 2.383471436639142
<matplotlib.legend.Legend at 0x7fdbb329d7f0>
Although the val_loss became much smaller, that probably because the validation size isn't big enough, because when I plot the errors, it still looks same.
Some visualization of the relationships in my data:
inputs are x1-car speed, x2-engine torque, x3-DOC temperature, x4-DPF temperature
outputs are y1-tailpipe CO gas, y2-tailpipe HC gas.
y1 against x1, x2, x3, x4 are shown below:
Should this be function easy to approximate at all? Thanks!!!
I plotted errors along targets, it seems the model didn't learn at all, because the errors is very correlated to the targets.
Training and validation is healthy for 2 epochs but after 2-3 epochs the Val_loss keeps increasing while the Val_acc keeps increasing.
I'm trying to train a CNN model to classify a given review to a single class of 1-5. Therefore, I considered it as a multi-class classification.
I've divided the dataset to 3 sets - 70% training, 20% testing and 10% validation.
Distribution of training data for 5 classes as follows.
1 - 31613, 2 - 32527, 3 - 61044, 4 - 140005, 5 - 173023.
Therefore I've added class weights as follows.
{1: 5.47, 2: 5.32, 3: 2.83, 4: 1.26, 5: 1}
Model structure is as below.
input_layer = Input(shape=(max_length, ), dtype='int32')
embedding = Embedding(vocab_size, 200, input_length=max_length)(input_layer)
channel1 = Conv1D(filters=100, kernel_size=2, padding='valid', activation='relu', strides=1)(embedding)
channel1 = GlobalMaxPooling1D()(channel1)
channel2 = Conv1D(filters=100, kernel_size=3, padding='valid', activation='relu', strides=1)(embedding)
channel2 = GlobalMaxPooling1D()(channel2)
channel3 = Conv1D(filters=100, kernel_size=4, padding='valid', activation='relu', strides=1)(embedding)
channel3 = GlobalMaxPooling1D()(channel3)
merged = concatenate([channel1, channel2, channel3], axis=1)
merged = Dense(256, activation='relu')(merged)
merged = Dropout(0.6)(merged)
merged = Dense(5)(merged)
output = Activation('softmax')(merged)
model = Model(inputs=[input_layer], outputs=[output])
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
model.fit(final_X_train, final_Y_train, epochs=5, batch_size=512, validation_data=(final_X_val, final_Y_val), callbacks=callback, class_weight=class_weights)
1/5 - loss: 1.8733 - categorical_accuracy: 0.5892 - val_loss: 0.7749 - val_categorical_accuracy: 0.6558
2/5 - loss: 1.3908 - categorical_accuracy: 0.6917 - val_loss: 0.7421 - val_categorical_accuracy: 0.6784
3/5 - loss: 0.9587 - categorical_accuracy: 0.7734 - val_loss: 0.7595 - val_categorical_accuracy: 0.6947
4/5 - loss: 0.6402 - categorical_accuracy: 0.8370 - val_loss: 0.7921 - val_categorical_accuracy: 0.7216
5/5 - loss: 0.4520 - categorical_accuracy: 0.8814 - val_loss: 0.8556 - val_categorical_accuracy: 0.7331
Final accuracy = 0.7328754744261703
This seems to be an overfitting behavior, but I've tried adding dropout layers which didn't help. I've also tried increasing the data, which made the results even worst.
I'm totally new to deep learning, if anyone has any suggestions to improve, please let me know.
val_loss keeps increasing while the Val_acc keeps increasing This is maybe because of the loss function...loss function is being calculated using actual predicted probabilities while accuracy is being calculated using one hot vectors.
Let's take your 4-class example. For one of the review true class is, say 1. The predicted probabilities by the system are [0.25, 0.30, 0.25, 0.2]. According to categorical_accuracy your output is correct i.e [0, 1, 0, 0] but since your probability mass is so distributed...categorical_crossentropy will give a high loss as well.
As for the overfitting problem. I am not really sure why introducing more data is causing problems.
Try increasing the strides.
Don't make the data more imbalanced by adding data to any particular class.