hi I experience a strange problem. I run a model and I expected that fit model take 6783 samples but instead, it takes 212.
so the model is below. First, as you can see X_train must have these values
timesteps = len(X_train[0])
input_dim = len(X_train[0][0])
n_classes = 6
100
6
6783
The model goes like this:
model = Sequential()
model.add(LSTM(64, return_sequences=True, recurrent_regularizer=l2(0.0015), input_shape=(timesteps, input_dim)))
model.add(Dropout(0.5))
model.add(LSTM(64, recurrent_regularizer=l2(0.0015), input_shape=(timesteps, input_dim)))
model.add(Dense(64, activation='relu'))
model.add(Dense(n_classes, activation='softmax'))
model.compile(optimizer=Adam(learning_rate = 0.0025), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
and I fit the model
model.fit(X_train, y_train, batch_size=32, epochs=50)
and I take
212/212 [==============================] - 21s 99ms/step - loss: 1.1047 - accuracy: 0.5210
any ideas? thank you
Update:
I change the batch size to 1 and it takes all the samples. But is this an acceptable behaviour? I use Keras 2.2.4 and TensorFlow 2.2.0
Related
I am learning neural networks. I get 98% accuracy with classical ML methods, so I think I made a coding error. The neural networks model is not learning.
Things I tried:
Changing X and y to float64 or float32
Normalizing data
Changing the activation to "linear" or "relu"
Removing Flatten()
Adding hidden layers
Using stochastic gradient descent as optimizer, instead of "adam".
Changing the y label with another label
There are 9 labels in X_train and 8 different classes in y_train.
X_train:
y_train:
Code:
model = keras.models.Sequential()
model.add(keras.layers.Input(shape=(9,)))
model.add(keras.layers.Dense(8, activation='softmax'))
model.add(layers.Flatten())
model.compile(optimizer= 'adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Fitting:
I tried these lines by changing the target label. None of them help training the model. Some give "nan" loss, some go slightly up and down, but all of them are below 0.1% accuracy:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=20, batch_size=24)
or this:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(3, activation='relu', name='relu1'))
model.add(layers.Dense(16, activation='relu', name='relu2'))
model.add(layers.Dense(16, activation='relu', name='relu3'))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history = model.fit(x=X_train, y=y_train, epochs=20)
My dataset( Network traffic dataset where we do binary classification)-
Number of features is 25 and I have normalized the dataset.
My model-
verbose=1
epoch_number=1000
batch_size = 32
n_outputs = 1
model = Sequential()
model.add(Conv1D(filters=200, kernel_size=4, strides=3,activation='relu', input_shape=(25,1)))
model.add(Dropout(0.05))
model.add(BatchNormalization())
model.add(Conv1D(filters=200, kernel_size=5, strides=1,activation='relu', input_shape=(25,1)))
model.add(Dropout(0.05))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.05))
model.add(Flatten())
model.add(Dense(200, activation='relu'))
model.add(Dropout(0.05))
model.add(Dense(100, activation='relu'))
model.add(Dropout(0.05))
model.add(Dense(50, activation='relu'))
model.add(Dropout(0.05))
model.add(Dense(n_outputs, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['acc',f1_m,precision_m, recall_m])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1)
# fit network
model.fit(X_train, y_train,validation_data=(X_test, y_test),epochs=epoch_number, batch_size=batch_size, verbose=1,callbacks=[es])
# evaluate model
loss, accuracy, f1_score, precision, recall = model.evaluate(X_test, y_test, batch_size=batch_size, verbose=0)
print(loss,accuracy,f1_score,precision,recall)
My model is stopping after one epoch when I add Keras Earlycall back even though loss is decreasing after every epoch when I remove it.
If you had printed your logs of training of dataset without using early stopping then It would have been easier to diagnose.
Now Let's look at the possibilities. You have set EarlyStopping as mentioned below.
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1)
Then that means your early stopping layer is like mentioned below which has default parameters.
tf.keras.callbacks.EarlyStopping(
monitor="val_loss",
min_delta=0,
patience=0,
verbose=1,
mode="min",
baseline=None,
restore_best_weights=False,
)
Now here your patience=0 , mode='min', 'min_delta= 0' and monitor_loss = 'val_loss'
This simply means that if your validation loss is not decreasing in the next layer then it will stop.
Or if your Validation loss is same or greater than the previous epoch then it will stop.
I would recommend you to change your patience parameter
I am new to machine learning. I am having trouble getting my data into my network.
This is the error that I am receiving:
ValueError: Error when checking input: expected cu_dnnlstm_22_input to have 3 dimensions, but got array with shape (2101, 17)
I have tried adding model.add(Flatten()) before the dense layer. I would really appreciate your help!
BATCH_SIZE = 64
test_size_length = int(len(main_df)*TESTING_SIZE)
training_df = main_df[:test_size_length]
validation_df = main_df[test_size_length:]
train_x, train_y = training_df.drop('target',1).to_numpy(), training_df['target'].tolist()
validation_x, validation_y = validation_df.drop('target',1).to_numpy(), validation_df['target'].tolist()
#train_x.shape is (2101, 17)
model = Sequential()
# model.add(Flatten())
model.add(CuDNNLSTM(128, input_shape=(train_x.shape), return_sequences=True))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(CuDNNLSTM(128, return_sequences=True))
model.add(Dropout(0.1))
model.add(BatchNormalization())
model.add(CuDNNLSTM(128))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(2, activation='softmax'))
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
# Compile model
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy']
)
tensorboard = TensorBoard(log_dir="logs/{}".format(NAME))
filepath = "RNN_Final-{epoch:02d}-{val_acc:.3f}" # unique file name that will include the epoch and the validation acc for that epoch
checkpoint = ModelCheckpoint("models/{}.model".format(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')) # saves only the best ones
# Train model
history = model.fit(
train_x, train_y,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_data=(validation_x, validation_y),
callbacks=[tensorboard, checkpoint],
)
# Score model
score = model.evaluate(validation_x, validation_y, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
# Save model
model.save("models/{}".format(NAME))
The input to your LSTM layer (CuDNNLSTM) should have shape: (batch_size, timesteps, input_dim).
It looks like you are missing one of these dimensions.
Often we can oversee the last dimension in the case where the input dimension is 1. If this is the case with your model (if you are predicting from a sequence of single numbers), then you might consider expanding the dimensions before the CuDNNLSTM layer with something like this:
model.add(Lambda(lambda t: tf.expand_dims(t, axis=-1)))
model.add(CuDNNLSTM(128))
Without knowing the problem you are working on it's hard to know whether this is a valid way forward but certainly you should keep in mind the required shape of a LSTM layer and reshape/expand dims accordingly.
I just got started with machine learning and I tried to write a simple program where the nn will learn the simple function y = f(x) = 2x.
Here's the code:
#x is a 1D array of 1 to 1000
x = np.arange(1,1000, 1)
y = x*2
xtrain = x[:750]
ytrain = y[:750]
xtest = x[750:]
ytest = y[750:]
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Conv2D
model = Sequential()
model.add(Dense(128, input_dim=1, activation='relu'))
model.add(Dense(1, activation='relu'))
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['accuracy'])
model.summary()
history = model.fit(xtrain, ytrain,
batch_size=100,
epochs=20,
verbose=1,
validation_split=0.2)
I get the following output, no matter how I change the architecture or the hyperparameters:
79999/79999 [==============================] - 1s 13us/step - loss: 8533120007.8465 - acc: 0.0000e+00 - val_loss: 32532613324.8000 - val_acc: 0.0000e+00
the accuracy is 0 all the time. what am I doing wrong?
It's actually what you would expect if you blindly run and expect gradient descent methods to learn any function. The behaviour you observe stems from 2 reasons:
The derivative that SGD uses to update weights actually depends on the input. Take a very simple case y = f(wx + b), the derivative of y with respect to w is f'(wx + b)*x using the chain rule. So when there is an update for an input that is extremely large / unnormalised it blows up. Now the update is basically w' = w - alpha*gradient, so the weight suddenly becomes very small, in fact negative.
After a single gradient update the output became negative because the SGD just overshot. Since you again have relu in the final layer, that just outputs 0 and the training stalls because when the output is negative derivative of relu is 0.
You can reduce the datasize to np.arange(1, 10) and reduce the number of hidden neurons to say 12 (more neurons make the output even more negative after single update as all their weights become negative as well) and you will be able to train the network.
I think it works check this out. I used randn instead of arange. Other things are pretty much the same.
x = np.random.randn(1000)
y = x*2
xtrain = x[0:750]
ytrain = y[0:750]
model = Sequential()
model.add(Dense(128, input_dim=1, activation='relu'))
model.add(Dense(1))
model.summary()
sgd = optimizers.SGD(lr=0.01, decay=1e-6)
model.compile(loss='mean_squared_error',
optimizer=sgd,
metrics=['mae'])
history = model.fit(xtrain, ytrain,
batch_size=100,
epochs=20,
verbose=1,
validation_split=0.2)
If you want to use the earlier dataset(ie arange). Here is accompanying code for that.
x = np.arange(1,1000, 1)
y = x*2
xtrain = x[0:750]
ytrain = y[0:750]
model = Sequential()
model.add(Dense(128, input_dim=1, activation='relu'))
model.add(Dense(1))
model.summary()
sgd = optimizers.Adam(lr=0.0001)
model.compile(loss='mean_squared_error',
optimizer=sgd,
metrics=['mae'])
history = model.fit(xtrain, ytrain,
batch_size=100,
epochs=200,
verbose=1,
validation_split=0.2)
I am training a simple neural network in Keras with Theano backend consisting of 4 dense layers connected to a Merge layer and then to a softmax classifier layer. Using Adam for training, the first few epochs train in about 60s each (in the CPU) but, after that, the training time per epoch starts increasing, taking more than 400s by epoch 70, making it unusable.
Is there anything wrong with my code or is this suppose to happen?
This only happens when using Adam, not with sgd, adadelta, rmsprop or adagrad. I'd use any of the other methods but Adam produces far better results.
The code:
modela = Sequential()
modela.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
modelb = Sequential()
modelb.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
modelc = Sequential()
modelc.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
modeld = Sequential()
modeld.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
model = Sequential()
model.add(Merge([modela, modelb, modelc, modeld], mode='concat', concat_axis=1))
model.add(Dense(258, init='uniform', activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
hist = model.fit([Xa, Xb, Xc, Xd], Ycat, validation_split=.25, nb_epoch=80, batch_size=100, verbose=2)