I am working on a seq2seq keras/tensorflow 2.0 model. Every time the user inputs something, my model prints the response perfectly fine. However on the last line of each response I get this:
You: WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 2 batches). You may need to use the repeat() function when building your dataset.
The "You:" is my last output, before the user is supposed to type something new in. The model works totally fine, but I guess no error is ever good, but I don't quite get this error. It says "interrupting training", however I am not training anything, this program loads an already trained model. I guess this is why the error is not stopping the program?
In case it helps, my model looks like this:
intent_model = keras.Sequential([
keras.layers.Dense(8, input_shape=[len(train_x[0])]), # input layer
keras.layers.Dense(8), # hidden layer
keras.layers.Dense(len(train_y[0]), activation="softmax"), # output layer
])
intent_model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
intent_model.fit(train_x, train_y, epochs=epochs)
test_loss, test_acc = intent_model.evaluate(train_x, train_y)
print("Tested Acc:", test_acc)
intent_model.save("models/intent_model.h5")
To make sure that you have "at least steps_per_epoch * epochs batches", set the steps_per_epoch to
steps_per_epoch = len(X_train)//batch_size
validation_steps = len(X_test)//batch_size # if you have validation data
You can see the maximum number of batches that model.fit() can take by the progress bar when the training interrupts:
5230/10000 [==============>...............] - ETA: 2:05:22 - loss: 0.0570
Here, the maximum would be 5230 - 1
Importantly, keep in mind that by default, batch_size is 32 in model.fit().
If you're using a tf.data.Dataset, you can also add the repeat() method, but be careful: it will loop indefinitely (unless you specify a number).
I have also had a number of models crash with the same warnings while trying to train them. The training dataset if created using the tf.keras.preprocessing.image_dataset_from_directory() and split 80/20. I have created a variable to try and not run out of image. Using ResNet50 with my own images.....
TRAIN_STEPS_PER_EPOCH = np.ceil((image_count*0.8/BATCH_SIZE)-1)
# to ensure that there are enough images for training bahch
VAL_STEPS_PER_EPOCH = np.ceil((image_count*0.2/BATCH_SIZE)-1)
but it still does. BATCH_SIZE is set to 32 so i am taking 80% of the number of images and dividing by 32 then taking away 1 to have surplus.....or so i thought.
history = model.fit(
train_ds,
steps_per_epoch=TRAIN_STEPS_PER_EPOCH,
epochs=EPOCHS,
verbose = 1,
validation_data=val_ds,
validation_steps=VAL_STEPS_PER_EPOCH,
callbacks=tensorboard_callback)
Error after 3 hours processing a a single successful Epoch is:
Epoch 1/25 374/374 [==============================] - 8133s 22s/step -
loss: 7.0126 - accuracy: 0.0028 - val_loss: 6.8585 - val_accuracy:
0.0000e+00 Epoch 2/25 1/374 [..............................] - ETA: 0s - loss: 6.0445 - accuracy: 0.0000e+00WARNING:tensorflow:Your input
ran out of data; interrupting training. Make sure that your dataset or
generator can generate at least steps_per_epoch * epochs batches (in
this case, 9350.0 batches). You may need to use the repeat() function
when building your dataset.
this might help....
> > print(train_ds) <BatchDataset shapes: ((None, 224, 224, 3), (None,)), types: (tf.float32, tf.int32)>
>
> print(val_ds) BatchDataset shapes: ((None, 224, 224, 3), (None,)),types: (tf.float32, tf.int32)>
>
> print(TRAIN_STEPS_PER_EPOCH)
> 374.0
>
> print(VAL_STEPS_PER_EPOCH)
> 93.0
Solution which worked for me was to set drop_remainder=True while generating the dataset. This automatically handles any extra data that is left over.
For example:
dataset = tf.data.Dataset.from_tensor_slices((images, targets)) \
.batch(12, drop_remainder=True)
I had same problem and decreasing validation_steps from 50 to 10 solved the issue.
If you create a dataset with image_dataset_from_directory, remove steps_per_epoch and validation_steps parameters from model.fit.
The reason is steps has been initiated when batch_size passed into image_dataset_from_directory, and you can trying get the steps number with len.
I had the same problem in TF 2.1. It has something to do with the shape/ type of the input, namely the query. In my case, I solved the problem as follows: (My model takes 3 inputs)
x = [[test_X[0][0]], [test_X[1][0]], [test_X[2][0]]]
MODEL.predict(x)
Output:
WARNING:tensorflow:Your input ran out of data; interrupting training.
Make sure that your dataset or generator can generate at least
steps_per_epoch * epochs batches (in this case, 7 batches). You may
need to use the repeat() function when building your dataset.
array([[2.053718]], dtype=float32)
Solution:
x = [np.array([test_X[0][0]]), np.array([test_X[1][0]]), np.array([test_X[2][0]])]
MODEL.predict(x)
Output:
array([[2.053718]], dtype=float32)
I understand that it is completely fine. Firstly. it is a warning not an error. Secondly, the situation is similar to one data is trained during one epoch, next epoch trains next data and you have set epochs value too high e.g.500 (assuming your data size is not fixed but will approximately be <=500). Suppose data size is 480. Now, the remaining epoch don't have any data left to process, hence the warning. As a result, it returns to the recent state when the last data was trained.
I hope this helps. Do let me know if the concept is misunderstood. Thanks!
Try reducing the steps_per_epoch value below the value you have currently set. This helped me solve the problem
I am seeing this issue with TF-2.9 and a custom ImageDataGenerator.
The core issue appears to be that TF/Keras is not selecting the correct data adapter:
<python site-packages>/keras/engine/data_adapter.py
select_data_adapter() was selecting a GeneratorDataAdapter when it should be a KerasSequenceAdapter.
I updated the following file to work around the issue:
<python site-packages>/keras_preprocessing/image/iterator.py
try:
DataSequence = get_keras_submodule('utils').Sequence
except:
try:
# Work-around for TF-2.9
from keras.utils.data_utils import Sequence
DataSequence = Sequence
except:
DataSequence = object
A better solution is using :
data_amount = .5 # to chage the amount of data if the data is large and training for small epoch
steps_per_epoch = int(data_amount*(len(train_data)/EPOCHS)),
# if you have validation data, then:
validation_steps = int(data_amount*(len(val_data)/EPOCHS)),
here .5 is float value which determines how much of the data you want to fit it.
This approach is better than the ones with BATCH_SIZE, but this will always fit the whole dataset, so change the data_amount value to adjust it
I also got this while training a model in google colab, and the reason was that there is not enough memory/RAM to store the amount of data per batch (if you are using batch), so after I lower the batch_size it all ran perfectly
Related
My dataset( Network traffic dataset where we do binary classification)-
Number of features in data is 25
This is the Transformer model -
embed_dim = 25 # Embedding size for each token
num_heads = 2 # Number of attention heads
ff_dim = 32 # Hidden layer size in feed forward network inside transformer
inputs = layers.Input(shape=(25,1,))
transformer_block = TransformerBlock(25, num_heads, ff_dim)
x = transformer_block(inputs)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(20, activation="relu")(x)
outputs = layers.Dense(1, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
history = model.fit(
x_train, y_train, batch_size=32, epochs=50, validation_data=(x_test, y_test))
But the accuracy isn't changing and is extremely poor and it isn't changing with epochs-
Epoch 1/50
1421/1421 [==============================] - 9s 6ms/step - loss: 0.5215 - accuracy: 0.1192 - val_loss: 0.4167 - val_accuracy: 0.1173
Overall, one should be able to get to 100% (train) accuracy, as long as data is not contradictory. Arguably it is the best strategy to get there, before worrying about generalisation (test error), for the specific case:
final activation should be sigmoid (otherwise we have f(x) = exp(x) / exp(x) = 1
there is no need to dropout (it will only make training accuracy lower)
global pooling can remove important information - replace it with a Dense layer for the time being
normalise your data, your features are in quite wide ranges, this can cause training to struggle to converge
consider lowering your learning rate, as it will make it easier to overfit to training data
If all the above fail, just increase size of the model, as "20-25" range of your features might just not be big enough. Neural networks need quite a lot of redundancy to learn properly.
Personally I would also replace the whole model with just an MLP and verify everything works, I am not sure why transformer would be the model of choice here, and it will allow you to verify if the issue is with the model chosen, or with the code.
Finally - make sure that indeed 100% accuracy is obtainable, take your training data and check if there are any 2 datapoints that have exactly the same features, and different labels. If there are none - you should be able to get to 100% accuracy, it is just a matter of getting hyperparamters right.
I am trying to optimize a given neural network (ex Perceptron Multilayer, with 2 hidden layers), by finding the number of epoch and batch that give the highest accuracy.
for epoch from 10 to 200 (in steps of 10):
for batch from 40 to 200 (in steps of 20):
modele.fit (X_train, Y_train, epochs = epoch, batch_size = batch)
I save batch, epoch, Accuracy;
Afterwards I kept the smallest epoch with the smallest corresponding batch which has the highest recognition
ex best_params: epoch = 10, batch = 150 => Accuracy = 94%
My problem is that when I re-run my model with the best_params, it doesn't give me the same results (loss, accuracy), even sometimes very low accuracy (eg 10%).
i try to fix seed, but no best result
Regards
Djam75
df=pd.DataFrame(columns=['Nb_Batch','Nb_Epoch','Accuracy'])
i=0
lst_loss=[]
lst_accuracy=[]
lst_epoch=list(np.arange(10,200,10))
lst_batch=list(np.arange(100,400,20))
for epoch in lst_epoch:
print ('---------------- Epoch ' + str(epoch)+ '------------------')
for batch in lst_batch:
modelSimple.fit(X_train, Y_train, nb_epoch = epoch, batch_size = batch, verbose = 0)
score = modelSimple.evaluate(X_test, Y_test)
df.loc[i,"Nb_Batch"]=batch
df.loc[i,"Nb_Epoch"]=epoch
df.loc[i,"Accuracy"]=score[1]*100
i=i+1
This might be happening due to random parameter initialization. Because if you are building an end-to-end model without transfer learn the weights, every time you training architecture get random values for its parameters.
In this case, a good practice is to use batch normalization layers after some layers according to your architecture.
tensoflow-implementation
pytorch-implmentation
extra idea:
Do not use any 'for', 'while' loops in the model implementation.
you can follow templates in TensorFlow or PyTorch.
OR, if you build a complete model from scratch, vectorize operations by using NumPy like metrics operation library.
Thanks for the update.
I resolve my probelm by saving a model and load it after.
thaks for idea (batch normalization ) and extra idea : not user any for ;-)
regards
I think you might not be updating the weight matrix after completing the training for certain batch sizes and epochs.
Please include the code as well in order to see the problem
I am working on a seq2seq keras/tensorflow 2.0 model. Every time the user inputs something, my model prints the response perfectly fine. However on the last line of each response I get this:
You: WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 2 batches). You may need to use the repeat() function when building your dataset.
The "You:" is my last output, before the user is supposed to type something new in. The model works totally fine, but I guess no error is ever good, but I don't quite get this error. It says "interrupting training", however I am not training anything, this program loads an already trained model. I guess this is why the error is not stopping the program?
In case it helps, my model looks like this:
intent_model = keras.Sequential([
keras.layers.Dense(8, input_shape=[len(train_x[0])]), # input layer
keras.layers.Dense(8), # hidden layer
keras.layers.Dense(len(train_y[0]), activation="softmax"), # output layer
])
intent_model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
intent_model.fit(train_x, train_y, epochs=epochs)
test_loss, test_acc = intent_model.evaluate(train_x, train_y)
print("Tested Acc:", test_acc)
intent_model.save("models/intent_model.h5")
To make sure that you have "at least steps_per_epoch * epochs batches", set the steps_per_epoch to
steps_per_epoch = len(X_train)//batch_size
validation_steps = len(X_test)//batch_size # if you have validation data
You can see the maximum number of batches that model.fit() can take by the progress bar when the training interrupts:
5230/10000 [==============>...............] - ETA: 2:05:22 - loss: 0.0570
Here, the maximum would be 5230 - 1
Importantly, keep in mind that by default, batch_size is 32 in model.fit().
If you're using a tf.data.Dataset, you can also add the repeat() method, but be careful: it will loop indefinitely (unless you specify a number).
I have also had a number of models crash with the same warnings while trying to train them. The training dataset if created using the tf.keras.preprocessing.image_dataset_from_directory() and split 80/20. I have created a variable to try and not run out of image. Using ResNet50 with my own images.....
TRAIN_STEPS_PER_EPOCH = np.ceil((image_count*0.8/BATCH_SIZE)-1)
# to ensure that there are enough images for training bahch
VAL_STEPS_PER_EPOCH = np.ceil((image_count*0.2/BATCH_SIZE)-1)
but it still does. BATCH_SIZE is set to 32 so i am taking 80% of the number of images and dividing by 32 then taking away 1 to have surplus.....or so i thought.
history = model.fit(
train_ds,
steps_per_epoch=TRAIN_STEPS_PER_EPOCH,
epochs=EPOCHS,
verbose = 1,
validation_data=val_ds,
validation_steps=VAL_STEPS_PER_EPOCH,
callbacks=tensorboard_callback)
Error after 3 hours processing a a single successful Epoch is:
Epoch 1/25 374/374 [==============================] - 8133s 22s/step -
loss: 7.0126 - accuracy: 0.0028 - val_loss: 6.8585 - val_accuracy:
0.0000e+00 Epoch 2/25 1/374 [..............................] - ETA: 0s - loss: 6.0445 - accuracy: 0.0000e+00WARNING:tensorflow:Your input
ran out of data; interrupting training. Make sure that your dataset or
generator can generate at least steps_per_epoch * epochs batches (in
this case, 9350.0 batches). You may need to use the repeat() function
when building your dataset.
this might help....
> > print(train_ds) <BatchDataset shapes: ((None, 224, 224, 3), (None,)), types: (tf.float32, tf.int32)>
>
> print(val_ds) BatchDataset shapes: ((None, 224, 224, 3), (None,)),types: (tf.float32, tf.int32)>
>
> print(TRAIN_STEPS_PER_EPOCH)
> 374.0
>
> print(VAL_STEPS_PER_EPOCH)
> 93.0
Solution which worked for me was to set drop_remainder=True while generating the dataset. This automatically handles any extra data that is left over.
For example:
dataset = tf.data.Dataset.from_tensor_slices((images, targets)) \
.batch(12, drop_remainder=True)
I had same problem and decreasing validation_steps from 50 to 10 solved the issue.
If you create a dataset with image_dataset_from_directory, remove steps_per_epoch and validation_steps parameters from model.fit.
The reason is steps has been initiated when batch_size passed into image_dataset_from_directory, and you can trying get the steps number with len.
I had the same problem in TF 2.1. It has something to do with the shape/ type of the input, namely the query. In my case, I solved the problem as follows: (My model takes 3 inputs)
x = [[test_X[0][0]], [test_X[1][0]], [test_X[2][0]]]
MODEL.predict(x)
Output:
WARNING:tensorflow:Your input ran out of data; interrupting training.
Make sure that your dataset or generator can generate at least
steps_per_epoch * epochs batches (in this case, 7 batches). You may
need to use the repeat() function when building your dataset.
array([[2.053718]], dtype=float32)
Solution:
x = [np.array([test_X[0][0]]), np.array([test_X[1][0]]), np.array([test_X[2][0]])]
MODEL.predict(x)
Output:
array([[2.053718]], dtype=float32)
I understand that it is completely fine. Firstly. it is a warning not an error. Secondly, the situation is similar to one data is trained during one epoch, next epoch trains next data and you have set epochs value too high e.g.500 (assuming your data size is not fixed but will approximately be <=500). Suppose data size is 480. Now, the remaining epoch don't have any data left to process, hence the warning. As a result, it returns to the recent state when the last data was trained.
I hope this helps. Do let me know if the concept is misunderstood. Thanks!
Try reducing the steps_per_epoch value below the value you have currently set. This helped me solve the problem
I am seeing this issue with TF-2.9 and a custom ImageDataGenerator.
The core issue appears to be that TF/Keras is not selecting the correct data adapter:
<python site-packages>/keras/engine/data_adapter.py
select_data_adapter() was selecting a GeneratorDataAdapter when it should be a KerasSequenceAdapter.
I updated the following file to work around the issue:
<python site-packages>/keras_preprocessing/image/iterator.py
try:
DataSequence = get_keras_submodule('utils').Sequence
except:
try:
# Work-around for TF-2.9
from keras.utils.data_utils import Sequence
DataSequence = Sequence
except:
DataSequence = object
A better solution is using :
data_amount = .5 # to chage the amount of data if the data is large and training for small epoch
steps_per_epoch = int(data_amount*(len(train_data)/EPOCHS)),
# if you have validation data, then:
validation_steps = int(data_amount*(len(val_data)/EPOCHS)),
here .5 is float value which determines how much of the data you want to fit it.
This approach is better than the ones with BATCH_SIZE, but this will always fit the whole dataset, so change the data_amount value to adjust it
I also got this while training a model in google colab, and the reason was that there is not enough memory/RAM to store the amount of data per batch (if you are using batch), so after I lower the batch_size it all ran perfectly
I am new in using keras for my project. I have been working with generator in my model.
I am literally confused what value should i input
1) In fit_generator : steps_per_epoch & validation_steps ?
2) evaluate_generator : steps ?
3) predict_generator : steps ?
I have referred keras documentation and few other stack1, stack2 questions. I cannot able to understand. Better I can provide the example of my data shape what I am currently working and follow my questions accordingly. Also, please correct if my understanding is wrong
model.fit_generator(trainGen, steps_per_epoch=25, epochs = 100, validation_data=ValGen, validation_steps= 4)
Q1: For every epoch, there were 25 steps. For each step trainGen yields a tuple of shape (244*100*4, 244*100*2) and perform training.
What will be my batch_size and batches if my steps_per_epoch is 25 ?
Q2:
I understood that val_acc and val_loss will be calculated at the end of 25th step of the an epoch. I choose my validation_steps = 4. So ValGen yields a tuple of shape (30*100*4, 30*100*2) 4 times at the end of 25th step of an epoch
I have chosen arbitrarily validation_steps = 4. But how to choose
correct number of validation_steps ? How does val_loss & val_acc
calculated ? (calculate the mean 4 times either as single batch or
using batch_size)
Q3:
Say for example in evaluate_generator & predict_generator, my Generator yields a tuple shape (30*100*4, 30*100*2) for both.
How to choose the correct number for steps argument for both
evaluate_generator & predict_generator ? In keras document it is mentioned as Total number of steps (batches of samples) to yield from generator before stopping ? In my case what will the batches of samples ?
If any additional information required let me know.
Steps are not a parameter that you "choose", you can compute it as:
steps = number of samples / batch size
So here the only parameter that you are free to choose is the batch size, which is chosen to a value where the model does not run out of memory while training. Typical values are between 32 and 64.
For the training set, you use the number of samples of the training set and divide it for the training batch size, and for the validation set, you divide the number of samples in the validation set with the validation batch size. Both batch sizes can be equal.
This applies to all functions that use generators.
I'm getting acquainted with LSTMs and I need clarity on something. I'm modeling a time series using t-300:t-1 to predict t:t+60. My first approach was to set up an LSTM like this:
# fake dataset to put words into code:
X = [[1,2...299,300],[2,3,...300,301],...]
y = [[301,302...359,360],[302,303...360,361],...]
# LSTM requires (num_samples, timesteps, num_features)
X = X.reshape(X.shape[0],1,X.shape[1])
model = Sequential()
model.add(LSTM(n_neurons[0], batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(y.shape[1]))
model.compile(loss='mse', optimizer='adam')
model.fit(X, y, epochs=1, batch_size=1, verbose=1, shuffle=False)
With my real dataset, the results have been suboptimal, and on CPU it was able to train 1 epoch of around 400,000 samples in 20 minutes. The network converged quickly after a single epoch, and for any set of points I fed it, the same results would come out.
My latest change has been to reshape X in the following way:
X = X.reshape(X.shape[0],X.shape[1],1)
Training seems to be going slower (I have not tried on the full dataset), but it is noticably slower. It takes about 5 minutes to train over a single epoch of 2,800 samples. I toyed around with a smaller subset of my real data and a smaller number of epochs and it seems to be promising. I am not getting the same output for different inputs.
Can anyone help me understand what is happening here?
In Keras, timesteps in (num_samples, timesteps, num_features) determine how many steps BPTT will propagate the error back.
This, in turn, takes more time to do hence the slow down that you are observing.
X.reshape(X.shape[0], X.shape[1], 1) is the right thing to do in your case, since what you have is a single feature, with 300 timesteps.